Oops, O_DIRECT does not actually make that big of a difference. I had updated my ad-hoc test to use O_DIRECT, but didn't check that write() now returned errors because of wrong alignment ;-)
As mentioned in the sibling comment, syncs are still slow. My initial 1-2ms number came from a desktop I bought in 2018, to which I added an NVME drive connected to an M.1 slot in 2022. On my current test system I'm seeing avg latencies of around 250us, sometimes a lot more (there a fluctuations).
# put the following in a file "fio.job" and run "fio fio.job"
# enable either direct=1 (O_DIRECT) or fsync=1 (fsync() after each write())
[Job1]
#direct=1
fsync=1
readwrite=randwrite
bs=64k # size of each write()
size=256m # total size written
As mentioned in the sibling comment, syncs are still slow. My initial 1-2ms number came from a desktop I bought in 2018, to which I added an NVME drive connected to an M.1 slot in 2022. On my current test system I'm seeing avg latencies of around 250us, sometimes a lot more (there a fluctuations).