Those numbers seem to be TSC sampled in software from the moment it receives a f...

krish678 · 2025-12-25T14:49:43 1766674183

That’s a fair point, and I agree on wire-to-wire (SOF-in → SOF-out) hardware timestamps being the correct benchmark for HFT.

The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.

I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.

Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.

asmnzxklopqw · 2025-12-25T19:00:56 1766689256

If that’s the case then 890ns is quite terrible. If for some reason you want to do this in software then the latency should be somewhere below 100ns.

krish678 · 2025-12-25T19:33:27 1766691207

That number is for a non-trivial software path (parsing, state updates, decision logic), not a minimal hot loop. Sub-100 ns in pure software usually means extremely constrained logic or offloading parts elsewhere. I agree there’s room to improve, and I’m working on reducing structural overheads, but this wasn’t meant to represent the absolute lower bound of what’s possible.

nly · 2025-12-25T14:50:40 1766674240

Just going over the PCI bus to the NIC costs you 500-600ns with a kernel bypass stack.

raviolo · 2025-12-25T23:00:23 1766703623

It does not. If this was the case, round trip wire to wire latency below 1.0-1.2 microseconds in software would’ve been impossible. But it clearly is possible - see benchmarks by Solarflare, Exablaze, and others.

nly · 2025-12-25T23:51:15 1766706675

You mean like this one directly from AMD showing a median 1/2 RTT ef_vi latency of 590ns (for UDP)?

Using their latency generation card that came out just a few months ago?

https://docs.amd.com/r/en-US/ug1586-onload-user/Latency-Test...

dundarious · 2025-12-25T14:50:03 1766674203

Not really, often you can pre compute your model and just do some kind of interpolation on price change and get it done sub 1us wire-to-wire.

mgaunard · 2025-12-25T15:38:22 1766677102

Just waiting for a MTU-sized frame to come in through the network at 10Gbps is 1.2us.

Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.