Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those numbers seem to be TSC sampled in software from the moment it receives a full frame to the moment it starts sending a packet.

The traditional way to measure performance in HFT is hardware timestamps on the wire, start of frame in to start of frame out.

With those measurements the performance is probably closer to 2us, which is usually the realistic limit of a non-trivial software trading system.



That’s a fair point, and I agree on wire-to-wire (SOF-in → SOF-out) hardware timestamps being the correct benchmark for HFT.

The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.

I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.

Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.


If that’s the case then 890ns is quite terrible. If for some reason you want to do this in software then the latency should be somewhere below 100ns.


That number is for a non-trivial software path (parsing, state updates, decision logic), not a minimal hot loop. Sub-100 ns in pure software usually means extremely constrained logic or offloading parts elsewhere. I agree there’s room to improve, and I’m working on reducing structural overheads, but this wasn’t meant to represent the absolute lower bound of what’s possible.


Just going over the PCI bus to the NIC costs you 500-600ns with a kernel bypass stack.


It does not. If this was the case, round trip wire to wire latency below 1.0-1.2 microseconds in software would’ve been impossible. But it clearly is possible - see benchmarks by Solarflare, Exablaze, and others.


You mean like this one directly from AMD showing a median 1/2 RTT ef_vi latency of 590ns (for UDP)?

Using their latency generation card that came out just a few months ago?

https://docs.amd.com/r/en-US/ug1586-onload-user/Latency-Test...


Not really, often you can pre compute your model and just do some kind of interpolation on price change and get it done sub 1us wire-to-wire.


Just waiting for a MTU-sized frame to come in through the network at 10Gbps is 1.2us.

Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: