It's a cool idea, but you have to keep in mind that binary searches are not opti...

psykotic · on July 31, 2021

A fast binary search intended for B-tree key array searches does not have branches at all [1]. An B-tree interior node's key array is usually sized to fit in one cache line, so there isn't a difference in cache access pattern.

With 8-byte keys and the usual 64-byte cache line size, you end up with 8 keys per interior node's key array. That's a pretty bad range for the "simple linear search" most people would write, which is going to be dominated by the expected branch mispredict cost. The idea of using a hint to guess a starting position for either linear or binary search has the same issue. You need branches to capitalize on the best case vs worst case spread, and the mispredict cost will dominate for smaller arrays like this.

If you care about speed for this case, use a fast branchless binary search or a vectorized linear search (the latter will have an edge). An additional minor consideration for B-tree traversal is the latency of vector vs scalar loads since you don't know the load address (except for the root node) until you've finished searching the parent node's key array. That is not necessarily true for vectorized searches of small arrays in general, where the vector load addresses may be key independent (like the B-tree root node is here) and so in that case the few extra cycles of L1 load-to-use latency for vector loads (and higher load latencies in general) can often be hidden.

[1] By B-tree I am referring to in-memory B+ trees. A disk-based B+ tree would usually be built around page-granularity nodes. In that regime a branchless binary search for integer/float keys (with software prefetch to populate the cache ahead of the data dependency chains) that finishes off with a vectorized linear search is likely going to be the winner. The B-tree slack interval (M/2, M] helps since it guarantees a fully unrolled binary search with lg(M) steps won't waste any steps. You can still support "underflowing" B-tree nodes with the same branchless code path.

scottlamb · on July 30, 2021

> It's a cool idea, but you have to keep in mind that binary searches are not optimal for b-tree nodes unless the node is really big. If you have a smaller node, < 32 or 64 elements, a simple linear search is faster. The reason has something to do with processor cache, branch mis-prediction, and pipelining.

I've heard that—and iirc Rust's BTreeMap type uses linear search—but does this depend on the type of key? Easy for me to see how that might be true for a BTreeMap<u64, V> but not a BTreeMap<String, V>. In the latter case, there's a pointer to chase for each comparison (and more/variable data to process; SIMD over multiple keys doesn't seem likely).

ccleve · on July 30, 2021

Yes, true. If your comparisons involve pointer-chasing or string comparison or anything complex, then it's not clear which strategy is faster. I always write the simplest possible code first, which is usually a linear search, benchmark it, and then try to beat the benchmark if it's warranted.

nextaccountic · on July 30, 2021

Isn't the binary search here taken as the whole tree, spanning multiple nodes? Each time we follow a pointer we are discarding a huge swath of the tree.

In a single node, a linear search is probably better, yes.

jleahy · on July 30, 2021

The something is data dependencies.

ardit33 · on July 30, 2021

I agree.... the whole point of a b-tree is to already subdivide the dataset in smaller manageable sets. A binary search wont be much help on those slices if the tree is constructed properly.

jldugger · on July 30, 2021

As I understand it, the idea is that a path hint is a suggestion on which node to scan first, not a suggestion on how to binary search an individual node.

jsnell · on July 30, 2021

No, it'll still do a scan all the way from the root node down to the leaf nodes. The path is simply saying which element to start the binary search from, in each of the nodes.