Static search trees: 40x faster than binary search

The post introduces and optimizes a static search tree (S+ tree) to enhance the high-throughput searching of sorted data. The implementation involves various strategies such as batching, prefetching, and optimized tree layouts. The optimization techniques include auto-vectorization, manual SIMD, and using hugepages for better memory management. The post provides significant speedup (over 40x) compared to traditional binary search by reducing the number of memory access and improving CPU efficiency.

#performance

#rust

#algorithms

Jan 01, 2025•30m read time•From curiouscoding.nl

Table of contents

1.1 Problem statement 1.2 Recommended reading 1.3 Binary search and Eytzinger layout 1.4 Hugepages 1.5 A note on benchmarking 1.6 Cache lines 1.7 S-trees and B-trees 2.1 Linear 2.2 Auto-vectorization 2.3 Trailing zeros 2.4 Popcount 2.5 Manual SIMD 3.1 Batching 3.2 Prefetching 3.3 Pointer arithmetic 3.4 Skip prefetch 3.5 Interleave 4.1 Left-tree 4.2 Memory layouts 4.3 Node size B = 15 4.4 Summary 5.1 Full layout 5.2 Compact subtrees 5.3 The best of both: compact first level 5.4 Overlapping trees 5.5 Human data 5.6 Prefix map 5.7 Summary 7.1 Future work