The post introduces and optimizes a static search tree (S+ tree) to enhance the high-throughput searching of sorted data. The implementation involves various strategies such as batching, prefetching, and optimized tree layouts. The optimization techniques include auto-vectorization, manual SIMD, and using hugepages for better memory management. The post provides significant speedup (over 40x) compared to traditional binary search by reducing the number of memory access and improving CPU efficiency.
Table of contents
1.1 Problem statement1.2 Recommended reading1.3 Binary search and Eytzinger layout1.4 Hugepages1.5 A note on benchmarking1.6 Cache lines1.7 S-trees and B-trees2.1 Linear2.2 Auto-vectorization2.3 Trailing zeros2.4 Popcount2.5 Manual SIMD3.1 Batching3.2 Prefetching3.3 Pointer arithmetic3.4 Skip prefetch3.5 Interleave4.1 Left-tree4.2 Memory layouts4.3 Node size B = 154.4 Summary5.1 Full layout5.2 Compact subtrees5.3 The best of both: compact first level5.4 Overlapping trees5.5 Human data5.6 Prefix map5.7 Summary7.1 Future work1 Comment
Sort: