A full-text search query on turbopuffer was taking 220ms instead of the expected ~50ms. Profiling revealed that over 60% of runtime was spent in a merge iterator, not in BM25 ranking. The root cause: Rust's zero-cost iterator abstraction, while compiling each individual call efficiently, prevented the compiler from vectorizing

12m read timeFrom turbopuffer.com
Post cover image
Table of contents
Understanding the turbopuffer read pathLooking inside the merge iteratorDisassembling the abstractionThe cost hides beneath the abstractionBreaking the abstraction to find a solutionConclusionturbopuffer
2 Comments

Sort: