A full-text search query on turbopuffer was taking 220ms instead of the expected ~50ms. Profiling revealed that over 60% of runtime was spent in a merge iterator, not in BM25 ranking. The root cause: Rust's zero-cost iterator abstraction, while compiling each individual call efficiently, prevented the compiler from vectorizing
Table of contents
Understanding the turbopuffer read pathLooking inside the merge iteratorDisassembling the abstractionThe cost hides beneath the abstractionBreaking the abstraction to find a solutionConclusionturbopuffer2 Comments
Sort: