One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky.

Erik Bernhardsson

Erik Bernhardsson, author of Spotify's Annoy library, presents updated benchmarks for approximate nearest neighbor (ANN) search algorithms. The ANN-benchmarks project now features Dockerized algorithms and pre-computed datasets for reproducible comparisons. Across multiple datasets (GloVe, SIFT, Fashion-MNIST, GIST), HNSW from NMSLIB consistently ranks first — over 10x faster than Annoy — followed by KGraph, SW-graph, FAISS-IVF, and Annoy. Graph-based algorithms dominate, while LSH-based approaches like FALCONN have regressed. The author calls for all future ANN research papers to benchmark against these standard libraries.

New benchmarks for approximate nearest neighbors