The embedding strategy you choose has a major impact on both cost, quality and latency. We ran a bunch of experiments to help you make better and more informed tradeoffs.

Vespa Blog

Comprehensive benchmarking study comparing embedding models for hybrid search across multiple dimensions: model quantization (FP32, FP16, INT8), vector precision (float, bfloat16, binary), Matryoshka dimensions, and hardware platforms (Graviton3/4, T4 GPU). Key findings include INT8 quantization providing 2.7-3.4x CPU speedup with 94-98% quality retention, binary vectors achieving 32x memory reduction with minimal quality loss on ModernBERT models, and hybrid retrieval consistently outperforming semantic-only search by 3-5 percentage points. Includes interactive leaderboard with NanoBEIR evaluation results and practical Vespa configuration examples for implementing these optimizations in production.

Embedding Tradeoffs, Quantified