Comprehensive benchmarking study comparing embedding models for hybrid search across multiple dimensions: model quantization (FP32, FP16, INT8), vector precision (float, bfloat16, binary), Matryoshka dimensions, and hardware platforms (Graviton3/4, T4 GPU). Key findings include INT8 quantization providing 2.7-3.4x CPU speedup
Table of contents
What MTEB doesn’t show youInteractive leaderboardGetting started with VespaA few caveatsWrapping upSort: