A hands-on walkthrough of scaling a Vespa Cloud application to maximize document feed throughput using the MS MARCO passages dataset (8.8M documents). Starting from a minimal single-node setup, the tutorial progressively scales container nodes with GPUs and content nodes, using the Vespa metrics dashboard to identify bottlenecks at each step. The final configuration uses 100 GPU container nodes and 40 content nodes, achieving ~7,358 documents/second and ingesting the full dataset with embeddings (E5, ColBERT) in just over 20 minutes.

12m read timeFrom blog.vespa.ai
Post cover image
Table of contents
Creating the Vespa ApplicationDeploying and FeedingScalingFeeding Fast: 20 GPUsFeeding Furiously: 100 GPUsConclusion

Sort: