Retrieval-Augmented Generation (RAG) allows an LLM to answer questions using your data at query time. On their own, LLMs are powerful but limited: they can hallucinate, they have a fixed knowledge cutoff, and they know nothing about your private documents, internal wikis, or proprietary systems.

Vespa Blog

A step-by-step guide to building a RAG application on Vespa Cloud using the out-of-the-box RAG Blueprint. The setup combines hybrid retrieval (BM25 + vector search with binary-quantized embeddings) with multiple ranking profiles including LightGBM/GBDT for high-quality context selection. The guide covers deploying the blueprint via the Vespa Cloud console, installing NyRAG (a Python tool that handles data ingestion, chunking, embedding, and a chat UI), configuring credentials, indexing local documents or web pages, and querying via a chat interface. Four query profiles are explained: hybrid, hybrid-with-gbdt, deepresearch, and deepresearch-with-gbdt, each offering different tradeoffs between speed and retrieval quality.

Build a High-Quality RAG App on Vespa Cloud in 15 Minutes

The Challenge: The Quality of the Context Window

The Solution: Out-of-the-Box RAG on Vespa Cloud

Deploy Vespa RAG Blueprint to Vespa Cloud

Behind the Scenes: What You Just Deployed