This post explains how to keep a vector store up to date using Change Data Capture (CDC), Python, and Redpanda. It provides a step-by-step guide to building a CDC-powered indexing pipeline that streams changes from a PostgreSQL database to a vector store. Using a prototype application, the post demonstrates how to use Docker, Quix Streams, and other tools to continuously ingest and update vectors for real-time search result optimization. Detailed instructions are provided for setting up and running the pipeline, as well as for understanding the underlying code and architecture.

14m read timeFrom ai.gopubby.com
Post cover image
Table of contents
Stream Changes from a PostgreSQL Database to a Vector StoreUsing continuous, event-based vector ingestion for incremental indexingPrerequisitesGetting the codeDatabase SetupAdding DataUsing the Streamlit Vector Search UIHow it works under the hoodConfiguring CDCCreating the EmbeddingsUpserting to the Vector DBLessons learned

Sort: