Neon

A practical exploration of using PostgreSQL with the pg_search extension to perform full-text search on massive web datasets. The experiment involved indexing 596GB of web data from the Common Crawl Corpus (365 million documents, 156 billion words) on a single Neon instance with 8 vCPUs and 32GB RAM. While pg_search successfully handled the full dataset, search queries took several seconds to minutes due to memory constraints. However, with a 10% subset that fit in memory, most queries completed under one second, demonstrating pg_search's viability for large-scale applications when properly sized.

Full-text searching the Web — on a single Postgres instance?

Welcome to the Postgres Daily squad! This is your go-to space for sharing and discovering valuable content focused exclusively on PostgreSQL. Join us to explore insightful articles, tutorials, and discussions that dive deep into the world of Postgres