The postmaster process in PostgreSQL runs a single-threaded main loop that handles connection spawning, worker reaping, and parallel query workers. Under extreme load with high connection rates (1400+ connections/sec) and background worker churn, this single-threaded bottleneck can saturate a CPU core, causing 10-15 second delays in connection establishment. The issue was traced through profiling to expensive fork operations and compounded by parallel query workers. Solutions include enabling huge pages for 20% throughput improvement, adding connection jitter to reduce peak rates, and eliminating parallel query bursts. This architectural constraint explains why connection pooling tools are essential for scaled PostgreSQL deployments.
Table of contents
TL;DRSlow connections to postgresA reproduction environmentA deep dive into the postmasterProfiling the postmasterHuge pagesBackground workersUnravelling the mysteryFixing the issueConclusion1 Comment
Sort: