An inside look at how OpenAI scaled PostgreSQL to millions of queries per second using replicas, caching, rate limiting, and workload isolation.

OpenAI is a research organization focused on artificial intelligence and machine learning. Readers can learn about  AI research, deep learning models, and AI applications across various domains. With research papers, blog posts, and technical documentation, OpenAI provides  insights and expertise for understanding and advancing the field of artificial intelligence.

OpenAI

OpenAI scaled PostgreSQL to handle millions of queries per second for 800 million ChatGPT users using a single primary Azure PostgreSQL instance with nearly 50 read replicas across multiple regions. Key optimizations included offloading reads to replicas, migrating write-heavy workloads to sharded systems like CosmosDB, implementing PgBouncer for connection pooling, deploying cache locking to prevent cache-miss storms, isolating workloads to prevent noisy neighbor issues, and enforcing strict rate limiting. The architecture achieved five-nines availability with low double-digit millisecond p99 latency despite PostgreSQL's MVCC limitations for write-heavy workloads.

Scaling PostgreSQL to power 800 million ChatGPT users

Woooow! Thanks, this was just what I needed to read!!! &lt;3 Thanks to the asian team behind OpenAI! ;) - PG is absolutely insane!!!

My ChatGPT is still ver slow, please scale it more. Just kidding, thanks for the interesting insight 👏

Loved it! Great share. I have recently been exploring PSQL from scaling pov, this is a great insight to it.

This is an incredible deep dive. The sections on connection pooling with PgBouncer and the ‘noisy neighbor’ isolation really hit home. The way openai have managed to push a single-primary architecture to 10x load through rigorous MVCC tuning and caching locks is a masterclass in ‘scaling vertically before over-complicating with shards.’ It’s a great reminder that solid engineering often beats premature complexity. Thanks for sharing the hard-won lessons on WAL streaming—definitely something I’ll be keeping in mind for my own infra projects!

I don’t think this is the only problem, the response speed isn’t that bad scaling will make it better but the entire desktop app is a slideshow fr, you have to wait 5 mins for the app to startup, wait another 5 for the chat to fully switch, and sometimes by the time the chat is rendered, I would have sent a prompt and ChatGPT might have even responded but I won’t even be able see it due to slow speed, fix the desktop app but until that ChatGPT isn’t worth it, I’ll just stick with Claude