Scaling PostgreSQL for ChatGPT: How OpenAI Optimized Performance
OpenAI has recently shared impressive details about how it scaled its PostgreSQL database to accommodate millions of queries per second, serving hundreds of millions of users worldwide through ChatGPT and its API platform. This endeavor reveals the challenges inherent in operating a powerful single-primary PostgreSQL instance under intense workloads, particularly during write-heavy transactions.
Growing PostgreSQL Load
Over the past year, the load on PostgreSQL skyrocketed by more than tenfold. In response, OpenAI collaborated with Azure to fine-tune its deployment on Azure Database for PostgreSQL. This partnership enabled OpenAI to serve an astounding 800 million ChatGPT users while still relying on a single-primary instance that had enough capacity to handle the demands. The optimizations implemented encompassed both the application and database layers, including scaling up the instance size, refining query patterns, and distributing reads across nearly 50 geo-distributed read replicas.
High Availability and Low Latency
To ensure top performance, reads are strategically distributed across these replicas, maintaining a p99 latency in the low double-digit milliseconds. Meanwhile, writes are kept centralized, and several measures are in place to limit unnecessary load. OpenAI utilized lazy writes and application-level optimizations, which help alleviate pressure on the primary instance. This multi-layered strategy guarantees consistent performance even during global traffic spikes.
Addressing Operational Challenges
As traffic volumes scaled, OpenAI faced various operational challenges, including cache-miss storms and complex multi-table join patterns often produced by Object-Relational Mappers (ORMs). To combat these issues, the company moved some computational activities to the application layer, introduced stricter timeouts for idle and long-running transactions, and revamped query structures to limit intrusions on essential processes like autovacuum.
Reducing write pressure remained a core strategy, especially given PostgreSQL’s Multiversion Concurrency Control (MVCC) model, which can hike CPU and storage costs during heavy updates. OpenAI tackled this by shifting shardable workloads to distributed systems, enforcing rate limits, and establishing strict operational policies to prevent cascading server overloads.
Connection Management
Connection pooling and workload isolation were crucial to maintaining optimal performance. OpenAI employed PgBouncer in transaction-pooling mode to manage PostgreSQL’s connection limits efficiently. This setup minimized connection setup latency while preventing client circuit spikes. Additionally, separating critical and non-critical workloads helped avoid noisy neighbor effects during peak times.
Strategies for Read Replication
As the number of read replicas increased, OpenAI needed to address the additional CPU and network overhead involved in streaming Write-Ahead Logs (WAL) to each replica. To mitigate this, the company is experimenting with cascading replication, where intermediate replicas relay WAL data to downstream replicas, thereby lightening the load on the primary instance. This adjustment not only supports growth but enhances the system’s overall efficiency.
OpenAI’s ongoing research into sharded PostgreSQL deployments and alternative distributed systems aims to balance strong consistency requirements with rapidly escalating global traffic and diverse workloads. These efforts highlight the complexity involved in scaling databases for high-demand applications.
By implementing a multitude of innovative strategies, OpenAI has managed to extend PostgreSQL’s limits effectively, ensuring a scalable, reliable, and high-performance structure capable of supporting its extensive user base.
Inspired by: Source



