OpenAI Boosts ChatGPT Performance: Scaling Single Primary PostgreSQL To Millions Of Queries Per Second

Scaling PostgreSQL for ChatGPT: How OpenAI Optimized Performance

OpenAI has recently shared impressive details about how it scaled its PostgreSQL database to accommodate millions of queries per second, serving hundreds of millions of users worldwide through ChatGPT and its API platform. This endeavor reveals the challenges inherent in operating a powerful single-primary PostgreSQL instance under intense workloads, particularly during write-heavy transactions.

Contents

Scaling PostgreSQL for ChatGPT: How OpenAI Optimized Performance

Growing PostgreSQL Load
High Availability and Low Latency
Addressing Operational Challenges
Connection Management
Strategies for Read Replication

Growing PostgreSQL Load

Over the past year, the load on PostgreSQL skyrocketed by more than tenfold. In response, OpenAI collaborated with Azure to fine-tune its deployment on Azure Database for PostgreSQL. This partnership enabled OpenAI to serve an astounding 800 million ChatGPT users while still relying on a single-primary instance that had enough capacity to handle the demands. The optimizations implemented encompassed both the application and database layers, including scaling up the instance size, refining query patterns, and distributing reads across nearly 50 geo-distributed read replicas.

High Availability and Low Latency

To ensure top performance, reads are strategically distributed across these replicas, maintaining a p99 latency in the low double-digit milliseconds. Meanwhile, writes are kept centralized, and several measures are in place to limit unnecessary load. OpenAI utilized lazy writes and application-level optimizations, which help alleviate pressure on the primary instance. This multi-layered strategy guarantees consistent performance even during global traffic spikes.

Addressing Operational Challenges

As traffic volumes scaled, OpenAI faced various operational challenges, including cache-miss storms and complex multi-table join patterns often produced by Object-Relational Mappers (ORMs). To combat these issues, the company moved some computational activities to the application layer, introduced stricter timeouts for idle and long-running transactions, and revamped query structures to limit intrusions on essential processes like autovacuum.

Reducing write pressure remained a core strategy, especially given PostgreSQL’s Multiversion Concurrency Control (MVCC) model, which can hike CPU and storage costs during heavy updates. OpenAI tackled this by shifting shardable workloads to distributed systems, enforcing rate limits, and establishing strict operational policies to prevent cascading server overloads.

Connection Management

Connection pooling and workload isolation were crucial to maintaining optimal performance. OpenAI employed PgBouncer in transaction-pooling mode to manage PostgreSQL’s connection limits efficiently. This setup minimized connection setup latency while preventing client circuit spikes. Additionally, separating critical and non-critical workloads helped avoid noisy neighbor effects during peak times.

Strategies for Read Replication

As the number of read replicas increased, OpenAI needed to address the additional CPU and network overhead involved in streaming Write-Ahead Logs (WAL) to each replica. To mitigate this, the company is experimenting with cascading replication, where intermediate replicas relay WAL data to downstream replicas, thereby lightening the load on the primary instance. This adjustment not only supports growth but enhances the system’s overall efficiency.

OpenAI’s ongoing research into sharded PostgreSQL deployments and alternative distributed systems aims to balance strong consistency requirements with rapidly escalating global traffic and diverse workloads. These efforts highlight the complexity involved in scaling databases for high-demand applications.

By implementing a multitude of innovative strategies, OpenAI has managed to extend PostgreSQL’s limits effectively, ensuring a scalable, reliable, and high-performance structure capable of supporting its extensive user base.

Inspired by: Source

OpenAI Boosts ChatGPT Performance: Scaling Single Primary PostgreSQL to Millions of Queries per Second

Scaling PostgreSQL for ChatGPT: How OpenAI Optimized Performance

Growing PostgreSQL Load

High Availability and Low Latency

Addressing Operational Challenges

Connection Management

Strategies for Read Replication

Stay Connected

Explore Top AI Tools Instantly

Latest News

Master Your Dataset: Take the pandas Quiz – Real Python Guide

Transform AI Prompts into Repeatable ‘Skills’ with Chrome’s New Feature

Efficient RAG Implementation with Training-Free Adaptive Gating Techniques

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Scaling PostgreSQL for ChatGPT: How OpenAI Optimized Performance

Growing PostgreSQL Load

High Availability and Low Latency

Addressing Operational Challenges

More Read

Connection Management

Strategies for Read Replication

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Master Your Dataset: Take the pandas Quiz – Real Python Guide

Transform AI Prompts into Repeatable ‘Skills’ with Chrome’s New Feature

Efficient RAG Implementation with Training-Free Adaptive Gating Techniques

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis