Read Replicas: Scaling Database Reads for High-Traffic Applications
Read replicas are copies of your primary database that serve read-only queries, distributing read traffic across multiple servers. They are the simplest and most effective way to scale read-heavy applications without the complexity of sharding. This guide covers how read replicas work, consistency challenges, routing strategies, and practical setup with AWS RDS.
How Read Replicas Work
A read replica continuously copies data from the primary (master) database using asynchronous replication. The primary handles all write operations (INSERT, UPDATE, DELETE), while read replicas serve SELECT queries. This effectively multiplies your read capacity by the number of replicas.
The typical architecture looks like this: your application sends writes to the primary database and distributes reads across one or more replicas. A connection router (application-level, proxy, or load balancer) directs traffic to the appropriate destination.
# Application-level read/write routing (Python example)
import psycopg2
from random import choice
PRIMARY_DSN = "host=primary.db port=5432 dbname=myapp"
REPLICA_DSNS = [
"host=replica1.db port=5432 dbname=myapp",
"host=replica2.db port=5432 dbname=myapp",
"host=replica3.db port=5432 dbname=myapp",
]
def get_connection(read_only=False):
if read_only:
dsn = choice(REPLICA_DSNS) # Round-robin across replicas
else:
dsn = PRIMARY_DSN
return psycopg2.connect(dsn)
# Write goes to primary
with get_connection(read_only=False) as conn:
conn.execute("INSERT INTO orders (user_id, total) VALUES (42, 99.99)")
# Read goes to a random replica
with get_connection(read_only=True) as conn:
result = conn.execute("SELECT * FROM orders WHERE user_id = 42")
Consistency Concerns
Because read replicas use asynchronous replication, there is always a delay (replication lag) between when data is written to the primary and when it appears on the replica. This causes several consistency challenges.
Stale Reads
A user writes data, then immediately reads it from a replica that has not yet received the update. The user sees old data, which can be confusing or even cause application errors.
Read-After-Write Consistency
The most common solution is to route reads to the primary for a short window after a write. This ensures the user sees their own changes immediately.
# Read-after-write consistency with a time window
import time
WRITE_CACHE = {} # user_id -> last_write_timestamp
CONSISTENCY_WINDOW = 5 # seconds
def should_read_from_primary(user_id):
last_write = WRITE_CACHE.get(user_id, 0)
return (time.time() - last_write) < CONSISTENCY_WINDOW
def write_order(user_id, order_data):
conn = get_connection(read_only=False)
conn.execute("INSERT INTO orders ...")
WRITE_CACHE[user_id] = time.time() # Track the write
def get_orders(user_id):
# Use primary if user recently wrote, otherwise use replica
use_primary = should_read_from_primary(user_id)
conn = get_connection(read_only=not use_primary)
return conn.execute("SELECT * FROM orders WHERE user_id = %s", (user_id,))
Monotonic Reads
If a user's reads are distributed across multiple replicas with different lag levels, they might see newer data and then older data on the next request. Monotonic reads ensure a user always reads from the same replica (sticky sessions), preventing this backward-in-time problem.
Routing Reads vs Writes
Application-Level Routing
The application maintains separate connection pools for the primary and replicas. The ORM or data access layer routes queries based on type. This is the most common approach with frameworks like Django, Rails, and Spring.
# Django: Database routing for read replicas
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'primary.db.example.com',
'NAME': 'myapp',
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'replica.db.example.com',
'NAME': 'myapp',
}
}
# Custom database router
class PrimaryReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica'
def db_for_write(self, model, **hints):
return 'default'
def allow_relation(self, obj1, obj2, **hints):
return True
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default'
Proxy-Level Routing
A database proxy sits between the application and databases, automatically routing queries based on their type. ProxySQL (MySQL), PgBouncer (PostgreSQL), and Amazon RDS Proxy handle this transparently.
-- ProxySQL: Route reads to replicas, writes to primary
-- mysql_query_rules table configuration:
-- Rule 1: All SELECT queries go to hostgroup 1 (replicas)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT', 1);
-- Rule 2: All other queries go to hostgroup 0 (primary)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (2, 1, '.*', 0);
-- Exception: SELECT ... FOR UPDATE goes to primary (it is a write lock)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (0, 1, '^SELECT.*FOR UPDATE', 0);
DNS-Level Routing
AWS RDS provides separate endpoints: a writer endpoint for the primary and a reader endpoint that load-balances across replicas. Your application connects to different DNS names based on the operation type.
AWS RDS Read Replicas
AWS RDS makes read replica setup straightforward. You can create up to 15 read replicas for PostgreSQL and MySQL, including cross-region replicas for disaster recovery and reduced global latency.
# AWS CLI: Create a read replica
aws rds create-db-instance-read-replica \
--db-instance-identifier mydb-replica-1 \
--source-db-instance-identifier mydb-primary \
--db-instance-class db.r5.xlarge \
--availability-zone us-east-1b
# Create a cross-region replica for disaster recovery
aws rds create-db-instance-read-replica \
--db-instance-identifier mydb-replica-eu \
--source-db-instance-identifier arn:aws:rds:us-east-1:123456789:db:mydb-primary \
--db-instance-class db.r5.xlarge \
--region eu-west-1
# Monitor replication lag
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name ReplicaLag \
--dimensions Name=DBInstanceIdentifier,Value=mydb-replica-1 \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 300 \
--statistics Average
RDS Reader Endpoint
For Aurora, the reader endpoint automatically distributes connections across all read replicas using round-robin. Your application only needs two connection strings: the writer endpoint and the reader endpoint.
| Feature | RDS Read Replica | Aurora Read Replica |
|---|---|---|
| Max replicas | 5 (MySQL), 5 (PostgreSQL) | 15 |
| Replication type | Asynchronous (binlog/WAL) | Shared storage (very low lag) |
| Typical lag | Seconds | Milliseconds (<20ms) |
| Failover | Manual promotion | Automatic (replica promoted) |
| Cross-region | Yes | Yes (Aurora Global Database) |
Connection Pooling with Read Replicas
Each database connection consumes memory on the server. With multiple replicas, connection pooling becomes essential to prevent connection exhaustion. Use PgBouncer for PostgreSQL or ProxySQL for MySQL to pool and reuse connections efficiently.
# PgBouncer configuration for read replica pooling
# pgbouncer.ini
[databases]
myapp_write = host=primary.db port=5432 dbname=myapp
myapp_read = host=replica1.db port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
reserve_pool_size = 10
Read Replicas vs Caching
| Aspect | Read Replicas | Caching (Redis) |
|---|---|---|
| Data freshness | Seconds of lag | TTL-based staleness |
| Query flexibility | Any SQL query | Key-based lookups only |
| Latency | Milliseconds (disk I/O) | Sub-millisecond (in-memory) |
| Complexity | Low (same schema/queries) | Higher (cache invalidation) |
| Best for | Diverse read queries | Hot data, repeated lookups |
In practice, use both: read replicas for diverse SQL queries and Redis for hot data that is read thousands of times per second. For more on caching strategies, see our caching guide.
When to Use Read Replicas
- Read-heavy workloads: Applications where reads outnumber writes 10:1 or more
- Reporting and analytics: Run heavy analytical queries on a replica without impacting production writes
- Geographic distribution: Place replicas closer to users in different regions
- Backup and disaster recovery: A cross-region replica serves as a warm standby
For deeper scaling strategies, explore database sharding, replication architectures, and load balancing.
Frequently Asked Questions
Can I write to a read replica?
No, read replicas are read-only by design. Any write operation sent to a replica will be rejected. All writes must go to the primary database. If you need writes in multiple regions, consider multi-primary replication or a distributed database like CockroachDB.
How do I handle replication lag in critical reads?
Implement read-after-write consistency: after a write, route the user's reads to the primary for a short window (5-10 seconds). For less critical reads, accept the lag and read from replicas. Monitor lag with database-specific metrics and alert when lag exceeds your tolerance threshold.
Should I use the same instance type for replicas as the primary?
Not necessarily. If replicas only serve read traffic, they may need less write I/O capacity. However, if a replica might be promoted to primary during failover, it should be the same instance type to handle the write workload. Start with the same type and optimize after observing actual usage.
How many read replicas do I need?
Start with one or two and add more based on read traffic. Monitor CPU, memory, and connection count on each replica. Each replica can handle roughly the same read throughput as the primary. If your primary handles 10,000 reads/second and you need 30,000, three replicas should suffice.