Read Replicas: Scaling Database Reads for High-Traffic Applications

Read replicas are copies of your primary database that serve read-only queries, distributing read traffic across multiple servers. They are the simplest and most effective way to scale read-heavy applications without the complexity of sharding. This guide covers how read replicas work, consistency challenges, routing strategies, and practical setup with AWS RDS.

How Read Replicas Work

A read replica continuously copies data from the primary (master) database using asynchronous replication. The primary handles all write operations (INSERT, UPDATE, DELETE), while read replicas serve SELECT queries. This effectively multiplies your read capacity by the number of replicas.

The typical architecture looks like this: your application sends writes to the primary database and distributes reads across one or more replicas. A connection router (application-level, proxy, or load balancer) directs traffic to the appropriate destination.

# Application-level read/write routing (Python example)
import psycopg2
from random import choice

PRIMARY_DSN = "host=primary.db port=5432 dbname=myapp"
REPLICA_DSNS = [
    "host=replica1.db port=5432 dbname=myapp",
    "host=replica2.db port=5432 dbname=myapp",
    "host=replica3.db port=5432 dbname=myapp",
]

def get_connection(read_only=False):
    if read_only:
        dsn = choice(REPLICA_DSNS)  # Round-robin across replicas
    else:
        dsn = PRIMARY_DSN
    return psycopg2.connect(dsn)

# Write goes to primary
with get_connection(read_only=False) as conn:
    conn.execute("INSERT INTO orders (user_id, total) VALUES (42, 99.99)")

# Read goes to a random replica
with get_connection(read_only=True) as conn:
    result = conn.execute("SELECT * FROM orders WHERE user_id = 42")

Consistency Concerns

Because read replicas use asynchronous replication, there is always a delay (replication lag) between when data is written to the primary and when it appears on the replica. This causes several consistency challenges.

Stale Reads

A user writes data, then immediately reads it from a replica that has not yet received the update. The user sees old data, which can be confusing or even cause application errors.

Read-After-Write Consistency

The most common solution is to route reads to the primary for a short window after a write. This ensures the user sees their own changes immediately.

# Read-after-write consistency with a time window
import time

WRITE_CACHE = {}  # user_id -> last_write_timestamp
CONSISTENCY_WINDOW = 5  # seconds

def should_read_from_primary(user_id):
    last_write = WRITE_CACHE.get(user_id, 0)
    return (time.time() - last_write) < CONSISTENCY_WINDOW

def write_order(user_id, order_data):
    conn = get_connection(read_only=False)
    conn.execute("INSERT INTO orders ...")
    WRITE_CACHE[user_id] = time.time()  # Track the write

def get_orders(user_id):
    # Use primary if user recently wrote, otherwise use replica
    use_primary = should_read_from_primary(user_id)
    conn = get_connection(read_only=not use_primary)
    return conn.execute("SELECT * FROM orders WHERE user_id = %s", (user_id,))

Monotonic Reads

If a user's reads are distributed across multiple replicas with different lag levels, they might see newer data and then older data on the next request. Monotonic reads ensure a user always reads from the same replica (sticky sessions), preventing this backward-in-time problem.

Routing Reads vs Writes

Application-Level Routing

The application maintains separate connection pools for the primary and replicas. The ORM or data access layer routes queries based on type. This is the most common approach with frameworks like Django, Rails, and Spring.

# Django: Database routing for read replicas
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'primary.db.example.com',
        'NAME': 'myapp',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'replica.db.example.com',
        'NAME': 'myapp',
    }
}

# Custom database router
class PrimaryReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

Proxy-Level Routing

A database proxy sits between the application and databases, automatically routing queries based on their type. ProxySQL (MySQL), PgBouncer (PostgreSQL), and Amazon RDS Proxy handle this transparently.

-- ProxySQL: Route reads to replicas, writes to primary
-- mysql_query_rules table configuration:
-- Rule 1: All SELECT queries go to hostgroup 1 (replicas)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT', 1);

-- Rule 2: All other queries go to hostgroup 0 (primary)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (2, 1, '.*', 0);

-- Exception: SELECT ... FOR UPDATE goes to primary (it is a write lock)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (0, 1, '^SELECT.*FOR UPDATE', 0);

DNS-Level Routing

AWS RDS provides separate endpoints: a writer endpoint for the primary and a reader endpoint that load-balances across replicas. Your application connects to different DNS names based on the operation type.

AWS RDS Read Replicas

AWS RDS makes read replica setup straightforward. You can create up to 15 read replicas for PostgreSQL and MySQL, including cross-region replicas for disaster recovery and reduced global latency.

# AWS CLI: Create a read replica
aws rds create-db-instance-read-replica \
    --db-instance-identifier mydb-replica-1 \
    --source-db-instance-identifier mydb-primary \
    --db-instance-class db.r5.xlarge \
    --availability-zone us-east-1b

# Create a cross-region replica for disaster recovery
aws rds create-db-instance-read-replica \
    --db-instance-identifier mydb-replica-eu \
    --source-db-instance-identifier arn:aws:rds:us-east-1:123456789:db:mydb-primary \
    --db-instance-class db.r5.xlarge \
    --region eu-west-1

# Monitor replication lag
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name ReplicaLag \
    --dimensions Name=DBInstanceIdentifier,Value=mydb-replica-1 \
    --start-time 2024-01-01T00:00:00Z \
    --end-time 2024-01-02T00:00:00Z \
    --period 300 \
    --statistics Average

RDS Reader Endpoint

For Aurora, the reader endpoint automatically distributes connections across all read replicas using round-robin. Your application only needs two connection strings: the writer endpoint and the reader endpoint.

Feature	RDS Read Replica	Aurora Read Replica
Max replicas	5 (MySQL), 5 (PostgreSQL)	15
Replication type	Asynchronous (binlog/WAL)	Shared storage (very low lag)
Typical lag	Seconds	Milliseconds (<20ms)
Failover	Manual promotion	Automatic (replica promoted)
Cross-region	Yes	Yes (Aurora Global Database)

Connection Pooling with Read Replicas

Each database connection consumes memory on the server. With multiple replicas, connection pooling becomes essential to prevent connection exhaustion. Use PgBouncer for PostgreSQL or ProxySQL for MySQL to pool and reuse connections efficiently.

# PgBouncer configuration for read replica pooling
# pgbouncer.ini

[databases]
myapp_write = host=primary.db port=5432 dbname=myapp
myapp_read = host=replica1.db port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
reserve_pool_size = 10

Read Replicas vs Caching

Aspect	Read Replicas	Caching (Redis)
Data freshness	Seconds of lag	TTL-based staleness
Query flexibility	Any SQL query	Key-based lookups only
Latency	Milliseconds (disk I/O)	Sub-millisecond (in-memory)
Complexity	Low (same schema/queries)	Higher (cache invalidation)
Best for	Diverse read queries	Hot data, repeated lookups

In practice, use both: read replicas for diverse SQL queries and Redis for hot data that is read thousands of times per second. For more on caching strategies, see our caching guide.

When to Use Read Replicas

Read-heavy workloads: Applications where reads outnumber writes 10:1 or more
Reporting and analytics: Run heavy analytical queries on a replica without impacting production writes
Geographic distribution: Place replicas closer to users in different regions
Backup and disaster recovery: A cross-region replica serves as a warm standby

For deeper scaling strategies, explore database sharding, replication architectures, and load balancing.

Frequently Asked Questions

Can I write to a read replica?

No, read replicas are read-only by design. Any write operation sent to a replica will be rejected. All writes must go to the primary database. If you need writes in multiple regions, consider multi-primary replication or a distributed database like CockroachDB.

How do I handle replication lag in critical reads?

Implement read-after-write consistency: after a write, route the user's reads to the primary for a short window (5-10 seconds). For less critical reads, accept the lag and read from replicas. Monitor lag with database-specific metrics and alert when lag exceeds your tolerance threshold.

Should I use the same instance type for replicas as the primary?

Not necessarily. If replicas only serve read traffic, they may need less write I/O capacity. However, if a replica might be promoted to primary during failover, it should be the same instance type to handle the write workload. Start with the same type and optimize after observing actual usage.

How many read replicas do I need?

Start with one or two and add more based on read traffic. Monitor CPU, memory, and connection count on each replica. Each replica can handle roughly the same read throughput as the primary. If your primary handles 10,000 reads/second and you need 30,000, three replicas should suffice.

Read Replicas: Scaling Database Reads for High-Traffic Applications