Scalability Cheatsheet: Patterns, Numbers, and Formulas for System Design

This cheatsheet is your quick reference for all things scalability. From load estimation formulas to scaling patterns and capacity planning, everything you need to design systems that handle millions of users. Use this alongside our System Design Interview Guide and General Cheat Sheets.

Horizontal vs Vertical Scaling

Aspect	Vertical Scaling (Scale Up)	Horizontal Scaling (Scale Out)
Method	Add more CPU, RAM, disk to one machine	Add more machines
Limit	Hardware ceiling (largest available instance)	Virtually unlimited
Complexity	Simple (no code changes)	Complex (distributed systems)
Downtime	Required for upgrade (usually)	Zero downtime (rolling deploys)
Cost	Exponential (bigger machines cost disproportionately more)	Linear (add commodity machines)
Fault Tolerance	Single point of failure	Redundant (survive node failures)
When to Use	Small/medium scale, databases (initially)	Large scale, web servers, stateless services

Load Estimation Formulas

// DAU to QPS Conversion
Daily Active Users (DAU) to QPS:
  QPS = DAU × (actions per user per day) / 86,400

Example: 10M DAU, 10 actions/user/day
  QPS = 10,000,000 × 10 / 86,400 = ~1,157 QPS
  Peak QPS = 1,157 × 3 = ~3,500 QPS (3x peak factor)

// Read-Write Split
Total QPS = 3,500
Read:Write ratio = 10:1
  Read QPS  = 3,500 × (10/11) = ~3,182
  Write QPS = 3,500 × (1/11)  = ~318

// Concurrent Users (approximate)
Concurrent users ≈ DAU × avg_session_duration / 86,400
Example: 10M DAU × 300s avg session / 86,400 = ~34,700 concurrent

Storage Estimation

// Storage per day
Daily storage = write_QPS × 86,400 × avg_record_size

Example: 318 writes/sec × 86,400 sec × 500 bytes
= 13.7 GB/day = ~5 TB/year

// 5-year storage
Total = 5 TB/year × 5 = 25 TB
With replication (3x) = 75 TB
With overhead (1.3x) = ~100 TB

// Common object sizes
Tweet/Post:           250 bytes (text only)
User profile:         1 KB
Image metadata:       200 bytes
Thumbnail:            10 KB
Photo (compressed):   200 KB
Video (1 min, 720p):  50 MB
Log entry:            200 bytes

Bandwidth Estimation

// Bandwidth = QPS × avg size
Incoming (write): 318 QPS × 500 bytes = 159 KB/s = ~1.3 Mbps
Outgoing (read):  3,182 QPS × 2 KB = 6.4 MB/s = ~51 Mbps

// Image-heavy service
Image reads: 1,000 QPS × 200 KB = 200 MB/s = ~1.6 Gbps
(This is why CDNs are essential for media-heavy apps)

// Network capacity reference
1 Gbps link ≈ 125 MB/s throughput (actual ~80-100 MB/s)
10 Gbps link ≈ 1 GB/s throughput

Cache Estimation

// Pareto principle: 20% of data = 80% of requests
Cache size = daily_read_requests × 0.2 × avg_response_size

Example:
Daily reads = 3,182 QPS × 86,400 = 275M requests/day
Cache size = 275M × 0.2 × 2 KB = ~110 GB

// Redis instance sizing
Single Redis node: up to 25 GB comfortably (64 GB max practical)
For 110 GB: 5 Redis nodes (sharded or Redis Cluster)

// Cache hit rate targets
Good:      > 90%
Excellent: > 95%
If < 80%:  Review cache key strategy and TTL

Scaling Patterns

Read Replicas

Pattern	How It Works	Scaling Limit
Primary-Replica	Writes go to primary; reads distributed across replicas	~5-10 replicas before replication lag becomes an issue
Connection Pooling	Reuse database connections (PgBouncer, ProxySQL)	10x more concurrent users per DB instance
Read-through Cache	Cache serves reads; DB only on cache miss	90%+ read reduction on database

Database Sharding

When a single database cannot handle the write load, partition data across multiple databases. See our Database Cheatsheet for sharding strategies.

Strategy	How It Works	Trade-off
Hash-based	hash(key) % num_shards	Even distribution; resharding is expensive
Range-based	Partition by key ranges (A-M, N-Z)	Efficient range queries; potential hot spots
Geographic	Partition by user's region	Low latency for local users; cross-region queries hard
Consistent Hashing	Hash ring with virtual nodes	Minimal redistribution on shard changes

Caching Layers

Layer	Technology	Latency	What to Cache
Browser Cache	HTTP cache headers	0 ms (local)	Static assets, API responses
CDN	CloudFront, Cloudflare	5-20 ms (edge)	Images, videos, static pages
Application Cache	In-process (LRU map)	~0.1 ms	Hot config, frequently accessed data
Distributed Cache	Redis, Memcached	0.1-1 ms	Session data, DB query results, computed values
Database Cache	Query cache, buffer pool	1-5 ms	Repeat queries, indexed data

Auto-Scaling Strategies

Strategy	Trigger	Response Time	Best For
Reactive (Threshold)	CPU > 70% for 5 min	2-5 minutes	General workloads
Predictive (Schedule)	Known traffic patterns	Proactive (ahead of demand)	Predictable peaks (morning rush)
Target Tracking	Maintain target metric (e.g., 1000 req/instance)	1-3 minutes	Steady-state optimization
Queue-based	Queue depth > threshold	2-5 minutes	Async processing workers

// Auto-scaling best practices
Scale-out: Fast (60s cooldown) — add capacity quickly
Scale-in:  Slow (300s cooldown) — avoid thrashing

// Capacity formula
Required instances = peak_QPS / QPS_per_instance
With headroom (1.5x): instances = peak_QPS / QPS_per_instance * 1.5
Minimum instances: 3 (across 3 AZs for fault tolerance)

Example:
Peak QPS = 12,000
QPS per instance = 2,000
Required = 12,000 / 2,000 * 1.5 = 9 instances
Across 3 AZs: min 3 per AZ = 9 instances (matches!)

Capacity Planning Worksheet

System: _______________
Planning horizon: 1 year / 3 years / 5 years

USERS
- Current DAU:        ___________
- Growth rate:        ___% per year
- Projected DAU:      ___________

TRAFFIC
- Actions/user/day:   ___________
- Average QPS:        ___________
- Peak QPS (3x):      ___________
- Read:Write ratio:   ___:___

STORAGE
- Avg record size:    ___________ bytes
- New records/day:    ___________
- Daily storage:      ___________ GB
- Yearly storage:     ___________ TB
- With replication:   ___________ TB

COMPUTE
- QPS per instance:   ___________
- Required instances: ___________
- With headroom:      ___________

CACHE
- Cache hit target:   ___%
- Cache size needed:  ___________ GB
- Redis nodes:        ___________

DATABASE
- Connections needed: ___________
- Read replicas:      ___________
- Shards needed:      ___________ (if applicable)

BANDWIDTH
- Incoming:           ___________ Mbps
- Outgoing:           ___________ Mbps
- CDN offload:        ___%

Connection Pooling

Parameter	Typical Value	Notes
Min pool size	5-10	Keep warm connections ready
Max pool size	20-50 per instance	DB max_connections / app_instances
Connection timeout	5-10 seconds	Time to wait for available connection
Idle timeout	30-300 seconds	Return idle connections to pool
Max lifetime	30-60 minutes	Recycle connections to prevent leaks

// PostgreSQL max connections formula
DB max_connections = 100 (default, increase to 200-500)
App instances = 10
Pool size per instance = max_connections / instances = 10-50

// With PgBouncer (connection multiplexer)
PgBouncer can handle 10,000+ client connections
while only using 100 actual DB connections
Improvement: 100x connection efficiency

Async Processing Patterns

Pattern	When to Use	Implementation
Message Queue	Decouple producer and consumer	Kafka, SQS, RabbitMQ
Event-Driven	React to state changes	EventBridge, Kafka, SNS
Batch Processing	Process large data volumes periodically	Spark, MapReduce, cron jobs
Stream Processing	Real-time data transformation	Kafka Streams, Flink, Kinesis
CQRS	Separate read/write optimization	Write DB + read-optimized view

Performance Benchmarks and Thresholds

Metric	Target	Alert Threshold
API response time (p50)	< 100 ms	> 200 ms
API response time (p99)	< 500 ms	> 1000 ms
Error rate (5xx)	< 0.1%	> 1%
CPU utilization	< 60%	> 80%
Memory utilization	< 70%	> 85%
DB query time	< 10 ms	> 50 ms
Cache hit rate	> 95%	< 80%
Queue depth	< 1000	> 10,000

Explore API security and rate limiting strategies that complement scalability. Test configurations with our API and Network Tools and Security Crypto Tools. Visit swehelper.com/tools for all interactive tools.

Frequently Asked Questions

How do I know when to scale horizontally vs vertically?

Start with vertical scaling — it is simpler and works until you hit the hardware ceiling. Switch to horizontal when: (1) a single machine cannot handle the load, (2) you need fault tolerance across multiple nodes, (3) you need zero-downtime deploys, or (4) cost becomes prohibitive for larger machines. For stateless services like web servers, horizontal scaling is almost always the right default.

How many read replicas do I need?

Calculate: total_read_QPS / QPS_per_replica. A single PostgreSQL replica can handle about 5,000-10,000 simple reads/second. If you need 30,000 read QPS, you need 3-6 replicas. Add a caching layer first — it is more cost-effective. Replicas beyond 5-10 introduce significant replication lag. At that point, consider sharding or a different database architecture.

When should I shard my database?

Shard when: (1) write QPS exceeds what a single primary can handle, (2) data size exceeds single-machine storage/memory, or (3) query latency degrades due to data volume. First exhaust simpler options: optimize queries, add indexes, introduce caching, use read replicas, archive old data. Sharding adds significant operational complexity and should be a last resort. See the Database Cheatsheet for strategies.

What is the right cache TTL?

It depends on data freshness requirements. Common defaults: user profiles (5-15 minutes), product listings (1-5 minutes), configuration (1-24 hours), session data (30 minutes), search results (1-10 minutes). Start with short TTLs and increase as you gain confidence. Too short wastes cache capacity; too long serves stale data. Monitor cache hit rates to optimize.

How do I estimate costs for cloud infrastructure?

Rough AWS/Azure monthly estimates: compute instance (m5.xlarge) ~$140/month, RDS instance (db.r5.xlarge) ~$350/month, ElastiCache Redis (r5.large) ~$200/month, S3 storage ~$23/TB/month, data transfer ~$90/TB outbound. For a system serving 10M DAU, expect $5,000-$20,000/month depending on media storage and processing requirements. Use the cloud provider's pricing calculator for precise estimates.

Scalability Cheatsheet: Patterns, Numbers, and Formulas for System Design