Scalability Cheatsheet: Patterns, Numbers, and Formulas for System Design
This cheatsheet is your quick reference for all things scalability. From load estimation formulas to scaling patterns and capacity planning, everything you need to design systems that handle millions of users. Use this alongside our System Design Interview Guide and General Cheat Sheets.
Horizontal vs Vertical Scaling
| Aspect | Vertical Scaling (Scale Up) | Horizontal Scaling (Scale Out) |
|---|---|---|
| Method | Add more CPU, RAM, disk to one machine | Add more machines |
| Limit | Hardware ceiling (largest available instance) | Virtually unlimited |
| Complexity | Simple (no code changes) | Complex (distributed systems) |
| Downtime | Required for upgrade (usually) | Zero downtime (rolling deploys) |
| Cost | Exponential (bigger machines cost disproportionately more) | Linear (add commodity machines) |
| Fault Tolerance | Single point of failure | Redundant (survive node failures) |
| When to Use | Small/medium scale, databases (initially) | Large scale, web servers, stateless services |
Load Estimation Formulas
// DAU to QPS Conversion
Daily Active Users (DAU) to QPS:
QPS = DAU × (actions per user per day) / 86,400
Example: 10M DAU, 10 actions/user/day
QPS = 10,000,000 × 10 / 86,400 = ~1,157 QPS
Peak QPS = 1,157 × 3 = ~3,500 QPS (3x peak factor)
// Read-Write Split
Total QPS = 3,500
Read:Write ratio = 10:1
Read QPS = 3,500 × (10/11) = ~3,182
Write QPS = 3,500 × (1/11) = ~318
// Concurrent Users (approximate)
Concurrent users ≈ DAU × avg_session_duration / 86,400
Example: 10M DAU × 300s avg session / 86,400 = ~34,700 concurrent
Storage Estimation
// Storage per day
Daily storage = write_QPS × 86,400 × avg_record_size
Example: 318 writes/sec × 86,400 sec × 500 bytes
= 13.7 GB/day = ~5 TB/year
// 5-year storage
Total = 5 TB/year × 5 = 25 TB
With replication (3x) = 75 TB
With overhead (1.3x) = ~100 TB
// Common object sizes
Tweet/Post: 250 bytes (text only)
User profile: 1 KB
Image metadata: 200 bytes
Thumbnail: 10 KB
Photo (compressed): 200 KB
Video (1 min, 720p): 50 MB
Log entry: 200 bytes
Bandwidth Estimation
// Bandwidth = QPS × avg size
Incoming (write): 318 QPS × 500 bytes = 159 KB/s = ~1.3 Mbps
Outgoing (read): 3,182 QPS × 2 KB = 6.4 MB/s = ~51 Mbps
// Image-heavy service
Image reads: 1,000 QPS × 200 KB = 200 MB/s = ~1.6 Gbps
(This is why CDNs are essential for media-heavy apps)
// Network capacity reference
1 Gbps link ≈ 125 MB/s throughput (actual ~80-100 MB/s)
10 Gbps link ≈ 1 GB/s throughput
Cache Estimation
// Pareto principle: 20% of data = 80% of requests
Cache size = daily_read_requests × 0.2 × avg_response_size
Example:
Daily reads = 3,182 QPS × 86,400 = 275M requests/day
Cache size = 275M × 0.2 × 2 KB = ~110 GB
// Redis instance sizing
Single Redis node: up to 25 GB comfortably (64 GB max practical)
For 110 GB: 5 Redis nodes (sharded or Redis Cluster)
// Cache hit rate targets
Good: > 90%
Excellent: > 95%
If < 80%: Review cache key strategy and TTL
Scaling Patterns
Read Replicas
| Pattern | How It Works | Scaling Limit |
|---|---|---|
| Primary-Replica | Writes go to primary; reads distributed across replicas | ~5-10 replicas before replication lag becomes an issue |
| Connection Pooling | Reuse database connections (PgBouncer, ProxySQL) | 10x more concurrent users per DB instance |
| Read-through Cache | Cache serves reads; DB only on cache miss | 90%+ read reduction on database |
Database Sharding
When a single database cannot handle the write load, partition data across multiple databases. See our Database Cheatsheet for sharding strategies.
| Strategy | How It Works | Trade-off |
|---|---|---|
| Hash-based | hash(key) % num_shards | Even distribution; resharding is expensive |
| Range-based | Partition by key ranges (A-M, N-Z) | Efficient range queries; potential hot spots |
| Geographic | Partition by user's region | Low latency for local users; cross-region queries hard |
| Consistent Hashing | Hash ring with virtual nodes | Minimal redistribution on shard changes |
Caching Layers
| Layer | Technology | Latency | What to Cache |
|---|---|---|---|
| Browser Cache | HTTP cache headers | 0 ms (local) | Static assets, API responses |
| CDN | CloudFront, Cloudflare | 5-20 ms (edge) | Images, videos, static pages |
| Application Cache | In-process (LRU map) | ~0.1 ms | Hot config, frequently accessed data |
| Distributed Cache | Redis, Memcached | 0.1-1 ms | Session data, DB query results, computed values |
| Database Cache | Query cache, buffer pool | 1-5 ms | Repeat queries, indexed data |
Auto-Scaling Strategies
| Strategy | Trigger | Response Time | Best For |
|---|---|---|---|
| Reactive (Threshold) | CPU > 70% for 5 min | 2-5 minutes | General workloads |
| Predictive (Schedule) | Known traffic patterns | Proactive (ahead of demand) | Predictable peaks (morning rush) |
| Target Tracking | Maintain target metric (e.g., 1000 req/instance) | 1-3 minutes | Steady-state optimization |
| Queue-based | Queue depth > threshold | 2-5 minutes | Async processing workers |
// Auto-scaling best practices
Scale-out: Fast (60s cooldown) — add capacity quickly
Scale-in: Slow (300s cooldown) — avoid thrashing
// Capacity formula
Required instances = peak_QPS / QPS_per_instance
With headroom (1.5x): instances = peak_QPS / QPS_per_instance * 1.5
Minimum instances: 3 (across 3 AZs for fault tolerance)
Example:
Peak QPS = 12,000
QPS per instance = 2,000
Required = 12,000 / 2,000 * 1.5 = 9 instances
Across 3 AZs: min 3 per AZ = 9 instances (matches!)
Capacity Planning Worksheet
System: _______________
Planning horizon: 1 year / 3 years / 5 years
USERS
- Current DAU: ___________
- Growth rate: ___% per year
- Projected DAU: ___________
TRAFFIC
- Actions/user/day: ___________
- Average QPS: ___________
- Peak QPS (3x): ___________
- Read:Write ratio: ___:___
STORAGE
- Avg record size: ___________ bytes
- New records/day: ___________
- Daily storage: ___________ GB
- Yearly storage: ___________ TB
- With replication: ___________ TB
COMPUTE
- QPS per instance: ___________
- Required instances: ___________
- With headroom: ___________
CACHE
- Cache hit target: ___%
- Cache size needed: ___________ GB
- Redis nodes: ___________
DATABASE
- Connections needed: ___________
- Read replicas: ___________
- Shards needed: ___________ (if applicable)
BANDWIDTH
- Incoming: ___________ Mbps
- Outgoing: ___________ Mbps
- CDN offload: ___%
Connection Pooling
| Parameter | Typical Value | Notes |
|---|---|---|
| Min pool size | 5-10 | Keep warm connections ready |
| Max pool size | 20-50 per instance | DB max_connections / app_instances |
| Connection timeout | 5-10 seconds | Time to wait for available connection |
| Idle timeout | 30-300 seconds | Return idle connections to pool |
| Max lifetime | 30-60 minutes | Recycle connections to prevent leaks |
// PostgreSQL max connections formula
DB max_connections = 100 (default, increase to 200-500)
App instances = 10
Pool size per instance = max_connections / instances = 10-50
// With PgBouncer (connection multiplexer)
PgBouncer can handle 10,000+ client connections
while only using 100 actual DB connections
Improvement: 100x connection efficiency
Async Processing Patterns
| Pattern | When to Use | Implementation |
|---|---|---|
| Message Queue | Decouple producer and consumer | Kafka, SQS, RabbitMQ |
| Event-Driven | React to state changes | EventBridge, Kafka, SNS |
| Batch Processing | Process large data volumes periodically | Spark, MapReduce, cron jobs |
| Stream Processing | Real-time data transformation | Kafka Streams, Flink, Kinesis |
| CQRS | Separate read/write optimization | Write DB + read-optimized view |
Performance Benchmarks and Thresholds
| Metric | Target | Alert Threshold |
|---|---|---|
| API response time (p50) | < 100 ms | > 200 ms |
| API response time (p99) | < 500 ms | > 1000 ms |
| Error rate (5xx) | < 0.1% | > 1% |
| CPU utilization | < 60% | > 80% |
| Memory utilization | < 70% | > 85% |
| DB query time | < 10 ms | > 50 ms |
| Cache hit rate | > 95% | < 80% |
| Queue depth | < 1000 | > 10,000 |
Explore API security and rate limiting strategies that complement scalability. Test configurations with our API and Network Tools and Security Crypto Tools. Visit swehelper.com/tools for all interactive tools.
Frequently Asked Questions
How do I know when to scale horizontally vs vertically?
Start with vertical scaling — it is simpler and works until you hit the hardware ceiling. Switch to horizontal when: (1) a single machine cannot handle the load, (2) you need fault tolerance across multiple nodes, (3) you need zero-downtime deploys, or (4) cost becomes prohibitive for larger machines. For stateless services like web servers, horizontal scaling is almost always the right default.
How many read replicas do I need?
Calculate: total_read_QPS / QPS_per_replica. A single PostgreSQL replica can handle about 5,000-10,000 simple reads/second. If you need 30,000 read QPS, you need 3-6 replicas. Add a caching layer first — it is more cost-effective. Replicas beyond 5-10 introduce significant replication lag. At that point, consider sharding or a different database architecture.
When should I shard my database?
Shard when: (1) write QPS exceeds what a single primary can handle, (2) data size exceeds single-machine storage/memory, or (3) query latency degrades due to data volume. First exhaust simpler options: optimize queries, add indexes, introduce caching, use read replicas, archive old data. Sharding adds significant operational complexity and should be a last resort. See the Database Cheatsheet for strategies.
What is the right cache TTL?
It depends on data freshness requirements. Common defaults: user profiles (5-15 minutes), product listings (1-5 minutes), configuration (1-24 hours), session data (30 minutes), search results (1-10 minutes). Start with short TTLs and increase as you gain confidence. Too short wastes cache capacity; too long serves stale data. Monitor cache hit rates to optimize.
How do I estimate costs for cloud infrastructure?
Rough AWS/Azure monthly estimates: compute instance (m5.xlarge) ~$140/month, RDS instance (db.r5.xlarge) ~$350/month, ElastiCache Redis (r5.large) ~$200/month, S3 storage ~$23/TB/month, data transfer ~$90/TB outbound. For a system serving 10M DAU, expect $5,000-$20,000/month depending on media storage and processing requirements. Use the cloud provider's pricing calculator for precise estimates.