Skip to main content
🎯Interview Prep

Scalability Cheatsheet: Patterns, Numbers, and Formulas for System Design

This cheatsheet is your quick reference for all things scalability. From load estimation formulas to scaling patterns and capacity planning, everything you...

📖 8 min read

Scalability Cheatsheet: Patterns, Numbers, and Formulas for System Design

This cheatsheet is your quick reference for all things scalability. From load estimation formulas to scaling patterns and capacity planning, everything you need to design systems that handle millions of users. Use this alongside our System Design Interview Guide and General Cheat Sheets.

Horizontal vs Vertical Scaling

Aspect Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out)
Method Add more CPU, RAM, disk to one machine Add more machines
Limit Hardware ceiling (largest available instance) Virtually unlimited
Complexity Simple (no code changes) Complex (distributed systems)
Downtime Required for upgrade (usually) Zero downtime (rolling deploys)
Cost Exponential (bigger machines cost disproportionately more) Linear (add commodity machines)
Fault Tolerance Single point of failure Redundant (survive node failures)
When to Use Small/medium scale, databases (initially) Large scale, web servers, stateless services

Load Estimation Formulas

// DAU to QPS Conversion
Daily Active Users (DAU) to QPS:
  QPS = DAU × (actions per user per day) / 86,400

Example: 10M DAU, 10 actions/user/day
  QPS = 10,000,000 × 10 / 86,400 = ~1,157 QPS
  Peak QPS = 1,157 × 3 = ~3,500 QPS (3x peak factor)

// Read-Write Split
Total QPS = 3,500
Read:Write ratio = 10:1
  Read QPS  = 3,500 × (10/11) = ~3,182
  Write QPS = 3,500 × (1/11)  = ~318

// Concurrent Users (approximate)
Concurrent users ≈ DAU × avg_session_duration / 86,400
Example: 10M DAU × 300s avg session / 86,400 = ~34,700 concurrent

Storage Estimation

// Storage per day
Daily storage = write_QPS × 86,400 × avg_record_size

Example: 318 writes/sec × 86,400 sec × 500 bytes
= 13.7 GB/day = ~5 TB/year

// 5-year storage
Total = 5 TB/year × 5 = 25 TB
With replication (3x) = 75 TB
With overhead (1.3x) = ~100 TB

// Common object sizes
Tweet/Post:           250 bytes (text only)
User profile:         1 KB
Image metadata:       200 bytes
Thumbnail:            10 KB
Photo (compressed):   200 KB
Video (1 min, 720p):  50 MB
Log entry:            200 bytes

Bandwidth Estimation

// Bandwidth = QPS × avg size
Incoming (write): 318 QPS × 500 bytes = 159 KB/s = ~1.3 Mbps
Outgoing (read):  3,182 QPS × 2 KB = 6.4 MB/s = ~51 Mbps

// Image-heavy service
Image reads: 1,000 QPS × 200 KB = 200 MB/s = ~1.6 Gbps
(This is why CDNs are essential for media-heavy apps)

// Network capacity reference
1 Gbps link ≈ 125 MB/s throughput (actual ~80-100 MB/s)
10 Gbps link ≈ 1 GB/s throughput

Cache Estimation

// Pareto principle: 20% of data = 80% of requests
Cache size = daily_read_requests × 0.2 × avg_response_size

Example:
Daily reads = 3,182 QPS × 86,400 = 275M requests/day
Cache size = 275M × 0.2 × 2 KB = ~110 GB

// Redis instance sizing
Single Redis node: up to 25 GB comfortably (64 GB max practical)
For 110 GB: 5 Redis nodes (sharded or Redis Cluster)

// Cache hit rate targets
Good:      > 90%
Excellent: > 95%
If < 80%:  Review cache key strategy and TTL

Scaling Patterns

Read Replicas

Pattern How It Works Scaling Limit
Primary-Replica Writes go to primary; reads distributed across replicas ~5-10 replicas before replication lag becomes an issue
Connection Pooling Reuse database connections (PgBouncer, ProxySQL) 10x more concurrent users per DB instance
Read-through Cache Cache serves reads; DB only on cache miss 90%+ read reduction on database

Database Sharding

When a single database cannot handle the write load, partition data across multiple databases. See our Database Cheatsheet for sharding strategies.

Strategy How It Works Trade-off
Hash-based hash(key) % num_shards Even distribution; resharding is expensive
Range-based Partition by key ranges (A-M, N-Z) Efficient range queries; potential hot spots
Geographic Partition by user's region Low latency for local users; cross-region queries hard
Consistent Hashing Hash ring with virtual nodes Minimal redistribution on shard changes

Caching Layers

Layer Technology Latency What to Cache
Browser Cache HTTP cache headers 0 ms (local) Static assets, API responses
CDN CloudFront, Cloudflare 5-20 ms (edge) Images, videos, static pages
Application Cache In-process (LRU map) ~0.1 ms Hot config, frequently accessed data
Distributed Cache Redis, Memcached 0.1-1 ms Session data, DB query results, computed values
Database Cache Query cache, buffer pool 1-5 ms Repeat queries, indexed data

Auto-Scaling Strategies

Strategy Trigger Response Time Best For
Reactive (Threshold) CPU > 70% for 5 min 2-5 minutes General workloads
Predictive (Schedule) Known traffic patterns Proactive (ahead of demand) Predictable peaks (morning rush)
Target Tracking Maintain target metric (e.g., 1000 req/instance) 1-3 minutes Steady-state optimization
Queue-based Queue depth > threshold 2-5 minutes Async processing workers
// Auto-scaling best practices
Scale-out: Fast (60s cooldown) — add capacity quickly
Scale-in:  Slow (300s cooldown) — avoid thrashing

// Capacity formula
Required instances = peak_QPS / QPS_per_instance
With headroom (1.5x): instances = peak_QPS / QPS_per_instance * 1.5
Minimum instances: 3 (across 3 AZs for fault tolerance)

Example:
Peak QPS = 12,000
QPS per instance = 2,000
Required = 12,000 / 2,000 * 1.5 = 9 instances
Across 3 AZs: min 3 per AZ = 9 instances (matches!)

Capacity Planning Worksheet

System: _______________
Planning horizon: 1 year / 3 years / 5 years

USERS
- Current DAU:        ___________
- Growth rate:        ___% per year
- Projected DAU:      ___________

TRAFFIC
- Actions/user/day:   ___________
- Average QPS:        ___________
- Peak QPS (3x):      ___________
- Read:Write ratio:   ___:___

STORAGE
- Avg record size:    ___________ bytes
- New records/day:    ___________
- Daily storage:      ___________ GB
- Yearly storage:     ___________ TB
- With replication:   ___________ TB

COMPUTE
- QPS per instance:   ___________
- Required instances: ___________
- With headroom:      ___________

CACHE
- Cache hit target:   ___%
- Cache size needed:  ___________ GB
- Redis nodes:        ___________

DATABASE
- Connections needed: ___________
- Read replicas:      ___________
- Shards needed:      ___________ (if applicable)

BANDWIDTH
- Incoming:           ___________ Mbps
- Outgoing:           ___________ Mbps
- CDN offload:        ___%

Connection Pooling

Parameter Typical Value Notes
Min pool size 5-10 Keep warm connections ready
Max pool size 20-50 per instance DB max_connections / app_instances
Connection timeout 5-10 seconds Time to wait for available connection
Idle timeout 30-300 seconds Return idle connections to pool
Max lifetime 30-60 minutes Recycle connections to prevent leaks
// PostgreSQL max connections formula
DB max_connections = 100 (default, increase to 200-500)
App instances = 10
Pool size per instance = max_connections / instances = 10-50

// With PgBouncer (connection multiplexer)
PgBouncer can handle 10,000+ client connections
while only using 100 actual DB connections
Improvement: 100x connection efficiency

Async Processing Patterns

Pattern When to Use Implementation
Message Queue Decouple producer and consumer Kafka, SQS, RabbitMQ
Event-Driven React to state changes EventBridge, Kafka, SNS
Batch Processing Process large data volumes periodically Spark, MapReduce, cron jobs
Stream Processing Real-time data transformation Kafka Streams, Flink, Kinesis
CQRS Separate read/write optimization Write DB + read-optimized view

Performance Benchmarks and Thresholds

Metric Target Alert Threshold
API response time (p50) < 100 ms > 200 ms
API response time (p99) < 500 ms > 1000 ms
Error rate (5xx) < 0.1% > 1%
CPU utilization < 60% > 80%
Memory utilization < 70% > 85%
DB query time < 10 ms > 50 ms
Cache hit rate > 95% < 80%
Queue depth < 1000 > 10,000

Explore API security and rate limiting strategies that complement scalability. Test configurations with our API and Network Tools and Security Crypto Tools. Visit swehelper.com/tools for all interactive tools.

Frequently Asked Questions

How do I know when to scale horizontally vs vertically?

Start with vertical scaling — it is simpler and works until you hit the hardware ceiling. Switch to horizontal when: (1) a single machine cannot handle the load, (2) you need fault tolerance across multiple nodes, (3) you need zero-downtime deploys, or (4) cost becomes prohibitive for larger machines. For stateless services like web servers, horizontal scaling is almost always the right default.

How many read replicas do I need?

Calculate: total_read_QPS / QPS_per_replica. A single PostgreSQL replica can handle about 5,000-10,000 simple reads/second. If you need 30,000 read QPS, you need 3-6 replicas. Add a caching layer first — it is more cost-effective. Replicas beyond 5-10 introduce significant replication lag. At that point, consider sharding or a different database architecture.

When should I shard my database?

Shard when: (1) write QPS exceeds what a single primary can handle, (2) data size exceeds single-machine storage/memory, or (3) query latency degrades due to data volume. First exhaust simpler options: optimize queries, add indexes, introduce caching, use read replicas, archive old data. Sharding adds significant operational complexity and should be a last resort. See the Database Cheatsheet for strategies.

What is the right cache TTL?

It depends on data freshness requirements. Common defaults: user profiles (5-15 minutes), product listings (1-5 minutes), configuration (1-24 hours), session data (30 minutes), search results (1-10 minutes). Start with short TTLs and increase as you gain confidence. Too short wastes cache capacity; too long serves stale data. Monitor cache hit rates to optimize.

How do I estimate costs for cloud infrastructure?

Rough AWS/Azure monthly estimates: compute instance (m5.xlarge) ~$140/month, RDS instance (db.r5.xlarge) ~$350/month, ElastiCache Redis (r5.large) ~$200/month, S3 storage ~$23/TB/month, data transfer ~$90/TB outbound. For a system serving 10M DAU, expect $5,000-$20,000/month depending on media storage and processing requirements. Use the cloud provider's pricing calculator for precise estimates.

Related Articles