System Design Cheat Sheets: Quick Reference for Every Key Concept

This is your one-stop quick reference for system design interviews. Every formula, number, pattern, and decision framework you need — condensed into scannable tables and lists. Bookmark this page and review it before your interview. For deeper coverage, see our System Design Interview Guide and Common Questions.

Numbers Every Engineer Should Know

Operation	Latency	Notes
L1 cache reference	0.5 ns	Fastest memory access
L2 cache reference	7 ns	14x L1
Main memory (RAM)	100 ns	200x L1
SSD random read	150 μs	~1,000x RAM
HDD seek	10 ms	~100,000x RAM
Network round trip (same DC)	500 μs	0.5 ms
Network round trip (cross-continent)	150 ms	Use CDN to reduce
Read 1 MB from SSD	1 ms	~1 GB/s throughput
Read 1 MB from network (1 Gbps)	10 ms	~100 MB/s
Redis GET	0.1-0.2 ms	In-memory, very fast
Simple DB query (indexed)	1-5 ms	With warm cache

Back-of-Envelope Estimation Formulas

// Time conversions
1 day    = 86,400 seconds ≈ 100,000 seconds (for estimation)
1 month  = 2.5 million seconds
1 year   = 31.5 million seconds

// QPS (Queries Per Second)
QPS = DAU × (avg queries per user per day) / 86,400
Peak QPS = QPS × 2-3 (peak factor)
Write QPS = QPS × write_ratio

// Storage
Storage per year = daily_new_records × 365 × avg_record_size
Total storage (5 years) = Storage per year × 5

// Bandwidth
Incoming BW = write_QPS × avg_request_size
Outgoing BW = read_QPS × avg_response_size

// Cache (80/20 rule)
Cache size = daily_read_requests × 0.2 × avg_response_size

Power of 2 Quick Reference

Power	Exact	Approx	Name
2^10	1,024	1 Thousand	1 KB
2^20	1,048,576	1 Million	1 MB
2^30	1,073,741,824	1 Billion	1 GB
2^40	~1 Trillion	1 Trillion	1 TB
2^50	~1 Quadrillion	1 Quadrillion	1 PB

CAP Theorem Quick Reference

Property	Meaning	Example
Consistency	All nodes see the same data at the same time	Bank transactions
Availability	Every request receives a response	Social media feed
Partition Tolerance	System works despite network partitions	Required in distributed systems

Choose	Trade-off	Databases
CP	May be unavailable during partition	MongoDB, HBase, Redis
AP	May return stale data during partition	Cassandra, DynamoDB, CouchDB
CA	Not partition tolerant (single node)	Traditional RDBMS (single node PostgreSQL)

Consistency Patterns

Pattern	Guarantee	Use Case
Strong Consistency	Reads always return latest write	Banking, inventory
Eventual Consistency	Reads eventually return latest write	Social feeds, analytics
Read-your-writes	User sees their own writes immediately	User profile updates
Causal Consistency	Causally related writes seen in order	Comment threads

Caching Strategies

Strategy	How It Works	Best For
Cache-Aside (Lazy)	App reads cache first; on miss, reads DB and populates cache	Read-heavy, general purpose
Write-Through	Write to cache and DB simultaneously	When data freshness is critical
Write-Behind (Back)	Write to cache; async write to DB later	Write-heavy with eventual consistency OK
Write-Around	Write directly to DB; cache populated on read	Data rarely re-read after write
Read-Through	Cache loads from DB on miss transparently	Simplified application code

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Cycles through servers sequentially	Equal-capacity servers, stateless
Weighted Round Robin	More traffic to higher-capacity servers	Mixed-capacity servers
Least Connections	Route to server with fewest active connections	Long-lived connections, varying request times
IP Hash	Hash client IP to determine server	Session affinity without cookies
Consistent Hashing	Minimize redistribution when servers change	Caches, distributed databases

Database Selection Guide

See our Database Cheatsheet for detailed comparison.

Use Case	Database Type	Examples
Structured data, ACID transactions	Relational (SQL)	PostgreSQL, MySQL
Flexible schema, rapid iteration	Document	MongoDB, CouchDB
High write throughput, horizontal scale	Wide Column	Cassandra, HBase
Caching, sessions, leaderboards	Key-Value	Redis, Memcached, DynamoDB
Relationships, social graphs	Graph	Neo4j, Amazon Neptune
Full-text search	Search Engine	Elasticsearch, Solr
Metrics, monitoring, IoT	Time Series	InfluxDB, TimescaleDB

Message Queue Comparison

Feature	Kafka	RabbitMQ	SQS
Model	Log-based (pull)	Queue (push)	Queue (pull)
Throughput	Very high (millions/sec)	High (100K/sec)	High (managed)
Ordering	Per-partition	Per-queue	FIFO option
Retention	Configurable (days/weeks)	Until consumed	14 days max
Best for	Event streaming, log aggregation	Task queues, RPC	Serverless, simple decoupling

Microservices Patterns

Pattern	Purpose
API Gateway	Single entry point, routing, auth, rate limiting
Service Discovery	Services find each other dynamically
Circuit Breaker	Prevent cascading failures
Saga Pattern	Distributed transactions via compensating actions
CQRS	Separate read and write models
Event Sourcing	Store state as sequence of events
Sidecar	Attach helper process alongside main service
Strangler Fig	Incrementally migrate from monolith

API Design Checklist

API Design:
[ ] RESTful resource naming (nouns, not verbs)
[ ] Consistent HTTP methods (GET=read, POST=create, PUT=update, DELETE=delete)
[ ] Proper status codes (200, 201, 400, 401, 403, 404, 429, 500)
[ ] Pagination (cursor-based for real-time data, offset for static)
[ ] Versioning strategy (URL path: /v1/ recommended)
[ ] Rate limiting headers (X-RateLimit-*)
[ ] Authentication (OAuth 2.0, JWT, API keys)
[ ] Input validation and sanitization
[ ] Error response format (consistent JSON structure)
[ ] HATEOAS links (optional, for discoverability)

Security Quick Reference

For detailed security guides, see Authentication vs Authorization, OAuth 2.0, JWT, and Encryption.

Topic	Key Points
Authentication	OAuth 2.0 + OIDC for user-facing; mTLS for service-to-service
Authorization	RBAC for most apps; ABAC for fine-grained policies
Encryption	AES-256-GCM at rest; TLS 1.3 in transit
Passwords	bcrypt or Argon2id, never SHA-256 or MD5
API Security	Rate limiting, input validation, CORS, security headers

Practice with Security Crypto Tools and API Network Tools. Visit swehelper.com/tools for all interactive tools.

Monitoring Checklist

The Four Golden Signals (Google SRE):
1. Latency    — Time to process a request (p50, p95, p99)
2. Traffic    — Requests per second
3. Errors     — Rate of failed requests (5xx)
4. Saturation — How full the system is (CPU, memory, disk, connections)

RED Method (for request-driven services):
- Rate:     Requests per second
- Errors:   Number of failed requests
- Duration: Distribution of request durations

USE Method (for resources):
- Utilization: % of time resource is busy
- Saturation:  Amount of work queued
- Errors:      Count of error events

Availability SLA Reference

SLA	Downtime/Year	Downtime/Month	Downtime/Day
99% (two 9s)	3.65 days	7.3 hours	14.4 minutes
99.9% (three 9s)	8.76 hours	43.8 minutes	1.44 minutes
99.99% (four 9s)	52.6 minutes	4.38 minutes	8.6 seconds
99.999% (five 9s)	5.26 minutes	26.3 seconds	0.86 seconds

Frequently Asked Questions

What is the most important concept for system design interviews?

Trade-offs. Every design decision involves trade-offs between consistency and availability, latency and throughput, simplicity and scalability, cost and performance. The ability to articulate why you chose one approach over another is the single most important skill. See our Interview Guide for the full framework.

How do I decide between SQL and NoSQL?

Default to SQL (PostgreSQL) unless you have a specific reason for NoSQL. Use NoSQL when you need: horizontal write scaling (Cassandra), flexible schemas (MongoDB), extreme read speed (Redis), or graph queries (Neo4j). See our Database Cheatsheet for detailed decision criteria.

When should I introduce caching in my design?

Introduce caching when: reads significantly outnumber writes (10:1+), data changes infrequently, latency requirements are strict, or you need to reduce database load. Cache-aside with Redis is the safest default. Always discuss cache invalidation strategy — it is one of the hardest problems in computer science.

What is the difference between horizontal and vertical scaling?

Vertical scaling (scale up) means adding more CPU/RAM to a single machine. It is simpler but has a hard ceiling. Horizontal scaling (scale out) means adding more machines. It requires distributed systems thinking (load balancing, sharding, consistency) but scales nearly infinitely. Most interview designs should plan for horizontal scaling. See our Scalability Cheatsheet for patterns.

System Design Cheat Sheets: Quick Reference for Every Key Concept