Design Netflix: A Video Streaming Architecture
Netflix serves over 260 million subscribers across 190+ countries, streaming billions of hours of content monthly. Its architecture is a masterclass in microservices, resilience engineering, and content delivery at global scale. This guide covers how to design a Netflix-like streaming platform, focusing on its unique CDN (Open Connect), recommendation engine, microservices patterns, and resilience strategies.
1. Requirements
Functional Requirements
- Users can browse a catalog of movies, TV shows, and documentaries.
- Personalized homepage with rows of recommended content.
- Video playback with adaptive bitrate streaming across devices (TV, mobile, web, tablet).
- Search for titles, actors, genres, and directors.
- User profiles (up to 5 per account) with individual watch history and preferences.
- Continue watching: resume playback from where the user left off.
- Download content for offline viewing on mobile devices.
- Multiple audio tracks and subtitle languages.
Non-Functional Requirements
- High availability: 99.99% uptime. Streaming must work even during partial failures.
- Low latency: Content should start playing within 3 seconds.
- Scalability: Handle 260M+ subscribers with peak concurrent streams of 15M+.
- Global reach: Low-latency streaming in 190+ countries.
- Resilience: Graceful degradation under failure, no cascading failures.
2. Capacity Estimation
| Metric | Estimate |
|---|---|
| Total subscribers | 260 million |
| Daily active users | ~100 million |
| Peak concurrent streams | ~15 million |
| Average stream bitrate | 5 Mbps (1080p average) |
| Peak bandwidth | 15M × 5 Mbps = 75 Tbps |
| Content library | ~17,000 titles |
| Encoded versions per title | ~1,200 (resolutions × bitrates × audio × subtitles) |
| Total storage (all encoded versions) | ~100 PB |
| API requests per day | ~500 billion (browsing, search, play events) |
Netflix accounts for approximately 15% of global internet downstream bandwidth during peak hours. This is why they built their own CDN.
3. High-Level Design
Netflix has two distinct planes: the Control Plane (runs on AWS) for all application logic, and the Data Plane (Open Connect CDN) for video delivery.
| Plane | Component | Responsibility |
|---|---|---|
| Control | API Gateway (Zuul) | Request routing, authentication, rate limiting |
| Control | Recommendation Service | Personalized content suggestions |
| Control | Playback Service | License validation, stream URL generation |
| Control | User/Profile Service | Account, profiles, preferences |
| Control | Search Service | Content search and filtering |
| Control | Content Ingestion Pipeline | Encoding, quality checks, CDN distribution |
| Data | Open Connect Appliances (OCAs) | Edge servers inside ISPs for video delivery |
| Data | Open Connect Control Plane | CDN health, routing, content placement |
4. Detailed Component Design
4.1 Open Connect CDN
Netflix's custom CDN, called Open Connect, is one of the largest CDNs in the world. It consists of Open Connect Appliances (OCAs) — custom-built servers deployed directly inside Internet Service Provider (ISP) networks.
How content gets to edge servers:
- New content is encoded in the cloud (AWS) into ~1,200 versions per title.
- During off-peak hours (typically 2-6 AM local time), the encoded files are proactively pushed to OCA servers worldwide.
- Content placement is optimized: popular titles are replicated to more OCAs, niche content to fewer.
- Each OCA has ~100-200 TB of storage, holding the most popular content for its region.
How streaming works:
- User clicks play → Playback Service determines the best OCA cluster based on user location and server health.
- Client receives a list of OCA URLs ranked by preference.
- Client streams directly from the OCA (video segments over HTTPS).
- If the primary OCA fails, the client automatically fails over to the next OCA in the list.
// Playback initiation (simplified)
POST /api/v1/playback/start
Body: { "title_id": "80100172", "profile_id": "prof_123" }
Response: {
"manifest_url": "https://oca1.nflxvideo.net/range/80100172/manifest.mpd",
"fallback_urls": [
"https://oca2.nflxvideo.net/range/80100172/manifest.mpd",
"https://oca3.nflxvideo.net/range/80100172/manifest.mpd"
],
"drm_license": "...",
"resume_position_ms": 1234567,
"audio_tracks": ["en", "es", "fr", "de", "ja"],
"subtitle_tracks": ["en", "es", "fr", "de", "ja", "ko"]
}
This architecture means that during playback, zero traffic hits AWS. All streaming traffic stays within the ISP network, resulting in lower latency and reduced bandwidth costs for both Netflix and the ISP.
4.2 Content Ingestion Pipeline
When new content is ingested (e.g., a new movie from a studio), it goes through a multi-stage pipeline:
- Ingest: Receive the original master file (often in Mezzanine format, ~50 GB+).
- Quality Analysis: Per-shot video analysis determines optimal encoding parameters. Netflix's "per-title encoding" analyzes each shot's complexity to allocate bitrate efficiently.
- Encoding: Encode into ~1,200 versions: multiple resolutions (240p to 4K HDR) × bitrates × codecs (H.264, H.265, VP9, AV1) × audio formats (AAC, Dolby Atmos, etc.).
- DRM Packaging: Wrap encoded files with DRM (Widevine, FairPlay, PlayReady).
- CDN Distribution: Push encoded files to OCAs worldwide during off-peak hours.
4.3 Recommendation Engine
Recommendations drive over 80% of content watched on Netflix. The system uses multiple ML models:
| Algorithm | How It Works | Used For |
|---|---|---|
| Collaborative Filtering | Users who liked X also liked Y | "Because You Watched" row |
| Content-Based Filtering | Match content attributes (genre, actors, director) to user preferences | Genre rows, similar titles |
| Trending Now | Real-time popularity tracking per region | "Trending Now" row |
| Page-Level Ranking | Determines which rows appear and in what order on the homepage | Homepage layout |
| Artwork Personalization | Different thumbnail images shown to different users based on their tastes | Title card images |
// Homepage generation pipeline
function generateHomepage(profileId) {
const profile = getProfile(profileId);
const watchHistory = getWatchHistory(profileId);
// Generate candidate rows
const rows = [
{ type: "continue_watching", items: getContinueWatching(profileId) },
{ type: "trending", items: getTrending(profile.country) },
{ type: "top_10", items: getTop10(profile.country) },
{ type: "because_you_watched", items: getBYW(watchHistory.last5) },
...getPersonalizedGenreRows(profile, 20),
{ type: "new_releases", items: getNewReleases(7) }
];
// Rank rows by predicted engagement
const rankedRows = pageRankingModel.rank(rows, profile);
// For each row, select personalized artwork
for (const row of rankedRows) {
for (const item of row.items) {
item.artwork = artworkModel.selectBest(item.titleId, profile);
}
}
return rankedRows.slice(0, 40);
}
4.4 Microservices Architecture
Netflix runs 1,000+ microservices on AWS. Key patterns include:
- API Gateway (Zuul): Single entry point for all client requests. Handles routing, authentication, request transformation, and canary deployments.
- Service Discovery (Eureka): Services register themselves; clients discover service instances dynamically without hardcoded addresses.
- Inter-Service Communication: Primarily HTTP/REST with gRPC for performance-critical paths.
- Configuration (Archaius): Dynamic configuration changes without redeployment.
4.5 Resilience Patterns
Netflix pioneered many resilience patterns that are now industry standards:
| Pattern | Implementation | Purpose |
|---|---|---|
| Circuit Breaker | Hystrix (now Resilience4j) | Stop calling a failing service; return fallback |
| Bulkhead | Thread pool isolation | Isolate failures; one slow service does not exhaust all threads |
| Retry with Backoff | Exponential backoff + jitter | Handle transient failures without thundering herd |
| Fallback | Pre-computed cached responses | Serve stale but functional data when a service is down |
| Chaos Engineering | Chaos Monkey, Chaos Kong | Proactively inject failures to test resilience |
// Circuit breaker pattern example
class RecommendationClient {
constructor() {
this.circuitBreaker = new CircuitBreaker({
failureThreshold: 5, // Open after 5 failures
resetTimeout: 30000, // Try again after 30 seconds
monitorInterval: 10000
});
}
async getRecommendations(profileId) {
return this.circuitBreaker.execute(
// Primary call
() => this.recoService.getPersonalized(profileId),
// Fallback when circuit is open
() => this.cache.getStaleRecommendations(profileId)
|| this.getGenericTopContent()
);
}
}
5. Database Schema
Netflix uses a polyglot persistence strategy. Different data stores for different access patterns. See SQL vs NoSQL for context.
-- User and account data (MySQL/PostgreSQL)
CREATE TABLE accounts (
id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
plan ENUM('basic','standard','premium') NOT NULL,
country VARCHAR(2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE profiles (
id BIGINT PRIMARY KEY,
account_id BIGINT NOT NULL REFERENCES accounts(id),
name VARCHAR(50) NOT NULL,
avatar_url TEXT,
maturity_level ENUM('kids','teen','adult') DEFAULT 'adult',
language VARCHAR(5) DEFAULT 'en'
);
-- Content catalog (Cassandra for global reads)
-- Partition by title_id for fast single-title lookups
CREATE TABLE titles (
title_id UUID PRIMARY KEY,
title_type TEXT,
name TEXT,
description TEXT,
release_year INT,
maturity_rating TEXT,
genres SET<TEXT>,
cast_list LIST<TEXT>,
duration_minutes INT,
season_count INT,
available_countries SET<TEXT>,
created_at TIMESTAMP
);
-- Watch history (Cassandra for high write throughput)
CREATE TABLE watch_history (
profile_id UUID,
watched_at TIMESTAMP,
title_id UUID,
episode_id UUID,
position_ms BIGINT,
duration_ms BIGINT,
completed BOOLEAN,
PRIMARY KEY (profile_id, watched_at)
) WITH CLUSTERING ORDER BY (watched_at DESC);
-- Viewing activity for analytics (Kafka + Druid/ClickHouse)
-- Events: play_start, pause, seek, quality_change, play_end
6. Key Trade-offs
| Decision | Trade-off |
|---|---|
| Own CDN vs third-party CDN | Open Connect gives Netflix full control, lower cost at scale, and ISP partnerships. But it requires massive upfront investment and ongoing maintenance of 17,000+ servers. At Netflix's scale, the economics strongly favor a custom CDN. |
| Pre-positioning vs on-demand caching | Netflix proactively pushes content to edge servers during off-peak hours. This guarantees zero cache misses for popular content but wastes bandwidth for content nobody watches. ML models predict popularity to optimize placement. |
| Per-title encoding vs fixed encoding ladder | Per-title encoding analyzes each title's visual complexity and creates a custom bitrate ladder. This saves 20-30% bandwidth compared to a fixed ladder but requires more compute for encoding. The bandwidth savings justify the encoding cost. |
| Microservices vs monolith | 1,000+ microservices enable independent team ownership and deployment. But they introduce complexity: distributed tracing, service mesh, and the need for resilience patterns everywhere. Netflix invested heavily in tooling (Zuul, Eureka, Hystrix) to manage this complexity. |
7. Scaling Considerations
7.1 Multi-Region Deployment
Netflix runs on three AWS regions (US-East, US-West, EU-West). All regions are active-active with data replicated across them. If one region fails completely, traffic is redirected to the remaining regions. This was validated by Chaos Kong exercises that simulated entire region failures. Understand the CAP theorem implications of multi-region replication.
7.2 Data Tier Scaling
- Cassandra: Hundreds of clusters across regions for content catalog, watch history, and user data. Data is replicated across regions. Read from the local region for low latency. Sharding is handled natively by Cassandra's consistent hashing.
- EVCache (Memcached): Netflix's distributed caching layer. Trillions of cache reads per day. Replicated across zones for availability.
- Kafka: For event streaming — play events, clickstream data, logging. Processes trillions of messages per day.
7.3 Handling Peak Traffic
Auto-scaling on AWS handles compute peaks. The CDN pre-positions content before expected peaks (e.g., new season launches). Load balancing across microservices uses client-side load balancing (Ribbon) with health-aware routing.
Use swehelper.com tools to practice designing resilient streaming architectures.
8. Frequently Asked Questions
Q1: How does Netflix achieve sub-3-second start time for video playback?
Three factors: (1) Content is pre-positioned on OCA servers inside the user's ISP network, so the first byte travels a very short distance. (2) Netflix uses a technique called "prebuffering" where the client starts downloading the first few seconds of video in low quality while the manifest loads. (3) Adaptive bitrate starts at a low quality and ramps up, avoiding buffering at the start.
Q2: What happens if the recommendation service goes down?
Netflix uses circuit breakers and fallback strategies. If the personalized recommendation service fails, the system falls back to pre-computed cached recommendations (slightly stale but still personalized). If even the cache is unavailable, it falls back to generic top-10 content for the user's region. The user always sees something useful, even during failures.
Q3: How does Netflix handle a new show launch that everyone watches at once?
For anticipated launches, Netflix pre-positions all episodes on every OCA worldwide well in advance. They also pre-warm CDN caches and increase capacity on the control plane services. Auto-scaling provisions additional compute. The separation of control plane (AWS) and data plane (Open Connect) ensures that even if API servers are stressed, streaming continues uninterrupted from OCAs.
Q4: Why did Netflix build its own CDN instead of using Akamai or CloudFront?
At Netflix's scale (15%+ of global internet traffic), the cost of third-party CDN would be enormous. Open Connect saves Netflix billions annually. Additionally, owning the CDN gives them: (1) full control over caching logic and content placement, (2) direct ISP partnerships that reduce network congestion, (3) the ability to optimize the stack end-to-end for video streaming, and (4) custom hardware optimized for streaming workloads.