Design Netflix: A Video Streaming Architecture

Netflix serves over 260 million subscribers across 190+ countries, streaming billions of hours of content monthly. Its architecture is a masterclass in microservices, resilience engineering, and content delivery at global scale. This guide covers how to design a Netflix-like streaming platform, focusing on its unique CDN (Open Connect), recommendation engine, microservices patterns, and resilience strategies.

1. Requirements

Functional Requirements

Users can browse a catalog of movies, TV shows, and documentaries.
Personalized homepage with rows of recommended content.
Video playback with adaptive bitrate streaming across devices (TV, mobile, web, tablet).
Search for titles, actors, genres, and directors.
User profiles (up to 5 per account) with individual watch history and preferences.
Continue watching: resume playback from where the user left off.
Download content for offline viewing on mobile devices.
Multiple audio tracks and subtitle languages.

Non-Functional Requirements

High availability: 99.99% uptime. Streaming must work even during partial failures.
Low latency: Content should start playing within 3 seconds.
Scalability: Handle 260M+ subscribers with peak concurrent streams of 15M+.
Global reach: Low-latency streaming in 190+ countries.
Resilience: Graceful degradation under failure, no cascading failures.

2. Capacity Estimation

Metric	Estimate
Total subscribers	260 million
Daily active users	~100 million
Peak concurrent streams	~15 million
Average stream bitrate	5 Mbps (1080p average)
Peak bandwidth	15M × 5 Mbps = 75 Tbps
Content library	~17,000 titles
Encoded versions per title	~1,200 (resolutions × bitrates × audio × subtitles)
Total storage (all encoded versions)	~100 PB
API requests per day	~500 billion (browsing, search, play events)

Netflix accounts for approximately 15% of global internet downstream bandwidth during peak hours. This is why they built their own CDN.

3. High-Level Design

Netflix has two distinct planes: the Control Plane (runs on AWS) for all application logic, and the Data Plane (Open Connect CDN) for video delivery.

Plane	Component	Responsibility
Control	API Gateway (Zuul)	Request routing, authentication, rate limiting
Control	Recommendation Service	Personalized content suggestions
Control	Playback Service	License validation, stream URL generation
Control	User/Profile Service	Account, profiles, preferences
Control	Search Service	Content search and filtering
Control	Content Ingestion Pipeline	Encoding, quality checks, CDN distribution
Data	Open Connect Appliances (OCAs)	Edge servers inside ISPs for video delivery
Data	Open Connect Control Plane	CDN health, routing, content placement

4. Detailed Component Design

4.1 Open Connect CDN

Netflix's custom CDN, called Open Connect, is one of the largest CDNs in the world. It consists of Open Connect Appliances (OCAs) — custom-built servers deployed directly inside Internet Service Provider (ISP) networks.

How content gets to edge servers:

New content is encoded in the cloud (AWS) into ~1,200 versions per title.
During off-peak hours (typically 2-6 AM local time), the encoded files are proactively pushed to OCA servers worldwide.
Content placement is optimized: popular titles are replicated to more OCAs, niche content to fewer.
Each OCA has ~100-200 TB of storage, holding the most popular content for its region.

How streaming works:

User clicks play → Playback Service determines the best OCA cluster based on user location and server health.
Client receives a list of OCA URLs ranked by preference.
Client streams directly from the OCA (video segments over HTTPS).
If the primary OCA fails, the client automatically fails over to the next OCA in the list.

// Playback initiation (simplified)
POST /api/v1/playback/start
Body: { "title_id": "80100172", "profile_id": "prof_123" }
Response: {
    "manifest_url": "https://oca1.nflxvideo.net/range/80100172/manifest.mpd",
    "fallback_urls": [
        "https://oca2.nflxvideo.net/range/80100172/manifest.mpd",
        "https://oca3.nflxvideo.net/range/80100172/manifest.mpd"
    ],
    "drm_license": "...",
    "resume_position_ms": 1234567,
    "audio_tracks": ["en", "es", "fr", "de", "ja"],
    "subtitle_tracks": ["en", "es", "fr", "de", "ja", "ko"]
}

This architecture means that during playback, zero traffic hits AWS. All streaming traffic stays within the ISP network, resulting in lower latency and reduced bandwidth costs for both Netflix and the ISP.

4.2 Content Ingestion Pipeline

When new content is ingested (e.g., a new movie from a studio), it goes through a multi-stage pipeline:

Ingest: Receive the original master file (often in Mezzanine format, ~50 GB+).
Quality Analysis: Per-shot video analysis determines optimal encoding parameters. Netflix's "per-title encoding" analyzes each shot's complexity to allocate bitrate efficiently.
Encoding: Encode into ~1,200 versions: multiple resolutions (240p to 4K HDR) × bitrates × codecs (H.264, H.265, VP9, AV1) × audio formats (AAC, Dolby Atmos, etc.).
DRM Packaging: Wrap encoded files with DRM (Widevine, FairPlay, PlayReady).
CDN Distribution: Push encoded files to OCAs worldwide during off-peak hours.

4.3 Recommendation Engine

Recommendations drive over 80% of content watched on Netflix. The system uses multiple ML models:

Algorithm	How It Works	Used For
Collaborative Filtering	Users who liked X also liked Y	"Because You Watched" row
Content-Based Filtering	Match content attributes (genre, actors, director) to user preferences	Genre rows, similar titles
Trending Now	Real-time popularity tracking per region	"Trending Now" row
Page-Level Ranking	Determines which rows appear and in what order on the homepage	Homepage layout
Artwork Personalization	Different thumbnail images shown to different users based on their tastes	Title card images

// Homepage generation pipeline
function generateHomepage(profileId) {
    const profile = getProfile(profileId);
    const watchHistory = getWatchHistory(profileId);

    // Generate candidate rows
    const rows = [
        { type: "continue_watching", items: getContinueWatching(profileId) },
        { type: "trending", items: getTrending(profile.country) },
        { type: "top_10", items: getTop10(profile.country) },
        { type: "because_you_watched", items: getBYW(watchHistory.last5) },
        ...getPersonalizedGenreRows(profile, 20),
        { type: "new_releases", items: getNewReleases(7) }
    ];

    // Rank rows by predicted engagement
    const rankedRows = pageRankingModel.rank(rows, profile);

    // For each row, select personalized artwork
    for (const row of rankedRows) {
        for (const item of row.items) {
            item.artwork = artworkModel.selectBest(item.titleId, profile);
        }
    }

    return rankedRows.slice(0, 40);
}

4.4 Microservices Architecture

Netflix runs 1,000+ microservices on AWS. Key patterns include:

API Gateway (Zuul): Single entry point for all client requests. Handles routing, authentication, request transformation, and canary deployments.
Service Discovery (Eureka): Services register themselves; clients discover service instances dynamically without hardcoded addresses.
Inter-Service Communication: Primarily HTTP/REST with gRPC for performance-critical paths.
Configuration (Archaius): Dynamic configuration changes without redeployment.

4.5 Resilience Patterns

Netflix pioneered many resilience patterns that are now industry standards:

Pattern	Implementation	Purpose
Circuit Breaker	Hystrix (now Resilience4j)	Stop calling a failing service; return fallback
Bulkhead	Thread pool isolation	Isolate failures; one slow service does not exhaust all threads
Retry with Backoff	Exponential backoff + jitter	Handle transient failures without thundering herd
Fallback	Pre-computed cached responses	Serve stale but functional data when a service is down
Chaos Engineering	Chaos Monkey, Chaos Kong	Proactively inject failures to test resilience

// Circuit breaker pattern example
class RecommendationClient {
    constructor() {
        this.circuitBreaker = new CircuitBreaker({
            failureThreshold: 5,      // Open after 5 failures
            resetTimeout: 30000,      // Try again after 30 seconds
            monitorInterval: 10000
        });
    }

    async getRecommendations(profileId) {
        return this.circuitBreaker.execute(
            // Primary call
            () => this.recoService.getPersonalized(profileId),
            // Fallback when circuit is open
            () => this.cache.getStaleRecommendations(profileId)
                   || this.getGenericTopContent()
        );
    }
}

5. Database Schema

Netflix uses a polyglot persistence strategy. Different data stores for different access patterns. See SQL vs NoSQL for context.

-- User and account data (MySQL/PostgreSQL)
CREATE TABLE accounts (
    id BIGINT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    plan ENUM('basic','standard','premium') NOT NULL,
    country VARCHAR(2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE profiles (
    id BIGINT PRIMARY KEY,
    account_id BIGINT NOT NULL REFERENCES accounts(id),
    name VARCHAR(50) NOT NULL,
    avatar_url TEXT,
    maturity_level ENUM('kids','teen','adult') DEFAULT 'adult',
    language VARCHAR(5) DEFAULT 'en'
);

-- Content catalog (Cassandra for global reads)
-- Partition by title_id for fast single-title lookups
CREATE TABLE titles (
    title_id UUID PRIMARY KEY,
    title_type TEXT,
    name TEXT,
    description TEXT,
    release_year INT,
    maturity_rating TEXT,
    genres SET<TEXT>,
    cast_list LIST<TEXT>,
    duration_minutes INT,
    season_count INT,
    available_countries SET<TEXT>,
    created_at TIMESTAMP
);

-- Watch history (Cassandra for high write throughput)
CREATE TABLE watch_history (
    profile_id UUID,
    watched_at TIMESTAMP,
    title_id UUID,
    episode_id UUID,
    position_ms BIGINT,
    duration_ms BIGINT,
    completed BOOLEAN,
    PRIMARY KEY (profile_id, watched_at)
) WITH CLUSTERING ORDER BY (watched_at DESC);

-- Viewing activity for analytics (Kafka + Druid/ClickHouse)
-- Events: play_start, pause, seek, quality_change, play_end

6. Key Trade-offs

Decision	Trade-off
Own CDN vs third-party CDN	Open Connect gives Netflix full control, lower cost at scale, and ISP partnerships. But it requires massive upfront investment and ongoing maintenance of 17,000+ servers. At Netflix's scale, the economics strongly favor a custom CDN.
Pre-positioning vs on-demand caching	Netflix proactively pushes content to edge servers during off-peak hours. This guarantees zero cache misses for popular content but wastes bandwidth for content nobody watches. ML models predict popularity to optimize placement.
Per-title encoding vs fixed encoding ladder	Per-title encoding analyzes each title's visual complexity and creates a custom bitrate ladder. This saves 20-30% bandwidth compared to a fixed ladder but requires more compute for encoding. The bandwidth savings justify the encoding cost.
Microservices vs monolith	1,000+ microservices enable independent team ownership and deployment. But they introduce complexity: distributed tracing, service mesh, and the need for resilience patterns everywhere. Netflix invested heavily in tooling (Zuul, Eureka, Hystrix) to manage this complexity.

7. Scaling Considerations

7.1 Multi-Region Deployment

Netflix runs on three AWS regions (US-East, US-West, EU-West). All regions are active-active with data replicated across them. If one region fails completely, traffic is redirected to the remaining regions. This was validated by Chaos Kong exercises that simulated entire region failures. Understand the CAP theorem implications of multi-region replication.

7.2 Data Tier Scaling

Cassandra: Hundreds of clusters across regions for content catalog, watch history, and user data. Data is replicated across regions. Read from the local region for low latency. Sharding is handled natively by Cassandra's consistent hashing.
EVCache (Memcached): Netflix's distributed caching layer. Trillions of cache reads per day. Replicated across zones for availability.
Kafka: For event streaming — play events, clickstream data, logging. Processes trillions of messages per day.

7.3 Handling Peak Traffic

Auto-scaling on AWS handles compute peaks. The CDN pre-positions content before expected peaks (e.g., new season launches). Load balancing across microservices uses client-side load balancing (Ribbon) with health-aware routing.

Use swehelper.com tools to practice designing resilient streaming architectures.

8. Frequently Asked Questions

Q1: How does Netflix achieve sub-3-second start time for video playback?

Three factors: (1) Content is pre-positioned on OCA servers inside the user's ISP network, so the first byte travels a very short distance. (2) Netflix uses a technique called "prebuffering" where the client starts downloading the first few seconds of video in low quality while the manifest loads. (3) Adaptive bitrate starts at a low quality and ramps up, avoiding buffering at the start.

Q2: What happens if the recommendation service goes down?

Netflix uses circuit breakers and fallback strategies. If the personalized recommendation service fails, the system falls back to pre-computed cached recommendations (slightly stale but still personalized). If even the cache is unavailable, it falls back to generic top-10 content for the user's region. The user always sees something useful, even during failures.

Q3: How does Netflix handle a new show launch that everyone watches at once?

For anticipated launches, Netflix pre-positions all episodes on every OCA worldwide well in advance. They also pre-warm CDN caches and increase capacity on the control plane services. Auto-scaling provisions additional compute. The separation of control plane (AWS) and data plane (Open Connect) ensures that even if API servers are stressed, streaming continues uninterrupted from OCAs.

Q4: Why did Netflix build its own CDN instead of using Akamai or CloudFront?

At Netflix's scale (15%+ of global internet traffic), the cost of third-party CDN would be enormous. Open Connect saves Netflix billions annually. Additionally, owning the CDN gives them: (1) full control over caching logic and content placement, (2) direct ISP partnerships that reduce network congestion, (3) the ability to optimize the stack end-to-end for video streaming, and (4) custom hardware optimized for streaming workloads.

Design Netflix: A Video Streaming Architecture