Design YouTube: A Video Streaming Platform

YouTube is the world's largest video-sharing platform, serving over 1 billion hours of video daily to 2+ billion users. Designing YouTube tests your knowledge of video upload pipelines, transcoding, adaptive bitrate streaming, CDN architecture, and recommendation systems. This guide covers the complete system design for a YouTube-like platform.

1. Requirements

Functional Requirements

Users can upload videos of various formats and sizes (up to 12 hours, 256 GB).
Videos are transcoded into multiple resolutions (144p to 4K) and formats.
Smooth video streaming with adaptive bitrate based on network conditions.
Video search by title, description, tags, and captions.
Personalized video recommendations (homepage and "up next").
Like, comment, subscribe, and share functionality.
View count tracking and analytics for creators.
Video thumbnails generated automatically or uploaded by creators.

Non-Functional Requirements

High availability: 99.99% uptime for video streaming.
Low latency: Video playback should start within 2 seconds.
Scalability: Handle 500+ hours of video uploaded per minute.
Durability: Uploaded videos must never be lost.
Global reach: Low-latency streaming for users worldwide.
Read-heavy: video watches vastly outnumber uploads.

2. Capacity Estimation

Metric	Estimate
Daily Active Users	800 million
Videos uploaded per day	720,000 (500 hours/min × 60 × 24 / avg 60 min)
Average original video size	500 MB (avg 10 min at moderate quality)
Storage per day (originals)	720K × 500 MB = 360 TB/day
Transcoded versions (5 resolutions)	≈3x original → 1.08 PB/day total
Storage per year	~400 PB/year
Daily video views	5 billion
Average video stream size	~50 MB (5 min avg at 720p)
Streaming bandwidth	5B × 50MB / 86,400 ≈ 2.9 TB/sec outbound

The massive outbound bandwidth requirement makes a CDN absolutely critical. YouTube uses its own CDN infrastructure (Google Global Cache) deployed within ISPs.

3. High-Level Design

The system has two major flows: the upload pipeline (write) and the streaming pipeline (read).

Component	Responsibility
Upload Service	Handles chunked video uploads with resume support
Transcoding Pipeline	Converts videos into multiple resolutions and formats
Object Storage	Stores original and transcoded video files
CDN	Caches and serves video segments from edge locations
Streaming Service	Serves video manifests and coordinates adaptive streaming
Metadata Service	Stores video titles, descriptions, tags, view counts
Search Service	Full-text search over video metadata and captions
Recommendation Engine	Generates personalized video suggestions
Analytics Service	Tracks views, watch time, engagement metrics
Message Queue	Decouples upload from transcoding and post-processing

4. Detailed Component Design

4.1 Video Upload Pipeline

Uploading large video files reliably requires chunked, resumable uploads:

// Resumable upload protocol (similar to tus.io)
// Step 1: Initialize upload
POST /api/v1/uploads
Body: { "filename": "video.mp4", "size": 524288000, "content_type": "video/mp4" }
Response: { "upload_id": "upload_abc", "upload_url": "/uploads/upload_abc" }

// Step 2: Upload chunks (5 MB each)
PATCH /uploads/upload_abc
Headers: Content-Range: bytes 0-5242879/524288000
Body: [binary chunk data]
Response: { "offset": 5242880 }

// Step 3: If upload interrupted, resume from last offset
HEAD /uploads/upload_abc
Response: Upload-Offset: 5242880
// Client resumes from byte 5242880

Once the upload completes, the Upload Service publishes a message to the transcoding queue.

4.2 Transcoding Pipeline

Transcoding converts the original video into multiple resolutions and formats for adaptive streaming. This is the most compute-intensive part of the system.

Resolution	Bitrate	File Size (10 min)
144p	200 Kbps	15 MB
360p	700 Kbps	52 MB
720p	2.5 Mbps	187 MB
1080p	5 Mbps	375 MB
4K	20 Mbps	1.5 GB

The pipeline architecture uses a message queue with parallel workers:

// Transcoding pipeline (DAG of tasks)
Upload Complete
    |
    v
[Split into segments] --> parallel segment transcoding
    |
    +--> Transcode to 144p (H.264) --> Generate HLS segments
    +--> Transcode to 360p (H.264) --> Generate HLS segments
    +--> Transcode to 720p (H.264 + VP9) --> Generate HLS/DASH segments
    +--> Transcode to 1080p (H.264 + VP9) --> Generate HLS/DASH segments
    +--> Transcode to 4K (VP9 + AV1) --> Generate HLS/DASH segments
    |
    v
[Generate thumbnail sprites]
    |
    v
[Generate subtitles via speech-to-text]
    |
    v
[Content moderation scan]
    |
    v
[Update metadata: status = "published"]
    |
    v
[Notify subscribers via push notification]

Each transcoding job can be parallelized by splitting the video into segments (e.g., 10-second chunks) and transcoding them independently, then concatenating.

4.3 Adaptive Bitrate Streaming (ABR)

YouTube uses DASH (Dynamic Adaptive Streaming over HTTP) and HLS (HTTP Live Streaming). The video player dynamically switches quality based on network bandwidth.

// HLS Master Playlist (m3u8)
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=200000,RESOLUTION=256x144
https://cdn.example.com/video123/144p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=700000,RESOLUTION=640x360
https://cdn.example.com/video123/360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
https://cdn.example.com/video123/720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
https://cdn.example.com/video123/1080p/playlist.m3u8

// Individual quality playlist
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXTINF:10.0,
https://cdn.example.com/video123/720p/segment_001.ts
#EXTINF:10.0,
https://cdn.example.com/video123/720p/segment_002.ts

The player starts with a low-quality stream and quickly upgrades based on measured download speed. If bandwidth drops, it seamlessly switches to a lower quality without buffering.

4.4 CDN Architecture

YouTube uses Google Global Cache (GGC) — servers placed directly inside ISP networks. Content is tiered:

Tier 1 (ISP edge): Most popular videos cached inside ISP data centers.
Tier 2 (Regional PoPs): Less popular videos at Google's regional points of presence.
Tier 3 (Origin): All videos stored in Google's core data centers.

Popular videos are "hot" and cached everywhere; long-tail videos are pulled from origin on demand and cached temporarily. Read more about CDN architecture.

4.5 Recommendation Engine

YouTube's recommendation system drives 70%+ of watch time. A simplified architecture:

Candidate Generation: From millions of videos, narrow to ~1,000 candidates using collaborative filtering (users who watched X also watched Y) and content-based signals.
Ranking: A deep neural network scores each candidate based on predicted watch time, considering user history, video features, context (time of day, device), and freshness.
Re-ranking: Apply diversity rules, remove duplicates, boost fresh content, filter inappropriate content.

// Simplified recommendation pipeline
function getRecommendations(userId, context) {
    // Stage 1: Candidate generation (~1000 videos)
    const candidates = [
        ...collaborativeFilter.getCandidates(userId, 500),
        ...contentBased.getSimilar(userId.recentWatched, 300),
        ...trending.getVideos(context.country, 200)
    ];

    // Stage 2: Ranking (predict watch time)
    const scored = rankingModel.score(candidates, {
        userProfile: getUserProfile(userId),
        watchHistory: getWatchHistory(userId, last30Days),
        context: context
    });

    // Stage 3: Re-ranking
    return rerank(scored, {
        diversityWeight: 0.3,
        freshnessBoost: 0.2,
        maxPerChannel: 3
    }).slice(0, 20);
}

5. Database Schema

CREATE TABLE videos (
    id BIGINT PRIMARY KEY,
    channel_id BIGINT NOT NULL,
    title VARCHAR(500) NOT NULL,
    description TEXT,
    duration_seconds INT,
    status ENUM('uploading','transcoding','published','removed') DEFAULT 'uploading',
    privacy ENUM('public','unlisted','private') DEFAULT 'public',
    view_count BIGINT DEFAULT 0,
    like_count INT DEFAULT 0,
    dislike_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    manifest_url TEXT,
    thumbnail_url TEXT,
    upload_size_bytes BIGINT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published_at TIMESTAMP
);

CREATE INDEX idx_videos_channel ON videos(channel_id, published_at DESC);

CREATE TABLE channels (
    id BIGINT PRIMARY KEY,
    user_id BIGINT UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    subscriber_count BIGINT DEFAULT 0,
    video_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE subscriptions (
    subscriber_id BIGINT NOT NULL,
    channel_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (subscriber_id, channel_id)
);

CREATE INDEX idx_subs_channel ON subscriptions(channel_id);

CREATE TABLE comments (
    id BIGINT PRIMARY KEY,
    video_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    parent_id BIGINT,
    content TEXT NOT NULL,
    like_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_comments_video ON comments(video_id, created_at);

CREATE TABLE watch_history (
    user_id BIGINT NOT NULL,
    video_id BIGINT NOT NULL,
    watched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    watch_duration_seconds INT,
    watch_percentage DECIMAL(5,2),
    PRIMARY KEY (user_id, watched_at, video_id)
);

6. Key Trade-offs

Decision	Trade-off
Pre-transcode all resolutions vs on-demand	Pre-transcode guarantees instant playback but costs 3x storage. On-demand saves storage but adds initial latency. YouTube pre-transcodes popular resolutions (360p-1080p) for all videos and transcodes 4K on demand.
View count accuracy vs performance	Real-time COUNT queries are expensive at scale. Use approximate counters with batch updates. YouTube shows approximate counts and reconciles periodically.
HLS vs DASH	HLS has broader device support (required for iOS). DASH is more flexible and open standard. YouTube supports both but primarily uses DASH on web.
Codec choice: H.264 vs VP9 vs AV1	H.264 has universal hardware support. VP9 saves 30% bandwidth. AV1 saves 50% but encoding is 10x slower. YouTube uses all three, serving the most efficient codec the client supports.

7. Scaling Considerations

7.1 Transcoding at Scale

With 720K videos/day, transcoding requires massive compute. Use a priority queue: popular creators and shorter videos get priority. Leverage spot/preemptible instances for cost efficiency. Shard the work across thousands of transcoding workers. Learn about queue-based architectures for this pattern.

7.2 Storage Optimization

Videos with zero views after 30 days: remove higher-resolution transcodes, keep only 360p.
Use tiered storage: hot (SSD for recent/popular) → warm (HDD) → cold (tape/archive for rarely accessed).
Deduplication: detect and link re-uploads of the same video.

7.3 Database Sharding

Shard the videos table by video_id using hash-based sharding. Watch history is sharded by user_id. Comments are sharded by video_id. Use Redis caching for hot video metadata and view counts.

7.4 Search

Index video metadata (title, description, tags, auto-generated captions) in Elasticsearch. Use a separate inverted index for caption-based search (search within video content). Rank results by relevance, freshness, view count, and personalization signals.

Use swehelper.com tools to practice video platform capacity estimation.

8. Frequently Asked Questions

Q1: How does adaptive bitrate streaming work?

The video is split into small segments (2-10 seconds each) and encoded at multiple quality levels. The player downloads a manifest file listing all available qualities. It starts with a low quality, measures download speed, and dynamically switches to higher or lower quality for subsequent segments. This happens seamlessly without the user needing to manually change quality, ensuring smooth playback even on fluctuating networks.

Q2: How does YouTube handle video uploads that fail midway?

YouTube uses resumable uploads. The client uploads in chunks (typically 5-8 MB each). The server tracks the last successfully received byte offset. If the upload is interrupted, the client queries the server for the current offset and resumes from that point. This is critical for large video files on unreliable connections.

Q3: How are view counts tracked at scale?

View events are published to Kafka and processed asynchronously. A streaming pipeline (like Flink) aggregates counts in near-real-time. The aggregated count is periodically flushed to the database. Redis holds the approximate live count for display. YouTube also applies fraud detection to filter bot views, which is why view counts sometimes freeze around 301 for new viral videos while being verified.

Q4: Why not just use a single CDN provider?

At YouTube's scale (2.9 TB/sec outbound), no single CDN can handle the load cost-effectively. YouTube built its own CDN (Google Global Cache) with servers placed directly inside ISP networks. This reduces transit costs, improves latency, and gives Google full control over caching policies and capacity planning.

Design YouTube: A Video Streaming Platform