Design Instagram: A Photo-Sharing Social Platform

Instagram serves over 2 billion monthly active users, handling photo uploads, news feed generation, stories, and social interactions at massive scale. This system design explores how to build the core features of Instagram, covering storage, feed generation, CDN integration, and the follower graph. It is a classic interview question that tests your understanding of media storage, fan-out strategies, and social graph management.

1. Requirements

Functional Requirements

Users can upload photos and short videos with captions and hashtags.
Users can follow other users (asymmetric relationship).
Generate a personalized news feed showing posts from followed users, ranked by relevance.
Like and comment on posts.
Explore/discover page with trending and recommended content.
Search for users, hashtags, and locations.
User profiles displaying their posts, follower/following counts.

Non-Functional Requirements

High availability: 99.99% uptime. Feed generation and media serving must be resilient.
Low latency: News feed loads in under 500ms. Photo uploads complete in under 5 seconds.
Scalability: Support 500M+ DAU with 100M+ photo uploads per day.
Durability: Uploaded photos must never be lost.
Eventual consistency: Feed updates can tolerate seconds of delay.
Read-heavy system: feed reads vastly outnumber writes (posts).

2. Capacity Estimation

Metric	Estimate
Daily Active Users	500 million
Photo uploads per day	100 million
Average photo size (original)	2 MB
Resized versions per photo	4 sizes (thumbnail, small, medium, original) ≈ 4 MB total
Storage per day (photos)	100M × 4 MB = 400 TB/day
Storage per year (photos)	~146 PB/year
Feed reads per day	500M users × 10 feed opens = 5 billion
Feed read QPS	5B / 86,400 ≈ 58,000/sec
Write QPS (posts)	100M / 86,400 ≈ 1,160/sec
Upload bandwidth	1,160 × 2 MB = 2.3 GB/sec inbound

CDN bandwidth: Serving images to 500M users generates massive outbound bandwidth. A CDN is absolutely essential to serve images from edge locations.

3. High-Level Design

Component	Responsibility
API Gateway	Authentication, rate limiting, request routing
Post Service	Handles photo upload, metadata creation
Media Processing Pipeline	Resize, compress, generate thumbnails
Object Storage (S3)	Stores photo files durably
CDN	Serves photos from edge locations globally
Feed Service	Generates and serves personalized news feeds
Social Graph Service	Manages follow/unfollow relationships
Search Service	Indexes users, hashtags, locations for search
Notification Service	Sends push notifications for likes, comments, follows
Cache Layer	Redis caches for feed, user profiles, hot posts

4. Detailed Component Design

4.1 Photo Upload Pipeline

The upload flow is a multi-step asynchronous pipeline:

Client uploads photo to a pre-signed S3 URL (direct-to-storage upload bypasses application servers).
Client sends metadata (caption, tags, location) to the Post Service via REST API.
Post Service creates a post record in the database with status "processing."
Media Processing Pipeline (triggered via message queue) resizes the photo into multiple dimensions, applies compression, strips EXIF data, and stores all versions in S3.
Post status updated to "published" once processing completes.
Fan-out Service is triggered to distribute the post to followers' feeds.

// Photo upload API
POST /api/v1/posts
Headers: Authorization: Bearer <token>
Body (multipart): {
    "caption": "Beautiful sunset",
    "hashtags": ["sunset", "nature"],
    "location": {"lat": 34.05, "lng": -118.24, "name": "Los Angeles"},
    "media_key": "s3://uploads/user123/photo_abc.jpg"
}
Response: {
    "post_id": "post_789",
    "status": "processing",
    "created_at": "2025-01-25T10:00:00Z"
}

4.2 Media Storage Strategy

Each uploaded photo is stored in multiple resolutions in object storage:

Version	Dimensions	Avg Size	Use Case
Thumbnail	150x150	15 KB	Grid view, notifications
Small	320x320	50 KB	Low-bandwidth feed
Medium	640x640	150 KB	Standard feed view
Original	Up to 1080x1080	2 MB	Full-screen view

The CDN URL structure encodes the size variant: cdn.instagram.com/photos/{post_id}/{size}.jpg. The client requests the appropriate size based on screen resolution and network conditions.

4.3 News Feed Generation

This is the most complex part of the system. Two fundamental approaches exist:

Fan-out on Write (Push Model)

When a user publishes a post, immediately push the post ID into every follower's pre-computed feed cache.

async function fanOutOnWrite(post) {
    const followers = await socialGraph.getFollowers(post.user_id);

    for (const followerId of followers) {
        // Push post_id to each follower's feed in Redis sorted set
        // Score = timestamp for chronological ordering
        await redis.zadd(
            `feed:${followerId}`,
            post.created_at,
            post.post_id
        );
        // Trim feed to last 1000 posts
        await redis.zremrangebyrank(`feed:${followerId}`, 0, -1001);
    }
}

Pros: Fast feed reads (pre-computed). Cons: Slow writes for users with millions of followers (celebrity problem). High memory usage for pre-computed feeds.

Fan-out on Read (Pull Model)

When a user requests their feed, fetch recent posts from all users they follow and merge them in real-time.

Pros: No write amplification. Cons: Slow feed reads; requires querying many users' post lists and merging.

Hybrid Approach (Recommended)

Instagram and Twitter use a hybrid: fan-out on write for normal users (who have <10K followers), and fan-out on read for celebrities (who have millions of followers). When generating a feed, merge the pre-computed feed with fresh posts from followed celebrities.

async function getFeed(userId, page) {
    // 1. Get pre-computed feed (from push for normal users)
    const feedPostIds = await redis.zrevrange(`feed:${userId}`, page * 20, (page + 1) * 20 - 1);

    // 2. Get followed celebrities
    const celebrities = await socialGraph.getFollowedCelebrities(userId);

    // 3. Fetch recent posts from celebrities (pull)
    const celebrityPosts = await postService.getRecentPosts(celebrities, since: lastFeedRefresh);

    // 4. Merge, rank, and return
    const mergedFeed = rankAndMerge(feedPostIds, celebrityPosts);
    return hydratePosts(mergedFeed);  // Fetch full post data
}

The follow relationship is asymmetric (A follows B does not mean B follows A). Store in a graph-like structure:

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX idx_follows_followee ON follows(followee_id, follower_id);

Two indexes allow efficient queries in both directions: "who does user X follow?" and "who follows user X?" For users with millions of followers, store the follower count in a separate counter cache (Redis) to avoid COUNT queries. Use database sharding by user_id for scalability.

4.5 Explore and Search

The Explore page shows trending and personalized content from users you do not follow. It relies on:

Engagement signals: Posts with high like velocity, comment count, and share rate.
Content-based filtering: Analyze hashtags, captions, and image features to match user interests.
Collaborative filtering: "Users similar to you liked these posts."
Search: Elasticsearch indexes usernames, hashtags, captions, and location names for full-text search.

5. Database Schema

CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    username VARCHAR(30) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    display_name VARCHAR(100),
    bio TEXT,
    profile_photo_url TEXT,
    follower_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    post_count INT DEFAULT 0,
    is_verified BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    caption TEXT,
    location_name VARCHAR(255),
    location_lat DECIMAL(9,6),
    location_lng DECIMAL(9,6),
    media_type ENUM('photo', 'video', 'carousel'),
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    status ENUM('processing', 'published', 'deleted') DEFAULT 'processing',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_posts_user_id ON posts(user_id, created_at DESC);

CREATE TABLE post_media (
    id BIGINT PRIMARY KEY,
    post_id BIGINT NOT NULL REFERENCES posts(id),
    media_url_template TEXT NOT NULL,
    width INT,
    height INT,
    duration_seconds INT,
    sort_order INT DEFAULT 0
);

CREATE TABLE likes (
    user_id BIGINT NOT NULL,
    post_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, post_id)
);

CREATE INDEX idx_likes_post ON likes(post_id, created_at DESC);

CREATE TABLE comments (
    id BIGINT PRIMARY KEY,
    post_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    parent_comment_id BIGINT,
    content TEXT NOT NULL,
    like_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_comments_post ON comments(post_id, created_at);

6. Key Trade-offs

Decision	Option A	Option B	Instagram's Choice
Feed generation	Fan-out on write	Fan-out on read	Hybrid (push for normal, pull for celebrities)
Feed ranking	Chronological	ML-based relevance	ML-ranked with chronological option
Photo storage	Own infrastructure	Cloud object storage	S3 (originally) → custom (at scale)
Database	PostgreSQL	Cassandra	PostgreSQL (sharded) + Cassandra for feeds
Like counts	Real-time COUNT query	Denormalized counter	Denormalized with async counter update

7. Scaling Considerations

7.1 CDN for Media Delivery

A CDN is non-negotiable. Photos are pushed to CDN edge locations worldwide. Cache-Control headers with long TTLs (photos are immutable) ensure minimal origin fetches. Instagram reportedly serves ~1 billion images per day from CDN.

7.2 Database Sharding Strategy

Shard the users and posts tables by user_id using consistent hashing. This co-locates a user's posts with their profile for efficient profile page loads. Cross-shard queries (like feed generation) are handled by the feed service aggregating from the pre-computed Redis cache. See sharding patterns for details.

7.3 Caching Layers

Multiple caching layers are essential:

Feed cache (Redis): Pre-computed list of post IDs per user.
Post cache (Memcached): Full post objects for hot posts.
User profile cache: Follower counts, profile data.
Session cache: Authentication tokens.

7.4 Handling the Celebrity Problem

A user with 100M followers would require 100M Redis writes on each post (fan-out on write). Instead, mark users with more than a threshold (e.g., 10K followers) as celebrities. Their posts are not fanned out; instead, they are pulled in real-time when followers request their feed.

Use swehelper.com tools to practice capacity estimation and architecture design for social media platforms.

8. Frequently Asked Questions

Q1: How does Instagram handle the celebrity problem in feed generation?

Instagram uses a hybrid fan-out approach. For regular users with fewer than ~10K followers, posts are fanned out on write (pushed to each follower's feed cache). For celebrities with millions of followers, posts are fetched on read (pulled when a follower opens their feed). The feed service merges both sources, ranks them, and returns the combined feed.

Q2: How do you store and serve billions of photos efficiently?

Photos are stored in object storage (like S3) in multiple resolutions. A CDN serves photos from edge locations closest to users. Photos are immutable (never modified), which makes CDN caching highly effective with long TTLs. The client requests the appropriate resolution based on screen size and network speed, reducing bandwidth usage.

Q3: How would you handle real-time notifications for likes and comments?

When a user likes or comments on a post, an event is published to a message queue. The Notification Service consumes these events and sends push notifications to the post owner. To prevent notification spam, aggregate notifications (e.g., "User A and 15 others liked your post"). Use a brief delay (30-60 seconds) before sending to allow aggregation.

Q4: How is the Explore page generated?

The Explore page uses a multi-stage pipeline: (1) A candidate generation phase selects thousands of potentially interesting posts using collaborative filtering and content signals. (2) A ranking model scores each candidate based on predicted engagement. (3) A diversity filter ensures variety in topics and content types. (4) Results are cached per-user with a TTL of a few minutes for freshness.