Design Instagram: A Photo-Sharing Social Platform
Instagram serves over 2 billion monthly active users, handling photo uploads, news feed generation, stories, and social interactions at massive scale. This system design explores how to build the core features of Instagram, covering storage, feed generation, CDN integration, and the follower graph. It is a classic interview question that tests your understanding of media storage, fan-out strategies, and social graph management.
1. Requirements
Functional Requirements
- Users can upload photos and short videos with captions and hashtags.
- Users can follow other users (asymmetric relationship).
- Generate a personalized news feed showing posts from followed users, ranked by relevance.
- Like and comment on posts.
- Explore/discover page with trending and recommended content.
- Search for users, hashtags, and locations.
- User profiles displaying their posts, follower/following counts.
Non-Functional Requirements
- High availability: 99.99% uptime. Feed generation and media serving must be resilient.
- Low latency: News feed loads in under 500ms. Photo uploads complete in under 5 seconds.
- Scalability: Support 500M+ DAU with 100M+ photo uploads per day.
- Durability: Uploaded photos must never be lost.
- Eventual consistency: Feed updates can tolerate seconds of delay.
- Read-heavy system: feed reads vastly outnumber writes (posts).
2. Capacity Estimation
| Metric | Estimate |
|---|---|
| Daily Active Users | 500 million |
| Photo uploads per day | 100 million |
| Average photo size (original) | 2 MB |
| Resized versions per photo | 4 sizes (thumbnail, small, medium, original) ≈ 4 MB total |
| Storage per day (photos) | 100M × 4 MB = 400 TB/day |
| Storage per year (photos) | ~146 PB/year |
| Feed reads per day | 500M users × 10 feed opens = 5 billion |
| Feed read QPS | 5B / 86,400 ≈ 58,000/sec |
| Write QPS (posts) | 100M / 86,400 ≈ 1,160/sec |
| Upload bandwidth | 1,160 × 2 MB = 2.3 GB/sec inbound |
CDN bandwidth: Serving images to 500M users generates massive outbound bandwidth. A CDN is absolutely essential to serve images from edge locations.
3. High-Level Design
| Component | Responsibility |
|---|---|
| API Gateway | Authentication, rate limiting, request routing |
| Post Service | Handles photo upload, metadata creation |
| Media Processing Pipeline | Resize, compress, generate thumbnails |
| Object Storage (S3) | Stores photo files durably |
| CDN | Serves photos from edge locations globally |
| Feed Service | Generates and serves personalized news feeds |
| Social Graph Service | Manages follow/unfollow relationships |
| Search Service | Indexes users, hashtags, locations for search |
| Notification Service | Sends push notifications for likes, comments, follows |
| Cache Layer | Redis caches for feed, user profiles, hot posts |
4. Detailed Component Design
4.1 Photo Upload Pipeline
The upload flow is a multi-step asynchronous pipeline:
- Client uploads photo to a pre-signed S3 URL (direct-to-storage upload bypasses application servers).
- Client sends metadata (caption, tags, location) to the Post Service via REST API.
- Post Service creates a post record in the database with status "processing."
- Media Processing Pipeline (triggered via message queue) resizes the photo into multiple dimensions, applies compression, strips EXIF data, and stores all versions in S3.
- Post status updated to "published" once processing completes.
- Fan-out Service is triggered to distribute the post to followers' feeds.
// Photo upload API
POST /api/v1/posts
Headers: Authorization: Bearer <token>
Body (multipart): {
"caption": "Beautiful sunset",
"hashtags": ["sunset", "nature"],
"location": {"lat": 34.05, "lng": -118.24, "name": "Los Angeles"},
"media_key": "s3://uploads/user123/photo_abc.jpg"
}
Response: {
"post_id": "post_789",
"status": "processing",
"created_at": "2025-01-25T10:00:00Z"
}
4.2 Media Storage Strategy
Each uploaded photo is stored in multiple resolutions in object storage:
| Version | Dimensions | Avg Size | Use Case |
|---|---|---|---|
| Thumbnail | 150x150 | 15 KB | Grid view, notifications |
| Small | 320x320 | 50 KB | Low-bandwidth feed |
| Medium | 640x640 | 150 KB | Standard feed view |
| Original | Up to 1080x1080 | 2 MB | Full-screen view |
The CDN URL structure encodes the size variant: cdn.instagram.com/photos/{post_id}/{size}.jpg. The client requests the appropriate size based on screen resolution and network conditions.
4.3 News Feed Generation
This is the most complex part of the system. Two fundamental approaches exist:
Fan-out on Write (Push Model)
When a user publishes a post, immediately push the post ID into every follower's pre-computed feed cache.
async function fanOutOnWrite(post) {
const followers = await socialGraph.getFollowers(post.user_id);
for (const followerId of followers) {
// Push post_id to each follower's feed in Redis sorted set
// Score = timestamp for chronological ordering
await redis.zadd(
`feed:${followerId}`,
post.created_at,
post.post_id
);
// Trim feed to last 1000 posts
await redis.zremrangebyrank(`feed:${followerId}`, 0, -1001);
}
}
Pros: Fast feed reads (pre-computed). Cons: Slow writes for users with millions of followers (celebrity problem). High memory usage for pre-computed feeds.
Fan-out on Read (Pull Model)
When a user requests their feed, fetch recent posts from all users they follow and merge them in real-time.
Pros: No write amplification. Cons: Slow feed reads; requires querying many users' post lists and merging.
Hybrid Approach (Recommended)
Instagram and Twitter use a hybrid: fan-out on write for normal users (who have <10K followers), and fan-out on read for celebrities (who have millions of followers). When generating a feed, merge the pre-computed feed with fresh posts from followed celebrities.
async function getFeed(userId, page) {
// 1. Get pre-computed feed (from push for normal users)
const feedPostIds = await redis.zrevrange(`feed:${userId}`, page * 20, (page + 1) * 20 - 1);
// 2. Get followed celebrities
const celebrities = await socialGraph.getFollowedCelebrities(userId);
// 3. Fetch recent posts from celebrities (pull)
const celebrityPosts = await postService.getRecentPosts(celebrities, since: lastFeedRefresh);
// 4. Merge, rank, and return
const mergedFeed = rankAndMerge(feedPostIds, celebrityPosts);
return hydratePosts(mergedFeed); // Fetch full post data
}
4.4 Follower System (Social Graph)
The follow relationship is asymmetric (A follows B does not mean B follows A). Store in a graph-like structure:
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
CREATE INDEX idx_follows_followee ON follows(followee_id, follower_id);
Two indexes allow efficient queries in both directions: "who does user X follow?" and "who follows user X?" For users with millions of followers, store the follower count in a separate counter cache (Redis) to avoid COUNT queries. Use database sharding by user_id for scalability.
4.5 Explore and Search
The Explore page shows trending and personalized content from users you do not follow. It relies on:
- Engagement signals: Posts with high like velocity, comment count, and share rate.
- Content-based filtering: Analyze hashtags, captions, and image features to match user interests.
- Collaborative filtering: "Users similar to you liked these posts."
- Search: Elasticsearch indexes usernames, hashtags, captions, and location names for full-text search.
5. Database Schema
CREATE TABLE users (
id BIGINT PRIMARY KEY,
username VARCHAR(30) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
display_name VARCHAR(100),
bio TEXT,
profile_photo_url TEXT,
follower_count INT DEFAULT 0,
following_count INT DEFAULT 0,
post_count INT DEFAULT 0,
is_verified BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
caption TEXT,
location_name VARCHAR(255),
location_lat DECIMAL(9,6),
location_lng DECIMAL(9,6),
media_type ENUM('photo', 'video', 'carousel'),
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
status ENUM('processing', 'published', 'deleted') DEFAULT 'processing',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_posts_user_id ON posts(user_id, created_at DESC);
CREATE TABLE post_media (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL REFERENCES posts(id),
media_url_template TEXT NOT NULL,
width INT,
height INT,
duration_seconds INT,
sort_order INT DEFAULT 0
);
CREATE TABLE likes (
user_id BIGINT NOT NULL,
post_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, post_id)
);
CREATE INDEX idx_likes_post ON likes(post_id, created_at DESC);
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
parent_comment_id BIGINT,
content TEXT NOT NULL,
like_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_comments_post ON comments(post_id, created_at);
6. Key Trade-offs
| Decision | Option A | Option B | Instagram's Choice |
|---|---|---|---|
| Feed generation | Fan-out on write | Fan-out on read | Hybrid (push for normal, pull for celebrities) |
| Feed ranking | Chronological | ML-based relevance | ML-ranked with chronological option |
| Photo storage | Own infrastructure | Cloud object storage | S3 (originally) → custom (at scale) |
| Database | PostgreSQL | Cassandra | PostgreSQL (sharded) + Cassandra for feeds |
| Like counts | Real-time COUNT query | Denormalized counter | Denormalized with async counter update |
7. Scaling Considerations
7.1 CDN for Media Delivery
A CDN is non-negotiable. Photos are pushed to CDN edge locations worldwide. Cache-Control headers with long TTLs (photos are immutable) ensure minimal origin fetches. Instagram reportedly serves ~1 billion images per day from CDN.
7.2 Database Sharding Strategy
Shard the users and posts tables by user_id using consistent hashing. This co-locates a user's posts with their profile for efficient profile page loads. Cross-shard queries (like feed generation) are handled by the feed service aggregating from the pre-computed Redis cache. See sharding patterns for details.
7.3 Caching Layers
Multiple caching layers are essential:
- Feed cache (Redis): Pre-computed list of post IDs per user.
- Post cache (Memcached): Full post objects for hot posts.
- User profile cache: Follower counts, profile data.
- Session cache: Authentication tokens.
7.4 Handling the Celebrity Problem
A user with 100M followers would require 100M Redis writes on each post (fan-out on write). Instead, mark users with more than a threshold (e.g., 10K followers) as celebrities. Their posts are not fanned out; instead, they are pulled in real-time when followers request their feed.
Use swehelper.com tools to practice capacity estimation and architecture design for social media platforms.
8. Frequently Asked Questions
Q1: How does Instagram handle the celebrity problem in feed generation?
Instagram uses a hybrid fan-out approach. For regular users with fewer than ~10K followers, posts are fanned out on write (pushed to each follower's feed cache). For celebrities with millions of followers, posts are fetched on read (pulled when a follower opens their feed). The feed service merges both sources, ranks them, and returns the combined feed.
Q2: How do you store and serve billions of photos efficiently?
Photos are stored in object storage (like S3) in multiple resolutions. A CDN serves photos from edge locations closest to users. Photos are immutable (never modified), which makes CDN caching highly effective with long TTLs. The client requests the appropriate resolution based on screen size and network speed, reducing bandwidth usage.
Q3: How would you handle real-time notifications for likes and comments?
When a user likes or comments on a post, an event is published to a message queue. The Notification Service consumes these events and sends push notifications to the post owner. To prevent notification spam, aggregate notifications (e.g., "User A and 15 others liked your post"). Use a brief delay (30-60 seconds) before sending to allow aggregation.
Q4: How is the Explore page generated?
The Explore page uses a multi-stage pipeline: (1) A candidate generation phase selects thousands of potentially interesting posts using collaborative filtering and content signals. (2) A ranking model scores each candidate based on predicted engagement. (3) A diversity filter ensures variety in topics and content types. (4) Results are cached per-user with a TTL of a few minutes for freshness.