Design a News Feed System

A news feed is the core feature of most social media platforms — Facebook, Instagram, Twitter, and LinkedIn all depend on a well-designed feed. The news feed system must aggregate content from followed users, rank it by relevance, handle real-time updates, and serve it at low latency to hundreds of millions of users. This guide covers feed generation strategies (push vs pull vs hybrid), ranking algorithms, and scalability considerations.

1. Requirements

Functional Requirements

Display a personalized feed of posts from followed users and pages.
Support multiple content types: text, photos, videos, links, shared posts.
Rank content by relevance (not just chronological order).
Pagination: load more posts as the user scrolls (infinite scroll).
Real-time updates: new posts from followed users appear without page refresh.
Support reactions (like, love, etc.), comments, and shares.

Non-Functional Requirements

Low latency: Feed loads in under 500ms.
High availability: 99.99% uptime.
Scalability: Serve 1B+ DAU with personalized feeds.
Eventual consistency: Acceptable delay of a few seconds for new posts to appear.
Extremely read-heavy (feed reads >> post creation).

2. Capacity Estimation

Metric	Estimate
Daily Active Users	1 billion
Feed views per user per day	10
Total feed reads per day	10 billion
Feed read QPS	10B / 86,400 ≈ 115,000/sec
New posts per day	500 million
Post write QPS	500M / 86,400 ≈ 5,800/sec
Average friends/following per user	300
Average post metadata size	~1 KB (post_id + user_id + timestamp + type)
Feed cache per user (500 post IDs)	500 × 8 bytes = 4 KB
Total feed cache (all users)	1B × 4 KB = 4 TB

3. High-Level Design

Component	Responsibility
Post Service	Create and store posts
Fan-out Service	Distribute posts to followers' feed caches
Feed Service	Assemble and serve the personalized feed
Ranking Service	Score and order feed items by relevance
Social Graph Service	Manages friend/follow relationships
Feed Cache (Redis)	Pre-computed feed per user (sorted set of post IDs)
Post Cache	Caches full post objects for hydration
Message Queue (Kafka)	Decouples post creation from fan-out
Notification Service	Real-time feed update notifications

4. Detailed Component Design

4.1 Feed Generation: Push vs Pull vs Hybrid

The most important design decision is how the feed is generated. There are three approaches:

Push Model (Fan-out on Write)

When a user creates a post, immediately push the post_id into every follower's feed cache.

async function fanOutOnWrite(post) {
    const followers = await socialGraph.getFollowers(post.userId);

    const pipeline = redis.pipeline();
    for (const followerId of followers) {
        pipeline.zadd(`feed:${followerId}`, post.createdAt, post.id);
        pipeline.zremrangebyrank(`feed:${followerId}`, 0, -501);
    }
    await pipeline.exec();
}

// Reading the feed is instant
async function getFeed(userId, offset, limit) {
    const postIds = await redis.zrevrange(`feed:${userId}`, offset, offset + limit - 1);
    return hydratePosts(postIds);
}

Pros	Cons
Feed reads are O(1) — very fast	Celebrity problem: millions of writes per celebrity post
Simple read path	Wastes resources for inactive users who never read their feed
Real-time: post appears in feed instantly	High memory usage for pre-computed feeds

Pull Model (Fan-out on Read)

async function getFeedOnRead(userId, since) {
    const following = await socialGraph.getFollowing(userId);

    // Fetch recent posts from each followed user
    const postLists = await Promise.all(
        following.map(uid => postService.getPostsSince(uid, since, limit: 10))
    );

    // Merge and sort
    const merged = mergeKSorted(postLists);
    return merged.slice(0, 20);
}

Pros	Cons
No write amplification	Slow reads: O(following) DB queries per feed load
No wasted work for inactive users	High read latency at scale
No celebrity problem	Expensive merge operation

Hybrid Model (Recommended)

The hybrid approach combines both: fan-out on write for normal users, fan-out on read for high-follower-count users (celebrities/influencers).

const CELEBRITY_THRESHOLD = 10000;

async function onNewPost(post) {
    const followerCount = await socialGraph.getFollowerCount(post.userId);

    if (followerCount < CELEBRITY_THRESHOLD) {
        // Fan-out on write for normal users
        await fanOutToFollowers(post);
    } else {
        // Do NOT fan-out; will be pulled on read
        await markAsCelebrity(post.userId);
    }
}

async function getHybridFeed(userId, cursor) {
    // 1. Get pre-computed feed (from pushed posts)
    const pushedPostIds = await redis.zrevrangebyscore(
        `feed:${userId}`, '+inf', cursor, 'LIMIT', 0, 20
    );

    // 2. Pull recent posts from followed celebrities
    const celebrities = await socialGraph.getFollowedCelebrities(userId);
    const pulledPosts = await fetchRecentFromCelebrities(celebrities, cursor);

    // 3. Merge, rank, and return
    const allPosts = mergePosts(pushedPostIds, pulledPosts);
    const ranked = await rankingService.rank(allPosts, userId);
    return ranked.slice(0, 20);
}

4.2 Feed Ranking Algorithm

Modern feeds use ML models to rank content. The goal is to maximize engagement while maintaining a good user experience.

function computeRankScore(post, user) {
    // Feature extraction
    const features = {
        // Post features
        postAge: (Date.now() - post.createdAt) / 3600000,  // hours
        postType: post.mediaType,
        currentLikes: post.likeCount,
        currentComments: post.commentCount,

        // Author features
        authorRelationship: socialGraph.closeness(user.id, post.userId),
        authorInteractionHistory: getInteractionCount(user.id, post.userId, last30Days),

        // User context
        userActiveHours: isUserPeakHour(user, Date.now()),
        userPreferredContentType: user.preferences.topMediaType
    };

    // Predict engagement probability
    const pLike = likeModel.predict(features);
    const pComment = commentModel.predict(features);
    const pShare = shareModel.predict(features);
    const pClick = clickModel.predict(features);

    // Weighted score
    const score = 1.0 * pLike + 2.0 * pComment + 3.0 * pShare + 0.5 * pClick;

    // Time decay
    const decayFactor = Math.exp(-0.1 * features.postAge);

    return score * decayFactor;
}

4.3 Pagination Strategy

For infinite scroll feeds, cursor-based pagination is essential (not offset-based):

// Cursor-based pagination
GET /api/v1/feed?cursor=1706140800000&limit=20

// Response
{
    "posts": [...],
    "next_cursor": "1706137200000",
    "has_more": true
}

// The cursor is the timestamp of the last post in the current page
// Next request uses this cursor to fetch older posts
// This is stable even as new posts are inserted at the top

Offset-based pagination (page=2, page=3) breaks when new posts are inserted: items shift and the user sees duplicates or misses posts.

4.4 Real-Time Feed Updates

When a user is actively viewing their feed, new posts should appear without a full page refresh:

The client maintains a WebSocket or Server-Sent Events (SSE) connection.
When a new post is fanned out to the user's feed, the server pushes a notification via the connection.
The client shows a "New posts available" banner rather than automatically inserting (to avoid disruptive scroll jumps).
User clicks the banner to load and merge new posts at the top.

5. Database Schema

CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    content TEXT,
    media_type ENUM('text','photo','video','link','shared'),
    media_url TEXT,
    shared_post_id BIGINT,
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    share_count INT DEFAULT 0,
    visibility ENUM('public','friends','private') DEFAULT 'public',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_posts_user ON posts(user_id, created_at DESC);

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX idx_follows_followee ON follows(followee_id);

CREATE TABLE feed_items (
    user_id BIGINT NOT NULL,
    post_id BIGINT NOT NULL,
    score DOUBLE,
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (user_id, created_at, post_id)
);

CREATE TABLE interactions (
    user_id BIGINT NOT NULL,
    post_id BIGINT NOT NULL,
    type ENUM('like','comment','share','click','hide'),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, post_id, type)
);

6. Key Trade-offs

Decision	Trade-off
Push vs pull vs hybrid	Hybrid is optimal for social networks with high follower variance. Pure push wastes writes for celebrities and inactive users. Pure pull is too slow. The celebrity threshold can be tuned per platform.
Chronological vs ranked feed	Chronological is simple, transparent, and low-latency. Ranked feeds increase engagement by 30-40% but require ML infrastructure, training data, and risk filter bubble effects. Most platforms default to ranked.
Feed cache TTL	Long TTL (hours) reduces computation but shows stale content. Short TTL (minutes) keeps feed fresh but increases cache rebuilds. Common: cache for 30 min with real-time push for new items overlaid.
Pre-rank vs lazy rank	Pre-ranking during fan-out adds latency to writes but makes reads instant. Ranking at read time gives more up-to-date scores but adds read latency. Facebook ranks at read time for freshness.

7. Scaling Considerations

7.1 Feed Cache

4 TB of Redis for 1B users. Use Redis cluster with consistent hashing. Evict inactive users' feeds (not opened in 30 days) and rebuild on demand. Keep feeds compact: store only post_ids (8 bytes each), not full post data.

7.2 Fan-out Service

The fan-out service processes 5,800 posts/sec × 300 followers/post = 1.74M Redis writes/sec. Shard fan-out workers by user_id range. Use Kafka to buffer and provide backpressure during traffic spikes. Learn about message queue patterns.

7.3 Post Hydration

Feeds store only post IDs. "Hydration" means fetching full post objects. Use a multi-layer cache: L1 (local in-memory, 1 min TTL), L2 (Redis cluster, 1 hour TTL), L3 (database). Hot posts are cached at L1, reducing Redis load.

7.4 Database Sharding

Shard posts by user_id for efficient user timeline queries. Shard follows by follower_id for efficient "who do I follow?" queries. See sharding strategies. Use load balancing across feed service instances.

Use swehelper.com tools to practice news feed capacity estimation.

8. Frequently Asked Questions

Q1: How does the hybrid fan-out approach handle the celebrity problem?

Users with more than a threshold of followers (e.g., 10K) are marked as celebrities. Their posts are NOT fanned out on write. Instead, when a follower loads their feed, the feed service pulls the latest posts from followed celebrities in real-time, merges them with the pre-computed (pushed) feed, and ranks the combined result. This limits fan-out writes to normal users while keeping feed reads fast.

Q2: How does feed ranking work in practice?

Feed ranking uses a multi-stage ML pipeline: (1) Candidate generation selects ~1000 posts from the user's feed cache and celebrity pulls. (2) A lightweight model scores each candidate using features like post age, author relationship strength, content type preference, and early engagement signals. (3) A heavier model re-ranks the top 100 for the final feed. (4) Diversity filters ensure variety. The entire pipeline runs in under 200ms.

Q3: How do you handle feed pagination with real-time updates?

Use cursor-based pagination with timestamp cursors. When the user scrolls down, the next page fetches posts older than the cursor. New posts appearing at the top do not affect the cursor position, preventing duplicates or gaps. For new posts at the top, a separate notification mechanism (WebSocket or SSE) alerts the user and allows them to pull the latest content.

Q4: What happens when a user unfollows someone?

When User A unfollows User B: (1) The follows table is updated. (2) User B's posts are NOT immediately removed from User A's feed cache (too expensive to scan). (3) Instead, during feed hydration, a filter removes posts from unfollowed users. (4) Over time, User B's posts naturally age out of the feed cache as new posts push them out.

Design a News Feed System