Design a Chat System

A real-time chat system supports 1:1 conversations, group chats, message ordering, offline delivery, and read receipts. Whether it is Slack, Discord, or a custom enterprise chat, the fundamental challenges are the same: managing persistent connections, ensuring message ordering, handling offline users, and scaling to millions of concurrent connections. This guide covers the complete architecture for building a production-grade chat system.

1. Requirements

Functional Requirements

One-to-one (1:1) real-time messaging.
Group chat supporting up to 500 members.
Message types: text, file attachments, images, reactions, replies (threads).
Read receipts: sent, delivered, read status.
Typing indicators.
Message history with search.
Offline message delivery.
Online/offline presence indicators.
Push notifications for offline users.

Non-Functional Requirements

Low latency: Messages delivered within 100ms for online users.
High availability: 99.99% uptime.
Message ordering: Messages appear in correct order within a conversation.
Durability: No messages lost.
Scalability: Handle 50M+ concurrent connections.

2. Capacity Estimation

Metric	Estimate
Daily Active Users	100 million
Concurrent connections	50 million (50% of DAU)
Messages per day	5 billion
Messages per second (avg)	58,000/sec
Average message size	~200 bytes
Storage per day	5B × 200B = 1 TB/day
Chat servers needed	50M / 50K per server = 1,000 servers

3. High-Level Design

Component	Responsibility
Chat Servers	Manage WebSocket connections, send/receive messages
Connection Registry	Maps user_id to their chat server
Message Service	Persists messages and routes between servers
Message Queue (Kafka)	Buffers messages for reliability and ordering
Message Store	Persistent storage for message history
Presence Service	Online/offline/typing status
Group Service	Group membership, metadata
Push Service	APNs/FCM for offline users
File Service	Handles file/image upload and storage
Load Balancer	Routes WebSocket connections to chat servers

4. Detailed Component Design

4.1 WebSocket Connection Management

Each user establishes a persistent WebSocket connection when they open the app. The Connection Registry (Redis) maintains the mapping.

// When user connects
async function onConnect(userId, chatServerId) {
    await redis.hset(`connections`, userId, chatServerId);
    await redis.sadd(`server:${chatServerId}:users`, userId);
    presenceService.setOnline(userId);
}

// When user disconnects
async function onDisconnect(userId, chatServerId) {
    await redis.hdel(`connections`, userId);
    await redis.srem(`server:${chatServerId}:users`, userId);
    presenceService.setOffline(userId);
}

// Find which server a user is on
async function findUserServer(userId) {
    return redis.hget(`connections`, userId);
}

4.2 Message Flow (1:1 Chat)

// User A sends message to User B
async function sendMessage(senderId, recipientId, content) {
    const message = {
        id: generateSnowflakeId(),
        conversationId: getConversationId(senderId, recipientId),
        senderId: senderId,
        content: content,
        timestamp: Date.now(),
        status: "sent"
    };

    // 1. Persist message
    await messageStore.save(message);

    // 2. Acknowledge to sender (sent status)
    await pushToUser(senderId, { type: "message_ack", messageId: message.id, status: "sent" });

    // 3. Route to recipient
    const recipientServer = await findUserServer(recipientId);
    if (recipientServer) {
        // Online: push via WebSocket
        await routeToServer(recipientServer, recipientId, message);
    } else {
        // Offline: queue for later + push notification
        await offlineQueue.enqueue(recipientId, message);
        await pushService.sendNotification(recipientId, {
            title: getSenderName(senderId),
            body: truncate(content, 100)
        });
    }
}

// When recipient's server receives the message
async function deliverToUser(userId, message) {
    const delivered = await websocket.send(userId, message);
    if (delivered) {
        // Send delivery receipt back to sender
        await sendReceipt(message.senderId, message.id, "delivered");
    }
}

4.3 Group Chat Message Flow

async function sendGroupMessage(senderId, groupId, content) {
    const message = {
        id: generateSnowflakeId(),
        conversationId: groupId,
        senderId: senderId,
        content: content,
        timestamp: Date.now()
    };

    // Persist once
    await messageStore.save(message);

    // Get group members
    const members = await groupService.getMembers(groupId);

    // Fan-out to each member
    for (const memberId of members) {
        if (memberId === senderId) continue;

        const memberServer = await findUserServer(memberId);
        if (memberServer) {
            await routeToServer(memberServer, memberId, message);
        } else {
            await offlineQueue.enqueue(memberId, message);
            await pushService.sendNotification(memberId, {
                title: `${getGroupName(groupId)}`,
                body: `${getSenderName(senderId)}: ${truncate(content, 80)}`
            });
        }
    }
}

4.4 Message Ordering

Ensuring correct message order within a conversation requires multiple techniques:

Snowflake IDs: Timestamp-embedded IDs provide natural ordering.
Kafka partitioning: Partition by conversation_id so all messages in a conversation go to the same partition, preserving order.
Sequence numbers: Each message gets an incrementing sequence number per conversation. Clients detect gaps and request missing messages.

4.5 Read Receipts and Typing Indicators

// Read receipt: user opened conversation and saw messages
async function markAsRead(userId, conversationId, lastReadMessageId) {
    // Update read position
    await redis.hset(`read:${conversationId}`, userId, lastReadMessageId);

    // In 1:1 chat, notify the other person
    const otherUserId = getOtherUser(conversationId, userId);
    await pushToUser(otherUserId, {
        type: "read_receipt",
        conversationId: conversationId,
        readBy: userId,
        upToMessageId: lastReadMessageId
    });
}

// Typing indicator (ephemeral, not persisted)
async function sendTypingIndicator(senderId, conversationId) {
    const recipients = getConversationMembers(conversationId)
                       .filter(id => id !== senderId);

    for (const recipientId of recipients) {
        await pushToUser(recipientId, {
            type: "typing",
            conversationId: conversationId,
            userId: senderId,
            expiresIn: 5000
        });
    }
}

4.6 Offline Message Delivery

When a user comes online, deliver all pending messages:

async function onUserReconnect(userId) {
    // Fetch messages from offline queue
    const pendingMessages = await offlineQueue.drain(userId);

    // Group by conversation and sort by timestamp
    const grouped = groupByConversation(pendingMessages);

    for (const [convId, messages] of grouped) {
        messages.sort((a, b) => a.timestamp - b.timestamp);
        for (const msg of messages) {
            await websocket.send(userId, msg);
        }
    }

    // Also sync conversation list updates
    const updatedConversations = await getConversationUpdates(userId, lastSyncTime);
    await websocket.send(userId, { type: "sync", conversations: updatedConversations });
}

5. Database Schema

-- Cassandra for messages (high write throughput, time-range queries)
CREATE TABLE messages (
    conversation_id UUID,
    message_id BIGINT,
    sender_id UUID,
    content TEXT,
    content_type TEXT,
    reply_to_id BIGINT,
    created_at TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- PostgreSQL for conversations and metadata
CREATE TABLE conversations (
    id UUID PRIMARY KEY,
    type ENUM('direct','group') NOT NULL,
    name VARCHAR(100),
    created_by UUID,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE conversation_members (
    conversation_id UUID NOT NULL,
    user_id UUID NOT NULL,
    role ENUM('owner','admin','member') DEFAULT 'member',
    joined_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_read_message_id BIGINT DEFAULT 0,
    muted_until TIMESTAMP,
    PRIMARY KEY (conversation_id, user_id)
);

CREATE INDEX idx_members_user ON conversation_members(user_id);

CREATE TABLE user_conversations (
    user_id UUID NOT NULL,
    conversation_id UUID NOT NULL,
    last_message_at TIMESTAMP,
    unread_count INT DEFAULT 0,
    pinned BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (user_id, last_message_at, conversation_id)
);

See SQL vs NoSQL for understanding why Cassandra suits messages and PostgreSQL suits metadata.

6. Key Trade-offs

Decision	Trade-off
WebSocket vs long polling	WebSocket provides true real-time bidirectional communication with minimal overhead. Long polling is simpler but has higher latency and more server load. For chat, WebSocket is the clear winner.
Fan-out on write vs read for groups	For groups under 500 members, fan-out on write (deliver to each member immediately) is fast enough. For very large groups (1000+), fan-out on read (members pull from group log) reduces write amplification.
Message ordering: timestamps vs sequence numbers	Timestamps can have clock skew across servers. Sequence numbers per conversation guarantee strict ordering but require coordination. Use both: Snowflake IDs (timestamp + sequence) for unique ordering.
Store all messages vs ephemeral	Storing all messages enables full history search but costs more storage. Ephemeral messages (auto-delete after 24h) save storage but lose history. Most chat apps store permanently with optional auto-delete features.

7. Scaling Considerations

7.1 Connection Scaling

50M concurrent WebSocket connections across 1,000 servers. Use sticky load balancing by user_id hash to route reconnections to the same server when possible. Each server handles 50K connections using event-loop architecture (Node.js, Go, or Erlang/Elixir).

7.2 Cross-Server Message Routing

When User A is on Server 1 and User B is on Server 5, the message must be routed across servers. Two approaches: (1) Direct server-to-server communication via internal RPC. (2) Pub/sub via Redis or Kafka where each server subscribes to its user set. The pub/sub approach is simpler and scales better.

7.3 Message Store Sharding

Shard messages by conversation_id using consistent hashing. This co-locates all messages in a conversation for efficient reads. Cassandra handles this natively with its partition key architecture. User conversation lists are sharded by user_id.

7.4 Presence at Scale

Tracking online/offline for 100M users. Store in Redis with heartbeat-based expiry (30-second heartbeat, 60-second TTL). Broadcast presence changes only to users with active conversations (lazy fan-out). Use in-memory caching for hot presence data.

Use swehelper.com tools to practice chat system design and capacity estimation.

8. Frequently Asked Questions

Q1: How do you ensure message ordering in a distributed system?

Use Kafka with conversation_id as the partition key so all messages in a conversation are processed in order. Assign each message a Snowflake ID (which embeds timestamp) and a per-conversation sequence number. Clients sort by message_id and detect gaps by checking sequence continuity. If a gap is detected, the client requests the missing messages from the server.

Q2: How do read receipts scale in group chats?

In a group of 200 members, tracking individual read positions creates 200 entries per conversation. This is manageable. However, broadcasting "User X read up to message Y" to all 199 other members is expensive. Optimize by: only sending read receipts when the sender is online, batching multiple read updates, and only showing the aggregate "seen by N people" rather than individual names in the UI.

Q3: How do you handle message delivery when both users are on different servers?

The Connection Registry (Redis hash map) tells the sender's server which server the recipient is connected to. The sender's server then routes the message to the recipient's server via internal RPC or a pub/sub channel. The recipient's server delivers it via WebSocket. If the recipient's server is unknown (user offline), the message goes to the offline queue and a push notification is sent.

Q4: How do typing indicators work without overwhelming the system?

Typing indicators are ephemeral and low-priority. The client sends a "typing" event at most once every 3 seconds (not on every keystroke). The indicator auto-expires after 5 seconds on the receiver side. Typing events are NOT persisted or queued; they are sent best-effort via WebSocket. If lost, the worst case is the typing indicator does not show, which is acceptable.

Design a Chat System