Skip to main content
🏢Case Studies

Design WhatsApp: A Real-Time Messaging System

WhatsApp is one of the most widely used messaging platforms, handling over 100 billion messages per day across 2+ billion users. Designing a system like Wh...

📖 10 min read

Design WhatsApp: A Real-Time Messaging System

WhatsApp is one of the most widely used messaging platforms, handling over 100 billion messages per day across 2+ billion users. Designing a system like WhatsApp tests your understanding of real-time communication, WebSockets, message storage, delivery guarantees, and encryption. This guide walks through a complete system design for a WhatsApp-like messaging platform.

1. Requirements

Functional Requirements

  • One-to-one (1:1) real-time messaging between users.
  • Group messaging supporting up to 1,024 members per group.
  • Message delivery receipts: sent, delivered, and read indicators.
  • Online/offline status and last-seen timestamps.
  • Media sharing: images, videos, documents, voice notes.
  • End-to-end encryption for all messages.
  • Offline message delivery: messages are delivered when the recipient comes online.
  • Push notifications for offline users.

Non-Functional Requirements

  • Low latency: Messages delivered in under 200ms for online users.
  • High availability: 99.99% uptime. Messaging must not go down.
  • Ordering: Messages within a conversation must be ordered correctly.
  • Durability: No message loss. Every sent message must be delivered.
  • Scalability: Support billions of users and 100B+ messages/day.
  • Security: End-to-end encryption; server cannot read message content.

2. Capacity Estimation

Metric Estimate
Daily Active Users (DAU) 500 million
Messages per user per day 40
Total messages per day 20 billion
Messages per second (avg) 20B / 86,400 ≈ 230,000 msg/sec
Peak messages per second ~700,000 msg/sec (3x average)
Average message size (text) ~100 bytes
Storage per day (text only) 20B × 100B = 2 TB/day
Storage per year (text only) ~730 TB/year
Concurrent WebSocket connections ~200 million (40% of DAU online at any time)
Bandwidth (inbound text) 230K × 100B = 23 MB/sec

Connection servers: If each server handles 50,000 concurrent WebSocket connections, we need 200M / 50K = 4,000 connection servers. Learn more about WebSocket architecture.

3. High-Level Design

The architecture follows a gateway-based model where clients maintain persistent connections to chat servers, and messages are routed through a messaging backbone.

Component Responsibility
Chat Servers (WebSocket) Maintain persistent connections with clients, send/receive messages
Session Service Tracks which chat server each user is connected to
Message Service Routes messages between users, handles persistence
Message Queue (Kafka) Decouples message production from delivery for reliability
Message Store Persistent storage for message history
Presence Service Tracks online/offline/last-seen status
Push Notification Service Sends push notifications via APNs/FCM for offline users
Media Service Handles upload, compression, and storage of media files
Group Service Manages group metadata, membership, and message fan-out
Load Balancer Distributes WebSocket connections across chat servers

4. Detailed Component Design

4.1 Connection and Message Flow

When a client opens the app, it establishes a WebSocket connection to a chat server. The session service records the mapping: user_id → chat_server_id.

1:1 Message Flow:

  1. User A sends a message through their WebSocket connection to Chat Server 1.
  2. Chat Server 1 publishes the message to Kafka (partitioned by conversation_id).
  3. Message Service consumes from Kafka, persists the message to the database.
  4. Message Service looks up User B's chat server from the Session Service.
  5. If User B is online: Route message to Chat Server 2, which pushes it via WebSocket.
  6. If User B is offline: Store in an offline message queue and trigger a push notification.
  7. Upon delivery, Chat Server 2 sends a delivery receipt back to User A.
// Message object structure
{
    "message_id": "uuid-v7-timestamp-based",
    "conversation_id": "conv_12345",
    "sender_id": "user_A",
    "recipient_id": "user_B",
    "content": "encrypted_payload_base64",
    "content_type": "text",
    "timestamp": 1706140800000,
    "status": "sent"
}

4.2 Message Ordering

Correct message ordering is critical. We use multiple strategies:

  • Kafka partitioning by conversation_id: All messages in a conversation go to the same partition, preserving order within that conversation.
  • Timestamp-based message IDs: Use UUIDv7 or Snowflake IDs that embed timestamps, so messages can be sorted chronologically.
  • Sequence numbers per conversation: Each message gets an incrementing sequence number within its conversation for gap detection.
// Client-side ordering logic
function insertMessage(conversation, newMessage) {
    const seqNum = newMessage.sequence_number;
    const expectedSeq = conversation.lastSequenceNumber + 1;

    if (seqNum === expectedSeq) {
        conversation.messages.push(newMessage);
        conversation.lastSequenceNumber = seqNum;
    } else if (seqNum > expectedSeq) {
        // Gap detected - request missing messages from server
        requestMissingMessages(conversation.id, expectedSeq, seqNum - 1);
        conversation.messages.push(newMessage);
    }
    // Ignore if seqNum <= lastSequenceNumber (duplicate)
}

4.3 Group Messaging

Group messages require fan-out to all group members. Two strategies:

Strategy How It Works Best For
Fan-out on Write When a message is sent, copy it to each member's inbox queue Small groups (<100 members)
Fan-out on Read Store message once; each member reads from the group's message log Large groups (100+ members)

WhatsApp uses fan-out on write for most groups (capped at 1,024 members). The Group Service retrieves the member list and enqueues a delivery task for each member. Read more about fan-out strategies in our message queues guide.

4.4 Delivery Receipts

Three levels of message status:

  • Sent (single check): Server acknowledges receipt of the message from the sender.
  • Delivered (double check): Recipient's device acknowledges receipt of the message.
  • Read (blue checks): Recipient's app reports the message was displayed on screen.
// Receipt handling on chat server
function handleReceipt(receipt) {
    // receipt: { message_id, status: "delivered"|"read", timestamp }
    db.updateMessageStatus(receipt.message_id, receipt.status, receipt.timestamp);

    // Notify the original sender
    const senderId = db.getMessageSender(receipt.message_id);
    const senderServer = sessionService.getChatServer(senderId);
    if (senderServer) {
        senderServer.pushReceipt(senderId, receipt);
    }
}

4.5 End-to-End Encryption

WhatsApp uses the Signal Protocol for end-to-end encryption. The key concepts:

  • Each device generates a public-private key pair on registration.
  • Public keys are uploaded to the server's Key Distribution Service.
  • When User A messages User B, User A fetches User B's public key and encrypts the message locally.
  • The server only sees encrypted ciphertext and cannot decrypt message content.
  • For groups, a shared group key is distributed to all members, encrypted individually with each member's public key.

The server's role is purely relay; it stores and forwards encrypted blobs without any ability to read content.

4.6 Presence Service

Tracking online/offline status for 500M daily users is expensive. Optimizations:

  • Use a distributed in-memory store (Redis) for presence data.
  • Heartbeat mechanism: clients send a heartbeat every 30 seconds. If no heartbeat for 60 seconds, mark as offline.
  • Only broadcast presence updates to users who have the app open and are viewing a conversation with that user (lazy fan-out).
  • For last-seen, update a database timestamp only on disconnect events, not on every heartbeat.

5. Database Schema

Messages are the core data. Given the volume (20B/day) and the access patterns (read recent messages by conversation), a wide-column store like Apache Cassandra or HBase is ideal. See SQL vs NoSQL for trade-offs.

-- Cassandra schema for messages
CREATE TABLE messages (
    conversation_id UUID,
    message_id TIMEUUID,
    sender_id UUID,
    content BLOB,
    content_type TEXT,
    status TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- Conversations per user (for listing chats)
CREATE TABLE user_conversations (
    user_id UUID,
    last_message_at TIMESTAMP,
    conversation_id UUID,
    conversation_name TEXT,
    is_group BOOLEAN,
    unread_count INT,
    PRIMARY KEY (user_id, last_message_at, conversation_id)
) WITH CLUSTERING ORDER BY (last_message_at DESC);

-- Group membership
CREATE TABLE group_members (
    group_id UUID,
    user_id UUID,
    role TEXT,
    joined_at TIMESTAMP,
    PRIMARY KEY (group_id, user_id)
);

-- User profiles (PostgreSQL)
CREATE TABLE users (
    id UUID PRIMARY KEY,
    phone_number VARCHAR(20) UNIQUE NOT NULL,
    display_name VARCHAR(100),
    profile_photo_url TEXT,
    public_key BYTEA,
    last_seen TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

The messages table is partitioned by conversation_id and sorted by message_id (time-ordered) within each partition, making it efficient to fetch recent messages in a conversation.

6. Key Trade-offs

Push vs Pull for Message Delivery

Approach Description Trade-off
Push (WebSocket) Server pushes messages to client immediately Low latency but requires persistent connections
Pull (Long Polling) Client periodically polls for new messages Simpler but higher latency and more server load
Hybrid WebSocket when online, push notification when offline Best of both worlds; added complexity

WhatsApp's approach: Push via persistent connections (XMPP-based, now custom protocol) when online, and APNs/FCM push notifications when offline. This hybrid approach is the standard for messaging systems.

Message Storage: SQL vs NoSQL

At 20B messages/day, a relational database cannot keep up. Cassandra is the right choice because: (1) it handles massive write throughput, (2) data is partitioned by conversation_id for locality, (3) it supports time-range queries efficiently with clustering keys. However, user profiles and group metadata can remain in PostgreSQL for relational queries. Understand the CAP theorem implications: Cassandra favors AP (availability and partition tolerance).

7. Scaling Considerations

7.1 WebSocket Connection Management

200M concurrent connections require careful management:

  • Deploy 4,000+ chat servers, each handling ~50K connections.
  • Use a load balancer with sticky sessions (by user_id hash) for WebSocket routing.
  • Use Erlang/Elixir or a custom C++ server for efficient per-connection memory usage (~10KB per connection).
  • Graceful handoff: when a chat server needs to restart, migrate connections to a new server without message loss.

7.2 Message Queue Partitioning

Kafka topics are partitioned by conversation_id. This ensures all messages in a conversation are processed in order within a single partition. With millions of active conversations, use thousands of partitions across the Kafka cluster. Learn more about message queue architecture.

7.3 Database Sharding

Shard Cassandra by conversation_id using consistent hashing. This co-locates all messages of a conversation on the same node, enabling efficient reads. For user_conversations table, shard by user_id so a user's chat list is on one shard. See database sharding for strategies.

7.4 Media Storage

Media files (photos, videos) are not stored in the message database. Instead:

  1. Client uploads encrypted media to object storage (S3) via the Media Service.
  2. Media Service returns a URL/reference.
  3. The message contains only the media reference, not the binary data.
  4. Recipient's client downloads and decrypts the media separately.

This keeps the message pipeline lightweight and avoids overloading the chat servers with large payloads.

7.5 Multi-Device Sync

Supporting multiple devices per user (phone, desktop, web) requires:

  • Each device maintains its own WebSocket connection and encryption keys.
  • Messages are delivered to all active devices of the recipient.
  • Sync state (last read message) is tracked per device.

Use swehelper.com tools to practice capacity estimation for messaging systems.

8. Frequently Asked Questions

Q1: How does WhatsApp handle message delivery when the user is offline?

When the recipient is offline, messages are stored in a persistent queue (per-user inbox in the database). When the user reconnects, the chat server retrieves all pending messages from the queue and delivers them in order via the WebSocket connection. Additionally, a push notification is sent via APNs (iOS) or FCM (Android) to alert the user of new messages.

Q2: How do you ensure exactly-once message delivery?

Use idempotent message IDs. Each message has a globally unique ID (UUIDv7). The server deduplicates based on message_id before persisting. The client retries failed sends with the same message_id. On the delivery side, the client acknowledges receipt; if no ack is received, the server retries delivery. This gives at-least-once delivery at the transport layer with deduplication at the application layer for effective exactly-once semantics.

Q3: How does presence (online/offline) scale to millions of users?

Presence is stored in Redis with TTL-based expiry. Clients send heartbeats every 30 seconds. Instead of broadcasting status changes to all contacts (which would be extremely expensive), presence is fetched lazily: only when a user opens a conversation do they subscribe to the other person's presence updates. This reduces the fan-out from O(contacts) to O(active_conversations).

Q4: Why not use HTTP long polling instead of WebSockets?

Long polling requires the client to repeatedly open new HTTP connections, each with TCP handshake overhead. WebSockets maintain a single persistent TCP connection with minimal overhead per message (~2-6 bytes framing). For a messaging app with frequent bidirectional communication, WebSockets reduce latency from hundreds of milliseconds to single-digit milliseconds and cut server resource usage dramatically.

Q5: How does end-to-end encryption work in group chats?

Each group has a shared encryption key. When a new member joins or an existing member leaves, a new group key is generated and distributed. The sender encrypts the message once with the group key. The group key itself is encrypted individually with each member's public key and distributed. This means the sender only encrypts the message once (not N times for N members), but key rotation events require N encrypted copies of the new group key.

Related Articles