Distributed Caching: Scaling Cache Across Multiple Nodes

When a single cache server is no longer enough — because your dataset outgrows its memory, your request rate exceeds its throughput, or you need fault tolerance — you need distributed caching. A distributed cache spreads data across multiple nodes, allowing you to scale horizontally while maintaining fast access times. This is the backbone of caching at companies like Netflix, Twitter, and Amazon.

This guide covers the core concepts, algorithms, and technologies that make distributed caching work, from consistent hashing to Redis Cluster and Memcached.

Why Distribute Your Cache?

A single Redis instance with 64 GB of RAM can handle impressive workloads — up to 100,000+ operations per second. But modern applications often need more:

Larger datasets: If your working set is 500 GB, no single server can hold it all in memory.
Higher throughput: A million requests per second requires spreading load across multiple nodes.
Fault tolerance: A single server is a single point of failure. Distributed caches survive individual node failures.
Geographic distribution: For global applications, cache nodes in multiple regions reduce latency for users worldwide.

Consistent Hashing

The fundamental challenge in distributed caching is: given a key, which node should store it? Naive approaches like node = hash(key) % num_nodes break catastrophically when nodes are added or removed — nearly every key remaps to a different node, causing a cache miss storm.

Consistent hashing solves this by mapping both keys and nodes onto a virtual ring (typically a circle of 0 to 2^32). Each key is stored on the first node encountered clockwise from its position on the ring.

import hashlib
import bisect

class ConsistentHash:
    def __init__(self, nodes=None, virtual_nodes=150):
        self.virtual_nodes = virtual_nodes
        self.ring = {}       # hash_value -> node
        self.sorted_keys = []
        
        if nodes:
            for node in nodes:
                self.add_node(node)
    
    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_node(self, node):
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vn{i}"
            hash_val = self._hash(virtual_key)
            self.ring[hash_val] = node
            bisect.insort(self.sorted_keys, hash_val)
    
    def remove_node(self, node):
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vn{i}"
            hash_val = self._hash(virtual_key)
            del self.ring[hash_val]
            self.sorted_keys.remove(hash_val)
    
    def get_node(self, key):
        if not self.ring:
            return None
        hash_val = self._hash(key)
        idx = bisect.bisect_right(self.sorted_keys, hash_val)
        if idx == len(self.sorted_keys):
            idx = 0  # Wrap around the ring
        return self.ring[self.sorted_keys[idx]]

# Usage
ch = ConsistentHash(["redis-node-1", "redis-node-2", "redis-node-3"])
node = ch.get_node("user:1001")  # Consistently maps to same node
print(f"Store on: {node}")

When a node is added or removed, only the keys that map to that specific segment of the ring need to be remapped — typically only K/N keys (where K is total keys and N is total nodes). Virtual nodes ensure even distribution across the ring.

Redis Cluster

Redis Cluster is Redis's built-in solution for horizontal scaling. It automatically partitions data across multiple Redis nodes using a hash slot mechanism.

Hash Slots

Redis Cluster divides the keyspace into 16,384 hash slots. Each key is mapped to a slot using CRC16(key) mod 16384. Slots are distributed across master nodes.

# Example: 3-node cluster
# Node A: slots 0-5460
# Node B: slots 5461-10922
# Node C: slots 10923-16383

# Check which slot a key maps to
CLUSTER KEYSLOT "user:1001"    # Returns slot number

# Hash tags force related keys to same slot
SET {user:1001}:profile "..."   # Same slot
SET {user:1001}:orders "..."    # Same slot (same hash tag)

# Cluster management
CLUSTER INFO          # Cluster state and stats
CLUSTER NODES         # List all nodes and their slot ranges
CLUSTER SLOTS         # Slot-to-node mapping

Replication and Failover

Each master node can have one or more replicas. If a master fails, its replica is automatically promoted. A Redis Cluster with 3 masters and 3 replicas (one replica per master) can survive any single node failure without data loss or downtime.

Limitations

Multi-key operations (MGET, MSET, transactions) only work if all keys are in the same slot. Use hash tags to colocate related keys.
Resharding (moving slots between nodes) requires data migration and can impact performance.
Maximum recommended cluster size is around 1,000 nodes due to gossip protocol overhead.

Memcached Distributed Architecture

Memcached takes a different approach — the servers are independent and have no awareness of each other. The client library implements the distribution logic using consistent hashing.

# Python memcached client with multiple servers
import pymemcache
from pymemcache.client.hash import HashClient

# Client handles consistent hashing across servers
client = HashClient([
    ("memcached-1.internal", 11211),
    ("memcached-2.internal", 11211),
    ("memcached-3.internal", 11211),
])

# Client determines which server gets each key
client.set("user:1001", '{"name": "Alice"}')
client.set("user:1002", '{"name": "Bob"}')

# Reads automatically go to the correct server
data = client.get("user:1001")

Feature	Redis Cluster	Memcached
Distribution Logic	Server-side (hash slots)	Client-side (consistent hashing)
Automatic Failover	Yes (replica promotion)	No (client must handle)
Data Replication	Built-in (master-replica)	None
Data Structures	Rich (lists, sets, hashes, etc.)	Key-value only
Resharding	Online (live slot migration)	Client reconnect (cache miss storm)

Partition Tolerance and CAP Theorem

Distributed caches must deal with network partitions — when nodes cannot communicate with each other. According to the CAP theorem, during a partition, the system must choose between consistency and availability.

Redis Cluster prioritizes availability with eventual consistency: during a partition, both sides of the partition continue serving requests, but writes on the minority side may be lost when the partition heals. This is acceptable for a cache (losing a cached value just causes a cache miss) but important to understand if using Redis as a primary data store.

Replication Strategies

Primary-Replica (Master-Slave)

One node handles writes; replicas receive copies for read scaling and failover. Redis uses asynchronous replication — the master does not wait for replicas to acknowledge writes, so some data may be lost during failover.

Multi-Region Replication

For global applications, replicate cache data across regions. Users in Europe read from European cache nodes while users in Asia read from Asian nodes. Writes are typically directed to a single primary region and replicated asynchronously. Services like Amazon ElastiCache Global Datastore support this pattern.

Common Distributed Caching Patterns

L1/L2 Cache Hierarchy

Use a fast local in-process cache (L1) backed by a shared distributed cache (L2). The L1 cache handles the hottest data with zero network latency, while the L2 cache provides a shared layer across all application instances.

from functools import lru_cache

# L1: In-process cache (per application instance)
@lru_cache(maxsize=1000)
def get_from_l1(key):
    return None  # Placeholder — real impl checks local dict

def get_with_hierarchy(key):
    # L1: Local in-process cache
    value = local_cache.get(key)
    if value is not None:
        return value
    
    # L2: Distributed cache (Redis)
    value = redis_client.get(key)
    if value is not None:
        local_cache.set(key, value, ttl=60)  # Short L1 TTL
        return value
    
    # L3: Database
    value = database.get(key)
    redis_client.setex(key, 600, value)       # Longer L2 TTL
    local_cache.set(key, value, ttl=60)
    return value

This pattern is essential for handling hot keys that would otherwise overwhelm a single Redis node.

Monitoring Distributed Caches

Key metrics to monitor in a distributed cache:

Hit ratio per node: Uneven hit ratios indicate key distribution problems.
Memory usage per node: Identify nodes approaching their memory limit before evictions begin.
Network latency between nodes: High inter-node latency degrades cluster operations.
Replication lag: Measure how far replicas fall behind the primary to understand your consistency window.
Eviction rate: A sudden spike in evictions indicates the cache is too small for the workload.

Frequently Asked Questions

How many nodes should my distributed cache have?

Start with the minimum that meets your memory and throughput needs. For Redis Cluster, the minimum is 3 masters (for quorum-based failover), each with 1 replica, totaling 6 nodes. Scale from there based on memory requirements (divide total data size by per-node memory) and throughput needs. Most applications work well with 3-12 nodes.

What happens when I add a new node to the cluster?

With consistent hashing, adding a node remaps approximately 1/N of keys (where N is the new total number of nodes). In Redis Cluster, you manually assign hash slots to the new node using CLUSTER ADDSLOTS or the resharding tool. The cluster migrates the affected keys to the new node online without downtime.

Should I use Redis Cluster or Redis Sentinel?

Use Sentinel when you need high availability (automatic failover) but your data fits on a single master. Use Cluster when you need both high availability AND horizontal scaling (data too large for one node). Sentinel is simpler to operate; Cluster is more powerful but more complex.

How do I handle cache invalidation across a distributed cache?

Cache invalidation in distributed systems requires the invalidation request to reach all nodes holding copies of the data. With Redis Cluster, deleting a key automatically routes to the correct node. For L1/L2 hierarchies, use pub/sub to broadcast invalidation events so all application instances clear their local L1 caches.

Distributed Caching: Scaling Cache Across Multiple Nodes