Consistent Hashing: Scalable Data Distribution
Consistent hashing is a distributed hashing technique that minimizes the number of keys that need to be remapped when the hash table is resized. Traditional hash-based partitioning (key % N) requires rehashing almost all keys when a node is added or removed. Consistent hashing remaps only K/N keys on average, where K is the total number of keys and N is the number of nodes. This makes it ideal for distributed caches, databases, and load balancers.
How Consistent Hashing Works
The core idea is to map both keys and nodes onto the same circular hash space (a ring of values from 0 to 2^32 - 1):
- Hash each node identifier to a position on the ring
- Hash each key to a position on the ring
- Walk clockwise from the key position to find the first node — that node owns the key
When a node is added, it takes over keys from the next node clockwise. When a node is removed, its keys move to the next clockwise node. Only keys between the new/removed node and the previous node are affected.
Basic Implementation
import hashlib
import bisect
class ConsistentHash:
def __init__(self, nodes=None, virtual_nodes=150):
self.virtual_nodes = virtual_nodes
self.ring = {} # hash -> node
self.sorted_keys = [] # sorted hash values
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
md5 = hashlib.md5(key.encode()).hexdigest()
return int(md5, 16)
def add_node(self, node):
for i in range(self.virtual_nodes):
virtual_key = f"{node}:vn{i}"
hash_val = self._hash(virtual_key)
self.ring[hash_val] = node
bisect.insort(self.sorted_keys, hash_val)
def remove_node(self, node):
for i in range(self.virtual_nodes):
virtual_key = f"{node}:vn{i}"
hash_val = self._hash(virtual_key)
if hash_val in self.ring:
del self.ring[hash_val]
self.sorted_keys.remove(hash_val)
def get_node(self, key):
if not self.sorted_keys:
return None
hash_val = self._hash(key)
idx = bisect.bisect_right(self.sorted_keys, hash_val)
if idx == len(self.sorted_keys):
idx = 0 # wrap around the ring
return self.ring[self.sorted_keys[idx]]
# Usage
ch = ConsistentHash(["node-1", "node-2", "node-3"])
print(ch.get_node("user:1234")) # -> "node-2"
print(ch.get_node("user:5678")) # -> "node-1"
# Adding a node only remaps ~1/N of keys
ch.add_node("node-4")
print(ch.get_node("user:1234")) # likely same node
Virtual Nodes
Without virtual nodes, the distribution of keys is often uneven — some nodes get significantly more keys than others. Virtual nodes solve this by mapping each physical node to multiple positions on the ring.
| Virtual Nodes Per Server | Standard Deviation of Load | Balance Quality |
|---|---|---|
| 1 | ~50% | Very poor |
| 10 | ~20% | Moderate |
| 100 | ~6% | Good |
| 150-200 | ~5% | Excellent |
Virtual nodes also enable heterogeneous node support. A more powerful server can be assigned more virtual nodes, receiving a proportionally larger share of keys.
Rebalancing on Node Changes
When a new node joins the ring:
class ConsistentHashWithRebalancing(ConsistentHash):
def add_node_with_migration(self, new_node):
keys_to_migrate = {}
for i in range(self.virtual_nodes):
virtual_key = f"{new_node}:vn{i}"
hash_val = self._hash(virtual_key)
# Find the node that currently owns this range
idx = bisect.bisect_right(self.sorted_keys, hash_val)
if idx == len(self.sorted_keys):
idx = 0
current_owner = self.ring[self.sorted_keys[idx]]
if current_owner not in keys_to_migrate:
keys_to_migrate[current_owner] = []
keys_to_migrate[current_owner].append(hash_val)
self.add_node(new_node)
# Migrate affected keys from old owners to new node
for old_owner, hash_ranges in keys_to_migrate.items():
migrate_keys(old_owner, new_node, hash_ranges)
Real-World Usage
Amazon DynamoDB
DynamoDB uses consistent hashing to distribute data across partitions. Each partition is responsible for a range of the hash space. When a partition becomes hot, DynamoDB automatically splits it and remaps only the affected range, not the entire key space.
Apache Cassandra
Cassandra uses consistent hashing with virtual nodes (vnodes) for data distribution. Each node owns multiple token ranges on the ring. When a new node joins, it takes over a portion of the token ranges from existing nodes. The default is 256 vnodes per node.
# Cassandra nodetool showing token ring
$ nodetool ring
Datacenter: dc1
Address Rack Status State Load Owns Token
10.0.1.1 rack1 Up Normal 15.2 GB 25.1% -9223372036854775808
10.0.1.2 rack2 Up Normal 14.8 GB 24.9% -4611686018427387904
10.0.1.3 rack1 Up Normal 15.5 GB 25.3% 0
10.0.1.4 rack2 Up Normal 14.9 GB 24.7% 4611686018427387903
Memcached
Memcached client libraries use consistent hashing to determine which cache server stores each key. This ensures that adding or removing a cache server only invalidates approximately 1/N of the cached data.
Comparison: Consistent Hashing vs Modular Hashing
| Aspect | Modular (key % N) | Consistent Hashing |
|---|---|---|
| Keys remapped on resize | ~100% (almost all keys) | ~K/N (minimal) |
| Load balance | Good (with good hash) | Good (with virtual nodes) |
| Heterogeneous nodes | Not supported | Supported via virtual nodes |
| Implementation | Trivial | Moderate |
Consistent hashing is fundamental to many distributed systems. It connects to quorum systems for determining replica placement, gossip protocols for disseminating ring membership, and horizontal scaling strategies. Use the System Design Calculator to model data distribution across your hash ring.
Frequently Asked Questions
Q: How many virtual nodes should I use?
The sweet spot is typically 100-200 virtual nodes per physical node. This provides good load balance (~5% standard deviation) without excessive memory overhead. Cassandra defaults to 256 vnodes. More virtual nodes mean better balance but higher memory usage for the ring metadata.
Q: What hash function should I use?
MD5 provides good distribution and is commonly used. MurmurHash3 is faster and provides excellent distribution — Cassandra uses it. SHA-1 works too but is slower than necessary since cryptographic security is not required. Avoid CRC32 as its distribution is not uniform enough.
Q: How does consistent hashing handle replication?
For replication factor N, a key is stored on the N nodes clockwise from the key's position on the ring. Cassandra places replicas on the next N distinct physical nodes (skipping additional virtual nodes of the same physical node). This connects to quorum-based reads and writes for consistency.
Q: What is jump consistent hashing?
Jump consistent hash (Google, 2014) is an alternative that provides perfect balance with zero memory overhead. However, it only supports sequential node numbering (0 to N-1) and does not support arbitrary node addition/removal. It is ideal for sharding across a fixed number of buckets.