Quorum Systems: Tunable Consistency in Distributed Databases
Quorum systems are the foundation of tunable consistency in distributed databases. A quorum is the minimum number of nodes that must participate in a read or write operation for it to be considered successful. By adjusting the quorum sizes for reads and writes, you can trade off between consistency, availability, and latency. This is the mechanism behind DynamoDB, Cassandra, and Riak's consistency models.
Understanding Read and Write Quorums
In a system with N replicas, a write quorum (W) is the number of replicas that must acknowledge a write, and a read quorum (R) is the number of replicas that must respond to a read. The fundamental rule for strong consistency is:
W + R > N
This guarantees that any read quorum overlaps with any write quorum — at least one node in every read set has the latest write. When this condition holds, reads always return the most recent write.
| Configuration (N=3) | W | R | Consistency | Trade-off |
|---|---|---|---|---|
| Strong Consistency | 2 | 2 | Strong (2+2>3) | Balanced reads and writes |
| Write-heavy | 1 | 3 | Strong (1+3>3) | Fast writes, slow reads |
| Read-heavy | 3 | 1 | Strong (3+1>3) | Fast reads, slow writes |
| Eventual | 1 | 1 | Eventual (1+1=2, not >3) | Fast but potentially stale |
Implementation Example
import asyncio
from dataclasses import dataclass
from typing import List, Optional, Tuple
@dataclass
class VersionedValue:
value: any
version: int
timestamp: float
class QuorumClient:
def __init__(self, replicas: List[str], n: int, w: int, r: int):
self.replicas = replicas
self.n = n
self.w = w
self.r = r
assert w + r > n, "W + R must be > N for strong consistency"
async def quorum_write(self, key: str, value: any) -> bool:
version = int(time.time() * 1000)
tasks = [
self._write_to_replica(replica, key, value, version)
for replica in self.replicas[:self.n]
]
results = await asyncio.gather(*tasks, return_exceptions=True)
ack_count = sum(1 for r in results if r is True)
return ack_count >= self.w
async def quorum_read(self, key: str) -> Optional[any]:
tasks = [
self._read_from_replica(replica, key)
for replica in self.replicas[:self.n]
]
results = await asyncio.gather(*tasks, return_exceptions=True)
valid_results = [r for r in results
if isinstance(r, VersionedValue)]
if len(valid_results) < self.r:
raise ConsistencyError("Could not reach read quorum")
# Return the value with the highest version
latest = max(valid_results, key=lambda x: x.version)
# Read repair: update stale replicas
asyncio.create_task(
self._read_repair(key, latest, valid_results)
)
return latest.value
async def _read_repair(self, key, latest, results):
for result in results:
if result.version < latest.version:
await self._write_to_replica(
result.replica, key, latest.value, latest.version
)
Sloppy Quorum
A strict quorum requires responses from the designated replicas for a key. A sloppy quorum allows any N nodes to participate, not just the designated ones. This improves availability during network partitions — if a designated node is unreachable, a temporary stand-in accepts the write.
class SloppyQuorumClient:
def __init__(self, all_nodes, preference_list_size=5, w=2, r=2):
self.all_nodes = all_nodes
self.preference_list_size = preference_list_size
self.w = w
self.r = r
def get_preference_list(self, key):
# Nodes that should own this key (via consistent hashing)
primary_nodes = self.consistent_hash.get_nodes(key, count=3)
# Additional fallback nodes
fallback_nodes = [n for n in self.all_nodes
if n not in primary_nodes]
return primary_nodes + fallback_nodes[:self.preference_list_size - 3]
async def sloppy_write(self, key, value):
preference_list = self.get_preference_list(key)
ack_count = 0
hints = []
for node in preference_list:
if ack_count >= self.w:
break
try:
if node in self.primary_nodes(key):
await write_to_node(node, key, value)
else:
# Store as hinted handoff
target = self.get_target_node(key, node)
await write_hint(node, key, value, target)
hints.append((node, target))
ack_count += 1
except ConnectionError:
continue
return ack_count >= self.w
Hinted Handoff
When a sloppy quorum writes to a substitute node, that node stores a "hint" indicating the intended recipient. When the intended node recovers, the hint is delivered and the data is transferred. This ensures data eventually reaches the correct replica.
class HintedHandoff:
def __init__(self, node_id):
self.node_id = node_id
self.hints = {} # target_node -> [(key, value, version)]
def store_hint(self, target_node, key, value, version):
if target_node not in self.hints:
self.hints[target_node] = []
self.hints[target_node].append((key, value, version))
async def deliver_hints(self):
for target_node, items in list(self.hints.items()):
try:
for key, value, version in items:
await write_to_node(target_node, key, value, version)
del self.hints[target_node]
except ConnectionError:
pass # Retry next cycle
DynamoDB Quorum Example
Amazon DynamoDB uses quorum-based replication across three availability zones. For a table with N=3 replicas:
- Strongly consistent reads: R=2, reads from majority and returns latest
- Eventually consistent reads: R=1, reads from any one replica (faster, cheaper)
- Writes: W=2, write acknowledged after 2 of 3 replicas confirm
DynamoDB also uses sloppy quorum with hinted handoff for durability during availability zone failures. The system automatically transfers hinted data when the target zone recovers.
Anti-Entropy and Read Repair
Quorum systems use two mechanisms to keep replicas synchronized:
| Mechanism | When It Runs | How It Works |
|---|---|---|
| Read Repair | On every read | When a read detects stale data, update the stale replica |
| Anti-Entropy | Background process | Merkle tree comparison to find and fix differences |
| Hinted Handoff | On node recovery | Transfer data stored by substitute nodes |
Quorum systems connect to consistent hashing for determining which nodes are replicas, vector clocks for detecting conflicts, and gossip protocols for membership management.
Frequently Asked Questions
Q: Does W + R > N guarantee linearizability?
No. W + R > N guarantees that reads see the most recent write, but this alone does not provide linearizability. Linearizability additionally requires that operations appear to happen instantaneously. Achieving linearizability requires additional mechanisms like read-your-writes consistency or using a leader-based approach.
Q: What happens when a quorum cannot be reached?
The operation fails. With strict quorum, if fewer than W nodes are available for a write, the write is rejected. This is the availability trade-off for consistency. Sloppy quorum mitigates this by allowing substitute nodes to participate.
Q: How do I choose between strict and sloppy quorum?
Use strict quorum when consistency is critical (financial transactions, inventory). Use sloppy quorum when availability is more important (shopping cart, user preferences). DynamoDB uses sloppy quorum for its core key-value operations, prioritizing availability.
Q: What is the relationship between quorums and the CAP theorem?
Quorums let you tune the CAP trade-off. With W + R > N, you get consistency but sacrifice availability during partitions (CP). With W + R <= N, you get availability but sacrifice consistency (AP). The partition tolerance dimension is not optional in distributed systems.