CAP Theorem Explained: Consistency, Availability, and Partition Tolerance
The CAP theorem is one of the most fundamental concepts in distributed systems. Proposed by computer scientist Eric Brewer in 2000 and formally proven by Seth Gilbert and Nancy Lynch in 2002, it states that a distributed data store can provide at most two out of three guarantees simultaneously: Consistency, Availability, and Partition Tolerance.
Understanding the CAP theorem is essential for making informed database and architecture choices. In this guide, we will break down each property, explore real systems through the CAP lens, and learn when to choose which trade-off.
The Three Properties
Consistency (C)
Every read receives the most recent write or an error. All nodes in the system see the same data at the same time. If you write a value and immediately read it back from any node, you get the updated value.
This is not the same as ACID consistency in databases. CAP consistency refers specifically to linearizability — a strong consistency model where operations appear to happen instantaneously and in a total order.
Availability (A)
Every request receives a non-error response, without the guarantee that it contains the most recent write. The system continues to operate and serve requests even if some nodes are down. No request is left hanging indefinitely.
In CAP terms, availability means every non-failing node must return a response for every request it receives. This is a strong guarantee — it does not allow the system to reject requests just because it cannot reach other nodes.
Partition Tolerance (P)
The system continues to operate despite network partitions — that is, messages being lost or delayed between nodes. In a distributed system running across multiple machines, network failures are inevitable. Cables get cut, switches fail, and cloud regions lose connectivity.
Partition tolerance is not really optional in a distributed system. Networks will fail. The real choice is between consistency and availability when a partition occurs.
Why You Cannot Have All Three
Imagine a simple distributed system with two nodes, Node A and Node B, both holding the same data. A network partition occurs — the nodes cannot communicate with each other.
A client sends a write to Node A, updating a value from X to Y. Now another client sends a read to Node B. The system has two options:
| Choice | What Happens | Result |
|---|---|---|
| Choose Consistency | Node B refuses to respond because it cannot verify it has the latest data | Consistent but not available (CP) |
| Choose Availability | Node B returns the stale value X | Available but not consistent (AP) |
There is no way to have both. If the network is partitioned, you must choose between giving a potentially stale answer (availability) or giving no answer at all (consistency).
CP, AP, and CA Systems
CP Systems (Consistency + Partition Tolerance)
CP systems prioritize consistency. When a partition occurs, the system makes some data unavailable rather than risk returning stale data. This is the right choice when incorrect data is worse than no data.
Examples:
- HBase: Built on HDFS, provides strong consistency with a single region server per row range. If the region server is unreachable, those rows are unavailable.
- MongoDB (with majority writes): When configured with write concern "majority", MongoDB ensures writes are replicated before acknowledging. During a partition, the minority side becomes read-only.
- Zookeeper: Used for coordination and configuration. Always returns consistent data, but becomes unavailable if a quorum cannot be reached.
- Redis (Cluster mode): With cluster mode, Redis prioritizes consistency over availability during partitions.
When to choose CP:
- Banking and financial transactions — showing a wrong balance is worse than showing an error
- Inventory management — overselling is costly
- Leader election and distributed locks — must be correct
AP Systems (Availability + Partition Tolerance)
AP systems prioritize availability. When a partition occurs, every node continues serving requests, but the data might be stale or inconsistent across nodes. The system reconciles differences after the partition heals.
Examples:
- Cassandra: Designed for availability. Every node can accept reads and writes. Uses eventual consistency with tunable consistency levels.
- DynamoDB: Amazon's key-value store, designed for "always-on" availability. Provides eventual consistency by default, with optional strongly consistent reads.
- CouchDB: Multi-master replication with conflict resolution. Optimizes for availability and partition tolerance.
- Riak: Distributed key-value store that stays available during partitions, resolving conflicts later.
When to choose AP:
- Social media feeds — showing a slightly stale timeline is fine
- Product catalogs — a brief delay in price updates is acceptable
- DNS — must always return a result, even if slightly outdated
- Shopping carts — availability is more important than perfect consistency
CA Systems (Consistency + Availability)
CA systems provide both consistency and availability but cannot tolerate network partitions. In practice, this means they are single-node systems or systems running on a single network that never partitions.
Examples:
- Single-node PostgreSQL: A single database server provides both consistency (ACID) and availability (if the server is up). But it cannot survive a network partition because there is only one node.
- Single-node MySQL: Same as PostgreSQL — strong consistency and availability on a single machine.
In the real world of distributed systems, CA is essentially impossible because you cannot guarantee no partitions. Any system that runs across a network must handle partitions. This is why the practical choice is always between CP and AP.
Real Databases Mapped to CAP
| Database | CAP Category | Notes |
|---|---|---|
| PostgreSQL (single node) | CA | Strong ACID, no partition tolerance |
| MySQL (single node) | CA | Same as PostgreSQL for single node |
| MongoDB | CP | Strongly consistent with replica sets |
| HBase | CP | Strong consistency, single region server per range |
| Zookeeper | CP | Coordination service, needs quorum |
| Cassandra | AP | Tunable consistency, defaults to eventual |
| DynamoDB | AP | Eventually consistent by default |
| CouchDB | AP | Multi-master with conflict resolution |
| Riak | AP | Distributed key-value store |
| Google Spanner | CP (effectively CA) | Uses TrueTime for global consistency, extremely high availability |
Beyond CAP: The PACELC Theorem
The CAP theorem only describes behavior during a partition. But what about normal operation? The PACELC theorem extends CAP:
If Partition (P), choose Availability (A) or Consistency (C); Else (E), choose Latency (L) or Consistency (C).
PACELC Examples:
Cassandra: PA/EL — Available during partitions, low latency otherwise
DynamoDB: PA/EL — Same as Cassandra
MongoDB: PC/EC — Consistent during partitions, consistent otherwise
HBase: PC/EC — Always prioritizes consistency
Spanner: PC/EC — Consistent always, accepts higher latency
PACELC is more nuanced than CAP because it also captures the latency vs. consistency trade-off that exists even when the network is healthy.
Practical Decision Framework
When choosing between CP and AP for a specific use case, ask these questions:
Decision Framework:
1. What happens if a user reads stale data?
- Catastrophic (financial loss, safety risk) → CP
- Annoying but recoverable (stale feed, old count) → AP
2. What happens if the system is unavailable?
- Acceptable for short periods → CP
- Must always respond, even with stale data → AP
3. Can the application handle conflicts?
- Yes, with last-write-wins or custom logic → AP
- No, conflicts would corrupt data → CP
4. What is the read/write ratio?
- Read-heavy with tolerance for staleness → AP
- Write-heavy where order matters → CP
Common Misconceptions
- "CAP says pick two of three": This is misleading. Since partitions are inevitable, you are really choosing between consistency and availability during partitions. When there is no partition, you can have both.
- "Eventual consistency means inconsistent": Eventual consistency means the system will become consistent given enough time without new writes. It is not random or chaotic — it follows well-defined rules.
- "CP means the system is always unavailable": CP systems are only unavailable during partitions. In normal operation, they serve all requests. Partitions are rare events.
- "One database, one CAP category": Many databases offer tunable consistency. Cassandra with QUORUM reads and writes behaves like a CP system. DynamoDB offers strongly consistent reads as an option.
CAP in System Design Interviews
When discussing CAP in interviews, avoid simply labeling databases. Instead, show that you understand the trade-offs and can make informed decisions based on requirements. For example: "Since our messaging system needs to always accept new messages even during network issues, we should lean AP. We can use Cassandra or DynamoDB and handle eventual consistency by showing a 'syncing...' indicator to users."
For a broader view of system design trade-offs, and how they connect to high availability and fault tolerance, explore our other guides.
Frequently Asked Questions
Is the CAP theorem still relevant today?
Yes, but with nuance. The core insight — that you must trade off consistency and availability during partitions — remains true. However, modern systems like Google Spanner blur the lines by using specialized hardware (atomic clocks) to minimize the practical impact of partitions. The PACELC theorem provides a more complete picture.
How does Google Spanner appear to beat the CAP theorem?
Spanner does not beat CAP — it is a CP system. It achieves near-CA behavior by using TrueTime (GPS and atomic clocks) to minimize the window of unavailability during partitions to milliseconds. It also runs on Google's private network, where partitions are exceedingly rare. In theory, during a partition, it chooses consistency over availability.
Can I use different CAP trade-offs for different parts of the same system?
Absolutely, and this is the recommended approach. In an e-commerce system, you might use a CP system for payment processing and inventory management, but an AP system for product recommendations and user session data. Match the CAP trade-off to the specific data's requirements.
What is tunable consistency and how does it relate to CAP?
Some databases like Cassandra let you configure the consistency level per query. With QUORUM reads and writes, Cassandra behaves like a CP system. With ONE read/write, it behaves like an AP system. This lets you choose the trade-off at the query level rather than the system level, offering maximum flexibility.
How do network partitions actually happen in practice?
Partitions can be caused by switch failures, cable cuts, cloud provider outages, misconfigured firewalls, BGP routing issues, or even software bugs that cause nodes to become unreachable. In cloud environments, cross-region partitions are more common than within-region ones. AWS, for example, has experienced multiple cross-region connectivity issues over the years.