CAP Theorem Explained: Consistency, Availability, and Partition Tolerance

The CAP theorem is one of the most fundamental concepts in distributed systems. Proposed by computer scientist Eric Brewer in 2000 and formally proven by Seth Gilbert and Nancy Lynch in 2002, it states that a distributed data store can provide at most two out of three guarantees simultaneously: Consistency, Availability, and Partition Tolerance.

Understanding the CAP theorem is essential for making informed database and architecture choices. In this guide, we will break down each property, explore real systems through the CAP lens, and learn when to choose which trade-off.

The Three Properties

Consistency (C)

Every read receives the most recent write or an error. All nodes in the system see the same data at the same time. If you write a value and immediately read it back from any node, you get the updated value.

This is not the same as ACID consistency in databases. CAP consistency refers specifically to linearizability — a strong consistency model where operations appear to happen instantaneously and in a total order.

Availability (A)

Every request receives a non-error response, without the guarantee that it contains the most recent write. The system continues to operate and serve requests even if some nodes are down. No request is left hanging indefinitely.

In CAP terms, availability means every non-failing node must return a response for every request it receives. This is a strong guarantee — it does not allow the system to reject requests just because it cannot reach other nodes.

Partition Tolerance (P)

The system continues to operate despite network partitions — that is, messages being lost or delayed between nodes. In a distributed system running across multiple machines, network failures are inevitable. Cables get cut, switches fail, and cloud regions lose connectivity.

Partition tolerance is not really optional in a distributed system. Networks will fail. The real choice is between consistency and availability when a partition occurs.

Why You Cannot Have All Three

Imagine a simple distributed system with two nodes, Node A and Node B, both holding the same data. A network partition occurs — the nodes cannot communicate with each other.

A client sends a write to Node A, updating a value from X to Y. Now another client sends a read to Node B. The system has two options:

Choice	What Happens	Result
Choose Consistency	Node B refuses to respond because it cannot verify it has the latest data	Consistent but not available (CP)
Choose Availability	Node B returns the stale value X	Available but not consistent (AP)

There is no way to have both. If the network is partitioned, you must choose between giving a potentially stale answer (availability) or giving no answer at all (consistency).

CP, AP, and CA Systems

CP Systems (Consistency + Partition Tolerance)

CP systems prioritize consistency. When a partition occurs, the system makes some data unavailable rather than risk returning stale data. This is the right choice when incorrect data is worse than no data.

Examples:

HBase: Built on HDFS, provides strong consistency with a single region server per row range. If the region server is unreachable, those rows are unavailable.
MongoDB (with majority writes): When configured with write concern "majority", MongoDB ensures writes are replicated before acknowledging. During a partition, the minority side becomes read-only.
Zookeeper: Used for coordination and configuration. Always returns consistent data, but becomes unavailable if a quorum cannot be reached.
Redis (Cluster mode): With cluster mode, Redis prioritizes consistency over availability during partitions.

When to choose CP:

Banking and financial transactions — showing a wrong balance is worse than showing an error
Inventory management — overselling is costly
Leader election and distributed locks — must be correct

AP Systems (Availability + Partition Tolerance)

AP systems prioritize availability. When a partition occurs, every node continues serving requests, but the data might be stale or inconsistent across nodes. The system reconciles differences after the partition heals.

Examples:

Cassandra: Designed for availability. Every node can accept reads and writes. Uses eventual consistency with tunable consistency levels.
DynamoDB: Amazon's key-value store, designed for "always-on" availability. Provides eventual consistency by default, with optional strongly consistent reads.
CouchDB: Multi-master replication with conflict resolution. Optimizes for availability and partition tolerance.
Riak: Distributed key-value store that stays available during partitions, resolving conflicts later.

When to choose AP:

Social media feeds — showing a slightly stale timeline is fine
Product catalogs — a brief delay in price updates is acceptable
DNS — must always return a result, even if slightly outdated
Shopping carts — availability is more important than perfect consistency

CA Systems (Consistency + Availability)

CA systems provide both consistency and availability but cannot tolerate network partitions. In practice, this means they are single-node systems or systems running on a single network that never partitions.

Examples:

Single-node PostgreSQL: A single database server provides both consistency (ACID) and availability (if the server is up). But it cannot survive a network partition because there is only one node.
Single-node MySQL: Same as PostgreSQL — strong consistency and availability on a single machine.

In the real world of distributed systems, CA is essentially impossible because you cannot guarantee no partitions. Any system that runs across a network must handle partitions. This is why the practical choice is always between CP and AP.

Real Databases Mapped to CAP

Database	CAP Category	Notes
PostgreSQL (single node)	CA	Strong ACID, no partition tolerance
MySQL (single node)	CA	Same as PostgreSQL for single node
MongoDB	CP	Strongly consistent with replica sets
HBase	CP	Strong consistency, single region server per range
Zookeeper	CP	Coordination service, needs quorum
Cassandra	AP	Tunable consistency, defaults to eventual
DynamoDB	AP	Eventually consistent by default
CouchDB	AP	Multi-master with conflict resolution
Riak	AP	Distributed key-value store
Google Spanner	CP (effectively CA)	Uses TrueTime for global consistency, extremely high availability

Beyond CAP: The PACELC Theorem

The CAP theorem only describes behavior during a partition. But what about normal operation? The PACELC theorem extends CAP:

If Partition (P), choose Availability (A) or Consistency (C); Else (E), choose Latency (L) or Consistency (C).

PACELC Examples:

Cassandra:  PA/EL — Available during partitions, low latency otherwise
DynamoDB:   PA/EL — Same as Cassandra
MongoDB:    PC/EC — Consistent during partitions, consistent otherwise
HBase:      PC/EC — Always prioritizes consistency
Spanner:    PC/EC — Consistent always, accepts higher latency

PACELC is more nuanced than CAP because it also captures the latency vs. consistency trade-off that exists even when the network is healthy.

Practical Decision Framework

When choosing between CP and AP for a specific use case, ask these questions:

Decision Framework:

1. What happens if a user reads stale data?
   - Catastrophic (financial loss, safety risk) → CP
   - Annoying but recoverable (stale feed, old count) → AP

2. What happens if the system is unavailable?
   - Acceptable for short periods → CP
   - Must always respond, even with stale data → AP

3. Can the application handle conflicts?
   - Yes, with last-write-wins or custom logic → AP
   - No, conflicts would corrupt data → CP

4. What is the read/write ratio?
   - Read-heavy with tolerance for staleness → AP
   - Write-heavy where order matters → CP

Common Misconceptions

"CAP says pick two of three": This is misleading. Since partitions are inevitable, you are really choosing between consistency and availability during partitions. When there is no partition, you can have both.
"Eventual consistency means inconsistent": Eventual consistency means the system will become consistent given enough time without new writes. It is not random or chaotic — it follows well-defined rules.
"CP means the system is always unavailable": CP systems are only unavailable during partitions. In normal operation, they serve all requests. Partitions are rare events.
"One database, one CAP category": Many databases offer tunable consistency. Cassandra with QUORUM reads and writes behaves like a CP system. DynamoDB offers strongly consistent reads as an option.

CAP in System Design Interviews

When discussing CAP in interviews, avoid simply labeling databases. Instead, show that you understand the trade-offs and can make informed decisions based on requirements. For example: "Since our messaging system needs to always accept new messages even during network issues, we should lean AP. We can use Cassandra or DynamoDB and handle eventual consistency by showing a 'syncing...' indicator to users."

For a broader view of system design trade-offs, and how they connect to high availability and fault tolerance, explore our other guides.

Frequently Asked Questions

Is the CAP theorem still relevant today?

Yes, but with nuance. The core insight — that you must trade off consistency and availability during partitions — remains true. However, modern systems like Google Spanner blur the lines by using specialized hardware (atomic clocks) to minimize the practical impact of partitions. The PACELC theorem provides a more complete picture.

How does Google Spanner appear to beat the CAP theorem?

Spanner does not beat CAP — it is a CP system. It achieves near-CA behavior by using TrueTime (GPS and atomic clocks) to minimize the window of unavailability during partitions to milliseconds. It also runs on Google's private network, where partitions are exceedingly rare. In theory, during a partition, it chooses consistency over availability.

Can I use different CAP trade-offs for different parts of the same system?

Absolutely, and this is the recommended approach. In an e-commerce system, you might use a CP system for payment processing and inventory management, but an AP system for product recommendations and user session data. Match the CAP trade-off to the specific data's requirements.

What is tunable consistency and how does it relate to CAP?

Some databases like Cassandra let you configure the consistency level per query. With QUORUM reads and writes, Cassandra behaves like a CP system. With ONE read/write, it behaves like an AP system. This lets you choose the trade-off at the query level rather than the system level, offering maximum flexibility.

How do network partitions actually happen in practice?

Partitions can be caused by switch failures, cable cuts, cloud provider outages, misconfigured firewalls, BGP routing issues, or even software bugs that cause nodes to become unreachable. In cloud environments, cross-region partitions are more common than within-region ones. AWS, for example, has experienced multiple cross-region connectivity issues over the years.

CAP Theorem Explained: Consistency, Availability, and Partition Tolerance