The CAP theorem doesn’t actually say you have to choose between Consistency, Availability, and Partition Tolerance. It says that in the presence of a network partition, you can only guarantee two of the three.

Let’s see this in action. Imagine a simple distributed key-value store. We have two nodes, Node A and Node B, both holding the key user:123.

Scenario 1: No Partition (Normal Operation)

If we write user:123 with value {"name": "Alice"} to Node A, and then immediately read user:123 from Node B, a strongly consistent system would ensure that Node B also returns {"name": "Alice"}. This is because there’s no partition, so all nodes are in communication and can synchronize their data.

# On Node A
curl -X PUT -d '{"name": "Alice"}' http://node-a:8080/data/user:123

# On Node B (immediately after)
curl http://node-b:8080/data/user:123
# Expected output: {"name": "Alice"}

Scenario 2: Network Partition (The CAP Dilemma)

Now, let’s introduce a partition between Node A and Node B. They can no longer talk to each other.

  • If we prioritize Consistency © and Partition Tolerance (P): When a client tries to write user:123 with value {"name": "Bob"} to Node A, Node A knows it can’t replicate this change to Node B due to the partition. To maintain consistency (meaning all nodes would see the same data if they could communicate), Node A must refuse the write operation. Similarly, if a client tries to read from Node B, and Node B hasn’t received the latest update from Node A, it would return stale data or an error to avoid returning inconsistent information. In this CP system, Availability (A) is sacrificed during a partition.

    # Assume partition is active. Client tries to write to Node A.
    curl -X PUT -d '{"name": "Bob"}' http://node-a:8080/data/user:123
    # Node A returns an error, e.g., "503 Service Unavailable" because it can't guarantee consistency.
    
    # Client tries to read from Node B.
    curl http://node-b:8080/data/user:123
    # Node B might return old data or an error, depending on its read policy.
    
  • If we prioritize Availability (A) and Partition Tolerance (P): When the partition occurs, Node A will happily accept the write user:123 with value {"name": "Bob"} and Node B will happily accept a read request, returning its last known value (which might be {"name": "Alice"}). Both nodes remain available to their clients. However, because they can’t communicate, their data diverges. Once the partition heals, the system needs a reconciliation strategy to decide which version is "correct" (e.g., last write wins, or a more complex merge). This is an AP system.

    # Assume partition is active. Client writes to Node A.
    curl -X PUT -d '{"name": "Bob"}' http://node-a:8080/data/user:123
    # Node A succeeds and returns 200 OK.
    
    # Client reads from Node B.
    curl http://node-b:8080/data/user:123
    # Node B returns {"name": "Alice"} (stale data).
    
  • What about CA? A system that is strictly Consistent and Available must not tolerate partitions. If a partition occurs, it has to shut down or become unavailable to avoid inconsistency. This is rare for large-scale distributed systems, as network partitions are a fact of life. You’re almost always dealing with P.

The Core Problem: Partitions are Unavoidable

Network partitions are not a "bug" to be fixed; they are a fundamental characteristic of distributed systems. Cables get cut, routers fail, firewalls misconfigure. The CAP theorem forces us to acknowledge that when these events happen, we can’t have it all.

How Systems Implement CAP

Most distributed databases today are designed as AP systems. They accept that during a partition, data might become temporarily inconsistent across nodes. They rely on mechanisms like:

  • Replication: Data is copied across multiple nodes.
  • Quorums: Reads and writes must succeed on a certain number of nodes (a quorum) to be considered successful. For example, a write might need to be acknowledged by W nodes and a read by R nodes. If W + R > N (where N is the total number of replicas), you can achieve strong consistency when there is no partition. However, during a partition, if you can’t reach a quorum on one side, that side becomes unavailable for that operation.
  • Conflict Resolution: When a partition heals, the system needs a way to reconcile divergent data. This can be:
    • Last Write Wins (LWW): The update with the latest timestamp is applied. This is simple but can lose data if clocks aren’t perfectly synchronized or if concurrent writes happen.
    • Vector Clocks / Version Vectors: More sophisticated methods that track causality between updates, allowing for detection of concurrent, non-conflicting updates and identifying true conflicts that need application-level resolution.
    • Application-Specific Logic: The application itself defines how to merge conflicting data.

The "choice" in CAP is really about when you are willing to sacrifice availability. Do you want your system to remain responsive even if it might be serving stale data during an outage between nodes, or do you want it to be strictly correct but potentially unresponsive when nodes can’t talk to each other?

The most surprising true thing about the CAP theorem is that it’s often misinterpreted as a strict, static choice made at design time. In reality, many systems dynamically adjust their behavior. For instance, a system might aim for strong consistency by default, but if it detects a long-standing partition and the business needs dictate, it might shift to prioritize availability for certain operations to avoid complete service interruption. The theorem describes the guarantees you can have, not necessarily the static configuration of a system.

The next concept you’ll grapple with is how different consistency models (like eventual consistency, causal consistency, and linearizability) map onto the CAP theorem’s guarantees.

Want structured learning?

Take the full Distributed Systems course →