Paxos is a consensus algorithm that allows a distributed system to agree on a single value, even if some nodes fail or messages are lost.

Let’s see it in action. Imagine a simple distributed key-value store where multiple replicas need to agree on the latest value for a key.

{
  "key": "user_profile:123",
  "value": {
    "name": "Alice",
    "email": "alice@example.com"
  },
  "version": 5
}

Here, version is crucial. Every update to a key increments its version. Paxos ensures that all replicas agree on the highest version number and its associated value.

At its core, Paxos involves two main roles: Proposers and Acceptors. Proposers suggest values, and Acceptors vote on these suggestions. The goal is to have a majority of Acceptors agree on a single proposed value.

The process has two phases:

Phase 1: Prepare

  1. A Proposer wanting to propose a value (say, v with version n) sends a Prepare(n) request to a majority of Acceptors.
  2. An Acceptor that receives Prepare(n):
    • If n is greater than any proposal number it has seen before, it promises not to accept any future proposals with a number less than n. It then responds with Promise(n, highest_accepted_proposal_number, highest_accepted_value). The highest_accepted_proposal_number and highest_accepted_value are the latest proposal it has accepted (if any).
    • If n is not greater than a proposal number it has already promised, it ignores the request.

Phase 2: Accept

  1. If a Proposer receives Promise responses from a majority of Acceptors for its Prepare(n) request:
    • If any Acceptor reported having already accepted a value, the Proposer must choose the value associated with the highest proposal number reported by those Acceptors. If no value was reported, it can propose its own v.
    • The Proposer then sends an Accept(n, chosen_value) request to the same majority of Acceptors.
  2. An Acceptor that receives Accept(n, chosen_value):
    • If n is greater than or equal to any proposal number it has previously promised, it accepts the proposal. It records (n, chosen_value) as its latest accepted proposal and responds with Accepted(n, chosen_value).
    • Otherwise, it ignores the request.

If a Proposer receives Accepted responses from a majority of Acceptors, the value is considered chosen.

The problem Paxos solves is consistency in the face of failures. In a distributed system, nodes can crash, network partitions can occur, and messages can be delayed or lost. Without a mechanism like Paxos, different replicas of your data could end up with different versions of the truth. Paxos guarantees that even under these conditions, the system will eventually converge on a single, agreed-upon value.

The "real levers" you control when implementing or tuning Paxos are primarily related to:

  • Quorum Size: The number of Acceptors required to form a majority. This directly impacts fault tolerance. A larger majority means more nodes can fail, but it also increases latency as more nodes need to respond.
  • Proposal Numbers: These must be strictly increasing and unique. Typically, they are composed of a sequence number and a node ID (e.g., (10, node_3)). This ensures that older proposals are always superseded by newer ones.
  • Leader Election (for Multi-Paxos): In practice, a single Proposer (often called a Leader) is elected to simplify the process and reduce contention. If the Leader fails, a new one is elected, and it resumes the Paxos protocol.

One of the most subtle aspects of Paxos is how it handles a Proposer that crashes after sending Prepare but before sending Accept. If another Proposer then successfully runs the protocol and chooses a value, the original Proposer (if it recovers) might try to re-initiate a Prepare with an even higher sequence number. However, because it has promised not to accept proposals lower than its previous Prepare number, it might get stuck. The key is that any new Proposer must check for previously accepted values from any Acceptor that has already promised. If an Acceptor has already accepted a value, the new Proposer must use that value, ensuring progress.

The next concept you’ll grapple with is how to make Paxos efficient enough for real-world databases, which often leads to exploring variations like Multi-Paxos or Raft.

Want structured learning?

Take the full Distributed Systems course →