Distributed systems can’t agree on the "time" of events across different machines, so they invent their own "clocks" to keep track of causality.

Let’s see how these clocks work with a simple example. Imagine we have three processes, P1, P2, and P3, each with its own state and a clock.

graph TD
    P1(Process 1)
    P2(Process 2)
    P3(Process 3)

    P1 -- Message A --> P2
    P2 -- Message B --> P3
    P3 -- Message C --> P1

When P1 sends Message A to P2, P1 increments its clock. When P2 receives Message A, it updates its clock to be at least one greater than the received timestamp and then increments it again. This ensures that P2’s clock is always ahead of P1’s clock after the message exchange, indicating that P1’s event happened before P2’s.

This is the essence of Lamport clocks. Each process maintains a single integer counter.

  • When a process experiences an internal event (e.g., performs a computation), it increments its counter.
  • When a process sends a message, it piggybacks its current counter value on the message.
  • When a process receives a message, it updates its counter to be the maximum of its current counter and the received message’s counter, and then increments its counter.

This mechanism guarantees that if event a happens before event b in real time, then LamportClock(a) < LamportClock(b). However, the converse is not always true: LamportClock(a) < LamportClock(b) does not necessarily mean a happened before b. This is because two events can have causally unrelated timestamps that still satisfy this inequality. For instance, if P1 sends a message to P2, and P3 sends a message to P4, and their clocks happen to be 5 and 6 respectively, we can’t say anything about the causal relationship between these events.

To address this limitation and capture more precise causal relationships, we use Vector Clocks. A Vector Clock is not a single number but a vector of integers, where each element in the vector corresponds to the logical clock of a specific process in the system.

Consider our three processes again, P1, P2, and P3. Their Vector Clocks would be represented as VC = [VC1, VC2, VC3].

  • Initially, all vector clocks are [0, 0, 0].
  • When P1 experiences an internal event: VC1 increments. So, [1, 0, 0].
  • When P1 sends a message to P2: P1 increments its own clock ([2, 0, 0]) and sends this vector along with the message.
  • When P2 receives the message from P1: P2 updates its vector clock by taking the element-wise maximum of its own clock and the received clock, and then increments its own element. If P2’s clock was [1, 1, 0] and it receives [2, 0, 0], its new clock becomes [max(1,2), max(1,0), max(0,0)] which is [2, 1, 0], and then it increments its own component: [3, 1, 0].

This process ensures that VC(a) < VC(b) (meaning VCa[i] <= VCb[i] for all i, and VCa[k] < VCb[k] for at least one k) if and only if event a causally precedes event b. This is a much stronger guarantee than Lamport clocks.

Let’s trace a scenario:

  1. P1: [1, 0, 0] (internal event)
  2. P1 sends to P2: [2, 0, 0] (message timestamp)
  3. P2 receives: VC_P2 becomes [max(VC_P2[0], 2), max(VC_P2[1], 0), max(VC_P2[2], 0)] and then VC_P2[1] increments. If VC_P2 was [0,0,0], it becomes [2, 1, 0].
  4. P2: [2, 2, 0] (internal event)
  5. P2 sends to P3: [2, 3, 0] (message timestamp)
  6. P3 receives: VC_P3 becomes [max(VC_P3[0], 2), max(VC_P3[1], 3), max(VC_P3[2], 0)] and then VC_P3[2] increments. If VC_P3 was [0,0,0], it becomes [2, 3, 1].
  7. P3: [2, 3, 2] (internal event)
  8. P3 sends to P1: [2, 3, 3] (message timestamp)
  9. P1 receives: VC_P1 becomes [max(VC_P1[0], 2), max(VC_P1[1], 3), max(VC_P1[2], 3)] and then VC_P1[0] increments. If VC_P1 was [2,0,0], it becomes [max(2,2), max(0,3), max(0,3)] which is [2, 3, 3], and then VC_P1[0] increments: [3, 3, 3].

Notice that when P1 receives from P3, its vector clock is [2, 3, 3]. This means P1 knows that P2’s clock is at least 3 and P3’s clock is at least 3. P1 itself has only advanced its own clock to 3. This correctly reflects that P1’s last internal event happened after it received the message from P3, and P3’s clock was already at 3 when it sent the message.

The key difference lies in how they handle concurrent events. If two events a and b are concurrent (neither causally precedes the other), their Lamport timestamps might be ordered, leading to ambiguity. With vector clocks, if VC(a) and VC(b) are not comparable (i.e., VCa[i] > VCb[i] for some i and VCa[j] < VCb[j] for some other j), then a and b are concurrent. This allows systems to detect and handle concurrent operations correctly.

A common pitfall is forgetting to update the local clock after merging a received vector clock. If P2 receives a message from P1 with timestamp [2, 0, 0] while its own clock is [1, 1, 0], it first computes [max(1,2), max(1,0), max(0,0)] = [2, 1, 0]. If it stops there, it incorrectly signals that P1’s event is concurrent with P2’s second internal event. The crucial step is then incrementing its own component: [2, 1, 0] becomes [3, 1, 0]. This final increment is what signifies that P2 has now experienced an event after processing the message from P1.

The comparison of vector clocks VCa and VCb is defined as:

  • VCa <= VCb if VCa[i] <= VCb[i] for all i.
  • VCa < VCb if VCa <= VCb and VCa != VCb.
  • VCa || VCb (concurrent) if neither VCa <= VCb nor VCb <= VCa.

The next concept to grapple with is how these clocks are used for garbage collection of distributed state, specifically how to know when a piece of data can be safely deleted because no process can possibly refer to it anymore.

Want structured learning?

Take the full Distributed Systems course →