CRDTs are a lie. They don’t actually merge data; they prevent merge conflicts from ever happening.
Let’s say you have a simple counter. Normally, if two people increment it simultaneously, you get a race condition: one increment might be lost. With a CRDT counter, this is impossible.
Imagine two clients, Alice and Bob, both starting with a counter value of 0.
Alice increments her counter. Her local state becomes 1.
Bob increments his counter. His local state becomes 1.
Now, Alice and Bob sync their states. Instead of a simple "last write wins" or a complex merge algorithm, CRDTs have a specific mathematical property. For a counter, this is often implemented as a pair of values: a "grow-only counter" (G-counter) that only ever increases, and a "tombstone" or "replicated grow-only counter" (RG-counter) that also only increases. When merging, you take the maximum of the two components.
Here’s a simplified view of what’s happening under the hood for a G-counter:
// Alice's state
{
"counters": {
"clientA": 1,
"clientB": 0
}
}
// Bob's state
{
"counters": {
"clientA": 0,
"clientB": 1
}
}
When Alice and Bob sync, their states are merged by taking the maximum value for each key:
// Merged state
{
"counters": {
"clientA": Math.max(alice.counters.clientA, bob.counters.clientA), // Math.max(1, 0) = 1
"clientB": Math.max(alice.counters.clientB, bob.counters.clientB) // Math.max(0, 1) = 1
}
}
The total value of the counter is the sum of all values in the counters map. In this case, 1 + 1 = 2. Both Alice and Bob will eventually see 2 without any explicit conflict resolution.
This "magic" works because CRDTs are designed to be commutative, associative, and idempotent. This means the order of operations doesn’t matter, you can group them however you like, and applying an operation multiple times has the same effect as applying it once.
Let’s look at another common CRDT: a Last-Writer-Wins Register (LWW-Register). This is useful for fields like a user’s name or a status message.
Alice changes her name to "Alice". Her local state:
{"name": "Alice", "timestamp": 1678886400000}
Bob changes his name to "Bob". His local state:
{"name": "Bob", "timestamp": 1678886405000}
When they sync, the merge logic is simple: take the value with the higher timestamp.
{"name": "Bob", "timestamp": 1678886405000}
Bob’s change "wins" because it has a later timestamp. This is predictable and deterministic.
What if they have the same timestamp? This is where things get interesting. Most CRDT implementations will use a tie-breaker, often the unique client ID.
Alice changes her name to "Alice". Her local state:
{"name": "Alice", "timestamp": 1678886400000, "client_id": "clientA"}
Bob changes his name to "Bob". His local state:
{"name": "Bob", "timestamp": 1678886400000, "client_id": "clientB"}
The merge logic:
- Compare timestamps. If different, pick the higher one.
- If timestamps are equal, compare client IDs. Pick the one that comes later lexicographically (or according to some predefined ordering).
In this case, clientB comes after clientA, so Bob’s change wins.
{"name": "Bob", "timestamp": 1678886400000, "client_id": "clientB"}
The system doesn’t just handle conflicts; it’s designed so that conflicts, in the traditional sense, never arise. Instead, you have concurrent updates that are merged according to deterministic rules. This leads to eventual consistency, where all replicas will eventually agree on the same state, provided they receive all updates.
The true power of CRDTs lies in their ability to provide strong guarantees about convergence without requiring a central authority for every single operation. This makes them ideal for distributed systems, collaborative editing, and scenarios where network partitions are common.
The most surprising aspect is how simple data structures like sets and lists can be transformed into conflict-free versions. For a set, you might have a "grow-only set" (G-Set) where elements can only be added, or a "replicated grow-only set" (RG-Set) that supports adding and removing elements. For removals, you use a separate "tombstone" set. When merging, you add all elements from both sets and then remove any element that exists in the tombstone set of either replica. This ensures that once an element is "removed" (marked as a tombstone), it stays removed across all replicas.
If you’re working with distributed systems and need to ensure data consistency across multiple nodes that might be offline or experiencing network issues, you’ll eventually need to consider how to handle concurrent writes.