CockroachDB’s ability to survive the simultaneous failure of an entire data center is its most surprising strength, and it achieves this not by magic, but by treating every node as a potential witness.
Imagine you have three nodes, node1, node2, and node3, spread across us-east-1a, us-east-1b, and eu-west-1a respectively. This is a typical multi-region deployment. CockroachDB uses a consensus protocol called Raft to ensure data consistency. For any piece of data, a majority of the replicas for that data must agree on its state. In a three-replica setup (the default for most data), this means at least two replicas must agree.
Let’s see this in action. We’ll set up a small cluster with three nodes, each in a different region.
# Node 1: us-east-1a
cockroach start --certs-dir=certs --node=1 \
--locality=region:us-east1,cloud:aws,aws-region:us-east-1a \
--join=node2:26257,node3:26257 \
--advertise-addr=node1.example.com:26257 \
--http-addr=node1.example.com:8080 \
--background
# Node 2: us-east-1b
cockroach start --certs-dir=certs --node=2 \
--locality=region:us-east1,cloud:aws,aws-region:us-east-1b \
--join=node1:26257,node3:26257 \
--advertise-addr=node2.example.com:26257 \
--http-addr=node2.example.com:8080 \
--background
# Node 3: eu-west-1a
cockroach start --certs-dir=certs --node=3 \
--locality=region:eu-west1,cloud:aws,aws-region:eu-west-1a \
--join=node1:26257,node2:26257 \
--advertise-addr=node3.example.com:26257 \
--http-addr=node3.example.com:8080 \
--background
Once the cluster is up, you can connect to it using the SQL client:
cockroach sql --certs-dir=certs --host=node1.example.com:26257
Inside the SQL client, you can see the localities of your nodes:
SHOW CLUSTER SETTINGS;
You’ll see output like:
variable | value
-----------+------------------------------------------------------
... | ...
geo.regions | {"aws":{"us-east-1a":{},"us-east-1b":{}},"eu-west-1":{"eu-west-1a":{}}}
... | ...
This geo.regions setting is crucial. It tells CockroachDB how to distribute data. By default, for each range of data, CockroachDB will try to place one replica in each of the specified regions. So, for a range of data, you might have replicas on node1 (us-east-1a), node2 (us-east-1b), and node3 (eu-west-1a).
When a write operation occurs, say to a table in the defaultdb database, the Raft leader for that range will coordinate the write. If the leader is on node1, it will send the write request to its followers on node2 and node3. For the write to be committed, a majority of these replicas (at least two) must acknowledge that they’ve received and persisted the write. This ensures that even if node1 (and its entire region) goes down, the data is still available and can be served by node2 and node3.
The key to multi-region resilience is the locality setting on each node and the geo.regions cluster setting. When you define locality=region:us-east1,cloud:aws,aws-region:us-east-1a, you’re not just tagging the node; you’re telling CockroachDB that this node is part of the us-east1 logical region, within the aws cloud, and specifically in the us-east-1a availability zone. CockroachDB uses this information to make intelligent placement decisions for data replicas.
The geo.regions cluster setting is where you declare the distinct geographical regions your cluster spans. CockroachDB uses this to ensure that for critical data, it attempts to place replicas across these regions. The default replication factor is 3, meaning it tries to have 3 copies of your data. With multi-region, it aims to place one replica in each region you’ve defined in geo.regions, up to the replication factor. If you have more regions than your replication factor, it will pick regions to distribute across. If you have fewer replicas than regions, it will place one in each region.
A common pitfall is assuming that simply having nodes in different regions automatically makes your data resilient. You must configure locality correctly on each node and ensure geo.regions reflects your deployment. If your geo.regions setting is not populated or incorrectly configured, CockroachDB might place all replicas within a single region, defeating the purpose of a multi-region deployment.
When you configure a table_locality for a specific table, you can override the cluster-wide default. For example, ALTER TABLE my_table_high_traffic EXPERIMENTAL_RELOCATE VALUES FROM (DEFAULT) TO (REGION 'us-east1', REGION 'us-west1'); instructs CockroachDB to ensure replicas for my_table_high_traffic are distributed across us-east1 and us-west1. This is how you achieve geo-partitioning and tailor data placement for specific workloads.
The "least surprising" aspect of CockroachDB’s multi-region setup is how it leverages the Raft consensus protocol. Raft inherently requires a majority of nodes to agree on operations. By ensuring that these majority nodes are geographically distributed, you achieve resilience against regional outages. The surprise is that this isn’t a separate, complex mechanism; it’s a direct consequence of how Raft works when nodes are aware of their distributed localities.
The next challenge you’ll face is optimizing query performance across these regions, which often involves understanding and tuning crdb_internal.region_liveness to influence query routing.