CockroachDB’s geo-partitioning lets you pin data to specific geographic regions, but it doesn’t actually enforce it with the strictness you might assume.

Let’s see this in action. Imagine we have a table users and we want to ensure user data from the US stays in us-east1 and user data from Europe stays in europe-west1.

CREATE TABLE users (
    id UUID PRIMARY KEY,
    name STRING,
    country STRING,
    -- Geo-partitioning is defined at the table level
    PRIMARY KEY (id)
)
PARTITION BY RANGES (country)
REGIONAL BY STATEMENT; -- This is the key for geo-partitioning

-- Now, let's define the regions
ALTER TABLE users EXPERT PARTITION BY RANGES (country)
ADD REGION "us-east1" FOR VALUES IN ('US');

ALTER TABLE users EXPERT PARTITION BY RANGES (country)
ADD REGION "europe-west1" FOR VALUES IN ('DE', 'FR', 'ES', 'IT');

When you execute INSERT INTO users (id, name, country) VALUES (gen_random_uuid(), 'Alice', 'US');, CockroachDB tries to route this statement to the us-east1 region because the country value ('US') falls into the partition defined for that region. Similarly, an insert for 'DE' would aim for europe-west1.

The problem this solves is data residency and compliance. If regulations require certain data to reside within specific geographic boundaries, geo-partitioning allows you to meet those requirements. It also improves performance by reducing latency for users accessing data geographically close to them.

Internally, CockroachDB uses a combination of the REGIONAL BY STATEMENT table setting and partition definitions. When a statement arrives, the query planner examines the country value in the WHERE clause or the INSERT statement. It then consults the REGIONS definitions associated with the partitions. If a match is found, the statement is routed to the appropriate region. If no specific region is defined for a value, or if the value doesn’t match any partition, the statement will be executed on the cluster’s default region.

The exact levers you control are the PARTITION BY RANGES clause, which defines how data is grouped (in this case, by country codes), and the ADD REGION statements, which map those partitions to specific CockroachDB regions. The REGIONAL BY STATEMENT setting is crucial; without it, the partitioning wouldn’t be aware of or influenced by the geographic location of the data.

What’s often overlooked is that REGIONAL BY STATEMENT primarily influences routing. If a node in us-east1 receives a statement that should go to europe-west1 (perhaps due to a network misconfiguration or a direct query bypassing the usual routing), it can still execute the query, but it will incur cross-region latency. The system doesn’t have a hard "data must stay here" enforcement at the storage layer for all queries by default; rather, it prioritizes routing based on the defined regions.

The next logical step is understanding how to handle queries that span multiple geo-partitions and the implications for transaction consistency.

Want structured learning?

Take the full Cockroachdb course →