Hash-sharded indexes are the secret weapon for taming index hotspots in CockroachDB, letting you scale writes beyond the capacity of a single range.

Let’s see it in action. Imagine a table users with a high-volume INSERT workload into a sequential id column, which becomes a hot spot on the primary index.

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username STRING UNIQUE,
    email STRING,
    created_at TIMESTAMP DEFAULT now()
);

-- High insert rate
INSERT INTO users (username, email) VALUES
('user1', 'user1@example.com'),
('user2', 'user2@example.com'),
-- ... millions more

Without sharding, all these inserts target the same primary index range, saturating a single set of nodes.

The solution? Hash-sharding the primary index. We’ll create a new table with a HASH_SHARDED PRIMARY INDEX. This tells CockroachDB to distribute the index keys across multiple ranges based on a hash of the primary key.

CREATE TABLE users_sharded (
    id UUID PRIMARY KEY HASH_SHARDED DEFAULT gen_random_uuid(),
    username STRING UNIQUE,
    email STRING,
    created_at TIMESTAMP DEFAULT now()
);

Now, when you INSERT data, CockroachDB computes a hash of the id (the primary key) and uses that to determine which range the new row belongs to. This distributes the write load across multiple ranges, and thus, multiple nodes.

The mental model is simple: instead of one big, unwieldy index range, you have many smaller, independent ranges. Each range can handle its own set of writes. By hashing the primary key, we ensure that new keys are spread out evenly across these ranges, preventing any single range from becoming a bottleneck. The HASH_SHARDED keyword is the magic switch. CockroachDB handles the mechanics of distributing data and routing queries automatically.

You control the level of sharding by specifying the number of shards. The default is 1024, which is usually a good starting point. You can override this with NUM_BUCKETS.

CREATE TABLE users_sharded_16 (
    id UUID PRIMARY KEY HASH_SHARDED NUM_BUCKETS 16 DEFAULT gen_random_uuid(),
    username STRING UNIQUE,
    email STRING,
    created_at TIMESTAMP DEFAULT now()
);

Choosing NUM_BUCKETS involves a trade-off. More buckets mean finer-grained distribution and potentially higher write throughput. However, it also increases the overhead of managing more ranges and can slightly slow down point lookups if the hash doesn’t perfectly align with the data distribution. For sequential keys or UUIDs, a higher NUM_BUCKETS is generally beneficial. For randomly distributed keys, the default is often sufficient.

The surprising thing is that HASH_SHARDED applies to the index, not the table itself. This means you can hash-shard a secondary index just as easily as the primary. When you do this, CockroachDB hashes the indexed columns to determine which shard the index entry belongs to, distributing the index maintenance load. This is crucial if your hotspot is on a secondary index used by WHERE clauses or ORDER BY clauses.

CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID,
    order_date DATE,
    status STRING,
    INDEX customer_date_idx (customer_id, order_date) STORING (status)
);

-- Hotspot on customer_id for a specific date range

To shard the customer_date_idx:

CREATE TABLE orders_sharded_idx (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID,
    order_date DATE,
    status STRING,
    INDEX customer_date_idx (customer_id, order_date) STORING (status) HASH_SHARDED NUM_BUCKETS 512
);

This breaks up the index entries for (customer_id, order_date) across multiple shards, distributing the load when querying by customer_id.

When you hash-shard an index, CockroachDB doesn’t store the id of the primary key in the sharded secondary index. Instead, it stores the hash of the indexed columns. This is a key optimization: it means the sharded secondary index doesn’t need to know the actual primary key value to hash; it only needs the values of the columns participating in the index. This allows for efficient distribution of index maintenance even for very large primary keys.

Understanding how NUM_BUCKETS interacts with your data cardinality is the next step to mastering performance.

Want structured learning?

Take the full Cockroachdb course →