CockroachDB doesn’t actually use traditional B-tree indexes for joins; it uses them for lookup which is a completely different beast.

Let’s see how CockroachDB handles joins, and how you can make them sing.

Consider these two tables:

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100)
);

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL,
    product VARCHAR(50),
    amount DECIMAL(10, 2),
    INDEX user_idx (user_id) -- This index is key
);

And some data:

INSERT INTO users (username, email) VALUES ('alice', 'alice@example.com'), ('bob', 'bob@example.com');

INSERT INTO orders (user_id, product, amount) VALUES
('00000000-0000-0000-0000-000000000001', 'widget', 10.00), -- Assuming this is alice's UUID
('00000000-0000-0000-0000-000000000001', 'gadget', 25.50),
('00000000-0000-0000-0000-000000000002', 'thingamajig', 5.75); -- Assuming this is bob's UUID

Now, a common query:

SELECT u.username, o.product, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id;

If you run EXPLAIN on this, you’ll see something like:

• distributed_sort
  • hash_join
    • scan_unfinished_transactions
      • distributed_lookup: users
    • scan_unfinished_transactions
      • distributed_lookup: orders

This is where the magic (and potential confusion) happens. CockroachDB uses hash joins by default for joins between tables that aren’t physically co-located or where the join condition doesn’t allow for a simple merge.

Here’s the breakdown:

  1. distributed_lookup: users: This part is scanning the users table. Since users.id is the primary key, CockroachDB can efficiently look up rows by id.
  2. distributed_lookup: orders: This part is scanning the orders table. Crucially, it’s using the user_idx index on orders.user_id. Without this index, it would have to scan the entire orders table, which would be much slower.
  3. hash_join: This is the core of the join operation. CockroachDB will:
    • Build a hash table in memory on one side of the join (usually the smaller side, or the side that’s being iterated over first).
    • Iterate through the other side, hashing each join key and looking for matches in the hash table.
    • This is highly efficient for large datasets if the hash table fits in memory.

Why is the INDEX user_idx (user_id) so important here?

Without user_idx, the distributed_lookup: orders would become a full table scan. CockroachDB would have to read every single row in orders and then, for each row, look up the corresponding user in users. With the index, it can quickly find all orders associated with a given user_id. This is the "lookup" part that the index enables.

What if the join condition isn’t on an indexed column?

If you had users.username and orders.product_name and joined on those, and neither column had an index, both distributed_lookup steps would become full table scans. You’d then be performing a Cartesian product and filtering, which is terrible for performance.

When does CockroachDB use something other than a hash join?

If the data is co-located (e.g., you have a CREATE TABLE ... AS SELECT that preserves locality, or you’ve explicitly used REGIONAL BY TABLE or GLOBAL table settings that align join keys), and the join keys are sorted identically on both sides, CockroachDB might opt for a merge_join. This is even more efficient as it avoids building a hash table, simply merging sorted streams. However, hash joins are far more common and generally what you’ll see.

The Counterintuitive Part: Indexes and Joins Aren’t "Linked" Like in Traditional RDBMS

In many traditional databases, an index on orders.user_id would be directly used by the query planner to "seek" into the users table. In CockroachDB, the index on orders.user_id is used to efficiently scan the orders table for matching user_ids. The join itself is then performed by the hash_join operator, which takes the output of the distributed_lookup operations (which used the indexes for efficient lookups/scans) and combines them. The index doesn’t directly feed into the users table’s lookup during the join; rather, both sides of the join are efficiently prepared using their respective indexes (or primary keys).

To optimize this join, you need to ensure:

  1. Indexes on Join Columns: The orders table must have an index on user_id. If you were joining users to orders on users.username = orders.username, you’d need an index on orders.username as well.
  2. Hash Join Efficiency: The hash_join is generally good. If it’s spilling to disk (indicated by EXPLAIN showing disk_spill or high temp_storage usage in SHOW CLUSTER METRICS), you might need to increase the max_memory setting for your nodes, or ensure your join is on a smaller subset of data.
  3. Data Locality (Advanced): For extreme performance, if users.id and orders.user_id frequently join and you have a high cardinality of users, you might consider co-locating the orders table data with the users table data. This is typically done by creating the orders table using AS SELECT from users and selecting users.id as the range key, or by using REGIONAL BY TABLE and ensuring user_id is the first column in the orders table’s range key. This can allow for merge joins or even eliminate distributed joins entirely if the join is within a single node.

The next problem you’ll likely encounter is understanding how EXPLAIN output changes when you introduce EXPLAIN (OPT, ENV).

Want structured learning?

Take the full Cockroachdb course →