CockroachDB’s query optimizer is more of a suggestion box than a dictator, and it’s your job to make sure its suggestions are the best ones.
Let’s see it in action. Imagine we have a simple users table:
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username STRING NOT NULL UNIQUE,
email STRING NOT NULL UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
last_login TIMESTAMP WITH TIME ZONE,
profile_data JSONB
);
And we want to find users who logged in after a specific date:
SELECT user_id, username, email
FROM users
WHERE last_login > '2023-10-26 10:00:00+00';
Without any indexes, CockroachDB has to scan the entire users table, checking last_login for every single row. This is slow.
The EXPLAIN statement is your primary tool here. Running EXPLAIN SELECT ... will show you the query plan. A full table scan will look something like this:
tree
---------------------
SELECT
FROM users
WHERE last_login > '2023-10-26 10:00:00+00'
rows=1000000
This tells you it’s going to examine every row.
To speed this up, we need an index on last_login. A common mistake is to just CREATE INDEX idx_last_login ON users (last_login);. This is good, but not always optimal for range queries.
A better index for this specific query would be a secondary index that includes the columns we’re selecting:
CREATE INDEX idx_users_last_login ON users (last_login DESC);
Notice DESC. For time-based data, indexing in descending order often makes sense if you’re querying for recent items. If you were querying for older items, ASC might be better. This choice depends on your typical query patterns.
Now, EXPLAIN SELECT ... will show a plan that uses idx_users_last_login:
tree
------------------------------------
SELECT
FROM idx_users_last_login
WHERE last_login > '2023-10-26 10:00:00+00'
rows=100000
This is already a huge improvement. But what if we also wanted to see the username and email in the result? The current index only helps us find the rows quickly. To get the username and email, CockroachDB still has to go back to the primary table (users) for each row found via the index. This is called a "lookup."
We can make this even faster by creating an interleaved index, or more commonly, a covering index by including the columns we need:
CREATE INDEX idx_users_last_login_cover ON users (last_login DESC, user_id, username, email);
Now, when you EXPLAIN the query, you might see something like:
tree
----------------------------------------------
SELECT
FROM idx_users_last_login_cover
WHERE last_login > '2023-10-26 10:00:00+00'
rows=100000
This plan indicates that all the data needed for the SELECT statement can be retrieved directly from the index itself, eliminating the need for back-and-forth lookups to the main table.
The EXPLAIN output is crucial. Look for index scan or index only scan on your desired index, and avoid table scan on large tables. If the optimizer isn’t picking your index, you might need to:
- Ensure Statistics are Fresh: The optimizer relies on statistics about your data distribution. Run
CREATE STATISTICS IF NOT EXISTS stats_users_last_login ON users (last_login);andANALYZE users;. - Use
FORCE INDEX(with caution): In rare cases, you can hint the optimizer:SELECT /*+ FORCE_INDEX(users idx_users_last_login_cover) */ user_id, username, email FROM users WHERE last_login > '2023-10-26 10:00:00+00';. This is a last resort. - Index Selectivity: If your index is not selective (e.g., indexing a boolean column where 99% of values are
true), the optimizer might correctly decide a table scan is faster. - Query Structure: Sometimes, rewriting the query can help the optimizer find a better plan. Avoid functions on indexed columns in the
WHEREclause if possible (e.g.,WHERE date(last_login) > ...).
CockroachDB’s configuration, particularly max_sql_memory and max_சைgn_memory, plays a significant role in how efficiently queries execute, especially complex ones involving aggregations or large sorts. If these are set too low, queries might spill to disk or be canceled, impacting performance. You can check these in SHOW ALL; or SHOW CLUSTER SETTING and adjust them via SET CLUSTER SETTING sql.memory.max_memory = '4GiB';.
The most surprising thing about CockroachDB’s indexing is how much it favors covering indexes for SELECT statements that retrieve specific columns. It’s not just about finding rows faster; it’s about fetching all the required data directly from the index itself, eliminating the primary table lookup entirely. This means an index definition like CREATE INDEX idx_foo ON table (col1 DESC) INCLUDE (col2, col3); can dramatically outperform a simple CREATE INDEX idx_foo ON table (col1 DESC); if your query is SELECT col2, col3 FROM table WHERE col1 > value;.
When you start tuning for high-throughput OLTP workloads, you’ll encounter the concept of "hot ranges," where a specific range of data is being accessed by a disproportionate number of transactions, leading to contention.