EXPLAIN ANALYZE is your best friend for diagnosing slow CockroachDB queries, but it’s not just about identifying bottlenecks; it’s about understanding how CockroachDB chooses to execute your query and then gently nudging it towards a better path.

Let’s watch it in action. Imagine you have a table users with millions of rows, and you’re trying to find a specific user by their email address:

SELECT * FROM users WHERE email = 'alice@example.com';

When this query runs slowly, your first instinct is to EXPLAIN ANALYZE it:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

The output might look something like this (simplified for clarity):

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Distribution |           Node            |                           Statement                           | Estimated Rows | Actual Rows |   Time
--------------+---------------------------+-----------------------------------------------------------------+----------------+-------------+-----------
              |                         | SELECT * FROM users WHERE email = 'alice@example.com'           |              1 |           1 | 123.45ms
              |                         |   -> Table: users (full scan)                                   |              1 |           1 | 122.90ms
              |                         |        Filters: email = 'alice@example.com'                     |                |             |
(1 row)

This initial output tells you that CockroachDB performed a "full scan" on the users table, which is almost certainly the problem if you have many users. It scanned 100% of the rows (Actual Rows: 1, but it tried to scan all of them before filtering) and it took 122.90 milliseconds. The estimated rows (1) were correct, but the plan chosen was inefficient.

The Mental Model: How CockroachDB Chooses a Plan

CockroachDB, like most modern databases, uses a cost-based optimizer (CBO). When you submit a query, the CBO analyzes it, considers the available indexes, table statistics, and cluster state, and then estimates the "cost" of various possible execution plans. It picks the plan it believes will be the fastest.

The CBO’s job is to balance several factors:

  • Data Scans: How much data needs to be read? Full table scans are expensive. Index scans are usually cheaper.
  • Joins: How are tables combined? The order and method of joining can drastically affect performance.
  • Data Movement: CockroachDB is distributed. Data might need to be sent between nodes. This network I/O is costly.
  • Sorting and Aggregation: Operations that require collecting and processing data can be resource-intensive.

EXPLAIN ANALYZE gives you two crucial pieces of information:

  1. Estimated Rows vs. Actual Rows: If these numbers are wildly different, your table statistics are stale, and the CBO is making bad decisions based on wrong assumptions.
  2. Time Spent per Operation: This is the core of identifying bottlenecks. You see exactly which part of the plan took the longest.

Leveraging EXPLAIN ANALYZE for Performance Tuning

The most common reason for slow queries is the absence of a suitable index. In our users example, email is a unique identifier, so it’s a prime candidate for an index.

Diagnosis: The EXPLAIN ANALYZE output clearly shows a "full scan" on users.

Fix: Create an index on the email column.

CREATE INDEX users_email_idx ON users (email);

Why it works: This creates a special data structure that maps email addresses directly to the rows containing them. When you query WHERE email = 'alice@example.com', CockroachDB can now use this index to directly locate the row for Alice, instead of reading every single row in the table.

After creating the index, run EXPLAIN ANALYZE again:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

The output should now look very different:

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Distribution |           Node            |                           Statement                           | Estimated Rows | Actual Rows |   Time
--------------+---------------------------+-----------------------------------------------------------------+----------------+-------------+-----------
              |                         | SELECT * FROM users WHERE email = 'alice@example.com'           |              1 |           1 | 1.23ms
              |                         |   -> Index Scan using users_email_idx on users (email = 'alice@example.com') |              1 |           1 | 0.90ms
(1 row)

Notice the change from "full scan" to "Index Scan using users_email_idx". The time has dropped dramatically from 122.90ms to 0.90ms.

Other Common Issues and Fixes:

  • Stale Statistics: If Estimated Rows is vastly different from Actual Rows for many operations, the CBO is flying blind.

    • Diagnosis: Observe significant discrepancies between estimated and actual row counts in EXPLAIN ANALYZE output.
    • Fix: Run CREATE STATISTICS users_email_stats ON users (email); (or more general CREATE STATISTICS ON users;). This tells CockroachDB to collect fresh statistics about the data distribution in your users table.
    • Why it works: Accurate statistics allow the CBO to correctly estimate the cost of different plans, leading it to choose more efficient ones.
  • Inefficient Joins: When joining multiple tables, the order and method of joining matter.

    • Diagnosis: EXPLAIN ANALYZE shows a high time spent in a "hash join" or "merge join" between two large intermediate results.
    • Fix: Ensure you have indexes on the columns used in your JOIN conditions. For example, if joining orders and users on users.id = orders.user_id, ensure orders.user_id is indexed. Sometimes, a different join order can be hinted or achieved by rewriting the query.
    • Why it works: Indexes speed up the lookup of matching rows for the join, and a good join order reduces the intermediate data that needs to be processed.
  • Data Skew: One or a few nodes might be doing disproportionately more work. This is common in distributed systems.

    • Diagnosis: EXPLAIN ANALYZE output shows that a particular Node is responsible for a much larger percentage of the total query time, or data movement between nodes is high.
    • Fix: Review your table’s distribution key. If it’s id on a monotonically increasing integer, data might be heavily skewed to recent rows. Consider rebalancing or using a different distribution key if appropriate.
    • Why it works: A good distribution key spreads data and load evenly across all nodes, preventing hot spots.
  • Subqueries / Correlated Subqueries: These can sometimes be executed repeatedly, leading to poor performance.

    • Diagnosis: EXPLAIN ANALYZE shows a subquery execution time that, when multiplied by the outer query’s row count, exceeds the total query time.
    • Fix: Rewrite the subquery as a JOIN or a Common Table Expression (CTE) to ensure it’s executed only once.
    • Why it works: Joins and CTEs are typically planned and executed more efficiently by the optimizer than repeatedly executing a correlated subquery.
  • Large ORDER BY or GROUP BY Without Index Support: If you’re sorting or grouping data and there’s no index that can satisfy the ordering or grouping directly, CockroachDB might have to collect all data and sort/group it.

    • Diagnosis: High time spent in sort or group operations in EXPLAIN ANALYZE, especially after filtering.
    • Fix: Create an index that matches the ORDER BY or GROUP BY columns, potentially including columns from the WHERE clause. For example, for SELECT ... FROM users WHERE status = 'active' ORDER BY created_at DESC, an index on (status, created_at DESC) would be beneficial.
    • Why it works: The index can provide the data in the desired order, eliminating the need for a separate, costly sort operation.

The most subtle but often powerful aspect of EXPLAIN ANALYZE is understanding the cost model and how it prioritizes operations. For instance, CockroachDB might choose a hash join over a merge join because it estimates fewer rows will be produced by the preceding filter, making the hash join’s memory overhead acceptable. If your statistics are off, it might estimate very few rows when in reality there are millions, leading it to favor a plan that is disastrously slow for large datasets.

The next thing you’ll likely encounter is optimizing queries involving complex aggregations or window functions, where understanding the interaction between indexes, data distribution, and the execution plan becomes even more intricate.

Want structured learning?

Take the full Cockroachdb course →