ClickHouse read queries can feel like they’re stuck in molasses, and the culprit isn’t always obvious.

Here’s how to dig in and find out why your SELECT statements are dragging their feet.

1. Isolate the Slow Query

First, we need to confirm which query is the problem. The ClickHouse server logs slow queries by default.

Diagnosis:

Check your ClickHouse server logs (usually /var/log/clickhouse-server/clickhouse-server.log or similar, depending on your OS and installation). Look for lines containing Query took.

Example Log Entry:

2023.10.27 10:30:00.123 [ 12345 ] <Info> <>: <Query> SELECT count() FROM my_table WHERE event_date = '2023-10-26' Query took 15.678 sec. Rows: 1000000000 Processed rows: 1000000000

This tells you the exact query and how long it took.

2. Analyze the Query Plan

Once you have the slow query, let’s see how ClickHouse plans to execute it. This is crucial for understanding where time is being spent.

Diagnosis:

Prepend EXPLAIN to your slow query.

Command:

EXPLAIN SELECT count() FROM my_table WHERE event_date = '2023-10-26';

Output Interpretation:

The EXPLAIN output will show you the stages of the query. Look for stages involving:

  • MergeTree operations (reading data from disk)
  • Filter steps (applying WHERE clauses)
  • Aggregate steps (if you have GROUP BY or aggregations)
  • Join steps (if applicable)

Pay close attention to the estimated number of rows processed at each stage. If a stage estimates processing millions or billions of rows when you expect far fewer, that’s a red flag.

3. Check Data Skipping

ClickHouse uses various indices and data structures to skip reading unnecessary data. If this isn’t working, it will scan way more than it needs.

Common Causes & Fixes:

  • Primary Key Granularity:

    • Diagnosis: Examine your table definition. The ORDER BY clause defines the primary key. If it’s not selective enough (e.g., just a timestamp, or a low-cardinality field), ClickHouse might not be able to skip much data.
    • Fix: Ensure your ORDER BY (which also serves as the primary key for data skipping) is based on columns that are frequently used in WHERE clauses and have high cardinality. For example, ORDER BY (event_date, user_id) is often better than ORDER BY event_date.
    • Why it works: ClickHouse uses the primary key to create sparse indices. A more selective primary key allows the index to point to smaller chunks of data, reducing disk I/O.
  • Secondary Indices (Skip Indexes):

    • Diagnosis: If you’ve defined secondary indices (e.g., INDEX index_name column TYPE minmax GRANULARITY 1), check if they are being used. You can often infer this from the EXPLAIN plan or by looking at the system.query_log table for read_bytes vs. bytes_uncompressed.
    • Fix: Create appropriate skip indexes. For example, if you frequently filter by user_id and event_date in a table not ordered by them:
      ALTER TABLE my_table ADD INDEX user_id_idx user_id TYPE bloom_filter GRANULARITY 1024;
      ALTER TABLE my_table ADD INDEX event_date_idx event_date TYPE set(10000) GRANULARITY 1024;
      
    • Why it works: Bloom filters, min/max indices, and set indices allow ClickHouse to quickly determine if a block of data might contain relevant rows, enabling it to skip blocks that definitely do not. GRANULARITY determines how often the index is updated within a data part.
  • Data Partitioning:

    • Diagnosis: Check your PARTITION BY clause in the CREATE TABLE statement. If you’re partitioning by a granularity that’s too large (e.g., partitioning by month when you query daily data), ClickHouse will scan all data for the current month instead of just a day.
    • Fix: Adjust partitioning. For daily data, PARTITION BY toYYYYMMDD(event_date) is common.
    • Why it works: Partitioning physically separates data on disk based on the partition key. When you filter by the partition key, ClickHouse only needs to read data from the relevant partitions, drastically reducing I/O.

4. Evaluate Data Compression and Codec

Ineffective compression can lead to larger data files, meaning more data to read from disk.

Diagnosis:

Use system.columns to check the CompressionCodec for your table columns.

Command:

SELECT name, compression_codec FROM system.columns WHERE database = 'default' AND table = 'my_table';

Output Example:

┌─name──────┬─compression_codec─┐
│ event_date│ LZ4               │
│ user_id   │ ZSTD(3)           │
│ value     │ DELTA,LZ4         │
└─────────────┴───────────────────┘

Fix:

  • ZSTD: Generally offers the best compression ratio for a good balance of speed. ZSTD(1) is fast, ZSTD(3) or ZSTD(5) offer better compression at a slightly higher CPU cost.
  • LZ4: Very fast decompression, but lower compression ratio. Good for columns that are read very frequently and where CPU is a bottleneck.
  • Delta/DoubleDelta: Excellent for sequential or nearly sequential data (like timestamps or counters). Often combined with LZ4 or ZSTD.

If you see NONE or just LZ4 on columns that benefit from better compression (like numeric IDs or timestamps), consider changing it:

ALTER TABLE my_table MODIFY COLUMN user_id ZSTD(3);
ALTER TABLE my_table MODIFY COLUMN event_date CODEC(ZSTD(1), LZ4); -- Apply multiple codecs

Why it works: Better compression means smaller files on disk. While decompression adds CPU overhead, it often saves significantly more time by reducing the amount of data that needs to be read from disk (which is typically much slower).

5. Examine Data Granularity (GRANULARITY)

The GRANULARITY setting in MergeTree tables dictates how many rows are grouped together for the primary index.

Diagnosis:

Check your CREATE TABLE statement for the GRANULARITY parameter. Default is 8192.

Fix:

  • Too High: If GRANULARITY is too high (e.g., 65536), the primary index becomes too sparse, and ClickHouse has to read more data per index entry.
  • Too Low: If GRANULARITY is too low (e.g., 128), index files become very large, increasing memory usage and potentially slowing down index lookups.

For most analytical workloads, the default GRANULARITY = 8192 is a good starting point. If your queries are still slow and EXPLAIN shows large row counts being scanned, consider decreasing it slightly (e.g., to 4096).

-- Note: Changing GRANULARITY requires rebuilding the table or parts.
-- This is a more advanced operation and usually done during table creation or via ALTER TABLE ... MODIFY ... which may rewrite data.
-- For existing tables, consider creating a new table with the desired granularity and migrating data.

Why it works: GRANULARITY balances index size and data skipping efficiency. A lower granularity means more index entries, but each entry covers fewer rows, allowing for more precise data skipping.

6. Check for Unnecessary Data Loading or Joins

Sometimes, the slowness isn’t in reading from the main table, but in how data is brought in or combined.

Diagnosis:

  • Subqueries/CTEs: Analyze your EXPLAIN plan for subqueries or Common Table Expressions (CTEs). If they’re materialized and large, they could be a bottleneck.
  • Joins: If your query involves JOINs, check the join keys and the size of the tables being joined. Unoptimized joins, especially with large right-hand tables, are notorious performance killers.
  • IN vs. JOIN: For large lists in WHERE col IN (list), consider if a JOIN with a temporary table or a pre-aggregated table would be more efficient.

Fix:

  • Subqueries: Rewrite subqueries to be more efficient, or materialize them differently if possible.
  • Joins: Ensure join keys are indexed and have matching data types. Use GLOBAL JOIN or ANY JOIN where appropriate. Consider denormalizing your schema if joins are consistently problematic.
  • IN clause: If the list in IN is large and static, consider creating a temporary table and joining against it.

Why it works: ClickHouse is optimized for analytical queries. Complex procedural logic or inefficient joins can bypass its strengths, leading to performance degradation.

7. Query Profiling

For very complex or elusive performance issues, dive into the query profile.

Diagnosis:

Run your query with FORMAT PROFILEEvents.

Command:

SELECT count() FROM my_table WHERE event_date = '2023-10-26' FORMAT PROFILEEvents;

Output Interpretation:

This provides a detailed breakdown of events during query execution. Look for:

  • ReadBuffer_ (I/O operations)
  • Execute (CPU time spent in various functions)
  • Network (if applicable)
  • UncompressedCache (if enabled and being hit/missed)

Fix:

This output often points back to the previous issues. For example, high ReadBuffer_ counts suggest I/O bottlenecks (likely due to poor data skipping or compression). High Execute counts in specific functions might indicate inefficient UDFs or complex aggregations.

Why it works: The profile provides a granular, event-by-event view of what the query is doing, allowing you to pinpoint the exact operations consuming the most resources.

If you’ve addressed data skipping, compression, and partitioning, and your queries are still slow, the next bottleneck you’ll likely encounter is CPU saturation from complex aggregations or data transformations.

Want structured learning?

Take the full Clickhouse course →