ClickHouse doesn’t just tell you that a query was slow; it can show you exactly why, down to the microsecond, by letting you peer into the execution plan as it runs.

Let’s watch a slow query get dissected. Imagine this query is grinding your dashboard to a halt:

SELECT
    toDate(event_time) AS event_date,
    count() AS event_count
FROM events
WHERE event_time BETWEEN '2023-10-01 00:00:00' AND '2023-10-31 23:59:59'
GROUP BY event_date
ORDER BY event_date;

This query, at first glance, looks innocent. It’s a simple aggregation over a date range. But if it’s slow, we need to see where the time is being spent.

The first tool is EXPLAIN. Running EXPLAIN on your query gives you the intended execution plan, not the actual one. It’s a blueprint.

EXPLAIN SELECT toDate(event_time) AS event_date, count() AS event_count FROM events WHERE event_time BETWEEN '2023-10-01 00:00:00' AND '2023-10-31 23:59:59' GROUP BY event_date ORDER BY event_date;

This will output something like:

-> SELECT toDate(event_time) AS event_date, count() AS event_count FROM events WHERE event_time BETWEEN '2023-10-01 00:00:00' AND '2023-10-31 23:59:59' GROUP BY event_date ORDER BY event_date

Not very helpful yet! EXPLAIN in ClickHouse is often used with PIPELINE or SYNTAX to see the structure, but for performance, we need to see it in action.

To do that, we enable trace logs. This requires a configuration change. In your config.xml or a file in config.d/, you’d add or modify the <log> section:

<clickhouse>
    <log>
        <level>trace</level>
        <log_queries>1</log_queries>
        <log_query_threads>1</log_query_threads>
        <log_query_path>/var/log/clickhouse-server/query_trace.log</log_query_path>
        <log_query_max_size>104857600</log_query_max_size> <!-- 100MB -->
    </log>
</clickhouse>

After restarting the ClickHouse server, slow queries (by default, queries taking more than 1 second) will start appearing in /var/log/clickhouse-server/query_trace.log with detailed timing information.

Let’s simulate a slow query and look at the trace log output. Suppose our events table is massive and the event_time column isn’t optimally sorted or indexed. The trace log might show something like this (simplified):

2023-10-27 10:30:00.123456 [12345] <Trace> void DB::executeQuery(const DB::String&, const DB::BlockIO&, bool, bool, bool, bool, bool) - Query: SELECT toDate(event_time) AS event_date, count() AS event_count FROM events WHERE event_time BETWEEN '2023-10-01 00:00:00' AND '2023-10-31 23:59:59' GROUP BY event_date ORDER BY event_date
...
2023-10-27 10:30:05.456789 [12345] <Trace> void DB::ProcessThreadPool::workerThread() - Thread 1: Processed 1000000000 rows in 5.333333 seconds. Operation: ReadPart. Path: /var/lib/clickhouse/data/default/events/202310_1/
...
2023-10-27 10:30:08.789012 [12345] <Trace> void DB::ProcessThreadPool::workerThread() - Thread 2: Processed 1000000000 rows in 3.333333 seconds. Operation: Filter. Data: 1000000000 rows filtered, 500000000 kept.
...
2023-10-27 10:30:10.111222 [12345] <Trace> void DB::ProcessThreadPool::workerThread() - Thread 3: Processed 500000000 rows in 1.333333 seconds. Operation: Aggregation.
...
2023-10-27 10:30:11.555666 [12345] <Trace> void DB::ProcessThreadPool::workerThread() - Thread 4: Processed 31 rows in 1.444444 seconds. Operation: Sorting.

The key is to look for the Operation and the associated time. Here, "ReadPart" took over 5 seconds. This indicates that ClickHouse had to scan a huge amount of raw data from disk. "Filter" took 3.3 seconds, showing that even after reading, a lot of rows were discarded. "Aggregation" was relatively fast, but "Sorting" at the end took a significant chunk of time.

The core problem here is the inefficient data scanning. ClickHouse’s performance hinges on minimizing the amount of data it needs to read and process.

Common Causes and Fixes:

  1. No Primary Key or Poorly Chosen Primary Key: If event_time isn’t part of your ORDER BY clause (which defines the primary key for MergeTree engines), ClickHouse might scan entire data parts.

    • Diagnosis: Check DESCRIBE TABLE events. If event_time isn’t the first column (or among the first few), this is likely the issue.
    • Fix: Recreate the table with event_time as the primary key. For example:
      -- Backup existing data if necessary
      CREATE TABLE events_new (...) ENGINE = MergeTree() ORDER BY (event_time, ...); -- Add other columns as needed for sorting
      INSERT INTO events_new SELECT * FROM events;
      RENAME TABLE events TO events_old, events_new TO events;
      DROP TABLE events_old;
      
    • Why it works: The ORDER BY clause on a MergeTree table creates a sorted index. By placing event_time first, ClickHouse can use "sparse primary index" lookups to quickly find the relevant data blocks for the date range, drastically reducing I/O.
  2. Wide Tables (Too Many Columns): If your events table has hundreds of columns and you’re only selecting a few, ClickHouse still has to read the metadata for all columns in the scanned parts.

    • Diagnosis: DESCRIBE TABLE events. Count the columns.
    • Fix: Create a materialized view or a new table with only the necessary columns.
      CREATE MATERIALIZED VIEW events_mv TO events_minimal (event_time Date, event_count AggregateFunction(count)) AS
      SELECT toDate(event_time) AS event_time, count() AS event_count FROM events GROUP BY toDate(event_time);
      -- Then query events_mv
      
    • Why it works: ClickHouse stores data in columns. Selecting only a few columns means it only needs to read those specific columns from disk, not all of them.
  3. Large Number of Small Data Parts: Frequent small inserts can lead to many data parts. Querying across many parts incurs overhead for opening and merging them.

    • Diagnosis: SELECT count() FROM system.parts WHERE table = 'events' AND active; If the count is in the thousands, this is a problem.
    • Fix: Manually trigger a merge or wait for background merges. You can also adjust background_pool_size and background_merges_mutations_concurrency in config.xml if you have many tables needing merges.
      OPTIMIZE TABLE events FINAL; -- Use with caution on very large tables
      
    • Why it works: OPTIMIZE TABLE FINAL forces ClickHouse to merge all data parts into a single, larger part, reducing the overhead of managing and accessing numerous small files.
  4. Unnecessary ORDER BY at the End: The ORDER BY event_date at the end of the query can be expensive if the number of groups is large.

    • Diagnosis: The trace log shows a significant time spent in the "Sorting" operation after aggregation.
    • Fix: If the order doesn’t matter for the application consuming the data, remove the ORDER BY clause. If it does, ensure the ORDER BY column is already sorted by the primary key or that the aggregation can produce sorted output.
      -- Remove ORDER BY if not strictly necessary
      SELECT toDate(event_time) AS event_date, count() AS event_count FROM events WHERE event_time BETWEEN '2023-10-01 00:00:00' AND '2023-10-31 23:59:59' GROUP BY event_date;
      
    • Why it works: Sorting is an O(N log N) operation. If the results are already in the desired order (e.g., due to the primary key), this step is skipped.
  5. Inefficient Data Types or Functions: Using functions like toDate() on a large number of rows can be a bottleneck.

    • Diagnosis: The trace log shows significant time in the "Function" or "Transform" phase where toDate is applied.
    • Fix: Store event_time as a Date type if possible, or pre-calculate the date during ingestion.
      -- If event_time is already a DateTime, casting to Date is usually fast.
      -- If event_time is a String, it's much slower. Convert to DateTime or Date.
      -- Best: Store as Date/DateTime from the start.
      -- If not possible, consider a materialized view to pre-calculate dates.
      
    • Why it works: Applying functions on every row during query execution is costly. Pre-calculating or storing data in a format that doesn’t require runtime transformations is much faster.
  6. Insufficient Server Resources: While the above are query-specific, a general lack of CPU or I/O capacity will slow down all queries.

    • Diagnosis: Monitor system CPU, RAM, and disk I/O during query execution using tools like htop, iostat, or ClickHouse’s system.metrics table.
    • Fix: Scale up hardware or optimize other resource-intensive queries.
    • Why it works: The query is starved of the resources it needs to execute quickly.

After applying these fixes, you’ll find the trace logs show much shorter times for "ReadPart", "Filter", and "Sorting", and the overall query duration plummets.

The next challenge you’ll encounter is understanding how to use system.query_log for historical analysis of slow queries.

Want structured learning?

Take the full Clickhouse course →