ClickHouse’s sparse indexes don’t index every row; they index blocks of data, making them incredibly efficient for analytical workloads.
Let’s see this in action with a common scenario: querying a massive events table by event_timestamp.
-- Assume a table with billions of rows
CREATE TABLE events (
event_timestamp DateTime,
event_type String,
user_id UInt64,
payload String
) ENGINE = MergeTree()
ORDER BY (event_type, user_id, event_timestamp);
Now, imagine we want to find all events for a specific user within a particular hour:
SELECT count(*)
FROM events
WHERE user_id = 1234567890
AND event_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59';
Without sparse indexes, ClickHouse would have to scan a significant portion of the event_timestamp column, potentially billions of rows, to find the matching data. This is where the ORDER BY clause and ClickHouse’s indexing magic come into play.
The ORDER BY (event_type, user_id, event_timestamp) in the MergeTree engine definition is crucial. It tells ClickHouse to physically sort the data on disk according to these columns. This sorting is what enables the sparse index.
Instead of indexing every single event_timestamp, ClickHouse creates an index entry for every granule of data. A granule is a contiguous block of data, typically 8192 rows. For each granule, the sparse index stores the minimum and maximum values of the columns specified in the ORDER BY key.
So, for our events table, the sparse index would contain entries like this (conceptually):
| Granule ID | Min event_type |
Max event_type |
Min user_id |
Max user_id |
Min event_timestamp |
Max event_timestamp |
|---|---|---|---|---|---|---|
| 1 | 'click' | 'view' | 1000 | 99999 | '2023-10-27 00:00:00' | '2023-10-27 01:30:00' |
| 2 | 'click' | 'view' | 5000 | 150000 | '2023-10-27 01:30:01' | '2023-10-27 03:00:00' |
| … | … | … | … | … | … | … |
| N | 'purchase' | 'view' | 1234567890 | 1234567890 | '2023-10-27 10:00:00' | '2023-10-27 10:59:59' |
| … | … | … | … | … | … | … |
When you execute the query WHERE user_id = 1234567890 AND event_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59', ClickHouse doesn’t scan the whole table. Instead, it first consults the sparse index.
- Index Seek: It looks at the index entries for
user_idandevent_timestamp. - Range Elimination: For each granule, it checks if the granule’s
min_user_idtomax_user_idrange andmin_event_timestamptomax_event_timestamprange could possibly overlap with your query’suser_id = 1234567890andevent_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59'. - Data Block Pruning: If a granule’s index range cannot contain any matching rows (e.g., a granule’s
max_user_idis less than1234567890and itsmax_event_timestampis before10:00:00), ClickHouse completely skips reading the data for that entire granule. This is the "sparse" part – it’s not looking at every row, but at blocks of rows. - Data Retrieval: Only the granules that might contain matching data are read from disk.
This dramatically reduces the amount of I/O required. The ORDER BY key is paramount; the columns used in WHERE clauses that are also part of the ORDER BY key are the ones that benefit most directly from this primary sparse index.
The size of a granule (8192 rows) is configurable, but this default is a good balance between index granularity and memory footprint. You can inspect the index for a MergeTree table by querying system.parts_columns and system.columns or by using DESCRIBE TABLE ... to see the ORDER BY key.
The key takeaway is that ClickHouse’s MergeTree engine, when combined with a well-chosen ORDER BY key, builds a primary index that is sparse by design. This index operates on granules, allowing ClickHouse to discard huge chunks of data that cannot possibly satisfy the query conditions, leading to lightning-fast analytical queries on massive datasets.
The next challenge is understanding how to optimize for queries that don’t perfectly align with the primary ORDER BY key.