How ClickHouse Sparse Indexes and Granules Speed Up Queries (2026)

ClickHouse’s sparse indexes don’t index every row; they index blocks of data, making them incredibly efficient for analytical workloads.

Let’s see this in action with a common scenario: querying a massive events table by event_timestamp.

-- Assume a table with billions of rows
CREATE TABLE events (
    event_timestamp DateTime,
    event_type String,
    user_id UInt64,
    payload String
) ENGINE = MergeTree()
ORDER BY (event_type, user_id, event_timestamp);

Now, imagine we want to find all events for a specific user within a particular hour:

SELECT count(*)
FROM events
WHERE user_id = 1234567890
  AND event_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59';

Without sparse indexes, ClickHouse would have to scan a significant portion of the event_timestamp column, potentially billions of rows, to find the matching data. This is where the ORDER BY clause and ClickHouse’s indexing magic come into play.

The ORDER BY (event_type, user_id, event_timestamp) in the MergeTree engine definition is crucial. It tells ClickHouse to physically sort the data on disk according to these columns. This sorting is what enables the sparse index.

Instead of indexing every single event_timestamp, ClickHouse creates an index entry for every granule of data. A granule is a contiguous block of data, typically 8192 rows. For each granule, the sparse index stores the minimum and maximum values of the columns specified in the ORDER BY key.

So, for our events table, the sparse index would contain entries like this (conceptually):

Granule ID	Min `event_type`	Max `event_type`	Min `user_id`	Max `user_id`	Min `event_timestamp`	Max `event_timestamp`
1	'click'	'view'	1000	99999	'2023-10-27 00:00:00'	'2023-10-27 01:30:00'
2	'click'	'view'	5000	150000	'2023-10-27 01:30:01'	'2023-10-27 03:00:00'
…	…	…	…	…	…	…
N	'purchase'	'view'	1234567890	1234567890	'2023-10-27 10:00:00'	'2023-10-27 10:59:59'
…	…	…	…	…	…	…

When you execute the query WHERE user_id = 1234567890 AND event_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59', ClickHouse doesn’t scan the whole table. Instead, it first consults the sparse index.

Index Seek: It looks at the index entries for user_id and event_timestamp.
Range Elimination: For each granule, it checks if the granule’s min_user_id to max_user_id range and min_event_timestamp to max_event_timestamp range could possibly overlap with your query’s user_id = 1234567890 and event_timestamp BETWEEN '2023-10-27 10:00:00' AND '2023-10-27 10:59:59'.
Data Block Pruning: If a granule’s index range cannot contain any matching rows (e.g., a granule’s max_user_id is less than 1234567890 and its max_event_timestamp is before 10:00:00), ClickHouse completely skips reading the data for that entire granule. This is the "sparse" part – it’s not looking at every row, but at blocks of rows.
Data Retrieval: Only the granules that might contain matching data are read from disk.

This dramatically reduces the amount of I/O required. The ORDER BY key is paramount; the columns used in WHERE clauses that are also part of the ORDER BY key are the ones that benefit most directly from this primary sparse index.

The size of a granule (8192 rows) is configurable, but this default is a good balance between index granularity and memory footprint. You can inspect the index for a MergeTree table by querying system.parts_columns and system.columns or by using DESCRIBE TABLE ... to see the ORDER BY key.

The key takeaway is that ClickHouse’s MergeTree engine, when combined with a well-chosen ORDER BY key, builds a primary index that is sparse by design. This index operates on granules, allowing ClickHouse to discard huge chunks of data that cannot possibly satisfy the query conditions, leading to lightning-fast analytical queries on massive datasets.

The next challenge is understanding how to optimize for queries that don’t perfectly align with the primary ORDER BY key.