Materialized views in ClickHouse are not just pre-aggregated tables; they are a fundamental mechanism for query acceleration that operates by proactively materializing query results rather than lazily computing them on demand.

Let’s see this in action. Imagine a clicks table logging user interactions:

CREATE TABLE clicks
(
    event_time DateTime,
    user_id UInt64,
    event_type String,
    country String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);

INSERT INTO clicks VALUES
('2023-10-26 10:00:00', 123, 'view', 'US'),
('2023-10-26 10:01:00', 456, 'click', 'CA'),
('2023-10-26 10:02:00', 123, 'purchase', 'US'),
('2023-10-26 10:03:00', 789, 'view', 'DE'),
('2023-10-26 10:04:00', 123, 'view', 'US');

Now, we frequently need to count daily unique users per country. A direct query would scan the clicks table:

SELECT
    toDate(event_time) AS event_date,
    country,
    count(DISTINCT user_id) AS unique_users
FROM clicks
WHERE event_time >= '2023-10-26 00:00:00' AND event_time < '2023-10-27 00:00:00'
GROUP BY event_date, country
ORDER BY event_date, country;

This is fine for small datasets, but for billions of rows, it’s slow. A materialized view, however, pre-computes these aggregates as data is inserted into the base table.

CREATE MATERIALIZED VIEW daily_unique_users_mv
TO clicks_aggregated -- This is the target table where data will be stored
AS
SELECT
    toDate(event_time) AS event_date,
    country,
    count(DISTINCT user_id) AS unique_users
FROM clicks
GROUP BY event_date, country;

And we need a destination table for the materialized view:

CREATE TABLE clicks_aggregated
(
    event_date Date,
    country String,
    unique_users AggregateFunction(count(DISTINCT user_id), UInt64) -- Stores aggregate state
) ENGINE = AggregatingMergeTree(event_date, country)
ORDER BY (event_date, country);

Now, when we insert data into clicks:

INSERT INTO clicks VALUES ('2023-10-26 10:05:00', 999, 'view', 'US');

The daily_unique_users_mv automatically processes this new data and updates clicks_aggregated. The key here is that clicks_aggregated stores AggregateFunction states, not raw counts. This is crucial for incremental updates.

To query the materialized view, we don’t query daily_unique_users_mv directly. Instead, we query the target table (clicks_aggregated) using finalizeAggregation:

SELECT
    event_date,
    country,
    finalizeAggregation(unique_users) AS unique_users
FROM clicks_aggregated
WHERE event_date = '2023-10-26'
GROUP BY event_date, country
ORDER BY event_date, country;

This query is lightning fast because it reads pre-aggregated, summarized data from clicks_aggregated and only performs the final aggregation step.

The mental model is this: the materialized view acts as a background process. For every INSERT into the source table (clicks), it runs the SELECT query defined in the view’s CREATE statement, but instead of writing to a regular table, it inserts into a designated target table (clicks_aggregated). The AggregateFunction engine in the target table is what makes this efficient for aggregations; it stores intermediate states that can be combined incrementally. The ORDER BY clause of the target table (AggregatingMergeTree) should ideally align with the GROUP BY of the materialized view for optimal merging and aggregation.

The ENGINE = AggregatingMergeTree and the use of AggregateFunction in the target table are not strictly required for all materialized views, but they are the most common and powerful pattern for speeding up aggregation queries. If the materialized view’s definition involved a simple SUM(amount) without DISTINCT, the target table could be a regular MergeTree and the view would store the sums directly. However, for COUNT(DISTINCT) or AVG, AggregateFunction is essential.

The most surprising thing about materialized views is that they don’t just store results; they store intermediate aggregation states that can be merged and finalized incrementally, allowing for near-instantaneous updates to aggregated results as new data arrives in the base table. This is fundamentally different from a traditional pre-aggregated table where you’d have to rebuild the entire aggregate.

What most people don’t realize is that the ORDER BY clause of the target table for an AggregatingMergeTree should ideally match the GROUP BY clause of the materialized view definition. This alignment allows ClickHouse to perform more efficient merges of the aggregate states within the AggregatingMergeTree engine, significantly reducing the computational cost of background merges and improving query performance when finalizeAggregation is called. For example, if your MV groups by event_date, country, your AggregatingMergeTree table should also be ordered by event_date, country.

Next, you’ll likely encounter the challenge of managing the storage for materialized views, especially when they become very large, leading into the realm of data lifecycle management and potentially using different MergeTree variants for the target tables.

Want structured learning?

Take the full Clickhouse course →