Materialized views in BigQuery don’t speed up queries by pre-calculating every possible combination of your data, they do it by intelligently selecting and pre-computing a subset of data that most queries will need.
Let’s see this in action. Imagine you have a massive orders table with billions of rows, and you frequently query for the total sales per day.
-- Your original, slow query
SELECT
DATE(order_timestamp) AS order_date,
SUM(order_total) AS daily_sales
FROM
`my_project.my_dataset.orders`
WHERE
order_timestamp BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
order_date
ORDER BY
order_date;
This query scans the entire orders table, which is slow and expensive.
Now, let’s create a materialized view that aggregates this data daily:
-- Creating the materialized view
CREATE MATERIALIZED VIEW `my_project.my_dataset.daily_sales_mv`
OPTIONS (
enable_refresh = true,
refresh_interval_minutes = 60 -- Refresh every hour
) AS
SELECT
DATE(order_timestamp) AS order_date,
SUM(order_total) AS daily_sales
FROM
`my_project.my_dataset.orders`
GROUP BY
order_date;
With this materialized view in place, BigQuery’s query optimizer is smart enough to rewrite your original query to use the daily_sales_mv instead of the base orders table. When you run the same query as before:
-- The SAME query as before
SELECT
DATE(order_timestamp) AS order_date,
SUM(order_total) AS daily_sales
FROM
`my_project.my_dataset.orders`
WHERE
order_timestamp BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
order_date
ORDER BY
order_date;
BigQuery will notice that the daily_sales_mv already contains the aggregated order_date and daily_sales columns. It will then rewrite the query internally to query the materialized view, filtering on order_date if the WHERE clause matches the materialized view’s structure. The query plan will show that it’s reading from daily_sales_mv, not orders, resulting in dramatically faster execution and lower costs.
The core problem materialized views solve is reducing the amount of data scanned for repetitive aggregation or filtering operations. Instead of scanning billions of raw rows every time, BigQuery scans a pre-computed, smaller table. This is particularly effective for:
- Aggregations:
SUM,COUNT,AVG,MIN,MAXover specific dimensions. - Filtering on frequently used columns: If you always filter by
regionorproduct_category. - Joining smaller dimension tables to large fact tables: Pre-joining and aggregating can be very effective.
How it Works Internally:
When you create a materialized view, BigQuery doesn’t just store the result of the SELECT statement. It stores the definition of the view and a mechanism to incrementally update it.
- Creation: BigQuery runs the
CREATE MATERIALIZED VIEWstatement, populating the view’s storage with the initial aggregated data. - Query Rewriting: When you submit a query, BigQuery’s optimizer analyzes it. If the query can be satisfied entirely or partially by existing materialized views, it rewrites the query to use the materialized view. This rewriting is automatic and transparent to the user.
- Incremental Refresh: The
enable_refresh = trueandrefresh_interval_minutesoptions tell BigQuery to periodically update the materialized view. When new data arrives in the base table (ordersin our example), BigQuery detects the changes and applies them to the materialized view. This is usually an incremental process, meaning it only processes the new or changed data from the base table, making refreshes much faster than a full recomputation. - Staleness: Materialized views are not always perfectly up-to-date. The
refresh_interval_minutesdictates the maximum staleness. If your query needs data fresher than the last refresh, BigQuery will fall back to querying the base table.
Levers You Control:
OPTIONS (enable_refresh = true, refresh_interval_minutes = X): This is your primary control for how fresh the data in the materialized view will be and whether it updates automatically. A smallerXmeans fresher data but more frequent (and potentially more costly) refreshes.- The
SELECTstatement within theCREATE MATERIALIZED VIEW: This defines what data is pre-computed. You need to carefully design this to cover your most common query patterns. - Partitioning and Clustering of the Materialized View: Just like base tables, you can partition and cluster your materialized views to further optimize query performance when accessing the materialized view itself. For example, if your materialized view is heavily filtered by
order_date, partitioning byorder_dateon the MV is a good idea.
A common misconception is that materialized views are a silver bullet for all performance problems. They are most effective when there’s a significant overlap between the data required by your queries and the data pre-computed by the materialized view. If your queries constantly hit different subsets of data, or require extremely low latency (near real-time), a materialized view might not be the best fit, or its definition might need to be more complex. BigQuery’s query optimizer is quite sophisticated; it will only use a materialized view if it determines it will result in a faster and/or cheaper query plan. You can inspect the query plan to confirm if a materialized view was used.
The next step after optimizing aggregation queries with materialized views is often exploring how to handle complex joins that can also be accelerated.