ClickHouse’s analytical window functions are a surprisingly powerful and performant way to perform calculations across sets of table rows that are related to the current row, especially for time-series data.
Let’s see this in action. Imagine we have a table sensor_readings with timestamp (DateTime), sensor_id (UInt8), and value (Float64) columns. We want to calculate the 5-minute moving average of value for each sensor_id.
SELECT
timestamp,
sensor_id,
value,
avg(value) OVER (
PARTITION BY sensor_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL 5 MINUTE PRECEDING AND CURRENT ROW
) AS moving_avg_5min
FROM sensor_readings
WHERE timestamp BETWEEN '2023-10-26 00:00:00' AND '2023-10-26 01:00:00'
ORDER BY sensor_id, timestamp;
This query, when run on a moderately sized dataset, will output something like this:
┌───────────timestamp─┬─sensor_id─┬───value─┬─moving_avg_5min─┐
│ 2023-10-26 00:00:00 │ 1 │ 25.5 │ 25.5 │
│ 2023-10-26 00:01:30 │ 1 │ 26.1 │ 25.8 │
│ 2023-10-26 00:03:00 │ 1 │ 25.8 │ 25.8 │
│ 2023-10-26 00:05:00 │ 1 │ 27.0 │ 26.1 │
│ 2023-10-26 00:07:00 │ 1 │ 26.5 │ 26.35 │
│ 2023-10-26 00:00:00 │ 2 │ 15.2 │ 15.2 │
│ 2023-10-26 00:02:00 │ 2 │ 14.8 │ 15.0 │
│ 2023-10-26 00:04:00 │ 2 │ 15.5 │ 15.166666666666666 │
└─────────────────────┴───────────┴─────────┴─────────────────┘
The core problem window functions solve is performing calculations over a "window" of rows without resorting to self-joins or complex subqueries, which are often inefficient for large datasets, especially time-series. For time-series metrics, this means easily calculating things like moving averages, cumulative sums, or ranking over time partitions.
Internally, ClickHouse processes window functions efficiently. The OVER clause defines the window. PARTITION BY sensor_id tells ClickHouse to perform the calculation independently for each sensor_id. ORDER BY timestamp is crucial for time-series, as it defines the order within each partition. The RANGE BETWEEN INTERVAL 5 MINUTE PRECEDING AND CURRENT ROW specifies the actual window frame: for each row, it considers all preceding rows within the same sensor_id partition whose timestamp is within 5 minutes of the current row’s timestamp, plus the current row itself.
The avg() function then operates on the value column for all rows within this defined window. This is significantly faster than a typical SQL self-join that would try to achieve the same result by joining sensor_readings to itself based on time intervals. ClickHouse’s vectorized execution engine can process these window frames very effectively, especially when the data is sorted appropriately (which ORDER BY timestamp helps facilitate).
The PARTITION BY clause is your primary tool for segmenting your data. If you were calculating a moving average across all sensors, you’d omit it. If you wanted to compare sensors within a region, you might add PARTITION BY region_id, sensor_id. The ORDER BY within the OVER clause dictates the order within each partition, which is almost always chronological for time-series. The frame clause (ROWS BETWEEN or RANGE BETWEEN) defines how many rows or what time range constitutes the "window" for each calculation. RANGE BETWEEN is particularly powerful for time-series as it’s independent of the row density, whereas ROWS BETWEEN would be fixed by the number of data points.
A common pitfall is misunderstanding the RANGE versus ROWS frame. If you use ROWS BETWEEN INTERVAL 5 MINUTE PRECEDING AND CURRENT ROW, ClickHouse would literally look at the previous 5 rows (if they exist) and the current row, regardless of their timestamps. This is usually not what you want for time-series analysis where the time interval is the critical factor. Using RANGE BETWEEN INTERVAL 5 MINUTE PRECEDING AND CURRENT ROW correctly uses the ORDER BY timestamp column to define the window boundaries based on actual time.
When using RANGE BETWEEN, the ORDER BY column must be ordered and usually numeric or a temporal type. ClickHouse uses the values in the ORDER BY column to determine which rows fall within the specified range. For example, if timestamp is 2023-10-26 00:05:00, INTERVAL 5 MINUTE PRECEDING would include rows with timestamps from 2023-10-26 00:00:00 up to and including 2023-10-26 00:05:00.
You can also use LAG() and LEAD() functions to access values from previous or subsequent rows within the partition. For example, LAG(value, 1, 0) OVER (PARTITION BY sensor_id ORDER BY timestamp) would give you the value of the previous row for that sensor, defaulting to 0 if there’s no previous row. This is incredibly useful for calculating deltas or comparing consecutive measurements.
The next logical step after mastering moving averages and simple aggregations is to explore cumulative aggregations with window functions, like SUM(...) OVER (PARTITION BY sensor_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).