ClickHouse’s Time To Live (TTL) feature can automatically delete old data, but it’s not the "set it and forget it" feature many expect.

Let’s see it in action. Imagine a logs table:

CREATE TABLE logs (
    event_date Date,
    event_time DateTime,
    message String
) ENGINE = MergeTree()
ORDER BY (event_date, event_time);

We want to delete logs older than 30 days. We add a TTL clause:

ALTER TABLE logs
    MODIFY TTL event_time + INTERVAL 30 DAY DELETE;

Now, when you insert data:

INSERT INTO logs VALUES (toDate('2023-01-01'), toDateTime('2023-01-01 10:00:00'), 'Log message 1');
INSERT INTO logs VALUES (toDate('2023-01-02'), toDateTime('2023-01-02 11:00:00'), 'Log message 2');
-- ... many more inserts ...
INSERT INTO logs VALUES (toDate('2023-03-15'), toDateTime('2023-03-15 12:00:00'), 'Recent log');

ClickHouse doesn’t immediately purge the old rows. Instead, it marks them for deletion. The actual deletion happens during background merge processes. When a part of the table containing old data is merged with newer data, the old rows are excluded from the resulting merged part.

The core problem TTL solves is managing storage growth in analytical databases where historical data often becomes less relevant but still consumes significant space. Manually deleting old data can be inefficient and error-prone. TTL automates this process based on defined conditions.

Internally, ClickHouse uses a background process for TTL. It scans table parts and identifies rows that meet the TTL condition. For DELETE TTL, it essentially rebuilds the data part without the expired rows. For UPDATE or MOVE TO TABLE TTL, it performs a similar operation, but instead of discarding the rows, it transforms them or writes them to a different table.

The primary lever you control is the TTL expression. This expression is evaluated for each row. It can be a simple time interval, like event_time + INTERVAL 30 DAY, or a more complex condition involving other columns. For example, you could move older, less frequently accessed data to a colder storage table:

ALTER TABLE logs
    MODIFY TTL event_time + INTERVAL 90 DAY MOVE TO TABLE logs_archive;

Here, logs_archive would need to be a separate table, potentially with a different engine or schema optimized for archival.

The DELETE action for TTL is asynchronous. It doesn’t happen immediately when a row expires based on the event_time column. Instead, the TTL check is performed during background merge operations. A data part is only truly "cleaned" of expired rows when it’s merged into a larger, newer part. This means that even after a row has logically expired, it might still physically exist in a data part until that part is merged. The frequency of merges is influenced by the MergeTree engine’s settings and the volume of data being inserted. You can monitor the background pool activity via system tables like system.merges.

If you’re expecting immediate data removal upon expiration, you’ll be disappointed. The delay is inherent to how MergeTree handles data modifications and optimizations.

The next challenge will be managing the lifecycle of archived data if you use MOVE TO TABLE TTL.

Want structured learning?

Take the full Clickhouse course →