ClickHouse log table engines are a specialized set of table engines designed for scenarios where data is primarily appended and rarely, if ever, updated or deleted, offering a performance edge over the more general-purpose MergeTree family in specific use cases.
Let’s see what this looks like in practice. Imagine you’re ingesting a high volume of application logs. You’d typically create a table like this:
CREATE TABLE application_logs (
log_timestamp DateTime,
level String,
message String,
host String
) ENGINE = Log;
When you insert data, it’s simply appended to the end of the file.
INSERT INTO application_logs VALUES (now(), 'INFO', 'User logged in', 'webserver01');
INSERT INTO application_logs VALUES (now() - INTERVAL 1 MINUTE, 'WARN', 'Disk space low', 'dbserver01');
Now, let’s contrast this with a MergeTree table for the same data.
CREATE TABLE application_logs_mergetree (
log_timestamp DateTime,
level String,
message String,
host String
) ENGINE = MergeTree()
ORDER BY log_timestamp;
When you insert into application_logs_mergetree, ClickHouse not only appends the data but also immediately starts background processes to sort and merge these data parts based on the log_timestamp order key. This background work, while essential for query performance on MergeTree, is overhead that Log engines bypass.
The core problem Log engines solve is the overhead associated with data manipulation in MergeTree tables. MergeTree tables are optimized for analytical queries that benefit from sorted data and efficient data skipping. This is achieved through sorting data by the ORDER BY key, creating sparse primary indexes, and background merging of data parts. However, for pure append-only workloads, these features are either unnecessary or introduce performance penalties during ingestion. The Log engine, by contrast, is designed for maximum insertion throughput. It writes data sequentially to a single file per table, with no background merging or indexing. This simplicity means inserts are as fast as disk I/O allows.
The primary levers you control are the choice of engine itself and, for MergeTree variants, the ORDER BY key and partitioning. For Log engines, there’s no ORDER BY or partitioning to configure; the engine’s behavior is fixed. This makes them incredibly simple to use but also less flexible.
The most surprising true thing about these engines is that Log and TinyLog tables are not intended for concurrent writes from multiple servers in a distributed ClickHouse setup. They are designed for single-server, high-throughput ingestion where you might have one or more client applications writing to a single ClickHouse node. If you need distributed ingestion with append-only semantics, you’d typically use a MergeTree table with ReplacingMergeTree or CollapsingMergeTree (if you have specific deduplication or collapsing needs) and manage the distributed writes at the application or ingestion layer, or rely on ClickHouse’s distributed DDL and INSERT statements which will route data to the appropriate shards.
The next concept you’ll likely encounter is handling data retention and querying performance on these very simple, un-indexed tables.