ClickHouse doesn’t actually make you wait for data to be written to disk before it tells you the insert succeeded, and that’s the most surprising thing about it.

Let’s watch an insert happen.

# Terminal 1: Start a ClickHouse client and prepare to insert
clickhouse-client --host localhost --port 9000 --user default --password ''

# In the client:
:) CREATE TABLE test_async (id UInt64, value String) ENGINE = MergeTree ORDER BY id;
:) INSERT INTO test_async VALUES (1, 'hello'), (2, 'world');
# Notice how fast this returns. It's already "successful".

# Terminal 2: Watch the data actually land in MergeTree
# This command tails the ClickHouse server logs, looking for MergeTree activity.
# Adjust path to your ClickHouse log directory if needed.
tail -f /var/log/clickhouse-server/clickhouse-server.log | grep -i 'MergeTree'

You’ll see log entries related to data parts being created and merged after the INSERT command in Terminal 1 returned. This is the asynchronous magic at play.

The Problem: Insert Latency

Traditional databases often block your insert operation until the data is durably written to disk. This is great for consistency but can be a major bottleneck for high-throughput applications or services that need to acknowledge data reception quickly. Imagine a web server logging millions of events per second – waiting for each log entry to hit disk before responding would cripple performance.

The Solution: Asynchronous Inserts

ClickHouse’s async inserts decouple the acknowledgment of an insert from the actual persistence of the data. When you send an INSERT statement, ClickHouse’s HTTP interface (or native client) quickly acknowledges the request, typically by writing the data to an in-memory buffer or a temporary file. This acknowledgment is what you see as a successful insert.

Behind the scenes, a separate process within ClickHouse is responsible for taking these buffered/temporary writes and merging them into the actual data parts that form the MergeTree structure on disk. This background merging process is optimized for throughput.

Here’s a simplified breakdown of the flow:

  1. Client Sends Data: Your application sends an INSERT query to the ClickHouse HTTP or native interface.
  2. Immediate Acknowledgment: ClickHouse receives the data and writes it to an in-memory buffer or a temporary location. It immediately responds to the client with a success status.
  3. Background Merging: A dedicated ClickHouse thread (or pool of threads) monitors these buffers/temporary files. It periodically flushes this data into immutable data parts on disk.
  4. Data Visibility: Once a data part is written to disk, it becomes visible for SELECT queries.

When to Use Async Inserts

Async inserts are ideal for:

  • High-volume, low-latency ingestion: Applications that need to send a lot of data quickly and don’t require immediate disk durability for every single record. Think IoT sensor data, application logs, clickstream events.
  • Microservices: When a service needs to quickly confirm it has "received" data before moving on to its next task.
  • Batching: While ClickHouse excels at batching, async inserts allow the acknowledgment of batches to be fast, even if the underlying merge process takes time.

Configuration Levers

The primary configuration for controlling asynchronous behavior is found in the config.xml file.

  • insert_quorum: This setting defines how many replicas must acknowledge an insert for it to be considered successful (for replicated tables). By default, insert_quorum is 2. If you set insert_quorum = 1, ClickHouse will wait for only one replica to acknowledge the insert. Setting insert_quorum = 'auto' means it will wait for (N/2) + 1 replicas, where N is the total number of replicas. For truly asynchronous inserts that return immediately, you often don’t set insert_quorum or set it to a value like 1, but the true async nature is in the background merging, not the quorum.
  • background_pool_size and background_schedule_pool_size: These settings control the number of threads available for background tasks, including the merging of data parts from async inserts. Increasing these can improve the rate at which buffered data is persisted, especially on multi-core servers.
  • max_insert_block_size: This defines the maximum number of rows in a block that ClickHouse will process as a single unit during insertion. Larger blocks can be more efficient for background merging.

The "Hidden" Asynchronous Behavior

The key insight is that even for non-replicated MergeTree tables without any quorum settings, INSERT operations are inherently asynchronous with respect to disk persistence. The data is first written to an in-memory buffer or a temporary location, and only then does the background Mergetree process pick it up to create a permanent data part. The INSERT query returns success once the data is in this initial buffer/temp location. The true "asynchronous" nature is this decoupling from the final, disk-based data part creation.

The next thing you’ll encounter is handling deduplication for these potentially fast, but perhaps duplicated, inserts.

Want structured learning?

Take the full Clickhouse course →