ClickHouse’s background merge process, essential for maintaining data efficiency, can become a significant source of I/O if not properly tuned, impacting query performance.
When ClickHouse writes data, it creates new, small data parts. To optimize storage and query speed, these parts are periodically merged into larger ones in the background. This merge process is I/O intensive because it reads existing data parts, combines them, and writes new, larger parts. If merges happen too frequently or involve too many small parts, the constant disk activity can saturate your I/O subsystem, slowing down both background operations and user queries.
The core of ClickHouse’s merge strategy is governed by settings related to merge_tree table engines, primarily MergeTree and its variants. These settings control when merges are triggered and how aggressively they proceed.
1. max_bytes_to_merge_at_max_space and max_bytes_to_merge_at_min_space
These settings dictate the maximum amount of data (in bytes) that ClickHouse will attempt to merge in a single background task, depending on the available disk space.
-
Diagnosis: Check your current settings:
SELECT name, value FROM system.settings WHERE name LIKE '%max_bytes_to_merge%';Observe the current values. If they are very low, ClickHouse might be triggering many small merges.
-
Cause: Default values are often conservative. If your disks have ample free space, you can afford to merge larger chunks at once, reducing merge frequency.
- High I/O: Low
max_bytes_to_mergevalues lead to more frequent, smaller merges. - Low I/O: High
max_bytes_to_mergevalues lead to less frequent, larger merges.
- High I/O: Low
-
Fix: Increase these values. For example, to allow merges of up to 500GiB when space is plentiful and 100GiB when space is limited:
-- In config.xml or system.settings <max_bytes_to_merge_at_max_space>500000000000</max_bytes_to_merge_at_max_space> <max_bytes_to_merge_at_min_space>100000000000</max_bytes_to_merge_at_min_space>(Replace with your desired values, e.g.,
500GiBor100GiBin bytes). -
Why it works: By allowing larger merges, ClickHouse performs fewer merge operations overall. Each larger merge is more efficient in terms of read/write amplification per byte of data processed.
2. number_of_free_entries_in_pool_to_lower_max_size_of_merge
This setting controls how much ClickHouse reduces the maximum size of a merge when the background merge pool is busy.
-
Diagnosis:
SELECT name, value FROM system.settings WHERE name = 'number_of_free_entries_in_pool_to_lower_max_size_of_merge';A low value means that even with a few merges running, ClickHouse will significantly shrink the size of subsequent merges.
-
Cause: If this value is too low (e.g., 0 or 1), ClickHouse will aggressively reduce merge sizes as soon as any background merges are active, leading to many small, inefficient merges.
-
Fix: Increase this value. A common recommendation is
16or32:-- In config.xml or system.settings <number_of_free_entries_in_pool_to_lower_max_size_of_merge>32</number_of_free_entries_in_pool_to_lower_max_size_of_merge> -
Why it works: A higher value allows ClickHouse to continue merging larger parts even when some background merge tasks are already running, promoting larger, more efficient merges before resorting to smaller ones.
3. background_pool_size
This setting determines the number of threads ClickHouse dedicates to background merge operations.
-
Diagnosis:
SELECT name, value FROM system.settings WHERE name = 'background_pool_size';Check if this value is too high or too low for your system’s core count and I/O capabilities.
-
Cause:
- Too High: If
background_pool_sizeis too high, it can lead to excessive competition for I/O resources, causing all background tasks (including merges) to slow down and potentially starve foreground queries. - Too Low: If it’s too low, merges might not keep up with data ingestion, leading to an ever-increasing number of small parts.
- Too High: If
-
Fix: Adjust based on your CPU cores and I/O capacity. A good starting point is often
16for I/O-bound systems ornumber_of_cores / 2for CPU-bound systems.-- In config.xml or system.settings <background_pool_size>16</background_pool_size> -
Why it works: This balances the number of concurrent merge operations against available system resources, preventing I/O saturation while ensuring merges progress reasonably.
4. background_schedule_pool_size
This controls the number of threads used for scheduling background tasks, including merges.
-
Diagnosis:
SELECT name, value FROM system.settings WHERE name = 'background_schedule_pool_size'; -
Cause: If this is too low, ClickHouse might not be able to schedule merges efficiently, leading to delays in starting new merge tasks. If it’s too high, it might consume unnecessary CPU cycles.
-
Fix: A common value is
16or32.-- In config.xml or system.settings <background_schedule_pool_size>32</background_schedule_pool_size> -
Why it works: Ensures that the scheduling of merge tasks isn’t a bottleneck, allowing the
background_pool_sizethreads to be utilized effectively.
5. max_parts_in_total and max_rows_to_merge_at_max_space
While max_bytes_to_merge is primary, these also play a role. max_parts_in_total limits the total number of parts per table, and max_rows_to_merge_at_max_space limits merges by row count.
-
Diagnosis:
SELECT name, value FROM system.settings WHERE name LIKE '%max_parts_in_total%' OR name LIKE '%max_rows_to_merge%'; -
Cause: If
max_parts_in_totalis set very low, ClickHouse might force merges more aggressively than desired. Ifmax_rows_to_merge_at_max_spaceis too restrictive, it can also lead to many small merges. -
Fix: Generally,
max_parts_in_totalcan be left at its default (or increased significantly if you have many small inserts).max_rows_to_merge_at_max_spaceshould be considered in conjunction withmax_bytes_to_merge. Ensure it’s not overly restrictive.-- Example: Increase max_parts_in_total if needed, but usually not the primary tuning knob for I/O -- <max_parts_in_total>10000</max_parts_in_total> -
Why it works: These provide secondary constraints, ensuring that merges don’t become excessively fragmented or too large by row count, complementing the byte-based limits.
General Tuning Strategy:
- Monitor: Use
system.mergesandsystem.query_logto understand merge activity and its impact on query performance. Look forelapsedtimes andread/write_rowsinsystem.merges. Highelapsedtimes for many small merges indicate a problem. - Increase
max_bytes_to_merge: Start by significantly increasingmax_bytes_to_merge_at_max_spaceandmax_bytes_to_merge_at_min_spaceto allow larger, more efficient merges. - Adjust Pool Sizes: Tune
background_pool_sizeandbackground_schedule_pool_sizeto match your hardware. - Iterate: Make changes gradually and monitor their impact.
When you’ve successfully tuned your merge settings, the next challenge you’ll likely encounter is optimizing query execution plans to leverage the larger, more efficient data parts.