Resource pools in ClickHouse are how you carve up your server’s CPU and memory to ensure different types of queries get the resources they need, preventing one runaway query from starving others.
Let’s say you have a ClickHouse cluster and you want to make sure your interactive dashboards don’t get bogged down by heavy batch ETL jobs. You can use resource pools to dedicate a certain amount of CPU and memory to each.
Here’s a simplified setup in config.xml for two pools: one for "interactive" queries and another for "batch" queries.
<yandex>
<resources>
<pools>
<pool>
<name>interactive</name>
<max_memory_usage>10G</max_memory_usage>
<max_cpu_cores>4</max_cpu_cores>
</pool>
<pool>
<name>batch</name>
<max_memory_usage>50G</max_memory_usage>
<max_cpu_cores>8</max_cpu_cores>
</pool>
</pools>
</resources>
</yandex>
When you submit a query, you can assign it to a specific pool using the SETTINGS clause. For example, to run a dashboard query against the interactive pool:
SELECT count() FROM events SETTINGS resource_pool = 'interactive';
And a bulk data loading query against the batch pool:
INSERT INTO analytics_data SELECT ... FROM raw_data SETTINGS resource_pool = 'batch';
The max_memory_usage is a hard limit on the RAM a pool can consume for active queries. If a query in a pool tries to allocate more memory than its limit, ClickHouse will throw an MEMORY_LIMIT_EXCEEDED error for that query. The max_cpu_cores is a limit on the number of CPU cores the scheduler will allow queries within that pool to utilize concurrently. If the total CPU demand from queries in a pool exceeds this, queries will be throttled.
These pools don’t just magically appear. You need to configure them in your ClickHouse server’s config.xml file, typically within the <yandex> section, under <resources> and then <pools>. Each <pool> element defines a named resource pool with its own constraints.
The real magic happens in how ClickHouse’s query scheduler uses these definitions. When a query arrives, if it’s explicitly assigned to a pool, the scheduler tries to fit it within that pool’s resource constraints. If it’s not assigned, it falls into a default pool. You can even set a default pool in your users.xml configuration to control what happens to unassigned queries.
Let’s look at users.xml to assign a default pool to a user:
<yandex>
<users>
<user>
<name>dashboard_user</name>
<default_resource_pool>interactive</default_resource_pool>
<networks>
<ip>::/0</ip>
</networks>
<profile>default</profile>
</user>
</users>
</yandex>
This ensures that dashboard_user’s queries, if not explicitly assigned to another pool, will automatically try to use the interactive pool’s resources.
The max_memory_usage is actually a bit more nuanced than just a RAM cap. It’s a per-query limit within the pool, not a total pool limit. So, if you have max_memory_usage: 10G for the interactive pool, it means each query in that pool can use up to 10GB of RAM. The total memory used by all queries in the pool is managed by the ClickHouse server’s overall memory allocator, but the pool definition acts as a guideline and a mechanism for preventing individual queries from consuming excessive resources.
ClickHouse doesn’t enforce max_cpu_cores as a strict hard limit in the same way it does memory. Instead, it uses this as a hint for the scheduler. If a pool has max_cpu_cores: 4, the scheduler will try to ensure that no more than 4 cores are actively executing tasks for queries within that pool at any given moment. This is achieved by queueing or throttling queries when the limit is approached. It’s a way to manage concurrency and prevent CPU starvation for other pools.
The most surprising thing about resource pools is that they don’t automatically isolate resource usage at the operating system level (e.g., using cgroups). ClickHouse manages these limits entirely within its own process. This means that while you can control how ClickHouse allocates CPU and memory internally, you still need to be mindful of the overall server resources. A very "greedy" pool could still potentially impact other processes running on the same host if ClickHouse itself is starved of system resources.
If you define a resource pool with max_memory_usage: 0, it effectively disables the memory limit for that pool, allowing queries within it to consume as much memory as available on the server, up to the system’s limits. This is rarely desirable but can be useful for debugging or specific, controlled scenarios.
The next concept you’ll likely encounter is how to monitor the actual resource usage of these pools to ensure your configuration is effective.