ClickHouse’s tiered storage for S3 doesn’t actually move data to S3; it accesses data already there, but it does so in a way that feels local.
Let’s say you have a massive dataset in ClickHouse, and you want to keep it accessible but reduce your hot storage costs. Tiered storage lets you define "cold" volumes that point to data files stored in S3. When a query hits data in a cold volume, ClickHouse transparently fetches it from S3, reads it, and then discards it from its local cache (if it was even cached). It’s like having a local disk, but the backing store is S3.
Here’s a table definition that uses tiered storage. Notice the volume clause:
CREATE TABLE my_s3_data (
event_date Date,
user_id UInt64,
event_type String,
value Float64
)
ENGINE = MergeTree()
ORDER BY (event_date, user_id)
PARTITION BY toYYYYMM(event_date)
SETTINGS storage_policy = 's3_cold_policy';
And here’s how the s3_cold_policy would be defined in your ClickHouse config.xml or a separate file in conf.d:
<yandex>
<storage_configuration>
<policies>
<s3_cold_policy>
<volume>
<name>s3_cold</name>
<source>
<type>s3</type>
<bucket>my-clickhouse-cold-bucket</bucket>
<region>us-east-1</region>
<path>data/my_s3_data/</path>
<!-- Optional: credentials if not using IAM roles -->
<!--
<access_key_id>YOUR_ACCESS_KEY_ID</access_key_id>
<secret_access_key>YOUR_SECRET_ACCESS_KEY</secret_access_key>
-->
</source>
<max_data_volume_size>10000000000000</max_data_volume_size> <!-- 10TB -->
</volume>
<!-- You can define other volumes for hot storage here -->
<!--
<volume>
<name>hot_local</name>
<disk>default</disk>
<max_data_volume_size>500000000000</max_data_volume_size>
</volume>
-->
</s3_cold_policy>
</policies>
<storage_volumes>
<s3_cold>
<volume_type>HOT</volume_type> <!-- Even though it's S3, we treat it as HOT for access -->
<disk>s3_disk</disk>
</s3_cold>
<!--
<hot_local>
<volume_type>HOT</volume_type>
<disk>default</disk>
</hot_local>
-->
</storage_volumes>
<disks>
<s3_disk>
<type>s3</type>
<bucket>my-clickhouse-cold-bucket</bucket>
<region>us-east-1</region>
<path>disks/s3_disk/</path>
<!-- Optional: credentials -->
</s3_disk>
<default> <!-- Assuming you have a default local disk -->
<type>local</type>
<path>/var/lib/clickhouse/data/</path>
<max_size>1000000000000</max_size> <!-- 1TB -->
</default>
</disks>
</storage_configuration>
</yandex>
The key is that storage_policy on the table points to a policy that includes a volume backed by S3. When ClickHouse needs to read data that isn’t on a local "hot" disk, it checks if the data part exists in a volume configured for S3. If it does, it downloads the necessary parts from S3 on demand.
This is often confused with ClickHouse’s S3 table engine, which only reads data directly from S3 without storing it locally at all. Tiered storage is for data that was or could be on local disks but you want to offload the persistent storage cost to S3 while keeping query performance relatively high for that data.
The volume_type in <storage_volumes> is a bit of a misnomer here. Even though you’re pointing to S3, you typically set HOT for tiered storage volumes because you want ClickHouse to actively fetch and potentially cache this data for queries. If you set it to COLD, ClickHouse would only consider it for background merging operations, not for direct query reads.
The most surprising thing about this setup is how seamlessly ClickHouse integrates the remote S3 data into its query execution plan. It doesn’t feel like a separate data source; it’s just another "disk" for ClickHouse, albeit one with higher latency. The max_data_volume_size on the S3 volume is a soft limit, mainly for reporting and perhaps some internal heuristics, not a hard cap on how much data can be stored in S3.
When data is written to a table using a policy that includes an S3 volume, ClickHouse will write it to the local hot disk first. Only when background processes like merges or explicit ALTER TABLE ... FREEZE commands occur, or when data is explicitly moved via ALTER TABLE ... MODIFY SETTING storage_policy = '...', will data actually be transferred or referenced in S3. The "tiering" aspect is managed by ClickHouse’s background processes and storage policies, deciding which data parts reside on local disks versus being referenced in S3.
If you’re using an S3 bucket for tiered storage and your ClickHouse nodes are running on EC2 instances within the same AWS region, you should strongly consider using IAM roles attached to the EC2 instances instead of embedding access_key_id and secret_access_key directly in your configuration. This is more secure and avoids exposing credentials.
The next step after setting this up is understanding how ClickHouse manages data placement and movement between volumes within a policy, especially when you have multiple hot and cold volumes defined.