ClickHouse doesn’t actually have a built-in, single command for taking a full, consistent snapshot of your entire cluster and restoring it. That’s where clickhouse-backup swoops in.

Let’s see it in action. Imagine you have a single ClickHouse node running, and you want to back up a table named my_table in the default database.

# Install clickhouse-backup (if you haven't already)
curl -sL https://github.com/AlexAkulov/clickhouse-backup/releases/download/v2.0.10/clickhouse-backup_linux_amd64 -o clickhouse-backup
chmod +x clickhouse-backup

# Take a backup
./clickhouse-backup -c my_clickhouse_config.yml --table default.my_table --backup

# Restore the backup (assuming you want to restore to a new table named my_restored_table)
./clickhouse-backup -c my_clickhouse_config.yml --table default.my_restored_table --restore

Here, my_clickhouse_config.yml would look something like this:

host: localhost
port: 9000
user: default
password: ''
compression: 'lz4'
storage_configuration:
  type: 'local'
  path: '/var/lib/clickhouse/backups'

This setup defines how clickhouse-backup connects to your ClickHouse instance and where it stores backups locally. The --table flag specifies which table to target, and --backup initiates the backup process, while --restore does the opposite.

The core problem clickhouse-backup solves is managing the complexity of ClickHouse’s distributed nature for backups. ClickHouse stores data in shards and replicas. A simple SELECT INTO OUTFILE won’t capture the entire cluster state consistently. clickhouse-backup orchestrates this by interacting with the ClickHouse server’s API to freeze writes (briefly), copy the actual data files from disk, and then record metadata. For distributed setups, it coordinates this across multiple nodes. It also handles compression and uploading to various storage backends like S3, GCS, or SFTP.

Internally, when you run a backup, clickhouse-backup does a few key things:

  1. Connects to ClickHouse: Uses the provided credentials to establish a connection.
  2. Freezes Writes (Optional but Recommended): For a consistent snapshot, it can issue an ALTER TABLE ... FREEZE command. This creates a read-only snapshot of the table’s data parts on disk without stopping the server.
  3. Copies Data Parts: It then locates and copies the actual data files (for the specified table or database) from ClickHouse’s data directory (/var/lib/clickhouse/data/) to a temporary location or directly to your configured storage.
  4. Compresses Data: Applies the configured compression algorithm (e.g., lz4, zstd).
  5. Uploads to Storage: If a remote storage is configured (like S3), it uploads the compressed backup files.
  6. Cleans Up: Removes the temporary FREEZE snapshot.

Restoring reverses this process: downloading from storage, decompressing, and then using INSERT statements or directly placing files back into ClickHouse’s data directory (often requires server restart or specific ALTER commands to attach parts).

The levers you control are primarily in the configuration file: database connection details, the storage backend and its credentials, compression settings, and which tables/databases to include or exclude. The command-line flags let you override or specify specific actions like --table, --database, --schema, --restore, --backup, --clean, and --remote-path.

A common misconception is that clickhouse-backup automatically handles schema changes between backup and restore. It doesn’t. If you back up a table with columns A and B, and then drop column B before restoring, the restore will likely fail because the data parts won’t match the current schema. You need to ensure schema compatibility or manage schema migrations separately.

The next logical step after mastering single-node backups is understanding how clickhouse-backup handles distributed clusters with multiple shards and replicas.

Want structured learning?

Take the full Clickhouse course →