cbbackupmgr is the official Couchbase tool for backing up and restoring your data, and it’s surprisingly powerful for how simple it can feel.
Here’s how it works in practice. Imagine you have a Couchbase cluster running, and you want to take a snapshot. You’d run something like this:
cbbackupmgr backup \
--cluster tcp://10.0.0.1:8091 \
--backup-dir /mnt/backups/my_couchbase_backup \
--username Administrator \
--password password123
This command tells cbbackupmgr to connect to the Couchbase cluster at 10.0.0.1:8091, create a backup in the /mnt/backups/my_couchbase_backup directory, using the provided credentials. Couchbase will then stream the data from each bucket, shard by shard, to the specified directory. Each shard’s data is stored in its own file, typically named with a UUID.
Restoring is just as straightforward. If you needed to bring that backup back to a cluster (either the same one or a different one), you’d use:
cbbackupmgr restore \
--cluster tcp://10.0.0.2:8091 \
--backup-dir /mnt/backups/my_couchbase_backup \
--username Administrator \
--password password123 \
--restore-buckets my_bucket_name
This command targets a new cluster at 10.0.0.2:8091 and restores the data from /mnt/backups/my_couchbase_backup, specifically for the bucket named my_bucket_name. If you omit --restore-buckets, it attempts to restore all buckets found in the backup directory.
The primary problem cbbackupmgr solves is providing a consistent, point-in-time snapshot of your Couchbase data. Unlike trying to manually copy files, cbbackupmgr understands Couchbase’s internal data structures and ensures that the backup is a valid, restorable representation of your cluster’s state. It handles the complexities of distributed data, ensuring that data from all nodes and all vBuckets within a bucket are captured.
Internally, cbbackupmgr uses Couchbase’s internal backup APIs. When you run a backup, it instructs the Couchbase nodes to serialize their data. This data is then streamed over the network to the machine where cbbackupmgr is running and written to disk. The tool manages the metadata for the backup, including which buckets and buckets were included, and the location of the data files. For restoration, it reverses this process, reading the data files and instructing the target Couchbase cluster to ingest them.
You have fine-grained control over what gets backed up and restored. You can specify specific buckets using the --backup-buckets and --restore-buckets flags, allowing you to back up only critical data or restore only specific datasets. You can also manage multiple backup sets in a single directory by using the --backup-id flag, which appends a unique identifier to the backup, allowing you to later select which backup to restore from. For example, a backup might be stored in /mnt/backups/my_couchbase_backup/2023-10-27T10:00:00Z.
The mechanism for handling data compression and encryption during backup is also managed by cbbackupmgr. You can specify compression levels using --compression (e.g., zstd or gzip) and encryption keys using --encryption-key. This is crucial for both storage efficiency and data security. When restoring an encrypted backup, the --encryption-key must be provided again to decrypt the data before it’s loaded into the target cluster.
The most surprising thing about cbbackupmgr is its ability to perform incremental backups, which are not immediately obvious from its basic usage. While the primary commands are backup and restore, cbbackupmgr also supports incremental backups by leveraging Couchbase’s XDCR (Cross Datacenter Replication) technology in a clever way. When you configure a replication from a source bucket to a destination bucket on the same Couchbase cluster, and then use cbbackupmgr to back up that destination bucket, you are effectively creating an incremental backup. The initial backup captures a full snapshot, and subsequent backups of the destination bucket will only contain the changes that have occurred since the last full backup. This is not a true block-level incremental backup but a snapshot of the replicated data, which can significantly reduce backup storage and time for frequently changing datasets.
After successfully restoring your data, you’ll likely encounter the challenge of re-establishing any network or security configurations that were not part of the data backup itself.