CockroachDB’s backup and restore functionality to S3 is surprisingly flexible, allowing you to restore to a specific point-in-time before a given backup was taken, not just after.

Let’s see it in action. Imagine we have a running CockroachDB cluster and we want to back it up to an S3 bucket.

First, we need to configure the S3 connection. This involves providing your AWS credentials and the S3 bucket details. You can do this via environment variables or by creating a secrets file. For simplicity, let’s use environment variables:

export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
export AWS_REGION="us-east-1"

Now, let’s perform a full backup of our cluster:

cockroach backup --certs-dir=/path/to/certs --host=your-cockroachdb-host --port=26257 s3://your-bucket-name/backups/my_cluster_backup_$(date +%Y%m%d_%H%M%S)

This command takes a full backup of all data in your cluster and uploads it to the specified S3 path. The certs-dir is crucial for secure communication with your cluster.

To restore, you’ll need the path to your backup in S3. Let’s say we want to restore this backup to a new cluster or to a different database within the same cluster.

cockroach restore s3://your-bucket-name/backups/my_cluster_backup_YYYYMMDD_HHMMSS --certs-dir=/path/to/certs --host=your-restore-host --port=26257 --target-db=restored_db

This command initiates the restore process. target-db specifies where the data will be restored. If you omit target-db, it will restore to the default database.

The mental model for backup and restore in CockroachDB revolves around distributed snapshots. When you initiate a backup, CockroachDB coordinates across all its nodes to generate consistent, point-in-time snapshots of your data. These snapshots are then streamed to your configured destination, in this case, S3. Restore works by reading these snapshots from S3 and replaying them into a target cluster or database, ensuring data integrity and consistency.

The backup command itself is a distributed operation. CockroachDB doesn’t just dump a file; it orchestrates the creation of multiple files that represent the state of your data at a specific moment. This is why you can specify AS OF SYSTEM TIME during a restore operation. It allows you to rewind your data to a state before the backup was taken, which is incredibly powerful for recovering from logical errors or accidental data modifications. For example, if you discover a critical data corruption at 10:00 AM and your last backup was at 9:00 AM, you can restore from the 9:00 AM backup but specify AS OF SYSTEM TIME '2023-10-27 09:59:00' to effectively get the state of your data just before the corruption occurred, even though the backup itself represents the 9:00 AM state.

The "trick" most people miss is the granular control over the restore point. Many think backup means "restore to this exact state." However, CockroachDB’s restore command supports the AS OF SYSTEM TIME clause, allowing you to restore data as it existed at any point in time between the start of the cluster’s operation and the completion of the backup. This means you can recover from logical errors that occurred after the backup was taken by restoring to a point in time just before the error.

The next step in mastering data management is understanding incremental backups and how they can significantly reduce backup times and storage costs.

Want structured learning?

Take the full Cockroachdb course →