A robust backup script in Bash isn’t just about copying files; it’s about creating a self-healing, auditable, and efficient system that can actually recover your data when you need it most.

Let’s walk through building one, assuming you want to back up a directory /data to a remote server backup.example.com via SSH, keeping daily backups for 7 days.

#!/bin/bash

# --- Configuration ---
SOURCE_DIR="/data"
BACKUP_HOST="backup.example.com"
BACKUP_USER="backupuser"
BACKUP_BASE_DIR="/backups/myserver"
RETENTION_DAYS=7
SSH_KEY="/home/user/.ssh/id_rsa_backup" # Private key for passwordless SSH

# --- Timestamp and Filename ---
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
BACKUP_FILENAME="data_backup_${TIMESTAMP}.tar.gz"
REMOTE_BACKUP_PATH="${BACKUP_BASE_DIR}/${BACKUP_FILENAME}"

# --- Logging ---
LOG_FILE="/var/log/backup_script.log"
exec > >(tee -a "$LOG_FILE") 2>&1 # Redirect stdout and stderr to log file and console

echo "--- Starting backup: $(date) ---"

# --- Pre-checks ---
if [ ! -d "$SOURCE_DIR" ]; then
  echo "ERROR: Source directory '$SOURCE_DIR' does not exist."
  exit 1
fi

if [ ! -f "$SSH_KEY" ]; then
  echo "ERROR: SSH key '$SSH_KEY' not found."
  exit 1
fi

# Check if remote backup directory exists, create if not
ssh -i "$SSH_KEY" "${BACKUP_USER}@${BACKUP_HOST}" "[ ! -d \"${BACKUP_BASE_DIR}\" ] && mkdir -p \"${BACKUP_BASE_DIR}\""
if [ $? -ne 0 ]; then
  echo "ERROR: Could not create or access remote backup directory '${BACKUP_BASE_DIR}' on ${BACKUP_HOST}."
  exit 1
fi

# --- Create Backup ---
echo "Creating archive: $BACKUP_FILENAME"
# Use --absolute-names to ensure full paths are stored, which can be crucial for restoration.
# Use --exclude to skip temporary or volatile files.
tar --absolute-names -czvf "$BACKUP_FILENAME" --exclude="*.sock" --exclude="cache/" --exclude="tmp/" "$SOURCE_DIR"
if [ $? -ne 0 ]; then
  echo "ERROR: tar command failed during archive creation."
  exit 1
fi

# --- Transfer Backup ---
echo "Transferring archive to ${BACKUP_HOST}:${REMOTE_BACKUP_PATH}"
scp -i "$SSH_KEY" "$BACKUP_FILENAME" "${BACKUP_USER}@${BACKUP_HOST}:${REMOTE_BACKUP_PATH}"
if [ $? -ne 0 ]; then
  echo "ERROR: scp command failed during file transfer."
  # Clean up the local archive if transfer failed
  rm -f "$BACKUP_FILENAME"
  exit 1
fi

# --- Cleanup Local Archive ---
echo "Cleaning up local archive: $BACKUP_FILENAME"
rm -f "$BACKUP_FILENAME"

# --- Remote Cleanup (Retention Policy) ---
echo "Applying retention policy: keeping last ${RETENTION_DAYS} days on ${BACKUP_HOST}"
# Use find to locate old backups and delete them.
# -type f: only consider files.
# -name 'data_backup_*.tar.gz': match our backup file naming convention.
# -mtime +$((RETENTION_DAYS - 1)): find files older than RETENTION_DAYS (e.g., +6 for 7 days).
# -delete: remove the found files.
ssh -i "$SSH_KEY" "${BACKUP_USER}@${BACKUP_HOST}" "find \"${BACKUP_BASE_DIR}\" -type f -name 'data_backup_*.tar.gz' -mtime +$((RETENTION_DAYS - 1)) -delete"
if [ $? -ne 0 ]; then
  echo "WARNING: Remote cleanup command encountered an issue, but backup was successful."
  # Don't exit here, as the primary goal (backup) was achieved.
fi

echo "--- Backup finished successfully: $(date) ---"
exit 0

This script leverages tar for compression and archiving, and scp for secure transfer. The ssh command is used for pre-flight checks and remote cleanup.

The core of the backup is tar --absolute-names -czvf "$BACKUP_FILENAME" ... "$SOURCE_DIR". The --absolute-names flag is critical because it stores the full path of each file within the archive. Without it, tar would store files relative to the SOURCE_DIR, meaning you’d have to cd into /data on the destination to restore properly, which is error-prone. The c flag creates an archive, z compresses it with gzip, v is for verbose output (helpful for debugging, though you might remove it for production), and f specifies the output filename.

For transfer, scp -i "$SSH_KEY" "$BACKUP_FILENAME" "${BACKUP_USER}@${BACKUP_HOST}:${REMOTE_BACKUP_PATH}" securely copies the generated archive. The -i "$SSH_KEY" ensures it uses your specific SSH private key, which should be configured for passwordless login to backup.example.com for the backupuser.

The remote cleanup is handled by ssh ... "find ... -mtime +$((RETENTION_DAYS - 1)) -delete". This command runs on the backup.example.com server. find searches the BACKUP_BASE_DIR for files (-type f) whose names match our pattern (-name 'data_backup_*.tar.gz') and are older than RETENTION_DAYS days (-mtime +$((RETENTION_DAYS - 1))). The $((RETENTION_DAYS - 1)) is important: -mtime +N means files whose data was last modified more than N*24 hours ago. So, for 7 days of retention, we want to delete files older than 7 days, which means files modified more than 6 full days ago. The -delete action removes the found files.

The logging is piped to tee so you see output on the console and it’s appended to /var/log/backup_script.log. This is invaluable for troubleshooting.

Most people miss the subtle but crucial difference between tar storing relative paths and absolute paths. If you restore an archive created with relative paths from /data/file.txt, and you try to restore it to /mnt/restore, the file will end up at /mnt/restore/data/file.txt if you’re not careful about cd-ing into the correct directory first. --absolute-names ensures the path /data/file.txt is stored, so when you restore it to /mnt/restore, it will correctly place the file at /mnt/restore/data/file.txt.

The next step you’ll likely face is handling verification. After the backup, you’ll want to run a checksum on the remote file and compare it to a checksum generated locally before transfer.

Want structured learning?

Take the full Bash course →