Database replication and failover are critical for ensuring your applications stay online even when a database server goes down. Most people think of replication as just copying data, but its real power is in enabling automatic takeover when the primary fails.
Let’s see this in action with PostgreSQL. We’ll set up a primary server and a replica.
Primary Server (pg_primary.conf)
listen_addresses = '*'
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024MB # Or wal_keep_segments for older versions
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/wal_archive/%f'
hot_standby = on
Replica Server (pg_replica.conf)
hot_standby = on
On the primary, we create a replication user and grant them privileges:
-- On primary
CREATE USER replicator WITH REPLICATION PASSWORD 'mysecretpassword';
GRANT CONNECT ON DATABASE mydatabase TO replicator;
Then, on the replica, we initialize it from the primary’s data directory. This involves stopping the replica, clearing its data, and then using pg_basebackup:
# On replica
pg_ctl stop -D /var/lib/postgresql/data
# Clean out existing data (be careful!)
rm -rf /var/lib/postgresql/data/*
# Perform the base backup
pg_basebackup -h pg_primary_ip_address -U replicator -D /var/lib/postgresql/data -Fp -Xs -P -R
The -R flag automatically creates the standby.signal file and populates postgresql.auto.conf (or recovery.conf for older versions) on the replica with connection information to the primary. The pg_basebackup command copies the entire data directory from the primary to the replica, sets up the replica to be a standby, and configures it to stream WAL (Write-Ahead Logging) records from the primary. The wal_level = replica on the primary ensures that enough information is logged to support replication. max_wal_senders defines how many streaming replication connections the primary can handle. wal_keep_size (or wal_keep_segments) on the primary ensures that WAL files are retained on the primary until the replica has successfully received and applied them, preventing the replica from falling too far behind. archive_mode = on and archive_command on the primary are for point-in-time recovery and can also be used by a replica to catch up if streaming replication is interrupted. hot_standby = on allows read-only queries to be executed on the replica while it’s replicating.
The core mechanism is WAL streaming. The primary writes all changes to its WAL files. The replica connects to the primary using a WAL sender process and requests WAL records. These records are sent over the network and applied by the replica’s WAL receiver process, keeping the replica’s data in sync.
The real magic happens with failover. When the primary becomes unavailable, a separate tool or process detects this and promotes the replica to become the new primary. PostgreSQL has built-in support for this, but it often relies on external agents like repmgr or Patroni to manage the promotion process and DNS updates.
Here’s a common configuration for repmgr on the replica:
# repmgr.conf on replica
node.id=2
node.name='pg_replica'
conninfo='host=pg_replica_ip_address dbname=repmgr user=repmgr password=mysecretpassword connect_timeout=2'
data_directory='/var/lib/postgresql/data'
pg_bindir='/usr/pgsql-13/bin'
# Primary connection string for repmgr to monitor
primary_conninfo='host=pg_primary_ip_address dbname=repmgr user=repmgr password=mysecretpassword connect_timeout=2'
And on the primary:
# repmgr.conf on primary
node.id=1
node.name='pg_primary'
conninfo='host=pg_primary_ip_address dbname=repmgr user=repmgr password=mysecretpassword connect_timeout=2'
data_directory='/var/lib/postgresql/data'
pg_bindir='/usr/pgsql-13/bin'
You’d also need a repmgr database and user on both servers, created using repmgr -h <host> -U repmgr -d repmgr -p <port> primary register and repmgr -h <host> -U repmgr -d repmgr -p <port> standby register --primary-node-id=1.
The most surprising thing about automatic failover is how often it doesn’t involve complex network configurations or custom scripts. Tools like Patroni manage the PostgreSQL cluster state, using a distributed consensus system like etcd or Consul to elect a leader and coordinate failover. This eliminates single points of failure in the failover logic itself.
When a primary fails, Patroni on the surviving replica(s) will detect the primary’s absence via its health checks against etcd. If the replica is healthy and no other replica has been promoted, it will initiate the promotion process. This involves stopping WAL streaming, replaying any remaining WAL from the primary’s last known state, and then starting PostgreSQL in standalone mode, allowing it to accept writes. Patroni can also be configured to update DNS records or load balancer configurations to direct application traffic to the new primary.
The critical piece that often trips people up is ensuring the replica can actually catch up to the primary’s state. If the primary crashes before all its committed transactions are written to WAL and sent to the replica, the replica might not have all the data. This is where archive_mode becomes vital. If streaming replication breaks, a replica can potentially recover by fetching missing WAL segments from the archive.
The next challenge you’ll face is managing multiple replicas, ensuring consistent read scaling, and handling split-brain scenarios during network partitions.