Set Up MySQL Primary-Replica Failover with Automatic Promotion (2026)

The most surprising truth about MySQL primary-replica failover is that the "automatic" part is often a carefully orchestrated illusion, relying on external tools to detect failure and manually trigger promotion.

Imagine this setup: a primary MySQL server (let’s call it mysql-primary) handles all writes, and one or more replica servers (mysql-replica-1, mysql-replica-2) mirror its data for read scaling and disaster recovery. When mysql-primary goes down, we need mysql-replica-1 to seamlessly take over as the new primary.

Here’s a simplified view of the components involved.

MySQL Replication Basics:

Binary Log (binlog): The primary server records all data-changing events (inserts, updates, deletes) in its binary log.
Replication Threads:
- I/O Thread (on replica): Connects to the primary, reads binlog events, and writes them to a relay log on the replica.
- SQL Thread (on replica): Reads events from the relay log and applies them to the replica’s data.
CHANGE MASTER TO: This command on the replica tells it where to connect to the primary and which binlog file and position to start reading from.
SHOW MASTER STATUS: On the primary, this shows the current binlog file and position.
SHOW REPLICA STATUS (or SHOW SLAVE STATUS in older versions): On the replica, this shows the I/O and SQL thread states, the last read binlog file/position from the primary, and the executed binlog file/position. Crucially, Seconds_Behind_Master (or Seconds_Behind_Source) tells you how far behind the replica is.

The Failover Illusion:

"Automatic" failover typically involves:

Monitoring: A separate process (like Orchestrator, MHA, or a custom script) constantly checks the health of the primary. This might involve pinging the server, checking if SHOW MASTER STATUS returns valid data, or even running a quick query.
Failure Detection: When the monitor can’t reach the primary or detects an error, it flags a failure.
Replica Identification: The monitor identifies a suitable replica to promote. This usually means the replica that is most caught up (lowest Seconds_Behind_Master) and healthy.
Promotion: The monitor executes commands on the chosen replica to:
- Stop the I/O and SQL threads (if they are running).
- Discard any remaining relay log events that might be inconsistent.
- Tell MySQL to stop replicating from the old primary and start accepting writes as a new primary. This is often done by setting read_only = 0 (if it was set) and potentially reconfiguring server_id.
Reconfiguration: The monitor then updates other replicas to replicate from the new primary, and directs application traffic to it.

Setting Up a Basic Failover Scenario (Conceptual Example):

Let’s say we have two MySQL instances, db-prod-1 (primary) and db-prod-2 (replica).

On db-prod-1 (Primary):

-- Ensure binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- Expected output: log_bin | ON

-- Set a unique server ID (important for replication)
SHOW VARIABLES LIKE 'server_id';
-- Expected output: server_id | 1 (or some unique number)

-- Create a replication user
CREATE USER 'replicator'@'%' IDENTIFIED BY 'your_strong_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;

-- Get the current binlog position
SHOW MASTER STATUS;
-- Example output: File | mysql-bin.000001, Position | 12345

On db-prod-2 (Replica):

-- Set a unique server ID, different from the primary
SHOW VARIABLES LIKE 'server_id';
-- Expected output: server_id | 2 (or some unique number)

-- Configure replication to point to the primary
CHANGE MASTER TO
  MASTER_HOST='db-prod-1',
  MASTER_USER='replicator',
  MASTER_PASSWORD='your_strong_password',
  MASTER_LOG_FILE='mysql-bin.000001', -- From SHOW MASTER STATUS on primary
  MASTER_LOG_POS=12345;             -- From SHOW MASTER STATUS on primary

-- Start replication threads
START SLAVE; -- Or START REPLICA;

-- Check replica status
SHOW SLAVE STATUS\G; -- Or SHOW REPLICA STATUS\G
-- Look for:
-- Slave_IO_Running: Yes
-- Slave_SQL_Running: Yes
-- Seconds_Behind_Master: 0 (or a small, stable number)

The "Automatic" Part - Using Orchestrator:

Orchestrator is a popular tool for this. You’d install it on a separate server. It scans your MySQL topology.

Configuration: You’d tell Orchestrator how to connect to your MySQL instances (credentials, hostnames).
Monitoring: Orchestrator continuously checks the health of db-prod-1. It might query SHOW MASTER STATUS and check Seconds_Behind_Master on replicas.
Failure Detection: If db-prod-1 becomes unreachable, Orchestrator detects it.
Promotion Command (Orchestrator UI/CLI): You would typically go to the Orchestrator web UI and click "Discover," then "Promote" on db-prod-2. Orchestrator then executes the necessary steps:
- It stops replication on db-prod-2.
- It makes db-prod-2 writable (SET GLOBAL read_only = 0;).
- It tells other replicas (if any) to start replicating from db-prod-2 as the new primary.

The Real Mechanics of Promotion:

When Orchestrator (or similar tool) promotes db-prod-2, it’s essentially doing this on db-prod-2:

STOP SLAVE; (or STOP REPLICA;) - Prevents further replication from the old primary.
RESET SLAVE ALL; - Clears replication configuration. This is a strong indicator that this server is no longer a replica.
SET GLOBAL read_only = 0; - Makes the server writable.
SET GLOBAL server_id = <old_primary_server_id>; (Optional, but can simplify things if you want the promoted server to retain the original primary’s ID).
FLUSH TABLES WITH READ LOCK; (briefly, if needed to ensure data consistency before reconfiguring other replicas).

Then, it reconfigures other replicas to point to the new primary (db-prod-2) using CHANGE MASTER TO with db-prod-2’s hostname and its current binlog position.

The key is that MySQL itself doesn’t have a built-in "failover" command. It’s a series of SQL commands executed by an external orchestrator after it determines the primary is dead.

The next hurdle you’ll typically face is managing split-brain scenarios, where network issues might cause both the old primary (if it recovers) and the new primary to accept writes independently, leading to data divergence.