MySQL’s semi-synchronous replication can make your data seem more durable than it actually is if you don’t understand its core limitation.

Let’s see it in action. First, we need a primary and a replica.

On the primary:

-- Enable binlog
SET GLOBAL log_bin = 1;
-- Set unique server ID
SET GLOBAL server_id = 1;
-- Set binlog format to ROW
SET GLOBAL binlog_format = 'ROW';
-- Create replication user
CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
-- Get primary's current binlog position
SHOW MASTER STATUS;

Note down the File and Position from SHOW MASTER STATUS.

On the replica:

-- Set unique server ID
SET GLOBAL server_id = 2;
-- Configure replica to connect to primary
CHANGE MASTER TO
  MASTER_HOST='<primary_ip_address>',
  MASTER_USER='repl',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='<primary_log_file_from_show_master_status>',
  MASTER_LOG_POS=<primary_log_pos_from_show_master_status>;
-- Start replication
START SLAVE;
-- Check status
SHOW SLAVE STATUS\G

Look for Slave_IO_Running: Yes and Slave_SQL_Running: Yes.

Now, to make it semi-synchronous, we need the rpl_semi_sync_master and rpl_semi_sync_replica plugins. Install them on both primary and replica.

On the primary, load and enable the master plugin:

-- Load the plugin
INSTALL PLUGIN rpl_semi_sync_master SONAME 'rpl_semi_sync_master.so';
-- Enable the plugin
SET GLOBAL rpl_semi_sync_master_enabled = 1;
-- Set a timeout (e.g., 10 seconds)
SET GLOBAL rpl_semi_sync_master_timeout = 10000; -- milliseconds

On the replica, load and enable the replica plugin:

-- Load the plugin
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'rpl_semi_sync_replica.so';
-- Enable the plugin
SET GLOBAL rpl_semi_sync_replica_enabled = 1;

With semi-synchronous replication, the primary waits for at least one replica to acknowledge receipt of the transaction’s commit event before the primary commits the transaction locally. This is achieved by the replica sending an ACK after it has applied the transaction. The primary’s rpl_semi_sync_master plugin waits for this ACK. If the timeout (rpl_semi_sync_master_timeout) is reached without an ACK, the primary will either wait longer (default behavior for older versions) or revert to asynchronous mode (default for newer versions) to avoid blocking application writes indefinitely.

The core problem is that the primary only waits for acknowledgment of receipt of the commit event, not for the transaction to be applied on the replica. The replica can acknowledge receipt, and then immediately crash before it has a chance to actually write the transaction to its own data files. In this scenario, the primary thinks the write is safe because it got an ACK, but the data is lost if the replica fails catastrophically before persisting it.

The mechanism for this acknowledgment is typically a network packet sent back from the replica’s applier thread to the primary’s connection handler. The rpl_semi_sync_replica plugin on the replica intercepts the COMMIT event and, after it’s written to the replica’s relay log, signals the primary. The rpl_semi_sync_master plugin on the primary has a worker thread that monitors these acknowledgments.

What most people miss is that the rpl_semi_sync_master_timeout is critical. If this timeout is too short, and you have a slow or congested network, the primary might revert to asynchronous mode without you realizing it, giving you a false sense of security. Conversely, if it’s too long, you might experience significant latency on your writes. The default behavior in newer MySQL versions (8.0+) is for the primary to stop blocking writes and revert to asynchronous replication if the timeout occurs, rather than continuing to block indefinitely. You need to monitor Rpl_semi_sync_master_no_ack_to_master and Rpl_semi_sync_master_acks_behind_master on the primary, and Rpl_semi_sync_replica_lost_ack_count on the replica to understand the health of the semi-sync connection.

The next step is to consider how to handle network partitions and ensure failover doesn’t result in data loss, which often leads to exploring group replication or Galera.

Want structured learning?

Take the full Express course →