Prevent and Recover from MySQL Galera Cluster Split-Brain (2026)

A Galera cluster can, and often will, split into two or more independent clusters that no longer synchronize.

Here’s how to prevent and recover from that dreaded "split-brain" scenario.

Prevention is Key

The most common culprit for split-brain is a network interruption that isolates nodes from each other, preventing them from receiving or sending replication traffic. This can manifest as a complete network outage between data centers, a faulty switch, or even a misconfigured firewall blocking traffic on the Galera ports (usually 3306 for MySQL and 4567 for Galera replication).

1. Network Stability and Redundancy: This sounds obvious, but it’s the bedrock. Ensure your network infrastructure between Galera nodes is robust.

Diagnosis: Monitor network latency and packet loss between nodes. Tools like ping (with -c 100 for 100 packets) and mtr are your friends.
```
ping -c 100 <node_ip>
mtr <node_ip>
```
Fix: Implement redundant network paths. Use bonded interfaces on your servers. If you’re in a multi-datacenter setup, ensure your inter-DC links are highly available and have sufficient bandwidth. This prevents a single cable or switch failure from taking down the cluster.
Why it works: Redundant paths ensure that if one network link fails, traffic can still flow between nodes, maintaining quorum and preventing isolation.

2. Quorum and wsrep_provider_options: Galera relies on a quorum to make decisions. If a node can’t communicate with a majority of the cluster, it will stop accepting writes to prevent inconsistencies. The wsrep_cluster_address setting is critical here.

Diagnosis: Check your my.cnf (or galera.cnf) for the wsrep_cluster_address setting. It should list all nodes in the cluster.
```
[galera]
wsrep_cluster_address = "gcomm://192.168.1.101,192.168.1.102,192.168.1.103"
```
Fix: Ensure this setting accurately lists all nodes intended to be in the cluster. If a node is missing, it might not know about the others and could get isolated. Restarting the node after correction is necessary.
Why it works: This directive tells each node which other nodes it should attempt to connect to and form a cluster with. A complete list is vital for initial bootstrapping and ongoing communication.

3. wsrep_sst_donor and SST Failures: When a new node joins or a restarted node needs to catch up, it performs a State Snapshot Transfer (SST). If this process fails, it can leave a node in an inconsistent state or prevent it from joining.

Diagnosis: Check the MySQL error logs (mysqld.log or similar) on both the donor and the joining node for SST-related errors. Look for messages indicating failed connections, timeouts, or data transfer issues.
```
grep "SST failed" /var/log/mysql/mysqld.log
```
Fix: Ensure the SST user has sufficient privileges (RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT, SUPER) on the donor node. Verify that wsrep_sst_method (e.g., rsync, xtrabackup-v2) is configured correctly and that the necessary tools are installed and accessible on all nodes. If using xtrabackup, ensure it’s compatible with your MySQL version.
Why it works: A successful SST ensures the joining node receives a consistent, up-to-date copy of the data, allowing it to rejoin the cluster without causing divergence.

4. wsrep_cluster_name Consistency: A simple typo or omission in this parameter can prevent nodes from recognizing each other as part of the same cluster.

Diagnosis: Verify the wsrep_cluster_name setting in my.cnf on all nodes.
```
[galera]
wsrep_cluster_name = "my_galera_cluster"
```
Fix: Ensure the wsrep_cluster_name is identical across all nodes. A mismatch will cause them to form separate, non-communicating clusters. Restart all nodes after correcting.
Why it works: This parameter acts as a unique identifier for your cluster. Nodes will only attempt to join or communicate with other nodes that share the exact same cluster name.

5. gmcast.listen_addr and gmcast.mcast_addr: These settings relate to how Galera nodes discover each other, especially in multicast or specific network configurations. Incorrect settings here can lead to nodes being unable to find peers.

Diagnosis: Examine your my.cnf for gmcast.listen_addr and gmcast.mcast_addr.
```
[galera]
gmcast.listen_addr = "tcp://0.0.0.0:4567"
gmcast.mcast_addr = "239.255.255.255:4567" # If using multicast
```
If you’re not using multicast (which is generally recommended for stability), ensure gmcast.listen_addr is set correctly and that nodes are configured to use gcomm:// with IP addresses.
Fix: For unicast (recommended), ensure wsrep_cluster_address is correctly configured with IP addresses of peers. Disable multicast (gmcast.mcast_addr) if not explicitly needed and understood, as it can be unreliable on many modern networks. Ensure firewalls allow UDP traffic on the multicast address if it’s used.
Why it works: These settings dictate how nodes broadcast their presence and listen for other nodes. Correct configuration ensures nodes can discover and communicate with each other.

6. innodb_flush_log_at_trx_commit and Data Integrity: While not a direct cause of split-brain, an incorrect setting here can exacerbate data loss after a split occurs.

Diagnosis: Check innodb_flush_log_at_trx_commit. A value of 1 is ACID compliant but can be slower. Values of 0 or 2 are faster but risk data loss on crash.
```
[mysqld]
innodb_flush_log_at_trx_commit = 1
```
Fix: For Galera, innodb_flush_log_at_trx_commit = 1 is strongly recommended to ensure data integrity across nodes. If it’s set lower, and a node crashes or becomes isolated, transactions that were acknowledged but not yet flushed to disk could be lost when that node rejoins or is restarted.
Why it works: Setting this to 1 ensures that each committed transaction’s log entry is flushed to disk synchronously, guaranteeing durability even if the server crashes immediately after acknowledging the commit.

Recovering from Split-Brain

When split-brain happens, you’ll typically see nodes stop accepting writes, and the MySQL error logs will show messages about failing to reach quorum or diverging states.

The General Strategy:

Identify the "True" Cluster: Determine which of the split partitions contains the most up-to-date and correct data. This often involves looking at transaction logs, application state, or simply which partition has the majority of nodes.
Isolate the "Bad" Partition: Stop MySQL (or Galera) on all nodes in the partition you’ve deemed incorrect. This prevents them from accepting new writes or further diverging.
Bootstrap the "Good" Partition: If your "true" cluster is only a subset of the original nodes, you might need to restart it. If one node is already in a good state, you can use it to bootstrap the others.
Re-integrate and Re-sync: Start the nodes from the "bad" partition one by one, ensuring they perform an SST from a node in the "good" partition.

Specific Recovery Steps (Common Scenario: Two Partitions, A and B)

Let’s say nodes N1, N2 are in partition A, and N3, N4 are in partition B. You’ve determined A is the correct partition.

Stop Writes on Partition B:
- On N3:
```
sudo systemctl stop mysql
# Or: sudo service mysql stop
```
- On N4:
```
sudo systemctl stop mysql
# Or: sudo service mysql stop
```
- Why it works: This prevents any further writes from occurring on the partition that is considered "wrong," stopping data divergence.

Bootstrap Partition A (if necessary): If N1 and N2 are still running and communicating, you might not need to do anything here. If they also stopped or you want to be sure:

On N1 (assuming it’s the most stable node):

# Ensure N1 is clean and ready to bootstrap
# If N1 was part of a split, you might need to clear its Galera state.
# This is dangerous and depends on your setup. A common method
# is to stop mysql, delete grastate.dat and ibdata files, then start.
# BE EXTREMELY CAREFUL WITH THIS.
sudo systemctl stop mysql
sudo rm /var/lib/mysql/grastate.dat
# You might need to remove ibdata files too, but this is a full data reset.
# For a simple restart of a node that was NOT part of the bad partition:
sudo systemctl start mysql

Why it works: Starting a node with a clean grastate.dat (or by deleting it and associated data files if a full reset is needed) forces it to re-initialize its Galera state. If it’s the only node, it will attempt to bootstrap a new cluster.

Re-integrate Partition B Nodes: Now, bring N3 and N4 back online, but have them join Partition A. This is done by starting them with an SST.
- On N3 (assuming N1 is now a healthy node in the correct cluster):
```
# Ensure N3 is clean. Stop MySQL if running.
sudo systemctl stop mysql
# Reset N3's Galera state to ensure it performs an SST.
# Typically, this means ensuring grastate.dat indicates it needs SST.
# The safest way is often to remove grastate.dat and let it start fresh.
sudo rm /var/lib/mysql/grastate.dat
# Restart MySQL. It will see no grastate.dat and attempt to join the cluster
# specified in wsrep_cluster_address, performing an SST.
sudo systemctl start mysql
```
- Repeat for N4.
- Why it works: By removing grastate.dat, you signal to Galera that this node doesn’t have a valid cluster state and needs to perform a full State Snapshot Transfer (SST) from a healthy node in the cluster it’s trying to join. This overwrites its local data with a consistent copy.
Verify and Monitor: Once all nodes are back up, check SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; on each node. It should show the correct number of nodes. Monitor for any new errors.

The next error you’ll hit after fixing split-brain is usually related to application-level data consistency issues if the split-brain wasn’t handled perfectly, or perhaps a resource exhaustion problem if your cluster was already under heavy load.