MySQL’s binary log format isn’t just a technical detail; it fundamentally dictates how your database handles replication, point-in-time recovery, and even certain types of statement execution. The choice between Row, Statement, and Mixed formats dramatically impacts performance, safety, and the types of operations you can reliably perform.
Let’s see it in action. Imagine you have a simple table and you execute an UPDATE statement.
-- Table definition
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
last_login TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Data
INSERT INTO users (name) VALUES ('Alice'), ('Bob');
-- Update statement
UPDATE users SET last_login = NOW() WHERE name = 'Alice';
Now, let’s peek into the binary log with different formats.
Statement-Based Replication (SBR)
In SBR, the actual SQL statement that modified the data is written to the binary log.
If you were to view the binary log using mysqlbinlog, you’d see something like this (simplified):
# ... metadata ...
# at 123
# mysql-bin.000001:123
# START TRANSACTION
# at 150
# SET TIMESTAMP=1678886400
# ...
# UPDATE users SET last_login = NOW() WHERE name = 'Alice'
# COMMIT
How it works: The replica server simply re-executes the exact same UPDATE statement.
When it’s good:
- Smaller log size: For operations that affect many rows with a single statement (like
UPDATE users SET status = 'inactive' WHERE last_login < '2023-01-01'), SBR logs only one line, whereas RBR would log each row modification. - Simpler to read: The log is human-readable SQL.
When it’s bad:
- Non-deterministic statements:
NOW(),RAND(),UUID(), and stored routines that rely on non-deterministic functions or have side effects can lead to replication divergence. The replica might executeNOW()at a different time than the primary, resulting in different data. LOAD DATA INFILE: This statement is problematic because the file itself isn’t part of the log, and its contents might differ on the replica.ALTER TABLEand other DDL: These can be tricky. While generally safe, certain DDL operations might behave differently across versions or configurations.
Row-Based Replication (RBR)
In RBR, the binary log records the changes to individual rows. For each row modified, it logs the old and new values.
For the same UPDATE statement, the binary log (simplified) might look like this:
# ... metadata ...
# at 123
# mysql-bin.000001:123
# START TRANSACTION
# at 150
# SET TIMESTAMP=1678886400
# ...
# UPDATE employees SET last_login = '2023-03-15 10:00:00' WHERE id = 1
# ... (details of the row change) ...
# COMMIT
This would typically be represented in a more structured, binary format within the log, detailing the table, the primary key(s) of the row(s) affected, and the columns that changed with their new values.
How it works: The replica server receives the precise row modifications and applies them directly. It doesn’t re-execute the statement, but rather the result of the statement on specific rows.
When it’s good:
- Safety and Determinism: RBR is the safest format. It guarantees that the replica will have the exact same data as the primary because it logs the actual data changes, not the statements that produced them. Non-deterministic functions like
NOW()are resolved on the primary, and that specific timestamp is logged. - Reliable for all statements: Works perfectly with
UPDATE,DELETE,INSERT,LOAD DATA INFILE, and most DDL.
When it’s bad:
- Larger log size: If a single
UPDATEstatement affects thousands of rows, RBR will log each of those row changes, potentially leading to a much larger binary log file compared to SBR. - Performance overhead: Writing detailed row data can sometimes be more I/O intensive than writing a simple SQL statement.
Mixed-Based Replication (MBR)
Mixed-based replication is a hybrid approach. It defaults to Statement-Based Replication for safety and efficiency but switches to Row-Based Replication for statements that are deemed unsafe for SBR.
How it works: MySQL analyzes each statement. If it’s a simple, deterministic statement (like UPDATE users SET name = 'Alice' WHERE id = 1), it’s logged using SBR. If the statement uses non-deterministic functions (NOW()), involves LOAD DATA INFILE, or is otherwise flagged as potentially unsafe for SBR, it’s automatically logged using RBR.
When it’s good:
- Best of both worlds: Offers the performance and log size advantages of SBR for most common operations, while providing the safety of RBR for problematic statements.
- Automatic safety: You don’t have to manually identify and convert risky statements; MySQL handles it.
When it’s bad:
- Still a potential for SBR issues: While it switches to RBR for known unsafe statements, there might be edge cases or future SQL constructs that MySQL doesn’t automatically detect as unsafe for SBR, leading to potential divergence.
- Log complexity: The log can contain a mix of statement and row events, making it slightly more complex to parse manually.
How to Choose and Configure
The binlog_format system variable controls this. You can set it globally or per session.
To check your current setting:
SHOW VARIABLES LIKE 'binlog_format';
To change it (requires SUPER privilege and a server restart, or can be set dynamically for new connections):
-- Set globally for new connections (dynamic)
SET GLOBAL binlog_format = 'ROW';
-- Set in a specific session
SET SESSION binlog_format = 'ROW';
Recommendation:
- For most modern applications,
ROWis the safest and recommended format. The potential for log size increase is usually manageable, and the guarantee of data consistency is paramount. MIXEDis a good compromise if you have a large, existing codebase and are concerned about log size, but want better safety than pure SBR.STATEMENTis generally discouraged unless you have a very specific, well-understood workload where you can guarantee all statements are deterministic and safe, and log size is a critical constraint.
The next thing you’ll likely encounter is understanding how these formats interact with specific MySQL features like GTID (Global Transaction Identifiers) and binlog_row_image.