MySQL’s Performance Schema is actually a performance enhancer in disguise, not just a diagnostic tool.

Let’s see it in action. Imagine we’re running a busy e-commerce site. Users are complaining about slow page loads, especially during peak hours. We suspect disk I/O is the culprit.

Here’s a snapshot of what we might see in Performance Schema’s events_waits_summary_global_by_event_name table during a period of slowness:

SELECT
  EVENT_NAME,
  SUM(COUNT_STAR) AS total_count,
  SUM(SUM_TIMER_WAIT) / 1000000000000 AS total_time_seconds,
  SUM(SUM_TIMER_WAIT) / SUM(COUNT_STAR) / 1000000000000 AS avg_time_seconds
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/io/file/sql/...'
ORDER BY total_time_seconds DESC
LIMIT 10;

And the output might look something like this:

+------------------------------------+------------+----------------------+--------------------+
| EVENT_NAME                         | total_count| total_time_seconds   | avg_time_seconds   |
+------------------------------------+------------+----------------------+--------------------+
| wait/io/file/sql/innodb            | 15000000   | 120.50               | 8.03e-06           |
| wait/io/file/sql/innodb_log_file   | 500000     | 85.20                | 0.000170           |
| wait/io/file/sql/innodb_data_file  | 10000000   | 35.30                | 3.53e-06           |
| wait/io/file/sql/archive_log       | 20000      | 5.10                 | 0.000255           |
+------------------------------------+------------+----------------------+--------------------+

This output tells us that wait/io/file/sql/innodb_log_file is consuming a disproportionate amount of time, even though it has fewer operations than wait/io/file/sql/innodb. This is a strong indicator that writing to the InnoDB redo log is a major bottleneck.

The problem we’re solving is that MySQL, particularly InnoDB, relies heavily on disk I/O for its operations. When disk I/O becomes saturated, the database server can’t keep up with requests, leading to slow queries and overall poor performance. Performance Schema allows us to pinpoint which specific types of I/O operations are causing the slowdown, rather than just knowing "disk is slow."

Internally, Performance Schema works by instrumenting various points within the MySQL server, including I/O operations. When a thread performs an I/O operation, it records the event and the time spent waiting. These events are aggregated, allowing us to see patterns and identify the most time-consuming operations. The events_waits_summary_global_by_event_name table provides a high-level overview of these aggregated wait events across the entire server.

The key levers we control are understanding the EVENT_NAME and correlating it with the total_time_seconds and avg_time_seconds. For example, wait/io/file/sql/innodb_log_file indicates waits related to writing to the InnoDB redo log files. High values here point to potential issues with synchronous writes to the redo log, which are critical for durability.

If we see a lot of wait/io/file/sql/innodb_log_file, the first thing to check is the innodb_flush_log_at_trx_commit setting. If it’s set to 1 (the default for ACID compliance), every transaction commit will perform a synchronous fsync() to the redo log file. This is the safest but slowest option.

Diagnosis and Fix:

  1. Identify the bottleneck: As seen above, wait/io/file/sql/innodb_log_file is high.
  2. Check configuration:
    SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
    
    If it’s 1, this is likely the cause.
  3. Consider alternative settings:
    • Setting to 2: SET GLOBAL innodb_flush_log_at_trx_commit = 2; This setting flushes the log to the OS buffer on commit but relies on the OS to fsync it to disk roughly once per second. This significantly reduces I/O but carries a small risk of data loss (up to 1 second of transactions) if the OS crashes.
    • Setting to 0: SET GLOBAL innodb_flush_log_at_trx_commit = 0; This is the fastest option, flushing the log to the OS buffer and letting the OS fsync it at its own pace, which could be much less frequent than once per second. This has a higher risk of data loss.
  4. Why it works: Changing innodb_flush_log_at_trx_commit from 1 to 2 or 0 reduces the number of synchronous fsync() calls to the redo log file. Instead of performing a disk fsync on every commit, it defers the fsync to the operating system, which can often batch writes more efficiently, thus reducing the I/O load.

Another common culprit is wait/io/file/sql/innodb_data_file. This indicates waits on reading or writing to the InnoDB data files.

Diagnosis and Fix:

  1. Identify the bottleneck: wait/io/file/sql/innodb_data_file is high.
  2. Check disk performance: Use OS-level tools like iostat or iotop to see if the underlying disk subsystem is saturated.
  3. Analyze query patterns: If I/O is high, it’s often due to inefficient queries causing excessive data reads.
    • Diagnosis Command:
      SELECT
        q.DIGEST_TEXT,
        SUM(q.SUM_ROWS_EXAMINED) AS total_rows_examined,
        SUM(q.SUM_ROWS_SENT) AS total_rows_sent,
        SUM(q.SUM_CREATED_TMP_DISK_TABLES) AS total_tmp_disk_tables,
        SUM(q.SUM_SORT_ROWS) AS total_sort_rows,
        SUM(q.SUM_TIMER_WAIT) / 1000000000000 AS total_time_seconds
      FROM performance_schema.events_statements_summary_by_digest AS q
      WHERE q.SUM_ROWS_EXAMINED > 1000000 OR q.SUM_CREATED_TMP_DISK_TABLES > 100
      ORDER BY total_time_seconds DESC
      LIMIT 10;
      
    • Fix: Optimize queries identified by the above command. This might involve adding indexes, rewriting joins, or avoiding full table scans. For example, if a query is doing a full table scan on a large table and examining millions of rows, adding an appropriate index can drastically reduce the amount of data read from disk.
  4. Consider hardware: If the queries are already optimized and disk I/O is still high, it might be time to upgrade to faster storage (e.g., SSDs) or a more performant RAID configuration.

Beyond these specific I/O waits, Performance Schema also provides insights into other areas that can indirectly impact I/O, such as buffer pool usage (wait/io/buffer_pool/innodb/buffer_pool_read_requests) and locking (wait/lock/innodb/row_lock). Understanding these relationships helps build a complete picture of performance.

The one detail often overlooked is that wait/io/file/sql/... events are user-time waits. This means they represent time spent by the MySQL server process waiting for I/O to complete. If your system is reporting high CPU usage and high I/O wait times, it means MySQL is both actively processing and also stuck waiting for the disk. This is a classic symptom of an I/O bottleneck where the CPU is often idle, not because it has nothing to do, but because it’s waiting for data to be read from or written to disk.

After addressing redo log flushing and optimizing data file I/O, the next performance bottleneck you’ll likely encounter is related to network I/O or CPU contention for query execution.

Want structured learning?

Take the full Express course →