Updating MySQL index statistics is how you trick the query optimizer into picking a better execution plan.

Let’s see this in action. Imagine you have a table users with a million rows, and a query like this:

SELECT * FROM users WHERE last_login < '2023-01-01';

If you’ve recently inserted a lot of new users or updated last_login for many existing ones, the statistics MySQL has about the distribution of values in the last_login column might be stale. The optimizer might think a full table scan is faster than using an index on last_login because it "thinks" most rows will match the condition.

Here’s how you’d manually update those statistics:

ANALYZE TABLE users;

After running this, MySQL will re-scan the last_login column (and other indexed columns) to build a fresh histogram of its data distribution. If the optimizer now sees that only a small percentage of rows actually have last_login < '2023-01-01', it will likely choose to use the index, drastically speeding up your query.

The mental model here is that MySQL’s query optimizer is a strategic planner that doesn’t actually see your data. Instead, it relies on metadata – the statistics – about your data’s shape and distribution. When these statistics are out of date, the planner makes bad strategic decisions, leading to inefficient query plans. ANALYZE TABLE is the command that forces a refresh of this crucial metadata.

The problem this solves is "bad query plans." You know it’s happening when a query that should be fast, especially one using an index, is suddenly crawling. Often, it’s a query that used to be fast, and you haven’t changed the query itself, but you have changed the data. The optimizer has an outdated map of your data landscape.

Internally, ANALYZE TABLE works by reading a sample of the data in the indexed columns. For indexed columns, it builds a histogram representing the distribution of values. For non-indexed columns, it collects information about uniqueness. This information is then stored in the information_schema.STATISTICS table, which the query optimizer consults before deciding on an execution plan. It’s like giving a cartographer a fresh survey of the terrain.

The exact levers you control are primarily when you run ANALYZE TABLE. By default, MySQL automatically runs ANALYZE TABLE in the background when certain thresholds of data changes are met (e.g., after 1/16th of the rows have been updated or deleted). However, this automatic process can sometimes lag behind rapid data changes or might not be aggressive enough for your workload. Manually triggering it ensures the statistics are fresh right now.

You can also influence how ANALYZE TABLE samples data. By default, it uses a sample size determined by the innodb_stats_sample_pages (for InnoDB) or myisam_stats_sample_pages (for MyISAM) system variables. Increasing this value makes the analysis more thorough but also takes longer and consumes more I/O. For example, to make ANALYZE TABLE on InnoDB more thorough, you could set SET GLOBAL innodb_stats_sample_pages = 100; before running ANALYZE TABLE.

A common misconception is that ANALYZE TABLE is only for INSERT statements. It’s equally critical after large DELETE operations or UPDATEs that significantly alter the distribution of values within an indexed column. If you delete 80% of your data, the statistics need to reflect that vast emptiness, otherwise, the optimizer might still assume a large portion of the table is populated.

The next error you’ll hit after everything is fixed is likely a SELECT statement that now returns results in sub-second time, which might seem like an error if you were expecting it to take minutes.

Want structured learning?

Take the full Express course →