The most counterintuitive truth about MySQL composite indexes is that their effectiveness hinges less on which columns you include, and more on the order in which you list them.
Let’s see it in action. Imagine a table orders with millions of rows:
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
status VARCHAR(20) NOT NULL,
total_amount DECIMAL(10, 2)
);
And a common query:
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 12345 AND status = 'Shipped';
Without an index, MySQL would perform a full table scan. With a composite index on (customer_id, status), it’s lightning fast. But what if we created it as (status, customer_id)?
-- Bad order:
CREATE INDEX idx_status_customer ON orders (status, customer_id);
-- Good order:
CREATE INDEX idx_customer_status ON orders (customer_id, status);
The query SELECT order_id, total_amount FROM orders WHERE customer_id = 12345 AND status = 'Shipped'; will use idx_customer_status efficiently. It can quickly find all rows where customer_id is 12345, and within that subset, it can efficiently find rows where status is 'Shipped'. This is because the index is structured like a sorted directory: first by customer, then by status for each customer.
If we used idx_status_customer, MySQL would first find all rows where status is 'Shipped'. Then, within that larger group, it would have to scan for customer_id = 12345. This is less efficient because the 'Shipped' group is likely much larger and less selective than the customer_id = 12345 group.
The mental model is that a composite index acts like a multi-level sorted list. When you query with equality conditions (=) on columns included in the index, MySQL can traverse this list. It can only effectively "jump" down the index levels if the preceding columns in the index definition are also specified in the WHERE clause, in the same order.
Think of the index (colA, colB, colC).
WHERE colA = 'valueA': Uses the index fully.WHERE colA = 'valueA' AND colB = 'valueB': Uses the index fully.WHERE colA = 'valueA' AND colB = 'valueB' AND colC = 'valueC': Uses the index fully.WHERE colB = 'valueB': Does not use the index effectively forcolB’s position. It might use it for a full scan ifcolAis also in the index, but it can’t "seek" tovalueBdirectly.WHERE colA = 'valueA' AND colC = 'valueC': Does not use the index effectively forcolC’s position becausecolBis missing.
The rule of thumb is to place columns with higher selectivity (columns that filter out more rows) earlier in the index. This typically means columns that are more unique or used in equality checks (=) should come before columns used in range checks (>, <, BETWEEN) or with fewer distinct values.
Consider a slightly different query: SELECT order_id FROM orders WHERE customer_id = 12345 AND order_date BETWEEN '2023-01-01' AND '2023-12-31';
An index on (customer_id, order_date) would be highly effective. MySQL would quickly locate all orders for customer_id = 12345 and then efficiently scan the order_date index entries within that subset.
The one thing most people don’t fully grasp is how this "left-prefix" rule applies to range queries and OR conditions. For a query like WHERE colA = 'A' AND colB > 'B', an index on (colA, colB) works perfectly. MySQL seeks to 'A' on colA, and then efficiently scans colB for values greater than 'B'. However, if the query was WHERE colA > 'A' AND colB = 'B', an index on (colA, colB) would not be able to use the colB part of the index for seeking. It would have to scan colA and then check colB for each matching row. This is why column order is paramount.
The next concept you’ll need to wrap your head around is how indexes handle OR conditions and the trade-offs of covering indexes.