MySQL’s B-tree indexes, the default and most common type, aren’t just sorted lists; they’re balanced trees where each node points to other nodes, allowing for efficient range scans and exact matches by traversing a predictable path from root to leaf.
Let’s see a B-tree index in action. Imagine a users table with columns id (primary key, automatically indexed), username (indexed), and email (indexed).
-- Table structure
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL,
INDEX idx_username (username),
INDEX idx_email (email)
);
-- Sample data
INSERT INTO users (username, email) VALUES
('alice', 'alice@example.com'),
('bob', 'bob@example.com'),
('charlie', 'charlie@example.com'),
('david', 'david@example.com');
-- Query using the idx_username index
EXPLAIN SELECT * FROM users WHERE username = 'bob';
The EXPLAIN output for this query would show idx_username being used, indicating that MySQL traverses the B-tree for username to quickly locate the row where username is 'bob'.
Now, consider a query that doesn’t benefit as much from a single index:
EXPLAIN SELECT * FROM users WHERE username = 'bob' AND email = 'bob@example.com';
If only idx_username and idx_email exist, MySQL might use one index and then filter the results with the other condition, or it might perform a full table scan if neither index is deemed efficient enough. This is where composite indexes shine.
A composite index is an index on multiple columns, like (username, email). The order of columns in a composite index is crucial. MySQL can use a composite index for queries that filter on the leading columns of the index. For example, an index on (username, email) can be used for:
WHERE username = 'bob'WHERE username = 'bob' AND email = 'bob@example.com'
But it cannot be used efficiently for:
WHERE email = 'bob@example.com'
The system builds the composite index by sorting rows first by username, and then for rows with the same username, it sorts them by email. This structure is what allows for efficient lookups on the leading columns.
-- Adding a composite index
ALTER TABLE users ADD INDEX idx_username_email (username, email);
-- Query using the composite index
EXPLAIN SELECT * FROM users WHERE username = 'bob' AND email = 'bob@example.com';
The EXPLAIN output will now show idx_username_email being used, and importantly, it will be able to satisfy both conditions directly from the index without further filtering.
Full-text indexes are a different beast, designed for searching text within character-based columns (VARCHAR, TEXT, etc.) using natural language queries. Unlike B-trees which are good for exact matches and range scans, full-text indexes break down text into words (tokens) and build an inverted index. This index maps words to the documents (rows) they appear in.
Consider a articles table with a body column containing article content:
CREATE TABLE articles (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255),
body TEXT,
FULLTEXT INDEX ft_body (body)
);
INSERT INTO articles (title, body) VALUES
('MySQL Basics', 'This article covers the basics of MySQL indexing.'),
('Advanced SQL', 'Learn advanced SQL techniques, including subqueries and joins.');
A query using a full-text index looks quite different:
EXPLAIN SELECT * FROM articles WHERE MATCH(body) AGAINST('MySQL indexing' IN NATURAL LANGUAGE MODE);
The MATCH...AGAINST syntax is specific to full-text searches. The ft_body index allows MySQL to quickly find articles containing the words "MySQL" and "indexing" without scanning every row and performing string comparisons. The "natural language mode" is the default and treats the search string as a phrase. Other modes, like BOOLEAN MODE, offer more control with operators like +, -, and *.
The most surprising thing about B-tree indexes is that their performance degrades significantly for queries that only use a suffix of the indexed columns, even in a composite index. You can’t efficiently search for email if your index is (username, email). This is because the B-tree is ordered by the first column, then the second, and so on. If you don’t provide a value for the first column, the tree structure doesn’t help you narrow down the search space for the subsequent columns.
One common pitfall is misunderstanding how MySQL chooses which index to use when multiple are available. The query optimizer, not you, makes the final decision based on estimated row counts and other statistics. Sometimes, even with a perfectly good index, the optimizer might choose a full table scan if it thinks it will be faster (e.g., if the query selects a very large percentage of rows). This is why EXPLAIN is your best friend.
The next concept you’ll likely encounter is understanding index selectivity and how it impacts the optimizer’s choices, leading into discussions about index cardinality and when to consider dropping or altering existing indexes.