MySQL’s temporary tables and derived tables, while both used to hold intermediate result sets, operate on fundamentally different principles, leading to significant performance implications that often catch people off guard.
Let’s see this in action. Imagine you have two tables: users (with id, name) and orders (with user_id, amount). We want to find users who have placed orders totaling more than $1000.
Scenario 1: Derived Table
SELECT u.name
FROM users u
JOIN (
SELECT user_id, SUM(amount) as total_amount
FROM orders
GROUP BY user_id
) AS user_totals ON u.id = user_totals.user_id
WHERE user_totals.total_amount > 1000;
In this case, MySQL processes the subquery (SELECT user_id, SUM(amount) as total_amount FROM orders GROUP BY user_id) first. It calculates the total_amount for each user_id and conceptually holds these results in memory or, if too large, on disk as an unnamed, internal temporary table. This internal structure is optimized for the immediate needs of the outer query and is discarded as soon as the outer query finishes. The optimizer tries to be smart here, potentially pushing predicates down or using indexes effectively on the orders table.
Scenario 2: Explicit Temporary Table
CREATE TEMPORARY TABLE temp_user_totals AS
SELECT user_id, SUM(amount) as total_amount
FROM orders
GROUP BY user_id;
-- Create an index on the temporary table for faster joins
CREATE INDEX idx_temp_user_id ON temp_user_totals(user_id);
SELECT u.name
FROM users u
JOIN temp_user_totals ON u.id = temp_user_totals.user_id
WHERE temp_user_totals.total_amount > 1000;
DROP TEMPORARY TABLE temp_user_totals;
Here, we explicitly create a named temporary table temp_user_totals. This table is a real table, albeit one that exists only for the duration of the current session or until explicitly dropped. Crucially, it’s stored in a dedicated temporary tablespace (often tmpfs or a specific directory configured for temporary tables). We can then add indexes to this table, which is a key differentiator. The subsequent join and filtering operations will use this indexed temporary table, potentially offering much better performance if the intermediate result set is large and needs to be accessed multiple times or if complex joins are involved.
The core problem these solve is breaking down complex queries into manageable, intermediate steps. Derived tables are syntactic sugar for subqueries that are correlated to the outer query’s scope. Temporary tables are explicit, session-scoped storage for intermediate results that can be indexed and reused.
The internal mechanism for derived tables involves MySQL materializing the subquery’s results. Depending on the size and complexity, it might use in-memory structures or spill to disk. The key is that the optimizer has a lot of leeway and might not materialize it at all, instead trying to "flatten" the query. Explicit temporary tables, on the other hand, are materialized into a separate storage area and are subject to standard table operations, including index creation.
The most surprising aspect for many is how much impact indexing a temporary table can have. When you create a derived table, you cannot add an index to it. MySQL’s optimizer will attempt to use existing indexes on the base tables or optimize the subquery execution as a whole. However, with an explicit temporary table, you gain the ability to create indexes. This is incredibly powerful if your intermediate result set is large and you’re performing joins or lookups against it in the outer query. For instance, if temp_user_totals contains millions of rows, and you join it with a similarly large users table, an index on temp_user_totals.user_id can turn a slow, full-scan join into a rapid, indexed lookup.
The next hurdle is understanding how MySQL manages the lifecycle of these temporary tables, especially in distributed environments or when dealing with very large datasets that might exceed available memory.