Once again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.
The regular
EXPLAIN
command lists each table that participates in a
JOIN
operation on a single row. This works perfectly for simple queries:
mysql> explain select * from employees join titles join salariesG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299379 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 442724 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2745434 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) 3 rows in set, 1 warning (0.00 sec)
You can see that the first accessed table was
employees
, then
titles
and finally
salaries
. Everything is clear.
EXPLAIN FORMAT=JSON
in this case puts everything into the
nested_loop
array (even if “MySQL isn’t limited to nested-loop joins”):
mysql> explain format=json select * from employees join titles join salariesG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "7.277755124e16" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "rows_examined_per_scan": 299379, "rows_produced_per_join": 299379, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59875.80", "prefix_cost": "60804.80", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } }, { "table": { "table_name": "titles", "access_type": "ALL", "rows_examined_per_scan": 442724, "rows_produced_per_join": 132542268396, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "62734.88", "eval_cost": "26508453679.20", "prefix_cost": "26508577218.88", "data_read_per_join": "7T" }, "used_columns": [ "emp_no", "title", "from_date", "to_date" ] } }, { "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2745434, "rows_produced_per_join": 363886050091503872, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "314711040856.92", "eval_cost": "7.277721002e16", "prefix_cost": "7.277755124e16", "data_read_per_join": "5171P" }, "used_columns": [ "emp_no", "salary", "from_date", "to_date" ] } } ] } } 1 row in set, 1 warning (0.00 sec)
For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.
But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two
JOIN
operations and two subqueries, a regular
EXPLAIN
returns this plan:
mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <subquery2> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: dept_manager partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 2 select_type: MATERIALIZED table: <derived3> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: NULL *************************** 5. row *************************** id: 3 select_type: DERIVED table: titles partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 442724 filtered: 7.51 Extra: Using where; Using index; Using temporary; Using filesort; LooseScan *************************** 6. row *************************** id: 3 select_type: DERIVED table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.titles.emp_no rows: 9 filtered: 100.00 Extra: NULL 6 rows in set, 1 warning (0.00 sec)
It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if
DERIVED
belongs to the first
JOIN
or to the second. And I am not quite sure why
<subquery2>
was marked as
PRIMARY
, which is supposed to indicate “Outermost SELECT”.
The real issue here is that the internal representation of
JOIN
is hierarchical, and MySQL Server (like in the case for
UNION
) has trouble representing an object as a “flat” table.
EXPLAIN FORMAT=JSON
, with its hierarchical nature, can help us in this case.
mysql> explain format=json select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "39.45" }, "nested_loop": [ { "table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "10.45", "eval_cost": "1.80", "prefix_cost": "12.25", "data_read_per_join": "144" }, "used_columns": [ "emp_no", "salary" ], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "176246.11" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 442724, "rows_produced_per_join": 33229, "filtered": "7.51", "using_index": true, "loosescan": true, "cost_info": { "read_cost": "3380.56", "eval_cost": "6645.94", "prefix_cost": "63199.96", "data_read_per_join": "2M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(`employees`.`titles`.`title` like '%manager%')" } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.titles.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "49622.62", "eval_cost": "1.91", "prefix_cost": "176236.57", "data_read_per_join": "152" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } ] } } } } } } } }, { "table": { "table_name": "dept_manager", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "9.00", "eval_cost": "1.80", "prefix_cost": "23.05", "data_read_per_join": "144" }, "used_columns": [ "dept_no", "emp_no", "from_date", "to_date" ] } }, { "table": { "table_name": "employees", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.20", "prefix_cost": "39.45", "data_read_per_join": "48" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } } ] } } 1 row in set, 1 warning (0.01 sec)
At first we see that all our tables,
JOIN
operations and subqueries are in the
nested_loop
array:
"nested_loop": [ { "table": { "table_name": "<subquery2>", ... { "table": { "table_name": "dept_manager", ... { "table": { "table_name": "employees", ... } ]
Then we see that the first table,
<subquery2>
, was materialized_from_subquery:
"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ...
Which, in its turn, was
materialized_from_subquery
too:
"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ... "materialized_from_subquery": { ...
This last subquery performs
grouping_operation
on the other
nested_loop
(
JOIN
) of tables
titles
and
salaries
:
"grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", ... }, { "table": { "table_name": "salaries", ...
Now we have a better picture of how the query was optimized: tables
titles
and
salaries
were joined first, then
GROUP BY
was executed on the result, then the result was materialized and queried. The result of the query
select emp_no from <materialized> t
was materialized again as
<subquery2>
, and only after it joined with two other tables.
Conclusion:
EXPLAIN FORMAT=JSON
helps to understand how complex queries are optimized.