
Once again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.
The regular
EXPLAIN
command lists each table that participates in a
JOIN
operation on a single row. This works perfectly for simple queries:
mysql> explain select * from employees join titles join salariesG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: employees
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 299379
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: titles
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 442724
filtered: 100.00
Extra: Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2745434
filtered: 100.00
Extra: Using join buffer (Block Nested Loop)
3 rows in set, 1 warning (0.00 sec)
You can see that the first accessed table was
employees
, then
titles
and finally
salaries
. Everything is clear.
EXPLAIN FORMAT=JSON
in this case puts everything into the
nested_loop
array (even if “MySQL isn’t limited to nested-loop joins”):
mysql> explain format=json select * from employees join titles join salariesG
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "7.277755124e16"
},
"nested_loop": [
{
"table": {
"table_name": "employees",
"access_type": "ALL",
"rows_examined_per_scan": 299379,
"rows_produced_per_join": 299379,
"filtered": "100.00",
"cost_info": {
"read_cost": "929.00",
"eval_cost": "59875.80",
"prefix_cost": "60804.80",
"data_read_per_join": "13M"
},
"used_columns": [
"emp_no",
"birth_date",
"first_name",
"last_name",
"gender",
"hire_date"
]
}
},
{
"table": {
"table_name": "titles",
"access_type": "ALL",
"rows_examined_per_scan": 442724,
"rows_produced_per_join": 132542268396,
"filtered": "100.00",
"using_join_buffer": "Block Nested Loop",
"cost_info": {
"read_cost": "62734.88",
"eval_cost": "26508453679.20",
"prefix_cost": "26508577218.88",
"data_read_per_join": "7T"
},
"used_columns": [
"emp_no",
"title",
"from_date",
"to_date"
]
}
},
{
"table": {
"table_name": "salaries",
"access_type": "ALL",
"rows_examined_per_scan": 2745434,
"rows_produced_per_join": 363886050091503872,
"filtered": "100.00",
"using_join_buffer": "Block Nested Loop",
"cost_info": {
"read_cost": "314711040856.92",
"eval_cost": "7.277721002e16",
"prefix_cost": "7.277755124e16",
"data_read_per_join": "5171P"
},
"used_columns": [
"emp_no",
"salary",
"from_date",
"to_date"
]
}
}
]
}
}
1 row in set, 1 warning (0.00 sec)
For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.
But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two
JOIN
operations and two subqueries, a regular
EXPLAIN
returns this plan:
mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <subquery2>
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: dept_manager
partitions: NULL
type: ref
possible_keys: PRIMARY,emp_no
key: PRIMARY
key_len: 4
ref: <subquery2>.emp_no
rows: 1
filtered: 100.00
Extra: NULL
*************************** 3. row ***************************
id: 1
select_type: PRIMARY
table: employees
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: <subquery2>.emp_no
rows: 1
filtered: 100.00
Extra: NULL
*************************** 4. row ***************************
id: 2
select_type: MATERIALIZED
table: <derived3>
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9
filtered: 100.00
Extra: NULL
*************************** 5. row ***************************
id: 3
select_type: DERIVED
table: titles
partitions: NULL
type: index
possible_keys: PRIMARY,emp_no
key: emp_no
key_len: 4
ref: NULL
rows: 442724
filtered: 7.51
Extra: Using where; Using index; Using temporary; Using filesort; LooseScan
*************************** 6. row ***************************
id: 3
select_type: DERIVED
table: salaries
partitions: NULL
type: ref
possible_keys: PRIMARY,emp_no
key: PRIMARY
key_len: 4
ref: employees.titles.emp_no
rows: 9
filtered: 100.00
Extra: NULL
6 rows in set, 1 warning (0.00 sec)
It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if
DERIVED
belongs to the first
JOIN
or to the second. And I am not quite sure why
<subquery2>
was marked as
PRIMARY
, which is supposed to indicate “Outermost SELECT”.
The real issue here is that the internal representation of
JOIN
is hierarchical, and MySQL Server (like in the case for
UNION
) has trouble representing an object as a “flat” table.
EXPLAIN FORMAT=JSON
, with its hierarchical nature, can help us in this case.
mysql> explain format=json select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "39.45"
},
"nested_loop": [
{
"table": {
"table_name": "<subquery2>",
"access_type": "ALL",
"materialized_from_subquery": {
"using_temporary_table": true,
"query_block": {
"table": {
"table_name": "t",
"access_type": "ALL",
"rows_examined_per_scan": 9,
"rows_produced_per_join": 9,
"filtered": "100.00",
"cost_info": {
"read_cost": "10.45",
"eval_cost": "1.80",
"prefix_cost": "12.25",
"data_read_per_join": "144"
},
"used_columns": [
"emp_no",
"salary"
],
"materialized_from_subquery": {
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 3,
"cost_info": {
"query_cost": "176246.11"
},
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": true,
"cost_info": {
"sort_cost": "9.54"
},
"nested_loop": [
{
"table": {
"table_name": "titles",
"access_type": "index",
"possible_keys": [
"PRIMARY",
"emp_no"
],
"key": "emp_no",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"rows_examined_per_scan": 442724,
"rows_produced_per_join": 33229,
"filtered": "7.51",
"using_index": true,
"loosescan": true,
"cost_info": {
"read_cost": "3380.56",
"eval_cost": "6645.94",
"prefix_cost": "63199.96",
"data_read_per_join": "2M"
},
"used_columns": [
"emp_no",
"title",
"from_date"
],
"attached_condition": "(`employees`.`titles`.`title` like '%manager%')"
}
},
{
"table": {
"table_name": "salaries",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"emp_no"
],
"key": "PRIMARY",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"ref": [
"employees.titles.emp_no"
],
"rows_examined_per_scan": 9,
"rows_produced_per_join": 9,
"filtered": "100.00",
"cost_info": {
"read_cost": "49622.62",
"eval_cost": "1.91",
"prefix_cost": "176236.57",
"data_read_per_join": "152"
},
"used_columns": [
"emp_no",
"salary",
"from_date"
]
}
}
]
}
}
}
}
}
}
}
},
{
"table": {
"table_name": "dept_manager",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"emp_no"
],
"key": "PRIMARY",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"ref": [
"<subquery2>.emp_no"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 9,
"filtered": "100.00",
"cost_info": {
"read_cost": "9.00",
"eval_cost": "1.80",
"prefix_cost": "23.05",
"data_read_per_join": "144"
},
"used_columns": [
"dept_no",
"emp_no",
"from_date",
"to_date"
]
}
},
{
"table": {
"table_name": "employees",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"ref": [
"<subquery2>.emp_no"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "0.20",
"prefix_cost": "39.45",
"data_read_per_join": "48"
},
"used_columns": [
"emp_no",
"birth_date",
"first_name",
"last_name",
"gender",
"hire_date"
]
}
}
]
}
}
1 row in set, 1 warning (0.01 sec)
At first we see that all our tables,
JOIN
operations and subqueries are in the
nested_loop
array:
"nested_loop": [
{
"table": {
"table_name": "<subquery2>",
...
{
"table": {
"table_name": "dept_manager",
...
{
"table": {
"table_name": "employees",
...
}
]
Then we see that the first table,
<subquery2>
, was materialized_from_subquery:
"table": {
"table_name": "<subquery2>",
"access_type": "ALL",
"materialized_from_subquery": {
...
Which, in its turn, was
materialized_from_subquery
too:
"table": {
"table_name": "<subquery2>",
"access_type": "ALL",
"materialized_from_subquery": {
...
"materialized_from_subquery": {
...
This last subquery performs
grouping_operation
on the other
nested_loop
(
JOIN
) of tables
titles
and
salaries
:
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": true,
"cost_info": {
"sort_cost": "9.54"
},
"nested_loop": [
{
"table": {
"table_name": "titles",
...
},
{
"table": {
"table_name": "salaries",
...
Now we have a better picture of how the query was optimized: tables
titles
and
salaries
were joined first, then
GROUP BY
was executed on the result, then the result was materialized and queried. The result of the query
select emp_no from <materialized> t
was materialized again as
<subquery2>
, and only after it joined with two other tables.
Conclusion:
EXPLAIN FORMAT=JSON
helps to understand how complex queries are optimized.
Google announced a number of new security features for Gmail users in the enterprise today. Last year, the company launched its Data Loss Prevention (DLP) feature for Google Apps Unlimited users that helps businesses keep sensitive data out of emails. Today, it’s launching the first major update of this service at the RSA Conference in San Francisco. The DLP feature allows businesses to…
The RSA security conference is being held this week in San Francisco where security pros come together to discuss strategy. IBM made several security announcements this morning ahead of the conference, headlined by the purchase of Resilient Systems. Instead of trying to prevent an attack, Resilient gives customers a plan to deal with a breach after it’s happened. While IBM offers…
MongoLab has long been the de facto service for deploying and managing MongoDB databases, but now that the company has cornered this market, it’s looking to expand into new areas. Before launching any new products, though, the company decided to change its name from MongoLab to mLab. MongoLab/mLab CEO and co-founder Will Shulman tells me that he always saw MongoLab as the first part of… 
















About 15 years ago, the Internet triggered a platform shift in the delivery of enterprise software; a wave of cloud startups quickly unseated the kings of the server era. Now, another platform evolution — the mobile era — is upon us. For nimble and opportunistic founders, the shift toward mobile is a monumental opportunity to stake ownership in a newly dominant platform.
In February 2016, I ran a week-long promotion with







