Another post in the EXPLAIN FORMAT=JSON is Cool! series! In this post, we’ll discuss how the EXPLAIN FORMAT=JSON provides optimization details for
ORDER BY
and
GROUP BY
operations in conjunction with
order_by_subqueries
and
group_by_subqueries
.
EXPLAIN FORMAT=JSON
can print details on how a subquery in
ORDER BY
is optimized:
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select emp_no limit 1)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "60833.60"
},
"ordering_operation": {
"using_filesort": true,
"table": {
"table_name": "employees",
"access_type": "ALL",
"rows_examined_per_scan": 299843,
"rows_produced_per_join": 299843,
"filtered": "100.00",
"cost_info": {
"read_cost": "865.00",
"eval_cost": "59968.60",
"prefix_cost": "60833.60",
"data_read_per_join": "13M"
},
"used_columns": [
"emp_no",
"first_name",
"last_name"
]
},
"order_by_subqueries": [
{
"dependent": true,
"cacheable": false,
"query_block": {
"select_id": 2,
"message": "No tables used"
}
}
]
}
}
}
1 row in set, 2 warnings (0.00 sec)
Note (Code 1276): Field or reference 'employees.employees.emp_no' of SELECT #2 was resolved in SELECT #1
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`employees`.`emp_no` limit 1)
The above code shows member
ordering_operation
of
query_block
(which includes the
order_by_subqueries
array) with information on how the subquery in
ORDER BY
was optimized.
This is a simple example. In real life you can have larger subqueries in the
ORDER BY
clause. For example, take this more complicated and slightly crazy query:
select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)
Run a regular
EXPLAIN
on it. If we imagine this is a regular subquery, we won’t know if it can be cached or would be executed for each row sorted.
mysql> explain select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: employees
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 299843
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 2
select_type: SUBQUERY
table: dept_emp
partitions: NULL
type: index
possible_keys: PRIMARY,emp_no,dept_no
key: dept_no
key_len: 4
ref: NULL
rows: 331215
filtered: 100.00
Extra: Using index
*************************** 3. row ***************************
id: 2
select_type: SUBQUERY
table: salaries
partitions: NULL
type: ref
possible_keys: PRIMARY,emp_no
key: emp_no
key_len: 4
ref: employees.dept_emp.emp_no
rows: 10
filtered: 100.00
Extra: Using index
3 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)
EXPLAIN FORMAT=JSON
provides a completely different picture:
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "60833.60"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "employees",
"access_type": "ALL",
"rows_examined_per_scan": 299843,
"rows_produced_per_join": 299843,
"filtered": "100.00",
"cost_info": {
"read_cost": "865.00",
"eval_cost": "59968.60",
"prefix_cost": "60833.60",
"data_read_per_join": "13M"
},
"used_columns": [
"emp_no",
"first_name",
"last_name"
]
},
"optimized_away_subqueries": [
{
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "1082124.21"
},
"grouping_operation": {
"using_filesort": false,
"nested_loop": [
{
"table": {
"table_name": "dept_emp",
"access_type": "index",
"possible_keys": [
"PRIMARY",
"emp_no",
"dept_no"
],
"key": "dept_no",
"used_key_parts": [
"dept_no"
],
"key_length": "4",
"rows_examined_per_scan": 331215,
"rows_produced_per_join": 331215,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "673.00",
"eval_cost": "66243.00",
"prefix_cost": "66916.00",
"data_read_per_join": "5M"
},
"used_columns": [
"emp_no",
"dept_no"
]
}
},
{
"table": {
"table_name": "salaries",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"emp_no"
],
"key": "emp_no",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"ref": [
"employees.dept_emp.emp_no"
],
"rows_examined_per_scan": 10,
"rows_produced_per_join": 3399374,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "335333.33",
"eval_cost": "679874.87",
"prefix_cost": "1082124.21",
"data_read_per_join": "51M"
},
"used_columns": [
"emp_no",
"from_date"
]
}
}
]
}
}
}
]
}
}
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)
We see that the subquery was optimized away: member
optimized_away_subqueries
exists, but there is no
order_by_subqueries
in the
ordering_operation
object. We can also see that the subquery was cached:
"cacheable": true
.
EXPLAIN FORMAT=JSON
also provides information about subqueries in the
GROUP BY
clause. It uses the
group_by_subqueries
array in the
grouping_operation
member for this purpose.
mysql> explain format=json select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "3412037.60"
},
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": true,
"cost_info": {
"sort_cost": "2838638.00"
},
"table": {
"table_name": "salaries",
"access_type": "ALL",
"rows_examined_per_scan": 2838638,
"rows_produced_per_join": 2838638,
"filtered": "100.00",
"cost_info": {
"read_cost": "5672.00",
"eval_cost": "567727.60",
"prefix_cost": "573399.60",
"data_read_per_join": "43M"
},
"used_columns": [
"emp_no",
"salary",
"from_date"
]
},
"group_by_subqueries": [
{
"dependent": true,
"cacheable": false,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "881731.00"
},
"table": {
"table_name": "t",
"access_type": "ALL",
"rows_examined_per_scan": 3526884,
"rows_produced_per_join": 3526884,
"filtered": "100.00",
"cost_info": {
"read_cost": "176354.20",
"eval_cost": "705376.80",
"prefix_cost": "881731.00",
"data_read_per_join": "134M"
},
"used_columns": [
"dept_no",
"s",
"c"
],
"attached_condition": "((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`)))",
"materialized_from_subquery": {
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 3,
"cost_info": {
"query_cost": "1106758.94"
},
"grouping_operation": {
"using_filesort": false,
"nested_loop": [
{
"table": {
"table_name": "dept_emp",
"access_type": "index",
"possible_keys": [
"PRIMARY",
"emp_no",
"dept_no"
],
"key": "dept_no",
"used_key_parts": [
"dept_no"
],
"key_length": "4",
"rows_examined_per_scan": 331215,
"rows_produced_per_join": 331215,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "673.00",
"eval_cost": "66243.00",
"prefix_cost": "66916.00",
"data_read_per_join": "5M"
},
"used_columns": [
"emp_no",
"dept_no"
]
}
},
{
"table": {
"table_name": "salaries",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"emp_no"
],
"key": "PRIMARY",
"used_key_parts": [
"emp_no"
],
"key_length": "4",
"ref": [
"employees.dept_emp.emp_no"
],
"rows_examined_per_scan": 10,
"rows_produced_per_join": 3526884,
"filtered": "100.00",
"cost_info": {
"read_cost": "334466.14",
"eval_cost": "705376.80",
"prefix_cost": "1106758.95",
"data_read_per_join": "53M"
},
"used_columns": [
"emp_no",
"salary",
"from_date"
]
}
}
]
}
}
}
}
}
}
]
}
}
}
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))
Again, this output gives a clear view of query optimization: subquery in
GROUP BY
itself cannot be optimized, cached or converted into temporary table, but the subquery inside the subquery (
select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no
) could be materialized into a temporary table and cached.
A regular
EXPLAIN
command does not provide such details:
mysql> explain select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2838638
filtered: 100.00
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: <derived3>
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3526884
filtered: 100.00
Extra: Using where
*************************** 3. row ***************************
id: 3
select_type: DERIVED
table: dept_emp
partitions: NULL
type: index
possible_keys: PRIMARY,emp_no,dept_no
key: dept_no
key_len: 4
ref: NULL
rows: 331215
filtered: 100.00
Extra: Using index
*************************** 4. row ***************************
id: 3
select_type: DERIVED
table: salaries
partitions: NULL
type: ref
possible_keys: PRIMARY,emp_no
key: PRIMARY
key_len: 4
ref: employees.dept_emp.emp_no
rows: 10
filtered: 100.00
Extra: NULL
4 rows in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))
Most importantly, we cannot guess from the output if the
DERIVED
subquery can be cached.
Conlcusion:
EXPLAIN FORMAT=JSON
provides details on how subqueries in
ORDER BY
and
GROUP BY
clauses are optimized.