Time for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that
cost_info
knows why the optimizer prefers one index to another.
Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.
For example, let’s take the table
titles
from the standard employees database:
mysql> show create table titlesG *************************** 1. row *************************** Table: titles Create Table: CREATE TABLE `titles` ( `emp_no` int(11) NOT NULL, `title` varchar(50) NOT NULL, `from_date` date NOT NULL, `to_date` date DEFAULT NULL, PRIMARY KEY (`emp_no`,`title`,`from_date`), KEY `emp_no` (`emp_no`), CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.01 sec)
It has two indexes:
emp_no
and
PRIMARY
, each of which could be used to resolve query:
select distinct title from titles where year(from_date) > '1990';
At first glance,
emp_no
doesn’t really fit for this query.
PRIMARY
does fit, because it contains both the
title
and
from_date
fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by
emp_no
and
title
. It can, however, be used to select rows from the index. When we use
EXPLAIN
, though, it shows us that the optimizer has chosen index
emp_no
(every secondary index in InnoDB contains a link to the clustered index anyway):
mysql> explain select distinct title from titles where year(from_date) > '1990'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 442724 filtered: 100.00 Extra: Using where; Using index; Using temporary 1 row in set, 1 warning (0.00 sec)
PRIMARY KEY
exists in the field
possible_keys
, but was not chosen.
EXPLAIN FORMAT=JSON
can show us why.
First let’s run it on the original query:
mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "89796.80" }, "duplicates_removal": { "using_temporary_table": true, "using_filesort": false, "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 442724, "rows_produced_per_join": 442724, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1252.00", "eval_cost": "88544.80", "prefix_cost": "89796.80", "data_read_per_join": "27M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')" } } } } 1 row in set, 1 warning (0.01 sec)
The important part here is:
"cost_info": { "query_cost": "89796.80" },
Which shows that the overall
query_cost
is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)
Another important member of the index is
cost_info
, which belongs to the table itself:
"cost_info": { "read_cost": "1252.00", "eval_cost": "88544.80", "prefix_cost": "89796.80", "data_read_per_join": "27M" },
Here we get even more details, such as cost of read operation and evaluation.
prefix_cost
is not useful for this example, because it contains the cost of joining to the next table in
JOIN
. Since we don’t join the table
titles
with any other value of
prefix_cost
, is equivalent to the cost of the full query.
data_read_per_join
contains the amount of data that should be read for each
JOIN
operation. In our case it is once again the same as how much data we should read to fully evaluate the query.
Now let’s force index
PRIMARY
and examine the
EXPLAIN FORMAT=JSON
output:
mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "531269.80" }, "duplicates_removal": { "using_temporary_table": true, "using_filesort": false, "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no", "title", "from_date" ], "key_length": "59", "rows_examined_per_scan": 442724, "rows_produced_per_join": 442724, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "442725.00", "eval_cost": "88544.80", "prefix_cost": "531269.80", "data_read_per_join": "27M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')" } } } } 1 row in set, 1 warning (0.01 sec)
Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:
"cost_info": { "query_cost": "531269.80" },
read_cost
is 442725.00, which is 353 times greater than 1252.00. However, the
eval_cost
and
data_read_per_join
are the same as the query that uses index
emp_no
:
"cost_info": { "read_cost": "442725.00", "eval_cost": "88544.80", "prefix_cost": "531269.80", "data_read_per_join": "27M" },
These numbers clearly explain why the optimizer prefers the index
emp_no
to
PRIMARY KEY
.
In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or that the table’s statistics are outdated and need to be updated.
Conclusion:
EXPLAIN FORMAT=JSON
can be used together with
FORCE INDEX
to find out why the optimizer prefers one index to another.