Feb
22
2016
--

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that

cost_info

  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table

titles

  from the standard employees database:

mysql> show create table titlesG
*************************** 1. row ***************************
       Table: titles
Create Table: CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

It has two indexes:

emp_no

  and

PRIMARY

, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, 

emp_no

  doesn’t really fit for this query.

PRIMARY

  does fit, because it contains both the 

title

  and

from_date

  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by

emp_no

  and

title

 .  It can, however, be used to select rows from the index. When we use 

EXPLAIN

 , though, it shows us that the optimizer has chosen index

emp_no

  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using where; Using index; Using temporary
1 row in set, 1 warning (0.00 sec)

PRIMARY KEY

  exists in the field

possible_keys

, but was not chosen.

EXPLAIN FORMAT=JSON

  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "89796.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "emp_no",
        "used_key_parts": [
          "emp_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": {
      "query_cost": "89796.80"
    },

Which shows that the overall

query_cost

  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is

cost_info

, which belongs to the table itself:

"cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },

Here we get even more details, such as cost of read operation and evaluation.

prefix_cost

  is not useful for this example, because it contains the cost of joining to the next table in

JOIN

. Since we don’t join the table

titles

  with any other value of

prefix_cost

, is equivalent to the cost of the full query.

data_read_per_join

  contains the amount of data that should be read for each

JOIN

  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index

PRIMARY

  and examine the 

EXPLAIN FORMAT=JSON

  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "531269.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "emp_no",
          "title",
          "from_date"
        ],
        "key_length": "59",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": {
      "query_cost": "531269.80"
    },

read_cost

  is 442725.00, which is 353 times greater than 1252.00. However, the 

eval_cost

  and

data_read_per_join

  are the same as the query that uses index

emp_no

 :

"cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },

These numbers clearly explain why the optimizer prefers the index 

emp_no

  to

PRIMARY KEY

.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  can be used together with

FORCE INDEX

  to find out why the optimizer prefers one index to another.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com