Apr
06
2016
--

EXPLAIN FORMAT=JSON wrap-up

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON wrap-upThis blog is an EXPLAIN FORMAT=JSON wrap-up for the series of posts I’ve done in the last few months.

In this series, we’ve discussed everything unique to

EXPLAIN FORMAT=JSON

. I intentionally skipped a description of members such as

table_name

,

access_type

  or

select_id

, which are not unique.

In this series, I only mentioned in passing members that replace information from the

Extra

 column in the regular

EXPLAIN

 output, such as

using_join_buffer

 ,

partitions

,

using_temporary_table

  or simply

message

. You can see these in queries like the following:

mysql> explain format=json select rand() from dual
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "No tables used"
  }
}
1 row in set, 1 warning (0.00 sec)

Or

mysql> explain format=json select emp_no from titles where 'Senior Engineer' = 'Senior Cat'
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "Impossible WHERE"
  }
}
1 row in set, 1 warning (0.01 sec)

Their use is fairly intuitive, similar to regular

EXPLAIN

, and I don’t think one can achieve anything from reading a blog post about each of them.

The only thing left to list is a Table of Contents for the series:

attached_condition: How EXPLAIN FORMAT=JSON can spell-check your queries

rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

used_columns: EXPLAIN FORMAT=JSON tells when you should use covered indexes

used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used

EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

Thanks for following the series!

Feb
29
2016
--

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONOnce again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.

The regular

EXPLAIN

  command lists each table that participates in a 

JOIN

  operation on a single row. This works perfectly for simple queries:

mysql> explain select * from employees join titles join salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299379
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2745434
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
3 rows in set, 1 warning (0.00 sec)

You can see that the first accessed table was

employees

, then

titles

  and finally 

salaries

. Everything is clear.

EXPLAIN FORMAT=JSON

 in this case puts everything into the 

nested_loop

array (even if “MySQL isn’t limited to nested-loop joins”):

mysql> explain format=json select * from employees join titles join salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "7.277755124e16"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "employees",
          "access_type": "ALL",
          "rows_examined_per_scan": 299379,
          "rows_produced_per_join": 299379,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "929.00",
            "eval_cost": "59875.80",
            "prefix_cost": "60804.80",
            "data_read_per_join": "13M"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "titles",
          "access_type": "ALL",
          "rows_examined_per_scan": 442724,
          "rows_produced_per_join": 132542268396,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "62734.88",
            "eval_cost": "26508453679.20",
            "prefix_cost": "26508577218.88",
            "data_read_per_join": "7T"
          },
          "used_columns": [
            "emp_no",
            "title",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2745434,
          "rows_produced_per_join": 363886050091503872,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "314711040856.92",
            "eval_cost": "7.277721002e16",
            "prefix_cost": "7.277755124e16",
            "data_read_per_join": "5171P"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date",
            "to_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)

For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.

But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two

JOIN

 operations and two subqueries, a regular

EXPLAIN

 returns this plan:

mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <subquery2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: dept_manager
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: <derived3>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 3
  select_type: DERIVED
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 7.51
        Extra: Using where; Using index; Using temporary; Using filesort; LooseScan
*************************** 6. row ***************************
           id: 3
  select_type: DERIVED
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.titles.emp_no
         rows: 9
     filtered: 100.00
        Extra: NULL
6 rows in set, 1 warning (0.00 sec)

It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if

DERIVED

 belongs to the first

JOIN

 or to the second. And I am not quite sure why

<subquery2>

  was marked as

PRIMARY

, which is supposed to indicate “Outermost SELECT”.

The real issue here is that the internal representation of

JOIN

 is hierarchical, and MySQL Server (like in the case for

UNION

) has trouble representing an object as a “flat” table.

EXPLAIN FORMAT=JSON

, with its hierarchical nature, can help us in this case.

mysql> explain format=json  select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "39.45"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
            "using_temporary_table": true,
            "query_block": {
              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "rows_examined_per_scan": 9,
                "rows_produced_per_join": 9,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "10.45",
                  "eval_cost": "1.80",
                  "prefix_cost": "12.25",
                  "data_read_per_join": "144"
                },
                "used_columns": [
                  "emp_no",
                  "salary"
                ],
                "materialized_from_subquery": {
                  "using_temporary_table": true,
                  "dependent": false,
                  "cacheable": true,
                  "query_block": {
                    "select_id": 3,
                    "cost_info": {
                      "query_cost": "176246.11"
                    },
                    "grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
                            "access_type": "index",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "emp_no",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "rows_examined_per_scan": 442724,
                            "rows_produced_per_join": 33229,
                            "filtered": "7.51",
                            "using_index": true,
                            "loosescan": true,
                            "cost_info": {
                              "read_cost": "3380.56",
                              "eval_cost": "6645.94",
                              "prefix_cost": "63199.96",
                              "data_read_per_join": "2M"
                            },
                            "used_columns": [
                              "emp_no",
                              "title",
                              "from_date"
                            ],
                            "attached_condition": "(`employees`.`titles`.`title` like '%manager%')"
                          }
                        },
                        {
                          "table": {
                            "table_name": "salaries",
                            "access_type": "ref",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "PRIMARY",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "ref": [
                              "employees.titles.emp_no"
                            ],
                            "rows_examined_per_scan": 9,
                            "rows_produced_per_join": 9,
                            "filtered": "100.00",
                            "cost_info": {
                              "read_cost": "49622.62",
                              "eval_cost": "1.91",
                              "prefix_cost": "176236.57",
                              "data_read_per_join": "152"
                            },
                            "used_columns": [
                              "emp_no",
                              "salary",
                              "from_date"
                            ]
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        }
      },
      {
        "table": {
          "table_name": "dept_manager",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "emp_no"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 9,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "9.00",
            "eval_cost": "1.80",
            "prefix_cost": "23.05",
            "data_read_per_join": "144"
          },
          "used_columns": [
            "dept_no",
            "emp_no",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "employees",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "1.00",
            "eval_cost": "0.20",
            "prefix_cost": "39.45",
            "data_read_per_join": "48"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.01 sec)

At first we see that all our tables,

JOIN

 operations and subqueries are in the

nested_loop

 array:

"nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
...
      {
        "table": {
          "table_name": "dept_manager",
...
      {
        "table": {
          "table_name": "employees",
...
      }
    ]

Then we see that the first table,

<subquery2>

, was materialized_from_subquery:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...

Which, in its turn, was

materialized_from_subquery

 too:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...
                "materialized_from_subquery": {
...

This last subquery performs

grouping_operation

  on the other 

nested_loop

  (

JOIN

) of tables

titles

  and

salaries

:

"grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
...
                        },
                        {
                          "table": {
                            "table_name": "salaries",
...

Now we have a better picture of how the query was optimized: tables

titles

 and

salaries

  were joined first, then

GROUP BY

 was executed on the result, then the result was materialized and queried. The result of the query

select emp_no from <materialized> t

  was materialized again as

<subquery2>

, and only after it joined with two other tables.

Conclusion:

EXPLAIN FORMAT=JSON

  helps to understand how complex queries are optimized.

Feb
22
2016
--

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that

cost_info

  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table

titles

  from the standard employees database:

mysql> show create table titlesG
*************************** 1. row ***************************
       Table: titles
Create Table: CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

It has two indexes:

emp_no

  and

PRIMARY

, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, 

emp_no

  doesn’t really fit for this query.

PRIMARY

  does fit, because it contains both the 

title

  and

from_date

  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by

emp_no

  and

title

 .  It can, however, be used to select rows from the index. When we use 

EXPLAIN

 , though, it shows us that the optimizer has chosen index

emp_no

  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using where; Using index; Using temporary
1 row in set, 1 warning (0.00 sec)

PRIMARY KEY

  exists in the field

possible_keys

, but was not chosen.

EXPLAIN FORMAT=JSON

  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "89796.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "emp_no",
        "used_key_parts": [
          "emp_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": {
      "query_cost": "89796.80"
    },

Which shows that the overall

query_cost

  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is

cost_info

, which belongs to the table itself:

"cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },

Here we get even more details, such as cost of read operation and evaluation.

prefix_cost

  is not useful for this example, because it contains the cost of joining to the next table in

JOIN

. Since we don’t join the table

titles

  with any other value of

prefix_cost

, is equivalent to the cost of the full query.

data_read_per_join

  contains the amount of data that should be read for each

JOIN

  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index

PRIMARY

  and examine the 

EXPLAIN FORMAT=JSON

  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "531269.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "emp_no",
          "title",
          "from_date"
        ],
        "key_length": "59",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": {
      "query_cost": "531269.80"
    },

read_cost

  is 442725.00, which is 353 times greater than 1252.00. However, the 

eval_cost

  and

data_read_per_join

  are the same as the query that uses index

emp_no

 :

"cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },

These numbers clearly explain why the optimizer prefers the index 

emp_no

  to

PRIMARY KEY

.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  can be used together with

FORCE INDEX

  to find out why the optimizer prefers one index to another.

Feb
09
2016
--

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series. Today we’re going to look at how you can view the buffer result using JSON (instead of the regular

EXPLAIN

 command.

Regular

EXPLAIN

 does not identify if

SQL_BUFFER_RESULT

 was used at all. To demonstrate, let’s run this query:

mysql> explain select * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Now, let’s compare it to this query:

mysql> explain select sql_buffer_result * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Notice there is no difference, except the expected

"Using temporary"

 value in the

"Extra"

 row of the second query. The field 

"Using temporary"

  is expected here, because

SQL_BUFFER_RESULT

  directly instructs the MySQL server to put a result set into a temporary table to free locks. But what if the query uses the temporary table by itself? For example, for a grouping operation? In this case, the 

EXPLAIN

 result for the original query and the query that contains the 

SQL_BUFFER_RESULT

  clause will be 100% identical.

Compare:

mysql> explain select emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

With:

mysql> explain select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

There is no difference! We not able to tell if we used a temporary table to resolve the query, or simply put the result set into the buffer. The 

EXPLAIN FORMAT=JSON

  command can help in this case as well. Its output is clear, and shows all the details of the query optimization:

mysql> explain format=json select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "buffer_result": {
        "using_temporary_table": true,
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2557022,
          "rows_produced_per_join": 2557022,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "5544.00",
            "eval_cost": "511404.40",
            "prefix_cost": "516948.40",
            "data_read_per_join": "39M"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date"
          ]
        }
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

Firstly, we can see how the

grouping_operation

 was optimized:

"grouping_operation": { "using_temporary_table": true, "using_filesort": true,

And it does indeed use the temporary table.

Now we can follow the details for

SQL_BUFFER_RESULT

:

"buffer_result": {
        "using_temporary_table": true,

With this output, we can be absolutely certain that the temporary table was created for both the  

SQL_BUFFER_RESULT

 and the grouping operation. This is especially helpful for support engineers who need the 

EXPLAIN

  output to help their customers to tune queries, but are afraid to ask for the same query twice — once with the 

SQL_BUFFER_RESULT

 clause and once without.

Conclusion:

EXPLAIN FORMAT=JSON

  does not hide important details for query optimizations.

Jan
29
2016
--

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONReady for another post in the EXPLAIN FORMAT=JSON is Cool series! Great! This post will discuss how to see all the information that is contained in optimized queries with

UNION

 using the

union_result

 and

query_specifications

 commands.

 

When optimizing complicated queries with

UNION

, it is easy to get lost in the regular

EXPLAIN

  output trying to identify which part of the output belongs to each part of the

UNION

.

Let’s consider the following example:

mysql> explain
    ->     select emp_no, last_name, 'low_salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary < (select avg(salary) from salaries))
    -> union
    ->     select emp_no, last_name, 'high salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 4
  select_type: UNION
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 4
  select_type: UNION
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 6. row ***************************
           id: 6
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 7. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,4>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Using temporary
7 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

While we can guess that subquery 3 belongs to the first query of the union, and subquery 6 belongs to the second (which has number 4 for some reason), we have to be very careful (especially in our case) when queries use the same tables in both parts of the

UNION

.

The main issue with the regular

EXPLAIN

 for

UNION

  is that it has to re-present the hierarchical structure as a table. The same issue occurs when you want to store objects created in programming language, such as Java, in the database.

EXPLAIN FORMAT=JSON

, on the other hand, has hierarchical structure and more clearly displays how

UNION

 was optimized:

mysql> explain format=json select emp_no, last_name, 'low_salary' from employees where emp_no in (select emp_no from salaries  where salary < (select avg(salary) from salaries)) union select emp_no, last_name, 'high salary' from employees where emp_no in (select emp_no from salaries where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "union_result": {
      "using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 3,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 6,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

First it puts member

union_result

 in the

query_block

  at the very top level:

EXPLAIN: {
  "query_block": {
    "union_result": {

The

union_result

 object contains information about how the result set of the

UNION

 was processed:

"using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",

And also contains the 

query_specifications

 array which also contains all the details about queries in the

UNION

:

"query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
<skipped>
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,

This representation is much more clear, and also contains all the details which the regular

EXPLAIN

misses for regular queries.

Conclusion:

EXPLAIN FORMAT=JSON

 not only contains additional optimization information for each query in the

UNION

, but also has a hierarchical structure that is more suitable for the hierarchical nature of the

UNION

.

Jan
25
2016
--

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

EXPLAIN FORMAT=JSONOver several previous blog posts, we’ve already discussed what information the 

EXPLAIN FORMAT=JSON

 output provides for some subqueries. You can review those discussions here, here and here. EXPLAIN FORMAT=JSON shows many details that you can’t get with other commands. Let’s now finish this topic and discuss the output for the rest of the subquery types.

First, let’s look at the subquery in the 

HAVING

 clause, such as in the following example:

select count(emp_no), salary
from salaries
group by salary
having salary > ALL (select avg(s)
                     from (select dept_no, sum(salary) as s
                           from salaries join dept_emp using (emp_no) group by dept_no) t
                     )

This example prints the number of employees and their salaries, if their salary is greater than the average salary in their department.

EXPLAIN FORMAT=JSON

 provides a lot details on how this subquery is optimized:

mysql> explain format=json select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2557022,
        "rows_produced_per_join": 2557022,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5544.00",
          "eval_cost": "511404.40",
          "prefix_cost": "516948.40",
          "data_read_per_join": "39M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "771970.25"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1019140.27"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331570,
                          "rows_produced_per_join": 331570,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "737.00",
                            "eval_cost": "66314.00",
                            "prefix_cost": "67051.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 9,
                          "rows_produced_per_join": 3087841,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334520.92",
                            "eval_cost": "617568.35",
                            "prefix_cost": "1019140.27",
                            "data_read_per_join": "47M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` group by `employees`.`salaries`.`salary` having <not>((`employees`.`salaries`.`salary` <= <max>(/* select#2 */ select avg(`t`.`s`) from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t`)))

We see that the subquery in the 

HAVING

 clause is not dependent, but cacheable:

"having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,

It has its own query block:

"query_block": {
            "select_id": 2,

Which accesses table “t”:

"table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],

Table “t” was also materialized from the subquery:

],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,

Another kind of subquery is in the 

SELECT

 list. If we want to compare the salary of an employee with the average salary in the company, for example, we can use the query

select emp_no, salary, (select avg(salary) from salaries) from salaries

. Lets examine the 

EXPLAIN

 output:

mysql> explain format=json select emp_no, salary, (select avg(salary) from salaries) from salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "516948.40"
    },
    "table": {
      "table_name": "salaries",
      "access_type": "ALL",
      "rows_examined_per_scan": 2557022,
      "rows_produced_per_join": 2557022,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "5544.00",
        "eval_cost": "511404.40",
        "prefix_cost": "516948.40",
        "data_read_per_join": "39M"
      },
      "used_columns": [
        "emp_no",
        "salary"
      ]
    },
    "select_list_subqueries": [
      {
        "dependent": false,
        "cacheable": true,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "516948.40"
          },
          "table": {
            "table_name": "salaries",
            "access_type": "ALL",
            "rows_examined_per_scan": 2557022,
            "rows_produced_per_join": 2557022,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "5544.00",
              "eval_cost": "511404.40",
              "prefix_cost": "516948.40",
              "data_read_per_join": "39M"
            },
            "used_columns": [
              "salary"
            ]
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,(/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`) AS `(select avg(salary) from salaries)` from `employees`.`salaries`

EXPLAIN FORMAT=JSON

 in this case shows that the subquery is part of the first

query_block

, not dependent and cacheable.

The last type of subquery I want to discuss is the subquery updating values. For example, I added a new column to the

titles

 table from the standard employees database:

mysql> alter table titles add column full_title varchar(100);
Query OK, 0 rows affected (24.42 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now I want

full_title

 to contain both the department’s name and title, separated by a space. I can use 

UPDATE

 with the subquery to achieve this:

update titles
set full_title=concat((select dept_name
                       from departments
                       join dept_emp using(dept_no)
                       where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01')
               ,' ', title)
where to_date = '9999-01-01';

To find out how it is optimized, we can use

EXPLAIN FORMAT=JSON

:

mysql> explain format=json update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "update": true,
      "table_name": "titles",
      "access_type": "index",
      "key": "PRIMARY",
      "used_key_parts": [
        "emp_no",
        "title",
        "from_date"
      ],
      "key_length": "59",
      "rows_examined_per_scan": 442843,
      "filtered": "100.00",
      "using_temporary_table": "for update",
      "attached_condition": "(`employees`.`titles`.`to_date` = '9999-01-01')"
    },
    "update_value_subqueries": [
      {
        "dependent": true,
        "cacheable": false,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "1.35"
          },
          "nested_loop": [
            {
              "table": {
                "table_name": "dept_emp",
                "access_type": "ref",
                "possible_keys": [
                  "PRIMARY",
                  "emp_no",
                  "dept_no"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "emp_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.titles.emp_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "10.00",
                "cost_info": {
                  "read_cost": "1.00",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.22",
                  "data_read_per_join": "1"
                },
                "used_columns": [
                  "emp_no",
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_emp`.`to_date` = '9999-01-01')"
              }
            },
            {
              "table": {
                "table_name": "departments",
                "access_type": "eq_ref",
                "possible_keys": [
                  "PRIMARY"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "dept_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.dept_emp.dept_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "0.11",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.35",
                  "data_read_per_join": "5"
                },
                "used_columns": [
                  "dept_no",
                  "dept_name"
                ]
              }
            }
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1276): Field or reference 'employees.titles.emp_no' of SELECT #2 was resolved in SELECT #1

We can see in this output that the subquery is dependent, not cacheable, and will be executed for each row that needs to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  provides various information about all kind of subqueries.

Jan
08
2016
--

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONWe’ve already discussed using the ORDER BY clause with subqueries. You can also, however, use the 

ORDER BY

 clause with sorting results of one of the columns. Actually, this is most common way to use this clause.

Sometimes such queries require using temporary tables or filesort, and a regular

EXPLAIN

  clause provides this information. But it doesn’t show if this job is needed for

ORDER BY

 or for optimizing another part of the query.

For example, if we take a pretty simple query ( 

select distinct last_name from employees order by last_name asc

) and run

EXPLAIN

  on it, we can see that both the temporary table and filesort were used. However, we can’t identify if these were applied to

DISTINCT

, or to

ORDER BY

, or to any other part of the query.

mysql> explain select distinct last_name from employees order by last_name ascG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299379
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` order by `employees`.`employees`.`last_name`

EXPLAIN FORMAT=JSON

 tells us exactly what happened:

mysql> explain format=json select distinct last_name from employees order by last_name ascG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "360183.80"
    },
    "ordering_operation": {
      "using_filesort": false,
      "duplicates_removal": {
        "using_temporary_table": true,
        "using_filesort": true,
        "cost_info": {
          "sort_cost": "299379.00"
        },
        "table": {
          "table_name": "employees",
          "access_type": "ALL",
          "rows_examined_per_scan": 299379,
          "rows_produced_per_join": 299379,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "929.00",
            "eval_cost": "59875.80",
            "prefix_cost": "60804.80",
            "data_read_per_join": "13M"
          },
          "used_columns": [
            "emp_no",
            "last_name"
          ]
        }
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` order by `employees`.`employees`.`last_name`

In the output above you see can see that 

ordering_operation

 does not use filesort:

"ordering_operation": {
      "using_filesort": false,

But

DISTINCT

 does:

"duplicates_removal": {
        "using_temporary_table": true,
        "using_filesort": true,

If we remove the 

DISTINCT

 clause, we will find that 

ORDER BY

 started using filesort, but does not need to create a temporary table:

mysql> explain format=json select last_name from employees order by last_name ascG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "360183.80"
    },
    "ordering_operation": {
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "299379.00"
      },
<rest of the output skipped>

This means that in the case of the first query, a sorting operation proceeded in parallel with the duplicate keys removal.

Conclusion:

EXPLAIN FORMAT=JSON

  provides details about

ORDER BY

  optimization which cannot be seen with a regular

EXPLAIN

 operation.

Jan
04
2016
--

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSONIn the previous EXPLAIN FORMAT=JSON is Cool! series blog post, we discussed the  

group_by_subqueries

  member (which is child of

grouping_operation

). Let’s now focus on the 

grouping_operation

  and other details of 

GROUP BY

  processing.

grouping_operation

 simply shows the details of what happens when the 

GROUP BY

 clause is run:

mysql> explain format=json select dept_no from dept_emp group by dept_noG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "14.40"
    },
    "grouping_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "dept_emp",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY",
          "emp_no",
          "dept_no"
        ],
        "key": "dept_no",
        "used_key_parts": [
          "dept_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 9,
        "rows_produced_per_join": 9,
        "filtered": "100.00",
        "using_index_for_group_by": true,
        "cost_info": {
          "read_cost": "12.60",
          "eval_cost": "1.80",
          "prefix_cost": "14.40",
          "data_read_per_join": "144"
        },
        "used_columns": [
          "emp_no",
          "dept_no"
        ]
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no` from `employees`.`dept_emp` group by `employees`.`dept_emp`.`dept_no`

In the listing above, you can see which table was accessed by the 

GROUP BY

 operation, the access type, and if an index for

GROUP BY

 was used.

In case of a simple

JOIN

  of two tables, 

grouping_operation

 is usually a parent for the 

nested_loop

  object (which provides details on how the 

JOIN

  proceeded):

mysql> explain format=json select de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_noG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "61.50"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "26.41"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "dm",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY",
              "emp_no"
            ],
            "key": "emp_no",
            "used_key_parts": [
              "emp_no"
            ],
            "key_length": "4",
            "rows_examined_per_scan": 24,
            "rows_produced_per_join": 24,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "1.00",
              "eval_cost": "4.80",
              "prefix_cost": "5.80",
              "data_read_per_join": "384"
            },
            "used_columns": [
              "dept_no",
              "emp_no"
            ]
          }
        },
        {
          "table": {
            "table_name": "de",
            "access_type": "ref",
            "possible_keys": [
              "PRIMARY",
              "emp_no",
              "dept_no"
            ],
            "key": "emp_no",
            "used_key_parts": [
              "emp_no"
            ],
            "key_length": "4",
            "ref": [
              "employees.dm.emp_no"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 26,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "24.00",
              "eval_cost": "5.28",
              "prefix_cost": "35.09",
              "data_read_per_join": "422"
            },
            "used_columns": [
              "emp_no",
              "dept_no"
            ]
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no`

Surprisingly, while many

DISTINCT

 queries can be converted into equivalent queries with the 

GROUP BY

 clause, there is separate member (

duplicates_removal

) for processing it. Let’s see how it works with a simple query that performs the same job as the first one in this blog post:

mysql> explain format=json select distinct dept_no from dept_empG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "14.40"
    },
    "duplicates_removal": {
      "using_filesort": false,
      "table": {
        "table_name": "dept_emp",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY",
          "emp_no",
          "dept_no"
        ],
        "key": "dept_no",
        "used_key_parts": [
          "dept_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 9,
        "rows_produced_per_join": 9,
        "filtered": "100.00",
        "using_index_for_group_by": true,
        "cost_info": {
          "read_cost": "12.60",
          "eval_cost": "1.80",
          "prefix_cost": "14.40",
          "data_read_per_join": "144"
        },
        "used_columns": [
          "emp_no",
          "dept_no"
        ]
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`dept_emp`.`dept_no` AS `dept_no` from `employees`.`dept_emp`

You can see that the plan is almost same, but parent element for the plan is

duplicates_removal

.

The reason there are differences between these members can be seen if we change the second, more complicated query to use

DISTINCT

 in place of

GROUP BY

:

mysql> explain format=json select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no)G
ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'employees.de.dept_no'; this is incompatible with sql_mode=only_full_group_by

This example shows that 

DISTINCT

 is not exactly same as

GROUP BY

, and can be used together  if we want to count the number of managers in each department (grouped by the year when the manager started working in the department). In this case, however, we are interested only in unique pairs of such dates and don’t want to see duplicates. Duplicates will appear if one person managed same department more than two years.

mysql> explain format=json select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_no, year(de.from_date)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "61.63"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "grouping_operation": {
        "using_temporary_table": true,
        "using_filesort": true,
        "cost_info": {
          "sort_cost": "26.53"
        },
        "nested_loop": [
          {
            "table": {
              "table_name": "dm",
              "access_type": "index",
              "possible_keys": [
                "PRIMARY",
                "emp_no"
              ],
              "key": "emp_no",
              "used_key_parts": [
                "emp_no"
              ],
              "key_length": "4",
              "rows_examined_per_scan": 24,
              "rows_produced_per_join": 24,
              "filtered": "100.00",
              "using_index": true,
              "cost_info": {
                "read_cost": "1.00",
                "eval_cost": "4.80",
                "prefix_cost": "5.80",
                "data_read_per_join": "384"
              },
              "used_columns": [
                "dept_no",
                "emp_no"
              ]
            }
          },
          {
            "table": {
              "table_name": "de",
              "access_type": "ref",
              "possible_keys": [
                "PRIMARY",
                "emp_no"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "emp_no"
              ],
              "key_length": "4",
              "ref": [
                "employees.dm.emp_no"
              ],
              "rows_examined_per_scan": 1,
              "rows_produced_per_join": 26,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "24.00",
                "eval_cost": "5.31",
                "prefix_cost": "35.11",
                "data_read_per_join": "424"
              },
              "used_columns": [
                "emp_no",
                "dept_no",
                "from_date"
              ]
            }
          }
        ]
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no`,year(`employees`.`de`.`from_date`)

In this case, the member

grouping_operation

 is a child of

duplicates_removal

 and the temporary table used to store the result of

GROUP BY

  before removing the duplicates. A temporary table was also used to perform a filesort for the grouping operation itself.

Compare this with regular

EXPLAIN

 output.

EXPLAIN

 only shows that a temporary table was used, but does not provide insights on the operations for which it was used:

mysql> explain select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_no, year(de.from_date)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dm
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 24
     filtered: 100.00
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: de
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.dm.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no`,year(`employees`.`de`.`from_date`)

Conclusion: 

EXPLAIN FORMAT=JSON

 contains all the details about the 

GROUP BY

 and

DISTINCT

  optimizations.

Dec
29
2015
--

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

EXPLAIN FORMAT=JSON

EXPLAIN FORMATAnother post in the EXPLAIN FORMAT=JSON is Cool! series! In this post, we’ll discuss how the EXPLAIN FORMAT=JSON provides optimization details for 

ORDER BY

 and  

GROUP BY

 operations in conjunction with 

order_by_subqueries

 and  

group_by_subqueries

EXPLAIN FORMAT=JSON

 can print details on how a subquery in

ORDER BY

 is optimized:

mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select emp_no limit 1)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60833.60"
    },
    "ordering_operation": {
      "using_filesort": true,
      "table": {
        "table_name": "employees",
        "access_type": "ALL",
        "rows_examined_per_scan": 299843,
        "rows_produced_per_join": 299843,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "865.00",
          "eval_cost": "59968.60",
          "prefix_cost": "60833.60",
          "data_read_per_join": "13M"
        },
        "used_columns": [
          "emp_no",
          "first_name",
          "last_name"
        ]
      },
      "order_by_subqueries": [
        {
          "dependent": true,
          "cacheable": false,
          "query_block": {
            "select_id": 2,
            "message": "No tables used"
          }
        }
      ]
    }
  }
}
1 row in set, 2 warnings (0.00 sec)
Note (Code 1276): Field or reference 'employees.employees.emp_no' of SELECT #2 was resolved in SELECT #1
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`employees`.`emp_no` limit 1)

The above code shows member

ordering_operation

 of

query_block

  (which includes the 

order_by_subqueries

 array) with information on how the subquery in

ORDER BY

  was optimized.

This is a simple example. In real life you can have larger subqueries in the 

ORDER BY

  clause. For example, take this more complicated and slightly crazy query:

select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)

Run a regular

EXPLAIN

 on it. If we imagine this is a regular subquery, we won’t know if it can be cached or would be executed for each row sorted.

mysql> explain  select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299843
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: dept_emp
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no,dept_no
          key: dept_no
      key_len: 4
          ref: NULL
         rows: 331215
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: employees.dept_emp.emp_no
         rows: 10
     filtered: 100.00
        Extra: Using index
3 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)

EXPLAIN FORMAT=JSON

  provides a completely different picture:

mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60833.60"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "employees",
        "access_type": "ALL",
        "rows_examined_per_scan": 299843,
        "rows_produced_per_join": 299843,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "865.00",
          "eval_cost": "59968.60",
          "prefix_cost": "60833.60",
          "data_read_per_join": "13M"
        },
        "used_columns": [
          "emp_no",
          "first_name",
          "last_name"
        ]
      },
      "optimized_away_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "1082124.21"
            },
            "grouping_operation": {
              "using_filesort": false,
              "nested_loop": [
                {
                  "table": {
                    "table_name": "dept_emp",
                    "access_type": "index",
                    "possible_keys": [
                      "PRIMARY",
                      "emp_no",
                      "dept_no"
                    ],
                    "key": "dept_no",
                    "used_key_parts": [
                      "dept_no"
                    ],
                    "key_length": "4",
                    "rows_examined_per_scan": 331215,
                    "rows_produced_per_join": 331215,
                    "filtered": "100.00",
                    "using_index": true,
                    "cost_info": {
                      "read_cost": "673.00",
                      "eval_cost": "66243.00",
                      "prefix_cost": "66916.00",
                      "data_read_per_join": "5M"
                    },
                    "used_columns": [
                      "emp_no",
                      "dept_no"
                    ]
                  }
                },
                {
                  "table": {
                    "table_name": "salaries",
                    "access_type": "ref",
                    "possible_keys": [
                      "PRIMARY",
                      "emp_no"
                    ],
                    "key": "emp_no",
                    "used_key_parts": [
                      "emp_no"
                    ],
                    "key_length": "4",
                    "ref": [
                      "employees.dept_emp.emp_no"
                    ],
                    "rows_examined_per_scan": 10,
                    "rows_produced_per_join": 3399374,
                    "filtered": "100.00",
                    "using_index": true,
                    "cost_info": {
                      "read_cost": "335333.33",
                      "eval_cost": "679874.87",
                      "prefix_cost": "1082124.21",
                      "data_read_per_join": "51M"
                    },
                    "used_columns": [
                      "emp_no",
                      "from_date"
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)

We see that the subquery was optimized away: member

optimized_away_subqueries

 exists, but there is no

order_by_subqueries

 in the

ordering_operation

 object. We can also see that the subquery was cached:

"cacheable": true

.

EXPLAIN FORMAT=JSON

 also provides information about subqueries in the 

GROUP BY

 clause. It uses the 

group_by_subqueries

 array in the 

grouping_operation

  member for this purpose.

mysql> explain format=json select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3412037.60"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2838638.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2838638,
        "rows_produced_per_join": 2838638,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5672.00",
          "eval_cost": "567727.60",
          "prefix_cost": "573399.60",
          "data_read_per_join": "43M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "group_by_subqueries": [
        {
          "dependent": true,
          "cacheable": false,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "881731.00"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3526884,
              "rows_produced_per_join": 3526884,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "176354.20",
                "eval_cost": "705376.80",
                "prefix_cost": "881731.00",
                "data_read_per_join": "134M"
              },
              "used_columns": [
                "dept_no",
                "s",
                "c"
              ],
              "attached_condition": "((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`)))",
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1106758.94"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331215,
                          "rows_produced_per_join": 331215,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "673.00",
                            "eval_cost": "66243.00",
                            "prefix_cost": "66916.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 10,
                          "rows_produced_per_join": 3526884,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334466.14",
                            "eval_cost": "705376.80",
                            "prefix_cost": "1106758.95",
                            "data_read_per_join": "53M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))

Again, this output gives a clear view of query optimization: subquery in

GROUP BY

 itself cannot be optimized, cached or converted into temporary table, but the subquery inside the subquery (

select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no

) could be materialized into a temporary table and cached.

A regular

EXPLAIN

 command does not provide such details:

mysql> explain select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838638
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: <derived3>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3526884
     filtered: 100.00
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: DERIVED
        table: dept_emp
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no,dept_no
          key: dept_no
      key_len: 4
          ref: NULL
         rows: 331215
     filtered: 100.00
        Extra: Using index
*************************** 4. row ***************************
           id: 3
  select_type: DERIVED
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.dept_emp.emp_no
         rows: 10
     filtered: 100.00
        Extra: NULL
4 rows in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))

Most importantly, we cannot guess from the output if the 

DERIVED

 subquery can be cached.

Conlcusion:

EXPLAIN FORMAT=JSON

  provides details on how subqueries in

ORDER BY

 and

GROUP BY

 clauses are optimized.

Dec
28
2015
--

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

optimizer_switch

EXPLAIN FORMAT=JSONThe previous post in the EXPLAIN FORMAT=JSON is Cool! series showed an example of the query

select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)

, where the subquery was materialized into a temporary table and then joined with the outer query. This is known as a semi-join optimization. But what happens if we turn off this optimization?

EXPLAIN FORMAT=JSON

 can help us with this investigation too.

First lets look at the original output again:

mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "16.72"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "departments",
          <skipped>
      },
      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "eq_ref",
          "key": "<auto_key>",
          "key_length": "4",
          "ref": [
            "employees.departments.dept_no"
          ],
          "rows_examined_per_scan": 1,
          "materialized_from_subquery": {
            "using_temporary_table": true,
            "query_block": {
              "table": {
                "table_name": "dept_manager",
                "access_type": "ALL",
                "possible_keys": [
                  "dept_no"
                ],
                "rows_examined_per_scan": 24,
                "rows_produced_per_join": 21,
                "filtered": "90.00",
                "cost_info": {
                  "read_cost": "1.48",
                  "eval_cost": "4.32",
                  "prefix_cost": "5.80",
                  "data_read_per_join": "345"
                },
                "used_columns": [
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"
              }
            }
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null))

To repeat what happened here: the subquery was materialized into a  temporary table, then  joined with the departments table. Semi-join optimization is ON by default (as would be most likely without intervention).

What happens if we temporarily turn semi-join optimization OFF?

mysql> set optimizer_switch="semijoin=off";
Query OK, 0 rows affected (0.00 sec)

And then execute

EXPLAIN

 one more time:

mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null) G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.80"
    },
    "table": {
      "table_name": "departments",
      "access_type": "index",
      "key": "dept_name",
      "used_key_parts": [
        "dept_name"
      ],
      "key_length": "42",
      "rows_examined_per_scan": 9,
      "rows_produced_per_join": 9,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "1.80",
        "prefix_cost": "2.80",
        "data_read_per_join": "432"
      },
      "used_columns": [
        "dept_no",
        "dept_name"
      ],
      "attached_condition": "<in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))",
      "attached_subqueries": [
        {
          "table": {
            "table_name": "<materialized_subquery>",
            "access_type": "eq_ref",
            "key": "<auto_key>",
            "key_length": "4",
            "rows_examined_per_scan": 1,
            "materialized_from_subquery": {
              "using_temporary_table": true,
              "dependent": true,
              "cacheable": false,
              "query_block": {
                "select_id": 2,
                "cost_info": {
                  "query_cost": "5.80"
                },
                "table": {
                  "table_name": "dept_manager",
                  "access_type": "ALL",
                  "possible_keys": [
                    "dept_no"
                  ],
                  "rows_examined_per_scan": 24,
                  "rows_produced_per_join": 21,
                  "filtered": "90.00",
                  "cost_info": {
                    "read_cost": "1.48",
                    "eval_cost": "4.32",
                    "prefix_cost": "5.80",
                    "data_read_per_join": "345"
                  },
                  "used_columns": [
                    "dept_no",
                    "to_date"
                  ],
                  "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` where <in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))

Now the picture is completely different. There is no

nested_loop

 member, and instead there is an 

attached_subqueries

 array containing a single member: the temporary table materialized from the subquery

select dept_no from dept_manager where to_date is not null

 (including all the details of this materialization).

Conclusion: We can experiment with the value of

optimizer_switch

 and use

EXPLAIN FORMAT=JSON

 to examine how a particular optimization affects our queries.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com