May
20
2016
--

Introduction to Troubleshooting Performance – Troubleshooting Slow Queries webinar: Q & A

Troubleshooting Slow Queries

Troubleshooting Slow QueriesIn this blog, I will provide answers to the Q & A for the Troubleshooting Slow Queries webinar.

First, I want to thank you for attending the April 28 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: I’ve heard that is a bad idea to use

select *

; what do you recommend?

A: When I used

SELECT *

 in my slides, I wanted to underline the idea that sometimes you need to select all columns from the table. There is nothing bad about it if you need them.

SELECT *

 is bad when you need only a few columns from the table. In this case, you retrieve more data than needed, which affects performance. Another issue that  

SELECT *

 can cause is if you hard-code the statement into your application, then change table definition; the application could start retrieving columns in wrong order and output (e.g., email instead of billing address). Or even worse, it will try to access a non-existent index in the result set array. The best practice is to explicitly enumerate all columns that your application needs.

Q: I heard that using 

index_field

 length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

A: I assume you are asking about the ability to create an index with lengths smaller than the column length? They work as follows:

Assume you have a 

TEXT

  field which contains these user questions:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using 
    index_field

     length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

  3. ….

Since this is a 

TEXT

  field you cannot create and index on it without specifying its length, so you need to make the index as minimal as possible to uniquely identify questions. If you create an index with length 10 it will contain:

  1. I’ve heard
  2. I heard th

You will index only those parts of questions that are not very distinct from each other, and do not contain useful information about what the question is. You can create index of length 255:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as index

In this case, the index includes the whole first question and almost all the second question. This makes the index too large and requires us to use more disk space (which causes more IO). Also, information from the second question is probably too much.

If make index of length 75, we will have:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle du

This is more than enough for the first question and gives a good idea of what is in the second question. It also potentially will have enough unique entries to make its cardinality look more like the cardinality of real data distribution.

To conclude: choosing the correct index length is something that requires practice and analysis of your actual data. Try to make them as short as possible, but long enough so that the number of unique entries in the index will be similar to a number of unique entries in the table.

Q: Which view can we query to see stats?

A: Do you mean index statistics?

SHOW INDEX FROM table_name

 will do it.

Q: We have an InnoDB table with 47 fields (mostly text); some are ft-indexed. I tried to do an alter table, and it ran for 24 hours. What is the best way to run an alter table to add one extra field? The table has 1.9 M rows and 47 columns with many indexes.

A: Adding a column requires a table copy. Therefore, the speed of this operation depends on the table size and speed of your disk. If you are using version 5.6 and later, adding a column would not block parallel queries (and therefore is not a big deal). If you are using an older version, you can always use the pt-online-schema-change utility from Percona Toolkit. However, it will run even more slowly than the regular

ALTER TABLE

. Unfortunately, you cannot speed up the execution of

ALTER TABLE

 much. The only thing that you can do is to use a faster disk (with options, tuned to explore speed of the disk).

However, if you do not want to have this increased IO affect the production server, you can alter the table on the separate instance, then copy tablespace to production and then apply all changes to the original table from the binary logs. The steps will be something like:

  1. Ensure you use option
    innodb_file_per_table

      and the big table has individual tablespace

  2. Ensure that binary log is enabled
  3. Start a new server (you can also use an existent stand-by slave).
  4. Disable writes to the table
  5. Record the binary log position
  6. Copy the tablespace to the new server as described here.
  7. Enable writes on the production server
  8. Run
    ALTER TABLE

     on the new server you created in step 2 (it will still take 24 hours)

  9. Stop writes to the table on the production server
  10. Copy the tablespace, altered in step 7
  11. Apply all writes to this table, which are in the binary logs after position, recorded in step 4.
  12. Enable writes to the table

This scenario will take even more time overall, but will have minimal impact on the production server

Q: If there is a compound index like index1(emp_id,date), will the following query be able to use index? “select * from table1 where emp_id = 10”

A: Yes. At least it should.

Q: Are 

filesort

 and

temporary

 in extended info for explain not good?

A: Regarding

filesort

: it depends. For example, you will always have the word

filesort

” for tables which perform 

ORDER BY

 and cannot use an index for

ORDER BY

. This is not always bad. For example, in this query:

mysql> explain select emp_no, first_name from employees where emp_no <20000 order by first_nameG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 18722
     filtered: 100.00
        Extra: Using where; Using filesort
1 row in set, 1 warning (0,01 sec)

the primary key used to resolve rows and

filesort

 were necessary and not avoidable. You can read about different

filesort

 algorithms here.

Regarding

Using temporary

: this means what during query execution temporary table will be created. This is can be not good, especially if the temporary table is large and cannot fit into memory. In this case, it would be written to disk and slow down operations. But, again, sometimes creating temporary tables in not avoidable, for example, if you have both

GROUP BY

 and

ORDER BY

 clauses which list columns differently as stated in the user manual.

Q: Is

key_len

 length more of a good thing for query execution?

A:

key_len

 field is not

NULL

 for all queries that use and index, and just shows the length of the key part used. It is not good or bad, it is just for information. You can use this information, for example, to identify which part of combined index is used to resolve the query.

Q: Does an alter query go for an optimizer check?

A: No. You can check it either by enabling optimizer trace, running

ALTER

 and find what trace is empty. Or by enabling the debug option and searching the resulting trace for

optimize

.

Q: A query involves four columns that are all individually covered by an index. The optimizer didn’t merge indexes because of cost, and even didn’t choose the composite index I created.

A: This depends on the table definition and query you used. I cannot provide a more detailed answer based only on this information.

Q cont.: Finally, only certain composite indexes were suitable, the column order in the complex index mattered a lot. Why couldn’t the optimizer merge the four individual single column indexes, and why did the order of the columns in the composite index matter?

A: Regarding why the optimizer could not merge four indexes, I need to see how the table is defined and which data is in these indexed columns. Regarding why the order of the columns in the composite index matters, it depends on the query. Why the optimizer can use an index, say, on

(col1, col2)

 where the conditions

col1=X AND col2=Y

 and

col2=Y AND col2=X

 for the case when you use

OR

, the order is important. For example, for the condition

col1=X OR col2=Y

, where the part

col1=X

 is always executed and the part

col2=Y

  is executed only when

col1=X

 is false. The same logic applies to queries like

SELECT col1 WHERE col2=Y ORDER BY col3

. See the user manual for details.

Q: When I try to obtain the optimizer trace on the console, the result is cut off. Even if I redirect the output to a file, how to overcome that?

A: Which version of MySQL Server do you use? The 

TRACE

 column is defined as

longtext NOT NULL

, and should not cause such issues. If it does with a newer version, report a bug at http://bugs.mysql.com/.

Q: Are there any free graphical visualizers for either EXPLAIN or the optimizer TRACE output?

A: There is graphical visualizer for

EXPLAIN

 in MySQL Workbench. But it works with online data only: you cannot run it on

EXPLAIN

 output, saved into a file. I don’t know about any visualizer for the optimizer

TRACE

 output. However, since it is

JSON

 you can simply save it to file and open in web browser. It will allow a better view than if opened in simple text editor.

Q: When do you use force index instead of

use index

 hints?

A: According to user manual “

USE INDEX (index_list)

 hint tells MySQL to use only one of the named indexes to find rows in the table” and “

FORCE INDEX

  hint acts like

USE INDEX (index_list)

, with the addition that a table scan is assumed to be very expensive . . . a table scan is used only if there is no way to use one of the named indexes to find rows in the table.” This means that when you use

USE INDEX

, you are giving a hint for the optimizer to prefer a particular index to others, but not enforcing index usage if the optimizer prefers a table scan, while

FORCE INDEX

 requires using the index. I myself use only

FORCE

 and

IGNORE

  index hints.

Q: Very informative session. I missed the beginning part. Are you going to distribute the recoded session later?

A: Yes. As usual slides and recording available here.

Apr
06
2016
--

EXPLAIN FORMAT=JSON wrap-up

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON wrap-upThis blog is an EXPLAIN FORMAT=JSON wrap-up for the series of posts I’ve done in the last few months.

In this series, we’ve discussed everything unique to

EXPLAIN FORMAT=JSON

. I intentionally skipped a description of members such as

table_name

,

access_type

  or

select_id

, which are not unique.

In this series, I only mentioned in passing members that replace information from the

Extra

 column in the regular

EXPLAIN

 output, such as

using_join_buffer

 ,

partitions

,

using_temporary_table

  or simply

message

. You can see these in queries like the following:

mysql> explain format=json select rand() from dual
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "No tables used"
  }
}
1 row in set, 1 warning (0.00 sec)

Or

mysql> explain format=json select emp_no from titles where 'Senior Engineer' = 'Senior Cat'
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "Impossible WHERE"
  }
}
1 row in set, 1 warning (0.01 sec)

Their use is fairly intuitive, similar to regular

EXPLAIN

, and I don’t think one can achieve anything from reading a blog post about each of them.

The only thing left to list is a Table of Contents for the series:

attached_condition: How EXPLAIN FORMAT=JSON can spell-check your queries

rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

used_columns: EXPLAIN FORMAT=JSON tells when you should use covered indexes

used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used

EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

Thanks for following the series!

Feb
22
2016
--

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that

cost_info

  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table

titles

  from the standard employees database:

mysql> show create table titlesG
*************************** 1. row ***************************
       Table: titles
Create Table: CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

It has two indexes:

emp_no

  and

PRIMARY

, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, 

emp_no

  doesn’t really fit for this query.

PRIMARY

  does fit, because it contains both the 

title

  and

from_date

  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by

emp_no

  and

title

 .  It can, however, be used to select rows from the index. When we use 

EXPLAIN

 , though, it shows us that the optimizer has chosen index

emp_no

  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using where; Using index; Using temporary
1 row in set, 1 warning (0.00 sec)

PRIMARY KEY

  exists in the field

possible_keys

, but was not chosen.

EXPLAIN FORMAT=JSON

  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "89796.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "emp_no",
        "used_key_parts": [
          "emp_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": {
      "query_cost": "89796.80"
    },

Which shows that the overall

query_cost

  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is

cost_info

, which belongs to the table itself:

"cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },

Here we get even more details, such as cost of read operation and evaluation.

prefix_cost

  is not useful for this example, because it contains the cost of joining to the next table in

JOIN

. Since we don’t join the table

titles

  with any other value of

prefix_cost

, is equivalent to the cost of the full query.

data_read_per_join

  contains the amount of data that should be read for each

JOIN

  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index

PRIMARY

  and examine the 

EXPLAIN FORMAT=JSON

  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "531269.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "emp_no",
          "title",
          "from_date"
        ],
        "key_length": "59",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": {
      "query_cost": "531269.80"
    },

read_cost

  is 442725.00, which is 353 times greater than 1252.00. However, the 

eval_cost

  and

data_read_per_join

  are the same as the query that uses index

emp_no

 :

"cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },

These numbers clearly explain why the optimizer prefers the index 

emp_no

  to

PRIMARY KEY

.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  can be used together with

FORCE INDEX

  to find out why the optimizer prefers one index to another.

Jan
25
2016
--

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

EXPLAIN FORMAT=JSONOver several previous blog posts, we’ve already discussed what information the 

EXPLAIN FORMAT=JSON

 output provides for some subqueries. You can review those discussions here, here and here. EXPLAIN FORMAT=JSON shows many details that you can’t get with other commands. Let’s now finish this topic and discuss the output for the rest of the subquery types.

First, let’s look at the subquery in the 

HAVING

 clause, such as in the following example:

select count(emp_no), salary
from salaries
group by salary
having salary > ALL (select avg(s)
                     from (select dept_no, sum(salary) as s
                           from salaries join dept_emp using (emp_no) group by dept_no) t
                     )

This example prints the number of employees and their salaries, if their salary is greater than the average salary in their department.

EXPLAIN FORMAT=JSON

 provides a lot details on how this subquery is optimized:

mysql> explain format=json select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2557022,
        "rows_produced_per_join": 2557022,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5544.00",
          "eval_cost": "511404.40",
          "prefix_cost": "516948.40",
          "data_read_per_join": "39M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "771970.25"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1019140.27"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331570,
                          "rows_produced_per_join": 331570,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "737.00",
                            "eval_cost": "66314.00",
                            "prefix_cost": "67051.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 9,
                          "rows_produced_per_join": 3087841,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334520.92",
                            "eval_cost": "617568.35",
                            "prefix_cost": "1019140.27",
                            "data_read_per_join": "47M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` group by `employees`.`salaries`.`salary` having <not>((`employees`.`salaries`.`salary` <= <max>(/* select#2 */ select avg(`t`.`s`) from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t`)))

We see that the subquery in the 

HAVING

 clause is not dependent, but cacheable:

"having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,

It has its own query block:

"query_block": {
            "select_id": 2,

Which accesses table “t”:

"table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],

Table “t” was also materialized from the subquery:

],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,

Another kind of subquery is in the 

SELECT

 list. If we want to compare the salary of an employee with the average salary in the company, for example, we can use the query

select emp_no, salary, (select avg(salary) from salaries) from salaries

. Lets examine the 

EXPLAIN

 output:

mysql> explain format=json select emp_no, salary, (select avg(salary) from salaries) from salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "516948.40"
    },
    "table": {
      "table_name": "salaries",
      "access_type": "ALL",
      "rows_examined_per_scan": 2557022,
      "rows_produced_per_join": 2557022,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "5544.00",
        "eval_cost": "511404.40",
        "prefix_cost": "516948.40",
        "data_read_per_join": "39M"
      },
      "used_columns": [
        "emp_no",
        "salary"
      ]
    },
    "select_list_subqueries": [
      {
        "dependent": false,
        "cacheable": true,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "516948.40"
          },
          "table": {
            "table_name": "salaries",
            "access_type": "ALL",
            "rows_examined_per_scan": 2557022,
            "rows_produced_per_join": 2557022,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "5544.00",
              "eval_cost": "511404.40",
              "prefix_cost": "516948.40",
              "data_read_per_join": "39M"
            },
            "used_columns": [
              "salary"
            ]
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,(/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`) AS `(select avg(salary) from salaries)` from `employees`.`salaries`

EXPLAIN FORMAT=JSON

 in this case shows that the subquery is part of the first

query_block

, not dependent and cacheable.

The last type of subquery I want to discuss is the subquery updating values. For example, I added a new column to the

titles

 table from the standard employees database:

mysql> alter table titles add column full_title varchar(100);
Query OK, 0 rows affected (24.42 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now I want

full_title

 to contain both the department’s name and title, separated by a space. I can use 

UPDATE

 with the subquery to achieve this:

update titles
set full_title=concat((select dept_name
                       from departments
                       join dept_emp using(dept_no)
                       where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01')
               ,' ', title)
where to_date = '9999-01-01';

To find out how it is optimized, we can use

EXPLAIN FORMAT=JSON

:

mysql> explain format=json update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "update": true,
      "table_name": "titles",
      "access_type": "index",
      "key": "PRIMARY",
      "used_key_parts": [
        "emp_no",
        "title",
        "from_date"
      ],
      "key_length": "59",
      "rows_examined_per_scan": 442843,
      "filtered": "100.00",
      "using_temporary_table": "for update",
      "attached_condition": "(`employees`.`titles`.`to_date` = '9999-01-01')"
    },
    "update_value_subqueries": [
      {
        "dependent": true,
        "cacheable": false,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "1.35"
          },
          "nested_loop": [
            {
              "table": {
                "table_name": "dept_emp",
                "access_type": "ref",
                "possible_keys": [
                  "PRIMARY",
                  "emp_no",
                  "dept_no"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "emp_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.titles.emp_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "10.00",
                "cost_info": {
                  "read_cost": "1.00",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.22",
                  "data_read_per_join": "1"
                },
                "used_columns": [
                  "emp_no",
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_emp`.`to_date` = '9999-01-01')"
              }
            },
            {
              "table": {
                "table_name": "departments",
                "access_type": "eq_ref",
                "possible_keys": [
                  "PRIMARY"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "dept_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.dept_emp.dept_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "0.11",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.35",
                  "data_read_per_join": "5"
                },
                "used_columns": [
                  "dept_no",
                  "dept_name"
                ]
              }
            }
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1276): Field or reference 'employees.titles.emp_no' of SELECT #2 was resolved in SELECT #1

We can see in this output that the subquery is dependent, not cacheable, and will be executed for each row that needs to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  provides various information about all kind of subqueries.

Dec
29
2015
--

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

EXPLAIN FORMAT=JSON

EXPLAIN FORMATAnother post in the EXPLAIN FORMAT=JSON is Cool! series! In this post, we’ll discuss how the EXPLAIN FORMAT=JSON provides optimization details for 

ORDER BY

 and  

GROUP BY

 operations in conjunction with 

order_by_subqueries

 and  

group_by_subqueries

EXPLAIN FORMAT=JSON

 can print details on how a subquery in

ORDER BY

 is optimized:

mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select emp_no limit 1)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60833.60"
    },
    "ordering_operation": {
      "using_filesort": true,
      "table": {
        "table_name": "employees",
        "access_type": "ALL",
        "rows_examined_per_scan": 299843,
        "rows_produced_per_join": 299843,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "865.00",
          "eval_cost": "59968.60",
          "prefix_cost": "60833.60",
          "data_read_per_join": "13M"
        },
        "used_columns": [
          "emp_no",
          "first_name",
          "last_name"
        ]
      },
      "order_by_subqueries": [
        {
          "dependent": true,
          "cacheable": false,
          "query_block": {
            "select_id": 2,
            "message": "No tables used"
          }
        }
      ]
    }
  }
}
1 row in set, 2 warnings (0.00 sec)
Note (Code 1276): Field or reference 'employees.employees.emp_no' of SELECT #2 was resolved in SELECT #1
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`employees`.`emp_no` limit 1)

The above code shows member

ordering_operation

 of

query_block

  (which includes the 

order_by_subqueries

 array) with information on how the subquery in

ORDER BY

  was optimized.

This is a simple example. In real life you can have larger subqueries in the 

ORDER BY

  clause. For example, take this more complicated and slightly crazy query:

select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)

Run a regular

EXPLAIN

 on it. If we imagine this is a regular subquery, we won’t know if it can be cached or would be executed for each row sorted.

mysql> explain  select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299843
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: dept_emp
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no,dept_no
          key: dept_no
      key_len: 4
          ref: NULL
         rows: 331215
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: employees.dept_emp.emp_no
         rows: 10
     filtered: 100.00
        Extra: Using index
3 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)

EXPLAIN FORMAT=JSON

  provides a completely different picture:

mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60833.60"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "employees",
        "access_type": "ALL",
        "rows_examined_per_scan": 299843,
        "rows_produced_per_join": 299843,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "865.00",
          "eval_cost": "59968.60",
          "prefix_cost": "60833.60",
          "data_read_per_join": "13M"
        },
        "used_columns": [
          "emp_no",
          "first_name",
          "last_name"
        ]
      },
      "optimized_away_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "1082124.21"
            },
            "grouping_operation": {
              "using_filesort": false,
              "nested_loop": [
                {
                  "table": {
                    "table_name": "dept_emp",
                    "access_type": "index",
                    "possible_keys": [
                      "PRIMARY",
                      "emp_no",
                      "dept_no"
                    ],
                    "key": "dept_no",
                    "used_key_parts": [
                      "dept_no"
                    ],
                    "key_length": "4",
                    "rows_examined_per_scan": 331215,
                    "rows_produced_per_join": 331215,
                    "filtered": "100.00",
                    "using_index": true,
                    "cost_info": {
                      "read_cost": "673.00",
                      "eval_cost": "66243.00",
                      "prefix_cost": "66916.00",
                      "data_read_per_join": "5M"
                    },
                    "used_columns": [
                      "emp_no",
                      "dept_no"
                    ]
                  }
                },
                {
                  "table": {
                    "table_name": "salaries",
                    "access_type": "ref",
                    "possible_keys": [
                      "PRIMARY",
                      "emp_no"
                    ],
                    "key": "emp_no",
                    "used_key_parts": [
                      "emp_no"
                    ],
                    "key_length": "4",
                    "ref": [
                      "employees.dept_emp.emp_no"
                    ],
                    "rows_examined_per_scan": 10,
                    "rows_produced_per_join": 3399374,
                    "filtered": "100.00",
                    "using_index": true,
                    "cost_info": {
                      "read_cost": "335333.33",
                      "eval_cost": "679874.87",
                      "prefix_cost": "1082124.21",
                      "data_read_per_join": "51M"
                    },
                    "used_columns": [
                      "emp_no",
                      "from_date"
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`)

We see that the subquery was optimized away: member

optimized_away_subqueries

 exists, but there is no

order_by_subqueries

 in the

ordering_operation

 object. We can also see that the subquery was cached:

"cacheable": true

.

EXPLAIN FORMAT=JSON

 also provides information about subqueries in the 

GROUP BY

 clause. It uses the 

group_by_subqueries

 array in the 

grouping_operation

  member for this purpose.

mysql> explain format=json select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3412037.60"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2838638.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2838638,
        "rows_produced_per_join": 2838638,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5672.00",
          "eval_cost": "567727.60",
          "prefix_cost": "573399.60",
          "data_read_per_join": "43M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "group_by_subqueries": [
        {
          "dependent": true,
          "cacheable": false,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "881731.00"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3526884,
              "rows_produced_per_join": 3526884,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "176354.20",
                "eval_cost": "705376.80",
                "prefix_cost": "881731.00",
                "data_read_per_join": "134M"
              },
              "used_columns": [
                "dept_no",
                "s",
                "c"
              ],
              "attached_condition": "((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`)))",
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1106758.94"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331215,
                          "rows_produced_per_join": 331215,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "673.00",
                            "eval_cost": "66243.00",
                            "prefix_cost": "66916.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 10,
                          "rows_produced_per_join": 3526884,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334466.14",
                            "eval_cost": "705376.80",
                            "prefix_cost": "1106758.95",
                            "data_read_per_join": "53M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))

Again, this output gives a clear view of query optimization: subquery in

GROUP BY

 itself cannot be optimized, cached or converted into temporary table, but the subquery inside the subquery (

select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no

) could be materialized into a temporary table and cached.

A regular

EXPLAIN

 command does not provide such details:

mysql> explain select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838638
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: <derived3>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3526884
     filtered: 100.00
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: DERIVED
        table: dept_emp
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no,dept_no
          key: dept_no
      key_len: 4
          ref: NULL
         rows: 331215
     filtered: 100.00
        Extra: Using index
*************************** 4. row ***************************
           id: 3
  select_type: DERIVED
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.dept_emp.emp_no
         rows: 10
     filtered: 100.00
        Extra: NULL
4 rows in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`)))))

Most importantly, we cannot guess from the output if the 

DERIVED

 subquery can be cached.

Conlcusion:

EXPLAIN FORMAT=JSON

  provides details on how subqueries in

ORDER BY

 and

GROUP BY

 clauses are optimized.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com