Jun
20
2018
--

Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance

database query tuning

database query tuningPlease join Percona’s MySQL Database Administrator, Brad Mickel as he presents How to Analyze and Tune MySQL Queries for Better Performance on Thursday, June 21st, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

 

Query performance is essential in making any application successful. In order to finely tune your queries you first need to understand how MySQL executes them, and what tools are available to help identify problems.

In this session you will learn:

  1. The common tools for researching problem queries
  2. What an Index is, and why you should use one
  3. Index limitations
  4. When to rewrite the query instead of just adding a new index
Register Now

 

Brad Mickel

MySQL DBA

Bradley began working with MySQL in 2013 as part of his duties in healthcare billing. After 3 years in healthcare billing he joined Percona as part of the bootcamp process. After the bootcamp he has served as a remote database administrator on the Atlas team for Percona Managed Services.

The post Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance appeared first on Percona Database Performance Blog.

Oct
18
2017
--

Webinar Thursday, October 19, 2017: What You Need to Get the Most Out of Indexes – Part 2

Indexes

IndexesJoin Percona’s Senior Architect, Matthew Boehm, as he presents What You Need to Get the Most Out of Indexes – Part 2 webinar on Thursday, October 19, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Proper indexing is key to database performance. Finely tune your query writing and database performance with tips from the experts. MySQL offers a few different types of indexes and uses them in a variety of ways.

In this session you’ll learn:

  • How to use composite indexes
  • Other index usages besides lookup
  • How to find unoptimized queries
  • What is there beyond EXPLAIN?

Register for the webinar.

IndexesMatthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster, massive sharding topologies, PHP development and a bit of MySQL-C-API development. Previously, Matthew DBAed for the fifth largest MySQL installation at eBay/PayPal. He also hails from managed hosting environments. During his off-hours, Matthew is a nationally ranked, competitive West Coast Swing dancer and travels to competitions around the US. He enjoys working out, camping, biking and playing MMOs with his son.

May
20
2016
--

Introduction to Troubleshooting Performance – Troubleshooting Slow Queries webinar: Q & A

Troubleshooting Slow Queries

Troubleshooting Slow QueriesIn this blog, I will provide answers to the Q & A for the Troubleshooting Slow Queries webinar.

First, I want to thank you for attending the April 28 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: I’ve heard that is a bad idea to use

select *

; what do you recommend?

A: When I used

SELECT *

 in my slides, I wanted to underline the idea that sometimes you need to select all columns from the table. There is nothing bad about it if you need them.

SELECT *

 is bad when you need only a few columns from the table. In this case, you retrieve more data than needed, which affects performance. Another issue that  

SELECT *

 can cause is if you hard-code the statement into your application, then change table definition; the application could start retrieving columns in wrong order and output (e.g., email instead of billing address). Or even worse, it will try to access a non-existent index in the result set array. The best practice is to explicitly enumerate all columns that your application needs.

Q: I heard that using 

index_field

 length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

A: I assume you are asking about the ability to create an index with lengths smaller than the column length? They work as follows:

Assume you have a 

TEXT

  field which contains these user questions:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using 
    index_field

     length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

  3. ….

Since this is a 

TEXT

  field you cannot create and index on it without specifying its length, so you need to make the index as minimal as possible to uniquely identify questions. If you create an index with length 10 it will contain:

  1. I’ve heard
  2. I heard th

You will index only those parts of questions that are not very distinct from each other, and do not contain useful information about what the question is. You can create index of length 255:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as index

In this case, the index includes the whole first question and almost all the second question. This makes the index too large and requires us to use more disk space (which causes more IO). Also, information from the second question is probably too much.

If make index of length 75, we will have:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle du

This is more than enough for the first question and gives a good idea of what is in the second question. It also potentially will have enough unique entries to make its cardinality look more like the cardinality of real data distribution.

To conclude: choosing the correct index length is something that requires practice and analysis of your actual data. Try to make them as short as possible, but long enough so that the number of unique entries in the index will be similar to a number of unique entries in the table.

Q: Which view can we query to see stats?

A: Do you mean index statistics?

SHOW INDEX FROM table_name

 will do it.

Q: We have an InnoDB table with 47 fields (mostly text); some are ft-indexed. I tried to do an alter table, and it ran for 24 hours. What is the best way to run an alter table to add one extra field? The table has 1.9 M rows and 47 columns with many indexes.

A: Adding a column requires a table copy. Therefore, the speed of this operation depends on the table size and speed of your disk. If you are using version 5.6 and later, adding a column would not block parallel queries (and therefore is not a big deal). If you are using an older version, you can always use the pt-online-schema-change utility from Percona Toolkit. However, it will run even more slowly than the regular

ALTER TABLE

. Unfortunately, you cannot speed up the execution of

ALTER TABLE

 much. The only thing that you can do is to use a faster disk (with options, tuned to explore speed of the disk).

However, if you do not want to have this increased IO affect the production server, you can alter the table on the separate instance, then copy tablespace to production and then apply all changes to the original table from the binary logs. The steps will be something like:

  1. Ensure you use option
    innodb_file_per_table

      and the big table has individual tablespace

  2. Ensure that binary log is enabled
  3. Start a new server (you can also use an existent stand-by slave).
  4. Disable writes to the table
  5. Record the binary log position
  6. Copy the tablespace to the new server as described here.
  7. Enable writes on the production server
  8. Run
    ALTER TABLE

     on the new server you created in step 2 (it will still take 24 hours)

  9. Stop writes to the table on the production server
  10. Copy the tablespace, altered in step 7
  11. Apply all writes to this table, which are in the binary logs after position, recorded in step 4.
  12. Enable writes to the table

This scenario will take even more time overall, but will have minimal impact on the production server

Q: If there is a compound index like index1(emp_id,date), will the following query be able to use index? “select * from table1 where emp_id = 10”

A: Yes. At least it should.

Q: Are 

filesort

 and

temporary

 in extended info for explain not good?

A: Regarding

filesort

: it depends. For example, you will always have the word

filesort

” for tables which perform 

ORDER BY

 and cannot use an index for

ORDER BY

. This is not always bad. For example, in this query:

mysql> explain select emp_no, first_name from employees where emp_no <20000 order by first_nameG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 18722
     filtered: 100.00
        Extra: Using where; Using filesort
1 row in set, 1 warning (0,01 sec)

the primary key used to resolve rows and

filesort

 were necessary and not avoidable. You can read about different

filesort

 algorithms here.

Regarding

Using temporary

: this means what during query execution temporary table will be created. This is can be not good, especially if the temporary table is large and cannot fit into memory. In this case, it would be written to disk and slow down operations. But, again, sometimes creating temporary tables in not avoidable, for example, if you have both

GROUP BY

 and

ORDER BY

 clauses which list columns differently as stated in the user manual.

Q: Is

key_len

 length more of a good thing for query execution?

A:

key_len

 field is not

NULL

 for all queries that use and index, and just shows the length of the key part used. It is not good or bad, it is just for information. You can use this information, for example, to identify which part of combined index is used to resolve the query.

Q: Does an alter query go for an optimizer check?

A: No. You can check it either by enabling optimizer trace, running

ALTER

 and find what trace is empty. Or by enabling the debug option and searching the resulting trace for

optimize

.

Q: A query involves four columns that are all individually covered by an index. The optimizer didn’t merge indexes because of cost, and even didn’t choose the composite index I created.

A: This depends on the table definition and query you used. I cannot provide a more detailed answer based only on this information.

Q cont.: Finally, only certain composite indexes were suitable, the column order in the complex index mattered a lot. Why couldn’t the optimizer merge the four individual single column indexes, and why did the order of the columns in the composite index matter?

A: Regarding why the optimizer could not merge four indexes, I need to see how the table is defined and which data is in these indexed columns. Regarding why the order of the columns in the composite index matters, it depends on the query. Why the optimizer can use an index, say, on

(col1, col2)

 where the conditions

col1=X AND col2=Y

 and

col2=Y AND col2=X

 for the case when you use

OR

, the order is important. For example, for the condition

col1=X OR col2=Y

, where the part

col1=X

 is always executed and the part

col2=Y

  is executed only when

col1=X

 is false. The same logic applies to queries like

SELECT col1 WHERE col2=Y ORDER BY col3

. See the user manual for details.

Q: When I try to obtain the optimizer trace on the console, the result is cut off. Even if I redirect the output to a file, how to overcome that?

A: Which version of MySQL Server do you use? The 

TRACE

 column is defined as

longtext NOT NULL

, and should not cause such issues. If it does with a newer version, report a bug at http://bugs.mysql.com/.

Q: Are there any free graphical visualizers for either EXPLAIN or the optimizer TRACE output?

A: There is graphical visualizer for

EXPLAIN

 in MySQL Workbench. But it works with online data only: you cannot run it on

EXPLAIN

 output, saved into a file. I don’t know about any visualizer for the optimizer

TRACE

 output. However, since it is

JSON

 you can simply save it to file and open in web browser. It will allow a better view than if opened in simple text editor.

Q: When do you use force index instead of

use index

 hints?

A: According to user manual “

USE INDEX (index_list)

 hint tells MySQL to use only one of the named indexes to find rows in the table” and “

FORCE INDEX

  hint acts like

USE INDEX (index_list)

, with the addition that a table scan is assumed to be very expensive . . . a table scan is used only if there is no way to use one of the named indexes to find rows in the table.” This means that when you use

USE INDEX

, you are giving a hint for the optimizer to prefer a particular index to others, but not enforcing index usage if the optimizer prefers a table scan, while

FORCE INDEX

 requires using the index. I myself use only

FORCE

 and

IGNORE

  index hints.

Q: Very informative session. I missed the beginning part. Are you going to distribute the recoded session later?

A: Yes. As usual slides and recording available here.

Apr
06
2016
--

EXPLAIN FORMAT=JSON wrap-up

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON wrap-upThis blog is an EXPLAIN FORMAT=JSON wrap-up for the series of posts I’ve done in the last few months.

In this series, we’ve discussed everything unique to

EXPLAIN FORMAT=JSON

. I intentionally skipped a description of members such as

table_name

,

access_type

  or

select_id

, which are not unique.

In this series, I only mentioned in passing members that replace information from the

Extra

 column in the regular

EXPLAIN

 output, such as

using_join_buffer

 ,

partitions

,

using_temporary_table

  or simply

message

. You can see these in queries like the following:

mysql> explain format=json select rand() from dual
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "No tables used"
  }
}
1 row in set, 1 warning (0.00 sec)

Or

mysql> explain format=json select emp_no from titles where 'Senior Engineer' = 'Senior Cat'
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "Impossible WHERE"
  }
}
1 row in set, 1 warning (0.01 sec)

Their use is fairly intuitive, similar to regular

EXPLAIN

, and I don’t think one can achieve anything from reading a blog post about each of them.

The only thing left to list is a Table of Contents for the series:

attached_condition: How EXPLAIN FORMAT=JSON can spell-check your queries

rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

used_columns: EXPLAIN FORMAT=JSON tells when you should use covered indexes

used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used

EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

Thanks for following the series!

Feb
29
2016
--

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONOnce again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.

The regular

EXPLAIN

  command lists each table that participates in a 

JOIN

  operation on a single row. This works perfectly for simple queries:

mysql> explain select * from employees join titles join salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299379
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2745434
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
3 rows in set, 1 warning (0.00 sec)

You can see that the first accessed table was

employees

, then

titles

  and finally 

salaries

. Everything is clear.

EXPLAIN FORMAT=JSON

 in this case puts everything into the 

nested_loop

array (even if “MySQL isn’t limited to nested-loop joins”):

mysql> explain format=json select * from employees join titles join salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "7.277755124e16"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "employees",
          "access_type": "ALL",
          "rows_examined_per_scan": 299379,
          "rows_produced_per_join": 299379,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "929.00",
            "eval_cost": "59875.80",
            "prefix_cost": "60804.80",
            "data_read_per_join": "13M"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "titles",
          "access_type": "ALL",
          "rows_examined_per_scan": 442724,
          "rows_produced_per_join": 132542268396,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "62734.88",
            "eval_cost": "26508453679.20",
            "prefix_cost": "26508577218.88",
            "data_read_per_join": "7T"
          },
          "used_columns": [
            "emp_no",
            "title",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2745434,
          "rows_produced_per_join": 363886050091503872,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "314711040856.92",
            "eval_cost": "7.277721002e16",
            "prefix_cost": "7.277755124e16",
            "data_read_per_join": "5171P"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date",
            "to_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)

For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.

But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two

JOIN

 operations and two subqueries, a regular

EXPLAIN

 returns this plan:

mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <subquery2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: dept_manager
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: <derived3>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 3
  select_type: DERIVED
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 7.51
        Extra: Using where; Using index; Using temporary; Using filesort; LooseScan
*************************** 6. row ***************************
           id: 3
  select_type: DERIVED
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.titles.emp_no
         rows: 9
     filtered: 100.00
        Extra: NULL
6 rows in set, 1 warning (0.00 sec)

It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if

DERIVED

 belongs to the first

JOIN

 or to the second. And I am not quite sure why

<subquery2>

  was marked as

PRIMARY

, which is supposed to indicate “Outermost SELECT”.

The real issue here is that the internal representation of

JOIN

 is hierarchical, and MySQL Server (like in the case for

UNION

) has trouble representing an object as a “flat” table.

EXPLAIN FORMAT=JSON

, with its hierarchical nature, can help us in this case.

mysql> explain format=json  select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "39.45"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
            "using_temporary_table": true,
            "query_block": {
              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "rows_examined_per_scan": 9,
                "rows_produced_per_join": 9,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "10.45",
                  "eval_cost": "1.80",
                  "prefix_cost": "12.25",
                  "data_read_per_join": "144"
                },
                "used_columns": [
                  "emp_no",
                  "salary"
                ],
                "materialized_from_subquery": {
                  "using_temporary_table": true,
                  "dependent": false,
                  "cacheable": true,
                  "query_block": {
                    "select_id": 3,
                    "cost_info": {
                      "query_cost": "176246.11"
                    },
                    "grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
                            "access_type": "index",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "emp_no",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "rows_examined_per_scan": 442724,
                            "rows_produced_per_join": 33229,
                            "filtered": "7.51",
                            "using_index": true,
                            "loosescan": true,
                            "cost_info": {
                              "read_cost": "3380.56",
                              "eval_cost": "6645.94",
                              "prefix_cost": "63199.96",
                              "data_read_per_join": "2M"
                            },
                            "used_columns": [
                              "emp_no",
                              "title",
                              "from_date"
                            ],
                            "attached_condition": "(`employees`.`titles`.`title` like '%manager%')"
                          }
                        },
                        {
                          "table": {
                            "table_name": "salaries",
                            "access_type": "ref",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "PRIMARY",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "ref": [
                              "employees.titles.emp_no"
                            ],
                            "rows_examined_per_scan": 9,
                            "rows_produced_per_join": 9,
                            "filtered": "100.00",
                            "cost_info": {
                              "read_cost": "49622.62",
                              "eval_cost": "1.91",
                              "prefix_cost": "176236.57",
                              "data_read_per_join": "152"
                            },
                            "used_columns": [
                              "emp_no",
                              "salary",
                              "from_date"
                            ]
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        }
      },
      {
        "table": {
          "table_name": "dept_manager",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "emp_no"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 9,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "9.00",
            "eval_cost": "1.80",
            "prefix_cost": "23.05",
            "data_read_per_join": "144"
          },
          "used_columns": [
            "dept_no",
            "emp_no",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "employees",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "1.00",
            "eval_cost": "0.20",
            "prefix_cost": "39.45",
            "data_read_per_join": "48"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.01 sec)

At first we see that all our tables,

JOIN

 operations and subqueries are in the

nested_loop

 array:

"nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
...
      {
        "table": {
          "table_name": "dept_manager",
...
      {
        "table": {
          "table_name": "employees",
...
      }
    ]

Then we see that the first table,

<subquery2>

, was materialized_from_subquery:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...

Which, in its turn, was

materialized_from_subquery

 too:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...
                "materialized_from_subquery": {
...

This last subquery performs

grouping_operation

  on the other 

nested_loop

  (

JOIN

) of tables

titles

  and

salaries

:

"grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
...
                        },
                        {
                          "table": {
                            "table_name": "salaries",
...

Now we have a better picture of how the query was optimized: tables

titles

 and

salaries

  were joined first, then

GROUP BY

 was executed on the result, then the result was materialized and queried. The result of the query

select emp_no from <materialized> t

  was materialized again as

<subquery2>

, and only after it joined with two other tables.

Conclusion:

EXPLAIN FORMAT=JSON

  helps to understand how complex queries are optimized.

Feb
22
2016
--

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that

cost_info

  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table

titles

  from the standard employees database:

mysql> show create table titlesG
*************************** 1. row ***************************
       Table: titles
Create Table: CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

It has two indexes:

emp_no

  and

PRIMARY

, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, 

emp_no

  doesn’t really fit for this query.

PRIMARY

  does fit, because it contains both the 

title

  and

from_date

  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by

emp_no

  and

title

 .  It can, however, be used to select rows from the index. When we use 

EXPLAIN

 , though, it shows us that the optimizer has chosen index

emp_no

  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using where; Using index; Using temporary
1 row in set, 1 warning (0.00 sec)

PRIMARY KEY

  exists in the field

possible_keys

, but was not chosen.

EXPLAIN FORMAT=JSON

  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "89796.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "emp_no",
        "used_key_parts": [
          "emp_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": {
      "query_cost": "89796.80"
    },

Which shows that the overall

query_cost

  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is

cost_info

, which belongs to the table itself:

"cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },

Here we get even more details, such as cost of read operation and evaluation.

prefix_cost

  is not useful for this example, because it contains the cost of joining to the next table in

JOIN

. Since we don’t join the table

titles

  with any other value of

prefix_cost

, is equivalent to the cost of the full query.

data_read_per_join

  contains the amount of data that should be read for each

JOIN

  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index

PRIMARY

  and examine the 

EXPLAIN FORMAT=JSON

  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "531269.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "emp_no",
          "title",
          "from_date"
        ],
        "key_length": "59",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": {
      "query_cost": "531269.80"
    },

read_cost

  is 442725.00, which is 353 times greater than 1252.00. However, the 

eval_cost

  and

data_read_per_join

  are the same as the query that uses index

emp_no

 :

"cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },

These numbers clearly explain why the optimizer prefers the index 

emp_no

  to

PRIMARY KEY

.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  can be used together with

FORCE INDEX

  to find out why the optimizer prefers one index to another.

Feb
09
2016
--

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series. Today we’re going to look at how you can view the buffer result using JSON (instead of the regular

EXPLAIN

 command.

Regular

EXPLAIN

 does not identify if

SQL_BUFFER_RESULT

 was used at all. To demonstrate, let’s run this query:

mysql> explain select * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Now, let’s compare it to this query:

mysql> explain select sql_buffer_result * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Notice there is no difference, except the expected

"Using temporary"

 value in the

"Extra"

 row of the second query. The field 

"Using temporary"

  is expected here, because

SQL_BUFFER_RESULT

  directly instructs the MySQL server to put a result set into a temporary table to free locks. But what if the query uses the temporary table by itself? For example, for a grouping operation? In this case, the 

EXPLAIN

 result for the original query and the query that contains the 

SQL_BUFFER_RESULT

  clause will be 100% identical.

Compare:

mysql> explain select emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

With:

mysql> explain select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

There is no difference! We not able to tell if we used a temporary table to resolve the query, or simply put the result set into the buffer. The 

EXPLAIN FORMAT=JSON

  command can help in this case as well. Its output is clear, and shows all the details of the query optimization:

mysql> explain format=json select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "buffer_result": {
        "using_temporary_table": true,
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2557022,
          "rows_produced_per_join": 2557022,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "5544.00",
            "eval_cost": "511404.40",
            "prefix_cost": "516948.40",
            "data_read_per_join": "39M"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date"
          ]
        }
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

Firstly, we can see how the

grouping_operation

 was optimized:

"grouping_operation": { "using_temporary_table": true, "using_filesort": true,

And it does indeed use the temporary table.

Now we can follow the details for

SQL_BUFFER_RESULT

:

"buffer_result": {
        "using_temporary_table": true,

With this output, we can be absolutely certain that the temporary table was created for both the  

SQL_BUFFER_RESULT

 and the grouping operation. This is especially helpful for support engineers who need the 

EXPLAIN

  output to help their customers to tune queries, but are afraid to ask for the same query twice — once with the 

SQL_BUFFER_RESULT

 clause and once without.

Conclusion:

EXPLAIN FORMAT=JSON

  does not hide important details for query optimizations.

Jan
29
2016
--

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONReady for another post in the EXPLAIN FORMAT=JSON is Cool series! Great! This post will discuss how to see all the information that is contained in optimized queries with

UNION

 using the

union_result

 and

query_specifications

 commands.

 

When optimizing complicated queries with

UNION

, it is easy to get lost in the regular

EXPLAIN

  output trying to identify which part of the output belongs to each part of the

UNION

.

Let’s consider the following example:

mysql> explain
    ->     select emp_no, last_name, 'low_salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary < (select avg(salary) from salaries))
    -> union
    ->     select emp_no, last_name, 'high salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 4
  select_type: UNION
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 4
  select_type: UNION
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 6. row ***************************
           id: 6
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 7. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,4>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Using temporary
7 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

While we can guess that subquery 3 belongs to the first query of the union, and subquery 6 belongs to the second (which has number 4 for some reason), we have to be very careful (especially in our case) when queries use the same tables in both parts of the

UNION

.

The main issue with the regular

EXPLAIN

 for

UNION

  is that it has to re-present the hierarchical structure as a table. The same issue occurs when you want to store objects created in programming language, such as Java, in the database.

EXPLAIN FORMAT=JSON

, on the other hand, has hierarchical structure and more clearly displays how

UNION

 was optimized:

mysql> explain format=json select emp_no, last_name, 'low_salary' from employees where emp_no in (select emp_no from salaries  where salary < (select avg(salary) from salaries)) union select emp_no, last_name, 'high salary' from employees where emp_no in (select emp_no from salaries where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "union_result": {
      "using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 3,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 6,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

First it puts member

union_result

 in the

query_block

  at the very top level:

EXPLAIN: {
  "query_block": {
    "union_result": {

The

union_result

 object contains information about how the result set of the

UNION

 was processed:

"using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",

And also contains the 

query_specifications

 array which also contains all the details about queries in the

UNION

:

"query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
<skipped>
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,

This representation is much more clear, and also contains all the details which the regular

EXPLAIN

misses for regular queries.

Conclusion:

EXPLAIN FORMAT=JSON

 not only contains additional optimization information for each query in the

UNION

, but also has a hierarchical structure that is more suitable for the hierarchical nature of the

UNION

.

Jan
25
2016
--

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

EXPLAIN FORMAT=JSONOver several previous blog posts, we’ve already discussed what information the 

EXPLAIN FORMAT=JSON

 output provides for some subqueries. You can review those discussions here, here and here. EXPLAIN FORMAT=JSON shows many details that you can’t get with other commands. Let’s now finish this topic and discuss the output for the rest of the subquery types.

First, let’s look at the subquery in the 

HAVING

 clause, such as in the following example:

select count(emp_no), salary
from salaries
group by salary
having salary > ALL (select avg(s)
                     from (select dept_no, sum(salary) as s
                           from salaries join dept_emp using (emp_no) group by dept_no) t
                     )

This example prints the number of employees and their salaries, if their salary is greater than the average salary in their department.

EXPLAIN FORMAT=JSON

 provides a lot details on how this subquery is optimized:

mysql> explain format=json select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2557022,
        "rows_produced_per_join": 2557022,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5544.00",
          "eval_cost": "511404.40",
          "prefix_cost": "516948.40",
          "data_read_per_join": "39M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "771970.25"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1019140.27"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331570,
                          "rows_produced_per_join": 331570,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "737.00",
                            "eval_cost": "66314.00",
                            "prefix_cost": "67051.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 9,
                          "rows_produced_per_join": 3087841,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334520.92",
                            "eval_cost": "617568.35",
                            "prefix_cost": "1019140.27",
                            "data_read_per_join": "47M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` group by `employees`.`salaries`.`salary` having <not>((`employees`.`salaries`.`salary` <= <max>(/* select#2 */ select avg(`t`.`s`) from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t`)))

We see that the subquery in the 

HAVING

 clause is not dependent, but cacheable:

"having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,

It has its own query block:

"query_block": {
            "select_id": 2,

Which accesses table “t”:

"table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],

Table “t” was also materialized from the subquery:

],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,

Another kind of subquery is in the 

SELECT

 list. If we want to compare the salary of an employee with the average salary in the company, for example, we can use the query

select emp_no, salary, (select avg(salary) from salaries) from salaries

. Lets examine the 

EXPLAIN

 output:

mysql> explain format=json select emp_no, salary, (select avg(salary) from salaries) from salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "516948.40"
    },
    "table": {
      "table_name": "salaries",
      "access_type": "ALL",
      "rows_examined_per_scan": 2557022,
      "rows_produced_per_join": 2557022,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "5544.00",
        "eval_cost": "511404.40",
        "prefix_cost": "516948.40",
        "data_read_per_join": "39M"
      },
      "used_columns": [
        "emp_no",
        "salary"
      ]
    },
    "select_list_subqueries": [
      {
        "dependent": false,
        "cacheable": true,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "516948.40"
          },
          "table": {
            "table_name": "salaries",
            "access_type": "ALL",
            "rows_examined_per_scan": 2557022,
            "rows_produced_per_join": 2557022,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "5544.00",
              "eval_cost": "511404.40",
              "prefix_cost": "516948.40",
              "data_read_per_join": "39M"
            },
            "used_columns": [
              "salary"
            ]
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,(/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`) AS `(select avg(salary) from salaries)` from `employees`.`salaries`

EXPLAIN FORMAT=JSON

 in this case shows that the subquery is part of the first

query_block

, not dependent and cacheable.

The last type of subquery I want to discuss is the subquery updating values. For example, I added a new column to the

titles

 table from the standard employees database:

mysql> alter table titles add column full_title varchar(100);
Query OK, 0 rows affected (24.42 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now I want

full_title

 to contain both the department’s name and title, separated by a space. I can use 

UPDATE

 with the subquery to achieve this:

update titles
set full_title=concat((select dept_name
                       from departments
                       join dept_emp using(dept_no)
                       where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01')
               ,' ', title)
where to_date = '9999-01-01';

To find out how it is optimized, we can use

EXPLAIN FORMAT=JSON

:

mysql> explain format=json update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "update": true,
      "table_name": "titles",
      "access_type": "index",
      "key": "PRIMARY",
      "used_key_parts": [
        "emp_no",
        "title",
        "from_date"
      ],
      "key_length": "59",
      "rows_examined_per_scan": 442843,
      "filtered": "100.00",
      "using_temporary_table": "for update",
      "attached_condition": "(`employees`.`titles`.`to_date` = '9999-01-01')"
    },
    "update_value_subqueries": [
      {
        "dependent": true,
        "cacheable": false,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "1.35"
          },
          "nested_loop": [
            {
              "table": {
                "table_name": "dept_emp",
                "access_type": "ref",
                "possible_keys": [
                  "PRIMARY",
                  "emp_no",
                  "dept_no"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "emp_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.titles.emp_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "10.00",
                "cost_info": {
                  "read_cost": "1.00",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.22",
                  "data_read_per_join": "1"
                },
                "used_columns": [
                  "emp_no",
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_emp`.`to_date` = '9999-01-01')"
              }
            },
            {
              "table": {
                "table_name": "departments",
                "access_type": "eq_ref",
                "possible_keys": [
                  "PRIMARY"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "dept_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.dept_emp.dept_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "0.11",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.35",
                  "data_read_per_join": "5"
                },
                "used_columns": [
                  "dept_no",
                  "dept_name"
                ]
              }
            }
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1276): Field or reference 'employees.titles.emp_no' of SELECT #2 was resolved in SELECT #1

We can see in this output that the subquery is dependent, not cacheable, and will be executed for each row that needs to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  provides various information about all kind of subqueries.

Jan
08
2016
--

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONWe’ve already discussed using the ORDER BY clause with subqueries. You can also, however, use the 

ORDER BY

 clause with sorting results of one of the columns. Actually, this is most common way to use this clause.

Sometimes such queries require using temporary tables or filesort, and a regular

EXPLAIN

  clause provides this information. But it doesn’t show if this job is needed for

ORDER BY

 or for optimizing another part of the query.

For example, if we take a pretty simple query ( 

select distinct last_name from employees order by last_name asc

) and run

EXPLAIN

  on it, we can see that both the temporary table and filesort were used. However, we can’t identify if these were applied to

DISTINCT

, or to

ORDER BY

, or to any other part of the query.

mysql> explain select distinct last_name from employees order by last_name ascG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299379
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` order by `employees`.`employees`.`last_name`

EXPLAIN FORMAT=JSON

 tells us exactly what happened:

mysql> explain format=json select distinct last_name from employees order by last_name ascG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "360183.80"
    },
    "ordering_operation": {
      "using_filesort": false,
      "duplicates_removal": {
        "using_temporary_table": true,
        "using_filesort": true,
        "cost_info": {
          "sort_cost": "299379.00"
        },
        "table": {
          "table_name": "employees",
          "access_type": "ALL",
          "rows_examined_per_scan": 299379,
          "rows_produced_per_join": 299379,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "929.00",
            "eval_cost": "59875.80",
            "prefix_cost": "60804.80",
            "data_read_per_join": "13M"
          },
          "used_columns": [
            "emp_no",
            "last_name"
          ]
        }
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select distinct `employees`.`employees`.`last_name` AS `last_name` from `employees`.`employees` order by `employees`.`employees`.`last_name`

In the output above you see can see that 

ordering_operation

 does not use filesort:

"ordering_operation": {
      "using_filesort": false,

But

DISTINCT

 does:

"duplicates_removal": {
        "using_temporary_table": true,
        "using_filesort": true,

If we remove the 

DISTINCT

 clause, we will find that 

ORDER BY

 started using filesort, but does not need to create a temporary table:

mysql> explain format=json select last_name from employees order by last_name ascG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "360183.80"
    },
    "ordering_operation": {
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "299379.00"
      },
<rest of the output skipped>

This means that in the case of the first query, a sorting operation proceeded in parallel with the duplicate keys removal.

Conclusion:

EXPLAIN FORMAT=JSON

  provides details about

ORDER BY

  optimization which cannot be seen with a regular

EXPLAIN

 operation.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com