Mar
07
2019
--

Reducing High CPU on MySQL: a Case Study

CPU Usage after query tuning

In this blog post, I want to share a case we worked on a few days ago. I’ll show you how we approached the resolution of a MySQL performance issue and used Percona Monitoring and Management PMM to support troubleshooting. The customer had noticed a linear high CPU usage in one of their MySQL instances and was not able to figure out why as there was no much traffic hitting the app. We needed to reduce the high CPU usage on MySQL. The server is a small instance:

Models | 6xIntel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz
10GB RAM

This symptom can be caused by various different reasons. Let’s see how PMM can be used to troubleshoot the issue.

CPU

The original issue - CPU usage at almost 100% during application use

It’s important to understand where the CPU time is being consumed: user space, system space, iowait, and so on. Here we can see that CPU usage was hitting almost 100% and the majority of the time was being spent on user space. In other words, the time the CPU was executing user code, such as MySQL. Once we determined that the time was being spent on user space, we could discard other possible issues. For example, we could eliminate the possibility that a high amount of threads were competing for CPU resources, since that would cause an increase in context switches, which in turn would be taken care of by the kernel – system space.

With that we decided to look into MySQL metrics.

MySQL

Thread activity graph in PMM for MySQL

Queries per second

As expected, there weren’t a lot of threads running—10 on average—and MySQL wasn’t being hammered with questions/transactions. It was running from 500 to 800 QPS (queries per second). Next step was to check the type of workload that was running on the instance:

All the commands are of a SELECT type, in red in this graph

In red we can see that almost all commands are SELECTS. With that in mind, we checked the handlers using 

SHOW STATUS LIKE 'Handler%'

 to verify if those selects were doing an index scan, a full table scan or what.

Showing that the query was a full table scan

Blue in this graph represents

Handler_read_rnd_next

 , which is the counter MySQL increments every time it reads a row when it’s doing a full table scan. Bingo!!! Around 350 selects were reading 2.5 million rows. But wait—why was this causing CPU issues rather than IO issues? If you refer to the first graph (CPU graph) we cannot see iowait.

That is because the data was stored in the InnoDB Buffer Pool, so instead of having to read those 2.5M rows per second from disk, it was fetching them from memory. The stress had moved from disk to CPU. Now that we identified that the issue had been caused by some queries or query, we went to QAN to verify the queries and check their status:

identifying the long running query in QAN

First query, a

SELECT

  on table 

store.clients

 was responsible for 98% of the load and was executing in 20+ seconds.

The initial query load

EXPLAIN confirmed our suspicions. The query was accessing the table using type ALL, which is the last type we want as it means “Full Table Scan”. Taking a look into the fingerprint of the query, we identified that it was a simple query:

Fingerprint of query
Indexes on table did not include a key column

The query was filtering clients based on the status field

SELECT * FROM store.clients WHERE status = ?

 As shown in the indexes, that column was not indexed. Talking with the customer, this turned out to be a query that was introduced as part of a new software release.

From that point, we were confident that we had identified the problem. There could be more, but this particular query was definitely hurting the performance of the server. We decided to add an index and also sent an annotation to PMM, so we could refer back to the graphs to check when the index has been added, check if CPU usage had dropped, and also check Handler_read_rnd_next.

To run the alter we decided to use pt-online-schema-change as it was a busy table, and the tool has safeguards to prevent the situation from becoming even worse. For example, we wanted to pause or even abort the alter in the case of the number of Threads_Running exceeding a certain threshold. The threshold is controlled by

--max-load

  (25 by default) and

--critical-load

  (50 by default):

pmm-admin annotate "Started ALTER store.clients ADD KEY (status)" && \
pt-online-schema-change --alter "ADD KEY (status)" --execute u=root,D=store,t=clients && \
pmm-admin annotate "Finished ALTER store.clients ADD KEY (status)"
Your annotation was successfully posted.
No slaves found. See --recursion-method if host localhost.localdomain has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
analyze_table, 10, 1
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `store`.`clients`...
Creating new table...
Created new table store._clients_new OK.
Altering new table...
Altered `store`.`_clients_new` OK.
2019-02-22T18:26:25 Creating triggers...
2019-02-22T18:27:14 Created triggers OK.
2019-02-22T18:27:14 Copying approximately 4924071 rows...
Copying `store`.`clients`: 7% 05:46 remain
Copying `store`.`clients`: 14% 05:47 remain
Copying `store`.`clients`: 22% 05:07 remain
Copying `store`.`clients`: 30% 04:29 remain
Copying `store`.`clients`: 38% 03:59 remain
Copying `store`.`clients`: 45% 03:33 remain
Copying `store`.`clients`: 52% 03:06 remain
Copying `store`.`clients`: 59% 02:44 remain
Copying `store`.`clients`: 66% 02:17 remain
Copying `store`.`clients`: 73% 01:50 remain
Copying `store`.`clients`: 79% 01:23 remain
Copying `store`.`clients`: 87% 00:53 remain
Copying `store`.`clients`: 94% 00:24 remain
2019-02-22T18:34:15 Copied rows OK.
2019-02-22T18:34:15 Analyzing new table...
2019-02-22T18:34:15 Swapping tables...
2019-02-22T18:34:27 Swapped original and new tables OK.
2019-02-22T18:34:27 Dropping old table...
2019-02-22T18:34:32 Dropped old table `store`.`_clients_old` OK.
2019-02-22T18:34:32 Dropping triggers...
2019-02-22T18:34:32 Dropped triggers OK.
Successfully altered `store`.`clients`.
Your annotation was successfully posted.

Results

MySQL Handlers after query tuning MySQL query throughput after query tuning
Query analysis by EXPLAIN in PMM after tuning

As we can see, above, CPU usage dropped to less than 25%, which is 1/4 of the previous usage level. Handler_read_rnd_next dropped and we can’t even see it once pt-osc has finished. We had a small increase on Handler_read_next as expected because now MySQL is using the index to resolve the WHERE clause. One interesting outcome is that the instance was able to increase it’s QPS by 2x after the index was added as CPU/Full Table Scan was no longer limiting performance. On average, query time has dropped from 20s to only 661ms.

Summary:

  1. Applying the correct troubleshooting steps to your problems is crucial:
    a) Understand what resources have been saturated.
    b) Understand what if anything is causing an error.
    c) From there you can divert into the areas that are related to that resource and start to narrow down the issue.
    d) Tackle the problems bit by bit.
  2. Having the right tools for the job key for success. PMM is a great example of a tool that can help you quickly identify, drill in, and fix bottlenecks.
  3. Have realistic load tests. In this case, they had tested the new release on a concurrency level that was not like their production
  4. By identifying the culprit query we were able to:
    a.) Drop average query time from 20s to 661ms
    b.) Increase QPS by 2x
    c.) Reduce the usage of CPU to 1/4 of its level prior to our intervention

Disclosure: For security reasons, sensitive information, such as database, table, column names have been modified and graphs recreated to simulate a similar problem.

Oct
03
2018
--

Finding Table Differences on Nullable Columns Using MySQL Generated Columns

MySQL generated columns

MySQL generated columnsSome time ago, a customer had a performance issue with an internal process. He was comparing, finding, and reporting the rows that were different between two tables. This is simple if you use a LEFT JOIN and an 

IS NULL

  comparison over the second table in the WHERE clause, but what if the column could be null? That is why he used UNION, GROUP BY and a HAVING clauses, which resulted in poor performance.

The challenge was to be able to compare each row using a LEFT JOIN over NULL values.

The challenge in more detail

I’m not going to use the customer’s real table. Instead, I will be comparing two sysbench tables with the same structure:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned DEFAULT NULL,
  `c` char(120) DEFAULT NULL,
  `pad` char(60) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `k` (`k`,`c`,`pad`)
) ENGINE=InnoDB

It is sightly different from the original sysbench schema, as this version can hold NULL values. Both tables have the same number of rows. We are going to set to NULL one row on each table:

update sbtest1 set k=null where limit 1;
update sbtest2 set k=null where limit 1;

If we execute the comparison query, we get this result:

mysql> select "sbtest1",a.* from
    -> sbtest1 a left join
    -> sbtest2 b using (k,c,pad)
    -> where b.id is null union
    -> select "sbtest2",a.* from
    -> sbtest2 a left join
    -> sbtest1 b using (k,c,pad)
    -> where b.id is null;
+---------+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| sbtest1 | id   | k    | c                                                                                                                       | pad                                                         |
+---------+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| sbtest1 | 4462 | NULL | 64568100364-99474573987-46567807085-85185678273-10829479379-85901445105-43623848418-63872374080-59257878609-82802454375 | 07052127207-33716235481-22978181904-76695680520-07986095803 |
| sbtest2 | 4462 | NULL | 64568100364-99474573987-46567807085-85185678273-10829479379-85901445105-43623848418-63872374080-59257878609-82802454375 | 07052127207-33716235481-22978181904-76695680520-07986095803 |
+---------+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
2 rows in set (3.00 sec)

As you can see, column k is NULL. In both cases it failed and reported those rows to be different. This is not new in MySQL, but it would be nice to have a way to sort this issue out.

Solution

The solution is based on GENERATED COLUMNS with a hash function (md5) and stored in a binary(16) column:

ALTER TABLE sbtest1
ADD COLUMN `real_id` binary(16) GENERATED ALWAYS AS (unhex(md5(concat(ifnull(`k`,'NULL'),ifnull(`c`,'NULL'),ifnull(`pad`,'NULL'))))) VIRTUAL,
ADD INDEX (real_id);
ALTER TABLE sbtest2
ADD COLUMN `real_id` binary(16) GENERATED ALWAYS AS (unhex(md5(concat(ifnull(`k`,'NULL'),ifnull(`c`,'NULL'),ifnull(`pad`,'NULL'))))) VIRTUAL,
ADD INDEX (real_id);

Adding the index is also part of the solution. Now, let’s execute the query using the new column to join the tables:

mysql> select "sbtest1",a.k,a.c,a.pad from
    -> sbtest1 a left join
    -> sbtest2 b using (real_id)
    -> where b.id is null union
    -> select "sbtest2",a.k,a.c,a.pad from
    -> sbtest2 a left join
    -> sbtest1 b using (real_id)
    -> where b.id is null;
Empty set (2.31 sec)

We can see an improvement in the query performance—it now takes 2.31 sec whereas before it was 3.00 sec—and that the result is as expected. We could say that that’s all, and no possible improvement can be made. However, is not true. Even though the query is running faster, it is possible to optimize it in this way:

mysql> select "sbtest1",a.k,a.c,a.pad
    -> from sbtest1 a
    -> where a.id in (select a.id
    ->   from sbtest1 a left join
    ->   sbtest2 b using (real_id)
    ->   where b.id is null) union
    -> select "sbtest2",a.k,a.c,a.pad
    -> from sbtest2 a
    -> where a.id in (select a.id
    ->   from sbtest2 a left join
    ->   sbtest1 b using (real_id)
    ->   where b.id is null);
Empty set (1.60 sec)

Why is this faster? The first query is performing two subqueries. Each subquery is very similar. Let’s check the explain plan:

mysql> explain select "sbtest1",a.k,a.c,a.pad from
    -> sbtest1 a left join
    -> sbtest2 b using (real_id)
    -> where b.id is null;
+----+-------------+-------+------------+------+---------------+---------+---------+------------------+--------+----------+--------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref              | rows   | filtered | Extra                                |
+----+-------------+-------+------------+------+---------------+---------+---------+------------------+--------+----------+--------------------------------------+
|  1 | SIMPLE      | a     | NULL       | ALL  | NULL          | NULL    | NULL    | NULL             | 315369 |   100.00 | NULL                                 |
|  1 | SIMPLE      | b     | NULL       | ref  | real_id       | real_id | 17      | sbtest.a.real_id |     27 |    10.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+------------------+--------+----------+--------------------------------------+

As you can see, it is performing a full table scan over the first table and using real_id to join the second table. The real_id is a generated column, so it needs to execute the function to get the value to join the second table. That means that it’s going to take time.

If we analyze the subquery of the second query:

mysql> explain select "sbtest1",a.k,a.c,a.pad
    -> from sbtest1 a
    -> where a.id in (select a.id
    ->   from sbtest1 a left join
    ->   sbtest2 b using (real_id)
    ->   where b.id is null);
+----+--------------+-------------+------------+--------+---------------+------------+---------+------------------+--------+----------+--------------------------------------+
| id | select_type  | table       | partitions | type   | possible_keys | key        | key_len | ref              | rows   | filtered | Extra                                |
+----+--------------+-------------+------------+--------+---------------+------------+---------+------------------+--------+----------+--------------------------------------+
|  1 | SIMPLE       | a           | NULL       | index  | PRIMARY       | k          | 187     | NULL             | 315369 |   100.00 | Using where; Using index             |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_key>    | <auto_key> | 4       | sbtest.a.id      |      1 |   100.00 | NULL                                 |
|  2 | MATERIALIZED | a           | NULL       | index  | PRIMARY       | real_id    | 17      | NULL             | 315369 |   100.00 | Using index                          |
|  2 | MATERIALIZED | b           | NULL       | ref    | real_id       | real_id    | 17      | sbtest.a.real_id |     27 |    10.00 | Using where; Not exists; Using index |
+----+--------------+-------------+------------+--------+---------------+------------+---------+------------------+--------+----------+--------------------------------------+

We are going to see that it is performing a full index scan over the first table, and that the generated column has never been executed. That is how we can go from an inconsistent result of three seconds, to a consistent result of 2.31 seconds, to finally reach a performant query using the faster time of 1.60 seconds.

Conclusions

This is not the first blog post that I’ve done about generated columns. I think that it is a useful feature for several scenarios where you need to improve performance. In this particular case, it’s also presenting a workaround to expected inconsistencies with LEFT JOINS with NULL values. It is also important to mention that this improved a process in a real world scenario.

The post Finding Table Differences on Nullable Columns Using MySQL Generated Columns appeared first on Percona Database Performance Blog.

Sep
25
2018
--

Why Optimization derived_merge can Break Your Queries

MySQL optimizer bugs

MySQL optimizer bugsLately, I worked on several queries which started returning wrong results after upgrading MySQL Server to version 5.7 The reason for the failure was derived merge optimization which is one of the default

optimizer_switch

  options. Issues were solved, though at the price of performance, when we turned it

OFF

 . But, more importantly, we could not predict if any other query would start returning incorrect data, to allow us to fix the application before it was too late. Therefore I tried to find reasons why

derived_merge

  can fail.

Analyzing the problem

In the first run, we turned SQL Mode

ONLY_FULL_GROUP_BY

on, and this removed most of the problematic queries. That said, few of the queries that were successfully working with

ONLY_FULL_GROUP_BY

  were affected.

A quick search in the MySQL bugs database gave me a not-so-short list of open bugs:

At first glance, the reported queries do not follow any pattern, and we still cannot quickly identify which would break and which would not.

Then I took a second look by running all of the provided test cases in my environment and found that for four bugs, the optimizer rewrote the query. For three of the bugs, it rewrote in both 5.7 and 8.0, and one case it rewrote in 8.0 only.

The remaining three buggy queries (Bug #85117, Bug #91418, Bug #91878) have things in common. Let’s first look at them:

  1. Bug #85117
    select
        temp.sel
    from
        table1 t1
        left join (
            select *,1 as sel from table1_use t1u where t1u.`table1id`=1
        ) temp on temp.table1id = t1.id
    order by t1.value
  2. Bug #91418
    select
        TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    from
        TEST_SUB_PROBLEM TST
        join (
            select uuid() as DIST_UID, vw.*
            from (
                select DISTINCT BID, THING_NAME
                from TEST_SUB_PROBLEM
            ) vw
        ) vw2
    on vw2.BID = TST.BID;
  3. Bug #91878
    SELECT
        Virtual_Table.T_FP AS T_FP,
        (
            SELECT COUNT(Virtual_Table.T_FP)
            FROM t1 t
            WHERE t.f1 = Virtual_Table.T_FP
            AND Virtual_Table.T_FP = 731834939448428685
        ) AS Test_Value
    FROM (
        SELECT t.f1 AS T_FP, tv.f1 AS TV_FP
        FROM t1 AS t
        JOIN t2 AS tv
        ON t.f1 = tv.t1_f1
    ) AS Virtual_Table
    GROUP BY
        Virtual_Table.TV_FP
    HAVING
        Test_Value > 0;

Two of the queries use

DISTINCT

  or

GROUP BY

 , one uses

ORDER BY

  clause. The cases do not have not the same clause in common—which is what I’d expect to see—and so, surprisingly, these are not the cause of the failure. However, all three queries use generated values: a constant in the first one;

UUID()

  and

COUNT()

  functions in the second and third respectively. This similarity is something we need to investigate.

To find out why

derived_merge

  might work incorrectly for these queries we need to understand how this optimization works and why it was introduced.

The intent behind derived_merge

First I recommend checking the official MySQL User Reference Manual and MariaDB knowledge base. It is correct to use both manuals: even if low-level implementations vary, the high-level architecture and the purpose of this optimization are the same.

In short:

derived_merge

  is used for queries that have subqueries in the 

FROM

  clause,  also called “derived tables” and practically converts them into

JOIN

 queries. This optimization allows avoiding unnecessary materialization (creating internal temporary tables to hold results). Virtually this is the same thing as a manual rewrite of a query with a subquery into a query that has

JOIN

 clause(s) only. The only difference is that when we rewrite queries manually, we can compare the expected and actual result, then adjust the resulting query if needed. The MySQL optimizer has to do a correct rewrite at the first attempt. And sometimes this effort fails.

Let’s check why this happens for these particular queries, reported in the MySQL Bugs Database.

Case Study 1: a Query from Bug #85117

Original query

select
      temp.sel
from
    table1 t1
    left join (
         select *,1 as sel from table1_use t1u where t1u.`table1id`=1
    ) temp on temp.table1id = t1.id
order by t1.value

was rewritten to:

Note (Code 1003):
/* select#1 */
select 1 AS `sel`
    from
        `test`.`table1` `t1`
    left join
        (`test`.`table1_use` `t1u`)
    on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1)))
    where 1
    order by `test`.`t1`.`value`;

You can always find a query that the optimizer converts the original one to in the

SHOW WARNINGS

 output following

EXPLAIN [EXTENDED]

 for the query.

In this case, the original query asks to return all rows from the table

table1

 , but selects only the generated field from the subquery. The subquery selects the only row with

table1id=1

 .

Avoiding derived merge optimization is practically the same as joining table

table1

 with a table with one row. You can see how it works in this code snippet:

mysql> create temporary table temp as select *,1 as sel from table1_use t1u where t1u.`table1id`=1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql> select * from temp;
+----+----------+------+-----+
| id | table1id | uid  | sel |
+----+----------+------+-----+
|  1 |        1 |   99 |   1 |
+----+----------+------+-----+
1 row in set (0.00 sec)
mysql> select temp.sel from table1 t1 left join temp on temp.table1id = t1.id order by t1.value;
+------+
| sel  |
+------+
|    1 |
| NULL |
| NULL |
+------+
3 rows in set (0.00 sec)

However, when the optimizer uses derived-merge optimization, it completely ignores the fact that the resulting table has one row, and that the calculated value would be either

NULL

  or 1 depending if a row corresponding to

table1

  exists in the table. That it prints

select 1 AS `sel`

  in the

EXPLAIN

  output while uses

select NULL AS `sel`

  does not change anything: both are wrong. The correct query without a subquery should look like:

mysql> select if(`test`.`t1u`.`table1id`, 1, NULL) AS `sel`
    -> from `test`.`table1` `t1`
    -> left join (`test`.`table1_use` `t1u`)
    -> on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1)))
    -> where 1
    -> order by `test`.`t1`.`value`;
+------+
| sel  |
+------+
|    1 |
| NULL |
| NULL |
+------+
3 rows in set (0.00 sec)

This report is the easiest of the bugs we will discuss in this post, and is also fixed in MariaDB.

Case Study 2: a Query from Bug #91418

mysql> select * from TEST_SUB_PROBLEM;
+-----+--------+------------+---------------------+
| UID | BID    | THING_NAME | OTHER_IFO           |
+-----+--------+------------+---------------------+
|   1 | thing1 | name1      | look a chicken      |
|   2 | thing1 | name1      | look an airplane    |
|   3 | thing2 | name2      | look a mouse        |
|   4 | thing3 | name3      | look a taperecorder |
|   5 | thing3 | name3      | look an explosion   |
|   6 | thing4 | name4      | look at the stars   |
+-----+--------+------------+---------------------+
6 rows in set (0.00 sec)
mysql> select
    ->     TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    -> from
    ->     TEST_SUB_PROBLEM TST
    -> join (
    ->     select uuid() as DIST_UID, vw.*
    ->     from (
    ->         select DISTINCT BID, THING_NAME
    ->         from TEST_SUB_PROBLEM
    ->     ) vw
    -> ) vw2
    -> on vw2.BID = TST.BID;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | e4c288fd-b29c-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | e4c28aef-b29c-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | e4c28c47-b29c-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | e4c28d92-b29c-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | e4c28ed9-b29c-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | e4c29031-b29c-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.00 sec)

This query should create a unique

DIST_UID

  for each unique

BID

 name. But, instead, it generates a unique

ID

  for each row.

First, let’s split the query into a couple of queries using temporary tables, to confirm our assumption that it was written correctly in the first place:

mysql> create temporary table vw as select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw;
+--------+------------+
| BID    | THING_NAME |
+--------+------------+
| thing1 | name1      |
| thing2 | name2      |
| thing3 | name3      |
| thing4 | name4      |
+--------+------------+
4 rows in set (0.00 sec)
mysql> create temporary table vw2 as select uuid() as DIST_UID, vw.* from vw;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw2;
+--------------------------------------+--------+------------+
| DIST_UID                             | BID    | THING_NAME |
+--------------------------------------+--------+------------+
| eb155f0e-b29d-11e8-b0d7-0242673a86b2 | thing1 | name1      |
| eb158c05-b29d-11e8-b0d7-0242673a86b2 | thing2 | name2      |
| eb159b28-b29d-11e8-b0d7-0242673a86b2 | thing3 | name3      |
| eb15a916-b29d-11e8-b0d7-0242673a86b2 | thing4 | name4      |
+--------------------------------------+--------+------------+
4 rows in set (0.00 sec)
mysql> select
    -> TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    -> from TEST_SUB_PROBLEM TST
    -> join vw2
    -> on vw2.BID = TST.BID;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | eb158c05-b29d-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | eb159b28-b29d-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | eb159b28-b29d-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | eb15a916-b29d-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.01 sec)
mysql> select distinct DIST_UID
    -> from (
    ->     select
    ->         TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    ->     from TEST_SUB_PROBLEM TST
    ->     join vw2
    ->     on vw2.BID = TST.BID
    -> ) t;
+--------------------------------------+
| DIST_UID                             |
+--------------------------------------+
| eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
| eb158c05-b29d-11e8-b0d7-0242673a86b2 |
| eb159b28-b29d-11e8-b0d7-0242673a86b2 |
| eb15a916-b29d-11e8-b0d7-0242673a86b2 |
+--------------------------------------+
4 rows in set (0.00 sec)

With temporary tables, we have precisely four unique

DIST_UID

  values unlike the six values that our original query returned.

Let’s check how the original query was rewritten:

Note (Code 1003):
/* select#1 */
select
    `test`.`TST`.`UID` AS `UID`,
    `test`.`TST`.`BID` AS `BID`,
    `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    uuid() AS `DIST_UID`
from `test`.`TEST_SUB_PROBLEM` `TST`
join
    (/* select#3 */
    select
        distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
        `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    from `test`.`TEST_SUB_PROBLEM`) `vw`
where (`vw`.`BID` = `test`.`TST`.`BID`)

You can see that the optimizer did not wholly remove the subquery here. Let’s run this modified query, and run a test with a temporary table one more time:

mysql> select
    ->     `test`.`TST`.`UID` AS `UID`,
    ->     `test`.`TST`.`BID` AS `BID`,
    ->     `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    ->     `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    ->     uuid() AS `DIST_UID`
    -> from
    ->     `test`.`TEST_SUB_PROBLEM` `TST`
    -> join
    -> (/* select#3 */
    ->     select
    ->         distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
    ->         `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    ->     from
    ->         `test`.`TEST_SUB_PROBLEM`
    -> ) `vw`
    -> where (`vw`.`BID` = `test`.`TST`.`BID`)
    -> ;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | 12c5f554-b29f-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | 12c5f73a-b29f-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | 12c5f894-b29f-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | 12c5f9de-b29f-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | 12c5fb20-b29f-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | 12c5fc7d-b29f-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.01 sec)

This time the changed query result is no different to the one we received from the original one. Let’s manually replace the subquery with temporary tables, and check if it affects the result again.

mysql> create temporary table vw
    -> select
    ->     distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
    ->     `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    -> from `test`.`TEST_SUB_PROBLEM`;
Query OK, 4 rows affected (0.01 sec)<br>Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw;
+--------+------------+
| BID    | THING_NAME |
+--------+------------+
| thing1 | name1      |
| thing2 | name2      |
| thing3 | name3      |
| thing4 | name4      |
+--------+------------+
4 rows in set (0.00 sec)
mysql> select
    ->     `test`.`TST`.`UID` AS `UID`,
    ->     `test`.`TST`.`BID` AS `BID`,
    ->     `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    ->     `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    ->      uuid() AS `DIST_UID`
    -> from `test`.`TEST_SUB_PROBLEM` `TST`
    -> join vw where (`vw`.`BID` = `test`.`TST`.`BID`) ;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | e11dbe61-b2a0-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | e11dc050-b2a0-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | e11dc1af-b2a0-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | e11dc2be-b2a0-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | e11dc3a8-b2a0-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | e11dc4e9-b2a0-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.00 sec)

In this case, the temporary table contains the correct number of rows: 4, but the outer query calculates a 

UUID

  value for all rows in the table

TEST_SUB_PROBLEM

 . It does not take into account that the user initially asks for a unique

UUID

  for each unique

BID

  and not each unique

UID

 . Instead, it just moves a call of

UUID()

  function into the outer query, which creates a unique value for each row in the table

TEST_SUB_PROBLEM

 . It does not take into account that the temporary table contains only four rows. In this case, it would not be easy to build an effective query that generates distinct

UUID

  values for rows with different

BID

 ‘s and the same

UUID

  values for rows with the same

BID

 .

Case Study 3: a Query from Bug #91878

This query is supposed to calculate a number of rows based on complex conditions:

SELECT
Virtual_Table.T_FP AS T_FP,
(SELECT COUNT(Virtual_Table.T_FP) FROM t1 t WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value
FROM
(SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1) AS Virtual_Table
GROUP BY Virtual_Table.TV_FP
HAVING Test_Value > 0;

However, it returns no rows when it should return 22 (check the bug report for the full test case).

mysql> SELECT Virtual_Table.T_FP AS T_FP,
    -> (
    ->     SELECT
    ->         COUNT(Virtual_Table.T_FP)
    ->     FROM t1 t
    ->     WHERE
    ->         t.f1 = Virtual_Table.T_FP
    ->     AND
    ->         Virtual_Table.T_FP = 731834939448428685
    -> ) AS Test_Value
    -> FROM (
    ->     SELECT
    ->         t.f1 AS T_FP, tv.f1 AS TV_FP
    ->     FROM t1 AS t
    ->     JOIN t2 AS tv
    ->     ON t.f1 = tv.t1_f1
    -> ) AS Virtual_Table
    -> GROUP BY Virtual_Table.TV_FP
    -> HAVING Test_Value > 0;
Empty set (1.28 sec)

To find out why this happens let’s perform a temporary table check first.

mysql> create temporary table Virtual_Table SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1;
Query OK, 18722 rows affected (2.12 sec)
Records: 18722  Duplicates: 0  Warnings: 0
mysql> SELECT Virtual_Table.T_FP AS T_FP,
    -> (SELECT COUNT(Virtual_Table.T_FP) FROM t1 t
    -> WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value
    -> FROM  Virtual_Table GROUP BY Virtual_Table.TV_FP HAVING Test_Value > 0;
+--------------------+------------+
| T_FP               | Test_Value |
+--------------------+------------+
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
+--------------------+------------+
22 rows in set (1.62 sec)

The rewritten query returned the correct result, as we expected.

To identify why the original query fails, let’s check how the optimizer rewrote it:

Note (Code 1003):
/* select#1 */
select
    `test`.`t`.`f1` AS `T_FP`,
    (/* select#2 */
        select
            count(`test`.`t`.`f1`)
        from
            `test`.`t1` `t`
        where
            (('731834939448428685' = 731834939448428685)
        and (`test`.`t`.`f1` = 731834939448428685))
    ) AS `Test_Value`
    from
        `test`.`t1` `t`
    join
        `test`.`t2` `tv`
    where
        (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    group by `test`.`tv`.`f1`
    having (`Test_Value` > 0)

Interestingly, when I run this query on the original tables it returned all 18722 rows that exist in table

t2

 .

This output means that we cannot entirely rely on the 

EXPLAIN

  output. But still we can see the same symptoms:

  • Subquery uses a function to generate a value
  • Subquery in the
    FROM

      clause is converted into a 

    JOIN

    , and its values are accessible by an outer subquery

We also see that the query has

GROUP BY

  and

HAVING

  clauses, thus adding a complication.

The query is almost correct, but in this case, the optimizer mixed aliases: it uses the same alias in the internal query as in the external one. If you change the alias from

t

  to

t2

  in the subquery, the rewritten query starts returning correct results:

mysql> select
    ->     `test`.`t`.`f1` AS `T_FP`,
    -> (/* select#2 */
    ->     select
    ->         count(`test`.`t`.`f1`)
    ->     from
    ->         `test`.`t1` `t`
    ->     where (
    ->         ('731834939448428685' = 731834939448428685)
    ->     and
    ->         (`test`.`t`.`f1` = 731834939448428685)
    ->     )
    -> ) AS `Test_Value`
    -> from
    ->     `test`.`t1` `t`
    -> join
    ->     `test`.`t2` `tv`
    -> where
    ->     (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    -> group by `test`.`tv`.`f1`
    -> having (`Test_Value` > 0);
...
| 731834939454553991 |          1 |
| 731834939453739998 |          1 |
+--------------------+------------+
18722 rows in set (0.49 sec)
mysql> select
    ->     `test`.`t`.`f1` AS `T_FP`,
    -> (/* select#2 */
    ->     select
    ->         count(`test`.`t`.`f1`)
    ->     from
    ->         `test`.`t1` `t2`
    ->     where (
    ->         (t2.f1=t.f1)
    ->     and
    ->         (`test`.`t`.`f1` = 731834939448428685)
    ->     )
    -> ) AS `Test_Value`
    -> from
    ->     `test`.`t1` `t`
    -> join
    ->     `test`.`t2` `tv`
    -> where
    ->     (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    -> group by `test`.`tv`.`f1`
    -> having (`Test_Value` > 0);
+--------------------+------------+
| T_FP               | Test_Value |
+--------------------+------------+
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
+--------------------+------------+
22 rows in set (1.82 sec)

While the calculated value is not the reason why this query returns incorrect results, it is similar to the previous examples because the optimizer does not take in account that the value of

`test`.`t`.`f1`

  in the outer query is not necessarily equal to 731834939448428685.

Is also interesting that neither Oracle nor PostgreSQL accept such a query, and instead complain of improper use of the 

GROUP BY

 clause. Meanwhile, MySQL accepts this query even with SQL Mode set to

ONLY_FULL_GROUP_BY

 . Reported as bug #92020.

Conclusion and recommendations

While

derived_merge

  is a very effective optimization, it can rewrite queries destructively. Safety measures when using this optimization are:

  1. Make sure that you use the latest version of MySQL/Percona/MariaDB servers which include all of the new bug fixes.
  2. Generated values for the subquery results either constant or returned values of functions is the red flag.
  3. Relaxing SQL Mode
    ONLY_FULL_GROUP_BY

      is always dangerous and should not be used together with

    derived_merge

    .

As a last resort, you can consider rewriting queries to

JOIN

  manually or turning

derived_merge

  optimization

OFF

 .

 

The post Why Optimization derived_merge can Break Your Queries appeared first on Percona Database Performance Blog.

Jun
20
2018
--

Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance

database query tuning

database query tuningPlease join Percona’s MySQL Database Administrator, Brad Mickel as he presents How to Analyze and Tune MySQL Queries for Better Performance on Thursday, June 21st, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

 

Query performance is essential in making any application successful. In order to finely tune your queries you first need to understand how MySQL executes them, and what tools are available to help identify problems.

In this session you will learn:

  1. The common tools for researching problem queries
  2. What an Index is, and why you should use one
  3. Index limitations
  4. When to rewrite the query instead of just adding a new index
Register Now

 

Brad Mickel

MySQL DBA

Bradley began working with MySQL in 2013 as part of his duties in healthcare billing. After 3 years in healthcare billing he joined Percona as part of the bootcamp process. After the bootcamp he has served as a remote database administrator on the Atlas team for Percona Managed Services.

The post Webinar Thu 6/21: How to Analyze and Tune MySQL Queries for Better Performance appeared first on Percona Database Performance Blog.

Apr
06
2016
--

EXPLAIN FORMAT=JSON wrap-up

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON wrap-upThis blog is an EXPLAIN FORMAT=JSON wrap-up for the series of posts I’ve done in the last few months.

In this series, we’ve discussed everything unique to

EXPLAIN FORMAT=JSON

. I intentionally skipped a description of members such as

table_name

,

access_type

  or

select_id

, which are not unique.

In this series, I only mentioned in passing members that replace information from the

Extra

 column in the regular

EXPLAIN

 output, such as

using_join_buffer

 ,

partitions

,

using_temporary_table

  or simply

message

. You can see these in queries like the following:

mysql> explain format=json select rand() from dual
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "No tables used"
  }
}
1 row in set, 1 warning (0.00 sec)

Or

mysql> explain format=json select emp_no from titles where 'Senior Engineer' = 'Senior Cat'
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "message": "Impossible WHERE"
  }
}
1 row in set, 1 warning (0.01 sec)

Their use is fairly intuitive, similar to regular

EXPLAIN

, and I don’t think one can achieve anything from reading a blog post about each of them.

The only thing left to list is a Table of Contents for the series:

attached_condition: How EXPLAIN FORMAT=JSON can spell-check your queries

rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

used_columns: EXPLAIN FORMAT=JSON tells when you should use covered indexes

used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used

EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery

EXPLAIN FORMAT=JSON provides insights on optimizer_switch effectiveness

EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY

grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

ordering_operation: EXPLAIN FORMAT=JSON knows everything about ORDER BY processing

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

Thanks for following the series!

Feb
29
2016
--

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONOnce again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.

The regular

EXPLAIN

  command lists each table that participates in a 

JOIN

  operation on a single row. This works perfectly for simple queries:

mysql> explain select * from employees join titles join salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299379
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2745434
     filtered: 100.00
        Extra: Using join buffer (Block Nested Loop)
3 rows in set, 1 warning (0.00 sec)

You can see that the first accessed table was

employees

, then

titles

  and finally 

salaries

. Everything is clear.

EXPLAIN FORMAT=JSON

 in this case puts everything into the 

nested_loop

array (even if “MySQL isn’t limited to nested-loop joins”):

mysql> explain format=json select * from employees join titles join salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "7.277755124e16"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "employees",
          "access_type": "ALL",
          "rows_examined_per_scan": 299379,
          "rows_produced_per_join": 299379,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "929.00",
            "eval_cost": "59875.80",
            "prefix_cost": "60804.80",
            "data_read_per_join": "13M"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "titles",
          "access_type": "ALL",
          "rows_examined_per_scan": 442724,
          "rows_produced_per_join": 132542268396,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "62734.88",
            "eval_cost": "26508453679.20",
            "prefix_cost": "26508577218.88",
            "data_read_per_join": "7T"
          },
          "used_columns": [
            "emp_no",
            "title",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2745434,
          "rows_produced_per_join": 363886050091503872,
          "filtered": "100.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "314711040856.92",
            "eval_cost": "7.277721002e16",
            "prefix_cost": "7.277755124e16",
            "data_read_per_join": "5171P"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date",
            "to_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)

For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.

But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two

JOIN

 operations and two subqueries, a regular

EXPLAIN

 returns this plan:

mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <subquery2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: dept_manager
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: <subquery2>.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: <derived3>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 3
  select_type: DERIVED
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 7.51
        Extra: Using where; Using index; Using temporary; Using filesort; LooseScan
*************************** 6. row ***************************
           id: 3
  select_type: DERIVED
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.titles.emp_no
         rows: 9
     filtered: 100.00
        Extra: NULL
6 rows in set, 1 warning (0.00 sec)

It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if

DERIVED

 belongs to the first

JOIN

 or to the second. And I am not quite sure why

<subquery2>

  was marked as

PRIMARY

, which is supposed to indicate “Outermost SELECT”.

The real issue here is that the internal representation of

JOIN

 is hierarchical, and MySQL Server (like in the case for

UNION

) has trouble representing an object as a “flat” table.

EXPLAIN FORMAT=JSON

, with its hierarchical nature, can help us in this case.

mysql> explain format=json  select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "39.45"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
            "using_temporary_table": true,
            "query_block": {
              "table": {
                "table_name": "t",
                "access_type": "ALL",
                "rows_examined_per_scan": 9,
                "rows_produced_per_join": 9,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "10.45",
                  "eval_cost": "1.80",
                  "prefix_cost": "12.25",
                  "data_read_per_join": "144"
                },
                "used_columns": [
                  "emp_no",
                  "salary"
                ],
                "materialized_from_subquery": {
                  "using_temporary_table": true,
                  "dependent": false,
                  "cacheable": true,
                  "query_block": {
                    "select_id": 3,
                    "cost_info": {
                      "query_cost": "176246.11"
                    },
                    "grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
                            "access_type": "index",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "emp_no",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "rows_examined_per_scan": 442724,
                            "rows_produced_per_join": 33229,
                            "filtered": "7.51",
                            "using_index": true,
                            "loosescan": true,
                            "cost_info": {
                              "read_cost": "3380.56",
                              "eval_cost": "6645.94",
                              "prefix_cost": "63199.96",
                              "data_read_per_join": "2M"
                            },
                            "used_columns": [
                              "emp_no",
                              "title",
                              "from_date"
                            ],
                            "attached_condition": "(`employees`.`titles`.`title` like '%manager%')"
                          }
                        },
                        {
                          "table": {
                            "table_name": "salaries",
                            "access_type": "ref",
                            "possible_keys": [
                              "PRIMARY",
                              "emp_no"
                            ],
                            "key": "PRIMARY",
                            "used_key_parts": [
                              "emp_no"
                            ],
                            "key_length": "4",
                            "ref": [
                              "employees.titles.emp_no"
                            ],
                            "rows_examined_per_scan": 9,
                            "rows_produced_per_join": 9,
                            "filtered": "100.00",
                            "cost_info": {
                              "read_cost": "49622.62",
                              "eval_cost": "1.91",
                              "prefix_cost": "176236.57",
                              "data_read_per_join": "152"
                            },
                            "used_columns": [
                              "emp_no",
                              "salary",
                              "from_date"
                            ]
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        }
      },
      {
        "table": {
          "table_name": "dept_manager",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "emp_no"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 9,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "9.00",
            "eval_cost": "1.80",
            "prefix_cost": "23.05",
            "data_read_per_join": "144"
          },
          "used_columns": [
            "dept_no",
            "emp_no",
            "from_date",
            "to_date"
          ]
        }
      },
      {
        "table": {
          "table_name": "employees",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "emp_no"
          ],
          "key_length": "4",
          "ref": [
            "<subquery2>.emp_no"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "1.00",
            "eval_cost": "0.20",
            "prefix_cost": "39.45",
            "data_read_per_join": "48"
          },
          "used_columns": [
            "emp_no",
            "birth_date",
            "first_name",
            "last_name",
            "gender",
            "hire_date"
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.01 sec)

At first we see that all our tables,

JOIN

 operations and subqueries are in the

nested_loop

 array:

"nested_loop": [
      {
        "table": {
          "table_name": "<subquery2>",
...
      {
        "table": {
          "table_name": "dept_manager",
...
      {
        "table": {
          "table_name": "employees",
...
      }
    ]

Then we see that the first table,

<subquery2>

, was materialized_from_subquery:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...

Which, in its turn, was

materialized_from_subquery

 too:

"table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "materialized_from_subquery": {
...
                "materialized_from_subquery": {
...

This last subquery performs

grouping_operation

  on the other 

nested_loop

  (

JOIN

) of tables

titles

  and

salaries

:

"grouping_operation": {
                      "using_temporary_table": true,
                      "using_filesort": true,
                      "cost_info": {
                        "sort_cost": "9.54"
                      },
                      "nested_loop": [
                        {
                          "table": {
                            "table_name": "titles",
...
                        },
                        {
                          "table": {
                            "table_name": "salaries",
...

Now we have a better picture of how the query was optimized: tables

titles

 and

salaries

  were joined first, then

GROUP BY

 was executed on the result, then the result was materialized and queried. The result of the query

select emp_no from <materialized> t

  was materialized again as

<subquery2>

, and only after it joined with two other tables.

Conclusion:

EXPLAIN FORMAT=JSON

  helps to understand how complex queries are optimized.

Feb
22
2016
--

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that

cost_info

  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table

titles

  from the standard employees database:

mysql> show create table titlesG
*************************** 1. row ***************************
       Table: titles
Create Table: CREATE TABLE `titles` (
  `emp_no` int(11) NOT NULL,
  `title` varchar(50) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date DEFAULT NULL,
  PRIMARY KEY (`emp_no`,`title`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

It has two indexes:

emp_no

  and

PRIMARY

, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, 

emp_no

  doesn’t really fit for this query.

PRIMARY

  does fit, because it contains both the 

title

  and

from_date

  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by

emp_no

  and

title

 .  It can, however, be used to select rows from the index. When we use 

EXPLAIN

 , though, it shows us that the optimizer has chosen index

emp_no

  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: titles
   partitions: NULL
         type: index
possible_keys: PRIMARY,emp_no
          key: emp_no
      key_len: 4
          ref: NULL
         rows: 442724
     filtered: 100.00
        Extra: Using where; Using index; Using temporary
1 row in set, 1 warning (0.00 sec)

PRIMARY KEY

  exists in the field

possible_keys

, but was not chosen.

EXPLAIN FORMAT=JSON

  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "89796.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "emp_no",
        "used_key_parts": [
          "emp_no"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": {
      "query_cost": "89796.80"
    },

Which shows that the overall

query_cost

  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is

cost_info

, which belongs to the table itself:

"cost_info": {
          "read_cost": "1252.00",
          "eval_cost": "88544.80",
          "prefix_cost": "89796.80",
          "data_read_per_join": "27M"
        },

Here we get even more details, such as cost of read operation and evaluation.

prefix_cost

  is not useful for this example, because it contains the cost of joining to the next table in

JOIN

. Since we don’t join the table

titles

  with any other value of

prefix_cost

, is equivalent to the cost of the full query.

data_read_per_join

  contains the amount of data that should be read for each

JOIN

  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index

PRIMARY

  and examine the 

EXPLAIN FORMAT=JSON

  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "531269.80"
    },
    "duplicates_removal": {
      "using_temporary_table": true,
      "using_filesort": false,
      "table": {
        "table_name": "titles",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "emp_no"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "emp_no",
          "title",
          "from_date"
        ],
        "key_length": "59",
        "rows_examined_per_scan": 442724,
        "rows_produced_per_join": 442724,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },
        "used_columns": [
          "emp_no",
          "title",
          "from_date"
        ],
        "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"
      }
    }
  }
}
1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": {
      "query_cost": "531269.80"
    },

read_cost

  is 442725.00, which is 353 times greater than 1252.00. However, the 

eval_cost

  and

data_read_per_join

  are the same as the query that uses index

emp_no

 :

"cost_info": {
          "read_cost": "442725.00",
          "eval_cost": "88544.80",
          "prefix_cost": "531269.80",
          "data_read_per_join": "27M"
        },

These numbers clearly explain why the optimizer prefers the index 

emp_no

  to

PRIMARY KEY

.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  can be used together with

FORCE INDEX

  to find out why the optimizer prefers one index to another.

Feb
09
2016
--

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series. Today we’re going to look at how you can view the buffer result using JSON (instead of the regular

EXPLAIN

 command.

Regular

EXPLAIN

 does not identify if

SQL_BUFFER_RESULT

 was used at all. To demonstrate, let’s run this query:

mysql> explain select * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.01 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Now, let’s compare it to this query:

mysql> explain select sql_buffer_result * from salariesG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Notice there is no difference, except the expected

"Using temporary"

 value in the

"Extra"

 row of the second query. The field 

"Using temporary"

  is expected here, because

SQL_BUFFER_RESULT

  directly instructs the MySQL server to put a result set into a temporary table to free locks. But what if the query uses the temporary table by itself? For example, for a grouping operation? In this case, the 

EXPLAIN

 result for the original query and the query that contains the 

SQL_BUFFER_RESULT

  clause will be 100% identical.

Compare:

mysql> explain select emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

With:

mysql> explain select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

There is no difference! We not able to tell if we used a temporary table to resolve the query, or simply put the result set into the buffer. The 

EXPLAIN FORMAT=JSON

  command can help in this case as well. Its output is clear, and shows all the details of the query optimization:

mysql> explain format=json select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "buffer_result": {
        "using_temporary_table": true,
        "table": {
          "table_name": "salaries",
          "access_type": "ALL",
          "rows_examined_per_scan": 2557022,
          "rows_produced_per_join": 2557022,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "5544.00",
            "eval_cost": "511404.40",
            "prefix_cost": "516948.40",
            "data_read_per_join": "39M"
          },
          "used_columns": [
            "emp_no",
            "salary",
            "from_date"
          ]
        }
      }
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

Firstly, we can see how the

grouping_operation

 was optimized:

"grouping_operation": { "using_temporary_table": true, "using_filesort": true,

And it does indeed use the temporary table.

Now we can follow the details for

SQL_BUFFER_RESULT

:

"buffer_result": {
        "using_temporary_table": true,

With this output, we can be absolutely certain that the temporary table was created for both the  

SQL_BUFFER_RESULT

 and the grouping operation. This is especially helpful for support engineers who need the 

EXPLAIN

  output to help their customers to tune queries, but are afraid to ask for the same query twice — once with the 

SQL_BUFFER_RESULT

 clause and once without.

Conclusion:

EXPLAIN FORMAT=JSON

  does not hide important details for query optimizations.

Jan
29
2016
--

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSONReady for another post in the EXPLAIN FORMAT=JSON is Cool series! Great! This post will discuss how to see all the information that is contained in optimized queries with

UNION

 using the

union_result

 and

query_specifications

 commands.

 

When optimizing complicated queries with

UNION

, it is easy to get lost in the regular

EXPLAIN

  output trying to identify which part of the output belongs to each part of the

UNION

.

Let’s consider the following example:

mysql> explain
    ->     select emp_no, last_name, 'low_salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary < (select avg(salary) from salaries))
    -> union
    ->     select emp_no, last_name, 'high salary' from employees
    ->     where emp_no in (select emp_no from salaries
    ->         where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 4
  select_type: UNION
        table: employees
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299778
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 4
  select_type: UNION
        table: salaries
   partitions: NULL
         type: ref
possible_keys: PRIMARY,emp_no
          key: PRIMARY
      key_len: 4
          ref: employees.employees.emp_no
         rows: 9
     filtered: 33.33
        Extra: Using where; FirstMatch(employees)
*************************** 6. row ***************************
           id: 6
  select_type: SUBQUERY
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2557022
     filtered: 100.00
        Extra: NULL
*************************** 7. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,4>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Using temporary
7 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

While we can guess that subquery 3 belongs to the first query of the union, and subquery 6 belongs to the second (which has number 4 for some reason), we have to be very careful (especially in our case) when queries use the same tables in both parts of the

UNION

.

The main issue with the regular

EXPLAIN

 for

UNION

  is that it has to re-present the hierarchical structure as a table. The same issue occurs when you want to store objects created in programming language, such as Java, in the database.

EXPLAIN FORMAT=JSON

, on the other hand, has hierarchical structure and more clearly displays how

UNION

 was optimized:

mysql> explain format=json select emp_no, last_name, 'low_salary' from employees where emp_no in (select emp_no from salaries  where salary < (select avg(salary) from salaries)) union select emp_no, last_name, 'high salary' from employees where emp_no in (select emp_no from salaries where salary >= (select avg(salary) from salaries))G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "union_result": {
      "using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 3,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,
            "cost_info": {
              "query_cost": "921684.48"
            },
            "nested_loop": [
              {
                "table": {
                  "table_name": "employees",
                  "access_type": "ALL",
                  "possible_keys": [
                    "PRIMARY"
                  ],
                  "rows_examined_per_scan": 299778,
                  "rows_produced_per_join": 299778,
                  "filtered": "100.00",
                  "cost_info": {
                    "read_cost": "929.00",
                    "eval_cost": "59955.60",
                    "prefix_cost": "60884.60",
                    "data_read_per_join": "13M"
                  },
                  "used_columns": [
                    "emp_no",
                    "last_name"
                  ]
                }
              },
              {
                "table": {
                  "table_name": "salaries",
                  "access_type": "ref",
                  "possible_keys": [
                    "PRIMARY",
                    "emp_no"
                  ],
                  "key": "PRIMARY",
                  "used_key_parts": [
                    "emp_no"
                  ],
                  "key_length": "4",
                  "ref": [
                    "employees.employees.emp_no"
                  ],
                  "rows_examined_per_scan": 9,
                  "rows_produced_per_join": 299778,
                  "filtered": "33.33",
                  "first_match": "employees",
                  "cost_info": {
                    "read_cost": "302445.97",
                    "eval_cost": "59955.60",
                    "prefix_cost": "921684.48",
                    "data_read_per_join": "4M"
                  },
                  "used_columns": [
                    "emp_no",
                    "salary"
                  ],
                  "attached_condition": "(`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
                  "attached_subqueries": [
                    {
                      "dependent": false,
                      "cacheable": true,
                      "query_block": {
                        "select_id": 6,
                        "cost_info": {
                          "query_cost": "516948.40"
                        },
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ALL",
                          "rows_examined_per_scan": 2557022,
                          "rows_produced_per_join": 2557022,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "5544.00",
                            "eval_cost": "511404.40",
                            "prefix_cost": "516948.40",
                            "data_read_per_join": "39M"
                          },
                          "used_columns": [
                            "salary"
                          ]
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

First it puts member

union_result

 in the

query_block

  at the very top level:

EXPLAIN: {
  "query_block": {
    "union_result": {

The

union_result

 object contains information about how the result set of the

UNION

 was processed:

"using_temporary_table": true,
      "table_name": "<union1,4>",
      "access_type": "ALL",

And also contains the 

query_specifications

 array which also contains all the details about queries in the

UNION

:

"query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
<skipped>
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 4,

This representation is much more clear, and also contains all the details which the regular

EXPLAIN

misses for regular queries.

Conclusion:

EXPLAIN FORMAT=JSON

 not only contains additional optimization information for each query in the

UNION

, but also has a hierarchical structure that is more suitable for the hierarchical nature of the

UNION

.

Jan
25
2016
--

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

EXPLAIN FORMAT=JSONOver several previous blog posts, we’ve already discussed what information the 

EXPLAIN FORMAT=JSON

 output provides for some subqueries. You can review those discussions here, here and here. EXPLAIN FORMAT=JSON shows many details that you can’t get with other commands. Let’s now finish this topic and discuss the output for the rest of the subquery types.

First, let’s look at the subquery in the 

HAVING

 clause, such as in the following example:

select count(emp_no), salary
from salaries
group by salary
having salary > ALL (select avg(s)
                     from (select dept_no, sum(salary) as s
                           from salaries join dept_emp using (emp_no) group by dept_no) t
                     )

This example prints the number of employees and their salaries, if their salary is greater than the average salary in their department.

EXPLAIN FORMAT=JSON

 provides a lot details on how this subquery is optimized:

mysql> explain format=json select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t)G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3073970.40"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2557022.00"
      },
      "table": {
        "table_name": "salaries",
        "access_type": "ALL",
        "rows_examined_per_scan": 2557022,
        "rows_produced_per_join": 2557022,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "5544.00",
          "eval_cost": "511404.40",
          "prefix_cost": "516948.40",
          "data_read_per_join": "39M"
        },
        "used_columns": [
          "emp_no",
          "salary",
          "from_date"
        ]
      },
      "having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "771970.25"
            },
            "table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "1019140.27"
                  },
                  "grouping_operation": {
                    "using_filesort": false,
                    "nested_loop": [
                      {
                        "table": {
                          "table_name": "dept_emp",
                          "access_type": "index",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no",
                            "dept_no"
                          ],
                          "key": "dept_no",
                          "used_key_parts": [
                            "dept_no"
                          ],
                          "key_length": "4",
                          "rows_examined_per_scan": 331570,
                          "rows_produced_per_join": 331570,
                          "filtered": "100.00",
                          "using_index": true,
                          "cost_info": {
                            "read_cost": "737.00",
                            "eval_cost": "66314.00",
                            "prefix_cost": "67051.00",
                            "data_read_per_join": "5M"
                          },
                          "used_columns": [
                            "emp_no",
                            "dept_no"
                          ]
                        }
                      },
                      {
                        "table": {
                          "table_name": "salaries",
                          "access_type": "ref",
                          "possible_keys": [
                            "PRIMARY",
                            "emp_no"
                          ],
                          "key": "PRIMARY",
                          "used_key_parts": [
                            "emp_no"
                          ],
                          "key_length": "4",
                          "ref": [
                            "employees.dept_emp.emp_no"
                          ],
                          "rows_examined_per_scan": 9,
                          "rows_produced_per_join": 3087841,
                          "filtered": "100.00",
                          "cost_info": {
                            "read_cost": "334520.92",
                            "eval_cost": "617568.35",
                            "prefix_cost": "1019140.27",
                            "data_read_per_join": "47M"
                          },
                          "used_columns": [
                            "emp_no",
                            "salary",
                            "from_date"
                          ]
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` group by `employees`.`salaries`.`salary` having <not>((`employees`.`salaries`.`salary` <= <max>(/* select#2 */ select avg(`t`.`s`) from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t`)))

We see that the subquery in the 

HAVING

 clause is not dependent, but cacheable:

"having_subqueries": [
        {
          "dependent": false,
          "cacheable": true,

It has its own query block:

"query_block": {
            "select_id": 2,

Which accesses table “t”:

"table": {
              "table_name": "t",
              "access_type": "ALL",
              "rows_examined_per_scan": 3087841,
              "rows_produced_per_join": 3087841,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "154402.05",
                "eval_cost": "617568.20",
                "prefix_cost": "771970.25",
                "data_read_per_join": "94M"
              },
              "used_columns": [
                "dept_no",
                "s"
              ],

Table “t” was also materialized from the subquery:

],
              "materialized_from_subquery": {
                "using_temporary_table": true,
                "dependent": false,
                "cacheable": true,
                "query_block": {
                  "select_id": 3,

Another kind of subquery is in the 

SELECT

 list. If we want to compare the salary of an employee with the average salary in the company, for example, we can use the query

select emp_no, salary, (select avg(salary) from salaries) from salaries

. Lets examine the 

EXPLAIN

 output:

mysql> explain format=json select emp_no, salary, (select avg(salary) from salaries) from salariesG
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "516948.40"
    },
    "table": {
      "table_name": "salaries",
      "access_type": "ALL",
      "rows_examined_per_scan": 2557022,
      "rows_produced_per_join": 2557022,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "5544.00",
        "eval_cost": "511404.40",
        "prefix_cost": "516948.40",
        "data_read_per_join": "39M"
      },
      "used_columns": [
        "emp_no",
        "salary"
      ]
    },
    "select_list_subqueries": [
      {
        "dependent": false,
        "cacheable": true,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "516948.40"
          },
          "table": {
            "table_name": "salaries",
            "access_type": "ALL",
            "rows_examined_per_scan": 2557022,
            "rows_produced_per_join": 2557022,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "5544.00",
              "eval_cost": "511404.40",
              "prefix_cost": "516948.40",
              "data_read_per_join": "39M"
            },
            "used_columns": [
              "salary"
            ]
          }
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,(/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`) AS `(select avg(salary) from salaries)` from `employees`.`salaries`

EXPLAIN FORMAT=JSON

 in this case shows that the subquery is part of the first

query_block

, not dependent and cacheable.

The last type of subquery I want to discuss is the subquery updating values. For example, I added a new column to the

titles

 table from the standard employees database:

mysql> alter table titles add column full_title varchar(100);
Query OK, 0 rows affected (24.42 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now I want

full_title

 to contain both the department’s name and title, separated by a space. I can use 

UPDATE

 with the subquery to achieve this:

update titles
set full_title=concat((select dept_name
                       from departments
                       join dept_emp using(dept_no)
                       where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01')
               ,' ', title)
where to_date = '9999-01-01';

To find out how it is optimized, we can use

EXPLAIN FORMAT=JSON

:

mysql> explain format=json update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01'G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "update": true,
      "table_name": "titles",
      "access_type": "index",
      "key": "PRIMARY",
      "used_key_parts": [
        "emp_no",
        "title",
        "from_date"
      ],
      "key_length": "59",
      "rows_examined_per_scan": 442843,
      "filtered": "100.00",
      "using_temporary_table": "for update",
      "attached_condition": "(`employees`.`titles`.`to_date` = '9999-01-01')"
    },
    "update_value_subqueries": [
      {
        "dependent": true,
        "cacheable": false,
        "query_block": {
          "select_id": 2,
          "cost_info": {
            "query_cost": "1.35"
          },
          "nested_loop": [
            {
              "table": {
                "table_name": "dept_emp",
                "access_type": "ref",
                "possible_keys": [
                  "PRIMARY",
                  "emp_no",
                  "dept_no"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "emp_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.titles.emp_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "10.00",
                "cost_info": {
                  "read_cost": "1.00",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.22",
                  "data_read_per_join": "1"
                },
                "used_columns": [
                  "emp_no",
                  "dept_no",
                  "to_date"
                ],
                "attached_condition": "(`employees`.`dept_emp`.`to_date` = '9999-01-01')"
              }
            },
            {
              "table": {
                "table_name": "departments",
                "access_type": "eq_ref",
                "possible_keys": [
                  "PRIMARY"
                ],
                "key": "PRIMARY",
                "used_key_parts": [
                  "dept_no"
                ],
                "key_length": "4",
                "ref": [
                  "employees.dept_emp.dept_no"
                ],
                "rows_examined_per_scan": 1,
                "rows_produced_per_join": 0,
                "filtered": "100.00",
                "cost_info": {
                  "read_cost": "0.11",
                  "eval_cost": "0.02",
                  "prefix_cost": "1.35",
                  "data_read_per_join": "5"
                },
                "used_columns": [
                  "dept_no",
                  "dept_name"
                ]
              }
            }
          ]
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1276): Field or reference 'employees.titles.emp_no' of SELECT #2 was resolved in SELECT #1

We can see in this output that the subquery is dependent, not cacheable, and will be executed for each row that needs to be updated.

Conclusion:

EXPLAIN FORMAT=JSON

  provides various information about all kind of subqueries.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com