Using Explain Analyze in MySQL 8

Explain Analyze in MySQL

Explain Analyze in MySQLIn MySQL 8.0.18 there is a new feature called Explain Analyze when for many years we mostly had only the traditional Explain. I know there are different formats, but those based on the same information just show it in a different format with some extra details.

But Explain Analyze is a different concept. It is actually going to run the query and measure execution time by using the new iterator executor for each step. That topic itself deserves its own blog post on how the new iterator executor works, and I will write a post about that as well. But if you cannot wait and you would like to read up, here are some links to some additional information: Iterator executor analytics queries, Volcano iterator design, and Volcano iterator semijoin.

In this post, we are going to focus on what Explain Analyze can give us.

As always let’s start with some testing. I have my test server with a sbtest1 table:

`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB AUTO_INCREMENT=262125 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

There are almost 1 million rows in it:

mysql> select count(*) from sbtest1;

| count(*) |
| 999999 |
1 row in set (0.32 sec)

Traditional Explain

mysql> explain select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: range
possible_keys: idx3
key: idx3
key_len: 4
ref: NULL
rows: 493204
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Hopefully, most of you already know this output; we can see which table and index will be used and we can also see it is a range query and approximately it has to read 493204 Rows. With InnoDB we know that is just an estimation, it is not the real number. In the past, we had two options: either run the query and see the real number or run the query and check the handler statistics to get even more detailed information.

Now I am going to run the query and see the real row count:

mysql> select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
count(*): 625262
1 row in set (0.10 sec)

So the query has to read 625262 rows and takes 0.10s.  Let’s have a look at Explain Analyze.

Explain Analyze

mysql> explain analyze select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=178.225..178.225 rows=1 loops=1)
-> Filter: (sbtest1.k > 500000) (cost=98896.53 rows=493204) (actual time=0.022..147.502 rows=625262 loops=1)
-> Index range scan on sbtest1 using idx3 (cost=98896.53 rows=493204) (actual time=0.021..96.488 rows=625262 loops=1)

1 row in set (0.18 sec)

It always uses the tree format and it took me some time to actually understand what all this information means. Unfortunately, the manual page does not really explain it.

Let me try to fill the gaps:

Index range scan on sbtest1 using idx3 –  this part is quite trivial, it says it is going to use a range scan on sbtest1 table and will use the idx3 index.

cost=98896.53 rows=493204 –  this is the cost and the same row number that traditional Explain gives us.

actual time=0.021..96.488 rows=625262 loops=1:

  • 0.021 – The time to return first row (Init + first Read) in milliseconds.
  • 96.488 – The time to return all rows (Init + all Read calls) in milliseconds.
  • rows=625262 – The number of rows returned by this iterator. Finally, it is the exact number.
  • loops=1 – The number of loops (number of Init calls).

The source for this information is here: Implement EXPLAIN ANALYZE.

We can see all this information with each step, which gives us great help and insights for query optimization.

But it is important to notice in the last step that the time is 178ms and it also reports 1 row in set (0.18 sec). So it says this query takes 0.18s! But the original query took only 0.10s.

The manual says:

This has naturally some overhead, but it seems this is only around 5–6% for an instrumented query
(entirely free for a non-instrumented query).

But with this very simple query, the overhead is already 80%, so we cannot really trust those numbers.

Let’s have a look on another query which takes more time, and see how this difference behaves:

mysql> select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k;
| count(*) |
| 77902613 |
1 row in set (15.42 sec)

mysql> explain analyze select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=26501.232..26501.233 rows=1 loops=1)
-> Nested loop left join (cost=810799.35 rows=4550475) (actual time=0.036..22572.530 rows=77902616 loops=1)
-> Index scan on t1 using idx3 (cost=108280.56 rows=986408) (actual time=0.026..176.645 rows=999999 loops=1)
-> Index lookup on t2 using idx3 (k=t1.k) (cost=0.25 rows=5) (actual time=0.001..0.017 rows=78 loops=999999)

1 row in set (26.50 sec)

It reports that the query is 60% slower than in real life.

Just out of curiosity I tried Explain Analyze with a Stored Procedure, but it didn’t work. One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood.  (In Percona Server there are log_slow_sp_statements which will allow you to log the individual queries into a slow query log.)

I would love to see if Explain Analyze would run the procedure and then display the actual execution plan in a tree format. But maybe I am believing in a perfect world and I am too naive.

I also noticed some interesting behavior:

mysql> explain analyze select count(*) from sbtest1\G
*************************** 1. row ***************************
EXPLAIN: -> Count rows in sbtest1

1 row in set (0.13 sec)

If there isn’t a where condition, or group by, or anything else and just a simple count query, it does not give us the time for that step or how many rows were read or which index was used. I think here its a bit too minimalistic and some basic information would be good to see.


I think Explain Analyze is a great step in the right direction, and I am excited to see if in the future it gets even more features, but it should report more realistic times for execution time.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com