Abstract:
By diving into the details of our case study, we will explain how incorrect table statistics may lead the optimizer to choose a suboptimal execution plan. We will also go into how MySQL calculates the table statistics and the ways to correct the table statistics to prevent it from happening again.
Case study: Incorrect table statistics lead the optimizer to choose a poor execution plan.
A customer reported a drastic performance degradation of a query while there were no code changes and no configuration changes made. The data in this article has been edited for brevity and modified to mitigate the exposure of confidential information. The case has also been approved for publication by the customer.
We obtained the query execution plan, and got the results as shown below (execution plan #1):
mysql> explain -> SELECT count(con.id) , -> MAX(DAYNAME(con.date)) , -> now() , -> pcz.type, -> pcz.c_c -> FROM con AS con -> join orders o on con.o_id = o.id -> JOIN pcz AS pcz ON o.d_p_c_z_id = pcz.id -> left join c c on con.c_id = c.id -> WHERE con.date = current_date() and pcz.type = "T_D" -> GROUP BY con.date, pcz.c_c, pcz.type; +----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+ | 1 | SIMPLE | pcz | NULL | ALL | PRIMARY | NULL | NULL | NULL | 194 | 10.00 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | o | NULL | ref | PRIMARY,dpcz_FK | dpcz_FK | 9 | custom.pcz.id | 1642 | 100.00 | Using index | | 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | FK_order | 8 | custom.o.id | 1 | 4.23 | Using where | | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index | +----+-------------+-------+------------+--------+-------------------+----------+---------+----------------------------+------+----------+----------------------------------------------+
The estimated rows to be examined = (194*10%)*1642*(1*4.23%)=1347
Upon taking a closer look at the query, we see the condition: con.date = current_date() . This condition seems to be a better choice to help filter results, but why did the MySQL optimizer skip out on using the index? Let’s take a look at the execution plan by forcing the use of the index on the con.date field. The explain output (execution plan #2) will be:
mysql> explain -> SELECT count(con.id) , -> MAX(DAYNAME(con.date)) , -> now() , -> pcz.type, -> pcz.c_c -> FROM con AS con USE INDEX(IDX_date) -> join orders o on con.o_id = o.id -> JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id -> left join c c on con.c_id = c.id -> WHERE con.date = current_date() and pcz.type = "T_D" -> GROUP BY con.date, pcz.c_c, pcz.type; +----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+ | 1 | SIMPLE | con | NULL | ref | IDX_date | IDX_date | 3 | const | 110446 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index | | 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.o_id | 1 | 100.00 | Using where | | 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where | +----+-------------+-------+------------+--------+-----------------+----------+---------+---------------------------------------+--------+----------+---------------------------------+
The estimated rows to be examined = 110446*(1*10%)=11045 rows
Based on the estimate, because 1347 is around one-tenth of 11045, it is reasonable for MySQL to choose execution plan #1, as it appears to be optimal.
However, in comparing the expected response times with the actual results, something was clearly off. While the second execution plan returned results within the expected time (before the performance degradation), execution plan #1 exceeded the estimated time for response.
Taking a further look at the structure of the table orders and execution plan #1, we found out that there are actually 194 rows of the table pcz. Moreover, looking at the index orders.dpcz_FK, the table orders will return 1642 rows, because of the FOREIGN KEY constraint orders_ibfk_10
as shown below – this implies that the number of rows in the table orders should be 194*1642=318548, but the actual number of rows of the table orders is 32508150, which is more than 100 times the estimated amount of 318548.
CREATE TABLE `orders` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, ... `d_p_c_z_id` bigint(20) DEFAULT NULL, ..., PRIMARY KEY (`id`), ... KEY `dpcz_FK` (`d_p_c_z_id`), ... CONSTRAINT `orders_ibfk_10` FOREIGN KEY (`d_p_c_z_id`) REFERENCES `p_c_z` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, ... ) ENGINE=InnoDB .... mysql> select * from mysql.innodb_table_stats where database_name='cutom' and table_name='orders'; +---------------+------------+---------------------+----------+----------------------+--------------------------+ | database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes | +---------------+------------+---------------------+----------+----------------------+--------------------------+ | custom | orders | 2022-03-03 21:58:18 | 32508150 | 349120 | 697618 | +---------------+------------+---------------------+----------+----------------------+--------------------------+
As such, we suspected that the table statistics of orders.dpcz_FK are not accurate. We verified it by running the test below:
mysql> select * from mysql.innodb_index_stats where database_name='cutom' and table_name='orders' and index_name='dpcz_FK'; mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_diff_pfx01 | 19498 | 50 | d_p_c_z_id | | custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_diff_pfx02 | 32283087 | 128 | d_p_c_z_id,id | | custom | orders | dpcz_FK | 2022-02-28 12:35:30 | n_leaf_pages | 55653 | NULL | Number of leaf pages in the index | | custom | orders | dpcz_FK | 2022-02-28 12:35:30 | size | 63864 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ mysql> select count(distinct d_p_c_z_id) from orders; +----------------------------------------------+ | count(distinct d_p_c_z_id) | +----------------------------------------------+ | 195 | +----------------------------------------------+
Bingo! The actual cardinality of the column d_p_c_z_id (of which the index dpcz_FK index on) of the table orders is 195. In the statistics table mysql.innodb_index_stats the stat_value for the index dpcz_FK is 19498, which is incorrect and is a large difference from the actual value! Moreover, the stat_value for an index is supposed to be the cardinality of the column of the table.
Using the correct stat_value 195 for the index dpcz_FK, we can get the actual number of rows to return in line 2 of execution plan #1 to be 32508150/195=166708, and then the estimated rows to be examined will be (194*10%)*166708*(1*4.23%)=136804. As this new value is over 10 times larger than 11045, the estimated row size of execution plan #2, MySQL will now correctly select execution plan #2 without us having to force the use of the index.
But why did MySQL calculate the table statistics incorrectly, and how can we fix it?
To answer that question, we first need to know how MySQL calculates the table statistics and what parameters control the process. Then the path to the solution can be discovered easily.
How InnoDB Calculates the Table Statistics
The Table Statistics can be collected explicitly or automatically. People usually enable(also by default) innodb_stats_auto_recalc to automatically recalculate persistent statistics after the data in a table is changed substantially. When 10% of the rows in the table are changed, InnoDB will recalculate the statistics. Alternatively, we can use ANALYZE TABLE to recalculate statistics explicitly.
Behind the specifications, InnoDB uses the sampling technique known as a random dive which samples random pages from each index on a table to estimate the cardinality of the index. The innodb_stats_persistent_sample_pages controls the number of sampled pages. Refer to the link.
As per the code and description, random sampling is not fully random. The sampling pages are actually selected based on the sampling algorithm. Ultimately, the total number of different key values, namely, the stat_value of the index will be calculated by the formula: N * R * N_DIFF_AVG_LEAF. where:
N: the number of leaf pages
R: the ratio of the number of different key values on level LA to the total number of records on level LA
N_DIFF_AVG_LEAF: the average number of different key values found in all the A leaf pages.
The details of the sampling algorithm code can be found in the link:
With the above understanding, we know that when the index of a table is fragmented, both the number of leaf pages(N) and the ratio of the number of different key values on level LA to the total number of records on level LA(R) becomes more and more inaccurate, and so the calculation of the stat_value may be incorrect. Once that happens, unless the parameter innodb_stats_persistent_sample_pages is changed or the index is reconstructed, recalculation explicitly(manually run ANALYZE TABLE) will not be able to produce the correct stat_value.
Solution: How can we correct the table statistics and prevent it from happening again?
Due to the sampling algorithm as discussed above, we now know that there are only two factors that impact the calculation: the parameter innodb_stats_persistent_sample_pages: A; and how the index is organized.
To allow InnoDB to get the correct table statistics, we will have to either increase the innodb_stats_persistent_sample_pages or rebuild/reconstruct the index. The head-on approach to reconstructing the index is to rebuild the table, for example, execute a no-op alter against the table.
Let’s take a look at the following three examples:
- ANALYZE TABLE without rebuilding, keeping innodb_stats_persistent_sample_pages as it is (128), the stat_value slightly changed to 19582, close to the original incorrect 19498, still off. The number of the leaf pages in the index slightly changed from 55653 to 55891, the number of pages in the index also slightly changed from 63864 to 64248:
mysql> show variables = 'innodb_stats_persistent_sample_pages; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | innodb_stats_persistent_sample_pages | 128 | +--------------------------------------+-------+ mysql> analyze table orders; +---------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------+---------+----------+----------+ | custom.orders | analyze | status | OK | +---------------+---------+----------+----------+ mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_diff_pfx01 | 19582 | 50 | d_p_c_z_id | | custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_diff_pfx02 | 32425512 | 128 | d_p_c_z_id,id | | custom | orders | dpcz_FK | 2022-03-03 21:58:18 | n_leaf_pages | 55891 | NULL | Number of leaf pages in the index | | custom | orders | dpcz_FK | 2022-03-03 21:58:18 | size | 64248 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
- ANALYZE TABLE without rebuilding, but increasing innodb_stats_persistent_sample_pages from 128 to 512, got stat_value to 192 very close to the real cardinality 195. There was a big change in the number of leaf pages in the index, from 55653 to 44188. The number of pages in the index also changed drastically, from 63864 to 50304.
mysql> show variables like '%persistent_sample%'; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | innodb_stats_persistent_sample_pages | 512 | +--------------------------------------+-------+ mysql> analyze table orders; +---------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------+---------+----------+----------+ | custom.orders | analyze | status | OK | +---------------+---------+----------+----------+ mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_diff_pfx01 | 192 | 179 | d_p_c_z_id | | custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_diff_pfx02 | 31751321 | 512 | d_p_c_z_id,id | | custom | orders | dpcz_FK | 2022-03-09 06:54:29 | n_leaf_pages | 44188 | NULL | Number of leaf pages in the index | | custom | orders | dpcz_FK | 2022-03-09 06:54:29 | size | 50304 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
Rebuilding the table, keeping innodb_stats_persistent_sample_pages as 128, also got the correct stat_value 187 close to the real cardinality 195. The number of leaf pages in the index substantially changed, from 55653 to 43733, and the number of pages in the index also changed from 63864 to 50111.
mysql> show variables = 'innodb_stats_persistent_sample_pages'; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | innodb_stats_persistent_sample_pages | 128 | +--------------------------------------+-------+ mysql> alter table orders engine=innodb; Query OK, 0 rows affected (11 min 16.37 sec) mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_diff_pfx01 | 187 | 128 | d_p_c_z_id | | custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_diff_pfx02 | 31531493 | 128 | d_p_c_z_id,id | | custom | orders | dpcz_FK | 2022-03-07 18:44:43 | n_leaf_pages | 43733 | NULL | Number of leaf pages in the index | | custom | orders | dpcz_FK | 2022-03-07 18:44:43 | size | 50111 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
After the table statistics data is corrected, the MySQL optimizer would choose the correct execution plan as well:
mysql> explain SELECT count(con.id) , MAX(DAYNAME(con.date)) , now() , pcz.type, pcz.c_c FROM con AS con join orders o on con.order_id = o.id JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id left join c c on con.c_id = c.id WHERE con.date = current_date() and pcz.type = "T_D" GROUP BY con.date, pcz.c_c, pcz.type; +----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+ | 1 | SIMPLE | con | NULL | ref | FK_order,IDX_date | IDX_date | 3 | const | 3074 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.con.c_id | 1 | 100.00 | Using index | | 1 | SIMPLE | o | NULL | eq_ref | PRIMARY,dpcz_FK | PRIMARY | 8 | custom.con.order_id | 1 | 100.00 | Using where | | 1 | SIMPLE | pcz | NULL | eq_ref | PRIMARY | PRIMARY | 8 | custom.o.d_p_c_z_id | 1 | 10.00 | Using where | +----+-------------+-------+------------+--------+-------------------+----------+---------+---------------------------------------+------+----------+---------------------------------+ 4 rows in set, 1 warning (0.01 sec)
Conclusion
MySQL optimizer depends on the accuracy of the table statistics in order to select the optimal execution plan. We can control the accuracy of the table statistics by changing the parameter innodb_stats_persistent_sample_pages. We can also choose to force a full recalculation of the table statistics by rebuilding/reconstructing the table while defragmenting the indexes, which helps to improve the accuracy of the table statistics. To reconstruct the table, we can directly alter the table with no-op or use pt-online-schema-change to achieve the same effect.