Sep
11
2017
--

Updating InnoDB Table Statistics Manually

InnoDB Tables

InnoDB TablesIn this post, we will discuss how to fix cardinality for InnoDB tables manually.

As a support engineer, I often see situations when the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages, or by the 

CREATE TABLE

 option

STATS_SAMPLE_PAGES

. The default value for persistent statistics is 20. This approach works fine when the number of unique values in your secondary key grows in step with the size of the table. But what if you have a column that has a comparatively small number of unique values? This could be a common service, many-to-many relationship table, for example, or just a table containing a list of sell orders that belong to one of a dozen shops owned by the company. Such tables could grow up to billions of rows with a small (less than 100) number of unique shop IDs.

At some point, InnoDB will report the wrong values for such indexes. Really! If 20 pages have 100 unique shop IDs, how many unique shop IDs would 20000 pages have? 100 times 1000? This seems logical, and after a certain number of rows such indexes will have extraordinarily large cardinality values.

ANALYZE TABLE

 will not help, because it uses the same algorithm. Increasing the number of “stats” sample pages would help, but it has its own downside: the more pages you have to examine, the slower

ANALYZE TABLE

 runs. While this command is not blocking, it still creates side effects as described in this blog post. And the longer it runs, the less control you have.

Another issue with InnoDB statistics: even if it is persistent and

STATS_AUTO_RECALC

 is set to 0, it still adds values for secondary indexes as shown in lp:1538765. Eventually, after you insert million of rows, your statistics get corrupted.

ANALYZE TABLE

  can fix it only if you specify a very large number of “stats” sample pages.

Can we do anything about it?

InnoDB stores statistics in the “mysql” database, in the tables

innodb_table_stats

 and

innodb_index_stats

. Since they are regular MySQL tables, privileged users can access them. We can update them and modify statistics as we like. And these statistics are used by the Optimizer!

I created a small example showing how to do this trick. I used Percona Server for MySQL version 5.7.19, but the trick will work on any supported MySQL and Percona Server for MySQL version.

First, let’s create test tables. The first table has shops, with a few shop profiles with the shop ID and name:

create table shops(
  shop_id int not null auto_increment primary key,
  name varchar(32)
) engine=innodb;

The second table refers to the “shops” table:

create table goods(
  id int not null auto_increment primary key,
  shop_id int not null,
  name varchar(32),
  create_date datetime DEFAULT NULL,
  key (shop_id, create_date)
) engine=innodb;

Let’s check how many unique shops we have:

mysql> select count(distinct shop_id) from shops;
+-------------------------+
| count(distinct shop_id) |
+-------------------------+
| 100                     |
+-------------------------+
1 row in set (0.02 sec)

With 100 distinct shops, and a key on

(shop_id, create_date)

, we expect cardinality in table goods to be not much different than this query result:

mysql> select count(distinct id) as `Cardinality for PRIMARY`,
    -> count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`,
    -> count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id`
    -> from goods
*************************** 1. row ***************************
Cardinality for PRIMARY: 8000000
Cardinality for shop_id column in index shop_id: 100
Cardinality for create_date column in index shop_id: 169861
1 row in set (2 min 8.74 sec)

However, 

SHOW INDEX

 returns dramatically different values for the column

shop_id

:

mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7289724 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       13587 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      178787 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.09 sec)

ANALYZE TABLE

 does not help:

mysql> analyze table goods;
+------------+---------+----------+----------+
|      Table |      Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze |   status |       OK |
+------------+---------+----------+----------+
1 row in set (0.88 sec)
mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

As a result, if we join the two tables, Optimizer chooses the wrong

JOIN

 order and query execution plan:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions |  type | possible_keys |     key | key_len |                ref | rows | filtered |                    Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
|  1 |      SIMPLE | shops |       NULL | index |       PRIMARY | PRIMARY |       4 |               NULL |  100 |   100.00 | Using where; Using index |
|  1 |      SIMPLE | goods |       NULL |   ref |       shop_id | shop_id |       4 | test.shops.shop_id |  534 |    11.11 |    Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.13 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (43.32 sec)

If compared to 

STRAIGHT_JOIN

 order:

mysql> explain select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions |   type | possible_keys |     key | key_len |                ref |  rows | filtered |                 Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
|  1 |      SIMPLE | goods |       NULL |  range |       shop_id | shop_id |      10 |               NULL | 31997 |   100.00 | Using index condition |
|  1 |      SIMPLE | shops |       NULL | eq_ref |       PRIMARY | PRIMARY |       4 | test.goods.shop_id |     1 |   100.00 |           Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.14 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.94 sec)

The time difference for a small 8M row table is around six times! For a big table with many columns, it would be even larger.

Is

STRAIGHT_JOIN

 the only solution for this case?

No! It’s also not a great solution because if the query is complicated and involves more than two tables, it may be affected by bug fixes and improvements in the Optimizer code. Then the query order might not be optimal for new versions and updates. Therefore, you’ll need to test such queries at each upgrade, including minor ones.

So why does

ANALYZE TABLE

 not work? Because the default number of pages it uses to calculate statistics is too small for the difference. You can increase the table option

STATS_SAMPLE_PAGES

  until you find a proper one. The drawback is that the greater you set 

STATS_SAMPLE_PAGES

, the longer it takes for 

ANALYZE TABLE

 to finish. Also, if you update a large portion of the table, you are often affected by lp:1538765. At some point, the statistics will again be inaccurate.

Now let’s try our manual statistics update trick

InnoDB stores its persistent statistics in the tables

mysql.innodb_table_stats

  and

mysql.innodb_index_stats

:

mysql> alter table goods stats_persistent=1, stats_auto_recalc=0;
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name |         last_update |  n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
|          test |      goods | 2017-09-05 00:21:12 | 7765796 |                34624 |                    17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name |         last_update |    stat_name | stat_value | sample_size |                  stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_diff_pfx01 |    7765796 |          20 |                                id |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages |      34484 |        NULL | Number of leaf pages in the index |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 |         size |      34624 |        NULL |      Number of pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx01 |      14523 |          20 |                           shop_id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx02 |     168168 |          20 |               shop_id,create_date |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx03 |    8045310 |          20 |            shop_id,create_date,id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_leaf_pages |      15288 |        NULL | Number of leaf pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 |         size |      17600 |        NULL |      Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)

And we can update these tables directly:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.18 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.08 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0

I took index values from earlier, as calculated by this query:

select count(distinct id) as `Cardinality for PRIMARY`, count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` from goods;

mysql> select * from mysql.innodb_table_stats where table_name='goods';
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name |         last_update |  n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
|          test |      goods | 2017-09-05 00:47:45 | 8000000 |                34624 |                    17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name |         last_update |    stat_name | stat_value | sample_size |                  stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
|          test |      goods |    PRIMARY | 2017-09-05 00:48:32 | n_diff_pfx01 |    8000000 |          20 |                                id |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages |      34484 |        NULL | Number of leaf pages in the index |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 |         size |      34624 |        NULL |      Number of pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:49:13 | n_diff_pfx01 |        100 |          20 |                           shop_id |
|          test |      goods |    shop_id | 2017-09-05 00:49:26 | n_diff_pfx02 |     169861 |          20 |               shop_id,create_date |
|          test |      goods |    shop_id | 2017-09-05 00:48:32 | n_diff_pfx03 |    8000000 |          20 |            shop_id,create_date,id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_leaf_pages |      15288 |        NULL | Number of leaf pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 |         size |      17600 |        NULL |      Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)

Now the statistics are up to date, but not used:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions |  type | possible_keys |     key | key_len |                ref | rows | filtered |                    Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
|  1 |      SIMPLE | shops |       NULL | index |       PRIMARY | PRIMARY |       4 |               NULL |  100 |   100.00 | Using where; Using index |
|  1 |      SIMPLE | goods |       NULL |   ref |       shop_id | shop_id |       4 | test.shops.shop_id |  534 |    11.11 |    Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.04 sec)

To finalize the changes, we need to run

FLUSH TABLE goods

:

mysql> FLUSH TABLE goods;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions |   type | possible_keys |     key | key_len |                ref |  rows | filtered |                 Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
|  1 |      SIMPLE | goods |       NULL |  range |       shop_id | shop_id |      10 |               NULL | 31997 |   100.00 | Using index condition |
|  1 |      SIMPLE | shops |       NULL | eq_ref |       PRIMARY | PRIMARY |       4 | test.goods.shop_id |     1 |   100.00 |           Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.28 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.79 sec)

Now everything is good.

But

FLUSH TABLE

 is a blocking operation, right? Won’t it block queries and create a worse scenario than described for ANALYZE TABLE in this post?

At first glance this is true. But we can use the same trick Percona Toolkit uses: set

lock_wait_timeout

 to 1 and call

FLUSH

 in a loop. To demonstrate how it works, I use a similar scenario as described in the

ANALYZE TABLE

 blog post.

First, let’s reset the statistics to ensure our

FLUSH

 works as expected:

mysql> analyze table goods;
+------------+---------+----------+----------+
|      Table |      Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze |   status |       OK |
+------------+---------+----------+----------+
1 row in set (0.38 sec)
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

And then update

mysql.innodb_*_stats

 tables manually. Then check that Optimizer still sees outdated statistics:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.09 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.11 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.10 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Now let’s start a long running query in one session that blocks our

FLUSH TABLE

 command:

mysql> select sleep(1) from goods limit 1000, 300;

And let’s run

FLUSH TABLE

 in a loop:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ until (`mysqlmtr -P13001 -e "set lock_wait_timeout=1; flush table goods;" test`); do sleep 1; done
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
...

Now let’s ensure we can access the table:

mysql> select * from goods order by id limit 10;
^C

We cannot! We cannot even connect to the database where the table is stored:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysqlmtr -P13001 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
^C

The reason for this is that while the 

FLUSH TABLE

 command was killed due to the metadata lock wait timeout, it also requested table lock for flushing and blocked other incoming queries.

But we can enclose

FLUSH TABLE

 into

LOCK TABLE ... WRITE; ... UNLOCK TABLES;

 operations. In this case, the 

LOCK TABLE

 command gets blocked until all queries release metadata lock on the table. Then it exclusively locks the table,

FLUSH TABLE

 runs and then the script immediately unlocks the table. Since closing the session causes an implicit unlock, I used a PHP one-liner to have everything in a single session:

$ php -r '
> $link = new mysqli("127.0.0.1", "root", "", "test", 13001);
> $link->query("set lock_wait_timeout=1");
> while(!$link->query("lock table goods write")) {sleep(1);}
> $link->query("flush table goods");
> $link->query("unlock tables");'

We can confirm if a parallel session can access the table:

mysql> select * from goods order by id limit 10;
+----+---------+----------------------------------+---------------------+
| id | shop_id |                             name |         create_date |
+----+---------+----------------------------------+---------------------+
|  1 |      58 | 5K0z2sHTgjWKKdryTaniQdZmjGjA9wls | 2015-09-19 00:00:00 |
|  2 |      17 | xNll02kgUTWAFURj6j5lL1zXAubG0THG | 2013-10-19 00:00:00 |
|  3 |      30 | clHX7uQopKmoTtEFH5LYBgQncsxRtTIB | 2017-08-01 00:00:00 |
|  4 |      93 | bAzoQTN98AmFjPOZs7PGfbiGfaf9Ye4b | 2013-02-24 00:00:00 |
|  5 |      20 | rQuTO5GHjP60kDbN6WoPpE2S8TtMbrVL | 2017-08-05 00:00:00 |
|  6 |      37 | WxqxA5tBHxikaKbuvbIF84H9QuaCnqQ3 | 2013-10-18 00:00:00 |
|  7 |      13 | DoYnFpQZSVV8UswBsWklgGBUc8zW9mVW | 2017-02-06 00:00:00 |
|  8 |      81 | dkNxMQyZNZuTrONEX4gxRLa0DOedatIs | 2015-07-05 00:00:00 |
|  9 |      12 | Z0t2uQ9itexpPf01KUpa7qBWlT5fBmXR | 2014-06-25 00:00:00 |
| 10 |      90 | 6urABBQyaUVVyxljvd11D3kUxbdDRPRV | 2013-10-23 00:00:00 |
+----+---------+----------------------------------+---------------------+
10 rows in set (0.00 sec)
mysql> update goods set name='test' where id=100;
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0

After the PHP script finishes its job, statistics are corrected:

mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     8000000 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |         100 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      169861 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Conclusion

We can manually update persistent InnoDB statistics to fix Optimizer plans for our queries, with almost no impact on a live server.

Dec
17
2014
--

InnoDB’s multi-versioning handling can be Achilles’ heel

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password  --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 953860873
        Extra: Using index
1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects:  0.00
[207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects:  0.00
[207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects:  0.00
[207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects:  0.00
[207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects:  0.00
[207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects:  0.00
[207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects:  0.00
[207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects:  0.00
[207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects:  0.00
[208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects:  0.00
[208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects:  0.00
[208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects:  0.00
[208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects:  0.00
[208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects:  0.00
[208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects:  0.00
[208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects:  0.00
[208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects:  0.00
[208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects:  0.00
[208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects:  0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

Corresponding CPU usage:

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

|  5499 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5500 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5501 | root          | localhost | sbtest             | Query   |    185 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5502 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5503 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c                                |         0 |             0 |
|  5504 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5505 | root          | localhost | sbtest             | Query   |     14 | updating     | UPDATE sbtest1 SET k=k+1 WHERE id=1                                                                  |         0 |             0 |
|  5506 | root          | localhost | sbtest             | Query   |    236 | updating     | DELETE FROM sbtest1 WHERE id=1                                                                       |         0 |             0 |
|  5507 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5508 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99                                                    |         0 |             0 |
|  5509 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5510 | root          | localhost | sbtest             | Query   |     14 | updating     | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 |         0 |             0 |
|  5511 | root          | localhost | sbtest             | Query   |    236 | updating     | UPDATE sbtest1 SET k=k+1 WHERE id=18                                                                 |         0 |             1 |
|  5512 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=7                                                                     |         0 |             0 |
|  6009 | root          | localhost | sbtest             | Query   | 195527 | Sending data | select avg(k) from sbtest1                                                                           |         0 |             0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type;
+-------------------+----------+
| page_type         | count(*) |
+-------------------+----------+
| EXTENT_DESCRIPTOR |        1 |
| FILE_SPACE_HEADER |        1 |
| IBUF_BITMAP       |      559 |
| IBUF_INDEX        |      855 |
| INDEX             |     2186 |
| INODE             |        1 |
| SYSTEM            |      128 |
| UNDO_LOG          |   382969 |
| UNKNOWN           |     6508 |
+-------------------+----------+
9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

P.S: I’ve done more experiments to validate how bad the problem really is and I can repeat it rather easily even without putting system into overdrive. Even if I run sysbench injecting just 25% of the transactions the system is possibly capable of handling at peak I have “select avg(k) from sbtest1″ query on 1 billion row table to never complete as it looks like the new entries are injected into the index at this point faster than Innodb can examine which of them are visible.

The post InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com