Nov
01
2023
--

Is ANALYZE TABLE Safe on a Busy MySQL Database Server?

ANALYZE TABLE Safe on a Busy Database

Sometimes, there is a need to update the table and index statistics manually using the ANALYZE TABLE command. Without going further into the reasons for such a need, I wanted to refresh this subject in terms of overhead related to running the command on production systems. However, the overhead discussed here is unrelated to the usual cost of diving into table rows to gather statistics, which we can control by setting the number of sample pages

Five years ago, my colleague Sveta posted a nice blog post about an improvement introduced in Percona Server for MySQL to address unnecessary stalls related to running the command:

ANALYZE TABLE Is No Longer a Blocking Operation

Historically, the problem with running the ANALYZE TABLE command in MySQL was that the query needed an exclusive lock on the table definition cache entry for the table. This makes the query wait for any long-running queries to finish but also can trigger cascading waiting for other incoming requests. In short, ANALYZE could lead to nasty stalls in busy production environments.

A lot has changed since then, but many production systems alive today still run with affected versions. Let’s recap how the situation has evolved over the years. 

MySQL Server – Community Edition

The problem applies to all versions of the upstream MySQL Community up to 8.0.23. There were no improvements in the 5.7 series (btw, EOL will be reached this month!), which means even the latest 5.7.43 is affected. Here is an example scenario you may end up here:

mysql > select @@version,@@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 5.7.43    | MySQL Community Server (GPL) |
+-----------+------------------------------+
1 row in set (0.00 sec)

mysql > show processlist;
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
| Id | User     | Host      | db   | Command | Time | State                   | Info                                                           |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
|  4 | msandbox | localhost | db1  | Query   |   54 | Sending data            | select avg(k) from sbtest1 where pad not like '%f%' group by c |
| 13 | msandbox | localhost | db1  | Query   |   29 | Waiting for table flush | analyze table sbtest1                                          |
| 17 | msandbox | localhost | db1  | Query   |    0 | starting                | show processlist                                               |
| 18 | msandbox | localhost | db1  | Query   |   15 | Waiting for table flush | select * from sbtest1 where id=100                             |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
4 rows in set (0.00 sec)

One long query made the ANALYZE wait, but another, normally very fast query, is now waiting, too.

The same situation may happen in MySQL 8.0 series, including 8.0.23. Fortunately, there was a fix in version 8.0.24 addressing this problem. We can only read a bit restrained comment in the release notes about the “wait eliminated”:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-24.html

Indeed, since version 8.0.24, a similar test during a long-running query results in instant query execution:

mysql > select @@version,@@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 8.0.24    | MySQL Community Server - GPL |
+-----------+------------------------------+
1 row in set (0.00 sec)

mysql > analyze table sbtest1;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| db1.sbtest1 | analyze | status   | OK       |
+-------------+---------+----------+----------+
1 row in set (0.00 sec)

However, we can still find a warning in the official documentation, even for the 8.1 version, like this:

ANALYZE TABLE removes the table from the table definition cache, which requires a flush lock. If there are long running statements or transactions still using the table, subsequent statements and transactions must wait for those operations to finish before the flush lock is released. Because ANALYZE TABLE itself typically finishes quickly, it may not be apparent that delayed transactions or statements involving the same table are due to the remaining flush lock.

I requested an update of the related bug report as well as the documentation problem accordingly:

https://bugs.mysql.com/bug.php?id=87065
https://bugs.mysql.com/bug.php?id=112670

Percona Server for MySQL

As mentioned above, Percona introduced a fix and removed unnecessary table definition cache lock as a result of solving this bug report:

https://jira.percona.com/browse/PS-2503

When using the Percona variant, running ANALYZE TABLE was safe already since versions 5.6.38 and 5.7.20, as these were the active development series at the time. You may read the announcement in the release notes here:

https://docs.percona.com/percona-server/5.7/release-notes/Percona-Server-5.7.20-18.html#bugs-fixed

Percona Server for MySQL version 8.0 has been free from the issue since the very first release (I tested back, including the first GA release 8.0.13-3), as the improvement was merged from the Percona Server for MySQL 5.7 series.

MariaDB server

The locking ANALYZE TABLE problem applies to all MariaDB versions up to 10.5.3. In version 10.5.4, the solution from Percona was implemented as described in the following report:
https://jira.mariadb.org/browse/MDEV-15101

Therefore, when you run the query in 10.5.3 or lower, and in any previous series, like even the latest 10.4.31, a similar situation may occur:

mysql > select @@version,@@version_comment;
+----------------+-------------------+
| @@version      | @@version_comment |
+----------------+-------------------+
| 10.5.3-MariaDB | MariaDB Server    |
+----------------+-------------------+
1 row in set (0.000 sec)

mysql > show processlist;
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
| Id | User     | Host      | db   | Command | Time | State                   | Info                                                           | Progress |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
|  4 | msandbox | localhost | db1  | Query   |   18 | Sending data            | select avg(k) from sbtest1 where pad not like '%f%' group by c |    0.000 |
| 13 | msandbox | localhost | db1  | Query   |   16 | Waiting for table flush | analyze table sbtest1                                          |    0.000 |
| 14 | msandbox | localhost | db1  | Query   |   14 | Waiting for table flush | select * from sbtest1 where id=100                             |    0.000 |
| 15 | msandbox | localhost | NULL | Query   |    0 | starting                | show processlist                                               |    0.000 |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
4 rows in set (0.000 sec)

mysql > select @@version,@@version_comment;
+-----------------+-------------------+
| @@version       | @@version_comment |
+-----------------+-------------------+
| 10.4.31-MariaDB | MariaDB Server    |
+-----------------+-------------------+
1 row in set (0.000 sec)

mysql > show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                    | Info                                                           | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
|  1 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL                                                           |    0.000 |
|  2 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  3 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  5 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL                                                           |    0.000 |
|  9 | msandbox    | localhost | db1  | Query   |   18 | Sending data             | select avg(k) from sbtest1 where pad not like '%f%' group by c |    0.000 |
| 18 | msandbox    | localhost | db1  | Query   |   16 | Waiting for table flush  | analyze table sbtest1                                          |    0.000 |
| 19 | msandbox    | localhost | db1  | Query   |   12 | Waiting for table flush  | select * from sbtest1 where id=100                             |    0.000 |
| 22 | msandbox    | localhost | NULL | Query   |    0 | Init                     | show processlist                                               |    0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
9 rows in set (0.000 sec)

Summary

As long as your database runs on the most recent version of MySQL or MariaDB variant, running ANALYZE TABLE should be absolutely safe and not cause any unexpected stalls.

Users of all three major Percona Server for MySQL series – 5.6.38+, 5.7.20+, and 8.0.x are all safe.

When, for any reason, you are not able to upgrade the community variant to the latest MySQL 8.0.24+ version yet and have to stick with 5.6 or 5.7 series for now, you may just swap MySQL Community binaries to Percona Server for MySQL ones, which are 100% compatible yet free from the problem. At the same time, you may check our post-EOL support for Percona Server for MySQL 5.7.

MariaDB users must upgrade to 10.5.4 or later to avoid the locking problem.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Mar
27
2018
--

ANALYZE TABLE Is No Longer a Blocking Operation

analyze table

analyze tableIn this post, I’ll discuss the fix for lp:1704195 (migrated to PS-2503), which prevents

ANALYZE TABLE

 from blocking all subsequent queries on the same table.

In November 2017, Percona released a fix for lp:1704195 (migrated to PS-2503), created by Laurynas Biveinis. The fix, included with Percona Server for MySQL since versions 5.6.38-83.0 and 5.7.20-18, stops

ANALYZE TABLE

 from invalidating query and table definition cache content for supported storage engines (InnoDB, TokuDB and MyRocks).

Why is this important?

In short, it is now safe to run

ANALYZE TABLE

 in production environments because it won’t trigger a situation where all queries on the same table stack are in the state

"Waiting for table flush"

. Check this blog post for details on how this situation can happen.

Why do we need to run ANALYZE TABLE?

When Optimizer decides which index to choose to resolve the query, it uses statistics stored for this table by storage engine. If the statistics are not up to date, Optimizer might choose the wrong index when it creates the query execution plan. This can cause performance to suffer.

To prevent this, storage engines support automatic and manual statistics updates. While automatic statistics updates usually work fine, there are cases when they do not do their job properly.

For example, InnoDB uses 20 sample 16K pages when it updates persistent statistics, and eight 16K pages when it updates transient statistics. If your data distribution is even, it does not matter how big your table is: even for 1T tables, using a sample of 320K is enough. But if your data isn’t even, statistics might get wrongly created. The solution for this issue is to increase either the innodb_stats_transient_sample_pages or innodb_stats_persistent_sample_pages variable. But increasing the number of pages to examine while collecting statistics leads to longer update runs, and thus higher IO activity, which is probably not what you want to happen often.

To control this, you can disable automatic statistics updates for such tables, and schedule a job that periodically runs 

ANALYZE TABLE

.

Will it be safe before the fix for lp:1704195 (migrated to PS-2503)?

Theoretically yes, but we could easily hit a situation as described in this blog post by Miguel Angel Nieto. The article describes what if some long-running query starts and doesn’t finish before

ANALYZE TABLE

. All the queries on the analyzing table get stuck in the state

"Waiting for table flush"

 at some time.

This happens because before the fix, 

ANALYZE TABLE

 worked as follows:

  1. Opens table statistics: concurrent DML operations (
    INSERT/UPDATE/DELETE/SELECT

    ) are allowed

  2. Updates table statistics: concurrent DML operations are allowed
  3. Update finished
  4. Invalidates table entry in the table definition cache: concurrent DML operations are forbidden
    1. What happens here is
      ANALYZE TABLE

       marks the currently open table share instances as invalid. This does not affect running queries: they will complete as usual. But all incoming queries will not start until they can re-open table share instance. And this will not happen until all currently running queries complete.

  5. Invalidates query cache: concurrent DML operations are forbidden

Last two operations are usually fast, but they cannot finish if another query touched either the table share instance or acquired query cache mutex. And, in its turn, it cannot allow for incoming queries to start.

However

ANALYZE TABLE

 modifies table statistics, not table definition!

Practically, it cannot affect already running queries in any way. If a query started before

ANALYZE TABLE

 finished updating statistics, it uses old statistics.

ANALYZE TABLE

 does not affect data in the table. Thus old entries in the query cache will still be correct. It hasn’t changed the definition of the table. Therefore there is no need to remove it from the table definition cache. As a result, we avoid operations 4 and 5 above.

The fix for lp:1704195 (migrated to PS-2503) removes these additional updates and locks required for them, and makes

ANALYZE TABLE

 always safe to run in busy production environments.

The post ANALYZE TABLE Is No Longer a Blocking Operation appeared first on Percona Database Performance Blog.

Feb
27
2013
--

MySQL optimizer: ANALYZE TABLE and Waiting for table flush

The MySQL optimizer makes the decision of what execution plan to use based on the information provided by the storage engines. That information is not accurate in some engines like InnoDB and they are based in statistics calculations therefore sometimes some tune is needed. In InnoDB these statistics are calculated automatically, check the following blog post for more information:

http://www.mysqlperformanceblog.com/2011/10/06/when-does-innodb-update-table-statistics-and-when-it-can-bite/

There are some variables to tune how that statistics are calculated but we need to wait until the gathering process triggers again to see if there is any improvement. Usually the first step to try to get back to the previous execution plan is to force that process with ANALYZE TABLE that is usually fast enough to not cause troubles.

Let’s see an example of how a simple and fast ANALYZE can cause a downtime.

Waiting for table flush:

In order to trigger this problem we need:

– Lot of concurrency
– A long running query
– Run an ANALYZE TABLE on a table accessed by the long running query

So first we need a long running query against table t:

SELECT * FROM t WHERE c > '%c%';

Then in our efforts to get a better execution plan for another query we run ANALYZE TABLE:

mysql> analyze table t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.00 sec)

Perfect, very fast! But then some seconds later we realize that our application is down. Let’s see the process list. I’ve removed most of the columns to make it clearer:

+------+-------------------------+---------------------------------+
| Time | State                   | Info                            |
|   32 | Writing to net          | select * from t where c > '%0%' |
|   12 | Waiting for table flush | select * from test.t where i=1  |
|   12 | Waiting for table flush | select * from test.t where i=2  |
|   12 | Waiting for table flush | select * from test.t where i=3  |
|   11 | Waiting for table flush | select * from test.t where i=7  |
|   10 | Waiting for table flush | select * from test.t where i=11 |
|   11 | Waiting for table flush | select * from test.t where i=5  |
|   11 | Waiting for table flush | select * from test.t where i=4  |
|   11 | Waiting for table flush | select * from test.t where i=9  |
|   11 | Waiting for table flush | select * from test.t where i=8  |
|   11 | Waiting for table flush | select * from test.t where i=12 |
|   11 | Waiting for table flush | select * from test.t where i=14 |
|   10 | Waiting for table flush | select * from test.t where i=6  |
|   10 | Waiting for table flush | select * from test.t where i=15 |
|   10 | Waiting for table flush | select * from test.t where i=10 |
[...]

The ANALYZE TABLE runs perfect but after it the rest of the threads that are running a query against that table need to wait. This is because MySQL has detected that the underlying table has changed and it needs to close and reopen it using FLUSH. Therefore the table will be locked until all queries that are using that table finish. There are only two solutions to this situation, wait until the long query finishes or kill the query. Also, we have to take in account that killing a query could cause even more troubles. If we are dealing with a write query on InnoDB the rollback process could take even more time to finish than the original query. On the other hand, if the table is MyISAM there will be no rollback process so all the already updated rows can’t be recovered.

This particular example is not only a problem of ANALYZE. Other commands like FLUSH TABLES, ALTER, RENAME, OPTIMIZE or REPAIR can cause threads to wait on “Waiting for tables”, “Waiting for table” and “Waiting for table flush”.

Conclusion

Before running an ANALYZE table or any other command listed before, check the running queries. If the table that you are going to work on is very used the recommendation is to run it during the low peak of load or a maintenance window.

The post MySQL optimizer: ANALYZE TABLE and Waiting for table flush appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com