Mar
27
2018
--

ANALYZE TABLE Is No Longer a Blocking Operation

analyze table

analyze tableIn this post, I’ll discuss the fix for lp:1704195 (migrated to PS-2503), which prevents

ANALYZE TABLE

 from blocking all subsequent queries on the same table.

In November 2017, Percona released a fix for lp:1704195 (migrated to PS-2503), created by Laurynas Biveinis. The fix, included with Percona Server for MySQL since versions 5.6.38-83.0 and 5.7.20-18, stops

ANALYZE TABLE

 from invalidating query and table definition cache content for supported storage engines (InnoDB, TokuDB and MyRocks).

Why is this important?

In short, it is now safe to run

ANALYZE TABLE

 in production environments because it won’t trigger a situation where all queries on the same table stack are in the state

"Waiting for table flush"

. Check this blog post for details on how this situation can happen.

Why do we need to run ANALYZE TABLE?

When Optimizer decides which index to choose to resolve the query, it uses statistics stored for this table by storage engine. If the statistics are not up to date, Optimizer might choose the wrong index when it creates the query execution plan. This can cause performance to suffer.

To prevent this, storage engines support automatic and manual statistics updates. While automatic statistics updates usually work fine, there are cases when they do not do their job properly.

For example, InnoDB uses 20 sample 16K pages when it updates persistent statistics, and eight 16K pages when it updates transient statistics. If your data distribution is even, it does not matter how big your table is: even for 1T tables, using a sample of 320K is enough. But if your data isn’t even, statistics might get wrongly created. The solution for this issue is to increase either the innodb_stats_transient_sample_pages or innodb_stats_persistent_sample_pages variable. But increasing the number of pages to examine while collecting statistics leads to longer update runs, and thus higher IO activity, which is probably not what you want to happen often.

To control this, you can disable automatic statistics updates for such tables, and schedule a job that periodically runs 

ANALYZE TABLE

.

Will it be safe before the fix for lp:1704195 (migrated to PS-2503)?

Theoretically yes, but we could easily hit a situation as described in this blog post by Miguel Angel Nieto. The article describes what if some long-running query starts and doesn’t finish before

ANALYZE TABLE

. All the queries on the analyzing table get stuck in the state

"Waiting for table flush"

 at some time.

This happens because before the fix, 

ANALYZE TABLE

 worked as follows:

  1. Opens table statistics: concurrent DML operations (
    INSERT/UPDATE/DELETE/SELECT

    ) are allowed

  2. Updates table statistics: concurrent DML operations are allowed
  3. Update finished
  4. Invalidates table entry in the table definition cache: concurrent DML operations are forbidden
    1. What happens here is
      ANALYZE TABLE

       marks the currently open table share instances as invalid. This does not affect running queries: they will complete as usual. But all incoming queries will not start until they can re-open table share instance. And this will not happen until all currently running queries complete.

  5. Invalidates query cache: concurrent DML operations are forbidden

Last two operations are usually fast, but they cannot finish if another query touched either the table share instance or acquired query cache mutex. And, in its turn, it cannot allow for incoming queries to start.

However

ANALYZE TABLE

 modifies table statistics, not table definition!

Practically, it cannot affect already running queries in any way. If a query started before

ANALYZE TABLE

 finished updating statistics, it uses old statistics.

ANALYZE TABLE

 does not affect data in the table. Thus old entries in the query cache will still be correct. It hasn’t changed the definition of the table. Therefore there is no need to remove it from the table definition cache. As a result, we avoid operations 4 and 5 above.

The fix for lp:1704195 (migrated to PS-2503) removes these additional updates and locks required for them, and makes

ANALYZE TABLE

 always safe to run in busy production environments.

The post ANALYZE TABLE Is No Longer a Blocking Operation appeared first on Percona Database Performance Blog.

Feb
27
2013
--

MySQL optimizer: ANALYZE TABLE and Waiting for table flush

The MySQL optimizer makes the decision of what execution plan to use based on the information provided by the storage engines. That information is not accurate in some engines like InnoDB and they are based in statistics calculations therefore sometimes some tune is needed. In InnoDB these statistics are calculated automatically, check the following blog post for more information:

http://www.mysqlperformanceblog.com/2011/10/06/when-does-innodb-update-table-statistics-and-when-it-can-bite/

There are some variables to tune how that statistics are calculated but we need to wait until the gathering process triggers again to see if there is any improvement. Usually the first step to try to get back to the previous execution plan is to force that process with ANALYZE TABLE that is usually fast enough to not cause troubles.

Let’s see an example of how a simple and fast ANALYZE can cause a downtime.

Waiting for table flush:

In order to trigger this problem we need:

– Lot of concurrency
– A long running query
– Run an ANALYZE TABLE on a table accessed by the long running query

So first we need a long running query against table t:

SELECT * FROM t WHERE c > '%c%';

Then in our efforts to get a better execution plan for another query we run ANALYZE TABLE:

mysql> analyze table t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.00 sec)

Perfect, very fast! But then some seconds later we realize that our application is down. Let’s see the process list. I’ve removed most of the columns to make it clearer:

+------+-------------------------+---------------------------------+
| Time | State                   | Info                            |
|   32 | Writing to net          | select * from t where c > '%0%' |
|   12 | Waiting for table flush | select * from test.t where i=1  |
|   12 | Waiting for table flush | select * from test.t where i=2  |
|   12 | Waiting for table flush | select * from test.t where i=3  |
|   11 | Waiting for table flush | select * from test.t where i=7  |
|   10 | Waiting for table flush | select * from test.t where i=11 |
|   11 | Waiting for table flush | select * from test.t where i=5  |
|   11 | Waiting for table flush | select * from test.t where i=4  |
|   11 | Waiting for table flush | select * from test.t where i=9  |
|   11 | Waiting for table flush | select * from test.t where i=8  |
|   11 | Waiting for table flush | select * from test.t where i=12 |
|   11 | Waiting for table flush | select * from test.t where i=14 |
|   10 | Waiting for table flush | select * from test.t where i=6  |
|   10 | Waiting for table flush | select * from test.t where i=15 |
|   10 | Waiting for table flush | select * from test.t where i=10 |
[...]

The ANALYZE TABLE runs perfect but after it the rest of the threads that are running a query against that table need to wait. This is because MySQL has detected that the underlying table has changed and it needs to close and reopen it using FLUSH. Therefore the table will be locked until all queries that are using that table finish. There are only two solutions to this situation, wait until the long query finishes or kill the query. Also, we have to take in account that killing a query could cause even more troubles. If we are dealing with a write query on InnoDB the rollback process could take even more time to finish than the original query. On the other hand, if the table is MyISAM there will be no rollback process so all the already updated rows can’t be recovered.

This particular example is not only a problem of ANALYZE. Other commands like FLUSH TABLES, ALTER, RENAME, OPTIMIZE or REPAIR can cause threads to wait on “Waiting for tables”, “Waiting for table” and “Waiting for table flush”.

Conclusion

Before running an ANALYZE table or any other command listed before, check the running queries. If the table that you are going to work on is very used the recommendation is to run it during the low peak of load or a maintenance window.

The post MySQL optimizer: ANALYZE TABLE and Waiting for table flush appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com