May
04
2018
--

How Binary Logs (and Filesystems) Affect MySQL Performance

I want to take a closer look at MySQL performance with binary logs enabled on different filesystems, especially as MySQL 8.0 comes with binary logs enabled by default.

As part of my benchmarks of the MyRocks storage engine, I’ve noticed an unusual variance in throughput for the InnoDB storage engine, even though we spent a lot of time making it as stable as possible in Percona Server for MySQL. In the end, the culprit was enabled binary logs. There is also always the question, “If there is a problem with EXT4, does XFS perform differently?” To answer that, I will repeat the same benchmark on the EXT4 and XFS filesystems.

You can find our previous experiments with binary logs here: https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/.

Benchmark Setup

A short overview of the benchmark setup:

  • Percona Server for MySQL 5.7.21
  • InnoDB storage engine
  • In contrast to the previous benchmark, I enabled foreign keys, used REPEATABLE-READ isolation level, and I used UTF8 character sets. Because of these changes, the results are not really comparable with the previous results.
  • The dataset is the same: sysbench-tpcc with ten tables and 100 warehouses, resulting in a total of 1000 warehouses, and about a 90GB dataset size.
  • I will use innodb_buffer_pool_size 80GB, 70GB, and 60GB to emulate different IO loads and evaluate how that affects binary logs writes.

Initial Results

For the first run, let’s check the results without binary logs vs. with binary log enabled, but with sync_binlog=0:

Binary Log Performance

We can see that results without binary logs are generally better, but we can also see that with binary logs enabled and sync_binglog=0, there are regular drops to 0 for 1-2 seconds. This basically results in stalls in any connected application.

So, enabling binary logs may result in regular application stalls. The reason for this is that there is a limit on the size of the binary log file (max_binlog_size), which is 1GB. When the limit is reached, MySQL has to perform a binary log rotation. With sync_binlog=0, all previous writes to the binary log are cached in the OS cache, and during rotation, MySQL forces synchronous flushing of all changes to disk. This results in complete stalls every ~40 seconds (the amount of time it takes to fill 1GB of binary log in the above tests).

How can we deal with this? The obvious solution is to enable more frequent sync writes of binary logs. This can be achieved by setting sync_binlog > 0. The popular choice is the most strict, sync_binlog=1, providing the most guarantees. The strict setting also comes with noted performance penalties. I will also test sync_binlog=1000 and sync_binlog=10000, which means perform synchronous writes of binary logs every 1000 and 10000 transactions, respectively.

The Results

Binary Log Performance 1

The same results in a tabular format with median throughput (tps, more is better)

Bp sync_binlog 0 1 1000 10000 nobinlog
60 GB 4174.945 3598.12 3950.19 4205.165 4277.955
70 GB 5053.11 4541.985 4714 4997.875 5328.96
80 GB 5701.985 5263.375 5303.145 5664.155 6087.925

 

Some conclusions we can make:

  • sync_binlog=1 comes with the biggest performance penalty, but with minimal variance. This is comparable to running without binary logs.
  • sync_binlog=0 provides best (for enabled binary logs) performance, but the variance is huge.
  • sync_binlog=1000 is a good compromise, providing better performance than sync_binlog=1 with minimal variance.
  • sync_binlog=10000 might not be good, showing less variance than with 0, but it is still big.

So what value should we use? This is probably a choice between sync_binlog=1 or some value like 1000. It depends on your use case and your storage solution. In the case of slow storage, sync_binlog=1 may show a bigger penalty compared to what I can see on my enterprise SATA SSD SAMSUNG SM863.

Filesystems

All of the above results were on an EXT4 filesystem. Let’s compare to XFS. Will it show different throughput and variance?

Binary Log Performance 2

The median throughput in tabular format:

sync_binlog Buffer pool (GB) EXT4 XFS
0 60 4174.945 3902.055
0 70 5053.11 4884.075
0 80 5701.985 5596.025
1 60 3598.12 3526.545
1 70 4541.985 4538.455
1 80 5263.375 5255.38
1000 60 3950.19 3620.05
1000 70 4714 4526.49
1000 80 5303.145 5150.11
10000 60 4205.165 3874.03
10000 70 4997.875 4845.85
10000 80 5664.155 5557.61
No binlog 60 4277.955 4169.215
No binlog 70 5328.96 5139.625
No binlog 80 6087.925 5957.015

 

We can observe the general trend that median throughput on XFS is a little worse than with EXT4, with practically identical variance.

The difference in throughput is minimal. You can use either XFS or EXT4.

Hardware Spec

Supermicro server:

  • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
  • 2 sockets / 28 cores / 56 threads
  • Memory: 256GB of RAM
  • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
  • Filesystem: ext4/xfs
  • Percona-Server-5.7.21-20
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic

Extra Raw Results, Scripts and Config

My goal is to provide fully repeatable benchmarks. To that effect, I’ve shared all the scripts and settings I used in the following GitHub repo:

https://github.com/Percona-Lab-results/201805-sysbench-tpcc-binlog-fs

The post How Binary Logs (and Filesystems) Affect MySQL Performance appeared first on Percona Database Performance Blog.

Apr
19
2018
--

Congratulations to Our Friends at Oracle with the MySQL 8.0 GA Release!

MySQL 8.0 GA

MySQL 8.0 GAIt is a great today for whole MySQL community: MySQL 8.0 was just released as GA!

Geir Høydalsvik has a great summary in his “What’s New in MySQL 8.0” blog post. You can find additional information about MySQL 8.0 Replication and MySQL 8.0 Document Store that is also worth reading.

If you can’t wait to upgrade to MySQL 8.0, please make sure to read the Upgrading to MySQL 8.0 section in the manual, and pay particular attention to changes to Connection Authentication. It requires special handling for most applications.

Also keep in mind that while MySQL 8.0 passed through an extensive QA process, this is the first GA release. It is not yet as mature and polished as MySQL 5.7. If you’re just now starting application development, however, you should definitely start with MySQL 8.0 — by the time you launch your application, 8.0 will be good. 

All of us at Percona – and me personally – are very excited about this release. You can learn more details about what we expect from it in our Why We’re Excited about MySQL 8.0 webinar recording.    

We also wrote extensively about MySQL 8.0 on our blog. Below are some posts on various features, as well as thoughts on the various RCs, that you might want to review:

The best way to learn about MySQL 8.0, though, is to attend the Percona Live Open Source Database Conference 2018, taking place in Santa Clara, CA next week. We have an outstanding selection of MySQL 8.0 focused talks both from the MySQL Engineering team and the community at large (myself included):

You can still get tickets to the conference. Come by and learn about MySQL 8.0. If you can’t make it, please check back later for slides.

Done reading? Go ahead go download  MySQL 8.0 and check it out!

The post Congratulations to Our Friends at Oracle with the MySQL 8.0 GA Release! appeared first on Percona Database Performance Blog.

Apr
16
2018
--

Binlog and Replication Improvements in Percona Server for MySQL

Percona Server for MySQL

Percona Server for MySQLDue to continuous development and improvement, Percona Server for MySQL incorporates a number of improvements related to binary log handling and replication. This results in replication specifics, distinguishing it from MySQL Server.

Temporary tables and mixed logging format

Summary of the fix:

As soon as some statement involving temporary tables was met when using a mixed binlog format, MySQL switched to row-based logging for all statements until the end of the session (or until all temporary tables used in the session were dropped). This is inconvenient when you have long-lasting connections, including replication-related ones. Percona Server for MySQL fixes the situation by switching between statement-based and row-based logging when necessary.

Details:

The new mixed binary logging format, supported by Percona Server for MySQL, means that the server runs in statement-based logging by default, but switches to row-based logging when replication would be unpredictable. For example, in the case of a nondeterministic SQL statement that could cause data divergence if reproduced on a slave server. The switch is done when matching any condition from a long list, and one of these conditions is the use of temporary tables.

Temporary tables are never logged using row-based format, but any statement that touches a temporary table is logged in row mode. This way, we intercept all the side effects that temporary tables can produce on non-temporary ones.

There is no need to use the row logging format for any other statements, solely because of the temp table presence. However, MySQL undertook such an excessive precaution: once some statement with a temporary table had appeared and the row-based logging was used, MySQL was logging unconditionally put all subsequent statements in row format.

Percona Server for MySQL has implemented more accurate behavior. Instead of switching to row-based logging until the last temporary table is closed, the usual rules of row vs. statement format apply, and we don’t consider the presence of currently opened temporary tables. This change was introduced with the fix of bug #151 (upstream #72475).

Temporary table drops and binloging on GTID-enabled server

Summary of the fix:

MySQL logs DROP statements for all temporary tables regardless of the logging mode under which these tables were created. This produces binlog writes and errand GTIDs on slaves with row and mixed logging. Percona Server for MySQL fixes this by tracking the binlog format at temporary table create time and uses it to decide whether a DROP should be logged or not.

Details:

Even with read_only mode enabled, the server permits some operations, including ones with temporary tables. With the previous fix, temporary table operations are not binlogged in row- or mixed-mode. But MySQL server doesn’t track what the logging mode was when a temporary table was created, and therefore unconditionally logs DROP statements for all temporary tables. These DROP statements receive IF EXISTS addition, which is intended to make them harmless.

Percona Server for MySQL has fixed this with the bug fixes #964, upstream #83003, and upstream #85258. Moreover, with all the binlogging fixes discussed so far nothing involving temporary tables is logged to the binary log in row or mixed format. There is no need to consider CREATE/DROP TEMPORARY TABLE unsafe for use in stored functions, triggers and multi-statement transactions in row/mixed format. Therefore, we introduced an additional fix to mark the creation and drop of temporary tables as unsafe inside transactions in statement-based replication only (the fixed bug is #1816, while the correspondent upstream one is #89467 and it is still open).

Safety of statements with a LIMIT clause

Summary of the fix:

MySQL Server considers all UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause unsafe, no matter if they are really producing non-deterministic results or not. Percona Server for MySQL is more accurate because it acknowledges such instructions as safe when they include ORDER BY PK or WHERE condition.

Details:

MySQL Server treats UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause as unsafe, considering that they produce an unpredictable number of rows. But some such statements can still produce an absolutely predictable result. One such deterministic case takes place when a statement with the LIMIT clause has an ORDER BY PK or WHERE condition.

The patch, making updates and deletes with a limit to be supposed as safe if they have an ORDER BY pk_column clause, was initially provided on the upstream bug report and incorporated later into Percona Server for MySQL with additional improvements. Bug fixed #44 (upstream #42415).

Performance improvements

There are also two modifications in Percona Server related to multi-source replication that improve performance on slaves.

The first improvement is about relay log position, which was always updated in multi-source replications setups regardless of whether the committed transaction has already been executed or not. Percona Server omits relay log position updates for the already logged GTIDs.

These unconditional relay log position updates caused additional fsync operations in the case of relay-log-info-repository=TABLE. With the higher number of channels transmitting such duplicate (already executed) transactions, the situation became proportionally worse. The problem was solved in Percona Server 5.7.18-14.  Bug fixed  #1786 (upstream #85141).

The second improvement decreases the load on slave nodes configured to update the master status and connection information only on log file rotation. MySQL additionally updated this information in the case of multi-source replication when a slave had to skip the already executed GTID event. This behavior was the cause of substantially higher write loads on slaves and lower replication throughput.

The configuration with master_info_repository=TABLE and sync_master_info=0  makes the slave update the master status and connection information in this table on log file rotation and not after each sync_master_info event, but it didn’t work on multi-source replication setups. Heartbeats sent to the slave to skip GTID events that it had already executed previously were evaluated as relay log rotation events and reacted with mysql.slave_master_info table sync. This inaccuracy could produce a huge (up to five times on some setups) increase in write load on the slave, before this problem was fixed in Percona Server for MySQL 5.7.20-19. Bug fixed  #1812 (upstream #85158).

Current status of fixes

The three issues related to temporary tables that were fixed in Percona Server 5.5 and contributed upstream, and the final fixes of the bugs #72475, #83003, and #85258, have landed into MySQL Server 8.0.4.

The post Binlog and Replication Improvements in Percona Server for MySQL appeared first on Percona Database Performance Blog.

Apr
05
2018
--

Percona Live Europe 2018 – Save the Date!

Percona Live Europe 2018

Percona Live Europe 2018We’ve been searching for a great venue for Percona Live Europe 2018, and I am thrilled to announce we’ll be hosting it in Frankfurt, Germany! Please block November 5-7, 2018 on your calendar now and plan to join us at the Radisson Blu Frankfurt for the premier open source database conference.

We’re in the final days of organizing for the Percona Live 2018 in Santa Clara. You can still purchase tickets for an amazing lineup of keynote speakers, tutorials and sessions. We have ten tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Major areas of focus at the conference will include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

The Call for Papers for Percona Live Europe will open soon. We look forward to seeing you in Santa Clara!

The post Percona Live Europe 2018 – Save the Date! appeared first on Percona Database Performance Blog.

Mar
30
2018
--

Multi-Source Replication Performance with GTID

Multi-Source Replication with GTID

In this blog post, we’ll look at the performance of multi-source replication with GTID.

Multi-Source Replication is a topology I’ve seen discussed recently, so I decided to look into how it performs with the different replication concepts. Multi-source replication use replication channels, which allow a slave to replicate from multiple masters. This is a great way to consolidate data that has been sharded for production or simplify the analytics process by using the same server. Since multiple masters are taking writes, care is needed to not overlook the slave. The traditional replication concept uses the binary log file name, and the position inside that file.

This was the standard until the release of global transaction identifiers (GTID). I have set up a test environment to validate which concept would perform better, and be a better choice for use in this topology.

SETUP

My test suite is rather simple, consisting of only three virtual machines, two masters and one slave. The slaves’ replication channels are set up using the same concept for each run, and no run had any replication filters. To prevent any replication errors, each master took writes against a different schema and user grants are identical on all three servers. The setup below ran with both replication channels using binary log file and position. Then the tables were dropped and the servers changed to use GTID for the next run.

Prepare the sysbench tables:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

I used a read-only sysbench to warm up the InnoDB buffer pool. Both commands ran on the slave to ensure both schemas were loaded into the buffer pool:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=5 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

After warming up the buffer pool, the slave should be fully caught up with both masters. To remove IO contention as a possible influencer, I stopped the SQL thread while I generated load on the master. Leaving the IO thread running allowed the slave to write the relay logs during this process, and help ensure that the test only measures the difference in the slave SQL thread.

stop slave sql thread for channel 'db1'; stop slave sql thread for channel 'db3';

Each master had a sysbench run against it for the schema that was designated to it in order to generate the writes:

sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db1 --range_size=100 --table_size=1000000 --tables=5 --threads=1 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_write_only.lua run
sysbench --db-driver=mysql --mysql-user= --mysql-password='' --mysql-db=db3 --range_size=100 --table_size=1000000 --tables=5 --threads=1 --events=0 --time=3600 --rand-type=uniform /usr/share/sysbench/oltp_write_only.lua run

Once the writes completed, I monitored the IO activity on the slave to ensure it was 100% idle and that all relay logs were fully captured. Once everything was fully written, I enabled a capture of the replication lag once per minute for each replication channel, and started the slaves SQL threads:

usr/bin/pt-heartbeat -D db1 -h localhost --master-server-id=101 --check
usr/bin/pt-heartbeat -D db3 -h localhost --master-server-id=103 --check
start slave sql thread for channel 'db1'; start slave sql thread for channel 'db3';

The above chart depicts the cumulative lag seen on the slave by pt-heartbeat since starting the sql_thread. The first item to noticed is that the replication delay was higher overall with the binary log. This could be because the SQL thread stopped for a different amount of time. This may appear to give GTID an advantage in this test, but remember with this test the amount of delay is less important than the processed rate. Focusing on when replication began to display a significant change towards catching up you can see that there are two distinct drops in delay. This is caused by the fact that the slave has two replication threads that individually monitor their delay. One of the replication threads caught up fully and the other was delayed for a bit longer.

In every test run. GTID took slightly longer to fully catch up than the traditional method. There are a couple of reasons to expect GTID’s to be slightly slower. One possibility is the that there are additional writes on the slave, in order to keep track of all the GTID’s that the slave ran. I removed the initial write to the relay log, but we must retain the committed GTID, and this causes additional writes. I used the default settings for MySQL, and as such log_slave_updates was disabled. This causes the replicated GTID to be stored in a table, which is periodically compressed. You can find more details on how log_slave_updates impacts GTID replication here.

So the question still exists, why should we use GTID, especially with multisource replication? I’ve found that the answer lies in the composition of a GTID. From MySQL’s GTID Concepts, a GTID is composed of two parts, the source_id, and the transaction_id. The source_id is a unique identifier targeting the server which originally wrote the transaction. This allows you to identify in the binary log which master took the initial write, and so you can pinpoint problems much easier.

The below excerpt from DB1’s (a master from this test) binary log shows that, before the transaction being written, the “SET @@SESSION.GTID_NEXT” ran. This is the GTID that you can follow through the rest of the topology to identify the same transaction.

“d1ab72e9-0220-11e8-aee7-00155dab6104” is the server_uuid for DB1, and 270035 is the transaction id.

SET @@SESSION.GTID_NEXT= 'd1ab72e9-0220-11e8-aee7-00155dab6104:270035'/*!*/;
# at 212345
#180221 15:37:56 server id 101 end_log_pos 212416 CRC32 0x758a2d77 Query thread_id=15 exec_time=0 error_code=0
SET TIMESTAMP=1519245476/*!*/;
BEGIN
/*!*/;
# at 212416
#180221 15:37:56 server id 101 end_log_pos 212472 CRC32 0x4363b430 Table_map: `db1`.`sbtest1` mapped to number 109
# at 212472
#180221 15:37:56 server id 101 end_log_pos 212886 CRC32 0xebc7dd07 Update_rows: table id 109 flags: STMT_END_F
### UPDATE `db1`.`sbtest1`
### WHERE
### @1=654656 /* INT meta=0 nullable=0 is_null=0 */
### @2=575055 /* INT meta=0 nullable=0 is_null=0 */
### @3='20363719684-91714942007-16275727909-59392501704-12548243890-89454336635-33888955251-58527675655-80724884750-84323571901' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='97609582672-87128964037-28290786562-40461379888-28354441688' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### SET
### @1=654656 /* INT meta=0 nullable=0 is_null=0 */
### @2=575055 /* INT meta=0 nullable=0 is_null=0 */
### @3='17385221703-35116499567-51878229032-71273693554-15554057523-51236572310-30075972872-00319230964-15844913650-16027840700' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='97609582672-87128964037-28290786562-40461379888-28354441688' /* STRING(60) meta=65084 nullable=0 is_null=0 */
# at 212886
#180221 15:37:56 server id 101 end_log_pos 212942 CRC32 0xa6261395 Table_map: `db1`.`sbtest3` mapped to number 111
# at 212942
#180221 15:37:56 server id 101 end_log_pos 213166 CRC32 0x2782f0ba Write_rows: table id 111 flags: STMT_END_F
### INSERT INTO `db1`.`sbtest3`
### SET
### @1=817058 /* INT meta=0 nullable=0 is_null=0 */
### @2=390619 /* INT meta=0 nullable=0 is_null=0 */
### @3='01297933619-49903746173-24451604496-63437351643-68022151381-53341425828-64598253099-03878171884-20272994102-36742295812' /* STRING(120) meta=65144 nullable=0 is_null=0 */
### @4='29893726257-50434258879-09435473253-27022021485-07601619471' /* STRING(60) meta=65084 nullable=0 is_null=0 */
# at 213166
#180221 15:37:56 server id 101 end_log_pos 213197 CRC32 0x5814a60c Xid = 2313
COMMIT/*!*/;
# at 213197

Conclusion

Based on the sysbench tests I ran, GTID replication has a slightly lower throughput. It took about two to three minutes longer to process an hour worth of writes on two masters, compared to binary log replication. GTID’s strengths lie more in how it eases the management and troubleshooting of complex replication topologies.

The GTID concept allows a slave to know exactly which server initially wrote the transaction, even in a tiered environment. This means that if you need to promote a slave from the bottom tier, to the middle tier, simply changing the master is all that is needed. The slave can pick up from the last transaction it ran on that server and continue replicating without a problem. Stephane Combaudon explains this in detail in a pair of blogs. You can find part 1 here and part 2 here. Facebook also has a great post about their experience deploying GTID-based replication and the troubles they faced.

The post Multi-Source Replication Performance with GTID appeared first on Percona Database Performance Blog.

Mar
27
2018
--

ANALYZE TABLE Is No Longer a Blocking Operation

analyze table

analyze tableIn this post, I’ll discuss the fix for lp:1704195 (migrated to PS-2503), which prevents

ANALYZE TABLE

 from blocking all subsequent queries on the same table.

In November 2017, Percona released a fix for lp:1704195 (migrated to PS-2503), created by Laurynas Biveinis. The fix, included with Percona Server for MySQL since versions 5.6.38-83.0 and 5.7.20-18, stops

ANALYZE TABLE

 from invalidating query and table definition cache content for supported storage engines (InnoDB, TokuDB and MyRocks).

Why is this important?

In short, it is now safe to run

ANALYZE TABLE

 in production environments because it won’t trigger a situation where all queries on the same table stack are in the state

"Waiting for table flush"

. Check this blog post for details on how this situation can happen.

Why do we need to run ANALYZE TABLE?

When Optimizer decides which index to choose to resolve the query, it uses statistics stored for this table by storage engine. If the statistics are not up to date, Optimizer might choose the wrong index when it creates the query execution plan. This can cause performance to suffer.

To prevent this, storage engines support automatic and manual statistics updates. While automatic statistics updates usually work fine, there are cases when they do not do their job properly.

For example, InnoDB uses 20 sample 16K pages when it updates persistent statistics, and eight 16K pages when it updates transient statistics. If your data distribution is even, it does not matter how big your table is: even for 1T tables, using a sample of 320K is enough. But if your data isn’t even, statistics might get wrongly created. The solution for this issue is to increase either the innodb_stats_transient_sample_pages or innodb_stats_persistent_sample_pages variable. But increasing the number of pages to examine while collecting statistics leads to longer update runs, and thus higher IO activity, which is probably not what you want to happen often.

To control this, you can disable automatic statistics updates for such tables, and schedule a job that periodically runs 

ANALYZE TABLE

.

Will it be safe before the fix for lp:1704195 (migrated to PS-2503)?

Theoretically yes, but we could easily hit a situation as described in this blog post by Miguel Angel Nieto. The article describes what if some long-running query starts and doesn’t finish before

ANALYZE TABLE

. All the queries on the analyzing table get stuck in the state

"Waiting for table flush"

 at some time.

This happens because before the fix, 

ANALYZE TABLE

 worked as follows:

  1. Opens table statistics: concurrent DML operations (
    INSERT/UPDATE/DELETE/SELECT

    ) are allowed

  2. Updates table statistics: concurrent DML operations are allowed
  3. Update finished
  4. Invalidates table entry in the table definition cache: concurrent DML operations are forbidden
    1. What happens here is
      ANALYZE TABLE

       marks the currently open table share instances as invalid. This does not affect running queries: they will complete as usual. But all incoming queries will not start until they can re-open table share instance. And this will not happen until all currently running queries complete.

  5. Invalidates query cache: concurrent DML operations are forbidden

Last two operations are usually fast, but they cannot finish if another query touched either the table share instance or acquired query cache mutex. And, in its turn, it cannot allow for incoming queries to start.

However

ANALYZE TABLE

 modifies table statistics, not table definition!

Practically, it cannot affect already running queries in any way. If a query started before

ANALYZE TABLE

 finished updating statistics, it uses old statistics.

ANALYZE TABLE

 does not affect data in the table. Thus old entries in the query cache will still be correct. It hasn’t changed the definition of the table. Therefore there is no need to remove it from the table definition cache. As a result, we avoid operations 4 and 5 above.

The fix for lp:1704195 (migrated to PS-2503) removes these additional updates and locks required for them, and makes

ANALYZE TABLE

 always safe to run in busy production environments.

The post ANALYZE TABLE Is No Longer a Blocking Operation appeared first on Percona Database Performance Blog.

Mar
26
2018
--

New MySQL 8.0 innodb_dedicated_server Variable Optimizes InnoDB from the Get-Go

MySQL 8.0 innodb_dedicated_server

MySQL 8.0 innodb_dedicated_serverIn this post, we’ll look at the MySQL 8.0 innodb_dedicated_server variable.

MySQL 8.0 introduces a new variable called innodb_dedicated_server. When enabled, it auto tunes innodb_buffer_pool_size, innodb_log_file_size and innodb_flush_method at startup (if these variables are not explicitly defined in my.cnf).

The new MySQL 8.0 variable automatically sizes the following variables based on the RAM size of the system:

innodb_buffer_pool_size:

    • <1G: 128M(default value if innodb_dedicated_server is OFF)
    • <=4G: Detected Physical RAM * 0.5
    • >4G: Detected Physical RAM * 0.75

innodb_log_file_size:

    • <1G: 48M(default value if innodb_dedicated_server is OFF)
    • <=4G: 128M
    • <=8G: 512M
    • <=16G: 1024M
    • >16G: 2G

The variable also sets the following:

innodb_flush_method: 

    • Set to O_DIRECT_NO_FSYNC if the setting is available on the system. If not, set it to the default InnoDB flush method

These new default values are very reasonable, and the changes to these three variables show considerable performance improvements from the get-go than using the old default values. As stated in the worklog of this feature, the current MySQL version (5.7) only uses around 512M RAM with the default settings. With the new feature, these variables can easily adapt to the amount of RAM allocated to the server for the convenience of the system/database administrator.

With that said, you can achieve the best setting for these three variables by tuning it to your workload and hardware.

For InnoDB buffer pool size (based on this article), consider allocating 80% of physical RAM for starters. You can increase it to as large as needed and possible, as long as the system doesn’t swap on the production workload.

For InnoDB log file size, it should be able to handle one hour of writes to allow InnoDB to optimize writing the redo log to disk. You can calculate an estimate by following the steps here, which samples one minute worth of writes to the redo log. You could also get a better estimate from hourly log file usage with Percona Monitoring and Management (PMM) graphs.

Finally, for innodb_flush_method, O_DIRECT_NO_FSYNC prevents double buffering between the OS cache and disk, and works well with low-latency IO devices such as RAID subsystem with write cache. On the other hand, in high-latency IO devices, commonly found on deployments where MySQL is stored in SAN drives, having an OS cache with the default flush method fsync is more beneficial.

All in all, the MySQL 8.0 innodb_dedicated_server variable provides a fairly well-tuned InnoDB configuration at startup. But if it’s not enough, you can still tune these variables based on your workload and hardware. While MySQL 8.0 isn’t released yet, you can take a look at this article that helps you tune the current version (MySQL 5.7) right after installation.

The post New MySQL 8.0 innodb_dedicated_server Variable Optimizes InnoDB from the Get-Go appeared first on Percona Database Performance Blog.

Feb
22
2018
--

How to Restore MySQL Logical Backup at Maximum Speed

Restore MySQL Logical Backup

Restore MySQL Logical BackupThe ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.

Disclaimer

All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.

Benchmark

There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.

Database:

  • 16GB – InnoDB database size
  • 6.6GB – uncompressed mysqldump sql
  • 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
  --innodb_buffer_pool_size=4GB
  --innodb_log_file_size=1G
  --skip-log-bin
  --innodb_flush_log_at_trx_commit=0
  --innodb_flush_method=nosync
  --innodb_io_capacity=2000
  --innodb_io_capacity_max=3000
  --max_allowed_packet=1G
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
)
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
)
# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/test-restore.sh
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
done
)
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
)
#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

  • Using four parallel jobs for load: 17m3.662s
  • Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
17m59.866s
mydumper -B imdb1 --no-locks -o default_options
17m15.175s
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
16m36.878s
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB
18m15.266s

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years: https://bugs.launchpad.net/mydumper/+bug/1640550.

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

  • Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
  • 24 parallel processes of mysqlimport: 11m51.718s
  • Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
  • Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.
  • In many places, disable unique checks referred as a performance booster: 10m52.489s
    You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
  • The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.
  • O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.
  • O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.

Conclusions

  • There is no common solution to improve logical backup restore procedure.
  • If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
  • Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
  • The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
  • InnoDB log file size and buffer pool size are really important options for backup restore performance.
  • O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
  • If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
  • Hyperthreading also counts.
  • A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
  • Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
  • You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
  • Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
  • Default MySQL settings could cost you 10x backup restore slowdown
  • This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
  • If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).
Feb
19
2018
--

Percona Server for MySQL 5.7.21-20 Is Now Available

Percona Server for MySQL 5.7.20-18

Percona Server for MySQL 5.7.20-19Percona announces the GA release of Percona Server for MySQL 5.7.21-20 on February 19, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-20 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software.

New Features:
  • A new string variable version_suffix allows to change suffix for the Percona Server version string returned by the read-only version variable. Also version_comment is converted from a global read-only to a global read-write variable.
  • A new keyring_vault_timeout variable allows to set the amount of seconds for the Vault server connection timeout. Bug fixed #298.
Bugs Fixed:
  • mysqld startup script was unable to detect jemalloc library location for preloading, and that prevented starting Percona Server on systemd based machines. Bugs fixed #3784 and #3791.
  • There was a problem with fulltext search, which could find a word with punctuation marks in natural language mode only, but not in boolean mode. Bugs fixed #258#2501 (upstream #86164).
  • Build errors were present on FreeBSD (caused by fixing the bug #255 in Percona Server 5.6.38-83.0) and on MacOS (caused by fixing the bug #264 in Percona Server 5.7.20-19). Bugs fixed #2284 and #2286.
  • A bunch of fixes was introduced to remove GCC 7 compilation warnings for
    the Percona Server build. Bugs fixed #3780 (upstream #89420#89421, and #89422).
  • CMake error took place at compilation with bundled zlib. Bug fixed #302.
  • A GCC 7 warning fix introduced regression in Percona Server that led to a wrong SQL query built to access the remote server when Federated storage engine was used. Bug fixed #1134.
  • It was possible to enable encrypt_binlog with no binary or relay logging enabled. Bug fixed #287.
  • Long buffer wait times where occurring on busy servers in case of the IMPORT TABLESPACE command.
  • Bug fixed #276.
  • Server queries that contained JSON special characters and were logged by Audit Log Plugin in JSON format caused invalid output due to lack of escaping. Bug fixed #1115.
  • Percona Server now uses Travis CI for additional tests. Bug fixed #3777.

Other bugs fixed:  #257#264#1090  (upstream #78048),  #1109#1127#2204#2414#2415#3767#3794, and  #3804 (upstream #89598).

 This release also contains fixes for the following CVE issues: CVE-2018-2565, CVE-2018-2573, CVE-2018-2576, CVE-2018-2583, CVE-2018-2586, CVE-2018-2590, CVE-2018-2612, CVE-2018-2600, CVE-2018-2622, CVE-2018-2640, CVE-2018-2645, CVE-2018-2646, CVE-2018-2647, CVE-2018-2665, CVE-2018-2667, CVE-2018-2668, CVE-2018-2696, CVE-2018-2703, CVE-2017-3737.
MyRocks Changes:
  • A new behavior makes Percona Server fail to restart on detected data corruption;  rocksdb_allow_to_start_after_corruption variable can be passed to mysqld as a command line parameter to switch off this restart failure.
  • A new cmake option ALLOW_NO_SSE42 was introduced to allow MyRocks build on hosts not supporting SSE 4.2 instructions set, which makes MyRocks usable without FastCRC32-capable hardware. Bug fixed MYR-207.
  • rocksdb_bytes_per_sync  and rocksdb_wal_bytes_per_sync  variables were turned into dynamic ones.
  • rocksdb_flush_memtable_on_analyze variable has been removed.
  • rocksdb_concurrent_prepare is now deprecated, as it has been renamed in upstream to  rocksdb_two_write_queues.
  • rocksdb_row_lock_deadlocks and rocksdb_row_lock_wait_timeouts global status counters were added to track the number of deadlocks and the number of row lock wait timeouts.
  • Creating table with string indexed column to non-binary collation now generates warning about using inefficient collation instead of error. Bug fixed MYR-223.
TokuDB Changes:
  • A memory leak was fixed in the PerconaFT library, caused by not destroying PFS key objects on shutdown. Bug fixed TDB-98.
  • A clang-format configuration was added to PerconaFT and TokuDB. Bug fixed TDB-104.
  • A data race was fixed in minicron utility of the PerconaFT. Bug fixed TDB-107.
  • Row count and cardinality decrease to zero took place after long-running REPLACE load.

Other bugs fixed: TDB-48TDB-78TDB-93, and TDB-99.

The release notes for Percona Server for MySQL 5.7.21-20 are available in the online documentation. Please report any bugs on the project bug tracking system.

Feb
15
2018
--

Troubleshooting MySQL Crashes Webinar: Q&A

Troubleshooting MySQL Crashes

Troubleshooting MySQL CrashesIn this blog, I will provide answers to the Q & A for the Troubleshooting MySQL Crashes webinar.

First, I want to thank everybody for attending our January 25, 2018, webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: I have the 600 seconds “Long semaphore wait” assertion failure / crashing issue following DDL queries, sometimes on the master, sometimes just the slaves. Any hints for troubleshooting these? How can I understand what semaphore holding threads are doing?

A: These are hardest errors to troubleshoot. Especially because in some cases (like long-running

CHECK TABLE

 commands) long semaphore waits could be expected and appropriate behavior. If you see long semaphore waits when performing DDL operations, it makes sense to consider using pt-online-schema-change or gh-ost utilities. Also, check the list of supported online DDL operations in the MySQL User Reference Manual.

But if you want to know how to analyze such messages, let’s check the output from page #17 in the slide deck used in the webinar:

2018-01-19T20:38:43.381127Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139970010412800 has waited at ibuf0ibuf.cc line 3454 for 321.00 seconds the semaphore:
S-lock on RW-latch at 0x7f4dde2ea310 created in file buf0buf.cc line 1453
a writer (thread id 139965530261248) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: fffffffff0000000
Last time read locked in file ibuf0ibuf.cc line 3454
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/
debian-wheezy-x64/percona-server-5.7.14-8/storage/innobase/btr/btr0btr.cc line 177
2018-01-19T20:38:43.381143Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139965135804160 has waited at buf0buf.cc line 4196 for 321.00 seconds the semaphore:
S-lock on RW-latch at 0x7f4f257d33c0 created in file hash0hash.cc line 353
a writer (thread id 139965345621760) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0buf.cc line 4196
Last time write locked in file ...

The line

--Thread 139970010412800 has waited at ibuf0ibuf.cc line 3454 for 321.00 seconds the semaphore:

Shows that some transaction was waiting for a semaphore. The code responsible for this wait is located on line 3454 in file

ibuf0ibuf.cc

. I received this crash when I ran Percona Server for MySQL version 5.7.14-8. Therefore, to check what this code is doing, I need to use Percona Server 5.7.14-8 source code:

sveta@Thinkie:~/mysql_packages/percona-server-5.7.14-8$ vim storage/innobase/ibuf/ibuf0ibuf.cc
...
3454 btr_pcur_open(ibuf->index, ibuf_entry, PAGE_CUR_LE, mode, &pcur, &mtr);
...

A few lines above in the same file contain function definition and comment:

3334 /** Buffer an operation in the insert/delete buffer, instead of doing it
3335 directly to the disk page, if this is possible.
3336 @param[in] mode BTR_MODIFY_PREV or BTR_MODIFY_TREE
3337 @param[in] op operation type
3338 @param[in] no_counter TRUE=use 5.0.3 format; FALSE=allow delete
3339 buffering
3340 @param[in] entry index entry to insert
3341 @param[in] entry_size rec_get_converted_size(index, entry)
3342 @param[in,out] index index where to insert; must not be unique
3343 or clustered
3344 @param[in] page_id page id where to insert
3345 @param[in] page_size page size
3346 @param[in,out] thr query thread
3347 @return DB_SUCCESS, DB_STRONG_FAIL or other error */
3348 static MY_ATTRIBUTE((warn_unused_result))
3349 dberr_t
3350 ibuf_insert_low(
3351 ulint mode,
3352 ibuf_op_t op,
3353 ibool no_counter,
3354 const dtuple_t* entry,
3355 ulint entry_size,
3356 dict_index_t* index,
3357 const page_id_t& page_id,
3358 const page_size_t& page_size,
3359 que_thr_t* thr)
3360 {
...

The first line of the comment gives us an idea that InnoDB tries to insert data into change buffer.

Now, let’s check the next line from the error log file:

S-lock on RW-latch at 0x7f4dde2ea310 created in file buf0buf.cc line 1453
sveta@Thinkie:~/mysql_packages/percona-server-5.7.14-8$ vim storage/innobase/buf/buf0buf.cc
...
1446 /* If PFS_SKIP_BUFFER_MUTEX_RWLOCK is defined, skip registration
1447 of buffer block rwlock with performance schema.
1448
1449 If PFS_GROUP_BUFFER_SYNC is defined, skip the registration
1450 since buffer block rwlock will be registered later in
1451 pfs_register_buffer_block(). */
1452
1453 rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);
...

And again let’s check what this function is doing:

1402 /********************************************************************//**
1403 Initializes a buffer control block when the buf_pool is created. */
1404 static
1405 void
1406 buf_block_init(

Even without knowledge of how InnoDB works internally, by reading only these comments I can guess that a thread waits for some global InnoDB lock when it tries to insert data into change buffer. The solution for this issue could be either disabling change buffer, limiting write concurrency, upgrading or using a software solution that allows you to scale writes.

Q: For the page cleaner messages, when running app using replication we didn’t get them. After switching to PXC we started getting them. Something we should look at particular to PXC to help resolve this?

A: Page cleaner messages could be a symptom of starving IO activity. You need to compare Percona XtraDB Cluster (PXC) and standalone server installation and check how exactly the write load increased.

Q: Hi, I have one question, we have a query we were joining on 

BLOB

 or

TEXT

 fields that is causing system locks and high CPU alerts and causing a lot of system locks, can you please suggest how can we able to make it work? Can you please send the answer in a text I missed some information?

A: If you are joining on

BLOB

 or

TEXT

 fields you most likely don’t use indexes. This means that InnoDB has to perform a full table scan. It increases IO and CPU activity by itself, but also increases the number of locks that InnoDB has to set to resolve the query. Even if you have partial indexes on the 

BLOB

 and

TEXT

 columns, mysqld has to compare full values for the equation, so it cannot use index only to resolve

ON

 clause. It is a best practice to avoid such kinds of

JOIN

s. You can use surrogate integer keys, for example.

Q: Hi, please notice that “MySQL server has gone away” is the worst one, in my opinion, and there was no mention about that ….can you share some tips on this? Thank you.
Both MySQL from Oracle and Percona error log does not help on that, by the way …

A:

MySQL Server has gone away

” error maybe the result of a crash. In this case, you need to handle it like any other crash symptom. But in most cases, this is a symptom of network failure. Unfortunately, MySQL doesn’t have much information why connection failures happen. Probably because, from mysqld’s point of view, a problematic network only means that the client unexpectedly disconnected after a timeout, and the client still waiting for a response receives “

MySQL Server has gone away

”. I discussed these kinds of errors in my  “Troubleshooting hardware resource usage” webinar. A good practice for situations when you see this kind of error often is don’t leave idle connections open for a long time.

Q: I see that a lot of work is doing hard investigation about some possibilities of what is going wrong….is there a plan at development roadmap on improve error log output messages? If you can comment on that …

A: Percona Engineering does a lot for better diagnostics. For example, Percona Server for MySQL has an extended slow log file format, and Percona Server for MySQL 5.7.20 introduced a new

innodb_print_lock_wait_timeout_info

  variable that allows log information about all InnoDB lock wait timeout errors (manual). More importantly, it logs not only blocked transaction, but also locking transaction. This feature was requested at lp:1657737 for one of our Percona Support customers and is now implemented

Oracle MySQL Engineering team also does a lot for better error logging. The start of these improvements happened in version 5.7.2, when variable log_error_verbosity was introduced. Version 8.0.4 added much better tuning control. You can read about it in the Release Notes.

Q: Hello, you do you using strace to find what exactly table have problems in case there is not clear information in mysql error log?

A: I am not a big fan of

strace

 when debugging mysqld crashes, but Percona Support certainly uses this tool. I myself prefer to work with

strace

 when debugging client issues, such as trying to identify why Percona XtraBackup behaves incorrectly.

Thanks everybody for attending the webinar. You can find the slides and recording of the webinar at the Troubleshooting MySQL Crashes web page.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com