Sep
23
2024
--

Understanding MySQL Indexes: Types, Benefits, and Best Practices

When it comes to MySQL databases, performance is everything. As more activities move online and data volumes grow exponentially, ensuring efficient data retrieval and query execution becomes crucial. Database indexing plays a significant role in this by providing powerful tools to optimize operations in MySQL. Without an index, MySQL must perform a full table scan, […]

Jul
17
2024
--

Ultimate Guide to Improving MySQL Query Performance

Improving MySQL Query PerformanceMySQL is certainly a powerful open source database management system, but even the most robust engine struggles when queries take an eternity to execute. For DBAs and developers, improving MySQL query performance is an ongoing goal. Efficient query performance is crucial for ensuring the smooth operation and optimal user experience of applications powered by MySQL […]

Dec
29
2021
--

Q & A on Webinar “MySQL Performance for DevOps”

MySQL Performance for DevOps

MySQL Performance for DevOpsFirst I want to thank everyone who attended my November 16, 2021 webinar “MySQL Performance for DevOps“. Recording and slides are available on the webinar page.

Here are answers to the questions from participants which I was not able to provide during the webinar.

Q: Hi! We have troubles with DELETE queries. We have to remove some data periodically (like, hourly, daily) and we have short-term server stalls during these DELETEs. Server is running on modern NVMe’s so we wonder why do we have this situation. Those DELETE’s are not so large, like 10 000 – 15 000 records, but tables on which DELETE’s are performed update frequently.

A: I would test if a similar

DELETE

  statement is slow when you run it on the development server in an isolated environment while no other session is connected to the MySQL server instance.  If it is slow in this case too, check if MySQL uses indexes to resolve the condition

WHERE

  for the

DELETE

  statement. You can use

EXPLAIN

  statement for

DELETE

  or convert

DELETE

  into a similar

SELECT

  query and experiment.

If the

DELETE

  statement is running fast when called in the isolated environment, check how parallel sessions affect its performance. If the tables you are deleting from are updated frequently,

DELETE

  statements could cause and be affected by locking conflicts. To resolve this situation study how MySQL works with locks. Great presentation about InnoDB locks “InnoDB Locking Explained with Stick Figures” could be found at https://www.slideshare.net/billkarwin/innodb-locking-explained-with-stick-figures Then you need to optimize

DELETE

  and

UPDATE

  statements, so they finish faster. Alternatively, you can separate them in time, so they have less effect on each other. You may also split

DELETE

  statements, so they update fewer records at a time.

Q: Question 2. We have innodb_buffer_size set around 260Gb on the dedicated server with about 320Gb of total RAM. Still, we have 99.9% memory full and there are no other large memory consumers, only MySQL (Percona 8.0.23). The server starts and around 3 hours it takes all available memory regardless of the innodb_buffer_size setting. We never had something like this with 5.7. Do you have any ideas?

A: MySQL uses memory not only for the InnoDB buffer pool but for other data, such as session-based and operation-based buffers. For example, if you have 100 connections that use underlying temporary tables to resolve queries and set the size of the internal temporary table to 100MB you will use around 10G additional memory for these tables. Query memory digest tables in Performance Schema and views on these tables in the

sys

 schema to find the operations that allocate memory in your MySQL server.

Q: Can we get a copy of this presentation?

A: You should have received a copy of the slides. If you did not, they are attached to this blog post: DevOps_Perf_202111

Q: buffer_pool_size should be what percentage of the host RAM?

A: The percentage of the host RAM is a very rough estimation of the ideal amount of memory you need to allocate for the InnoDB buffer pool. For example, the MySQL user manual in past had recommendations for having InnoDB buffer pool size up to 80% of the available RAM. But 80% of RAM is very different if the host has, say, 8G, or 1024G. In the former case, 80% is 6.4G and the host will have 1.6G for other MySQL buffers and the operating system that could be not enough. In the latter case, 80% is 819.2G and the host will have 204.8G for other needs. Depending on your workload it could be a huge waste of resources. I recommend you to read this blog post: https://www.percona.com/blog/2015/06/02/80-ram-tune-innodb_buffer_pool_size/ and follow the links in the end, then choose the size, appropriate for your data set and workload.

Q: How we can fitting RAM size vs data size?

Example: if I have 1G of data, how many RAM I need for get 100 QPS, and if I have 100G of data how many RAM I need for get 100 QPS?

A: RAM size, dataset size, and the number of queries per second that your server can handle are not directly related. You need to test your queries and follow how they are executed. For example, if you select everything from the InnoDB table and your table holds either 1G or 100G of data, and you do not access any other table on the server, the very first run will be slower than following because InnoDB will read data into the buffer pool. Then performance and the number of queries per second will be limited only by network speed and bandwidth between your client and server having you can allocate about 100G for your buffer pool. But cached size will stay almost the same as the table size no matter how many connections you have. Your MySQL server will only use a small amount of memory for new connections buffers.

In another case, however, you may have a comparatively small table that you will access by a quite complicated query. For example, if you try to repeat the test case for still valid https://bugs.mysql.com/bug.php?id=29423, a single query on the 184M table would run for a much longer time than you expect. In this case number of queries per second will be also very low.

Q: Do you have a recommendation parameter list for MySQL RDS on AWS?

A: It is the same as for the dedicated MySQL server but you may have not been able to change some of the options.

Q: If you know you have SSD’s, but ROTA = 1, what has to be configured to make use of the SSDs?

A: For SSD ROTA should be 0. If you are sure you have SSDs but they are shown as rotational disks this means that your storage is configured incorrectly. Depending on the configuration you may still have the same performance as if the disks were recognized properly. If this is not the case, check your storage, RAID controller, and system configuration.

MySQL just sends system commands for reading, writing, and syncing data. It does not care if the disk is rotational or not. For MySQL performance value of ROTA does not really matter.

Q: If you believed you tuned both master and slave for the best performance, but seconds behind master continues to increase, you decide to split the shard, but xtrabackup fails with log wrap.  But even if you were to get a good backup, once it is online, the slave will never catch up.  The Kobayashi Maru, a no win situation – have you been there?  What did you do?

A: First make sure if you configured a multi-threaded replica. If you use parallel type

LOGICAL_CLOCK

, study option binlog_transaction_dependency_tracking. Practically how it works when set to

WRITESET

  or to

WRITESET_SESSION

 . For avoiding log wrap during backup increase redo log file size. If you can stop the source server, stop it and set up a replica by copying datadir: it is faster than using XtraBackup, because you would not need to copy changes in the redo log files while the backup is running.

Q: In MySQL 5.7, the tmp tablespace is now InnoDB, how can you tune tmp to take advantage of RAM and not use disk?

A: The tablespace file on disk is used only when the in-memory table is converted into a disk-based table. Otherwise, temporary tables continue using memory.

Q: What are the top 6 variables to get the best performance, how can you verify how effective their setting are, looking at the global status, when can you know when those variables can be increased to get the best utilization from CPUs/RAM/Disk/Network.

A: While I showed variables that can improve performance in most cases on my “Conclusion” slides I recommend you to start from the issue you are trying to solve and start adjusting variables only when you understand what you are doing.

Some of such variables could be measured for effectiveness. For example, if the number of free buffers in the output of

SHOW ENGINE INNODB STATUS

  is small and the buffer pool hit rate shows that a number of disk access is consistently greater than the number of the buffer pool hits, it indicates that the buffer pool size may be too small for you your workload and data.

Regarding CPU, if the number of active threads is high, and you see performance drop when concurrency increases while the operating system shows low CPU usage, it may be a symptom that either:

– you limited the upper limit of the number of active engine threads

– disk does not support so many parallel operations and active threads are waiting for IO

Another issue with CPU performance could happen if the upper limit of the number of active engine threads is not set or too high and threads are spending time doing nothing while waiting in the priority queue.

The only option that directly limits IO activity is

innod_io_capacity

  that limits the speed of background InnoDB operations. If set too low InnoDB may underuse your fast disk and if set too high InnoDB could start writing too fast, so each write request will waste time waiting in its queue.

Q: What was the last InnoDB setting, the one which should up to no of CPU cores?

A: This is

innodb_thread_concurrency

  that limits the number of InnoDB threads that could run in parallel. You should set it either to 0 or to the number of CPU cores.

Q: Which is more secure and faster community MySQL or Percona MySQL or aws rds?

A: Percona MySQL has performance, diagnostic improvements, as well as Enterprise-level features, available as open source. AWS RDS supports hardware scaling on demand and physical replication that uses InnoDB redo log files instead of binary logs. However, it does not allow you to have the same control on the server as for your own physical instance. Community MySQL works on a higher number of platforms, thus uses function calls that work on all of them where Percona MySQL or AWS RDS may use optimized variants. So each of them has its own advantages and disadvantages.

Q: In case with open tables >>> open_files (and cannot change open_files) how to set table_open_cache? “as big as possible”?

A: Status variable

Open_files

  is “the number of files that are open. This count includes regular files opened by the server. It does not include other types of files such as sockets or pipes. Also, the count does not include files that storage engines open using their own internal functions rather than asking the server level to do so.” (https://dev.mysql.com/doc/refman/8.0/en/server-status-variables.html#statvar_Open_files) The status variable

Open_tables

  is “the number of tables that are open”. They are not related to each other. You need to watch that value of 

Opened_tables

  (“the number of tables that have been opened”) is not greater than

Open_tables

 .

There is an operating system option “open files” that is visible if you run the command

ulimit -n

. This option should be greater than the maximum number of files that your MySQL instance can simultaneously open. Speaking about

Open_tables

 : you cannot have this value set to a number that is larger than the operating system option “open files” unless your tables are stored in the shared or general tablespace.

Q: How to tell if we should tune join_buffer_size? wait events anywhere?

A: If you use

JOIN

  queries that do not use indexes and they perform slowly because of this. Start from regular query tuning using slow query log, Performance Schema, and Query Analyzer in PMM to find queries that require optimization. In Query Analyzer add a column “Full Join” to your query list. In the Performance Schema search for statements where the value of

SELECT_FULL_JOIN

  is greater than 0 in the

events_statements_*

  tables.

Check also my “Introduction to MySQL Query Tuning for Dev[Op]s” webinar.

Q: How to measure memory consumption of table_open_cache? 15K/table? FRM-related? some way to estimate?

A: This is event “

memory/sql/TABLE_SHARE::mem_root

” Check also this blog post.

Q: Hello guys!

Do we need to prepare different optimization depends on MySQL engine e.g. XtraDB, InnoDB? If yes, could you please explain differences?

Best regards,

Oleg Stelmach

A: XtraDB is an enhanced version of InnoDB in the Percona Server: https://www.percona.com/doc/percona-server/8.0/percona_xtradb.html. So differences are added features in the Percona Server. Namely, the options that exist in the Percona server and do not exist in the upstream Community MySQL.

Q: Regarding threads. Do better to use hyperthreading\multithreading for MySQL instance or we need to turn off this function?

Best regards,

Oleg Stelmach

A: You do not have to turn this option off but you may see that MySQL performance is not linearly predictable in high concurrent workloads. I recommend you to check this blog post with hyperthreading benchmarks on MySQL and comments on it for a better understanding of how hyperthreading can affect MySQL performance.

Q: Besides from setting os swap-pines correctly. would also recommend to enable memlock in my.cnf?

A: Normally you do not need it.

Aug
01
2018
--

Saving With MyRocks in The Cloud

The main focus of a previous blog post was the performance of MyRocks when using fast SSD devices. However, I figured that MyRocks would be beneficial for use in cloud workloads, where storage is either slow or expensive.

In that earlier post, we demonstrated the benefits of MyRocks, especially for heavy IO workloads. Meanwhile, Mark wrote in his blog that the CPU overhead in MyRocks might be significant for CPU-bound workloads, but this should not be the issue for IO-bound workloads.

In the cloud the cost of resources is a major consideration. Let’s review the annual cost for the processing and storage resources.

 Resource cost/year, $   IO cost $/year   Total $/year 
c5.9xlarge  7881    7881
1TB io1 5000 IOPS  1500  3900    5400
1TB io1 10000 IOPS  1500  7800    9300
1TB io1 15000 IOPS  1500  11700  13200
1TB io1 20000 IOPS  1500  15600  17100
1TB io1 30000 IOPS  1500  23400  24900
3.4TB GP2 (10000 IOPS)  4800    4800

 

The scenario

The server version is Percona Server 5.7.22

For instances, I used c5.9xlarge instances. The reason for c5 was that it provides high performance Nitro virtualization: Brendan Gregg describes this in his blog post. The rationale for 9xlarge instances was to be able to utilize io1 volumes with a 30000 IOPS throughput – smaller instances will cap io1 throughput at a lower level.

I also used huge gp2 volumes: 3400GB, as this volume provides guaranteed 10000 IOPS even if we do not use io1 volumes. This is a cheaper alternative to io1 volumes to achieve 10000 IOPS.

For the workload I used sysbench-tpcc 5000W (50 tables * 100W), which for InnoDB gave about 471GB in storage used space.

For the cache I used 27GB and 54G buffer size, so the workload is IO-heavy.

I wanted to compare how InnoDB and RocksDB performed under this scenario.

If you are curious I prepared my terraform+ansible deployment files here: https://github.com/vadimtk/terraform-ansible-percona

Before jumping to the results, I should note that for MyRocks I used LZ4 compression for all levels, which in its final size is 91GB. That is five times less than InnoDB size. This alone provides operational benefits—for example to copy InnoDB files (471GB) from a backup volume takes longer than 1 hour, while it is much faster (five times) for MyRocks.

The benchmark results

So let’s review the results.

InnoDB versus MyRocks throughput in the cloud

Or presenting average throughput in a tabular form:

cachesize IOPS engine avg TPS
27 5000 innodb 132.66
27 5000 rocksdb 481.03
27 10000 innodb 285.93
27 10000 rocksdb 1224.14
27 10000gp2 innodb 227.19
27 10000gp2 rocksdb 1268.89
27 15000 innodb 436.04
27 15000 rocksdb 1839.66
27 20000 innodb 584.69
27 20000 rocksdb 2336.94
27 30000 innodb 753.86
27 30000 rocksdb 2508.97
54 5000 innodb 197.51
54 5000 rocksdb 667.63
54 10000 innodb 433.99
54 10000 rocksdb 1600.01
54 10000gp2 innodb 326.12
54 10000gp2 rocksdb 1559.98
54 15000 innodb 661.34
54 15000 rocksdb 2176.83
54 20000 innodb 888.74
54 20000 rocksdb 2506.22
54 30000 innodb 1097.31
54 30000 rocksdb 2690.91

 

We can see that MyRocks outperformed InnoDB in every single combination, but it is also important to note the following:

MyRocks on io1 5000 IOPS showed the performance that InnoDB showed in io1 15000 IOPS.

That means that InnoDB requires three times more in storage throughput. If we take a look at the storage cost, it corresponds to three times more expensive storage. Given that MyRocks requires less storage, it is possible to save even more on storage capacity.

On the most economical storage (3400GB gp2, which will provide 10000 IOPS) MyRocks showed 4.7 times better throughput.

For the 30000 IOPS storage, MyRocks was still better by 2.45 times.

However it is worth noting that MyRocks showed a greater variance in throughput during the runs. Let’s review the charts with 1 sec resolution for GP2 and io1 30000 IOPS storage:Throughput 1 sec resolution for GP2 and io1 30000 IOPS storage MyROCKS versus InnoDB

Such variance might be problematic for workloads that require stable throughput and where periodical slowdowns are unacceptable.

Conclusion

MyRocks is suitable and beneficial not only for fast SSD, but also for cloud deployments. By requiring less IOPS, MyRocks can provide better performance and save on the storage costs.

However, before evaluating MyRocks, make sure that your workload is IO-bound i.e. the working set is much bigger than available memory. For CPU-intensive workloads (where the working set fits into memory), MyRocks will be less beneficial or even perform worse than InnoDB (as described in the blog post A Look at MyRocks Performance)

 

 

 

The post Saving With MyRocks in The Cloud appeared first on Percona Database Performance Blog.

Jul
13
2018
--

On MySQL and Intel Optane performance

MySQL 8 versus Percona Server with heavy IO application on Intel Optane

Recently, Dimitri published the results of measuring MySQL 8.0 on Intel Optane storage device. In this blog post, I wanted to look at this in more detail and explore the performance of MySQL 8, MySQL 5.7 and Percona Server for MySQL using a similar set up. The Intel Optane is a very capable device, so I was puzzled that Dimitri chose MySQL options that are either not safe or not recommended for production workloads.

Since we have an Intel Optane in our labs, I wanted to run a similar benchmark, but using settings that we would recommend our customers to use, namely:

  • use innodb_checksum
  • use innodb_doublewrite
  • use binary logs with sync_binlog=1
  • enable (by default) Performance Schema

I still used

charset=latin1

  (even though the default is utf8mb4 in MySQL 8) and I set a total size of InnoDB log files to 30GB (as in Dimitri’s benchmark). This setting allocates big InnoDB log files to ensure there is no pressure from adaptive flushing. Though I have concerns about how it works in MySQL 8, this is a topic for another research.

So let’s see how MySQL 8.0 performed with these settings, and compare it with MySQL 5.7 and Percona Server for MySQL 5.7.

I used an Intel Optane SSD 905P 960GB device on the server with 2 socket Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz CPUs.

To highlight the performance difference I wanted to show, I used a single case: sysbench 8 tables 50M rows each (which is about ~120GB of data) and buffer pool 32GB. I ran sysbench oltp_read_write in 128 threads.

First, let’s review the results for MySQL 8 vs MySQL 5.7

After achieving a steady state – we can see that MySQL 8 does not have ANY performance improvements over MySQL 5.7.

Let’s compare this with Percona Server for MySQL 5.7

MySQL 8 versus Percona Server with heavy IO application on Intel Optane

Percona Server for MySQL 5.7 shows about 60% performance improvement over both MySQL 5.7 and MySQL 8.

How did we achieve this? All our improvements are described here: https://www.percona.com/doc/percona-server/LATEST/performance/xtradb_performance_improvements_for_io-bound_highly-concurrent_workloads.html. In short:

  1. Parallel doublewrite.  In both MySQL 5.7 and MySQL 8 writes are serialized by writing to doublewrite.
  2. Multi-threaded LRU flusher. We reported and proposed a solution here https://bugs.mysql.com/bug.php?id=70500. However, Oracle have not incorporated the solution upstream.
  3. Single page eviction. This is another problematic area in MySQL’s flushing algorithm. The bug https://bugs.mysql.com/bug.php?id=81376 was reported over 2 years ago, but unfortunately it’s still overlooked.

Summarizing performance findings:

  • For Percona Server for MySQL during this workload, I observed 1.4 GB/sec  reads and 815 MB/sec  writes
  • For MySQL 5.7 and MySQL 8 the numbers are 824 MB/sec reads and  530 MB/sec writes.

My opinion is that Oracle focused on addressing the wrong performance problems in MySQL 8 and did not address the real issues. In this benchmark, using real production settings, MySQL 8 does not show any significant performance benefits over MySQL 5.7 for workloads characterized by heavy IO writes.

With this, I should admit that Intel Optane is a very performant storage. By comparison, on Intel 3600 SSD under the same workload, for Percona Server I am able to achieve only 2000 tps, which is 2.5x times slower than with Intel Optane.

Drawing some conclusions

So there are a few outcomes I can highlight:

  • Intel Optane is a very capable drive, it is easily the fastest of those we’ve tested so far
  • MySQL 8 is not able to utilize all the power of Intel Optane, unless you use unsafe settings (which to me is the equivalent of driving 200 MPH on a highway without working brakes)
  • Oracle has focused on addressing the wrong IO bottlenecks and has overlooked the real ones
  • To get all the benefits of Intel Optane performance, use a proper server—Percona Server for MySQL—which is able to utilize more IOPS from the device.

The post On MySQL and Intel Optane performance appeared first on Percona Database Performance Blog.

Jun
19
2018
--

Chunk Change: InnoDB Buffer Pool Resizing

innodb buffer pool chunk size

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand

Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 |
| innodb_buffer_pool_instances | 8     |
| innodb_buffer_pool_chunk_size | 134217728 |
mysql> set global innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 1073741824 |
+-------------------------+------------+
1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------+
| Level   | Code | Message                                                                         |
+---------+------+---------------------------------------------------------------------------------+
| Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. |
+---------+------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------+
| Level   | Code | Message                                                         |
+---------+------+-----------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' |
+---------+------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.01 sec)

Interesting scenarios

Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with 

innodb_buffer_pool_instances = 16

  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 4294967296 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 67108864   |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

determine_best_chunk_size{
  if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size
  then
    innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)
  fi
}
determine_amount_of_chunks{
  innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)
  if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024
  then
    innodb_buffer_amount_chunks_per_instance++
  fi
}
determine_best_chunk_size
determine_amount_of_chunks
innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.

Jun
15
2016
--

Troubleshooting hardware resource usage webinar: Q & A

InnoDB locks and transaction isolation

Troubleshooting hardware resourceIn this blog, I provide answers to the Q & A for the Troubleshooting hardware resource usage webinar.

First, I want to thank everybody who attended the May 26 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: How did you find the memory IO LEAK?

A: Do you mean the replication bug I was talking about in the webinar? I wrote about this bug here. See also comments in the bug report itself.

Q: Do you have common formulas you use to tune MySQL?

A: There are best practices: relate thread concurrency to number of CPU cores you have, set InnoDB buffer pool size large enough so it can contain all your working dataset (which is not always possible), and do not set the Query Cache size larger than 512MB (or even better, turn it off) to avoid issues with global lock set when it needs to be de-fragmented. I prefer not to call them “formulas,” because all options need to be adjusted to match the workload. If this weren’t the case, MySQL Server would have an automatic configuration. There is also a separate webinar on configuration (Troubleshooting configuration issues) where I discuss these practices.

Q: Slide 11: is this real time? Can we get this info for a job that has already finished?

A: Yes, this is real time. No, it is not possible to get this info for a thread that does not exist.

Q: Slide 11: what do negative numbers mean?

A: Numbers are taken from the 

CURRENT_NUMBER_OF_BYTES_USED

 field for table

memory_summary_by_thread_by_event_name

 in Performance Schema. These values, in turn, are calculated as (memory allocated by thread) – (memory freed by thread). Negative numbers here mean either a memory leak or incorrect calculation of memory used by the thread. I reported this behavior in the MySQL Bugs database. Please subscribe to the bug report and wait to see how InnoDB and Performance Schema developers answer.

Q: Are TokuDB memory usage stats recorded in the 

sys.memory_by_thread_by_current_bytes

  table also?  Do we have to set something to enable this collection? I ran the query, but it shows 0 for everything.

A: TokuDB currently does not support Performance Schema, thus its memory statistics are not instrumented. See the user manual on how memory instrumentation works.

Q: With disk what we will check for disk I/O?

A: I quite don’t understand the question. Are you asking on which disk we should check IO statistics? For datadir and other disks, look at the locations where MySQL stores data and log files (if you set custom locations).

Q: How can we put CPU in parallel to process multiple client requests? Put multiple requests In memory by parallel way. By defining transaction. Or there any query hints?

A: We cannot directly put CPU in parallel, but we can affect it indirectly by tuning InnoDB threads-related options (

innodb_threads_concurrency, innodb_read_io_threads, innodb_write_io_threads

) and using the thread pool.

Q: Is there any information the Performance Schema that is not found in the SYS schema?

A: Yes. For example, sys schema does not have a view for statistics about prepared statements, while Performance Schema does, because sys schema takes its statement statistics from digest tables (which make no sense for prepared statements).

Q: What is your favorite tool to investigate a memory issue with a task/job that has already finished?

A: I don’t know that there is such a tool suitable for use in production. In a test environment, you can use valgrind or similar tools. You can also make core dumps of the mysqld process and investigate them after the issue is gone.

Jun
03
2016
--

MySQL 5.7 By Default 1/3rd Slower Than 5.6 When Using Binary Logs

binary logs make MySQL 5.7 slower

binary logs make MySQL 5.7 slower

Researching a performance issue, we came to a startling discovery:

MySQL 5.7 + binlogs is by default 37-45% slower than MySQL 5.6 + binlogs when otherwise using the default MySQL settings

Test server MySQL versions used:
i7, 8 threads, SSD, Centos 7.2.1511
mysql-5.6.30-linux-glibc2.5-x86_64
mysql-5.7.12-linux-glibc2.5-x86_64

mysqld –options:

--no-defaults --log-bin=mysql-bin --server-id=2

Run details:
Sysbench version 0.5, 4 threads, socket file connection

Sysbench Prepare: 

sysbench --test=/usr/share/doc/sysbench/tests/db/parallel_prepare.lua --oltp-auto-inc=off --mysql-engine-trx=yes --mysql-table-engine=innodb --oltp_table_size=1000000 --oltp_tables_count=1 --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=/path_to_socket_file/your_socket_file.sock prepare

Sysbench Run:

sysbench --report-interval=10 --oltp-auto-inc=off --max-time=50 --max-requests=0 --mysql-engine-trx=yes --test=/usr/share/doc/sysbench/tests/db/oltp.lua --init-rng=on --oltp_index_updates=10 --oltp_non_index_updates=10 --oltp_distinct_ranges=15 --oltp_order_ranges=15 --oltp_tables_count=1 --num-threads=4 --oltp_table_size=1000000 --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=/path_to_socket_file/your_socket_file.sock run

Results:

5.6.30: transactions: 7483 (149.60 per sec.)
5.7.12: transactions: 4689 (93.71 per sec.)  — That is a 37.36% decrease!

Note: on high-end systems with premium IO (think Fusion-IO, memory-only, high-end SSD with good caching throughput), the difference would be much smaller or negligible.

The reason?

A helpful comment from Shane Bester on a related bug report made me realize what was happening. Note the following in the MySQL Manual:

“Prior to MySQL 5.7.7, the default value of sync_binlog was 0, which configures no synchronizing to disk—in this case, the server relies on the operating system to flush the binary log’s contents from time to time as for any other file. MySQL 5.7.7 and later use a default value of 1, which is the safest choice, but as noted above can impact performance.” — https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_sync_binlog

The culprit is thus the

--sync_binlog=1

 change which was made in 5.7.7 (in 5.6 it is 0 by default). While this may indeed be “the safest choice,” one has to wonder why Oracle chose to implement this default change in 5.7.7. After all, there are many other options t aid crash safety.

A related blog post  from the MySQL HA team states;

“Indeed, [with sync_binlog=1,] it increases the total number of fsyncs called, but since MySQL 5.6, the server groups transactions and fsync’s them together, which minimizes greatly a potential performance hit.” — http://mysqlhighavailability.com/replication-defaults-in-mysql-5-7-7/ (ref item #4)

This seems incorrect given our findings, unless perhaps it requires tuning some other option.

This raises some actions points/questions for Oracle’s team: why change this now? Was 5.6 never crash-safe in terms of binary logging? How about other options that aid crash safety? Is anything [before 5.7.7] really ACID compliant by default?

In 2009 my colleague Peter Zaitsev had already posted on performance matters in connection with sync_binlog issues. More than seven years later, the questions asked in his post may still be valid today;

“May be opening binlog with O_DSYNC flag if sync_binlog=1 instead of using fsync will help? Or may be binlog pre-allocation would be good solution.” — PZ

Testing the same setup again, but this time with

sync_binlog=0

  and

sync_binlog=1

  synchronized/setup on both servers, we see;

Results for sync_binlog=0:

5.6.30: transactions: 7472 (149.38 per sec.)
5.7.12: transactions: 6594 (131.86 per sec.)  — A 11.73% decrease

Results for sync_binlog=1:

5.6.30: transactions: 3854 (77.03 per sec.)
5.7.12: transactions: 4597 (91.89 per sec.)  — A 19.29% increase

Note: the increase here is to some extent negated by the fact that enabling sync_binlog is overall still causes a significant (30% on 5.7 and 48% on 5.6) performance drop. Also interesting is that this could be the effect of “tuning the defaults” of/in 5.7, and it also makes one think about the possibility o further defaults tuning/optimization in this area.

Results for sync_binlog=100:

5.6.30: transactions: 7564 (151.12 per sec.)
5.7.12: transactions: 6515 (130.22 per sec.) — A 13.83% decrease

Thus, while 5.7.12 made some improvements when it comes to

--sync_binlog=1

, when

--sync_binlog

 is turned off or is set to 100, we still see a ~11% decrease in performance. This is the same when not using binary logging at all, as a test with only

--no-defaults

  (i.e. 100% vanilla out-of-the-box MySQL 5.6.30 versus MySQL 5.7.12) shows;

Results without binlogs enabled:

5.6.30: transactions: 7891 (157.77 per sec.)
5.7.12: transactions: 6963 (139.22 per sec.)  — A 11.76% decrease

This raises another question for Oracle’s team: with four threads, there is a ~11% decrease in performance for 5.7.12 versus 5.6.30 (both vanilla)?

Discussing this internally, we were interested to see whether the arbitrary low number of four threads skewed the results and perhaps only showed a less realistic use case. However, testing with more threads, the numbers became worse still:

Results with 100 threads:

5.6.30. transactions: 20216 (398.89 per sec.)
5.7.12. transactions: 11097 (218.43 per sec.) — A 45.24% decrease

Results with 150 threads:

5.6.30. transactions: 11852 (233.01 per sec.)
5.7.12. transactions: 6606 (129.80 per sec.) — A 44.29% decrease

The findings in this article were compiled from a group effort.

Apr
27
2015
--

Indexing 101: Optimizing MySQL queries on a single table

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:

SELECT * FROM t WHERE c = 100

The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:

SELECT * FROM t WHERE c > 100 ORDER BY b

This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.

Conclusion

Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

The post Indexing 101: Optimizing MySQL queries on a single table appeared first on MySQL Performance Blog.

Jan
19
2015
--

Looking deeper into InnoDB’s problem with many row versions

A few days ago I wrote about MySQL performance implications of InnoDB isolation modes and I touched briefly upon the bizarre performance regression I found with InnoDB handling a large amount of versions for a single row. Today I wanted to look a bit deeper into the problem, which I also filed as a bug.

First I validated in which conditions the problem happens. It seems to happen only in REPEATABLE-READ isolation mode and only in case there is some hot rows which get many row versions during a benchmark run. For example the problem does NOT happen if I run sysbench with “uniform” distribution.

In terms of concurrent selects it also seems to require some very special conditions – you need to have the connection to let some history accumulate by having read snapshot open and then do it again with high history. The exact queries to do that seems not to be that important.

Contrary to what I expected this problem also does not require multiple connections – I can repeat it with only one read and one write connection, which means it can happen in a lot more workloads.

Here is the simplified case to repeat it:

sysbench  --num-threads=1 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=uniform --oltp-table-size=1 --mysql-user=root --mysql-password= --mysql-db=sbinnodb  --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

It is repeatable on the table with just 1 row where this row is very hot!

Select query sequence:

begin;
select c  from sbtest1 limit 1;
select sleep(20);
commit;
begin;
select c  from sbtest1 limit 1;
select sleep(300);
commit;

It is interesting though, in this case it does not look like it is enough to open a consistent snapshot – it is only when I run the query on the same table as the update workload is running (I assume touches the same data) when the issue happens.

Let’s look at some graphs:

The transaction rate indeed suffers dramatically – in this case going down more than 100x from close to 15K to around 60 in the lowest point.

It is interesting that CPU consumption is reduced but not as much as the throughput.

This is something I find the most enlightening about what seems to happen. The number of buffer pool read requests increases dramatically where the number of transactions goes down. If we do the math we have 940K buffer pool reads per 14.5K updates in the normal case giving us 65 buffer reads per update, which goes up to 37000 per update when we have regression happening. Which is a dramatic change and it points to something that goes badly wrong.

History size is of interest. We can see it grows quickly first – this is when we have the first transaction and sleep(20) and when it grows very slowly when the second transaction is open as update rate is low. Finally it goes to zero very quickly once the transaction is committed. What is worth noting is the history length here peaks at just 280K which is really quite trivial size.

The InnoDB contention graph is interesting from 2 view points. First it shows that the contention picture changes at the time when the transaction is held open and also right afterward when history is being worked out.

If you look at this graph focused only on the os_wait metrics we can see that mutex_os_waits goes down during the problem while x-locks os waits increases significantly. This means there are some very long exclusive lock os-waits happening which is indicative of the long waits. We can’t actually lock times from status variables – we would need to get data from performance schema for that which unfortunately does not cover key mutexes, and which I unfortunately can’t graph easily yet.

Lets look at some more data.

From the great processlist from sys-schema we can get:

*************************** 4. row ***************************
                thd_id: 536
               conn_id: 516
                  user: root@localhost
                    db: sbinnodb
               command: Query
                 state: updating
                  time: 0
     current_statement: UPDATE sbtest1 SET k=k+1 WHERE id=1
          lock_latency: 36.00 us
         rows_examined: 1
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: NULL
last_statement_latency: NULL
             last_wait: wait/io/table/sql/handler
     last_wait_latency: Still Waiting
                source: handler.cc:7692

Which is unfortunately not very helpful as it does not show the actual lock which is being waited on – just saying that it is waiting somewhere inside storage engine call which is obvious.

We can get some good summary information in the other table:

mysql> select * from waits_global_by_latency;
+------------------------------------------------------+-----------+---------------+-------------+-------------+
| events                                               | total     | total_latency | avg_latency | max_latency |
+------------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/synch/cond/sql/Item_func_sleep::cond            |       384 | 00:31:27.18   | 4.91 s      | 5.00 s      |
| wait/synch/rwlock/innodb/index_tree_rw_lock          | 239460870 | 00:12:13.82   | 3.06 us     | 495.27 ms   |
| wait/synch/mutex/sql/THD::LOCK_thd_data              | 601160083 | 36.48 s       | 60.41 ns    | 643.18 us   |
| wait/io/table/sql/handler                            |      1838 | 29.12 s       | 15.84 ms    | 211.21 ms   |
| wait/synch/mutex/mysys/THR_LOCK::mutex               | 240456674 | 16.57 s       | 68.86 ns    | 655.16 us   |
| wait/synch/cond/sql/MDL_context::COND_wait_status    |        15 | 14.11 s       | 940.37 ms   | 999.36 ms   |
| wait/synch/mutex/innodb/trx_mutex                    | 360685304 | 10.52 s       | 29.11 ns    | 4.01 ms     |
| wait/synch/mutex/innodb/trx_undo_mutex               | 120228229 | 3.87 s        | 31.93 ns    | 426.16 us   |
| wait/io/file/innodb/innodb_data_file                 |       399 | 1.46 s        | 3.66 ms     | 11.21 ms    |
| wait/synch/mutex/innodb/buf_pool_flush_state_mutex   |     32317 | 587.10 ms     | 18.17 us    | 11.27 ms    |
| wait/synch/rwlock/innodb/fil_space_latch             |   4154178 | 386.96 ms     | 92.96 ns    | 74.57 us    |
| wait/io/file/innodb/innodb_log_file                  |       987 | 271.76 ms     | 275.34 us   | 14.23 ms    |
| wait/synch/rwlock/innodb/hash_table_locks            |   8509138 | 255.76 ms     | 30.05 ns    | 53.41 us    |
| wait/synch/mutex/innodb/srv_threads_mutex            |  11938747 | 249.49 ms     | 20.66 ns    | 15.06 us    |
| wait/synch/rwlock/innodb/trx_purge_latch             |   8488041 | 181.01 ms     | 21.28 ns    | 23.51 us    |
| wait/synch/rwlock/innodb/dict_operation_lock         |       232 | 100.48 ms     | 433.09 us   | 61.88 ms    |
| wait/synch/rwlock/innodb/checkpoint_lock             |        10 | 32.34 ms      | 3.23 ms     | 8.38 ms     |
| wait/io/socket/sql/client_connection                 |      2171 | 30.87 ms      | 14.22 us    | 3.95 ms     |
| wait/synch/mutex/innodb/log_sys_mutex                |    260485 | 6.57 ms       | 25.04 ns    | 52.00 us    |
| wait/synch/rwlock/innodb/btr_search_latch            |      4982 | 1.71 ms       | 343.05 ns   | 17.83 us    |
| wait/io/file/myisam/kfile                            |       647 | 1.70 ms       | 2.63 us     | 39.28 us    |
| wait/synch/rwlock/innodb/dict_table_stats            |     46323 | 1.63 ms       | 35.06 ns    | 6.55 us     |
| wait/synch/mutex/innodb/os_mutex                     |     11410 | 904.63 us     | 79.19 ns    | 26.67 us    |
| wait/lock/table/sql/handler                          |      1838 | 794.21 us     | 431.94 ns   | 9.70 us     |
| wait/synch/rwlock/sql/MDL_lock::rwlock               |      7056 | 672.34 us     | 95.15 ns    | 53.29 us    |
| wait/io/file/myisam/dfile                            |       139 | 518.50 us     | 3.73 us     | 31.61 us    |
| wait/synch/mutex/sql/LOCK_global_system_variables    |     11692 | 462.58 us     | 39.44 ns    | 363.39 ns   |
| wait/synch/mutex/innodb/rseg_mutex                   |      7238 | 348.25 us     | 47.89 ns    | 10.38 us    |
| wait/synch/mutex/innodb/lock_mutex                   |      5747 | 222.13 us     | 38.50 ns    | 9.84 us     |
| wait/synch/mutex/innodb/trx_sys_mutex                |      4601 | 187.43 us     | 40.69 ns    | 326.15 ns   |
| wait/synch/mutex/sql/LOCK_table_cache                |      2173 | 185.14 us     | 85.14 ns    | 10.02 us    |
| wait/synch/mutex/innodb/fil_system_mutex             |      3481 | 144.60 us     | 41.32 ns    | 375.91 ns   |
| wait/synch/rwlock/sql/LOCK_grant                     |      1217 | 121.86 us     | 99.85 ns    | 12.07 us    |
| wait/synch/mutex/innodb/flush_list_mutex             |      3191 | 108.82 us     | 33.80 ns    | 8.81 us     |
| wait/synch/mutex/innodb/purge_sys_bh_mutex           |      3600 | 83.52 us      | 23.16 ns    | 123.95 ns   |
| wait/synch/mutex/sql/MDL_map::mutex                  |      1456 | 75.02 us      | 51.33 ns    | 8.78 us     |
| wait/synch/mutex/myisam/MYISAM_SHARE::intern_lock    |      1357 | 55.55 us      | 40.69 ns    | 320.51 ns   |
| wait/synch/mutex/innodb/log_flush_order_mutex        |       204 | 40.84 us      | 200.01 ns   | 9.42 us     |
| wait/io/file/sql/FRM                                 |        16 | 31.16 us      | 1.95 us     | 16.40 us    |
| wait/synch/mutex/mysys/BITMAP::mutex                 |       378 | 30.52 us      | 80.44 ns    | 9.16 us     |
| wait/synch/mutex/innodb/dict_sys_mutex               |       463 | 24.15 us      | 51.96 ns    | 146.17 ns   |
| wait/synch/mutex/innodb/buf_pool_LRU_list_mutex      |       293 | 23.37 us      | 79.50 ns    | 313.94 ns   |
| wait/synch/mutex/innodb/buf_dblwr_mutex              |       398 | 22.60 us      | 56.65 ns    | 380.61 ns   |
| wait/synch/rwlock/myisam/MYISAM_SHARE::key_root_lock |       388 | 14.86 us      | 38.19 ns    | 254.16 ns   |
| wait/synch/mutex/sql/LOCK_plugin                     |       250 | 14.30 us      | 56.97 ns    | 137.41 ns   |
| wait/io/socket/sql/server_unix_socket                |         2 | 9.35 us       | 4.67 us     | 4.72 us     |
| wait/synch/mutex/sql/THD::LOCK_temporary_tables      |       216 | 8.97 us       | 41.32 ns    | 465.74 ns   |
| wait/synch/mutex/sql/hash_filo::lock                 |       151 | 8.35 us       | 55.09 ns    | 150.87 ns   |
| wait/synch/mutex/mysys/THR_LOCK_open                 |       196 | 7.84 us       | 39.75 ns    | 110.80 ns   |
| wait/synch/mutex/sql/TABLE_SHARE::LOCK_ha_data       |       159 | 6.23 us       | 39.13 ns    | 108.92 ns   |
| wait/synch/mutex/innodb/lock_wait_mutex              |        29 | 5.38 us       | 185.30 ns   | 240.38 ns   |
| wait/synch/mutex/innodb/ibuf_mutex                   |        29 | 5.26 us       | 181.23 ns   | 249.46 ns   |
| wait/synch/mutex/sql/LOCK_status                     |        62 | 4.12 us       | 66.36 ns    | 127.39 ns   |
| wait/synch/mutex/sql/LOCK_open                       |       109 | 4.05 us       | 36.93 ns    | 105.79 ns   |
| wait/io/file/sql/dbopt                               |         3 | 3.34 us       | 1.11 us     | 1.99 us     |
| wait/synch/mutex/innodb/buf_pool_free_list_mutex     |        30 | 1.58 us       | 52.58 ns    | 106.73 ns   |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex     |        31 | 1.38 us       | 44.45 ns    | 89.52 ns    |
| wait/synch/mutex/sql/LOCK_thread_count               |        33 | 1.10 us       | 33.18 ns    | 50.71 ns    |
| wait/synch/mutex/sql/LOCK_prepared_stmt_count        |        32 | 1.10 us       | 34.12 ns    | 67.92 ns    |
| wait/synch/mutex/innodb/srv_sys_mutex                |        42 | 1.07 us       | 25.35 ns    | 49.14 ns    |
| wait/synch/mutex/mysys/KEY_CACHE::cache_lock         |        24 | 1.02 us       | 42.26 ns    | 103.60 ns   |
| wait/synch/mutex/sql/MDL_wait::LOCK_wait_status      |        18 | 544.62 ns     | 30.05 ns    | 42.57 ns    |
| wait/synch/rwlock/sql/CRYPTO_dynlock_value::lock     |         2 | 509.25 ns     | 254.47 ns   | 287.02 ns   |
| wait/synch/rwlock/sql/LOCK_dboptions                 |         6 | 262.92 ns     | 43.82 ns    | 84.82 ns    |
| wait/synch/rwlock/sql/MDL_context::LOCK_waiting_for  |         4 | 262.92 ns     | 65.73 ns    | 103.60 ns   |
| wait/synch/mutex/sql/LOCK_connection_count           |         4 | 238.51 ns     | 59.47 ns    | 123.01 ns   |
| wait/synch/mutex/sql/LOCK_sql_rand                   |         2 | 172.15 ns     | 86.08 ns    | 94.84 ns    |
| wait/synch/rwlock/sql/LOCK_system_variables_hash     |         2 | 141.16 ns     | 70.43 ns    | 78.25 ns    |
| wait/synch/rwlock/sql/gtid_commit_rollback           |         2 | 95.47 ns      | 47.58 ns    | 51.65 ns    |
| wait/synch/mutex/innodb/autoinc_mutex                |         2 | 90.14 ns      | 45.07 ns    | 45.07 ns    |
| wait/synch/mutex/sql/LOCK_thd_remove                 |         2 | 85.14 ns      | 42.57 ns    | 52.58 ns    |
| wait/synch/mutex/innodb/ibuf_bitmap_mutex            |         1 | 76.37 ns      | 76.37 ns    | 76.37 ns    |
| wait/synch/mutex/sql/LOCK_xid_cache                  |         2 | 62.91 ns      | 31.30 ns    | 34.43 ns    |
+------------------------------------------------------+-----------+---------------+-------------+-------------+
73 rows in set (0.00 sec)

Which shows a couple of hotspots – the well-known index_tree_rw_lock hotspot which should be gone in MySQL 5.7 as well as trx_mutex and trx_undo_mutexes. It is especially worth noting the almost 500ms which this index tree lw_lock was waited at max (this is the data for a couple of minutes only, so it it is unlikely to be a very rare outlier). This hints to me that in the case of many versions our index_tree_rw_lock might be held for a prolonged period of time.

What is interesting is that the reliable “SHOW ENGINE INNODB STATUS” remains quite handy:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 58847289
--Thread 140386357991168 has waited at btr0cur.cc line 304 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2ca40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row /row0row.cc line 815
--Thread 140386374776576 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386324420352 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386332813056 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386445960960 has waited at btr0cur.cc line 586 for 0.0000 seconds the semaphore:
S-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386366383872 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386341205760 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386349598464 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577

Another sample….

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 61753307
--Thread 140386332813056 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386366383872 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386341205760 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386374776576 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386445960960 has waited at btr0cur.cc line 257 for 0.0000 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7fb0bea2f2c0 '&block->lock'
a writer (thread id 140386445960960) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr/btr0cur.cc line 257
OS WAIT ARRAY INFO: signal count 45260762
Mutex spin waits 1524575466, rounds 5092199292, OS waits 43431355
RW-shared spins 14500456, rounds 258217222, OS waits 3721069
RW-excl spins 37764438, rounds 730059390, OS waits 12592324
Spin rounds per wait: 3.34 mutex, 17.81 RW-shared, 19.33 RW-excl

The first one shows mostly index lock contention which we have already seen. The second, though, shows a lot of contention on the block lock, which is something that is not covered by performance schema instrumentation in MySQL 5.6.

Now to tell the truth in this 2nd thread test case it is not your typical case of contention – unlike the case with 64 threads we do not see that drastic CPU drop and it still stays above 25%, the single thread fully saturated should show on this system. This is however where DBA level profiling gets tough with MySQL – as I mentioned the only indication of excessive activity going on is a high number of buffer pool requests – there is no instrumentation which I’m aware of that tells us anything around scanning many old row versions in undo space. Perhaps some more metrics around this syssystem need to be added to INNODB_METRICS?

A good tool to see CPU consumption which both developers and DBAs can use is oprofile, which is of low enough overhead to use in production. Note that to get any meaningful information about the CPU usage inside the program you need to have debug systems installed. For Percona Server 5.6 On Ubuntu you can do apt-get install percona-server-5.6-dbg

To profile just the MySQL Process you can do:

root@ts140i:~# operf --pid 8970
operf: Press Ctl-c or 'kill -SIGINT 9113' to stop profiling
operf: Profiler started
^C
Profiling done.

To get top symbols (functions) called in mysqld I can do:

root@ts140i:~# opreport -l | more
Using /root/oprofile_data/samples/ for samples directory.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:8970 range:0x7fffba5fe000-0x7fffba5fffff) could not be found.
CPU: Intel Haswell microarchitecture, speed 3.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
212162   12.4324  mysqld                   buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
204230   11.9676  mysqld                   dict_table_copy_types(dtuple_t*, dict_table_t const*)
179783   10.5351  no-vmlinux               /no-vmlinux
144335    8.4579  mysqld                   trx_undo_update_rec_get_update(unsigned char*, dict_index_t*, unsigned long, unsigned long, unsigned long, unsigned long, trx_t*, mem_block_info_t*, upd_t
**)
95984     5.6245  mysqld                   trx_undo_prev_version_build(unsigned char const*, mtr_t*, unsigned char const*, dict_index_t*, unsigned long*, mem_block_info_t*, unsigned char**)
78785     4.6167  mysqld                   row_build(unsigned long, dict_index_t const*, unsigned char const*, unsigned long const*, dict_table_t const*, dtuple_t const*, unsigned long const*, row_
ext_t**, mem_block_info_t*)
76656     4.4919  libc-2.19.so             _int_free
63040     3.6941  mysqld                   rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
52794     3.0937  mysqld                   mtr_commit(mtr_t*)

Which points to some hot functions – quite as expected we see some functions working with undo space here. In some cases it is also helpful to look at the call graph to understand where functions are being called if it is not clear.

Now in many cases looking at oprofile report it is hard to figure out what is “wrong” without having the baseline. In this case we can look at the oprofile run for sysbench alone:

root@ts140i:~# opreport -l | more
Using /root/oprofile_data/samples/ for samples directory.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:8970 range:0x7fffba5fe000-0x7fffba5fffff) could not be found.
CPU: Intel Haswell microarchitecture, speed 3.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
1331386  28.4544  no-vmlinux               /no-vmlinux
603873   12.9060  mysqld                   ut_delay(unsigned long)
158428    3.3859  mysqld                   buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*,
 unsigned long, mtr_t*)
88172     1.8844  mysqld                   my_timer_cycles
75948     1.6232  mysqld                   srv_worker_thread
74350     1.5890  mysqld                   MYSQLparse(THD*)
68075     1.4549  mysqld                   mutex_spin_wait(void*, bool, char const*, unsigned long)
59321     1.2678  mysqld                   que_run_threads(que_thr_t*)
59152     1.2642  mysqld                   start_mutex_wait_v1
57495     1.2288  mysqld                   rw_lock_x_lock_func(rw_lock_t*, unsigned long, char const*, unsigned long, bool, bool)
53133     1.1356  mysqld                   mtr_commit(mtr_t*)
44404     0.9490  mysqld                   end_mutex_wait_v1
40226     0.8597  libpthread-2.19.so       pthread_mutex_lock
38889     0.8311  mysqld                   log_block_calc_checksum_innodb(unsigned char const*)

As you see this report is very different in terms of top functions.

Final thoughts: I had 2 purposes with this blog post. First, I hope it will contain some helpful information for developers so this bug can be fixed at least in MySQL 5.7. It is quite a nasty one as it can come out of nowhere if the application unexpectedly develops hot rows. Second, I hope it will show some of the methods you can use to capture and analyze data about MySQL performance.

The post Looking deeper into InnoDB’s problem with many row versions appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com