Jan
26
2015

MySQL benchmarks on eXFlash DIMMs

In this blog post, we will discuss MySQL performance on eXFlash DIMMs. Earlier we measured the IO performance of these storage devices with sysbench fileio.

Environment

The benchmarking environment was the same as the one we did sysbench fileio in.

CPU: 2x Intel Xeon E5-2690 (hyper threading enabled)
FusionIO driver version: 3.2.6 build 1212
Operating system: CentOS 6.5
Kernel version: 2.6.32-431.el6.x86_64

In this case, we used a separate machine for testing which had a 10G ethernet connection to this server. This server executed sysbench. The client was not the bottleneck in this case. The environment is described in greater detail at the end of the blog post.

Sysbench OLTP write workload

The graph shows throughput for sysbench OLTP, we will examine properties only for the dark areas of this graph: which is the read/write case for high concurrency.

Each table in the following sections has the following columns

column	explanation
storage	The device that was used for the measurement.
threads	The number of sysbench client threads were used in the benchmark.
ro_rw	Read-only or read-write. In the whitepaper you can find detailed information about read-only data as well.
sd	The standard deviation of the metric in question.
mean	The mean of the metric in question.
95thpct	The 95th percentile of the metric in question (the maximum without the highest 5 percent of the samples).
max	The maximum of the metric in question.

Sysbench OLTP throughput

storage	threads	ro_rw	sd	mean	95thpct	max
eXFlash DIMM_4	128	rw	714.09605	5996.5105	7172.0725	7674.87
eXFlash DIMM_4	256	rw	470.95410	6162.4271	6673.0205	7467.99
eXFlash DIMM_8	128	rw	195.57857	7140.5038	7493.4780	7723.13
eXFlash DIMM_8	256	rw	173.51373	6498.1460	6736.1710	7490.95
fio	128	rw	588.14282	1855.4304	2280.2780	7179.95
fio	256	rw	599.88510	2187.5271	2584.1995	7467.13

Going from 4 to 8 eXFlash DIMMs will mostly mean more consistent throughput. The mean throughput is significantly higher in case of 8 DIMMs used, but the 95th percentile and the maximum values are not much different (the difference in standard deviation also shows this). The reason they are not much different is that these benchmark are CPU bound (check CPU idle time table later in this post or the graphs in the whitepaper). The PCI-E flash drive on the other hand can do less than half of the throughput of the eXFlash DIMMs (the most relevant is comparing the 95th percentile value).

Sysbench OLTP response time

storage	threads	ro_rw	sd	mean	95thpct	max
eXFlash DIMM_4	128	rw	4.4187784	37.931489	44.2600	64.54
eXFlash DIMM_4	256	rw	9.6642741	90.789317	109.0450	176.45
eXFlash DIMM_8	128	rw	2.1004085	28.796017	32.1600	67.10
eXFlash DIMM_8	256	rw	5.5932572	94.060628	101.6300	121.92
fio	128	rw	51.2343587	138.052150	203.1160	766.11
fio	256	rw	72.9901355	304.851844	392.7660	862.00

The 95th percentile response time for the eXFlash DIMM’s case are less than 1/4 compared to the PCI-E flash device.

CPU idle percentage

storage	threads	ro_rw	sd	mean	95thpct	max
eXFlash DIMM_4	128	rw	1.62846674	3.3683857	6.2600	22.18
eXFlash DIMM_4	256	rw	1.06980095	2.2930634	3.9170	26.37
eXFlash DIMM_8	128	rw	0.42987637	0.8553543	1.2900	15.28
eXFlash DIMM_8	256	rw	1.32328435	4.4861795	6.7100	9.40
fio	128	rw	4.21156996	26.1278994	31.5020	55.49
fio	256	rw	5.49489852	19.3123639	27.6715	47.34

The percentage of CPU being idle shows that the performance bottleneck in this benchmark was the CPU in case of eXFlash DIMMs (both with 4 and 8 DIMMs, this is why we didn’t see a substantial throughput difference between the 4 and the 8 DIMM setup). However, for the PCI-E flash, the storage device itself was the bottleneck.

If you are interested in more details, download the free white paper which contains the full analysis of sysbench OLTP and linkbench benchmarks.

The post MySQL benchmarks on eXFlash DIMMs appeared first on MySQL Performance Blog.

Written by Peter Boros in: benchmarks,eXFlash DIMMs,MySQL,MySQL benchmarks,Peter Boros,Primary,sysbench,Zend Developer |

Sep
02
2014

Using sysbench 0.5 for performing MySQL benchmarks

Given the recent excitement & interest around OpenStack I wanted to make sure I was ready to conduct appropriate evaluations of system performance. I generally turn to sysbench since it comes with a variety of different tests (accessed via –test= option interface), including:

fileio – File I/O test
cpu – CPU performance test
memory – Memory functions speed test
threads – Threads subsystem performance test
mutex – Mutex performance test

As you can see, sysbench lets you stress many of the fundamental components of your hardware and infrastructure, such as your disk subsystem, along with your CPUs and memory. An additional option exists that is designed to perform synthetic stress testing of MySQL, and I was surprised when I didn’t see it in the above list on version 0.5, as it used to show up as “oltp – OLTP test”. What happened to –test=oltp ??

This list is from the latest release of sysbench which is 0.5 — you’re only going to be on this version if you build it yourself or if you use the package provided courtesy of Frederic Descamps (thanks lefred!). If you’re using the version from EPEL, Ubuntu 14.04, or Debian 7 you’re still using version 0.4.12 (check with sysbench –version). One thing you’ll notice is that the test type of OLTP doesn’t show up anymore. What gives? I was scratching my head until I asked on Percona IRC and found out that in 0.5 the standard OLTP test type was replaced with a different syntax, that instead of passing parameters to sysbench you instead reference scripts written in lua. The advantage here is that now you have an interface in order to write your own specific load tests (provided you know lua, but it isn’t hard). For those of you looking to run the pre-canned load tests they still exist but you have to have them as part of the RPM install or otherwise copied to your system.

Fortunately if you use the package provided by lefred you’ll find these lua scripts here (this is using Amazon ami as of August 4th, 2014):

[root@pxc-control ~]# ls -l /usr/share/doc/sysbench/tests/db/
total 44
-rw-r--r-- 1 root root 3585 Sep 7 2012 common.lua
-rw-r--r-- 1 root root 340 Sep 7 2012 delete.lua
-rw-r--r-- 1 root root 830 Sep 7 2012 insert.lua
-rw-r--r-- 1 root root 2925 Sep 7 2012 oltp.lua
-rw-r--r-- 1 root root 342 Sep 7 2012 oltp_simple.lua
-rw-r--r-- 1 root root 425 Sep 7 2012 parallel_prepare.lua
-rw-r--r-- 1 root root 343 Sep 7 2012 select.lua
-rw-r--r-- 1 root root 3964 Sep 7 2012 select_random_points.lua
-rw-r--r-- 1 root root 4066 Sep 7 2012 select_random_ranges.lua
-rw-r--r-- 1 root root 343 Sep 7 2012 update_index.lua
-rw-r--r-- 1 root root 552 Sep 7 2012 update_non_index.lua

So the trick (if you want to call it that) is that instead of passing a single word to the –test directive, instead you pass the full path to the lua script.

This is the old way (sysbench 0.4.12 from EPEL repo):

--test=oltp --oltp-test-mode=complex

This is the new way (sysbench 0.5):

--test=/usr/share/doc/sysbench/tests/db/insert.lua

Here is an example of a test I’m running through haproxy against a 3-node PXC cluster doing the INSERT-only test type so you can see the full syntax I pass to sysbench:

[root@pxc-control ~]# cat sys_haproxy.sh
#!/bin/bash
sysbench
--test=/usr/share/doc/sysbench/tests/db/insert.lua
--mysql-host=pxc-control
--mysql-port=9999
--mysql-user=sysbench-haproxy
--mysql-password=sysbench-haproxy
--mysql-db=sbtest
--mysql-table-type=innodb
--oltp-test-mode=complex
--oltp-read-only=off
--oltp-reconnect=on
--oltp-table-size=1000000
--max-requests=100000000
--num-threads=3
--report-interval=1
--report-checkpoints=10
--tx-rate=24
$1

And here’s what the insert.lua script looks like:

[root@pxc-control ~]# cat /usr/share/doc/sysbench/tests/db/insert.lua
pathtest = string.match(test, "(.*/)") or ""
dofile(pathtest .. "common.lua")
function thread_init(thread_id)
   set_vars()
end
function event(thread_id)
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   table_name = "sbtest".. sb_rand_uniform(1, oltp_tables_count)
   if (oltp_auto_inc) then
      i = 0
   else
      i = sb_rand_uniq(1, oltp_table_size)
   end
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
   rs = db_query("INSERT INTO " .. table_name ..  " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, k_val, c_val, pad_val))
end

The thing that I like most about sysbench 0.5 (beyond the lua interface, of course!) is that it now comes with a –report-interval option (which I generally set as = 1) so that you get output while the script is running. No more waiting until the end of the test to get feedback! Here’s a sample of sysbench 0.5 in action running the INSERT test through a local haproxy instance and writing to three nodes in a PXC cluster such as OpenStack Trove might do:

[root@pxc-control ~]# ./sys_haproxy.sh run
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 3
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
[   1s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1099.28, response time: 9.86ms (95%)
[   2s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 973.02, response time: 10.77ms (95%)
[   3s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1181.01, response time: 6.23ms (95%)
[   4s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1103.00, response time: 6.77ms (95%)

I would also like to call your attention to a blog post by Nilnandan Joshi from Percona’s Support team where he describes a method to build sysbench 0.5 on Debian 7. Thanks Nil for pointing this out!

I hope that helps others out there who upgrade to sysbench 0.5 and then have questions about where –test=oltp went to. I’d love to hear your own sysbench use cases, and whether anyone else is publishing lua scripts for their own load testing!

The post Using sysbench 0.5 for performing MySQL benchmarks appeared first on MySQL Performance Blog.

Written by Michael Coburn in: benchmarks,Frederic Descamps,Insight for DBAs,Michael Coburn,MySQL,MySQL benchmarks,Nilnandan Joshi,OpenStack,Percona MySQL Consulting,Percona MySQL Support,sysbench 0.5,Zend Developer |

Oct
08
2013

A closer look at Percona Server 5.6

Yesterday we announced the GA release of Percona Server 5.6, the latest release of our enhanced, drop-in replacement for MySQL. Percona Server 5.6 is the best free MySQL alternative for demanding applications. Our third major release, Percona Server 5.6 offers all the improvements found in MySQL 5.6 Community Edition plus scalability, availability, backup, and security features some of which are found only in MySQL 5.6 Enterprise Edition.

Percona Server 5.6 comes with:

General performance improvements
Extensive diagnostics via TABLE/INDEX/USER STATISTIC and slow query log
Thread Pool plugin
Pluggable Authentication Module plugin
Integration with Percona XtraBackup: “Real” incremental backups and Archive logs backups
Statement timeouts
Fully drop-in compatible with MySQL

General performance improvements
MySQL 5.6 by itself comes with a great list of performance fixes, however what we discovered that their focus was on small datasets that fit into memory. In other words, mostly on CPU-bound workloads.

In our research we found that in IO-bound cases there is still room for improvement so we took action:

Ported good old buffer mutex split from Percona Server 5.5 to Percona Server 5.6. This helps to decrease a contention on buffer pool even further.
Implemented “priority” mutexes and rw-locks in InnoDB, check https://blueprints.launchpad.net/percona-server/+spec/xtradb-priority-mutex. Free list priority refill now gets priority in obtaining mutexes.
Implemented “Thread scheduling.” Now it is possible to change priority for InnoDB Cleaner thread. This helps to stabilize performance, as we found that in IO-heavy workloads, the cleaner thread is “starving” and not getting enough CPU time to do its job.
Implemented new waiting algorithms with exponential wait on the access to shared resources.
Additional tunings to Page Cleaner thread.

For performance improvements and testing I would like to give credit to Percona engineers Laurynas Biveinis and Alexey Stroganov.

Diagnostics via TABLE/INDEX/USER STATISTIC and slow query log
Even with MySQL 5.6′s rich PERFORMANCE_SCHEMA information, we decided to keep our diagnostics. Why? Because it is very easy to use. Check Domas’ post on this topic.

Integration with Percona XtraBackup: “Real” incremental backups and Archive logs backups
Percona Server comes with following backup features:

Changed page tracking AKA “Real” incremental backups. Using this feature will avoid full table scans for incremental backups, and information on changed pages is now available in bitmap files.
Archive Logs. An alternative way to perform incremental backups by copying InnoDB transactional logs and applying them to backup.

These features are unique to Percona Server, and, in combination with Percona XtraBackup, they allow users to achieve greater flexibility in backup schemas.

Statement timeouts
This feature was ported from Twitter’s fork of MySQL and allows users to control execution time of statements.

You are welcome to review what has changed in Percona Server 5.6 in our summary, compare with previous Percona Server releases, or compare with MySQL 5.6.

How do we do QA of Percona Server
For QA testing I want to give credit to Roel Van De Paar and his Random Query Generator extensions specific for new Percona Server features. By using RQG together with a new option combinatorics approach (expect a blog post on this soon), we are confident in the quality of Percona Server.
Also, we found, reported and fixed quite a large number of bugs for upstream MySQL.

Performance results
And of course I want to share benchmark results. What kind of performance gain can we expect with all these performance improvements I explained above?

For tests I took sysbench OLTP read-write workload with pareto distribution. The dataset is 32 tables, 10mln rows each, which totals about ~77GB of data. Our interest is intensive IO-bound workload, so buffer_pool size is 25GB and we ran the load in 250 user threads.

For hardware I used a Cisco UCS 250 server with two Intel(R) Xeon(R) CPU X5670, Ubuntu 12.04.3 LTS as the OS and very high-end PCIe SSD storage (capable of 100,000 IOPS in random 16KB writes).

So let’s compare Percona Server 5.6 and MySQL 5.6 in this workload. The graph shows timeline for 30 mins run with 1 sec resolution.
Throughput (more is better):

95% Response time (less is better):

We find that Percona Server 5.6 provides 2x better performance (in both throughput and response time) with much less variance.

This is possible to achieve by decreasing internal contention in InnoDB and prioritizing page cleaner thread and free list refill.

Now let me share configuration files for this run:

[mysqld]
innodb_data_file_path=ibdata1:10M:autoextend
innodb_log_files_in_group=2
innodb_log_file_size=2G
innodb_buffer_pool_size=25GB
innodb_lru_scan_depth=4000
innodb_flush_neighbors=0
innodb_log_buffer_size=256M
innodb_io_capacity = 25000
innodb_io_capacity_max= 50000
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_instances=15
innodb_file_format = Barracuda
innodb_checksum_algorithm = crc32
innodb_file_per_table = true
innodb_doublewrite=1
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_purge_threads=4
table_open_cache=15000
open_files_limit=15000
max_connections=15000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_change_buffering=all
loose-innodb_sync_array_size=16
sync_binlog=0
query_cache_type=OFF
thread_cache_size=1000
back_log=2000
connect_timeout=15
loose-metadata_locks_hash_instances=256
max_prepared_stmt_count=1048560
loose-performance_schema=0
# --- below is Percona Server Specific ---
innodb_sched_priority_cleaner=39
innodb_log_block_size=4096
innodb_adaptive_hash_index_partitions=65

And some comments on it:

1. Please note that we are using fully durable settings:
innodb_flush_log_at_trx_commit = 1 innodb_doublewrite=1 innodb_checksum_algorithm = crc32
This is different from the results provided by MySQL, where, to get better numbers, they disable data protection.

2. innodb_checksum_algorithm = crc32. New hardware crc32 checksums actually provide much better performance, and we recommend using it whenever possible (Please note this will have an effect only on new created databases, and not for databases created in previous versions of the server)

3. Percona Server only specific settings:
innodb_sched_priority_cleaner=39 – to give highest priority to page cleaner thread
innodb_log_block_size=4096 – to use 4096 block size for InnoDB logs
innodb_adaptive_hash_index_partitions=65 – to enable partitioning of adaptive hash index, otherwise quite often this is a contention point.

And some variables which are used by default in Percona Server.
innodb_foreground_preflush=exponential_backoff innodb_empty_free_list_algorithm=backoff innodb_cleaner_lsn_age_factor=high_checkpoint

As a disclaimer I should mention that I expect the difference between Percona Server 5.6 and MySQL 5.6 performance will grow even wider with a larger dataset, more memory and faster storage.

For a small dataset which fits into memory on low-end servers, Percona Server 5.6 performance will be identical to MySQL 5.6 performance. Credit where credit is due: MySQL did a great job optimizing InnoDB for small datasets in memory.

This characteristics of Percona Server 5.6 are actually very important. With our server you are able to scale your workload by a simple hardware upgrade. By increasing CPU speed, increasing memory or upgrading your storage, you get better performance with Percona Server.

You are welcome to try Percona Server 5.6 yourself and give us your feedback!

What is in the future?
We are not stopping here. Expect new improvements and more features.

More InnoDB performance improvements. What we have done so far is only the tip of the iceberg and has opened the door for further research
TokuDB support. We will ship TokuDB in one of the next Percona Server 5.6 releases
Per query variables. We will add a new scope (in additional to global and session) for MySQL variables: per query. You will be able to change some variable only for one specific query
Percona XtraDB Cluster 5.6, based on Percona Server 5.6 with all of its improvements, will be available in few months

Should you need any assistance in planning your Percona Server 5.6 upgrade or migration for your company, we’re here to help. Percona has seasoned support and consulting professionals available around the world that are ready for your call. Contact us today.

The post A closer look at Percona Server 5.6 appeared first on MySQL Performance Blog.

Written by Vadim Tkachenko in: benchmarks,drop-in MySQL replacement,MySQL,MySQL 5.6 Community Edition,MySQL benchmarks,percona server,Percona Software,Vadim Tkachenko,Zend Developer |

Personal blog of Yzmir Ramirez

The Glass is twice as large as it needs to be

MySQL benchmarks on eXFlash DIMMs

Environment

Sysbench OLTP write workload

Sysbench OLTP throughput

Sysbench OLTP response time

CPU idle percentage

Using sysbench 0.5 for performing MySQL benchmarks

A closer look at Percona Server 5.6

Archives

Contributors

Personal blog of Yzmir Ramirez

The Glass is twice as large as it needs to be

MySQL benchmarks on eXFlash DIMMs

Environment

Sysbench OLTP write workload

Sysbench OLTP throughput

Sysbench OLTP response time

CPU idle percentage

Using sysbench 0.5 for performing MySQL benchmarks

A closer look at Percona Server 5.6

Tag Cloud

Archives

Contributors