May
06
2012
--

Load management Techniques for MySQL

One of the very frequent cases with performance problems with MySQL is what they happen every so often or certain times. Investigating them we find out what the cause is some batch jobs, reports and other non response time critical activities are overloading the system causing user experience to degrade.

The first thing you need to know it is not MySQL problem, might be even not problem with your MySQL configuration, queries and hardware, even though fixing these does help in many cases. Whatever powerful and well tuned system you have if you put too heavy of concurrent load on it the response times will increase and user experience will suffer.

So what you can do to prevent this problem from happening ? The answer is easy. Throttle the side load so it does not consume too much system resources. Here are some specific techniques to use.

Do push concurrency too high Many developers will test script with multiple level of concurrency and find out doing work from 32 processes is faster than just having one process. This is true if you have system completely at your disposal. If you however need system to serve other users too you typically need to reduce concurrency to where it does not overload the system. Unless it is really time critical process I would not use more than 4 parallel processes heavily writing to database.

Introduce Throttling Sometimes even single process overloads system too much in this case throttling by having relatively short queries and introducing “sleeps” between them can be a good idea. It also often helps with monopolizing replication thread. For example if I need to delete old data instead of DELETE FROM TBL WHERE ts<"2010-01-01" I’ll do “DELETE FROM TBL WHERE TS<"2010-01-01" LIMIT 1000 in the loop until no more rows need to be deleted. When I may inject “sleep” between iterations which to be as long as query execution – so the longer queries run (and the more system is loaded) the more “rest” it will get. Alternatively you can look at “threads_running” variable which is very good simple identifier of the current load and sleep based on its value – for example you may want chose to pause the script at all if the load is too high and wait for threads_running to go below certain value.

Tuning Cron It also often helps to look into your cron or other scheduling system you’re using. Frequently way too many scripts can be started at once, or very close to each other so they start to overlap and so producing the overload. Solutions could be spacing them out, introducing some “job control” to ensure scripts do not run in parallel if they should not (and especially you do not get many copies of same script running at once). One simple solution is instead of having bunch of scripts scheduled at midnight, 1AM, 2AM to start I can put them into nightly.sh one after another and schedule that to run at midnight – this way I get scripts ran one after another at their own pace.

Dedicated Slave I remember listening to Cary Millsap’s talk once and he recommended moving the load in time and space as optimization technique. We spoke about moving load in time before, but we also can move in space – putting it on the different system, which in MySQL space is most commonly dedicated slave. In a lot of environments especially with low level of operational/development discipline to enforce previous solutions it can be a life saver. Of course it only works for read jobs which is important limitation. Getting slave(s) for batch jobs also can help in other ways too – such as competition for buffer pool between different kinds of workloads is reduced.

innodb_old_blocks_time Surprisingly simple but effective, setting innodb_old_blocks_time=1000 can often be very helpful in avoiding batch jobs washing away buffer pool contents and so making normal user queries a lot more disk bound and slower. I wrote about it in more details few months ago.

Finally lets touch upon discovery question. To deal with load management you need to understand whenever the problem is happening in your environment (we want to catch it before users complain right?) and if it does what jobs exactly cause the overload. In complex environments it might be harder question than it looks. pt-stalk is a great tool for this purpose. Getting it running can help you to collect the state of your system when it was overloaded with side load (as well as performing poorly for other reasons). Analyzing wealth of data it generate will most likely contain answers you’re looking for.

May
04
2012
--

Testing Virident FlashMAX 1400

I still continue to run benchmarks of different SSD cards. This time I show numbers for Virident FlashMAX 1400. This is a MLC PCIe SSD device. There are couple notes on these results.
First, this time I use a different server. For this benchmark it is Cisco UCS C250, while for previous results I used HP ProLiant DL380 G6.

Second note is, that I use a mode “turbo=1″ for Virident card. What does that mean? Apparently PCIe specification has a limitation on available power. If I am not mistaken it is 25W, however Virident to provide full write performance requires 28W. And while many servers can handle 28W on PCIe, this is a non-standard mode, and Virident by default uses 25W (turbo=0). To force full power, I load a driver with turbo=1. I also use “maxperformance” formatting for Virident, which gives less capacity (1.2TB visible for user), but reserves internally more space to provide better write performance.

So as usually I start with random writes, async.

Soon after initial period, the result stabilizes at 550 MiB/sec level.

Random read, async:

Random read throughput is very close to perfect line, and it is 1450 MiB/sec.
This is best read throughput I’ve seen so far in my benchmarks.

To see distribution of response time, the results for random read synchronous IO.

There we can see that 1450 MiB/sec is not quite achievable in sync mode, and only 64 threads are getting close.

Response time:

In the conclusion, from all tested cards, Virident FlashMAX shows the most stable results and the best absolute performance so far.

For reference, other results in series:


May
03
2012
--

Testing Fusion-io ioDrive

Following my series of posts on testing different SSD, in my last post I mentioned that SATA SSD performance is getting closer to PCIe cards. It really makes sense to test it under MySQL workload, but before getting to that, let me review the same workload on Fusion-io ioDrive PCIe card. This is yet previous generation of Fusion-io cards, but this is the one that has biggest installation base.

Driver information: Fusion-io driver version: 2.3.10 build 110; Firmware v5.0.7, rev 107053

Following the format of previous benchmarks, first is random write async 16KB case.

We can see some wave-like pattern with throughput 350-400 MiB/sec.

Random reads, async:

Interesting to see that there is quite unstable throughput in range 450-500 MiB/sec.
This is not usual for read-only workload to have such variety in throughput.

It gets even more interesting when we go to read sync IO, with 1-64 threads.
Throughput:

Response time:

For same cases (i.e. 4 threads) we see some interesting patterns.
As for response time, actually it does not seem much better than for Intel 520.
For 8 threads it is 0.6 ms ( for Intel 520 – 0.69 ms).

To better understand patterns in the read synchronous case, let me unfold results and show them in timeline (from 0 to 1800 sec):

I am not sure how to explain it, that with 4 and 8 threads the pattern is less stable than with 32 threads.

It is curios that I published results already for bunch of cards:

and each card shows individuals patterns and different handling of write and read IO cases.


May
01
2012
--

Testing Intel SSD 520

Following my previous benchmarks of SATA SSD cards I got Intel SSD 520 240GB into my hands. In this post I show the results of raw IO performance of this card.

The benchmark methodology I described in previous posts, so let me jump directly to results.

First case is random write asynchronous 8 threads IO, the test is done just after a secure erase operation on the card.

The card is doing stable 380 MiB/sec level, but after around 4000 sec, as garbage collector kicks in, we see a performance drop to around 300 MiB/sec with some instability, which I will research in later charts.

Now, random reads, still asynchronous

It gives almost stable 370 MiB/sec throughput, with some strange small periodic drops.

To better understand response time ranges, we need to switch to synchronous IO and vary amount of threads.

Throughput:

And response times:

We still see small hiccups in throughput and response times even for small amount of threads.
For 8 threads the 95% response time is 0.69ms.

Now let me get back to random write case. I will try synchronous IO varying amount of threads and with measurements every 1 sec to see how bad are drops.

So there is more or less stable performance only for 1 thread. For 2 or more, the throughput varies a lot from second to second. I draw boxplots, which show 25-50-75 percentiles. So there is no grow in throughput after 2 threads, and the result averages at 300 MiB/sec.

I am still interesting in asynchronous IO, as MySQL 5.5 uses async IO for writes. Maybe 8 threads in the first graph is too much and we should go with 1 thread?

So even with 1 async write thread the throughput jumps a lot in range 200 – 400 MiB/sec.

As conclusion, I should say that 300 MiB/sec level for random reads and writes is very decent result for SATA card. I think with this performance SATA is getting closer to level of PCIe cards. Of course PCIe still provides better numbers, but the question is how much MySQL can use. In his keynote Mark Callaghan mentioned that Fusion-io cards they use are highly underutilized.

With the performance variance we see it is a good question how does it affect MySQL performance, and I am going to run some MySQL workloads on these cards to understand it better.

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.


Apr
26
2012
--

Percona Server 5.1.62-13.3 released!

Percona is glad to announce the release of Percona Server 5.1.62-13.3 on April 25, 2012 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.1.62, including all the bug fixes in it, Percona Server 5.1.62-13.3 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.62-13.3 milestone at Launchpad.

New option rewrite-db has been added to the mysqlbinlog utility that allows the changing names of the used databases in both Row-Based and Statement-Based replication. This was possible before by using tools like grep, awk and sed but only for SBR, because with RBR database name is encoded within the BINLOG ‘….’ statement.

Release notes for Percona Server 5.1.62-13.3 are available in our online documentation.

Apr
26
2012
--

Percona Server 5.5.22-25.2 released!

Percona is glad to announce the release of Percona Server 5.5.22-25.2 on April 25, 2012 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.5.22, including all the bug fixes in it, Percona Server 5.5.22-25.2 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.22-25.1 milestone at Launchpad.

Bugs Fixed:

  • While running the test case found that MEMORY engine may return rows in non-deterministic order for equal keys. Bug fixed #892951 (Laurynas Biveinis).

Release notes for Percona Server 5.5.22-25.2 are available in our online documentation.

Apr
25
2012
--

Testing STEC SSD MACH16 200GB SLC

Following my previous benchmark of Samsung 830, today I want to show results for STEC MACH16 SATA card, 200GB size, this card is based on SLC, and regarding STEC website, it is an enterprise grade storage.

For tests I use sysbench fileio, 16KiB block size (to match workload from InnoDB, as this is primary usage for me), and recently I switched to use async IO mode. There are two reasons for that. First, MySQL/InnoDB uses async writes, so this will emulate database load, and second, async mode allows to see maximal possible throughput, it does not show reliable latency though, as it appears there is no a reliable way in the Linux asynchronous IO library to get time metrics for particular IO block.

so my testing command line looks like:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=18000 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=10 run

You may see I gather metrics every 10 sec to see how stable the performance is, and it really helps to observe some artifacts, as you will see in following graphs.

Hardware for tests: HP ProLiant DL380 G6, filesystem: ext4, mounted with nobarrier.

The results for random write case (8 async IO threads):

In general it shows stable throughput topping to 148 MiB/sec, but every 20 min, there is small drop to 87 MiB/sec, which I guess is related to internal garbage collector activity.

The results for random read case:

Very stable throughput on line 222 MiB/sec

To understand better what kind of response time we should expect, I ran random read sync IO mode, now for 1-64 threads.

The throughput:

We are getting to the peak throughput at 8 threads.

And response time:

For 8 threads, we may expect 0.62ms response time.

In general I have very good experience with this card, and it seems suitable to work with MySQL. I will publish sysbench oltp benchmarks running MySQL on RAID10 over 4 STEC MACH16 cards.

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.

Disclaimer: This benchmark is done as part of consulting work for STEC, but this post is totally independent and fully reflects our opinion.


Apr
25
2012
--

Testing Samsung SSD SATA 256GB 830 – not all SSD created equal

I personally like PCIe based Flash, but from a pricing point our customers are looking for cheaper alternatives. SATA SSD is an options. There is many products based on MLC technology, and Intel 320 I would say is the most popular. I do not particularly like its write performance – I wrote about it before, that’s why I am looking for comparable alternatives. Samsung 830 256GB looked like a good product, that’s why I decided to test it.

For tests I use sysbench fileio, 16KiB block size (to match workload from InnoDB, as this is primary usage for me), and recently I switched to use async IO mode. There are two reasons for that. First, MySQL/InnoDB uses async writes, so this will emulate database load, and second, async mode allows to see maximal possible throughput, it does not show reliable latency though, as it appears there is no a reliable way in the Linux asynchronous IO library to get time metrics for particular IO block.

so my testing command line looks like:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=18000 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=10 run

You may see I gather metrics every 10 sec to see how stable the performance is, and it really helps to observe some artifacts, as you will see in following graphs.

Hardware for tests: HP ProLiant DL380 G6, filesystem: ext4, mounted with nobarrier.

The results for random write case (8 async IO threads):

It seems that InnoDB is not alone with its flashing problems. You can see there periodical stalls in throughput (0 throughput for 20-30 sec period of time). When there is no drops, the drive keep write throughput on 323 MiB/sec level.

I really thought that these stalls are related, so I was totally surprised them in random reads also.
The results for random read case:

I do not have a good explanation for this. When there is no drop, the drive keeps 375 MiB/sec throughput. I may do a wild guess about drops – the drive periodically cleans an internal cache or something.

To understand better what kind of response time we should expect, I ran random read sync IO mode, now for 1-64 threads.

The throughput:

We are getting to the peak throughput at 16-32 threads.

And response time:

For 16 threads, we may expect 0.96ms response time, which increases to 1.62ms under 32 threads.

The periodic drops that I observe for both random reads and random writes do not allow me to recommend this drive for a database server usage, even in general this drive provides much better throughput than Intel 320 (some results for Intel 320).

If you are interested more in SSD and MySQL questions – I will be giving a webinary “MySQL and SSD” on May-9. It will be the same as my talk on Percona Live MySQL Conference 2012, if you did not attend my talk – you are welcome to join the webinar.


Apr
20
2012
--

Joining many tables in MySQL – optimizer_search_depth

Working on customer case today I ran into interesting problem – query joining about 20 tables (thank you ORM by joining all tables connected with foreign keys just in case) which would take 5 seconds even though in the read less than 1000 rows and doing it completely in memory. The plan optimizer picked was very good one, yet you could notice EXPLAIN itself was taking same 5 seconds, which points to problem with optimizer performance. Note though if you have subqueries these might need to be executed during EXPLAIN phase yet making it unusable to check the optimizer performance.

Solution for this problem was to use set optimizer_search_depth=0, rarely used option which as per manual will chose best value automatically. Making this change I could bring optimization, and full query execution time to less than 50ms. Low values, such as 3,4 provided a bit better performance but I decided against using this as I did not want to risk likehood of execution plans changing for some over queries joining less number of tables.

I was wondering if 0 is automatic selection why do we have value of 62 being default in MySQL 5.5 which can produce very expensive plan selections ? Investigating this further I found the following explanation from Timour Katchaounov in MySQL mailing list archives

I have some recollection that there were few main reasons for the
decision to keep exhaustive search as the default:
- backwards compatibility,
- the hypothesis that most users have joins with few tables,
- it is not clear how far from optimal plans do we get by using a greedy
search.

From the same discussion we can learn how automatic selection works – it picks value of min(number of tables, 7) essentially limiting search depth to no more than 7 at which complexity is reasonable. This makes Timour explanation somewhat conflicting though as if we assume MySQL users do not join lots of tables (less than 7) when using 0 as default value would not impact them.
For people who have more than 7 tables in join I think faster execution plan computation would be more important than backward compatibility.

In MySQL 5.6 things are likely to get even better handling joins of many tables as optimizer heuristics are improved so much higher search depths are feasible now.

Apr
20
2012
--

Benchmarks challenges of XtraDB Cluster

We are running internally a lot of benchmarks on our recently announced Percona XtraDB Cluster, and I am going to publish these results soon.
But before that I wanted to mention that proper benchmark of distributed system comes with a lot of challenges.
I am saying that not to complain, but to make sure, if you are going to benchmark XtraDB Cluster yourself, there is a lot of things to take into account.

And it seems that one component, which was not much important before, now appears as critical peace, which easily can became bottleneck in the benchmarks – this is network.

In case of simple client-server setup, the network is not fully utilized.

But as we start testing a cluster setup, the 1Gb network between client and switch is getting fully utilized by sysbench communication with 3 nodes.

In this setup it does not make sense to increase number of nodes, as we will not be able to load them properly.

The solution would be to increase network capacity or add additional client boxes.

Now take into account that there is an internal network communication between nodes also, and that makes a network tuning as the critical part of a cluster setup. This is not something we paid much attention before.

The main conclusion of this post is that if you are going to benchmark a Percona XtraDB Cluster or just use it under intensive communication workload, pay an attention to network component. It is very easy that a client or a client network becomes bottleneck.


Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com