Maximal write througput in MySQL

I recently was asked what maximal amount transactions per second we can get using MySQL and XtraDB / InnoDB storage engine if we have high-end server. Good questions, though not easy to answer, as it depends on:

– durability setting ( innodb_flush_log_at_trx_commit = 0 or 1 ) ?
– do we use binary logs ( I used ROW based replication for 5.1)
– do we have sync_binlog options.

So why would not take these as variable parameters and run simple benchmark.
I took sysbench update_key scenario ( update indexed field on simple table)
and used Dell PowerEdge R900 with 16 cores, FusionIO as storage for table and RAID 10 with BBU as storage for innodb log files, innodb system table space and binary logs. And I used Percon-XtraDB-5.1.43-9.1 for benchmarks. All used partitions are formatted in XFS and mounted with nobarrier option.

I run update key for various threads and with next parameters

  • trx_commit=0 : innodb_flush_log_at_trx_commit = 0 and no binary logs
  • trx_commit=1 : innodb_flush_log_at_trx_commit = 1 and no binary logs
  • trx_commit=0 & binlog : innodb_flush_log_at_trx_commit = 0 and binary logs
  • trx_commit=1 & binlog : innodb_flush_log_at_trx_commit = 1 and binary logs
  • trx_commit=1 & binlog & sync_bin : innodb_flush_log_at_trx_commit = 1 and binary logs and sync_binlog=1
  • There are results I get:

    I found results being quite interesting.
    with innodb_flush_log_at_trx_commit = 0 maximal tps is 36332.02 tps, which drops to 23115.04 tps as
    we switch to innodb_flush_log_at_trx_commit = 1. As we use RAID10 with BBU, I did not expect the drops is going to be significant. In second case InnoDB spends

    With enabling binary logs, the results drops to 17451.01 tps with innodb_flush_log_at_trx_commit = 0 and to 12097.39 tps with innodb_flush_log_at_trx_commit = 1. So with binary logs serialization is getting even worse.

    Enabling sync_binlog makes things really bad, and maximal results I have is
    3086.7 tps. So this is good decision if binary log protection is worth such drop.

    UPDATE ( 3/4/2010 )

    Results with innodb_flush_log_at_trx_commit = 2

    Results with innodb_flush_log_at_trx_commit = 2 and binlogs

    Entry posted by Vadim |

    Add to: delicious | digg | reddit | netscape | Google Bookmarks


MySQL 5.5-m2 scalability

Oracle recently announcent MySQL-5.5.2-m2 milestone, and I have a predition that MySQL-5.5 will be announced as GA on MySQL UC 2010 in April.

So let’s make quick on scalability characteristics we should expect from new release.

I made sysbench oltp benchmarks on 10 mln rows (worth 2.5GB of data), on our Dell R900 system ( 16 cores, 32GB of RAM, FusionIO + RAID10, but disk in this case does not matter).

There is results for various threads (results are in tps, more is better)

I think it is good results, and there couple conclusions we can come with

  • InnoDB now can scale up to 16 threads pretty well
  • InnoDB can keep load up to 128 threads, and for 256 threads the result is still acceptable, but drops by 25% comparing to peak in 16 threads

If you interested what is issue in scaling over 16 threads, that in current
benchmarks – this is concurrency on rollback segment (which is global shared structure in InnoDB)

sysbench command for reference:

./sysbench --test=oltp --oltp-table-size=10000000 --init-rng=1 --num-threads=$i --max-requests=0 --oltp-dist-type=uniform --max-time=180 run

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Innodb,MySQL,Zend Developer |

Why you should ignore MySQL’s key cache hit ratio

I have not caused a fist fight in a while, so it’s time to take off the gloves. I claim that somewhere around of 99% of advice about tuning MySQL’s key cache hit ratio is wrong, even when you hear it from experts. There are two major problems with the key buffer hit ratio, and a host of smaller ones. If you make some assumptions that are very hard to prove, there actually is a very limited use for the statistics from which the ratio is derived (but not the ratio itself, which you should ignore). Read on for the details.

In this article, I will use key buffer, key_buffer, and key cache interchangeably. However, I will be careful about the difference between “rate” and “ratio”. In this article, the key cache miss rate is defined as the number of misses per unit of time, with the units of operations per second. The key cache miss ratio is the ratio between reads from the disk and reads from the cache; it is a dimensionless number because the units are the same in the numerator and denominator, and they cancel out. The key cache hit ratio is simply one minus the miss ratio.

The key_buffer hit ratio

The metrics we’re interested in are all defined in terms of counters you can get from SHOW GLOBAL STATUS. I’ll start out by copying and pasting from the MySQL manual:

  • Key_read_requests

    The number of requests to read a key block from the cache.

  • Key_reads

    The number of physical reads of a key block from disk.

So far, so good. All of the above is mostly factual (more on this later). Here’s another fact from the manual:

The cache miss rate can be calculated as Key_reads/Key_read_requests. [Note: their use of ‘rate’ doesn’t match mine; I would call this the ‘miss ratio’].

The problem occurs when you start to assign any importance to this ratio. The MySQL manual doesn’t fall into this trap, but if you search Google for Key_read_requests, you will find lots of advice on “tuning by ratio,” including phrases such as “The ratio of key_reads to key_read_requests should be as low as possible, no more than 1:100” or “your key cache hit ratio should be very high, ideally above 99.9%” or “if your key cache hit ratio is low, your key_buffer_size is too small.”

So here’s a summary of two pieces of bad advice:

  • Bad advice #1: you should care about the key cache hit ratio.
  • Bad advice #2: you should set your key_buffer_size according to this ratio.

Tuning by ratio is one of those things that is widely accepted because of “proof by repeated assertion,” but as you know, that doesn’t make it valid. Let’s see why the above two things are bad advice.

Problem 1: Ratios don’t show magnitude

Ratios of counters are virtually meaningless for two major reasons. The first is that ratios obscure magnitude. Look at the following and tell me what you think:

  • Server #1 has a key cache miss ratio of 23%
  • Server #2 has a key cache miss ratio of 0.001%

Which server is badly tuned? Maybe you’re objecting that you don’t know enough about the servers. OK, I’ll give you more information. Both servers are real production servers, with powerful hardware and heavy query workloads. Both have a key_buffer_size of 4GB. Now can you tell which server is badly tuned? — No, you can’t tell anything meaningful based on a ratio of counters, because the process of dividing one counter by the other to get the ratio has discarded vital information. You don’t know how many Key_reads and Key_read_requests those servers have done.

Just for fun, consider these fake but entirely possible scenarios: server #1 has 23 Key_reads and 100 Key_read_requests (23% miss ratio). Server #2 has one trillion Key_reads, and one hundred quatrillion Key_read_requests (1/100th of a percent). Given that information, which server is badly tuned? If you said “I still can’t tell,” maybe you want more information, so I’ll tell you that both servers are identically tuned, and they have identical data, hardware, and workload. Even that doesn’t help, though.

Problem 2: Counters don’t measure time

The reason you still can’t tell which server is badly tuned is because, even if you know the absolute numbers, you are missing the element of time, in two important ways. First, you don’t know how long of an interval I used to measure the statistics on those two servers. Maybe I measured the first server immediately after starting it, and that’s why its counters are so small. The second server has been online practically forever, and that’s why its counters are big. Let’s say this is the case. Now, you’ve got all the information you need to form an opinion, right? Instead of asking the same annoying question, let me ask it a different way: is either of these servers badly tuned?

There’s still not enough information — I hope you’re beginning to appreciate that tuning by ratio is a waste of time! The Oracle folks arrived at this conclusion a long time before the MySQL world started to come around. There are even tuning utilities (anti-tuning anti-utilities?) for Oracle, specifically designed to mock and frustrate those who would tune by ratio. They are capable of creating any buffer hit ratio the user desires by running silly queries that do nothing but cause buffer hits, skewing the result towards “this ratio looks great!”

The second kind of time information you’re lacking is how much time each buffer hit or miss takes. If you approach application performance optimization from the standpoint of response time measurements, which you should, you will eventually arrive at this question. “I have a query I know is slow and is a problem for my application. I have profiled it with SHOW STATUS and I know it causes a great many Key_reads to occur. How much of this query’s execution time is consumed by those operations? Should I try to reduce Key_reads?”

There is no way to know. All you get is counters — you don’t get the time elapsed. In technical terms, counters are surrogate measures. They are not helpful. And as Cary Millsap says, the unfortunate problem is that surrogate measures work sometimes, simply because there is sometimes a correlation (but not a cause) relationship between the counter events and the query’s execution time. Alas, that correlation fools us into thinking it’s a cause, and we optimize-by-surrogate-measure a time or two and it appears to work — so we turn into little Pavlovian DBAs and try to do that every time. It would be better if optimizing-by-counter never worked!

A partially valid use of Key_reads

There is a partially valid reason to examine Key_reads, assuming that we care about the number of physical reads that occur, because we know that disks are very slow relative to other parts of the computer. And here’s where I return to what I called “mostly factual” above, because Key_reads actually aren’t physical disk reads at all. If the requested block of data isn’t in the operating system’s cache, then a Key_read is a disk read — but if it is cached, then it’s just a system call. However, let’s make our first hard-to-prove assumption:

  • Hard-to-prove assumption #1: A Key_read might correspond to a physical disk read, maybe.

If we take that assumption as true, then what other reason might we have for caring about Key_reads? This assumption leads to “a cache miss is significantly slower than a cache hit,” which makes sense. If it were just as fast to do a Key_read as a Key_read_request, what use would the key buffer be anyway? Let’s trust MyISAM’s creators on this one, because they designed a cache hit to be faster than a miss.

  • Hard-to-prove assumption #2: A key cache miss is probably slower than a hit, maybe.

What else? Maybe this physical I/O operation is randomly positioned, which is a worst-case scenario for spinning disks. This is also very hard to prove, but seems reasonable based on the structure of a B-tree index, so let’s assume anyway:

  • Hard-to-prove assumption #3: A Key_read might cause a random I/O operation, maybe.

Now, given those assumptions, we can further assume the following:

  • It is good to minimize Key_reads because they are slow and cause random disk I/O.

Notice that we still don’t know anything about any relationship between Key_reads and the execution time of our query. All we can do is guess, like good Pavlovian DBAs, that there is a relationship. However, we can again reason that random I/O can cause collateral damage: if the disk head is seeking all over for random I/O, then other I/O (including non-random I/O) is likely to be impacted. If we manage to reduce Key_reads, we might make the database server faster overall, and perhaps the query of interest will accidentally get faster too, and we’ll get a treat.

There is one interesting question that we haven’t really addressed yet. How bad is bad? This is where we return to the notion of the key cache miss rate in units of operations per second. Given our assumed correlation between a Key_read and a random physical disk I/O, it is partially valid to say that we are going to get in trouble when Key_reads gets close to the number of random I/Os our disk can do. Here’s another formula for you:

Key_cache_miss_rate = Key_reads / Uptime

Note the conspicuous absence of Key_read_requests in the formula. The number of requests is absolutely irrelevant — who cares how often the key is requested? What’s relevant is that our assumed connection between Key_reads and random I/Os means that Key_reads/Uptime is assumed to be the same as “random I/Os per second.”

And now, I would finally like to show you something partially useful you can do with Key_reads:


  1. [baron@localhost ~]$ mysqladmin ext -ri10 | grep Key_reads
  2. | Key_reads                         | 6030962       |
  3. | Key_reads                         | 98            |
  4. | Key_reads                         | 89            |
  5. | Key_reads                         | 104           |

This server is doing approximately 100 Key_reads every ten seconds, so we can assume Key_reads are causing about ten random I/Os per second. Compare that to what your disks are capable of, and draw your own conclusions about whether this is a performance problem. I know what I’d like: I’d like to ask the disk itself how much random I/O it’s doing. But alas, that’s virtually impossible on most systems I work on. So there you have it — yet another surrogate measure.

How to choose a key_buffer_size

Let’s recap. So far I’ve shown you the fallacy of tuning by ratio, and told you to ignore the ratio and in fact, ignore Key_read_requests altogether. I’ve explained that counters are a surrogate measure, but the fact that they’re easy to get and sometimes correlated with the true problem causes people to mistake counter analysis for a true performance optimization method. I’ve shown that if we make some assumptions that are hard to prove, we can compare Key_reads to the disk’s physical capacity for random I/O and get an idea of whether index I/O might be causing a performance problem.

But I haven’t shown you how to choose an appropriate key_buffer_size. Let’s look at that now.

This topic deserves an entire blog post, because there are many subtleties including the possibility of having multiple key caches. But I’ll give the simple version here. In my opinion, you should choose a key_buffer_size that is large enough to hold your working set — the index blocks that are frequently used. How large is that? This is yet another thing that’s really hard to measure, alas! So we need to either pick a surrogate, or pull a number out of thin air. Here are some suggestions that are about as good as any:

  1. Just set it really big and forget it. If you have enough memory, who cares. The memory isn’t allocated until it’s used — if you set it to 4GB, that doesn’t mean that 4GB is actually used. This is not as much of an abdication of responsibility as it might sound like on a machine that’s dedicated to MyISAM tables.
  2. Consider your mixture of storage engines (some InnoDB, some MyISAM, which is more important to you, etc) and choose an amount of memory based on how important those tables are to you, how big the indexes are on disk, and so on.
  3. Raise the key_buffer_size until, when the buffer is full, Key_reads/Uptime reduces to a number you’re comfortable with.
  4. Set key_buffer_size really big, and then measure its size as it fills up, in something like 1 minute intervals. Pull this into a spreadsheet and graph it as a curve. When the curve’s growth tapers off, pick that point on the curve and use it as a heuristic for how big your working set is. Set the key_buffer_size to that.

If the above methods shock you with their unscientific-ness, they shouldn’t. The reality is that this server setting is very subjective, and there is no good instrumentation in MySQL to guide your decisions. It is also not the be-all and end-all of MySQL performance, and people frequently obsess over it far out of proportion. But again, 99% of the advice I’ve seen is based on something much worse: a red herring that only sounds scientific and authoritative — the “key cache hit ratio.” This is a shame. When you are new to MySQL, trying to configure my.cnf, and you have heard guidance that seems so definite, mathematical, and authoritative, but still makes no sense, why wouldn’t you obsess over it?

What about InnoDB tuning?

You might be wondering, what about InnoDB tuning? What is the best way to choose an innodb_buffer_pool_size setting? This is a topic that deserves its own article too, but the short version is: ratio-based tuning is just as wrong for InnoDB as it is for MyISAM. Ratio-based tuning is invalid and wrong in general, not just for specific things. All of the above points (loss of magnitude, lack of timing information, etc) apply to all types of ratio-based and counter-based tuning techniques.


Major points in this article:

  • Counter ratios are meaningless as a performance analysis metric because they are a) ratios of b) counters.
    • You need to look at absolute magnitude, not ratios.
    • For performance analysis, you need to measure elapsed time, not just the number of times something happens.
  • In the absence of timing information, and if you trust the cache’s creator and assume that misses are more expensive than hits, then you care about cache misses, not cache hits.
  • A Key_read is not guaranteed to be a random physical I/O, but it might be. Gather Key_reads/Uptime over 10-second or 60-second intervals and compare the results to your IO system’s capabilities.
  • MySQL doesn’t have good instrumentation for scientifically choosing a key_buffer_size setting, but there are many unscientific approaches that are better than ratio-based tuning.
  • Counter ratios suck for everything, not just for MyISAM tuning.

I’ve had some heated arguments over these points, so I don’t expect the above to pass without controversy. But really, it’s time to stop with the bad advice about counter ratios. The sooner we do that, the sooner we can move on to better things.

Entry posted by Baron Schwartz |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Index lock and adaptive search – next two biggest InnoDB problems

Running many benchmarks on fast storage (FusionIO, SSDs) and multi-cores CPUs system I constantly face two contention problems.

So I suspect it’s going to be next biggest issues to make InnoDB scaling on high-end system.

This is also reason why in benchmarks I posted previously CPU usage is only about 50%, leaving other 50% in idle state.

First problem is index->lock mutex.
InnoDB uses single mutex per index, so when you run mixed read / write queries, InnoDB locks index for write operation and thus keeps all selects waiting when update/insert is done. This is implemented in this way because write operation may cause B-Tree page split, and InnoDB needs to move records between pages to finish operation. It is getting even worse when for write you need to perform some additional IO to bring page into buffer_pool.

What could be done there internally: there is B-Tree lock free or only page-level lock algorithms, so operation does not need to block whole B-Tree.
From end user point, to fight with this problem, you may need to partition (manually or using 5.1 partitions) table with big index into couple smaller table. It’s ugly, but it can help while main problem is not solved.

Second problem is adaptive_search index.
It appears when you have some scanning by secondary key select queries and write queries at the same time.
InnoDB again uses single global mutex for adaptive_search (single mutex for ALL table and ALL indexes), so write query blocks ALL select queries.
Usually first action is to disable adaptive_search (it is possible via global variable), but it rarely helps actually. With disabled adaptive index InnoDB needs to perform much more operations reading secondary keys.

How it can be solved internally: I think some hashing algorithms may be applied to not lock select queries. We may look how to implement it.

Until that InnoDB basically can’t utilize powerful hardware.
For example even in IO intensive load I am getting the same result
for single FusionIO card and for two FusionIO cards coupled in RAID0 (which theoretically doubles througput).

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Innodb,MySQL,Zend Developer |

Missleading Innodb message on recovery

As I wrote about 2 years ago the feature of Innodb to store copy of master’s position in Slave’s Innodb tablespace got broken. There is a lot of discussions at the corresponding bug report while outcome of the fix remained uncertain for me (the bug is market duplicate while the bugs it seems to be duplicate for describe different issues).
Anyway. The customer came to me today having the following message in the error log after Slave crash while running MySQL 5.1.41 (extra virgin version without XtraDB or Percona Patches)

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 10000000, file name mysql-bin.005000
InnoDB: Last MySQL binlog file position 0 20000000, file name ./mysql-bin.003000

He has tried to restart replication from mysql-bin.005000 position 10000000 which failed with “Could not find first log file name in binary log index file” error message

Looking at the Master for this slave I could see its binary log files going only to about 2000, so the binary log file mentioned did not ever exist on this master. What is going on ?
The thing is Innodb does not update this information any more however if it is stored in the tablespace the code is still there to print it. This database was running older MySQL version a while back which was updated to MySQL 5.1 months ago, moved to the new hardware by physical copy and the log file numbers restarted back from 1 but tablespace still contained the ancient data.

I reported the bug on this which should be easy to fix. Otherwise it is easy mistake to make. We also have a patch which restores this information and uses it on slave crash recovery.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks



Question what is better isolation level is poping up again and again. Recently it was discussed in InnoDB : Any real performance improvement when using READ COMMITED isolation level ? and in Repeatable read versus read committed for InnoDB .
Serge in his post explains why READ COMMITED is better for TPCC load, so
why don’t we take tpcc-mysql bencmark and check on results.

I took 3 different datasets 1000w (100GB of data), 300w (30GB) and 10w (1GB) for box with 32GB of RAM and buffer_pool 26GB. Latest case 10w is interesting as I expect a lot of contention on row level having small dataset.
I used as usually tpcc-mysql benchmark with 16 and 32 (for 10w) concurrent users.

Also I had binary log enabled on RBR mode (as READ-COMMITED does not support STATEMENT based replication in 5.1), and everything was run under XtraDB-9 ( based on InnoDB-plugin 1.0.6)

So there are results:

For 1000w:


There READ-COMMITED seems more preferable, however difference is very small.

For 300w:

I would say both modes are even there, there is no winner for me.

for 10w:

Almost indentical resutls in this case.

However, however.
For 10w run I got 60 Deadlock errors “1213, Deadlock found when trying to get lock; try restarting transaction” in READ-COMMITED mode, and 0 Deadlocks in REPEATABLE-READ mode.

I understand that 60 deadlocks for total 2704687 transactions can be ignored, but it seems you have better chance to get DEADLOCK in READ-COMMITED then in REPEATABLE-READ.

So both modes looks even for me, though some facts to consider:

  • READ-COMMITED is used rare than REPEATABLE-READ (default), that is less-tested
  • READ-COMMITED does not work with statement-based replication in 5.1
  • with READ-COMMITED you may have more DEADLOCKS

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Blob Storage in Innodb

I’m running in this misconception second time in a week or so, so it is time to blog about it.
How blobs are stored in Innodb ? This depends on 3 factors. Blob size; Full row size and Innodb row format.

But before we look into how BLOBs are really stored lets see what misconception is about. A lot of people seems to think for standard (“Antelope”) format first 768 bytes are stored in the row itself while rest is stored in external pages, which would make such blobs really bad. I even seen a solution to store several smaller blobs or varchar fields which are when concatenated to get the real data. This is not exactly what happens :)

With COMPACT and REDUNDANT row formats (used in before Innodb plugin and named “Antelope” in Innodb Plugin and XtraDB) Innodb would try to fit the whole row onto Innodb page. At least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. If row fits completely Innodb will store it on the page and not use external blob storage pages. For example 7KB blob can be stored on the page. However if row does not fit on the page, for example containing two 7KB blobs Innodb will have to pick some of them and store them in external blob pages. It however will keep at least 768 bytes from each of the BLOBs on the row page itself. With two of 7KB blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.

Such decision to store first 768 bytes of the BLOB may look strange, especially as MySQL internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if BLOB is accessed external page will always have to be read. This decision seems to be rooted in desire to keep code simple while implementing initial BLOB support for Innodb – BLOB can have prefix index and it was easier to implement index BLOBs if their prefix is always stored on the row page.

This decision also causes strange data storage “bugs” – you can store 200K BLOB easily, however you can’t store 20 of 10K blobs. Why ? Because each of them will try to store 768 bytes on the row page itself and it will not fit.

Another thing to beware with Innodb BLOB storage is the fact external blob pages are not shared among the blobs. Each blob, even if it has 1 byte which does not fit on the page will have its own 16K allocated. This can be pretty inefficient so I’d recommend avoiding multiple large blobs per row when possible. Much better decision in many cases could be combine data in the single large Blob (and potentially compress it)

If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen. Furthermore depending on column sizes it may vary for different rows. I wish Innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. May be one day we’ll come to implementing this in XtraDB :)

So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

COMPRESSED row format is similar to DYNAMIC when it comes to handling blobs and will use the same strategy storing BLOBs completely off page. It however will always compress blobs which do not fit to the row page, even if KEY_BLOCK_SIZE is not specified and compression for normal data and index pages is not enabled.

If you’re interested to learn more about Innodb row format check out this page in Innodb docs:

It is worth to note I use BLOB here in a very general term. From storage prospective BLOB, TEXT as well as long VARCHAR are handled same way by Innodb. This is why Innodb manual calls it “long columns” rather than BLOBs.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Innodb,MySQL,Zend Developer |

Introducing percona-patches for 5.1

Our patches for 5.0 have attracted significant interest.  You can read about SecondLife’s experience here, as well as what Flickr had to say on their blog.  The main improvements come in both performance gains and improvements to diagnostics (such as the improvements to the slow log output, and INDEX_STATISTICS).

Despite having many requests to port these patches to 5.1, we simply haven’t had the bandwidth as our main focus has been on developing XtraDB and XtraBackup.  Thankfully a customer (who prefers to stay unnamed) as stood up and sponsored the work to move the patches to 5.1.

To refresh, the most interesting patches are:

Two new features which not available for 5.0:

  • In slow.log for Stored Procedure call you can see profiling for each individial query from this procedure, not just call storproc()
  • With userstat you can get additional THREADS_STATISTICS which show similar information to USER/CLIENT_STATISTICS but per THREAD granularity (it’s useful if you have connection pool)

On this stage the patches are available only in source code, you
can get them from Launchpad https://code.launchpad.net/~percona-dev/percona-patches/5.1.43.  Binaries are also on the way, and will be ready soon. We are running intensive stress testing loads on them to provide stable and quality packages.

And to finalize are results for tpce-like benchmark, where I compare MySQL-5.1.43 vs percona-5.1.43.

The results made for TPCE configuration with 2000 customers and 300 tradedays and 16 concurrent users on our R900 server. The dataset is about 25GB, fully fitting into buffer_pool, so disk does not really matter, but data was stored on FusionIO 320GB MLC card.

On chart with results I show amount of TradeResults transactions per 10 sec during 3600 session (more is better)

As you see with percona patches you can get just about 10x improvement.
Yeah, that sounds too cool, but let me explain where difference comes from.

As I mentioned in tpce workload details the load is very SELECT intensive and these SELECTS are mainly scans by secondary keys ( not Primary Keys), so it hits problems in InnoDB rw-lock implementations and in buffer_pool mutex contention, which alredy fixed in percona-patches ( and in XtraDB and InnoDB-plugin also).

So you are welcome to try it!

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Introducing tpce-like workload for MySQL

We have been using tpcc-mysql benchmark for long time, and there many results published in our blog, but that’s just single workload. That’s why we are looking into different benchmarks, and one
of them is TPCE. Yasufumi made some efforts to make TPCE working with MySQL, and we are making it available for public consideration.

You can download it from our Lauchpad Percona-tools project, it’s
bzr branch lp:~percona-dev/perconatools/tpcemysql

Using this package you should agree with TPC-E License Agreement,
which in human words is:

  • You can’t name results as “TPC Benchmark Results”
  • You can’t compare results with results published on http://www.tpc.org/ and you can’t pretend the results are compatible with published by TPC.

And we are not going to do anything from that, your primary goals is XtraDB/InnoDB performance research and/or compare with available Storage Engines for MySQL.

The workload in tpce is quite different from tpcc. Tpcc is write intensive, while tpce
is read oriented.
To give more details, there is stats for 10 seconds:


  1. | Com_select                            | 46272       |
  2. | Com_update                            | 5214        |
  3. | Com_delete                            | 385         |
  4. | Com_insert                            | 3468        |
  5. | Com_commit                            | 5404        |

The result is quite chatty,


  1. |    |     [MEE]    | [DM] |                         [CE]                          |
  2. sec. |    TR,    MF |   DM |   BV,    CP,    MW,    SD,    TL,    TO,    TS,    TU | MEEThreads, ReqQueue
  3.       (1st line: count, 2nd line: 90%ile response [msec.])
  4.  260 |   402,    39,     0,   195,   532,   749,   588,   342,   415,   816,    88 | 30, 0
  5.           20,    60,     0,    30,    20,    20,    20,    50,    20,   310,    60
  7.  270 |   395,    40,     0,   201,   608,   842,   608,   358,   449,   833,    89 | 30, 0
  8.           30,    40,     0,    30,    20,    20,    20,    50,    20,   300,    50

but it allows you to see count of 11 different transactions per 10 secs and 90% response time.

and final result


  1. [TradeResult(TR) transaction]
  2. Succeed: 150243
  3. Lated:   0
  4. Retried: 3
  5. Failed:  0
  7. 41.7342 TpsE

where you can see count of successful TR (TradeResult) transactions, and
the summary result in TpsE (transactions per seconds).

Expect our results soon!

Entry posted by Vadim |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com