Jan
31
2011
--

MySQL & Friends Meetup @ FOSDEM

Kris Buytaert organized a MySQL Meetup at FOSDEM last year, and because of the success we’ll be doing it again this year, in the same restaurant: Sogno d’Italia.

Everybody is invited to come to the dinner. Just register on http://mysqlandfriendsfosdem2011.eventbrite.com/ so we have an idea how many reservations we should make. The maximum capacity is about 40 people.

We’ll plan to meet at 19:00 on Saturday, under the big tree in front of the AW building. Then we’ll walk to the restaurant, which is about 5 minutes walking distance.

 

Jan
31
2011
--

What is exec_time in binary logs?

If you’ve used MySQL’s mysqlbinlog tool, you’ve probably seen something like the following in the output: “exec_time=0″ What is the exec_time? It seems to be the query’s execution time, but it is not.

In most cases, the exec_time will be similar to the query execution time on the server where the query originally ran. If you’re running replication, then on the replication master, the exec_time will look believable. But on replicas, exec_time can be completely different. Suppose you execute an UPDATE that takes one second to run on the master. This gets logged into the binary log and copied to replicas for replay. Now suppose that you have also set log_slave_updates on the replica. When the statement executes in the replication thread, it will be logged to the replica’s binary log so it can be replicated to other servers in turn.

If you use mysqlbinlog to examine the replica’s binary log, you might see a very different exec_time, such as exec_time=1000, even if the statement did not take 1000 seconds to execute. Why is that?

The reason is that the exec_time is the difference from the statement’s original start timestamp and the time at which it completed executing. On the originating server, that’s accurate to within the tolerance of a one-second timestamp. However, replicas might execute the statement much later. And in those cases, the end timestamp will be much later than the start timestamp on the originating server, thus causing the statement to appear to have taken a long time to execute.

Jan
28
2011
--

Announcing Ignite Percona – San Francisco, Feb 16th.

We’re excited to announce that as well as Percona Live – we will also be holding an evening of Ignite talks on February 16th.

For those not familiar the format, you should check out the Best of Ignite from the MySQL conference last year, as well as Baron’s What is the Difference Between XtraDB and Others? talk.

Here are the key details:

  • Same Location as Percona Live (Bently Reserve, San Francisco).
  • A ticket to Percona Live also gets you into Ignite Percona.
  • Free registration is also available to Ignite Percona (space is limited).
  • We are also seeking speakers to submit talks until February 4th.  Be quick!

For more information, please check out the Ignite Percona event page on our website.

Jan
26
2011
--

Modeling InnoDB Scalability on Multi-Core Servers

Mat Keep’s blog post on InnoDB-vs-MyISAM benchmarks that Oracle recently published prompted me to do some mathematical modeling of InnoDB’s scalability as the number of cores in the server increases. Vadim runs lots of benchmarks that measure what happens under increasing concurrency while holding the hardware constant, but not as many with varying numbers of cores, so I decided to use Mat Keep’s data for this. The modeling I performed is Universal Scalability Law modeling, which can predict both software and hardware scalability, depending on how it is used.

In brief, the benchmarks are sysbench’s read-only and read-write tests, and the server has two Intel SSDs, 64GB of memory, and 4 x 12-core AMD Opteron 6172 “Magny-Cours” 2.1GHz CPUs. It is a reasonably typical commodity machine except for the high core count, which is more than I can remember seeing in the wild. The database was MySQL 5.5.7-rc. I am not sure why they didn’t run the GA version of MySQL for this benchmark. Maybe they wrote the paper before 5.5 went GA.

The following are plots of the read-only and read-write scalability models that I generated, based on these benchmarks.

Read-Only Results

Read-Only Results

Read-Write Results

Read-Write Results

The model predicts that the server will continue to provide more throughput as the core count climbs into the mid-50s, although the bang for the buck isn’t very good at that point. Also, there appears to be some bottleneck that hits more sharply than the model predicts at high core counts. It would be great if the benchmark were re-run with the same core counts and with sysbench on another machine, instead of taking 12 cores away from MySQL and giving them to sysbench. That way we could test with 48 cores and see what happens. My gut feeling is that the results will not be as good as the model predicts at high numbers of cores. But as Neil Gunther says, this wouldn’t mean the model is broken; it would mean that there is potentially something to fix in the server at high core counts. Without the model, there wouldn’t even be a basis for discussion.

The biggest thing I want to point out here is the dramatic improvement over just a few years ago, when you could “upgrade” from 4 to 8 cores and see a reduction in throughput. Oracle (and Percona, and lots of others) have done great work in the last couple of years making InnoDB scale and perform better on modern hardware.

Jan
25
2011
--

Percona Live: Venu Anuganti; early bird registration expires tomorrow

Today we are announcing Venu Anuganti as a guest speaker to Percona Live: San Francisco.

Venu works as a Data architect, evaluating and implementing high performance SQL/NoSQL data store solutions for large scale OLTP and OLAP infrastructure; and previously worked as a database kernel engineer at companies like SolidDB, MySQL, ANTs Data Server and Yahoo!

Venu will be presenting on the “Role of MySQL in Analytics, Warehouse and Large Data At Scale”, where he will share some of his latest research.  We have Venu to thank for the SHOW TEMPORARY TABLES feature which has made its way into to Percona Server, and for detailed investigative blog posts such as How read_buffer_size Impacts Write Buffering and Write Performance.

In addition to this announcement, we have a warning:  The $50 early bird ticket registration expires tomorrow!  At this point, pricing will go up to $125/attendee.

Jan
16
2011
--

Blog Redesign 2011

After more than four years of using a theme that can charitably be described as rustic, we’ve updated our look. We’re using the News theme from StudioPress, which is a bargain; their themes and theme framework are an engineering marvel. We’re still using the amazing Sphinx search plugin from Ivinco for fast and relevant searches. For the archives, here are before and after screenshots.

MySQL Performance Blog 2006     MySQL Performance Blog 2011

MySQL Performance Blog 2006 and 2011

Jan
13
2011
--

Different flavors of InnoDB flushing

In my recent benchmarks, such as this one about the Virident TachIon card, I used different values for innodb_buffer_pool_size, like 13GB, 52GB, and 144GB, for testing the tpcc-mysql database with size 100G. This was needed in order to test different memory/dataset size ratios. But why is it important, and how does it affect how InnoDB works internally? Let me show some details.

Internally, InnoDB uses two lists for flushing (writing pages from the Innodb buffer pool memory to disk): the LRU list and the flush list. You can see a tracking of these lists in SHOW ENGINE INNODB STATUS:

...
Pending writes: LRU 0, flush list 129, single page 0
...

It is important to understand which list is being used for flushing, because that defines what MySQL InnoDB tuning should apply.

  • The LRU list is used when InnoDB tries to read data from disk, but there are no free pages. In this case, InnoDB has to flush some data to be able to perform the read. For this, InnoDB uses the LRU list and flushes the least recently used page (this is what LRU stands for).
  • The flush list is used when the percentage of dirty pages reaches innodb_max_dirty_pages_pct, or for flushing to do checkpoint activity.

What is important about this? If your workload is mostly bound by the LRU list, then innodb_adaptive_flushing, innodb_max_dirty_pages_pct, innodb_io_capacity, and innodb_log_file_size do not matter, and changing these settings won’t help.

Ironically, these settings are most important for tuning MySQL 5.5 to get stable performance (as you may see from Dimitri’s post, http://dimitrik.free.fr/blog/archives/2010/12/mysql-performance-analyzing-perconas-tpcclike-workload-on-mysql-55.html). But you can’t really detect what kind of flushing is used. There is a variable in SHOW STATUS, innodb_buffer_pool_pages_flushed, but it shows the sum of flushing from both the LRU list and the flush list.
In Percona Server 5.5, I added the SHOW STATUS variable innodb_buffer_pool_pages_LRU_flushed, which shows only pages flushed by the LRU list.

So, let’s see what flushing we have using 13GB and 52GB for innodb_buffer_pool_size settings in Percona Server 5.5, using the Cisco UCS C250 server and the Virident tachIOn PCI-E Flash card.

As you can see, in the 13G case (where memory to data ratio is about 1/10), all flushing comes from the LRU list,
while with 52G, the LRU list does not play into it much.

Here are also graphs of reads and writes per second for both cases:

(BTW, I am really impressed at what a stable level of reads and writes per second the Virident tachIOn card provides.)

And here is a graph with the percentage of dirty pages for each case:

As you can see, with 13GB we are under the limit of innodb_max_dirty_pages_pct=75, and
with 52GB we are reaching the 75% maximum for dirty pages.

How can you detect in regular MySQL that your flushing comes from the LRU list? You could periodically check SHOW ENGINE INNODB STATUS, and see something like

Pending writes: LRU N, flush list 0, single page 0

where N>0, and flush list is 0.

A second way to detect it is when the number of dirty pages is less than innodb_max_dirty_pages_pct and checkpoint age is less than 3/4 of innodb_log_file_size * innodb_log_files_in_group. If you have these conditions and observe intensive writes, InnoDB is using LRU list flushing.

So why is it important to know what the mix of InnoDB flushing types is in your workload? There are a couple reasons:

  • It defines a different I/O pattern on your I/O subsystem. Also, it seems different file systems behave differently. You will see this in my upcoming posts.
  • It defines your MySQL InnoDB tuning. Now I am about to make a strong statement: If your workload is LRU list bound, then InnoDB and MySQL settings (mentioned above) do not matter a lot, and performance is mostly defined by available memory and the I/O subsystem. Also, it seems MySQL 5.1 with the InnoDB-plugin, MySQL 5.5, and Percona Server show about the same performance in this case, and it does not matter what version you pick from a performance standpoint. If you are LRU list bound, the strategy which will give the biggest benefit is: Install more memory or upgrade the I/O subsystem.
  • If you are flush list bound, then for MySQL 5.1/InnoDB-plugin or MySQL 5.5 the setting of innodb_adaptive_flushing, innodb_max_dirty_pages_pct, innodb_io_capacity, and innodb_log_file_size are important for tuning. Also, upgrading to Percona Server, with a more advanced flushing algorithm, may give a good performance gain. If you already use Percona Server, then increasing innodb_log_file_size may help, but check checkpoint age history to see if that is really needed. Of course, more memory and/or a better I/O subsystem should also be helpful.

(Post edited by Fred Linhoss)

Jan
12
2011
--

Innodb undo segment size and transaction isolation

You might know if you have long running transactions you’re risking having a lot of “garbage” accumulated in undo segment size which can cause performance degradation as well as increased disk space usage. Long transactions can also be bad for other reasons such as taking row level locks which will prevent other transactions for execution, this however only applies to read transactions while problem with undo space growth exists for read only transactions as well.

This is how long transaction looks:

---TRANSACTION 17402D749, ACTIVE 15867 sec, process no 19349, OS thread id 1630148928
MySQL thread id 188790, query id 14796224615 host.domain.com 127.0.0.1 root

It was running over 4 hours so Innodb could be preventing purge from happening for long time and there could be a lot of garbage in undo segment. If you’re running Percona Server it is easy to check:

mysql> select * from information_schema.innodb_rseg;
+---------+----------+----------+---------+------------+-----------+
| rseg_id | space_id | zip_size | page_no | max_size   | curr_size |
+---------+----------+----------+---------+------------+-----------+
|       0 |        0 |        0 |       6 | 4294967294 |      134528 |
+---------+----------+----------+---------+------------+-----------+
1 row in set (0.00 sec)

The curr_size specifies current size in pages, so we’re looking at about 2GB of waste in this case.
If not you can check history size in SHOW INNODB STATUS:

History list length 4000567

Mind however this size is in transactions and so very workload dependent. You may have a lot of waste even with small history size if transactions performed large amount of changes.

There is something else you need to know – not all changes are created equal from their impact on undo space usage. Inserts only have a small record in undo space so row can be deleted on rollback, update and delete however need to put the old row version in undo space and so will require more space, depending on row size. Long blobs (the ones which are not stored on the page) may require less space than you think – only portion of row which is stored on the data page need to be placed in undo space. Blob does not need to be copied as if it is updated the new blog version is placed in the new location. If rollback is to be performed just on-the-page part of the row needs to be recovered from undo space and it will contain correct pointer to appropriate blob version.

Now lets move on to the main topic – how does transaction isolation affects purging ? Innodb is smart and trying to purge data as soon as possible. In default (REPEATABLE-READ) mode Innodb has to maintain versions back to the first data read the oldest transaction has performed because it is possible for that transaction to require state of any row in Innodb table at that point in time. If you use READ-COMMITTED mode Innodb does NOT need to maintain row versions going back to the first data access in transaction, but rather can purge the data after statement is completed. Note long running queries will cause problems in this case anyway, even if you set transaction isolation to lowest READ-UNCOMMITTED level Innodb will not purge the rows from undo space until the statement/query is completed.

As result if you’re dealing with long running transactions consisting of many queries setting isolation mode to READ-COMMITTED can be good idea if your application supports it. Note you can set isolation mode on transaction start so you do not have to just pick one and Innodb is smart enough to figure how far it can purge the data, it even can be seen in SHOW INNODB STATUS:

Trx read view will not see trx with id >= F58464, sees < F58462

but whatever isolation mode you're using beware of long running selects as they prevent purging activities in any isolation mode.

Jan
12
2011
--

Conflict Avoidance with auto_increment_increment and auto_increment_offset

A lot of people are running MySQL Master-Master replication pairs in Active-Passive mode for purpose of high availabilities using MMM or other solutions. Such solutions generally have one major problem – you have to be very carefully switching writes as if you do not do it atomically (such as some scripts continue to write to old master) or if you had slave lag during the switch you can have replication stopped with the error or silently become inconsistent. You can pick what you dislike the most.

There are people to tell you you just should use auto_increment_increment and auto_increment_offset and you would not have any conflicts, I would tell you they are wrong but there is still a good reason you may consider using these options.

Let me start by saying these options help to avoid some conflicts – if you have insert in table with auto_increment column you will almost for sure get duplicate key error in replication thread if you happen to insert data to 2 masters at once. This can be solved with auto_increment_increment and auto_increment_offset, however a lot of other things can’t – updates to the data, deletes as well as insertion to the table with unique keys and non auto increment primary keys. The problem is duplicate key error is one of the few errors statement based replication will raise. deletes and updates resulting in different data will just happen silently in most cases.

If we use row based replication it can be configured to be more or less picky. In default, slave_exec_mode=STRICT mode replication will stop if row to be updated or deleted is not found in addition to duplicate key errors, if you set slave_exec_mode=IDEMPOTENT both duplicate key error and row not found errors will be ignored. Strict Mode with row level replication will not detect all conflicts – it is for example possible for the same row to get different updates on master and slave and replication will get silently out of sync.

So back of our topic of auto_increment_increment and auto_increment_offset – besides being able to avoid some problems they can be helpful in manually resolving inconsistencies later on as they allow to identify which server has inserted the data in question (and which was master at least for that application at that time). Sure you can find same information in binary logs but having it in the database too allows to speed up process in many cases.

Jan
12
2011
--

Percona Live Keynote Speaker: Mark Callaghan

Mark Callaghan has graciously accepted to be the closing keynote speaker for Percona Live: San Francisco!

Mark is best known for his work behind MySQL @ Facebook, where he and his team maintain one of the largest MySQL installations around.  They also contribute back to the community with a publicly available branch of enhancements, improved diagnostic tools, and bug reports which help make MySQL better.

Mark’s keynote will be on “High-value Transaction Processing”.  I assure you, this is a presentation not to be missed.