Give me 8 hours, and I’ll help you build a better application

I have run into a number of cases recently that all had a similar look and feel. In most of these cases, the symptoms were very complicated, but they boiled down to just a few problems that can be prevented in very simple ways.

If you are not doing any of the following three simple things, you probably should. These are essential practices for building successful applications, which everyone should be doing. And the best part is how easy it is to do them — read on for the details.

1. Archive or Purge Your Database

Data archiving is crucial. There is a difference between data that your application needs now, and data that your application used to need. Do not co-mingle the two. Plan from the outset to archive or purge data that is no longer needed. Everybody wants to know how to deal with an application that is growing. The simplest way is to prevent it from growing. Archiving and purging unnecessary data is probably one of the highest value activities that you could ever do.

At a more technical level, archiving reduces the size of your workload’s “working set” — the data that the server really needs to keep in memory to function well. That’s because unused rows can bloat and dilute your indexes and tables. Storage engines such as InnoDB also don’t have features such as midpoint insertion algorithms for cache control, so a single table scan can flush really interesting data out of your LRU list and replace it with a bunch of stuff that’s needed only once for the table scan. Keeping those indexes small enough to fit in memory is a big step towards better performance, and archiving can help do that.

2. Monitor Your Servers

Everybody should have an automated monitoring system that is watching key elements of their systems. This can be very simple, even just an external service such as Pingdom or PagerDuty. Or it can be your own in-house hosted Nagios installation. The point is, your users should not be telling you when the service is down. Be the first to know, and have enough information to help you find the root of the problem.

Setting up monitoring can seem to be a daunting task, in part because of the steep learning curve that comes with most tools worth using. Another problem is the plethora of well-meaning but misguided advice and plugins for such tools, which will lead you to believe that you should monitor every ratio and counter there is in your systems — a sure recipe for an inbox flooded with useless noise, a monitoring screen that looks like the scene of a crime, and ultimately people who are numb to the monitoring system and ignore it when something really does go wrong. Keep it simple and monitor things that matter to your business: does the website load in the expected time and contain the expected text? Does the database server respond okay to a sample query that your application actually uses? Is the disk getting full?

3. Capture and Store Historical Data

Likewise, it is vital to have something recording metrics over time. A lot of people use an RRDTool-based system, such as Cacti. Again, it doesn’t really matter which one you use. At some point in the future, you are going to be very grateful that you have historical metrics to understand changes in your application, or understand what happened just prior to a crash. Here’s an example from a customer: the server crashed, and we could see in the log when that happened, but not why. The Cacti graph for disk space had a sharp curve — it showed the disk space suddenly filling up at the rate of gigabytes per minute. We looked at the disk and found it’d filled up with files due to an infinite loop in some code. How long would it have taken us to notice this otherwise, while checking dozens of possible causes?

Unlike with monitoring, you should measure and store everything you can, for as long as you can. You will never stay awake at night kicking yourself for measuring too much about your application.

All of the above are quite simple and fast to do once you’ve gone through the process a few times. I’d say that a practiced consultant can do a pretty comprehensive job in most applications in 8 hours or less. If it’s your first time through, it will take longer, and some good books might help. Either way, it is an investment with a very high ROI.

Entry posted by Baron Schwartz |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Innodb performance gotcha w Larger queries.

Couple of days ago I was looking for a way to improve update performance for the application and I was replacing single value UPDATE with multiple value REPLACE (though I also saw the same problem with INSERT ON DUPLICATE KEY UPDATE)

As I went from 1 value to 3 or 10 in the batch performance improved, especially for network tests, however going to batches of 100 and 1000 values performance started to degrade badly – the process which was taking 45 seconds with single value statements was taking over 12 minutes with 1000 values in a batch. This was a big surprise for me as I’ve used batching with great performance gains a lot of times.

I had 2 thoughts – either something is going on with parser or it is something nasty going on on Innodb level, so I tried running the test with MyISAM tables instead. The process completed in 12 seconds for 1000 rows in a batch vs 40 seconds for single row statements. So with single statement MyISAM was about 10% faster, which is quite expected but with 1000 rows in the batch the difference grew to 60 times !

I looked at the oprofile results:

samples % image name app name symbol name
312528 53.8091 mysqld mysqld dict_scan_to
150327 25.8823 __ctype_toupper_loc
71924 12.3834 mysqld mysqld .plt
18071 3.1113 no-vmlinux no-vmlinux (no symbols)

Which shows some funky function “dict_scan_to” taking over 50% of the time – not the function you would expect to see on top for update workload !

I asked Yasufumi, our Innodb code ninja to take a look what it could be and here is what he tells me:

This may be kind of performance bug of 5.0.

At row0sel.c:row_search_for_mysql() “PHASE 3:”,

InnoDB judge whether the SQL is “SELECT” or not.

5.0 does scanning the SQL for each time.

if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
&& prebuilt->select_lock_type != LOCK_NONE
&& trx->mysql_query_str) {

/* Scan the MySQL query string; check if SELECT is the first
word there */
ibool success;

dict_accept(*trx->mysql_query_str, “SELECT”, &success);

if (success) {
/* It is a plain locking SELECT and the isolation
level is low: do not lock gaps */

set_also_gap_locks = FALSE;

The “REPLACE 1000 rows” SQL doesn’t have “SELECT” so, InnoDB scan all of the SQL each times…
(* “INSERT” may not use this function? “UPDATE” may be also affected)

On the other hand, current 5.1 at the same place is,

if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
&& prebuilt->select_lock_type != LOCK_NONE
&& trx->mysql_thd != NULL
&& thd_is_select(trx->mysql_thd)) {
/* It is a plain locking SELECT and the isolation
level is low: do not lock gaps */

set_also_gap_locks = FALSE;

looking up to the flag of mysql_thd….

Scanning the statement for each update… that is a bummer ! It is great to see however it is fixed in MySQL 5.1
It also explains why the problem took a while to uncover – the most typical statement of the giant size which we use is INSERT, and it does not use this function so it did not have a bug.
INSERT ON DUPLICATE KEY UPDATE, REPLACE, UPDATE all should be affected, though I have not tested this carefully.

There is one more interesting thing with this code – The scan is only run if trx->isolation_level <= TRX_ISO_READ_COMMITTED - so the problem would not exist in case you use default REPEATABLE-READ isolation level - if you try to save by lowering isolation mode you get such unexpected regression. This is why I typically recommend people to stick to default settings (which everybody uses) unless there is proven measurable gain by doing the change. This especially applies to behavior changing settings which can cause a more significant impact than settings which apply to size of anything.

After putting isolation mode back to REPEATABLE-READ We get the batch with 1000 rows per statement run in 15 seconds instead of 12 minutes. What a change !

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Innodb,Zend Developer |

Looking at Redis

Recently I had a chance to take a look at Redis project, which is semi-persistent in memory database with idea somethat similar to memcache but richer feature set.

Redis has simple single process event driven design, which means it does not have to deal with any locks which is performance killer for a lot of applications. This however limits it scalability to single core. Still with 100K+ operations a second this single core performance will be good enough for many applications. Also nothing stops you from running many Redis instance on single server to get advantage of multiple cores.

I call Redis semi-persistent because it does not store the data on disk immediately but rather dumps its all database every so often – you have a choice of configuring time and number of updates between database dumps. Because dump is basically serial write Redis does not an expensive IO subsystem. Also because this dump is background it does not affect read/write performance to the database which is in memory. In the tests I’ve done I’ve seen Redis doing writes some 4MB/sec for probably 50% of test duration where Innodb had to write 50MB/sec for about third of throughput and doing a lot of random IO as it was doing it. This is among other things because Innodb has to flush full 16K pages while doing flush.

The background flush in Redis is designed the following way – Redis process forks and justs dumps the database it has in the background. Unix copy on write takes care of getting another copy of pages as they are modified. This keeps overhead rather low. The database is dumped in temporary file which is renamed only after fsync which means if you crash during the dump you simply discard partial file.

I also liked full pipelining support in the protocol – you can send multiple commands at once – any commands and redis server will process them in order and returns results to you. This not only allows for multi-get and multi-set but any of batches of commands being submitted. The API support for this is relatively week but the features are there.

When it comes to data types – Redis supports simple key-value storage just as memcache but it also adds support for Lists, which are similar to the linked list data type you would have as well as sets which allow to store sets of strings with support of various set operations.

I kind of miss support for something like associative array/hash/map in the data types but I guess there is nothing in architecture which would stop it from being added later.

Redis also has support for “databases” which are basically key spaces inside the server. This should allow for using server by different applications, different versions, testing or some other features. Though you’ve got to be careful with these – there is only simple per instance password based authentication in place, so if application can talk to the instance it can access all databases.

I am also a bit surprised why databases are numbered instead of named. Naming the databases would make it more simple to avoid unwanted conflicts etc.

Redis also supports master/slave replication out of the box and it is extremely simple. You just specify from which node to replicate and this is it. It is even more simple than with MySQL as you do not need to deal with snapshot or binary log position. Replication is asynchronous and low overhead – redis will perform the database dump and store the commands on the data since the start of the process. Slave can get the data (which is basically set of commands to populate database itself) and when get the data from the master as it comes. I did not benchmark the replication capacity but I’d expect it to be close to 100K of writes/sec the single instance can handle.

The benchmarks I’ve done were for applications which is very update intensive with updates being pretty much random single row updates which are hard to batch. With MySQL/Innodb I got server being able to handle some 30.000 updates/sec on 16 core server with replication being able to handle 10.000 updates/sec. This was using about 5 cores so you could probably get 4 MySQL instances on this server and get up to 100K updates/sec with up to 40K updates/sec being able to replicate.

With Redis I got about 3 times more updates/sec – close to 100.000 updates/sec with about 1.5 core being used. I have not tried running multiple instances and I’m not sure the network and TCP stack would scale linearly in this case but anyway we’re speaking about hundreds of thousands of updates/sec.

I think Redis can be great piece of architecture for number of applications. You can use it as the database or as cache (it supports data expiration too)

I have not benchmarked it against memcache in terms of performance and memory usage. This may be another project to look at.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: lamp,Zend Developer |

A few administrative updates

I wanted to write a few administrative updates in one so I didn’t spam everyone’s feed readers too much. Here we go:

  • We’ve had reports of some lost comments.  We reported this via Twitter a while ago, but thought it was fixed.  We’ll try and pay more attention to spam filtering, but we wanted to reach out and say – let us know if you are one of the people having problems.  We’re listening.  (Of course if you have any suggestions on how to manage/filter through thousands of spam items/week, we’d like to know as well).
  • Pingbacks were recently dropped when we upgraded to WordPress 2.8.  Miss them?  We’ve turned them back on.
  • Our training sales for Santa Clara and San Francisco are going well.  As a result we’re planning to have an east coast location at some point in the future.  Please feel free to comment where you would like that to be!
  • We have notified the winners of of our training competition.  I wanted to share their comments:
    My favorite Percona patch : innodb fast recovery patch. Reason: Innodb recovery has always been a big headache for any DBA since it used to take ages to complete a recovery after crash. Now with the patch I am able to do a recovery in less than a tenth of the time it used to take. Great job by Yasufumi in hacking the code(not just this patch :) ) which has been a nightmare for all DBA's.


    My favorite patch is "Micro second slow query patch" as it actively allows me to identify the queries which takes few seconds to execute but their frequencies is just too high i.e. few hundred times a hour. By optimizing these queries I can freeup the server resources to do other useful work.

  • And the last item is a reminder: the deadline for our early bird registration on training is Monday.

Entry posted by Morgan Tocker |

Add to: delicious | digg | reddit | netscape | Google Bookmarks



Dear Community,

The release 0.9 of the opensource backup tool for InnoDB and XtraDB is available for download.


  • tar4ibd could not treat over 8GB file, now max 64GB
  • prepare-speed-hack is added

Fixed bugs:

The binary packages for RHEL4,5, Debian, FreeBSD, MacOS as well as source code of the XtraBackup is available on

The project lives on Launchpad : and you can report bug to Launchpad bug system: The documentation is available on our Wiki.

For general questions use our Pecona-discussions group, and for development question Percona-dev group.

For support, commercial and sponsorship inquiries contact Percona.

Entry posted by Aleksandr Kuzminsky |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


XtraDB storage engine release 1.0.3-7

Dear Community,

Today we are announcing XtraDB release 7.
This is the last release based on InnoDB plugin version 1.0.3, as you might know Innobase has released 1.0.4.

The release includes following new features:

  • MySQL 5.1.37 as a base release
  • speed hack for buf_flush_insert_sorted_into_flush_list controlled by the new variable innodb_fast_recovery

Fixed bugs:

The builds for RedHat4,5 and Debian are located on
The latest source code of XtraDB, including development branch you can find on LaunchPAD.

Please report any bugs found on Bugs in Percona XtraDB Storage Engine for MySQL.
For general questions use our Pecona-discussions group, and for development question Percona-dev group.

For support, commercial and sponsorship inquiries contact Percona

Entry posted by Aleksandr Kuzminsky |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


A micro-benchmark of stored routines in MySQL

Ever wondered how fast stored routines are in MySQL? I just ran a quick micro-benchmark to compare the speed of a stored function against a “roughly equivalent” subquery. The idea — and there may be shortcomings that are poisoning the results here, your comments welcome — is to see how fast the SQL procedure code is at doing basically the same thing the subquery code does natively (so to speak).

Before we go further, I want to make sure you know that the queries I’m writing here are deliberately mis-optimized to force a bad execution plan. You should never use IN() subqueries the way I do, at least not in MySQL 5.1 and earlier.

I loaded the World sample database and cooked up this query:


  1. SELECT sql_no_cache sum(ci.Population) FROM City AS ci
  2.   WHERE CountryCode IN (
  3.     SELECT DISTINCT co.Code FROM Country AS co
  4.       INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code
  5.     WHERE lower(cl.LANGUAGE) = ‘English’);
  6. +——————–+
  7. | sum(ci.Population) |
  8. +——————–+
  9. |          237134840 |
  10. +——————–+
  11. 1 row IN SET (0.23 sec)

This pretty consistently runs in just about 1/4th of a second. If you look at the abridged explain plan below, you’ll see the query is doing a table scan against the first query, and then executing the subquery for each row:


  1. mysql> EXPLAIN SELECT ….\G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: ci
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 4079
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: DEPENDENT SUBQUERY
  16. *************************** 3. row ***************************
  17.            id: 2
  18.   select_type: DEPENDENT SUBQUERY

Now I took the subquery and basically rewrote it as a stored function.


  1. mysql> delimiter //
  2. mysql> CREATE FUNCTION speaks_english(c char(3)) returns integer deterministic
  3.     > begin
  4.     > declare res int;
  5.     > SELECT count(DISTINCT co.Code) INTO res FROM Country AS co INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code WHERE lower(cl.LANGUAGE) = ‘English’ AND co.Code = c;
  6.     > RETURN res;
  7.     > end//
  8. mysql> delimiter ;

Now the query can be rewritten as this:


  1. mysql> SELECT sql_no_cache sum(ci.Population) FROM City AS ci WHERE speaks_english(CountryCode)> 0;
  2. +——————–+
  3. | sum(ci.Population) |
  4. +——————–+
  5. |          237134840 |
  6. +——————–+
  7. 1 row IN SET (1.00 sec)

If we explain it, we get output similar to the first table shown above, but the further two rows are not shown. The query can’t be optimized to use indexes, and the stored function is opaque to the optimizer. This is why I purposefully wrote the subquery badly in the first query! (If you think of a better way to compare apples and uhm, apples… please comment).

The poorly-optimized-subquery portion of the query essentially happens inside that function now.

And it’s four times slower, consistently, and that’s all I wanted to show here. Thanks for reading.

Entry posted by Baron Schwartz |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


D&D: 3.5 Initiative Combat Cards

Here are some modified 3.5 Initiative Combat Cards that were adapted from the Heroforg 3.5 Initiative Card:

D&D: 3.5 Initiative Combat Cards – for Players

D&D: 3.5 Initiative Combat Cards – for Monsters

D&D: 3.5 Initiative Combat Cards – Excel – this was used to make the above cards

Written by in: Tech | Tags:

XtraDB: The Top 10 enhancements

Note: This post is part 2 of 4 on building our training workshop.

Last week I talked about why you don’t want to shard. This week I’m following up with the top 10 enhancements that XtraDB has over the built-in InnoDB included in MySQL 5.0 and 5.1.  Building this list was not really a scientific process – It’s always difficult to say which feature is better than another, because a lot of it depends on the individual workload.  My ranking method was to pick the features that have the highest impact and are most applicable to all workloads first:

  1. CPU scalability fixes – XtraDB improves performance on systems with multi-cpus (see docs 1, 2).
  2. Import/Export Tables – XtraDB allows you to import an arbitrary  table from one server to another, by backing up the .ibd file with Xtrabackup (see docs).
  3. IO scalability fixes – A lot of the internal algorithms of InnoDB are based on the non-configurable assumption that the server has only a single disk installed (100 iops).  One problem in particular that this causes, is that InnoDB the algorithm which chooses if InnoDB is too busy to flush dirty pages can consider it’s self busy very easily.  Keeping a large percentage of pages dirty increases recovery time, and will lead to more work when checkpoints are eventually forced at the end of a log file.  XtraDB improves this with innodb_io_capacity, as well as configuration items for innodb_read_threads, innodb_write_threads (see docs).
  4. Better Diagnostics – The SHOW ENGINE INNODB STATUS command in XtraDB shows a lot more information than the standard InnoDB status (see docs).  The built-in InnoDB status also has some problems with the placement of items (a long transaction list will prevent the rest of the information shown).  In addition to this, XtraDB diagnostics include the ability to see the contents of the buffer pool (see docs), and InnoDB row statistics are inserted into the slow query log (see docs).
  5. Fast Crash Recovery – In the built-in InnoDB, the crash recovery process is sometimes best measured in hours and days – this restricts users to using very small transaction log files (innodb_log_file_size), which is worse for performance.  In a simple test, XtraDB recovered ten times faster (see docs).
  6. InnoDB Plugin Features – XtraDB is derived from the InnoDB plugin, which has fast index creation (as opposed to recreating the whole table!) and page compression.
  7. Adaptive Checkpointing – The built-in InnoDB can have erratic dips in performance as it approaches the end of a log file and needs to checkpoint – which can cause a denial of service to your application (this can be seen in any benchmark – such as this one).  In XtraDB, adaptive checkpointing can smooth out the load, and checkpoint data more aggressively as you approach the end of a log file (see docs).
  8. Insert Buffer control – The insert buffer is a great feature of InnoDB that is not often discussed.  It allows you to delay the writing of non-unique secondary index pages, which can often lead to a lot of merged requests and reduced IO.  The  problem with the insert buffer in the built-in InnoDB, is that there are no options to tweak it.  It can grow to 1/2 the size of your buffer pool, and when it does, it doesn’t try to aggressively free entries (a full buffer provides no use) or reduce its size (see docs).
  9. Data dictionary control – Once an InnoDB table is opened it is never freed from the in-memory data dictionary (which is unlimited in size).  XtraDB introduces a patch to be able to control this, which is useful in cases where users have a large number of tables. (see docs).
  10. Additional undo slots – In the built-in InnoDB, the number of open transactions is limited to 1023 (see bug report).  XtraDB allows this to be expanded to 4072 (Warning: Incompatible change!).  (see docs).

All of these 10 items will be covered in our Training workshops for InnoDB and XtraDB.  In Santa Clara / San Francisco between 14-16 September?  Come along!

My next post in this series will be on XtraDB: The Top 10 Configuration Parameters.

Entry posted by Morgan Tocker |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Innodb plugin 1.0.4 released – great job Innobase

As you might have seen Innodb Plugin 1.0.4 was released today. I am very excited to see this release which is released exactly 5 months after release of Innodb Plugin 1.0.3 (I honestly expected to see Innodb Plugin 1.0.4 to be released by MySQL Conference and Expo in April). This also is still “early adopter” version of a plugin which is a bit of disappointment as we can’t wait for Innodb plugin to become stable/GA but considering number of improvements this is probably good thing.

We’re also pleased to see some of Percona contributions made in this release (in modified form) while others were evaluated and given ideas for different implementations.

Among other changes I’m especially pleased with fixed Group Commit Bug which I reported about 4 years ago and which was known for about 5 years, since early MySQL 5.0 beta releases. We had semi-fix for this problem earlier this year but it is great to see solution with even less mutex code.

Now the fun part starts – we need to evaluate/benchmark/stress test Innodb Plugin 1.0.4 to integrate it with next XtraDB release.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by