New OLAP Wikistat benchmark: Introduction and call for feedbacks

I’ve seen my posts on Ontime Air traffic and Star Schema Benchmark got a lot of interest

However benchmarks by itself did not cover all cases I would want, so I was thinking about better scenario. The biggest problem is to get real big enough dataset, and I thank to Bradley C. Kuszmaul, he pointed me on Wikipedia statistics on access to Wikipedia pages, and thank to Domas, who made stats accessible. Link to the archives: http://dammit.lt/wikistats/archive/ or the original Domas’s announcement .

Although the table does not have very much different information, I think it is good enough to represent cases you can face in Web application ( log processing, page visits, clickstream, etc).

I made some efforts to normalize data to have model in classic star schema and prepared queries that could be run on proposed dataset (John Sichi, lead of LucidDB helped me to draft some queries).
You can see details on our Percona Wikistat benchmark Wiki.

I have next goals with proposed benchmark:

  • Compare engines in OLAP queries for planning, predicting growth, analyzing access patterns to wiki pages, draw trends.
  • Compare engines in statistical queries for end users, which can be executed in real-time. I.e. How many times that or another page was accessed yesterday vs today.
  • Understand specific features and characteristic of each engine.
  • Compare throughput on simple queries (queries and scenario to be drafted yet)
  • Check ability to load data and serve queries at the same time ( availability during data load ) (queries and scenario to be drafted yet)

So in proposed schema I have four tables:
pagestat (fact table), and pages, datesinfo, projects (dimensions tables).

Dimensions tables are supposed to be static and not changed, and we can change datasize
by varying amount of months loaded into fact table (so this is scale factor).

EER diagram

( made with MySQL Workbench )

In current dataset, which you can download from Amazon snapshot (name: “percona-wikistat”, ID:snap-a5f9bacc) we have:

  • Table pages: 724.550.811 rows. data size: 40476M
  • Table datesinfo: 9624 rows, one entry represents 1 hour
  • Table projects: 2025 rows
  • Table pagestats
    Data for 2009-06: # 3.453.013.109 rows / size 68352M
    Data for 2009-07: # 3.442.375.618 rows / size 68152M

So with two months of stats we have about 172GB of data with about 7 billion rows in fact table.

Example of query ( again, full list on Benchmark Wiki)


  1. SELECT project, sum(page_count) sm
  2.  FROM pagestat
  3.    JOIN datesinfo di ON ( di.id=date_id )
  4.    JOIN projects p ON  (p.id=project_id )
  5.  WHERE di.calmonth=7 AND di.calyear=2009
  6.  GROUP BY project
  7.  ORDER BY sm DESC
  8.  LIMIT 20;

I am going to load data and run queries against available engines:

  • MySQL MyISAM / InnoDB (to have reference results)
  • InfoBright
  • InfiniDB
  • MonetDB
  • LucidDB
  • Greenplum

and I will report my results ( so stay with MySQLPerformanceBlog ;) )

I’d like also to test also Paraccel, Vertica and KickFire systems, but I do not have access to.

I welcome your feedback on the benchmark, and what else you would like to see here.

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Percona is hiring a Shift Support Captain

Percona is hiring. As part of our growth process, we introduced the role of the Shift Support Captain in 2009
(see the original announcement here) to
provide customers with a 24×7 technical contact person.

The Shift Support Team dispatches incoming emergencies, assigns new issues, handles or escalates incoming Nagios alerts from some customers, and in general makes sure things get handled
and customers get the attention they need.

You must be detail oriented, service oriented, tech savvy and as all Percona staff, able to work from home with little supervision. This is a growth position, so we’re also looking for people who can become
consultants with time.

The basic requirements are:

  • Excellent written and spoken English.
  • Knowledge of MySQL and InnoDB.
  • Knowledge of Linux systems administration.
  • Knowledge of popular LAMP technologies such as Apache.
  • General programming and scripting abilities.
  • Proven success working in a distributed environment where
    e-mail, IRC and voice calls are the only interaction with
    clients, colleagues and managers on a daily basis.

Work is scheduled in 8-hour shifts, and some weekend shifts will be required.

Some of the benefits of working for Percona include good pay and benefits, working from home, with a team of experts who work on challenging projects everyday and who develop/contribute to many cool projects

Some travel is required, at least for a yearly company meeting, and for some conferences and other events.

If you think you’re a good candidate, please fill out the contact form on our website. Thanks!

Entry posted by fernando |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: percona,Zend Developer |

Community Events February-March

February and March as busy months for Community events.  There’s MySQL University, Fosdem, the Seattle MySQL Meetup & Confoo.ca. Here are the details:

  • February 4th – I’ll be presenting a talk via MySQL University on Optimizing Queries with Explain.  This talk will be about learning to read the output from MySQL’s EXPLAIN command, and optimizing some example queries from the IMDB dataset.  At 5.7G in InnoDB tables imported before any secondary key indexes IMDB is one of my new favorite example databases.  It’s about the right size on most desktop PCs that you can measure the difference between a query that’s optimized and one that’s not.  All material comes courtesy of a new chapter I’ve been writing for our Developer’s class on Schema & Indexing.
  • February 7th – My colleague Piotr will be presenting on Multi-Master Replication Manager for MySQL at the MySQL & Friends Dev Room.  Besides Piotr’s talk, there will be a number of MySQL enthusiasts in Brussels for FOSDEM.  I recommend reading Lenz’s announcement or the wiki page for more information.
  • February 16th – I’ll be at the Seattle MySQL User Group giving a talk on Quick Wins with third party patches for MySQL.  This is an old talk I’ve given previously, but I’ll be updating it to demonstrate some of the really cool things you can do with XtraDB R-9.
  • February 23rd – The Boston MySQL Meetup is organizing a social dinner.  More details to come.
  • March 8-12th – I’ll be at Confoo.ca in Montréal presenting two training workshops, and talks on “I’m going to tell you what MySQL is bad at!” and “Diagnosing and Fixing MySQL Database Problems“.  Telling you what something is bad at is my provocative way of saying that there are some really awesome tools that compliment MySQL well.

Entry posted by Morgan Tocker |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


When should you store serialized objects in the database?

A while back Friendfeed posted a blog post explaining how they changed from storing data in MySQL columns to serializing data and just storing it inside TEXT/BLOB columns. It seems that since then, the technique has gotten more popular with Ruby gems now around to do this for you automatically.

So when is it a good idea to use this technique?

If the application really is schema-less and has a lot of optional parameters that do not appear in every record, serializing the data in one column can be a better idea than having many extra columns that are NULL. The restriction on this, would be that searching on these columns now becomes more difficult[1]. A good example of this optional nature of data is user preferences – you only really need to store the settings that differ from the default values.

The second situation where I can see this technique making sense, is that when you update the text/blob, a large percentage of the data is actually modified. If you have to write back the full blob for 1 change, it is expensive.

Another potential pro for this technique is that ALTER TABLE commands are no longer required. Until 5.1-plugin simple operations like adding a secondary index on an InnoDB table require the whole table to be rebuilt (now just the index is built). I don’t really buy this pro, since using master-slave switching I’ve always been able to emulate online index adds. Projects like MMM and Flipper make this even easier.

In Friendfeed’s case, they also used the serialization to be able to compress the objects.  From 5.1-plugin this is now available natively.

[1] Friendfeed solved this indexing problem by creating separate ‘index’ tables.

What are the downsides?

I would say that the first serious downside is write amplification. If you are constantly making small updates to one piece of data in a very large blob, the effort MySQL has to go to is greatly increased.

The second downside I would mention is that this pattern tends to force you to read/write larger amounts of data at once. We’ve all been trained to know that SELECT * is bad. This creates a pattern where SELECT * is not optional. In practical terms I would expect this increases the size of a working set since more data must be kept in the buffer pool.

The next obvious restriction is that there is a clear loss in functionality. You can no longer easily perform aggregation functions on the data (MIN, MAX, AVG). You are storing the data in a way that is very specific to one application, and you can’t just point a BI tool at it to process.

It can become difficult to apply even the simplest constraints on the data such as character length, if an age must be a number, and if the age must be unsigned.  MySQL doesn’t go as far as having check constraints, but what it has is better than nothing.

Some smaller issues I have is that if you are using a standard serialization technique like JSON it’s difficult to store pieces of data like numbers or IP addresses in their most efficient form, and that technically this design breaks 1NF.

What would I recommend?

Josh Berkus (of PgExperts) calls this serialization E-blob. Much like EAV, he criticizes this as one of 10 ways to wreck your database (slides). I tend to side with Josh.  I’m optimistic that this design has its place in a smaller component of your application, but I’m weary every time I hear someone decide to switch to it exclusively.

Before making such decisions for yourself I would recommend reading Peter’s KISS KISS KISS post.

Entry posted by Morgan Tocker |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


XtraDB feature: save / restore buffer pool

We recently released XtraDB-9, and while we did not highlight it in announcement, the release-making feature is ability to save and restore InnoDB buffer pool.
The idea is not new and was originally developed by Jeremy Cole (sorry, I do not have the link on hands) some time ago, and now we implemented it in XtraDB.

Why would we need to save and restore content of buffer pool ?
There are several reasons.
First, it’s not rate on modern servers to have 32GB+ of RAM, with allocated InnoDB buffer_pool 26GB or more. When you do restart of server, it may take long time to populate cache with useful data before you can bring it back to serve production load. It’s not rare to see
maintenance cycle takes two or more hours, mainly because the slave need to catchup with master and to warm cache.
In case with the server crash, it is even worse, you need to wait possible long time on InnoDB
recovery (we have the patch for that too, in that post you can see InnoDB recovery took 1h to accomplish) and after that warm caches.

Second, it is useful for some HA schemas, like DRBD, when, in case of failover, you need to start passive instance on cold.

So let’s see what results we have.
Details about patch you can get there http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_lru_dump_restore (Yasufumi names it LRU dump/restore, because he thinks about buffer pool as about LRU list, which how it is internally).

To save buffer pool you execute
select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_DUMP*/;

and to restore
select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_RESTORE*/;

it will create/read file ib_lru_dump from your database directory.

You may want to sort ib_lru_dump in order of pages in tablespaces, so RESTORE will be
performed in most sequential way. The small python script lrusort.py to sort ib_lru_dump is available
in our Launchpad branch lp:~percona-dev/percona-xtradb/extensions-1.0.6

I made small tpcc benchmark to show effect with restored buffer_pool (the condition of
benchmarks are the same as in my runs on fast storages, and I
used RAID10 to store InnoDB files).
First run (xtradb cold) I made just after restart and ran it for 1h.
After that I saved buffer_pool, restarted mysqld, restored buffer_pool ( it took about 4 min
to load 26GB worth of data), and run (xtradb warm) tpcc again.

Here is graphical results (results in New Transactions Per 10 sec, more is better):


As you see in the cold run it took 1500-1800 sec to enter into stable mode, while
it warm run it happened almost from start. There was some period of unstable results, but it
did not affect ability to serve load.

You are welcome to test it, it is available in XtraDB-9 release and also in MariaDB 5.1.41-RC.

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: MySQL,xtradb,Zend Developer |

Why Swapping is bad for MySQL Performance ?

So you get MySQL or other applications using too much memory on the box or OS behaving funny and using more memory for cache and pushing application to the swap. This causes swapping and causes performance problems. This much is obvious. But how bad is it ? Should you count it same as normal Disk IO as the box is having or is it worse than that ?

Swapping is going to impact your performance more than just normal IO and here are 3 reasons why. If you know more please let me know, for my taste these 3 are bad enough so I have not been looking for more.

Cache in the Swap File will multiply IO compared to just having less cache. What happens when page in cache is replaced which is swapped out itself ? First you have to find space to swap in the page (we’re speaking about memory pressure right?) which means swapping out some page. This would normally happen in background but still it has to be done. When the pages is swapped in which is second IO and finally you get page being cached read from the hard drive to the cache. This gives us 3 IOs instead of one. Nice :)
Flushing dirty pages or even discarding the page will cause extra IO slowing things down.

Skewing up all Algorithms The database internals algorithms are tuned for things being in memory and if they start dealing with data which is on disk they just often stop working with any reasonable level of efficiently – when database deals with on disk data it often uses different set of algorithms which are optimized to limit number of IOs or make them more sequential. Most of these were designed before SSD era. For example Insert Buffer in MySQL makes a special effort to avoid (delay) IO. If it happens to go to swap file it will more than defeat its purpose. Number of background threads are designed with assumptions they can check page statuses in buffer pool very efficiently which again stops working as soon as page accesses cause disk IO.

Escalated Locking/Latching If breaking of internal operations is not bad enough on its own lets see what swapping does for concurrent (multi CPU, multi client) processing. Database Locks/Latches are typically designed to be held for as short time as possible. The less portion of execution time thread spends holding exclusive locks the better system will scale. It is a big no-no to hold any critical locks while you’re doing disk IO as IOs take a long time. When system is swapping all of these gets messed up – when database is thinking it is taking the locks for few instructions only it can be a long while while IO completes – if this is critical lock it is possible to see everything in the system waiting on this IO, even transactions which work with data which is not in the swap file.

The bottom line: You should configure system so no swapping activity is going on during normal operations. The swap file itself may be justified – if you have some unexpected memory consumption spike you may prefer to see slowdown instead of MySQL being killed because of out of memory but do react on them promptly and do not treat such situation as normal.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Fast storage: 8 SSD Intel X-25M 80GB benchmarks

I appreciate opportunity Jos van Dongen from Tholis Consulting gave me. He granted me access to servers with 8 attached Intel X-25M 80GB MLC cards. The cards attached to 2 Adaptec 5805 raid controllers, with 4 cards per controller.

The cost of setup is 8 x 260$ (X-25M) + 2×500$ (Adaptec 5805) = ~3000$.
Available space varies in depends on raid setup from 300GB to 600GB.

The logical comparison is to compare results with FusionIO 320GB MLC card, so I will copy results from FusionIO 320GB MLC benchmarks.

For benchmarks I used sysbench fileio benchmark.
All raw results are available on Percona benchmarks wiki, there I will highlight most interesting points.

Couple words on tested setups. We used two configurations:

  • In first runs (software) each card connected as individual card to Adaptec, so in OS we see 8 individual cards. Card configured in software RAIDs: RAID0, RAID10, RAID5, RAID50
  • In second round (marked as hardware) each Adaptec is configured as hardware RAID0 over 4 cards, so in OS we see 2 devices. Devices connected in software RAIDs: RAID0 and RAID10 with different OS schedules over each device.

I should highlight I do not see usage in production for RAID0, as in my opinion SSD cards
are not reached enough level of reliability yet (see comments to post FusionIO 320GB MLC benchmarks), however I put results here to show theoretical maximal results.

So let’s start with random reads:


I’d say SSDs show comparable results with FusionIO on 16+ threads, however on 4-8 threads difference is significant. On SSD you can get 160MB/sec for 4 threads and 260MB/sec for 8 threads.

Random writes:

There couple things to note (beside SSD is doing much worse than FusionIO).
1. Something is wrong with scaling random writes in this setup. It is point for research,
I think there some serialization in Linux software raid or Linux scheduler or on Adaptec hardware level.
2. Cards connected in hardware raids show worse results than card connected as single devices (you can see results on http://www.percona.com/docs/wiki/benchmark:ssd:start, in summary table, randwr rows)
3. For cards connected in hardware raids, DEADLINE performs much worse than CFQ.

Sequential reads:

– For sequential reads you can get pretty decent results from SSD (230MB/sec on 4+ threads)
– Cards connected in hardware raid is doing much better
– DEADLINE outperforms CFQ here (which is different from random writes)
– Software raid0 performed pretty bad, so I chosen to show hardware-raid0 results

Sequential writes:

I’d say sequential writes is hard task for both SSDs and FusionIO, it does not scale well.
You may want to look into another options if your load requires sequential writes (e.g. like I
put InnoDB transactional logs on rotation based drives instead of SSD in my InnoDB on FusionIO benchmarks).

So in summary I can say

  • With SSD drives you can get decent results for random and sequential reads.
    I think it is competitive vs FusionIO if we look into price/performance analysis (remember FusionIO is twice as expensive)
  • Random writes did not work for me as expected, and this is point for investigations
  • RAID5, as expected, is only competitive for reads, but not writes
  • Complexity of having 8 SSD drives may be significant. You may want to
    look into different options: connect them in software or hardware raids, what scheduler to
    pick, etc. I suggest to run sysbench fileio or similar (i.e. iozone) to check if you get performance you expect
  • In my opinion maintaining 8 SSD cards per server is much more hassle than
    deal with single FusionIO card, however there is important point that with SSD you may
    “hot”remove and insert cards, while for FusionIO, which put into PCI-E bus, you need
    to shutdown server to replace it

Entry posted by Vadim |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


How expensive is MySQL Replication for the Master

I generally thought about MySQL replication as being quite low overhead on Master, depending on number of Slaves. What kind of load extra Slave causes ? Well it just gets a copy of binary log streamed to it. All slaves typically get few last events in binary log so it is in cash. In most cases having some 10 slaves are relatively low overhead not impacting master performance a lot. I however ran into the case when performance was significantly impacted by going from 2 to 4 slave on the box. Why ?

The reason was Master was having a lot of very small updates going on – over 15.000 transactions a second. Each time event is logged to the binary log all 4 threads feeding binary logs to the slave were to be woken up to send the update notification. With 4 slaves connected this makes 60.000 of threads being woken up sending some 60.000 packets (there may be some buffering taking place on TCP side merging multiple sends but still)

I guess this scenario is just not really caught any developer attention yet as it should be rather easy to optimize. Same as network cards are designed to throttle numbers of interrupts they get and process several packets at the time we could make replication threads to be woken up in the batches. For example we could tune the system to wake up the thread feeding slave no more often than 1000 times a second and each wake up even would send multiple events to the slave. It should be possible to make this number tunable as more rare wakeups are less overhead but they are also can impact replication latency a bit. It is also possible to get some auto detection in place timing how long it really takes all threads to send their data to the slave. If you have large amount of slaves the delay from the event executed on the master to last thread sends packet to the slave can be significant.

What does this case teach me in general ? To always look for data rather and question your assumptions. If something is unlikely to be the bottleneck it does not mean it is not :)

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


2010 Percona Training Schedule

After a nice long vacation, it’s time to unveil our destinations for public classes in 2010.  We are now offering a course for Developers as well as DBAs.  The dates are:

Do you run a meetup group in one of the cities listed? Get in touch! I would be happy to come along and give a presentation, or just answer your MySQL questions over a dinner or beer.  You can reach me via @percona or email.

Entry posted by Morgan Tocker |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


XtraDB storage engine release 1.0.6-9

Dear Community,

As of today Release 9 of XtraDB storage engine is available.

The release includes following new features:

  • The release is base on 1.0.6 version of InnoDB plugin.
  • MySQL 5.1.42 as a base release
  • Separate purge thread and LRU dump is implemented (this feature was actually added in Release 8, but somehow it was forgotten)
  • New patch innodb_relax_table_creation
  • Added extended statistics to slow log
  • Adjust defaults with performance intention
  • Added parameter to control checkpoint age
  • Added recovery statistics output when crash recovery (disabled by default)
  • Patch to dump and restore innodb_buffer_pool

Fixed bugs:

  • Bug#488315:
    rename columns and add unique index cause serious inconsistent with mysqld
  • Bug #495342: “MySQL 5.1.41 + InnoDB 1.0.6 + XtraDB patches extensions-1.0.6, rev. 127 fails to compile on OpenSolaris”
  • Bug #47621: change the fix of http://bugs.mysql.com/47621 more reasonable

The builds for RedHat 4, 5 , Debian and Ubuntu are located on http://www.percona.com/mysql/xtradb/5.1.42-9/ . Since Release 9 .tar.gz packages are available for FreeBSD.

There is an RPM Percona-release. It installs Percona.repo to /etc/yum.repos.d/ and the public key which we use to sign RPMs. The copy of the key is located on our web-site.

Once you install Percona-release repo the upgrade becomes as easy as


  1. yum update MySQL-server-percona MySQL-client-percona

The latest source code of XtraDB, including development branch you can find on LaunchPAD.

Please report any bugs found on Bugs in Percona XtraDB Storage Engine for MySQL.
For general questions use our Pecona-discussions group, and for development question Percona-dev group.

For support, commercial and sponsorship inquiries contact Percona

Entry posted by Aleksandr Kuzminsky |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com