As Mark pointed out, there isn’t a lot of detail in the release notes about what could potentially be a very serious problem that is fixed in MySQL 5.1.60. I’ll repeat here the full documentation from the release notes:

“InnoDB Storage Engine: Data from BLOB columns could be lost if the server crashed at a precise moment when other columns were being updated in an InnoDB table. (Bug #12704861)”

This wasn’t much detail…so I went digging. Even though the BZR trees have not yet been updated with the 5.1.60 release tags, there’s enough history there to attempt to piece together what the story is.

Sometime after 5.1.57, Marko fixed “Bug#12612184 Race condition after btr_cur_pessimistic_update()” (revision id: ). I mention this bug as it is mentioned in the commit message for the fix for the bug we’re interested in, 12704861.

The fix for our bug is in revision id (so you can find it in the bzr tree if you’re so interested). The first paragraph of the commit message is rather telling:

The fix of Bug#12612184 broke crash recovery. When a record that contains off-page columns (BLOBs) is updated, we must first write redo log about the BLOB page writes, and only after that write the redo log about the B-tree changes. The buggy fix would log the B-tree changes first, meaning that after recovery, we could end up having a record that contains a null BLOB pointer.

So, from this we can tell that crash recovery for updated records that contain BLOBs is broken in 5.1.58 and 5.1.59. Now, just because this bug is there doesn’t mean you’ll hit it all the time, InnoDB has managed to have recovery bugs for a very long time before and people only seldom hit them.

The last paragraph of the commit message is also interesting:

There is a known issue in tablespace extension. If the request to allocate a BLOB page leads to the tablespace being extended, crash recovery could see BLOB writes to pages that are off the tablespace file bounds. This should trigger an assertion failure in fil_io() at crash recovery. The safe thing would be to write redo log about the tablespace extension to the mini-transaction of the BLOB write, not to the mini-transaction of the record update. However, there is no redo log record for file extension in the current redo log format.

What does this mean? If writing a BLOB causes the tablespace to be extended and you crash at exactly the right time you can end up with a situation where crash recovery will itself crash. The last sentence (“there is no redo log record for file extension in the current redo log format”) to me suggests that if we were to go looking for bugs, we could certainly find more around various operations and growing tablespace files. The two obvious recommendations to help avoid hitting such bugs: don’t use autoextend and create huge initial tablespaces both have drawbacks. Perhaps a compromise is to have the autoextend increment be sufficiently large that you won’t do it too often and greatly reduce your exposure to such bugs.

For completeness, I’ll include the full text of Marko’s commit message for the fix as it’s interesting reading for those of us who are into InnoDB internals:

The fix of Bug#12612184 broke crash recovery. When a record that contains off-page columns (BLOBs) is updated, we must first write redo log about the BLOB page writes, and only after that write the redo log about the B-tree changes. The buggy fix would log the B-tree changes first, meaning that after recovery, we could end up having a record that contains a null BLOB pointer.

Because we will be redo logging the writes off the off-page columns before the B-tree changes, we must make sure that the pages chosen for the off-page columns are free both before and after the B-tree changes. In this way, the worst thing that can happen in crash recovery is that the BLOBs are written to free pages, but the B-tree changes are not applied. The BLOB pages would correctly remain free in this case. To achieve this, we must allocate the BLOB pages in the mini-transaction of the B-tree operation. A further quirk is that BLOB pages are allocated from the same file segment as leaf pages. Because of this, we must temporarily “hide” any leaf pages that were freed during the B-tree operation by “fake allocating” them prior to writing the BLOBs, and freeing them again before the mtr_commit() of the B-tree operation, in btr_mark_freed_leaves().

btr_cur_mtr_commit_and_start(): Remove this faulty function that was introduced in the Bug#12612184 fix. The problem that this function was trying to address was that when we did mtr_commit() the BLOB writes before the mtr_commit() of the update, the new BLOB pages could have overwritten clustered index B-tree leaf pages that were freed during the update. If recovery applied the redo log of the BLOB writes but did not see the log of the record update, the index tree would be corrupted. The correct solution is to make the freed clustered index pages unavailable to the BLOB allocation. This function is also a likely culprit of InnoDB hangs that were observed when testing the Bug#12612184 fix.

btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs, or freed (nonfree=FALSE) before committing the mini-transaction.

btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).

btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter “mtr_t* init_mtr” for specifying the mini-transaction where the page should be initialized, or if this is a “fake allocation” (init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).

btr_page_alloc(): Add the parameter init_mtr, allowing the page to be initialized and X-latched in a different mini-transaction than the one that is used for the allocation. Invoke btr_page_alloc_low(). If a clustered index leaf page was previously freed in mtr, remove it from the memo of previously freed pages.

btr_page_free(): Assert that the page is a B-tree page and it has been X-latched by the mini-transaction. If the freed page was a leaf page of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to the mini-transaction.

btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.

fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page(). Allocate the specified page from a partially free extent.

fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter “mtr_t* init_mtr” for specifying the mini-transaction where the page should be initialized, or NULL if this is a “fake allocation” that prevents the reuse of a previously freed B-tree page for BLOB storage. If init_mtr==NULL, try harder to reallocate the specified page and assert that it succeeded.

fsp_alloc_free_page(): Add the parameter “mtr_t* init_mtr” for
specifying the mini-transaction where the page should be initialized. Do not allow init_mtr == NULL, because this function is never to be used for “fake allocations”.

mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag mtr->freed_clust_leaf for quickly determining if any MTR_MEMO_FREE_CLUST_LEAF operations have been posted.

row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

row_build(): Correct a comment, and add a debug assertion that a record that contains NULL BLOB pointers must be a fresh insert.

row_upd_clust_rec(): When columns are being moved off-page, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

buf_reset_check_index_page_at_flush(): Remove. The function fsp_init_file_page_low() already sets bpage->check_index_page_at_flush=FALSE.

There is a known issue in tablespace extension. If the request to allocate a BLOB page leads to the tablespace being extended, crash recovery could see BLOB writes to pages that are off the tablespace file bounds. This should trigger an assertion failure in fil_io() at crash recovery. The safe thing would be to write redo log about the tablespace extension to the mini-transaction of the BLOB write, not to the mini-transaction of the record update. However, there is no redo log record for file extension in the current redo log format.

rb:693 approved by Sunny Bains


Percona Server 5.5.17-22.1 released

Percona is glad to announce the release of Percona Server 5.5.17-22.1 on November 19th, 2011 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.5.17, including all the bug fixes in it, Percona Server 5.5.17-22.1 is now the current stable release in the 5.5 series. All of Percona ‘s software is open-source and free, all the details of the release can be found in the 5.5.17-22.1 milestone at Launchpad or in the Release Notes for 5.5.17-22.1 in the documentation.

Bug Fixes

  • MyISAM repair-by-sort buffer could not be greater than 4GB even on 64bit architectures. Bug Fixed: #878404 (Alexey Kopytov).
  • The kill idle transactions feature in XtraDB (if enabled) could sometimes cause the server to crash. Bug Fixed: #871722 (Yasufumi Kinoshita).
  • In a master-master setup when using SET user variables it was possible to have SHOW SLAVE STATUS give incorrect output due to a corrupted relay log. Bug Fixed: #860910 (Alexey Kopytov).

MySQL Life Cycle. Your Feedback is needed.

When I started with MySQL 3.22 I would start running MySQL from early beta (if not alpha) and update MySQL the same date as release would hit the web. Since that time I matured and so did MySQL ecosystem. MySQL is powering a lot more demanding and business critical applications now than 12 years ago and it also a lot more complicated product with more things which can go wrong. I see a lot fewer customers running their systems on non GA software and a lot more customers running MySQL few major version behind the most current one – if you’re not running the most demanding or quickly changing application it well might be good enough.

Many users select rather traditional/prudent approach these days and will only start considering upgrade to next major MySQL version 6+ months after release. MySQL 5.5 is out almost a year ago and I would still estimate more people running on MySQL or Percona Server 5.1 than 5.5 Facebook also sticks with MySQL 5.1 even though their highly patched and improved version.

MySQL 5.0 which went to GA in October 2005 is in minority these days though it is not exceptional to see it with number of people jumping from version 5.0 straight to 5.5. This means 6 years since GA release date – quite respected.

The cases when you see old MySQL versions are used is typically when application is not overly demanding so performance improvements in recent releases is not critical, older applications which have not been actively developed recently are also more likely to run on old MySQL release which is understandable as upgrade requires extra QA effort might be application changes which might not be practical for old applications. For the same reasons we often see old MySQL versions not being upgraded to the most recent version in series. It is frequent to see something like MySQL 5.0.44 which went in as release in some of distributions.

As many people use MySQL releases which come with their distributions and some distributions would not offer MySQL upgrade to the next major version out of the box it is not a big surprise people running older Linux distributions are more frequently to stay on old MySQL releases.

Now where I’m looking for your feedback is understanding what kind of Version and Platform support do you think is adequate for MySQL ? Oracle seems to have extended support for MySQL versions rather significantly with Extended support going 8 years after release date. This does not seems to apply to MySQL 5.0 though which is described as EOL in the manual. even though it was released just about 6 years ago. With supported platforms it gets a bit more confusing. Per this table RHEL4 is Supported all the way to MySQL 5.5 yet it does not seems to be available from download pages. It looks like that page is outdated in more places. Support for Itanium was announced to be stopped yet that page still mentions it.

Do you think such policies are reasonable ? Do we need longer support cycle or shorter would make sense ? How long older platforms need to be supported ? I think for example it makes sense to support RHEL4 with MySQL 5.1 but support with MySQL 5.5 is not needed as people doing major MySQL upgrade usually would have an option to upgrade operating system as well

Or is MySQL Upgrade to new version while sticking to very old operating system is important for you ? Finally I’d like to know What is the Oldest Platform/Operating system are you running MySQL on now ? Do we have any RHEL4 still in production ? RHEL3 ?

I need your feedback so we can understand what policies would make sense for our builds and MySQL Support offerings. Supporting many platforms and versions for long time takes resources which alternatively could be used to do various cool stuff with Percona Server, Percona Xtrabackup and other Open Source software Percona develops.

I would appreciate your feedback on this matter either as comments to this blog posts or as direct email to my initials


Eventual Consistency in MySQL

We’re told that foreign key constraints are the best way to enforce logical referential integrity (RI) in SQL, preventing rows from becoming orphaned.  But then we learn that the enforcement of foreign keys incurs a significant performance overhead.1,2

MySQL allows us to set FOREIGN_KEY_CHECKS=0 to disable enforcement of RI when the overhead is too high.  But if you later restore enforcement, MySQL does not immediately scan all your data to verify that you haven’t broken any references while the enforcement was disabled.  That’s undoubtedly the best choice, but it leaves our database in a strange state where RI constraints are active, but we cannot assume that all our data satisfies RI.  At some point we do want to perform some quality control, to verify consistency.  How can we do that?

Quality Control Queries

We need to check for orphaned rows in every parent-child relationship in your database.  That is, do an exclusion join from child table to its referenced parent table, and if no matching parent row is found, then the child is an orphan.  For example, we have a parent table Foo and a child table Bar:


You can find orphaned rows in Bar:


If the result set of this query is empty, then there are no orphaned rows.

But there are probably hundreds of foreign key relationships in any complex database.  We can find all foreign keys by querying the INFORMATION_SCHEMA. The KEY_COLUMN_USAGE contains information about both primary keys and foreign keys. If the REFERENCED_* columns are non-null, it’s a foreign key.

*************************** 1. row ***************************
           CONSTRAINT_CATALOG: def
            CONSTRAINT_SCHEMA: test
              CONSTRAINT_NAME: Bar_ibfk_1
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: Bar
                  COLUMN_NAME: X
             ORDINAL_POSITION: 1
*************************** 2. row ***************************
           CONSTRAINT_CATALOG: def
            CONSTRAINT_SCHEMA: test
              CONSTRAINT_NAME: Bar_ibfk_1
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: Bar
                  COLUMN_NAME: Y
             ORDINAL_POSITION: 2
2 rows in set (0.00 sec)

This shows that test.Bar has columns X,Y which reference test.Foo columns A,B.

Generating SQL with SQL

Now that we know the information in KEY_COLUMN_USAGE, we can write a query to fetch every distinct KEY_COLUMN_USAGE.CONSTRAINT_NAME where the REFERENCED_* columns are non-null.  From that information, we can generate an exclusion-join query for each foreign key relationship:

 ) AS _SQL

 | _SQL                                                                                                                                                                   |
 | SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo  ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL; |
 1 row in set (0.00 sec)

It can be a dizzying exercise to write SQL queries that in turn output valid SQL queries, but with a little care and good testing, we can do it.

Using The Quality Control Query

We now have a query that can verify the referential integrity between Foo and Bar.  If we run this query and it returns a non-empty result set, it shows us which rows in Bar are orphans.

mysql> INSERT INTO Foo (A,B) VALUES (111,2222), (333,444);
mysql> INSERT INTO Bar (ID,X,Y) VALUES (21,333,444);

mysql> INSERT INTO Bar (ID,X,Y) VALUES (42,555,666); -- THIS IS AN ORPHAN

mysql> SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo  ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL;
| test.Bar.ID |
|          42 |
1 row in set (0.00 sec)

That shows that the quality control query can find orphans.  Because we have a way to generate quality control queries automatically, we can run them at any time:

$ cat generate_foreign_key_checks.sql | mysql -N | mysql -E
*************************** 1. row ***************************
test.Bar.ID: 42


We can get information about foreign key constraints from the INFORMATION_SCHEMA, but what if our database includes relationships for which we never declared foreign key constraints at all? There’s no foreign key constraint, therefore the INFORMATION_SCHEMA gives us no information about the relationship. That’s okay, we can create a conventional table with similar structure, and track our database design manually, so we can use the same SQL queries for quality control:

            CONSTRAINT_SCHEMA = 'ecommerce',
              CONSTRAINT_NAME = 'fk_lineitems',
                 TABLE_SCHEMA = 'ecommerce',
                   TABLE_NAME = 'LineItems',
                  COLUMN_NAME = 'order_id',
             ORDINAL_POSITION = 1,
      REFERENCED_TABLE_SCHEMA = 'ecommerce',
        REFERENCED_TABLE_NAME = 'Orders',

All that remains is to employ test.MY_KEY_COLUMN_USAGE in our query generator script instead of INFORMATION_SCHEMA.KEY_COLUMN_USAGE.


Without real foreign key constraints, the database can’t prevent rows from being orphaned, and it’s almost a given that this will occur from time to time.  You can use techniques like the above to automate quality control and detect orphans early so you can correct them.


1Innodb locking and Foreign Keys

2Instrumentation and the cost of Foreign Keys


Stewart speaking at OSDC 2011

I’ve just arrived at ANU in Canberra for the Open Source Developers Conference 2011 (OSDC). I’ve spoken at several of the past OSDCs that have been held around Australia: 2005, 2007, 2008, 2010 and now 2011. It’s one of those conferences with great energy and great people that’s organised by dedicated members in the community who build the conference they want to go to.

I’ll be giving two talks this year:


Sphinx 2.0.2 Beta is released, Sphinx Users Conference in December

My friends at Sphinx Technologies have finally released new beta of Sphinx – Sphinx 2.0.2. It includes about 6 months of development and includes over 30 new features and tons of bug fixes. I’m happy to see how Sphinx 2.0 is shaping up a lot of rough corners are being polished and I’m hopeful we will see very solid Stable Sphinx 2.0 within next 3 to 6 months. In fact Sphinx 2.0.3-rc is promised within 1 month.

What MySQL Users will find interesting in this release is a lot of work on SphinxQL the Sphinx’s language similar to SQL. Now it support number of functions such as SET NAMES which make it possible to connect to Sphinx as if it were MySQL Server with more advanced connectors which issue such commands when they set up connection.

Sphinx Team also announced Sphinx Users Conference in St. Petersburg, Russia on December 4th. This is going to be free Russian Speaking event.

This is the second Sphinx event in Russia and I really wish there would be and International Sphinx Conference somewhere in US or Western Europe. I hope we’ll see something like it in 2012.

Another thought – with Innodb Full Text Search coming up in MySQL 5.6 and Claimed to be faster than MyISAM’s I really would like to see some benchmarks comparing it to Sphinx, especially to its Real Time indexes. Some easy to understand feature comparison table would also be helpful as in many cases people use Sphinx not because it is faster – speed might not be the main thing on smaller collections but its various full text search features.


Fishing with dynamite, brought to you by the randgen and dbqp

I tend to speak highly of the random query generator as a testing tool and thought I would share a story that shows how it can really shine. At our recent dev team meeting, we spent approximately 30 minutes of hack time to produce test cases for 3 rather hard to duplicate bugs. Of course, I would also like to think that the way we have packaged our randgen tests into unittest format for dbqp played some small part, but I might be mildly biased.

The best description of the randgen’s power comes courtesy of Andrew Hutchings – “fishing with dynamite“. This is a very apt metaphor for how the tool works – it can be quite effective for stressing a server and finding bugs, but it can also be quite messy, possibly even fatal if one is careless. ; ) However, I am not writing this to share any horror stories, but glorious tales of bug hunting!

The randgen uses yacc-style grammar files that define a realm of possible queries (provided you did it right…the zen of grammar writing is a topic for another day). Doing this allows us to produce high volumes of queries that are hopefully interesting (see previous comment about grammar-writing-zen).

It takes a certain amount of care to produce a grammar that is useful and interesting, but the gamble is that this effort will produce more interesting effects on the database than the hand-written queries that could be produced in similar time. This is especially useful when you aren’t quite sure where a problem is and are just trying to see what shakes out under a certain type of stress. Another win is that a well-crafted grammar can be used for a variety of scenarios. The transactional grammars that were originally written for testing Drizzle’s replication system have been reused many times (including for two of these bugs!)

This brings us to our first bug:
mysql process crashes after setting innodb_dict_size

The basics of this were that the server was crashing under load when innodb_dict_size_limit was set to a smaller value. In order to simulate the situation, Stewart suggested we use a transactional load against a large number of tables. We were able to make this happen in 4 easy steps:
1) Create a test case module that we can execute. All of the randgen test cases are structured similarly, so all we had to do was copy an existing test case and tweak our server options and randgen command line as needed.

2) Make an altered copy of the general, percona.zz gendata file. This file is used by the randgen to determine the number, composition, and population of any test tables we want to use and generate them for us. As the original reporter indicated they had a fair number of tables:

$tables = {
rows => [1..50],
partitions => [ undef ]

The value in the ‘rows’ section tells the data generator to produce 50 tables, with sizes from 1 row to 50 rows.

3) Specify the server options. We wanted the server to hit similar limits as the original bug reporter, but we were working on a smaller scale.
To make this happen, we set the following options in the test case:

server_requirements = [["--innodb-dict-size-limit=200k --table-open-cache=10"]]

Granted, these are insanely small values, but this is a test and we’re trying to do horrible things to the server ; )

4) Set up our test_* method in our testcase class. This is all we need to specify in our test case:

def test_bug758788(self):
test_cmd = ("./ "
            "--gendata=conf/percona/innodb_dict_size_limit.zz "
            "--grammar=conf/percona/translog_concurrent1.yy "
            "--queries=1000 "
retcode, output = execute_randgen(test_cmd, test_executor, servers)
self.assertTrue(retcode==0, output)

The test is simply to ensure that the server remains up and running under a basic transactional load

From there, we only need to use the following command to execute the test:
./ –default-server-type=mysql –basedir=/path/to/Percona-Server –suite=randgen_basic innodbDictSizeLimit_test
This enabled us to reproduce the crash within 5 seconds.

The reason I think this is interesting is that we were unable to duplicate this bug otherwise. The combination of the randgen’s power and dbqp’s organization helped us knock this out with about 15 minutes of tinkering.

Once we had a bead on this bug, we went on to try a couple of other bugs:

Crash when query_cache_strip_comments enabled

For this one, we only modified the grammar file to include this as a possible WHERE clause for SELECT queries:

WHERE X . char_field_name != 'If you need to translate Views labels into other languages, consider installing the <a href=\" !path\">Internationalization</a> package\'s Views translation module.'

The test value was taken from the original bug report.
Similar creation of a test case file + modifications resulted in another easily reproduced crash.
I will admit that there may be other ways to go about hitting that particular bug, but we *were* practicing with new tools and playing with dynamite can be quite exhilarating ; )
parallel option breaks backups and restores

For this bug, we needed to ensure that the server used –innodb_file_per_table and that we used Xtrabackup‘s –parallel option. I also wanted to create multiple schemas and we did via a little randgen / python magic:

# populate our server with a test bed
test_cmd = "./ --gendata=conf/percona/bug826632.zz "
retcode, output = execute_randgen(test_cmd, test_executor, servers)
# create additional schemas for backup
for i in range(6):
    schema = schema_basename+str(i)
    query = "CREATE SCHEMA %s" %(schema)
    retcode, result_set = execute_query(query, master_server)
    self.assertEquals(retcode,0, msg=result_set)
    retcode, output = execute_randgen(test_cmd, test_executor, servers, schema)

This gave us 7 schemas, all with 100 tables per schema (with rows 1-100). From here we take a backup with –parallel=50 and then try to restore it. These are basically the same steps we use in our basic_test from the xtrabackup suite. We just copied and modified the test case to suit our needs for this bug. With this setup, we need a crash / failure during the prepare phase of the backup. Interestingly this only happens with this number of tables, schemas, and –parallel threads.

Not too shabby for about 30 minutes of hacking + explaining things, if I do say so myself. One of the biggest difficulties in fixing bugs comes from being able to recreate them reliably and easily. Between the randgen’s brutal ability to produce test data and queries and dbqp’s efficient test organization, we are now able to quickly produce complicated test scenarios and reproduce more bugs so our amazing dev team can fix them into oblivion : )


dbqp and Xtrabackup testing

So I’m back from the Percona dev team’s recent meeting.  While there, we spent a fair bit of time discussing Xtrabackup development.  One of our challenges is that as we add richer features to the tool, we need equivalent testing capabilities.  However, it seems a constant in the MySQL world that available QA tools often leave something to be desired.  The randgen is a literal wonder-tool for database testing, but it is also occasionally frustrating / doesn’t scratch every testing itch.  It is based on technology SQL Server was using in 1998 (MySQL began using it in ~2007, IIRC).  So this is no knock, it is merely meant to be an example of a poor QA engineer’s frustrations ; )  While the current Xtrabackup test suite is commendable, it also has its limitations. Enter the flexible, adaptable, and expressive answer: dbqp.

One of my demos at the dev meeting was showing how we can set up tests for Xtrabackup using the unittest paradigm.  While this sounds fancy, basically, we take advantage of Python’s unittest and write classes that use their code.  The biggest bit dbqp does is search the specified server code (to make sure we have everything we should), allocate and manage servers as requested by the test cases, and do some reporting and management of the test cases.  As the tool matures, I will be striving to let more of the work be done by unittest code rather than things I have written : )

To return to my main point, we now have two basic tests of xtrabackup:

Basic test of backup + restore:

  1. Populate server
  2. Take a validation snapshot (mysqldump)
  3. Take the backup (via innobackupex)
  4. Clean datadir
  5. Restore from backup
  6. Take restored state snapshot and compare to original state

Slave setup

  1. Similar to our basic test except we create a slave from the backup, replicating from the backed up server.
  2. After the initial setup, we ensure replication is set up ok, then we do additional work on the master and compare master and slave states

One of the great things about this is that we have the magic of assertions.  We can insert them at any point of the test we feel like validating and the test will fail with useful output at that stage.  The backup didn’t take correctly?  No point going through any other steps — FAIL! : )  The assertion methods just make it easy to express what behavior we are looking for.  We want the innobackupex prepare call to run without error?
Boom goes the dynamite!:

# prepare our backup
cmd = ("%s --apply-log --no-timestamp --use-memory=500M "
"--ibbackup=%s %s" %( innobackupex
, xtrabackup
, backup_path))
retcode, output = execute_cmd(cmd, output_path, exec_path, True)
self.assertEqual(retcode, 0, msg = output)

From these basic tests, it will be easy to craft more complex test cases.  Creating the slave test was simply matter of adapting the initial basic test case slightly.  Our plans include: *heavy* crash testing of both xtrabackup and the server, enhancing / expanding replication tests by creating heavy randgen loads against the master during backup and slave setup, and other assorted crimes against database software.  We will also be porting the existing test suite to use dbqp entirely…who knows, we may even start working on Windows one day ; )

These tests are by no means the be-all-end-all, but I think they do represent an interesting step forward.  We can now write actual, honest-to-goodness Python code to test the server.  On top of that, we can make use of the included unittest module to give us all sorts of assertive goodness to express what we are looking for.  We will need to and plan to refine things as time moves forward, but at the moment, we are able to do some cool testing tricks that weren’t easily do-able before.

If you’d like to try these tests out, you will need the following:
* dbqp (bzr branch lp:dbqp)
* DBD:mysql installed (test tests use the randgen and this is required…hey, it is a WONDER-tool!) : )
* Innobackupex, a MySQL / Percona server and the appropriate xtrabackup binary.

The tests live in dbqp/percona_tests/xtrabackup_basic and are named and, respectively.

To run them:
$./ –suite=xtrabackup_basic –basedir=/path/to/mysql –xtrabackup-path=/mah/path –innobackupex-path=/mah/other/path –default-server-type=mysql –no-shm

Some next steps for dbqp include:
1)  Improved docs
2)  Merging into the Percona Server trees
3)  Setting up test jobs in Jenkins (crashme / sqlbench / randgen)
4)  Other assorted awesomeness

Naturally, this testing goodness will also find its way into Drizzle (which currently has a 7.1 beta out).  We definitely need to see some Xtrabackup test cases for Drizzle’s version of the tool (mwa ha ha!) >: )


MLC SSD card lifetime and write amplification

As MLC-based SSD cards are raising popularity, there is also a raising concern how long it can survive. As we know, a MLC NAND module can handle 5,000-10,000 erasing cycles, after which it gets unusable. And obviously the SSD card based on MLC NAND has a limited lifetime. There is a lot of misconceptions and misunderstanding on how long such card can last, so I want to show some calculation to shed a light on this question.

For base I will take Virident FlashMAX M1400 (1.4TB) card. Virident guarantees 15PB (PB as in petabytes) of writes on this card.
15PB sounds impressive, but how many years it corresponds to ? Of course it depends on your workload, and mainly how write intensive it is. But there are some facts that can help you to estimate.

On Linux you can look into the /proc/diskstats file, which shows something like:

 251       0 vgca0 30273954 0 968968610 416767 122670649 0 8492649856 19260417 0 19677184 220200747

where 8492649856 is the number of sectors written since the reboot (sector is 512 bytes).

Now you can say that we may take /proc/diskstats stats with the 1h interval, and it will show write how many bytes per hour we write, and in such way to calculate the potential lifetime.
This will be only partially correct. There is such factor as Write Amplification, which is very well described on WikiPedia, but basically SSD cards, due an internal organization, write more data than it comes from an application.
Usually the write amplification is equal or very close to 1 (meaning there is no overhead) for sequential writes and it gets a maximum value for fully random writes. This value can be 2 – 5 or more and depends on many factors like the used capacity and the space used for an over-provisioning.

Basically it means you should look into the card statistic to get an exact written bytes.
For Virident FlashMAX it is

vgc-monitor -d /dev/vgca  | grep writes
                                 379835046150144 (379.84TB) (writes)

Having this info let’s take look what a lifetime we can expect under a tpcc-mysql workload.
I put 32 users threads against 5000W dataset (about 500GB of data on the disk) during 1 hour.

After 1 hour, /proc/diskstat shows 984,442,441,728 bytes written, which is 984.44GB and the Virident stat shows 1,125,653,692,416 bytes written, which is 1,125.65GB
It allows us to calculate the write amplification factor, which in our case is
1,125,653,692,416 / 984,442,441,728 = 1.143. This looks very decent, but remember we use only 500GB out of 1400GB, and the factor will grow as we fill out more space.

Please note we put a quite intensive write load during this hour.
MySQL handled 25,000 updates/sec, 20,000 inserts/sec and 1,500 deletes/sec, which corresponds to
write throughput 273.45MB/sec from MySQL to disk.

And it helps to calculate the lifetime of the card if we put such workload 24/7 non-stop.
15PB (of total writes) / 1125.65GB (per hour) = 13,325.634 hours = 555.23 days = 1.52 years

That is under non-stop tpcc-mysql workload we may expect the card will last 1.52 years. However, in real production you do not have an uniform load every hour, so you may base your estimation on daily or weekly stats.

Unfortunately there is no easy way to predict this number until you start workload on the SSD.
You can take look into /proc/diskstat, but
1. There is write amplification factor which you do not know
and 2. A throughput on regular RAID is much less than on SSD and you do not know what your throughput will be when you put workload on SSD.


Installing and using Coffee-Script with Assetic on Windows

There are a lot of poor souls that are forced to use Windows as an OS for their daily development job. I’m one of them. I heard even of people enjoying working on Windows ;) There is one thing in common for them: Using cutting edge tools like CoffeeScript, node.js and others isn’t working easily out of the box most of the time. But in case of CoffeeScript there is good news. Since the release of Node.JS 0.6 with native windows support everything is working pretty fine. Here is what you will need to do to get it working.

Installing Node.JS

The best way using CoffeeScript is the installation via Node.js. The setup of Node.js can easily be done with the new Windows-Installer package that is available on the project site. There is nothing special here. After node.js is installed you will need to setup the node.js package manager (npm). You willneed to have a Git-Client installed on your machine. The best you can get is msysgit which brings a git bash console. Please make sure you have version 1.7.6 or newer installed on your PC. Open up a Git-Bash console and change to the folder you want to install npm (C:\Program Files for example). After that execute the following three commands:
[crayon lang=sh]> git config –system http.sslcainfo /bin/curl-ca-bundle.crt
> git clone –recursive git://
> cd npm
> node cli.js install npm -gf
You have now the node.js package manager installed.

Installing CoffeeScript

Installing CoffeeScript is now a piece of cake. Just use the following command:
[crayon lang=sh]> npm install -g coffee-script[/crayon]
The -g switch ensures that your CoffeeScript installation is available system-wide. If everything went right you can now start the CoffeeScript commandline tool from the console.

Coffee-Script commandline

If you need tutorials or reference material for CoffeeScript here is what I found useful:

Using Coffee-Script with Assetic

Working with Coffee-Script and the commandline compiler leads to uncomfortable development process. But there is light at the end of the tunnel. As with all assets Assetic will be of great help when using Coffee-Script files in your Symfony project. The AsseticBundle is part of the Symfony2 Standard Edition so there is no need to install anything. You just have to do some configuration work ahead. This is done in the app/config/config.yml file. There you have to add a filter for CoffeeScript.
By default this filter is looking at fixed folders for the bin files of CoffeeScript and node.js. These paths are unix-ones. So we need to configure the paths, so they correspond with your installation folders.
bin: c:/nodejs/bin/node.exe
node: c:/npm/bin/npm.exe
From now on there is no difference in using the CoffeeScript files with Assetic compared to unix systems. Here is how to use them in Twig-Templates.
[crayon lang=php]
{% javascripts filter=”coffee”

{% endjavascripts %}

If you need more infos on how to use CoffeeScript with Assetic take a look at the Blog of Richard Miller

flattr this!

Powered by WordPress | Theme: Aeros 2.0 by