Oct
30
2012
--

Percona Live London 2012 talks you do not want to miss

Percona Live, London 2012 continues streak of “not to be missed” events in London, following the footsteps of Royal Wedding, Diamond Jubilee and Summer Olympics. We have prepared great set of Tutorials and Sessions for these two day event. Here is my personal selection of the talks I’d love to attend, though I rarely have such chance :)

MariaDB – All the New Features by Sergei Golubchik is a great overview of MariaDB features. If you’re interested to learn more about current state of MariaDB it is great talk to attend.

Choosing Hardware for MySQL by Kenny Gryp is a great talk if you want to learn how to chose hardware for best MySQL performance and maximum efficiency.

MySQL Compatible Open Source Connectors by Andrew Hutchings. There are More MySQL connectors available than standard “libmysql” and Andrew will tell you all about it. You should be specially interested in this talk if GPL licensing requirements for standard MySQL client libraries make it challenging to use in your environment.

common_schema: DBA’s framework for MySQL by Shlomi Noach is a talk about very powerful set of views and routines to simplify a lot of common DBA tasks for administration and analyses.

Survey of Percona Toolkit: Command-line Tools for MySQL by Bill Karwin. This talk is a brief overview of 9 essential Percona Toolkit command line tools.

I’m also giving talks myself – Using Percona Server in Production tutorial and Optimizing MySQL Configuration and I’d love to see you on these.

Use “Petersblog” discount code to sign up now and receive 20% off

See you in London!

Oct
29
2012
--

Ocean of Dust: Cover reveal!

I’m really excited to reveal the cover for my upcoming OCEAN OF DUST, which will hit all good ebook retailers sometime in November. Watch this space. I won’t say anything else, since the cover speaks for itself. I think it’s stunning! A big thank you to Erin, the artist.

Copyright (c) 2012, EDHGraphics

Oct
29
2012
--

MySQL and predictive option handling – beware

MySQL command line utilities have an interesting property – if you only use the prefix of the option or command it will go over the list of available command and if there is only one command with matching prefix it will execute it with no warnings or any kind, otherwise it will report the error. For example mysqladmin e works as there is only one command “extended-status” which starts with “e” mysqladmin f however does not work because there are multiple commands which start with “f”.

This behavior can cause all kings of issues especially if you make a typo. I made one today (thankfully on the test box). MySQL init Scripts use “stop” and “start” commands to start and stop the server while mysqladmin uses “shutdown”. I mixed this while trying to stop Percona Server:

root@smt2:/var/lib/mysql# mysqladmin stop
Slave stopped

It would be quite confusing if mysqladmin would have “stop” command which does not do the same thing as “shutdown” . And it does not. The commands it has are
“start-slave” and “stop-slave” though as these are the only commands with such prefixes they are matched to “start” and “stop” appropriately.

I think such automated option prediction is very dangerous and it gets more dangerous the more options are added for the tools as it is more likely your mistake can be actually matching something… I think much better solution would be to have explicitly created synonyms and shortcuts if they are needed. For example “extended-status” is rather option and we can allow “extended” shortcut for it or might be even “ext” but allowing “e” is an overkill.

Oct
28
2012
--

Seeking opinions is dangerous to your sanity

We all seek opinions about everything, every day, either at home, school or work. We ask friends and family, we ask acquaintances on Facebook or other social media. Do you like this? What should I wear? Is that movie any good? Did you like this book? What kind of person is Joe? Was my report useful at the management meeting?

Of course, right now, the entire US is engaged in the largest mission of opinion-seeking there is – who do we think will make the best President. I think we are training ourselves that polls are of no value when both candidates are 1% apart with a 3% margin of error.

But this post isn’t about politics. I hear a sigh of relief. This post is about writing.

Throughout the process of writing our book, especially after the first draft, we begin to solicit the opinions of others. Creatives are by nature paranoid and insecure. Are we writing a pile of poop that no one will want to read? Our work represents months, often years of our life. We need opinions – we can’t complete it in a vacuum.

Let’s look at the pros and cons of soliciting certain groups:

  • Family: Often the easiest to ask since they are right on hand. They love us, they understand our egos. They will be gentle. But that’s the problem: They are inherently biased toward telling us that everything is great – they love our book.
  • Friends: Some friends will act just like family and sugarcoat their opinions. True friends should have the guts and enough mutual respect to tell us the good and the bad. There’s a hidden minefield here. They may understand when we ask for a brutally honest opinion, and they may deliver it, but somehow our brains fritz out when we hear tough criticism from a friend.
  • Writer’s Group: Now we are getting very honest, objective and useful opinions from others that are traveling the same road. They understand the craft, they understand the pitfalls, they’ve studied markets and genres just like we have. A good writer’s group operates on a covenant of mutual assistance: I help you make your book the best it can be, and you reciprocate. Fantastic. But… and you knew there was a but… writers can be overly picky, get deep in the weeds, feel the pressure to offer highly technical and detailed critiques. Many things that a writer’s group obsesses over, a reader will never notice. Often it can be easier to copyedit a piece with lots of mark up, than to step away from the pen and offer big picture opinions. Are the characters progressing? Is the pacing right? Is the suspense building or flopping?
  • Beta Readers: Coupled with a writer’s group, now you have the best of both worlds. Readers won’t be as technical (though practiced ones can be), but won’t get caught in the weeds of the craft. They represent the “end-user” so their opinions should carry considerable weight. Don’t expect detailed critiques, but even a simple note at the end of each page or chapter can suffice to teach us a phenomenal amount. “I skim-read it, it bored me”. “I couldn’t wait to turn the page.” “Susan just wouldn’t behave that way.” “The ending was flat.” “OMG, Frank was just so horrible but I really admired Jane’s strength in dealing with him. I clapped and cheered when she pushed him over the cliff.”

We won’t talk about reviewers, since that is a completely different kettle of fish (and sharks).

So all these opinions are great, right? We can get constant feedback about whether we are on the right path. If it passes the writer’s group, and passes 2 or 3 beta readers, then I have a bestseller on my hands, right?

Er… no. But you knew that. You might have gathered a handful, maybe a dozen opinions along the way. How representative is that of your audience? Likely, many folks in your writer’s group don’t even read your genre. Beta readers will have their own bias. Just as we ask ourselves, how accurate is a CNN election poll that asks 30,000 people against a population of hundreds of millions; so should we ask ourselves are we seeking opinions from the right people, and enough people.

By now we should have all accepted the truth that we can’t please everyone. We won’t even capture most of them. It all depends on your own definition of success. If 1 million people read your book, yet 90% of them disliked it, how much would you value those 100,000 fans? As an optimist, I call that a success, though the percentage tempts us to believe the opposite. All we are after is a core set of readers that like what we like. We must put opinions into perspective.

Here comes the reason for the title of my post. Chasing accurate opinions will drive you insane. How do I know? I just went through it.

Just a few weeks out from publishing my first book, I began to solicit opinions on my cover, on my blurb and hookline. What we’re looking for of course is a consensus. If I ask 20 people and 15 people agree, I have to assume that is representative and I can relax. Those are great odds. What I found were clusters of people sharing an opinion, but no clear leader. Stalemate! So I asked more people, seeking clarity. More opinions, more dilution of the majority, more insanity. I have to find a way to know which opinion is accurate! Who else can I ask?

Luckily, I stopped myself before my brain exploded. The problem was that I had written my best blurb and hookline, and worked with an artist on a superb cover, and expected everyone to agree. Not likely. Once I restructured the question in my head, everything became clear. Intuition tells me that I had crafted the materials to best represent my book. All I need to be sure of is that everyone isn’t unanimous AGAINST that. I won’t please everyone. I just need to ensure that I am pleasing SOMEONE(S).

Am I saying ignore opinions? Absolutely not. I have changed many decisions due to the opinions of my friends and family, and my book would be much weaker without my awesome writing group. What I am saying is weight them against their contained bias, and against your own intuition. You be the arbiter, don’t expect everyone to make the decision for you. That way lays madness.

Ask yourself the simple question: Am I being stubborn against an overwhelming trend of opinion? If the opinions are mixed – that’s ok. Embrace it. As long as you have been diligent at putting forward your best, be grateful you are pleasing someone. A mixed opinion amplified by billions of readers worldwide, could still net you a wonderfully diverse readership.

Have you agonized over trying to please everyone? How do you handle it when opinions vary from awful to awesome?

 

Oct
28
2012
--

Percona XtraDB Cluster – installation and setup webinar follow up Q&A

Thanks for all, who attended my webinar, I got many questions and I wanted to take this opportunity to answer them.

Q: Even ntp has a delay of 0.3-0.4 between servers does that mean a 0.25 as from logs can be an issue ?
A: My demo vms were running for a few hours before the webinar in my local virtualbox instance, and I tried to show the minimal installation required for XtraDB cluster. Unfortunately enough, I didn’t include NTP, which caused SSTs to fail with the tar stream being from the future. The 0.3 – 0.4 second delay seems too much to me. According to the NTP standard, if the delay between the hosts is greater than 128 ms, NTP won’t work. So far for background and theory, from the practical point of view, you should have ntpd running, and you should monitor stratum. If you are using ntp servers stratum should be 3 or less, you can monitor it with

echo sysinfo | ntpdc

For the record, when I set the clock on my vms, they were almost off by 90s compared to the reference.

Q: You have to bootsrap it every time a node is down?
A: No. You have to bootstrap every time all nodes in the cluster down when you bring up the first node. If a node is down, when it comes back online, you have to perform a state transfer, which can be IST (Incremental State Transfer) and SST (Snapshot State Transfer). If the node has recent enough data that the write sets that needs to be applied are in the online node’s gcache, it will only perform an IST. When I mention bootstrapping here and in the webinar, I mean wsrep_cluster_address=’gcomm://’ and not ‘pc.bootstrap=true’.

Q: Can you quickly show the my.cnf settings again?
A: Sure, see this post or webinar recording.

Q: Fault tolerance: Can you recommend a three node cluster for production use?
A: Yes, that’s the recommended configuration. The number of nodes/data centers should be odd, so you can always decide which part is in majority in case of a network split.

Q: How does latency and bandwidth affect performance? Is it practical to buildthe cluster over a wan connection with ~ 5 ms latency?
A: Yes, it can be practical. The response time of your transactions will be affected by latency, but since we are using parallel replication here, with many parallel threads, this means that you can achieve a high throughput despite the response time limitations. Even with parallel replication, the response time limits the number of transactions / second that you can do on a single record. This means that if you have “hot database areas (single records that are modified very frequently)” high latency can potentially be an issue, otherwise this is mitigated by parallel replication. In this case the cost of latency on commit will be 5 ms each time.

Q: Can I change the replication method between servers?
A: If you would like to switch back and forth between galera and built-in mysql replication (asynchronous) for a given node, it’s not possible. However, a node in PXC can be a master or a slave on asynchronous replication, galera and built-in mysql replication are independent from each other.

Q: I installed Percona as cluster database working with Zabbix (innodb), do you have a recommended configuration?
A: We don’t have any specific recommandations for Zabbix, but general InnoDB tuning recommendations apply here, in large zabbix deployments, you may want to partition your large history tables, so it can keep up with the writes and/or configure and rethink the retention of the raw data.

Q: What are the steps to automatically restart a Percona Cluster after an outage?
A: It depends of the type of failure, first you can try to just restart the nodes. If you can’t bring it up, your last resort is bootstrap the first node, and do an SST/IST with the rest of the nodes.

Q: We are missing SST_AUTH parameter, rigth ? It is not possible to do SST without this, as per my testing…or am I wrong here ?
A: In the demo I used a blank root password for all nodes, so it did work without it.

Q: For automatic cluster starting, should it be ok to have in [mysqld_safe] the following: wsrep_urls=gcomm://IP1:4567,gcomm://IP2:4567,gcomm://IP3:4567,gcomm://
A: This will work for automatic cluster starting, but unfortunately it can lead to inconsistency in case of a network split. If a network split happens and a node decides to shut down, and somehow restarts because of this automatic process, it gets bootstrapped because of this option, and you can end up having 2 different clusters serving data to different groups of clients. So, the last gcomm:// option should be avoided. Although it can be helpful when you are testing certain scenarios, don’t forget to remove it before going to production.

Q: How would you migrate to XtradbCluster from a traditional Master Slave scenario when the data set is very large? Can you swap in Binaries ?
A: Check out Jay Janssen’s webinar about the topic.

Q: When one node in a 3 node cluster is synchronizing with the group, are the other two nodes available to serve data during the synchronization?
A: The node synchronizing should get the data from somewhere, it is from an other node. That node can be available during this process it you are using the xtrabackup SST method. Even if you are using an SST method which is blocking (rsync for example), you will have one node available in this situation.

Q: Does galera async replication fit in with percona XtraDB Cluster? i.e. can I run a production cluster in synchronous, and a disaster recovery location replicated asynchronously?
A: You can use asynchrounous replication between 2 PXC clusters, with one node being the master from the first cluster, and one node being the slave from the second cluster. However, if you lose either of them, you can’t get a consistent binlog position in most causes to continue replication.

Q: I hope that one point that is addressed, if not part of the presentation then as a comment near the end, is how to start up a cluster from scratch (i.e. a standalone XtraDB Cluster machine, convert it to a cluster type, and add nodes to it)
A: For details on starting up a cluster from scratch, see my earlier blogpost , for migration see Jay Janssen’s webinar about the topic.

Q: Do you recomend this configuration for servers in different datacenters? thanks!
A: Yes, as long as there are at least 3 datacenters, and the total number of the datacenters is odd.

Q: How is configuration for the arbitrator?
A: You should use the garbd binary for that, is takes wsrep_cluster_address and wsrep_cluster_name as an argument. It will join the cluster, but won’t store or serve data.

Q: Is the configuration file my.cnf ?
A: Yes.

Q: If the cluster blocks write while it commits to another host, does that not limit write capacity to the “weakest link”. Or have i missed something?
A: The cluster doesn’t block for the duration of the write, but only for certifying that write. See virtually synchronous replication of the cluster explained here.

Q: What about myisam storage? is it safe to use it? Does xtrabackup works as a sst method for myisam?
A: Yes, but for MyISAM xtrabackup is blocking, since it is backed up by the wrapper script called innobackupex, and not xtrabackup itself while holding a lock using FLUSH TABLES WITH READ LOCK. MyISAM support is experimental. Apart from some very special, edge cases, I would not recommend using MyISAM.

Q: How does write performance scale with an increasing number of nodes?
A: Since all the nodes has to do all the writes, you can’t scale writes infinitely by adding nodes. I would expect that writing to a few nodes is faster then writing only to 1, the reason for this is that only 1 node has to parse SQL, the rest just has to apply RBR events.

Q: I saw an rm -rf * in the demo what folder is that in and when is it appropriate to do it?
A: I wanted to show you that during the SST the node is actually rebuilding, also you can force an SST this way (although for forcing an SST it’s enough if you delete grastate.dat). So, it is appropriate for demo purposes when you would like to show that the full data is copied when you SST again.

Q: Thanx Peter, I loved the webinar!  Especially the fact that that everything is live!
A: Cheers, you are welcome, I am glad you liked it:). Thanks everyone for all these questions, feel free to ask additional ones in comments.

Oct
27
2012
--

Edge-case behavior of INSERT…ODKU

A few weeks back, I was working on a customer issue wherein they were observing database performance that dropped through the floor (to the point of an outage) roughly every 4 weeks or so. Nothing special about the environment, the hardware, or the queries; really, the majority of the database was a single table with an auto-incrementing integer PK and a secondary UNIQUE KEY.

The queries being run against this table were almost exclusively INSERT … ON DUPLICATE KEY UPDATE (INSERT ODKU), with the columns from the INSERT part of the statement corresponding to the columns in the secondary index, and they were coming in at a rate of approximately 1500 to 2000 per second, sustained, 24h per day. The mathematically-astute among you may already be able to see where this is going.

For purposes of discussion, we can use the following table to illustrate the situation:

CREATE TABLE update_test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  username VARCHAR(20) NOT NULL,
  host_id TINYINT UNSIGNED NOT NULL,
  last_modified TIMESTAMP NULL DEFAULT NULL,
  PRIMARY KEY(id),
  UNIQUE KEY(username)
) ENGINE=InnoDB;

Now consider the following sequence of events.

(root@localhost) [test]> INSERT INTO update_test (username, host_id, last_modified) VALUES ('foo',3,NOW());
Query OK, 1 row affected (0.00 sec)

(root@localhost) [test]> select * from update_test;
+----+----------+---------+---------------------+
| id | username | host_id | last_modified       |
+----+----------+---------+---------------------+
| 1  | foo      |       3 | 2012-10-05 22:36:30 |
+----+----------+---------+---------------------+

Nothing crazy here, right? We’ve inserted one row into an empty table, and if we were to do a SHOW CREATE TABLE we’d see that the AUTO_INCREMENT counter is currently set to 2. If we do an INSERT … ODKU on this table, we see the following:

(root@localhost) [test]> insert into update_test (username,host_id) values ('foo',1) on duplicate key update last_modified=NOW();
Query OK, 2 rows affected (0.00 sec)

(root@localhost) [test]> select * from update_test;
+----+----------+---------+---------------------+
| id | username | host_id | last_modified       |
+----+----------+---------+---------------------+
|  1 | foo      |       3 | 2012-10-05 22:58:28 |
+----+----------+---------+---------------------+
1 row in set (0.00 sec)

And now, even though we didn’t insert a new row, our auto-increment counter has increased to 3. This is actually expected behavior; InnoDB checks constraints in the order in which they were defined, and the PRIMARY KEY is always going to be considered as being defined first. So, MySQL checks our INSERT, sees that the next auto-inc value is available, and claims it, but then it checks the UNIQUE KEY and finds a violation, so instead it does an UPDATE. If we look at the handler status counters, we can see that there was 1 request to insert a row (which failed) and 1 request to update a row, which succeeded (this explains why, when there’s a row-update, that we have 2 rows affected and not 1.

(root@localhost) [test]> show status like 'handler%';
 *** some rows omitted ***
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_rollback           | 0     |
| Handler_update             | 1     |
| Handler_write              | 1     |
+----------------------------+-------+

At this point, you might be thinking, “Ok, so what?” Let’s go back to our customer. 1500 INSERT ODKUs per second, sustained for 24 hours per day. The PK on their table is the same as what I’ve used in my demonstration table – INT UNSIGNED. Do the math. The maximum value for an auto-increment INT UNSIGNED is 4294967295. Divide that by 1500 qps and then again by 86400, which is the number of seconds in a day, and we get 33.1 days, or a little over 4 weeks. Coincidence? I think not.

So what actually happens when we run out of auto-increment space? Some of the behavior might surprise you. Let’s go back to our demonstration table and insert a row at the end of the auto-increment range, and then try to insert another one:

(root@localhost) [test]> insert into update_test (id,username,host_id) values (4294967295, 'bar', 10);
Query OK, 1 row affected (0.00 sec)
(root@localhost) [test]> flush status;
(root@localhost) [test]> insert into update_test (username,host_id) values ('baz', 10);
ERROR 1062 (23000): Duplicate entry '4294967295' for key 'PRIMARY'

(root@localhost) [test]> show status like 'handler%';
 *** some rows omitted ***
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_rollback           | 1     |
| Handler_write              | 1     |
+----------------------------+-------+

Ok, so we tried to insert a row, and it failed because the counter for auto-increment was already at its maximum value, and the statement was rolled back. But what happens if we try an INSERT … ODKU? First, recall what’s in our table:

(root@localhost) [test]> select * from update_test;
+------------+----------+---------+---------------------+
| id         | username | host_id | last_modified       |
+------------+----------+---------+---------------------+
|          1 | foo      |       3 | 2012-10-05 22:58:28 |
| 4294967295 | bar      |      10 | NULL                |
+------------+----------+---------+---------------------+
2 rows in set (0.00 sec)

(root@localhost) [test]> INSERT INTO update_test (username, host_id) VALUES ('foo', 7) ON DUPLICATE KEY UPDATE host_id=7, last_modified=NOW();
Query OK, 2 rows affected (0.00 sec)

Looks fine, right? 2 rows affected, so obviously, the row that we wanted, i.e., the one that has the username = ‘foo’, was updated with the proper host_id and the last_modified time, and we can happily go about our day. Unfortunately, this isn’t the case.

(root@localhost) [test]> select * from update_test;
+------------+----------+---------+---------------------+
| id         | username | host_id | last_modified       |
+------------+----------+---------+---------------------+
|          1 | foo      |       3 | 2012-10-05 22:58:28 |
| 4294967295 | bar      |       7 | 2012-10-05 23:24:49 |
+------------+----------+---------+---------------------+
2 rows in set (0.00 sec)

Yep, it’s actually THE LAST ROW, the one where the id is equal to the auto-increment max value, which is the one that got updated. The secondary UNIQUE on username is, for all intents and purposes, ignored.

For the customer whose database served as the inspiration for this post, we can see fairly easily what the problem turned out to be. 1500 queries per second all trying to lock and update the same row is not going to end well; deadlocks, rollbacks, contention, and all sorts of related unpleasantness. There is, of course, a trivial solution to this: just change the AUTO-INCREMENT column to use a BIGINT, and problem solved.

As it turns out, this is documented behavior; the manual states that our INSERT … ODKU on a table with multiple unique indexes such as this one would be equivalent to “UPDATE update_test SET host_id=7, last_modified=NOW() WHERE id=4294967295 OR username=’foo’ LIMIT 1″, and of course, a PK lookup is going to be more likely to be chosen by the optimizer than a UNIQUE on the secondary index.

So what do we learn here?

  • It’s easier to burn through AUTO_INCREMENT values than you might think; the original customer table in question had less than 500K rows in it.
  • Using signed types for AUTO_INCREMENT columns is almost never a good idea; it wastes half of the column’s available range.
  • Intuition, like the laws of physics, often breaks down in edge case scenarios.
Oct
26
2012
--

Announcing Percona Server 5.1.66-14.1

Percona is glad to announce the release of Percona Server 5.1.66-14.1 on October 26th, 2012 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.1.66, including all the bug fixes in it, Percona Server 5.1.66-14.1 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.66-14.1 milestone at Launchpad.

Bugs Fixed:

  • Percona Server would disconnect clients if gdb was attached and detached. This was caused by wrong signal handling. Bugs fixed #805805 and #1060136 (Laurynas Biveinis).
  • Fixed the upstream MySQL bug #61509. Crash in YaSSL was causing SSL tests failures on Ubuntu Oneiric hosts. Bug fixed #902471 (Laurynas Biveinis).
  • Fixed the upstream MySQL bug #62856 where check for “stack overrun” wouldn’t work with gcc-4.6 and caused the server crash. Bug fixed #902472 (Laurynas Biveinis).
  • Resolved the Ubuntu Percona Server package conflicts with upstream packages. Bug fixed #907499 (Ignacio Nin).
  • Percona Server would crash on a DDL statement if an XtraDB internal SYS_STATS table was corrupted or overwritten. This is now fixed by detecting the corruption and creating a new SYS_STATS table. Bug fixed #978036 (Laurynas Biveinis).
  • Postfix would crash on CentOS/RHEL 6.x when using shared dependency (libmysqlclient.so). Fixed by building packages with OpenSSL support rather than the bundled YaSSL library. Bug fixed #1028240 (Ignacio Nin).
  • Fix for bug #905334 regressed by adding debug-specific code with missing local variable that would break the debug build. Bug fixed #1046389 (Laurynas Biveinis).
  • Fix for bug #686534 caused a regression by mishandling LRU list mutex. Bug fixed #1053087 (George Ormond Lorch III).
  • Fixed the upstream MySQL bug #67177, Percona Server 5.1 was incompatible with Automake 1.12. Bug fixed #1064953 (Alexey Kopytov).
  • Flashcache support resulted in confusing messages in the error log on Percona Server startup, even when flashcache was not used. This was fixed by adding new boolean option flashcache. When set to 0 (default), flashcache checks are disabled and when set to 1 checks are enabled. Error message has been made more verbose including error number and system error message as well. Bug fixed #747032 (Sergei Glushchenko).
  • Custom server builds would crash when compiled with a non-default maximum number of indexes per table. Upstream MySQL bugs fixed: #54127, #61178, #61179 and #61180. Bug fixed #1042517 (Sergei Glushchenko).
  • Cleaned up the test duplicates in regular atomic operation tests. Bug fixed #1039931 (Laurynas Biveinis).

Release notes for Percona Server 5.1.66-14.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Oct
26
2012
--

Announcing Percona Server 5.5.28-29.1

Percona is glad to announce the release of Percona Server 5.5.28-29.1 on October 26th, 2012 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.5.28, including all the bug fixes in it, Percona Server 5.5.28-29.1 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.28-29.1 milestone at Launchpad.

New Features:

  • Percona Server has ported Twitter’s MySQL NUMA patch. This patch implements improved NUMA support as it prevents imbalanced memory allocation across NUMA nodes.

Bugs Fixed:

  • Percona Server would disconnect clients if gdb was attached and detached. This was caused by wrong signal handling. Bugs fixed #805805 and #1060136 (Laurynas Biveinis).
  • Fixed the upstream MySQL #62856, where slave server would crash after update statement. Bug fixed #1053342 (George Ormond Lorch III).
  • Reads from tablespaces being deleted would result in buffer pool locking error. Bug fixed #1042640 (Stewart Smith).
  • Resolved the Ubuntu Percona Server package conflicts with upstream packages. Bug fixed #907499 (Ignacio Nin).
  • Crash-resistant replication would break with binlog XA transaction recovery. If a crash would happened between XA PREPARE and COMMIT stages, the prepared InnoDB transaction would not have the slave position recorded and thus would fail to update it once it is replayed during binlog crash recovery. Bug fixed #1012715 (Laurynas Biveinis).

Release notes for Percona Server 5.5.28-29.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Oct
25
2012
--

L2 cache for MySQL

The idea to use SSD/Flash as a cache is not new, and there are different solutions for this, both OpenSource like L2ARC for ZFS and Flashcache from Facebook, and proprietary, like directCache from Fusion-io.
They all however have some limitations, that’s why I am considering to have L2 cache on a database level, as an extension to InnoDB buffer pool.
Fortunately, there is a project in progress Flash_Cache_For_InnoDB by David which implements this.
David helped us to port his work to the latest Percona Server and you can get it from our launchpad Percona Server 5.5.28

I think that naming this as Flash_Cache is confusing due similarity to Flashcache from Facebook, so I prefer to name it as L2cache.

There is a quick benchmark on tpcc-mysql 2500W (250GB), I used

  • data on RAID
  • data on SSD (Intel SSD 910)
  • data on RAID with L2 cache on Intel SSD 910, and size of cache is 150GB

As we see the result is quite good an promising, so I am going to continue to research on further integration with Percona Server.
Right now there are two challenging questions, which we need to resolve:

      Backup. In current state only mysqldump will work. To make backups with Percona XtraBackup we will need additional support of L2 cache in XtraBackup
      Recovery time. Right now if a crash happens, the cache is safe, but recovery time is on longer side, we need to see how it can be improved
Oct
25
2012
--

Replaying database load with Percona Playback

If you are planning to upgrade or make any configuration change on your MySQL database the first advice usually is:

– Benchmark!

How should we do that benchmark? People usually run generic benchmark tools like sysbench, tpcc or mysqlslap that are good to know the number of transactions per seconds that a database can do but it doesn’t care about your workload, data set or queries. Those tools just run random queries against random generated data.

The best way to run a benchmark is replaying the load of your production server against a different database server and here is where Percona Playback can help us. Percona Playback is a new tool that can replay the data captured from the production server in another different server. It can replay queries from tcpdump or slow query logs. With this tool you can measure how a database upgrade, change on my.cnf or schema change can affect the overall performance of your application.

In the example I’m going to show how to meassure the impact of my.cnf changes replaying a slow query log. The service starts with an empty my.cnf configuration so all parameters are set to their default values. We have two servers, PROD and DEV. PROD is the database that is in production and where our customers are working. DEV is where we are going to test all the changes and replay the load. Both servers should have an identical copy of the data, using DEV as slave of PROD or taking an xtrabackup copy just before the data capture.

Important: don’t run the benchmark on the production server because Percona Playback will run both SELECT and DML queries. There is still no –read-only parameter :)

So now we need to enable the slow query log on the PROD server. We should have it enabled during the peak time to get valuable information:

SET GLOBAL slow_query_log=1;
SET GLOBAL long_query_time=0;

With long_query_time we are logging all queries that are being executed in our database. After capturing enough data and copying the slow query log to the DEV server we can replay the load:

percona-playback --mysql-host=127.0.0.1 --mysql-user=root --mysql-schema=my_db --query-log-file=slow.log

Executed 429467 queries
Spent 00:00:59.668649 executing queries versus an expected 00:00:47.085301 time.
98966 queries were quicker than expected, 330501 were slower
A total of 51 queries had errors.
Expected 372462 rows, got 371851 (a difference of 611)
Number of queries where number of rows differed: 324.

That’s the report of the tool. Seems that the it needed 59 seconds to execute the load but it was 47 seconds originally on the production server. 98966 queries were faster and 330501 slower. We also see that some queries had errors. Those error are shown on the stderr so you can redirect them and review later. The typical reason for an error is that a particular query returns different number or rows. For example:

Connection 43 Rows Sent: 210 != expected 211 for query: SELECT DISTINCT ol_i_id FROM order_line WHERE ol_w_id = 1 AND ol_d_id = 10 AND ol_o_id < 3415 AND ol_o_id >= (3415 - 20);

Why there are some queries returns different number of rows? Usually because the data set is not exactly the same on PROD and DEV servers. In the previous report we see that 324 queries returned different row numbers from 429467 queries in total.

Let’s modify the my.cnf to see if we can get any improvement. Take in account that the data set on DEV has been modified by the previous replay so you should restore it before the next benchmark. I’m going to add 1GB of buffer pool and run again the benchmark (innodb_buffer_pool_size=1G).

Executed 429467 queries
Spent 00:00:59.856656 executing queries versus an expected 00:00:47.085301 time.
98730 queries were quicker than expected, 330737 were slower
A total of 51 queries had errors.
Expected 372462 rows, got 371851 (a difference of 611)
Number of queries where number of rows differed: 324.

Not too many changes, is very similar to the previous report. What is the reason to see no improvement? Well, I haven’t warmed up the buffer pool. Usually when doing a benchmark is a good idea to read the data first to warm up the buffer pool. This blog post is not a benchmark per se, just an overview of Percona Playback so we’re going to ignore that. Second try, add 2GB of InnoDB Log Files (innodb_log_file_size=1G, innodb_log_files_in_group=2):

Executed 429467 queries
Spent 00:00:56.772552 executing queries versus an expected 00:00:47.085301 time.
99959 queries were quicker than expected, 329508 were slower
A total of 51 queries had errors.
Expected 372462 rows, got 371851 (a difference of 611)
Number of queries where number of rows differed: 324.

Now we see an improvement but we can go further. Last change, flush logs every second instead of every commit (innodb_flush_log_at_trx_commit=2):

Executed 429467 queries
Spent 00:00:51.605932 executing queries versus an expected 00:00:47.085301 time.
111833 queries were quicker than expected, 317634 were slower
A total of 51 queries had errors.
Expected 372462 rows, got 371851 (a difference of 611)
Number of queries where number of rows differed: 324.

Much better, 51 seconds. Very near to the original value :)

This is an easy example of how we can test our changes or updates using real data and real queries from our application. The tool is under active development. If you find any bugs we would appreciate it being filed on launchpad. You can download the tool from:

http://www.percona.com/downloads/Percona-Playback/

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com