Apr
30
2010
--

MongoDB Approach to Availability

Another thing I find interesting about MongoDB is its approach to Durability, Data Consistency and Availability. It is very relaxed and will not work for some applications but for others it can be usable in current form. Let me explain some concepts and compare it to technologies in MySQL space.

First I think MongoDB is best compared no to MySQL Server but MySQL Cluster, especially in newer versions which implement “sharding”. Same as commit to NDB Storage engine does not normally mean commit to disk, but rather commit to network it does not mean commit to disk with MongoDB, furthermore MongoDB uses Asynchronous replication, meaning it may take some time before data will be at more than one node. You can also use getLastError() to ensure data is propagated to the slave. So you can see it as a hybrid between MySQL Cluster and innodb_flush_log_at_trx_commit=2 mode. The second difference of course the fact MongoDB is not crash safe – similar to MyISAM database will need to be repaired if it crashes. Still I find behavior somewhat similar – you’re not expected to run MySQL Cluster without replication, MongoDB is practically the same.

Second – if we look at Replication Sets we find them very similar to MySQL Cluster though designed to work with Wide area network and so Async replication. There is voting required to pick the master node in case of node failure and at least 3 servers is recommended, where you can have some voting servers only cast their votes and hold no data. The other different is there is only one master rather than multiple. This is because doing master with asynchronous replication requires conflict resolution which can be tricky in general sense and MongoDB wants simplicity of operation for developers and administration.

Third if we look at how failover happens – same with NDB (native API) it is handled on driver level. When you connect to replication set you connect to set of server not one of them and if one server fails driver fails over to different master. Things are again tuned to deal with Asynchronous Replication. Consistency is maintained but at expense of certain changes may be thrown away/ “rolled back” in case of fail over.

This approach is not as clean as best possible “no committed data loss with almost instant fail over” but It makes sense for large number of applications. In fact using MySQL Replication for failover we’re operating with kind of similar situation, just with a lot less automation.

The good question of course is how robust these features are in MongoDB – many of them are new and Replication Sets are in development still. It may take a time for them to stabilize as well as later develop tools around them. How to check if 2 MongoDB nodes are indeed in sync ? How to do Hot Backups with point in time recovery ? These and many similar questions need to be answered and bugs worked out. One good example of early stage of MongoDB replication could be a bug mentioned during presentation today with replication breaking if time on master server is changed (MongoDB uses timestamps to identify events in replication log). It was just fixed last month I understood.

At the same time many things, including replication are a lot more simply with MongoDB and there is a lot less of old baggage so I hope it will be able to stabilize and mature quickly.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
Apr
30
2010
--

MongoDB Approach to database synchronization

I went to MongoSF today – quite an event, and I hope to have a chance to write more about it. This post is about one replication problem and how MongoDB solves it.

If you’re using MySQL Replication when your master goes down it is possible for some writes to be executed on the master, but not on the slave, which gets promoted to the master. When Master comes back up it has some updates done to it which cause it to be inconsistent with data on the new Master. In MySQL world we can chose to either ignore this problem (or may be even replay those changes on slaves and hope it works out), re-clone it from the slave or use mk-table-checksum to find inconsistencies and re-sync them with mk-table-sync. Both of these operations can be very expensive for large databases.

MongoDB approach used in Replication Sets is for failed master to scan its log files to find all object ids which were modified from the point slave synchronized successfully and retrieve those objects back from the new master (or delete them if they no more exist). Such approach allows quick synchronization without any complex support of rolling back changes. In MongoDB there is a catch with this approach – because there is no local durability this also works as long as network goes down but server stays up, however once Single Server Durability is implemented it will be pretty cool.

What is really interesting – it should be possible to apply the same concept to MySQL Replication, possibly with help of some tools like MMM. Row level Replication makes it possible to identify the objects which were changed on the Master after failover to Slave happened and they can be dumped to local file (in case one wants to synchronize them manually) and when fetched again from the master.
This of course will require IDEMPOTENT slave mode but otherwise it should work unless you have DDL operations in between.

In general listening the great presentation on MongoDB Replication by Dwight Merriman as well as previously looking at how replication done in Redis I should say things can be done a lot more simple way when there is no schema and when you do not have to mess with complex features like triggers or multiple storage engines.


Entry posted by peter |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
Apr
29
2010
--

Seeking volunteers for Percona documentation

Percona wants to upgrade our documentation to improve its readability
and to make it more useful for you, our clients and partners. We are
so busy developing software and handling your needs that we have
trouble finishing all the documentation! We think you can help.

Helping us will give you a chance to interact closely with lead Perona
developers and learn more about Percona’s products as well as our
development process.

Tasks include talking to developers, writing text, and interacting
with an editor. (Andy Oram, our O’Reilly editor on the book High
Performance MySQL, will take on the volunteer role.) You should have
some understanding of MySQL and Percona’s extensions, and of way
XtraDB and XtraBackup work.

Documents we want to start with include:

* Product features (we made a start at
http://www.percona.com/docs/wiki/percona-xtradb:features:start)

* Release notes

* Installation instructions for XtraBackup

Eventually we’ll also upgrade our user guides, a bigger job.

If this work intrigues you, please contact Andy Oram and
andyo@praxagora.com. Andy has offered to coordinate efforts for free,
although we will pay him for professional editing.


Entry posted by Vadim |
5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: announce,MySQL,Zend Developer |
Apr
29
2010
--

Presentations Announcement: 2010 O’Reilly MySQL Conference & Expo

The Percona team participated at this year’s O’Reilly MySQL Conference & Expo held April 12-15, 2010 in Santa Clara, California. We gave a lot of talks on various topics and all of those presentations are now available. Here’s the list:

You can also read more about this conference on Percona site.


Entry posted by Michael Rikmas |
3 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
27
2010
--

Level 2 Flash cache is there

As I mentioned in my talk An Overview of Flash Storage for Databases I see in the near and middle term future a lot of interest for using Flash storage in Level 2 caching level. The price-capacity trade-off makes Flash as the very good fit for a cache layer.
Actually it is not the new idea, and it was implemented in L2ARC ZFS for two years already.
It is also described in ACM article “Flash storage memory” by Adam Leventhal.

So I am exciting to see it is available for Linux now. Paul Saab and Mohan Srinivasan from Facebook released Flashcache, which allows to use Flash as cache in front of regular hard drives.

The implementation is based on dm-cache and introduces a new block level device to the system, which you use instead of a regular partition.

I congrats Facebook with implementation of great idea and I am happy to see Facebook releases it under GPL license. I am working on builds to see if we can provide binaries for different platforms.

And, I also see another idea implemented by David Jiang
http://code.google.com/p/david-mysql-tools/wiki/innodb_secondary_buffer_pool, it allows to use a file on Flash partitions as L2 cache for InnoDB buffer pool. I see a lot of potential here, and I am going to evaluate it for including into XtraDB.


Entry posted by Vadim |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Innodb,MySQL,Zend Developer |
Apr
26
2010
--

XtraDB / InnoDB internals in drawing

I did some drawing exercise and put XtraDB / InnoDB internals in Visio diagram:

The XtraDB differences and main parameters are marked out.

PDF version is there http://www.percona.com/docs/wiki/percona-xtradb:internals:start.


Entry posted by Vadim |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
24
2010
--

How fast is FLUSH TABLES WITH READ LOCK?

A week or so ago at the MySQL conference, I visited one of the backup vendors in the Expo Hall. I started to chat with them about their MySQL backup product. One of the representatives told me that their backup product uses FLUSH TABLES WITH READ LOCK, which he admitted takes a global lock on the whole database server. However, he proudly told me that it only takes a lock for “a couple of milliseconds.” This is a harmful misconception that many backup vendors seem to hold dear.

The truth is, this command can take a lock for an indeterminate amount of time. It might complete in milliseconds on a test system in a laboratory, but I have seen it take an extremely long time on production systems, measured in many minutes, or potentially even hours. And during this time, the server will get completely blocked (not just read-only!) To understand why, let’s look at what this command actually does. There are several important parts of processing involved in the command.

Requesting the lock

The FLUSH TABLES WITH READ LOCK command immediately requests the global read lock. As soon as this happens, even before the lock is granted to it, all other processes attempting to modify anything in the system are locked out. In theory, this might not seem so bad because after all, the command acquires only a read lock. Other commands that need only a read lock can coexist with this. However, in practice, most tables are both read and written. The first write query to each table will immediately block against the requested global read lock, and subsequent read queries will block against the write query’s requested table lock, so the real effect is that the table is exclusively locked, and all new requests into the system are blocked. Even read queries!

Waiting for the lock

Before the FLUSH TABLES WITH READ LOCK command can successfully acquire the lock, anything else that currently holds the lock must finish what it’s doing. That means that every currently running query, including SELECT queries, must finish. So if there is a long-running query on the system, or an open transaction or another process that holds a table lock, the FLUSH TABLES WITH READ LOCK command itself will block until the other queries finish and all locks are released. This can take a very long time. It is not uncommon for me to log on to a customer’s system and see a query that has been running for minutes or hours. If such a query were to begin running just before the FLUSH TABLES WITH READ LOCK command is issued, the results could be very bad.

Here’s one example of what the system can look like while this process is ongoing:

SQL:

  1. mysql> SHOW processlist;
  2. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  3. | Id | User | Host      | db   | Command    | Time | State             | Info                                                                 |
  4. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  5. 4 | root | localhost | test | Query      |   80 | Sending DATA      | SELECT count(*) FROM t t1 JOIN t t2 JOIN t t3 JOIN t t4 WHERE t1.b=0 |
  6. 5 | root | localhost | test | Query      |   62 | Flushing TABLES   | FLUSH TABLES WITH READ LOCK                                          |
  7. 6 | root | localhost | test | FIELD List |   35 | Waiting FOR TABLE |                                                                      |
  8. 7 | root | localhost | test | Query      |    0 | NULL              | SHOW processlist                                                     |
  9. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  10. 4 rows IN SET (0.00 sec)

Notice that connection 6 can’t even log in because it was a MySQL command-line client that wasn’t started with -A, and it’s trying to get a list of tables and columns in the current database for tab-completion. Note also that “Flushing tables” is a misnomer — connection 5 is not flushing tables yet. It’s waiting to get the lock.

Flushing tables

After the FLUSH TABLES WITH READ LOCK command finally acquires the lock, it must begin flushing data. This does not apply to all storage engines. However, MyISAM does not attempt to flush its own data to the disk during normal processing. It relies on the operating system to flush the data blocks to disk when it decides to. As a result, a system that has a lot of MyISAM data might have a lot of dirty blocks in the operating system buffer cache. This can take a long time to flush. During that time, the entire system is still locked. After all the data is finished, the FLUSH TABLES WITH READ LOCK command completes and sends its response to the client that issued it.

Holding the lock

The final part of this command is the duration during which the lock is held. The lock is released with UNLOCK TABLES or a number of other commands. Most backup systems that use FLUSH TABLES WITH READ LOCK are performing a relatively short operation inside of the lock, such as initiating a filesystem snapshot. So in practice, this often ends up being the shortest portion of the operation.

Conclusion

A backup system that is designed for real production usage must not assume that FLUSH TABLES WITH READ LOCK will complete quickly. In some cases, it is unavoidable. This includes backing up a mixture of MyISAM and InnoDB data. But many installations do not mix their data this way, and should be able to configure a backup system to avoid this global lock. There is no reason to take a lock at all for backing up only InnoDB data. Completely lock-free backups are easy to take. Backup vendors should build this capability into their products.


Entry posted by Baron Schwartz |
27 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
21
2010
--

MySQL 5.5.4 in tpcc-like workload

MySQL-5.5.4 ® is the great release with performance improvements, let’s see how it performs in
tpcc-like workload.

The full details are on Wiki page
http://www.percona.com/docs/wiki/benchmark:mysql:554-tpcc:start

I took MySQL-5.5.4 with InnoDB-1.1, tpcc-mysql benchmark with 200W ( about 18GB worth of data),
InnoDB log files are 3.8GB size, and run with different buffer pools from 20GB to 6GB. The storage is FusionIO 320GB MLC card with XFS-nobarrier. .

While the raw results are available on Wiki, there are graphical results.

I intentionally put all line on the same graph to show trends.

It seems adaptive_flushing is not able to keep up and you see periodical drops when InnoDB starts flushing. I hope InnoDB team will fix it before 5.5 GA.

I expect reasonable request how it can be compared with Percona Server/XtraDB, so there is
the same load on our server:

As you see our adaptive_checkpoint algorithm is performing much stable.

And to put direct comparison, there is side-to-side results for 10GB buffer_pool case.

So as you see InnoDB is doing great, trying to keep performance even, as in previous release, there was about 1.7x times difference. I expect to see more improvements in 5.5-GA.


Entry posted by Vadim |
21 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
16
2010
--

Netbeans Settingsfile for Symfony

As I’m working with Netbeans when developing Symfony-Applications under Windows I created some Code Templates and Shortcuts to work more effective with Netbeans and Symfony. If you are interessted you can download the Settingsfile and import it to your Netbeans installation.

You can get it here: http://www.symfony-zone.com/wordpress/netbeans/

Apr
14
2010
--

Is there a performance difference between JOIN and WHERE?

I’ve heard this question a lot, but never thought to blog about the answer. “Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?”

No, there’s no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.

SQL:

  1. SELECT * FROM A, B WHERE A.ID = B.ID;
  2.  
  3. SELECT * FROM A JOIN B ON A.ID = B.ID;
  4.  
  5. SELECT * FROM A JOIN B USING(ID);

Personally, I prefer to stay away from “comma joins” because a) the ANSI join syntax is more expressive and you’re going to use it anyway for LEFT JOIN, and mixing styles is asking for trouble, so you might as well just use one style; b) I think ANSI style is clearer. But that’s just personal preference.


Entry posted by Baron Schwartz |
20 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com