Oct
31
2011
--

Percona Training: Dublin, San Diego, NYC, and WDC

If you have been waiting for Percona’s highly rated MySQL courses to be taught near you, your wait is over. We will be holding our Developer, DBA, InnoDB, and Operations training in the following cities:

  • Dublin, Ireland – December 5 – 8, 2011
  • San Diego, CA – December 12 – 15, 2011
  • New York, NY – January 9 – 12, 2012
  • Washington, DC – January 16 – 19, 2012
Full details on the courses can be found on the Percona website.
We are providing a special discount code to our MySQL Performance Blog readers. When you purchase your ticket, enter discount code PBLOG to receive 10% off. Act now while early registration is open to save even more.
Oct
26
2011
--

Clustrix benchmarks under tpcc-mysql workload

I’ve been working with Clustrix team for long time on the evaluation of Clustrix product, and this is the report on performance characteristics of Clustrix under tpcc-mysql workload.

I tested tpcc 5000W (~500GB of data in InnoDB) on Clustrix systems with 3, 6, 9-nodes and also, to have base for comparison, ran the same workload on HP ProLiant DL380 G6 powered by Fusion-io card, and on SuperMicro server powered by 7 Intel SSD 320 cards (this server is equal to hardware that Clustrix uses for its nodes).

The full report is available on our page with whitepapers, and in this post I would like to highlight the most interesting points.

The chart with comparison of all systems ( results in throughput per 10 sec, more is better)

So my conclusions from this benchmark:

  • Clustrix shows very good scalability in the high concurrent workload by adding additional nodes.
    In fact the throughput improves more than by 2 times (3 times) by doubling (tripling) amount of nodes. This is possible Clustrix automatically distributes data around new nodes, and data/memory ratio decreases, which allows to achieve better throughput per node.
  • Clustrix is able to handle such complex workload as tpcc, and automatically distributes load between nodes despite multi-statements transactions and foreign key relations.
  • For a workload with a small number of threads, Clustrix does not perform as well as the system with Fusion-io cards.
  • We also should take into account that Clustrix automatically provides high availability, maintaining redundant information on each node. Other systems in comparison are not fault- or crash-tolerant.

So looking on the results, Clustrix might be not your first choice for single-thread or low concurrency workloads from the performance point of view, but consider other factors such as high availability and transparent auto-rebalancing out-of-the-box. For high concurrent workloads, Clustrix provides great performance, and if you need better throughput, just add more nodes.

The other factor which would be interesting to compare, but I did not do that in this research, is the total cost of system. I need to ask Clustrix how cost of 3,6,9 nodes system is compared to other systems in comparison.

Standard dislaimer: this post is part of paid evaluation we perform for Clustrix, but is totally independent and fully reflects our opinion.


Oct
25
2011
--

Looking for InnoDB/XtraDB hacker

We are happy to see that Percona Server/XtraDB and XtraBackup are raising popularity and to keep excitement we want to have more features and more performance fixes! If you are experienced software engineer and feel strong enough to hack InnoDB/XtraDB code we invite you to join our development team to work on the following (but not limited to) problems:

If any of that sounds interesting, please contact us or drop me email directly.

Oct
25
2011
--

Free Percona Live London Reception at Revolution Bar

Even if you didn’t make it to Percona Live London, you can get the chance to hang out with us.
We’re giving a reception at the Revolution Bar America Square just a few meters walk from the Tower Hill Underground between 17:30 and 19:30 today (Tuesday 25 October 2011).

All friends of Percona are welcome. See you there!

Thanks to Clustrix for sponsoring the drinks.

Oct
24
2011
--

Symfony Day 2011 in Cologne (Slides)

Last week I attended the Symfony Day 2011 in cologne. The event was great and really well organized. I went there for the conference day at friday. From what I heared the thursday workshops were good. The conference place was pretty crowdy and packed. Over 250 Symfony people were attending.

The day was packed with sessions. Igor Wiedler was first with a nice introduction to the Silex framework. I already used Silex in a small project but nevertheless Igor was pretty inspiring and I’m more convinced that I will use the Framework for some use-cases in the future.

Next up was Marc Weistroff with a talk about developer habits and how to learn from the Symfony2 sourecode. The presentation was nice and there were some useful tidbits but I think it would have been more useful if there would have been more concrete examples.

The last session before the lunch break was held by Stefan Koopmanschap and Christian Schäfer. Their talk about “Catching Opportunities with Open Source” was pretty well prepared (maybe the best prepared at the Symfony Day). The presentation style was fluid, humorous and it was totally motivating. Even if it was not directly Symfony-related there was so much of what you can learn from the two speakers own experience with open-source that it was the right descission to place such a talk at the Symfony-Day.

After the lunch break the sessions got more technical. The frist one was about the Sonata Admin Bundle. This was the session I was looking forward to. I know Thomas Rabaix did a wonderful job with the Admin-Bundle and I havn’t used it except in a small demo which didn’t scratched the surface. So I was eager to hear more about it. Sorry to say the talk doesn’t held up with my expectations. There was to much in detail informations, up to no overview on whats possible or some examples.

Next up was Hugo Hamon with its talk about how to create command-line applications/scripts with Symfony2 and the Console Symfony Component. The talk was really informative, nicely prepared and well presented.

The last session came from Richartd Miller who blogged a lot about Symfony2 and Dependency Injection in the last months. His talk about DI was a pretty good introduction to the topic. Interesstingly Richard was the only native english speaker of the speakers but the one I had most difficulties to understand :)

The highlight of the day was the Keynote held by Fabien Potencier. Showing his understanding of what Symfony2 is and whats not. His view on MVC, Symfony components was enlightend. He showed the new profiling capabilities of Symfony 2.1 too.

Next year the Symfony Day will become Symfony Live and will be located in Berlin. I hope I will find more time to attend the workshops and the hack-day.

flattr this!

Oct
21
2011
--

Find where your data is replicated, win a Percona Live ticket

Percona Live London starts on Monday, so this is the last in the series of free-Percona-Live tickets we’ll give away. But first — have you ever wondered what servers are in your MySQL replication hierarchy?

We have, too. As consultants, a lot of times we need to get a quick overview of the whole replication hierarchy and specific things about it, such as replication filters, versions, whether replication is running, how far behind it is, and so on. These things are very important for making sure that we don’t break something in the system, get blamed for breaking replication when it was already broken, and so on.

So we built pt-slave-find (it was actually born as mk-slave-find, ages ago before I even worked at Percona). It automatically detects and connects to every server in the replication hierarchy, and prints out a summary of each one’s most important details.

Here’s an example. Notice how one of the servers has a replication filter on it, the servers don’t all have the same version installed, and some are in the AWS cloud. All of these things are important to know — if we run a checksum on the non-cloud servers, for example, we could make the cloud servers fall far behind in replication.

$ perl pt-slave-find localhost
localhost
Version         5.1.57-rel12.8-log
Server ID       6275
Uptime          10+01:18:10 (started 2011-10-08T19:22:04)
Replication     Is a slave, has 5 slaves connected, is not read_only
Filters
Binary logging  MIXED
Slave status    0 seconds behind, running, no errors
Slave mode      IDEMPOTENT
Auto-increment  increment 4, offset 2
InnoDB version  1.0.16-12.8
+- 10.124.62.76
   Version         5.1.57-rel12.8-log
   Server ID       6276
   Uptime          12+03:35:17 (started 2011-10-06T17:04:57)
   Replication     Is a slave, has 2 slaves connected, is read_only
   Filters
   Binary logging  MIXED
   Slave status    0 seconds behind, running, no errors
   Slave mode      IDEMPOTENT
   Auto-increment  increment 4, offset 2
   InnoDB version  1.0.16-12.8
   +- 10.124.62.78
      Version         5.1.57-rel12.8-log
      Server ID       6278
      Uptime          12+03:53:59 (started 2011-10-06T16:46:15)
      Replication     Is a slave, has 0 slaves connected, is read_only
      Filters
      Binary logging  MIXED
      Slave status    0 seconds behind, running, no errors
      Slave mode      IDEMPOTENT
      Auto-increment  increment 4, offset 2
      InnoDB version  1.0.16-12.8
+- 10.124.62.77
   Version         5.1.57-rel12.8-log
   Server ID       6277
   Uptime          12+04:00:24 (started 2011-10-06T16:39:50)
   Replication     Is a slave, has 0 slaves connected, is read_only
   Filters
   Binary logging  MIXED
   Slave status    0 seconds behind, running, no errors
   Slave mode      IDEMPOTENT
   Auto-increment  increment 4, offset 2
   InnoDB version  1.0.16-12.8
+- 10.124.62.86
   Version         5.1.57-rel12.8-log
   Server ID       6286
   Uptime          12+02:52:15 (started 2011-10-06T17:47:59)
   Replication     Is a slave, has 4 slaves connected, is not read_only
   Filters
   Binary logging  MIXED
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  1.0.16-12.8
   +- [redacted].compute-1.amazonaws.com
      Version         5.1.43-60.hardy.7-log
      Server ID       3141592
      Uptime          11:29:58 (started 2011-10-18T09:10:16)
      Replication     Is a slave, has 0 slaves connected, is not read_only
      Filters         replicate_wild_ignore_table=foo%.mount%
      Binary logging  ROW
      Slave status    0 seconds behind, running, no errors
      Slave mode      STRICT
      Auto-increment  increment 1, offset 1
      InnoDB version  1.0.6-unknown
   +- [redacted].compute-1.amazonaws.com
      Version         5.1.43-60.hardy.7-log
      Server ID       19924124
      Uptime          12+23:32:10 (started 2011-10-05T21:08:04)
      Replication     Is a slave, has 1 slaves connected, is not read_only
      Filters
      Binary logging  STATEMENT
      Slave status    0 seconds behind, running, no errors
      Slave mode      STRICT
      Auto-increment  increment 1, offset 1
      InnoDB version  1.0.6-unknown
   +- [redacted].compute-1.amazonaws.com
      Version         5.1.43-60.hardy.7-log
      Server ID       207
      Uptime          12+23:40:41 (started 2011-10-05T20:59:34)
      Replication     Is a slave, has 1 slaves connected, is not read_only
      Filters
      Binary logging  STATEMENT
      Slave status    0 seconds behind, running, no errors
      Slave mode      STRICT
      Auto-increment  increment 1, offset 1
      InnoDB version  1.0.6-unknown
   +- [redacted].compute-1.amazonaws.com
      Version         5.1.43-60.hardy.7-log
      Server ID       330612
      Uptime          12+23:44:28 (started 2011-10-05T20:55:47)
      Replication     Is a slave, has 0 slaves connected, is not read_only
      Filters
      Binary logging  STATEMENT
      Slave status    0 seconds behind, running, no errors
      Slave mode      STRICT
      Auto-increment  increment 1, offset 1
      InnoDB version  1.0.6-unknown
+- [redacted].compute-1.amazonaws.com
   Version         5.1.43-60.hardy.7-log
   Server ID       215924
   Uptime          12+08:15:08 (started 2011-10-06T12:25:08)
   Replication     Is a slave, has 6 slaves connected, is read_only
   Filters
   Binary logging  MIXED
   Slave status    0 seconds behind, running, no errors
   Slave mode      IDEMPOTENT
   Auto-increment  increment 4, offset 3
   InnoDB version  1.0.6-unknown
+- 10.153.116.5
   Version         5.1.57-rel12.8-log
   Server ID       1165
   Uptime          20+09:44:46 (started 2011-09-28T10:55:30)
   Replication     Is a slave, has 0 slaves connected, is read_only
   Filters
   Binary logging  MIXED
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 4, offset 3
   InnoDB version  1.0.16-12.8

So that’s how you can quickly find out what’s in your replication hierarchy, how it’s configured, and a few basic bits of status information.

Now how do you win a free ticket to Percona Live London? Watch our @percona Twitter stream to find out!

Oct
16
2011
--

Percona Server 5.1.59-13.0

Percona is glad to announce the release of Percona Server 5.1.59-13.0 on October 15, 2011 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.1.59, including all the bug fixes in it, Percona Server 5.1.59-13.0 is now the current stable release in the 5.1 series. All of Percona’s software is open-source and free, all the details of the release can be found in the 5.1.59-13.0 milestone at Launchpad.

New Features

InnoDB Fake Changes

When restarting a slave server in a replication environment, the process can be speed up by having prefetch threads to warm the server: replay statements and then rollback at commit.

That makes prefetch simple but has high overhead from locking rows only to undo changes at rollback.

Using this approach, support for Fake Changes have been implemented in order to remove the overhead and make it faster.

By reading the rows for “INSERT“, “UPDATE“ and “DELETE“ statements but not updating them (Fake Changes), the rollback is very fast as in most cases there is nothing to do.

InnoDB Kill Idle Transactions

NOTE: Percona classes this feature as Beta and possibly not yet suited for production environments.

This feature limits the age of idle |XtraDB| transactions. If a transaction is idle for more seconds than the threshold specified, it will be killed. This prevents users from blocking purge by mistake.

Block startup until LRU dump is loaded

Added a new boolean option, –innodb-blocking-lru-restore, which is OFF by default. When set to ON, restoring from the LRU dump file is synchronous, i.e. InnoDB waits until it is complete before reporting successful startup to the server. #785489 (Alexey Kopytov).

Behavior changes

The Fast Index Creation Feature has been disabled by default to align the behavior with upstream. The boolean variable innodb_expand_fast_index_creation has been introduced for enabling or disabling this feature. #858945 (Alexey Kopytov).

Bugs Fixed

  • InnoDB requires a full table rebuild for foreign key changes. This unnecessarily delays their creation in a mysqldump output, so –innodb-optimize-keys should ignore foreign key constrains. #859078 (Alexey Kopytov).
  • After adding an index using Fast Index Creation, statistics for that index provided by InnoDB were left in a bogus state until an explicit ANALYZE TABLE is executed. #857590 (Alexey Kopytov).
  • QUERY_RESPONSE_TIME doesn’t respect QUERY_RESPONSE_TIME_STATS #855312 (Oleg Tsarev).
  • The mysqldump option –innodb-optimize-keys did not work correctly with tables where the first UNIQUE key on non-nullable columns was picked as the clustered index by InnoDB in the absence of a PRIMARY KEY#851674 (Alexey Kopytov).
  • Backported fix for MySQL bug #53761 (Wrong estimate for RANGE query with compound indexes). #832528 (Alexey Kopytov).
  • Added a new boolean option, –innodb-blocking-lru-restore, which is OFF by default. When set to ON, restoring from the LRU dump file is synchronous, i.e. InnoDB waits until it is complete before reporting successful startup to the server. #785489 (Alexey Kopytov).
  • Fixed assertion failure in InnoDB. #814404 (Yasufumi Kinoshita).
  • Since AUTO_INCREMENT columns must be defined as keys, omitting key specifications and then adding them back in ALTER TABLE doesn’t work for them. mysqldump --innodb-optimize-keys has been fixed to take this into account. #812179 (Alexey Kopytov).

Other Changes

  • Improvements and fixes on general distribution: #858467#845019, (Alexey Kopytov).

More Information

For more information, please see the following links:

Oct
16
2011
--

Percona Server 5.5.16-22.0

Percona is glad to announce the release of Percona Server 5.5.16-22.0 on October 14, 2011 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.5.16, including all the bug fixes in it, Percona Server 5.5.16-22.0 is now the current stable release in the 5.5 series. All of Percona’s software is open-source and free, all the details of the release can be found in the 5.5.16-22.0 milestone at Launchpad.

New Features

InnoDB Fake Changes

When restarting a slave server in a replication environment, the process can be speed up by having prefetch threads to warm the server: replay statements and then rollback at commit.

That makes prefetch simple but has high overhead from locking rows only to undo changes at rollback.

Using this approach, support for Fake Changes have been implemented in order to remove the overhead and make it faster.

By reading the rows for “INSERT“, “UPDATE“ and “DELETE“ statements but not updating them (Fake Changes), the rollback is very fast as in most cases there is nothing to do.

InnoDB Kill Idle Transactions

NOTE: Percona classes this feature as Beta and possibly not yet suited for production environments.

This feature limits the age of idle |XtraDB| transactions. If a transaction is idle for more seconds than the threshold specified, it will be killed. This prevents users from blocking purge by mistake.

Block startup until LRU dump is loaded

Added a new boolean option, –innodb-blocking-buffer-pool-restore (in 5.1 it is –innodb-blocking-lru-restore), which is OFF by default. When set to ON, restoring from the LRU dump file is synchronous, i.e. InnoDB waits until it is complete before reporting successful startup to the server. #785489 (Alexey Kopytov).

Behavior changes

The Fast Index Creation Feature has been disabled by default to align the behavior with upstream. The boolean variable innodb_expand_fast_index_creation has been introduced for enabling or disabling this feature. #858945 (Alexey Kopytov).

Bugs Fixed

  • InnoDB requires a full table rebuild for foreign key changes. This unnecessarily delays their creation in a mysqldump output, so –innodb-optimize-keys should ignore foreign key constrains. #859078 (Alexey Kopytov).
  • After adding an index using Fast Index Creation, statistics for that index provided by InnoDB were left in a bogus state until an explicit ANALYZE TABLE is executed. #857590 (Alexey Kopytov).
  • QUERY_RESPONSE_TIME doesn’t respect QUERY_RESPONSE_TIME_STATS #855312 (Oleg Tsarev).
  • The mysqldump option –innodb-optimize-keys did not work correctly with tables where the first UNIQUE key on non-nullable columns was picked as the clustered index by InnoDB in the absence of a PRIMARY KEY#851674 (Alexey Kopytov).
  • Added a new boolean option, –innodb-blocking-lru-restore, which is OFF by default. When set to ON, restoring from the LRU dump file is synchronous, i.e. InnoDB waits until it is complete before reporting successful startup to the server. #785489 (Alexey Kopytov).
  • The Slow Query Log did not log the error number correctly. #830199 (Oleg Tsarev).
  • Variable log-slow-admin-statements was not listed with SHOW VARIABLES#830199 (Oleg Tsarev).
  • Fixed assertion failure in InnoDB. #814404 (Yasufumi Kinoshita).
  • Since AUTO_INCREMENT columns must be defined as keys, omitting key specifications and then adding them back in ALTER TABLE doesn’t work for them. mysqldump --innodb-optimize-keys has been fixed to take this into account. #812179 (Alexey Kopytov).

Other Changes

  • Improvements and fixes on general distribution: #845019#702376#795747 (Alexey Kopytov, Ignacio Nin, Yasufumi Kinoshita).

More information

For more information, please see the following links:

Oct
13
2011
--

Benchmarking Galera replication overhead

When I mention Galera replication as in my previous post on this topic, the most popular question is how does it affect performance.

Of course you may expect performance overhead, as in case with Galera replication we add some network roundtrip and certification process. How big is it ? In this post I am trying to present some data from my benchmarks.

For tests I use tpcc-mysql, datasize 600W (~60GB) with buffer pool 52GB. Workload is run under 48 user connections.
Hardware:

Software: Percona-Server-5.5.15 regular and with Galera replication.

During tests I measure throughput each 10 sec, so it allows to observe stability of results. As final result I take median (line that divides top 50% measurements and bottom 50%).

In text I use following code names:

  • single – result just on single node, without any replication. This is used to establish baseline
  • wsrep – result on single node with enabled wsrep_provider. wsrep stands for Write Set REPlication, this is used by Galera
  • wsrep 2 nodes – result when we have 2 nodes in Galera cluster
  • replication – result using 2 nodes under regular MySQL replication
  • semisync – result using 2 nodes with semisync MySQL replication

And now go to results.

1. Compare baseline to the same node but with enabled wsrep.

That is we have 12432 vs 10821 NOT/10sec.
I think main overhead may coming from writing write sets to disk. Galera 1.0 stores events on disk.

2. We add second node connected to first.

The result drops a bit further, network communication adds its overhead.
We have 10384 for 2 nodes vs 10821 for 1 node.
The drop is just somewhat 4.2%

3. It is unfair to compare two nodes system to single node, so let’s see how regular MySQL replication performs there.

So we see that running regular replication we have better result:
11180 for MySQL replication vs 10384 for 2 Galera nodes.
However there two things to consider:

  • You see quite periodic drops for MySQL replication, I think it is related to binary logs rotation
  • And, second, much more serious problem: After 1800 sec of run, slave in regular MySQL replication was behind of master by 1000 sec. You can calculate how many transactions slave is missing. For Galera replication this is NOT a problem.

4. Now it is interesting how Semi-Sync replication will do under this workload.

For Semi-Sync replication we have 6439 NOT/10sec which is significantly slower than
in any other run.
And even with that, slave was still behind Master by 300 sec after 1800 sec of run.

And there to gather all result together:

I personally consider results with Galera very good, taking into account that we do not have second node getting behind and this node is consistent with first node.

For further activity it will be interesting to see what overhead we can get in 3-nodes setup,
and also what total throughput we can get if we put load on ALL nodes, but not only on single.

Scripts and raw results on Benchmarks Launchpad



Oct
13
2011
--

When EXPLAIN estimates can go wrong!

I have been working with a few customer cases and one interesting case popped up. The customer was facing a peculiar problem where the rows column in the EXPLAIN output of the query was totally off. The actual number of rows was 18 times more than the number of rows reported by MySQL in the output of EXPLAIN. Now this can be a real pain as MySQL uses “the number of rows” estimation to pick and choose indexes and it could really be picking up a wrong index simply because of the wrong estimate.

The customer reported that he changed the value of innodb_stats_sample_pages from 8 to 256 but with no effect, however I think innodb_stats_sample_pages really would have no effect on the “number of rows estimation”, because page sampling is used to generate index cardinality estimates which are then in turn used by MySQL to pick and choose indexes and hence is irrelevant to this problem.

Next, I proceeded to test using MySQL 5.1.58 and 5.5.15 vanilla releases.

Following is the definition of the table that I used for the test:

mysql [localhost] {msandbox} (foo2) > show create table test_estimate \G
*************************** 1. row ***************************
       Table: test_estimate
Create Table: CREATE TABLE `test_estimate` (
  `id` int(11) NOT NULL DEFAULT '0',
  `created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `type` tinyint(4) NOT NULL DEFAULT '0',
  KEY `id` (`id`),
  KEY `type_created` (`type`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

First I did a test for a SIMPLE SELECT to see the actual count of rows:

mysql [localhost] {msandbox} (foo2) > select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
count(*): 3372104

And then proceeded to run the EXPLAIN on both 5.1 and 5.5:

On 5.1:

mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_estimate
         type: ref
possible_keys: type_created
          key: type_created
      key_len: 1
          ref: const
         rows: 185440
        Extra: Using index

On 5.5:

mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type=6 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_estimate
         type: ref
possible_keys: type_created
          key: type_created
      key_len: 1
          ref: const
         rows: 3135918
        Extra: Using index

The results on 5.1 are very off, you don’t expect that much variation between the actual number of rows and the rows estimated. While the results on 5.5 are much closer and are acceptable.

Next, I decided to do a test run for a SELECT involving RANGE SCAN:

The actual count of rows is:

mysql [localhost] {msandbox} (foo2) > select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
count(*): 19391022

And the EXPLAIN result,

On 5.1:

mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_estimate
         type: range
possible_keys: type_created
          key: type_created
      key_len: 1
          ref: NULL
         rows: 339184
        Extra: Using where; Using index

On 5.5:

mysql [localhost] {msandbox} (foo2) > explain select count(*) from test_estimate where type between 3 and 6 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_estimate
         type: range
possible_keys: type_created
          key: type_created
      key_len: 1
          ref: NULL
         rows: 14313458
        Extra: Using where; Using index

Again the row estimates on 5.1 is very off, its as much as 57 times less than the number of rows, which is not acceptable at all. While 5.5 returns an acceptable estimate.

This test proves that there is a bug in MySQL 5.1 and how it calculates the row estimates. This bug was tracked down to http://bugs.mysql.com/bug.php?id=53761 and was indeed fixed in 5.5 as my tests show.

Vasil Dimov goes on to explain the reason of the bug:
“For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages.”

Obviously this assumption, that all the pages in the requested range have almost similar number of records, fails when the pages between the leftmost and the rightmost page have fewer or larger number of records per page. This is not something that can happen in all cases, but there are two scenarios when this can happen:

  1. when you have a large number of records that are concentrated against a key value and that key value happens to be in the intermediate pages,
  2. when you have a large number of records that are concentrated against key values that lie in the leftmost and rightmost page and very few records concentrated against the key values that lie in the intermediate pages.

The fix that is introduced is:
“Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation.”

This seems to be a good fix as sampling more of the intermediate pages is going to increase the quality of the estimation, and make the estimation very precise if the index tree is not much wide.

Written by in: MySQL,Zend Developer |

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com