May
31
2010
--

On Good Instrumentation

In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which are not very helpful.

If you look at the system from bird eye view – system needs to process transactions and you want it to successfully complete large number of transactions it gets (this is what called availability) and we want it to serve them with certain response time, which is what is called performance. There could be many variables in environment which change – load, number of concurrent users, database, the way users use the system but in the nutshell all what you really care is having predictable response time within certain range. So if we care about response time – this is exactly what our instrumentation should measure

Response Time Summary We want to understand where exactly response time comes from. For example if we define transaction as the time it took to generate HTML page we want to understand how much time was spent waiting on the database, memcache, other external services, as well as how much CPU time it consumed.

Now what is important we need this information for individual transactions. It may be every transaction which is best and easily achievable for small-medium systems or at least for large enough sample. It is very important this information is available for individual transactions not the average. Average is useless because 100 transactions taking 1 sec and 99 transactions having 1ms and 1 taking 99.1 sec will have the same average while for sake of performance analyzes these are completely different. When you have transaction sample make sure it contains fair population of transaction – getting only transactions which are slow is not helpful as we might want to compare them to the fast transactions to understand why they are slow.

What kind of components do you need to have in response time summary – all components which are significant enough. If your instrumentation has 95% of response time unaccounted for it is useless. You also want blocks not to mix apples with oranges. For example “mysql and memcache” block would not be helpful. Even further I would prefer to split “mysql time” in the “connect time” and “query time” as there are situations when one but not other would be affected.

In is important for response time summary stored in the logs which are easy to query so you can analyze data in a lot of various ways. Sometimes you may find the response time is impacted by queries from certain user, in others it may be attributed to different application/web server.

The goal for Response Time Summary is to quickly point direction where problem happens. Whenever you have spike in response time or it is bad response time for certain kind of request you can quickly understand where does it come from ? Is it wasted CPU time Slow response from MySQL or Memcache.

I also like to see numbers of calls stored together with attributed response time. For example I’d like to see number of mysql calls in addition to MySQL response time. This helps to understand if it is the issue with number of queries or their performance. If I see 2 queries taking 30 seconds it is clearly slow queries. If it is 10.000 queries executed and total response time is 4 sec I know it is pretty much as good as it gets with standard Ethernet network and finding a ways to reduce number of queries is going to be more helpful.

The Glue Our applications involve multiple layers and typically higher layer can only report response time it took to call lower layer, but not the reason for that response time. For example we can report time it took to execute SQL query from PHP application side, but we can’t say why it has taken so long. Was it row level lock ? waiting on disk IO or was it simply question of burning a lot of CPU. On the other hand this information may be available in the instrumentation stats from that lower layer – for example in MySQL Query Log. What is important is however to be able to connect the data from these logs – glue them together. The easy way to do it is to provide an unique identifier to all requests and put it in the logs with request of the lower levels. With mySQL the simple way to do it is to put it in the comments for queries you execute.

Optional Tracing The information in lower layers logs is very helpful however it typically have two problems. First not every layer has good logs. For example if you’re running memcached you probably do not have the logs detailing all requests and their response time. Second – the lower layer may only know response time from its vantage point, which in many cases does not include network communication time which can be very important.

Tracing should be optional and normally applied to the small sample of requests, though it needs to be detailed. Typically you would include the calls to the lower level services together with timestamp, the response summary with timestamp again. The information about request has to be complete enough to identify target action and response completely. For example if I’m speaking about memcache I’d like to know which server:port request was issued to which key was requested, and on response I’d like to know if it was hit, miss or error.

The way I use it may be as follows. I see the increased response time for given kind of request. I see response time is coming from MySQL. I check the number of MySQL Queries and it is 5x when it usually is for this kind of request. Looking at memcache stats I can see high number of misses. Looking at some available traces shows the server memcache01 has very high miss ratio. Checking what is going on with memcache01 shows it just was restarted (and hence has almost empty cache). This is important example as it shows your increased response time from MySQL may not have anything to do with MySQL itself but you would not know unless you’re capturing the right data.

If you’re looking for nice example framework for instrumentation, check out instrumentation for PHP – It has everything mentioned by tracing which is trivial to add.


Entry posted by peter |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
May
26
2010
--

Zend Framework 1.10.5 Released

The Zend Framework team announces the immediate availability of Zend
Framework 1.10.5, our fifth maintenance release in the 1.10 series. This
release includes around 60 bug fixes, many due to the bug hunt days
held last week .

You may download it from the Zend Framework site .

May
26
2010
--

Percona Welcomes Justin Swanhart

Percona is pleased to officially welcome Justin Swanhart to our team of consultants.

Before joining Percona, Justin worked as a MySQL DBA at Gazillion, Yahoo, and Kickfire. Justin has become a regular contributor here on the MySQL Performance Blog as well as being an active blogger at http://swanhart.livejournal.com/. He is very active in the community, maintaining FlexViews, a stored procedure managed solution for materialized view creation and maintenance in MySQL 5.1 as well as instrumentation-for-php, a suite of PHP classes for easing the implementation of instrumentation in applications.

Justin, a big welcome – we are fortunate indeed you’re working with us!


Entry posted by Ryan Lowe |
2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
May
25
2010
--

FlashCache: tpcc workload

This is my last post in series on FlashCache testing when the cache is placed on Intel SSD card.

This time I am using tpcc-like workload with 1000 Warehouses ( that gives 100GB of data) on Dell PowerEdge R900 with 32GB of RAM, 22GB allocated for buffer pool and I put 70GB on FlashCache partition ( just to simply test the case when data does not fit into cache partition).

Please note in this configuration the benchmark is very write intensive, and it is not going be easy for FlashCache, as in background it has to write blocks to RAID anyway, and write rate in final place is limited by RAID. So all performance benefits will come from read hits

The full report and results are available on benchmark Wiki
http://www.percona.com/docs/wiki/benchmark:flashcache:tpcc:start.

Short version of results are on graph:

In summary:
on RAID final result: 2556.592 TpmC
on Intel SSD X25-M: 7084.483 TpmC
on FlashCache with 20% dirty pages: 2632.892 TpmC
on FlashCache with 80% dirty pages: 4468.883 TpmC

So with 20% dirty pages the benefit are really miserable, and it is quite explainable ( see note above about write intensive workload), but really on the graph we can see that probably 2h was not enough to warmup FlashCache enough.
And this is interesting problem with FlashCache what I see. Warmup by simple copying files does not work (you need O_DIRECT with proper blocksize), and you only rely on InnoDB in this case, and it takes about 30min+ to fill FlashCache. Solution there would be PRELOAD TABLE / INDEX, and it is in our roadmap for XtraDB.

With 80% dirty pages the performance gain in 1.7x and it is pretty decent, as you can get it for 500$ ( price of Intel X25-M card) ( for this particular workload, your experience may vary!).

On this stage I consider FlashCache as pretty stable and ready for an evaluation on real workloads ( kudos to Facebook team, they provide really stable release).

I actually did pretty bad test – just turned off power on SSD drive in the middle of tpcc-mysql run,
just SSD power, not whole server. No wonder FlashCache complained on failed writes, and after that I restarted full system. I was expecting that database is going to trash, but after restart FlashCache was able to attach previous cache, and MySQL was able to start and finish crash recovery. I am impressed.

In my next rounds I am looking to run similar benchmarks on FusionIO card.

P.S. And if CentOS team reads this post – please change default IO scheduler from CFQ to Deadline. Seriously, it makes so much damage on performance on servers with IO intensive workloads, so it should be the first action after CentOS installation. And I doubt that there big usage of CentOS on desktop systems anyway.


Entry posted by Vadim |
10 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
May
25
2010
--

Web Builder Zone: Zend_Test for Acceptance TDD

On the Web Builder Zone (of DZone ) Giorgio Sironi has posted an article looking at the Zend_Test component of the Zend Framework and how to use it for acceptance test-driven development .

May
25
2010
--

PHPBuilder.com: Managing Zend Framework Layouts

On PHPBuilder.com there’s a new tutorial on layouts in Zend Framework applications. The tools the framework gives you makes things much simpler when it comes to changing layouts and updating the general structure of your site.

May
25
2010
--

Some seats free for the upcoming german symfony training

There are some seats free for my upcoming german symfony training session (http://bit.ly/adl47z). This training takes place mid-june (16-18th June) in a training center in Heilbronn.

Es sind noch ein paar Plätze für meine kommende offene Symfony-Schulung verfügbar (http://bit.ly/adl47z) Diese findet am 16. bis 18. Juni in einem Trainingszentrum in Heilbronn statt.

May
24
2010
--

Tuning InnoDB Concurrency Tickets

InnoDB has an oft-unused parameter innodb_concurrency_tickets that seems widely misunderstood. From the docs: “The number of threads that can enter InnoDB concurrently is determined by the innodb_thread_concurrency variable. A thread is placed in a queue when it tries to enter InnoDB if the number of threads has already reached the concurrency limit. When a thread is allowed to enter InnoDB, it is given a number of “free tickets” equal to the value of innodb_concurrency_tickets, and the thread can enter and leave InnoDB freely until it has used up its tickets. After that point, the thread again becomes subject to the concurrency check (and possible queuing) the next time it tries to enter InnoDB. The default value is 500…”

What this means from a practical perspective is that each query is allocated 500 tickets when it begins executing. Each time it enters InnoDB, this number is decremented until it reaches zero (“entering InnoDB” appears only to occur when a row is accessed). When it reaches zero, it may-or-may-not be put into a queue and wait to continue execution. InnoDB doesn’t provide us a way in which to determine how many concurrency tickets a query uses, making this parameter notoriously difficult to tune. It is important to note that this variable only comes in to play when innodb_thread_concurrency is greater than zero.

On a stock install of MySQL, here are some example queries and the corresponding number of concurrency tickets used for each:

SQL:

  1. mysql> CREATE TABLE test_table (
  2.     ->     id int
  3.     -> ) ENGINE=InnoDB; – 0 Tickets Used
  4. Query OK, 0 rows affected (0.36 sec)
  5.  
  6. mysql> INSERT INTO test_table (id) VALUES (1); – 0 Tickets Used
  7. Query OK, 1 row affected (0.00 sec)
  8.  
  9. mysql> SELECTFROM test_table; – 1 Ticket Used
  10. +——+
  11. | id   |
  12. +——+
  13. |    1 |
  14. +——+
  15. 1 row IN SET (0.00 sec)
  16.  
  17. mysql> INSERT INTO test_table (id) VALUES (2),(3); – 0 Tickets Used
  18. Query OK, 2 rows affected (0.00 sec)
  19. Records: 2  Duplicates: 0  Warnings: 0
  20.  
  21. mysql> SELECT COUNT(*) FROM test_table; – 3 Tickets Used
  22. +———-+
  23. | COUNT(*) |
  24. +———-+
  25. |        3 |
  26. +———-+
  27. 1 row IN SET (0.00 sec)
  28.  
  29. mysql> UPDATE test_table SET id=4 WHERE id=1; – 4 Tickets Used (because no index, a table scan is performed)
  30. Query OK, 1 row affected (0.00 sec)
  31. Rows matched: 1  Changed: 1  Warnings: 0
  32.  
  33. mysql> ALTER TABLE test_table ADD INDEX (id); – 5 Tickets Used
  34. Query OK, 3 rows affected (0.01 sec)
  35. Records: 3  Duplicates: 0  Warnings: 0

And now on to a more interesting scenario: foreign keys

SQL:

  1. mysql> CREATE TABLE parent (id INT NOT NULL,
  2.     ->                      PRIMARY KEY (id)
  3.     -> ) ENGINE=INNODB; – 0 Tickets Used
  4. Query OK, 0 rows affected (0.01 sec)
  5.  
  6. mysql> CREATE TABLE child (id INT, parent_id INT,
  7.     ->                     INDEX par_ind (parent_id),
  8.     ->                     FOREIGN KEY (parent_id) REFERENCES parent(id)
  9.     ->                       ON DELETE CASCADE
  10.     -> ) ENGINE=INNODB; – 0 Tickets Used
  11. Query OK, 0 rows affected (0.00 sec)
  12.  
  13. mysql> INSERT INTO parent (id) VALUES (1),(2),(3),(4); – 3 Tickets Used
  14. Query OK, 4 rows affected (0.03 sec)
  15.  
  16. mysql> INSERT INTO child (id, parent_id) VALUES (1,1),(1,1),(2,1); – 2 Tickets Used
  17. Query OK, 3 rows affected (0.00 sec)
  18. Records: 3  Duplicates: 0  Warnings: 0
  19.  
  20. mysql> DELETE FROM child WHERE 1; – 6 Tickets Used
  21. Query OK, 3 rows affected (0.02 sec)
  22.  
  23. mysql> ALTER TABLE `child` ADD PRIMARY KEY (`id`,`parent_id`); – 0 Tickets Used
  24. Query OK, 0 rows affected (0.02 sec)
  25. Records: 0  Duplicates: 0  Warnings: 0
  26.  
  27. mysql> INSERT INTO `child` (`id`,`parent_id`) VALUES (1,1), (1,2), (2,1),(2,2); – 3 Tickets Used
  28. Query OK, 4 rows affected (0.01 sec)
  29. Records: 4  Duplicates: 0  Warnings: 0

So, how can we put this into practice, since this information isn’t available to most users?

INSERT w/PRIMARY KEY defined: Number of rows inserted – 1
INSERT w/FOREIGN KEY constraint: Number of rows inserted – 1
SELECT: 1 ticket per row returned
UPDATE: 1 ticket per row examined + 1 ticket per row updated
DELETE: 1 ticket per row examined + 1 ticket per row deleted
ALTER: (2 * rows in the table) – 1

As with any performance optimization effort, you will want to optimize for the common case. If you have a very simple workload, you can calculate these values by hand. But for most workloads with a complex access pattern, we’ll need to estimate or wait for InnoDB to expose this information to us.

What happens in the case where I have two distinct access patterns: single row primary-key lookups and SELECT statements that examine 900 rows? If innodb_concurrency_tickets is set to 500, then all of the single row PK lookups will execute without ever being subject to an additional concurrency check (there is always one when a thread first enters InnoDB) while the 900-row SELECT statements will always be subject to one additional concurrency check (we actually care less about the concurrency check itself than the possibility that it may become queued). Your first instinct may be to increase innodb_concurrency_tickets to >=900 in this case, but that isn’t necessarily the best decision. As stated in the docs, the number of threads that can enter InnoDB is limited by innodb_thread_concurrency (which is why these two variables are most often tuned in concert). To continue the example, if innodb_thread_concurrency is set to 8 and eight 900-row-SELECT statements come in, they will effectively block the PK lookups until one of them is subject to a concurrency check or complete execution and exit InnoDB. If innodb_concurrency_tickets had been increased to >= 900, then ALL of the PK lookups would be blocked until the 900-row-SELECT statements complete execution.

With a maximum value of 4,294,967,295 this has the potential to block other queries for a significant amount of time. Setting innodb_concurrency_tickets too high can have startlingly negative performance implications. On the other hand, if we determine that 99% of the traffic are these single row PK lookups and only 1% are the 900-row SELECTs, we may be tempted to lower the setting to 1 to accommodate the “typical case”. The effects of this, though, would be to cause the 900-row SELECT statements to be subject to 899 concurrency checks. This means 899 potential opportunities to be queued! So, as with most other parameters, this is a balancing act.

It really comes down to the importance of the applicable queries. Imagine those 900-row SELECT statements were actually 10,000 row selects, this would become a more pressing issue. If they are reporting queries used only internally, then it is not so much of an issue and you can leave innodb_concurrency_tickets rather small. If, on the other hand, these are the queries that lead to revenue generation, you may want to give them a bit more dedicated CPU time so they execute that much faster (even at the expense of the PK lookups). In other words, if you’re optimizing for throughput in this scenario, you will tune innodb_concurrency_tickets to the 99th percentile of small PK lookups. If you’re optimizing for response time, you would set it larger to accommodate the larger (important) select statements.

A quick sysbench run gives us the following results (X-axis is innodb_concurrency_tickets, Y-axis is txn/sec. More is better). Since all sysbench queries are 10 rows or less, we don’t really expect to see much of a difference here:

Details:

CODE:

  1. sysbench –test=oltp –oltp-table-size=80000000 –oltp-read-only=off –init-rng=on –num-threads=16 –max-requests=0 –oltp-dist-type=uniform –max-time=300  –mysql-user=root –mysql-socket=/var/lib/mysql/mysql.sock run

Applicable my.cnf settings:

CODE:

  1. innodb_buffer_pool_size=24G
  2. innodb_data_file_path=ibdata1:10M:autoextend
  3. innodb_file_per_table=1
  4. innodb_flush_log_at_trx_commit = 1
  5. innodb_log_buffer_size               = 8M
  6. innodb_log_files_in_group=2
  7. innodb_log_file_size=1900M
  8. innodb_thread_concurrency=16
  9. innodb_flush_method             = O_DIRECT
  10. innodb_write_io_threads=8
  11. innodb_read_io_threads=8
  12. innodb_io_capacity=500
  13. innodb_max_dirty_pages_pct=90
  14. max_connections=3000
  15. query_cache_size=0
  16. skip-name-resolve
  17. table_cache=10000

Entry posted by Ryan Lowe |
2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
May
23
2010
--

Percona Server security fix releases

As you may know MySQL ® announced 5.0.91 and 5.1.47 with serious security fixes, so we provide binary releases of Percona Server 5.0 and Percona Server / XtraDB 5.1 with security patches.

Fixed bugs:

Release Percona Server 5.0.91-rel22 is available in our download area:
http://www.percona.com/downloads/Percona-Server-5.0/Percona-Server-5.0.91-22/

Release Percona Server/XtraDB 5.1.45-rel10.2 is available also ( we backported patches from 5.1.47 to 5.1.45):
http://www.percona.com/downloads/Percona-Server-5.1/Percona-Server-5.1.45-10.2/

Updated RPM and DEB are also available in our repositories.

We know that there still are many active MySQL 4.0 and MySQL 4.1 installations and we understand how hard it is upgrade 4.0/4.1 to 5.1. With published exploits these installation are now under serious risks. If you are interested in backporting patches to 4.0 / 4.1 or to older 5.0 and 5.1 releases, it is available under Percona Maintenance package or regular Consulting


Entry posted by Vadim |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |
May
22
2010
--

Extending Index for Innodb tables can hurt performance in a surprising way

One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.

SQL:

  1. CREATE TABLE `idxitest` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `a` int(11) NOT NULL,
  4.   `b` int(11) NOT NULL,
  5.   PRIMARY KEY (`id`),
  6.   KEY `a` (`a`)
  7. ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1
  8.  
  9. mysql> SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  10. +———-+
  11. | count(*) |
  12. +———-+
  13. |    60434 |
  14. +———-+
  15. 1 row IN SET (0.69 sec)
  16.  
  17. mysql> EXPLAIN SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  18. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  19. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows   | Extra       |
  20. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  21. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const | 707820 | USING WHERE |
  22. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  23. 1 row IN SET (0.00 sec)

The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?

SQL:

  1. mysql> ALTER TABLE idxitest DROP KEY a,ADD KEY(a,b);
  2. Query OK, 0 rows affected (24.84 sec)
  3. Records: 0  Duplicates: 0  Warnings: 0
  4.  
  5. mysql> SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  6. +———-+
  7. | count(*) |
  8. +———-+
  9. |    60434 |
  10. +———-+
  11. 1 row IN SET (0.02 sec)
  12.  
  13. mysql> EXPLAIN SELECT count(*) FROM idxitest WHERE a=5 AND b=5;
  14. +—-+————-+———-+——+—————+——+———+————-+——–+————-+
  15. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref         | rows   | Extra       |
  16. +—-+————-+———-+——+—————+——+———+————-+——–+————-+
  17. 1 | SIMPLE      | idxitest | ref  | a             | a    | 8       | const,const | 120640 | USING INDEX |
  18. +—-+————-+———-+——+—————+——+———+————-+——–+————-+
  19. 1 row IN SET (0.00 sec)

Wow. The query runs 30 times faster – not only because it has to scan less rows but also because it is index covering query now – it does not need to access the data.

It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:

SQL:

  1. # Old Schema
  2.  
  3. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  4. +———+—–+—+
  5. | id      | a   | b |
  6. +———+—–+—+
  7. | 3000000 | 100 | 7 |
  8. +———+—–+—+
  9. 1 row IN SET (0.00 sec)
  10.  
  11. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  12. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  13. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows   | Extra       |
  14. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  15. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const | 126074 | USING WHERE |
  16. +—-+————-+———-+——+—————+——+———+——-+——–+————-+
  17. 1 row IN SET (0.00 sec)
  18.  
  19.  
  20.  
  21. # new Schema
  22.  
  23. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  24. +———+—–+—+
  25. | id      | a   | b |
  26. +———+—–+—+
  27. | 3000000 | 100 | 7 |
  28. +———+—–+—+
  29. 1 row IN SET (1.01 sec)
  30.  
  31. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  32. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  33. | id | select_type | TABLE    | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  34. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  35. 1 | SIMPLE      | idxitest | INDEX | a             | PRIMARY | 4       | NULL |   36 | USING WHERE |
  36. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  37. 1 row IN SET (0.00 sec)
  38.  
  39. # The plan also can look something like this:
  40.  
  41. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  42. +—-+————-+———-+——+—————+——+———+——-+——+——————————————+
  43. | id | select_type | TABLE    | type | possible_keys | KEY  | key_len | ref   | rows | Extra                                    |
  44. +—-+————-+———-+——+—————+——+———+——-+——+——————————————+
  45. 1 | SIMPLE      | idxitest | ref  | a             | a    | 4       | const |    1 | USING WHERE; USING INDEX; USING filesort |
  46. +—-+————-+———-+——+—————+——+———+——-+——+——————————————+
  47. 1 row IN SET (0.01 sec)

So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature – index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.

What is solution ? You can have “redundant” indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:

SQL:

  1. CREATE TABLE `idxitest` (
  2.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   `a` int(11) NOT NULL,
  4.   `b` int(11) NOT NULL,
  5.   PRIMARY KEY (`id`),
  6.   KEY `a` (`a`),
  7.   KEY `a_2` (`a`,`b`)
  8. ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1
  9.  
  10. mysql> SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  11. +———+—–+—+
  12. | id      | a   | b |
  13. +———+—–+—+
  14. | 3000000 | 100 | 7 |
  15. +———+—–+—+
  16. 1 row IN SET (1.03 sec)
  17.  
  18. mysql> EXPLAIN SELECT * FROM  idxitest  WHERE a=100 ORDER BY id DESC LIMIT 1;
  19. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  20. | id | select_type | TABLE    | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  21. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  22. 1 | SIMPLE      | idxitest | INDEX | a,a_2         | PRIMARY | 4       | NULL | 2247 | USING WHERE |
  23. +—-+————-+———-+——-+—————+———+———+——+——+————-+
  24. 1 row IN SET (0.00 sec)

MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.

To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.

These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.

Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I’ve seen same behavior with MySQL 5.0 too.


Entry posted by peter |
9 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: Zend Developer |

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com