Oct
25
2010
--

MySQL Limitations Part 3: Subqueries

This is the third in a series on what’s seriously limiting MySQL in certain circumstances (links: part 1, 2). This post is about subqueries, which in some cases execute outside-in instead of inside-out as users expect.

It’s easy to pick on subqueries in MySQL, so I’ll try to be gentle. The following query will surprise users unpleasantly:

select * from a where a.id in (select id from b);

Users expect the inner query to execute first, then the results to be substituted into the IN() list. But what happens instead is usually a full scan or index scan of table a, followed by N queries to table b. This is because MySQL rewrites the query to make the inner query dependent on the outer query, which could be an optimization in some cases, but de-optimizes the query in many other cases. NOT IN(SELECT …) queries execute badly, too. (Note: putting a literal list of items in the IN() clause performs fine. It’s only when there is a SELECT inside it that it works poorly.)

The fix for this has been in progress for a few years, and Sergey Petrunia committed working code to the stalled 6.0 release. But it’s not quite clear whether that code was a complete solution. It has not been in any GA or RC release, so it hasn’t been used widely.

To be fair, many other database servers also have poor subquery performance, or have had it in the past and have fixed it. And many MySQL users have learned to simply write JOINs instead, so it isn’t that much of a limitation. But it would be a big improvement if it were fixed.

See if you can guess what limitation number 4 will be!


Entry posted by Baron Schwartz |
17 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Mar
18
2010
--

When the subselect runs faster

A few weeks ago, we had a query optimization request from one of our customer.

The query was very simple like:

CODE:

  1. SELECT * FROM `table` WHERE (col1=‘A’||col1=‘B’) ORDER BY id DESC LIMIT 20 OFFSET 0

This column in the table is looks like this:

CODE:

  1. `col1` enum(‘A’,‘B’,‘C’,‘CD’,‘DE’,‘F’,‘G’,‘HI’) default NULL

The table have 549252 rows and of course, there is an index on the col1. MySQL estimated the cardinality of that index as 87, though what was of course misleading as index cardinality in this case can’t be over 9, as there is only 8(+ NULL) different possible values for this column.

CODE:

  1. +—-+————-+——-+——-+—————+——+———+——+——–+—————————–+
  2. | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                       |
  3. +—-+————-+——-+——-+—————+——+———+——+——–+—————————–+
  4. 1 | SIMPLE      | table  | range | col1         | col1 | 2       | NULL | 549252 | Using where; Using filesort |
  5. +—-+————-+——-+——-+—————+——+———+——+——–+—————————–+

This query took more than 5 minutes (the rows are large and table does not fit in cache well)

When you want to run this query mysql first will try to find each row where col1 is A or B using index. Then its going to order by the ID using file sort and then send first 20 rows ignoring the rest.

In this case MySQL has 2 indexes where one is usable to find rows, while other is usable to return them in the right order. MySQL can chose only one of them to execute the query – use index to find rows. This is sensible strategy if there is no LIMIT, however it is poor chose if there is one – it is often a lot faster to retrieve rows in order checking WHERE clause for them until required number of rows were returned. Especially in the cases when WHERE clause is not very selective.

So I tried this:

CODE:

  1. select * from table where id in (SELECT id FROM `table` WHERE (col1=‘A’||col1=‘B’)) ORDER BY id DESC LIMIT 20 OFFSET 0;

In this case we forcing MySQL to do retrieve rows in sorted order and checking if it matches our original WHERE clause with subselects. It looks scary if we look at EXPLAIN but in reality the dependent subquery is only executed enough times to produce 20 rows in result set.

CODE:

  1. +—-+——————–+——-+—————–+—————+———+———+——+——–+————-+
  2. | id | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows   | Extra       |
  3. +—-+——————–+——-+—————–+—————+———+———+——+——–+————-+
  4. 1 | PRIMARY            | table  | index           | NULL          | PRIMARY | 4       | NULL | 765105 | Using where |
  5. 2 | DEPENDENT SUBQUERY | table  | unique_subquery | PRIMARY,col1  | PRIMARY | 4       | func |      1 | Using where |
  6. +—-+——————–+——-+—————–+—————+———+———+——+——–+————-+

The result is a lot better result time:

CODE:

  1. (20 rows in set (0.01 sec))

So by rewriting query using subqueries we actually improved it performance 100 times. So subqueries are
not always slowing things down.

Even though proving subqueries are not always slow this way is not the most optimal. We do not really need separate subselect to make MySQL check WHERE clause while scanning table in index order. We can just use FORCE INDEX hint to override MySQL index choice:

CODE:

  1. mysql> explain select * from table FORCE INDEX(PRIMARY) where (col1=‘A’||col1=‘B’) order by id desc limit 20 offset 0;
  2. +—-+————-+——-+——-+—————+———+———+——+——–+————-+
  3. | id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
  4. +—-+————-+——-+——-+—————+———+———+——+——–+————-+
  5. 1 | SIMPLE      | table  | index | NULL          | PRIMARY | 4       | NULL | 549117 | Using where |
  6. +—-+————-+——-+——-+—————+———+———+——+——–+————-+
  7.  
  8. mysql> select * from table FORCE INDEX(PRIMARY) where (col1=‘A’||col1=‘B’) order by id desc limit 20 offset 0;
  9. 20 rows in set (0.00 sec)

This approach works well if WHERE clause is not very selective, otherwise MySQL may need to scan very many rows to find enough matching rows. You can use another trick Peter
wrote. about couple of years ago.


Entry posted by Istvan Podor |
7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com