Ever wondered how fast stored routines are in MySQL? I just ran a quick micro-benchmark to compare the speed of a stored function against a “roughly equivalent” subquery. The idea — and there may be shortcomings that are poisoning the results here, your comments welcome — is to see how fast the SQL procedure code is at doing basically the same thing the subquery code does natively (so to speak).
Before we go further, I want to make sure you know that the queries I’m writing here are deliberately mis-optimized to force a bad execution plan. You should never use IN() subqueries the way I do, at least not in MySQL 5.1 and earlier.
I loaded the World sample database and cooked up this query:
-
SELECT sql_no_cache sum(ci.Population) FROM City AS ci
-
WHERE CountryCode IN (
-
SELECT DISTINCT co.Code FROM Country AS co
-
INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code
-
WHERE lower(cl.LANGUAGE) = ‘English’);
-
+——————–+
-
| sum(ci.Population) |
-
+——————–+
-
| 237134840 |
-
+——————–+
-
1 row IN SET (0.23 sec)
This pretty consistently runs in just about 1/4th of a second. If you look at the abridged explain plan below, you’ll see the query is doing a table scan against the first query, and then executing the subquery for each row:
-
mysql> EXPLAIN SELECT ….\G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: PRIMARY
-
TABLE: ci
-
type: ALL
-
possible_keys: NULL
-
KEY: NULL
-
key_len: NULL
-
ref: NULL
-
rows: 4079
-
Extra: USING WHERE
-
*************************** 2. row ***************************
-
id: 2
-
select_type: DEPENDENT SUBQUERY
-
*************************** 3. row ***************************
-
id: 2
-
select_type: DEPENDENT SUBQUERY
Now I took the subquery and basically rewrote it as a stored function.
-
mysql> delimiter //
-
mysql> CREATE FUNCTION speaks_english(c char(3)) returns integer deterministic
-
> begin
-
> declare res int;
-
> SELECT count(DISTINCT co.Code) INTO res FROM Country AS co INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code WHERE lower(cl.LANGUAGE) = ‘English’ AND co.Code = c;
-
> RETURN res;
-
> end//
-
mysql> delimiter ;
Now the query can be rewritten as this:
-
mysql> SELECT sql_no_cache sum(ci.Population) FROM City AS ci WHERE speaks_english(CountryCode)> 0;
-
+——————–+
-
| sum(ci.Population) |
-
+——————–+
-
| 237134840 |
-
+——————–+
-
1 row IN SET (1.00 sec)
If we explain it, we get output similar to the first table shown above, but the further two rows are not shown. The query can’t be optimized to use indexes, and the stored function is opaque to the optimizer. This is why I purposefully wrote the subquery badly in the first query! (If you think of a better way to compare apples and uhm, apples… please comment).
The poorly-optimized-subquery portion of the query essentially happens inside that function now.
And it’s four times slower, consistently, and that’s all I wanted to show here. Thanks for reading.
Entry posted by Baron Schwartz |
7 comments