Jun
26
2019
--

Stored Functions and Temporary Tables are Not a Good Fit

Stored Functions

Stored FunctionsIn this blog post, I am going to show why we have to be careful with stored functions in select list, as a single query can cause thousands of queries in the background if we aren’t cautious.

For this example, I am only going to use the SLEEP function to demonstrate the issue, but you could use any other stored functions.

Here is the test schema and data:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
 id INT NOT NULL PRIMARY KEY auto_increment,
 gcol INT NOT NULL,
 fcol INT NOT NULL,
 INDEX (gcol)
);
INSERT INTO t1 VALUES (NULL,1,1),(NULL,2,1),(NULL,3,1),(NULL,4,2),(NULL,5,1);

And the first query:

mysql [localhost] {msandbox} (test) > SELECT gcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY gcol LIMIT 1;
+------+----------+
| gcol | SLEEP(1) |
+------+----------+
|    1 |        0 |
+------+----------+
1 row in set (1.00 sec)

The query takes one second, which means the SLEEP was executed after/during the LIMIT step.

The second query creates a temp table:

mysql [localhost] {msandbox} (test) > SELECT fcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY fcol LIMIT 1;
+------+----------+
| fcol | SLEEP(1) |
+------+----------+
|    1 |        0 |
+------+----------+
1 row in set (5.02 sec)

It takes 5 seconds, meaning the SLEEP was executed for every row before LIMIT, and the only differences between these two queries is the second one uses temp tables. (Yes, the schema and query could be improved to avoid temp tables, but this is only a test case and you can’t avoid temp tables all the time.)

MySQL uses a different order of steps here and does the select list evaluation before/during creating the temp tables.

As you can see, there are functions in the select list, and MySQL will create temp tables, and that could cause thousands of extra queries in the background. I recently had a case where running a single query with functions in the select list resulted in 333,906 queries in the background! In the query, there was a Limit 0,10.

There is a very good blog post describing the order of the operations in SQL, which should be the following:

- From
- Where
- Group By
- Aggregations
- Having
- Window
- Select
- Distinct
- Uninon
- Order by
- Offset
- Limit

Based on this, if MySQL strictly follows the SQL order, the LIMIT should be the last, and the first query should take 5s because the Select list evaluation should happen before that for every row. But I guess this is a performance optimization in MySQL, to only run the evaluation for the limited number of rows.

When temp tables are involved, MySQL will use a different logic and do the evaluation before/during the group by when it creates the temp tables.

When MySQL creates a temp table, it has to materialize every single row, and that includes the functions as well. If your query is going to create a huge temp table, MySQL will execute the function for all those rows.

MySQL does not have generated columns in internal temp tables, but what might do the trick here is to just point to a function that needs to be executed when MySQL reads the data out from the temp table.

I have created a ticket, but we are still debating if there is any room for improvement here.

Using DML queries in these functions

If you are using DML (insert/update/delete) queries in stored functions, and you are calling them in the select list, you have to be careful because if MySQL creates temp tables it will call those queries for all the rows in the temp table as well. So you might just want to insert only one row in another table, as you will end up with thousands if not.

How can we avoid this?

If you are facing this issue, you have to analyze your queries. You might only need an extra index to avoid temp tables, or you could rewrite the query.

Where can I see if this is happening?

Unfortunately, in this case, the slow query log cannot help us, because these function calls and the queries from the functions are not logged. This could also be a feature request because it would make debugging much easier. (I have already created a feature request in Percona Server to log queries from stored functions.)

However, in the slow query log for the original query, the Rows_examined would be very high if you are using stored functions and that number is way higher than it should be and might cause problems. Also, the general log can be helpful for investigating the problem here, because in the general log we can see all these function calls as well.

Conclusion

If you are using stored functions in Select list, you have to make sure the query does not use temp tables, otherwise, it could cause many extra queries in the background.

Jul
12
2018
--

Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance

Execution map for func1()

MySQL stored procedures, functions and triggers are tempting constructs for application developers. However, as I discovered, there can be an impact on database performance when using MySQL stored routines. Not being entirely sure of what I was seeing during a customer visit, I set out to create some simple tests to measure the impact of triggers on database performance. The outcome might surprise you.

Why stored routines are not optimal performance wise: short version

Recently, I worked with a customer to profile the performance of triggers and stored routines. What I’ve learned about stored routines: “dead” code (the code in a branch which will never run) can still significantly slow down the response time of a function/procedure/trigger. We will need to be careful to clean up what we do not need.

Profiling MySQL stored functions

Let’s compare these four simple stored functions (in MySQL 5.7):

Function 1:

CREATE DEFINER=`root`@`localhost` FUNCTION `func1`() RETURNS int(11)
BEGIN
	declare r int default 0;
RETURN r;
END

This function simply declares a variable and returns it. It is a dummy function

Function 2:

CREATE DEFINER=`root`@`localhost` FUNCTION `func2`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2
    THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

This function calls another function, levenshtein_limit_n (calculates levenshtein distance). But wait: this code will never run – the condition IF 1=2 will never be true. So that is the same as function 1.

Function 3:

CREATE DEFINER=`root`@`localhost` FUNCTION `func3`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select levenshtein_limit_n('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select levenshtein_limit_n('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select levenshtein_limit_n('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

Here there are four conditions and none of these conditions will be true: there are 4 calls of “dead” code. The result of the function call for function 3 will be the same as function 2 and function 1.

Function 4:

CREATE DEFINER=`root`@`localhost` FUNCTION `func3_nope`() RETURNS int(11)
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select does_not_exit('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select does_not_exit('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select does_not_exit('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select does_not_exit('test finc', 'test func', 1000) into r;
    END IF;
RETURN r;
END

This is the same as function 3 but the function we are running does not exist. Well, it does not matter as the

select does_not_exit

  will never run.

So all the functions will always return 0. We expect that the performance of these functions will be the same or very similar. Surprisingly it is not the case! To measure the performance I used the “benchmark” function to run the same function 1M times. Here are the results:

+-----------------------------+
| benchmark(1000000, func1()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (1.75 sec)
+-----------------------------+
| benchmark(1000000, func2()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (2.45 sec)
+-----------------------------+
| benchmark(1000000, func3()) |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (3.85 sec)
+----------------------------------+
| benchmark(1000000, func3_nope()) |
+----------------------------------+
|                                0 |
+----------------------------------+
1 row in set (3.85 sec)

As we can see func3 (with four dead code calls which will never be executed, otherwise identical to func1) runs almost 3x slower compared to func1(); func3_nope() is identical in terms of response time to func3().

Visualizing all system calls from functions

To figure out what is happening inside the function calls I used performance_schema / sys schema to create a trace with ps_trace_thread() procedure

  1. Get the thread_id for the MySQL connection:
    mysql> select THREAD_ID from performance_schema.threads where processlist_id = connection_id();
    +-----------+
    | THREAD_ID |
    +-----------+
    |        49 |
    +-----------+
    1 row in set (0.00 sec)
  2. Run ps_trace_thread in another connection passing the thread_id=49:
    mysql> CALL sys.ps_trace_thread(49, concat('/var/lib/mysql-files/stack-func1-run1.dot'), 10, 0, TRUE, TRUE, TRUE);
    +--------------------+
    | summary            |
    +--------------------+
    | Disabled 0 threads |
    +--------------------+
    1 row in set (0.00 sec)
    +---------------------------------------------+
    | Info                                        |
    +---------------------------------------------+
    | Data collection starting for THREAD_ID = 49 |
    +---------------------------------------------+
    1 row in set (0.00 sec)
  3. At that point I switched to the original connection (thread_id=49) and run:
    mysql> select func1();
    +---------+
    | func1() |
    +---------+
    |       0 |
    +---------+
    1 row in set (0.00 sec)
  4. The sys.ps_trace_thread collected the data (for 10 seconds, during which I ran the
    select func1()

     ), then it finished its collection and created the dot file:

    +-----------------------------------------------------------------------+
    | Info                                                                  |
    +-----------------------------------------------------------------------+
    | Stack trace written to /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-----------------------------------------------------------------------+
    1 row in set (9.21 sec)
    +-------------------------------------------------------------------------------+
    | Convert to PDF                                                                |
    +-------------------------------------------------------------------------------+
    | dot -Tpdf -o /tmp/stack_49.pdf /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-------------------------------------------------------------------------------+
    1 row in set (9.21 sec)
    +-------------------------------------------------------------------------------+
    | Convert to PNG                                                                |
    +-------------------------------------------------------------------------------+
    | dot -Tpng -o /tmp/stack_49.png /var/lib/mysql-files/stack-func3nope-new12.dot |
    +-------------------------------------------------------------------------------+
    1 row in set (9.21 sec)
    Query OK, 0 rows affected (9.45 sec)

I repeated these steps for all the functions above and then created charts of the commands.

Here are the results:

Func1()

Execution map for func1()

Func2()

Execution map for func2()

Func3()

Execution map for func3()

 

As we can see there is a sp/jump_if_not call for every “if” check followed by an opening tables statement (which is quite interesting). So parsing the “IF” condition made a difference.

For MySQL 8.0 we can also see MySQL source code documentation for stored routines which documents how it is implemented. It reads:

Flow Analysis Optimizations
After code is generated, the low level sp_instr instructions are optimized. The optimization focuses on two areas:

Dead code removal,
Jump shortcut resolution.
These two optimizations are performed together, as they both are a problem involving flow analysis in the graph that represents the generated code.

The code that implements these optimizations is sp_head::optimize().

However, this does not explain why it executes “opening tables”. I have filed a bug.

When slow functions actually make a difference

Well, if we do not plan to run one million of those stored functions we will never even notice the difference. However, where it will make a difference is … inside a trigger. Let’s say that we have a trigger on a table: every time we update that table it executes a trigger to update another field. Here is an example: let’s say we have a table called “form” and we simply need to update its creation date:

mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.31 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

That is good and fast. Now we create a trigger which will call our dummy func1():

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
	select func1() into r;
END

Now repeat the update. Remember: it does not change the result of the update as we do not really do anything inside the trigger.

mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.90 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Just adding a dummy trigger will add 2x overhead: the next trigger, which does not even run a function, introduces a slowdown:

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.52 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Now, lets use func3 (which has “dead” code and is equivalent to func1):

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
	declare r int default 0;
	select func3() into r;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (1.06 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

However, running the code from the func3 inside the trigger (instead of calling a function) will speed up the update:

CREATE DEFINER=`root`@`localhost` TRIGGER `test`.`form_AFTER_UPDATE`
AFTER UPDATE ON `form`
FOR EACH ROW
BEGIN
    declare r int default 0;
    IF 1=2 THEN
		select levenshtein_limit_n('test finc', 'test func', 1) into r;
    END IF;
    IF 2=3 THEN
		select levenshtein_limit_n('test finc', 'test func', 10) into r;
    END IF;
    IF 3=4 THEN
		select levenshtein_limit_n('test finc', 'test func', 100) into r;
    END IF;
    IF 4=5 THEN
		select levenshtein_limit_n('test finc', 'test func', 1000) into r;
    END IF;
END
mysql> update form set form_created_date = NOW() where form_id > 5000;
Query OK, 65536 rows affected (0.66 sec)
Rows matched: 65536  Changed: 65536  Warnings: 0

Memory allocation

Potentially, even if the code will never run, MySQL will still need to parse the stored routine—or trigger—code for every execution, which can potentially lead to a memory leak, as described in this bug.

Conclusion

Stored routines and trigger events are parsed when they are executed. Even “dead” code that will never run can significantly affect the performance of bulk operations (e.g. when running this inside the trigger). That also means that disabling a trigger by setting a “flag” (e.g.

if @trigger_disable = 0 then ...

 ) can still affect performance of bulk operations.

The post Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance appeared first on Percona Database Performance Blog.

Aug
16
2009
--

A micro-benchmark of stored routines in MySQL

Ever wondered how fast stored routines are in MySQL? I just ran a quick micro-benchmark to compare the speed of a stored function against a “roughly equivalent” subquery. The idea — and there may be shortcomings that are poisoning the results here, your comments welcome — is to see how fast the SQL procedure code is at doing basically the same thing the subquery code does natively (so to speak).

Before we go further, I want to make sure you know that the queries I’m writing here are deliberately mis-optimized to force a bad execution plan. You should never use IN() subqueries the way I do, at least not in MySQL 5.1 and earlier.

I loaded the World sample database and cooked up this query:

SQL:

  1. SELECT sql_no_cache sum(ci.Population) FROM City AS ci
  2.   WHERE CountryCode IN (
  3.     SELECT DISTINCT co.Code FROM Country AS co
  4.       INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code
  5.     WHERE lower(cl.LANGUAGE) = ‘English’);
  6. +——————–+
  7. | sum(ci.Population) |
  8. +——————–+
  9. |          237134840 |
  10. +——————–+
  11. 1 row IN SET (0.23 sec)

This pretty consistently runs in just about 1/4th of a second. If you look at the abridged explain plan below, you’ll see the query is doing a table scan against the first query, and then executing the subquery for each row:

SQL:

  1. mysql> EXPLAIN SELECT ….\G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: ci
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 4079
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: DEPENDENT SUBQUERY
  16. *************************** 3. row ***************************
  17.            id: 2
  18.   select_type: DEPENDENT SUBQUERY

Now I took the subquery and basically rewrote it as a stored function.

SQL:

  1. mysql> delimiter //
  2. mysql> CREATE FUNCTION speaks_english(c char(3)) returns integer deterministic
  3.     > begin
  4.     > declare res int;
  5.     > SELECT count(DISTINCT co.Code) INTO res FROM Country AS co INNER JOIN CountryLanguage AS cl ON cl.CountryCode = co.Code WHERE lower(cl.LANGUAGE) = ‘English’ AND co.Code = c;
  6.     > RETURN res;
  7.     > end//
  8. mysql> delimiter ;

Now the query can be rewritten as this:

SQL:

  1. mysql> SELECT sql_no_cache sum(ci.Population) FROM City AS ci WHERE speaks_english(CountryCode)> 0;
  2. +——————–+
  3. | sum(ci.Population) |
  4. +——————–+
  5. |          237134840 |
  6. +——————–+
  7. 1 row IN SET (1.00 sec)

If we explain it, we get output similar to the first table shown above, but the further two rows are not shown. The query can’t be optimized to use indexes, and the stored function is opaque to the optimizer. This is why I purposefully wrote the subquery badly in the first query! (If you think of a better way to compare apples and uhm, apples… please comment).

The poorly-optimized-subquery portion of the query essentially happens inside that function now.

And it’s four times slower, consistently, and that’s all I wanted to show here. Thanks for reading.


Entry posted by Baron Schwartz |
7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com