Apr
27
2015
--

Indexing 101: Optimizing MySQL queries on a single table

I have recently seen several cases when performance for MySQL queries on a single table was terrible. The reason was simple: the wrong indexes were added and so the execution plan was poor. Here are guidelines to help you optimize various kinds of single-table queries.

Disclaimer: I will be presenting general guidelines and I do not intend to cover all scenarios. I am pretty confident that you can find examples where what I am writing does not work, but I am also confident that it will help you most of the time. Also I will not discuss features you can find in MySQL 5.6+ like Index Condition Pushdown to keep things simple. Be aware that such features can actually make a significant difference in query response time (for good or for bad).

What an index can do for you

An index can perform up to 3 actions: filter, sort/group and cover. While the first 2 actions are self-explanatory, not everyone may know what a ‘covering index’ is. Actually that’s very easy. The general workflow for a basic query is:
1. Use an index to find matching records and get the pointers to data.
2. Use the pointers to the corresponding data.
3. Return records

When a covering index can be used, the index already covers all fields requested in the query, so step #2 can be skipped and the workflow is now:
1. Use an index to find matching records
2. Return records

In many cases, indexes are small and can fit in memory while data is large and does not fit in memory: by using a covering index, you can avoid lots of disk operations and performance can be order of magnitudes better.
Let’s now look at different common scenarios.

Single equality

This is the most basic scenario:

SELECT * FROM t WHERE c = 100

The idea is of course to add an index on (c). However note that if the criteria is not selective enough, the optimizer may choose to perform a full table scan that will certainly be more efficient.
Also note that a frequent variation of this query is when you only select a small subset of fields instead of all fields:

SELECT c1, c2 FROM t WHERE c = 100

Here it could make sense to create an index on (c, c1, c2) because it will be a covering index. Do not create an index on (c1, c2, c)! It will still be covering but it will not be usable for filtering (remember that you can only use a left-most prefix of an index to filter).

Multiple equalities

SELECT * FROM t WHERE c = 100 and d = 'xyz'

It is also very easy to optimize: just add an index on (c, d) or (d, c).

The main mistake here is to add 2 indexes: one on (c) and one on (d). Granted, MySQL is able to use both indexes with the index_merge algorithm, but it is almost always a very bad option.

Equality and inequality

SELECT * FROM t WHERE c > 100 and d = 'xyz'

Here we must be careful because as long as we are using a column with an inequality, this will prevent us from using further columns in the index.

Therefore if we create an index on (d, c), we will be able to filter both on c and d, this is good.
But if we create an index on (c, d), we will only be filtering on c, which is less efficient.

So unlike the situation when you have equalities, order of columns matters when inequalities are used.

Multiple inequalities

SELECT * FROM t WHERE c > 100 and b < 10 and d = 'xyz'

As we have 2 inequalities, we already know that we will not be able to filter on both conditions (*). So we have to make a decision: will we filter on (d, b) or on (d, c)?

It is not possible to tell which option is better without looking at the data: simply choose the column where the inequality is the most selective. The main point is that you must put the column(s) with an equality first.

(*) Actually there is a way to ‘filter’ on both inequalites: partition on b and add an index on (d, c) or partition on c and add an index on (d, b). The details are out of the scope of this post but it might be an option for some situations.

Equalities and sort

SELECT * FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

As mentioned in the first paragraph, an index can filter and sort so this query is easy to optimize. However like for inequalities, we must carefully choose the order of the columns in the index: the rule is that we will filter first, and then sort.

With that in mind, it is easy to know that (c, d, b) or (d, c, b) will be good indexes while (b, c, d) or (b, d, c) are not as good (they will sort but not filter).

And if we have:

SELECT c1, c2 FROM t WHERE c = 100 and d = 'xyz' ORDER BY b

We can create a super efficient index that will filter, sort and be covering: (c, d, b, c1, c2).

Inequality and sort

We have 2 main variations here. The first one is:

SELECT * FROM t WHERE c > 100 and d = 'xyz' ORDER BY b

Two options look reasonable in this case:
1. filter on d and sort by b.
2. filter on d and c.

Which strategy is more efficient? It will depend on your data, so you will have to experiment.

The second variation is:

SELECT * FROM t WHERE c > 100 ORDER BY b

This time we have no equality so we have to choose between filtering and sorting. Most likely you will choose filtering.

Conclusion

Not all cases have been covered in this post but you can already see that in some cases you will create poor MySQL indexes if you are not careful. In a future post, I will present a case that can look confusing at first sight but which is easy to understand if you already know everything mentioned here.

The post Indexing 101: Optimizing MySQL queries on a single table appeared first on MySQL Performance Blog.

Apr
16
2015
--

Profiling MySQL queries from Performance Schema

When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries aka

SET profiling = 1;

 . This is already awesome and simple to use, but why the PERFORMANCE_SCHEMA alternative?

Because profiling will be removed soon (already deprecated on MySQL 5.6 ad 5.7); the built-in profiling capability can only be enabled per session. This means that you cannot capture profiling information for queries running from other connections. If you are using Percona Server, the profiling option for log_slow_verbosity is a nice alternative, unfortunately, not everyone is using Percona Server.

Now, for a quick demo: I execute a simple query and profile it below. Note that all of these commands are executed from a single session to my test instance.

mysql> SHOW PROFILES;
+----------+------------+----------------------------------------+
| Query_ID | Duration   | Query                                  |
+----------+------------+----------------------------------------+
|        1 | 0.00011150 | SELECT * FROM sysbench.sbtest1 LIMIT 1 |
+----------+------------+----------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW PROFILE SOURCE FOR QUERY 1;
+----------------------+----------+-----------------------+------------------+-------------+
| Status               | Duration | Source_function       | Source_file      | Source_line |
+----------------------+----------+-----------------------+------------------+-------------+
| starting             | 0.000017 | NULL                  | NULL             |        NULL |
| checking permissions | 0.000003 | check_access          | sql_parse.cc     |        5797 |
| Opening tables       | 0.000021 | open_tables           | sql_base.cc      |        5156 |
| init                 | 0.000009 | mysql_prepare_select  | sql_select.cc    |        1050 |
| System lock          | 0.000005 | mysql_lock_tables     | lock.cc          |         306 |
| optimizing           | 0.000002 | optimize              | sql_optimizer.cc |         138 |
| statistics           | 0.000006 | optimize              | sql_optimizer.cc |         381 |
| preparing            | 0.000005 | optimize              | sql_optimizer.cc |         504 |
| executing            | 0.000001 | exec                  | sql_executor.cc  |         110 |
| Sending data         | 0.000025 | exec                  | sql_executor.cc  |         190 |
| end                  | 0.000002 | mysql_execute_select  | sql_select.cc    |        1105 |
| query end            | 0.000003 | mysql_execute_command | sql_parse.cc     |        5465 |
| closing tables       | 0.000004 | mysql_execute_command | sql_parse.cc     |        5544 |
| freeing items        | 0.000005 | mysql_parse           | sql_parse.cc     |        6969 |
| cleaning up          | 0.000006 | dispatch_command      | sql_parse.cc     |        1874 |
+----------------------+----------+-----------------------+------------------+-------------+
15 rows in set, 1 warning (0.00 sec)

To demonstrate how we can achieve the same with Performance Schema, we first identify our current connection id. In the real world, you might want to get the connection/processlist id of the thread you want to watch i.e. from

SHOW PROCESSLIST

 .

mysql> SELECT THREAD_ID INTO @my_thread_id
    -> FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID();
Query OK, 1 row affected (0.00 sec)

Next, we identify the bounding EVENT_IDs for the statement stages. We will look for the statement we wanted to profile using the query below from the

events_statements_history_long

table. Your LIMIT clause may vary depending on how much queries the server might be getting.

mysql> SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID
    -> FROM events_statements_history_long
    -> WHERE THREAD_ID = @my_thread_id
    ->   AND EVENT_NAME = 'statement/sql/select'
    -> ORDER BY EVENT_ID DESC LIMIT 3 G
*************************** 1. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 419
    END_EVENT_ID: 434
        SQL_TEXT: SELECT THREAD_ID INTO @my_thread_id
FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID()
NESTING_EVENT_ID: NULL
*************************** 2. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 374
    END_EVENT_ID: 392
        SQL_TEXT: SELECT * FROM sysbench.sbtest1 LIMIT 1
NESTING_EVENT_ID: NULL
*************************** 3. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 353
    END_EVENT_ID: 364
        SQL_TEXT: select @@version_comment limit 1
NESTING_EVENT_ID: NULL
3 rows in set (0.02 sec)

From the results above, we are mostly interested with the EVENT_ID and END_EVENT_ID values from the second row, this will give us the stage events of this particular query from the

events_stages_history_long

table.

mysql> SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'
    -> FROM events_stages_history_long
    -> WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392;
+--------------------------------+----------------------+---------------+
| EVENT_NAME                     | SOURCE               | DURATION (ms) |
+--------------------------------+----------------------+---------------+
| stage/sql/init                 | mysqld.cc:998        |        0.0214 |
| stage/sql/checking permissions | sql_parse.cc:5797    |        0.0023 |
| stage/sql/Opening tables       | sql_base.cc:5156     |        0.0205 |
| stage/sql/init                 | sql_select.cc:1050   |        0.0089 |
| stage/sql/System lock          | lock.cc:306          |        0.0047 |
| stage/sql/optimizing           | sql_optimizer.cc:138 |        0.0016 |
| stage/sql/statistics           | sql_optimizer.cc:381 |        0.0058 |
| stage/sql/preparing            | sql_optimizer.cc:504 |        0.0044 |
| stage/sql/executing            | sql_executor.cc:110  |        0.0008 |
| stage/sql/Sending data         | sql_executor.cc:190  |        0.0251 |
| stage/sql/end                  | sql_select.cc:1105   |        0.0017 |
| stage/sql/query end            | sql_parse.cc:5465    |        0.0031 |
| stage/sql/closing tables       | sql_parse.cc:5544    |        0.0037 |
| stage/sql/freeing items        | sql_parse.cc:6969    |        0.0056 |
| stage/sql/cleaning up          | sql_parse.cc:1874    |        0.0006 |
+--------------------------------+----------------------+---------------+
15 rows in set (0.01 sec)

As you can see the results are pretty close, not exactly the same but close. SHOW PROFILE shows Duration in seconds, while the results above is in milliseconds.

Some limitations to this method though:

  • As we’ve seen it takes a few hoops to dish out the information we need. Because we have to identify the statement we have to profile manually, this procedure may not be easy to port into tools like the sys schema or pstop.
  • Only possible if Performance Schema is enabled (by default its enabled since MySQL 5.6.6, yay!)
  • Does not cover all metrics compared to the native profiling i.e. CONTEXT SWITCHES, BLOCK IO, SWAPS
  • Depending on how busy the server you are running the tests, the sizes of the history tables may be too small, as such you either have to increase or loose the history to early i.e.
    performance_schema_events_stages_history_long_size

    variable. Using ps_history might help in this case though with a little modification to the queries.

  • The resulting Duration per event may vary, I would think this may be due to the additional as described on performance_timers table. In any case we hope to get this cleared up as result when this bug is fixed.

The post Profiling MySQL queries from Performance Schema appeared first on MySQL Performance Blog.

Apr
16
2015
--

Profiling MySQL queries from Performance Schema

When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries aka

SET profiling = 1;

 . This is already awesome and simple to use, but why the PERFORMANCE_SCHEMA alternative?

Because profiling will be removed soon (already deprecated on MySQL 5.6 ad 5.7); the built-in profiling capability can only be enabled per session. This means that you cannot capture profiling information for queries running from other connections. If you are using Percona Server, the profiling option for log_slow_verbosity is a nice alternative, unfortunately, not everyone is using Percona Server.

Now, for a quick demo: I execute a simple query and profile it below. Note that all of these commands are executed from a single session to my test instance.

mysql> SHOW PROFILES;
+----------+------------+----------------------------------------+
| Query_ID | Duration   | Query                                  |
+----------+------------+----------------------------------------+
|        1 | 0.00011150 | SELECT * FROM sysbench.sbtest1 LIMIT 1 |
+----------+------------+----------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW PROFILE SOURCE FOR QUERY 1;
+----------------------+----------+-----------------------+------------------+-------------+
| Status               | Duration | Source_function       | Source_file      | Source_line |
+----------------------+----------+-----------------------+------------------+-------------+
| starting             | 0.000017 | NULL                  | NULL             |        NULL |
| checking permissions | 0.000003 | check_access          | sql_parse.cc     |        5797 |
| Opening tables       | 0.000021 | open_tables           | sql_base.cc      |        5156 |
| init                 | 0.000009 | mysql_prepare_select  | sql_select.cc    |        1050 |
| System lock          | 0.000005 | mysql_lock_tables     | lock.cc          |         306 |
| optimizing           | 0.000002 | optimize              | sql_optimizer.cc |         138 |
| statistics           | 0.000006 | optimize              | sql_optimizer.cc |         381 |
| preparing            | 0.000005 | optimize              | sql_optimizer.cc |         504 |
| executing            | 0.000001 | exec                  | sql_executor.cc  |         110 |
| Sending data         | 0.000025 | exec                  | sql_executor.cc  |         190 |
| end                  | 0.000002 | mysql_execute_select  | sql_select.cc    |        1105 |
| query end            | 0.000003 | mysql_execute_command | sql_parse.cc     |        5465 |
| closing tables       | 0.000004 | mysql_execute_command | sql_parse.cc     |        5544 |
| freeing items        | 0.000005 | mysql_parse           | sql_parse.cc     |        6969 |
| cleaning up          | 0.000006 | dispatch_command      | sql_parse.cc     |        1874 |
+----------------------+----------+-----------------------+------------------+-------------+
15 rows in set, 1 warning (0.00 sec)

To demonstrate how we can achieve the same with Performance Schema, we first identify our current connection id. In the real world, you might want to get the connection/processlist id of the thread you want to watch i.e. from

SHOW PROCESSLIST

 .

mysql> SELECT THREAD_ID INTO @my_thread_id
    -> FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID();
Query OK, 1 row affected (0.00 sec)

Next, we identify the bounding EVENT_IDs for the statement stages. We will look for the statement we wanted to profile using the query below from the

events_statements_history_long

table. Your LIMIT clause may vary depending on how much queries the server might be getting.

mysql> SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID
    -> FROM events_statements_history_long
    -> WHERE THREAD_ID = @my_thread_id
    ->   AND EVENT_NAME = 'statement/sql/select'
    -> ORDER BY EVENT_ID DESC LIMIT 3 G
*************************** 1. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 419
    END_EVENT_ID: 434
        SQL_TEXT: SELECT THREAD_ID INTO @my_thread_id
FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID()
NESTING_EVENT_ID: NULL
*************************** 2. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 374
    END_EVENT_ID: 392
        SQL_TEXT: SELECT * FROM sysbench.sbtest1 LIMIT 1
NESTING_EVENT_ID: NULL
*************************** 3. row ***************************
       THREAD_ID: 13848
        EVENT_ID: 353
    END_EVENT_ID: 364
        SQL_TEXT: select @@version_comment limit 1
NESTING_EVENT_ID: NULL
3 rows in set (0.02 sec)

From the results above, we are mostly interested with the EVENT_ID and END_EVENT_ID values from the second row, this will give us the stage events of this particular query from the

events_stages_history_long

table.

mysql> SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)'
    -> FROM events_stages_history_long
    -> WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392;
+--------------------------------+----------------------+---------------+
| EVENT_NAME                     | SOURCE               | DURATION (ms) |
+--------------------------------+----------------------+---------------+
| stage/sql/init                 | mysqld.cc:998        |        0.0214 |
| stage/sql/checking permissions | sql_parse.cc:5797    |        0.0023 |
| stage/sql/Opening tables       | sql_base.cc:5156     |        0.0205 |
| stage/sql/init                 | sql_select.cc:1050   |        0.0089 |
| stage/sql/System lock          | lock.cc:306          |        0.0047 |
| stage/sql/optimizing           | sql_optimizer.cc:138 |        0.0016 |
| stage/sql/statistics           | sql_optimizer.cc:381 |        0.0058 |
| stage/sql/preparing            | sql_optimizer.cc:504 |        0.0044 |
| stage/sql/executing            | sql_executor.cc:110  |        0.0008 |
| stage/sql/Sending data         | sql_executor.cc:190  |        0.0251 |
| stage/sql/end                  | sql_select.cc:1105   |        0.0017 |
| stage/sql/query end            | sql_parse.cc:5465    |        0.0031 |
| stage/sql/closing tables       | sql_parse.cc:5544    |        0.0037 |
| stage/sql/freeing items        | sql_parse.cc:6969    |        0.0056 |
| stage/sql/cleaning up          | sql_parse.cc:1874    |        0.0006 |
+--------------------------------+----------------------+---------------+
15 rows in set (0.01 sec)

As you can see the results are pretty close, not exactly the same but close. SHOW PROFILE shows Duration in seconds, while the results above is in milliseconds.

Some limitations to this method though:

  • As we’ve seen it takes a few hoops to dish out the information we need. Because we have to identify the statement we have to profile manually, this procedure may not be easy to port into tools like the sys schema or pstop.
  • Only possible if Performance Schema is enabled (by default its enabled since MySQL 5.6.6, yay!)
  • Does not cover all metrics compared to the native profiling i.e. CONTEXT SWITCHES, BLOCK IO, SWAPS
  • Depending on how busy the server you are running the tests, the sizes of the history tables may be too small, as such you either have to increase or loose the history to early i.e.
    performance_schema_events_stages_history_long_size

    variable. Using ps_history might help in this case though with a little modification to the queries.

  • The resulting Duration per event may vary, I would think this may be due to the additional as described on performance_timers table. In any case we hope to get this cleared up as result when this bug is fixed.

The post Profiling MySQL queries from Performance Schema appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com