MySQL query digest with Performance Schema

Data AnalysisQuery analysis is a fantastic path in the pursuit to achieve high performance. It’s also probably the most repeated part of a DBA’s daily adventure. For most of us, the weapon of choice is definitely pt-query-digest, which is one of the best tools for slow query analysis out there.

Why not use pt-query-digest? Well, sometimes getting the slow log can be a challenge, such as with RDS instances or when your database is running as part of a DBaaS, which is a common practice in certain organizations.

In those cases it’s good to have an alternative. And in this case, the chosen one is the Performance Schema. We have already talked about the events_statements_* tables; however, this is the moment for the events_statements_summary_by_digest. In this table each row summarizes events for given schema/digest values (note that before MySQL 5.6.9, there is no SCHEMA_NAME column and grouping is based on DIGEST values only).

In order for MySQL to start to aggregate information in summary tables, you must verify that the consumer statement_digest is enabled.

Now, the most straightforward way to get data is to simply query the table, like this:

round(sum_timer_wait/ 1000000000000, 6),
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;

This will show you the picture of volume and frequency of SQL statements in your server. As simple as that. But there are some caveats:

  • Statements are normalized to a digest text. Instead of seeing a query like SELECT age FROM population WHERE id BETWEEN 153 AND 153+69 you will have the fingerprint version: SELECT age FROM population WHERE id BETWEEN ? AND ? + ?
  • The events_statements_summary_by_digest table has a limited maximum number of rows (200 by default, but MySQL 5.6.5 can be modified with the performance_schema_digests_size variable). As a consequence, when the table is full, statement digest values that have no already existing row will be added to a special “catch-all” row with DIGEST = NULL. In plain English: you won’t have meaningful info for those statements.

To solve the first issue, we can use the events_statements_history table to get complete queries for all the digests. I chose not to use events_statements_currents because of the short life the rows have on that table. With history, there are more chances to get more queries in the same amount of time.

Now, when using pt-query-digest, the first step is always to collect a representative amount of data, commonly from the slow log, and then process. With Performance Schema, let’s collect a representative amount of complete queries so we can have examples for every aggregated statement.

To address the second issue, we just need to TRUNCATE the events_statements_summary_by_digest table. This way the summary will start from scratch.

Since Performance Schema is available on all the platforms supported by MySQL, I chose to run the tests on an Amazon RDS MySQL instance. The only thing I didn’t like is that P_S is disabled by default on RDS and to enable it requires an instances reboot. Other than that, everything is the same as in a regular instance.

The steps are:

  1. Enable the events_statements_history consumer
  2. Create a MEMORY table to hold the data
  3. Truncate tables to have a fresh start
  4. Create a MySQL EVENT that will fill the table with data
  5. Once the event has ended, get the query digest.

The table schema is the following:

CREATE TABLE IF NOT EXISTS percona.digest_seen
(schema_name varchar(64) DEFAULT NULL,
digest varchar(32) DEFAULT NULL,
sql_text varchar(1024) DEFAULT NULL,
PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;

The original SQL_TEXT field from the events_statements_history table is defined as longtext, but unless you are using Percona Server (5.5+), you won’t be able to use longtext on a memory table. This is possible in Percona Server because the Improved Memory Engine permits the use of Blob and Text fields on the Memory Storage Engine. The workaround is to define that field as a varchar 1024. Why 1024? That’s another requirement from the table: The SQL_TEXT is fixed at 1024 chars. It’s only after MySQL 5.7.6 that the maximum number of bytes to display can be modified by changing the performance_schema_max_sql_text_length system variable at server startup.

Also, since we are going to use EVENTS on RDS, the “event_scheduler” variable has to be set to ON. Luckily, it is a dynamic variable so there’s no need to reboot the instance after modifying the Parameter Group. If using a non-RDS, it’s enough to execute “SET GLOBAL event_scheduler = ON;”

Here is the complete list of steps:

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'events_statements_history';
SET SESSION max_heap_table_size = 1024*1024;
USE percona;
CREATE TABLE IF NOT EXISTS percona.digest_seen (schema_name varchar(64) DEFAULT NULL, digest varchar(32) DEFAULT NULL, sql_text varchar(24) DEFAULT NULL, PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;
TRUNCATE TABLE percona.digest_seen;
TRUNCATE performance_schema.events_statements_summary_by_digest;
TRUNCATE performance_schema.events_statements_history;
INSERT IGNORE INTO percona.digest_seen SELECT CURRENT_SCHEMA, DIGEST, SQL_TEXT FROM performance_schema.events_statements_history WHERE DIGEST IS NOT NULL GROUP BY current_schema, digest LIMIT 50;

The event is defined to be run immediately, once per second, for 5 minutes. After the event is complete, it will be deleted.

When the event is done, we are in position to get the query digest. Simply execute this query:

ROUND(d.SUM_TIMER_WAIT / 1000000000000, 6) as EXECUTION_TIME,
ROUND(d.AVG_TIMER_WAIT / 1000000000000, 6) as AVERAGE_TIME,
FROM performance_schema.events_statements_summary_by_digest d
LEFT JOIN percona.digest_seen s USING (digest)
GROUP BY s.digest

The order by is similar to the one pt-query-digest does by default, but it could be any one you want.

The output is:

| SCHEMA_NAME | SQL_TEXT                                                                                                                                                                  | EXECUTION_TIME | AVERAGE_TIME | COUNT_STAR |
| percona     | UPDATE population SET age=age+1 WHERE id=148                                                                                                                              |  202304.145758 |     1.949487 |     103773 |
| percona     | SELECT age FROM population WHERE id BETWEEN 153 AND 153+69                                                                                                                |     488.572609 |     0.000176 |    2771352 |
| percona     | SELECT sex,age,estimate2012 FROM population WHERE id BETWEEN 174 AND 174+69 ORDER BY sex                                                                                  |     108.841575 |     0.000236 |     461412 |
| percona     | SELECT census2010 FROM population WHERE id=153                                                                                                                            |      62.742239 |     0.000090 |     693526 |
| percona     | SELECT SUM(estimate2014) FROM population WHERE id BETWEEN 154 AND 154+69                                                                                                  |      44.940020 |     0.000195 |     230810 |
| percona     | SELECT DISTINCT base2010 FROM population WHERE id BETWEEN 154 AND 154+69 ORDER BY base2010                                                                                |      33.909593 |     0.000294 |     115338 |
| percona     | UPDATE population SET estimate2011='52906609184-39278192019-93190587310-78276160274-48170779146-66415569224-40310027367-70054020251-87998206812-01032761541' WHERE id=154 |       8.231353 |     0.000303 |      27210 |
| percona     | COMMIT                                                                                                                                                                    |       2.630153 |     0.002900 |        907 |
| percona     | BEGIN                                                                                                                                                                     |       0.705435 |     0.000031 |      23127 |
|             | SELECT 1                                                                                                                                                                  |       0.422626 |     0.000102 |       4155 |
10 rows in set (0.10 sec)

Finally, you can do some cleanup:

DROP event IF EXISTS getDigest;
DROP TABLE IF EXISTS percona.digest_seen;
SET SESSION max_heap_table_size = @@max_heap_table_size;
UPDATE performance_schema.setup_consumers SET ENABLED = 'NO' WHERE NAME = 'events_statements_history';

Summary: Performance Schema is doing the query digest already for you. It is just a matter of how to access the data in a way that suits your requirements.

The post MySQL query digest with Performance Schema appeared first on MySQL Performance Blog.


Capture database traffic using the Performance Schema

Capturing data is a critical part of performing a query analysis, or even just to have an idea of what’s going on inside the database.

There are several known ways to achieve this. For example:

  • Enable the General Log
  • Use the Slow Log with long_query_time = 0
  • Capture packets that go to MySQL from the network stream using TCPDUMP 
  • Use the pt-query-digest with the –processlist parameter

However, these methods can add significant overhead and might even have negative performance consequences, such as:

Now, sometimes you just need to sneak a peek at the traffic. Nothing fancy. In that case, probably the faster and easiest way to gather some traffic data is to use pt-query-digest with the –processlist. It doesn’t require any change in the server’s configuration nor critical handling of files. It doesn’t even require access to the server, just a user with the proper permissions to run “show full processlist”. But, and this is a significantly big “but,” you have to take into account that polling the SHOW PROCESSLIST command misses quite a number of queries and gives very poor timing resolution, among other things (like the processlist Mutex).

What’s the alternative? Use the Performance Schema. Specifically: The events_statements_* and threads tables.

First, we have to make sure that we have the correspondent consumers enabled:

mysql> select * from setup_consumers where name like 'events%statement%' and enabled = 'yes';
| NAME                           | ENABLED |
| events_statements_current      | YES     |
| events_statements_history      | YES     |
| events_statements_history_long | YES     |
3 rows in set (0.00 sec)

Additionally, for statistics to be collected for statements, it is not sufficient to enable only the final statement/sql/* instruments used for individual statement types. The abstract statement/abstract/* instruments must be enabled as well. This should not normally be an issue because all statement instruments are enabled by default.

If you can’t see the event_statements_* consumers on your setup_consumers tables, you’re probably running a MySQL version prior to 5.6.3. Before that version, the events_statements_* tables didn’t exists. MySQL 5.6 might not be more widely used, as was already pointed out in this same blog.

Before continuing, it’s important to note that the most important condition at the moment of capture data is that:

The statements must have ended.

If the statement is still being executed, it can’t be part of the collected traffic. For the ones out there that want to know what’s running inside MySQL, there’s already a detailed non-blocking processlist view to replace [INFORMATION_SCHEMA. | SHOW FULL] PROCESSLIST available with Sys Schema (that will come as default in MySQL 5.7).

Our options to capture data are: get it from one of the three available tables: events_statements_current, events_statements_history or events_statements_history_long.

First option: use the events_statements_current table, which contains current statement events. Since we only want to get statements that have ended, the query will need to add the condition END_EVENT_ID IS NOT NULL to the query. This column is set to NULL when the event starts and updated to the thread current event number when the event ends, but when testing, there were too many missing queries. This is probably because between iterations, the associated threads were removed from the threads table or simply because the time between END_EVENT_ID being updated and the row being removed from the table is too short. This option is discarded.

Second option: The events_statements_history table contains the most recent statement events per thread and since statement events are not added to the events_statements_history table until they have ended, using this table will do the trick without additional conditions in order to know if the event is still running or not. Also, as new events are added, older events are discarded if the table is full.

That means that this table size is fixed. You can change the table size by modifying the variable performance_schema_events_statements_history_size. In the server version I used (5.6.25-73.1-log Percona Server (GPL), Release 73.1, Revision 07b797f) the table size is, by default, defined as autosized (-1) and can have 10 rows per thread. For example: if you are running 5 threads, the table will have 50 rows.

Since it is a fixed size, chances are that some events might be lost between iterations.

Third option: The events_statements_history_long table, which is kind of an extended version of events_statements_history table. Depending on the MySQL version, by default it can hold up to 10000 rows or be autosized (also modifiable with the variable performance_schema_events_statements_history_long_size)

One major -and not cool at all- drawback for this table is that “When a thread ends, its rows are removed from the table”. So it is not history-history data. It will go as far as the oldest thread, with the older event still alive.

The logical option to choose would be the third one: use the events_statements_history_long table. I’ve created a small script (available here) to collect infinite iterations on all the events per thread between a range of event_id’s. The idea of the range is to avoid capturing the same event more than once. Turns out that the execute a query against this table is pretty slow, something between 0.53 seconds and 1.96 seconds. It can behave in a quite invasive way.

Which leave us with the second option: The events_statements_history table.

Since the goal is to capture data in a slow log format manner,  additional information needs to be obtained from the threads table, which has a row for each server thread. The most important thing to remember: access to threads does not require a mutex and has minimal impact on server performance.

Combined, these two tables give us enough information to simulate a very comprehensive slow log format. We just need the proper query:

'','# Time: ', date_format(CURDATE(),'%y%m%d'),' ',TIME_FORMAT(NOW(6),'%H:%i:%s.%f'),'\n'
,'# Schema: ',CURRENT_SCHEMA,'  Last_errno: ',MYSQL_ERRNO,'  ','\n'
,'# Query_time: ',ROUND(s.TIMER_WAIT / 1000000000000, 6),' Lock_time: ',ROUND(s.LOCK_TIME / 1000000000000, 6),'  Rows_sent: ',ROWS_SENT,'  Rows_examined: ',ROWS_EXAMINED,'  Rows_affected: ',ROWS_AFFECTED,'\n'
,'# Tmp_tables: ',CREATED_TMP_TABLES,'  Tmp_disk_tables: ',CREATED_TMP_DISK_TABLES,'  ','\n'
,'# Full_scan: ',IF(SELECT_SCAN=0,'No','Yes'),'  Full_join: ',IF(SELECT_FULL_JOIN=0,'No','Yes'),'  Tmp_table: ',IF(CREATED_TMP_TABLES=0,'No','Yes'),'  Tmp_table_on_disk: ',IF(CREATED_TMP_DISK_TABLES=0,'No','Yes'),'\n'
FROM performance_schema.events_statements_history s
JOIN performance_schema.threads t using(thread_id)
AND t.PROCESSLIST_ID != connection_id()

The idea of this query is to get a Slow Log format as close as possible to the one that can be obtained by using all the options from the log_slow_filter variable.

The other conditions are:

  • t.TYPE = ‘FOREGROUND’: The threads table provides information about background threads, which we don’t intend to analyze. User connection threads are foreground threads.
  • t.PROCESSLIST_INFO IS NOT NULL: This field is NULL if the thread is not executing any statement.
  • t.PROCESSLIST_ID != connection_id(): Ignore me (this query).

The output of the query will look like a proper Slow Log output:

# Time: 150928 18:13:59.364770
# User@Host: root[root] @ localhost []  Id: 58918
# Schema: test  Last_errno: 0
# Query_time: 0.000112 Lock_time: 0.000031  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0
# Tmp_tables: 0  Tmp_disk_tables: 0
# Full_scan: No  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
INSERT INTO sbtest1 (id, k, c, pad) VALUES (498331, 500002, '94319277193-32425777628-16873832222-63349719430-81491567472-95609279824-62816435936-35587466264-28928538387-05758919296'
, '21087155048-49626128242-69710162312-37985583633-69136889432');

And this file can be used with pt-query-digest to aggregate similar queries, just as it was a regular slow log output. I ran a small test which consists of:

  • Generate traffic using sysbench. This is the sysbench command used:
    sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password= --mysql-db=test --mysql-table-engine=innodb --oltp-test-mode=complex --oltp-read-only=off --oltp-reconnect=on --oltp-table-size=1000000 --max-requests=100000000 --num-threads=4 --report-interval=1 --report-checkpoints=10 --tx-rate=0 run
  • Capture the data using slow log + long_query_time = 0
  • Capture data using pt-query-digest –processlist
  • Capture data from Performance Schema
  • Run pt-query-digest on the 3 files

The results were:

– Slow Log:

# Profile
# Rank Query ID           Response time Calls  R/Call V/M   Item
# ==== ================== ============= ====== ====== ===== ==============
#    1 0x813031B8BBC3B329 47.7743 18.4%  15319 0.0031  0.01 COMMIT
#    2 0x737F39F04B198EF6 39.4276 15.2%  15320 0.0026  0.00 SELECT sbtest?
#    3 0x558CAEF5F387E929 37.8536 14.6% 153220 0.0002  0.00 SELECT sbtest?
#    4 0x84D1DEE77FA8D4C3 30.1610 11.6%  15321 0.0020  0.00 SELECT sbtest?
#    5 0x6EEB1BFDCCF4EBCD 24.4468  9.4%  15322 0.0016  0.00 SELECT sbtest?
#    6 0x3821AE1F716D5205 22.4813  8.7%  15322 0.0015  0.00 SELECT sbtest?
#    7 0x9270EE4497475EB8 18.9363  7.3%   3021 0.0063  0.00 SELECT performance_schema.events_statements_history performance_schema.threads
#    8 0xD30AD7E3079ABCE7 12.8770  5.0%  15320 0.0008  0.01 UPDATE sbtest?
#    9 0xE96B374065B13356  8.4475  3.3%  15319 0.0006  0.00 UPDATE sbtest?
#   10 0xEAB8A8A8BEEFF705  8.0984  3.1%  15319 0.0005  0.00 DELETE sbtest?
# MISC 0xMISC              8.5077  3.3%  42229 0.0002   0.0 <10 ITEMS>

– pt-query-digest –processlist

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ===============
#    1 0x737F39F04B198EF6 53.4780 16.7%  3676 0.0145  0.20 SELECT sbtest?
#    2 0x813031B8BBC3B329 50.7843 15.9%  3577 0.0142  0.10 COMMIT
#    3 0x558CAEF5F387E929 50.7241 15.8%  4024 0.0126  0.08 SELECT sbtest?
#    4 0x84D1DEE77FA8D4C3 35.8314 11.2%  2753 0.0130  0.11 SELECT sbtest?
#    5 0x6EEB1BFDCCF4EBCD 32.3391 10.1%  2196 0.0147  0.21 SELECT sbtest?
#    6 0x3821AE1F716D5205 28.1566  8.8%  2013 0.0140  0.17 SELECT sbtest?
#    7 0x9270EE4497475EB8 22.1537  6.9%  1381 0.0160  0.22 SELECT performance_schema.events_statements_history performance_schema.threads
#    8 0xD30AD7E3079ABCE7 15.4540  4.8%  1303 0.0119  0.00 UPDATE sbtest?
#    9 0xE96B374065B13356 11.3250  3.5%   885 0.0128  0.09 UPDATE sbtest?
#   10 0xEAB8A8A8BEEFF705 10.2592  3.2%   792 0.0130  0.09 DELETE sbtest?
# MISC 0xMISC              9.7642  3.0%   821 0.0119   0.0 <3 ITEMS>

– Performance Schema

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ==============
#    1 0x813031B8BBC3B329 14.6698 24.8% 12380 0.0012  0.00 COMMIT
#    2 0x558CAEF5F387E929 12.0447 20.4% 10280 0.0012  0.00 SELECT sbtest?
#    3 0x737F39F04B198EF6  7.9803 13.5% 10280 0.0008  0.00 SELECT sbtest?
#    4 0x3821AE1F716D5205  4.6945  7.9%  5520 0.0009  0.00 SELECT sbtest?
#    5 0x84D1DEE77FA8D4C3  4.6906  7.9%  7350 0.0006  0.00 SELECT sbtest?
#    6 0x6EEB1BFDCCF4EBCD  4.1018  6.9%  6310 0.0007  0.00 SELECT sbtest?
#    7 0xD30AD7E3079ABCE7  3.7983  6.4%  3710 0.0010  0.00 UPDATE sbtest?
#    8 0xE96B374065B13356  2.3878  4.0%  2460 0.0010  0.00 UPDATE sbtest?
#    9 0xEAB8A8A8BEEFF705  2.2231  3.8%  2220 0.0010  0.00 DELETE sbtest?
# MISC 0xMISC              2.4961  4.2%  2460 0.0010   0.0 <7 ITEMS>

The P_S data is closer to the Slow Log one than the captured with regular SHOW FULL PROCESSLIST, but it is still far from being accurate. Remember that this is an alternative for a fast and easy way to capture traffic without too much trouble, so that’s a trade-off that you might have to accept.

Summary: Capture traffic always comes with a tradeoff, but if you’re willing to sacrifice accuracy it can be done with minimal impact on server performance, using the Performance Schema. Because P_S is enabled by default since MySQL 5.6.6 you might already be living with the overhead (if using 5.6). If you are one of the lucky ones that have P_S on production, don’t be afraid to use it. There’s a lot of data already in there.

The post Capture database traffic using the Performance Schema appeared first on MySQL Performance Blog.


Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars

I will be presenting two webinars in August:

This Wednesday’s webinar on advanced MySQL query tuning will be focused on tuning the “usual suspects”: queries with “Group By”, “Order By” and subqueries; those query types are usually perform bad in MySQL and add an additional load as MySQL may need to create a temporary table(s) or perform a filesort. New this year: I will talk more about new MySQL 5.6 features that can increase MySQL query performance for subqueries and “order by” queries.Percona MySQL webinars

Next week’s performance schema webinar will be focused on practical use of Performance Schema in MySQL 5.6. I will show real-world examples of MySQL monitoring and troubleshooting using MySQL 5.6 Performance Schema. Performance Schema can give you a lots of valuable information about MySQL internal operations. For example, if using a multi-tenant shared MySQL installation, Performance Schema can give you a lots of extensive information about your MySQL load: breakdown by user, queries, tables and schema (see more examples in my previous blog post: Using MySQL 5.6 Performance Schema in multi-tenant environments).



The post Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com