Oct
15
2018
--

Identifying High Load Spots in MySQL Using Slow Query Log and pt-query-digest

pt-query-digest MySQL slow queries

pt-query-digest MySQL slow queriespt-query-digest is one of the most commonly used tool when it comes to query auditing in MySQL®. By default, pt-query-digest reports the top ten queries consuming the most amount of time inside MySQL. A query that takes more time than the set threshold for completion is considered slow but it’s not always true that tuning such queries makes them faster. Sometimes, when resources on server are busy, it will impact every other operation on the server, and so will impact queries too. In such cases, you will see the proportion of slow queries goes up. That can also include queries that work fine in general.

This article explains a small trick to identify such spots using pt-query-digest and the slow query log. pt-query-digest is a component of Percona Toolkit, open source software that is free to download and use.

Some sample data

Let’s have a look at sample data in Percona Server 5.7. Slow query log is configured to capture queries longer than ten seconds with no limit on rate of logging, which is generally considered to throttle the IO that comes while writing slow queries to the log file.

mysql> show variables like 'log_slow_rate%' ;
+---------------------+---------+
| Variable_name       | Value    |
+---------------------+---------+
| log_slow_rate_limit | 1       |  --> Log all queries
| log_slow_rate_type  | session |
+---------------------+---------+
2 rows in set (0.00 sec)
mysql> show variables like 'long_query_time' ;
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |  --> 10 seconds
+-----------------+-----------+
1 row in set (0.01 sec)

When I run pt-query-digest, I see in the summary report that 80% of the queries have come from just three query patterns.

# Profile
# Rank Query ID                      Response time    Calls R/Call   V/M
# ==== ============================= ================ ===== ======== =====
#    1 0x7B92A64478A4499516F46891... 13446.3083 56.1%   102 131.8266  3.83 SELECT performance_schema.events_statements_history
#    2 0x752E6264A9E73B741D3DC04F...  4185.0857 17.5%    30 139.5029  0.00 SELECT table1
#    3 0xAFB5110D2C576F3700EE3F7B...  1688.7549  7.0%    13 129.9042  8.20 SELECT table2
#    4 0x6CE1C4E763245AF56911E983...  1401.7309  5.8%    12 116.8109 13.45 SELECT table4
#    5 0x85325FDF75CD6F1C91DFBB85...   989.5446  4.1%    15  65.9696 55.42 SELECT tbl1 tbl2 tbl3 tbl4
#    6 0xB30E9CB844F2F14648B182D0...   420.2127  1.8%     4 105.0532 12.91 SELECT tbl5
#    7 0x7F7C6EE1D23493B5D6234382...   382.1407  1.6%    12  31.8451 70.36 INSERT UPDATE tbl6
#    8 0xBC1EE70ABAE1D17CD8F177D7...   320.5010  1.3%     6  53.4168 67.01 REPLACE tbl7
#   10 0xA2A385D3A76D492144DD219B...   183.9891  0.8%    18  10.2216  0.00 UPDATE tbl8
#      MISC 0xMISC                     948.6902  4.0%    14  67.7636   0.0 <10 ITEMS>

Query #1 is generated by the qan-agent from PMM and runs approximately once a minute. These results will be handed over to PMM Server. Similarly queries #2 & #3 are pretty simple. I mean, they scan just one row and will return either zero or one rows. They also use indexing, which makes me think that this is not because of something just with in MySQL. I wanted to know if I could find any common aspect of all these occurrences.

Let’s take a closer look at the queries recorded in slow query log.

# grep -B3 DIGEST mysql-slow_Oct2nd_4th.log
....
....
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601021
# Query_time: 139.279651  Lock_time: 64.502959 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601029
# Query_time: 139.282594  Lock_time: 83.140413 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601031
# Query_time: 139.314228  Lock_time: 96.679563 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
....
....

Now you can see two things.

  • All of them have same Unix timestamp
  • All of them were spending more than 70% of their execution time waiting for some lock.

Analyzing the data from pt-query-digest

Now I want to check if I can group the count of queries based on their time of execution. If there are multiple queries at a given time captured into the slow query log, time will be printed for the first query but not all. Fortunately, in this case I can rely on the Unix timestamp to compute the counts. The timestamp is gets captured for every query. Luckily, without a long struggle, a combination of grep and awk utilities have displayed what I wanted to display.

# grep -A1 Query_time mysql-slow_Oct2nd_4th.log | grep SET | awk -F "=" '{ print $2 }' | uniq -c
2   1538450797;
1   1538524822;
3   1538524846;
7   1538524857;
167 1538524947;   ---> 72% of queries have happened at this timestamp.
1   1538551813;
3   1538551815;
6   1538602215;
1   1538617599;
33  1538631015;
1   1538631016;
1   1538631017;

You can use the command below to check the regular date time format of a given timestamp. So, Oct 3, 05:32 is when there was something wrong on the server:

# date -d @1538524947
Wed Oct 3 05:32:27 IST 2018

Query tuning can be carried out alongside this, but identifying such spots helps avoiding spending time on query tuning where badly written queries are not the problem. Having said that, from this point, further troubleshooting may take different sub paths such as checking log files at that particular time, looking at CPU reports, reviewing past pt-stalk reports if set up to run in the background, and dmesg etc. This approach is useful for identifying at what time (or time range) MySQL was more stressed just using slow query log when no robust monitoring tools, like Percona Monitoring and Management (PMM), are deployed.

Using PMM to monitor queries

If you have PMM, you can review Query Analytics to see the topmost slow queries, along with details like execution counts, load etc. Below is a sample screen copy for your reference:

Slow query log from PMM dashboard

NOTE: If you use Percona Server for MySQL, slow query log can report time in micro seconds. It also supports extended logging of  other statistics about query execution. These provide extra power to see the insights of query processing. You can see more information about these options here.

Sep
13
2018
--

Analyzing Amazon Aurora Slow Logs with pt-query-digest

Amazon Aurora MySQL slow query logs with pt-query-digest slow

Amazon Aurora MySQL slow query logs with pt-query-digest slowIn this blog post we shall discuss how you can analyze slow query logs from Amazon Aurora for MySQL, (referred to as Amazon Aurora in the remaining blog). The tools and techniques explained here apply to the other MySQL compatible services available under Amazon Aurora. However, we’ll focus specially on analyzing slow logs from Amazon Aurora version 2 (MySQL 5.7 compatible) using pt-query-digest. We believe there is a bug in Aurora where it logs really big numbers for query execution and lock times for otherwise really fast queries.

So, the main steps we need are:

  1. Enable slow query logging on your Amazon Aurora DB parameter group, apply the change when appropriate.
  2. Download the slow log(s) that match the time that you are interested to investigate, and optionally concatenate them.
  3. Run pt-query-digest on the downloaded logs and check the results.

Enable slow query logging

For our testing we decided to capture all the SELECT queries that were hitting our Amazon Aurora instance, mainly because we had a sysbench OLTP read only workload and that wouldn’t really have a lot of slow queries. An easy way to do so is to enable the capture of slow query logs and set long_query_time to 0 — you will need to enable slow query logging. To achieve that, we created a new DB parameter group and applied it to our test Aurora instance with the following three parameters set as below:

slow_query_log=1
long_query_time=0
min_examined_row_limit=0

Once you have the above configuration applied to Amazon RDS, you will be able to see slow query logs being created in the Amazon RDS console.

Download the log file

You can download the log file of your choice using either the Amazon RDS console OR you can use the following AWS CLI command to achieve the same:

$ aws rds download-db-log-file-portion --db-instance-identifier perconasupport  --starting-token 0 --output text --log-file-name slowquery/mysql-slowquery.log.2018-09-03.09 > mysql-slowquery.log.2018-09-03.09

Depending on the size of the chosen log file, the above command will take some time to complete the download.

Run pt-query-digest on the log file

Once the file has been downloaded you can analyse that using the following pt-query-digest command.

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum mysql-slowquery.log.2018-09-03.09

On our Aurora test slow log file, the initial results didn’t look right so we had to apply a workaround. Here is the header of the initial results from pt-query-digest:

# 456.2s user time, 2.5s system time, 43.80M rss, 141.48M vsz
# Current date: Tue Sep 4 15:54:21 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 507.43Gx concurrency _______
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1826227663297288s 1us 18446744073710s 355917782s 761us 80127878922s 93us
# Lock time 1401952549601936s 0 18446744073710s 273229812s 44us 70205933577s 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69
# Profile
# Rank Query ID Response time Calls R/Call
# ==== ====================== =========================== ======= ========
# 1 0xE81D0B3DB4FB31BC5... 1346612317380813.0000 73.7% 3194111 421592210.5966 18... SELECT sbtest?
# 2 0x9934EF6887CC7A638... 147573952589685.0625 8.1% 319381 462062403.8051 18... SELECT sbtest?
# 3 0x8D589AFA4DFAEEED8... 110680464442264.1094 6.1% 319411 346514254.1812 18... BEGIN
# 4 0xFF7C69F51BBD3A736... 92233720368565.1875 5.1% 319388 288782673.0139 18... SELECT sbtest?
# 5 0xFFFCA4D67EA0A7888... 73786976294861.9844 4.0% 321238 229695665.8143 18... COMMIT
# MISC 0xMISC 55340232221335.8281 3.0% 657509 84166501.4796 0.0 <43 ITEMS>

What’s wrong with the above results is that the total query Exec time and Lock time are very large numbers. Digging deeper into the logs revealed a problem with the slow logs themselves that had very large numbers for Query time & Lock time for some queries. For instance in our case, of 5.13 million queries in the log file, only 111 had the anomaly. Even so, it was enough to skew the results.

# Time: 2018-09-03T08:41:47.363522Z
--
SELECT c FROM sbtest1 WHERE id=24278;
# Time: 2018-09-03T08:41:49.363224Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20869
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964109;
SELECT c FROM sbtest2 WHERE id=989322;
# Time: 2018-09-03T08:41:49.363296Z
--
BEGIN;
# Time: 2018-09-03T08:41:53.362947Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20873
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964113;
SELECT c FROM sbtest1 WHERE id=246889;
# Time: 2018-09-03T08:41:53.363003Z

Incorrect logging

The above two queries are, in fact, really fast, but for some reason the execution time & lock times are wrongly logged in the slow query log. Since the number of such query log records is statistically negligible compared to the total number of queries, we decided to ask pt-query-digest to ignore them using the command line parameter –attribute-value-limit . The default value of this parameter is 0. We decided to increase that to 2^32, and make it ignore the large numbers from the slow query log. So, the pt-query-digest command became:

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum --attribute-value-limit=4294967296 mysql-slowquery.log.2018-09-03.09

This caused the 111 queries with the bad log times to be ignored and the results looked good. In our case, the ignored queries were bad variants of queries for which good versions existed. You can tell this because the number of unique queries remained the same as before after the bad variants were ignored. However, this may not always hold true and one should expect to lose some fidelity, especially if you are analyzing a smaller slow log.

# 441s user time, 450ms system time, 38.19M rss, 111.76M vsz
# Current date: Tue Sep 4 16:23:33 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 0.30x concurrency __________
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1096s 1us 198ms 213us 761us 431us 93us
# Lock time 180s 0 103ms 34us 44us 161us 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69
# Profile
# Rank Query ID Response time Calls R/Call V/M Ite
# ==== =========================== ============== ======= ====== ===== ===
# 1 0xE81D0B3DB4FB31BC558CAE... 400.1469 36.5% 3194111 0.0001 0.00 SELECT sbtest?
# 2 0xF0C5AE75A52E847D737F39... 161.4065 14.7% 319453 0.0005 0.00 SELECT sbtest?
# 3 0xFFFCA4D67EA0A788813031... 155.8740 14.2% 321238 0.0005 0.00 COMMIT
# 4 0x8D589AFA4DFAEEED85FFF5... 107.9827 9.9% 319411 0.0003 0.00 BEGIN
# 5 0x9934EF6887CC7A6384D1DE... 94.1002 8.6% 319381 0.0003 0.00 SELECT sbtest?
# 6 0xFF7C69F51BBD3A736EEB1B... 79.9279 7.3% 319388 0.0003 0.00 SELECT sbtest?
# 7 0xA729E7889F57828D3821AE... 75.3969 6.9% 319398 0.0002 0.00 SELECT sbtest?
# MISC 0xMISC 21.1212 1.9% 18658 0.0011 0.0 <41 ITEMS>
# Query 1: 1.27k QPS, 0.16x concurrency, ID 0xE81D0B3DB4FB31BC558CAEF5F387E929 at byte 358647353
# Scores: V/M = 0.00
# Time range: 2018-09-03T08:00:04 to 2018-09-03T08:42:00
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 62 3194111
# Exec time 36 400s 10us 198ms 125us 332us 300us 80us
# Lock time 74 134s 0 26ms 42us 49us 154us 27us
# Rows sent 3 3.01M 0 1 0.99 0.99 0.11 0.99
# Rows examine 1 3.01M 0 1 0.99 0.99 0.11 0.99
# Query size 57 112.37M 32 38 36.89 36.69 0.53 36.69
# String:
# Databases perconasupport
# Hosts 172.30.2.111
# Users perconasupport
# Query_time distribution
# 1us
# 10us ################################################################
# 100us ##############
# 1ms #
# 10ms #
# 100ms #
# 1s

That number looks familiar

The really big number 18446744073709.550781 seemed to ring a bell. A quick web search revealed that it could be a regression of an old bug in MySQL’s code. The following bugs were found to have the same value being reported for query exec time & query lock time.

  1. https://bugs.mysql.com/bug.php?id=59757
  2. https://bugs.mysql.com/bug.php?id=63524
  3. https://bugs.mysql.com/bug.php?id=35396
Once slow logs were enabled, we used this sysbench command  to generate the workload for the Amazon Aurora instance. You might like to try it yourselves. Please note that this used sysbench version 1.0.14.
$ sysbench --db-driver=mysql --mysql-user=perconasupport --mysql-host=perconasupport-1234567.cgmobiazycdv.eu-west-1.rds.amazonaws.com --mysql-password=XXXXXXX  --mysql-db=perconasupport --range_size=100 --table_size=1000000 --tables=2 --threads=6 --events=0 --time=600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

If you are an Amazon Aurora user, have you found any problems analyzing slow query logs? You are welcome to use the comments section, below, to let me know.

Percona Toolkit

pt-query-digest is part of Percona Toolkit, a collection of advanced open source command-line tools, developed and used by the Percona technical staff. Percona Toolkit is open source and free to download and use.

The post Analyzing Amazon Aurora Slow Logs with pt-query-digest appeared first on Percona Database Performance Blog.

Mar
19
2014
--

How to log slow queries on Slave in MySQL 5.0 with pt-query-digest

Working as a Percona Support Engineer, every day we are seeing lots of issues related to MySQL replication. One very common issue is slave lagging. There are many reasons for slave lag but one common reason is that queries are taking more time on slave then master. How to check and log those long-running queries?  From MySQL 5.1, log-slow-slave-statements variable was introduced, which you can enable on slave and log slow queries. But what if you want to log slow queries on slave in earlier versions like MySQL 5.0?  There is a good solution/workaround: pt-query-digest. How? let’s take a look….

If you want to log all queries that are running on slave (including those, running by sql thread), you can use pt-query-digest with –processlist and –print (In pt-query-digest 2.1.9) OR –output (In pt-query-digest 2.2.7) options and log all queries in specific file. I have tested it in my local environment and it works.

You can start pt-query-digest like below on Slave,

nil@Dell:~$ /percona-toolkit-2.1.9/bin/pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --print --no-report
OR
nil@Dell:-$ /percona-toolkit-2.2.7/bin/pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --no-report --output=slowlog

Run some long running queries on Master,

nil@Dell:~$ mysql -umsandbox -p --socket=/tmp/mysql_sandbox34497.sock
Enter password:
mysql> use percona
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> delete from test limit 5000000;
Query OK, 5000000 rows affected (1 min 54.33 sec)
mysql> delete from test limit 5000000;
Query OK, 5000000 rows affected (1 min 56.42 sec)

mysql>

and you’ll see the output on Slave like this,

nil@Dell:~/Downloads/percona-toolkit-2.1.9/bin$ ./pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --print --no-report
# Time: 2014-03-18T12:10:57
# User@Host: system user[system user] @ []
# Query_time: 114.000000 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
use percona;
delete from test limit 5000000;
nil@Dell:~/Downloads/percona-toolkit-2.2.7/bin$ pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --no-report --output=slowlog
# Time: 2014-03-18T12:21:05
# User@Host: system user[system user] @ []
# Query_time: 117.000000 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
use percona;
delete from test limit 5000000;

You can also run pt-query-digest in background like a daemon and send output to the specific file like slow.log and review it.

i.e /percona-toolkit-2.1.9/bin/pt-query-digest –processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock –print –no-report > slow.log 2>&1

OR

i.e /percona-toolkit-2.2.7/bin/pt-query-digest –processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock –no-report –output=slowlog > slow.log 2>&1

Here, the default output will be just like slow query log. If we have master-master replication where every master is slave too and we want to log only those statements that are executing by sql_thread then –filter option can be used like this:

pt-query-digest –filter ‘$event->user eq “system user”‘ –no-report –output=slowlog

Since pt-query-digest–processlist polls 10 times/second ( –interval option), it’s not reliable to use for collecting complete query logs, because quick queries could fall in between the polling intervals. And in any case, it won’t measure query time with precision any better than 1/10th of a second. But if the goal is to identify queries that are very long-running, it should be adequate.

The post How to log slow queries on Slave in MySQL 5.0 with pt-query-digest appeared first on MySQL Performance Blog.

Mar
14
2014
--

Tools and tips for analysis of MySQL’s Slow Query Log

MySQL's Slow Query LogMySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Peter Zaitsev first wrote about this back in 2006 – there have been a few other posts here on the MySQL Performance Blog since then (check this and this, too) but I wanted to revisit his original subject in today’s post.

Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. Percona Toolkit’s pt-query-digest is one of the most powerful tools for SQL analysis. That’s because pt-query-digest can generate a very comprehensive report that spots problematic queries very efficiently. It works equally well with Oracle MySQL server. This post will focus mainly on pt-query-digest.

Slow query log is great at spotting really slow queries that are good candidates for optimization. Beginning with MySQL 5.1.21, the minimum value is 0 for long_query_time, and the value can be specified to a resolution of microseconds. In Percona Server additional statistics may be output to the slow query log. You can find the full details here. For our clients, we often need to identify queries that impact an application the most. It does not always have to be the slowest queries – queries that runs more frequently with lower execution time per call put more load on a server than queries running with lower frequency. We of course want to get rid of really slow queries but to really optimize application throughput, we also need to investigate queries that generate most of the load. Further, if you enable option log_queries_not_using_indexes  then MySQL will log queries doing full table scans which doesn’t always reflect that the query is slow, because in some situations the query optimizer chooses full table scan rather than using any available index or probably showing all records from a small table.

Our usual recommendation is to generate the slow log with long_query_time=0. This will record all the traffic but this will be I/O intensive and will eat up disk space very quickly depending on your workload. So beware of running with long_query_time=0 for only a specific period of time and revert it back to logging only very slow queries. In Percona Server there is nice option where you can limit the rate of logging, log_slow_rate_limit is the option to handle it. Filtering slow query log is very helpful too in some cases e.g. if we know the main performance issue is table scans we can log queries only doing full table scans or if we see I/O is bottleneck we can collect queries doing full scans and queries creating on disk temporary tables. Again, this is only possible in Percona Server with the log_slow_filter option. Also, you may want to collect everything on slow query log and then filter with pt-query-digest. Depending on I/O capacity, you might prefer one or another way, as collecting everything in slow query log allows us to investigate other queries too if needed. Finally, use pt-query-digest to generate an aggregate report over slow query log which highlights the problematic part very efficiently. Again, pt-query-digest can bring up server load high so our usual recommendation on it is to move slow query log to some staging/dev server and run pt-query-digest over there to generate the report.

Note: changing the long_query_time parameter value only affects newly created connections to log queries exceeds long_query_time threshold. In Percona Server there is feature which changes variable scope to global instead of local. Enabling slow_query_log_use_global_control  log queries for connected sessions too after changing long_query_time parameter threshold. You can read more about this patch here.

I am not going to show you a detailed report of pt-query-digest and explain each part of it here, because it is well defined already by my colleague Ovais Tariq in this post. However, I will show you some of the other aspects of pt-query-digest tool here.

Let me show you code snippets that enable slow query log for only a specific time period with long_query_time=0 and log_slow_verbosity to ‘full’. log_slow_verbosity is a Percona Server variable which logs extra stats such as information on query cache, Filesort, temporary tables, InnoDB statistics etc. Once you are done collecting logs, revert back the values for long_query_time to the previous value, and finally run pt-query-digest on the log to generate report. Note: run the below code in same MySQL session.

-- Save previous settings
mysql> SELECT @@global.log_slow_verbosity INTO @__log_slow_verbosity;
mysql> SELECT @@global.long_query_time INTO @__long_query_time;
mysql> SELECT @@global.slow_query_log INTO @__slow_query_log;
mysql> SELECT @@global.log_slow_slave_statements INTO @__log_slow_slave_statements;
-- Keep this in safe place, we'll need to run pt-query-digest
mysql> SELECT NOW() AS "Time Since";
-- Set values to enable query collection
mysql> SET GLOBAL slow_query_log_use_global_control='log_slow_verbosity,long_query_time';
mysql> SET GLOBAL log_slow_verbosity='full';
mysql> SET GLOBAL slow_query_log=1;
mysql> SET GLOBAL long_query_time=0;
mysql> SET GLOBAL log_slow_slave_statements=1;
-- Verify settings are OK
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity;
-- wait for 30 - 60 minutes
-- Keep this one too, also for pt-query-digest
mysql> SELECT NOW() AS "Time Until";
-- Revert to previous values
mysql> SET GLOBAL slow_query_log=@__slow_query_log;
mysql> SET GLOBAL long_query_time=@__long_query_time;
mysql> SET GLOBAL log_slow_verbosity=@__log_slow_verbosity; -- if percona server
mysql> SET GLOBAL log_slow_slave_statements=@__log_slow_slave_statements;
-- Verify settings are back to previous values
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity, @@global.slow_query_log_file;
-- Then with pt-query-digest run like (replace values for time-since, time-until and log name)
$ pt-query-digest --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out
-- If you're not using Percona Server then you need to remove all references to log_slow_verbosity, slow_query_log_use_global_control and log_slow_slave_statements (priot MySQL 5.6).

My colleague Bill Karwin wrote bash script that does almost the same as the above code. You can find the script to collect slow logs here. This script doesn’t hold connection to the database session while you wait for logs to accumulate and it sets all the variables back to the state they were before. For full documentation view this.

Further, you can also get explain output into the report from the pt-query-digest tool. For that you need to use –explain parameter similar to as follows.

$ pt-query-digest --explain u=<user>,p=<password>,h=<hostname> /path/to/slow.log > /path/to/report.out

Explain output in query report will get you all the information for query execution plan and explain output signal towards how that particular query going to be executed. Note that, if you execute pt-query-digest over slow query log other than originated server of slow query log as I mentioned above e.g. staging/dev you may get different execution path for the query in the report or lower number of rows to examined, etc., because usually staging/dev servers has different data distribution, different MySQL versions, or different indexes. MySQL explain adds overhead as queries needs to be prepared on the server to generate intended query execution path. For this reason, you may want to run pt-query-digest with –explain on a production replica.

It’s worth mentioning that logging queries with log_slow_verbosity in Percona Server is really handy as it shows lots of additional statistics and it is more helpful in situations when the explain plan reports a different execution path than when the query is executed. On that particular topic, you may want to check this nice post.

pt-query-digest also supports filters. You can read more about it here. Let me show you an example. The following command will discard everything apart from insert/update/delete queries in pt-query-digest output report.

$ pt-query-digest --filter '$event->{arg} =~ m/^(insert|update|delete)/i' --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out

If you’re looking for some GUI tools for pt-query-digest then I would recommend reading this nice blogpost from my colleague Roman. Further, our CEO Peter Zaitsev also wrote a post recently where he shows the comparison between performance_schema and slow query log. Check here for details.

In related new, Percona recently announced Percona Cloud Tools, the next generation of tools for MySQL. It runs a client-side agent (pt-agent) which runs pt-query-digest on the server with some intervals and uploads the aggregated data to the Percona Cloud Tools API which process it further.  Query Analytics is one tool from the Percona Cloud Tools that provides advanced query metrics. It  is a nice visualization tool. You may be interested to learn more about it here, and it’s also worth viewing this related webinar about Percona Cloud Tools from our CTO Vadim Tkachenko.

Conclusion:
pt-query-digest from Percona Toolkit is a versatile (and free) tool for slow query log analysis. It provides good insight about every individual query, especially in Percona Server with log_slow_verbosity enabled, e.g. log queries with microsecond precision, log information about the query’s execution plan. On top of that, Percona Cloud Tools includes Query Analytics which provides you with good visuals about query performance and also provides a view of historical data.

The post Tools and tips for analysis of MySQL’s Slow Query Log appeared first on MySQL Performance Blog.

Feb
11
2014
--

PERFORMANCE_SCHEMA vs Slow Query Log

A couple of weeks ago, shortly after Vadim wrote about Percona Cloud Tools and using Slow Query Log to capture the data, Mark Leith asked why don’t we just use Performance Schema instead? This is an interesting question and I think it deserves its own blog post to talk about.

First, I would say main reason for using Slow Query Log is compatibility. Basic Slow query log with microsecond query time precision is available starting in MySQL 5.1, while events_statements_summary_by_digest table was only added in MySQL 5.6 which was out for about a year now but which is still far from complete market domination. It is especially interesting if you look at the low-end market – users who just run some web applications using whatever MySQL Version their hosting provider installed for them. If you look at WordPress Users for example you will see MySQL 5.6 at just 1.3% as of today. As time passes and MySQL 5.6 takes a larger share of the market we surely should add support for Performance Schema based query sampling to Percona Cloud Tools.

The second reason is amount of data available. There is a fair amount of data which Performance Schema digest table providers including some which are not available in Percona Server logs:

mysql> select * from events_statements_summary_by_digest where digest_text like "%sbtest%" \G
*************************** 1. row ***************************
                SCHEMA_NAME: sbtest
                     DIGEST: 2062ac01bc1798df1eebd3e111a22b59
                DIGEST_TEXT: SELECT c FROM sbtest WHERE id = ?
                 COUNT_STAR: 882262
             SUM_TIMER_WAIT: 933683089690000
             MIN_TIMER_WAIT: 106418000
             AVG_TIMER_WAIT: 1058283000
             MAX_TIMER_WAIT: 1031299058000
              SUM_LOCK_TIME: 60853469000000
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 889205
          SUM_ROWS_EXAMINED: 890279
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 0
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 0
              SUM_SORT_SCAN: 0
          SUM_NO_INDEX_USED: 0
     SUM_NO_GOOD_INDEX_USED: 0
                 FIRST_SEEN: 2014-02-10 17:39:39
                  LAST_SEEN: 2014-02-10 17:40:39
1 row in set (0.00 sec)

BTW – note Rows Sent here not being equal to rows examined while in reality they should be exactly the same for this benchmark. This is the approximate accounting of Performance Schema in action, though.

Now compare it to the sample for the same query in the slow query log in Percona Server

SET timestamp=1392071614;
SELECT c from sbtest where id=387872;
# Time: 140210 17:33:34.837517
# User@Host: root[root] @ localhost []  Id:    95
# Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.001000  Lock_time: 0.000054  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0
# Bytes_sent: 74  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 90E34CF
# QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 1
# Log_slow_rate_type: query  Log_slow_rate_limit: 100

What I believe is the most valuable here is the information about Innodb IO which instantly allows us to isolate the query patterns which are disk IO bound as well as information about Bytes Sent which allows to see which queries are responsible for generating high volumes of network traffic.

I wish Performance Schema would be enhanced to return data in something like JSON where for each digest the top waits are accounted as in reality it can be different by the query. Some queries might be waiting on IO other on Locks, yet another could be bound by some specific mutexes. Having exact information about what limits performance of queries of the specific type would be a gem.

The third reason for using Slow Query Log is using placeholders. Note in the query above has “SELECT c FROM sbtest WHERE id = ?” which is not very convenient – I can’t even run EXPLAIN for such query to see what could be the reason for its slowness. Log contains exact queries and we are able to show exact queries in reports (pt-query-digest and Percona Cloud Tools) or you can opt for seeing only query digests if you do not want to see the values for privacy/security reasons. Picking the constant for a query with worse plan usually works very well to check out worse case scenario.

This might look like very simple problem – why you can’t just come up with ID and reconstruct the query but for more complicated queries with multiple conditions it is virtually impossible to reconstruct the realistic query.

Now in theory you can look up actual query from events_statements_history_long and join the data together, however it does not really work at the high query rates as it is very likely rare queries will not have a sample available in the history table.

The forth reason is support for prepared statements. Enable prepared statements and you will not see the actual query in the digest. This may or may not be an issue for your application but it further limits usability of this feature. I can’t count on simply looking at events_statements_summary_by_digest to always find which queries are responsible for majority of the load.

The fifth reason is performance or actually not much of a reason. I really think Performance Schema overhead is reasonable for most workloads. In simple queries benchmark which I’ve done:

sysbench --test=oltp --oltp-test-mode=simple  --num-threads=16 --max-requests=0 --max-time=60   --mysql-user=root --oltp-table-size=1000000 run

On my old server I got some 20.2K QPS with Performance Schema Disabled and 19.4 QPS with Performance Schema enabled which is overhead of less than 5%.

For most workloads paying 5% to have insight about what is happening with the system is a very fair trade.

The slow query log overhead actually can be much larger. The moderate level of details “microtime” resulted in 15.1K queries and log_slow_verbosity=”full” takes this value down to 11.6K having over 40% overhead. Note I designed this test as worse case scenario and for more complicated queries the overhead is likely to be less (while overhead with Performance Schema can stay the same or even increase depending on what queries are doing).

Some people set long_query_time to some non zero value to reduce amount of queries logged. This is bad idea because the workload logged will be very different from your real one – chances are majority of your load comes from simple quick queries which will be very unrepresented with non zero long query time with only outliers logged.

A much better idea is to enable Sampling which is available in latest version of Percona Server – this way only one out of every so many queries will be logged:

mysql> set global log_slow_rate_limit=100;
Query OK, 0 rows affected (0.00 sec)
mysql> set global log_slow_rate_type="query";
Query OK, 0 rows affected (0.00 sec)

This will get one out ever 100 queries randomly logged which should give you good idea of your workload without such skew. It works well unless you have some rare and very complicated queries which impact your workload disproportionally and which you can’t ignore for performance analyses. To deal with this situation we added slow_query_log_always_write_time option to Percona Server, which allows you to always log such queries in the log even if they would not be selected because of sampling.

Enabling sampling 1/100 queries for this workload with full level of details I get 19.8K queries giving us overhead less than 2% which is even less than Performance Schema and selecting 1/1000 queries to be logged I can get overhead to about 1%. So with Slow Query Log I can make a choice between accuracy and overhead.

I wish Performance Schema would offer something similar – instead of figuring out what probes and statistic tables I need (not all of them are enabled by default) I could just chose to get the sampled data and play with accuracy vs overhead instead of missing the data all together.

Summary: There is a great amount of data in Performance Schema in MySQL 5.6 and Percona Server 5.6 though there are a number of reasons that you also might not want to discard the old and proven tools based on the slow query log just yet.

P.S., If you’re interested in seeing Query Performance Analyses in action please join Vadim’s webinar on Wednesday Feb 12. If you missed it check out recording at that same link.

The post PERFORMANCE_SCHEMA vs Slow Query Log appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com