Oct
15
2018
--

Identifying High Load Spots in MySQL Using Slow Query Log and pt-query-digest

pt-query-digest MySQL slow queries

pt-query-digest MySQL slow queriespt-query-digest is one of the most commonly used tool when it comes to query auditing in MySQL®. By default, pt-query-digest reports the top ten queries consuming the most amount of time inside MySQL. A query that takes more time than the set threshold for completion is considered slow but it’s not always true that tuning such queries makes them faster. Sometimes, when resources on server are busy, it will impact every other operation on the server, and so will impact queries too. In such cases, you will see the proportion of slow queries goes up. That can also include queries that work fine in general.

This article explains a small trick to identify such spots using pt-query-digest and the slow query log. pt-query-digest is a component of Percona Toolkit, open source software that is free to download and use.

Some sample data

Let’s have a look at sample data in Percona Server 5.7. Slow query log is configured to capture queries longer than ten seconds with no limit on rate of logging, which is generally considered to throttle the IO that comes while writing slow queries to the log file.

mysql> show variables like 'log_slow_rate%' ;
+---------------------+---------+
| Variable_name       | Value    |
+---------------------+---------+
| log_slow_rate_limit | 1       |  --> Log all queries
| log_slow_rate_type  | session |
+---------------------+---------+
2 rows in set (0.00 sec)
mysql> show variables like 'long_query_time' ;
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |  --> 10 seconds
+-----------------+-----------+
1 row in set (0.01 sec)

When I run pt-query-digest, I see in the summary report that 80% of the queries have come from just three query patterns.

# Profile
# Rank Query ID                      Response time    Calls R/Call   V/M
# ==== ============================= ================ ===== ======== =====
#    1 0x7B92A64478A4499516F46891... 13446.3083 56.1%   102 131.8266  3.83 SELECT performance_schema.events_statements_history
#    2 0x752E6264A9E73B741D3DC04F...  4185.0857 17.5%    30 139.5029  0.00 SELECT table1
#    3 0xAFB5110D2C576F3700EE3F7B...  1688.7549  7.0%    13 129.9042  8.20 SELECT table2
#    4 0x6CE1C4E763245AF56911E983...  1401.7309  5.8%    12 116.8109 13.45 SELECT table4
#    5 0x85325FDF75CD6F1C91DFBB85...   989.5446  4.1%    15  65.9696 55.42 SELECT tbl1 tbl2 tbl3 tbl4
#    6 0xB30E9CB844F2F14648B182D0...   420.2127  1.8%     4 105.0532 12.91 SELECT tbl5
#    7 0x7F7C6EE1D23493B5D6234382...   382.1407  1.6%    12  31.8451 70.36 INSERT UPDATE tbl6
#    8 0xBC1EE70ABAE1D17CD8F177D7...   320.5010  1.3%     6  53.4168 67.01 REPLACE tbl7
#   10 0xA2A385D3A76D492144DD219B...   183.9891  0.8%    18  10.2216  0.00 UPDATE tbl8
#      MISC 0xMISC                     948.6902  4.0%    14  67.7636   0.0 <10 ITEMS>

Query #1 is generated by the qan-agent from PMM and runs approximately once a minute. These results will be handed over to PMM Server. Similarly queries #2 & #3 are pretty simple. I mean, they scan just one row and will return either zero or one rows. They also use indexing, which makes me think that this is not because of something just with in MySQL. I wanted to know if I could find any common aspect of all these occurrences.

Let’s take a closer look at the queries recorded in slow query log.

# grep -B3 DIGEST mysql-slow_Oct2nd_4th.log
....
....
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601021
# Query_time: 139.279651  Lock_time: 64.502959 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601029
# Query_time: 139.282594  Lock_time: 83.140413 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
# User@Host: ztrend[ztrend] @ localhost []  Id: 6431601031
# Query_time: 139.314228  Lock_time: 96.679563 Rows_sent: 0  Rows_examined: 0
SET timestamp=1538524947;
SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history;
....
....

Now you can see two things.

  • All of them have same Unix timestamp
  • All of them were spending more than 70% of their execution time waiting for some lock.

Analyzing the data from pt-query-digest

Now I want to check if I can group the count of queries based on their time of execution. If there are multiple queries at a given time captured into the slow query log, time will be printed for the first query but not all. Fortunately, in this case I can rely on the Unix timestamp to compute the counts. The timestamp is gets captured for every query. Luckily, without a long struggle, a combination of grep and awk utilities have displayed what I wanted to display.

# grep -A1 Query_time mysql-slow_Oct2nd_4th.log | grep SET | awk -F "=" '{ print $2 }' | uniq -c
2   1538450797;
1   1538524822;
3   1538524846;
7   1538524857;
167 1538524947;   ---> 72% of queries have happened at this timestamp.
1   1538551813;
3   1538551815;
6   1538602215;
1   1538617599;
33  1538631015;
1   1538631016;
1   1538631017;

You can use the command below to check the regular date time format of a given timestamp. So, Oct 3, 05:32 is when there was something wrong on the server:

# date -d @1538524947
Wed Oct 3 05:32:27 IST 2018

Query tuning can be carried out alongside this, but identifying such spots helps avoiding spending time on query tuning where badly written queries are not the problem. Having said that, from this point, further troubleshooting may take different sub paths such as checking log files at that particular time, looking at CPU reports, reviewing past pt-stalk reports if set up to run in the background, and dmesg etc. This approach is useful for identifying at what time (or time range) MySQL was more stressed just using slow query log when no robust monitoring tools, like Percona Monitoring and Management (PMM), are deployed.

Using PMM to monitor queries

If you have PMM, you can review Query Analytics to see the topmost slow queries, along with details like execution counts, load etc. Below is a sample screen copy for your reference:

Slow query log from PMM dashboard

NOTE: If you use Percona Server for MySQL, slow query log can report time in micro seconds. It also supports extended logging of  other statistics about query execution. These provide extra power to see the insights of query processing. You can see more information about these options here.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com