pt-query-digest is one of the most commonly used tool when it comes to query auditing in MySQL®. By default, pt-query-digest reports the top ten queries consuming the most amount of time inside MySQL. A query that takes more time than the set threshold for completion is considered slow but it’s not always true that tuning such queries makes them faster. Sometimes, when resources on server are busy, it will impact every other operation on the server, and so will impact queries too. In such cases, you will see the proportion of slow queries goes up. That can also include queries that work fine in general.
This article explains a small trick to identify such spots using pt-query-digest and the slow query log. pt-query-digest is a component of Percona Toolkit, open source software that is free to download and use.
Some sample data
Let’s have a look at sample data in Percona Server 5.7. Slow query log is configured to capture queries longer than ten seconds with no limit on rate of logging, which is generally considered to throttle the IO that comes while writing slow queries to the log file.
mysql> show variables like 'log_slow_rate%' ; +---------------------+---------+ | Variable_name | Value | +---------------------+---------+ | log_slow_rate_limit | 1 | --> Log all queries | log_slow_rate_type | session | +---------------------+---------+ 2 rows in set (0.00 sec) mysql> show variables like 'long_query_time' ; +-----------------+-----------+ | Variable_name | Value | +-----------------+-----------+ | long_query_time | 10.000000 | --> 10 seconds +-----------------+-----------+ 1 row in set (0.01 sec)
When I run pt-query-digest, I see in the summary report that 80% of the queries have come from just three query patterns.
# Profile # Rank Query ID Response time Calls R/Call V/M # ==== ============================= ================ ===== ======== ===== # 1 0x7B92A64478A4499516F46891... 13446.3083 56.1% 102 131.8266 3.83 SELECT performance_schema.events_statements_history # 2 0x752E6264A9E73B741D3DC04F... 4185.0857 17.5% 30 139.5029 0.00 SELECT table1 # 3 0xAFB5110D2C576F3700EE3F7B... 1688.7549 7.0% 13 129.9042 8.20 SELECT table2 # 4 0x6CE1C4E763245AF56911E983... 1401.7309 5.8% 12 116.8109 13.45 SELECT table4 # 5 0x85325FDF75CD6F1C91DFBB85... 989.5446 4.1% 15 65.9696 55.42 SELECT tbl1 tbl2 tbl3 tbl4 # 6 0xB30E9CB844F2F14648B182D0... 420.2127 1.8% 4 105.0532 12.91 SELECT tbl5 # 7 0x7F7C6EE1D23493B5D6234382... 382.1407 1.6% 12 31.8451 70.36 INSERT UPDATE tbl6 # 8 0xBC1EE70ABAE1D17CD8F177D7... 320.5010 1.3% 6 53.4168 67.01 REPLACE tbl7 # 10 0xA2A385D3A76D492144DD219B... 183.9891 0.8% 18 10.2216 0.00 UPDATE tbl8 # MISC 0xMISC 948.6902 4.0% 14 67.7636 0.0 <10 ITEMS>
Let’s take a closer look at the queries recorded in slow query log.
# grep -B3 DIGEST mysql-slow_Oct2nd_4th.log .... .... # User@Host: ztrend[ztrend] @ localhost [] Id: 6431601021 # Query_time: 139.279651 Lock_time: 64.502959 Rows_sent: 0 Rows_examined: 0 SET timestamp=1538524947; SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history; # User@Host: ztrend[ztrend] @ localhost [] Id: 6431601029 # Query_time: 139.282594 Lock_time: 83.140413 Rows_sent: 0 Rows_examined: 0 SET timestamp=1538524947; SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history; # User@Host: ztrend[ztrend] @ localhost [] Id: 6431601031 # Query_time: 139.314228 Lock_time: 96.679563 Rows_sent: 0 Rows_examined: 0 SET timestamp=1538524947; SELECT DIGEST, CURRENT_SCHEMA, SQL_TEXT FROM performance_schema.events_statements_history; .... ....
Now you can see two things.
- All of them have same Unix timestamp
- All of them were spending more than 70% of their execution time waiting for some lock.
Analyzing the data from pt-query-digest
Now I want to check if I can group the count of queries based on their time of execution. If there are multiple queries at a given time captured into the slow query log, time will be printed for the first query but not all. Fortunately, in this case I can rely on the Unix timestamp to compute the counts. The timestamp is gets captured for every query. Luckily, without a long struggle, a combination of grep and awk utilities have displayed what I wanted to display.
# grep -A1 Query_time mysql-slow_Oct2nd_4th.log | grep SET | awk -F "=" '{ print $2 }' | uniq -c 2 1538450797; 1 1538524822; 3 1538524846; 7 1538524857; 167 1538524947; ---> 72% of queries have happened at this timestamp. 1 1538551813; 3 1538551815; 6 1538602215; 1 1538617599; 33 1538631015; 1 1538631016; 1 1538631017;
You can use the command below to check the regular date time format of a given timestamp. So, Oct 3, 05:32 is when there was something wrong on the server:
# date -d @1538524947 Wed Oct 3 05:32:27 IST 2018
Query tuning can be carried out alongside this, but identifying such spots helps avoiding spending time on query tuning where badly written queries are not the problem. Having said that, from this point, further troubleshooting may take different sub paths such as checking log files at that particular time, looking at CPU reports, reviewing past pt-stalk reports if set up to run in the background, and dmesg etc. This approach is useful for identifying at what time (or time range) MySQL was more stressed just using slow query log when no robust monitoring tools, like Percona Monitoring and Management (PMM), are deployed.
Using PMM to monitor queries
If you have PMM, you can review Query Analytics to see the topmost slow queries, along with details like execution counts, load etc. Below is a sample screen copy for your reference:
NOTE: If you use Percona Server for MySQL, slow query log can report time in micro seconds. It also supports extended logging of other statistics about query execution. These provide extra power to see the insights of query processing. You can see more information about these options here.