Personal blog of Yzmir Ramirez

Jan
16
2014

Analyzing WordPress MySQL queries with Query Analytics

This blog, MySQLPerformanceBlog.com, is powered by WordPress, but we never really looked into what kind of queries to MySQL are used by WordPress. So for couple months we ran a Query Analytics (part of Percona Cloud Tools) agent there, and now it is interesting to take a look on queries. Query Analytics uses reports produced by pt-query-digest, but it is quite different as it allows to see trends and dynamics of particular query, in contrast to pt-query-digest, which is just one static report.

Why looking into queries important? I gave an intro in my previous post from this series.

So Query Analytics give the report on the top queries. How to detect which query is “bad”?
One of metrics I am typically looking into is ratio of “Rows examined” to “Rows sent”. In OLTP workload
I expect “Rows sent” to be close to “Rows examined”, because otherwise it means that a query handles a lot of rows (“examined”) which are not used in final result set (“sent”), and it means wasted CPU cycles and even unnecessary IOs if rows are not in memory.

Looking on WordPress queries it does not take long to find one:

This one actually looks quite bad… It examines up to ~186000 rows to return 0 or in the best case 1 row.
The full query text is (and this is available in Query Analytics, you do not need to dig through logs to find it):
SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '154' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'poloralphlauren.redhillrecords' OR comment_author_email = 'spam@gmail.com' ) AND comment_content = 'Probabilities are in case you are like the ma spam jorityof people nowadays, you\'re f lululemonaddictoutletcanadaSale.angelasauceda ighting tooth and nail just looking to keep up together with your existence. Acquiring organized can help you win the fight. Appear to the ugg factors in just abo spam ut every of your spaces (desk, workplace, living room, bed' LIMIT 1;

We can see how execution time of this query changes overtime

and also how many rows it examines for the last month

It is clearly an upward trend, and obviously the query does not scale well as there more and more data.
I find these trending graphs very useful and they are available in Query Analytics as we continuously digest and analyze queries. We can see that only for the last month amount of rows this query examines increased from ~130K to ~180K.

So, the obvious question is how to optimize this query?

We look into the explain plan

+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+
| id | select_type | table       | type | possible_keys                                            | key             | key_len | ref   | rows   | Extra       |
+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_post_ID,comment_approved_date_gmt,comment_parent | comment_post_ID | 8       | const | 188482 | Using where |
+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+

and SHOW CREATE TABLE

CREATE TABLE `wp_comments` (
  `comment_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `comment_post_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_author` tinytext NOT NULL,
  `comment_author_email` varchar(100) NOT NULL DEFAULT '',
  `comment_author_url` varchar(200) NOT NULL DEFAULT '',
  `comment_author_IP` varchar(100) NOT NULL DEFAULT '',
  `comment_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_content` text NOT NULL,
  `comment_karma` int(11) NOT NULL DEFAULT '0',
  `comment_approved` varchar(20) NOT NULL DEFAULT '1',
  `comment_agent` varchar(255) NOT NULL DEFAULT '',
  `comment_type` varchar(20) NOT NULL DEFAULT '',
  `comment_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`comment_ID`),
  KEY `comment_post_ID` (`comment_post_ID`),
  KEY `comment_approved_date_gmt` (`comment_approved`,`comment_date_gmt`),
  KEY `comment_date_gmt` (`comment_date_gmt`),
  KEY `comment_parent` (`comment_parent`)
)

Obviously WordPress did not design this schema to handle 180000 comments to a single post.
There are several ways to fix it, I will take the easiest way and change the key
KEY comment_post_ID (comment_post_ID)
to
KEY comment_post_ID (comment_post_ID,comment_content(300))

and it changes execution plan to

+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+
| id | select_type | table       | type | possible_keys                                                                                                  | key               | key_len | ref         | rows | Extra       |
+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_post_ID,comment_approved_date_gmt,comment_parent | comment_post | 910     | const,const |    1 | Using where |
+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+

From 186000 rows to 910 rows – that’s quite improvement!

How does it affect execution time? Let’s query run for a while and see again in our trending graph:

The drop from ~600ms to ~34ms

and for Rows examined:

The 2nd query is also not to hard to find, and it is again on wp_comments table

The query examines up to 16K rows, sending only 123 in the best case.

Query text is (this one is from different instance of WordPress, so the table structure is different)
SELECT comment_post_ID FROM wp_comments WHERE LCASE(comment_author_email) = 'spam@gmail.com' AND comment_subscribe='Y' AND comment_approved = '1' GROUP BY comment_post_ID
and EXPLAIN for this particular one

+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table       | type | possible_keys             | key                       | key_len | ref   | rows | Extra                                        |
+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_approved_date_gmt | comment_approved_date_gmt | 62      | const | 6411 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+

This table structure is

CREATE TABLE `wp_comments` (
  `comment_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `comment_post_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_author` tinytext NOT NULL,
  `comment_author_email` varchar(100) NOT NULL DEFAULT '',
  `comment_author_url` varchar(200) NOT NULL DEFAULT '',
  `comment_author_IP` varchar(100) NOT NULL DEFAULT '',
  `comment_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_content` text NOT NULL,
  `comment_karma` int(11) NOT NULL DEFAULT '0',
  `comment_approved` varchar(20) NOT NULL DEFAULT '1',
  `comment_agent` varchar(255) NOT NULL DEFAULT '',
  `comment_type` varchar(20) NOT NULL DEFAULT '',
  `comment_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_reply_ID` int(11) NOT NULL DEFAULT '0',
  `comment_subscribe` enum('Y','N') NOT NULL DEFAULT 'N',
  `openid` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`comment_ID`),
  KEY `comment_post_ID` (`comment_post_ID`),
  KEY `comment_date_gmt` (`comment_date_gmt`),
  KEY `comment_approved_date_gmt` (`comment_approved`,`comment_date_gmt`),
  KEY `comment_parent` (`comment_parent`)
)

There again several ways how to make the query more optimal, but let’s make a little exercise: Please
propose your solution in comments, and for the one I like the most by the end of February, I will give my personal discount code to Percona Live MySQL Conference and Expo 2014

So in the conclusion:

Query Analytics from Percona Cloud Tools gives immediate insight as to which query requires attention
With continuously running reports we can see the trends and effects from our optimizations

Please also note, that Query Analytics does not require you to install MySQL proxy, some third-party middle-ware or any kind of tcp-traffic analyzers. It fully operates with slow-log generated by MySQL, Percona Server or MariaDB (Percona Server provides much more information in slow-log than vanilla MySQL).
So try Percona Cloud Tools for free while it’s still in beta. You’ll be up and running minutes!

Actually, Would you like me to take look on your queries and provide my advice? It is possible.
I will be running a webinar titled “Analyze MySQL Query Performance with Percona Cloud Tools” on Feb-12, 2014; please register and see conditions.

The post Analyzing WordPress MySQL queries with Query Analytics appeared first on MySQL Performance Blog.

Written by Vadim Tkachenko in: MySQL,MySQL Webinars,Percona Cloud Tools,Percona Software,pt-query-digest,Query Analytics,Vadim Tkachenko,WordPress MySQL queries,Zend Developer |

Jun
28
2013

Percona Toolkit 2.2.3 released; bug fixes include pt-online-schema-change

Among all of the excellent events going on like YAPC, Velocity, and Percona MySQL University, we recently released Percona Toolkit 2.2.3. It’s a small update that includes the following:

pt-online-schema-change did not handle the failure of the DROP TRIGGER statements correctly
Created pt-agent
pt-query-digest –output json now includes more data

The pt-online-schema-change issue had a high importance. The bug fix is bug 1188002: pt-online-schema-change causes “ERROR 1146 (42S02): “Table ‘db._t_new’ doesn’t exist”. This happens when the tool’s triggers cannot be dropped. If that occurred, it still dropped the new table, which meant triggers were referencing a missing table, causing errors on write to the original table.

This is handled properly now: if triggers can’t be dropped, then the new table isn’t dropped either. This leaves a half-finished new table, but that’s better than write errors on the original table. Those using pt-online-schema-change should definitely upgrade.

The “pt-query-digest –output json” change now delivers more information. The previous JSON output was minimal; the output as of 2.2.3 is much more extensive and, consequently, has a new structure. This means 2.2.3 pqd –output json is not backwards-compatible (but we warned in the first 2.2 release that –output json would be in flux).

See https://launchpad.net/percona-toolkit/+milestone/2.2.3 for the full list of bugs fixed in Percona Toolkit 2.2.3.

Visit http://www.percona.com/software/percona-toolkit to download Percona Toolkit 2.2.3; it’s also in our main repos.

The post Percona Toolkit 2.2.3 released; bug fixes include pt-online-schema-change appeared first on MySQL Performance Blog.

Written by Matt Griffin in: MySQL,Percona Toolkit,pt-agent,pt-online-schema-change,pt-query-digest,Zend Developer |

Apr
23
2013

High-load problems? Investigate them with ‘pt-query-digest’

I had the chance to work on an interesting case last week, and I thought I’d share what I think is a little known goodie from Percona Toolkit for MySQL called pt-query-digest.

One customer was suffering from periods of high load on their database server, leading to degraded application performance, and sometimes even short moments of downtime. As usual with these cases, we set up pt-stalk to trigger a capture if Threads_running went above a known ‘good’ value for this workload, so we would have lots of diagnostics information to review even if we couldn’t be logged in to the server at the time the problem happened.

As I began reviewing available captures, I was able to correlate most load peaks with temporary increases in user activity, so it became a matter of figuring out if the server was hitting its capacity, or if there was another limiting factor in the workload that made it scale poorly. This is when the ‘SHOW FULL PROCESSLIST’ captures generated by pt-stalk come in handy, but if you’ve used them, you know it can be a daunting task to review them in servers with thousands of connections.

Enter pt-query-digest. The tool supports its share of format parsers, but the one that saved the day here is also the simplest: the ‘raw’ log parser. If you specify –format rawlog, then the tool expects the input to have one query per line, and it will do it’s aggregation job on that. Granted, this means there is no run time data available, but at least you get a nice ranking of queries sorted by the number of times they appear in processlist.

In this case, I extracted the first snapshot from each *-processlist capture (under normal circumstances, this file will have –run-time snapshots separated by timestamps) and ran that by pt-query-digest –type rawlog –report-format profile, to get just the ranking of queries. To my luck, one query stood out right away, always appearing many times more than any other one during peaks. Rewriting that query (actually the customer removed it altogether, by merging it with another query that ended up having a much better execution plan, but that is beyond the scope of this post!) made the problem go away. The server took the next round of peak traffic time without issues.

If you regularly work with MySQL databases, getting familiar with Percona Toolkit for MySQL will let you do more work, in a safer way, and in less time. If you need to get started, there are good recorded webinars on the subject, and the upcoming Percona Live conference’s tutorials and sessions will include a wealth on information on everything MySQL-related, including Percona Toolkit.

The post High-load problems? Investigate them with ‘pt-query-digest’ appeared first on MySQL Performance Blog.

Written by Fernando Ipar in: high-load problems,MySQL,Percona Toolkit,pt-query-digest,Zend Developer |

Mar
22
2013

5 Percona Toolkit Tools for MySQL That Could Save Your Day: April 3 Webinar

On April 3 at 10 a.m. PST, I’ll be giving a webinar titled “5 Percona Toolkit Tools for MySQL That Could Save Your Day.” In this presentation you’ll learn how to perform typical but challenging MySQL database administration tasks.

My focus will be on the following tools:

pt-query-digest, to select the queries you should try to improve to get optimal response times
pt-archiver, to efficiently purge purge data from huge tables
pt-table-checksum/pt-table-sync, to check if data on replicas is in sync with data on the master
pt-stalk, to gather data when performance problems happen randomly or are very short
pt-online-schema-change, to run ALTER TABLE statements on large tables without downtime

You can reserve your spot for this free MySQL webinar now. If you can’t attend you’ll receive an link to the recording and the slides after the webinar.

More info on “5 Percona Toolkit Tools for MySQL That Could Save Your Day”Percona Toolkit for MySQL is a must-have set of tools that can help serious MySQL administrators perform tasks that are common but difficult to do manually. Years of deployments by tens of thousands of users including some of the best-known Internet sites, have proven the reliability of the Percona Toolkit tools for MySQL.

In this presentation, you will learn how you can use some of these tools to solve typical, real-world MySQL database administration challenges, such as:

Selecting which queries you should try to optimize to get better response times
How to efficiently purge data from a huge table without putting too much load on your server
How to check which tables are affected when someone accidentally wrote to a replica and fix the problem without rebuilding the tables
How to gather data for performance problems that happen randomly and last only a few seconds
How to run ALTER TABLE on your largest tables without downtime

The post 5 Percona Toolkit Tools for MySQL That Could Save Your Day: April 3 Webinar appeared first on MySQL Performance Blog.

Written by Stephane Combaudon in: Events and Announcements,MySQL,MySQL Webinars,Percona Toolkit,pt-archiver,pt-online-schema-change,pt-query-digest,pt-stalk,pt-table-checksum/pt-table-sync,Stephane Combaudon,Zend Developer |

Aug
30
2012

Visualization tools for pt-query-digest tables

When you process MySQL slow query logs using pt-query-digest you can store samples of each query into query_review table and historical values for review trend analysis into query_review_history table. But it could be difficult to easily browse those tables without a good GUI tool.

For the visual browsing of tables created by pt-query-digest you may want to use some kind of web tools besides phpMyAdmin

Query Digest UI

This is a advanced, but easy to install, ui for pt-query-digest or mk-query-digest.

Main features:
* Dynamic filtering and searching of queries
* Colorized and normalized SQL syntax
* Explain the query dynamically
* Integrated pt-query-advisor support
* Detailed query historic stats, support for log_slow_verbosity.

Actually, this is a very simple and straightforward tool to browse slow queries. The web interface is AJAX-based. Please refer to the screenshots below to see what columns you can filter the report on. Personally, I find this tool useful in case you want to easily find a certain query by id, have it syntax-highlighted or find queries that have first been seen since the specified date or date range.
Also you can explain queries, see their stats and post reviews.

Box Anemometer

Anemometer is a tool for visualizing collected data from the MySQL Slow Query Log. The name comes from the instrument in a weather station that measures wind speed. SQL queries are like the wind — ephemeral and hard to get a hold of. Anemometer helps you get a handle on your queries; it makes it easier to figure out what to optimize and how to track performance over time.

Anemometer relies on the Percona Toolkit to do the slow query log collection. Specifically you can run pt-query-digest. To parse your slow logs and insert them into a database for reporting.

Personally, I find this tool as an advanced instrument that could be useful for:

* browsing queries with optional filtering on first seen date;
* custom reports, choosing which columns to show;
* filtering queries by hosts;
* filtering reports by different query conditions;
* graphing reports;
* explaining queries;
* searching and displaying samples;
* displaying table status and CREATE statements;
* reviewing and commenting on queries.

Despite its lack of ability to save predefined reports in the web interface for easy access, it does have a Permalink feature and API for relative date ranges. The web interface is pretty flexible as built on JQuery UI.

Take a look into the screenshots:

Written by Roman Vynar in: Insight for DBAs,MySQL,pt-query-digest,Zend Developer |

Aug
28
2012

Hidden columns of query_review_history table

You can use pt-query-digest to process a MySQL slow query log and store historical values for review trend analysis into query_review_history table. According to its official documentation you can populate many columns in that table but there are other important ones such as ‘user’, ‘host’, ‘db’ which are not included by default. I will explain how to implement this.

Also the documentation says:

Any columns not mentioned above are inspected to see if they follow a certain naming convention. The column is special if the name ends with an underscore followed by any of these MAGIC_history_cols values:

pct|avt|cnt|sum|min|max|pct_95|stddev|median|rank

If the column ends with one of those values, then the prefix is interpreted as the event attribute to store in that column, and the suffix is interpreted as the metric to be stored.
For a full list of attributes, see http://code.google.com/p/maatkit/wiki/EventAttributes

pt-query-digest works on events, which are a collection of key/value pairs called attributes. You’ll recognize most of the attributes right away: Query_time, Lock_time, and so on. You can just look at a slow log and see them. However, there are some that don’t exist in the slow log, and slow logs may actually include different kinds of attributes (for example, you may have a server with the Percona patches).
With creative use of –filter, you can create new attributes derived from existing attributes.

Note: the event attributes list could be outdated as Percona Toolkit was moved from Google Code to Launchpad.

So according the above, you can manually add and have the following useful columns populated in query_review_history table:
* `user_max` varchar(64) DEFAULT NULL – User who executed the query.
* `host_max` varchar(64) DEFAULT NULL – Client host which executed the query.
* `db_max` varchar(64) DEFAULT NULL – Current database that comes from USE database statements.
* `hostname_max` varchar(64) NOT NULL – Hostname of the server where slow log is located or was copied from.

The first three columns could be populated automatically when you run pt-query-digest but in order to set the hostname you need to specify it in the –filter option. For example:

pt-query-digest --review h=localhost,D=percona,t=query_review,p=pass,u=percona \
                --review-history h=localhost,D=percona,t=query_review_history,p=pass,u=percona \
                --filter=" \$event->{Bytes} = length(\$event->{arg}) and \$event->{hostname}=\"$HOSTNAME\"" mysqld-slow-queries.log

$HOSTNAME will reflect the current machine hostname. Replace $HOSTNAME with the necessary hostname if you copied slow query log from the another server. Having hostname set might be useful for filtering queries in the database aggregating slow queries from multiple servers.