Jan
16
2014
--

Analyzing WordPress MySQL queries with Query Analytics

This blog, MySQLPerformanceBlog.com, is powered by WordPress, but we never really looked into what kind of queries to MySQL are used by WordPress. So for couple months we ran a Query Analytics (part of Percona Cloud Tools) agent there, and now it is interesting to take a look on queries. Query Analytics uses reports produced by pt-query-digest, but it is quite different as it allows to see trends and dynamics of particular query, in contrast to pt-query-digest, which is just one static report.

Why looking into queries important? I gave an intro in my previous post from this series.

So Query Analytics give the report on the top queries. How to detect which query is “bad”?
One of metrics I am typically looking into is ratio of “Rows examined” to “Rows sent”. In OLTP workload
I expect “Rows sent” to be close to “Rows examined”, because otherwise it means that a query handles a lot of rows (“examined”) which are not used in final result set (“sent”), and it means wasted CPU cycles and even unnecessary IOs if rows are not in memory.

Looking on WordPress queries it does not take long to find one:
query1

This one actually looks quite bad… It examines up to ~186000 rows to return 0 or in the best case 1 row.
The full query text is (and this is available in Query Analytics, you do not need to dig through logs to find it):

SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '154' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'poloralphlauren.redhillrecords' OR comment_author_email = 'spam@gmail.com' ) AND comment_content = 'Probabilities are in case you are like the ma spam jorityof people nowadays, you\'re f lululemonaddictoutletcanadaSale.angelasauceda ighting tooth and nail just looking to keep up together with your existence. Acquiring organized can help you win the fight. Appear to the ugg factors in just abo spam ut every of your spaces (desk, workplace, living room, bed' LIMIT 1;

We can see how execution time of this query changes overtime
query_1_time

and also how many rows it examines for the last month
query_max_rows

It is clearly an upward trend, and obviously the query does not scale well as there more and more data.
I find these trending graphs very useful and they are available in Query Analytics as we continuously digest and analyze queries. We can see that only for the last month amount of rows this query examines increased from ~130K to ~180K.

So, the obvious question is how to optimize this query?

We look into the explain plan

+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+
| id | select_type | table       | type | possible_keys                                            | key             | key_len | ref   | rows   | Extra       |
+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_post_ID,comment_approved_date_gmt,comment_parent | comment_post_ID | 8       | const | 188482 | Using where |
+----+-------------+-------------+------+----------------------------------------------------------+-----------------+---------+-------+--------+-------------+

and SHOW CREATE TABLE

CREATE TABLE `wp_comments` (
  `comment_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `comment_post_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_author` tinytext NOT NULL,
  `comment_author_email` varchar(100) NOT NULL DEFAULT '',
  `comment_author_url` varchar(200) NOT NULL DEFAULT '',
  `comment_author_IP` varchar(100) NOT NULL DEFAULT '',
  `comment_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_content` text NOT NULL,
  `comment_karma` int(11) NOT NULL DEFAULT '0',
  `comment_approved` varchar(20) NOT NULL DEFAULT '1',
  `comment_agent` varchar(255) NOT NULL DEFAULT '',
  `comment_type` varchar(20) NOT NULL DEFAULT '',
  `comment_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`comment_ID`),
  KEY `comment_post_ID` (`comment_post_ID`),
  KEY `comment_approved_date_gmt` (`comment_approved`,`comment_date_gmt`),
  KEY `comment_date_gmt` (`comment_date_gmt`),
  KEY `comment_parent` (`comment_parent`)
)

Obviously WordPress did not design this schema to handle 180000 comments to a single post.
There are several ways to fix it, I will take the easiest way and change the key
KEY comment_post_ID (comment_post_ID)
to
KEY comment_post_ID (comment_post_ID,comment_content(300))

and it changes execution plan to

+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+
| id | select_type | table       | type | possible_keys                                                                                                  | key               | key_len | ref         | rows | Extra       |
+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_post_ID,comment_approved_date_gmt,comment_parent | comment_post | 910     | const,const |    1 | Using where |
+----+-------------+-------------+------+----------------------------------------------------------------------------------------------------------------+-------------------+---------+-------------+------+-------------+

From 186000 rows to 910 rows – that’s quite improvement!

How does it affect execution time? Let’s query run for a while and see again in our trending graph:

query_1_time_after

The drop from ~600ms to ~34ms

and for Rows examined:
query_1_after

The 2nd query is also not to hard to find, and it is again on wp_comments table
query_2

The query examines up to 16K rows, sending only 123 in the best case.

Query text is (this one is from different instance of WordPress, so the table structure is different)

SELECT comment_post_ID FROM wp_comments WHERE LCASE(comment_author_email) = 'spam@gmail.com' AND comment_subscribe='Y' AND comment_approved = '1' GROUP BY comment_post_ID

and EXPLAIN for this particular one

+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table       | type | possible_keys             | key                       | key_len | ref   | rows | Extra                                        |
+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+
|  1 | SIMPLE      | wp_comments | ref  | comment_approved_date_gmt | comment_approved_date_gmt | 62      | const | 6411 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------------------+---------------------------+---------+-------+------+----------------------------------------------+

This table structure is

CREATE TABLE `wp_comments` (
  `comment_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `comment_post_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_author` tinytext NOT NULL,
  `comment_author_email` varchar(100) NOT NULL DEFAULT '',
  `comment_author_url` varchar(200) NOT NULL DEFAULT '',
  `comment_author_IP` varchar(100) NOT NULL DEFAULT '',
  `comment_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_content` text NOT NULL,
  `comment_karma` int(11) NOT NULL DEFAULT '0',
  `comment_approved` varchar(20) NOT NULL DEFAULT '1',
  `comment_agent` varchar(255) NOT NULL DEFAULT '',
  `comment_type` varchar(20) NOT NULL DEFAULT '',
  `comment_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_reply_ID` int(11) NOT NULL DEFAULT '0',
  `comment_subscribe` enum('Y','N') NOT NULL DEFAULT 'N',
  `openid` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`comment_ID`),
  KEY `comment_post_ID` (`comment_post_ID`),
  KEY `comment_date_gmt` (`comment_date_gmt`),
  KEY `comment_approved_date_gmt` (`comment_approved`,`comment_date_gmt`),
  KEY `comment_parent` (`comment_parent`)
)

There again several ways how to make the query more optimal, but let’s make a little exercise: Please
propose your solution in comments, and for the one I like the most by the end of February, I will give my personal discount code to Percona Live MySQL Conference and Expo 2014

So in the conclusion:

  • Query Analytics from Percona Cloud Tools gives immediate insight as to which query requires attention
  • With continuously running reports we can see the trends and effects from our optimizations

Please also note, that Query Analytics does not require you to install MySQL proxy, some third-party middle-ware or any kind of tcp-traffic analyzers. It fully operates with slow-log generated by MySQL, Percona Server or MariaDB (Percona Server provides much more information in slow-log than vanilla MySQL).
So try Percona Cloud Tools for free while it’s still in beta. You’ll be up and running minutes!

Actually, Would you like me to take look on your queries and provide my advice? It is possible.
I will be running a webinar titled “Analyze MySQL Query Performance with Percona Cloud Tools” on Feb-12, 2014; please register and see conditions.

The post Analyzing WordPress MySQL queries with Query Analytics appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com