Oct
13
2015
--

MySQL query digest with Performance Schema

Data AnalysisQuery analysis is a fantastic path in the pursuit to achieve high performance. It’s also probably the most repeated part of a DBA’s daily adventure. For most of us, the weapon of choice is definitely pt-query-digest, which is one of the best tools for slow query analysis out there.

Why not use pt-query-digest? Well, sometimes getting the slow log can be a challenge, such as with RDS instances or when your database is running as part of a DBaaS, which is a common practice in certain organizations.

In those cases it’s good to have an alternative. And in this case, the chosen one is the Performance Schema. We have already talked about the events_statements_* tables; however, this is the moment for the events_statements_summary_by_digest. In this table each row summarizes events for given schema/digest values (note that before MySQL 5.6.9, there is no SCHEMA_NAME column and grouping is based on DIGEST values only).

In order for MySQL to start to aggregate information in summary tables, you must verify that the consumer statement_digest is enabled.

Now, the most straightforward way to get data is to simply query the table, like this:

SELECT
SCHEMA_NAME,
digest,
digest_text,
round(sum_timer_wait/ 1000000000000, 6),
count_star
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;

This will show you the picture of volume and frequency of SQL statements in your server. As simple as that. But there are some caveats:

  • Statements are normalized to a digest text. Instead of seeing a query like SELECT age FROM population WHERE id BETWEEN 153 AND 153+69 you will have the fingerprint version: SELECT age FROM population WHERE id BETWEEN ? AND ? + ?
  • The events_statements_summary_by_digest table has a limited maximum number of rows (200 by default, but MySQL 5.6.5 can be modified with the performance_schema_digests_size variable). As a consequence, when the table is full, statement digest values that have no already existing row will be added to a special “catch-all” row with DIGEST = NULL. In plain English: you won’t have meaningful info for those statements.

To solve the first issue, we can use the events_statements_history table to get complete queries for all the digests. I chose not to use events_statements_currents because of the short life the rows have on that table. With history, there are more chances to get more queries in the same amount of time.

Now, when using pt-query-digest, the first step is always to collect a representative amount of data, commonly from the slow log, and then process. With Performance Schema, let’s collect a representative amount of complete queries so we can have examples for every aggregated statement.

To address the second issue, we just need to TRUNCATE the events_statements_summary_by_digest table. This way the summary will start from scratch.

Since Performance Schema is available on all the platforms supported by MySQL, I chose to run the tests on an Amazon RDS MySQL instance. The only thing I didn’t like is that P_S is disabled by default on RDS and to enable it requires an instances reboot. Other than that, everything is the same as in a regular instance.

The steps are:

  1. Enable the events_statements_history consumer
  2. Create a MEMORY table to hold the data
  3. Truncate tables to have a fresh start
  4. Create a MySQL EVENT that will fill the table with data
  5. Once the event has ended, get the query digest.

The table schema is the following:

CREATE TABLE IF NOT EXISTS percona.digest_seen
(schema_name varchar(64) DEFAULT NULL,
digest varchar(32) DEFAULT NULL,
sql_text varchar(1024) DEFAULT NULL,
PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;

The original SQL_TEXT field from the events_statements_history table is defined as longtext, but unless you are using Percona Server (5.5+), you won’t be able to use longtext on a memory table. This is possible in Percona Server because the Improved Memory Engine permits the use of Blob and Text fields on the Memory Storage Engine. The workaround is to define that field as a varchar 1024. Why 1024? That’s another requirement from the table: The SQL_TEXT is fixed at 1024 chars. It’s only after MySQL 5.7.6 that the maximum number of bytes to display can be modified by changing the performance_schema_max_sql_text_length system variable at server startup.

Also, since we are going to use EVENTS on RDS, the “event_scheduler” variable has to be set to ON. Luckily, it is a dynamic variable so there’s no need to reboot the instance after modifying the Parameter Group. If using a non-RDS, it’s enough to execute “SET GLOBAL event_scheduler = ON;”

Here is the complete list of steps:

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'events_statements_history';
SET SESSION max_heap_table_size = 1024*1024;
CREATE DATABASE IF NOT EXISTS percona;
USE percona;
CREATE TABLE IF NOT EXISTS percona.digest_seen (schema_name varchar(64) DEFAULT NULL, digest varchar(32) DEFAULT NULL, sql_text varchar(24) DEFAULT NULL, PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;
TRUNCATE TABLE percona.digest_seen;
TRUNCATE performance_schema.events_statements_summary_by_digest;
TRUNCATE performance_schema.events_statements_history;
CREATE EVENT IF NOT EXISTS getDigest
ON SCHEDULE EVERY 1 SECOND
ENDS CURRENT_TIMESTAMP + INTERVAL 5 MINUTE ON COMPLETION NOT PRESERVE
DO
INSERT IGNORE INTO percona.digest_seen SELECT CURRENT_SCHEMA, DIGEST, SQL_TEXT FROM performance_schema.events_statements_history WHERE DIGEST IS NOT NULL GROUP BY current_schema, digest LIMIT 50;

The event is defined to be run immediately, once per second, for 5 minutes. After the event is complete, it will be deleted.

When the event is done, we are in position to get the query digest. Simply execute this query:

SELECT
s.SCHEMA_NAME,
s.SQL_TEXT,
ROUND(d.SUM_TIMER_WAIT / 1000000000000, 6) as EXECUTION_TIME,
ROUND(d.AVG_TIMER_WAIT / 1000000000000, 6) as AVERAGE_TIME,
COUNT_STAR
FROM performance_schema.events_statements_summary_by_digest d
LEFT JOIN percona.digest_seen s USING (digest)
WHERE s.SCHEMA_NAME IS NOT NULL
GROUP BY s.digest
ORDER BY EXECUTION_TIME DESC LIMIT 10;

The order by is similar to the one pt-query-digest does by default, but it could be any one you want.

The output is:

+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
| SCHEMA_NAME | SQL_TEXT                                                                                                                                                                  | EXECUTION_TIME | AVERAGE_TIME | COUNT_STAR |
+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
| percona     | UPDATE population SET age=age+1 WHERE id=148                                                                                                                              |  202304.145758 |     1.949487 |     103773 |
| percona     | SELECT age FROM population WHERE id BETWEEN 153 AND 153+69                                                                                                                |     488.572609 |     0.000176 |    2771352 |
| percona     | SELECT sex,age,estimate2012 FROM population WHERE id BETWEEN 174 AND 174+69 ORDER BY sex                                                                                  |     108.841575 |     0.000236 |     461412 |
| percona     | SELECT census2010 FROM population WHERE id=153                                                                                                                            |      62.742239 |     0.000090 |     693526 |
| percona     | SELECT SUM(estimate2014) FROM population WHERE id BETWEEN 154 AND 154+69                                                                                                  |      44.940020 |     0.000195 |     230810 |
| percona     | SELECT DISTINCT base2010 FROM population WHERE id BETWEEN 154 AND 154+69 ORDER BY base2010                                                                                |      33.909593 |     0.000294 |     115338 |
| percona     | UPDATE population SET estimate2011='52906609184-39278192019-93190587310-78276160274-48170779146-66415569224-40310027367-70054020251-87998206812-01032761541' WHERE id=154 |       8.231353 |     0.000303 |      27210 |
| percona     | COMMIT                                                                                                                                                                    |       2.630153 |     0.002900 |        907 |
| percona     | BEGIN                                                                                                                                                                     |       0.705435 |     0.000031 |      23127 |
|             | SELECT 1                                                                                                                                                                  |       0.422626 |     0.000102 |       4155 |
+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
10 rows in set (0.10 sec)

Finally, you can do some cleanup:

DROP event IF EXISTS getDigest;
DROP TABLE IF EXISTS percona.digest_seen;
SET SESSION max_heap_table_size = @@max_heap_table_size;
UPDATE performance_schema.setup_consumers SET ENABLED = 'NO' WHERE NAME = 'events_statements_history';

Summary: Performance Schema is doing the query digest already for you. It is just a matter of how to access the data in a way that suits your requirements.

The post MySQL query digest with Performance Schema appeared first on MySQL Performance Blog.

Nov
14
2014
--

RDS for Aurora unveiled at AWS re:Invent

One of the big announcements at the Amazon Web Services re:Invent 2014 conference this week was the unveiling of Aurora. The result of years of internal work, Aurora, currently in preview, is a MySQL 5.6-compatible option that “combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases” on the AWS infrastructure. The Aurora database engine will be made available through the Amazon RDS for Aurora service. This new database option is another example of the vibrant innovation coming from the MySQL ecosystem and key role that relational databases play in applications of today and tomorrow.

Integration with other AWS components

Since the Aurora relational database engine will only be made available as a service via RDS, Amazon Web Services can do some interesting new things including:

  • Fully leveraging available compute, memory and networking resources
  • Automatically growing the size of database volumes as storage needs grow
  • Tighter integration with AWS monitoring and restore capabilities
  • Moving from RDS for MySQL to RDS for Aurora will be enabled by a migration tool

Aurora performance

In the keynote, Amazon Web Services shared that Aurora delivers up to five times the performance of stock MySQL 5.6 without requiring any changes to most MySQL applications. Their FAQ reveals some insights into their testing methodology: “Amazon Aurora delivers over 500,000 SELECTs/sec and 100,000 updates/sec, five times higher than MySQL running the same benchmark on the same hardware.”

More RDS for Aurora details coming soon

We’re very excited about Amazon pushing the boundaries in relational databases for the enterprise and look forward to see what we, as a MySQL community, can learn from their new cloud-centric and cross-service approach.

Like many, Percona is looking forward to trying out Aurora. We’ll definitely do some benchmark testing to compare additional database options such as a tuned Percona Server 5.6. Additionally, we’ll share our expert advice and ideas for improvement that we uncover from our testing of RDS for Aurora back to the team at AWS so the service becomes better for all.

The post RDS for Aurora unveiled at AWS re:Invent appeared first on MySQL Performance Blog.

Sep
26
2014
--

Logical MySQL backup tool mydumper 0.6.2 now available

We are pleased to announce the third release in the 0.6 series of mydumper, a tool for performing logical MySQL backups. In this release, we focused on simplifying compiling the code and added new features for making logical backups. These new features include enhancements to AWS RDS support and extending TokuDB support.

Due to recent changes (or not so much) on mysql libs, it became impossible to compile mydumper without the complete mysql source code. To simplify this, we had to disable the binlog functionality by default, as it was the one affected by the mysql libs changes. Now you should be able to compile against any mysql version without issues. If you still want the binlog feature, it is still there and you can enable it with:

cmake . -DWITH_BINLOG=ON

The one caveat is that you will need to compile against a mysql version greater than 5.5.34. It’s also possible to compile with binlog enabled against the latest 5.6 versions. In this case, you will need the source code to make some changes. You can find related information about this issue at these locations:

Download mydumper-0.6.2 source code here.

Bugs Fixed:

  • #1347392 Last row of table not dumped if it brings statement over statement_size
  • #1157113 Compilation of latest branch fails on CentOS 6.3 64bit
  • #1326368 Can’t make against Percona-Server-devel-55 headers
  • #1282862 Unknown type name ‘HASH’
  • #1336860 k is used twice
  • #913307 Can’t compile – missing libs crypto and ssl
  • #1364393 Rows chunks doesn’t increase non innodb jobs

New MySQL Backup Features:

  • --lock-all-tables

Use LOCK TABLE for all instead of FLUSH TABLES WITH READ LOCK. With this option you will be able to backup RDS instances and also get the binlog coordinates (5.6).

  • TokuDB support

Now TokuDB tables are dumped within the consistent snapshot instead of being locked like MyISAM.

  • Support to dump tables from different schemas

mydumper has two arguments,

    --database (-B)

and

    --tables-list (-T)

so until now you were able to do;

-B db1

or

-B db1 -T t1,t2,tn

To dump a whole database or a list of tables from one database respectively. In 0.6.2 you can list tables in different databases in this way;

-T db1.t1,db1.t2,db2.t1,db2.t1

NOTE: You should not use -B here because mydumper will take the -T list as table names.

The post Logical MySQL backup tool mydumper 0.6.2 now available appeared first on Percona Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com