Dec
29
2011
--

How To Test Your Upgrades – pt-upgrade

Upgrades are usually one of the biggest part of any database infrastructure maintenance. Even with enough planning something else can go bad after sending your production application to the version you’ve upgraded to. Let’s look at how one Percona Toolkit tool, pt-upgrade can help you identify what to expect and test your upgrades better which is one important step when upgrading your servers.

pt-upgrade

This tool lets you test your SELECT queries against multiple MySQL servers and reports on how each type of query performs from every server you let it test on. It can report differences with results like row counts, checksums, column count, query execution time including execution errors and warnings.

So how do you exactly test your queries on servers of multiple versions.

  1. First, ideally, each server must have identical set of data. One approach and most recommended is by having a slave upgraded to your target version and let it replicate until such time you are ready to test.  Another is populating spare servers from backup and testing with them. With the former, you can easily verify that data is consistent across the master and the slave with pt-table-checksum.
  2. Capture your production queries, we recommend either slow query log (use long_query_time = 0) or tcpdump (other types supported by pt-query-digest’s –type option is also possible, see below).
  3. Test away! Some example scenarios below.

Using slow query log:

# You can also run the slow log through pt-query-digest like the commands below
#    to limit the number of queries per fingerprint to test with.
pt-upgrade --fingerprints --run-time=1h mysqld-slow.log h=127.0.0.1,P=5091 h=127.0.0.1,P=5517

Using sources supported by pt-query-digest –type:

# tcpdump - You can replace --type tcpdump with the specific type and
#    path to the appropriate type of file you wish to input as source of queries
# --print simply returns the queries in slow log format and --sample
#    limits this to 100 queries only
# --no-report simply skips the overhead pt-query-digest had to compute
#    values for reports
pt-query-digest --print --no-report --sample 100 --type tcpdump /path/to/tcpdump.out \
   | pt-upgrade --fingerprints --run-time=1h h=mysql50 h=mysql51 h=mysql55

Few essential notes when using pt-upgrade:

  1. Make sure to cover all your query types from your log (slow log or tcpdump). If you have 5GB of log collected within 2hrs, it does not make sense to test all 5GB if your runtime will not cover all fingerprints. Passing the queries to pt-query-digest and using a sample of 100 like above should be enough sample per fingerprint and reduces the runtime (–runt-time) of the tests.
  2. Using a separate database and table with –temp-database and –temp-table is recommended, this ensures separation of the test table from your production databases and tables.
  3. Use a separate account for testing – one that has read (i.e. SELECT) to all objects within the database server and has all privileges to the specified –temp-database from #1. pt-upgrade is a read-only tool, however I can only emphasize you should try to use an account with only the necessary privileges.
  4. It is also important to note to run pt-upgrade itself on a 3rd server if the servers you are testing against are on two different machines. This helps account for the network latency and provide more accurate results.
  5. Do not test against production servers, 1) because it can incur extra load and 2) the existing production load can affect your results. Testing on EC2 (+EBS) is no exception, although you can easily spin up instances for testing, the unpredictable nature of the cloud can also affect your results.
  6. Lastly, although if you are upgrading your hardware alongside your MySQL version, you should also consider running these tests on similar hardware and OS configuration. Sometimes, a slight difference in these factors can skew your results greatly.

Here is a simple report output from pt-upgrade. As you can see, there are no errors or warnings and only differences with query times.

# Query 7: ID 0x76323E2525BA457C at byte 0 _______________________________
# Found 441 differences in 510 samples:
#   checksums       0
#   column counts   0
#   column types    0
#   query times     441
#   row counts      0
#   warning counts  0
#   warning levels  0
#   warnings        0
#            127.0.0.1:50910 127.0.0.1:55170
# Errors                   0               0
# Warnings                 0               0
# Query_time
#   sum                442ms           506ms
#   min                391us           480us
#   max                  4ms             2ms
#   avg                867us           992us
#   pct_95               1ms             1ms
#   stddev             234us           210us
#   median             881us             1ms
# row_count
#   sum                  510             510
#   min                    1               1
#   max                    1               1
#   avg                    1               1
#   pct_95                 1               1
#   stddev                 0               0
#   median                 1               1
# Fingerprint
#   select * from categories where parent = ? and site = ?

Looking at the sample above, there are microsecond differences between the query times. The most important is the average (avg) which tells in general how each fingerprint performs on the test servers. About 10% variation is tolerable and can be accounted for other MySQL overhead like network latency.

pt-upgrade provides valuable insight on how your production queries will behave on your target version, but it is only a part of the process. There is still no alternative to doing your research for incompatibilities, changes and bug fixes or new bugs introduced that can potentially break your application.

Dec
29
2011
--

Actively monitoring replication connectivity with MySQL’s heartbeat

Until MySQL 5.5 the only variable used to identify a network connectivity problem between Master and Slave was slave-net-timeout. This variable specifies the number of seconds to wait for more Binary Logs events from the master before abort the connection and establish it again. With a default value of 3600 this has been a historically bad configured variable and stalled connections or high latency peaks were not detected for a long period of time or not detected at all. We needed an active master/slave connection check. And here is where replication’s heartbeat can help us.

This feature was introduce in 5.5 as another parameter to the CHANGE MASTER TO command. After you enable it, the MASTER starts to send “beat” packages (of 106 bytes) to the SLAVE every X seconds where X is a value you can define. If the network link goes down or the latency goes up for more than the time threshold, then the SLAVE IO thread will disconnect and try to connect again. This means we now measure the connection time or latency, not the time without binary log events. We’re actively checking the communication.

How can I configure replication’s heartbeat?

Is very easy to setup with negligible overhead:

mysql_slave > STOP SLAVE;
mysql_slave > CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD=1;
mysql_slave > START SLAVE;

MASTER_HEATBEAT_PERIOD is a value in seconds in the range between 0 to 4294967 with resolution in milliseconds. After the loss of a beat the SLAVE IO Thread will disconnect and try to connect again. Here is the SHOW SLAVE STATUS output after an error:

mysql_slave > show slave status\G
[...]
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
[...]
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master 'rsandbox@127.0.0.1:19972' - retry-time: 60 retries: 86400
[...]

Is interesting to note that having a 5.5 slave with replication’s heartbeat enabled and connected to a 5.1 master doesn’t break the replication. Of course, the heartbeat will not work in this case because the master doesn’t know what is a beat or how to send it :)

What status variables do I have?

The heartbeat check period time and the number of beats received.

mysql_slave > SHOW STATUS LIKE '%heartbeat%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| Slave_heartbeat_period | 1.000 |
| Slave_received_heartbeats | 1476 |
+---------------------------+-------+

Conclusion

If you need to know when exactly the connection between your Master/Slaves breaks then replication’s heartbeat is the easiest and fastest solution to implement.

Dec
29
2011
--

Identifying the load with the help of pt-query-digest and Percona Server

Overview

Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make your life easy. However, there is one such tool, pt-query-digest (from Percona Toolkit) which provides you with all the data points you need to attack the right query in the right way. But vanilla MySQL does have its limitations, it reports only a subset of stats, however if you compare that to Percona server, it reports extra stats such as information about the queries’ execution plan (which includes things like whether Query cache was used or not, if Filesort was used, whether tmp table was created in memory or on disk, if full scan was done, etc) as well as InnoDB statistics (such as IO read operations, the number of unique pages the query accessed, the length of time query waited for row locks, etc). So you can see there is a plethora of useful information reported by Percona Server. Another great thing about Percona Server is the ability to enable logging atomically, not just for new connections as in MySQL. This is very helpful for measurement as otherwise we might not catch some long running connections.

Now let’s get started.

Before We Start!

But before we start, make sure you have enabled slow query logging and set a low enough value for long_query_time. We normally use a value of long_query_time=0, because if you set it to some other value say 0.1 seconds, it will miss all queries shorter than that which well may be majority of your workload. So if you want to analyze what causes load on your server you better use a value of 0. But remember that you would only want to set it to 0, for a period of time that allows you to gather enough statistics about the queries, after that time period remember to set it back to a value greater than 0, because otherwise you can have a really large log file generated. Another thing that we normally do is set the variable log_slow_verbosity to ‘full’, this variable is available in Percona Server and allows us to log all the extra stats I mentioned above in the overview section.

So say if you were using the vanilla MySQL server, you would see an entry like this in the slow query log:

# Time: 111229  3:02:26
# User@Host: msandbox[msandbox] @ localhost []
# Query_time: 2.365434  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 655360
use test;
SET timestamp=1325145746;
select count(*) from auto_inc;

Compare that to Percona Server with log_slow_verbosity=full:

# Time: 111229  3:11:26
# User@Host: msandbox[msandbox] @ localhost []
# Thread_id: 1  Schema: test  Last_errno: 0  Killed: 0
# Query_time: 0.117904  Lock_time: 0.002886  Rows_sent: 1  Rows_examined: 655360  Rows_affected: 0  Rows_read: 655361
# Bytes_sent: 68  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: F00
# QC_Hit: No  Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 984  InnoDB_IO_r_bytes: 16121856  InnoDB_IO_r_wait: 0.001414
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 973
SET timestamp=1325146286;
select count(*) from auto_inc;

Note that logging all queries in this fashion as opposed to the general query log, enables us to have the statistics available after the query is actually executed, while no such statistics are available for queries that are logged using the general query log.

Installing pt-query-digest tool (as well as other tools from Percona Toolkit) is very easy, and is explained here at this link.

Now before we move forward, I would like to point out that the results shown in this blog post are from the queries that I have gathered from one of the Percona Server instance running on my personal Amazon micro instance.

Using pt-query-digest

Using pt-query-digest is pretty straight forward:

pt-query-digest /path/to/slow-query.log

Note that executing pt-query-digest can be pretty CPU and memory consuming, so ideally you would want to download the "slow query log" to another machine and run it there.

Analyzing pt-query-digest Output

Now let's see what output it returns. The first part of the output is an overall summary:

# 8.1s user time, 60ms system time, 26.23M rss, 62.49M vsz
# Current date: Thu Dec 29 07:09:32 2011
# Hostname: somehost.net
# Files: slow-query.log.1
# Overall: 20.08k total, 167 unique, 16.04 QPS, 0.01x concurrency ________
# Time range: 2011-12-28 18:42:47 to 19:03:39
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             8s     1us    44ms   403us   541us     2ms    98us
# Lock time          968ms       0    11ms    48us   119us   134us    36us
# Rows sent        105.76k       0    1000    5.39    9.83   32.69       0
# Rows examine     539.46k       0  15.65k   27.52   34.95  312.56       0
# Rows affecte       1.34k       0      65    0.07       0    1.35       0
# Rows read        105.76k       0    1000    5.39    9.83   32.69       0
# Bytes sent        46.63M      11 191.38k   2.38k   6.63k  11.24k  202.40
# Merge passes           0       0       0       0       0       0       0
# Tmp tables         1.37k       0      61    0.07       0    0.91       0
# Tmp disk tbl         490       0      10    0.02       0    0.20       0
# Tmp tbl size      72.52M       0 496.09k   3.70k       0  34.01k       0
# Query size         3.50M      13   2.00k  182.86  346.17  154.34   84.10
# InnoDB:
# IO r bytes        96.00k       0  32.00k   20.86       0  816.04       0
# IO r ops               6       0       2    0.00       0    0.05       0
# IO r wait           64ms       0    26ms    13us       0   530us       0
# pages distin      28.96k       0      48    6.29   38.53   10.74    1.96
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       4% yes,  95% no
# Filesort on    0% yes,  99% no
# Full scan      4% yes,  95% no
# QC Hit         0% yes,  99% no
# Tmp table      4% yes,  95% no
# Tmp table on   2% yes,  97% no

It tells you that a total of 20.08k queries were captured which are actually invocations of 167 different queries. Following that there are summaries of various data points such as the total query execution time and the average query execution time, the number of tmp tables created in memory vs on-disk, percentage of queries that needed full scan, InnoDB IO stats, etc. One thing I suggest here is that, you should probably give more importance to the times/values reported in the 95% (95th percentile) column as that gives us more accurate understanding. Now, for example it is shown here that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48K of data), however, 95% of the times InnoDB r ops is 0, which means it accesses these pages in memory. What it says though is that, if this query would run on a cold MySQL instance, then it would require reading nearly 40 pages from disk.

Let's analyze next part of the output produced by pt-query-digest.

# Profile
# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0x92F3B1B361FB0E5B  4.0522 50.0%   312 0.0130 1.00  0.00 SELECT wp_options
#    2 0xE71D28F50D128F0F  0.8312 10.3%  6412 0.0001 1.00  0.00 SELECT poller_output poller_item
#    3 0x211901BF2E1C351E  0.6811  8.4%  6416 0.0001 1.00  0.00 SELECT poller_time
#    4 0xA766EE8F7AB39063  0.2805  3.5%   149 0.0019 1.00  0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
#    5 0xA3EEB63EFBA42E9B  0.1999  2.5%    51 0.0039 1.00  0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits wp_posts
#    6 0x94350EA2AB8AAC34  0.1956  2.4%    89 0.0022 1.00  0.01 UPDATE wp_options
#    7 0x7AEDF19FDD3A33F1  0.1381  1.7%   909 0.0002 1.00  0.00 SELECT wp_options
#    8 0x4C16888631FD8EDB  0.1160  1.4%     5 0.0232 1.00  0.00 SELECT film
#    9 0xCFC0642B5BBD9AC7  0.0987  1.2%    50 0.0020 1.00  0.01 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits
#   10 0x88BA308B9C0EB583  0.0905  1.1%     4 0.0226 1.00  0.01 SELECT poller_item
#   11 0xD0A520C9DB2D6AC7  0.0850  1.0%   125 0.0007 1.00  0.00 SELECT wp_links wp_term_relationships wp_term_taxonomy
#   12 0x30DA85C940E0D491  0.0835  1.0%   542 0.0002 1.00  0.00 SELECT wp_posts
#   13 0x8A52FE35D340A347  0.0767  0.9%     4 0.0192 1.00  0.00 TRUNCATE TABLE poller_time
#   14 0x3E84BF7C0C2A3005  0.0624  0.8%   272 0.0002 1.00  0.00 SELECT wp_postmeta
#   15 0xA01053DA94ED829E  0.0567  0.7%   213 0.0003 1.00  0.00 SELECT data_template_rrd data_input_fields
#   16 0xBE797E1DD5E4222F  0.0524  0.6%    79 0.0007 1.00  0.00 SELECT wp_posts
#   17 0xF8EC4434E0061E89  0.0475  0.6%    62 0.0008 1.00  0.00 SELECT wp_terms wp_term_taxonomy
#   18 0xCDFFAD848B0C1D52  0.0465  0.6%     9 0.0052 1.00  0.01 SELECT wp_posts wp_term_relationships
#   19 0x5DE709416871BF99  0.0454  0.6%   260 0.0002 1.00  0.00 DELETE poller_output
#   20 0x428A588445FE580B  0.0449  0.6%   260 0.0002 1.00  0.00 INSERT poller_output
# MISC 0xMISC              0.8137 10.0%  3853 0.0002   NS   0.0 <147 ITEMS>

The above part of the output ranks the queries and shows the top queries with largest impact - longest sum of run time which typically (not always) shows queries causing highest load on the server. As we can see here the one causing the highest load is the SELECT wp_options query, this is basically a unique way of identifying the query and simply implies that this is a SELECT query executed against the wp_options table. Another thing to note is the last line in the output the # MISC part, it tells you how much of "load" is not covered by top queries, we have 10% in MISC which means that by reviewing these top 20 queries we essentially reviewed most of the load.

Now let's take a look at the most important part of the output:

# Query 1: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |   _^   |
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          1     312
# Exec time     50      4s     5ms    25ms    13ms    20ms     4ms    12ms
# Lock time      3    32ms    43us   163us   103us   131us    19us    98us
# Rows sent     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Rows examine  13  73.63k     238     296  241.67  246.02   10.15  234.30
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Bytes sent    53  24.85M  46.52k  84.36k  81.56k  83.83k   7.31k  79.83k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size     0  21.63k      71      71      71      71       0      71
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin  40  11.77k      34      44   38.62   38.53    1.87   38.53
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full scan    100% yes,   0% no
# String:
# Databases    wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
# Last errno   0
# Users        wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  #################
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_options`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'\G

This is the actual part of the output dealing with analysis of the query that is taking up the longest sum of run time, query ranked #1. The first row in the table above shows the Count of number of times this query was executed. Now let's take a look at the values in the 95% column, we can see that this query is taking up 20ms 95% of the times and is sending 202 rows and 83.83k of data per query while its also examining 246 rows for every query. Another important thing that is shown here is that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48k of data). While you can also see that this query is doing a full scan every time its run. The "Databases" section of the output also shows the name of the databases where this query was executed. Next the "Query_time distribution" section shows how this query times mostly, which you can see majority of the time lies in the range >= 10ms and < 100ms. The "Tables" section lists the queries that you can use to gather more data about the underlying tables involved and the query execution plan used by MySQL.
The end result might be that you end up limiting the number of results returned by the query, by using a LIMIT clause or by filtering based on the option_name column, or you might even compress the values stored in the option_value column so that less data is read and sent.

Let’s analyze another query, this time query ranked #4 by pt-query-digest.

# Query 4: 0.12 QPS, 0.00x concurrency, ID 0xA766EE8F7AB39063 at byte 4001761
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |  .^_   |
# Time range: 2011-12-28 18:42:47 to 19:02:57
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          0     149
# Exec time      3   281ms   534us    17ms     2ms     4ms     2ms     1ms
# Lock time      1    14ms    56us   179us    92us   159us    29us    80us
# Rows sent      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Rows examine   8  45.05k       0   1.05k  309.59 1012.63  299.07  124.25
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Bytes sent     1 622.17k     694  13.27k   4.18k  11.91k   3.35k   2.06k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables    10     149       1       1       1       1       0       1
# Tmp disk tbl  30     149       1       1       1       1       0       1
# Tmp tbl size  16  12.00M       0 287.72k  82.47k 270.35k  79.86k  33.17k
# Query size     1  45.68k     286     345  313.91  329.68   13.06  313.99
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin   2     683       2       7    4.58    6.98    1.29    4.96
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Filesort     100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on 100% yes,   0% no
# String:
# Databases    wp_blog_one (105/70%), wp_blog_tw... (34/22%)... 1 more
# Hosts
# InnoDB trxID 86B40F (1/0%), 86B429 (1/0%), 86B434 (1/0%)... 146 more
# Last errno   0
# Users        wp_blog_one (105/70%), wp_blog_two (34/22%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us  ###################
#   1ms  ################################################################
#  10ms  #
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_terms'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_terms`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_taxonomy'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_taxonomy`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_relationships'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_relationships`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT t.*, tt.*, tr.object_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy AS tt ON tt.term_id = t.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category', 'post_tag', 'post_format') AND tr.object_id IN (733) ORDER BY t.name ASC\G

Let’s again take a look at the 95% column in the above output. The query execution time is 4ms, it sends 202 rows for which it has to examine 1012 rows (per every query), interesting here is that this query needs to do Filesort 100% of the times and also needs to create on-disk tmp tables every time its executed. Those are the two important things that you would probably like to fix with respect to this query. The tmp table size needed per query is 270.35k, which is not much considering the fact that tmp_tbl_size variable is set to 32M on the server, so on-disk tables are probably being created because of blob columns being accessed by the query. So a quick fix here could be to instead of selecting every column from all the tables involved in the query, probably selecting only the needed columns which could exclude the blob ones.

Conclusion

The only conclusion, I can make out is “Get yourself Percona Server, turn on log_slow_verbosity and start using pt-query-digest”, your job of identifying queries producing most load will be all the more simpler then.

Dec
27
2011
--

MySQL Training in Frankfurt, Germany

We will be holding our highly acclaimed MySQL workshops in Frankfurt, Germany the week of February 13th. Early registration is open; enroll today and secure your seat. Detail and enrollment can be found here.

Dec
23
2011
--

Solving INFORMATION_SCHEMA slowness

Many of us find INFORMATION_SCHEMA painfully slow to work it when it comes to retrieving table meta data. Many people resort to using file system tools instead to
find for example how much space innodb tables are using and things like it. Besides being just slow accessing information_schema can often impact server performance
dramatically. The cause of majority of this slowness is not opening and closing tables, which can be solved with decent table cache size, and which is very fast for
Innodb but by the fact MySQL by default looks to refresh Innodb statistics each time table is queried from information schema.

The solution is simple, just set innodb_stats_on_metadata=0 which will prevent statistic update when you query information_schema. Most likely
you do not want it anyway. This will not make Innodb to operate without statistics at all as Innodb will still compute statistics for the table first time it opens it.

Here are some numbers from my test box:

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|      130 |       2856365892 |
+----------+------------------+
1 row in set (1.08 sec)

mysql> set global innodb_stats_on_metadata=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|      130 |       2856365892 |
+----------+------------------+
1 row in set (0.00 sec)

As you can see performance gains are huge.
Note enabling this option will not make information_schema to be stale when it comes to important stuff – data_length for example will be correctly returned by information schema as it changes.

Dec
23
2011
--

Online MySQL Configuration Wizard from Percona

Merry Christmas! Just in time for the holidays, we have released a new tool to help you configure and manage your MySQL servers. Our online MySQL Configuration Wizard can help you generate a good basic configuration file for a server. This MySQL tuning wizard is our answer to the commonly asked question, “what is a good default configuration file for my server with 16 GB of RAM?”

We have a raft of new features planned for future releases, including advanced configuration options, supersafe settings to prevent bad things from happening, and much more. In the future we plan to add more online tools to help you be more productive.

Please give it a spin and let us know what you think. Credit for this tool must go to Miguel Trias, our talented lead developer. Thanks also to the many experts inside Percona who helped test, and made suggestions for improving the tool.

Dec
22
2011
--

Optimizing InnoDB for creating 30,000 tables (and nothing else)

Once upon a time, it would have been considered madness to even attempt to create 30,000 tables in InnoDB. That time is now a memory. We have customers with a lot more tables than a mere 30,000. There have historically been no tests for anything near this many tables in the MySQL test suite.

So, in fleshing out the test cases for this and innodb_dict_size_limit I was left with the not so awesome task of making the test case run in remotely reasonable time. The test case itself is pretty simple, a simple loop in the not at all exciting mysqltest language that will create 30,000 identical tables, insert a row into each of them and then drop them.

Establishing the ground rules: I do not care about durability. This is a test case, not a production system holding important data which means I can lie, cheat and steal to get performance.

The simplest way is to use libeatmydata. This is a small shared library designed to be LD_PRELOADed that disables just about every way an application can write data safely to disk. This is perfect for running a test suite for a database server as if your machine crashes halfway through a test run, you’ll just start the test run again – you are not dealing with any important data. Obviously, you should never, ever, ever use libeatmydata with anything you care about; it is called libeat-my-data for a reason.

Without libeatmydata and using the default options for the MySQL test suite, I noticed that I was only creating about 10-15 tables every second which means we’d take a very long time to create 30,000. After a bit of time, it sped up to about 20-25 per second. Of course, with the default timeout for a MySQL test (15 minutes), we quickly (well.. in 15 minutes) hit that and the test is failed for taking too long.

With libeatmydata the test takes about 77 seconds – a huge improvement.

So how could I possibly get this test to run (even with –big-test option to mysql-test-run) in reasonable time? Well… I can set some InnoDB options! I’m going to try the obvious first: innodb-flush-method, sync-frm and innodb-flush-log-at-trx-commit. There is an undocumented option for innodb-flush-method called “nosync” that is meant to not flush data to disk. Since you could hear how much syncing to disk was going on during my test run, not syncing to disk all the time would get closer to the libeatmydata performance. I also want to disable syncing of the FRM file to disk and set log flushing to happen as infrequently as possible. With these options I started to get somewhere between 25-90 CREATE TABLE per second. This gets the test execution time down to 12 minutes, so that just escapes the timeout.

I then added the options of innodb-adaptive-checkpoint=0 and flush-neighbor-pages=0 in the hope of avoiding a bunch of background flushing (which called fsync). It didn’t help.

I noticed that there was an fsync() call when extending the data file, so I tried setting a higher innodb-autoextend-increment and a larger initial size. This also did not help.

So how fast is InnoDB under all of this? Am I hitting a fundamental limitation in InnoDB?

Well…. I went and wrote a program using HailDB – which is InnoDB as a shared library that you can call using an easy to use C API.

Writing a simple test program that creates 30,000 tables in a similar InnoDB configuration as default MySQL is pretty easy (easier than writing the mysqltest language that’s for sure). After a “I’m glad this isn’t a SSD” killer amount of fsync() activity, it took a total of 14.5 minutes. Not too bad. This is less than my initial test with MySQL, probably due to not writing and syncing FRM files. If I run the same program with libeatmydata, it only takes 15-20 seconds. Clearly it’s syncing things to disk that takes all the time.

If we make the HailDB program set flush_method to nosync and flush_log_at_trx_commit=2, the execution time is only 1 minute. This is much closer to the libeatmydata time than MySQL ever got.

With HailDB you can do more than one data dictionary operation in a single transaction. So if instead of setting flush_method and flush_log_at_trx_commit I instead group the CREATE TABLE into transactions of creating 100 tables at a time, I get an execution time of 3 minutes. This is a big difference to the original 14.5 minutes.

What’s the practical applications of all of this? Not much (unless you’re writing complex test cases) but it is clear that loading large amounts of DDL could be a lot faster than it is currently (although loading the data into tables is still going to take a while too).

Dec
22
2011
--

XtraBackup 1.6.4 respin

Very shortly after the XtraBackup 1.6.4 binaries hit the downloads site and repositories, it was found to have a bug where the master information was not properly stored, thus breaking point in time recovery and the ability to start a slave from a backup.

This was caught and fixed really quickly and we’ve re-spun the XtraBackup 1.6.4 binaries to include this fix. Since we include the BZR revision number in the versions, you will see that 1.6.4-314 replaces 1.6.4-313. The gap to get a faulty binary was very small, so not many people should be affected.

The sad-but-funny part of all this? We have very nice automated testing for exactly this problem in our new QA infrastructure that goes live this week, shortly after the 1.6.4 release. i.e. if we were a week quicker in deploying our new infrastructure or delayed our 1.6.4 release a week, we would have caught this before we even handed the tree over to be packaged.

You can find out if you were one of the few unlucky people to get the old binaries in one of a number of ways:

  1. Your package is version 1.6.4-313 instead of 1.6.4-314
  2. The command below returns a matching line:
    $ grep 'innobackup hello' innobackupex
            if ($line =~ m/innobackup hello/) {
  3. A backup produced with the faulty version will have “xtrabackup ping 5″ in the xtrabackup_binlog_info file

As I mentioned before, the time period of getting the faulty build was very small, we already had in place all the steps needed for this not to happen and if you are using YUM or APT to manage packages, everything will automatically be taken care of for you even if you did initially get the faulty build.

For reference, the bug report is here.

Dec
21
2011
--

Percona Testing: Innodb crash / recovery tests available

Not everyone may know this, but there are precious few innodb crash recovery tests available.

Some folks have noticed this and asked for something to be done about it, but unfortunately, no tests have been created for the main MySQL branch.

The MySQL at Facebook branch has a number of tests that are quite interesting.  They basically create a master-slave pair, subject the master to a transactional load, crash the master, restart it, then ensure the master and slave are in sync (once the test load is stopped).

The team at Percona has been known to tinker with Innodb now and again, and were also very interested in having some tests to ensure things were working as expected.  To that end, I have created the innodbCrash suite for kewpie.
These tests follow the Facebook guys’ lead for test load + validation, but we use the random query generator for generating our transactions.  This is for two reasons:

  1. My attempts at copying the Facebook multi-threaded python load-generation code seemed very, very slow
  2. I just like the randgen (it is fun!)

The tests require a debug build as they make use of the DBUG points noted in the crash test bug:

  • DBUG_EXECUTE_IF(“crash_commit_before”, DBUG_SUICIDE(););
  • DBUG_EXECUTE_IF(“crash_commit_after_prepare”, DBUG_SUICIDE(););
  • DBUG_EXECUTE_IF(“crash_commit_after_log”, DBUG_SUICIDE(););
  • DBUG_EXECUTE_IF(“crash_commit_before_unlog”, DBUG_SUICIDE(););
  • DBUG_EXECUTE_IF(“crash_commit_after”, DBUG_SUICIDE(););
  • DBUG_EXECUTE_IF(“half_binlogged_transaction”, DBUG_SUICIDE(););

What happens is that the randgen grammars have a small chance of hitting a rule that will issue a SET SESSION debug=”d,crash_commit_*”.  The test will then continue to issue DML until the server hits that debug crash point.  We can ensure that the randgen actually did encounter a server crash by examining the contents of kewpie/workdir/bot0/log/randgen.out:

$ tail workdir/bot0/log/randgen.out
# 2011-12-21T14:25:02 bindir is mysql-5.5/sql
# 2011-12-21T14:25:02 core is
"kewpie/randgen/" is not a core dump: Is a directory
No stack.
kewpie/randgen/backtrace.gdb:20: Error in sourced command file:
No registers.
# 2011-12-21T14:25:03
"kewpie/randgen/" is not a core dump: Is a directory
# 2011-12-21T14:25:03
# 2011-12-21T14:25:03 Test completed with failure status STATUS_SERVER_CRASHED (101)

 

I also took pains to ensure that the randgen workload was having an effect on the test tables between crash/recovery/validate cycles.  To examine this for yourself during a run, you can try the new option –test-debug.

I created this option as many tests might have points where we’d like to see what is going on when things fail (or we are changing things and want to make sure they still work), but that would be annoying for general test runs.  For example, we can view the comparisons of master-slave checksums after the master has crashed and restarted:

$ ./kewpie.py --suite=innodbCrash --basedir=mysql-5.5  --test-debug crashCommitAfter_test

20111221-142500 INFO MASTER_PORT: 9307
20111221-142500 INFO SOCKET_FILE: kewpie/workdir/bot0/var_s0/s0.sock
20111221-142500 INFO VARDIR: kewpie/workdir/bot0/var_s0
20111221-142500 INFO STATUS: 1
20111221-142500 TEST_DEBUG Crashes remaining: 1
20111221-142505 TEST_DEBUG A: master_checksum= (('test.A', 1651057258L),) | slave_checksum= (('test.A', 1651057258L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG AA: master_checksum= (('test.AA', 1110194204L),) | slave_checksum= (('test.AA', 1110194204L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG B: master_checksum= (('test.B', 0L),) | slave_checksum= (('test.B', 0L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG BB: master_checksum= (('test.BB', 100984011L),) | slave_checksum= (('test.BB', 100984011L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG C: master_checksum= (('test.C', 2978384584L),) | slave_checksum= (('test.C', 2978384584L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG CC: master_checksum= (('test.CC', 521860080L),) | slave_checksum= (('test.CC', 521860080L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG D: master_checksum= (('test.D', 1148937120L),) | slave_checksum= (('test.D', 1148937120L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142505 TEST_DEBUG DD: master_checksum= (('test.DD', 2110127969L),) | slave_checksum= (('test.DD', 2110127969L),)
20111221-142505 TEST_DEBUG ################################################################################
20111221-142507  ===============================================================
20111221-142507  TEST NAME                                  [ RESULT ] TIME (ms)
20111221-142507  ===============================================================
20111221-142507  innodbCrash.crashCommitAfter_test          [ pass ]     6059
20111221-142507  ===============================================================
20111221-142507 INFO Test execution complete in 16 seconds

Finally, one of the most useful features of the randgen is its ability to use –seed values to change the data and/or queries for a test run.  Seed values are deterministic as each randgen run with that seed should produce the same queries and data (there are unit tests to monitor this doesn’t break, IIRC).  In the past, I have created separate variants of tests that take a –seed=time argument to give wider coverage.  However, this approach gets tiresome when one has many test cases (more to maintain!).

My solution to this problem was the addition of another option –randgen-seed.  By default this is 1, but one can also say –randgen-seed=time.  For those test cases that are written to take advantage of this (like our shiny new Innodb crash tests!), it is very easy to shuffle data and queries for all tests:

./kewpie.py –suite=innodbCrash –basedir=/path/to/basedir –force –randgen-seed=time –repeat=3 gets us:

20111221-142820 TEST_DEBUG Crashes remaining: 1
20111221-142825 TEST_DEBUG A: master_checksum= (('test.A', 2990808711L),) | slave_checksum= (('test.A', 2990808711L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG AA: master_checksum= (('test.AA', 2850101684L),) | slave_checksum= (('test.AA', 2850101684L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG B: master_checksum= (('test.B', 1893458598L),) | slave_checksum= (('test.B', 1893458598L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG BB: master_checksum= (('test.BB', 686510945L),) | slave_checksum= (('test.BB', 686510945L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG C: master_checksum= (('test.C', 1036631423L),) | slave_checksum= (('test.C', 1036631423L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG CC: master_checksum= (('test.CC', 3171836798L),) | slave_checksum= (('test.CC', 3171836798L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG D: master_checksum= (('test.D', 4208439607L),) | slave_checksum= (('test.D', 4208439607L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142825 TEST_DEBUG DD: master_checksum= (('test.DD', 1325583310L),) | slave_checksum= (('test.DD', 1325583310L),)
20111221-142825 TEST_DEBUG ################################################################################
20111221-142827  ===============================================================
20111221-142827  TEST NAME                                  [ RESULT ] TIME (ms)
20111221-142827  ===============================================================
20111221-142827  innodbCrash.crashCommitAfter_test          [ pass ]     5743
20111221-142829 TEST_DEBUG Crashes remaining: 1
20111221-142834 TEST_DEBUG A: master_checksum= (('test.A', 0L),) | slave_checksum= (('test.A', 0L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG AA: master_checksum= (('test.AA', 4174656109L),) | slave_checksum= (('test.AA', 4174656109L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG B: master_checksum= (('test.B', 0L),) | slave_checksum= (('test.B', 0L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG BB: master_checksum= (('test.BB', 502335766L),) | slave_checksum= (('test.BB', 502335766L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG C: master_checksum= (('test.C', 2318403278L),) | slave_checksum= (('test.C', 2318403278L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG CC: master_checksum= (('test.CC', 2060894679L),) | slave_checksum= (('test.CC', 2060894679L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG D: master_checksum= (('test.D', 3959154326L),) | slave_checksum= (('test.D', 3959154326L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142834 TEST_DEBUG DD: master_checksum= (('test.DD', 2507050488L),) | slave_checksum= (('test.DD', 2507050488L),)
20111221-142834 TEST_DEBUG ################################################################################
20111221-142840  innodbCrash.crashCommitAfter_test          [ pass ]     5603
20111221-142844 TEST_DEBUG Crashes remaining: 1
20111221-142849 TEST_DEBUG A: master_checksum= (('test.A', 488788349L),) | slave_checksum= (('test.A', 488788349L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG AA: master_checksum= (('test.AA', 880803763L),) | slave_checksum= (('test.AA', 880803763L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG B: master_checksum= (('test.B', 0L),) | slave_checksum= (('test.B', 0L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG BB: master_checksum= (('test.BB', 3997853573L),) | slave_checksum= (('test.BB', 3997853573L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG C: master_checksum= (('test.C', 2684719484L),) | slave_checksum= (('test.C', 2684719484L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG CC: master_checksum= (('test.CC', 1041735070L),) | slave_checksum= (('test.CC', 1041735070L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG D: master_checksum= (('test.D', 3171892155L),) | slave_checksum= (('test.D', 3171892155L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142849 TEST_DEBUG DD: master_checksum= (('test.DD', 1865390033L),) | slave_checksum= (('test.DD', 1865390033L),)
20111221-142849 TEST_DEBUG ################################################################################
20111221-142851  innodbCrash.crashCommitAfter_test          [ pass ]     5675
20111221-142851  ===============================================================
20111221-142851 INFO Test execution complete in 41 seconds

So far, my tests have been passing without incident, but I also haven’t dug into partitions or blobs >: )

If you’d like to try these out yourself, they are in lp:kewpie (I changed the name of dbqp).  You’ll need the DBD::mysql perl module for the randgen and MySQLdb for kewpie, and a debug build of the server, but other than that, they are ready to run out of the box.  The Percona development team will be adding Jenkins runs for these tests soon

 

Dec
20
2011
--

Percona Toolkit PDF manual now available

Not to be outdone by Vadim’s announcement, I also have one: Percona Toolkit’s manual is now available as PDF (requires registration). You can get it here: http://www.percona.com/software/percona-toolkit/

Thanks to Mauricio Stekl, who did all the hard work to create the PDF of the manual — and did it fast, too.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com