Dec
29
2011
--

How To Test Your Upgrades – pt-upgrade

Upgrades are usually one of the biggest part of any database infrastructure maintenance. Even with enough planning something else can go bad after sending your production application to the version you’ve upgraded to. Let’s look at how one Percona Toolkit tool, pt-upgrade can help you identify what to expect and test your upgrades better which is one important step when upgrading your servers.

pt-upgrade

This tool lets you test your SELECT queries against multiple MySQL servers and reports on how each type of query performs from every server you let it test on. It can report differences with results like row counts, checksums, column count, query execution time including execution errors and warnings.

So how do you exactly test your queries on servers of multiple versions.

  1. First, ideally, each server must have identical set of data. One approach and most recommended is by having a slave upgraded to your target version and let it replicate until such time you are ready to test.  Another is populating spare servers from backup and testing with them. With the former, you can easily verify that data is consistent across the master and the slave with pt-table-checksum.
  2. Capture your production queries, we recommend either slow query log (use long_query_time = 0) or tcpdump (other types supported by pt-query-digest’s –type option is also possible, see below).
  3. Test away! Some example scenarios below.

Using slow query log:

# You can also run the slow log through pt-query-digest like the commands below
#    to limit the number of queries per fingerprint to test with.
pt-upgrade --fingerprints --run-time=1h mysqld-slow.log h=127.0.0.1,P=5091 h=127.0.0.1,P=5517

Using sources supported by pt-query-digest –type:

# tcpdump - You can replace --type tcpdump with the specific type and
#    path to the appropriate type of file you wish to input as source of queries
# --print simply returns the queries in slow log format and --sample
#    limits this to 100 queries only
# --no-report simply skips the overhead pt-query-digest had to compute
#    values for reports
pt-query-digest --print --no-report --sample 100 --type tcpdump /path/to/tcpdump.out \
   | pt-upgrade --fingerprints --run-time=1h h=mysql50 h=mysql51 h=mysql55

Few essential notes when using pt-upgrade:

  1. Make sure to cover all your query types from your log (slow log or tcpdump). If you have 5GB of log collected within 2hrs, it does not make sense to test all 5GB if your runtime will not cover all fingerprints. Passing the queries to pt-query-digest and using a sample of 100 like above should be enough sample per fingerprint and reduces the runtime (–runt-time) of the tests.
  2. Using a separate database and table with –temp-database and –temp-table is recommended, this ensures separation of the test table from your production databases and tables.
  3. Use a separate account for testing – one that has read (i.e. SELECT) to all objects within the database server and has all privileges to the specified –temp-database from #1. pt-upgrade is a read-only tool, however I can only emphasize you should try to use an account with only the necessary privileges.
  4. It is also important to note to run pt-upgrade itself on a 3rd server if the servers you are testing against are on two different machines. This helps account for the network latency and provide more accurate results.
  5. Do not test against production servers, 1) because it can incur extra load and 2) the existing production load can affect your results. Testing on EC2 (+EBS) is no exception, although you can easily spin up instances for testing, the unpredictable nature of the cloud can also affect your results.
  6. Lastly, although if you are upgrading your hardware alongside your MySQL version, you should also consider running these tests on similar hardware and OS configuration. Sometimes, a slight difference in these factors can skew your results greatly.

Here is a simple report output from pt-upgrade. As you can see, there are no errors or warnings and only differences with query times.

# Query 7: ID 0x76323E2525BA457C at byte 0 _______________________________
# Found 441 differences in 510 samples:
#   checksums       0
#   column counts   0
#   column types    0
#   query times     441
#   row counts      0
#   warning counts  0
#   warning levels  0
#   warnings        0
#            127.0.0.1:50910 127.0.0.1:55170
# Errors                   0               0
# Warnings                 0               0
# Query_time
#   sum                442ms           506ms
#   min                391us           480us
#   max                  4ms             2ms
#   avg                867us           992us
#   pct_95               1ms             1ms
#   stddev             234us           210us
#   median             881us             1ms
# row_count
#   sum                  510             510
#   min                    1               1
#   max                    1               1
#   avg                    1               1
#   pct_95                 1               1
#   stddev                 0               0
#   median                 1               1
# Fingerprint
#   select * from categories where parent = ? and site = ?

Looking at the sample above, there are microsecond differences between the query times. The most important is the average (avg) which tells in general how each fingerprint performs on the test servers. About 10% variation is tolerable and can be accounted for other MySQL overhead like network latency.

pt-upgrade provides valuable insight on how your production queries will behave on your target version, but it is only a part of the process. There is still no alternative to doing your research for incompatibilities, changes and bug fixes or new bugs introduced that can potentially break your application.

Dec
29
2011
--

Actively monitoring replication connectivity with MySQL’s heartbeat

Until MySQL 5.5 the only variable used to identify a network connectivity problem between Master and Slave was slave-net-timeout. This variable specifies the number of seconds to wait for more Binary Logs events from the master before abort the connection and establish it again. With a default value of 3600 this has been a historically bad configured variable and stalled connections or high latency peaks were not detected for a long period of time or not detected at all. We needed an active master/slave connection check. And here is where replication’s heartbeat can help us.

This feature was introduce in 5.5 as another parameter to the CHANGE MASTER TO command. After you enable it, the MASTER starts to send “beat” packages (of 106 bytes) to the SLAVE every X seconds where X is a value you can define. If the network link goes down or the latency goes up for more than the time threshold, then the SLAVE IO thread will disconnect and try to connect again. This means we now measure the connection time or latency, not the time without binary log events. We’re actively checking the communication.

How can I configure replication’s heartbeat?

Is very easy to setup with negligible overhead:

mysql_slave > STOP SLAVE;
mysql_slave > CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD=1;
mysql_slave > START SLAVE;

MASTER_HEATBEAT_PERIOD is a value in seconds in the range between 0 to 4294967 with resolution in milliseconds. After the loss of a beat the SLAVE IO Thread will disconnect and try to connect again. Here is the SHOW SLAVE STATUS output after an error:

mysql_slave > show slave status\G
[...]
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
[...]
Last_IO_Errno: 2003
Last_IO_Error: error reconnecting to master 'rsandbox@127.0.0.1:19972' - retry-time: 60 retries: 86400
[...]

Is interesting to note that having a 5.5 slave with replication’s heartbeat enabled and connected to a 5.1 master doesn’t break the replication. Of course, the heartbeat will not work in this case because the master doesn’t know what is a beat or how to send it :)

What status variables do I have?

The heartbeat check period time and the number of beats received.

mysql_slave > SHOW STATUS LIKE '%heartbeat%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| Slave_heartbeat_period | 1.000 |
| Slave_received_heartbeats | 1476 |
+---------------------------+-------+

Conclusion

If you need to know when exactly the connection between your Master/Slaves breaks then replication’s heartbeat is the easiest and fastest solution to implement.

Dec
29
2011
--

Identifying the load with the help of pt-query-digest and Percona Server

Overview

Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make your life easy. However, there is one such tool, pt-query-digest (from Percona Toolkit) which provides you with all the data points you need to attack the right query in the right way. But vanilla MySQL does have its limitations, it reports only a subset of stats, however if you compare that to Percona server, it reports extra stats such as information about the queries’ execution plan (which includes things like whether Query cache was used or not, if Filesort was used, whether tmp table was created in memory or on disk, if full scan was done, etc) as well as InnoDB statistics (such as IO read operations, the number of unique pages the query accessed, the length of time query waited for row locks, etc). So you can see there is a plethora of useful information reported by Percona Server. Another great thing about Percona Server is the ability to enable logging atomically, not just for new connections as in MySQL. This is very helpful for measurement as otherwise we might not catch some long running connections.

Now let’s get started.

Before We Start!

But before we start, make sure you have enabled slow query logging and set a low enough value for long_query_time. We normally use a value of long_query_time=0, because if you set it to some other value say 0.1 seconds, it will miss all queries shorter than that which well may be majority of your workload. So if you want to analyze what causes load on your server you better use a value of 0. But remember that you would only want to set it to 0, for a period of time that allows you to gather enough statistics about the queries, after that time period remember to set it back to a value greater than 0, because otherwise you can have a really large log file generated. Another thing that we normally do is set the variable log_slow_verbosity to ‘full’, this variable is available in Percona Server and allows us to log all the extra stats I mentioned above in the overview section.

So say if you were using the vanilla MySQL server, you would see an entry like this in the slow query log:

# Time: 111229  3:02:26
# User@Host: msandbox[msandbox] @ localhost []
# Query_time: 2.365434  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 655360
use test;
SET timestamp=1325145746;
select count(*) from auto_inc;

Compare that to Percona Server with log_slow_verbosity=full:

# Time: 111229  3:11:26
# User@Host: msandbox[msandbox] @ localhost []
# Thread_id: 1  Schema: test  Last_errno: 0  Killed: 0
# Query_time: 0.117904  Lock_time: 0.002886  Rows_sent: 1  Rows_examined: 655360  Rows_affected: 0  Rows_read: 655361
# Bytes_sent: 68  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: F00
# QC_Hit: No  Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 984  InnoDB_IO_r_bytes: 16121856  InnoDB_IO_r_wait: 0.001414
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 973
SET timestamp=1325146286;
select count(*) from auto_inc;

Note that logging all queries in this fashion as opposed to the general query log, enables us to have the statistics available after the query is actually executed, while no such statistics are available for queries that are logged using the general query log.

Installing pt-query-digest tool (as well as other tools from Percona Toolkit) is very easy, and is explained here at this link.

Now before we move forward, I would like to point out that the results shown in this blog post are from the queries that I have gathered from one of the Percona Server instance running on my personal Amazon micro instance.

Using pt-query-digest

Using pt-query-digest is pretty straight forward:

pt-query-digest /path/to/slow-query.log

Note that executing pt-query-digest can be pretty CPU and memory consuming, so ideally you would want to download the "slow query log" to another machine and run it there.

Analyzing pt-query-digest Output

Now let's see what output it returns. The first part of the output is an overall summary:

# 8.1s user time, 60ms system time, 26.23M rss, 62.49M vsz
# Current date: Thu Dec 29 07:09:32 2011
# Hostname: somehost.net
# Files: slow-query.log.1
# Overall: 20.08k total, 167 unique, 16.04 QPS, 0.01x concurrency ________
# Time range: 2011-12-28 18:42:47 to 19:03:39
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             8s     1us    44ms   403us   541us     2ms    98us
# Lock time          968ms       0    11ms    48us   119us   134us    36us
# Rows sent        105.76k       0    1000    5.39    9.83   32.69       0
# Rows examine     539.46k       0  15.65k   27.52   34.95  312.56       0
# Rows affecte       1.34k       0      65    0.07       0    1.35       0
# Rows read        105.76k       0    1000    5.39    9.83   32.69       0
# Bytes sent        46.63M      11 191.38k   2.38k   6.63k  11.24k  202.40
# Merge passes           0       0       0       0       0       0       0
# Tmp tables         1.37k       0      61    0.07       0    0.91       0
# Tmp disk tbl         490       0      10    0.02       0    0.20       0
# Tmp tbl size      72.52M       0 496.09k   3.70k       0  34.01k       0
# Query size         3.50M      13   2.00k  182.86  346.17  154.34   84.10
# InnoDB:
# IO r bytes        96.00k       0  32.00k   20.86       0  816.04       0
# IO r ops               6       0       2    0.00       0    0.05       0
# IO r wait           64ms       0    26ms    13us       0   530us       0
# pages distin      28.96k       0      48    6.29   38.53   10.74    1.96
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       4% yes,  95% no
# Filesort on    0% yes,  99% no
# Full scan      4% yes,  95% no
# QC Hit         0% yes,  99% no
# Tmp table      4% yes,  95% no
# Tmp table on   2% yes,  97% no

It tells you that a total of 20.08k queries were captured which are actually invocations of 167 different queries. Following that there are summaries of various data points such as the total query execution time and the average query execution time, the number of tmp tables created in memory vs on-disk, percentage of queries that needed full scan, InnoDB IO stats, etc. One thing I suggest here is that, you should probably give more importance to the times/values reported in the 95% (95th percentile) column as that gives us more accurate understanding. Now, for example it is shown here that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48K of data), however, 95% of the times InnoDB r ops is 0, which means it accesses these pages in memory. What it says though is that, if this query would run on a cold MySQL instance, then it would require reading nearly 40 pages from disk.

Let's analyze next part of the output produced by pt-query-digest.

# Profile
# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0x92F3B1B361FB0E5B  4.0522 50.0%   312 0.0130 1.00  0.00 SELECT wp_options
#    2 0xE71D28F50D128F0F  0.8312 10.3%  6412 0.0001 1.00  0.00 SELECT poller_output poller_item
#    3 0x211901BF2E1C351E  0.6811  8.4%  6416 0.0001 1.00  0.00 SELECT poller_time
#    4 0xA766EE8F7AB39063  0.2805  3.5%   149 0.0019 1.00  0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
#    5 0xA3EEB63EFBA42E9B  0.1999  2.5%    51 0.0039 1.00  0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits wp_posts
#    6 0x94350EA2AB8AAC34  0.1956  2.4%    89 0.0022 1.00  0.01 UPDATE wp_options
#    7 0x7AEDF19FDD3A33F1  0.1381  1.7%   909 0.0002 1.00  0.00 SELECT wp_options
#    8 0x4C16888631FD8EDB  0.1160  1.4%     5 0.0232 1.00  0.00 SELECT film
#    9 0xCFC0642B5BBD9AC7  0.0987  1.2%    50 0.0020 1.00  0.01 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits
#   10 0x88BA308B9C0EB583  0.0905  1.1%     4 0.0226 1.00  0.01 SELECT poller_item
#   11 0xD0A520C9DB2D6AC7  0.0850  1.0%   125 0.0007 1.00  0.00 SELECT wp_links wp_term_relationships wp_term_taxonomy
#   12 0x30DA85C940E0D491  0.0835  1.0%   542 0.0002 1.00  0.00 SELECT wp_posts
#   13 0x8A52FE35D340A347  0.0767  0.9%     4 0.0192 1.00  0.00 TRUNCATE TABLE poller_time
#   14 0x3E84BF7C0C2A3005  0.0624  0.8%   272 0.0002 1.00  0.00 SELECT wp_postmeta
#   15 0xA01053DA94ED829E  0.0567  0.7%   213 0.0003 1.00  0.00 SELECT data_template_rrd data_input_fields
#   16 0xBE797E1DD5E4222F  0.0524  0.6%    79 0.0007 1.00  0.00 SELECT wp_posts
#   17 0xF8EC4434E0061E89  0.0475  0.6%    62 0.0008 1.00  0.00 SELECT wp_terms wp_term_taxonomy
#   18 0xCDFFAD848B0C1D52  0.0465  0.6%     9 0.0052 1.00  0.01 SELECT wp_posts wp_term_relationships
#   19 0x5DE709416871BF99  0.0454  0.6%   260 0.0002 1.00  0.00 DELETE poller_output
#   20 0x428A588445FE580B  0.0449  0.6%   260 0.0002 1.00  0.00 INSERT poller_output
# MISC 0xMISC              0.8137 10.0%  3853 0.0002   NS   0.0 <147 ITEMS>

The above part of the output ranks the queries and shows the top queries with largest impact - longest sum of run time which typically (not always) shows queries causing highest load on the server. As we can see here the one causing the highest load is the SELECT wp_options query, this is basically a unique way of identifying the query and simply implies that this is a SELECT query executed against the wp_options table. Another thing to note is the last line in the output the # MISC part, it tells you how much of "load" is not covered by top queries, we have 10% in MISC which means that by reviewing these top 20 queries we essentially reviewed most of the load.

Now let's take a look at the most important part of the output:

# Query 1: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |   _^   |
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          1     312
# Exec time     50      4s     5ms    25ms    13ms    20ms     4ms    12ms
# Lock time      3    32ms    43us   163us   103us   131us    19us    98us
# Rows sent     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Rows examine  13  73.63k     238     296  241.67  246.02   10.15  234.30
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Bytes sent    53  24.85M  46.52k  84.36k  81.56k  83.83k   7.31k  79.83k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size     0  21.63k      71      71      71      71       0      71
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin  40  11.77k      34      44   38.62   38.53    1.87   38.53
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full scan    100% yes,   0% no
# String:
# Databases    wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
# Last errno   0
# Users        wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  #################
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_options`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'\G

This is the actual part of the output dealing with analysis of the query that is taking up the longest sum of run time, query ranked #1. The first row in the table above shows the Count of number of times this query was executed. Now let's take a look at the values in the 95% column, we can see that this query is taking up 20ms 95% of the times and is sending 202 rows and 83.83k of data per query while its also examining 246 rows for every query. Another important thing that is shown here is that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48k of data). While you can also see that this query is doing a full scan every time its run. The "Databases" section of the output also shows the name of the databases where this query was executed. Next the "Query_time distribution" section shows how this query times mostly, which you can see majority of the time lies in the range >= 10ms and < 100ms. The "Tables" section lists the queries that you can use to gather more data about the underlying tables involved and the query execution plan used by MySQL.
The end result might be that you end up limiting the number of results returned by the query, by using a LIMIT clause or by filtering based on the option_name column, or you might even compress the values stored in the option_value column so that less data is read and sent.

Let’s analyze another query, this time query ranked #4 by pt-query-digest.

# Query 4: 0.12 QPS, 0.00x concurrency, ID 0xA766EE8F7AB39063 at byte 4001761
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |  .^_   |
# Time range: 2011-12-28 18:42:47 to 19:02:57
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          0     149
# Exec time      3   281ms   534us    17ms     2ms     4ms     2ms     1ms
# Lock time      1    14ms    56us   179us    92us   159us    29us    80us
# Rows sent      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Rows examine   8  45.05k       0   1.05k  309.59 1012.63  299.07  124.25
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Bytes sent     1 622.17k     694  13.27k   4.18k  11.91k   3.35k   2.06k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables    10     149       1       1       1       1       0       1
# Tmp disk tbl  30     149       1       1       1       1       0       1
# Tmp tbl size  16  12.00M       0 287.72k  82.47k 270.35k  79.86k  33.17k
# Query size     1  45.68k     286     345  313.91  329.68   13.06  313.99
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin   2     683       2       7    4.58    6.98    1.29    4.96
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Filesort     100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on 100% yes,   0% no
# String:
# Databases    wp_blog_one (105/70%), wp_blog_tw... (34/22%)... 1 more
# Hosts
# InnoDB trxID 86B40F (1/0%), 86B429 (1/0%), 86B434 (1/0%)... 146 more
# Last errno   0
# Users        wp_blog_one (105/70%), wp_blog_two (34/22%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us  ###################
#   1ms  ################################################################
#  10ms  #
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_terms'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_terms`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_taxonomy'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_taxonomy`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_relationships'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_relationships`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT t.*, tt.*, tr.object_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy AS tt ON tt.term_id = t.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category', 'post_tag', 'post_format') AND tr.object_id IN (733) ORDER BY t.name ASC\G

Let’s again take a look at the 95% column in the above output. The query execution time is 4ms, it sends 202 rows for which it has to examine 1012 rows (per every query), interesting here is that this query needs to do Filesort 100% of the times and also needs to create on-disk tmp tables every time its executed. Those are the two important things that you would probably like to fix with respect to this query. The tmp table size needed per query is 270.35k, which is not much considering the fact that tmp_tbl_size variable is set to 32M on the server, so on-disk tables are probably being created because of blob columns being accessed by the query. So a quick fix here could be to instead of selecting every column from all the tables involved in the query, probably selecting only the needed columns which could exclude the blob ones.

Conclusion

The only conclusion, I can make out is “Get yourself Percona Server, turn on log_slow_verbosity and start using pt-query-digest”, your job of identifying queries producing most load will be all the more simpler then.

Dec
27
2011
--

MySQL Training in Frankfurt, Germany

We will be holding our highly acclaimed MySQL workshops in Frankfurt, Germany the week of February 13th. Early registration is open; enroll today and secure your seat. Detail and enrollment can be found here.

Dec
26
2011
--

Symfony 2.0.8 released

Symfony 2.0.8 has just been released.

The CHANGELOG
has all the details about the changes done in this release and you can also
have a look at the full
diff.

If you are starting a new project, you can get the Symfony Standard Edition
distribution on the download page.

If you already have a project based on the Symfony Standard Edition 2.0.x, you
can easily upgrade to 2.0.8 by getting the new
deps and
deps.lock
files.

Then, run the vendors script:

$ ./bin/vendors install

And don’t forget to clear your cache:

$ php ./app/console cache:clear

Remember that the Symfony2 Components are also available as standalone
libraries. You can get them via their dedicated read-only repositories on
Github (https://github.com/symfony/Finder for instance), install them via
PEAR (pear install symfony2/Finder), or even
install them via Composer.


Be trained by Symfony experts
2012-02-06 Paris
2012-02-13 Paris
2012-02-13 Paris

Written by in: Zend Developer |
Dec
26
2011
--

A week of symfony #260 (19->25 December 2011)

This week, the Symfony2 master branch focused on a big DoctrineBridge refactorization. In addition, the Filesystem class was moved to its own component. Lastly, the 2.0 branch was made compatible with the upcoming Twig 1.5 version.

Development mailing list

symfony 1 development highlights

Changelog:

  • r33292:
    [1.4]
    fixed test for PHP 5.3

Symfony2 development highlights

Master branch:

  • 517eebc:
    [DoctrineBridge] refactored entity choice list to be ORM independant using an EntityLoader interface
  • fd174a2:
    [Tests] use proper assertion for counting
  • b919d92:
    [DoctrineBridge] optimized fetching of entities to use WHERE IN and fix other inefficencies
  • 3b5c617:
    [DoctrineBridge] removed large parts of the EntityChoiceList code that were completly unnecessary
  • 3c81b62:
    [DoctrineBridge] cleanup and move loader into its own method
  • 200ed54:
    [DoctrineBridge] extracted the common type and made the choice list generic
  • f1199c0:
    [DoctrineBridge] decoupled the EntityUserProvider from the ORM (it can now be used by any Doctrine project implementing the interfaces from Doctrine Common 2.2)
  • 84ad40d:
    added cache clear hook
  • 24319bb:
    [DoctrineBridge] made it possible to change the manager used by the provider
  • 9653be6:
    moved the EntityFactory to the bridge
  • 818a332:
    [Component] moved Filesystem class to its own component
  • 74cfd04:
    [Security] made the logout path check configurable
  • 9daa2a6:
    [Profiler] added function to get parent token directly

2.0.x branch:

  • 29f4111:
    [DoctrineBridge] added a failing test showing the issue for proxy users
  • 649fa52:
    [DoctrineBridge] fixed the entity provider to support proxies
  • ebc9979:
    [Process] fixed unit tests on PHP 5.4
  • 3dd1072:
    [HttpKernel] fixed unit tests that can fail randomly
  • 5a6c989:
    [FrameworkBundle] added test-attribute in xsd-schema to write functional-tests if using xml-configurations
  • 8235848:
    [HttpFoundation] added flv file default extension
  • 1b4aaa2:
    [HttpFoundation] fixed ApacheRequest (pathinfo was incorrect when using mod_rewrite; added better test coverage)
  • bebdd07:
    [TwigBridge] simplified code
  • 6e98730:
    added forwards compatibility with Symfony 2.1 for the Filesystem component
  • adea589:
    [Twig] made code compatible with Twig 1.5

Repository summary: 3,640 watchers (#1 in PHP, #22 overall) and 943 forks (#1 in PHP, #12 overall).

Updated plugins

  • dcReloadedFormExtra:

    • added onChange event on autocomplete
  • WebPurify:

    • replaced simple_xml_call with sfWebBrowser, which uses CURL
    • added 10 second timeout
  • sfSphinx:

    • updated main lib to version 2.0.2-beta
  • cpUniForm:

    • added a missing require_once
  • apostrophe:

    • fixed bug with singleton slots and the areaHideWhenEmpty flag
    • more flexible form fields in the blog / events admin sidebar
    • the placeholder of media slots is now sized automatically and clickable
    • created a utility function for checking and calling a callback
  • apostropheBlog:

    • added more descriptive labels for the delete button in the admin form
    • when app_a_simple_permissions is true (PR), there is no list of allowed editors for a blog post
    • moved that refactored JS to a.js from aBlog.js because it can be shared by both admin generator skeletons

They talked about us


Be trained by Symfony experts
2012-02-06 Paris
2012-02-13 Paris
2012-02-13 Paris

Written by in: Zend Developer |
Dec
23
2011
--

Solving INFORMATION_SCHEMA slowness

Many of us find INFORMATION_SCHEMA painfully slow to work it when it comes to retrieving table meta data. Many people resort to using file system tools instead to
find for example how much space innodb tables are using and things like it. Besides being just slow accessing information_schema can often impact server performance
dramatically. The cause of majority of this slowness is not opening and closing tables, which can be solved with decent table cache size, and which is very fast for
Innodb but by the fact MySQL by default looks to refresh Innodb statistics each time table is queried from information schema.

The solution is simple, just set innodb_stats_on_metadata=0 which will prevent statistic update when you query information_schema. Most likely
you do not want it anyway. This will not make Innodb to operate without statistics at all as Innodb will still compute statistics for the table first time it opens it.

Here are some numbers from my test box:

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|      130 |       2856365892 |
+----------+------------------+
1 row in set (1.08 sec)

mysql> set global innodb_stats_on_metadata=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|      130 |       2856365892 |
+----------+------------------+
1 row in set (0.00 sec)

As you can see performance gains are huge.
Note enabling this option will not make information_schema to be stale when it comes to important stuff – data_length for example will be correctly returned by information schema as it changes.

Dec
23
2011
--

Online MySQL Configuration Wizard from Percona

Merry Christmas! Just in time for the holidays, we have released a new tool to help you configure and manage your MySQL servers. Our online MySQL Configuration Wizard can help you generate a good basic configuration file for a server. This MySQL tuning wizard is our answer to the commonly asked question, “what is a good default configuration file for my server with 16 GB of RAM?”

We have a raft of new features planned for future releases, including advanced configuration options, supersafe settings to prevent bad things from happening, and much more. In the future we plan to add more online tools to help you be more productive.

Please give it a spin and let us know what you think. Credit for this tool must go to Miguel Trias, our talented lead developer. Thanks also to the many experts inside Percona who helped test, and made suggestions for improving the tool.

Dec
22
2011
--

Optimizing InnoDB for creating 30,000 tables (and nothing else)

Once upon a time, it would have been considered madness to even attempt to create 30,000 tables in InnoDB. That time is now a memory. We have customers with a lot more tables than a mere 30,000. There have historically been no tests for anything near this many tables in the MySQL test suite.

So, in fleshing out the test cases for this and innodb_dict_size_limit I was left with the not so awesome task of making the test case run in remotely reasonable time. The test case itself is pretty simple, a simple loop in the not at all exciting mysqltest language that will create 30,000 identical tables, insert a row into each of them and then drop them.

Establishing the ground rules: I do not care about durability. This is a test case, not a production system holding important data which means I can lie, cheat and steal to get performance.

The simplest way is to use libeatmydata. This is a small shared library designed to be LD_PRELOADed that disables just about every way an application can write data safely to disk. This is perfect for running a test suite for a database server as if your machine crashes halfway through a test run, you’ll just start the test run again – you are not dealing with any important data. Obviously, you should never, ever, ever use libeatmydata with anything you care about; it is called libeat-my-data for a reason.

Without libeatmydata and using the default options for the MySQL test suite, I noticed that I was only creating about 10-15 tables every second which means we’d take a very long time to create 30,000. After a bit of time, it sped up to about 20-25 per second. Of course, with the default timeout for a MySQL test (15 minutes), we quickly (well.. in 15 minutes) hit that and the test is failed for taking too long.

With libeatmydata the test takes about 77 seconds – a huge improvement.

So how could I possibly get this test to run (even with –big-test option to mysql-test-run) in reasonable time? Well… I can set some InnoDB options! I’m going to try the obvious first: innodb-flush-method, sync-frm and innodb-flush-log-at-trx-commit. There is an undocumented option for innodb-flush-method called “nosync” that is meant to not flush data to disk. Since you could hear how much syncing to disk was going on during my test run, not syncing to disk all the time would get closer to the libeatmydata performance. I also want to disable syncing of the FRM file to disk and set log flushing to happen as infrequently as possible. With these options I started to get somewhere between 25-90 CREATE TABLE per second. This gets the test execution time down to 12 minutes, so that just escapes the timeout.

I then added the options of innodb-adaptive-checkpoint=0 and flush-neighbor-pages=0 in the hope of avoiding a bunch of background flushing (which called fsync). It didn’t help.

I noticed that there was an fsync() call when extending the data file, so I tried setting a higher innodb-autoextend-increment and a larger initial size. This also did not help.

So how fast is InnoDB under all of this? Am I hitting a fundamental limitation in InnoDB?

Well…. I went and wrote a program using HailDB – which is InnoDB as a shared library that you can call using an easy to use C API.

Writing a simple test program that creates 30,000 tables in a similar InnoDB configuration as default MySQL is pretty easy (easier than writing the mysqltest language that’s for sure). After a “I’m glad this isn’t a SSD” killer amount of fsync() activity, it took a total of 14.5 minutes. Not too bad. This is less than my initial test with MySQL, probably due to not writing and syncing FRM files. If I run the same program with libeatmydata, it only takes 15-20 seconds. Clearly it’s syncing things to disk that takes all the time.

If we make the HailDB program set flush_method to nosync and flush_log_at_trx_commit=2, the execution time is only 1 minute. This is much closer to the libeatmydata time than MySQL ever got.

With HailDB you can do more than one data dictionary operation in a single transaction. So if instead of setting flush_method and flush_log_at_trx_commit I instead group the CREATE TABLE into transactions of creating 100 tables at a time, I get an execution time of 3 minutes. This is a big difference to the original 14.5 minutes.

What’s the practical applications of all of this? Not much (unless you’re writing complex test cases) but it is clear that loading large amounts of DDL could be a lot faster than it is currently (although loading the data into tables is still going to take a while too).

Dec
22
2011
--

XtraBackup 1.6.4 respin

Very shortly after the XtraBackup 1.6.4 binaries hit the downloads site and repositories, it was found to have a bug where the master information was not properly stored, thus breaking point in time recovery and the ability to start a slave from a backup.

This was caught and fixed really quickly and we’ve re-spun the XtraBackup 1.6.4 binaries to include this fix. Since we include the BZR revision number in the versions, you will see that 1.6.4-314 replaces 1.6.4-313. The gap to get a faulty binary was very small, so not many people should be affected.

The sad-but-funny part of all this? We have very nice automated testing for exactly this problem in our new QA infrastructure that goes live this week, shortly after the 1.6.4 release. i.e. if we were a week quicker in deploying our new infrastructure or delayed our 1.6.4 release a week, we would have caught this before we even handed the tree over to be packaged.

You can find out if you were one of the few unlucky people to get the old binaries in one of a number of ways:

  1. Your package is version 1.6.4-313 instead of 1.6.4-314
  2. The command below returns a matching line:
    $ grep 'innobackup hello' innobackupex
            if ($line =~ m/innobackup hello/) {
  3. A backup produced with the faulty version will have “xtrabackup ping 5″ in the xtrabackup_binlog_info file

As I mentioned before, the time period of getting the faulty build was very small, we already had in place all the steps needed for this not to happen and if you are using YUM or APT to manage packages, everything will automatically be taken care of for you even if you did initially get the faulty build.

For reference, the bug report is here.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com