Apr
18
2013
--

Rotating MySQL slow logs safely

This blog post is part two of two. Like part one, published Wednesday, this is a cross-post from Groupon’s engineering blog. Thanks again to Kyle Oppenheim at Groupon. And one more reminder that I’ll be at the Percona Live MySQL Conference and Expo next week in Santa Clara, California so look for me there. You can checkout the sessions I’ll be leading here.

In my last post, I described a solution for keeping the caches of a MySQL standby server hot using MySQL slow logs with long_query_time set to 0. Here are a some lessons we learned when logging a high volume of queries to the slow log.

Do not use copytruncate

Logrotate offers two techniques for log rotation (your log rotation scheme likely offers similar options with a different name):

  1. copytruncate – Copies the file to a new name, and then truncates the original file.
  2. no copytruncate – Uses the rename() system call to move the file to a new name, and then expects the daemon to be signaled to reopen its log file.

MySQL has a mutex for slow log writes. Truncation can block MySQL because the OS serializes access to the inode during the truncate operation. This problem is particularly evident when using the ext3 file system (instead of xfs).

Use FLUSH LOGS instead of sending SIGHUP

When copytruncate is disabled, MySQL must be told to reopen the slow log file. There are two options for signaling:

  1. Send a HUP signal to the mysqld process.
  2. Use the mysql console or mysqladmin utility to FLUSH LOGS;

These options should be equivalent, but MySQL bug 65481 explains that the HUP signal also flushes tables in addition to logs. Flushing tables can impact running queries.

Disable MySQL slow logs during rotation

Flushing logs takes time. Meanwhile, queries are still being executed. To prevent MySQL from filling the slow log buffer, we disable the MySQL slow logs temporarily during log rotation.

Putting it all together

Here is a logrotate configuration file for a slow log that illustrates these best practices:

/var/mysql/slow_query.log {
    nocompress
    create 660 mysql mysql
    size 1G
    dateext
    missingok
    notifempty
    sharedscripts
    postrotate
       /usr/local/bin/mysql -e 'select @@global.long_query_time into @lqt_save; set global long_query_time=2000; select sleep(2); FLUSH LOGS; select sleep(2); set global long_query_time=@lqt_save;'
    endscript
    rotate 150

The post Rotating MySQL slow logs safely appeared first on MySQL Performance Blog.

Apr
16
2013
--

Is your MySQL buffer pool warm? Make it sweat!

Today’s blog post diving into the waters of the MySQL buffer pool is a cross-post from Groupon’s engineering blog, and is Part 1 of 2. Thank you to Kyle Oppenheim at Groupon for contributing to this project and post. We’ll be posting Part 2 on Thursday. I’ll be at the Percona Live MySQL Conference and Expo next week in Santa Clara, California so look for me there – I’d love to connect and talk more about MySQL buffer pools or anything else that’s on your mind!

There are numerous solutions for MySQL high availability. Many rely on MySQL’s asynchronous replication to maintain a warm standby server which is flipped into service if the active master server has an issue. At Groupon, our standard MySQL database configuration follows this active/passive pattern. There is a virtual IP that points to the active server of a pair. The passive server has mysqld running and replicating from the active master. In theory, failing over is the simple matter of moving the virtual IP. In practice, it’s slightly more complicated. To manage this process, we use a tool co-developed with Percona.

“Warm standby server”? Did you catch that? What does that mean? In most of these high-availability solutions it means that mysqld is running on the standby server and that replication is not lagging. Unfortunately, that’s often not sufficient for failover during peak traffic. The standby server does not process query traffic except for replication. The buffer pool and adaptive hash index on the standby server will not have recently accessed pages. When it starts handling queries after failover, the lower cache hit rates can lead to outages. Specifically, at Groupon, our servers would be heavily I/O bound after failover as the buffer pool pages were loaded from disk.

Replaying Queries

Working with Groupon, we have developed a solution to keep the standby server’s caches hot. (See my Fosdem 2013 slides for information about discarded designs and benchmarks.)

First, we set long_query_time to 0 in order to log every query. (See part two for handling massive slow log volume.) The slow logs are served, via HTTP, by mysql_slowlogd. This daemon is similar to running tail -f slow.log, except that it knows how to follow the log stream across log rotation events. On the standby server, the logs are replayed with Percona Playback by streaming the slow log from the active server.

wget -q -O - http://master_server:3307/slow | percona-playback --mysql-host 127.0.0.1 --mysql-username playback --mysql-password PaSSwOrd --mysql-schema schema_name --query-log-stdin --dispatcher-plugin thread-pool --thread-pool-threads-count 100 --session-init-query \"set innodb_fake_changes=1\" > /var/log/playback.log 2>&1 &

Our awesome development team added a few features to Percona Playback to make it work better for this use case. You will need version 0.6 or later to get these features. Be aware that playback output is really verbose, in production, most likely you want it to redirect to /dev/null, and only have a log file for debugging purposes.

  1. Streaming logs from stdin Percona Playback now supports the –query-log-stdin command-line option for accepting a never-ending stream of queries to playback.
  2. Read-only playback Using the –session-init-query command-line option, we set the option innodb_fake_changes to prevent INSERTs, UPDATEs, and DELETEs from corrupting the data on the standby server. You will need Percona Server in order to use innodb_fake_changes.
  3. Thread pool Percona Playback added a connection pool option via –dispatcher-plugin-thread-pool that will allow connection reuse. This is necessary when running a large stream of queries.

Benchmarks

We benchmarked with slow query logs captured from our production systems. We restored a production database backup to our test database so that our test database was consistent before applying the captured query traffic. This is an important step because update statements that match no rows or insert statements that have duplicate key errors may be faster than an actual database write.

The slow logs were split into chunks, each containing roughly 1M queries. We warmed the cold database with the first chunk and replayed the second chunk after the warmup.

Disk read I/O for chunk 1 followed by 2 - MySQL buffer pool

The y axis is logarithmic, so the difference between the IO usage is 2 orders of magnitude. All graphs looked like this (we did 39 measurements), the next graph shows chunk 4’s workload warmed up with chunk 3.

Disk read IO for chunk 3 followed by 4 - MySQL buffer pool

The result is similar for every single graph, each chunk warmed up the buffer pool for the next one.

disk_io_chunk_1and1 - MySQL buffer pool

As an additional experiment we tried replaying the same chunk again. We expected everything to be cached if we warmed the cache with the exact same data. All the graphs from such self-warming experiments look like this one. The green part of the graph lines up with the blue part.

Check back Thursday for Part 2!

The post Is your MySQL buffer pool warm? Make it sweat! appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com