Percona Toolkit and systemd

After some recent work with systemd I’ve realized it’s power and I can come clean that I am a fan. I realize that there are multitudes of posts out there with arguments both for and against systemd but let’s look at some nice ways to have systemd provide us with (but not limited to) pt-kill-as-a-service.

This brief post introduces you to a systemd unit file and how we can leverage them to enable pt-kill at startup, have it start after mysqld and ensure that MySQL is running by using the mysql service as a dependency of pt-kill. By using systemd to handle this we don’t have to complicate matters by ‘monitoring the monitor’ using hacky shell scripts, cron or utilities like monit.

So then, a quick primer on systemd, because lets face it, we’ve all been avoiding it. Systemd is not new but it made recent headlines in the Linux world due to some of the major distros announcing their intentions to migrate upcoming releases to systemd.

What is it? Well due to it’s depth it is best described as a suite of management daemons, libraries and tools that will replace the traditional init scripts. So essentially remember how you start a service, mount a volume or read the system logs…well start forgetting all of that because systemd is disrupting this space. With systemd comes some really neat tricks for administering your machines and I’m really only beginning to see the tip of this iceberg. There is admittedly a lot to learn with systemd but this should serve as pragmatic entrée.

Systemd what? When did this happen?

Linux distribution Date released as default
Arch Linux 000000002012-10-01-0000October 2012
CoreOS 000000002013-10-01-0000October 2013 (v94.0.0)
Debian 000000002015-04-01-0000April 2015 (v8 aka jessie)
Fedora 000000002011-05-01-0000May 2011 (v15)
Gentoo Linux N/A
Mageia 000000002012-05-01-0000May 2012 (v2.0)
openSUSE 000000002012-09-01-0000September 2012 (v12.2)
Red Hat Enterprise Linux 000000002014-06-01-0000June 2014 (v7.0)
Slackware N/A
SUSE Linux Enterprise Server 000000002014-10-01-0000October 2014 (v12)
Ubuntu 000000002015-04-01-0000April 2015 (v15.04)

Lennart Poettering, the name frequently attached with systemd is seeking to modernize the most fundamental process(es) of the Linux startup system, bringing the paradigms of modern computing; concurrency, parallelism and efficiency. The dependency tree of processes and services is more intuitive and the structure of the underlying startup scripts are unified. I feel that the direction proposed by systemd is an evolutionary one which promotes consistency within the startup scripts enabling conventions that can be easier understood by a broader audience.

Systemd and Percona Toolkit

This post aims to show that we can rely on systemd to handle processes such as pt-kill, pt-stalk, and other daemonized scripts that we like to have running perpetually, are fired at startup and can be reinstated after failure.

The scenario is this; I want pt-kill to drop all sleeping connections from a certain application user, lets call them, ‘caffeinebob’, because they never close connections. Due to various reasons we can’t make changes in the application so we’re employing Percona Toolkit favourite, pt-kill, to do this for us. For convenience we want this result to persist across server restarts. In the olden days we might have some cron job that fires a shell script in combination with a sentinal file to ensure it’s running. I’m pretty sure that this kitty could be skinned many ways.

The systemd Unit File

After some research and testing, the below unit file will play nicely on a Centos 7 node with systemd at it’s core. In this example I am running Percona Server 5.6 installed using Percona’s yum repo with the mysql.service unit file generated at installation. I suspect that there could be some systemd deviation with other MySQL variants however, this configuration is working for me.

Description = pt-kill caffeinebob mysql.service
Type = simple
PIDFile = /var/run/
ExecStart = /usr/bin/pt-kill
--match-user caffeinebob
--busy-time 10

Let’s examine the above and see what we’re working with. Systemd unit files have various biologies. The example above is a simple Service unit file. This means we are enacting a process controlled and supervised by systemd. The significance of the After directive is that this service will not attempt startup until after and mysql.service have been called. The Required directive is makes ptkill.service dependant on the mysql.service startup being successful.

The next part, the [Service] grouping, details the actions to be taken by the service. The Type can be one of many but as it’s a simple call to a script I’ve used the simple type. We are describing the command and the handling of it. The ExecStart is evidently the pt-kill command that we would usually run from the shell prompt or from within a shell script. This is a very corse example because we can opt to parameterize the command with the assistance of an Environment file. Note the use of the Restart directive, used so that systemd can handle a reaction should a failure occur that interrupts the process.

Finally under the [Install] grouping we’re telling systemd that this service should startup on a multi user system, and could be thought of as runlevel 2 or 3 (Multiuser mode).

So providing that we’ve got all the relevant paths, users and dependencies in place, once you reboot your host, mysql.service should in order, initiate mysqld and when that dependency is met, systemd will initiate pt-kill with our desired parameters to cull connections that meet the criteria stipulated in our configuration. This means you rely on systemd to manage pt-kill for you and you don’t necessarily need to remember to start this or similar processes when you restart you node.

Start up & enable

Now to envoke our service manually and add enable it to work on start up we should run the following systemctl commands;

[moore@localhost ~]$ sudo systemctl start ptkill.service
[moore@localhost ~]$ sudo systemctl enable ptkill.service

No feedback but no errors so we can check the status of the service

[moore@localhost ~]$ sudo systemctl status ptkill -l
ptkill.service - keep pt-kill persistent across restarts
   Loaded: loaded (/etc/systemd/system/ptkill.service; enabled)
   Active: active (running) since Wed 2015-08-12 02:39:13 BST; 1h 19min ago
 Main PID: 2628 (perl)
   CGroup: /system.slice/ptkill.service
           ??2628 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Perfect we can also instruct systemd to disable this and|or stop our service when the application is changed and caffeinebob close() all those open connections.

[moore@localhost ~]$ sudo systemctl stop ptkill.service
[moore@localhost ~]$ sudo systemctl disable ptkill.service

Now after successful implementation we see that our process is running delightfully;

[moore@localhost ~]$ ps -ef | grep pt-kill
root      2547     1  0 02:37 ?        00:00:00 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Catch me if I fall

Lets issue a kill signal to the process and observe it’s behaviour using journalctl

[moore@localhost ~]$ sudo kill -SEGV 2547

This will write similar entries into the system log;

[moore@localhost ~]$ sudo journalctl -xn -f
Aug 12 02:39:13 localhost.localdomain sudo[2624]: moore : TTY=pts/1 ; PWD=/home/moore ; USER=root ; COMMAND=/bin/kill -SEGV 2547
Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service: main process exited, code=killed, status=11/SEGV
Aug 12 02:39:13 localhost.localdomain systemd[1]: Unit ptkill.service entered failed state.
Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service holdoff time over, scheduling restart.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Stopping keep pt-kill persistent across restarts...
-- Subject: Unit ptkill.service has begun shutting down
-- Defined-By: systemd
-- Support:
-- Unit ptkill.service has begun shutting down.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Starting keep pt-kill persistent across restarts...
-- Subject: Unit ptkill.service has begun with start-up
-- Defined-By: systemd
-- Support:
-- Unit ptkill.service has begun starting up.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Started keep pt-kill persistent across restarts.
-- Subject: Unit ptkill.service has finished start-up
-- Defined-By: systemd
-- Support:
-- Unit ptkill.service has finished starting up.
-- The start-up result is done.

Pt-kill flaps after the kill signal but systemd has been instructed to restart on failure so we don’t see caffeinebob saturate our processlist with sleeping connections.

Another bonus with this workflow is use within orchestration. Any standardized unit files can be propagated to your fleet of hosts with tools such as Ansible, Chef, Puppet or Saltstack.

Closing note

I’d love to hear from the pragmatists from the systemd world to understand if this approach can be improved or whether there are any flaws in this example unit file that would require addressing. This is very much a new-school of thought for me and feedback is both welcome and encouraged.

Thank you for your time, happy systemd-ing.

The post Percona Toolkit and systemd appeared first on Percona Data Performance Blog.


Calling all MySQL DBAs: How do you use Percona Toolkit?

Percona Toolkit is one of our most mature open source applications. Derived from Maatkit and Aspersa, Percona Toolkit has evolved significantly over the years. The software now contains 32 tools, over 4,000 tests, and has been downloaded over 250,000 times. Anyone who manages a database – from DBAs to system administrators to even software developers – benefits from Percona Toolkit’s ability to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

We continue to make Percona Toolkit better each month. Over the last 9 months alone Percona has had 6 releases and resolved nearly 50 issues.


While Percona team members in Support, Consulting, and Managed Services are big drivers of identifying bugs and new features (driven mostly by Percona customer needs), the community of Percona Toolkit users plays a significant role in making the open source software what it is today.

We’d like to learn how we can make Percona Toolkit even better for your needs. Please take a brief survey so we can learn how you actually use the software. As a thank you for taking the survey, we are randomly giving away five $50 gift cards to participants. It’s a small token but one that we hope you’ll appreciate.

Recent additions to Percona Toolkit have included better Percona XtraDB Cluster support as well as multiple fixes and improvements to pt-online-schema-change, pt-kill, pt-query-digest, pt-stalk, and preparation for the MySQL 5.7 GA. Help us continue to improve Percona Toolkit by taking part in our survey. If you use Percona Toolkit and are attending Percona Live next month, please keep a look out for me. I’d like to hear about your experiences.

The post Calling all MySQL DBAs: How do you use Percona Toolkit? appeared first on MySQL Performance Blog.


3 handy tools to remove problematic MySQL processes

3 handy tools to remove problematic MySQL processesDBAs often encounter situations where they need to kill queries to ensure there are no long-running queries on a MySQL server that would impact performance. Long-running queries can be the result of many factors. Fortunately, Percona Server contains some handy tools to remove problematic MySQL processes. I will highlight all of the tools via some examples in this post.

There have been some good posts on this blog about the pt-kill tool, like this one by Arunjith Aravindan titled “How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill.” Let’s dive into pt-kill a bit further with a few more examples. What does pt-kill do? It kills MySQL connections. Say you wanted to run pt-kill from a cronjob and then get an email on every killed process/query. Here is typical example for that.

$ pt-kill --interval 1 --run-time 1 --busy-time 5 --log /path/to/kill_long_running_thread.log --match-info "^(select|SELECT|Select)" --kill --print --user=xxxxxx --password=xxxxxxxxxx

Assume this is running from a cronjob, When pt-kill executes, it will kill queries longer than 5 seconds. By default, pt-kill runs forever –run-time option tells how long pt-kill to run before exiting If –interval and –busy-time parameters are used together then the explicit –interval value is used. Otherwise the default interval is 30 seconds. Note: this will only kill all read queries as per the –match-info parameter.

The above command will log all killed queries in the file referenced with the –log option. If you need to be notified via email for every killed query, the command below will do it. Off-course, you need to have the system configured to send e-mail.

tail -n 0 -F /path/to/kill_long_running_thread.log | while read LOG
echo "$LOG" | mail -s "pt-kill alert"

You can execute this shell script in the background within screen or with the nohup utility continuously to monitor the log file. It will send an email whenever any new killed query arrives to the referenced log file in the pt-kill command. Unfortunately, there is no option to notify-by-email in pt-kill at the moment, so this is sort of a workaround.

In order to log all killed queries into a database table you will need to use the –log-dsn option as per the example below.

$ pt-kill --interval 1 --busy-time 1 --create-log-table --log-dsn=h=localhost,D=percona,t=kill_log --daemonize --match-info "^(select|SELECT|Select)" --kill

All killed queries will be logged into percona.kill_log table. The –daemonize option will run this command in the background forever and will kill all SELECT queries running longer than 1 second (–busy-time 1). The –interval option instructs pt-kill to scan processes every 1 second (–interval 1).

mysql> select * from kill_log;
| kill_id | server_id | timestamp           | reason                  | kill_error | Id    | User | Host      | db   | Command | Time | State      | Info            | Time_ms |
|      17 |         1 | 2015-01-10 08:38:33 | Query matches Info spec |            | 35146 | root | localhost | NULL | Query   |    0 | User sleep | SELECT SLEEP(5) |    NULL |
|      20 |         1 | 2015-01-10 08:38:34 | Query matches Info spec |            | 35223 | root | localhost | NULL | Query   |    0 | User sleep | SELECT SLEEP(5) |    NULL |

With the help of logging killed queries into a database tables. You can easily get all the trends/and /statistics on killed queries via SQL.

By default the tool kills the oldest of the queries that would have been running for more than a given –busy-time.  If you need to kill all the threads that have been busy for more than a specified –busy-time, then this will do it:

$ pt-kill --victims=all --busy-time=60

Statement Timeout in Percona Server:
The max-statement-time feature is ported from the Twitter patches. This feature can be used to limit the query execution time by specifying the timeout value in the max_statement_time variable. When the specified number of milliseconds is reached the server aborts the statement and returns the error below to the client.

ERROR 1877 (70101): Query execution was interrupted, max_statement_time exceeded

Let me demonstrate this through another example:

mysql [localhost] {msandbox} (world) > SET max_statement_time=1;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (world) > show variables like 'max_statement_time';
| Variable_name      | Value |
| max_statement_time | 1     |
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (world) > SELECT * FROM City WHERE District = 'abcd';
ERROR 1885 (70101): Query execution was interrupted, max_statement_time exceeded
mysql [localhost] {msandbox} (world) > UPDATE City SET District='abcd' WHERE ID = 2001;
ERROR 1885 (70101): Query execution was interrupted, max_statement_time exceeded
mysql [localhost] {msandbox} (world) > ALTER TABLE City ADD INDEX district_idx (district);
ERROR 1885 (70101): Query execution was interrupted, max_statement_time exceeded

As you can see from this example statement, the timeout feature works for all statements including SELECT/DML/DDL queries.

mysql [localhost] {msandbox} (world) > show status like 'Max_statement%';
| Variable_name                 | Value |
| Max_statement_time_exceeded   | 3     |
| Max_statement_time_set        | 19    |
| Max_statement_time_set_failed | 0     |
3 rows in set (0.00 sec)

The above mentioned status variables are stats for a statement timeout feature. Max_statement_time_exceeded will inform you that the total number of statements exceeded the defined timeout. Max_statement_time_set defines the number of statements for which execution time limit was set. You can find more details in this documentation. The statement timeout feature was introduced in Percona Server 5.6. You can check if your specific version of Percona Server supports this feature or not via the have_statement_timeout variable.

mysql [localhost] {msandbox} (world) > show global variables like 'have_statement_timeout';
| Variable_name          | Value |
| have_statement_timeout | YES   |
1 row in set (0.00 sec)

Bugs you should be aware of: -> This affects how the feature interacts with stored procedures. If you use stored procedures, max_statement_time might not behave as you expect. -> This is documentation bug. Percona Server timeouts might not be safe for some statements like DDL and should not be used with such queries, The documentation does not reflect this. You should be very careful if you set a global statement timeout, It affects data changing queries as well. For best results set the max_statement_time variable in a session before running queries that you want to be killed if they execute too long, instead of using a global variable. -> This affects the statement timeout feature on the query level. You must set max_statement_time in a session or globally instead however, this bug is fixed in latest version i.e. Percona Server 5.6.22-72.0

InnoDB Kill Idle Transactions:
This feature was introduced in Percona Server 5.5. It limits the age of idle XtraDB transactions and will kill idle transactions longer than a specified threshold for innodb_kill_idle_transaction. This feature is useful when autocommit is disabled on the server side and you are relying on the application to commit transactions and want to avoid long running transactions that are uncommitted. Application logic errors sometimes leaves transactions uncommitted. Let me demonstrate it quickly through one example:

mysql [localhost] {msandbox} (world) > show variables like 'autocommit';
| Variable_name | Value |
| autocommit    | OFF   |
mysql [localhost] {msandbox} (world) > show global variables like 'innodb_kill_idle_transaction';
| Variable_name                | Value |
| innodb_kill_idle_transaction | 10    |
mysql [localhost] {msandbox} (world) > START TRANSACTION; SELECT NOW(); INSERT INTO City_backup (Name,CountryCode,District,Population) VALUES ('Karachi','PK','Sindh','1000000');
Query OK, 0 rows affected (0.00 sec)
| NOW()               |
| 2015-01-31 07:11:39 |
1 row in set (0.00 sec)
Query OK, 1 row affected (0.01 sec)
mysql [localhost] {msandbox} (world) > SHOW ENGINE INNODB STATUSG
*************************** 1. row ***************************
---TRANSACTION 173076, ACTIVE 10 sec
1 lock struct(s), heap size 360, 0 row lock(s), undo log entries 1
MySQL thread id 15, OS thread handle 0x7f862e9bf700, query id 176 localhost msandbox init
TABLE LOCK table `world`.`City_backup` trx id 173076 lock mode IX
mysql [localhost] {msandbox} (world) > SELECT NOW(); SELECT * FROM City_backup;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    16
Current database: world
| NOW()               |
| 2015-01-31 07:12:06 |
1 row in set (0.01 sec)
Empty set (0.00 sec)

In this post, I shared some tools that can help you get rid of long-running transactions to help ensure that you don’t run into performance issues. This is one of the many good reasons to use Percona Server, which has some extra features on top of vanilla MySQL server.


The post 3 handy tools to remove problematic MySQL processes appeared first on MySQL Performance Blog.


Percona Toolkit 2.2.13 is now available

Percona ToolkitPercona is pleased to announce the availability of Percona Toolkit 2.2.13.  Released January 26, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes multiple bug fixes for pt-table-checksum with better support for Percona XtraDB Cluster, various other fixes, as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • pt-kill now supports new --query-id option. This option can be used to print a query fingerprint hash after killing a query to enable the cross-referencing with the pt-query-digest output. This option can be used along with --print option as well.

Bugs Fixed:

  • Fixed bug 1408375: Percona Toolkit was vulnerable to MITM attack which could allow exfiltration of MySQL configuration information via --version-check option. This vulnerability was logged as CVE 2015-1027
  • Fixed bug 1019479: pt-table-checksum now works with ONLY_FULL_GROUP_BY SQL mode.
  • Fixed bug 1394934: running pt-table-checksum in debug mode would cause an error.
  • Fixed bug 1396868: regression introduced in Percona Toolkit 2.2.12 caused pt-online-schema-change not to honor --ask-pass option.
  • Fixed bug 1399789: pt-table-checksum would fail to find Percona XtraDB Cluster nodes when variable wsrep_node_incoming_address was set to AUTO.
  • Fixed bug 1321297: pt-table-checksum was reporting differences on timestamp columns with replication from 5.5 to 5.6 server version, although the data was identical.
  • Fixed bug 1388870: pt-table-checksum was showing differences if the master and slave were in different time zone.
  • Fixed bug 1402668: pt-mysql-summary would exit if Percona XtraDB Cluster was in Donor/Desynced state.
  • Fixed bug 1266869: pt-stalk would fail to start if $HOME environment variable was not set.

Details of the release can be found in the release notes and the 2.2.13 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.13 is now available appeared first on MySQL Performance Blog.


How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill

You might have encountered situations where you had to kill some specific select queries that were running for long periods and choking the database. This post will go into more detail with an example of report query offloading.

Report query (select) offloading to a slave server is a common practice to reduce the workload of the master DB server. The long running selects will be executed in the slave for report generation. I have observed in many cases where the slave used to get delayed or the slave DB encounters a slowdown due to some heavy long-running orphaned selects from some wrong reports.

There are two main ways to kill queries in MySQL: 1. use custom scripts that match on a regular expression, or 2. use a tool written and supported by Percona that is designed to kill queries based on matching conditions. Below is one script that will help you to kill those queries. The script will take the process list from MySQL and filter the long-running select query considering the “User”, “Time” and “State” from the list. However I suggest that you use the pt-kill tool from Percona Toolkit which provides a more reliable choice and options based on your requirements.

Process list:

| Id  | User | Host      | db   | Command | Time | State                        | Info                        |
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist            |
| 104 | root | localhost | test | Sleep   |  383 |                              | NULL                        |
| 106 | root | localhost | test | Query   |  377 | Waiting for table level lock | SELECT * FROM t FOR UPDATE  |
| 107 | root | localhost | test | Query   |  364 | Waiting for table level lock | insert into t value(5)      |
| 108 | rpt  | localhost | test | Query   |  345 | Waiting for table level lock | SELECT c1 FROM t FOR UPDATE |

1. Shell script example:

List all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/) { print $2}'); do echo "KILL QUERY $id;"; done
Kill all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/)  { print $2}'); do mysql  -e "KILL QUERY $id;"; done

2. Pt-kill example:

List all queries from rpt user having query time greater than 1 minute (–print):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:51:01 KILL 108 (Query 485 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:51:31 KILL 108 (Query 515 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:52:01 KILL 108 (Query 545 sec) SELECT c1 FROM t FOR UPDATE

Kill all queries from rpt user having query time greater than 1 minute (–kill-query):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --kill-query --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:53:26 KILL QUERY 108 (Query 630 sec) SELECT c1 FROM t FOR UPDATE

Process list:

| Id  | User | Host      | db   | Command | Time | State                        | Info                       |
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist           |
| 104 | root | localhost | test | Sleep   |  843 |                              | NULL                       |
| 106 | root | localhost | test | Query   |  837 | Waiting for table level lock | SELECT * FROM t FOR UPDATE |
| 107 | root | localhost | test | Query   |  824 | Waiting for table level lock | insert into t value(5)     |
| 108 | rpt  | localhost | test | Sleep   |  805 |                              | NULL                       |
| 111 | root | localhost | NULL | Sleep   |   25 |                              | NULL                       |

 Note : –kill-query option makes pt-kill kill matching queries. This requires MySQL 5.0 or newer. Unlike –kill which kills the connection for matching queries, this option only kills the query, not its connection.

pt-kill captures queries from SHOW PROCESSLIST, filters them, and then either kills or prints them. This is also known as a “slow query sniper” in some circles. The idea is to watch for queries that might be consuming too many resources, and kill them. For details please read the tool’s (pt-kill) documentation.

The post How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by