Sep
14
2016
--

pmp-check-pt-table-checksum Percona Monitoring Plugin

pmp-check-pt-table-checksum

pmp-check-pt-table-checksumRecently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum

 alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave.

pmp-checksum-pt-table-checksum

 monitors data differences on the slave from the checksum table as per information in the last checksum performed by the

pt-table-checksum

 tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the

pmp-check-pt-table-checksum

 documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so

pt-table-checksum

 can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
10 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed

pt-table-checksum

 from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-25T04:57:10 0 1 10 1 0 0.018 test.t1
[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
Differences on slave1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1

pt-table-checksum

 correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the 

pmp-check-pt-table-checksum

  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1
[root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The

checksum_user

 database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the 

pmp-check-pt-table-checksum

 command to the commands.cfg file:

define command{
command_name            pmp-check-pt-table-checksum
command_line            $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$
        }

NOTE: I used “-c” option for

pmp-check-pt-table-checksum

, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{
        use                             generic-service
        host_name                       slave1
        service_description             Checksum Status
        check_command                   pmp-check-pt-table-checksum!1
}

In this command “1” is an argument to command “-c $ARG1$” so

pmp-check-pt-table-checksum

will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:

pmp-check-pt-table-checksum
pmp-check-pt-table-checksum

I also think the “INTERVAL” option is useful:

-i INTERVAL     Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have

pt-table-checksum

 cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule

pt-table-checksum

 via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com