Mar
09
2015
--

5 free handy tools for monitoring and managing MySQL replication

MySQL Replication is very simple to set up. In this post I’ll discuss its importance and five handy tools for monitoring and managing MySQL replication.

What is MySQL Replication? It’s the process of copying the (real-time events) data from one master instance to another slave instance and maintaining the redundant consistent data in a different machine. This enables a distributed database system that shares the same level of information.

In MySQL the replication works based on the three threads as shown below.

1) I/O thread on the slave server:  To start on receiving replication events, an I/O thread starts on the slave server and connects to the master server.

2) Master connection handler thread:  As a connection handier, master starts a thread whenever a replication slave connects to a master. The master server sends the events from its binary log file to the slave I/O thread, notifying slave about newly written events to its binary log. The slave I/O thread which records them to in the slave’s relay log file.

3) Slave SQL thread:  When it starts, immediately reads the events from the relay log and applies on the Slave DB. Once it finishes the processing of every relay log and if the I/O thread is writing the events to a new relay log file then it deletes the processed one. Suppose if the  I/O thread is writing  the events on a relay log and which is the same file SQL thread is reading then the SQL thread pauses until more events are available in the relay log.

MySQL replication (slave) redundant instances is an excellent method of improving data performance and stability. It ensures the availability of another copy of a database whenever there arises any issues with the master server. One of the other advantages is the report query (select) offloading to a slave server, which is a common practice to reduce the workload of the master DB server as there are multiple servers that are able to respond to the queries. The third advantage is to schedule the backup from the slave server, etc.

All the benefits discussed above are smart and effective only if replication is up and running and the data is in sync with the master.

Let us see the set of very useful tools from Percona Toolkit which help you in monitoring and managing the MySQL replication (Slaves).

5 free handy tools for monitoring and managing MySQL replication1) pt-heartbeat: Tool measures/monitor replication lag on a MySQL in real time. It is important to have a replication monitoring system to confirm that replication is up and running and lag is current.

In typical way of monitoring, we use “SHOW SLAVE STATUS” to find out the information like Slave_IO_Running: Yes, Slave_SQL_Running: Yes and Seconds_Behind_Master: 0 etc, but is not reliable as  Seconds_Behind_Master shows difference between last timestamp read in the binlogs and current time. Many reasons like slow network, long running queries, blocking operations or a 2nd level slaves(Master > slave1> slave2) etc. can produce an irrelevant value for the variable.

So I recommend to use pt-heartbeat, which depends only on the heartbeat record being replicated to slave instead of the above said unreliable method of finding the lag. pt-heartbeat will insert/update a row in the master and the time delay is calculated depending on when the data was inserted and when it became available to read in the slave. It works at any depth in the replication hierarchy. For example, it will reliably report how far a slave lags its original master (master’s master).

Example :

On Master:
[root@Tst1Master ~]#pt-heartbeat --daemonize -D test --update -h<IP address> --create-table
On Slave:
[root@Tst1Slave ~]#pt-heartbeat -D test --monitor --master-server-id 1
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]

We used to schedule the backup from the slave to avoid the additional load with the master server. In this case it is important to confirm the slave is current with the master to ascertain the backup is having the recent data. Here is a simple script you can use to verify the replication status on a periodical basis(cron) and to know the status just before the backup scheduled.

#!/bin/bash
#     <300 - [Good]
#     300> <600 - [Warning]
#     > 600 - [Critical]
MAIL_FROM="root@`hostname`"
MAIL_TO="mailid@mail.com"
Warningthreshold=300
Criticalthreshold=600
backup=$1
CMD=$(/root/bin/pt-heartbeat -D test --master-server-id 1 --check | cut -d. -f1)
# Pass the parameter "test.sh backup" to denote the call is from the backup script.
if [ $CMD -lt $Warningthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Good] current delay: "$CMD;
elif [ $CMD -gt $Warningthreshold ] && [ $CMD -lt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Warning] current delay: "$CMD;
elif [ $CMD -gt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Critical] current delay: $CMD Check the replication"
else
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Error] Replication status check failed need to investigate."
fi
#No arguments supplied"
if [ -z "$1" ] && [ $CMD -gt $Warningthreshold ]
then
(echo "Subject: Replication status on `hostname`";
echo "Replication status : "
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
elif [ $# -eq 1 ]
then
(echo "Subject: Replication status check prior to backup on `hostname`";
echo "Replication status prior to backup:"
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
fi

2) pt-slave-find: Finds and prints replication hierarchy of the slaves – shows you the topology and replication hierarchy of your MySQL replication instances.

Example :

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10
192.168.56.10
Version         5.6.22-72.0-log
Server ID       1
Uptime          42:09 (started 2015-03-03T01:40:42)
Replication     Is not a slave, has 1 slaves connected, is not read_only
Filters
Binary logging  STATEMENT
Slave status
Slave mode      STRICT
Auto-increment  increment 1, offset 1
InnoDB version  5.6.22-72.0
+- 192.168.56.11
   Version         5.6.22-72.0
   Server ID       2
   Uptime          41:48 (started 2015-03-03T01:41:03)
   Replication     Is a slave, has 0 slaves connected, is not read_only
   Filters
   Binary logging  STATEMENT
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.6.22-72.0

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10 --report-format=hostname
192.168.56.10
+- 192.168.56.11

3) pt-slave-restart: Watches the MySQL replication slaves for any error and tries to restart the replication.

The tool is very useful for skipping statements that cause errors and continuing replication. If you use this carelessly, the slave will be having the inconsistent data. However  when you use the tool, I recommended you to confirm the consistency of data between master and slave with help of pt-table-checksum.

Example : Restart the slave for error-numbers=1062 (Duplicate entry ‘1’ for key ‘PRIMARY’)

#pt-slave-restart --socket=/var/lib/mysql/custom-feeds/mysql.sock --ask-pass --error-numbers=1062

4) pt-table-checksum: Performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

Example :

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T02:34:44      0      1        2       1       0   0.011 d.t

Note: It  is important to run the pt-table-checksum tool regardless of whether or not you’ve ever skipped an event with pt-slave-restart to make sure we are having the identical data on the slave side.

5) pt-table-sync: Sync the slave with their master (synchronizes data efficiently between MySQL tables.)

Example :

[root@Tst1Slave ~]# ./pt-table-sync -dD --print  --sync-to-master 192.168.56.11
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('1', 'Test1') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('2', 'Test2') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
[root@Tst1Slave ~]#

[root@Tst1Slave ~]# ./pt-table-sync -dD  --verbose  --execute  --sync-to-master 192.168.56.11
# Syncing h=192.168.56.11
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       2      0      0 Chunk     03:38:09 03:38:09 2    d.t

We have successfully sync the tables so let us try the checksum again and confirm the table is in sync.

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T03:03:40      0      0        2       1       0   0.111 d.t

The aforesaid are the simple examples and based on your needs, you can choose the tools, options or modify the scripts. I also recommend that you to go through the documentations for more details on each tools.

The post 5 free handy tools for monitoring and managing MySQL replication appeared first on MySQL Performance Blog.

May
06
2013
--

Percona Toolkit 2.2.2 released; bug fixes include pt-heartbeat & pt-archiver

During the Percona Live MySQL Conference & Expo 2013 the week before last, we quietly released Percona Toolkit 2.2.2 with a few bug fixes:

  • pt-archiver –bulk-insert may corrupt data
  • pt-heartbeat –utc –check always returns 0
  • pt-query-digest 2.2 prints unwanted debug info on tcpdump parsing errors
  • pt-query-digest 2.2 prints too many string values
  • Some tools don’t have –help or –version
  • Some tools use @@hostname without /*!50038*/
  • pt-stalk prints the wrong variable name in verbose mode when –function is used
  • And more

Percona ToolkitThe first two bugs are important. Those using pt-archiver –bulk-insert with UTF-8 data should definitely upgrade. Those using pt-heartbeat –utc should also definitely upgrade.

Users may note the revival of the –show-all option in pt-query-digest. This had been removed in 2.2.1, but resulted in too much output in certain cases.

A new –recursion-method was added to pt-table-checksum: cluster. This method attempts to auto-discover cluster nodes, alleviating the need to specify cluster node DSNs in a DSN table (–recursion-method=dsn).

See https://launchpad.net/percona-toolkit/+milestone/2.2.2 for the full list of bugs fixed in Percona Toolkit 2.2.2.

Visit http://www.percona.com/software/percona-toolkit to download Percona Toolkit 2.2.2; it’s also in our main repos.

The post Percona Toolkit 2.2.2 released; bug fixes include pt-heartbeat & pt-archiver appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com