Mar
09
2015
--

5 free handy tools for monitoring and managing MySQL replication

MySQL Replication is very simple to set up. In this post I’ll discuss its importance and five handy tools for monitoring and managing MySQL replication.

What is MySQL Replication? It’s the process of copying the (real-time events) data from one master instance to another slave instance and maintaining the redundant consistent data in a different machine. This enables a distributed database system that shares the same level of information.

In MySQL the replication works based on the three threads as shown below.

1) I/O thread on the slave server:  To start on receiving replication events, an I/O thread starts on the slave server and connects to the master server.

2) Master connection handler thread:  As a connection handier, master starts a thread whenever a replication slave connects to a master. The master server sends the events from its binary log file to the slave I/O thread, notifying slave about newly written events to its binary log. The slave I/O thread which records them to in the slave’s relay log file.

3) Slave SQL thread:  When it starts, immediately reads the events from the relay log and applies on the Slave DB. Once it finishes the processing of every relay log and if the I/O thread is writing the events to a new relay log file then it deletes the processed one. Suppose if the  I/O thread is writing  the events on a relay log and which is the same file SQL thread is reading then the SQL thread pauses until more events are available in the relay log.

MySQL replication (slave) redundant instances is an excellent method of improving data performance and stability. It ensures the availability of another copy of a database whenever there arises any issues with the master server. One of the other advantages is the report query (select) offloading to a slave server, which is a common practice to reduce the workload of the master DB server as there are multiple servers that are able to respond to the queries. The third advantage is to schedule the backup from the slave server, etc.

All the benefits discussed above are smart and effective only if replication is up and running and the data is in sync with the master.

Let us see the set of very useful tools from Percona Toolkit which help you in monitoring and managing the MySQL replication (Slaves).

5 free handy tools for monitoring and managing MySQL replication1) pt-heartbeat: Tool measures/monitor replication lag on a MySQL in real time. It is important to have a replication monitoring system to confirm that replication is up and running and lag is current.

In typical way of monitoring, we use “SHOW SLAVE STATUS” to find out the information like Slave_IO_Running: Yes, Slave_SQL_Running: Yes and Seconds_Behind_Master: 0 etc, but is not reliable as  Seconds_Behind_Master shows difference between last timestamp read in the binlogs and current time. Many reasons like slow network, long running queries, blocking operations or a 2nd level slaves(Master > slave1> slave2) etc. can produce an irrelevant value for the variable.

So I recommend to use pt-heartbeat, which depends only on the heartbeat record being replicated to slave instead of the above said unreliable method of finding the lag. pt-heartbeat will insert/update a row in the master and the time delay is calculated depending on when the data was inserted and when it became available to read in the slave. It works at any depth in the replication hierarchy. For example, it will reliably report how far a slave lags its original master (master’s master).

Example :

On Master:
[root@Tst1Master ~]#pt-heartbeat --daemonize -D test --update -h<IP address> --create-table
On Slave:
[root@Tst1Slave ~]#pt-heartbeat -D test --monitor --master-server-id 1
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]

We used to schedule the backup from the slave to avoid the additional load with the master server. In this case it is important to confirm the slave is current with the master to ascertain the backup is having the recent data. Here is a simple script you can use to verify the replication status on a periodical basis(cron) and to know the status just before the backup scheduled.

#!/bin/bash
#     <300 - [Good]
#     300> <600 - [Warning]
#     > 600 - [Critical]
MAIL_FROM="root@`hostname`"
MAIL_TO="mailid@mail.com"
Warningthreshold=300
Criticalthreshold=600
backup=$1
CMD=$(/root/bin/pt-heartbeat -D test --master-server-id 1 --check | cut -d. -f1)
# Pass the parameter "test.sh backup" to denote the call is from the backup script.
if [ $CMD -lt $Warningthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Good] current delay: "$CMD;
elif [ $CMD -gt $Warningthreshold ] && [ $CMD -lt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Warning] current delay: "$CMD;
elif [ $CMD -gt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Critical] current delay: $CMD Check the replication"
else
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Error] Replication status check failed need to investigate."
fi
#No arguments supplied"
if [ -z "$1" ] && [ $CMD -gt $Warningthreshold ]
then
(echo "Subject: Replication status on `hostname`";
echo "Replication status : "
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
elif [ $# -eq 1 ]
then
(echo "Subject: Replication status check prior to backup on `hostname`";
echo "Replication status prior to backup:"
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
fi

2) pt-slave-find: Finds and prints replication hierarchy of the slaves – shows you the topology and replication hierarchy of your MySQL replication instances.

Example :

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10
192.168.56.10
Version         5.6.22-72.0-log
Server ID       1
Uptime          42:09 (started 2015-03-03T01:40:42)
Replication     Is not a slave, has 1 slaves connected, is not read_only
Filters
Binary logging  STATEMENT
Slave status
Slave mode      STRICT
Auto-increment  increment 1, offset 1
InnoDB version  5.6.22-72.0
+- 192.168.56.11
   Version         5.6.22-72.0
   Server ID       2
   Uptime          41:48 (started 2015-03-03T01:41:03)
   Replication     Is a slave, has 0 slaves connected, is not read_only
   Filters
   Binary logging  STATEMENT
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.6.22-72.0

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10 --report-format=hostname
192.168.56.10
+- 192.168.56.11

3) pt-slave-restart: Watches the MySQL replication slaves for any error and tries to restart the replication.

The tool is very useful for skipping statements that cause errors and continuing replication. If you use this carelessly, the slave will be having the inconsistent data. However  when you use the tool, I recommended you to confirm the consistency of data between master and slave with help of pt-table-checksum.

Example : Restart the slave for error-numbers=1062 (Duplicate entry ‘1’ for key ‘PRIMARY’)

#pt-slave-restart --socket=/var/lib/mysql/custom-feeds/mysql.sock --ask-pass --error-numbers=1062

4) pt-table-checksum: Performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

Example :

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T02:34:44      0      1        2       1       0   0.011 d.t

Note: It  is important to run the pt-table-checksum tool regardless of whether or not you’ve ever skipped an event with pt-slave-restart to make sure we are having the identical data on the slave side.

5) pt-table-sync: Sync the slave with their master (synchronizes data efficiently between MySQL tables.)

Example :

[root@Tst1Slave ~]# ./pt-table-sync -dD --print  --sync-to-master 192.168.56.11
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('1', 'Test1') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('2', 'Test2') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
[root@Tst1Slave ~]#

[root@Tst1Slave ~]# ./pt-table-sync -dD  --verbose  --execute  --sync-to-master 192.168.56.11
# Syncing h=192.168.56.11
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       2      0      0 Chunk     03:38:09 03:38:09 2    d.t

We have successfully sync the tables so let us try the checksum again and confirm the table is in sync.

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T03:03:40      0      0        2       1       0   0.111 d.t

The aforesaid are the simple examples and based on your needs, you can choose the tools, options or modify the scripts. I also recommend that you to go through the documentations for more details on each tools.

The post 5 free handy tools for monitoring and managing MySQL replication appeared first on MySQL Performance Blog.

Jan
02
2015
--

The MySQL Query Cache: How it works, plus workload impacts (good and bad)

Query caching is one of the prominent features in MySQL and a vital part of query optimization. It is important to know how it works as it has the potential to cause significant performance improvements – or a slowdown – of your workload.

The MySQL query cache is a global one shared among the sessions. It caches the select query along with the result set, which enables the identical selects to execute faster as the data fetches from the in memory. It is important to have everything identical, no new comments, spaces, or most significantly differences in the WHERE clause. Basically when you trigger a select query, if it is available in the cache; it fetches from there or it considers the query as a new one and will go to the parser.

Even though it has some nice advantages, the MySQL query cache has its own downsides too. Well, let’s think about this: If you are frequently updating the table, you are then invalidating the query cache for ALL queries cached for that table. So really, anytime you have a “frequently updated table” means you’re probably not going to get any sort of good usage from the query cache. See the below example.

mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 0       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.06 sec)
mysql> select * from d.t1;
405 rows in set (0.05 sec)
mysql> select * from d.t1 where id=88995159;
1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1020600 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 2       |
| Qcache_total_blocks     | 6       |
+-------------------------+---------+
8 rows in set (0.00 sec)

From the above  we are sure the queries are cached. Let us try an insert and see the status, it will invalidate the query cache and reclaim the memory.

mysql> insert into d.t1 (data)value('Welcome');
Query OK, 1 row affected (0.05 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.00 sec)

Now let us think about how to decide the query cache size:

To exemplify:-  I am having a mysql instance with two tables “t” and “t1″. Table “t” is with numerous records and “t1″ is with a fewer records. Let’s restart the mysql and see the query cache details.

mysql> show variables like 'query_cache_size';
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| query_cache_size | 1048576 |
+------------------+---------+
1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 0       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.01 sec)

From the above status note the below four points.

1) There is around 1 MB free space with Qcache.

2) The queries in Qcache are zero.

3) There is no Qcache hits.

4) Qcache lowmem prunes is zero.

mysql> select * from d.t1;
405 rows in set (0.03 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1021624 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 1       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 1       |
| Qcache_total_blocks     | 4       |
+-------------------------+---------+
8 rows in set (0.01 sec)

From the aforesaid status it is clear the query has been cached and it should execute much faster in the next try and increase the Qcache hits status variable by one.

mysql> select * from d.t1;
405 rows in set (0.00 sec).
mysql>  SHOW STATUS LIKE "%Qcache_hits%";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Qcache_hits   | 1     |
+---------------+-------+
1 row in set (0.00 sec)

Now let us see how the data is getting pruned from the Qcache. For this I will execute a select on table “t” which is having massive records.

mysql> select * from d.t where id > 78995159;
mysql>  SHOW STATUS LIKE "Qcache_lowmem_prunes";
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Qcache_lowmem_prunes | 1     |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> select * from d.t1;
405 rows in set (0.02 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1021624 |
| Qcache_hits             | 1       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 1       |
| Qcache_not_cached       | 6       |
| Qcache_queries_in_cache | 1       |
| Qcache_total_blocks     | 4       |
+-------------------------+---------+
8 rows in set (0.01 sec)

The Qcache_lowmem_prunes is the status variable which indicates how many times MySQL had to clear/prune some data from the Qcache to make space for the outputs of other queries. We need to observe the Qcache_lowmem_prunes  status variable and try to increase/adjust the size of the cache till we get a very low value ratio for the variable.

It is also undesirable to keep the query cache relatively high value at 256 MB as the Qcache invalidation becomes costly. For details, Peter Zaitsev wrote about this a few years ago in a post that’s still relevant today titled, “Beware large Query_Cache sizes.”

Contention often makes query cache the bottleneck instead of help when you have many CPU cores. Generally, query cache should be off unless proven useful for your workload. So it is important to know your environment well to enable the query cache and to decide what the query cache size should be.

There will also be circumstances where there is no chance of identical selects and in this case it is important to set the query_cache_size and query_cache_type variable to zero. The query_cache_type variable controls the query cache and  setting the query_cache_type to zero will reduce the significant overhead in query execution. On a highly concurrent environment there are chances of query cache mutex, which may become the source of a bottleneck. Setting the query_cache_type to zero will avoid the query cache mutex, as the query cache cannot be enabled at runtime which reduces the overhead in query execution. Please go through the details of QUERY CACHE ENHANCEMENTS with Percona Server.

The post The MySQL Query Cache: How it works, plus workload impacts (good and bad) appeared first on MySQL Performance Blog.

Nov
19
2014
--

How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill

You might have encountered situations where you had to kill some specific select queries that were running for long periods and choking the database. This post will go into more detail with an example of report query offloading.

Report query (select) offloading to a slave server is a common practice to reduce the workload of the master DB server. The long running selects will be executed in the slave for report generation. I have observed in many cases where the slave used to get delayed or the slave DB encounters a slowdown due to some heavy long-running orphaned selects from some wrong reports.

There are two main ways to kill queries in MySQL: 1. use custom scripts that match on a regular expression, or 2. use a tool written and supported by Percona that is designed to kill queries based on matching conditions. Below is one script that will help you to kill those queries. The script will take the process list from MySQL and filter the long-running select query considering the “User”, “Time” and “State” from the list. However I suggest that you use the pt-kill tool from Percona Toolkit which provides a more reliable choice and options based on your requirements.

Process list:

+-----+------+-----------+------+---------+------+------------------------------+-----------------------------+
| Id  | User | Host      | db   | Command | Time | State                        | Info                        |
+-----+------+-----------+------+---------+------+------------------------------+-----------------------------+
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist            |
| 104 | root | localhost | test | Sleep   |  383 |                              | NULL                        |
| 106 | root | localhost | test | Query   |  377 | Waiting for table level lock | SELECT * FROM t FOR UPDATE  |
| 107 | root | localhost | test | Query   |  364 | Waiting for table level lock | insert into t value(5)      |
| 108 | rpt  | localhost | test | Query   |  345 | Waiting for table level lock | SELECT c1 FROM t FOR UPDATE |
+-----+------+-----------+------+---------+------+------------------------------+-----------------------------+

1. Shell script example:

List all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/) { print $2}'); do echo "KILL QUERY $id;"; done
KILL QUERY 108;
Kill all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/)  { print $2}'); do mysql  -e "KILL QUERY $id;"; done

2. Pt-kill example:

List all queries from rpt user having query time greater than 1 minute (–print):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:51:01 KILL 108 (Query 485 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:51:31 KILL 108 (Query 515 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:52:01 KILL 108 (Query 545 sec) SELECT c1 FROM t FOR UPDATE

Kill all queries from rpt user having query time greater than 1 minute (–kill-query):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --kill-query --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:53:26 KILL QUERY 108 (Query 630 sec) SELECT c1 FROM t FOR UPDATE

Process list:

+-----+------+-----------+------+---------+------+------------------------------+----------------------------+
| Id  | User | Host      | db   | Command | Time | State                        | Info                       |
+-----+------+-----------+------+---------+------+------------------------------+----------------------------+
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist           |
| 104 | root | localhost | test | Sleep   |  843 |                              | NULL                       |
| 106 | root | localhost | test | Query   |  837 | Waiting for table level lock | SELECT * FROM t FOR UPDATE |
| 107 | root | localhost | test | Query   |  824 | Waiting for table level lock | insert into t value(5)     |
| 108 | rpt  | localhost | test | Sleep   |  805 |                              | NULL                       |
| 111 | root | localhost | NULL | Sleep   |   25 |                              | NULL                       |
+-----+------+-----------+------+---------+------+------------------------------+----------------------------+

 Note : –kill-query option makes pt-kill kill matching queries. This requires MySQL 5.0 or newer. Unlike –kill which kills the connection for matching queries, this option only kills the query, not its connection.

pt-kill captures queries from SHOW PROCESSLIST, filters them, and then either kills or prints them. This is also known as a “slow query sniper” in some circles. The idea is to watch for queries that might be consuming too many resources, and kill them. For details please read the tool’s (pt-kill) documentation.

The post How a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill appeared first on MySQL Performance Blog.

Oct
13
2014
--

How to avoid hash collisions when using MySQL’s CRC32 function

Percona Toolkit’s  pt-table-checksum performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master – and the tool pt-table-sync synchronizes data efficiently between MySQL tables.

The tools by default use the CRC32. Other good choices include MD5 and SHA1. If you have installed the FNV_64 user-defined function, pt-table-sync will detect it and prefer to use it, because it is much faster than the built-ins. You can also use MURMUR_HASH if you’ve installed that user-defined function. Both of these are distributed with Maatkit. For details please see the tool’s documentation.

Below are test cases similar to what you might have encountered. By using the table checksum we can confirm that the two tables are identical and useful to verify a slave server is in sync with its master. The following test cases with pt-table-checksum and pt-table-sync will help you use the tools more accurately.

For example, in a master-slave setup we have a table with a primary key on column “a” and a unique key on column “b”. Here the master and slave tables are not in sync and the tables are having two identical values and two distinct values. The pt-table-checksum tool should be able to identify the difference between master and slave and the pt-table-sync in this case should sync the tables with two REPLACE queries.

+-----+-----+    +-----+-----+
|  a  |  b  |    |  a  |  b  |
+-----+-----+    +-----+-----+
|  2  |  1  |    |  2  |  1  |
|  1  |  2  |    |  1  |  2  |
|  4  |  3  |    |  3  |  3  |
|  3  |  4  |    |  4  |  4  |
+-----+-----+    +-----+-----+

Case 1:  Non-cryptographic Hash function (CRC32) and the Hash collision.

The tables in the source and target have two different columns and in general way of thinking the tools should identify the difference. But the below scenarios explain how the tools can be wrongly used and how to avoid them – and make things more consistent and reliable when using the tools in your production.

The tools by default use the CRC32 checksums and it is prone to hash collisions. In the below case the non-cryptographic function (CRC32) is not able to identify the two distinct values as the function generates the same value even we are having the distinct values in the tables.

CREATE TABLE `t1` (
  `a` int(11) NOT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`a`),
  UNIQUE KEY `b` (`b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Master           Slave
+-----+-----+    +-----+-----+
|  a  |  b  |    |  a  |  b  |
+-----+-----+    +-----+-----+
|  2  |  1  |    |  2  |  1  |
|  1  |  2  |    |  1  |  2  |
|  4  |  3  |    |  3  |  3  |
|  3  |  4  |    |  4  |  4  |
+-----+-----+    +-----+-----+

Master:
[root@localhost mysql]#  pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1
localhost --user=root --password=*** --no-check-binlog-format
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-17T00:59:45      0      0        4       1       0   1.081 db1.t1
Slave:
[root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=***
 --verbose  --sync-to-master  192.**.**.**
# Syncing via replication h=192.**.**.**,p=...,u=root
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE

Narrowed down to BIT_XOR:

Master:
mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`;
+------------------------------------------------------------+
| BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) |
+------------------------------------------------------------+
|                                                    6581445 |
+------------------------------------------------------------+
1 row in set (0.00 sec)
Slave:
mysql> SELECT BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) FROM `db1`.`t1`;
+------------------------------------------------------------+
| BIT_XOR(CAST(CRC32(CONCAT_WS('#', `a`, `b`)) AS UNSIGNED)) |
+------------------------------------------------------------+
|                                                    6581445 |
+------------------------------------------------------------+
1 row in set (0.16 sec)

Case 2: As the tools are not able to identify the difference, let us add a new row to the slave and check if the tools are able to identify the distinct values. So I am adding a new row (5,5) to the slave.

mysql> insert into db1.t1 values(5,5);
Query OK, 1 row affected (0.05 sec)
Master           Slave
+-----+-----+    +-----+-----+
|  a  |  b  |    |  a  |  b  |
+-----+-----+    +-----+-----+
|  2  |  1  |    |  2  |  1  |
|  1  |  2  |    |  1  |  2  |
|  4  |  3  |    |  3  |  3  |
|  3  |  4  |    |  4  |  4  |
+-----+-----+    |  5  |  5  |
                 +-----+-----+

[root@localhost mysql]#  pt-table-checksum --replicate=percona.checksum --create-replicate-table --databases=db1 --tables=t1
 localhost --user=root --password=*** --no-check-binlog-format
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-17T01:01:13      0      1        4       1       0   1.054 db1.t1
[root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=***
 --verbose  --sync-to-master  192.**.**.**
# Syncing via replication h=192.**.**.**,p=...,u=root
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
DELETE FROM `db1`.`t1` WHERE `a`='5' LIMIT 1 /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**.
10,p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum
replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/;
REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**,
p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum
 replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/;
REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.**.**.**,
p=...,u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.**.**.**,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum
replicate:percona.checksum bidirectional:0 pid:5205 user:root host:localhost.localdomain*/;
#      1       2      0      0 Chunk     01:01:43 01:01:43 2    db1.t1

Well, apparently the tools are now able to identify the newly added row in the slave and the two other rows having the difference.

Case 3: Advantage of Cryptographic Hash functions (Ex: Secure MD5)

As such let us make the tables as in the case1 and ask the tools to use the cryptographic (secure MD5) hash functions instead the usual non-cryptographic function. The default CRC32 function provides no security due to their simple mathematical structure and too prone to hash collisions but the MD5 provides better level of integrity. So let us try with the –function=md5 and see the result.

Master           Slave
+-----+-----+    +-----+-----+
|  a  |  b  |    |  a  |  b  |
+-----+-----+    +-----+-----+
|  2  |  1  |    |  2  |  1  |
|  1  |  2  |    |  1  |  2  |
|  4  |  3  |    |  3  |  3  |
|  3  |  4  |    |  4  |  4  |
+-----+-----+    +-----+-----+

Narrowed down to BIT_XOR:

Master:
mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING
(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`))
, 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`;
+------+----+---+------+------+------+-----+----------------------------------+
| test | t2 | 1 | NULL | NULL | NULL | cnt | crc                              |
+------+----+---+------+------+------+-----+----------------------------------+
| test | t2 | 1 | NULL | NULL | NULL |   4 | 000000000000000063f65b71e539df48 |
+------+----+---+------+------+------+-----+----------------------------------+
1 row in set (0.00 sec)
Slave:
mysql> SELECT 'test', 't2', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING
(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := md5(CONCAT_WS('#', `a`, `b`))
, 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `db1`.`t1`;
+------+----+---+------+------+------+-----+----------------------------------+
| test | t2 | 1 | NULL | NULL | NULL | cnt | crc                              |
+------+----+---+------+------+------+-----+----------------------------------+
| test | t2 | 1 | NULL | NULL | NULL |   4 | 0000000000000000df024e1a4a32c31f |
+------+----+---+------+------+------+-----+----------------------------------+
1 row in set (0.00 sec)

[root@localhost mysql]# pt-table-checksum --replicate=percona.checksum --create-replicate-table --function=md5 --databases=db1
 --tables=t1 localhost --user=root --password=*** --no-check-binlog-format
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-23T23:57:52      0      1       12       1       0   0.292 db1.t1
[root@localhost bin]# ./pt-table-sync --print --execute --replicate=percona.checksum --tables db1.t1 --user=root --password=amma
 --verbose --function=md5 --sync-to-master  192.***.***.***
# Syncing via replication h=192.168.56.102,p=...,u=root
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('3', '4') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=...,
u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.***.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum
replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/;
REPLACE INTO `db1`.`t1`(`a`, `b`) VALUES ('4', '3') /*percona-toolkit src_db:db1 src_tbl:t1 src_dsn:P=3306,h=192.168.56.101,p=...,
u=root dst_db:db1 dst_tbl:t1 dst_dsn:h=192.***.**.***,p=...,u=root lock:1 transaction:1 changing_src:percona.checksum
replicate:percona.checksum bidirectional:0 pid:5608 user:root host:localhost.localdomain*/;
#      0       2      0      0 Chunk     04:46:04 04:46:04 2    db1.t1

Master           Slave
+-----+-----+    +-----+-----+
|  a  |  b  |    |  a  |  b  |
+-----+-----+    +-----+-----+
|  2  |  1  |    |  2  |  1  |
|  1  |  2  |    |  1  |  2  |
|  4  |  3  |    |  4  |  3  |
|  3  |  4  |    |  3  |  4  |
+-----+-----+    +-----+-----+

The MD5 did the trick and solved the problem. See the BIT_XOR result for the MD5 given above and the function is able to identify the distinct values in the tables and resulted with the different crc values. The MD5 (Message-Digest algorithm 5) is a well-known cryptographic hash function with a 128-bit resulting hash value. MD5 is widely used in security-related applications, and is also frequently used to check the integrity but MD5() and SHA1() are very CPU-intensive with slower checksumming if chunk-time is included.

 

The post How to avoid hash collisions when using MySQL’s CRC32 function appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com