This is the second and last tutorial blog post on how to use pt-table-checksum / pt-table-sync tools for MySQL replication.
In the first post, I showed you how to use the
pt-table-checksum
/
pt-table-sync
tools in a typical MySQL replication setup. In this post, we’ll discuss MySQL replication for more advanced topologies. I will show you how to use these tools in a chained master/slave replication setup in general, and in Percona XtraDB Cluster specifically. (Here is another good post about how to use the
pt-table-checksum
tool in Percona XtraDB Cluster.)
Let me first show you an example where I tried to run
pt-table-checksum
/
pt-table-sync
on a chained master/slave setup. In this example, the setup is Master -> Slave1 -> Slave2, where the master has a binary log format set to STATEMENT, while the chained Slave1 has a binary log format set to ROW.
Master:
mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';
+---------------+-----------+
| Variable_name | Value |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+
Slave1:
mysql> SHOW VARIABLES LIKE 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
[root@master ~]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.
As soon as I tried to execute
pt-table-checksum
on this replication topology, it failed because the binary logs use different formats. The
pt-table-checksum
documentation mentions this issue in the Limitations section.
The problem here is that Slave1 receives changes from its Master, which will be STATEMENT (
pt-table-checksum
will force this into STATEMENT binlog_format in its session). Slave1 executes the checksum queries, and all seems good until that point. But then Slave1 will also write the changes to its binary log to replicate to Slave2.
pt-table-checksum
only works properly when STATEMENT-based formatting is used to generate the checksum. If it’s ROW-based, the row changes to the checksums table are just replicated from the chain master, and no checksumming happens. In this case, this is what happens with Slave2, which is why
pt-table-checksum
doesn’t perform its checksum.
To remedy this problem we need to ignore this checking by using –no-check-binlog-format, or we need to change the binary format to STATEMENT on chain master (Slave1).
In our second example, I’m going to run
pt-table-checksum
on a three-node Percona XtraDB Cluster. Data inconsistencies can occur in Galera Cluster because of human errors or bugs. For this purpose, I inserted a few rows in Node1, and the Node1/Node2 test.dummy table data look like the following:
Node1 & Node2:
mysql> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
| 5 | e |
+----+------+
5 rows in set (0.00 sec)
Whereas Node3 is missing last two records, as shown below (I intentionally deleted the last couple rows with the setting wsrep_on off/on in between):
Node3:
mysql> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
| 1 | a |
| 2 | b |
| 3 | c |
+----+------+
3 rows in set (0.00 sec)
I have filled the DSN table accordingly with checksum user credentials:
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.10,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.14,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.169,u=checksum_user,p=checksum_password,P=3306');
pt-table-checksum
on Percona XtraDB Cluster works with a single cluster, where all nodes are cluster nodes and not regular replicas. The example above is a good one – Node1, Node2 and Node3 all belong to a single cluster. Cluster-to-cluster setups aren’t supported. (Refer to documentation for more details.)
Let’s try to run
pt-table-checksum
from authoritative node Node1:
root@node1:# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-21T08:27:12 0 1 5 1 0 0.045 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1
pt-table-checksum
prints messages during the checksum process saying it can’t check the replica lag. It’s obvious why: SHOW SLAVE STATUS is not supported on cluster nodes as it would be in a traditional master -> slave(s) replication setup. Further,
pt-table-checksum
found the difference on the test.dummy table. Now,
pt-table-checksum
also supports “recursion-method=cluster,” which will auto-discover cluster nodes by querying the variable wsrep_incoming_addresses status. This new recursion-method “cluster” works equally well in Percona XtraDB Cluster. You may use “recursion-method=cluster” in place of “recursion-method=dsn” Let’s find the differences again using recursion-method “cluster”:
root@node1:~# pt-table-checksum --empty-replicate-table --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-24T02:19:48 0 0 3 1 0 0.017 percona.dsns
02-24T02:19:48 0 1 5 1 0 0.017 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1
The differences are found on Node3. Now the next big challenge is to synchronize Node3 with the other nodes in the cluster.
pt-table-sync
is the tool to synchronize MySQL table data differences, but unfortunately, the
pt-table-sync
tool is not supported on Galera Cluster. I tried few hacks with pt-table-sync in Percona XtraDB Cluster, but no luck (as shown below):
Sync test.dummy on node1 to node3:
root@node1:~# pt-table-sync --dry-run h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
# NOTE: --dry-run does not show if data needs to be synced because it
# does not access, compare or sync data. --dry-run only shows
# the work that would be done.
# Syncing D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 0 0 Chunk 09:11:31 09:11:31 0 test.dummy
root@node1:~# pt-table-sync --print h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;
I tried to run
pt-table-sync
from Node1 to sync the differences on Node3. I first used –dry-run to make sure everything was good. Later, I passed the –print option to review queries before actually syncing. In XtraDB Cluster,
pt-table-sync
prints INSERT/UPDATE/DELETE queries instead of REPLACE queries. Executing
pt-table-sync
with the –execute option will break the cluster, as those changes will be replicated to all the cluster nodes. Since Node1 and Node2 already have those records in their table, the cluster will break with a “Duplicate Key” error. To workaround this problem, use the
pt-table-sync
–replace option (as shown below):
root@node1:/var/lib/mysql# pt-table-sync --print --replace h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;
When I run
pt-table-sync
with “–execute” and “–replace”, it breaks with deadlock errors (see here: https://bugs.launchpad.net/percona-toolkit/+bug/1250450). Let’s try to run those REPLACE statements from one of the authoritative nodes (i.e., Node1):
mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.03 sec)
mysql> source replace.sql;
Query OK, 1 row affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.00 sec)
As you can see,
pt-table-sync
tool prints REPLACE INTO statements, which seems correct as those will be no-op changes for nodes that already have that data, and replication will sync unsynchronized nodes. But here is the problem: Galera Cluster works through a ROW binary log format, and REPLACE statements won’t get to an unsynchronized node since there won’t be a binary log event recorded for REPLACE statements into the binary log if executed from a node that is already synchronized (Node1 and Node2 in this case) since a ROW image change didn’t occur.
So how do we synchronized those changes in Galera Cluster? I’ve found a couple of workarounds to synchronize nodes in Galera Cluster!
Approach # 1:
Generate a .sql script via
pt-table-sync
tool with –print option, and review the changes that need to occur on the unsynchronized node. Once you are satisfied, you can execute this script directly on the problematic node with binary logging disabled for that particular execution (you can push SET SQL_LOG_BIN=0 at first and enable SQL_LOG_BIN=1 again at last line of script). This ensures those changes don’t synchronize to other nodes of the cluster.
NOTE: This could be risky if you forgot to disable binary logging (i.e., SET SQL_LOG_BIN=0 for the synchronization process) because as previously explained changes will replicate to all nodes, and all the remaining nodes in the cluster will go down (as those rows already exist on the other nodes). So, be careful with this method!
Also, it’s important to mention that when manually syncing with an SQL file, it may not be an atomic operation. Between the time it takes to generate the SQL file and execute it, there is a chance that the rows in question could change. To deal with this, you should engage LOCK TABLE tablename WRITE followed by UNLOCK TABLES to release the lock after the syncing process. In fact, the most secure method is to stop writing to the table that needs to be synchronized on all the nodes before running
pt-table-sync
. You can resume writes on the table after the sync operation for the target table completes. One other important thing is if there are a vast number of table rows to sync via this method, it makes sense to enable wsrep_desync (set global
wsrep_desync=on
) until the node gets synchronized, and then disable
wsrep_desync
again.
As during this synchronization process, this node may lag behind and flow control will no longer take care of the desynced node.
mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)
mysql> LOCK TABLE dummy WRITE;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)
mysql> select * from test.dummy;
+----+------+
| id | name |
+----+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
| 5 | e |
+----+------+
5 rows in set (0.00 sec)
mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.01 sec)
mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)
mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)
Approach # 2:
Another approach is to rebuild database table in question. I got this idea from Jervin’s impressive post. You need to use the pt-online-schema-change tool and a NOOP ALTER. You need to run it through one of the authoritative synchronization nodes of the cluster, as below:
root@node2:~# pt-online-schema-change --execute --alter 'ENGINE=INNODB' D=test,t=dummy
No slaves found. See --recursion-method if host node4 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `test`.`dummy`...
Creating new table...
Created new table test._dummy_new OK.
Altering new table...
Altered `test`.`_dummy_new` OK.
2016-02-11T01:23:22 Creating triggers...
2016-02-11T01:23:22 Created triggers OK.
2016-02-11T01:23:22 Copying approximately 5 rows...
2016-02-11T01:23:22 Copied rows OK.
2016-02-11T01:23:22 Swapping tables...
2016-02-11T01:23:22 Swapped original and new tables OK.
2016-02-11T01:23:22 Dropping old table...
2016-02-11T01:23:22 Dropped old table `test`.`_dummy_old` OK.
2016-02-11T01:23:22 Dropping triggers...
2016-02-11T01:23:22 Dropped triggers OK.
Successfully altered `test`.`dummy`.
Once the table is rebuild, ROW images will be copied to the node needing synchronization. It might be a good idea to use the –dry-run option before doing the actual changes with –execute.
You’ll need to decide which of these approaches to choose: while this second approach seems a much safer option, it may take some time to complete. If the database table is in gigabytes and terabytes, and the differences in rows are few compared to the actual table size, then the first approach could be faster (but riskier).
In next example, I have added an async node as a replica of Node1, where Node1 acts as the master server for the async slave. The setup looks like:
node1 <-> node2 <-> node3
|
|
+-> replica (async)
pt-table-checksum
can only detect differences on a replica from the master node (i.e., Node1). In this case, the tool needs to be executed from the master node (Node1). If you run
pt-table-checksum
from Node2 or Node3, it won’t detect diffs on the replica because its master node is Node1. You can find details about this in the documentation.
Let’s try to see all this in below examples. In this test, I’m going to run the
pt-table-checksum
tool from Node1 (master) of the async node (slave) to see if it can find the differences on the async replica node.
root@node1:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-03T03:02:48 0 0 3 1 0 0.022 percona.dsns
09-03T03:02:48 0 1 5 1 0 0.041 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Differences on async1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1
The
pt-table-checksum
tool can find the differences on async node test.dummy table. Let’s try to sync those diffs with the help of
pt-table-sync tool
:
root@async1:~# pt-table-sync --dry-run --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
# NOTE: --dry-run does not show if data needs to be synced because it
# does not access, compare or sync data. --dry-run only shows
# the work that would be done.
# Syncing D=test,h=localhost,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 0 0 Chunk 08:38:55 08:38:55 0 test.dummy
root@async1:~# pt-table-sync --print --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;
root@async1:~# pt-table-sync --execute --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
Deadlock found when trying to get lock; try restarting transaction at line 6115 while doing test.dummy on localhost
As you can see, I executed
pt-table-sync
tool from the async node to find the exact differences, and it printed the two rows that were missing on the dummy table on the async replica. Further, when executed with the –execute option, the tool fails with a “Deadlock” error. In reality, there is no deadlock found from the InnoDB status. In order to sync the slave with its master, you can take one of the approaches described above: either use the .sql script to execute into slave server directly, or rebuild the table on the master Galera node and it will rebuild across the slaves.
Let me show you one final example where I tried to run
pt-table-checksum
from Node3 to find the differences on the async slave with Node1 as its master host:
root@node3:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
node3 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.
Here, Node3 refused to find differences because there is no direct replica attached to it. The documentation mentions this behavior, as cluster environment, differences can only be detected on the master node directly connected to async slave (as is the case here with Node1 to async replica). In order to run a checksum from Node3 to check for differences between cluster nodes, use the –recursion-method mentioned in previous examples.
Conclusion:
Galera Cluster doesn’t support
pt-table-sync
in general. But here we’ve found a couple of workarounds to solve this problem:
- You can sync those changes manually by redirecting the
pt-table-sync --print
output to file to generate a .SQL template, review changes carefully and execute it on nodes with the missing data. Again, this approach is fast but possibly dangerous.
- Another option is the noop alter method using
pt-online-schema-change
to rebuild the table. It’s the recommended method, but it could be slower.
Finally, after synchronizing the data changes it is a good idea to re-run the checksum tool to verify any discrepancies.