It is a common use case to synchronize data in two tables inside MySQL servers. This blog post describes one specific case: how to synchronize data between two different tables on the same MySQL server. This could be useful, for example, if you test DML query performance and do not want to affect production data. After few experiments, tables get out of sync and you may need to update the test one to continue working on improving your queries. There are other use cases when you may need to synchronize the content of the two different tables on the same server, and this blog will show you how to do it.
Table Content Synchronization
The industry-standard tool for table content synchronization – pt-table-sync – is designed to synchronize data between different MySQL servers and does not support bulk synchronization between two different databases on the same server yet. If you try it, you will receive an error message:
$ pt-table-sync D=db1 D=db2 --execute --no-check-slave You specified a database but not a table in D=db1. Are you trying to sync only tables in the 'db1' database? If so, use '--databases db1' instead.
However, it is possible to synchronize two individual tables on the same server by providing table names as DSN parameters:
$ pt-table-sync D=db1,t=foo D=db2,t=foo --execute --verbose # Syncing D=db2,t=foo # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 5 0 GroupBy 03:24:26 03:24:26 2 db1.foo
You may even synchronize two tables in the same database:
$ pt-table-sync D=db2,t=foo D=db2,t=bar --execute --verbose # Syncing D=db2,t=bar # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 5 0 GroupBy 03:25:34 03:25:34 2 db2.foo
We can use this feature to perform bulk synchronization.
First, we need to prepare a list of tables we want to synchronize:
$ mysql --skip-column-names -se "SHOW TABLES IN db2" > db1-db2.sync $ cat db1-db2.sync bar baz foo
Then we can invoke the tool as follows:
$ for i in `cat db1-db2.sync`; do pt-table-sync D=db1,t=$i D=db2,t=$i --execute --verbose; done # Syncing D=db2,t=bar # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 0 0 GroupBy 03:31:52 03:31:52 0 db1.bar # Syncing D=db2,t=baz # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 5 0 GroupBy 03:31:52 03:31:52 2 db1.baz # Syncing D=db2,t=foo # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 0 0 GroupBy 03:31:52 03:31:52 0 db1.foo
If you have multiple database pairs to sync, you can agree on the file name and parse it before looping through table names. For example, if you use pattern
SOURCE_DATABASE-TARGET_DATABASE.sync
you can use the following loop:
$ for tbls in `ls *.sync` > do dbs=`basename -s .sync $tbls` > source=${dbs%-*} > target=${dbs##*-} > for i in `cat $tbls` > do pt-table-sync D=$source,t=$i D=$target,t=$i --execute --verbose > done > done # Syncing D=cookbook_copy,t=limbs # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 4 0 GroupBy 04:07:07 04:07:07 2 cookbook.limbs # Syncing D=cookbook_copy,t=limbs_myisam # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 5 0 5 0 GroupBy 04:07:08 04:07:08 2 cookbook.limbs_myisam # Syncing D=db2,t=bar # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 0 0 5 0 GroupBy 04:07:08 04:07:08 2 db1.bar # Syncing D=db2,t=baz # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 5 0 5 0 GroupBy 04:07:08 04:07:08 2 db1.baz # Syncing D=db2,t=foo # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # 5 0 0 0 GroupBy 04:07:08 04:07:08 2 db1.foo
Note that
pt-table-sync
synchronizes only tables that exist in both databases. It does not create tables that do not exist in the target database and does not remove those that do not exist in the source database. If your schema could be out of sync, you need to synchronize it first.
I used option
--verbose
in all my examples, so you can see what the tool is doing. If you omit this option the tool still is able to synchronize tables on the same server.
Complete the 2021 Percona Open Source Data Management Software Survey