Oct
04
2021
--

Synchronize Tables on the Same Server with pt-table-sync

Synchronize Tables on the Same MySQL Server

Synchronize Tables on the Same MySQL ServerIt is a common use case to synchronize data in two tables inside MySQL servers. This blog post describes one specific case: how to synchronize data between two different tables on the same MySQL server. This could be useful, for example, if you test DML query performance and do not want to affect production data. After few experiments, tables get out of sync and you may need to update the test one to continue working on improving your queries. There are other use cases when you may need to synchronize the content of the two different tables on the same server, and this blog will show you how to do it.

Table Content Synchronization

The industry-standard tool for table content synchronization – pt-table-sync – is designed to synchronize data between different MySQL servers and does not support bulk synchronization between two different databases on the same server yet. If you try it, you will receive an error message:

$ pt-table-sync D=db1 D=db2 --execute --no-check-slave
You specified a database but not a table in D=db1.  Are you trying to sync only tables in the 'db1' database?  If so, use '--databases db1' instead.

However, it is possible to synchronize two individual tables on the same server by providing table names as DSN parameters:

$ pt-table-sync D=db1,t=foo D=db2,t=foo --execute --verbose
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:24:26 03:24:26 2    db1.foo

You may even synchronize two tables in the same database:

$ pt-table-sync D=db2,t=foo D=db2,t=bar --execute --verbose
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:25:34 03:25:34 2    db2.foo

We can use this feature to perform bulk synchronization.

First, we need to prepare a list of tables we want to synchronize:

$ mysql --skip-column-names -se "SHOW TABLES IN db2" > db1-db2.sync

$ cat db1-db2.sync
bar
baz
foo

Then we can invoke the tool as follows:

$ for i in `cat db1-db2.sync`; do pt-table-sync D=db1,t=$i D=db2,t=$i --execute --verbose; done
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 GroupBy   03:31:52 03:31:52 0    db1.bar
# Syncing D=db2,t=baz
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:31:52 03:31:52 2    db1.baz
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 GroupBy   03:31:52 03:31:52 0    db1.foo

If you have multiple database pairs to sync, you can agree on the file name and parse it before looping through table names. For example, if you use pattern

SOURCE_DATABASE-TARGET_DATABASE.sync

  you can use the following loop:

$ for tbls in `ls *.sync`
>   do dbs=`basename -s .sync $tbls`
>   source=${dbs%-*}
>   target=${dbs##*-}
>   for i in `cat $tbls`
>     do pt-table-sync D=$source,t=$i D=$target,t=$i --execute --verbose 
>   done
> done
# Syncing D=cookbook_copy,t=limbs
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      4      0 GroupBy   04:07:07 04:07:07 2    cookbook.limbs
# Syncing D=cookbook_copy,t=limbs_myisam
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      5      0 GroupBy   04:07:08 04:07:08 2    cookbook.limbs_myisam
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   04:07:08 04:07:08 2    db1.bar
# Syncing D=db2,t=baz
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      5      0 GroupBy   04:07:08 04:07:08 2    db1.baz
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      0      0 GroupBy   04:07:08 04:07:08 2    db1.foo

Note that

pt-table-sync

synchronizes only tables that exist in both databases. It does not create tables that do not exist in the target database and does not remove those that do not exist in the source database. If your schema could be out of sync, you need to synchronize it first.

I used option

--verbose

in all my examples, so you can see what the tool is doing. If you omit this option the tool still is able to synchronize tables on the same server.

Complete the 2021 Percona Open Source Data Management Software Survey

Have Your Say!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com