I recently worked through an issue with a client trying to detect data drift across some servers that were located in different timezones. Unfortunately, several of the tables had timestamp fields and were set to a default value of CURRENT_TIMESTAMP. From the manual, here is how MySQL handles timezone locality with timestamp fields:
Values for TIMESTAMP
columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval.
This can result in every row being deemed out of sync when each server is using in it’s own timezone. Fortunately, there is a workaround for this as a result of how pt-table-checksum/pt-table-sync calculate their checksums. Lets look at an example:
Node1 > PDT > select timediff(now(),convert_tz(
-> now(),
-> @@session.time_zone,'+00:00'
-> )) as tz_offset;
+-----------+
| tz_offset |
+-----------+
| -07:00:00 |
+-----------+
Node2 > EDT > select timediff(now(),convert_tz(
-> now(),
-> @@session.time_zone,'+00:00'
-> )) as tz_offset;
+-----------+
| tz_offset |
+-----------+
| -04:00:00 |
+-----------+
Node1 is configured in PDT time while node2 is configured in EDT time. Now, lets insert some rows into a table with timestamp field configured to use CURRENT_TIMESTAMP as the default and verify that they look different when no timezone manipulation takes place:
Node1 > PDT > select * from foo.bar;
+----+-------------+---------------------+
| id | data | date_created |
+----+-------------+---------------------+
| 1 | node1, row1 | 2012-09-24 09:38:49 |
| 2 | node1, row2 | 2012-09-24 09:38:49 |
| 3 | node1, row3 | 2012-09-24 09:38:49 |
+----+-------------+---------------------+
Node2 > EDT > select * from foo.bar;
+----+-------------+---------------------+
| id | data | date_created |
+----+-------------+---------------------+
| 1 | node1, row1 | 2012-09-24 12:38:49 |
| 2 | node1, row2 | 2012-09-24 12:38:49 |
| 3 | node1, row3 | 2012-09-24 12:38:49 |
+----+-------------+---------------------+
So, when we run pt-table-checksum against node1, we see that even though MySQL is storing these values in UTC internally, we have “data drift” on both nodes:
pt-table-checksum –replicate=percona.checksums –databases=foo h=localhost
Node1 > PDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 763d97f1 | 763d97f1 |
+----------+------------+
Node2 > EDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| b890c395 | 763d97f1 |
+----------+------------+
Naturally, pt-table-sync finds this and reports that all 3 rows in this chunk are different and gives the REPLACE INTO statements bring node2 in sync:
[root@test-master mbenshoof]# pt-table-sync --print --replicate percona.checksums --sync-to-master h=node2
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('1', 'node1, row1', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('2', 'node1, row2', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('3', 'node1, row3', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;
This is where we can use the @@session.time_zone variable to our advantage. If both servers were set up using UTC as the timezone (which is the recommended practice – store everything in UTC and then perform the timezone logic in the application), we wouldn’t see any data drift. And as explained above, the timestamps are actually stored in UTC on the server regardless of the timezone. So, to overcome the timezone inconsistencies, we can use the –set-vars option with pt-table-checksum like this:
pt-table-checksum –replicate percona.checksums –databases=foo –set-vars=”time_zone=’+00:00′” h=localhost
This effectively sets both servers to UTC (in terms of display logic) and the checksums are calculated in the same locality:
Node1 > PDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 4cdfe786 | 4cdfe786 |
+----------+------------+
Node2 > EDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 4cdfe786 | 4cdfe786 |
+----------+------------+
This same option can and should be applied to pt-table-sync as well, especially if the checksums were created without it. Since it uses the checksum table to find the chunks but still recalculates the checksums on the fly to get the actual rows in need of syncing, it will not find those rows even though pt-table-checksum reported the chunk as out of sync. You can verify this yourself looking at the actual SQL generated by pt-table-sync when looking at rows:
SELECT /*foo.bar:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(‘#’, `id`, `data`, `date_created` + 0, CONCAT(ISNULL(`data`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `foo`.`bar` FORCE INDEX (`PRIMARY`) WHERE (1=1) FOR UPDATE;
When you update your session timezone (SET @@session.time_zone = “+01:00″ for example), you will see different output for this. Experiment with different timezones and each time, the checksum will change. However, the underlying data isn’t touched and remains the same.
In general, it is safest to always use UTC as your timezone for MySQL to prevent this extra logic from being required, but the use of –set-vars can be a very powerful tool when you need to influence the session variables on different servers for whatever reason.