May
06
2014
--

Row-based replication, MySQL 5.6 upgrades and temporal data types

Whither your rollback plan?

MySQL 5.6 upgrades are in full swing these days and knowing how to safely upgrade from MySQL 5.5 to 5.6 is important. When upgrading a replication environment, it’s important that you can build a migration plan that safely allows for your upgrade with minimal risk — rollback is often a very important component to this.

For many people this means upgrading slaves first and then the master.  The strategy of an older master replicating to a newer slave is well known and has been supported in MySQL replication for a very long time.  To be specific:  you can have a MySQL 5.6 slave of a 5.5 master and this should work fine until you upgrade your master and/or promote one of the slaves to be the master.

However, there are those of us who like to live on the edge and do unsupported things.  Suppose that when you cut over to that MySQL 5.6 master your application completely breaks.  What would your rollback plan be?   In such a case, leaving a 5.5 slave of the new 5.6 master (or perhaps a dual-master setup with 5.5 and 5.6) would be useful to allow you to rollback to but still have the data written on the 5.6 master.

What might break?

With Statement-based replication (SBR), you are generally ok with this type of setup, provided you aren’t doing any MySQL 5.6 syntax-specific things until you don’t have any more 5.5 slaves.  However, with Row-based replication (RBR), things are a bit trickier, particularly when column formats change.

Now, one nice new feature of MySQL 5.6 is the improvement of the storage requirements for DATETIME fields as well as the addition of fractional second support for TIME, DATETIME, and TIMESTAMP.   This is great, but unfortunately this is a new column format that 5.5 clearly would not understand.  Does this put our 5.6 to 5.5 replication in jeopardy?    The answer is, if we’re careful, NO.

Quite simply, MySQL 5.6 supports both old and new types and mysql_upgrade does not make such a conversion on existing tables.  Only NEW tables or REBUILT tables in 5.6 will use the new format.  Any tables from 5.5 with a simple mysql_upgrade to 5.6 will still be using the old types.  For more information on how to find columns in 5.6 that are using the old format, see Ike Walker’s excellent blog post on the topic.  (Thanks Ike!)

An imperfect test

To test this out, I created a simple experiment.  I have a master and slave using RBR, both on 5.5, and I setup pt-heartbeat to update the master.  I realized that pt-heartbeat actually uses a varchar for the timestamp field — I suspect this makes multiple database support easier.  However, since pt-heartbeat’s update uses a NOW() to populate that field, I can convert it to a DATETIME:

[root@master ~]# pt-heartbeat --update  --database percona --create-table
CREATE TABLE `heartbeat` (
  `ts` varchar(26) NOT NULL,
  `server_id` int(10) unsigned NOT NULL,
  `file` varchar(255) DEFAULT NULL,
  `position` bigint(20) unsigned DEFAULT NULL,
  `relay_master_log_file` varchar(255) DEFAULT NULL,
  `exec_master_log_pos` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`server_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
master mysql> alter table heartbeat drop column ts, add column ts DATETIME;
slave mysql> select * from heartbeat\G
 *************************** 1. row ***************************
             server_id: 1
                  file: master-bin.000002
              position: 5107583
 relay_master_log_file: NULL
   exec_master_log_pos: NULL
                    ts: 2014-05-02 17:03:59
 1 row in set (0.00 sec)
 CREATE TABLE `heartbeat` (
   `server_id` int(10) unsigned NOT NULL,
   `file` varchar(255) DEFAULT NULL,
   `position` bigint(20) unsigned DEFAULT NULL,
   `relay_master_log_file` varchar(255) DEFAULT NULL,
   `exec_master_log_pos` bigint(20) unsigned DEFAULT NULL,
   `ts` datetime DEFAULT NULL,
   PRIMARY KEY (`server_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1 

So my heartbeat table now has a 5.5 DATETIME, pt-heartbeat is working properly, and the heartbeat is replicating to the slave.  Now I will upgrade my master to MySQL 5.6:

[root@master ~]# rpm -e Percona-Server-devel-55-5.5.36-rel34.2.el6.x86_64 Percona-Server-shared-55-5.5.36-rel34.2.el6.x86_64 Percona-Server-client-55-5.5.36-rel34.2.el6.x86_64 Percona-Server-server-55-5.5.36-rel34.2.el6.x86_64 --nodeps
[root@master ~]# yum install Percona-Server-server-56.x86_64
==============================================================================================================
Package                             Arch              Version                       Repository          Size
==============================================================================================================
Installing:
Percona-Server-server-56            x86_64            5.6.16-rel64.2.el6            Percona             19 M
Installing for dependencies:
Percona-Server-client-56            x86_64            5.6.16-rel64.2.el6            Percona            6.8 M
Percona-Server-shared-56            x86_64            5.6.16-rel64.2.el6            Percona            712 k
Transaction Summary
==============================================================================================================
Install       3 Package(s)
...
[root@master ~]# service mysql start
Starting MySQL (Percona Server)....... SUCCESS!
[root@master ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.16-64.2-56-log Percona Server (GPL), Release 64.2, Revision 569
[root@master ~]# mysql_upgrade
Looking for 'mysql' as: mysql
Looking for 'mysqlcheck' as: mysqlcheck
Running 'mysqlcheck with default connection arguments
Running 'mysqlcheck with default connection arguments
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.general_log                                  OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.ndb_binlog_index                             OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.servers                                      OK
mysql.slow_log                                     OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.user                                         OK
Running 'mysql_fix_privilege_tables'...
Running 'mysqlcheck with default connection arguments
Running 'mysqlcheck with default connection arguments
percona.heartbeat                                  OK
OK

I can now verify that Ike’s INFORMATION_SCHEMA queries correctly detect the ‘heartbeat.ts’ column as the old format:

master mysql> select t.table_schema,t.engine,t.table_name,c.column_name,c.column_type
from information_schema.tables t
  inner join information_schema.columns c on c.table_schema = t.table_schema and c.table_name = t.table_name
  left outer join information_schema.innodb_sys_tables ist on ist.name = concat(t.table_schema,'/',t.table_name)
  left outer join information_schema.innodb_sys_columns isc on isc.table_id = ist.table_id and isc.name = c.column_name
where c.column_type in ('time','timestamp','datetime')
  and t.table_schema not in ('mysql','information_schema','performance_schema')
  and t.table_type = 'base table'
  and (t.engine != 'innodb' or (t.engine = 'innodb' and isc.mtype = 6))
order by t.table_schema,t.table_name,c.column_name;
 +--------------+--------+------------+-------------+-------------+
 | table_schema | engine | table_name | column_name | column_type |
 +--------------+--------+------------+-------------+-------------+
 | percona      | InnoDB | heartbeat  | ts          | datetime    |
 +--------------+--------+------------+-------------+-------------+
 1 row in set (0.04 sec)

To make replication work from MySQL 5.6 to 5.5, I also had to add a few backwards compatibility options on the master:

log_bin_use_v1_row_events = ON
binlog_checksum = NONE

Once I fixed that up, I can verify my slave is still working after this and receiving heartbeats. Clearly the new formats are not a show-stopper for backwards replication compatibility.

slave mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.70.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000005
          Read_Master_Log_Pos: 120
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 267
        Relay_Master_Log_File: master-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
master mysql> select * from heartbeat;
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
| server_id | file              | position | relay_master_log_file | exec_master_log_pos | ts                  |
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
|         1 | master-bin.000002 |  5115935 | NULL                  |                NULL | 2014-05-02 17:04:23 |
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
1 row in set (0.01 sec)
slave mysql> select * from heartbeat;
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
| server_id | file              | position | relay_master_log_file | exec_master_log_pos | ts                  |
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
|         1 | master-bin.000002 |  5115935 | NULL                  |                NULL | 2014-05-02 17:04:23 |
+-----------+-------------------+----------+-----------------------+---------------------+---------------------+
1 row in set (0.00 sec)

But, if I’m not careful on MySQL 5.6, and rebuild the table, the new format does clearly bite me:

master mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
master mysql> alter table percona.heartbeat force;
Query OK, 1 row affected, 1 warning (0.18 sec)
Records: 1  Duplicates: 0  Warnings: 1
master mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message                                                                             |
+-------+------+-------------------------------------------------------------------------------------+
| Note  | 1880 | TIME/TIMESTAMP/DATETIME columns of old format have been upgraded to the new format. |
+-------+------+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
slave mysql> show slave status\G
*************************** 1. row ***************************
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
...
                   Last_Errno: 1677
                   Last_Error: Column 5 of table 'percona.heartbeat' cannot be converted from type '' to type 'datetime'
...
               Last_SQL_Errno: 1677
               Last_SQL_Error: Column 5 of table 'percona.heartbeat' cannot be converted from type '' to type 'datetime'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

TL;DR

What does all this teach us?

While the MySQL version is important, for RBR what matters most is the actual current format for each column.  Your master and slave(s) MUST have the same column formats for RBR to work right.

So, the new temporal formats do not necessarily break RBR replication back to 5.5, provided:

  • All base MySQL 5.6 enhancements to replication are disabled (GTIDs, binlog checksums and the RBR v2 format)
  • Tables with temporal formats are preserved in their 5.5 formats until all 5.5 nodes are retired.
  • You can avoid creating any new tables on the MySQL 5.6 master with temporal formats

However, I want  to make it clear that MySQL 5.6 to 5.5 replication is technically unsupported.  I have not exhausted all possibilities for problems with 5.6 to 5.5 RBR replication, just this specific one. If you choose to make an upgrade strategy that relies on backwards replication in this way, be prepared for it to not work and test it thoroughly in advance.  The purpose of this post is to simply point out that data type formats, in and of themselves, do not necessarily break RBR backwards compatibility.

The post Row-based replication, MySQL 5.6 upgrades and temporal data types appeared first on MySQL Performance Blog.

Jul
23
2013
--

Another reason why SQL_SLAVE_SKIP_COUNTER is bad in MySQL

It is everywhere in the world of MySQL that if your replication is broken because an event caused a duplicate key or a row was not found and it cannot be updated or deleted, then you can use ‘

STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

 ’ and be done with it. In some cases this is fine and you can repair the offending row or statements later on. But what if the statement is part of a multi-statement transaction? Well, then it becomes more interesting, because skipping the offending statement will cause the whole transaction to be skipped. This is well documented in the manual by the way. So here’s a quick example.

3 rows on the master:

master> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  1 |   1 |
|  2 |   2 |
|  3 |   3 |
+----+-----+
3 rows in set (0.00 sec)

2 on the slave:

slave> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  1 |   1 |
|  3 |   3 |
+----+-----+
2 rows in set (0.00 sec)

Execute a transaction on the master to break replication:

master> BEGIN;
Query OK, 0 rows affected (0.00 sec)
master> DELETE FROM t WHERE id = 1;
Query OK, 1 row affected (0.00 sec)
master> DELETE FROM t WHERE id = 2;
Query OK, 1 row affected (0.00 sec)
master> DELETE FROM t WHERE id = 3;
Query OK, 1 row affected (0.00 sec)
master> COMMIT;
Query OK, 0 rows affected (0.01 sec)

Broken slave:

slave> show slave status \G
*************************** 1. row ***************************
...
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 333
...
1 row in set (0.00 sec)

An attempt to fix replication only caused bigger inconsistencies on slave:

slave> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
Query OK, 0 rows affected (0.00 sec)
slave> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  1 |   1 |
|  3 |   3 |
+----+-----+
2 rows in set (0.00 sec)

This happens because the replication honors transaction boundaries, and is definitely something you should consider the next time you try to use this workaround on a broken slave. Of course, there is pt-table-checksum and pt-table-sync to rescue you when inconsistencies occur, however, prevention is always better than cure. Make sure to put safeguards in place to prevent your slaves from drifting.

Lastly, the example above is for ROW-based replication as my colleague pointed out, but can similarly happen with STATEMENT for example with a duplicate key error.  You can optionally fix the error above by temporarily setting slave_exec_mode to IDEMPOTENT so errors because of missing rows are skipped, but then again, it does not apply in all cases like an UPDATE statement that cannot be applied because the row on the slave is missing.

Here is a demonstration of the problem with STATEMENT-based replication:

master> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  4 |   1 |
|  6 |   3 |
+----+-----+
2 rows in set (0.00 sec)
slave> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  4 |   1 |
|  5 |   2 |
|  6 |   3 |
+----+-----+
3 rows in set (0.00 sec)
master> BEGIN;
Query OK, 0 rows affected (0.00 sec)
master> delete from t where id = 4;
Query OK, 1 row affected (0.00 sec)
master> insert into t values (5,2);
Query OK, 1 row affected (0.00 sec)
master> delete from t where id = 6;
Query OK, 1 row affected (0.00 sec)
master> COMMIT;
Query OK, 0 rows affected (0.15 sec)
slave> show slave status \G
*************************** 1. row ***************************
...
               Last_SQL_Errno: 1062
               Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t values (5,2)'
...
1 row in set (0.00 sec)
slave> stop slave; set global sql_slave_skip_counter = 1; start slave;
Query OK, 0 rows affected (0.05 sec)
slave> select * from t;
+----+-----+
| id | pid |
+----+-----+
|  4 |   1 |
|  5 |   2 |
|  6 |   3 |
+----+-----+
3 rows in set (0.00 sec)

 

The post Another reason why SQL_SLAVE_SKIP_COUNTER is bad in MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com