Mar
13
2018
--

The Multi-Source GTID Replication Maze

Multi-Source GTID Replication

In this blog post, we’ll look at how to navigate some of the complexities of multi-source GTID replication.

GTID replication is often a real challenge for DBAs, especially if this has to do with multi-source GTID replication. A while back, I came across a really interesting customer environment with shards where multi-master, multi-source, multi-threaded MySQL 5.6 MIXED replication was active. This is a highly complex environment that has both pros and cons, introducing risks as a trade-off for specific customer requirements.

This is the set up of part of this environment:

I started looking into this setup when a statement broke replication between db1 and db10. Replication broke due to a statement executed on a schema that was not present on db10. This also resulted in changes originating from db1 to not being pushed down to db100 as db10, as we stopped the replication thread (for db1 channel).

On the other hand, replication was not stopped on db2 because the schema in question was present on db2. Replication between db2 and db20 was broken as well because the schema was not present in db20.

In order to fix db1->db10 replication, four GTID sets were injected in db10.

Here are some interesting blog posts regarding how to handle/fix GTID replication issues:

After injecting the GTID sets, we started replication again and everything ran fine.

 

After that, we had to check the db2->db20 replication, which, as I’ve already said, was broken as well. In this case, injecting only the first GTID trx into db20 instead of all of those causing issues on db10 was enough!

You may wonder how this is possible. Right? The answer is that the rest of them were replicated from db10 to db20, although the channel was not the same.

Another strange thing is the fact that although the replication thread for the db2->db20 channel was stopped (broken), checking the slave status on db20 showed that Executed_Gtid_Set was moving for all channels even though Retrieved_Gtid_Set for the broken one was stopped! So what was happening there?

This raised my curiosity, so I decided to do some further investigation and created scenarios regarding other strange things that could happen. An interesting one was about the replication filters. In our case, I thought “What would happen in the following scenario … ?”

Let’s say we write a row from db1 to db123.table789. This row is replicated to db10 (let’s say using channel 1) and to db2 (let’s say using channel2). On channel 1, we filter out the db123.% tables, on channel2 we don’t. db1 writes the row and the entry to the binary log. db2 writes the row after reading the entry from the binary log and subsequently writes the entry to its own binary log and replicates this change to db20. This change is also replicated to db10. So now, on db10 (depending on which channel finds the GTID first) it either gets filtered on channel1 and written to its own bin log at just startcommit with any actual DDL/DML removed, or if it is read first on channel2 (db1->db2 and then db20->db10) then it is NOT filtered out and executed instead. Is this correct? It definitely ISN’T!

Points of interest

You can find answers to the above questions in the points of interest listed below. Although it’s not really clear through the official documentation, this is what happens with GTID replication and multi-source GTID replication:

  • As we know GTID sets are unique across all nodes in a given cluster. In multi-source replication, Executed_Gtid_Set is common for all channels. This means that regardless the originating channel, when a GTID transaction is executed it is recorded in all channels’ Executed_Gtid_Set. Although it’s logical (each database is unique, so if a trx is going to affect a database it shouldn’t be tightened to a single channel regardless of the channel it uses), the documentation doesn’t provide much info around this.
  • When we have multi-source, multi-level replication, there are cases where the GTID sets originating from one master can end up on one slave via different replication paths. It’s not clear if it applies any special algorithm (although it doesn’t seem that there could be one), but the preferred method seems to be FIFO. The fastest wins! This means that GTID sets can travel to the slave via different channels, and it’s related to how fast the upper-level slaves can commit changes. In fact, the path doesn’t really matter as it only executes each GTID trx once.
  • Replication filters are global regardless the channel. This means they apply each filter to all channels. This is normal as we can’t define a replication filter per channel. In order to be able to debug such cases, adding a small replication delay per channel seems a good idea.
Nov
04
2015
--

MySQL 5.7: CHANGE REPLICATION FILTER online

shutterstock_253451584MySQL 5.7 has a lot of enhancements and new features. I summarized this list previously in this blog post.

Adding replication filters online is one of the MySQL 5.7 features described in this manual. However, I will describe and summarize a few examples in this blog post.

Filtering replication events is also known as Partial Replication. Partial replication can be done from the master or slave. Filtering events on the master server via binlog-do-db and binlog-ignore-db is not a good idea, as explained in this post. However, if you really need it, partial replication on the slave is a better option. Our CEO Peter Zaitsev wrote a detailed blog post some time back on filtered MySQL replication that you may find useful.

Partial replication works differently for statement-based and row-based replication. You can find details in the manual and on this manual page. It is useful to know how MySQL evaluates partial replication rules.

Prior to MySQL 5.7, adding/changing replication rules required bouncing the MySQL server. In MySQL 5.7 adding/changing replication filter rules becomes an online operation without restarting MySQL server, using the CHANGE REPLICATION FILTER command. Let me share a few examples:

Currently, the slave server runs without filtered replication and this can be verified by the slave status and the last five variables. Replicate_* has an empty value, meaning no replication filter rules are set.

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 351
               Relay_Log_File: centos59-relay-bin.000003
                Relay_Log_Pos: 566
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
.

The master database server contains db1-db4. Let’s replicate only db1 and db2 out of the four databases. This can be done with the help of replicate-do-db option.

mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2);
ERROR 3017 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD first

This error is CHANGE REPLICATION FILTER is not supported on a running slave. We have to stop the SQL slave thread first and re-run the command to set replicate-do-db option. Checking the slave status confirmed that the slave now only replicates db1 and db2 out of the four databases from the master.

mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2);
Query OK, 0 rows affected (0.00 sec)
mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 505
               Relay_Log_File: centos59-relay-bin.000003
                Relay_Log_Pos: 720
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: db1,db2
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
.

In order to remove that filter, you need an empty value for the filter name; i.e., replicate-do-db as below. The slave status verified that there are no replication filters set against the replicate-do-db variable.

mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.03 sec)
mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = ();
Query OK, 0 rows affected (0.00 sec)
mysql> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000002
          Read_Master_Log_Pos: 1629
               Relay_Log_File: centos59-relay-bin.000003
                Relay_Log_Pos: 1844
        Relay_Master_Log_File: master-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
.

Moreover, multiple, different replication filters can be set in one command and should be separated with a comma as below:

mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.03 sec)
mysql> CHANGE REPLICATION FILTER
REPLICATE_WILD_DO_TABLE = ('db1.db1_new%'),
REPLICATE_WILD_IGNORE_TABLE =  ('db1.db1_old%');
mysql> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000003
          Read_Master_Log_Pos: 448
               Relay_Log_File: centos59-relay-bin.000006
                Relay_Log_Pos: 663
        Relay_Master_Log_File: master-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table: db1.db1_new%
  Replicate_Wild_Ignore_Table: db1.db1_old1%
.

The slave status verifies that there are a couple of replication filters set where db1.db1_new replicates binary log events to slave, which ignores replication events on the slave for db1.db1_old table(s) as per Replicate_Wild_Ignore_Table filter. Also, if the database or table name doesn’t contain any special characters, then it’s not necessary to be quoted as a value for the filter. However, Replicate_Wild_Do_Table and Replicate_Wild_Ignore_Table are string expressions and may contain wild cards, so they must be quoted.

With the CHANGE REPLICATION FILTER command, you can’t set the same filtering rule multiple times, as opposed to behavior where you can set multiple filters for the same rule in my.cnf file by specifying it multiple times. With the CHANGE REPLICATION FILTER command, if you try to set multiple filters for the same rule then only the last rule will be activated and all above rules will be ignored as illustrated in the following example:

mysql> SELECT * FROM db1.db1_old;
Empty set (0.00 sec)
mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.03 sec)
mysql> CHANGE REPLICATION FILTER
REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%'),
REPLICATE_WILD_DO_TABLE =  ('db2.db2_tbl2%');
mysql> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000003
          Read_Master_Log_Pos: 980
               Relay_Log_File: centos59-relay-bin.000006
                Relay_Log_Pos: 1195
        Relay_Master_Log_File: master-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table: db2.db2_tbl2%
  Replicate_Wild_Ignore_Table:
.

As you can see, db2.db2_tbl1 table is ignored and only the last rule for db2.db2_tbl2 table is activated.
As I mentioned before, to unset filters of any given type you need to set that particular filter to an empty value. The below example will unset Replicate_Wild_Do_Table filter.

mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ();

However, you may set multiple filters in one command by separating each rule with a comma as in the previous example with Replicate_Do_DB. Let’s set multiple rules for the Replicate_Wild_Do_Table option via the CHANGE REPLICATION FILTER command.

mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.03 sec)
mysql> CHANGE REPLICATION FILTER
REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%','db2.db2_tbl2%');
Query OK, 0 rows affected (0.00 sec)
mysql> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.130
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000008
          Read_Master_Log_Pos: 154
               Relay_Log_File: centos59-relay-bin.000013
                Relay_Log_Pos: 369
        Relay_Master_Log_File: master-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table: db2.db2_tbl1%,db2.db2_tbl2%
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
.
.

Conclusion:
Partial replication is not a great solution in most cases. All of the replicate options replicate-do-table, replicate-ignore-table, replicate-wild-do-table and replicate-wild-ignore-table work differently. You need to use the default database in order to work filtering normally and it behaves differently with a different binlog format. Filters other than replicate-wild-do-table and replicate-wild-ignore-table might not work as expected and events with wild% filters stored procedures and stored functions may be inconsistent.

The post MySQL 5.7: CHANGE REPLICATION FILTER online appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com