This past week was marked by a series of personal findings related to the use of Global Transaction IDs (GTIDs) on Galera-based clusters such as Percona XtraDB Cluster (PXC). The main one being the fact that transactions touching MyISAM tables (and FLUSH PRIVILEGES!) issued on a giving node of the cluster are recorded on a GTID set bearing the node’s server_uuid as “source id” and added to the binary log (if the node has binlog enabled), thus being replicated to any async replicas connected to it. However, they won’t be replicated across the cluster (that is, all of this is by design, if wsrep_replicate_myisam is disabled, which it is by default).
My colleague Stéphane covered this story in one of his recent blog posts titled, “Percona XtraDB Cluster 5.6: a tale of 2 GTIDs,” explaining how those local (in reference to the node) transactions are saved in a different GTID set to the cluster’s main one and the impact this may cause when re-pointing an async slave to replicate from a different node.
GTIDs is a feature introduced in MySQL 5.6 that made replication management much easier and considering there’s a series of advantages in having an async replica attached to a PXC cluster, why hasn’t this popped out earlier to either of us? I guess there aren’t so many people using GTIDs with Galera-based clusters around yet so here’s a post to show you how to do it.
Initializing a PXC cluster configured with GTIDs
My testing environment for a 3-node cluster is composed of node1 (192.168.70.2), node2 (.3) and node3 (.4). All of them have the same PXC binaries installed:
$ rpm -qa |grep -i percona-xtradb-cluster Percona-XtraDB-Cluster-56-5.6.21-25.8.938.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el6.x86_64 Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el6.x86_64 Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el6.x86_64
and are configured with an almost identical /etc/my.cnf (apart from wsrep_node_address and the prompt line):
[mysql] prompt='mysql {node1} > ' [mysqld] datadir = /var/lib/mysql log_warnings=2 server_id=1 log_bin=percona-bin log_slave_updates binlog_format = ROW enforce_gtid_consistency=1 gtid_mode=on wsrep_cluster_name = my-three-node-cluster wsrep_cluster_address = gcomm://192.168.70.2,192.168.70.3,192.168.70.4 wsrep_node_address = 192.168.70.2 wsrep_provider = /usr/lib64/libgalera_smm.so wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = sst:secret innodb_locks_unsafe_for_binlog = 1 innodb_autoinc_lock_mode = 2
server_id, log_bin, log_slave_updates and binlog_format are not needed for the cluster to operate but they are required to setup regular replication later on so I’ve added those to let the nodes ready to operate as masters.
We start with an empty, inexisting datadir on all nodes so I’ll use the mysql_install_db script to create a base datadir with all that is needed for MySQL to work on node1, which will be the reference node of the cluster:
[node1]$ mysql_install_db --user=mysql
We’re now ready to bootstrap the cluster from this reference node:
[node1]$ service mysql bootstrap-pxc
With that, we have an operational reference node:
mysql [node1] > select @@server_uuid; +--------------------------------------+ | @@server_uuid | +--------------------------------------+ | 834bca7f-b45e-11e4-a7b5-0800272e951c | +--------------------------------------+ 1 row in set (0.00 sec)
Note from above my.cnf settings that I’ve chosen xtrabackup-v2 as the State Snapshot Transfer (SST) method, which requires authentication (wsrep_sst_auth). For this reason, if we now try to start MySQL on node2 it will fail with its error log showing:
2015-02-14 16:58:26 24149 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --addre ss '192.168.70.3' --auth 'sst:secret' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '24149' --binlog 'percona-bin' : 1 (Operation not permitted) 2015-02-14 16:58:26 24149 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 2015-02-14 16:58:26 24149 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 2015-02-14 16:58:26 24149 [ERROR] Aborting
The problem here is that the SST method requires MySQL authentication: even though the credentials are shared on the wsrep_sst_auth variable of all node’s my.cnf configuration file this only tells xtrabackup-v2 to require it, it doesn’t actually configure MySQL with it. That’s a step left for us:
mysql [node1] > GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret'; Query OK, 0 rows affected (0.02 sec)
And this constitutes our very first commited transaction, which goes into the cluster’s GTID set:
mysql [node1] > select @@global.gtid_executed; +----------------------------------------+ | @@global.gtid_executed | +----------------------------------------+ | 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1 | +----------------------------------------+ 1 row in set (0.00 sec)
This GRANT only needs to be issued once, in the reference node. Now you start MySQL on node2 and node3: they will use XtraBackup to make a backup of node1’s dataset, which will contain this GRANT, and restore it in their own datadir:
[node2]$ service mysql start
and:
[node3]$ service mysql start
OK, that’s done. But how do I attach an async replica to the cluster?
Just to make it clear, you cannot attach an async replica to the cluster: you need to choose a node that is member of the cluster and make it the master for the async replica. Considering all nodes should have the same data replicated it should be easy to change the async replica’s configuration and make it replicate from any other node from the cluster – and it is, though you may need to do some adjustments (more on this later).
The initial procedure is exactly the same one used with regular replication: you start by taking a backup of the master and restoring it on the replica. We’ll use XtraBackup again to perform the backup and we’ll start by having node2 as the master of our async replica (192.168.70.7). We could stream the backup from node2 directly to the async replica and later “prepare” it (by applying the logs, which needs to be done using the same version of Percona XtraBackup that you’ve used to take the backup), but to make things simple we’ll first take the backup on node2:
[node2]$ innobackupex /tmp (...) innobackupex: Using server version 5.6.21-70.1-56-log innobackupex: Created backup directory /tmp/2015-02-14_17-53-22 (...) 150214 17:53:26 innobackupex: completed OK!
then “prepare” it:
[node2]$ innobackupex --apply-log /tmp/2015-02-14_17-53-22 (...) 150214 17:56:10 innobackupex: Starting the apply-log operation (...) 150214 17:56:14 innobackupex: completed OK!
and from our async replica we’ll copy the backup from node2 using rsync over ssh (you could use scp or any other mathod to copy the files from one server to the other):
[replica]$ rsync -av -e 'ssh -l root' 192.168.70.3:/tmp/2015-02-14_17-53-22/ /var/lib/mysql
We now need to change the ownership of those files to the ‘mysql‘ user:
[replica]$ chown mysql:mysql -R /var/lib/mysql
and take note of the “replication coordinates” – but related to GTID. We have those in the xtrabackup_info file:
[replica]$ cat /var/lib/mysql/xtrabackup_info (...) binlog_pos = GTID of the last change '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1' (...)
In our case we took the backup very early in this database’s lifecycle (seeing by the GTID # in there) but in practice that’s seldom the case. We can now start MySQL:
[replica]$ service MySQL start
Before we can point our async replica to replicate from node2 we need to create a replication user with the right set of privileges there:
mysql [node2] > GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repl@'192.168.70.%' IDENTIFIED BY 'p4ssword'; Query OK, 0 rows affected (0.06 sec)
Now we can use CHANGE MASTER TO on our async replica to point it to node2 using the above credentials:
mysql [replica] > CHANGE MASTER TO MASTER_HOST='192.168.70.3', MASTER_USER='repl', MASTER_PASSWORD='p4ssword', MASTER_AUTO_POSITION = 1; Query OK, 0 rows affected, 2 warnings (0.05 sec)
If we START SLAVE now we’ll run into error 1236: we need first to set the replication coordinates from when the backup was taken. With GTID replication, this is done in a different manner: instead of providing replicate coordinates in the CHANGE MASTER TO command (where we’ve used MASTER_AUTO_POSITION=1 instead) we do it by redefining the gtid_purged global variable with the GTID sets we got from the xtrabackup_info file (in this example there’s only one set):
mysql [replica] > SET @@global.gtid_purged='7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1'; Query OK, 0 rows affected (0.02 sec)
We can now START SLAVE and check its status:
mysql [replica] > START SLAVE; SHOW SLAVE STATUSG Query OK, 0 rows affected (0.00 sec) *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.70.3 Master_User: repl (...) Slave_IO_Running: Yes Slave_SQL_Running: Yes (...) Master_Server_Id: 2 Master_UUID: 8a157d9c-b465-11e4-aafa-0800272e951c (...) Retrieved_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:2 Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-2 Auto_Position: 1 1 row in set (0.00 sec)
Note that Executed_Gtid above is showing a second transaction in the cluster’s GTID set: this is related to the GRANT statement we issued on node2 to setup the replication account and it means it is now configured on all nodes members of the cluster (and this async replica as well). With that in place we can easily point our async replica to replicate from a different node, such as node3:
mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.4'; START SLAVE; Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.01 sec)
Let’s verify this is indeed the case:
mysql [replica] > SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.70.4 Master_User: repl (...) Slave_IO_Running: Yes Slave_SQL_Running: Yes (...) Master_Server_Id: 3 Master_UUID: aa9acb85-b465-11e4-ab09-0800272e951c (...) Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-2 Auto_Position: 1
Nice! What about the caveats you were talking about in the other blog post?
The problem arises when you start to have other GTID sets on the gtid_executed variable of the nodes. This happens when a DML transaction involving a MyISAM table is issued on the node while having wsrep_replicate_myisam disabled: instead of having that transaction going to the cluster’s main GTID set it will go to a new one, bearing the node’s server_uuid as source id. In fact, that’s the behavior you find on regular MySQL replication configured with GTIDs. Here’s an example:
mysql [node1] > CREATE TABLE test.fernando1 (id int) ENGINE=MYISAM; Query OK, 0 rows affected (0.03 sec)
Creating a MyISAM table per si is not an issue, because this is a DDL, so the transaction will go to the cluster’s GTID set and replicated to the other nodes:
mysql [node1] > select @@global.gtid_executedG *************************** 1. row *************************** @@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3 1 row in set (0.00 sec)
But doing an INSERT in such a table is a DML so the transaction will go to a different GTID set:
mysql [node1] > select @@global.gtid_executedG *************************** 1. row *************************** @@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 834bca7f-b45e-11e4-a7b5-0800272e951c:1 1 row in set (0.00 sec)
Now if we move our async replica to node1 it might just works:
mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.2'; START SLAVE; Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.02 sec) Query OK, 0 rows affected (0.01 sec) mysql [replica] > show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.70.2 (...) Retrieved_Gtid_Set: 834bca7f-b45e-11e4-a7b5-0800272e951c:1 Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 834bca7f-b45e-11e4-a7b5-0800272e951c:1 Auto_Position: 1 1 row in set (0.00 sec)
and that is because transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1‘ could still be found on the binary logs of node1. Let’s do something slightly different now, but on node2:
mysql [node2] > INSERT INTO test.fernando1 values (2); Query OK, 1 row affected (0.01 sec) mysql [node2] > select @@global.gtid_executedG *************************** 1. row *************************** @@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 8a157d9c-b465-11e4-aafa-0800272e951c:1 1 row in set (0.00 sec) mysql [node2] > SHOW MASTER STATUSG *************************** 1. row *************************** File: percona-bin.000008 Position: 923 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 8a157d9c-b465-11e4-aafa-0800272e951c:1 1 row in set (0.00 sec) mysql [node2] > FLUSH LOGS; Query OK, 0 rows affected (0.01 sec) mysql [node2] > SHOW MASTER STATUSG *************************** 1. row *************************** File: percona-bin.000009 Position: 231 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 8a157d9c-b465-11e4-aafa-0800272e951c:1 1 row in set (0.00 sec) mysql [node2] > PURGE BINARY LOGS TO 'percona-bin.000009'; Query OK, 0 rows affected (0.01 sec)
Transaction ‘8a157d9c-b465-11e4-aafa-0800272e951c:1‘ that contains the INSERT statement we issued from node2 was recorded into binary log file percona-bin.000008 but the following PURGE BINARY LOGS TO command deleted that file. Now, if we point our async replica back to node2 we’ll run into a problem:
mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.3'; START SLAVE; Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.01 sec) mysql [replica] > show slave statusG *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.70.3 (...) Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' (...) Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 834bca7f-b45e-11e4-a7b5-0800272e951c:1 Auto_Position: 1 1 row in set (0.00 sec)
The master can no longer provide transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1‘ to the replica as requested thus breaking replication.
Ouch! Is there a fix for this?
Well, kind of. You can trick MySQL into believing it has processed this transaction by injecting an empty transaction bearing the same GTID of the missing one:
mysql [replica] > SELECT GTID_SUBTRACT('7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,8a157d9c-b465-11e4-aafa-0800272e951c:1', '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,834bca7f-b45e-11e4-a7b5-0800272e951c:1')G *************************** 1. row *************************** GTID_SUBTRACT('7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,8a157d9c-b465-11e4-aafa-0800272e951c:1', '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,834bca7f-b45e-11e4-a7b5-0800272e951c:1'): 8a157d9c-b465-11e4-aafa-0800272e951c:1 1 row in set (0.00 sec) mysql [replica] > SET GTID_NEXT='8a157d9c-b465-11e4-aafa-0800272e951c:1'; Query OK, 0 rows affected (0.00 sec) mysql [replica] > BEGIN;COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql [replica] > SET gtid_next = 'AUTOMATIC'; Query OK, 0 rows affected (0.00 sec) mysql [replica] > show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.70.3 (...) Slave_IO_Running: Yes Slave_SQL_Running: Yes (...) Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3, 834bca7f-b45e-11e4-a7b5-0800272e951c:1, 8a157d9c-b465-11e4-aafa-0800272e951c:1 Auto_Position: 1 1 row in set (0.00 sec)
The problem is that you’ll end up with an inconsistent dataset – injecting an empty transaction is the same as using SQL_SLAVE_SKIP_COUNTER on regular MySQL replication.
Take-home lesson
It is not complicated to configure a PXC cluster to use GTIDs, in fact you may prefer to do so once you get used to it. And it makes that all much easier to have an asynchronous slave replicating from one of the nodes, even moving it around to replicate from a different node. It should all go well while the nodes record their transactions on the cluster’s main GTID set, which is shared by all nodes: this implies all of them will have the same transactions in their binlogs, as designed. The problem appears when we start seeing a deviation on this pattern, with a few nodes recording local transactions on their “own” GTID set. This won’t bother the cluster operation per si, as these other GTID sets are simply ignored and won’t be replicated to other nodes, but they may complicate things for attached async replicas if you need to point them to a different node.
In such environments, strive to maintain a unified GTID set around the cluster’s main one and if you find out that one of the nodes has started adding transactions to a different set, investigate it. You shouldn’t be using MyISAM tables on an XtraDB Cluster as those aren’t officially supported but if you must do them you should probably consider using wsrep_replicate_myisam. Operations on mysql system tables should be done through the use of DDLs instead of DMLs (like GRANT instead of INSERT) and, above all, keep distance from this bug.
The post How to setup a PXC cluster with GTIDs (and have async slaves replicating from it!) appeared first on MySQL Performance Blog.