Apr
21
2023
--

Fixing Errant GTID With Orchestrator: The Easy Way Out

Fixing Errant GTI With Orchestrator

In this article, we will discuss errant Transaction /GTID and how we can solve them with the Orchestrator tool.

Orchestrator is a MySQL high availability and replication management tool that runs as a service and provides command line access, HTTP API, and Web interface. I will not go into the details of the Orchestrator but will explore one of the features that can help us solve the errant GTID in a replication topology.

What are errant transactions?

Simply stated, they are transactions executed directly on a replica. Thus they only exist on a specific replica. This could result from a mistake (the application wrote to a replica instead of writing to the source) or by design (you need additional tables for reports).

What problem can errant transactions cause?

The major problem it causes during a planned change in a MySQL replication topology is that the transaction is not present in the binlog and hence cannot be sent over to the replica, which causes a replication error.

So let’s jump into generating and fixing an errant transaction. Below is my current topology:

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306  0s ok 5.7.41-44-log rw ROW GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log ro ROW GTID
+ 192.168.56.30:3306 0s ok 5.7.41-44-log ro ROW GTID

Now let’s make some changes on any of the replicas, which will generate an errant transaction. On 192.168.56.20:3306, I created a test database:

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

This will result in an errant transaction, so let’s see how the Orchestrator will show the topology.

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID:errant
+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

Now you can see we have an errant transaction; we can check in more detail by using the Orchestrator API as below:

[root@monitor ~]# orchestrator-client -c which-gtid-errant -i 192.168.56.20:3306
A71a855a-dcdc-11ed-99d7-080027e6334b:1

To know which binlogs have this errant transaction, you check with the below command:

[root@monitor ~]# orchestrator-client -c locate-gtid-errant -i 192.168.56.20:3306
mysqlbinlog.000001

Checking the binlogs is very important. We should know what changes were made to the replica, and you can check that binlog for specific GTIDs.

We can get the output from replication analysis, and you use this API feature in your custom code in case you want to monitor the topology for errant transactions:

[root@monitor ~]# orchestrator-client -c api -path replication-analysis | jq . | grep -A2 -B2 "StructureAnalysis"
      "Analysis": "NoProblem",
      "Description": "",
      "StructureAnalysis": [
        "ErrantGTIDStructureWarning"

There is a more detailed way to compare the ExecutedGtidSet and GtidErrant on the whole topology. So let me show you below:

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant:.GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": "a71a855a-dcdc-11ed-99d7-080027e6334b:1"
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10",
  "GtidErrant": ""
}

So now we know about the issue, let’s fix it with the Orchestrator.

The first way to fix it is to inject an empty transaction, which can be done as below:

[root@monitor ~]# orchestrator-client -c gtid-errant-inject-empty -i 192.168.56.20:3306
192.168.56.20:3306

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'

192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID

+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID

+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

The gtid-errant-inject-empty configuration contains settings related to injecting empty transactions to reconcile Global Transaction Identifiers (GTIDs) in a MySQL replication topology. GTIDs are a way to uniquely identify transactions in a MySQL cluster, and ensuring their consistency is critical for maintaining data integrity.

So with injecting an empty transaction, the Orchestrator will inject the empty transaction from the top, it will replicate to the bottom, and that GTID will be ignored by the replica server, which already has it. So now you can see that the gti-executed set is changed, and it contains the GTID with UUID from the replica 192.168.56.20:3306.

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant: .GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}

Another way to fix this is a DANGEROUS way is to reset the master.

Orchestrator has a command gtid-errant-reset-master, applied on an instance:

Then this command “fixes” errant GTID transactions via RESET MASTER; SET GLOBAL gtid_purged…

This command is, of course, destructive to the server’s binary logs. If binary logs are assumed to enable incremental restore, then this command is dangerous.

So an example to fix an errant transaction is:

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID:errant
+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

[root@monitor ~]# orchestrator-client -c which-gtid-errant -i 192.168.56.20:3306
A71a855a-dcdc-11ed-99d7-080027e6334b:2

This is how it looks:

{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1-2",
  "GtidErrant": "a71a855a-dcdc-11ed-99d7-080027e6334b:2"
}

Let’s reset the master.

[root@monitor ~]# orchestrator-client -c gtid-errant-reset-master -i 192.168.56.20:3306
192.168.56.20:3306

Now you can see that the ExecutedGtidSet is synced with the Source ExecutedGtidSet.

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant: .GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}

But this option is risky because this command actually purged the binlogs, and if any app is tailing the logs or if binary logs are assumed to enable incremental restore, then this command is dangerous and not recommended. It’s better to use gtid-errant-inject-empty, and if you still want to use gtid-errant-reset-master on a busy replica, then stop the replication first and make sure to wait for two or three minutes, then use gtid-errant-reset-master.

Conclusion

If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And specifically, be careful if you use a tool that reconfigures replication for you. It is always recommended to use the pt-table-checksum and pt-table-sync if you ever get this kind of situation where changes were made to the replica.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Dec
02
2015
--

Fixing errant transactions with mysqlslavetrx prior to a GTID failover

GTID and errant transactionsErrant transactions are a major issue when using GTID replication. Although this isn’t something new, the drawbacks are more notorious with GTID than with regular replication.

The situation where errant transaction bites you is a common DBA task: Failover. Now that tools like MHA have support for GTID replication (starting from 0.56 version), this protocol is becoming more popular, and so are the issues with errant transactions. Luckily, the fix is as simple as injecting an empty transaction into the databases that lack the transaction. You can easily do this through the master, and it will be propagated to all the slaves.

Let’s consider the following situations:

  • What happens when the master blows up into the air and is out of the picture?
  • What happens when there’s not just one but dozens of errant transactions?
  • What happens when you have a high number of slaves?

Things start to become a little more complex.

A side note for the first case: when your master is no longer available, how can you find errant transactions? Well, you can’t. In this case, you should check for errant transactions between your slaves and your former slave/soon-to-be master.

Let’s think alternatives. What’s the workaround of injecting empty transactions for every single errant transaction to every single slave? The MySQL utility mysqlslavetrx. Basically, this utility allows us to skip multiple transactions on multiple slaves in a single step.

One way to install the MySQL utilities is by executing the following steps:

  • wget http://dev.mysql.com/get/Downloads/MySQLGUITools/mysql-utilities-1.6.2.tar.gz
  • tar -xvzf mysql-utilities-1.6.2.tar.gz
  • cd mysql-utilities-1.6.2
  • python ./setup.py build
  • sudo python ./setup.py install

And you’re ready.

What about some examples? Let’s say we have a Master/Slave server with GTID replication, current status as follows:

mysql> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 | 530      |              |                  | 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
mysql> show slave statusG
...
Executed_Gtid_Set: 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2
Auto_Position: 1
1 row in set (0.00 sec)

Add chaos to the slave in form of a new schema:

mysql> create database percona;
Query OK, 1 row affected (0.00 sec)

Now we have an errant transaction!!!!!

The slave status looks different:

mysql> show slave statusG
...
Executed_Gtid_Set: 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,
674a625e-976e-11e5-a8fb-125cab082fc3:1
Auto_Position: 1
1 row in set (0.00 sec)

By using the GTID_SUBSET function we can confirm that things go from “all right” to “no-good”:

Before:

mysql> select gtid_subset('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as is_subset;
+-----------+
| is_subset |
+-----------+
| 1         |
+-----------+
1 row in set (0.00 sec)

After:

mysql> select gtid_subset('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,674a625e-976e-11e5-a8fb-125cab082fc3:1','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as is_subset;
+-----------+
| is_subset |
+-----------+
| 0         |
+-----------+
1 row in set (0.00 sec)

All right, it’s a mess, got it. What’s the errant transaction? The GTID_SUBTRACT function will tell us:

mysql> select gtid_subtract('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,674a625e-976e-11e5-a8fb-125cab082fc3:1','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as errand;
+----------------------------------------+
| errand                                 |
+----------------------------------------+
| 674a625e-976e-11e5-a8fb-125cab082fc3:1 |
+----------------------------------------+
1 row in set (0.00 sec)

The classic way to fix this is by injecting an empty transaction:

mysql> SET GTID_NEXT='674a625e-976e-11e5-a8fb-125cab082fc3:1';
Query OK, 0 rows affected (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='AUTOMATIC';
Query OK, 0 rows affected (0.00 sec)

After this, the errant transaction won’t be errant anymore.

mysql> show master status;
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| mysql-bin.000002 | 715      |              |                  | 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,
674a625e-976e-11e5-a8fb-125cab082fc3:1                                                                                                             |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Okay, let’s add another slave to the mix. Now is the moment where the mysqlslavetrx utility becomes very handy.

What you need to know is:

  • The slave’s IP address
  • The GTID set

It will be simple to execute:

mysqlslavetrx --­­gtid­-set=6aa9a742­8284­11e5­a09b­12aac3869fc9:1­­ --verbose ­­--slaves=user:password@172.16.1.143:3306,user:password@172.16.1.144

The verbose output will look something this:

# GTID set to be skipped for each server:
# ­- 172.16.1.143@3306: 6aa9a742­8284­11e5­a09b­12aac3869fc9:1
# ­- 172.16.1.144@3306: 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 #
# Injecting empty transactions for '172.16.1.143:3306'...
# ­- 6aa9a742­8284­11e5­a09b­12aac3869fc9:1
# Injecting empty transactions for '172.16.1.144:3306'...
# ­- 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 #
#...done.
#

You can run mysqlslavetrx from anywhere (master or any slave). You just need to be sure that the user and password are valid, and have the SUPER privilege required to set the gtid_next variable.

As a summary: Take advantage of the MySQL utilities. In this particular case, mysqlslavetrx is extremely useful when using GTID replication and you want to perform a clean failover. It can be added as a pre-script for MHA failover (which supports GTID since the 0.56 version) or can be simply used to maintain consistency between master and slaves.

The post Fixing errant transactions with mysqlslavetrx prior to a GTID failover appeared first on MySQL Performance Blog.

May
18
2014
--

Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6

I have previously written about the new replication protocol that comes with GTIDs in MySQL 5.6. Because of this new replication protocol, you can inadvertently create errant transactions that may turn any failover to a nightmare. Let’s see the problems and the potential solutions.

In short

  • Errant transactions may cause all kinds of data corruption/replication errors when failing over.
  • Detection of errant transactions can be done with the GTID_SUBSET() and GTID_SUBTRACT() functions.
  • If you find an errant transaction on one server, commit an empty transaction with the GTID of the errant one on all other servers.
  • If you are using a tool to perform the failover for you, make sure it can detect errant transactions. At the time of writing, only mysqlfailover and mysqlrpladmin from MySQL Utilities can do that.

What are errant transactions?

Simply stated, they are transactions executed directly on a slave. Thus they only exist on a specific slave. This could result from a mistake (the application wrote to a slave instead of writing to the master) or this could be by design (you need additional tables for reports).

Why can they create problems that did not exist before GTIDs?

Errant transactions have been existing forever. However because of the new replication protocol for GTID-based replication, they can have a significant impact on all servers if a slave holding an errant transaction is promoted as the new master.

Compare what happens in this master-slave setup, first with position-based replication and then with GTID-based replication. A is the master, B is the slave:

# POSITION-BASED REPLICATION
# Creating an errant transaction on B
mysql> create database mydb;
# Make B the master, and A the slave
# What are the databases on A now?
mysql> show databases like 'mydb';
Empty set (0.01 sec)

As expected, the mydb database is not created on A.

# GTID-BASED REPLICATION
# Creating an errant transaction on B
mysql> create database mydb;
# Make B the master, and A the slave
# What are the databases on A now?
mysql> show databases like 'mydb';
+-----------------+
| Database (mydb) |
+-----------------+
| mydb            |
+-----------------+

mydb has been recreated on A because of the new replication protocol: when A connects to B, they exchange their own set of executed GTIDs and the master (B) sends any missing transaction. Here it is the create database statement.

As you can see, the main issue with errant transactions is that when failing over you may execute transactions ‘coming from nowhere’ that can silently corrupt your data or break replication.

How to detect them?

If the master is running, it is quite easy with the GTID_SUBSET() function. As all writes should go to the master, the GTIDs executed on any slave should always be a subset of the GTIDs executed on the master. For instance:

# Master
mysql> show master statusG
*************************** 1. row ***************************
             File: mysql-bin.000017
         Position: 376
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-30,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7
# Slave
mysql> show slave statusG
[...]
Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-29,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9
# Now, let's compare the 2 sets
mysql> > select gtid_subset('8e349184-bc14-11e3-8d4c-0800272864ba:1-29,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as slave_is_subset;
+-----------------+
| slave_is_subset |
+-----------------+
|               0 |
+-----------------+

Hum, it looks like the slave has executed more transactions than the master, this indicates that the slave has executed at least 1 errant transaction. Could we know the GTID of these transactions? Sure, let’s use GTID_SUBTRACT():

select gtid_subtract('8e349184-bc14-11e3-8d4c-0800272864ba:1-29,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30,
8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as errant_transactions;
+------------------------------------------+
| errant_transactions                      |
+------------------------------------------+
| 8e3648e4-bc14-11e3-8d4c-0800272864ba:8-9 |
+------------------------------------------+

This means that the slave has 2 errant transactions.

Now, how can we check errant transactions if the master is not running (like master has crashed, and we want to fail over to one of the slaves)? In this case, we will have to follow these steps:

  • Check all slaves to see if they have executed transactions that are not found on any other slave: this is the list of potential errant transactions.
  • Discard all transactions originating from the master: now you have the list of errant transactions of each slave

Some of you may wonder how you can know which transactions come from the master as it is not available: SHOW SLAVE STATUS gives you the master’s UUID which is used in the GTIDs of all transactions coming from the master.

How to get rid of them?

This is pretty easy, but it can be tedious if you have many slaves: just inject an empty transaction on all the other servers with the GTID of the errant transaction.

For instance, if you have 3 servers, A (the master), B (slave with an errant transaction: XXX:3), and C (slave with 2 errant transactions: YYY:18-19), you will have to inject the following empty transactions in pseudo-code:

# A
- Inject empty trx(XXX:3)
- Inject empty trx(YYY:18)
- Inject empty trx(YYY:19)
# B
- Inject empty trx(YYY:18)
- Inject empty trx(YYY:19)
# C
- Inject empty trx(XXX:3)

Conclusion

If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And be specifically careful if you use a tool that reconfigures replication for you: at the time of writing, only mysqlrpladmin and mysqlfailover from MySQL Utilities can warn you if you are trying to perform an unsafe topology change.

The post Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com