Jan
27
2026
--

Automatic “Multi-Source” Async Replication Failover Using PXC Replication Manager

Automatic "Multi-Source" Async Replication Failover Using PXC Replication ManagerThe replication  manager script can be particularly useful in complex PXC/Galera topologies that require Async/Multi-source replication. This will ease the auto source and replica failover to ensure all replication channels are healthy and in sync. If certain nodes shouldn’t  be part of a async/multi-source replication, we can disable the replication manager script there to tightly controlled the flow. Alternatively, node participation can be controlled by adjusting the weights in the percona.weight table, allowing replication behavior to be managed more precisely.

Jan
23
2026
--

MySQL January 2026 Performance Review

MySQL January 2026 Performance ReviewThis article is focused on describing the latest performance benchmarking executed on the latest releases of Community MySQL, Percona Server for MySQL and MariaDB.  In this set of tests I have used the machine described here.  Assumptions There are many ways to run tests, and we know that results may vary depending on how you […]

Jan
12
2026
--

Using PXC Replication Manager to Auto Manage Both Source and Replica Failover in Galera-Based Environments

In this blog post, we will be discussing the PXC Replication Manager script/tool which basically facilitates both source and replica failover when working with multiple PXC clusters, across different DC/Networks connected via asynchronous replication mechanism. Such topologies emerge from requirements like database version upgrades, reporting or streaming for applications, separate disaster recovery or backup solutions, […]

Dec
18
2024
--

How To Deal with a AUTO_INCREMENT Max Value Problem in MySQL and MariaDB

AUTO_INCREMENT Max Value Problem in MySQLAn application down due to not being able to write into a table anymore due to a maximum allowed auto-increment value may be one of the worst nightmares of a DBA. Typical errors related to this problem in MySQL will look like this: [crayon-6765d4e60c8c7045134571/] or [crayon-6765d4e60c8ce742467398/] While the solution could be easy and fairly quick […]

May
23
2024
--

How to Migrate From MariaDB to MySQL

How to Migrate From MariaDB to MySQLMySQL and MariaDB are two major open source database management systems that share a common codebase and history. MariaDB started as a MySQL fork in 2009 to provide an alternate database version following Oracle’s acquisition of MySQL.While MariaDB has many features and has grown in popularity among users seeking a more open and community-driven development […]

May
16
2024
--

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Choosing the Right DatabaseSelecting the appropriate database technology for your needs is crucial for ensuring compatibility with existing applications and infrastructure, optimizing application performance, and reducing costs. With so many different choices available, it can be difficult to make such a decision!This blog post provides a head-to-head comparison of MariaDB vs. PostgreSQL, MongoDB, and MySQL, highlighting their similarities […]

Nov
01
2023
--

Is ANALYZE TABLE Safe on a Busy MySQL Database Server?

ANALYZE TABLE Safe on a Busy Database

Sometimes, there is a need to update the table and index statistics manually using the ANALYZE TABLE command. Without going further into the reasons for such a need, I wanted to refresh this subject in terms of overhead related to running the command on production systems. However, the overhead discussed here is unrelated to the usual cost of diving into table rows to gather statistics, which we can control by setting the number of sample pages

Five years ago, my colleague Sveta posted a nice blog post about an improvement introduced in Percona Server for MySQL to address unnecessary stalls related to running the command:

ANALYZE TABLE Is No Longer a Blocking Operation

Historically, the problem with running the ANALYZE TABLE command in MySQL was that the query needed an exclusive lock on the table definition cache entry for the table. This makes the query wait for any long-running queries to finish but also can trigger cascading waiting for other incoming requests. In short, ANALYZE could lead to nasty stalls in busy production environments.

A lot has changed since then, but many production systems alive today still run with affected versions. Let’s recap how the situation has evolved over the years. 

MySQL Server – Community Edition

The problem applies to all versions of the upstream MySQL Community up to 8.0.23. There were no improvements in the 5.7 series (btw, EOL will be reached this month!), which means even the latest 5.7.43 is affected. Here is an example scenario you may end up here:

mysql > select @@version,@@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 5.7.43    | MySQL Community Server (GPL) |
+-----------+------------------------------+
1 row in set (0.00 sec)

mysql > show processlist;
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
| Id | User     | Host      | db   | Command | Time | State                   | Info                                                           |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
|  4 | msandbox | localhost | db1  | Query   |   54 | Sending data            | select avg(k) from sbtest1 where pad not like '%f%' group by c |
| 13 | msandbox | localhost | db1  | Query   |   29 | Waiting for table flush | analyze table sbtest1                                          |
| 17 | msandbox | localhost | db1  | Query   |    0 | starting                | show processlist                                               |
| 18 | msandbox | localhost | db1  | Query   |   15 | Waiting for table flush | select * from sbtest1 where id=100                             |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+
4 rows in set (0.00 sec)

One long query made the ANALYZE wait, but another, normally very fast query, is now waiting, too.

The same situation may happen in MySQL 8.0 series, including 8.0.23. Fortunately, there was a fix in version 8.0.24 addressing this problem. We can only read a bit restrained comment in the release notes about the “wait eliminated”:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-24.html

Indeed, since version 8.0.24, a similar test during a long-running query results in instant query execution:

mysql > select @@version,@@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 8.0.24    | MySQL Community Server - GPL |
+-----------+------------------------------+
1 row in set (0.00 sec)

mysql > analyze table sbtest1;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| db1.sbtest1 | analyze | status   | OK       |
+-------------+---------+----------+----------+
1 row in set (0.00 sec)

However, we can still find a warning in the official documentation, even for the 8.1 version, like this:

ANALYZE TABLE removes the table from the table definition cache, which requires a flush lock. If there are long running statements or transactions still using the table, subsequent statements and transactions must wait for those operations to finish before the flush lock is released. Because ANALYZE TABLE itself typically finishes quickly, it may not be apparent that delayed transactions or statements involving the same table are due to the remaining flush lock.

I requested an update of the related bug report as well as the documentation problem accordingly:

https://bugs.mysql.com/bug.php?id=87065
https://bugs.mysql.com/bug.php?id=112670

Percona Server for MySQL

As mentioned above, Percona introduced a fix and removed unnecessary table definition cache lock as a result of solving this bug report:

https://jira.percona.com/browse/PS-2503

When using the Percona variant, running ANALYZE TABLE was safe already since versions 5.6.38 and 5.7.20, as these were the active development series at the time. You may read the announcement in the release notes here:

https://docs.percona.com/percona-server/5.7/release-notes/Percona-Server-5.7.20-18.html#bugs-fixed

Percona Server for MySQL version 8.0 has been free from the issue since the very first release (I tested back, including the first GA release 8.0.13-3), as the improvement was merged from the Percona Server for MySQL 5.7 series.

MariaDB server

The locking ANALYZE TABLE problem applies to all MariaDB versions up to 10.5.3. In version 10.5.4, the solution from Percona was implemented as described in the following report:
https://jira.mariadb.org/browse/MDEV-15101

Therefore, when you run the query in 10.5.3 or lower, and in any previous series, like even the latest 10.4.31, a similar situation may occur:

mysql > select @@version,@@version_comment;
+----------------+-------------------+
| @@version      | @@version_comment |
+----------------+-------------------+
| 10.5.3-MariaDB | MariaDB Server    |
+----------------+-------------------+
1 row in set (0.000 sec)

mysql > show processlist;
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
| Id | User     | Host      | db   | Command | Time | State                   | Info                                                           | Progress |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
|  4 | msandbox | localhost | db1  | Query   |   18 | Sending data            | select avg(k) from sbtest1 where pad not like '%f%' group by c |    0.000 |
| 13 | msandbox | localhost | db1  | Query   |   16 | Waiting for table flush | analyze table sbtest1                                          |    0.000 |
| 14 | msandbox | localhost | db1  | Query   |   14 | Waiting for table flush | select * from sbtest1 where id=100                             |    0.000 |
| 15 | msandbox | localhost | NULL | Query   |    0 | starting                | show processlist                                               |    0.000 |
+----+----------+-----------+------+---------+------+-------------------------+----------------------------------------------------------------+----------+
4 rows in set (0.000 sec)

mysql > select @@version,@@version_comment;
+-----------------+-------------------+
| @@version       | @@version_comment |
+-----------------+-------------------+
| 10.4.31-MariaDB | MariaDB Server    |
+-----------------+-------------------+
1 row in set (0.000 sec)

mysql > show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                    | Info                                                           | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
|  1 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL                                                           |    0.000 |
|  2 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  3 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL                                                           |    0.000 |
|  5 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL                                                           |    0.000 |
|  9 | msandbox    | localhost | db1  | Query   |   18 | Sending data             | select avg(k) from sbtest1 where pad not like '%f%' group by c |    0.000 |
| 18 | msandbox    | localhost | db1  | Query   |   16 | Waiting for table flush  | analyze table sbtest1                                          |    0.000 |
| 19 | msandbox    | localhost | db1  | Query   |   12 | Waiting for table flush  | select * from sbtest1 where id=100                             |    0.000 |
| 22 | msandbox    | localhost | NULL | Query   |    0 | Init                     | show processlist                                               |    0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+----------------------------------------------------------------+----------+
9 rows in set (0.000 sec)

Summary

As long as your database runs on the most recent version of MySQL or MariaDB variant, running ANALYZE TABLE should be absolutely safe and not cause any unexpected stalls.

Users of all three major Percona Server for MySQL series – 5.6.38+, 5.7.20+, and 8.0.x are all safe.

When, for any reason, you are not able to upgrade the community variant to the latest MySQL 8.0.24+ version yet and have to stick with 5.6 or 5.7 series for now, you may just swap MySQL Community binaries to Percona Server for MySQL ones, which are 100% compatible yet free from the problem. At the same time, you may check our post-EOL support for Percona Server for MySQL 5.7.

MariaDB users must upgrade to 10.5.4 or later to avoid the locking problem.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Sep
15
2023
--

Convert MariaDB Binary Log File and Position-Based Replication to GTID Replication

Convert MariaDB Binary Log File

At Percona Managed Services, we manage Percona Server for MySQL, Community MySQL, and MariaDB. There are slight differences when configuring and managing MariaDB GTID replication. In this blog, we’ll show you how to convert MariaDB binary log file and position-based replication to GTID replication.

In my lab, we have two test nodes; both servers will have Debian 11 and MariaDB10.5.19 with mariabackup installed.

The PRIMARY server is deb11m8m (IP: 192.168.56.190), and the REPLICA server name is deb11m8s (IP: 192.168.56.191).

1. Install MariaDB on Debian 11 on PRIMARY and REPLICA

wget https://downloads.mariadb.com/MariaDB/mariadb_repo_setup
chmod +x  ./mariadb_repo_setup
./mariadb_repo_setup --mariadb-server-version="mariadb-10.5.19"
 apt-get install mariadb-server mariadb-backup
systemctl daemon-reload
root@deb11m8:~#wget https://downloads.mariadb.com/MariaDB/mariadb_repo_setup
 https://downloads.mariadb.com/MariaDB/mariadb_repo_setup
downloads.mariadb.com (downloads.mariadb.com)|104.17.191.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36188 (35K) [application/octet-stream]
Saving to: ‘mariadb_repo_setup’
2023-08-30 14:42:51 (26.0 MB/s) - ‘mariadb_repo_setup’ saved [36188/36188]
root@deb11m8m:~#  chmod +x  ./mariadb_repo_setup
root@deb11m8m:~#  ./mariadb_repo_setup --mariadb-server-version="mariadb-10.5.19"
# [info] Checking for script prerequisites.
# [warning] Found existing file at /etc/apt/sources.list.d/mariadb.list. Moving to /etc/apt/sources.list.d/mariadb.list.old_1
# [info] MariaDB Server version 10.5.19 is valid
# [info] Repository file successfully written to /etc/apt/sources.list.d/mariadb.list
# [info] Adding trusted package signing keys...
# [info] Running apt-get update…
# [info] Done adding trusted package signing keys
root@deb11m8m:~#
root@deb11m8m:~#  apt-get install mariadb-server mariadb-backup
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
 libconfig-inifiles-perl libmecab2 libopengl0
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libdbd-mariadb-perl libmariadb3 mariadb-client-10.5 mariadb-client-core-10.5 mariadb-common mariadb-server-10.5 mariadb-server-core-10.5
Suggested packages:
  mailx mariadb-test netcat-openbsd
TSelecting previously unselected package mariadb-client-core-10.5.
Preparing to unpack .../mariadb-client-core-10.5_1%3a10.5.19+maria~deb11_amd64.deb ...
Unpacking mariadb-client-core-10.5 (1:10.5.19+maria~deb11) ...
……
Setting up mariadb-server (1:10.5.19+maria~deb11) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.31-13+deb11u6) ...
root@deb11m8m:~#

1.1 Reset root user password and create a replication user

MariaDB [(none)]&gt;Alter user 'root'@'localhost' identified by ‘#######’;
MariaDB [(none)]&gt;GRANT PROXY ON ''@'%' TO 'root'@'localhost' WITH GRANT OPTION  ;
MariaDB [(none)]flush privileges;
MariaDB [(none)]CREATE USER 'repl'@'%' IDENTIFIED BY ‘#######’;
MariaDB [(none)]GRANT RELOAD, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl'@'%';
 MariaDB [(none)]flush privileges;

1.2 Configure my.cnf on PRIMARY and REPLICA

PRIMARY

[mysqld]
pid-file                = /run/mysqld/mysqld.pid
basedir                 = /usr
server_id=1
bind-address=192.168.56.190
log-bin = /var/lib/mysql/bin_log
expire_logs_days        = 10
character-set-server  = utf8mb4
collation-server      = utf8mb4_general_ci

REPLICA
[mysqld]
pid-file                = /run/mysqld/mysqld.pid
basedir                 = /usr
<b>server_id=2</b>
<b>bind-address=192.168.56.191</b>
log-bin = /var/lib/mysql/bin_log
expire_logs_days        = 10
character-set-server  = utf8mb4
collation-server      = utf8mb4_general_ci

2. Let’s set up binary log and position-based replication between PRIMARY and REPLICA with the stream copy method.

2.1 on REPLICA

root@deb11m8s: nc -l -p 2222 | mbstream&nbsp; -x -C /var/lib/mysql

2.2 on PRIMARY

root@deb11m8m:/var/lib/mysql# mariabackup --stream=mbstream --parallel=4&nbsp; --backup |&nbsp; nc -w 2 192.168.56.191 2222
[00] 2023-08-30 18:02:08 Connecting to server host: localhost, user: root, password: set, port: not set, socket: /run/mysqld/mysqld.sock
[00] 2023-08-30 18:02:08 Using server version 10.5.19-MariaDB-1:10.5.19+maria~deb11-log
 mariabackup based on MariaDB server 10.5.19-MariaDB debian-linux-gnu (x86_64)
[00] 2023-08-30 18:02:08 uses posix_fadvise().
[00] 2023-08-30 18:02:08 cd to /var/lib/mysql/
 [00] 2023-08-30 18:02:08 open files limit requested 0, set to 1024
 [00] 2023-08-30 18:02:08 mariabackup: using the following InnoDB configuration:
 [00] 2023-08-30 18:02:08 innodb_data_home_dir =
 [00] 2023-08-30 18:02:08 innodb_data_file_path = ibdata1:12M:autoextend
 [00] 2023-08-30 18:02:08 innodb_log_group_home_dir = ./
 [00] 2023-08-30 18:02:08 InnoDB: Using Linux native AIO
 2023-08-30 18:02:08 0 [Note] InnoDB: Number of pools: 1
 [00] 2023-08-30 18:02:08 mariabackup: Generating a list of tablespaces
 [00] 2023-08-30 18:02:08 >> log scanned up to (10912597)
 [00] 2023-08-30 18:02:08 mariabackup: Starting 4 threads for parallel data files transfer
[03] 2023-08-30 18:02:08 Streaming ./mysql/innodb_index_stats.ibd
[04] 2023-08-30 18:02:08 Streaming ibdata1
……..
'/var/lib/mysql/xtrabackup_backupfiles/'
 [00] 2023-08-30 18:02:10 MySQL binlog position: filename 'bin_log.000002', position '326', GTID of the last change ''
 [00] 2023-08-30 18:02:10 Streaming backup-my.cnf
[00] 2023-08-30 18:02:10 Streaming xtrabackup_info
[00] 2023-08-30 18:02:10 Redo log (from LSN 10912585 to 10912597) was copied.
 [00] 2023-08-30 18:02:10 completed OK!

2.3 On REPLICA, get binlog information

<root@deb11m8s:/var/lib/mysql# cat xtrabackup_binlog_info
bin_log.000002 326

2.4 On REPLICA, prepare the backup

root@deb11m8s:/var/lib/mysql# mariabackup --prepare &nbsp; --use-memory=1G&nbsp; --target-dir=/var/lib/mysql
mariabackup based on MariaDB server 10.5.19-MariaDB debian-linux-gnu (x86_64)
[00] 2023-08-30 18:08:32 cd to /var/lib/mysql/
[00] 2023-08-30 18:08:32 open files limit requested 0, set to 1024
.
2023-08-30 18:08:32 0 [Note] InnoDB: Initializing buffer pool, total size = 1073741824, chunk size = 1073741824
2023-08-30 18:08:32 0 [Note] InnoDB: Completed initialization of buffer pool
2023-08-30 18:08:32 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=10912585,10912585
[00] 2023-08-30 18:08:32 Last binlog file , position 0
[00] 2023-08-30 18:08:32 completed OK!
root@deb11m8s:/var/lib/mysql#

2.5 Set up the replication 

root@deb11m8s:/var/lib/mysql# chown -R mysql:mysql /var/lib/mysql
root@deb11m8s:/var/lib/mysql#  systemctl start mysql
root@deb11m8s:/var/lib/mysql# mysql
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MariaDB connection id is 6
Server version: 10.5.19-MariaDB-0+deb11u2-log Debian 11
MariaDB [(none)]&gt; stop slave; reset slave; reset slave all; reset master;
Query OK, 0 rows affected, 1 warning (0.000 sec)
Query OK, 0 rows affected (0.001 sec)
Query OK, 0 rows affected (0.000 sec)
Query OK, 0 rows affected (0.009 sec)
ariaDB [(none)]&gt; CHANGE MASTER TO
    MASTER_HOST='192.168.56.190',
    MASTER_USER='repl',
    MASTER_PASSWORD='#######',
    MASTER_PORT=3306,
    MASTER_LOG_FILE='bin_log.000002',
   MASTER_LOG_POS=326,
   MASTER_CONNECT_RETRY=10;
Query OK, 0 rows affected (0.022 sec)
MariaDB [(none)]&gt; start slave;
Query OK, 0 rows affected (0.003 sec)
MariaDB [(none)]&gt; show slave statusG
*************************** 1. row ***************************
            Slave_IO_State: Waiting for master to send event
                 Master_Host: 192.168.56.190
                  Master_User: repl
                 Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: bin_log.000002
          Read_Master_Log_Pos: 326
                Relay_Log_File: mysqld-relay-bin.000002
                 Relay_Log_Pos: 553
       Relay_Master_Log_File: bin_log.000002
       Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
           …
        Exec_Master_Log_Pos: 326
           Relay_Log_Space: 863
         ….
          Master_SSL_Crlpath:
            Using_Gtid: No
                 Gtid_IO_Pos:
   Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: optimistic
                SQL_Delay: 0
         SQL_Remaining_Delay: NULL
     Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
             Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
   Slave_Transactional_Groups: 0
1 row in set (0.000 sec)

We have now set up binary log and position-based replication between the PRIMARY and REPLICA servers. Our next step is to convert this replication to GTID-based replication.

MariaDB and MySQL have different GTID implementations.

MariaDB GTID consists of three numbers separated with dashes ‘-‘. For example:

0-1-10: The first number, 0, is the domain ID, the second number is the server ID, and the third number is the sequence number.

MySQL GTID has two parts, source_id  and transaction_id, separated by a colon character (:) .

e.g., 3E11FA47-71CA-11E1-9E33-C80AA9429562:23

The source_id  is the source server’s server_uuid, and the transaction_id is the transaction sequence number committed on the source.

Let’s continue.

On PRIMARY, we enable GTID by running the below command.

MariaDB [(none)] SET GLOBAL gtid_domain_id = 1;
MariaDB [(none)] set global gtid_strict_mode=1 ;
And persist  in my.cnf 
Verify GTID is enabled with below query 
MariaDB [(none)]&gt; select @@server_id,@@bind_address,@@log_bin,@@gtid_strict_mode,@@gtid_domain_id;
+-------------+----------------+-----------+--------------------+------------------+
| @@server_id | @@bind_address | @@log_bin | @@gtid_strict_mode | @@gtid_domain_id |
+-------------+----------------+-----------+--------------------+------------------+
|           1 | 192.168.56.190 |         1 |                  <b>1 |                1 |
+-------------+----------------+-----------+--------------------+------------------+
1 row in set (0.000 sec)</b>

ON REPLICA

MariaDB [(none)]>SET GLOBAL gtid_domain_id = 2;
MariaDB [(none)]set global gtid_strict_mode=1 ;
And persist  in my.cnf 
Verify GTID is enabled with below query 
MariaDB [(none)]select @@server_id,@@bind_address,@@log_bin,@@gtid_strict_mode,@@gtid_domain_id;
+-------------+----------------+-----------+--------------------+------------------+
| @@server_id | @@bind_address | @@log_bin | @@gtid_strict_mode | @@gtid_domain_id |
+-------------+----------------+-----------+--------------------+------------------+
|  2 | 192.168.56.191 | 1 | 1 |2 |;
+-------------+----------------+-----------+--------------------+------------------+
1 row in set (0.000 sec)

On REPLICA, we need to stop replication and extract the values of Relay_Master_Log_File and Exec_Master_Log_Pos from the output.

MariaDB [test_db]STOP SLAVE;
Query OK, 0 rows affected (0.007 sec)
MariaDB [test_db] show slave statusG
*************************** 1. row ***************************
            Slave_IO_State:
               Master_Host: 192.168.56.190
            Master_User: repl
                Master_Port: 3306
               Connect_Retry: 10
             Master_Log_File: bin_log.000002
        Read_Master_Log_Pos: 1206
               Relay_Log_File: mysqld-relay-bin.000007
              Relay_Log_Pos: 553
   Relay_Master_Log_File: bin_log.000002
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
      Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
 Exec_Master_Log_Pos: 1206
           Relay_Log_Space: 863
    ……..
Slave_Non_Transactional_Groups: 0
   Slave_Transactional_Groups: 0
1 row in set (0.000 sec)

In our case, the Relay_Master_Log_File value is bin_log.000002, and the Exec_Master_Log_Pos is 1206.

On PRIMARY, we could use the below query to get the GTID position that corresponds to these binary log coordinates on PRIMARY.

MariaDB[test_db] SELECT BINLOG_GTID_POS('bin_log.000002', 1206),@@hostname;
  +-----------------------------------------+------------+
| BINLOG_GTID_POS('bin_log.000002', 1206) | @@hostname |
+-----------------------------------------+------------+
| 1-1-1,0-1-4 &nbsp; |
+-----------------------------------------+------------+
1 row in set (0.000 sec)

We can now modify the replica configuration on the REPLICA server deb11m8s by executing the following statements:

MariaDB [test_db]SET GLOBAL gtid_slave_pos = '1-1-1,0-1-4';
MariaDB [test_db] CHANGE MASTER TO master_use_gtid=slave_pos;
MariaDB [test_db]START SLAVE;
MariaDB [test_db] show slave statusG
    *************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                     Master_Host: 192.168.56.190
                    Master_User: repl
                      Master_Port: 3306
                 Connect_Retry: 10
              Master_Log_File: bin_log.000002
          Read_Master_Log_Pos: 1391
                 Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 910
         Relay_Master_Log_File: bin_log.000002
 Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                Replicate_Do_DB:
               Replicate_Ignore_DB:
              Replicate_Do_Table:
           Replicate_Ignore_Table:
         Replicate_Wild_Do_Table:
     Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                    Skip_Counter: 0
              Exec_Master_Log_Pos: 1391
                Relay_Log_Space: 1220
      …….
              Master_SSL_Key:
           Seconds_Behind_Master: 0
  Master_SSL_Verify_Server_Cert: No
                  Last_IO_Errno: 0
                 Last_IO_Error:
                   Last_SQL_Errno: 0
                    Last_SQL_Error:
       Replicate_Ignore_Server_Ids:
                 Master_Server_Id: 1
                  Master_SSL_Crl:
              Master_SSL_Crlpath:
                      Using_Gtid: Slave_Pos
                      Gtid_IO_Pos: 1-1-2,0-1-4
           Replicate_Do_Domain_Ids:
       Replicate_Ignore_Domain_Ids:
                    Parallel_Mode: optimistic
                         SQL_Delay: 0
               SQL_Remaining_Delay: NULL
     Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                Slave_DDL_Groups: 0
   Slave_Non_Transactional_Groups: 0
        Slave_Transactional_Groups: 1
   1 row in set (0.000 sec)

The SHOW SLAVE STATUS output Using_Gtid: Slave_Pos and Gtid_IO_Pos: 1-1-2,0-1-4 indicates that the slave uses GTIDs to track replication. We could use the above command to set up GTID replication on other replicas as well.

When running CHANGE MASTER TO master_use_gtid=slave_pos; and setting the MASTER_USE_GTID replication parameter, you have the option of enabling Global Transaction IDs to use either the current_pos or slave_pos values.

Using the value current_pos causes the replica to set its position based on the gtid_current_pos system variable, which is a union of gtid_binlog_pos and gtid_slave_pos. Using the value slave_pos causes the replica to instead set its position based on the gtid_slave_pos system variable. You may run into issues when you use the value current_pos if you write any local transactions on the replica. For details, please refer to https://mariadb.com/kb/en/gtid/.

On Community MySQL or Percona Server for MySQL, the parameter to enable GTID replication is
gtid-mode=ON enforce-gtid-consistency.

The command is:

change master to
master_host = '192.168.1.120',
master_port=3306,
master_user = 'repl',
master_password = 'password',
master_auto_position=1;

Conclusion

We can see that the procedure to enable GTIDs on MariaDB replication is simple, but the commands and parameters are a bit different.

As mentioned, Percona offers Support and Managed Services for Community versions of MariaDB. At the same time, we encourage MariaDB users to explore Percona Server for MySQL as an alternative. Should you make the decision to migrate to Percona Software for MySQL, our experts will assist throughout the migration process and support you after the migration is complete.

 

Contact us to discuss migration options

 

Hope this is helpful for your daily MySQL and MariaDB management.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com