Jun
25
2024
--

MySQL Replication: How To Deal With the ‘Got Fatal Error 1236’ or MY-013114 Error

MySQL Got Fatal Error 1236Replication has been the core functionality, allowing high availability in MySQL for decades already. However, you may still encounter replication errors that keep you awake at night. One of the most common and challenging to deal with starts with: “Got fatal error 1236 from source when reading data from binary log“. This blog post is […]

May
29
2024
--

Valkey/Redis: Setting Up Replication

setting up replication in ValkeyIn most database systems, like MySQL, PostgreSQL, and MongoDB, replication of some kind is used to create a highly available architecture. Valkey/Redis is no different in this regard. Replication is native functionality to Valkey, supporting multiple replicas, and even chains of replication.To clear up any confusion, understand that Valkey replication is a different concept compared […]

Dec
14
2023
--

How to Convert PostgreSQL Streaming Replication to Logical Replication

While a hallmark feature of PostgreSQL is streaming replication, sometimes data demands a more refined and adaptable replication approach to address different requirements. Logical replication offers a distinct advantage over traditional streaming replication by providing a more detailed and selective approach to data replication, enabling organizations to replicate specific tables selectively and even filter out particular types of data.

In this blog post, we will look into the steps of transforming an existing PostgreSQL streaming replication node into a logical replication node. This method facilitates a smooth transition to logical replication without the need to start from scratch, allowing organizations to harness the enhanced capabilities of logical replication effectively.

Pre-requisites

To switch from the current streaming replication to logical replication, confirm that the primary node fulfills the following prerequisites:

  • Set the wal_level to logical.
  • Adjust max_replication_slots and max_wal_senders as needed.
  • Ensure all tables have a replica identity set.

 Steps to convert streaming replication to logical replication:

The lab setup includes two servers: primary (172.31.46.28) and standby (172.31.39.50), currently configured with streaming replication. The following steps will guide the transition from streaming to logical replication. 

1. Verify that the current streaming replication setup is synchronized and that there is no lag between the primary and standby nodes.

percona=# select usename,client_addr,state,replay_lag from pg_stat_replication;
 usename | client_addr  |   state   | replay_lag
---------+--------------+-----------+------------
 percona | 172.31.39.50 | streaming |
(1 row)

percona=# select slot_name,slot_type,active from pg_replication_slots;
       slot_name       | slot_type | active
-----------------------+-----------+--------
 physical_replica_slot | physical  | t
(1 row)

2. Create a logical replication slot on the primary node.

percona=# select pg_create_logical_replication_slot('logical_replica_slot', 'pgoutput');
 pg_create_logical_replication_slot
------------------------------------
 (logical_replica_slot,0/7000180)
(1 row)
percona=# select * from pg_replication_slots ;
       slot_name       |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 physical_replica_slot |          | physical  |        |          | f         | t      |     116306 |      |              | 0/7000180   |                     | reserved   |               | f
 logical_replica_slot  | pgoutput | logical   |  16384 | percona  | f         | f      |            |      |          768 | 0/7000148   | 0/7000180           | reserved   |               | f
(2 rows)

3. Create publication for tables on the primary node.

percona=# CREATE PUBLICATION my_pub FOR ALL TABLES;
CREATE PUBLICATION

4. Next, promote the intended standby node to a standalone node.

percona=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)
postgres=# SELECT pg_promote();
 pg_promote
------------
 t
(1 row)
postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

5. Review the PostgreSQL log file on the promoted standby server and take note of the Log Sequence Number (LSN) associated with the “redo done at” message. In the below log file, the LSN is 0/7000640.

promoted standby postgresql Logfile:
2023-12-04 13:41:11.527 UTC [37749] LOG:  received promote request
2023-12-04 13:41:11.527 UTC [37750] FATAL:  terminating walreceiver process due to administrator command
2023-12-04 13:41:11.527 UTC [37749] LOG:  invalid record length at 0/7000678: wanted 24, got 0
2023-12-04 13:41:11.527 UTC [37749] LOG:  redo done at 0/7000640 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 1638.50 s
2023-12-04 13:41:11.527 UTC [37749] LOG:  last completed transaction was at log time 2023-12-04 13:40:25.619889+00
2023-12-04 13:41:11.535 UTC [37749] LOG:  selected new timeline ID: 2
2023-12-04 13:41:11.622 UTC [37749] LOG:  archive recovery complete
2023-12-04 13:41:11.633 UTC [37747] LOG:  checkpoint starting: force
2023-12-04 13:41:11.637 UTC [37746] LOG:  database system is ready to accept connections
2023-12-04 13:41:11.639 UTC [37883] WARNING:  archive_mode enabled, yet archiving is not configured
2023-12-04 13:41:11.955 UTC [37747] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.305 s, sync=0.005 s, total=0.322 s; sync files=5, longest=0.004 s, average=0.001 s; distance=1 kB, estimate=14745 kB

The log entry “redo done at 0/7000640” provides details about the most recently applied transaction location. We utilize this Log Sequence Number (LSN) to progress the logical replication slot created in step two. Consequently, transactions occurring after this point will be applied to the newly promoted node.

6. On the primary server, advance the logical replication slot created in step two.

percona=# select pg_replication_slot_advance('logical_replica_slot', '0/7000640');
   pg_replication_slot_advance    
----------------------------------
(logical_replica_slot,0/7000640)

7. On the standby server, create the subscription for the publication created in step three.

percona=# CREATE SUBSCRIPTION my_sub CONNECTION 'user=percona password=<redacted> host=<host_ip> port=5432' PUBLICATION my_pub WITH (copy_data=false, slot_name='logical_replica_slot', create_slot=false);

8. At this stage, logical replication will commence, which can be verified by inspecting the replication status on the primary server.

percona=# select slot_name,slot_type,active from pg_replication_slots;
       slot_name       | slot_type | active -----------------------+-----------+--------
 physical_replica_slot | physical  | f      
 logical_replica_slot  | logical   | t      
(2 rows)
percona=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 116806
usesysid         | 16434
usename          | percona
application_name | my_sub
client_addr      | 172.31.39.50
client_hostname  |
client_port      | 52512
backend_start    | 2023-12-04 13:52:46.200631+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/7002148
write_lsn        | 0/7002148
flush_lsn        | 0/7002148
replay_lsn       | 0/7002148
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2023-12-04 14:06:19.464122+00

9. It’s important to remove the current physical replication slot to avoid filling up the primary server’s disk.

postgres=# select pg_drop_replication_slot('physical_replica_slot');
 pg_drop_replication_slot 
--------------------------
(1 row)

The steps covered in this process make the transition to logical replication smoother, allowing for selective replication and improved data management across different parts of your database system. This change provides more control over how you manage and share data within PostgreSQL.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Sep
20
2023
--

How to Read Simplified SHOW REPLICA STATUS Output

SHOW REPLICA STATUS

As a MySQL database administrator, you’re likely familiar with the SHOW REPLICA STATUS command. It is an important command for monitoring the replication status on your MySQL replicas. However, its output can be overwhelming for beginners, especially regarding the binary log coordinates. I have seen confusion amongst new DBAs on which binary log file and position represent what in the replication.

In this guide, we’ll simplify the SHOW REPLICA STATUS output, focusing on the critical binary log coordinates essential for troubleshooting and managing replication.

The key binlog coordinates

Before we delve into the output, let’s understand the key binlog coordinates we’ll be working with:

  • Master_Log_File: This is the name of the primary binary log file that the I/O thread is currently reading from.
  • Read_Master_Log_Pos: It represents the position up to which the I/O thread has read in the current primary binary log file.
  • Relay_Log_File: This is the name of the relay log file that the SQL thread is currently processing.
  • Relay_Log_Pos: It shows the position up to which the SQL thread has finished processing in the current relay log file.
  • Relay_Master_Log_File: This is the name of the primary binary log file that contains the most recent event executed by the SQL thread.
  • Exec_Master_Log_Pos: It indicates the position up to which the SQL thread has processed in the current master binary log file. It can be used to start a new replica from a current replica with the CHANGE MASTER TO … MASTER_LOG_POS option.

Simplified SHOW REPLICA STATUS output

I thought of creating the following image to effectively make you understand the binary log coordinates in the show replica status output.

show replica status

Decoding the SHOW REPLICA STATUS output

Now, let’s break down the SHOW REPLICA STATUS output to understand these binlog coordinates:

  • Master_Log_File and Read_Master_Log_Pos: These values tell you which primary binary log file and position the I/O thread is currently reading. It’s like the bookmark in a book, showing you where the replication process is in the Primary’s log.
  • Relay_Log_File and Relay_Log_Pos: These values reveal the name of the relay log file and the position up to which the SQL thread has processed. Think of it as the progress report of the SQL thread.
  • Relay_Master_Log_File and Exec_Master_Log_Pos: These parameters are essential when you need to reset replication on a replica.
  • Relay_Master_Log_File specifies the name of the primary binary log file containing the most recent event executed by the SQL thread.
  • Exec_Master_Log_Pos tells you the precise position within that file. Together, they enable you to pick up where replication left off.

Troubleshooting and managing replication

Understanding these binlog coordinates simplifies troubleshooting and managing replication:

  • When replication breaks, you can use Relay_Master_Log_File and Exec_Master_Log_Pos to identify the exact location and resume from there.
  • Monitoring Master_Log_File and Read_Master_Log_Pos helps you keep track of the I/O thread’s progress on the primary.
  • Checking Relay_Log_File and Relay_Log_Pos lets you know how far the SQL thread has come in processing events.

By grasping these key binlog coordinates, you can confidently manage MySQL replication, resolve issues efficiently, and keep your databases in sync.

Quick tip for DBAs

We know monitoring is one of the important components of your architecture. We recommend using Percona Monitoring and Management, the best monitoring tool for your open source databases.

The MySQL Replication Summary dashboard comes in really handy when monitoring the replication status. Please find the sample snapshot from our test monitoring node:

mysql replication dashboard - PMM

Conclusion

The SHOW REPLICA STATUS output shouldn’t be something confusing you. By focusing on the binlog coordinates, you gain valuable insights into the replication process. Whether you’re troubleshooting a broken replica or monitoring ongoing replication, these coordinates guide success in managing MySQL replication. Next time you encounter the SHOW REPLICA STATUS output, remember that it’s simply telling you where your replication stands in the grand scheme of your MySQL environment.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

 

Download Percona Monitoring and Management Today

Aug
17
2023
--

InnoDB ClusterSet Deployment With MySQLRouter

InnoDB ClusterSet Deployment With MySQLRouter

This blog post will cover the basic setup of the InnoDB ClusterSet environment, which provides disaster tolerance for InnoDB Cluster deployments by associating a primary InnoDB Cluster with one or more replicas in alternate locations/different data centers. InnoDB ClusterSet automatically manages replication from the primary cluster to the replica clusters via a specific ClusterSet Async replication channel. If the primary cluster becomes inaccessible due to a loss of network connectivity or a data center issue, you can make a replica cluster active in its place.

Now, let’s see in detail how exactly we can configure the topology.

InnoDB ClusterSet Deployment

We have used the sandbox environment available via MySQLShell utility for this setup.

Environment

Cluster1:
         127.0.0.1:3308
         127.0.0.1:3309
         127.0.0.1:3310

Cluster2:
         127.0.0.1:3311
         127.0.0.1:3312
         127.0.0.1:3313

Router:
         127.0.0.1:6446/6447

Let’s set up the first cluster (“cluster1”)

  •  Deploying the sandboxes.
MySQL JS > dba.deploySandboxInstance(3308) 
MySQL JS > dba.deploySandboxInstance(3309) 
MySQL JS > dba.deploySandboxInstance(33010)

  •  Then, we need to perform some pre-checks before initiating the cluster.
###connecting to the concerned nodes one by one. 

MySQL JS > shell.connect('root@localhost:3308') 
MySQL localhost:3308 ssl JS > shell.connect('root@localhost:3309') 
MySQL localhost:3309 ssl JS > shell.connect('root@localhost:3310') 

###The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. 

MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3308') 
MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3308',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) 

MySQL localhost:3309 ssl JS > dba.checkInstanceConfiguration('root@localhost:3309') 
MySQL localhost:3309 ssl JS > dba.configureInstance('root@127.0.0.1:3309',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) 

MySQL localhost:3310 ssl JS > dba.checkInstanceConfiguration('root@localhost:3310') 
MySQL localhost:3310 ssl JS > dba.configureInstance('root@127.0.0.1:3310',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})

Once all the instances are prepared, we can plan to create the cluster with the seed node. The “createcluster” command will perform all the hidden steps of initializing group replication, and later on, the other nodes join the group with distributed recovery/clone plugin.

InnoDB cluster is built on top of group replication which provides (automatic membership management, fault tolerance, and automatic failover). It provides us with an easy interface to deploy/manage the complex topologies with DR support.

  • We will bootstrap the cluster with an initial node(“localhost:3308″).
MySQL localhost:3310 ssl JS > shell.connect('iroot@localhost:3308') 
MySQL localhost:3308 ssl JS > cluster1 = dba.createCluster('Cluster1') 
MySQL localhost:3308 ssl JS > cluster1 = dba.getCluster()

Output:

MySQL localhost:3308 ssl JS > cluster1.status()
{
    "clusterName": "Cluster1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "127.0.0.1:3308", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "127.0.0.1:3308": {
                "address": "127.0.0.1:3308", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.31"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "127.0.0.1:3308"
}

  • Here, we have successfully bootstrapped the first node. Next, the other nodes will join the cluster using the CLONE Plugin.
MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3309",{password:'Iroot@1234'})

Output:

* Waiting for clone to finish...

NOTE: 127.0.0.1:3309 is being cloned from 127.0.0.1:3308

** Stage DROP DATA: Completed 

** Clone Transfer 

    FILE COPY  ############################################################  100%  Completed

    PAGE COPY  ############################################################  100%  Completed

    REDO COPY  ############################################################  100%  Completed

NOTE: 127.0.0.1:3309 is shutting down...

* Waiting for server restart... ready 

* 127.0.0.1:3309 has restarted, waiting for clone to finish...

** Stage RESTART: Completed

* Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s)

State recovery already finished for '127.0.0.1:3309'

The instance '127.0.0.1:3309' was successfully added to the cluster.

 

MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3310",{password:'Iroot@1234'})

Output:

* Waiting for clone to finish...

NOTE: 127.0.0.1:3310 is being cloned from 127.0.0.1:3309

** Stage DROP DATA: Completed 

** Clone Transfer 

    FILE COPY  ############################################################  100%  Completed

    PAGE COPY  ############################################################  100%  Completed

    REDO COPY  ############################################################  100%  Completed

NOTE: 127.0.0.1:3310 is shutting down...

* Waiting for server restart... ready 

* 127.0.0.1:3310 has restarted, waiting for clone to finish...

** Stage RESTART: Completed

* Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s)

State recovery already finished for '127.0.0.1:3310'

The instance '127.0.0.1:3310' was successfully added to the cluster.

  • At this stage, our first cluster is ready with all three nodes.
MySQL localhost:3308 ssl JS > cluster1.status()

Output:

{

    "clusterName": "Cluster1", 

    "defaultReplicaSet": {

        "name": "default", 

        "primary": "127.0.0.1:3308", 

        "ssl": "REQUIRED", 

        "status": "OK", 

        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

        "topology": {

            "127.0.0.1:3308": {

                "address": "127.0.0.1:3308", 

                "memberRole": "PRIMARY", 

                "mode": "R/W", 

                "readReplicas": {}, 

                "replicationLag": "applier_queue_applied", 

                "role": "HA", 

                "status": "ONLINE", 

                "version": "8.0.31"

            }, 

            "127.0.0.1:3309": {

                "address": "127.0.0.1:3309", 

                "memberRole": "SECONDARY", 

                "mode": "R/O", 

                "readReplicas": {}, 

                "replicationLag": "applier_queue_applied", 

                "role": "HA", 

                "status": "ONLINE", 

                "version": "8.0.31"

            }, 

            "127.0.0.1:3310": {

                "address": "127.0.0.1:3310", 

                "memberRole": "SECONDARY", 

                "mode": "R/O", 

                "readReplicas": {}, 

                "replicationLag": "applier_queue_applied", 

                "role": "HA", 

                "status": "ONLINE", 

                "version": "8.0.31"

            }

        }, 

        "topologyMode": "Single-Primary"

    }, 

    "groupInformationSourceMember": "127.0.0.1:3308"

}

Let’s now proceed with the second cluster (“cluster2”) setup

  • Deploying the sandboxes via MySqlShell.
MySQL JS > dba.deploySandboxInstance(3311) 
MySQL JS > dba.deploySandboxInstance(3312) 
MySQL JS > dba.deploySandboxInstance(3313)

  •  Similarly, perform some pre-checks as we did for “cluster1” nodes.
# connecting to the concerned nodes. 

MySQL  JS > shell.connect('root@localhost:3311') 
MySQL  JS > shell.connect('root@localhost:3312')
MySQL  JS > shell.connect('root@localhost:3313')

# The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. 

MySQL  localhost:3308 ssl  JS > dba.checkInstanceConfiguration('root@localhost:3311')
MySQL  localhost:3308 ssl  JS > dba.configureInstance('root@127.0.0.1:3311',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})

MySQL  localhost:3308 ssl  JS > dba.checkInstanceConfiguration('root@localhost:3312')
MySQL  localhost:3308 ssl  JS > dba.configureInstance('root@127.0.0.1:3312',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})

MySQL  localhost:3308 ssl  JS > dba.checkInstanceConfiguration('root@localhost:3313')
MySQL  localhost:3308 ssl  JS > dba.configureInstance('root@127.0.0.1:3313',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})

  •  Next, we will create the ClusterSet topology by triggering the sync on the node (127.0.0.1:3311) by existing cluster1 nodes. Node (127.0.0.1:3311) will be the Primary node for cluster2, and the rest of other nodes will join this node by Clone/Incremental process.
1) First, connect to “cluster1” node.

MySQL localhost:3308 ssl JS > c iroot@127.0.0.1:3308
MySQL 127.0.0.1:3308 ssl JS > cluster1 = dba.getCluster()

2) Here, “cluster1” join the ClusterSet topology,

MySQL 127.0.0.1:3308 ssl JS > myclusterset = cluster1.createClusterSet('firstclusterset')

Output:

ClusterSet successfully created. Use ClusterSet.createReplicaCluster() to add Replica Clusters to it.
<ClusterSet:firstclusterset>`

3) Verifying the status.

MySQL 127.0.0.1:3308 ssl JS > myclusterset.status({extended: 1})

Output:

{

    "clusters": {

        "Cluster1": {

            "clusterRole": "PRIMARY", 

            "globalStatus": "OK", 

            "primary": "127.0.0.1:3308", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3308": {

                    "address": "127.0.0.1:3308", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/W", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3309": {

                    "address": "127.0.0.1:3309", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3310": {

                    "address": "127.0.0.1:3310", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-85,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5"

        }

    }, 

    "domainName": "firstclusterset", 

    "globalPrimaryInstance": "127.0.0.1:3308", 

    "metadataServer": "127.0.0.1:3308", 

    "primaryCluster": "Cluster1", 

    "status": "HEALTHY", 

    "statusText": "All Clusters available."

}

 4) Now, Node (“127.0.0.1:3311″) will sync with the existing “cluster1” with Async process.

MySQL  127.0.0.1:3308 ssl  JS > c iroot@127.0.0.1:3311
MySQL  127.0.0.1:3311 ssl  JS > cluster2 = myclusterset.createReplicaCluster("127.0.0.1:3311", "cluster2", {recoveryProgress: 1, timeout: 10})

Output:

... Replica Cluster 'cluster2' successfully created on ClusterSet 'firstclusterset'. ...

5) Next, the other nodes join the “cluster2” with the clone process.

MySQL  127.0.0.1:3311 ssl  JS > cluster2.addInstance("iroot@127.0.0.1:3312",{password:'Iroot@1234'})
MySQL  127.0.0.1:3311 ssl  JS > cluster2.addInstance("iroot@127.0.0.1:3313",{password:'Iroot@1234'})

6) Finally, checking the status of our clusterset environment.

MySQL  127.0.0.1:3311 ssl  JS > myclusterset = dba.getClusterSet()
MySQL  127.0.0.1:3311 ssl  JS > myclusterset.status({extended: 1})

Output:

{

    "clusters": {

        "Cluster1": {

            "clusterRole": "PRIMARY", 

            "globalStatus": "OK", 

            "primary": "127.0.0.1:3308", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3308": {

                    "address": "127.0.0.1:3308", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/W", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3309": {

                    "address": "127.0.0.1:3309", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3310": {

                    "address": "127.0.0.1:3310", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5"

        }, 

        "cluster2": {

            "clusterRole": "REPLICA", 

            "clusterSetReplication": {

                "applierStatus": "APPLIED_ALL", 

                "applierThreadState": "Waiting for an event from Coordinator", 

                "applierWorkerThreads": 4, 

                "receiver": "127.0.0.1:3311", 

                "receiverStatus": "ON", 

                "receiverThreadState": "Waiting for source to send event", 

                "source": "127.0.0.1:3308"

            }, 

            "clusterSetReplicationStatus": "OK", 

            "globalStatus": "OK", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3311": {

                    "address": "127.0.0.1:3311", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3312": {

                    "address": "127.0.0.1:3312", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3313": {

                    "address": "127.0.0.1:3313", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", 

            "transactionSetConsistencyStatus": "OK", 

            "transactionSetErrantGtidSet": "", 

            "transactionSetMissingGtidSet": ""

        }

    }, 

    "domainName": "firstclusterset", 

    "globalPrimaryInstance": "127.0.0.1:3308", 

    "metadataServer": "127.0.0.1:3308", 

    "primaryCluster": "Cluster1", 

    "status": "HEALTHY", 

    "statusText": "All Clusters available."

Here, the ClusterSet topology is ready now with all six nodes.

In the next phase, we will bootstrap MySQLRouter with our newly created ClusterSet environment:

  • First, we will generate a dedicated user for MySQLRouter monitoring/management.
MySQL  127.0.0.1:3311 ssl  JS > c iroot@localhost:3308
MySQL  localhost:3308 ssl  JS > cluster1 = dba.getCluster();
MySQL  localhost:3308 ssl  JS > cluster1.setupRouterAccount('router_usr')

Output:

Missing the password for new account router_usr@%. Please provide one.
Password for new account: **********
Confirm password: **********
Creating user router_usr@%.
Account router_usr@% was successfully created.

  • Bootstrap the router with the user (“router_usr) and router name (“Router1”).
[vagrant@localhost ~]$ sudo mysqlrouter --bootstrap iroot@127.0.0.1:3308 --account=router_usr --name='Router1' --user root --force

We are using –-force here because without –-force mysqlrouter won’t recognize the clusterset. This will reconfigure the existing clusterset.

Here, we will see some useful information that later on is required to connect to a database or manage the services.

# MySQL Router 'Router1' configured for the ClusterSet 'firstclusterset'

After this MySQL Router has been started with the generated configuration

    $ /etc/init.d/mysqlrouter restart

or

    $ systemctl start mysqlrouter

or

    $ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf

ClusterSet 'firstclusterset' can be reached by connecting to:

## MySQL Classic protocol

- Read/Write Connections: localhost:6446

- Read/Only Connections:  localhost:6447

## MySQL X protocol

- Read/Write Connections: localhost:6448

- Read/Only Connections:  localhost:6449

  • Finally, start the mysqlrouter service:
sudo mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf &

Validating the connection route

  •  Connect to the router port “6446” and create some demo table/data:
shell> mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6446 -e "create database sbtest;use sbtest;create table sbtest1 (id int(10) not null auto_increment primary key, user varchar(50));insert into sbtest1(user) values('test');"

  •  Connect to the router port “6447” for reading purposes. Here, the connection will be, by default, balanced among the number of nodes of the Primary Cluster(cluster1).
[vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;"
+----+------+
| id | user |
+----+------+
|  1 | test |
+----+------+
+-------------+
| @@server_id |
+-------------+
|   194452202 |
+-------------+

[vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;"
+----+------+
| id | user |
+----+------+
|  1 | test |
+----+------+
+-------------+
| @@server_id |
+-------------+
|  2376678236 |
+-------------+

[vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+------+
| id | user |
+----+------+
|  1 | test |
+----+------+
+-------------+
| @@server_id |
+-------------+
|   194452202 |
+-------------+

So, by default, all the connections will route to the default “Primary” Cluster, which, in our case, is “Clustrer1”; however, we can change the primary component based on the requirement.

Changing ClusterSet topology

MySQL  localhost:3308 ssl  JS > myclusterset=dba.getClusterSet()
MySQL  localhost:3308 ssl  JS > myclusterset.status({extended:1})

Output:

<ClusterSet:firstclusterset>

{

    "clusters": {

        "Cluster1": {

            "clusterRole": "PRIMARY", 

            "globalStatus": "OK", 

            "primary": "127.0.0.1:3308", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3308": {

                    "address": "127.0.0.1:3308", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/W", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3309": {

                    "address": "127.0.0.1:3309", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3310": {

                    "address": "127.0.0.1:3310", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5"

        }, 

        "cluster2": {

            "clusterRole": "REPLICA", 

            "clusterSetReplication": {

                "applierStatus": "APPLIED_ALL", 

                "applierThreadState": "Waiting for an event from Coordinator", 

                "applierWorkerThreads": 4, 

                "receiver": "127.0.0.1:3311", 

                "receiverStatus": "ON", 

                "receiverThreadState": "Waiting for source to send event", 

                "source": "127.0.0.1:3308"

            }, 

            "clusterSetReplicationStatus": "OK", 

            "globalStatus": "OK", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3311": {

                    "address": "127.0.0.1:3311", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3312": {

                    "address": "127.0.0.1:3312", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3313": {

                    "address": "127.0.0.1:3313", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", 

            "transactionSetConsistencyStatus": "OK", 

            "transactionSetErrantGtidSet": "", 

            "transactionSetMissingGtidSet": ""

        }

    }, 

    "domainName": "firstclusterset", 

    "globalPrimaryInstance": "127.0.0.1:3308", 

    "metadataServer": "127.0.0.1:3308", 

    "primaryCluster": "Cluster1", 

    "status": "HEALTHY", 

    "statusText": "All Clusters available."

}

  • Changing the Primary cluster from “cluster1” to “cluster2:
MySQL localhost:3308 ssl JS > myclusterset.setPrimaryCluster('cluster2')

Output:

Switching the primary cluster of the clusterset to 'cluster2'

* Verifying clusterset status

** Checking cluster cluster2

  Cluster 'cluster2' is available

** Checking cluster Cluster1

  Cluster 'Cluster1' is available

* Reconciling 5 internally generated GTIDs

* Refreshing replication account of demoted cluster

* Synchronizing transaction backlog at 127.0.0.1:3311

** Transactions replicated  ############################################################  100% 

* Updating metadata

* Updating topology

** Changing replication source of 127.0.0.1:3309 to 127.0.0.1:3311

** Changing replication source of 127.0.0.1:3310 to 127.0.0.1:3311

** Changing replication source of 127.0.0.1:3308 to 127.0.0.1:3311

* Acquiring locks in replicaset instances

** Pre-synchronizing SECONDARIES

** Acquiring global lock at PRIMARY

** Acquiring global lock at SECONDARIES

* Synchronizing remaining transactions at promoted primary

** Transactions replicated  ############################################################  100% 

* Updating replica clusters

Cluster 'cluster2' was promoted to PRIMARY of the clusterset. The PRIMARY instance is '127.0.0.1:3311'

  • If we see the output again, we can observe that  “clusterRole:PRIMARY” is shifted to “cluster2”.
<span class="s1">My</span><span class="s2">SQL </span><span class="s3"> localhost:3308 ssl </span><span class="s4"> JS </span><span class="s5">&gt; </span><span class="s6">myclusterset.status({extended:1})</span>

Output:

{

    "clusters": {

        "Cluster1": {

            "clusterRole": "REPLICA", 

            "clusterSetReplication": {

                "applierStatus": "APPLIED_ALL", 

                "applierThreadState": "Waiting for an event from Coordinator", 

                "applierWorkerThreads": 4, 

                "receiver": "127.0.0.1:3308", 

                "receiverStatus": "ON", 

                "receiverThreadState": "Waiting for source to send event", 

                "source": "127.0.0.1:3311"

            }, 

            "clusterSetReplicationStatus": "OK", 

            "globalStatus": "OK", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3308": {

                    "address": "127.0.0.1:3308", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3309": {

                    "address": "127.0.0.1:3309", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3310": {

                    "address": "127.0.0.1:3310", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", 

            "transactionSetConsistencyStatus": "OK", 

            "transactionSetErrantGtidSet": "", 

            "transactionSetMissingGtidSet": ""

        }, 

        "cluster2": {

            "clusterRole": "PRIMARY", 

            "globalStatus": "OK", 

            "primary": "127.0.0.1:3311", 

            "status": "OK", 

            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 

            "topology": {

                "127.0.0.1:3311": {

                    "address": "127.0.0.1:3311", 

                    "memberRole": "PRIMARY", 

                    "mode": "R/W", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3312": {

                    "address": "127.0.0.1:3312", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }, 

                "127.0.0.1:3313": {

                    "address": "127.0.0.1:3313", 

                    "memberRole": "SECONDARY", 

                    "mode": "R/O", 

                    "replicationLagFromImmediateSource": "", 

                    "replicationLagFromOriginalSource": "", 

                    "status": "ONLINE", 

                    "version": "8.0.31"

                }

            }, 

            "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5"

        }

    }, 

    "domainName": "firstclusterset", 

    "globalPrimaryInstance": "127.0.0.1:3311", 

    "metadataServer": "127.0.0.1:3311", 

    "primaryCluster": "cluster2", 

    "status": "HEALTHY", 

    "statusText": "All Clusters available."

}

So, we have changed the Primary component from cluster1 to cluster2, but the routing is still set for cluster1. In order to send traffic to cluster2, we also have to change the routing option.

MySQL localhost:3308 ssl JS > myclusterset.listRouters()

Output:

{     "domainName": "firstclusterset",     "routers": {         "localhost.localdomain::Router1": {             "hostname": "localhost.localdomain",             "lastCheckIn": "2023-07-22 02:47:42",             "roPort": "6447",             "roXPort": "6449",             "rwPort": "6446",             "rwXPort": "6448",             "targetCluster": "primary",             "version": "8.0.32"         },

  • Changing the connection target from “cluster1” to “cluster2”:
MySQL localhost:3308 ssl JS > myclusterset.setRoutingOption('localhost.localdomain::Router1', 'target_cluster', 'cluster2')

MySQL localhost:3308 ssl JS > myclusterset.listRouters()

Output:

MySQL localhost:3308 ssl JS > myclusterset.listRouters()
{

    "domainName": "firstclusterset", 

    "routers": {

        "localhost.localdomain::Router1": {

            "hostname": "localhost.localdomain", 

            "lastCheckIn": "2023-07-22 02:47:42", 

            "roPort": "6447", 

            "roXPort": "6449", 

            "rwPort": "6446", 

            "rwXPort": "6448", 

            "targetCluster": "cluster2", 

            "version": "8.0.32"

        }

 Verifying the routing policy in the existing clusterset

MySQL localhost:3308 ssl JS > myclusterset.routingOptions()

Output:

{

    "domainName": "firstclusterset", 

    "global": {

        "invalidated_cluster_policy": "drop_all", 

        "stats_updates_frequency": 0, 

        "target_cluster": "primary"

    }, 

    "routers": {

        "localhost.localdomain::Router1": {

            "target_cluster": "cluster2"

        }

          }

}

There are situations when Primary clusters are not available or reachable. The immediate solution in some situations would be to perform an emergency failover in order to avoid the application block out.

An emergency failover basically switches to a selected replica cluster from the primary InnoDB Cluster for the InnoDB ClusterSet deployment. During an emergency failover process, data consistency is not assured due to async replication and other network factors, so for safety, the original primary cluster is marked as invalidated during the failover process.

So if by any chance the original primary cluster remains online, it should be shut down. Later, the invalidated primary cluster can join the clusterset via rejoin/repair process.

Perform emergency failover

myclusterset.forcePrimaryCluster("cluster2")
myclusterset.setRoutingOption('localhost::Route1', 'target_cluster', 'cluster2')

Summary

With the help of ClusterSet implementation, deploying DR support over different regions is no more a complex challenge. MySQLShell and InnoDB cluster tackles all the configurations and syncing process behind the scene. The disaster recovery and failure time can be minimized with the help of the Admin APIs/MySQLShell commands.

There is one caveat with the clusterset functioning. It does not support high availability/auto-promotion of the new primary if the existing one goes down. We must take care of the same with some manual intervention or via some internal automated process.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Jan
23
2023
--

Working of MySQL Replication Filters When Using Statement-based and Row-based Replication

MySQL Replication Filters

MySQL Replication FiltersA couple of days ago I was creating an index on the source and when I checked the replica side it was not replicated, so I just wanted to explain how the replication filter may increase the complexity of your DBA operations.

Replication occurs by reading events from the binary log of the source and then executing them on the replica. The events in the binary log are recorded in different formats, depending on the type of event. These formats are determined by the binary logging format used when the events were initially recorded on the source. The relationship between the binary logging formats and the terminology used during replication is as follows:

When using statement-based binary logging, the source writes SQL statements to the binary log. Replication of the source to the replica is performed by executing these SQL statements on the replica. This is known as statement-based replication (SBR), and it corresponds to the statement-based binary logging format in MySQL.

When using row-based binary logging, the source writes events to the binary log that shows how individual rows in tables are changed. Replication of the source to the replica is done by copying these events, representing the changes in the table rows, to the replica. This is known as row-based replication (RBR) and it corresponds to the row-based binary logging format in MySQL.

Row-based logging is the default method.

You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which is most appropriate for the change to be logged. This is called mixed-format binary logging.

MySQL supports two types of replication filters; those that apply at the database level and those that apply and the table level:

There are filters at the database level, known as binlog-do-db and binlog-ignore-db, that control what is included in the binary log. However, it’s important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. It’s therefore recommended to use filters that apply to the replica rather than the binary log.

When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. And when using table-level filters with STATEMENT-based replication, the filter applies to the table change.

Mixing database and table-level filters create more complex rules. First, the database-level filters are checked, and if the update qualifies, the table-level filters are also checked. This can result in different outcomes for STATEMENT-based and ROW-based replication when using non-default databases. Additionally, using the MIXED format for binary logging can also cause issues, as a small change in the query can change whether the statement is logged in the STATEMENT- or ROW-based format. For this reason, it’s safer to use table-level replication filters rather than database-level filters.

See Determination of Safe and Unsafe Statements in Binary Logging for details of how it is decided whether a statement is logged using the STATEMENT-based or ROW-based format when using MIXED mode replication.

Examples

All the following examples use the following schema:

mysql> CREATE DATABASE databasefilter; 
Query OK, 1 row affected (0.02 sec) 
mysql> CREATE DATABASE databasewithoutfilter; 
Query OK, 1 row affected (0.00 sec) 
mysql> use databasefilter; 
Database changed 
mysql> CREATE TABLE t1 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; 
Query OK, 0 rows affected (0.06 sec) 
mysql> CREATE TABLE t2 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; 
Query OK, 0 rows affected (0.04 sec)

 

Example one

Replication filter:

replicate-wild-do-table = databasefilter.t1\_%

Statements:

use databasefilter; 
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: Yes

Row-based replication: Yes

The replication filter is set to “replicate-wild-do-table = databasefilter.t1_%”, meaning that any table in the “databasefilter” database with a name starting with “t1_” will be replicated. If the statement “use databasefilter; INSERT INTO t1 VALUES (1);” is executed, it will be replicated using both statement-based and row-based replication methods.

Example two

Replication filter:

replicate-wild-do-table = databasefilter.t2\_%

Statements:

use databasefilter;
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: No

Row-based replication: No

The replication filter “replicate-wild-do-table = databasefilter.t2_%” is set, which means that only tables that match the pattern “databasefilter.t2_%” will be replicated. When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.

However, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will not replicate in either statement-based or row-based replication because the table “t1” does not match the pattern specified in the replication filter.

Example three

Replication filter:

replicate-do-db = databasefilter

Statements:

use databasefilter;
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: Yes

Row-based replication: Yes

The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated.When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.

After that, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will replicate in both statement-based and row-based replication because it is executed in the “databasefilter” database that matches the pattern specified in the replication filter.

Example four

Replication filter:

replicate-do-db = databasefilter

Statements:

use databasewithoutfilter;
INSERT INTO databasefilter.t1 VALUES (1);

Replicates?

Statement-based replication: No

Row-based replication: Yes

The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated. When the statement “use databasewithoutfilter;” is executed, the current database will be set to “databasewithoutfilter”, which does not match the pattern specified in the replication filter. However, when the statement “INSERT INTO databasefilter.t1 VALUES (1);” is executed, it will replicate in row-based replication but not in statement-based replication.

The reason for this is that statement-based replication replicates the entire statement, including the “use databasewithoutfilter” statement, which does not match the pattern specified in the replication filter. But in row-based replication, it only replicates the actual data change and it does not care about the current database.

Conclusion

MySQL replication filters can be used to control which events are replicated from the source to the replica. These filters can be applied at the database level or the table level and can increase the complexity of DBA operations. It is important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. Mixing database and table-level filters create more complex rules and can cause issues when using MIXED format for binary logging.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Dec
06
2022
--

Importance of Delay Replica in Data Recovery: Recover Deleted Records

Delay Replica in Data Recovery

What is a Delay Replica and how does it help?

MySQL Replication is useful, easy to set up, and used for very different purposes. For example:

  • split reads and writes
  • run data mining or reporting processes on them
  • disaster recovery

To learn more, check out How Does MySQL Replication Work?

It’s important to mention that a replication server is not a backup by itself. A mistake on the source, for example, a DROP DATABASE or an UPDATE without a WHERE clause will be replicated nearly instantly to all replica servers. Simply having a replica is not going to be helpful here. How can we avoid that kind of mistake? By having a replica server that intentionally lags behind.

We can never avoid human error in database infrastructure operations. But rollback to the last known good condition from delayed Source/Replica is the best thing recommended during the entire database infrastructure corruption scenarios.

Delayed replication can be used for several purposes:

  • To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the replica. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging replica.
  • To inspect what the database looked like in the past, without having to reload a backup. For example, by configuring a replica with a delay of one week, if you then need to see what the database looked like before the last few days’ worth of development, the delayed replica can be inspected.
  • In our case, as we have six hours replication delay, we can recover the non-problematic state of the table by going back to six hours delayed replica in case there was the wrong DML on the source.

Testing the Delay Replica which can help us recover the deleted record

Let’s understand the situation first: Someone deleted the data on a Percona XtraDB Cluster 8 (PXC) server accidentally. As per architecture, we always configure a delayed replica. We have stopped the replication on the delayed replica and now we will restore the deleted records.

From the initial investigation, we came to know from the application operation team that the below query was executed.

delete from Schema_g2.per_sch order by id desc;

Almost 20k records were deleted. Let us do our pre-reqs and initial investigation based on the requirements we have.

In this article, PXC-8 is our source and pxc-backup is our delayed backup Replica. Click through to learn about our product Percona XtraDB Cluster and our backup solution Percona XtraBackup.

Percona XtraDB Cluster

  1. All records for Schema_g2.per_sch were deleted from PXC-8 by mistake.
  2. We have stopped the delayed replica pxc-backup to retrieve these records. All deleted records are present here right now. Below are the number of records we have until replication was stopped.
Pxc-backup > select count(*) from Schema_g2.per_sch;
+----------+
| count(*) |
+----------+
|    21762 |
+----------+

At this point, we have already stopped the backup replica. 

  1. After deletion, below are the new records inserted on the PXC-8 source for this table.
PXC-8 (none)> select count(*) from Schema_g2.per_sch;
+----------+
| count(*) |
+----------+
|      215 |
+----------+

  1. id column is auto_inc so new rows inserted have next values.
id int(11) NOT NULL AUTO_INCREMENT,

PXC-8 (none)> select * from Schema_g2.per_sch;
+--------+---------------+-----------+-------+
| id     | permission_id | school_id | grant |
+--------+---------------+-----------+-------+
| 178852 |           446 |     48887 |     0 |
| 178853 |           448 |     48887 |     0 |

...

  1. Records on the backup server:
Pxc-backup > select * from Schema_g2.per_sch limit 10;
+-----+---------------+-----------+-------+
| id  | permission_id | school_id | grant |
+-----+---------------+-----------+-------+
|   0 |             0 |         0 |     0 |
| 105 |           426 |         1 |     0 |
| 429 |            89 |        16 |     0 |
| 431 |            93 |        16 |     0 |

...

| 178629 |           194 |     35758 |     0 |
| 178630 |           195 |     35758 |     0 |
| 178631 |          8239 |     35758 |     0 |
+--------+---------------+-----------+-------+

  1. A few records between 178631 and 178852 are missing on the backup node and the replica was stopped in between.
  2. Exact position before the drop was executed on source: (Application team pinged this, we verified it as an UPDATE query before the DELETE).
START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

From binlog we see below:

#210922 11:44:05 server id 8  end_log_pos 613996753 CRC32 0xee39f244    Query   thread_id=36995659      exec_time=0     error_code=0
SET TIMESTAMP=1632300245/*!*/;
BEGIN
/*!*/;
# at 613996753
#210922 11:44:05 server id 8  end_log_pos 613997049 CRC32 0x92aea136    Table_map: `Schema_g2`.`usr` mapped to number 109
# at 613997049
#210922 11:44:05 server id 8  end_log_pos 613997697 CRC32 0x1132b4ad    Update_rows: table id 109 flags: STMT_END_F
### UPDATE `Schema_g2`.`usr`

.......

#210922 11:44:04 server id 8  end_log_pos 613997872 CRC32 0xf41297bc    Query   thread_id=37823889      exec_time=0     error_code=0
SET TIMESTAMP=1632300244/*!*/;
BEGIN
/*!*/;
# at 613997872
#210922 11:44:04 server id 8  end_log_pos 613997943 CRC32 0xe341c435    Table_map: `Schema_g2`.`per_sch` mapped to number 664
# at 613997943
#210922 11:44:04 server id 8  end_log_pos 614006154 CRC32 0xd71595b1    Delete_rows: table id 664

 

Steps to restore

  1. Start replication on the backup server until the DROP statement.
START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;

 Verify binlog position and that it is caught up till above and verify new record count on the backup server.

 Verify replication is stopped again.

select count(*) from Schema_g2.per_sch;

-Verify last id is < 178852

 Use –where clause IF we notice any duplicate duplicates rows same as PXC-8 on the backup server.

--where=“id < 178852”

  1. Take a backup from the backup server with the below options to avoid dropping the newly added rows.
mysqldump -h backup-server-ip --single-transaction --skip-add-drop-table --no-create-info Schema_g2 per_sch > per_sch_backup.sql

Verify no drops/created are present in the backup file.

  1. Restore backup on source PXC-8:
mysql -h Source-server-ip Schema_g2 < per_sch_backup.sql

Verify it completes ok and records are added back on source:

mysql -h Source-server-ip -e "select count(*) from Schema_g2.per_sch"

Verify records < 178852 and above also exist on the source.

  1. Start replication on the backup server normally.
stop slave;

start slave;

With loading the record on the source, it will get replicated to delay replica and other replicas and the data will be in sync again.

Conclusion

It is always recommended to have a Delay Replica in your architecture to avoid and quickly resolve such data losses.

Jul
29
2022
--

How to Benchmark Replication Performance in MySQL

Benchmark Replication Performance in MySQL

In this blog, I will cover important aspects which you need to test when benchmarking replication setup. MySQL has great tools that could be used to test its performance. They include:

sysbench – https://github.com/akopytov/sysbench

BMK-kit – http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html

mysqlslap – https://dev.mysql.com/doc/refman/8.0/en/mysqlslap.html

LinkBench – https://github.com/facebookarchive/linkbench

I will not describe how to use them here, as you can find instructions on the provided links or in the Percona blog by browsing tags sysbench, benchmark, benchmarks, and the category benchmarks.

However, most of the instructions take into account standalone MySQL servers. If you want to test how your replication setup performs, you may want to perform additional checks.

Can the replica catch up to the source server?

To answer this question, you need to run a load on the source machine and monitor the replica delay. You can do it by examining the

SHOW REPLICA STATUS

  (

SHOW SLAVE STATUS

) output, particularly the value of the

Seconds_Behind_Source

  (

Seconds_Behind_Master

 ) parameter. The same parameter is monitored by the “MySQL Replication Lag” graph under the “MySQL Replication Summary” dashboard in Percona Monitoring and Management (PMM).

The more advanced check would be to compare

Exec_Source_Log_Pos

  (

Exec_Master_Log_Pos

),

Relay_Log_File

  with

Read_Source_Log_Pos

  (

Read_Master_Log_Pos

),

Source_Log_File

  pairs since

Seconds_Behind_Source

  may be affected by long-running commands and return wrong values.

You may tune your replica server by adding parallelization if you see increasing lag. Check also how option

binlog_transaction_dependency_tracking

  works: https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_transaction_dependency_tracking

Can the replica run queries while applying updates from the source server?

Once you are happy with the replication lag, you may start adding the load you plan to run on this replica and test its performance. For example, if you plan to use the replica for handling read-only queries, you can run a read-only benchmark on it. You need to watch the performance of the read-only benchmark and replication lag.

If you plan to run analytical queries on the replica: test them. If you plan to use this replica for taking backups: take the backup and measure how it will affect the performance of the replication.

Synchronous replication

Synchronous replication solutions such as Galera, Percona XtraDB Cluster (PXC), and Group Replication are affected by the flow control paused time. You can use PMM to monitor it.

For PXC/Galera clusters, watch graph “Flow Control Paused Time” under “PXC/Galera Cluster Summary” dashboard.

Or you can watch the value of the

wsrep_flow_control_paused

  variable. See Galera Flow Control in Percona XtraDB Cluster for MySQL for more details.

For Group Replication, you need to check the value of the

COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE

column in the table

performance_schema.replication_group_member_stats

 . That shows how many transactions are waiting in the queue on the secondary node to apply.

You may also watch “Replication Lag” graph under “MySQL Group Replication” dashboard in PMM

Your best test is your production

As said by Dimitri Kravtchuk at Fosdem 2020:

While it is hard to create such a benchmark on the server that accepts writes from users, for the replica, it is comparatively easy. All you need to do is to create a replica of your production server and not make it available for the application yet. First, watch if it can handle replication lag while idle, then start adding load to it. You may also use the mirroring feature of ProxySQL Note; however, you need to check if some of the limitations described in this post, still apply.

How fast will the replica catch up?

One more thing you may want to test is how fast the replica will recover after a delay in replicating updates from the source server. To perform this test, stop the replica while running the source server’s load, restart it after a certain amount of time, and watch replication lag. If you are testing a replica that serves read-only load, let the load test continue.

It is better to perform stop, wait, and start in one series of commands, so you will have control over how long the replica was stopped. For example, to stop replica for one hour, use the following commands:

STOP REPLICA;
SELECT SLEEP(3600);
START REPLICA;

Then wait until the replica is running:

SELECT SERVICE_STATE FROM performance_schema.replication_applier_status;

Once this query returns ON, start monitoring the value of

Seconds_Behind_Source

  in the

SHOW REPLICA STATUS

  output until it becomes 0. Once it happens, record the time. This way, you will know how much time will be needed to apply one hour of changes.

Conclusion

You can use the same benchmarking tools that are recommended for standalone MySQL servers to perform performance tests against replication setups. You only need to monitor replication lag together with other performance characteristics.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com