Replication has been the core functionality, allowing high availability in MySQL for decades already. However, you may still encounter replication errors that keep you awake at night. One of the most common and challenging to deal with starts with: “Got fatal error 1236 from source when reading data from binary log“. This blog post is […]
29
2024
Valkey/Redis: Setting Up Replication
In most database systems, like MySQL, PostgreSQL, and MongoDB, replication of some kind is used to create a highly available architecture. Valkey/Redis is no different in this regard. Replication is native functionality to Valkey, supporting multiple replicas, and even chains of replication.To clear up any confusion, understand that Valkey replication is a different concept compared […]
14
2023
How to Convert PostgreSQL Streaming Replication to Logical Replication
While a hallmark feature of PostgreSQL is streaming replication, sometimes data demands a more refined and adaptable replication approach to address different requirements. Logical replication offers a distinct advantage over traditional streaming replication by providing a more detailed and selective approach to data replication, enabling organizations to replicate specific tables selectively and even filter out particular types of data.
In this blog post, we will look into the steps of transforming an existing PostgreSQL streaming replication node into a logical replication node. This method facilitates a smooth transition to logical replication without the need to start from scratch, allowing organizations to harness the enhanced capabilities of logical replication effectively.
Pre-requisites
To switch from the current streaming replication to logical replication, confirm that the primary node fulfills the following prerequisites:
- Set the wal_level to logical.
- Adjust max_replication_slots and max_wal_senders as needed.
- Ensure all tables have a replica identity set.
Steps to convert streaming replication to logical replication:
The lab setup includes two servers: primary (172.31.46.28) and standby (172.31.39.50), currently configured with streaming replication. The following steps will guide the transition from streaming to logical replication.
1. Verify that the current streaming replication setup is synchronized and that there is no lag between the primary and standby nodes.
percona=# select usename,client_addr,state,replay_lag from pg_stat_replication; usename | client_addr | state | replay_lag ---------+--------------+-----------+------------ percona | 172.31.39.50 | streaming | (1 row) percona=# select slot_name,slot_type,active from pg_replication_slots; slot_name | slot_type | active -----------------------+-----------+-------- physical_replica_slot | physical | t (1 row)
2. Create a logical replication slot on the primary node.
percona=# select pg_create_logical_replication_slot('logical_replica_slot', 'pgoutput'); pg_create_logical_replication_slot ------------------------------------ (logical_replica_slot,0/7000180) (1 row) percona=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase -----------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+----------- physical_replica_slot | | physical | | | f | t | 116306 | | | 0/7000180 | | reserved | | f logical_replica_slot | pgoutput | logical | 16384 | percona | f | f | | | 768 | 0/7000148 | 0/7000180 | reserved | | f (2 rows)
3. Create publication for tables on the primary node.
percona=# CREATE PUBLICATION my_pub FOR ALL TABLES; CREATE PUBLICATION
4. Next, promote the intended standby node to a standalone node.
percona=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# SELECT pg_promote(); pg_promote ------------ t (1 row) postgres=# SELECT pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)
5. Review the PostgreSQL log file on the promoted standby server and take note of the Log Sequence Number (LSN) associated with the “redo done at” message. In the below log file, the LSN is 0/7000640.
promoted standby postgresql Logfile: 2023-12-04 13:41:11.527 UTC [37749] LOG: received promote request 2023-12-04 13:41:11.527 UTC [37750] FATAL: terminating walreceiver process due to administrator command 2023-12-04 13:41:11.527 UTC [37749] LOG: invalid record length at 0/7000678: wanted 24, got 0 2023-12-04 13:41:11.527 UTC [37749] LOG: redo done at 0/7000640 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 1638.50 s 2023-12-04 13:41:11.527 UTC [37749] LOG: last completed transaction was at log time 2023-12-04 13:40:25.619889+00 2023-12-04 13:41:11.535 UTC [37749] LOG: selected new timeline ID: 2 2023-12-04 13:41:11.622 UTC [37749] LOG: archive recovery complete 2023-12-04 13:41:11.633 UTC [37747] LOG: checkpoint starting: force 2023-12-04 13:41:11.637 UTC [37746] LOG: database system is ready to accept connections 2023-12-04 13:41:11.639 UTC [37883] WARNING: archive_mode enabled, yet archiving is not configured 2023-12-04 13:41:11.955 UTC [37747] LOG: checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.305 s, sync=0.005 s, total=0.322 s; sync files=5, longest=0.004 s, average=0.001 s; distance=1 kB, estimate=14745 kB
The log entry “redo done at 0/7000640” provides details about the most recently applied transaction location. We utilize this Log Sequence Number (LSN) to progress the logical replication slot created in step two. Consequently, transactions occurring after this point will be applied to the newly promoted node.
6. On the primary server, advance the logical replication slot created in step two.
percona=# select pg_replication_slot_advance('logical_replica_slot', '0/7000640'); pg_replication_slot_advance ---------------------------------- (logical_replica_slot,0/7000640)
7. On the standby server, create the subscription for the publication created in step three.
percona=# CREATE SUBSCRIPTION my_sub CONNECTION 'user=percona password=<redacted> host=<host_ip> port=5432' PUBLICATION my_pub WITH (copy_data=false, slot_name='logical_replica_slot', create_slot=false);
8. At this stage, logical replication will commence, which can be verified by inspecting the replication status on the primary server.
percona=# select slot_name,slot_type,active from pg_replication_slots; slot_name | slot_type | active -----------------------+-----------+-------- physical_replica_slot | physical | f logical_replica_slot | logical | t (2 rows) percona=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 116806 usesysid | 16434 usename | percona application_name | my_sub client_addr | 172.31.39.50 client_hostname | client_port | 52512 backend_start | 2023-12-04 13:52:46.200631+00 backend_xmin | state | streaming sent_lsn | 0/7002148 write_lsn | 0/7002148 flush_lsn | 0/7002148 replay_lsn | 0/7002148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2023-12-04 14:06:19.464122+00
9. It’s important to remove the current physical replication slot to avoid filling up the primary server’s disk.
postgres=# select pg_drop_replication_slot('physical_replica_slot'); pg_drop_replication_slot -------------------------- (1 row)
The steps covered in this process make the transition to logical replication smoother, allowing for selective replication and improved data management across different parts of your database system. This change provides more control over how you manage and share data within PostgreSQL.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
31
2023
Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB
Hello friends,
In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.
This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:
PS-primary-1=192.168.0.14 [RW] |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0) |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)
[RW] means Read/Write access.
[R] means Read Only access.
We received an alert of this kind on PS-primary-2:
Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177
Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:
mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
Or by injecting an empty transaction.
(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)
Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.
Therefore, we executed the pt-table-checksum in the following way:
$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T12:40:31 0 1 6 1 1 0 0.193 foo.persons
(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)
(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)
At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.
PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 08:40:31 | foo | persons | 5 | 1 | +---------------------+-----+---------+------------+--------+ 1 row in set (0.00 sec) PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 09:53:10 | foo | persons | 4 | 1 | +---------------------+-----+---------+------------+--------+ 1 row in set (0.00 sec)
The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:
$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
A DELETE statement has appeared. How is it possible?
As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.
$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T13:08:12 0 1 5 1 1 0 0.147 foo.persons
That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:
PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 10:03:00 | foo | persons | 6 | 1 | +---------------------+-----+---------+------------+--------+ PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 10:03:00 | foo | persons | 6 | 1 | +---------------------+-----+---------+------------+--------+
Let’s see what data the table currently contains:
$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done 192.168.0.14 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.59 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.99 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+
As you can see, there is data drift everywhere.
And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list
Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):
- A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
- A statement was executed directly on the replica
- This can happen if the replica was not in super_read_only and a super user executed
- This can happen if the replica was not in read_only
- A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
- A primary server is not configured for full ACID compliance, and it crashed
- At some point, the primary was not configured for row-based replication (even briefly)
These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.
Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:
$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/; DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.
Therefore, note that it asks us to delete rows, which is not correct. So, what to do? After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:
- We need to add the row with ID=3 to the instances where it’s missing.
- We need to update the row with ID=1 in the instances where it’s missing.
- We need to add the row with ID=7 to the instances where it’s missing.
- We DO NOT need to delete any rows; no DELETES are necessary.
Important note: The client informed us that the data that exists only in one instance and not in the others is necessary. The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.
How do we achieve this, then?
Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.
We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here
MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):
- Try to insert the new row into the table
- While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
- Delete from the table the conflicting row that has the duplicate key value
- Try again to insert the new row into the table
That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):
PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13'); Query OK, 1 row affected (0.01 sec)
We connected to PS-primary-2, and indeed, we found the duplicate key error:
PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG PAGER set to 'grep "Last_Err"' Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840 1 row in set (0.01 sec)
The commands we should execute would be the following:
set sql_log_bin=0; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
However, as we observed, the first one will fail due to a duplicate key. So, how do we resolve this? The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:
PS-primary-1 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > Bye PS-primary-2 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.00 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-2 > Bye PS-replica-1 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > set global super_read_only=1; set global read_only=1; Query OK, 0 rows affected (0.00 sec)
Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.
$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done 192.168.0.14 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.59 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.99 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ $ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T14:12:32 0 0 7 0 1 0 0.185 foo.persons $ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T14:12:50 0 0 7 0 1 0 0.148 foo.persons
Conclusion
Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.
Remember that if you have any questions or concerns, you can always contact us.
Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.
Let’s stay synchronized, and I’ll see you in the next blog!
31
2023
Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB
Hello friends,
In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.
This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:
PS-primary-1=192.168.0.14 [RW] |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0) |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)
[RW] means Read/Write access.
[R] means Read Only access.
We received an alert of this kind on PS-primary-2:
Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177
Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:
mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
Or by injecting an empty transaction.
(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)
Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.
Therefore, we executed the pt-table-checksum in the following way:
$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T12:40:31 0 1 6 1 1 0 0.193 foo.persons
(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)
(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)
At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.
PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 08:40:31 | foo | persons | 5 | 1 | +---------------------+-----+---------+------------+--------+ 1 row in set (0.00 sec) PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 09:53:10 | foo | persons | 4 | 1 | +---------------------+-----+---------+------------+--------+ 1 row in set (0.00 sec)
The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:
$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
A DELETE statement has appeared. How is it possible?
As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.
$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T13:08:12 0 1 5 1 1 0 0.147 foo.persons
That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:
PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 10:03:00 | foo | persons | 6 | 1 | +---------------------+-----+---------+------------+--------+ PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +---------------------+-----+---------+------------+--------+ | max(ts) | db | tbl | total_rows | chunks | +---------------------+-----+---------+------------+--------+ | 2023-07-28 10:03:00 | foo | persons | 6 | 1 | +---------------------+-----+---------+------------+--------+
Let’s see what data the table currently contains:
$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done 192.168.0.14 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.59 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.99 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+
As you can see, there is data drift everywhere.
And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list
Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):
- A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
- A statement was executed directly on the replica
- This can happen if the replica was not in super_read_only and a super user executed
- This can happen if the replica was not in read_only
- A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
- A primary server is not configured for full ACID compliance, and it crashed
- At some point, the primary was not configured for row-based replication (even briefly)
These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.
Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:
$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/; DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.
Therefore, note that it asks us to delete rows, which is not correct. So, what to do? After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:
- We need to add the row with ID=3 to the instances where it’s missing.
- We need to update the row with ID=1 in the instances where it’s missing.
- We need to add the row with ID=7 to the instances where it’s missing.
- We DO NOT need to delete any rows; no DELETES are necessary.
Important note: The client informed us that the data that exists only in one instance and not in the others is necessary. The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.
How do we achieve this, then?
Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.
We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here
MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):
- Try to insert the new row into the table
- While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
- Delete from the table the conflicting row that has the duplicate key value
- Try again to insert the new row into the table
That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):
PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13'); Query OK, 1 row affected (0.01 sec)
We connected to PS-primary-2, and indeed, we found the duplicate key error:
PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG PAGER set to 'grep "Last_Err"' Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840 1 row in set (0.01 sec)
The commands we should execute would be the following:
set sql_log_bin=0; REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
However, as we observed, the first one will fail due to a duplicate key. So, how do we resolve this? The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:
PS-primary-1 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-1 > Bye PS-primary-2 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.00 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-primary-2 > Bye PS-replica-1 > set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10'); Query OK, 1 row affected (0.01 sec) PS-replica-1 > set global super_read_only=1; set global read_only=1; Query OK, 0 rows affected (0.00 sec)
Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.
$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done 192.168.0.14 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.59 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ 192.168.0.99 +----------+-----------+-----------+---------+------------+---------------------+ | PersonID | LastName | FirstName | Address | City | Insert_timestamp | +----------+-----------+-----------+---------+------------+---------------------+ | 1 | Iglesias | Par | IND | Jaipur | 2023-07-27 07:10:10 | | 2 | Rooster | War | IND | Tamil Nadu | 2023-07-27 07:10:10 | | 3 | Colonel | Vil | PER | Lima | 2023-07-27 07:10:10 | | 4 | SoS | Syl | CAN | Ontario | 2023-07-27 07:10:10 | | 5 | Al-lot | Pat | IND | Delhi | 2023-07-27 07:10:10 | | 6 | The Crazy | Vai | IND | Vadodara | 2023-07-27 07:10:10 | | 7 | Ewd | Woo | CAN | Vancouver | 2023-07-28 07:10:10 | +----------+-----------+-----------+---------+------------+---------------------+ $ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T14:12:32 0 0 7 0 1 0 0.185 foo.persons $ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 07-28T14:12:50 0 0 7 0 1 0 0.148 foo.persons
Conclusion
Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.
Remember that if you have any questions or concerns, you can always contact us.
Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.
Let’s stay synchronized, and I’ll see you in the next blog!
20
2023
How to Read Simplified SHOW REPLICA STATUS Output
As a MySQL database administrator, you’re likely familiar with the SHOW REPLICA STATUS command. It is an important command for monitoring the replication status on your MySQL replicas. However, its output can be overwhelming for beginners, especially regarding the binary log coordinates. I have seen confusion amongst new DBAs on which binary log file and position represent what in the replication.
In this guide, we’ll simplify the SHOW REPLICA STATUS output, focusing on the critical binary log coordinates essential for troubleshooting and managing replication.
The key binlog coordinates
Before we delve into the output, let’s understand the key binlog coordinates we’ll be working with:
- Master_Log_File: This is the name of the primary binary log file that the I/O thread is currently reading from.
- Read_Master_Log_Pos: It represents the position up to which the I/O thread has read in the current primary binary log file.
- Relay_Log_File: This is the name of the relay log file that the SQL thread is currently processing.
- Relay_Log_Pos: It shows the position up to which the SQL thread has finished processing in the current relay log file.
- Relay_Master_Log_File: This is the name of the primary binary log file that contains the most recent event executed by the SQL thread.
- Exec_Master_Log_Pos: It indicates the position up to which the SQL thread has processed in the current master binary log file. It can be used to start a new replica from a current replica with the CHANGE MASTER TO … MASTER_LOG_POS option.
Simplified SHOW REPLICA STATUS output
I thought of creating the following image to effectively make you understand the binary log coordinates in the show replica status output.
Decoding the SHOW REPLICA STATUS output
Now, let’s break down the SHOW REPLICA STATUS output to understand these binlog coordinates:
- Master_Log_File and Read_Master_Log_Pos: These values tell you which primary binary log file and position the I/O thread is currently reading. It’s like the bookmark in a book, showing you where the replication process is in the Primary’s log.
- Relay_Log_File and Relay_Log_Pos: These values reveal the name of the relay log file and the position up to which the SQL thread has processed. Think of it as the progress report of the SQL thread.
- Relay_Master_Log_File and Exec_Master_Log_Pos: These parameters are essential when you need to reset replication on a replica.
- Relay_Master_Log_File specifies the name of the primary binary log file containing the most recent event executed by the SQL thread.
- Exec_Master_Log_Pos tells you the precise position within that file. Together, they enable you to pick up where replication left off.
Troubleshooting and managing replication
Understanding these binlog coordinates simplifies troubleshooting and managing replication:
- When replication breaks, you can use Relay_Master_Log_File and Exec_Master_Log_Pos to identify the exact location and resume from there.
- Monitoring Master_Log_File and Read_Master_Log_Pos helps you keep track of the I/O thread’s progress on the primary.
- Checking Relay_Log_File and Relay_Log_Pos lets you know how far the SQL thread has come in processing events.
By grasping these key binlog coordinates, you can confidently manage MySQL replication, resolve issues efficiently, and keep your databases in sync.
Quick tip for DBAs
We know monitoring is one of the important components of your architecture. We recommend using Percona Monitoring and Management, the best monitoring tool for your open source databases.
The MySQL Replication Summary dashboard comes in really handy when monitoring the replication status. Please find the sample snapshot from our test monitoring node:
Conclusion
The SHOW REPLICA STATUS output shouldn’t be something confusing you. By focusing on the binlog coordinates, you gain valuable insights into the replication process. Whether you’re troubleshooting a broken replica or monitoring ongoing replication, these coordinates guide success in managing MySQL replication. Next time you encounter the SHOW REPLICA STATUS output, remember that it’s simply telling you where your replication stands in the grand scheme of your MySQL environment.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
17
2023
InnoDB ClusterSet Deployment With MySQLRouter
This blog post will cover the basic setup of the InnoDB ClusterSet environment, which provides disaster tolerance for InnoDB Cluster deployments by associating a primary InnoDB Cluster with one or more replicas in alternate locations/different data centers. InnoDB ClusterSet automatically manages replication from the primary cluster to the replica clusters via a specific ClusterSet Async replication channel. If the primary cluster becomes inaccessible due to a loss of network connectivity or a data center issue, you can make a replica cluster active in its place.
Now, let’s see in detail how exactly we can configure the topology.
We have used the sandbox environment available via MySQLShell utility for this setup.
Environment
Cluster1: 127.0.0.1:3308 127.0.0.1:3309 127.0.0.1:3310 Cluster2: 127.0.0.1:3311 127.0.0.1:3312 127.0.0.1:3313 Router: 127.0.0.1:6446/6447
Let’s set up the first cluster (“cluster1”)
- Deploying the sandboxes.
MySQL JS > dba.deploySandboxInstance(3308) MySQL JS > dba.deploySandboxInstance(3309) MySQL JS > dba.deploySandboxInstance(33010)
- Then, we need to perform some pre-checks before initiating the cluster.
###connecting to the concerned nodes one by one. MySQL JS > shell.connect('root@localhost:3308') MySQL localhost:3308 ssl JS > shell.connect('root@localhost:3309') MySQL localhost:3309 ssl JS > shell.connect('root@localhost:3310') ###The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3308') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3308',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3309 ssl JS > dba.checkInstanceConfiguration('root@localhost:3309') MySQL localhost:3309 ssl JS > dba.configureInstance('root@127.0.0.1:3309',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3310 ssl JS > dba.checkInstanceConfiguration('root@localhost:3310') MySQL localhost:3310 ssl JS > dba.configureInstance('root@127.0.0.1:3310',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})
Once all the instances are prepared, we can plan to create the cluster with the seed node. The “createcluster” command will perform all the hidden steps of initializing group replication, and later on, the other nodes join the group with distributed recovery/clone plugin.
InnoDB cluster is built on top of group replication which provides (automatic membership management, fault tolerance, and automatic failover). It provides us with an easy interface to deploy/manage the complex topologies with DR support.
- We will bootstrap the cluster with an initial node(“localhost:3308″).
MySQL localhost:3310 ssl JS > shell.connect('iroot@localhost:3308') MySQL localhost:3308 ssl JS > cluster1 = dba.createCluster('Cluster1') MySQL localhost:3308 ssl JS > cluster1 = dba.getCluster()
Output:
MySQL localhost:3308 ssl JS > cluster1.status() { "clusterName": "Cluster1", "defaultReplicaSet": { "name": "default", "primary": "127.0.0.1:3308", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "127.0.0.1:3308" }
- Here, we have successfully bootstrapped the first node. Next, the other nodes will join the cluster using the CLONE Plugin.
MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3309",{password:'Iroot@1234'})
Output:
* Waiting for clone to finish... NOTE: 127.0.0.1:3309 is being cloned from 127.0.0.1:3308 ** Stage DROP DATA: Completed ** Clone Transfer FILE COPY ############################################################ 100% Completed PAGE COPY ############################################################ 100% Completed REDO COPY ############################################################ 100% Completed NOTE: 127.0.0.1:3309 is shutting down... * Waiting for server restart... ready * 127.0.0.1:3309 has restarted, waiting for clone to finish... ** Stage RESTART: Completed * Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s) State recovery already finished for '127.0.0.1:3309' The instance '127.0.0.1:3309' was successfully added to the cluster.
MySQL localhost:3308 ssl JS > cluster1.addInstance("iroot@localhost:3310",{password:'Iroot@1234'})
Output:
* Waiting for clone to finish... NOTE: 127.0.0.1:3310 is being cloned from 127.0.0.1:3309 ** Stage DROP DATA: Completed ** Clone Transfer FILE COPY ############################################################ 100% Completed PAGE COPY ############################################################ 100% Completed REDO COPY ############################################################ 100% Completed NOTE: 127.0.0.1:3310 is shutting down... * Waiting for server restart... ready * 127.0.0.1:3310 has restarted, waiting for clone to finish... ** Stage RESTART: Completed * Clone process has finished: 73.66 MB transferred in about 1 second (~73.66 MB/s) State recovery already finished for '127.0.0.1:3310' The instance '127.0.0.1:3310' was successfully added to the cluster.
- At this stage, our first cluster is ready with all three nodes.
MySQL localhost:3308 ssl JS > cluster1.status()
Output:
{ "clusterName": "Cluster1", "defaultReplicaSet": { "name": "default", "primary": "127.0.0.1:3308", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.31" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "127.0.0.1:3308" }
Let’s now proceed with the second cluster (“cluster2”) setup
- Deploying the sandboxes via MySqlShell.
MySQL JS > dba.deploySandboxInstance(3311) MySQL JS > dba.deploySandboxInstance(3312) MySQL JS > dba.deploySandboxInstance(3313)
- Similarly, perform some pre-checks as we did for “cluster1” nodes.
# connecting to the concerned nodes. MySQL JS > shell.connect('root@localhost:3311') MySQL JS > shell.connect('root@localhost:3312') MySQL JS > shell.connect('root@localhost:3313') # The below commands will check if satisfying the Innodb cluster requirements (group replication settings) and fix the missing requirements automatically. Here, we have configured a new user "iroot" for cluster deployment. MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3311') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3311',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3312') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3312',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'}) MySQL localhost:3308 ssl JS > dba.checkInstanceConfiguration('root@localhost:3313') MySQL localhost:3308 ssl JS > dba.configureInstance('root@127.0.0.1:3313',{clusterAdmin: 'iroot', clusterAdminPassword: 'Iroot@1234'})
- Next, we will create the ClusterSet topology by triggering the sync on the node (127.0.0.1:3311) by existing cluster1 nodes. Node (127.0.0.1:3311) will be the Primary node for cluster2, and the rest of other nodes will join this node by Clone/Incremental process.
1) First, connect to “cluster1” node.
MySQL localhost:3308 ssl JS > c iroot@127.0.0.1:3308 MySQL 127.0.0.1:3308 ssl JS > cluster1 = dba.getCluster()
2) Here, “cluster1” join the ClusterSet topology,
MySQL 127.0.0.1:3308 ssl JS > myclusterset = cluster1.createClusterSet('firstclusterset')
Output:
ClusterSet successfully created. Use ClusterSet.createReplicaCluster() to add Replica Clusters to it. <ClusterSet:firstclusterset>`
3) Verifying the status.
MySQL 127.0.0.1:3308 ssl JS > myclusterset.status({extended: 1})
Output:
{ "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-85,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available." }
4) Now, Node (“127.0.0.1:3311″) will sync with the existing “cluster1” with Async process.
MySQL 127.0.0.1:3308 ssl JS > c iroot@127.0.0.1:3311 MySQL 127.0.0.1:3311 ssl JS > cluster2 = myclusterset.createReplicaCluster("127.0.0.1:3311", "cluster2", {recoveryProgress: 1, timeout: 10})
Output:
... Replica Cluster 'cluster2' successfully created on ClusterSet 'firstclusterset'. ...
5) Next, the other nodes join the “cluster2” with the clone process.
MySQL 127.0.0.1:3311 ssl JS > cluster2.addInstance("iroot@127.0.0.1:3312",{password:'Iroot@1234'}) MySQL 127.0.0.1:3311 ssl JS > cluster2.addInstance("iroot@127.0.0.1:3313",{password:'Iroot@1234'})
6) Finally, checking the status of our clusterset environment.
MySQL 127.0.0.1:3311 ssl JS > myclusterset = dba.getClusterSet() MySQL 127.0.0.1:3311 ssl JS > myclusterset.status({extended: 1})
Output:
{ "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" }, "cluster2": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3311", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3308" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-124,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available."
Here, the ClusterSet topology is ready now with all six nodes.
In the next phase, we will bootstrap MySQLRouter with our newly created ClusterSet environment:
- First, we will generate a dedicated user for MySQLRouter monitoring/management.
MySQL 127.0.0.1:3311 ssl JS > c iroot@localhost:3308 MySQL localhost:3308 ssl JS > cluster1 = dba.getCluster(); MySQL localhost:3308 ssl JS > cluster1.setupRouterAccount('router_usr')
Output:
Missing the password for new account router_usr@%. Please provide one. Password for new account: ********** Confirm password: ********** Creating user router_usr@%. Account router_usr@% was successfully created.
- Bootstrap the router with the user (“router_usr) and router name (“Router1”).
[vagrant@localhost ~]$ sudo mysqlrouter --bootstrap iroot@127.0.0.1:3308 --account=router_usr --name='Router1' --user root --force
We are using –-force here because without –-force mysqlrouter won’t recognize the clusterset. This will reconfigure the existing clusterset.
Here, we will see some useful information that later on is required to connect to a database or manage the services.
# MySQL Router 'Router1' configured for the ClusterSet 'firstclusterset' After this MySQL Router has been started with the generated configuration $ /etc/init.d/mysqlrouter restart or $ systemctl start mysqlrouter or $ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf ClusterSet 'firstclusterset' can be reached by connecting to: ## MySQL Classic protocol - Read/Write Connections: localhost:6446 - Read/Only Connections: localhost:6447 ## MySQL X protocol - Read/Write Connections: localhost:6448 - Read/Only Connections: localhost:6449
- Finally, start the mysqlrouter service:
sudo mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf &
Validating the connection route
- Connect to the router port “6446” and create some demo table/data:
shell> mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6446 -e "create database sbtest;use sbtest;create table sbtest1 (id int(10) not null auto_increment primary key, user varchar(50));insert into sbtest1(user) values('test');"
- Connect to the router port “6447” for reading purposes. Here, the connection will be, by default, balanced among the number of nodes of the Primary Cluster(cluster1).
[vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 194452202 | +-------------+ [vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 2376678236 | +-------------+ [vagrant@localhost ~]$ mysql -h 127.0.0.1 -u root -pRoot@1234 -P 6447 -e "use sbtest;select * from sbtest1;select @@server_id;" mysql: [Warning] Using a password on the command line interface can be insecure. +----+------+ | id | user | +----+------+ | 1 | test | +----+------+ +-------------+ | @@server_id | +-------------+ | 194452202 | +-------------+
So, by default, all the connections will route to the default “Primary” Cluster, which, in our case, is “Clustrer1”; however, we can change the primary component based on the requirement.
Changing ClusterSet topology
MySQL localhost:3308 ssl JS > myclusterset=dba.getClusterSet() MySQL localhost:3308 ssl JS > myclusterset.status({extended:1})
Output:
<ClusterSet:firstclusterset> { "clusters": { "Cluster1": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3308", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" }, "cluster2": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3311", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3308" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-143,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3308", "metadataServer": "127.0.0.1:3308", "primaryCluster": "Cluster1", "status": "HEALTHY", "statusText": "All Clusters available." }
-
Changing the Primary cluster from “cluster1” to “cluster2:
MySQL localhost:3308 ssl JS > myclusterset.setPrimaryCluster('cluster2')
Output:
Switching the primary cluster of the clusterset to 'cluster2' * Verifying clusterset status ** Checking cluster cluster2 Cluster 'cluster2' is available ** Checking cluster Cluster1 Cluster 'Cluster1' is available * Reconciling 5 internally generated GTIDs * Refreshing replication account of demoted cluster * Synchronizing transaction backlog at 127.0.0.1:3311 ** Transactions replicated ############################################################ 100% * Updating metadata * Updating topology ** Changing replication source of 127.0.0.1:3309 to 127.0.0.1:3311 ** Changing replication source of 127.0.0.1:3310 to 127.0.0.1:3311 ** Changing replication source of 127.0.0.1:3308 to 127.0.0.1:3311 * Acquiring locks in replicaset instances ** Pre-synchronizing SECONDARIES ** Acquiring global lock at PRIMARY ** Acquiring global lock at SECONDARIES * Synchronizing remaining transactions at promoted primary ** Transactions replicated ############################################################ 100% * Updating replica clusters Cluster 'cluster2' was promoted to PRIMARY of the clusterset. The PRIMARY instance is '127.0.0.1:3311'
- If we see the output again, we can observe that “clusterRole:PRIMARY” is shifted to “cluster2”.
<span class="s1">My</span><span class="s2">SQL </span><span class="s3"> localhost:3308 ssl </span><span class="s4"> JS </span><span class="s5">> </span><span class="s6">myclusterset.status({extended:1})</span>
Output:
{ "clusters": { "Cluster1": { "clusterRole": "REPLICA", "clusterSetReplication": { "applierStatus": "APPLIED_ALL", "applierThreadState": "Waiting for an event from Coordinator", "applierWorkerThreads": 4, "receiver": "127.0.0.1:3308", "receiverStatus": "ON", "receiverThreadState": "Waiting for source to send event", "source": "127.0.0.1:3311" }, "clusterSetReplicationStatus": "OK", "globalStatus": "OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3308": { "address": "127.0.0.1:3308", "memberRole": "PRIMARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3309": { "address": "127.0.0.1:3309", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3310": { "address": "127.0.0.1:3310", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5", "transactionSetConsistencyStatus": "OK", "transactionSetErrantGtidSet": "", "transactionSetMissingGtidSet": "" }, "cluster2": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "127.0.0.1:3311", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "127.0.0.1:3311": { "address": "127.0.0.1:3311", "memberRole": "PRIMARY", "mode": "R/W", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3312": { "address": "127.0.0.1:3312", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" }, "127.0.0.1:3313": { "address": "127.0.0.1:3313", "memberRole": "SECONDARY", "mode": "R/O", "replicationLagFromImmediateSource": "", "replicationLagFromOriginalSource": "", "status": "ONLINE", "version": "8.0.31" } }, "transactionSet": "2e71122e-2862-11ee-b81c-5254004d77d3:1-5,39c28b63-285a-11ee-a411-5254004d77d3:1-4,59d8e60a-285d-11ee-bb44-5254004d77d3:1-145,59d8f3a6-285d-11ee-bb44-5254004d77d3:1-5" } }, "domainName": "firstclusterset", "globalPrimaryInstance": "127.0.0.1:3311", "metadataServer": "127.0.0.1:3311", "primaryCluster": "cluster2", "status": "HEALTHY", "statusText": "All Clusters available." }
So, we have changed the Primary component from cluster1 to cluster2, but the routing is still set for cluster1. In order to send traffic to cluster2, we also have to change the routing option.
MySQL localhost:3308 ssl JS > myclusterset.listRouters()
Output:
{ "domainName": "firstclusterset", "routers": { "localhost.localdomain::Router1": { "hostname": "localhost.localdomain", "lastCheckIn": "2023-07-22 02:47:42", "roPort": "6447", "roXPort": "6449", "rwPort": "6446", "rwXPort": "6448", "targetCluster": "primary", "version": "8.0.32" },
- Changing the connection target from “cluster1” to “cluster2”:
MySQL localhost:3308 ssl JS > myclusterset.setRoutingOption('localhost.localdomain::Router1', 'target_cluster', 'cluster2')
MySQL localhost:3308 ssl JS > myclusterset.listRouters()
Output:
MySQL localhost:3308 ssl JS > myclusterset.listRouters() { "domainName": "firstclusterset", "routers": { "localhost.localdomain::Router1": { "hostname": "localhost.localdomain", "lastCheckIn": "2023-07-22 02:47:42", "roPort": "6447", "roXPort": "6449", "rwPort": "6446", "rwXPort": "6448", "targetCluster": "cluster2", "version": "8.0.32" }
Verifying the routing policy in the existing clusterset
MySQL localhost:3308 ssl JS > myclusterset.routingOptions()
Output:
{ "domainName": "firstclusterset", "global": { "invalidated_cluster_policy": "drop_all", "stats_updates_frequency": 0, "target_cluster": "primary" }, "routers": { "localhost.localdomain::Router1": { "target_cluster": "cluster2" } } }
There are situations when Primary clusters are not available or reachable. The immediate solution in some situations would be to perform an emergency failover in order to avoid the application block out.
An emergency failover basically switches to a selected replica cluster from the primary InnoDB Cluster for the InnoDB ClusterSet deployment. During an emergency failover process, data consistency is not assured due to async replication and other network factors, so for safety, the original primary cluster is marked as invalidated during the failover process.
So if by any chance the original primary cluster remains online, it should be shut down. Later, the invalidated primary cluster can join the clusterset via rejoin/repair process.
Perform emergency failover
myclusterset.forcePrimaryCluster("cluster2") myclusterset.setRoutingOption('localhost::Route1', 'target_cluster', 'cluster2')
Summary
With the help of ClusterSet implementation, deploying DR support over different regions is no more a complex challenge. MySQLShell and InnoDB cluster tackles all the configurations and syncing process behind the scene. The disaster recovery and failure time can be minimized with the help of the Admin APIs/MySQLShell commands.
There is one caveat with the clusterset functioning. It does not support high availability/auto-promotion of the new primary if the existing one goes down. We must take care of the same with some manual intervention or via some internal automated process.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
23
2023
Working of MySQL Replication Filters When Using Statement-based and Row-based Replication
A couple of days ago I was creating an index on the source and when I checked the replica side it was not replicated, so I just wanted to explain how the replication filter may increase the complexity of your DBA operations.
Replication occurs by reading events from the binary log of the source and then executing them on the replica. The events in the binary log are recorded in different formats, depending on the type of event. These formats are determined by the binary logging format used when the events were initially recorded on the source. The relationship between the binary logging formats and the terminology used during replication is as follows:
When using statement-based binary logging, the source writes SQL statements to the binary log. Replication of the source to the replica is performed by executing these SQL statements on the replica. This is known as statement-based replication (SBR), and it corresponds to the statement-based binary logging format in MySQL.
When using row-based binary logging, the source writes events to the binary log that shows how individual rows in tables are changed. Replication of the source to the replica is done by copying these events, representing the changes in the table rows, to the replica. This is known as row-based replication (RBR) and it corresponds to the row-based binary logging format in MySQL.
Row-based logging is the default method.
You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which is most appropriate for the change to be logged. This is called mixed-format binary logging.
MySQL supports two types of replication filters; those that apply at the database level and those that apply and the table level:
- Database level filters:
- Table level filters:
There are filters at the database level, known as binlog-do-db and binlog-ignore-db, that control what is included in the binary log. However, it’s important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. It’s therefore recommended to use filters that apply to the replica rather than the binary log.
When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. And when using table-level filters with STATEMENT-based replication, the filter applies to the table change.
Mixing database and table-level filters create more complex rules. First, the database-level filters are checked, and if the update qualifies, the table-level filters are also checked. This can result in different outcomes for STATEMENT-based and ROW-based replication when using non-default databases. Additionally, using the MIXED format for binary logging can also cause issues, as a small change in the query can change whether the statement is logged in the STATEMENT- or ROW-based format. For this reason, it’s safer to use table-level replication filters rather than database-level filters.
See Determination of Safe and Unsafe Statements in Binary Logging for details of how it is decided whether a statement is logged using the STATEMENT-based or ROW-based format when using MIXED mode replication.
Examples
All the following examples use the following schema:
mysql> CREATE DATABASE databasefilter; Query OK, 1 row affected (0.02 sec) mysql> CREATE DATABASE databasewithoutfilter; Query OK, 1 row affected (0.00 sec) mysql> use databasefilter; Database changed mysql> CREATE TABLE t1 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; Query OK, 0 rows affected (0.06 sec) mysql> CREATE TABLE t2 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; Query OK, 0 rows affected (0.04 sec)
Example one
Replication filter:
replicate-wild-do-table = databasefilter.t1\_%
Statements:
use databasefilter; INSERT INTO t1 VALUES (1);
Replicates?
Statement-based replication: Yes
Row-based replication: Yes
The replication filter is set to “replicate-wild-do-table = databasefilter.t1_%”, meaning that any table in the “databasefilter” database with a name starting with “t1_” will be replicated. If the statement “use databasefilter; INSERT INTO t1 VALUES (1);” is executed, it will be replicated using both statement-based and row-based replication methods.
Example two
Replication filter:
replicate-wild-do-table = databasefilter.t2\_%
Statements:
use databasefilter; INSERT INTO t1 VALUES (1);
Replicates?
Statement-based replication: No
Row-based replication: No
The replication filter “replicate-wild-do-table = databasefilter.t2_%” is set, which means that only tables that match the pattern “databasefilter.t2_%” will be replicated. When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.
However, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will not replicate in either statement-based or row-based replication because the table “t1” does not match the pattern specified in the replication filter.
Example three
Replication filter:
replicate-do-db = databasefilter
Statements:
use databasefilter; INSERT INTO t1 VALUES (1);
Replicates?
Statement-based replication: Yes
Row-based replication: Yes
The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated.When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.
After that, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will replicate in both statement-based and row-based replication because it is executed in the “databasefilter” database that matches the pattern specified in the replication filter.
Example four
Replication filter:
replicate-do-db = databasefilter
Statements:
use databasewithoutfilter; INSERT INTO databasefilter.t1 VALUES (1);
Replicates?
Statement-based replication: No
Row-based replication: Yes
The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated. When the statement “use databasewithoutfilter;” is executed, the current database will be set to “databasewithoutfilter”, which does not match the pattern specified in the replication filter. However, when the statement “INSERT INTO databasefilter.t1 VALUES (1);” is executed, it will replicate in row-based replication but not in statement-based replication.
The reason for this is that statement-based replication replicates the entire statement, including the “use databasewithoutfilter” statement, which does not match the pattern specified in the replication filter. But in row-based replication, it only replicates the actual data change and it does not care about the current database.
Conclusion
MySQL replication filters can be used to control which events are replicated from the source to the replica. These filters can be applied at the database level or the table level and can increase the complexity of DBA operations. It is important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. Mixing database and table-level filters create more complex rules and can cause issues when using MIXED format for binary logging.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
06
2022
Importance of Delay Replica in Data Recovery: Recover Deleted Records
What is a Delay Replica and how does it help?
MySQL Replication is useful, easy to set up, and used for very different purposes. For example:
- split reads and writes
- run data mining or reporting processes on them
- disaster recovery
To learn more, check out How Does MySQL Replication Work?
It’s important to mention that a replication server is not a backup by itself. A mistake on the source, for example, a DROP DATABASE or an UPDATE without a WHERE clause will be replicated nearly instantly to all replica servers. Simply having a replica is not going to be helpful here. How can we avoid that kind of mistake? By having a replica server that intentionally lags behind.
We can never avoid human error in database infrastructure operations. But rollback to the last known good condition from delayed Source/Replica is the best thing recommended during the entire database infrastructure corruption scenarios.
Delayed replication can be used for several purposes:
- To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the replica. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging replica.
- To inspect what the database looked like in the past, without having to reload a backup. For example, by configuring a replica with a delay of one week, if you then need to see what the database looked like before the last few days’ worth of development, the delayed replica can be inspected.
- In our case, as we have six hours replication delay, we can recover the non-problematic state of the table by going back to six hours delayed replica in case there was the wrong DML on the source.
Testing the Delay Replica which can help us recover the deleted record
Let’s understand the situation first: Someone deleted the data on a Percona XtraDB Cluster 8 (PXC) server accidentally. As per architecture, we always configure a delayed replica. We have stopped the replication on the delayed replica and now we will restore the deleted records.
From the initial investigation, we came to know from the application operation team that the below query was executed.
delete from Schema_g2.per_sch order by id desc;
Almost 20k records were deleted. Let us do our pre-reqs and initial investigation based on the requirements we have.
In this article, PXC-8 is our source and pxc-backup is our delayed backup Replica. Click through to learn about our product Percona XtraDB Cluster and our backup solution Percona XtraBackup.
- All records for Schema_g2.per_sch were deleted from PXC-8 by mistake.
- We have stopped the delayed replica
pxc-backup
to retrieve these records. All deleted records are present here right now. Below are the number of records we have until replication was stopped.
Pxc-backup > select count(*) from Schema_g2.per_sch; +----------+ | count(*) | +----------+ | 21762 | +----------+
At this point, we have already stopped the backup replica.
- After deletion, below are the new records inserted on the PXC-8 source for this table.
PXC-8 (none)> select count(*) from Schema_g2.per_sch; +----------+ | count(*) | +----------+ | 215 | +----------+
- id column is auto_inc so new rows inserted have next values.
id int(11) NOT NULL AUTO_INCREMENT,
PXC-8 (none)> select * from Schema_g2.per_sch; +--------+---------------+-----------+-------+ | id | permission_id | school_id | grant | +--------+---------------+-----------+-------+ | 178852 | 446 | 48887 | 0 | | 178853 | 448 | 48887 | 0 | ...
- Records on the backup server:
Pxc-backup > select * from Schema_g2.per_sch limit 10; +-----+---------------+-----------+-------+ | id | permission_id | school_id | grant | +-----+---------------+-----------+-------+ | 0 | 0 | 0 | 0 | | 105 | 426 | 1 | 0 | | 429 | 89 | 16 | 0 | | 431 | 93 | 16 | 0 | ... | 178629 | 194 | 35758 | 0 | | 178630 | 195 | 35758 | 0 | | 178631 | 8239 | 35758 | 0 | +--------+---------------+-----------+-------+
- A few records between 178631 and 178852 are missing on the backup node and the replica was stopped in between.
- Exact position before the drop was executed on source: (Application team pinged this, we verified it as an UPDATE query before the DELETE).
START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;
From binlog we see below:
#210922 11:44:05 server id 8 end_log_pos 613996753 CRC32 0xee39f244 Query thread_id=36995659 exec_time=0 error_code=0 SET TIMESTAMP=1632300245/*!*/; BEGIN /*!*/; # at 613996753 #210922 11:44:05 server id 8 end_log_pos 613997049 CRC32 0x92aea136 Table_map: `Schema_g2`.`usr` mapped to number 109 # at 613997049 #210922 11:44:05 server id 8 end_log_pos 613997697 CRC32 0x1132b4ad Update_rows: table id 109 flags: STMT_END_F ### UPDATE `Schema_g2`.`usr` ....... #210922 11:44:04 server id 8 end_log_pos 613997872 CRC32 0xf41297bc Query thread_id=37823889 exec_time=0 error_code=0 SET TIMESTAMP=1632300244/*!*/; BEGIN /*!*/; # at 613997872 #210922 11:44:04 server id 8 end_log_pos 613997943 CRC32 0xe341c435 Table_map: `Schema_g2`.`per_sch` mapped to number 664 # at 613997943 #210922 11:44:04 server id 8 end_log_pos 614006154 CRC32 0xd71595b1 Delete_rows: table id 664
Steps to restore
- Start replication on the backup server until the DROP statement.
START SLAVE UNTIL MASTER_LOG_FILE = 'PXC-8-bin.001611', MASTER_LOG_POS = 613996753;
Verify binlog position and that it is caught up till above and verify new record count on the backup server.
Verify replication is stopped again.
select count(*) from Schema_g2.per_sch;
-Verify last id is < 178852
Use –where clause IF we notice any duplicate duplicates rows same as PXC-8 on the backup server.
--where=“id < 178852”
- Take a backup from the backup server with the below options to avoid dropping the newly added rows.
mysqldump -h backup-server-ip --single-transaction --skip-add-drop-table --no-create-info Schema_g2 per_sch > per_sch_backup.sql
Verify no drops/created are present in the backup file.
- Restore backup on source PXC-8:
mysql -h Source-server-ip Schema_g2 < per_sch_backup.sql
Verify it completes ok and records are added back on source:
mysql -h Source-server-ip -e "select count(*) from Schema_g2.per_sch"
Verify records < 178852 and above also exist on the source.
- Start replication on the backup server normally.
stop slave; start slave;
With loading the record on the source, it will get replicated to delay replica and other replicas and the data will be in sync again.
Conclusion
It is always recommended to have a Delay Replica in your architecture to avoid and quickly resolve such data losses.
29
2022
How to Benchmark Replication Performance in MySQL
In this blog, I will cover important aspects which you need to test when benchmarking replication setup. MySQL has great tools that could be used to test its performance. They include:
sysbench – https://github.com/akopytov/sysbench
BMK-kit – http://dimitrik.free.fr/blog/posts/mysql-perf-bmk-kit.html
mysqlslap – https://dev.mysql.com/doc/refman/8.0/en/mysqlslap.html
LinkBench – https://github.com/facebookarchive/linkbench
I will not describe how to use them here, as you can find instructions on the provided links or in the Percona blog by browsing tags sysbench, benchmark, benchmarks, and the category benchmarks.
However, most of the instructions take into account standalone MySQL servers. If you want to test how your replication setup performs, you may want to perform additional checks.
Can the replica catch up to the source server?
To answer this question, you need to run a load on the source machine and monitor the replica delay. You can do it by examining the
SHOW REPLICA STATUS
(
SHOW SLAVE STATUS
) output, particularly the value of the
Seconds_Behind_Source
(
Seconds_Behind_Master
) parameter. The same parameter is monitored by the “MySQL Replication Lag” graph under the “MySQL Replication Summary” dashboard in Percona Monitoring and Management (PMM).
The more advanced check would be to compare
Exec_Source_Log_Pos
(
Exec_Master_Log_Pos
),
Relay_Log_File
with
Read_Source_Log_Pos
(
Read_Master_Log_Pos
),
Source_Log_File
pairs since
Seconds_Behind_Source
may be affected by long-running commands and return wrong values.
You may tune your replica server by adding parallelization if you see increasing lag. Check also how option
binlog_transaction_dependency_tracking
Can the replica run queries while applying updates from the source server?
Once you are happy with the replication lag, you may start adding the load you plan to run on this replica and test its performance. For example, if you plan to use the replica for handling read-only queries, you can run a read-only benchmark on it. You need to watch the performance of the read-only benchmark and replication lag.
If you plan to run analytical queries on the replica: test them. If you plan to use this replica for taking backups: take the backup and measure how it will affect the performance of the replication.
Synchronous replication
Synchronous replication solutions such as Galera, Percona XtraDB Cluster (PXC), and Group Replication are affected by the flow control paused time. You can use PMM to monitor it.
For PXC/Galera clusters, watch graph “Flow Control Paused Time” under “PXC/Galera Cluster Summary” dashboard.
Or you can watch the value of the
wsrep_flow_control_paused
variable. See Galera Flow Control in Percona XtraDB Cluster for MySQL for more details.
For Group Replication, you need to check the value of the
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE
column in the table
performance_schema.replication_group_member_stats
. That shows how many transactions are waiting in the queue on the secondary node to apply.
You may also watch “Replication Lag” graph under “MySQL Group Replication” dashboard in PMM
Your best test is your production
As said by Dimitri Kravtchuk at Fosdem 2020:
While it is hard to create such a benchmark on the server that accepts writes from users, for the replica, it is comparatively easy. All you need to do is to create a replica of your production server and not make it available for the application yet. First, watch if it can handle replication lag while idle, then start adding load to it. You may also use the mirroring feature of ProxySQL Note; however, you need to check if some of the limitations described in this post, still apply.
How fast will the replica catch up?
One more thing you may want to test is how fast the replica will recover after a delay in replicating updates from the source server. To perform this test, stop the replica while running the source server’s load, restart it after a certain amount of time, and watch replication lag. If you are testing a replica that serves read-only load, let the load test continue.
It is better to perform stop, wait, and start in one series of commands, so you will have control over how long the replica was stopped. For example, to stop replica for one hour, use the following commands:
STOP REPLICA; SELECT SLEEP(3600); START REPLICA;
Then wait until the replica is running:
SELECT SERVICE_STATE FROM performance_schema.replication_applier_status;
Once this query returns ON, start monitoring the value of
Seconds_Behind_Source
in the
SHOW REPLICA STATUS
output until it becomes 0. Once it happens, record the time. This way, you will know how much time will be needed to apply one hour of changes.
Conclusion
You can use the same benchmarking tools that are recommended for standalone MySQL servers to perform performance tests against replication setups. You only need to monitor replication lag together with other performance characteristics.