Apr
12
2021
--

Replay the Execution of MySQL With RR (Record and Replay)

MySql Record and Replay

MySql Record and ReplayChasing bugs can be a tedious task, and multi-threaded software doesn’t make it any easier. Threads will be scheduled at different times, instructions will not have deterministic results, and in order for one to reproduce a particular issue, it might require the exact same threads, doing the exact same work, at the exact same time. As you can imagine, this is not straightforward.

Let’s say your database is crashing or even having a transient stall.  By the time you get to it, the crash has happened and you are stuck restoring service quickly and doing after-the-fact forensics.  Wouldn’t it be nice to replay the work from right before or during the crash and see exactly what was happening?

Record and Replay is a technique where we record the execution of a program allowing it to be replayed over and over producing the same result. Engineers at Mozilla have created RR, and basically, this open source tool allows you to record the execution of the software and replay it under the well-known GDB.

A Backup Problem

To demonstrate how powerful the tool is, we will be walking through how we used it to narrow down the issue from PXB-2180 (Special thanks to Satya Bodapati, who helped with all the InnoDB internals research for this bug). 

In summary, we were seeing Percona XtraBackup crashing at the prepare stage (remember, always test your backup!). The crash was happening randomly, sometimes after the second incremental, sometimes after the 10th incremental, with no visible pattern.

The stack trace was also not always the same. It was crashing on different parts of InnoDB, but here we had one commonality from all crashes – it always happened while trying to apply a redo log record to the same block page and space id:

#12 0x00000000015ad05f in recv_parse_or_apply_log_rec_body (type=MLOG_COMP_REC_INSERT, ptr=0x7f2849150556 "\003K4G", '\377' <repeats 13 times>, end_ptr=0x7f2849150573 "", space_id=<optimized out>, page_no=<optimized out>, block=0x7f2847d7da00, mtr=0x7f286857b4f0, parsed_bytes=18446744073709551615) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2002
2002         ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr, block, index, mtr);
(gdb) p block->page->id
+p block->page->id
$3 = {
  m_space = 4294967294,
  m_page_no = 5
}

Our suspicion was that the page layout on this block diverged between MySQL and XtraBackup. When working with these types of bugs, the crash is always the consequence of something that happened earlier, eg.: a crash on the sixth incremental backup could be the consequence of an issue that happened on the fourth incremental. 

The main goal at this step is to prove and identify where the page layout has diverted.

With this information, we ran MySQL under RR and reran the backup until we saw the same issue at prepare. We can now replay the MySQL execution and check how it compares. Our idea is to:

  1. Read the LSNs for this same page before/after each backup prepare.
  2. Identify all changes to  m_space = 4294967294 & m_page_no = 5 at mysqld.

Before we progress further, let’s explain a few things:

  1. m_space = 4294967294 correspond to the MySQL data dictionary (mysql.ibd) – dict0dict.h:1146
  2. On disk page, LSN is stored at the 16th byte of the page and has a size of 8 bytes – fil0types.h:66
  3. Pages are written sequentially to disk, as an example, for the default 16k page size, from bytes 1 to 16384 will have the data for page 0, from byte 16385 to 32768 data from page 1, and so on. 
  4. Frame is raw data of a page – buf0buf.h:1358

Replaying the Execution

To start, let’s read what LSN we have on mysql.ibd for page five before the backup. We will be using od (check man od for more information) and the information explained above:

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 10 21 85
0240030

And check if it matches an LSN stamp from mysqld. For that we will add a conditional breakpoint on the replay execution of MySQL at function buf_flush_note_modification:

$ rr replay .
. . .
(rr) b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5
+b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5
Breakpoint 1 at 0x495beb1: file /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic, line 69.
(rr) c
[Switching to Thread 18839.18868]

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=17892965, end_lsn=17893015, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$1 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x10,
  [0x6] = 0x21,
  [0x7] = 0x85}
(rr)

We can see the LSN stamp from before the preparation of full backup and the first stamp from the replay session match. Time to prepare the backup, advance the replay execution, and recheck:

xtrabackup --prepare --apply-log-only --target-dir=full/
. . .
Shutdown completed; log sequence number 17897577
Number of pools: 1
210402 17:46:29 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 11 07 06
0240030


(rr) c
+c
Continuing.
[Switching to Thread 18839.18868]

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=19077332, end_lsn=19077382, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$16 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x11,
  [0x6] = 0x7,
  [0x7] = 0x6}
(rr)

Same LSN stamp on both, server and backup. Time to move on and start to apply the incrementals:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc1/
. . .
Shutdown completed; log sequence number 19082430
. . .
210402 18:12:20 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 23 19 06
0240030


(rr) c
+c
Continuing.
Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=20262758, end_lsn=20262808, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$17 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x23,
  [0x6] = 0x19,
  [0x7] = 0x6}
(rr)

Once again, we have a matching LSN stamp on both sides. Moving to the next incremental:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc2/
. . .
Shutdown completed; log sequence number 20269669
. . .
210402 18:15:04 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 35 2f 98
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=21449997, end_lsn=21450047, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$18 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x35,
  [0x6] = 0x2f,
  [0x7] = 0x98}
(rr)

Incremental two applied and matching LSN stamp from mysqld. Let’s keep doing this until we find a mismatch:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc3/
. . .
Shutdown completed; log sequence number 21455916
. . .
210402 18:18:25 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 47 4d 3f
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=25529471, end_lsn=25529521, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$19 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x47,
  [0x6] = 0x4d,
  [0x7] = 0x3f}
(rr)


xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/
. . .
Shutdown completed; log sequence number 23044902
. . .
210402 18:24:00 completed OK!

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 5f a3 26
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=27218464, end_lsn=27218532, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$242 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x9f,
  [0x6] = 0x3f,
  [0x7] = 0xc9}
(rr)

Ok, here we have something. Backup files jumped from 0x01474d3f to 0x015fa326  when applying incremental four while the server moved from 0x01474d3f to 0x019f3fc9 . Perhaps we missed some other place where we can update the LSN stamp of a page? But now, we are at a point in the future with our replay execution of the MySQL server.

Replaying the Execution Backward

Here is (yet) another very cool feature from RR, it allows you to replay the execution backward. To eliminate the possibility of missing a place that is also updating the LSN of this block, let’s add a hardware watchpoint on the block->frame memory address and reverse the execution:

(rr) p block->frame
+p block->frame
$243 = (unsigned char *) 0x7fd2e0758000 "\327\064X["
(rr) watch *(unsigned char *) 0x7fd2e0758000
+watch *(unsigned char *) 0x7fd2e0758000
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000
(rr) disa 1
+disa 1
(rr) reverse-cont
+reverse-cont
+continue
Continuing.
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000

Old value = 215 '\327'
New value = 80 'P'

0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "P\257\"\347", n=3610531931) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135
135   b[0] = static_cast<byte>(n >> 24);
++rr-set-suppress-run-hook 1
++rr-set-suppress-run-hook 1
(rr) p/x buf_flush_init_for_writing::block->frame[16]@8
+p/x buf_flush_init_for_writing::block->frame[16]@8
$11 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x9f,
  [0x6] = 0x3f,
  [0x7] = 0xc9}
(rr) reverse-cont
+reverse-cont
+continue
Continuing.
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000

Old value = 80 'P'
New value = 43 '+'
0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "+k*\304", n=1353655015) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135
135   b[0] = static_cast<byte>(n >> 24);
++rr-set-suppress-run-hook 1
++rr-set-suppress-run-hook 1
(rr) p/x buf_flush_init_for_writing::block->frame[16]@8
+p/x buf_flush_init_for_writing::block->frame[16]@8
$12 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x47,
  [0x6] = 0x4d,
  [0x7] = 0x3f}
(rr)

By replaying the execution backward we can see that indeed the server changed the LSN  from 0x01474d3f to 0x019f3fc9. This confirms the issue is at incremental backup four as the LSN 0x015fa326 we see at end of incremental four was never a valid LSN at the server execution.

Root Cause

Now that we have limited the scope from six backups to a single one, things will become easier.

If we look closely at the log messages from the –prepare of the backup we can see that LSN of mysql.ibd matches the LSN stamp at the end of the backup:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/
. . .
Shutdown completed; log sequence number 23044902
. . .
210402 18:24:00 completed OK!

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 5f a3 26
0240030


$ echo $(( 16#015fa326 ))
23044902

By checking the stack trace of the issue and examining further the block we have parsed we can see that this is innodb_dynamic_metadata index:

(gdb) f 13
+f 13
#13 0x00000000015af3dd in recv_recover_page_func (just_read_in=just_read_in@entry=true, block=block@entry=0x7f59efd7da00) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2624
2624       recv_parse_or_apply_log_rec_body(recv->type, buf, buf + recv->len,
(gdb) p/x block->frame[66]@8
+p/x block->frame[66]@8
$4 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x0,
  [0x5] = 0x0,
  [0x6] = 0x0,
  [0x7] = 0x2}

You might be wondering where 66 came from; this is from examining position FIL_PAGE_DATA + PAGE_INDEX_ID. That gave us index ID 2. This is below 1024, which is reserved for Data Dictionary tables. By checking what is the second table on that list, we can see that it’s innodb_dynamic_metadata. With all this information summed up we can look at what the server does at shutdown, and it becomes clear what the issue is:
srv0start.cc:3965

/** Shut down the InnoDB database. */
void srv_shutdown() {
  . . .

  /* Write dynamic metadata to DD buffer table. */
  dict_persist_to_dd_table_buffer();
. . .
}

As part of the shutdown process, we are persisting dirty metadata back to the DD Buffer table (innodb_dynamic_metadata), which is wrong. Those changes will likely be persisted by the server and redo logged once the server performs a checkpoint. Also, more data can be merged together by the point of when the backup was taken and when the server itself persists this data to DD Tables. This is a result of the implementation of WL#7816 and WL#6204 which required Percona XtraBackup to change how it handles these types of redo records.

Summary

In this blog, we walked through the process of analyzing a real Percona XtraBackup bug. This bug exposes a challenge we face in various types of bugs, where the crash/malfunction is a consequence of something that happened way before, and by the time we have a stack trace/coredump, it is too late to perform a proper analysis. Record and Replay enabled us to consistently replay the execution of the source server, making it possible to narrow down the issue to where the root cause was. 


Percona XtraBackup is a free, open source, complete database backup solution for all versions of Percona Server for MySQL and MySQL

Apr
02
2021
--

Percona XtraBackup Point-In-Time Recovery for the Single Database

Percona XtraBackup Point-In-Time Recovery

Percona XtraBackup Point-In-Time RecoveryRecovering to a particular time in the past is called Point-In-Time Recovery (PITR). With PITR you can rollback unwanted DELETE without WHERE clause or any other harmful command.

PITR with Percona XtraBackup is pretty straightforward and perfectly described in the user manual. You need to restore the data from the backup, then apply all binary logs created or updated after the backup was taken, but skip harmful event(s).

However, if your data set is large you may want to recover only the affected database or table. This is possible but you need to be smart when filtering events from the binary log. In this post, I will show how to perform such a partial recovery using Percona XtraBackup, mysql command-line client, and mysqlbinlog programs only. There is an alternative approach that involves creating a fake source server, that is described in MySQL Point in Time Recovery the Right Way. You may consider it, especially if you need to apply changes to a single table.

Percona XtraBackup Point-In-Time Recovery

For our example we will create data first, then run DROP and DELETE commands on two different tables. Then we will rollback these commands.

First, let’s assume we have a server with two databases: test and sbtest. We are using GTIDs and row-based binary log format. We also run the server with the option innodb_file_per_table=1 and all our InnoDB tables use individual tablespaces. Otherwise, the individual restore method would not work.

mysql> show tables from sbtest;
+------------------+
| Tables_in_sbtest |
+------------------+
| sbtest1          |
| sbtest2          |
| sbtest3          |
| sbtest4          |
| sbtest5          |
| sbtest6          |
| sbtest7          |
| sbtest8          |
+------------------+
8 rows in set (0.00 sec)

mysql> show tables from test;
+----------------+
| Tables_in_test |
+----------------+
| bar            |
| baz            |
| foo            |
+----------------+
3 rows in set (0.00 sec)

We will experiment with tables foo and bar. We assume that at the time of our first backup, each of the tables contained five rows. Tables in the database sbtest also contain data, but it does not really matter for our experiment.

mysql> select count(*) from foo;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from bar;
+----------+
| count(*) |
+----------+
| 5        |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from baz;
+----------+
| count(*) |
+----------+
| 0        |
+----------+
1 row in set (0.00 sec)

Since we want to restore individual tables, we need to make a preparation before taking a backup: store database structure. We will do it with help of the mysqldump command. In this example, I store structure per database to make partial PITR easier, but you are free to use the option --all-databases.

mysqldump --no-data --set-gtid-purged=OFF --triggers --routines --events test > test_structure.sql
mysqldump --no-data --set-gtid-purged=OFF --triggers --routines --events sbtest > sbtest_structure.sql

Then we are ready to take the backup.

xtrabackup --parallel=8 --target-dir=./full_backup --backup

I am using the option --parallel to speed up the backup process.

Now let’s do some testing. First, let’s update rows in the table foo.

mysql> update foo set f1=f1*2;
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5 Changed: 5 Warnings: 0

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
+----+------+
5 rows in set (0.00 sec)

And then drop it and delete all rows from the table bar.

mysql> drop table foo;
Query OK, 0 rows affected (0.02 sec)

mysql> delete from bar;
Query OK, 5 rows affected (0.01 sec)

Finally, let’s insert a few rows into the tables bar and baz.

mysql> insert into bar(f1) values(6),(7),(8),(9),(10);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> insert into baz(f1) values(1),(2),(3),(4),(5);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

Assume that the DROP TABLE and DELETE command was an accident and we want to restore the state of the tables foo and bar as they were before these unlucky statements.

First, we need to prepare the backup.

Since we are interested in restoring only tables in the database test we need to prepare the backup with a special option --export that exports tablespaces in a way that they could be later imported:

xtrabackup --prepare --export --target-dir=./full_backup

Now the directory for the database test contains not only table definition files (.frm, only before 8.0) and tablespace files (.ibd) but also configuration files (.cfg).

Since we want all changes that happened after backup and before the problematic DROP TABLE and DELETE statements were applied, we need to identify which binary log and position were actual at the backup time. We can find it in the xtrabackup_binlog_info file:

$ cat full_backup/xtrabackup_binlog_info
master-bin.000004 1601 0ec00eed-87f3-11eb-acd9-98af65266957:1-56

Now we are ready to perform restore.

First, let’s restore the table foo from the backup. Restoring individual tablespaces requires the ALTER TABLE ... IMPORT TABLESPACE command. This command assumes that the table exists in the server. However, in our case, it was dropped and therefore we need to re-create it.

We will recreate the full database test from the file test_structure.sql

Since we do not want these administrative tasks to be re-applied, I suggest disabling binary logging for the session which will recreate the database structure.

mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

mysql> source test_structure.sql
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)
....

Once tables are recreated discard their tablespaces. I will show an example for the table foo. Adjust the code for the rest of the tables.

mysql> alter table foo discard tablespace;
Query OK, 0 rows affected (0.01 sec)

Then, in another terminal, copy the tablespace and configuration files from the backup to the database directory:

cp full_backup/test/foo.{ibd,cfg} var/mysqld.1/data/test/

And, finally, import the tablespace:

mysql> alter table foo import tablespace;
Query OK, 0 rows affected (0.05 sec)

Repeat for the other tables in the database test.

Now you can enable binary logging back.

You can do the same task in a script. For example:

for table in `mysql test --skip-column-names --silent -e "show tables"`
> do
>   mysql test -e "set sql_log_bin=0; alter table $table discard tablespace"
>   cp full_backup/test/$table.{ibd,cfg} var/mysqld.1/data/test/
>   mysql test -e "set sql_log_bin=0; alter table $table import tablespace"
> done

Our tables are recovered but do not have the updates made after the backup.

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
+----+------+
5 rows in set (0.00 sec)

mysql> select * from bar;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
+----+------+
5 rows in set (0.00 sec)

mysql> select * from baz;
Empty set (0.00 sec)

Therefore, we need to restore data from the binary logs.

To do it we first need to identify the GTID of the disaster event. It can be done if we dump all binary logs updated and created after backup into a dump file and then search for the DROP TABLE and DELETE statements and skipping them.

First, let’s check which binary logs do we have.

mysql> show binary logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |   1527476 |
| master-bin.000002 |      3035 |
| master-bin.000003 |      1987 |
| master-bin.000004 |      2466 |
| master-bin.000005 |       784 |
+-------------------+-----------+
5 rows in set (0.00 sec)

So we need to parse them, starting from the log master-bin.000004 and position 1601:

mysqlbinlog --start-position=1601 -vvv --base64-output=decode-rows --database=test master-bin.000004 master-bin.000005 > binlog_test.sql

I used options -vvv that prints SQL representation of row events, so we can find the one which we want to skip and --base64-output=decode-rows to not print row events at all. We will not use this file for the restore, only for searching the DROP TABLE and DELETE events.

Here they are, at the positions 2007 and 2123, with GTID 0ec00eed-87f3-11eb-acd9-98af65266957:58 and 0ec00eed-87f3-11eb-acd9-98af65266957:59

# at 2007
#210321 13:29:58 server id 1 end_log_pos 2123 CRC32 0xd1eb9854 Query thread_id=138 exec_time=0 error_code=0
use `test`/*!*/;
SET TIMESTAMP=1616322598/*!*/;
DROP TABLE `foo` /* generated by server */
/*!*/;
# at 2123
#210321 13:30:08 server id 1 end_log_pos 2188 CRC32 0xfc9b2088 GTID last_committed=7 sequence_number=8 rbr_only=yes original_committed_timestamp=0 immediate_commit_timestamp=0 transaction_length=0
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=0 (1970-01-01 02:00:00.000000 EET)
# immediate_commit_timestamp=0 (1970-01-01 02:00:00.000000 EET)
/*!80001 SET @@session.original_commit_timestamp=0*//*!*/;
/*!80014 SET @@session.original_server_version=0*//*!*/;
/*!80014 SET @@session.immediate_server_version=0*//*!*/;
SET @@SESSION.GTID_NEXT= '0ec00eed-87f3-11eb-acd9-98af65266957:59'/*!*/;
# at 2188
#210321 13:30:08 server id 1 end_log_pos 2260 CRC32 0x1d525b11 Query thread_id=138 exec_time=0 error_code=0
SET TIMESTAMP=1616322608/*!*/;
BEGIN
/*!*/;
# at 2260
#210321 13:30:08 server id 1 end_log_pos 2307 CRC32 0xb57ecb73 Table_map: `test`.`bar` mapped to number 226
# at 2307
#210321 13:30:08 server id 1 end_log_pos 2387 CRC32 0x6770a7e2 Delete_rows: table id 226 flags: STMT_END_F
### DELETE FROM `test`.`bar`
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2=1 /* INT meta=0 nullable=1 is_null=0 */
### DELETE FROM `test`.`bar`
### WHERE
...

Note that decoded row event contains a DELETE command for each affected row.

We may also find to which binary log this event belongs if search for the "Rotate to" event. In our case “Rotate to master-bin.000005” happened after the found positions, so we only need file master-bin.000004 In your case, you may need to skip events from the previous log files too.

So to restore the data we need to run mysqlbinlog one more time, this time with parameters:

mysqlbinlog  --start-position=1601 --exclude-gtids=0ec00eed-87f3-11eb-acd9-98af65266957:58-59 --database=test --skip-gtids=true master-bin.000004 master-bin.000005 > binlog_restore.sql

I removed options -vvvbecause we are not going to examine this restore file and option --base64-output=decode-rows because we need row events to present in the resulting file. I also used option --exclude-gtids=0ec00eed-87f3-11eb-acd9-98af65266957:58-59 to exclude GTIDs that we do not want to re-apply. We also need to use --skip-gtids=true because otherwise updates will be skipped since such GTIDs already exist on the server.

Now binlog_restore.sql contains all updates to the database test made after the backup and before the DROP statement. Let’s restore it.

mysql test < binlog_restore.sql

Restore went successfully. Our tables have all past updates.

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
+----+------+
5 rows in set (0.01 sec)

mysql> select count(*) from bar;
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from baz;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

Conclusion

You may save the time required for PITR if use the per-database restore method. However, you need to take into account the following considerations:

  • mysqlbinlog does not support filtering per table, therefore you either need to restore the full database or use a fake server method, described in MySQL Point in Time Recovery the Right Way.
  • Per-database filters depend on the USE statement in the statement-based binary log format. Therefore option --database can only be considered safe with a row-based format.
  • If you do not use GTID you still can use this method. You will need to combine options --start-position and --stop-position to skip the event.

Percona XtraBackup is a free, open source database backup solution for Percona Server for MySQL and MySQL.

Jan
27
2021
--

Percona XtraBackup 8 Enables –lock-ddl by Default

Percona XtraBackup 8 Enables --lock-ddl by Default

Percona XtraBackup 8 Enables --lock-ddl by DefaultPercona XtraBackup 8.0.23 enables the “lock-ddl” option by default to ensure any DDL events do not corrupt the backups. Any DML events continue to occur, and only DDL events are blocked.

A DDL lock protects the definition of tables and views. With the “–lock-ddl” option disabled, Percona XtraBackup allows backups while concurrent DDL events continue to happen. These backups are invalid and fail at the Prepare stage. If DDL events have occurred without the backup user’s knowledge, they may be unaware the backup is corrupt.

MySQL 8.0 introduced “Lock Instance for Backup”. This backup lock is lightweight, and the performance for DML operations is almost the same with or without backup locks. This lock is not available in 5.7, because Percona Server for MySQL has its own lightweight backup lock for that version.

Percona XtraBackup 8 also has “Lock Tables for Backup”. An active “Lock Tables for Backup” metadata lock (MDL) blocks all DDL statements and updates to MyISAM, CSV, MEMORY, ARCHIVE, TokuDB, and MyRocks tables. The PERFORMANCE_SCHEMA or PROCESSLIST displays “Waiting for backup log” when this lock is active.

Jan
19
2021
--

The MySQL Clone Wars: Plugin vs. Percona XtraBackup

MySQL Plugin vs. Percona XtraBackup

MySQL Plugin vs. Percona XtraBackupLarge replication topologies are quite common nowadays, and this kind of architecture often requires a quick method to rebuild a replica from another server.

The Clone Plugin, available since MySQL 8.0.17, is a great feature that allows cloning databases out of the box. It is easy to rebuild a replica or to add new nodes to a cluster using the plugin. Before the release of the plugin, the best open-source alternative was Percona XtraBackup for MySQL Databases.

In this blog post, we compare both alternatives for cloning purposes. If you need to perform backups, Percona XtraBackup is a better tool as it supports compression and incremental backups, among other features not provided by the plugin. The plugin supports compression only for network transmission, not for storage.

But one of the plugin’s strong points is simplicity. Once installed and configured, cloning a database is straightforward. Just issuing a command from the destination database is enough.

Percona XtraBackup, on the other side, is a more complex tool. The cloning process involves several stages: backup, stream, write, and prepare. These stages can take place in parallel: we can stream the backup to the new server using netcat and, at the same time, we can write it into the destination directory. The only stage that is sequential is the last one: prepare.

Test Characteristics

We used sysbench to create 200 tables of 124Mb each for a total of 24Gb. Both source and replica virtual machines run 4 cores, 8 Gb RAM, and 60Gb storage. We created the disks on the same datastore.

During the tests, we did not generate additional operations on the database. We measured only the clone process, reducing the benchmark complexity. Otherwise, we would have to take into consideration things like application response time, or the number of transactions executed. This is beyond the scope of this assessment.

We tested different combinations of clone and Percona XtraBackup operations. For XtraBackup, we tested 1 to 4 threads, with and without compression. In the case of compression, we allocated the same number of threads to compression and decompression. For the clone plugin, we tested auto (which lets the server decide how many threads will perform the clone) and 1 to 4 threads. We also tested with and without compression. Finally, we executed all the tests using three different network limits: 500mbps, 1000mbps, and 4000mbps. These make a total of 54 tests, executed 12+1 times each.

All times are in seconds. In the graphs below, lower values are better.

Method

Clone

Out of the required parameters to operate the clone plugin, the following were set up accordingly in the recipient server:

  • clone_max_concurrency=<maximum number of threads> Defines the maximum number of threads used for a remote cloning operation with autotune enabled. Otherwise, this is the exact number of threads that remote cloning uses.
  • clone_autotune_concurrency If enabled the clone operation uses up to clone_max_concurrency threads. The default is 16.
    • clone_enable_compression If enabled, the remote clone operation will use compression.

Percona XtraBackup

To stream the backup we used the xbstream format and sent the data to the remote server using netcat. We applied the following parameters:

  • parallel=<number of threads> Xtrabackup and xbstream parameter that defines the number of threads used for backup and restore operations.
  • rebuild-threads The number of threads used for the rebuild (prepare) operation.
  • decompress_threads and compress_threads Xtrabackup and xbstream parameters that define the number of threads used for compression operations.

Some people use additional parameters like innodb-read-io-threads, innodb-write-io-threads, or innoDB-io-capacity, but these parameters only affect the behavior of InnoDB background threads. They have no impact during backup and restore operations.

Results

Clone

No compression

For the lower bandwidth tests, the number of threads used does not make a difference. Once we increase bandwidth we see that time cuts by half when we move from one thread to two. Going beyond that value improves slightly. Probably we reach the disk i/o limit.

Clone Plugin performance without compression.

The auto option is consistently the fastest one.

Compression

Compression is supposed to improve performance for lower bandwidth connections, but we see that this is not the case. Bandwidth has no impact on execution time and compression makes the clone slower. Again auto gives the best results, equivalent to 4 threads.
clone plugin with compression

Percona XtraBackup

No Compression

Without compression, we see again that the number of threads does not make any difference in the lower bandwidth test. When we increase bandwidth, the number of threads is important, but we quickly reach i/o limits.

Percona Xtrabackup stream without compression

Compression

When using compression, we see that requires less time to complete in almost every case compared with the option without compression, even when bandwidth is not the limit.

Percona Xtrabackup stream with compression

Conclusion

We see that, when using compression, the clone plugin is the slower option while Percona XtraBackup gives great results for all bandwidths. Without compression, the clone plugin is faster when using more than 2 threads. XtraBackup is faster for fewer threads.

Xtrabackup vs. Clone plugin - results summary

Below, we have a chart comparing the worst and best results. As expected, the worst results correspond to one thread executions.

The Clone Plugin is a great option for simplicity. Percona XtraBackup is excellent to save bandwidth and provides better results with fewer threads. With enough threads and bandwidth available, both solutions provide comparable results.

Jan
12
2021
--

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best PracticesIn this blog, we will review all the backup and restore strategies for MySQL, the cornerstones of any application. There are a few options, depending on your topology, MySQL versions, etc. And based on that, there are some questions we need to ask ourselves to make sure we make the right choices.

How many backups we need to keep safe, or what’s the best retention policy for us?

This means the number of backups to safeguard, whether local or remote (external fileserver, cloud). The retention policy can be daily, weekly, or monthly, depending on the free space available.

What is the Recovery Time Objective?

The Recovery Time Objective (RTO) refers to the amount of time that may pass during a disruption before it exceeds the maximum allowable threshold specified in the Business Continuity Plan.

The key question related to RTO is, “How quickly must the data on this system be restored?”

What is the Recovery Point Objective?

The Recovery Point Objective (RPO) is the duration of time and service level within which a business process must be stored after a disaster in order to avoid unacceptable consequences associated with a break in continuity.

The key question related to RPO is, “How much data can we lose?”

Different Types of Backups

There are two backup types: physical and logical.

  • Physical (Percona XtraBackup, RDS/LVM Snapshots, MySQL Enterprise Backup), and also you can use cp or rsync command lines to copy the datadir as long as mysql is down/stopped.
  • Logical (mysqldump, mydumper, mysqlpump, mysql shell only for mysql 8)

Also is recommended to take a copy of binlog files, why? Well, this will help us to recover until the last transaction.

Why are backups needed?

Backups are needed in case of multiple problems:

  • Host Failure: We can get multiple problems from disks stalled or broken disks. Also from cloud services, our DB instance can be broken and it’s non-accessible.
  • Corrupted Data: This can happen on a power outage, MySQL wasn’t able to write correctly and close the file, sometimes when MySQL starts again it cannot start due to corrupted data and the crash recovery process cannot fix it.
  • Inconsistent Data: When a human mistake, delete/update erroneous data over the primary or replica node.
  • DataCenter Failure: power outage or internet provider issues.
  • Legislation/Regulation: provide consistent business value and customer satisfaction.

Now let me explain those different types of backups mentioned above, but before I continue, it’s important to configure a new and dedicated replica node for backups purposes, due to the high CPU load to avoid any issue on any other replica node (AKA backup server).

Logical Backup

This is a dump from logical database structure (CREATE DATABASE, CREATE TABLE statements) and content (INSERT statements). This is recommended to be used against smaller amounts of data. The disadvantage of this method is slower (backup and restore) if you compare it with physical backups. Using mydumper you can backup and restore a single database or a single table if it’s needed, and this is useful to copy some data to a different environment to run tests. Also, mydumper can take a consistent (as long as all the tables are InnoDB engine) backup and provides accurate master and slave log positions.

The output is larger than for physical backup, particularly when saved in text format, but it can be compressed on the fly depending on the software you are using. Mydumper can compress and mysqldump needs to add a pipe to redirect the output to gzip, for example.

Logical backups are used to address data corruption or the need to restore a subset of tables.

Physical (Raw) Backup

In short, this consists of exact copies of database directories and files. This can be a copy for all or a part from MySQL datadir directory. This kind of backup is most used to restore or create a new replica node easily and quickly and is used to address host failure. It’s recommended to restore using the same MySQL version. I recommend using Percona XtraBackup because it can include any related files such as configuration files like cnf config files.

Snapshot Backups

Some file system implementations enable “snapshots” to be taken. These provide logical copies of the file system at a given point in time, without requiring a physical copy of the entire file system. MySQL itself does not provide the capability for taking file system snapshots but it is available using third-party solutions such as LVM or ZFS.

The disadvantage is that sometimes physical backups do not compress much, because data is usually in a binary format and sometimes the table is already compressed.

Binary Log Backups

Binlog backups specifically address RPO. Binary log files contain records of each SQL query executed that made changes.

From MySQL 5.6 on, you can use mysqlbinlog to stream binary logs from a remote server. You can combine binlog backups with Percona XtraBackup or mydumper backup to allow restoration up to the end of the most-recently-backed-up binary log.

Incremental / Differential Backups

An incremental backup is a backup of everything that has changed since the last backup (a binary log backup is a special case of an incremental backup). This is a very good option if the dataset size is huge, as you can take a full backup at the beginning of the week and run incremental backups per day. Also, the backup size is smaller than the full backup.

The main risks associated with incremental backups are:

– A single corrupt incremental backup may invalidate all the others

– Incremental backups typically negatively affect the RTO

For a differential backup, it copies the differences from your last backup, and the advantage is that a lot of data does not change from one backup to the next, so the result can be significantly smaller backups. This saves disk space.

Percona XtraBackup supports both incremental and differential backups.

Offsite Storage

It’s highly recommended to copy all the backup methods to another place, like the cloud or an external file server, so in case of host failure or data center failure, you have another copy.

Not all the backup files need to be uploaded to the cloud, sometimes the time you need to spend in the download is bigger than the time consumed in the recovery process.

A good approach is to keep 1-7 days locally on the backup server in case a fast recovery is needed, and this depends on your business regulations.

Encryption

Backups have sensitive data, so it’s highly recommended to encrypt, especially for offsite storage. This adds more time when you need to restore a backup but it keeps your data safe.

GPG is a good option to encrypt backups, and if you use this option or some other alternative, don’t forget to get a copy of the keys/passphrase. If you lose it, your backups will be useless.

Restore Testing

Depending on your business, it’s highly recommended to test your backups at least once per month. This action validates your backups are not corrupted and it provides critical metrics on recovery time. This process should be automated to get the full backup, restore it, and finally configure this server as a replica from the current primary or another replica. This is good as well to validate that the replication process has no errors.

Many customers are using this methodology to refresh their QA/STG environment to have fresh data from production backups.

In addition to the above, it is recommended to create a manual or automated restore documentation process to keep all the steps together, so in case of disaster, you can follow it without wasting time.

Retention Requirements

Last but not least, it is very important to keep multiple copies of different backup types.

Our best recommendation is:

  • One or two physical backups locally on the backup server (as long as space allows it).
  • Seven daily and four weekly logical backups locally on the backup server.
  • 30 days of binlog backups locally on the backup server.
  • For offsite backups (like S3, Google Cloud, etc.), keep monthly backups for one year or more.

For local backups, keep in mind you will need a minimum of 2.5 times the current dataset size as free disk space to save/meet these retention policies. Don’t forget to encrypt all the backup types!

Legal or regulatory requirements may also dictate how long data must be archived.

Percona Can Help

Percona can help you choose, implement, and optimize the most appropriate MySQL backup and recovery solution for your MySQL ecosystem. If your current solution unexpectedly fails, we can facilitate your recovery with onsite, remote, or emergency consulting services. We can also help you take steps to prevent another occurrence. Every situation is unique and we will work with you to create the most effective solution for your business.

Contact Us

Dec
22
2020
--

Redesign of –lock-ddl-per-table in Percona XtraBackup

Redesign of -lock-ddl-per-table in Percona XtraBackup

Redesign of -lock-ddl-per-table in Percona XtraBackupMySQL 5.7, alongside other many improvements, brought bulk load for creating an index (WL#7277 to be specific), which made ADD INDEX operations much faster by disabling redo logging and making the changes directly to tablespace files. This change requires extra care for backup tools. To block DDL statements on an instance, Percona Server for MySQL implemented LOCK TABLES FOR BACKUP. Percona XtraBackup (PXB) uses this lock for the duration of the backup. This lock does not affect DML statements.

MySQL 5.7 doesn’t have an option to block an instance against DDL and to allow all MDL’s. Thus, Percona XtraBackup has also implemented –lock-ddl-per-table. Before we go into other details, let’s understand how –lock-ddl-per-table works up to now:

  1. PXB starts its work by parsing and copying all redo logs after the checkpoint mark.
  2. Fork a dedicated thread that will keep following new redo log entries.
  3. Get a list of all tablespaces that it will require to copy.
  4. Iterate through the list of tablespaces, for each tablespace, it does the following:
    •  Query INFORMATION_SCHEMA.INNODB_SYS_TABLES or in case of a 8.0 server INFORMATION_SCHEMA.INNODB_TABLES check which table or tables belong to that tablespace ID and take an MDL on the underlying table or tables in case of a shared tablespace.
    • Copy the tablespace .ibd file.

This approach works on the promise that if an MLOG_INDEX_LOAD event (Redo log event generated by bulk load operations to notify backup tools that changes to data files have been omitted from redo log) is encountered by the redo follow thread, it’s safe to continue as tablespaces that we have already copied are protected by MDL and the  MLOG_INDEX_LOAD event is for a tablespace that is yet to be copied.

This promise is not always correct and can lead to inconsistent backup; here are a few examples:

Full-Text Index

Full-Text Index has its own tablespace:

mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE name LIKE '%FTS%';
+----------+----------------------------------------------------+-------+
| table_id | name                                               | space |
+----------+----------------------------------------------------+-------+
|     1169 | test/FTS_000000000000002e_0000000000000508_INDEX_1 |  1157 |
|     1170 | test/FTS_000000000000002e_0000000000000508_INDEX_2 |  1158 |
|     1171 | test/FTS_000000000000002e_0000000000000508_INDEX_3 |  1159 |
|     1172 | test/FTS_000000000000002e_0000000000000508_INDEX_4 |  1160 |
|     1173 | test/FTS_000000000000002e_0000000000000508_INDEX_5 |  1161 |
|     1174 | test/FTS_000000000000002e_0000000000000508_INDEX_6 |  1162 |
|     1175 | test/FTS_000000000000002e_BEING_DELETED            |  1163 |
|     1176 | test/FTS_000000000000002e_BEING_DELETED_CACHE      |  1164 |
|     1177 | test/FTS_000000000000002e_CONFIG                   |  1165 |
|     1178 | test/FTS_000000000000002e_DELETED                  |  1166 |
|     1179 | test/FTS_000000000000002e_DELETED_CACHE            |  1167 |
+----------+----------------------------------------------------+-------+
11 rows in set (0.01 sec)

With the current approach, PXB will try to run a SELECT on FTS_000000000000002e_0000000000000508_INDEX_1 for example, which is not something we can do. Here the underlying table that this FTS belongs to may or may not have been protected by MDL, which can cause the FTS index to be copied without protection.

Full-Text Index has a defined name schema, on the above situation we can easily extract the table ID by translating the first 16 characters after FTS_ from hex to decimal, which would give us the underlying table that the FTS belongs to:

session 1> CREATE FULLTEXT INDEX full_index on joinit2 (s);
session 2> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE table_id >= 1319;
+----------+----------------------------------------------------+-------+
| table_id | name                                               | space |
+----------+----------------------------------------------------+-------+
|     1321 | test/#sql-ib1320-2000853746                        |  1309 |
|     1322 | test/FTS_0000000000000529_00000000000005b6_INDEX_1 |  1310 |
|     1323 | test/FTS_0000000000000529_00000000000005b6_INDEX_2 |  1311 |
|     1324 | test/FTS_0000000000000529_00000000000005b6_INDEX_3 |  1312 |
|     1325 | test/FTS_0000000000000529_00000000000005b6_INDEX_4 |  1313 |
|     1326 | test/FTS_0000000000000529_00000000000005b6_INDEX_5 |  1314 |
|     1327 | test/FTS_0000000000000529_00000000000005b6_INDEX_6 |  1315 |
|     1328 | test/FTS_0000000000000529_BEING_DELETED            |  1316 |
|     1329 | test/FTS_0000000000000529_BEING_DELETED_CACHE      |  1317 |
|     1330 | test/FTS_0000000000000529_CONFIG                   |  1318 |
|     1331 | test/FTS_0000000000000529_DELETED                  |  1319 |
|     1332 | test/FTS_0000000000000529_DELETED_CACHE            |  1320 |
|     1320 | test/joinit2                                       |  1308 |
+----------+----------------------------------------------------+-------+

FTS_0000000000000529 translates to table_id 1321, however, as you can see above, the FTS in question belongs to a temporary table. That is because when FTS is created for the first time, it has to rebuild the table to create the FTS_DOC_ID column. Again, it is not possible to copy the files under MDL protection.

New Table Added in the Middle of the Backup

Because of the per-table MDL acquisition, if a table has been created after PXB has gathered the list of tablespaces, it’s .ibd will not be copied. Instead, it will be recreated as part of the –prepare phase based on the data added to redo logs. As you can imagine, if the changes are redo-skipped after recreating the table based on redo information, the table will be incomplete.

Shared Tablespaces

Once a shared tablespace is parsed by the lock-ddl-per-table function, it will get a list of all tables created on that tablespace and acquire the MDL on those tables, however, there is no tablespace level MDL, which means there is nothing blocking a new table to be created on this tablespace. If the tablespace has already been copied, this will follow the previous point and be recreated at –prepare phase by parsing redo logs.

Best/Worst Case Scenario

The outcome of such inconsistent backups can be unknown. In the best-case scenario, you will get a crash either in the backup/prepare phase or when using the server. Yes, a crash is a best-case scenario because you will notice the issue right away.

In the worst-case scenario, data will be missed without you noticing it. Here instead of explaining let’s reproduce it.

Get a brand-new instance of MySQL/Percona Server for MySQL 5.7 and download Percona XtraBackup 2.4 prior to 2.4.21 (same can be reproducible with MySQL/PSMySQL 8 and PXB8).

In order to reproduce this scenario, we will be using gdb to pause PXB execution of the backup at a certain time, however, one can do the same by having a big table, which will take some time to copy.

gdb xtrabackup ex 'set args --backup --lock-ddl-per-table --target-dir=/tmp/backup' -ex 'b mdl_lock_table' -ex 'r'

This will start the backup. On a separate session, connect to MySQL and execute:

CREATE DATABASE a;
USE a;
CREATE TABLE tb1 (ID INT PRIMARY KEY, name CHAR(1));
INSERT INTO tb1 VALUES (3,'c'), (4, 'd'), (5, 'e');
CREATE INDEX n_index ON tb1(name);

Back to gdb session, execute:

disa 1
c
quit

Your backup will complete. You can now prepare it and restore it on MySQL. Now try to query a.tb1 using the below queries:

USE a;
SELECT * FROM tb1;
SELECT * FROM tb1 FORCE INDEX(PRIMARY);
SELECT * FROM tb1 FORCE INDEX(n_index);
SELECT * FROM tb1 WHERE name = 'd';

Here is an example:

mysql> SELECT * FROM tb1;
Empty set (0.00 sec)

mysql> SELECT * FROM tb1 FORCE INDEX(PRIMARY);
+----+------+
| ID | name |
+----+------+
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM tb1 FORCE INDEX(n_index);
Empty set (0.00 sec)

mysql> SELECT * FROM tb1 WHERE name = 'd';
Empty set (0.00 sec)

As you can see, querying using the PK shows that data is indeed present on the data. The index n_index exists but the corresponding index entries are not present, leading to inconsistent results, which may cause a lot of harm before you notice something is wrong.

Redesign of –lock-ddl-per-table

With the above points in mind, –lock-ddl-per-table has been reworked in order to guarantee a consistent backup. Percona XtraBackup 2.4.21 and Percona XtraBackup 8.0.22 have the below main changes:

  • MDL lock is now performed at the beginning of the backup before we do the first part/catch-up of redo logs.
  • Then the first scan of redo logs happens, and in this first scan, we can still have this MLOG_INDEX_LOAD event recorded in case a CREATE INDEX has happened right before the backup started. For now, it’s still safe to parse and accept it.
  • Once the first scan has finished, the thread responsible for following new redo log events is started and this thread will now abort the backup in case of encountering MLOG_INDEX_LOAD events.
  • Gather the list of tablespaces to copy.
  • Start to copy the actual .ibd files.

Other improvements have also been performed:

  • We now skip trying to run a SELECT if the .ibd file belongs to a temporary table as the SELECT query will never work.
  • Since we are taking MDL before we copy individual files, we are also skipping the lock in case we are dealing with an FTS. For FTS we will eventually (or have already) taken an MDL on the base table, making it safe to skip the work for those files too.
  • The query that takes MDL has been improved to not retrieve any data since a SELECT 1 FROM table LIMIT 0 will be enough for acquiring an MDL lock on the table.
  • –lock-ddl-per-table is a workaround for the changes done in WL#7277 when happening in the middle of a backup. In Percona Server 5.7 we have implemented LOCK TABLES FOR BACKUP (that gets executed with –lock-ddl parameter of PXB) which acquires an instance-level MDL lock. If a user is using –lock-ddl-per-table on Percona Server for MySQL 5.7 a warning is trow advising it should be using –lock-ddl instead.
  • For MySQL 8, upstream has implemented LOCK INSTANCE FOR BACKUP, which works similar to Percona Server LOCK TABLES FOR BACKUP, in the case of Percona XtraBackup 8, –lock-ddl-per-table is been deprecated, it still works but a warning is also issued advising users to switch to –lock-ddl.

Summary

As described throughout this post, WL#7277 brought some real challenges that can affect backups in different ways, some of them not easy to spot. Percona XtraBackup will always favor consistency and has been enhanced to work around those limitations when possible (taking MDL earlier in the process) and aborting the backup when an inconsistency is inevitable.

Percona Server for MySQL users and MySQL 8 should always use –lock-ddl which is a more robust and safe lock for those types of situations.

Oct
23
2020
--

MySQL New Releases and Percona XtraBackup Incompatibilities

MySQL Percona Backup Incompatibilities

MySQL Percona Backup IncompatibilitiesEarlier this week, Oracle released their Q4 releases series. As on the previous releases, backward compatibility has been broken with previous versions of the server. This time on both MySQL 5.7 and 8.0:

MySQL 5.7.32

While our QA team was performing an extensive test on it,  we found out this version introduced a new compression format version. This change breaks backward compatibility with older versions of MySQL, which is expected on the 8.0 series but is not on 5.7. As Percona XtraBackup (PXB) is based on MySQL code, it makes MySQL 5.7.32 incompatible with current versions of Percona XtraBackup 2.4.20 and prior.

The issue does not affect only Percona XtraBackup but also prevents users from downgrading the server from 5.7.32 to any lower version on the 5.7 series – More details at https://bugs.mysql.com/bug.php?id=101266.

In summary, if you have tables with compression flag as below:

CREATE TABLE t1 (c1 INT) COMPRESSION="zlib";

The issue will manifest if a user using 5.7.32:

  • Creates a new compressed table.
  • Runs any ALTER TABLE  that uses the algorithm copy (table rebuild) on a compressed table.

At this moment, we advise users using compressed tables to hold the upgrade to 5.7.32.

We are currently working on making Percona XtraBackup 2.4.21 fully compatible with 5.7.32.

MySQL 8.0.22

Percona XtraBackup 8.0.14 (the latest version available) is not compatible with MySQL 8.0.22 due to disk format changes introduced in the 8.0.22 release.

WL#13782: InnoDB: Add dynamic config option to use fallocate() on Linux introduced a new redo log record MLOG_FILE_EXTEND which is written on the file extension and doesn’t depend on –innodb-extend-and-initialize option. Unfortunately this time, the redo log format version is not bumped up. Percona XtraBackup 8.0.14 during backup, cannot parse this new redo log record and so backup fails.

If by chance, MLOG_FILE_EXTEND is checkpointed, PXB during backup doesn’t see this new record. This leads to a misleading successful backup that cannot be prepared. Let’s see why.

Bug#31587625: PERFORMANCE DEGRADATION AFTER WL14073
This bug fix in 8.0.22, increased the DD version to 8022. PXB during prepare, de-serializes the SDI from IBD file to bootstrap dictionary. Due to the higher DD_VERSION in SDI, PXB 8.0.14 cannot deserialize the SDI and prepare fails.

At this moment, we advise all users to hold the upgrade to 8.0.22.

We are working on these incompatible issues, look out for an upcoming release of PXB release to take successful, consistent backups of 8.0.22

Aug
26
2020
--

Creating an External Replica of AWS Aurora MySQL with Mydumper

Oftentimes, we need to replicate between Amazon Aurora and an external MySQL server. The idea is to start by taking a point-in-time copy of the dataset. Next, we can configure MySQL replication to roll it forward and keep the data up-to-date.

This process is documented by Amazon, however, it relies on the mysqldump method to create the initial copy of the data. If the dataset is in the high GB/TB range, this single-threaded method could take a very long time. Similarly, there are ways to improve the import phase (which can easily take 2x the time of the export).

Let’s explore some tricks to significantly improve the speed of this process.

Preparation Steps

The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure binlog_format is set to ROW. There is no log_bin option in Aurora (in case you are wondering), simply setting binlog_format is enough. The change requires a restart of the writer instance, so it, unfortunately, means a few minutes of downtime.

We can check if a server is generating binary logs as follows:

mysql> SHOW MASTER LOGS;

+----------------------------+-----------+
| Log_name                   | File_size |
+----------------------------+-----------+
| mysql-bin-changelog.034148 | 134219307 |
| mysql-bin-changelog.034149 | 134218251 |
...

Otherwise, you will get an error:

ERROR 1381 (HY000): You are not using binary logging

We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data.

mysql> call mysql.rds_set_configuration('binlog retention hours', 72);
Query OK, 0 rows affected (0.27 sec)

mysql> CALL mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name                   | value | description                                                                                          |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | 72    | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.25 sec)

The next step is creating a temporary cluster to take the export. We need to do this for a number of reasons: first to avoid overloading the actual production cluster by our export process, also because mydumper relies on FLUSH TABLES WITH READ LOCK to get a consistent backup, which in Aurora is not possible (due to the lack of SUPER privilege).

Go to the RDS console and restore a snapshot that was created AFTER the date/time where you enabled the binary logs. The restored cluster should also have binlog_format set, so select the correct Cluster parameter group.

Next, capture the binary log position for replication. This is done by inspecting the Recent events section in the console. After highlighting your new temporary writer instance in the console, you should see something like this:

Binlog position from crash recovery is mysql-bin-changelog.034259 32068147

So now we have the information to prepare the CHANGE MASTER command to use at the end of the process.

Exporting the Data

To get the data out of the temporary instance, follow these steps:

  1. Backup the schema
  2. Save the user privileges
  3. Backup the data

This gives us added flexibility; we can do some schema changes, add indexes, or extract only a subset of the data.

Let’s create a configuration file with the login details, for example:

tee /backup/aurora.cnf <<EOF
[client]
user=percona
password=percona
host=percona-tmp.cgutr97lnli6.us-west-1.rds.amazonaws.com
EOF

For the schema backup, use mydumper to do a no-rows export:

mydumper --no-data \
--triggers \
--routines \
--events \
-v 3 \
--no-locks \
--outputdir /backup/schema \
--logfile /backup/mydumper.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
--defaults-file /backup/aurora.cnf

To get the user privileges I normally like to use pt-show-grants. Aurora is, however, hiding the password hashes when you run SHOW GRANTS statement, so pt-show-grants will print incomplete statements e.g.:

mysql> SHOW GRANTS FOR 'user'@'%';
+---------------------------------------------------------+
| Grants for user@%                                       |
+---------------------------------------------------------+
| GRANT USAGE ON *.* TO 'user'@'%' IDENTIFIED BY PASSWORD |
| GRANT SELECT ON `db`.* TO 'user'@'%'                    |
+---------------------------------------------------------+

We can still gather the hashes and replace them manually in the pt-show-grants output if there is a small-ish number of users.

pt-show-grants --user=percona -ppercona -hpercona-tmp.cgutr97lnli6.us-west-1.rds.amazonaws.com  > grants.sql

mysql> select user, password from mysql.user;

Finally, run mydumper to export the data:

mydumper -t 8 \
--compress \
--triggers \
--routines \
--events \
—-rows=10000000 \
-v 3 \
--long-query-guard 999999 \
--no-locks \
--outputdir /backup/export \
--logfile /backup/mydumper.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
-O skip.txt \
--defaults-file /backup/aurora.cnf

The number of threads should match the number of CPUs of the instance running mydumper. In the skip.txt file, you can include any tables that you don’t want to copy. The –rows argument will give you the ability to split tables in chunks of X number of rows. Each chunk can run in parallel, so it is a huge speed bump for big tables.

Importing the Data

We need to stand up a MySQL instance to do the data import. In order to speed up the process as much as possible, I suggest doing a number of optimizations to my.cnf as follows:

[mysqld]
pid-file=/var/run/mysqld/mysqld.pid
log-error=/var/log/mysqld.log
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log_slave_updates
innodb_buffer_pool_size=16G
binlog_format=ROW
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=0
server-id=1000
log-bin=/log/mysql-bin
sync_binlog=0
master_info_repository=TABLE
relay_log_info_repository=TABLE
query_cache_type=0
query_cache_size=0
innodb_flush_neighbors=0
innodb_io_capacity_max=10000
innodb_stats_on_metadata=off
max_allowed_packet=1G
net_read_timeout=60
performance_schema=off
innodb_adaptive_hash_index=off
expire_logs_days=3
sql_mode=NO_ENGINE_SUBSTITUTION
innodb_doublewrite=off

Note that mydumper is smart enough to turn off the binary log for the importer threads.

After the import is complete, it is important to revert these settings to “safer” values: innodb_doublewriteinnodb_flush_log_at_trx_commit, sync_binlog, and also enable performance_schema again.

The next step is to create an empty schema by running myloader:

myloader \
-d /backup/schema \
-v 3 \
-h localhost \
-u root \
-p percona

At this point, we can easily introduce modifications like adding indexes, since the tables are empty. We can also restore the users at this time:

(echo "SET SQL_LOG_BIN=0;" ; cat grants.sql ) | mysql -uroot -ppercona -f

Now we are ready to restore the actual data using myloader. It is recommended to run this inside a screen session:

myloader -t 4 \
-d /backup/export \
-q 100 \
-v 3 \
-h localhost \
-u root \
-p percona

The rule of thumb here is to use half the number of vCPU threads. I also normally like to reduce mydumper default transaction size (1000) to avoid long transactions, but your mileage may vary.

After the import process is done, we can leverage faster methods (like snapshots or Percona Xtrabackup) to seed any remaining external replicas.

Setting Up Replication

The final step is setting up replication from the actual production cluster (not the temporary one!) to your external instance.

It is a good idea to create a dedicated user for this process in the source instance, as follows:

CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

Now we can start replication, using the binary log coordinates that we captured before:

CHANGE MASTER TO MASTER_HOST='aurora-cluster-gh5s6lnli6.us-west-1.rds.amazonaws.com', MASTER_USER='repl', MASTER_PASSWORD='percona', MASTER_LOG_FILE='mysql-bin-changelog.034259', MASTER_LOG_POS=32068147;
START SLAVE;

Final Words

Unfortunately, there is no quick and easy method to get a large dataset out of an Aurora cluster. We have seen how mydumper and myloader can save a lot of time when creating external replicas, by introducing parallel operations. We also reviewed some good practices and configuration tricks for speeding up the data loading phase as much as possible.


Optimize your database performance with Percona Monitoring and Management, a free, open source database monitoring tool. Designed to work with Amazon RDS MySQL and Amazon Aurora MySQL with a specific dashboard for monitoring Amazon Aurora MySQL using Cloudwatch and direct sampling of MySQL metrics.

Visit the Demo

Aug
20
2020
--

Streaming Percona XtraBackup for MySQL to Multiple Destinations

Percona XtraBackup for MySQL to Multiple Destinations

Percona XtraBackup for MySQL to Multiple DestinationsHave you ever had to provision a large number of instances from a single backup? The most common use case is having to move to new hardware, but there are other scenarios as well. This kind of procedure can involve multiple backup/restore operations which can easily become a pain to administer. Let’s look at a potential way to make it easier using Percona Xtrabackup. The Percona XtraBackup tool provides a method of performing fast and reliable backups of your MySQL data while the system is running.

Leveraging Named Pipes

As per the Linux manual page, a FIFO special file (a named pipe) is similar to a pipe except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing.

For this particular case, we can leverage FIFOs and netcat utility to build a “chain” of streams from one target host to the next.

The idea is we take the backup on the source server and pipe it over the network to the first target. In this target, we create a FIFO that is then piped over the network to the next target. We can then repeat this process until we reach the final target.

Since the FIFO can be read by many processes at the same time, we can use it to restore the backup locally, in addition to piping it over to the next host.

Implementation

In order to perform the following operations, we need the netcat, percona-xtrabackup and qpress packages installed.

Assume we have the following servers:

  • source, target1, target2, target3, target4

We can set up a “chain” of streams as follows:

  • source -> target1 -> target2 -> target3 -> target4

Looking at the representation above, we have to build the chain in reverse order to ensure the “listener” end is started before the “sender” tries to connect. Let’s see what the process looks like:

  1. Create listener on the final node that extracts the stream (e.g. target4):
    nc -l 3306 | xbstream -p 4 -x -C /data/mysql/

    Note: the -p argument specifies the number of worker threads for reading/writing. It should be sized based on the available resources.

  2. Setup the next listener node. On target3:
    # create the fifo
    mkfifo xbackup.fifo 
    
    # forward the fifo to target4
    nc <target4> 3306 < xbackup.fifo & 
    
    # also extract the fifo locally
    nc -l 3306 | tee xbackup.fifo | xbstream -p 4 -x -C /data/mysql/
  3. Repeat step 2 for all the remaining nodes in the chain (minding the order).
    On target 2:

    # create the fifo
    mkfifo xbackup.fifo 
    
    # forward the fifo to target3
    nc <target4> 3306 < xbackup.fifo & 
    
    # also extract the fifo locally
    nc -l 3306 | tee xbackup.fifo | xbstream -p 4 -x -C /data/mysql/

    On target 1:

    # create the fifo
    mkfifo xbackup.fifo 
    
    # forward the fifo to target2
    nc <target4> 3306 < xbackup.fifo & 
    
    # also extract the fifo locally
    nc -l 3306 | tee xbackup.fifo | xbstream -p 4 -x -C /data/mysql/

    Note that we can introduce as many intermediate targets as we need.

  4. Finally, we start the backup on the source, and send it to the first target node:
    xtrabackup --user=root --password=percona --backup --compress --compress-threads=4 --parallel=6 --stream=xbstream --target-dir=/tmp | nc <target1> 3306

    If we got it right, all servers should start populating the target dir.

Wrapping Up

After the backup streaming is done, we need to decompress and recover on each node:

xtrabackup --decompress --remove-original --parallel=8 --target-dir=/data/mysql/ 
xtrabackup --prepare --use-memory=10G --target-dir=/data/mysql

Also, adjust permissions and start the restored server:

chown -R mysql: /data/mysql
service mysql start

Conclusion

We have seen how using named pipes, in combination with netcat, can make our lives easier when having to distribute a single backup across many different target hosts. As a final note, keep in mind that netcat sends the output over the network unencrypted. If transferring over the public internet, it makes sense to use Percona XtraBackup encryption, or replace netcat with ssh.

Aug
18
2020
--

Aligning Percona XtraBackup Versions with Percona Server for MySQL

Aligning Percona XtraBackup Versions with Percona Server for MySQL

Aligning Percona XtraBackup Versions with Percona Server for MySQLWe are excited to let you know about two upcoming changes to Percona XtraBackup which will align Percona XtraBackup Versions with Percona Server for MySQL.  These changes are to bring Percona XtraBackup release naming line with Percona Server and MySQL and ensure Percona XtraBackup executes complete backups.

The first is a change to the naming structure of releases.  This change is something we believe will help when working with Percona products and is designed to position Percona XtraBackup to be in sync with the current release naming strategy of Percona Server for MySQL and Percona XtraDB Cluster.

The current naming structure of releases is x.y.zz where x.y is the upstream major release which is in concert with the Percona XtraBackup release (i.e. 8.0) and zz represents the build.  So 8.0.13 is based on the upstream 8.0 release and is the 13th build by Percona based on 8.0.

The new naming convention will be x.y.aa-zz.b where x.y remain the same – the upstream base (i.e 8.0) and zz will still be the Percona build number.  The aa will be the upstream build number (i.e. 21 in the 8.0.21 release) and the b will be any custom builds created on the base release.

For demonstration purposes here are some examples:

Current Naming Future Naming
8.0.14 8.0.21-14 (where aa would be 21 which is the current upstream release and 14 is the next Percona build based on the upstream release)
8.0.15 8.0.22-15 (where aa would be 22 which is the next upstream release and 15 is the next Percona build)
8.0.15-1 8.0.22-15.1 (where aa would be 22 which is the next upstream release and 15 is the next Percona build and .1 is a custom build based on the 8.0.22-15)

 

We believe it is important to provide advanced notice of this upcoming change so that any required analysis of automated release processing can be done so you are ready when we do make this change.

The second change coming for those who use Percona XtraBackup is a processing change.  With the last two upstream releases have come changes to MySQL which caused Percona XtraBackup processing to be affected.  In order to make sure we are supporting our customers and providing a verified solution we will be implementing changes to Percona XtraBackup as follows:

  1. When a backup is requested a check will be made to ensure the version of Percona XtraBackup is at or above the version of the database being backed up (MySQL, Percona Server for MySQL).  This will be the default configuration.
  2. If the Percona XtraBackup version is lower than the database version, processing will be stopped and Percona XtraBackup will not be allowed to continue UNLESS the default configuration provided by Percona is modified to ignore this check.
  3. Please note if this override is applied our customers will be taking responsibility for the results which can include the appearance that a successful backup has been completed when in fact the backup is actually not viable to be used in a future restoration.

The safest way to ensure your data is backed up and available for restore is to keep the Percona XtraBackup version at or above your database version and implement the new default configuration.

We are providing this information on these upcoming changes so that our customers and the community are aware and can position their environments appropriately.  We intend to make these changes within the next 6 months.  We are also looking for your feedback to ensure that this is considered as we are developing this change.  Feel free to comment below or reach out to us on the Percona XtraBackup Forum.


Our solution brief “Get Up and Running with Percona Server for MySQL” outlines setting up a MySQL® database on-premises using Percona Server for MySQL. It includes failover and basic business continuity components.

Download PDF

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com