How to Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node in InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication (Shutterstock)

In this blog, we’ll look at how to quickly add a node to an InnoDB Cluster or Group Replication using Percona XtraBackup.

Adding nodes to a Group Replication cluster can be easy (documented here), but it only works if the existing nodes have retained all the binary logs since the creation of the cluster. Obviously, this is possible if you create a new cluster from scratch. The nodes rotate old logs after some time, however. Technically, if the


 set is non-empty, it means you will need another method to add a new node to a cluster. You also need a different method if data becomes inconsistent across cluster nodes for any reason. For example, you might hit something similar to this bug, or fall prey to human error.

Hot Backup to the Rescue

The quick and simple method I’ll present here requires the Percona XtraBackup tool to be installed, as well as some additional small tools for convenience. I tested my example on Centos 7, but it works similarly on other Linux distributions. First of all, you will need the Percona repository installed:

# yum install -y -q

Then, install Percona XtraBackup and the additional tools. You might need to enable the EPEL repo for the additional tools and the experimental Percona repo for XtraBackup 8.0 that works with MySQL 8.0. (Note: XtraBackup 8.0 is still not GA when writing this article, and we do NOT recommend or advise that you install XtraBackup 8.0 into a production environment until it is GA). For MySQL 5.7, Xtrabackup 2.4 from the regular repo is what you are looking for:

# grep -A3 percona-experimental-\$basearch /etc/yum.repos.d/percona-release.repo
name = Percona-Experimental YUM repository - $basearch
baseurl =$releasever/RPMS/$basearch
enabled = 1

# yum install pv pigz nmap-ncat percona-xtrabackup-80 -q
 Package                             Arch                 Version                             Repository                                 Size
 nmap-ncat                           x86_64               2:6.40-13.el7                       base                                      205 k
 percona-xtrabackup-80               x86_64               8.0.1-2.alpha2.el7                  percona-experimental-x86_64                13 M
 pigz                                x86_64               2.3.4-1.el7                         epel                                       81 k
 pv                                  x86_64               1.4.6-1.el7                         epel                                       47 k
Installing for dependencies:
 perl-DBD-MySQL                      x86_64               4.023-6.el7                         base                                      140 k
Transaction Summary
Install  4 Packages (+1 Dependent package)
Is this ok [y/d/N]: y

You need to do it on both the source and destination nodes. Now, my existing cluster node (I will call it a donor) – gr01 looks like this:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
  CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
   MEMBER_HOST: gr01
   MEMBER_PORT: 3306
1 row in set (0.00 sec)
gr01 > show global variables like 'gtid%';
| Variable_name                    | Value                                         |
| gtid_executed                    | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662 |
| gtid_executed_compression_period | 1000                                          |
| gtid_mode                        | ON                                            |
| gtid_owned                       |                                               |
| gtid_purged                      | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-295538 |
5 rows in set (0.01 sec)

The new node candidate (I will call it a joiner) – gr02, has no data but the same MySQL version installed. It also has the required settings in place, like the existing node address in group_replication_group_seeds, etc. The next step is to stop the MySQL service on the joiner (if already running), and wipe out it’s datadir:

[root@gr02 ~]# rm -fr /var/lib/mysql/*

and start the “listener” process, that waits to receive the data snapshot (remember to open the TCP port if you have a firewall):

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql

Then, start the backup job on the donor:

[root@gr01 ~]# xtrabackup --user=root --password=*** --backup --parallel=4 --stream=xbstream --target-dir=./ 2> backup.log |pv|pigz -c --fast| nc -w 2 4444
240MiB 0:00:02 [81.4MiB/s] [ <=>

On the joiner side, we will see:

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql
21.2MiB 0:03:30 [ 103kiB/s] [ <=> ]
[root@gr02 ~]# du -hs /var/lib/mysql
241M /var/lib/mysql

BTW, if you noticed the difference in transfer rate between the two, please note that on the donor side I put


 before the compressor while in the joiner before decompressor. This way, I can monitor the compression ratio at the same time!

The next step will be to prepare the backup on joiner:

[root@gr02 ~]# xtrabackup --use-memory=1G --prepare --target-dir=/var/lib/mysql 2>prepare.log
[root@gr02 ~]# tail -1 prepare.log
181019 19:18:56 completed OK!

and fix the files ownership:

[root@gr02 ~]# chown -R mysql:mysql /var/lib/mysql

Now we should verify the GTID position information and restart the joiner (I have the


 in my.cnf):

[root@gr02 ~]# cat /var/lib/mysql/xtrabackup_binlog_info
binlog.000023 893 aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662
[root@gr02 ~]# systemctl restart mysqld

Now, let’s check if the position reported by the node is consistent with the above:

gr02 > select @@GLOBAL.gtid_executed;
| @@GLOBAL.gtid_executed                        |
| aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302660 |
1 row in set (0.00 sec)

No, it is not. We have to correct it:

gr02 > reset master; set global gtid_purged="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662";
Query OK, 0 rows affected (0.05 sec)
Query OK, 0 rows affected (0.00 sec)

Finally, start the replication:

Query OK, 0 rows affected (3.91 sec)

Let’s check the cluster status again:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
2 rows in set (0.00 sec)
gr01 > select * from performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
                  COUNT_CONFLICTS_DETECTED: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
            LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302665
*************************** 2. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
                  COUNT_CONFLICTS_DETECTED: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
2 rows in set (0.00 sec)

OK, our cluster is consistent! The new node joined successfully as secondary. We can proceed to add more nodes!


The Importance of mysqlbinlog –version

Importance of MySQL binlog version

Importance of MySQL binlog versionWhen deciding on your backup strategy, one of the key components for Point In Time Recovery (PITR) will be the binary logs. Thankfully, the mysqlbinlog command allows you to easily take binary log backups, including those that would otherwise be encrypted on disk using encrypt_binlog=ON.



  is used with

--raw --read-from-remote-server --stop-never --verify-binlog-checksum

  then it will retrieve binary logs from whichever master it is pointed to, and store them locally on disk in the same format as they were written on the master. Here is an example with the extra arguments that would normally be used:

/usr/bin/mysqlbinlog --raw --read-from-remote-server \
 --stop-never --connection-server-id=1234 \
 --verify-binlog-checksum \
 --host=localhost --port=3306 mysql-bin.000024

This would retrieve the localhost binary logs (starting from mysql-bin.000024) reporting as server_id 1234, verify the checksum and then write each of them to disk.

Changes to the mysqlbinlog source code are relatively infrequent, except for when developing for a new major version, so you may be fooled into thinking that the specific version that you use is not so important—a little like the client version. This is something that is more likely to vary when you are taking remote backups.

Here is the result from the 5.7 branch of mysql-server to show the history of commits by year:

$ git blame --line-porcelain client/ | egrep "^(committer-time|committer-tz)" | cut -f2 -d' ' | while read ct; do read ctz; date --date "Jan 1, 1970 00:00:00 ${ctz} + ${ct} seconds" --utc +%Y; done | sort -n | uniq -c
   105 2000
    52 2001
    52 2002
   103 2003
   390 2004
   113 2005
    58 2006
   129 2007
   595 2008
    53 2009
   349 2010
   151 2011
   382 2012
   191 2013
   308 2014
   404 2015
    27 2016
    42 2017
    15 2018

Since the first GA release of 5.7 (October 2015), there haven’t been too many bugs and so if you aren’t using new features then you may think that it is OK to keep using the same version as before:

$ git log --regexp-ignore-case --grep bug --since="2015-10-19" --oneline client/
1ffd7965a5e Bug#27558169 BACKPORT TO 5.7 BUG #26826272: REMOVE GCC 8 WARNINGS [noclose]
17c92835bb3 Bug #24674276 MYSQLBINLOG -R --HEXDUMP CRASHES FOR INTVAR,                   USER_VAR, OR RAND EVENTS
052dbd7b079 BUG#26878022 MYSQLBINLOG: ASSERTION `(OLD_MH->M_KEY == KEY) ||              (OLD_MH->M_KEY == 0)' FAILED
543129a577c BUG#26878022 MYSQLBINLOG: ASSERTION `(OLD_MH->M_KEY == KEY) || (OLD_MH->M_KEY == 0)' FAILED
ba1a99c5cd7 Bug#26825211 BACKPORT FIX FOR #25643811 TO 5.7
567bb732bc0 Bug#22932576 MYSQL5.6 DOES NOT BUILD ON SOLARIS12
efc42d99469 Bug#22932576 MYSQL5.6 DOES NOT BUILD ON SOLARIS12
6772eb52d66 Bug#21697461 MEMORY LEAK IN MYSQLBINLOG

However, this is not always the case and some issues are more obvious than others! To help show this, here are a couple of the issues that you might happen to notice.

Warning: option ‘stop-never-slave-server-id’: unsigned value <xxxxxxxx> adjusted to <yyyyy>

The server_id that is used by a server in a replication topology should always be unique within the topology. One of the easy ways to ensure this is to use a conversion of the external IPv4 address to an integer, such as INET_ATON , which provides you with an unsigned integer.

The introduction of


 (which deprecates


 ) changes the behaviour here (for the better). Prior to this you may experience warnings where your server_id was cast to the equivalent of an UNSIGNED SMALLINT. This didn’t seem to be a reported bug, just fixed as a by-product of the change.

ERROR: Could not find server version: Master reported unrecognized MySQL version ‘xxx’

When running mysqlbinlog, the version of MySQL is checked so that the event format is set correctly. Here is the code from MySQL 5.7:

switch (*version) {
 case '3':
   glob_description_event= new Format_description_log_event(1);
 case '4':
   glob_description_event= new Format_description_log_event(3);
 case '5':
     The server is soon going to send us its Format_description log
     event, unless it is a 5.0 server with 3.23 or 4.0 binlogs.
     So we first assume that this is 4.0 (which is enough to read the
     Format_desc event if one comes).
   glob_description_event= new Format_description_log_event(3);
   glob_description_event= NULL;
   error("Could not find server version: "
         "Master reported unrecognized MySQL version '%s'.", version);
   goto err;

This section of the code last changed in 2008, but of course there is another vendor that no longer uses a 5-prefixed-version number: MariaDB. With MariaDB, it is impossible to take a backup without using a MariaDB version of the program, as you are told that the version is unrecognised. The MariaDB source code contains a change to this section to resolve the issue when the version was bumped to 10:

83c02f32375b client/ (Michael Widenius    2012-05-31 22:39:11 +0300 1900) case 5:
83c02f32375b client/ (Michael Widenius    2012-05-31 22:39:11 +0300 1901) case 10:

Interestingly, MySQL 8.0 gets a little closer to not producing an error (although it still does), but finally sees off those rather old ancestral relatives:

 switch (*version) {
   case '5':
   case '8':
   case '9':
       The server is soon going to send us its Format_description log
     glob_description_event = new Format_description_log_event;
     glob_description_event = NULL;
         "Could not find server version: "
         "Master reported unrecognized MySQL version '%s'.",
     goto err;

These are somewhat trivial examples. In fact, you are more likely to suffer from more serious differences, perhaps ones that do not become immediately apparent, if you are not matching the mysqlbinlog version to the one provided by the version for the server producing the binary logs.

Sadly, it is not so easy to check the versions as the reported version was seemingly left unloved for quite a while (Ver 3.4), so you should check the binary package versions (e.g. using Percona-Server-client-57-5.7.23 with Percona-Server-server-57-5.7.23). Thankfully, the good news is that MySQL 8.0 fixes it!

So reduce the risk and match your package versions!


PostgreSQL Backup Strategy for an Enterprise-Grade Environment

PostgreSQL enterprise-grade backup and recovery

PostgreSQL® logoIn this post we cover the methods used to achieve an enterprise-grade backup strategy for the PostgreSQL cluster. In setting up our demonstration system, we use pg_basebackup and continuous archiving. The size of the database and your database environment—if it is on colo or cloud—matters. It is very important to understand how we can ensure minimalistic or no data loss at the time of disasters using our preferred backup tools.

As discussed in the introductory blog post, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. In our last blog post we looked at security.

Backing up…

The ability to recover a database to a certain point in time is always the ultimate aim of a backup strategy. Without a recoverable backup, you have no backup! So there are always two parts to a backup strategy: backup AND recovery. The backup tool you use to achieve your aims may vary. Let’s discuss some of the options.


The community version of PostgreSQL comes with a trusted backup tool called pg_basebackup. This utility can help you take a consistent online binary backup that is very fast, can be used for point-in-time-recovery, and also for setting up slaves/replicas.

Archiving and continuous archiving

Postgres does not flush data pages to disk upon every transaction commit. To ensure that PostgreSQL achieves durability and endures crash recovery, changes are written to transactions logs (a.k.a. WALs, Write-Ahead Logs) stored on disk.

WALs in PostgreSQL are similar to transaction log files in the InnoDB storage engine for MySQL. However, WAL files are recycled according to the values set to the parameters wal_keep_segments and max_wal_size. Hence, if WAL files are not copied to a safe location—such as a backup server or another file system—it won’t be possible to achieve point-in-time-recovery (PITR).

In order to archive WAL segments to a safe location, the parameter archive_mode must be set to ‘ON’ and we must pass an appropriate shell command to the archive_command parameter. The shell command can be a copy (cp command in Linux) or a bash script that tells postgres where to archive its WAL files. Administrators commonly use an NFS share or SAN partition for this, mounted locally to facilitate the storage of these files.You may refer to the manual page PostgreSQL Archiving to learn more about archiving.

Demonstration scenario

For our demonstration set up, and taking into account the size of the database and the number of transactions, we use the following backup strategy:

  1. Nightly pg_basebackup that runs every day at 01:00 am.
  2. Continuous archiving of WAL’s to a remote backup server that is redundant and avoids single point of failure.

Using the above strategy, we can ensure that we are storing full backups daily. At the same time, the changes/transactions are backed up using the archiving solution available with PostgreSQL. In this way, we can achieve recovery to any point-back-in-time, depending on our backup retention policies.

Be sure to check out our webinar, Enterprise-grade PostgreSQL Built on Open Source Tools, to see this in practice. If you are reading this in time to join us on October 10th, you’ll have the chance to ask questions, too.

Other options

Depending on your database size, number of transactions and the environment, you may also consider using pgBackRest, Barman or WAL-g. These are stable open source backup solutions that have been contributed and are continuously maintained by community members. They help us to achieve one or more of the following features:

  1. Incremental backups
  2. Differential backups
  3. Features for building standby replicas
  4. Ability to stream backups to another server
  5. Streaming backups to AWS S3 or object store, removing the need to store backups locally or on locally mounted network shares before uploading to the cloud.

Here is a snippet of what is/isn’t possible with each of these backup tools today.

postgresl backup feature comparison




The post PostgreSQL Backup Strategy for an Enterprise-Grade Environment appeared first on Percona Database Performance Blog.


Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby

PostgreSQL Point in Time Recovery

PostgreSQL Point in Time RecoveryThe need to recover a database back to a certain point in time can be a nerve-racking task for DBAs and for businesses. Can this be simplified? Could it be made to work faster? Can we recover to a given point in time with zero loss of transactions/records? Fortunately, the answer to these questions is yes. PostgreSQL Point in Time Recovery (PITR) is an important facility. It offers DBAs the ability to restore a PostgreSQL database simply, quickly and without the loss of transactions or data.

In this post, we’ll help you to understand how this can be achieved, and reduce the potential for pain in the event of panic situations where you need to perform a PITR.

Before proceeding further, let us understand what could force us to perform a PITR.

  1. Someone has accidentally dropped or truncated a table.
  2. A failed deployment has made changes to the database that are difficult to reverse.
  3. You accidentally deleted or modified a lot of data, and as a consequence you cannot run your applications.

In such scenarios, you would immediately look for the latest full backup and the relevant transaction logs (aka WALs in PostgreSQL) to recover up to a known point in the past, before the error occurred. But what if your backup is corrupt and not valid?

Well, it is very important to perform a backup and recovery validation to ensure that the backups are always recoverable—we will address this in a future post. But, if the backup that you are looking at is corrupt, that can be a nightmare. One such unlucky incident for GitLab, where there was a backup restoration failure, caused a major outage followed by a data loss after recovery.

Even the best of plans can be hard to realize in practice.

It may be that our backups are intact and recoverable. Can we afford to wait until we copy/download the backup and recover it to another disk or server? What if the database size is several hundreds of GBs or several TBs like GitLab’s?

The solution to the problem is: add another standby that is always delayed by a few hours or a day.

This is one of the great features available in PostgreSQL. If you have migrated from Oracle RDBMS to PostgreSQL, you can think of it as an equivalent to FLASHBACK DATABASE in Oracle. Flashback database helps you to rewind data back in time. However, the technique does not work if you have dropped a data file. In fact, this is the case for both Oracle RDBMS and PostgreSQL PITR. ?

Adding a Delayed Standby in PostgreSQL

It is important that we use features like streaming replication to achieve high availability in PostgreSQL. Most of the environments have 1 master with 1 or more slaves (standby), either in the same data centre or geographically distributed. To save the time needed for PITR, you can add another slave that can always be delayed by a certain amount of time—this could be hours or days.

For example, if I know that my deployment is determined to be successful when no issues are observed in the first 12 hours, then I might delay one of the standbys by 12 hours.

To delay a standby, once you have setup streaming replication between your PostgreSQL master and slave, the following parameter needs to be added to the recovery.conf file of the slave, followed by a restart.

recovery_min_apply_delay = '12h' # or '1min' or 1d'

Now, let’s consider an example where you have inserted 10000 records at 10:27:34 AM and you have accidentally deleted 5000 records at 10:28:43 AM. Let’s say that you have a standby that is delayed by 1 hour. The steps to perform PITR using the delayed standby through until 10:27:34 AM look like this:

Steps to perform PostgreSQL Point in Time Recovery using a delayed standby

Step 1

Stop the slave (delayed standby) immediately, as soon as you have noticed that an accidental change has happened. If you know that the change has been already applied on the slave, then you cannot perform the point in time recovery using this method.

$ pg_ctl -D $PGDATA stop -mf

Step 2

Rename the recovery.conf file in your standby to another name.

$ mv $PGDATA/recovery.conf $PGDATA/recovery.conf.old

Step 3

Create a new recovery.conf file with the required parameters for PITR.

# recovery.conf file always exists in the Data Directory of Slave
recovery_target_time = '2018-06-07 10:27:34 EDT'
restore_command = 'sh /var/lib/pgsql/scripts/ %p %f'
recovery_target_action = 'pause'
recovery_target_inclusive = 'false'


Specifies the timestamp up to which you wish to recover your database.


Shell command that can be used by PostgreSQL to fetch the required Transaction Logs (WALs) for recovery.
PostgreSQL sends the arguments %p (path to WAL file) and % f (WAL file name) to this shell command. These arguments can be used in the script you use to copy your WALs.

Here is an example script for your reference. This example relies on rsync. The script connects to the backup server to fetch the WALs requested by PostgreSQL. (We’ll cover the procedure to archive these WALs in another blog post soon: this could be a good time to subscribe to the Percona blog mailing list!)

$ cat /var/lib/pgsql/scripts/
# Enable passwordless ssh to Backup Server
# $1 is %p substituted by postgres as the path to WAL File
# $2 is the %f substituted by postgres as the WAL File Name
rsync --no-motd -ave ssh ${Backup_Server}:${ArchiveDir}/${wal} ${wal_with_path} >>$LOG 2>&1
if [ "$?" -ne "0" ]
echo "Restore Failed for WAL : $wal" >> $LOG
exit 1


This is the action that needs to be performed after recovering the instance up to the recovery_target_time. Setting this to pause would let you modify the recovery_target_time after recovery, if you need to. You can then replay the transactions at a slow pace until your desired recovery target is reached. For example, you can recover until 2018-06-07 10:26:34 EDT and then modify recovery_target_time to 2018-06-07 10:27:34 EDT when using pause.

When you know that all the data you are looking for has been recovered, you can issue the following command to stop the recovery process, change the timeline and open the database for writes.

select pg_wal_replay_resume();

Other possible settings for this parameter are promote and shutdown. These do not allow you to replay a few more future transactions after the recovery, as you can with pause.


Whether to stop recovery just after the specified recovery_target_time(true) or before(false).

Step 4

Start PostgreSQL using pg_ctl. Now, it should read the parameters in recovery.conf and perform the recovery until the time you set in the recovery_target_time.

$ pg_ctl -D $PGDATA start

Step 5

Here is how the log appears. It says that has performed point-in-time-recovery and has reached a consistent state as requested.

2018-06-07 10:43:22.303 EDT [1423] LOG: starting point-in-time recovery to 2018-06-07 10:27:34-04
2018-06-07 10:43:22.607 EDT [1423] LOG: redo starts at 0/40005B8
2018-06-07 10:43:22.607 EDT [1423] LOG: consistent recovery state reached at 0/40156B0
2018-06-07 10:43:22.608 EDT [1421] LOG: database system is ready to accept read only connections
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery stopping before commit of transaction 570, time 2018-06-07 10:28:59.645685-04
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery has paused
2018-06-07 10:43:22.626 EDT [1423] HINT: Execute pg_wal_replay_resume() to continue.

Step 6

You can now stop recovery and open the database for writes after PITR.

Before executing the next command, you may want to verify that you have got all the desired data by connecting to the database and executing some SQL’s. You can still perform reads before you stop recovery. If you notice that you need another few minutes (or hours) of transactions, then modify the parameter recovery_target_time and go back to step 4. Otherwise, you can stop the recovery by running the following command.

$ psql
select pg_wal_replay_resume();

Summing up

Using PostgreSQL Point in time Recovery is the most simple of procedures that does not involve any effort in identifying the latest backups, transaction logs and space or server to restore in a database emergency. These things happen! Also, it could save a lot of time because the replay of WALs is much faster than rebuilding an entire instance using backups, especially when you have a huge database.

Important post script: I tested and recorded these steps using PostgreSQL 10.4. It is possible with PostgreSQL 9.x versions, however, the parameters could change slightly and you should refer to the PostgreSQL documentation for the correct syntax.

The post Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby appeared first on Percona Database Performance Blog.


Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide

mongodb backup and recovery field guide

mongodb backup and recovery field guidePlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt as he presents MongoDB Backup and Recovery Field Guide on Thursday, June 14, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.

Register for the webinar

Tim Vaillancourt

Senior Technical Operations Architect

With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS.

Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide appeared first on Percona Database Performance Blog.


Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

Point-In-Time Recovery

Point-In-Time RecoveryIn this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.

It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).

How to reverse from disaster faster

Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.

The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).

master [localhost] {msandbox} ((none)) > select @@version,@@version_comment;
| @@version     | @@version_comment                                      |
| 5.7.21-20-log | Percona Server (GPL), Release 20, Revision ed217b06ca3 |
1 row in set (0.00 sec)

First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:

master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
| count(*) |
| 200      |
1 row in set (0.00 sec)
master [localhost] {msandbox} ((none)) > delete from test.sbtest1;
Query OK, 200 rows affected (0.04 sec)
slave1 [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
| count(*) |
| 0        |
1 row in set (0.00 sec

So, our data is lost on both the master and slave!

Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:

$ dpkg -x mariadb-server-10.2_10.2.13+maria~wheezy_amd64.deb /opt/maria/
$ /opt/maria/usr/bin/mysqlbinlog --help|grep flash
-B, --flashback Flashback feature can rollback you committed data to a

It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:

$ mysqlbinlog -v --base64-output=DECODE-ROWS mysql-bin.000002 > mysql-bin.000002.sql
$ less mysql-bin.000002.sql

By searching through the decoded binary log, we are looking for transactions that have wiped out the table test.sbtest1. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):

# at 291
#180314 15:30:34 server id 1  end_log_pos 348 CRC32 0x06cd193e  Table_map: `test`.`sbtest1` mapped to number 111
# at 348
#180314 15:30:34 server id 1  end_log_pos 8510 CRC32 0x064634c5         Delete_rows: table id 111
### DELETE FROM `test`.`sbtest1`
###   @1=200
###   @2=101
###   @3='26157116088-21551255803-13077038767-89418462090-07321921109-99464656338-95996554805-68102077806-88247356874-53904987561'
###   @4='51157774706-69740598871-18633441857-39587481216-98251863874'
# at 38323
#180314 15:30:34 server id 1  end_log_pos 38354 CRC32 0x6dbb7127        Xid = 97

It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback -v --base64-output=DECODE-ROWS --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.sql

Inside, we find 200 of those. Looks good:

### INSERT INTO `test`.`sbtest1`
### SET
### @1=200

Since we verified the positions are correct, we can prepare a binary log file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.bin

and load it back to our master:

master [localhost] {msandbox} (test) > source mysql-bin.000002_flash.bin
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.04 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
master [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
| count(*) |
| 200      |
1 row in set (0.00 sec)

and double check they restored on slaves:

slave1 [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
| count(*) |
| 200      |
1 row in set (0.00 sec)

GTID problem

MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:

# at 2300
#180315 9:37:31 server id 1 end_log_pos 2365 CRC32 0x09e4d815 GTID last_committed=1 sequence_number=2 rbr_only=yes
SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:2'/*!*/;
# at 2365
#180315 9:37:31 server id 1 end_log_pos 2433 CRC32 0xac62a20d Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1521103051/*!*/;
# at 2433
#180315 9:37:31 server id 1 end_log_pos 2490 CRC32 0x275601d6 Table_map: `test`.`sbtest1` mapped to number 108
# at 2490
#180315 9:37:31 server id 1 end_log_pos 10652 CRC32 0xe369e169 Delete_rows: table id 108
# at 42355
#180315 9:37:31 server id 1 end_log_pos 42386 CRC32 0xe01ff558 Xid = 31
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
| count(*) |
| 0 |
1 row in set (0.00 sec)

Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the

--flashback option

. In my case, adding these lines worked (I used the next free available GTID sequence):

$ diff -u mysql-bin.000003.flash mysql-bin.000003.flash.gtid
--- mysql-bin.000003.flash 2018-03-15 10:20:20.080487998 +0100
+++ mysql-bin.000003.flash.gtid 2018-03-15 10:25:02.909953620 +0100
@@ -4,6 +4,10 @@
#180315 9:32:51 server id 1 end_log_pos 123 CRC32 0x941b189a Start: binlog v 4, server v 5.7.21-20-log created 180315 9:32:51 at startup
+# at 154
+#180315 9:37:05 server id 1 end_log_pos 219 CRC32 0x69e4ce26 GTID last_committed=0 sequence_number=1 rbr_only=yes
+SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:5'/*!*/;
@@ -724,6 +728,7 @@
+SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash.gtid
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
| count(*) |
| 200      |
1 row in set (0.00 sec


Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:

master [localhost] {msandbox} ((none)) > select @@binlog_format,@@binlog_row_image;
| @@binlog_format | @@binlog_row_image |
| ROW             | FULL               |
1 row in set (0.00 sec)

If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.

The post Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server appeared first on Percona Database Performance Blog.


Upcoming Webinar Wednesday August 16: Lock, Stock and Backup – Data Guaranteed


BackupJoin Percona’s, Technical Services Manager, Jervin Real as he presents Lock, Stock and Backup: Data Guaranteed on Wednesday, August 16, 2017 at 7:00 am PDT / 10:00 am EDT (UTC-7).

Backups are crucial in a world where data is digital and uptime is revenue. Environments are no longer bound to traditional data centers, and span multiple cloud providers and many heterogeneous environments. We need bulletproof backups and impeccable recovery processes. This talk aims to answer the question “How should I backup my MySQL databases?” by providing 3-2-1 backup designs, best practices and real-world solutions leveraging key technologies, automation techniques and major cloud provider services.

Register for the webinar here.

Jervin RealJervin Real

As Technical Services Manager, Jervin partners with Percona’s customers on building reliable and highly performant MySQL infrastructures while also doing other fun stuff like watching cat videos on the internet. Jervin joined Percona in April 2010. Starting as a PHP programmer, Jervin quickly learned the LAMP stack. He has worked on several high-traffic sites and a number of specialized web applications (such as mobile content distribution). Before joining Percona, Jervin also worked with several hosting companies, providing care for customer hosted services and data on both Linux and Windows.

Dealing with corrupted InnoDB data


MySQLData corruption! It can happen. Maybe because of a bug or storage problem that you didn’t expect, or MySQL crashes when a page checksum’s result is different from what it expected. Either way, corrupted data can and does occur. What do you do then?

Let’s look at the following example and see what can be done when you face this situation.

We have some valuable data:

> select * from t limit 4;
| i | c      |
| 1 | Miguel |
| 2 | Angel  |
| 3 | Miguel |
| 4 | Angel  |
> select count(*) from t;
| count(*) |
|  2097152 |

One day the query you usually run fails and your application stops working. Even worse, it causes the crash already mentioned:

> select * from t where i=2097151;
ERROR 2006 (HY000): MySQL server has gone away

Usually this is the point when panic starts. The error log shows:

2016-01-13 08:01:48 7fbc00133700 InnoDB: uncompressed page, stored checksum in field1 2912050650, calculated checksums for field1: crc32 1490770609, innodb 1549747911, none 3735928559, stored checksum in field2 1670385167, calculated checksums for field2: crc32 1490770609, innodb 2416840536, none 3735928559, page LSN 0 130051648, low 4 bytes of LSN at page end 1476903022, page number (if stored to page already) 4651, space id (if created with >= MySQL-4.1.1 and stored already) 7
InnoDB: Page may be an index page where index id is 22
InnoDB: (index "PRIMARY" of table "test"."t")
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4651.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also
InnoDB: about forcing recovery.
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4651.
InnoDB: You may have to recover from a backup.
2016-01-13 08:01:48 7fbc00133700 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex ad925dda0000122b0000122affffffff0000000007c06e4045bf00000000000000000

OK, our database is corrupted and it is printing the page dump in ASCII and hex. Usually, the recommendation is to recover from a backup. In case you don’t have one, the recommendation would be the same as the one given by the error log. When we hit corruption, first thing we should try is dumping the data and then re-importing to another server (if possible). So, how we can read a corrupted TABLE and avoid the crash? In most cases, the 


  option will help us. It has values from 1 to 6. They are documented here:

The idea is to start with 1. If that doesn’t work, proceed to 2. If it fails again, then go to 3 . . . until you find a value that allows you to dump the data. In this case I know that the problem is a corrupted InnoDB page, so a value of 1 should be enough:

“Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM tbl_name jump over corrupt index records and pages, which helps in dumping tables.”

We add


 and restart the service. Now it’s time to try and dump our data with


. If the corruption is even worse you need to keep trying different modes. For example, I have this error:

> create table t2 like t;
> insert into t2 select * from t;
ERROR 1034 (HY000): Incorrect key file for table 't'; try to repair it
> insert into t2 select * from t;
ERROR 1712 (HY000): Index t is corrupted


 doesn’t work here. It doesn’t allow me to dump the data:

# mysqldump -uroot -pmsandbox --port 5623 -h --all-databases > dump.sql
Error: Couldn't read status information for table t ()

but in my test server, it seems that



This procedure sounds good and usually works. The problem is that the feature is mostly broken after 5.6.15.


 values greater or equal 4 won’t allow the database to start:

2015-07-08 10:25:25 315 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-07-08 10:25:25 315 [ERROR] Aborting

Bug are reported and verified here:

That means that if you have Insert Buffer, Undo Log or Redo log corruption (values 4, 5 and 6) you can’t continue. What to do?

  • You can install a older version of MySQL (previous to 5.6.15) to use higher values of

    . Modes 4, 5 and 6 can corrupt your data (even more) so they are dangerous. If there are no backups this is our only option, so my recommendation would be to make a copy of the data we have now and then proceed with higher values of




  • If you are using Percona Server,

      can be used to dump the data. You can use the value “salvage”. When the option value is salvage, XtraDB allows read access to a corrupted tablespace, but ignores corrupted pages.

If you can’t still dump your data, then you should try more advance solutions like Undrop for InnoDB. Also, it would be good idea to start planning to create regular database backups.    :)

Powered by WordPress | Theme: Aeros 2.0 by