Jun
13
2018
--

Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide

mongodb backup and recovery field guide

mongodb backup and recovery field guidePlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt as he presents MongoDB Backup and Recovery Field Guide on Thursday, June 14, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.

Register for the webinar

Tim Vaillancourt

Senior Technical Operations Architect

With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS.

Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide appeared first on Percona Database Performance Blog.

Apr
18
2018
--

Restore a MongoDB Logical Backup

MongoDB Logical Backup

MongoDB Logical BackupIn this article, we will explain how to restore a MongoDB logical backup performed via ‘mongodump’ to a mongod instance.

MongoDB logical backup requires the use of the ‘mongorestore‘ tool to perform the restore backup. This article focuses on this tool and process.

Note: Percona develops a backup tool named Percona-Lab/mongodb-consistent-backup, which is a wrapper for ‘mongodump‘, adding cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the exact same steps as a regular ‘mongodump’ backup – no special steps!

Mongorestore Command Flags

–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017). If authorization is enabled, add –user/–password flags also.

–drop

This is almost always required. This causes ‘mongodump‘ to drop the collection that is being restored before restoring it. Without this flag, the documents from the backup are inserted one at a time and if they already exist the restore fails.

–oplogReplay

This is almost always required. Replays the oplog that was dumped by mongodump. It is best to include this flag on replset-based backups unless there is a specific reason not to. You can tell if the backup was from a replset by looking for the file ‘oplog.bson‘ at the base of the dump directory.

–dir

Required. The path to the mongodump data.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the restore. This is required if ‘mongodump‘ used the –gzip flag (look for *.bson.gz files if you’re not sure if the collection files have no .gz suffix, don’t use –gzip).

–numParallelCollections=<number>

Optional. For mongodump >= 3.2 only, sets the number of collections to insert in parallel. By default four threads are used, and if you have a large server and you want to restore faster (more resource usage though), you could increase this number. Note that each thread uncompresses bson if the ‘–gzip‘ flag is used, so consider this when raising this number.

Steps

  1. (Optional) If the backup is archived (mongodb_consistent_backup defaults to creating tar archives), un-archive the backup so that ‘mongorestore‘ can access the .bson/.bson.gz files:
    $ tar -C /opt/mongodb/backup/testbackup/20160809_1306 -xvf /opt/mongodb/backup/testbackup/20160809_1306/test1.tar
    test1/
    test1/dump/
    test1/dump/wikipedia/
    test1/dump/wikipedia/pages.metadata.json.gz
    test1/dump/wikipedia/pages.bson.gz
    test1/dump/oplog.bson

    ** This command un-tars the backup to ‘/opt/mongodb/backup/testbackup/20160809_1306/test1/dump’ **

  2. Check (and then check again!) that you’re restoring the right backup to the right host. When in doubt, it is safer to ask the customer or others.
    1. The Percona ‘mongodb_consistent_backup‘ tool names backup subdirectories by replica set name, so you can ensure you’re restoring the right backup by checking the replica set name of the node you’re restoring to, if it exists.
    2. If you’re restoring to a replica set you will need to restore to the PRIMARY member and there needs to be a majority (so writes are accepted – some exceptions if you override write-concern, but not advised).
  3. Use ‘mongorestore‘ to restore the data by dropping/restoring each collection (–drop flag) and replay the oplog changes (–oplogReplay flag), specifying the restore dir explicitly (–dir flag) to the ‘mongorestore‘ command. In this example I also used authorization (–user/–password flags) and un-compression (–gzip flag):
    $ mongorestore --drop --host localhost --port 27017 --user secret --password secret --oplogReplay --gzip --dir /opt/mongodb/backup/testbackup/20160809_1306/test1/dump
    2016-08-09T14:23:04.057+0200    building a list of dbs and collections to restore from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump dir
    2016-08-09T14:23:04.065+0200    reading metadata for wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.metadata.json.gz
    2016-08-09T14:23:04.067+0200    restoring wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.bson.gz
    2016-08-09T14:23:07.058+0200    [#######.................]  wikipedia.pages  63.9 MB/199.0 MB  (32.1%)
    2016-08-09T14:23:10.058+0200    [###############.........]  wikipedia.pages  127.7 MB/199.0 MB  (64.1%)
    2016-08-09T14:23:13.060+0200    [###################.....]  wikipedia.pages  160.4 MB/199.0 MB  (80.6%)
    2016-08-09T14:23:16.059+0200    [#######################.]  wikipedia.pages  191.5 MB/199.0 MB  (96.2%)
    2016-08-09T14:23:19.071+0200    [########################]  wikipedia.pages  223.5 MB/199.0 MB  (112.3%)
    2016-08-09T14:23:22.062+0200    [########################]  wikipedia.pages  255.6 MB/199.0 MB  (128.4%)
    2016-08-09T14:23:25.067+0200    [########################]  wikipedia.pages  271.4 MB/199.0 MB  (136.4%)
    ...
    ...
    2016-08-09T14:24:19.058+0200    [########################]  wikipedia.pages  526.9 MB/199.0 MB  (264.7%)
    2016-08-09T14:24:22.058+0200    [########################]  wikipedia.pages  558.9 MB/199.0 MB  (280.8%)
    2016-08-09T14:24:23.521+0200    [########################]  wikipedia.pages  560.6 MB/199.0 MB  (281.6%)
    2016-08-09T14:24:23.522+0200    restoring indexes for collection wikipedia.pages from metadata
    2016-08-09T14:24:23.528+0200    finished restoring wikipedia.pages (32725 documents)
    2016-08-09T14:24:23.528+0200    replaying oplog
    2016-08-09T14:24:23.597+0200    done
    1. If you encounter problems with ‘mongorestore‘, carefully read the error message or rerun with several ‘-v‘ flags, e.g.: ‘-vvv‘. Once you have an error, attempt to troubleshoot the cause.
  4. Check to see that you saw “replaying oplog” and “done” after the restore (last two lines in the example). If you don’t see this, there is a problem.

As you notice, using this tool for MongoDB logical backup is very simple. However, when using sharding please note that –oplog is not available and the mongodump uses the primaries for each shard. As this is not advised typically in production, you might consider looking at Percona-Lab/mongodb-consistent-backup to ensure you are consistent and hitting secondary nodes, like mongodump with replica sets, will work.

If MongoDB and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Restore a MongoDB Logical Backup appeared first on Percona Database Performance Blog.

Apr
12
2018
--

Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server

Point-In-Time Recovery

Point-In-Time RecoveryIn this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.

It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).

How to reverse from disaster faster

Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.

The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).

master [localhost] {msandbox} ((none)) > select @@version,@@version_comment;
+---------------+--------------------------------------------------------+
| @@version     | @@version_comment                                      |
+---------------+--------------------------------------------------------+
| 5.7.21-20-log | Percona Server (GPL), Release 20, Revision ed217b06ca3 |
+---------------+--------------------------------------------------------+
1 row in set (0.00 sec)

First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:

master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)
master [localhost] {msandbox} ((none)) > delete from test.sbtest1;
Query OK, 200 rows affected (0.04 sec)
slave1 [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 0        |
+----------+
1 row in set (0.00 sec

So, our data is lost on both the master and slave!

Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:

$ dpkg -x mariadb-server-10.2_10.2.13+maria~wheezy_amd64.deb /opt/maria/
$ /opt/maria/usr/bin/mysqlbinlog --help|grep flash
-B, --flashback Flashback feature can rollback you committed data to a

It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:

$ mysqlbinlog -v --base64-output=DECODE-ROWS mysql-bin.000002 > mysql-bin.000002.sql
$ less mysql-bin.000002.sql

By searching through the decoded binary log, we are looking for transactions that have wiped out the table test.sbtest1. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):

BEGIN
/*!*/;
# at 291
#180314 15:30:34 server id 1  end_log_pos 348 CRC32 0x06cd193e  Table_map: `test`.`sbtest1` mapped to number 111
# at 348
#180314 15:30:34 server id 1  end_log_pos 8510 CRC32 0x064634c5         Delete_rows: table id 111
...
### DELETE FROM `test`.`sbtest1`
### WHERE
###   @1=200
###   @2=101
###   @3='26157116088-21551255803-13077038767-89418462090-07321921109-99464656338-95996554805-68102077806-88247356874-53904987561'
###   @4='51157774706-69740598871-18633441857-39587481216-98251863874'
# at 38323
#180314 15:30:34 server id 1  end_log_pos 38354 CRC32 0x6dbb7127        Xid = 97
COMMIT/*!*/;

It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback -v --base64-output=DECODE-ROWS --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.sql

Inside, we find 200 of those. Looks good:

### INSERT INTO `test`.`sbtest1`
### SET
### @1=200
...

Since we verified the positions are correct, we can prepare a binary log file:

$ /opt/maria/usr/bin/mysqlbinlog --flashback --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.bin

and load it back to our master:

master [localhost] {msandbox} (test) > source mysql-bin.000002_flash.bin
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.04 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
master [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)

and double check they restored on slaves:

slave1 [localhost] {msandbox} (test) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec)

GTID problem

MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:

# at 2300
#180315 9:37:31 server id 1 end_log_pos 2365 CRC32 0x09e4d815 GTID last_committed=1 sequence_number=2 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:2'/*!*/;
# at 2365
#180315 9:37:31 server id 1 end_log_pos 2433 CRC32 0xac62a20d Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1521103051/*!*/;
BEGIN
/*!*/;
# at 2433
#180315 9:37:31 server id 1 end_log_pos 2490 CRC32 0x275601d6 Table_map: `test`.`sbtest1` mapped to number 108
# at 2490
#180315 9:37:31 server id 1 end_log_pos 10652 CRC32 0xe369e169 Delete_rows: table id 108
...
# at 42355
#180315 9:37:31 server id 1 end_log_pos 42386 CRC32 0xe01ff558 Xid = 31
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the

--flashback option

. In my case, adding these lines worked (I used the next free available GTID sequence):

$ diff -u mysql-bin.000003.flash mysql-bin.000003.flash.gtid
--- mysql-bin.000003.flash 2018-03-15 10:20:20.080487998 +0100
+++ mysql-bin.000003.flash.gtid 2018-03-15 10:25:02.909953620 +0100
@@ -4,6 +4,10 @@
DELIMITER /*!*/;
#180315 9:32:51 server id 1 end_log_pos 123 CRC32 0x941b189a Start: binlog v 4, server v 5.7.21-20-log created 180315 9:32:51 at startup
ROLLBACK/*!*/;
+# at 154
+#180315 9:37:05 server id 1 end_log_pos 219 CRC32 0x69e4ce26 GTID last_committed=0 sequence_number=1 rbr_only=yes
+/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
+SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:5'/*!*/;
BINLOG '
sy+qWg8BAAAAdwAAAHsAAAAAAAQANS43LjIxLTIwLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACzL6paEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
@@ -724,6 +728,7 @@
'/*!*/;
COMMIT
/*!*/;
+SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash.gtid
(...)
master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1;
+----------+
| count(*) |
+----------+
| 200      |
+----------+
1 row in set (0.00 sec

Limitations

Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:

master [localhost] {msandbox} ((none)) > select @@binlog_format,@@binlog_row_image;
+-----------------+--------------------+
| @@binlog_format | @@binlog_row_image |
+-----------------+--------------------+
| ROW             | FULL               |
+-----------------+--------------------+
1 row in set (0.00 sec)

If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.

The post Flashback: Another Take on Point-In-Time Recovery (PITR) in MySQL/MariaDB/Percona Server appeared first on Percona Database Performance Blog.

Apr
10
2018
--

Migrating Database Charsets to utf8mb4: A Story from the Trenches

utf8mb4

utf8mb4In this blog post, we’ll look at options for migrating database charsets to utf8mb4.

Migrating charsets, in my opinion, is one of the most tedious tasks in a DBA’s life. There are so many things involved that can screw up our data, making it work is always hard. Sometimes what seems like a trivial task can become a nightmare very easily, and keeps us working for longer than expected.

I’ve recently worked on a case that challenged me with lots of tests due to some existing schema designs that made InnoDB suffer. I’ve decided to write this post to put together some definitive guide to enact charset conversion with minimal downtime and pain.

  • First disclosure: I can’t emphasize enough that you need to always backup your data. If something goes wrong, you can always roll things back by keeping a healthy set of backups.
  • Second disclosure: A backup can’t be considered a good backup until you test it, so I can’t emphasize enough that running regular backups and also performing regular restore tests is a must-to-do task for being in the safe side.
  • Third and last disclosure: I’m not pretending to present the best or only way to do this exercise. This is the way I consider easiest and painless to perform a charset conversion with minimal downtime.

My approach involves at least one slave for failover and logical/physical backup operations to make sure that data is loaded properly using the right charset.

In this case, we are moving from latin1 (default until MySQL 8.0.0) to utf8mb4 (new default from 8.0.1). In this post, Lefred refers to this change and some safety checks for upgrading. For our change, an important thing to consider: Latin1 charset stores one byte per character, while utf8mb4 can store up to four bytes per character. This change definitely impacts the disk usage, but also makes us hit some limits that I describe later in the plan.

So let’s put out hands in action. First, let’s create a slave using a fresh (non-locking) backup. Remember that these operations are designed to minimize downtime and reduce any potential impact on our production server.

If you already have a slave that can act as a master replacement then you can skip this section. In our source server, configure binlog_format and flush logs to start with fresh binary logs:

set global binlog_format=MIXED;
flush logs;

Start a streaming backup using Percona Xtrabackup through netcat in the destination server:

nc -l 9999 | cat - > /dest/folder/backup.tar

and in our source server:

innobackupex --stream=tar ./ | nc dest_server_ip 9999

Once the backup is done, untar and restore the backup. Then set up the slave:

tar -xif /dest/folder/backup.tar
innobackupex --apply-log /dest/folder/
/etc/init.d/mysql stop
rm -rf /var/lib/mysql/
mv /dest/folder/* /var/lib/mysql/
chown -R mysql:mysql /var/lib/mysql
/etc/init.d/mysql start
cat /var/lib/mysql/xtrabackup_binlog_info
change master to master_host='master_host', master_user='master_user, master_password='master_password', master_log_file='file_printed_in_xtrabackup_binlog_info', master_log_pos=pos_printed_in_xtrabackup_binlog_info;
start slave;

Now that we have the slave ready, we prepare our dataset by running two mysqldump processes so we have data and schemas in separate files. You can also run this operation using MyDumper or mysqlpump, but I will keep it easy:

STOP SLAVE;
SHOW SLAVE STATUS;

Write down this output, as it may be needed later:

mysqldump --skip-set-charset --no-data --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > schema.sql
mysqldump --skip-set-charset -n -t --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > data.sql

Notice that I’m passing a command as an argument to –databases to dump all databases but mysql, performance_schema and information_schema (hack stolen from this post, with credit to Ronald Bradford).  It is very important to keep the replication stopped, as we will resume replication after fully converting our charset.

Now we have to convert our data to utf8mb4. This is easy as we just need to touch the schema.sql file by running few commands:

sed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8mb4/g" schema.sql
sed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci /" schema.sql
sed -e "s/SET character_set_client = utf8/SET character_set_client = utf8mb4/" schema.sql

Can this be a one-liner? Yes, but I’m not a good basher. ?

Now we are ready to restore our data using new encoding:

mysql -e "set global innodb_large_prefix=1;"
mysql < schema.sql
mysql < data.sql

Notice I’ve enabled the variable innodb_large_prefix. This is important because InnoDB limits index prefixes to 768 bytes by default. If you have an index based in a varchar(255) data type, you will get an error because the new charset exceeds this limit (up to four bytes per character goes beyond 1000 bytes) unless you limit the index prefix. To avoid issues during data load, we enable this variable to extend the limit to 3072 bytes.

Finally, let’s configure our server and restart it to make sure to set new defaults properly. In the my.cnf file, add:

[client]
default-character-set=utf8mb4
[mysqld]
skip-slave-start
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
innodb_large_prefix=1

Let’s resume replication after the restart, and make sure everything is ok:

START SLAVE;
SHOW SLAVE STATUS;

Ok, at this point we should be fine and our data should be already converted to utf8mb4. So far so good. The next step is to failover applications to use the new server, and rebuild the old server using a fresh backup using xtrabackup as described above.

There are few things we need to consider now before converting this slave into master:

  1. Make sure you properly configured applications. Charset and collation values can be set as session level, so if you set your connection driver to another charset then you may end up mixing things in your data.
  2. Make sure the new slave is powerful enough to handle traffic from the master.
  3. Test everything before failing over production applications. Going from Latin1 to utf8mb4 should be straightforward, as utf8mb4 includes all the characters in Latin1. But let’s face it, things can go wrong and we are trying to avoid surprises.
  4. Last but not least, all procedures were done in a relatively small/medium sized dataset (around 600G). But this conversion (done via logical backups) is more difficult when talking about big databases (i.e., in the order of TBs). In these cases, the procedure helps but might not be good enough due to time restrictions (imagine loading a 1TB table from a logical dump — it take ages). If you happen to face such a conversion, here is a short, high-level plan:
    • Convert only smaller tables in the slave (i.e., those smaller than 500MB) following same procedure. Make sure to exclude big tables from the dump using the –ignore-tables parameter in mysqldump.
    • Convert bigger tables via alter table, as follows:
      ALTER TABLE big_table MODIFY latin1_column varbinary(250);
      ALTER TABLE big_table MODIFY latin1_column varchar(250) CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
    • Once everything is finished, you can resume replication. Notice you can do dump/conversion/restore in parallel with the altering of bigger tables, which should reduce the time required for conversion.

It’s important to understand why we need the double conversion from latin1 to varbinary to utf8mb4. This post from Marco Tusa largely explains this.

Conclusion

I wrote this guide from my experience working with these type of projects. If you Google a bit, you’ll find a lot of resources that make this work, along with different solutions. What I’ve tried to present here is a guide to help you deal with these projects. Normally, we have to perform these changes in existing datasets that sometimes are big enough to prevent any work getting done via ALTER TABLE commands. Hopefully, you find this useful!

The post Migrating Database Charsets to utf8mb4: A Story from the Trenches appeared first on Percona Database Performance Blog.

Apr
06
2018
--

Free, Fast MongoDB Hot Backup with Percona Server for MongoDB

MongoDB Hot Backups

In this blog post, we will discuss the MongoDB Hot Backup feature in Percona Server for MongoDB and how it can help you get a safe backup of your data with minimal impact.

Percona Server for MongoDB

Percona Server for MongoDB is Percona’s open-source fork of MongoDB, aimed at having 100% feature compatibility (much like our MySQL fork).MongoDB Hot Backup We have added a few extra features to our fork for free that are only available with MongoDB Enterprise binaries for an additional fee.

The feature pertinent to this article is our free, open-source Hot Backup feature for WiredTiger and RocksDB, only available in Percona Server for MongoDB.

Essentially, this Hot Backup feature adds a MongoDB server command that creates a full binary backup of your data set to a new directory with no locking or impact to the database, aside from some increased resource usage due to copying the data.

It’s important to note these backups are binary-level backups, not logical backups (such as mongodump would produce).

Logical vs. Binary Backups

Before the concept of a MongoDB Hot Backup, the only way to backup a MongoDB instance, cluster or replica set was using the logical backup tool ‘mongodump’, or using block-device (binary) snapshots.

A “binary-level” backup means backup data contains the data files (WiredTiger or RocksDB) that MongoDB stores on disk. This is different from the BSON representation of the data that ‘mongodump’ (a “logical” backup tool) outputs.

Binary-level backups are generally faster than logical because a logical backup (mongodump) requires the server to read all data and return it to the MongoDB Client API. Once received, ‘mongodump’ serializes the payload into .bson files on disk. Important areas like indices are not backed up in full, merely the metadata describing the index is backed up. On restore, the entire process is reversed: ‘mongorestore’ must deserialize the data created by ‘mongodump’, send it over the MongoDB Client API to the server, then the server’s storage engine must translate this into data files on disk. Due to only metadata of indices being backed up in this approach, all indices must be recreated at restore time. This is a serial operation for each collection! I have personally restored several databases where the majority of the restore time was the index rebuilds and NOT the actual restore of the raw data!

In contrast, binary-level backups are much simpler: the storage-engine representation of the data is what is backed up. This style of backup includes the real index data, meaning a restore is as simple as copying the backup directory to be in the location of the MongoDB dbPath and restarting MongoDB. No recreation of indices is necessary! For very large datasets, this can be a massive win. Hours or even days of restore time can be saved due to this efficiency.

Of course, there are always some tradeoffs. Binary-level backups can take a bit more space on disk and care must be taken to ensure the files are restored to the right version of MongoDB on matching CPU architecture. Generally, backing up the MongoDB Configuration File and version number with your backups addresses this concern.

‘createBackup’ Command

The Hot Backup feature is triggered by a simple admin command via the ‘mongo’ shell named ‘createBackup’. This command requires only one input: the path to output the backup to, named ‘backupDir’. This backup directory must not exist or an error is returned.

If you have MongoDB Authorization enabled (I hope you do!), this command requires the built-in role: ‘backup’ or a role that inherits the “backup” role.

An example in the  ‘mongo’ shell:

> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017"
  })
{ "ok" : 1 }

When this command returns an “ok”, a full backup is available to be read at the location specified in ‘backupDir’. This end-result is similar to using block-device snapshots such as LVM snapshots, however with less overhead vs. the 10-30% write degradation many users report at LVM snapshot time.

This backup directory can be deleted with a regular UNIX/Linux “rm -rf ...” command once it is no longer required. A typical deployment archives this directory and/or upload the backup to a remote location (Rsync, NFS, AWS S3, Google Cloud Storage, etc.) before removing the backup directory.

WiredTiger vs. RocksDB Hot Backups

Although the ‘createBackup’ command is syntactically the same for WiredTiger and RocksDB, there are some big differences in the implementation of the backup behind-the-scenes.

RocksDB is a storage engine available in Percona Server for MongoDB that uses a level-tiered compaction strategy that is highly write-optimized. As RocksDB uses immutable (write-once) files on disk, it can provide much more efficient backups of the database by using filesystem “hardlinks” to a single inode on disk. This is important to know for large data sets as this requires exponentially less overhead to create a backup.

If your RocksDB-based server ‘createBackup’ command uses an output path that is on the same disk volume as the MongoDB dbPath (very important requirement), hardlinks are used instead of copying most of the database data! If only 5% of the data changes during backup, only 5% of data is duplicated/copied. This makes backups potentially much faster than WiredTiger, which needs to make a full copy of the data and use two-times as much disk space as a result.

Here is an example of a ‘createBackup’ command on a RocksDB-based mongod instance that uses ‘/data/mongodb27017’ as a dbPath:

$ mongo --port=27017
test1:PRIMARY> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017.rocksdb"
})
{ "ok" : 1 }
test1:PRIMARY> quit()

Seeing we received { “ok”: 1 }, the backup is ready at our output path. Let’s see:

$ cd /data/backup27017.rocksdb
$ ls -alh
total 4.0K
drwxrwxr-x. 3 tim tim  36 Mar  6 15:25 .
drwxr-xr-x. 9 tim tim 147 Mar  6 15:25 ..
drwxr-xr-x. 2 tim tim 138 Mar  6 15:25 db
-rw-rw-r--. 1 tim tim  77 Mar  6 15:25 storage.bson
$ cd db
$ ls -alh
total 92K
drwxr-xr-x. 2 tim tim  138 Mar  6 15:25 .
drwxrwxr-x. 3 tim tim   36 Mar  6 15:25 ..
-rw-r--r--. 2 tim tim 6.4K Mar  6 15:21 000013.sst
-rw-r--r--. 2 tim tim  18K Mar  6 15:22 000015.sst
-rw-r--r--. 2 tim tim  36K Mar  6 15:25 000017.sst
-rw-r--r--. 2 tim tim  12K Mar  6 15:25 000019.sst
-rw-r--r--. 1 tim tim   16 Mar  6 15:25 CURRENT
-rw-r--r--. 1 tim tim  742 Mar  6 15:25 MANIFEST-000008
-rw-r--r--. 1 tim tim 4.1K Mar  6 15:25 OPTIONS-000005

Inside the RocksDB ‘db’ subdirectory we can see .sst files containing the data are there! As this MongoDB instance stores data on the same disk at ‘/data/mongod27017’, let’s prove that RocksDB created a “hardlink” instead of a full copy of the data.

First, we get the Inode number of an example .sst file using the ‘stat’ command. I chose the RocksDB data file: ‘000013.sst’:

$ stat 000013.sst
  File: ‘000013.sst’
  Size: 6501      	Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 33556899    Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tim)   Gid: ( 1000/     tim)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2018-03-06 15:21:10.735310581 +0200
Modify: 2018-03-06 15:21:10.738310479 +0200
Change: 2018-03-06 15:25:56.778556981 +0200
 Birth: -

Notice the Inode number for this file is 33556899. After the ‘find’ command can be used to find all files pointing to Inode 33556899 on /data:

$ find /data -inum 33556899
/data/mongod27017/db/000013.sst
/data/backup27017.rocksdb/db/000013.sst

Using the ‘-inum’ (Inode Number) flag of find, here we can see .sst files in both the live MongoDB instance (/data/mongodb27017) and the backup (/data/backup27017.rocksdb) are pointing to the same inode on disk for their ‘000013.sst’ file, meaning this file was NOT duplicated or copied during the hot backup process. Only metadata was written to point to the same inode! Now, imagine this file was 1TB+ and this becomes very impressive!

Restore Time

MongoDB Hot BackupIt bears repeating that restoring a logical, mongodump-based backup is very slow; Indices are rebuilt in serial for each collection and both mongorestore and the server need to spend time translating data from logical to binary representations.

Sadly, it is extremely rare that backup restore times are tested and I’ve seen large users of MongoDB disappointed to find that logical-backup restores will take several hours, an entire day or longer while their Production is on fire.

Thankfully, binary-level backups are very easy to restore: the backup directory needs to be copied to the location of the (stopped) MongoDB instance dbPath and then the instance just needs to be started with the same configuration file and version of MongoDB. No indices are rebuilt and there is no time spent rebuilding the data files from logical representations!

Percona-Labs/mongodb_consistent_backup Support

We have plans for our Percona-Labs/mongodb_consistent_backup tool to support the ‘createBackup’/binary-level method of backups in the future. See more about this tool in this Percona Blog post: https://www.percona.com/blog/2017/05/10/percona-lab-mongodb_consistent_backup-1-0-release-explained.

Currently, this project supports cluster-consistent backups using ‘mongodump’ (logical backups only), which are very time consuming for large systems.

Support for ‘createBackup’ in this tool greatly reduces the overhead and time required for backups of clusters, but it requires some added complexity to support remote filesystems it would use.

Conclusion

As this article outlines, there are a lot of exciting developments that have reduced the impact of taking backups of your systems. More important than the backup efficiency is the time it takes to recover your important data from backup, an area where “hot” binary backups are the clear winner by leaps and bounds.

If MongoDB hot backup and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Free, Fast MongoDB Hot Backup with Percona Server for MongoDB appeared first on Percona Database Performance Blog.

Apr
04
2018
--

AWS adds automated point-in-time recovery to DynamoDB

One of the joys of cloud computing is handing over your data to the cloud vendor and letting them handle the heavy lifting. Up until now that has meant they updated the software or scaled the hardware for you. Today, AWS took that to another level when it announced Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR).

With this new service, the company lets you simply enable the new backup tool, and the backup happens automatically. Amazon takes care of the rest, providing a continuous backup of all the data in your DynamoDB database.

But it doesn’t stop there, it lets the backup system act as a recording of sorts. You can rewind your data set to any point in time in the backup to any time with “per second granularity” up to 35 days in the past. What’s more, you can access the tool from the AWS Management Console, an API call or via the AWS Command Line Interface (CLI).

Screenshot: Amazon

“We built this feature to protect against accidental writes or deletes. If a developer runs a script against production instead of staging or if someone fat-fingers a DeleteItem call, PITR has you covered. We also built it for the scenarios you can’t normally predict,” Amazon’s Randall Hunt wrote in the blog post announcing the new feature.

If you’re concerned about the 35 day limit, you needn’t be as the system is an adjunct to your regular on-demand backups, which you can keep for as long as you need.

Amazon’s Chief Technology Officer, Werner Vogels, who introduced the new service at the Amazon Summit in San Francisco today, said it doesn’t matter how much data you have. Even with a terabyte of data, you can make use of this service. “This is a truly powerful mechanism here,” Vogels said.

The new service is available in various regions today. You can learn about regional availability and pricing options here.

Feb
22
2018
--

How to Restore MySQL Logical Backup at Maximum Speed

Restore MySQL Logical Backup

Restore MySQL Logical BackupThe ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.

Disclaimer

All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.

Benchmark

There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.

Database:

  • 16GB – InnoDB database size
  • 6.6GB – uncompressed mysqldump sql
  • 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
  --innodb_buffer_pool_size=4GB
  --innodb_log_file_size=1G
  --skip-log-bin
  --innodb_flush_log_at_trx_commit=0
  --innodb_flush_method=nosync
  --innodb_io_capacity=2000
  --innodb_io_capacity_max=3000
  --max_allowed_packet=1G
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
)
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
)
# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/test-restore.sh
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
done
)
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
)
#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

  • Using four parallel jobs for load: 17m3.662s
  • Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
17m59.866s
mydumper -B imdb1 --no-locks -o default_options
17m15.175s
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
16m36.878s
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB
18m15.266s

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years: https://bugs.launchpad.net/mydumper/+bug/1640550.

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

  • Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
  • 24 parallel processes of mysqlimport: 11m51.718s
  • Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
  • Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.
  • In many places, disable unique checks referred as a performance booster: 10m52.489s
    You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
  • The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.
  • O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.
  • O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.

Conclusions

  • There is no common solution to improve logical backup restore procedure.
  • If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
  • Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
  • The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
  • InnoDB log file size and buffer pool size are really important options for backup restore performance.
  • O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
  • If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
  • Hyperthreading also counts.
  • A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
  • Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
  • You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
  • Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
  • Default MySQL settings could cost you 10x backup restore slowdown
  • This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
  • If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).
Nov
27
2017
--

autoxtrabackup v1.5.0: A Tool for Automatic Backups

autoxtrabackup

autoxtrabackupThere is a new version of the autoxtrabackup tool. In this post, I’ll provide some of the highlights available this time around.

autoxtrabackup is a tool created by PerconLabs. We’ve now put out the 1.5.0 version, and you can test it further.

Note: PerconaLabs is a repository for various open source scripts and tools created by Percona staff. Items added to the PerconaLabs repository are not official Percona products, and aren’t covered by support or services agreements. 

autoxtrabackup is written in Python3 and hosted in PerconaLab (forked from Shako’s repo). Basically, this tool automates backup/prepare/copy-back actions. I want to talk about recent changes and additions.

First of all, autoxtrabackup now has a --test_mode option, intended to test XtraBackup automation process.

Here is the brief flow for this:

  • Clone percona-qa repo
  • Clone Percona Server for MySQL 5.6 and 5.7 from github.
  • Build PS servers in debug mode.
  • Get 2.3 and 2.4 versions of XtraBackup
  • Generate autoxtrabackup .conf files for each version of PS and XtraBackup
  • Pass different combination of options to PS start command and initialize PS servers each time with different options
  • Run sysbench against each started PS server
  • Take backup in cycles for each started PS + prepare
  • If make_slaves is defined, then create slave1 server from this backup (i.e., copy-back to another directory and start the slave from it)
  • Then take a backup, prepare and copy-back from this new slave1 to create slave2
  • Run pt-table-checksum on the master to check backup consistency

I have prepared my environment, and now want to start --test_mode. Basically, it creates option combinations and passes them to the start script:

2017-11-15 22:28:21 DEBUG    Starting cycle1
2017-11-15 22:28:21 DEBUG    Will start MySQL with --innodb_buffer_pool_size=1G --innodb_log_file_size=1G
--innodb_page_size=64K --early-plugin-load=keyring_file.so
--keyring_file_data=/home/shahriyar.rzaev/XB_TEST/server_dir/PS131117-percona-server-5.7.19-17-linux-x86_64/mysql-keyring/keyring
--log-bin=mysql-bin --log-slave-updates --server-id=1 --gtid-mode=ON --enforce-gtid-consistency --binlog-format=row

So as you see, it is starting MySQL with --innodb_buffer_pool_size=1G --innodb_log_file_size=1G --innodb_page_size=64K. In cycle2, it will likely pick --innodb_buffer_pool_size=1G --innodb_log_file_size=1G --innodb_page_size=32K, and so on. It depends what you have passed in config:

# Do not touch; this is for --test_mode, which is testing for XtraBackup itself.
[TestConf]
ps_branches=5.6 5.7
gitcmd=--recursive --depth=1 https://github.com/percona/percona-server.git
testpath=/home/shahriyar.rzaev/XB_TEST/server_dir
incremental_count=3
#make_slaves=1
xb_configs=xb_2_4_ps_5_6.conf xb_2_4_ps_5_7.conf xb_2_3_ps_5_6.conf
default_mysql_options=--log-bin=mysql-bin,--log-slave-updates,--server-id={},--gtid-mode=ON,--enforce-gtid-consistency,--binlog-format=row
mysql_options=--innodb_buffer_pool_size=1G 2G 3G,--innodb_log_file_size=1G 2G 3G,--innodb_page_size=4K 8K 16K 32K 64K

You can pass more options by changing the mysql_options in the config file. Also you can specify how many incremental backups you want by setting the incremental_count option. You can enable creating slaves from backup to test it as well, by enabling the make_slaves option. This is not recommended for daily usage. You can read more about it here: –test_mode.

For daily backup actions, I have added the --tag and --show_tags options, which can be quite useful. They help you to tag your backups. Take a full backup:

$ sudo autoxtrabackup --tag="My Full backup" -v
-lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
-l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

Take an incremental one:

$ autoxtrabackup --tag="First incremental backup" -v
-lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
-l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

Take a second incremental one:

$ autoxtrabackup --tag="Second incremental backup" -v
-lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
-l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

Now you can use the --show_tags to list tags:

$ sudo autoxtrabackup --show_tags
--defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf
Backup              Type    Status  TAG
-------------------------------------------
2017-11-16_20-10-53 Full        OK  'My Full backup'
2017-11-16_20-12-23 Inc         OK  'First incremental backup'
2017-11-16_20-13-39 Inc         OK  'Second incremental backup'

It would be quite nice if we could prepare those backups with a tag name. In other words, if I have a full backup and five incremental backups, what if I want to prepare until the second or third incremental, or just a full backup?

Pass the tag name with the --prepare option, and it will do the trick:

$ autoxtrabackup --tag="First incremental backup" -v
-lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
-l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --prepare

It will prepare the full and “First incremental backup” – the remaining incremental backups will be ignored.

autoxtrabackup 1.5.0 also has a --dry_run option, which is going to show but not run exact commands. It is described here: –dry_run.

How about autoxtrabackup 1.5.0 installation? You can install it from the source or use pip3:

pip3 install mysql-autoxtrabackup

For more please read: Installation.

Do you want to enable encryption and compression for backups? Yes? You can enable this from the autoxtrabackup config as described here: Config file structure.

You can enable taking partial backups again by editing the config: partial backups.

autoxtrabackup 1.5.0 allows you to perform a partial recovery – i.e., restoring only a single specified table from a full backup. If the table was dropped,  autoxtrabackup will try to extract the create table statement from the .frm file using the mysqlfrm tool and then discard/import the tablespace from full backup. This is related to the transportable tablespace concept. You can read more here: restoring-single-table-after-drop.

For a full list of available options, read the DOC: autoxtrabackup DOC.

Thanks for reading! If you are going to try autoxtrabackup 1.5.0, don’t hesitate to provide some feedback!

Oct
23
2017
--

MySQL Point in Time Recovery the Right Way

MySQL Point In Time Recovery

MySQL Point In Time RecoveryIn this blog, I’ll look at how to do MySQL point in time recovery (PITR) correctly.

Sometimes we need to restore from a backup, and then replay the transactions that happened after the backup was taken. This is a common procedure in most disaster recovery plans, when for example you accidentally drop a table/database or run an update/delete without the “where” clause and lose data.

The usual way is to get a copy of your binlogs and use mysqlbinlog to replay those transactions. But this approach has many pitfalls that can make the whole PITR process a nightmare. Some examples:

  • You need to make sure to run a single mysqldump command with all related binlogs, and pipe them to mysql at once. Otherwise, if binlog.000001 creates a temporary table, and binlog.000002 requires that temporary table, it will not be present. Each execution of MySQL creates a new connection:
shell> mysqlbinlog binlog.000001 | mysql -u root -p # Creates tmp table X
shell> mysqlbinlog binlog.000002 | mysql -u root -p # Uses tmp table X

  • We can say that it has to be an atomic operation. If it fails halfway through, it will be very difficult to know where it failed and even more difficult to resume from that point forward. There are many reasons for it to fail: InnoDB lock wait timeout / deadlock with some concurrent transaction, server and client have different
    max_allowed_packet

     and you get a Lost connection to MySQL server during query error, and so on.

So how can we overcome those limitations and have a reliable way to do Point In Time Recovery?

We can restore the backup on the desired server, build a second server with just the minimal MySQL required data and move the all binary logs to this “fake” server datadir. Then we need to configure the server where we want the PITR to happen as a slave of the fake server. From this point forward, it’s going to be pure MySQL replication happening.

To illustrate it, I create a Docker container on the machine. I have Percona Server for MySQL running on the box listening on 3306, and have already restored the backup on it. There is a tarball there with all binlogs required. The saved positions for PITR are as follows:

[root@localhost ~]# cat /var/lib/mysql/xtrabackup_binlog_info
master-bin.000007	1518932

I create a folder to store the Docker MySQL datadir:

mkdir /tmp/pitr
chown -R 1001 /tmp/pitr

I start the Docker container. As we can see from xtrabackup_binlog_info, my binlogs are named master-bin and I’ll be setting the same server-id as original master:

docker run --name ps_pitr -v /tmp/pitr:/var/lib/mysql
-p 3307:3306 -e MYSQL_ROOT_PASSWORD=secret
-d percona/percona-server:5.7.18
--log_bin=master-bin --server-id=10

In case you want to make usage of GTID, append --gtid-mode=ON --enforce_gtid_consistency=ON to the end of the Docker command.

The command above starts a MySQL instance, invokes mysqld –initialize, sets the root password to secret and it’s port 3306 is mapped back to my local 3307 port. Now I’ll stop it, remove the binlogs that it created, uncompress and move all required binlogs to its datadir mapped folder and start it again:

docker stop ps_pitr
rm /tmp/pitr/master-bin.*
tar -zxf binlogs.tgz -C /tmp/pitr
chown -R 1001 /tmp/pitr/master-bin.*
docker start ps_pitr

If it all worked correctly, at this point we can see the full list of binary logs on the Docker container by connecting on port 3307:

mysql -u root -psecret -P 3307 --protocol=TCP -e "SHOW BINARY LOGS"
mysql: [Warning] Using a password on the command line interface can be insecure.
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000005 |  26216208 |
| master-bin.000006 |  26214614 |
| master-bin.000007 |  26214902 |
. . .
| master-bin.000074 |       154 |
+-------------------+-----------+

Now, all we need to do is connect to our server, which has the backup restored, and configure it as a slave from 3307:

mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 6
Server version: 5.7.18-16 Percona Server (GPL), Release 16, Revision d7301f8
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3307, MASTER_USER='root', MASTER_PASSWORD='secret', MASTER_LOG_FILE='master-bin.000007', MASTER_LOG_POS=1518932;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: master-bin.000008
          Read_Master_Log_Pos: 449696
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 28957
        Relay_Master_Log_File: master-bin.000007
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 15217950
              Relay_Log_Space: 11476311
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 4382
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10
                  Master_UUID: 80b9fe26-a945-11e7-aa1d-0242ac110002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Opening tables
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.17 sec)
. . .
mysql> SHOW SLAVE STATUSG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: master-bin.000074
          Read_Master_Log_Pos: 154
               Relay_Log_File: localhost-relay-bin.000133
                Relay_Log_Pos: 381
        Relay_Master_Log_File: master-bin.000074
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 819
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10
                  Master_UUID: 80b9fe26-a945-11e7-aa1d-0242ac110002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.01 sec)

If you want to apply logs up to a particular time you can make use of mysqlbinlog to verify what the last position / GTID it should apply, and use START SLAVE UNTIL MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos or START SLAVE SQL_THREAD UNTIL SQL_AFTER_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56.

Special thanks to Marcos Albe, who originally showed me this MySQL point in time recovery approach.

Oct
16
2017
--

When Should I Enable MongoDB Sharding?

MongoDB Sharding

MongoDB ShardingIn this blog post, we will talk about MongoDB sharding and walk through the main reasons why you should start a cluster (independent of the approach you have chosen).

Note: I will cover this subject in my webinar How To Scale with MongoDB on Wednesday, October 18, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Sharding is the most complex architecture you can deploy using MongoDB, and there are two main approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when you predict high throughput and fast data growth.

The second says you should use a cluster as the best alternative when the application demands more resources than the replica set can offer (such as low memory, an overloaded disk or high processor load). This approach is more corrective than preventative, but we will discuss that in the future.

1) Disaster recovery plan

Disaster recovery (DR) is a very delicate topic: how long would you tolerate an outage? If necessary, how long would it take you to restore the entire database? Depending on the database size and on disk speed, a backup/restore process might take hours or even days!
There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the database is more than 200GB the backup and restore processes might take a while to finish.
Let’s consider the case where we have a replica set with a 300GB database. The full restore process might last around four hours, whereas if the database has two shards, it will take about two hours – and depending on the number of shards we can improve that time. Simple math: if there are two shards, the restore process takes half of the time to restore when compared to a single replica set.

2) Hardware limitations

Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale out to high numbers (such as TB of RAM). Suppose your cloud provider can only offer you up to 5,000 IOPS in the disk subsystem, but the application needs more than that to work correctly. To work around this performance limitation, it is better to start a cluster and divide the writes among instances. That said, if there are two shards the application will have 10000 IOPS available to use for writes and reads in the disk subsystem.

3) Storage engine limitations

There are a few storage engine limitations that can be a bottleneck in your use case. MMAPv2 does have a lock per collection, while WiredTiger has tickets that will limit the number of writes and reads happening concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual limit – which means that changing the available tickets might generate processor overload instead of increasing performance. If one of these situations becomes a bottleneck in your system, you start a cluster. Once you shard the collection, you distribute the load/lock among the different instances.

4) Hot data vs. cold data

Several databases only work with a small percentage of the data being stored. This is called hot data or working set. Cold data or historical data is rarely read, and demands considerable system resources when it is. So why spend money on expensive machines that only store cold data or low-value data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices and disks to do so. The same is true for hot data – we can use better machines to have better performance. This methodology also speeds up writes and reads on the hot data, as the indexes are smaller and add less overhead to the system.

5) Geo-distributed data

It doesn’t matter whether this need comes from application design or legal compliance. If the data must stay within continent or country borders, a cluster helps make that happen. It is possible to limit data localization so that it is stored solely in a specific “part of the world.” The number of shards and their geographic positions is not essential for the application, as it only views the database. This is commonly used in worldwide companies for better performance, or simply to comply with the local law.

6) Infrastructure limitations

Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however, we focus on specific cases when the instances should be small. An example is running MongoDB on Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing to pay more for that, it is not possible to purchase more than they offer as their products. A cluster provides the option to split a small amount of data among a lot of shards, reaching the same performance a big and expensive machine provides.

7) Failure isolation

Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80% of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch of instances makes the cluster prone to have a failed instance, but as each shard must have at least three instances the probability of the entire shard being down is minimal. For providers that offer different zones, it is good practice to have different members of the shard in different availability zones (or even different regions).

8) Speed up queries

Queries can take too long, depending on the number of reads they perform. In a clustered deployment, queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a replica set, it is very likely that the same query will run in five to six seconds if the cluster has two shards, and so on.

I hope this helps with MongoDB sharding. Having a cluster solves several other problems as well, and we have listed only a few of them. Don’t miss our webinar regarding scaling out MongoDB next Wednesday, October 18, 2017!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com