Feb
22
2018
--

How to Restore MySQL Logical Backup at Maximum Speed

Restore MySQL Logical Backup

Restore MySQL Logical BackupThe ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.

Disclaimer

All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.

Benchmark

There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.

Database:

  • 16GB – InnoDB database size
  • 6.6GB – uncompressed mysqldump sql
  • 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
  --innodb_buffer_pool_size=4GB
  --innodb_log_file_size=1G
  --skip-log-bin
  --innodb_flush_log_at_trx_commit=0
  --innodb_flush_method=nosync
  --innodb_io_capacity=2000
  --innodb_io_capacity_max=3000
  --max_allowed_packet=1G
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
--innodb_buffer_pool_size=4GB
--innodb_log_file_size=1G
--skip-log-bin
--innodb_flush_log_at_trx_commit=0
--innodb_io_capacity=700
--innodb_io_capacity_max=1500
--max_allowed_packet=1G
--innodb_doublewrite=0
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
)
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
)
# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/test-restore.sh
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
done
)
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
)
#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

  • Using four parallel jobs for load: 17m3.662s
  • Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
17m59.866s
mydumper -B imdb1 --no-locks -o default_options
17m15.175s
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
16m36.878s
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB
18m15.266s

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years: https://bugs.launchpad.net/mydumper/+bug/1640550.

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

  • Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
  • 24 parallel processes of mysqlimport: 11m51.718s
  • Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
  • Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.
  • In many places, disable unique checks referred as a performance booster: 10m52.489s
    You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
  • The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.
  • O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.
  • O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.

Conclusions

  • There is no common solution to improve logical backup restore procedure.
  • If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
  • Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
  • The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
  • InnoDB log file size and buffer pool size are really important options for backup restore performance.
  • O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
  • If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
  • Hyperthreading also counts.
  • A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
  • Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
  • You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
  • Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
  • Default MySQL settings could cost you 10x backup restore slowdown
  • This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
  • If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).
Apr
17
2013
--

Disconnecting a replication slave is easier with MySQL 5.5+ (RESET SLAVE vs. RESET SLAVE ALL)

It’s not uncommon to promote a server from slave to master. One of the key things to protect your data integrity is to make sure that the promoted slave is permanently disconnected from its old master. If not, it may get writes from the old master, which can cause all kinds of data corruption. MySQL provides the handy RESET SLAVE command. But as we’ll see, its behavior has changed along with the MySQL versions and it’s easy to shoot yourself in the foot if you use it incorrectly. So how do you safely disconnect a replication slave?

In short

  • For MySQL 5.0 and 5.1, run STOP SLAVE, CHANGE MASTER TO MASTER_HOST='' and then RESET SLAVE.
  • For MySQL 5.5 and 5.6, run STOP SLAVE and then RESET SLAVE ALL.
  • For all versions, ban master-user, master-host and master-password settings in my.cnf, this may cause huge problems (it’s anyway no longer supported from MySQL 5.5).

If you want to know more details, please read on!

MySQL 5.0/5.1

First let’s consider MySQL 5.0 and 5.1. RESET SLAVE will remove the master.info and relay-log.info files as well as all the relay log files. This looks great, but does it ensure the replica is disconnected from its master?
Let’s try:

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.03 sec)
mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State:
                Master_Host: 127.0.0.1
                Master_User: test
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File:
        Read_Master_Log_Pos: 4
             Relay_Log_File: mysql_sandbox35302-relay-bin.000001
              Relay_Log_Pos: 4
      Relay_Master_Log_File:
           Slave_IO_Running: No
          Slave_SQL_Running: No
          		[...]

This is not expected: instead of removing all settings, some of them are reset to default values. This means that if you run START SLAVE (or if it’s done automatically, for instance when restarting the server without the skip-slave-start option), replication may start again. But as the master position has been deleted, replication will restart at the beginning of the first available binary log, which is very likely to corrupt your data by reexecuting some queries.

Here’s a trick to make RESET SLAVE work as expected: use CHANGE MASTER TO MASTER_HOST='':

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> change master to master_host='';
Query OK, 0 rows affected (0.02 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.04 sec)
mysql> show slave status\G
Empty set (0.00 sec)
mysql> start slave;
ERROR 1200 (HY000): The server is not configured as slave; fix in config file or with CHANGE MASTER TO

Much better! If we try to restart replication, it fails. However, I don’t like the error message, specifically the ‘fix in config file’ part. What happens if we specify the master-user, master-password, master-host and master-port in the my.cnf file?

# cat my.cnf
[...]
master-user=rsandbox
master-password=rsandbox
master-host=127.0.0.1
master-port=35301
[...]

Let’s disconnect the slave:

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> change master to master_host='';
Query OK, 0 rows affected (0.03 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.03 sec)
mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State:
                Master_Host: 127.0.0.1
                Master_User: rsandbox
                Master_Port: 35301
              Connect_Retry: 60
            Master_Log_File:
        Read_Master_Log_Pos: 4
             Relay_Log_File: mysql_sandbox35302-relay-bin.000001
              Relay_Log_Pos: 4
      Relay_Master_Log_File:
           Slave_IO_Running: No
          Slave_SQL_Running: No
          [...]

Connection settings are automatically restored, which makes disconnecting the replica impossible. And again, if you restart replication, it will read events from the first available binary log file on the master, which is probably not what you want. So never set master-xxx variables in my.cnf!

From MySQL 5.5

Starting with MySQL 5.5, the situation has slightly changed. First the master-xxx variables are no longer supported, which is a great improvement. But the RESET SLAVE statement also behaves differently:

mysql> stop slave;
Query OK, 0 rows affected (0,01 sec)
mysql > reset slave;
Query OK, 0 rows affected (0,11 sec)
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 18675
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql_sandbox18676-relay-bin.000001
                Relay_Log_Pos: 4
        [...]

As stated in the documentation, the connection parameters are still held in memory. In any case, you will be able to restart replication, but again as no replication coordinate is specified, replication will start at the beginning of the first available binary log file, with all the nasty consequences we can imagine.

Even worse, the CHANGE MASTER TO MASTER_HOST='' trick no longer works:

mysql> stop slave;
Query OK, 0 rows affected (0,01 sec)
mysql> change master to master_host='';
ERROR 1210 (HY000): Incorrect arguments to MASTER_HOST

Fortunately, the documentation also specifies that we can use RESET SLAVE ALL to remove all replication-related configuration:

mysql> stop slave;
Query OK, 0 rows affected (0,00 sec)
mysql> reset slave all;
Query OK, 0 rows affected (0,04 sec)
mysql> show slave status\G
Empty set (0,00 sec)

Very good! The command does work as expected without any additional tricks. As soon as you are aware of the difference between RESET SLAVE and RESET SLAVE ALL, disconnecting a replication slave is much easier with MySQL 5.5+.

The post Disconnecting a replication slave is easier with MySQL 5.5+ (RESET SLAVE vs. RESET SLAVE ALL) appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com