Jan
12
2021
--

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best PracticesIn this blog, we will review all the backup and restore strategies for MySQL, the cornerstones of any application. There are a few options, depending on your topology, MySQL versions, etc. And based on that, there are some questions we need to ask ourselves to make sure we make the right choices.

How many backups we need to keep safe, or what’s the best retention policy for us?

This means the number of backups to safeguard, whether local or remote (external fileserver, cloud). The retention policy can be daily, weekly, or monthly, depending on the free space available.

What is the Recovery Time Objective?

The Recovery Time Objective (RTO) refers to the amount of time that may pass during a disruption before it exceeds the maximum allowable threshold specified in the Business Continuity Plan.

The key question related to RTO is, “How quickly must the data on this system be restored?”

What is the Recovery Point Objective?

The Recovery Point Objective (RPO) is the duration of time and service level within which a business process must be stored after a disaster in order to avoid unacceptable consequences associated with a break in continuity.

The key question related to RPO is, “How much data can we lose?”

Different Types of Backups

There are two backup types: physical and logical.

  • Physical (Percona XtraBackup, RDS/LVM Snapshots, MySQL Enterprise Backup), and also you can use cp or rsync command lines to copy the datadir as long as mysql is down/stopped.
  • Logical (mysqldump, mydumper, mysqlpump, mysql shell only for mysql 8)

Also is recommended to take a copy of binlog files, why? Well, this will help us to recover until the last transaction.

Why are backups needed?

Backups are needed in case of multiple problems:

  • Host Failure: We can get multiple problems from disks stalled or broken disks. Also from cloud services, our DB instance can be broken and it’s non-accessible.
  • Corrupted Data: This can happen on a power outage, MySQL wasn’t able to write correctly and close the file, sometimes when MySQL starts again it cannot start due to corrupted data and the crash recovery process cannot fix it.
  • Inconsistent Data: When a human mistake, delete/update erroneous data over the primary or replica node.
  • DataCenter Failure: power outage or internet provider issues.
  • Legislation/Regulation: provide consistent business value and customer satisfaction.

Now let me explain those different types of backups mentioned above, but before I continue, it’s important to configure a new and dedicated replica node for backups purposes, due to the high CPU load to avoid any issue on any other replica node (AKA backup server).

Logical Backup

This is a dump from logical database structure (CREATE DATABASE, CREATE TABLE statements) and content (INSERT statements). This is recommended to be used against smaller amounts of data. The disadvantage of this method is slower (backup and restore) if you compare it with physical backups. Using mydumper you can backup and restore a single database or a single table if it’s needed, and this is useful to copy some data to a different environment to run tests. Also, mydumper can take a consistent (as long as all the tables are InnoDB engine) backup and provides accurate master and slave log positions.

The output is larger than for physical backup, particularly when saved in text format, but it can be compressed on the fly depending on the software you are using. Mydumper can compress and mysqldump needs to add a pipe to redirect the output to gzip, for example.

Logical backups are used to address data corruption or the need to restore a subset of tables.

Physical (Raw) Backup

In short, this consists of exact copies of database directories and files. This can be a copy for all or a part from MySQL datadir directory. This kind of backup is most used to restore or create a new replica node easily and quickly and is used to address host failure. It’s recommended to restore using the same MySQL version. I recommend using Percona XtraBackup because it can include any related files such as configuration files like cnf config files.

Snapshot Backups

Some file system implementations enable “snapshots” to be taken. These provide logical copies of the file system at a given point in time, without requiring a physical copy of the entire file system. MySQL itself does not provide the capability for taking file system snapshots but it is available using third-party solutions such as LVM or ZFS.

The disadvantage is that sometimes physical backups do not compress much, because data is usually in a binary format and sometimes the table is already compressed.

Binary Log Backups

Binlog backups specifically address RPO. Binary log files contain records of each SQL query executed that made changes.

From MySQL 5.6 on, you can use mysqlbinlog to stream binary logs from a remote server. You can combine binlog backups with Percona XtraBackup or mydumper backup to allow restoration up to the end of the most-recently-backed-up binary log.

Incremental / Differential Backups

An incremental backup is a backup of everything that has changed since the last backup (a binary log backup is a special case of an incremental backup). This is a very good option if the dataset size is huge, as you can take a full backup at the beginning of the week and run incremental backups per day. Also, the backup size is smaller than the full backup.

The main risks associated with incremental backups are:

– A single corrupt incremental backup may invalidate all the others

– Incremental backups typically negatively affect the RTO

For a differential backup, it copies the differences from your last backup, and the advantage is that a lot of data does not change from one backup to the next, so the result can be significantly smaller backups. This saves disk space.

Percona XtraBackup supports both incremental and differential backups.

Offsite Storage

It’s highly recommended to copy all the backup methods to another place, like the cloud or an external file server, so in case of host failure or data center failure, you have another copy.

Not all the backup files need to be uploaded to the cloud, sometimes the time you need to spend in the download is bigger than the time consumed in the recovery process.

A good approach is to keep 1-7 days locally on the backup server in case a fast recovery is needed, and this depends on your business regulations.

Encryption

Backups have sensitive data, so it’s highly recommended to encrypt, especially for offsite storage. This adds more time when you need to restore a backup but it keeps your data safe.

GPG is a good option to encrypt backups, and if you use this option or some other alternative, don’t forget to get a copy of the keys/passphrase. If you lose it, your backups will be useless.

Restore Testing

Depending on your business, it’s highly recommended to test your backups at least once per month. This action validates your backups are not corrupted and it provides critical metrics on recovery time. This process should be automated to get the full backup, restore it, and finally configure this server as a replica from the current primary or another replica. This is good as well to validate that the replication process has no errors.

Many customers are using this methodology to refresh their QA/STG environment to have fresh data from production backups.

In addition to the above, it is recommended to create a manual or automated restore documentation process to keep all the steps together, so in case of disaster, you can follow it without wasting time.

Retention Requirements

Last but not least, it is very important to keep multiple copies of different backup types.

Our best recommendation is:

  • One or two physical backups locally on the backup server (as long as space allows it).
  • Seven daily and four weekly logical backups locally on the backup server.
  • 30 days of binlog backups locally on the backup server.
  • For offsite backups (like S3, Google Cloud, etc.), keep monthly backups for one year or more.

For local backups, keep in mind you will need a minimum of 2.5 times the current dataset size as free disk space to save/meet these retention policies. Don’t forget to encrypt all the backup types!

Legal or regulatory requirements may also dictate how long data must be archived.

Percona Can Help

Percona can help you choose, implement, and optimize the most appropriate MySQL backup and recovery solution for your MySQL ecosystem. If your current solution unexpectedly fails, we can facilitate your recovery with onsite, remote, or emergency consulting services. We can also help you take steps to prevent another occurrence. Every situation is unique and we will work with you to create the most effective solution for your business.

Contact Us

Nov
06
2020
--

Various Backup Compression Methods Using Mysqlpump

Backup Compression Methods Using Mysqlpump

Backup Compression Methods Using MysqlpumpMysqlpump is a client program that was released with MySQL 5.7.8 and is used to perform logical backups in a better way. Mysqlpump supports parallelism and it has the capability of creating compressed output. Pablo already wrote a blog about this utility (The mysqlpump Utility), and in this blog, I am going to explore the available compression techniques in the Mysqlpump utility.

Overview

Mysqlpump has three options to perform the compression backup.

–compress: Used to compress all the information sent between client and server.

–compression-algorithm: It was added in MySQL 8.0.18. Used to define the compression algorithm for all incoming connections to the server. (available options: zlib, zstd, uncompressed )

–compress-output: Used to define the compression algorithm for the backup file (available options: lz4, zlib)

Here, “–compress-output” is the option used to define the compression algorithm for the backup file. Which has two algorithms.

  • Lz4
  • Zlib

Lz4: LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed.

Zlib: zlib is a software library used for data compression. zlib compressed data are typically written with a gzip or a zlib wrapper. 

Lab Setup

To experiment with both compression techniques, I have installed the MySQL (8.0.22) server in my local environment. I also created the table “percona_test.mp_test” which has an 11GB size.

[root@localhost]# mysql -e "select @@version, @@version_comment\G"
*************************** 1. row ***************************
        @@version: 8.0.22
@@version_comment: MySQL Community Server - GPL

[root@localhost]# mysql -e "select count(*) from percona_test.mp_test\G"
*************************** 1. row ***************************
count(*): 70698024

[root@localhost percona_test]# ls -lrth
total 11G
-rw-r-----. 1 mysql mysql 11G Oct 23 11:20 mp_test.ibd

Now, I am going to experiment with both compression algorithms.

Compression with Lz4

I am going to take the backup (table: mp_test) using the lz4 compression algorithm.

[root@localhost]# time mysqlpump --set-gtid-purged=off --compress --compress-output=lz4 percona_test mp_test > percona_test.mp_test.lz4
Dump progress: 0/1 tables, 250/70131715 rows
Dump progress: 0/1 tables, 133000/70131715 rows
Dump progress: 0/1 tables, 278500/70131715 rows
...
...
Dump progress: 0/1 tables, 70624000/70131715 rows
Dump completed in 540824
real 9m0.857s

It took 9.1 minutes to complete. And, the file size is 1.1 GB, looks like 10x compression.

[root@dc1 percona_test]# ls -lrth | grep lz4

-rw-r--r--. 1 root  root  1.1G Oct 23 12:47 percona_test.mp_test.lz4

Compression with Zlib

Now, I am going to start the backup with “zlib” algorithm.

[root@dc1]# time mysqlpump --set-gtid-purged=off --compress --compress-output=zlib percona_test mp_test > percona_test.mp_test.zlib
Dump progress: 0/1 tables, 250/70131715 rows
Dump progress: 0/1 tables, 133250/70131715 rows
Dump progress: 0/1 tables, 280250/70131715 rows
Dump progress: 0/1 tables, 428750/70131715 rows
...
...
Dump progress: 0/1 tables, 70627000/70131715 rows
Dump completed in 546249
real 10m6.436s

It took 10.6 minutes to complete the process. And the file size is the same 1.1 GB (10x compression).

[root@dc1]# ls -lrth | grep -i zlib

-rw-r--r--. 1 root  root  1.1G Oct 23 13:06 percona_test.mp_test.zlib

 

 

How to Decompress the Backup

MySQL community provides two utilities to decompress the backups.

  • zlib_decompress ( for zlib compression files )
  • lz4_decompress ( for lz4 compression files )

lz4_decompress

[root@dc1]# time lz4_decompress percona_test.mp_test.lz4 percona_test.mp_test.sql
real 0m45.287s
user 0m1.114s
sys 0m6.568s
[root@dc1]# ls -lrth | grep percona_test.mp_test.sql
-rw-r--r--. 1 root  root  9.1G Oct 23 13:30 percona_test.mp_test.sql

lz4 took 45 seconds to decompress the backup file.

zlib_decompress

[root@dc1]# time zlib_decompress percona_test.mp_test.zlib percona_test.mp_test.sql
real 0m35.553s
user 0m6.642s
sys 0m7.105s
[root@dc1]# ls -lrth | grep percona_test.mp_test.sql
-rw-r--r--. 1 root  root  9.1G Oct 23 13:49 percona_test.mp_test.sql

zlib took 36 seconds to decompress the backup file.

This is the procedure we have to compress/decompress the backups with Mysqlpump. It seems both the algorithms provide the 10x compression. Also, there is not much difference in the execution time as well, but it may be the big one with a large dataset. 

Aug
07
2019
--

Percona XtraBackup 8.0.7 Is Now Available

Percona XtraBackup

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 8.0.7 on August 7, 2019. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

In release 8.0.7, Percona XtraBackup enables making backups of databases that contain the encrypted system tablespace. Encrypted mysql tablespace is now also supported.

Percona XtraBackup 8.0.7 implements the support of the lz4 compression algorithm so that you could make compressed backups using lz4 (--compress=lz4) in addition to the default quicklz method.

All Percona software is open-source and free.

New Features and Improvements

  • Add support of the system tablespace encryption. More information in PXB-1649
  • Implemented the support of the lz4 compression algorithm. More information in PXB-1857.

Bugs Fixed

  • When the encrypted tablespaces feature was enabled, encrypted and compressed tables were not usable on the joiner node (Percona XtraDB Cluster) via SST (State Snapshot Transfer) with the xtrabackup-v2 method. Bug fixed PXB-1867.
  • xbcloud did not update date related fields of the HTTP header when retrying a request. Bug fixed PXB-1874.
  • xbcloud did not retry to send the request after receiving the HTTP 408 error (request timeout). Bug fixed PXB-1875.
  • xtrabackup did not accept decimal fractions as values of the innodb_max_dirty_pages_pct option. Bug fixed PXB-1807.
  • If the user tried to merge an already prepared incremental backup, a misleading error was produced without informing that incremental backups may not be used twice. Bug fixed PXB-1862.

Other bugs fixed: PXB-1493, PXB-1557, PXB-1887, PXB-1870, PXB-1879, PXB-1901.

Release notes with all the improvements for version 8.0.7 are available in our online documentation. Please report any bugs to the issue tracker.

Jul
10
2019
--

Percona XtraBackup 2.4.15 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 2.4.15 on July 10, 2019. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

All Percona software is open-source and free.

Bugs Fixed

  • When the encrypted tablespaces feature was enabled, encrypted and compressed
    tables were not usable on the joiner node (Percona XtraDB Cluster) via SST
    (State Snapshot Transfer) with the xtrabackup-v2 method. Bug fixed PXB-1867.
  • xbcloud did not update date related fields of the HTTP
    header when retrying a request. Bug fixed PXB-1874.
  • xbcloud did not retry to send the request after receiving the HTTP 408
    error (request timeout). Bug fixed PXB-1875.
  • If the user tried to merge an already prepared incremental backup, a
    misleading error was produced without informing that incremental backups may
    not be used twice. Bug fixed PXB-1862.
  • xbcloud could crash with the Swift storage when project options were
    not included. Bug fixed PXB-1844.
  • xtrabackup did not accept decimal fractions as values of the
    innodb_max_dirty_pages_pct option. Bug fixed PXB-1807.

Other bugs fixed:  PXB-1850, PXB-1879, PXB-1887, PXB-1888, PXB-1890.

Release notes with all the improvements for Percona XtraBackup 2.4.15 are available in our online documentation. Please report any bugs to the issue tracker.

May
09
2019
--

Percona XtraBackup 8.0.6 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 8.0.6 on May 9, 2019. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

In version 8.0.6, Percona XtraBackup introduces the support of the MyRocks storage engine with Percona Server for MySQL version 8.0.15-6 or higher.

Percona XtraBackup 8.0.6 enables saving backups to an Amazon S3, MinIO, and Google Cloud Storage (using interoperability mode) when using xbcloud. The following example demonstrates how to use an Amazon S3 storage to make a full backup:

$ xtrabackup --backup --stream=xbstream --extra-lsndir=/tmp --target-dir=/tmp | \
 xbcloud put --storage=s3 \
 --s3-endpoint='s3.amazonaws.com' \
 --s3-access-key='YOUR-ACCESSKEYID' \
 --s3-secret-key='YOUR-SECRETACCESSKEY' \
 --s3-bucket='mysql_backups'
 --parallel=10 \
 ${date -I}-full_backup

All Percona software is open-source and free.

New Features

  • Amazon S3 is now supported in xbcloud. More information in PXB-1813.
  • The MyRocks storage engine is now supported with Percona XtraBackup. More information in PXB-1754.

Bugs Fixed

  • Percona XtraBackup could fail to restore the undo tablespace created during or before incremental backup. Bug fixed PXB-1780.
  • A backup could fail if log_bin_index was defined in my.cnf. Bug fixed PXB-1801.
  • When the row format was changed during the backup, xtrabackup could crash during the incremental prepare stage. Bug fixed PXB-1824.
  • During the prepare phase, PXB could freeze and never finish execution. Bug fixed PXB-1819.
  • Percona XtraBackup could crash during the prepare stage when making a backup of a host running MySQL Server v8.0.16. Bug fixed PXB-1839.

Other bugs fixed: PXB-1809PXB-1810PXB-1832PXB-1837.

Release notes with all the improvements for version 8.0.6 are available in our online documentation. Please report any bugs to the issue tracker.

May
01
2019
--

Percona XtraBackup 2.4.14 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 2.4.14 on May 1, 2019. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

Percona XtraBackup 2.4.14 enables saving backups to an Amazon S3, MinIO, and Google Cloud Storage (using interoperability mode) when using xbcloud. The following example demonstrates how to use an Amazon S3 storage to make a full backup:

$ xtrabackup --backup --stream=xbstream --extra-lsndir=/tmp --target-dir=/tmp | \
 xbcloud put --storage=s3 \
 --s3-endpoint='s3.amazonaws.com' \
 --s3-access-key='YOUR-ACCESSKEYID' \
 --s3-secret-key='YOUR-SECRETACCESSKEY' \
 --s3-bucket='mysql_backups'
 --parallel=10 \
 ${date -I}-full_backup

All Percona software is open-source and free.

New Features

  • Amazon S3 is now supported in xbcloud. More information in PXB-1813.

Bugs Fixed

  • When the row format was changed during the backup, xtrabackup could crash
    during the incremental prepare stage. Bug fixed PXB-1824.
  • If compressed InnoDB undo tablespaces were not removed beforehand, the
    incremental backup could crash at the prepare stage. Bug fixed PXB-1552.

Other bugs fixed:  PXB-1771, PXB-1809, PXB-1837.

Release notes with all the improvements for version 2.4.14 are available in our online documentation. Please report any bugs to the issue tracker.

Feb
26
2019
--

Percona XtraBackup Now Supports Dump of InnoDB Buffer Pool

percona-xtra-backup buffer pool restore

percona-xtra-backup buffer pool restoreInnoDB keeps hot data in memory on its buffer named InnoDB Buffer Pool. For a long time, when a MySQL instance needed to bounce, this hot cached data was lost and the instance required a warm-up period to perform as well as it did before the service restart.

That is not the case anymore. Newer versions of MySQL/MariaDB allow users to save the state of this buffer by dumping tablespace ID’s and page ID’s to a file on disk that will be loaded automatically on startup, making the newly started server buffer pool as it was prior the restart.

Details about the MySQL implementation can be found at https://dev.mysql.com/doc/refman/5.7/en/innodb-preload-buffer-pool.html

With that in mind, Percona XtraBackup versions 2.4.13 can now instruct MySQL to dump the content of buffer pool while taking a backup. This means you can restore the backup on a new server and make MySQL perform just like the other instance in terms of InnoDB Buffer Pool data.

How it works

The buffer pool dump happens at the beginning of backup if --dump-innodb-buffer-pool is set.

The user can choose to change the default innodb_buffer_pool_dump_pct. If --dump-innodb-buffer-pool-pct is set, it stores the current MySQL innodb_buffer_pool_dump_pct value, then it changes it to the desired percentage. After the end of the backup, original values is restored back.

The actual file copy happens at the end of the backup.

Percona XtraDB Cluster

A very good use case is PXC/Galera. When a node initiates SST, we would like the joiner to have a copy of InnoDB Buffer Pool from the donor. We can configure PXC nodes to do that:

[xtrabackup]
dump-innodb-buffer-pool
dump-innodb-buffer-pool-pct=100

Here is an example of a PXC node that just received SST:

Before PXB-1548:

[root@marcelo-altmann-pxb-pxc-3 ~]# systemctl stop mysql && rm -rf /var/lib/mysql/* && systemctl start mysql && mysql -psekret -e "SHOW ENGINE INNODB STATUS\G" | grep 'Database pages'
mysql: [Warning] Using a password on the command line interface can be insecure.
Database pages 311

Joiner started with a cold buffer pool.

After adding dump-innodb-buffer-pool and dump-innodb-buffer-pool-pct=100 to my.cnf :

[root@marcelo-altmann-pxb-pxc-3 ~]# systemctl stop mysql && rm -rf /var/lib/mysql/* && systemctl start mysql && mysql -psekret -e "SHOW ENGINE INNODB STATUS\G" | grep 'Database pages'
mysql: [Warning] Using a password on the command line interface can be insecure.
Database pages 30970

Joiner started with a copy of the buffer pool from the donor, which will reduce the joiner warm-up period.

Conclusion

The new version of Percona XtraBackup can help to minimize the time a newly restored backup will take to perform like source server


Photo by Jametlene Reskp on Unsplash

Jan
18
2019
--

Percona XtraBackup 2.4.13 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 2.4.13 on January 18, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

New features and improvements:

  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.

Bugs fixed

  • xtrabackup did not delete missing tables from the partial backup which led to error messages logged by the server on startup. Bug fixed PXB-1536.
  • The --history option did not work when autocommit was disabled. Bug fixed PXB-1569.
  • xtrabackup could fail to backup encrypted tablespace when it was recently created or altered. Bug fixed PXB-1648.
  • When the --throttle option was used, the applied value was different from the one specified by the user (off by one error). Bug fixed PXB-1668.
  • It was not allowed for MTS (multi-threaded slaves) without GTID to be backed up with --safe-slave-backup. Bug fixed PXB-1672.
  • Percona Xtrabackup could crash when the ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL. Bug fixed PXB-1679.
  • xbcrypt could display an assertion failure and generated core if the required parameters are missing. Bug fixed PXB-1683.
  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory error”. Bugs fixed PXB-1691 and PXB-1698.
  • xtrabackup --prepare could hang while performing insert buffer merge. Bug fixed PXB-1704.
  • Incremental backups did not update xtrabackup_binlog_info with --binlog-info=lockless. Bug fixed PXB-1711

Other bugs fixed:  PXB-1570PXB-1609PXB-1632

Release notes with all the improvements for version 2.4.13 are available in our online documentation. Please report any bugs to the issue tracker.

Jan
16
2019
--

AWS launches Backup, a fully managed backup service for AWS

Amazon’s AWS cloud computing service today launched Backup, a new tool that makes it easier for developers on the platform to back up their data from various AWS services and their on-premises apps. Out of the box, the service, which is now available to all developers, lets you set up backup policies for services like Amazon EBS volumes, RDS databases, DynamoDB tables, EFS file systems and AWS Storage Gateway volumes. Support for more services is planned, too. To back up on-premises data, businesses can use the AWS Storage Gateway.

The service allows users to define their various backup policies and retention periods, including the ability to move backups to cold storage (for EFS data) or delete them completely after a certain time. By default, the data is stored in Amazon S3 buckets.

Most of the supported services, except for EFS file systems, already feature the ability to create snapshots. Backup essentially automates that process and creates rules around it, so it’s no surprise that pricing for Backup is the same as for using those snapshot features (with the exception of the file system backup, which will have a per-GB charge). It’s worth noting that you’ll also pay a per-GB fee for restoring data from EFS file systems and DynamoDB backups.

Currently, Backup’s scope is limited to a given AWS region, but the company says that it plans to offer cross-region functionality later this year.

“As the cloud has become the default choice for customers of all sizes, it has attracted two distinct types of builders,” writes Bill Vass, AWS’s VP of Storage, Automation, and Management Services. “Some are tinkerers who want to tweak and fine-tune the full range of AWS services into a desired architecture, and other builders are drawn to the same breadth and depth of functionality in AWS, but are willing to trade some of the service granularity to start at a higher abstraction layer, so they can build even faster. We designed AWS Backup for this second type of builder who has told us that they want one place to go for backups versus having to do it across multiple, individual services.”

Early adopters of AWS Backup are State Street Corporation, Smile Brands and Rackspace, though this is surely a service that will attract its fair share of users as it makes the life of admins quite a bit easier. AWS does have quite a few backup and storage partners, though, who may not be all that excited to see AWS jump into this market, too — though they often offer a wider range of functionality than AWS’s service, including cross-region and offsite backups.

 

Dec
21
2018
--

Backup and Restore a PostgreSQL Cluster With Multiple Tablespaces Using pg_basebackup

PostgreSQL backup cluster multiple tablespaces

PostgreSQL logopg_basebackup is a widely used PostgreSQL backup tool that allows us to take an ONLINE and CONSISTENT file system level backup. These backups can be used for point-in-time-recovery or to set up a slave/standby. You may want to refer to our previous blog posts, PostgreSQL Backup StrategyStreaming Replication in PostgreSQL and Faster PITR in PostgreSQL where we describe how we used pg_basebackup for different purposes. In this post, I’ll demonstrate the steps to restore a backup taken using pg_basebackup when we have many tablespaces that store databases or their underlying objects.

A simple backup can be taken using the following syntax.

Tar and Compressed Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Ft -z -Xs -P
Plain Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Fp -Xs -P

Using a tar and compressed format is advantageous when you wish to use less disk space to backup and store all tablespaces, data directory and WAL segments, with everything in just one directory (target directory for backup).

Whereas a plain format stores a copy of the data directory as is, in the target directory. When you have one or more non-default tablespaces, tablespaces may be stored in a separate directory. This is usually the same as the original location, unless you use

--tablespace-mapping

  to modify the destination for storing the tablespaces backup.

PostgreSQL supports the concept of tablespaces. In simple words, a tablespace helps us maintain multiple locations to scatter databases or their objects. In this way, we can distribute the IO and balance the load across multiple disks.

To understand what happens when we backup a PostgreSQL cluster that contains multiple tablespaces, let’s consider the following example. We’ll take these steps:

  • Create two tablespaces in an existing master-slave replication setup.
  • Take a backup and see what is inside the backup directory.
  • Restore the backup.
  • Conclude our findings

Create 2 tablespaces and take a backup (tar format) using pg_basebackup

Step 1 :

I set up a replication cluster using PostgreSQL 11.2. You can refer to our blog post Streaming Replication in PostgreSQL to reproduce the same scenario. Here are the steps used to create two tablespaces:

$ sudo mkdir /data_pgbench
$ sudo mkdir /data_pgtest
$ psql -c "CREATE TABLESPACE data_pgbench LOCATION '/data_pgbench'"
$ psql -c "CREATE TABLESPACE data_pgtest LOCATION '/data_pgtest'"
$ psql -c "select oid, spcname, pg_tablespace_location(oid) from pg_tablespace"
oid | spcname | pg_tablespace_location
-------+--------------+------------------------
1663 | pg_default |
1664 | pg_global |
16419 | data_pgbench | /data_pgbench
16420 | data_pgtest | /data_pgtest
(4 rows)

Step 2 :

Now, I create two databases in two different tablespaces, using pgbench to create a few tables and load some data in them.

$ psql -c "CREATE DATABASE pgbench TABLESPACE data_pgbench"
$ psql -c "CREATE DATABASE pgtest TABLESPACE data_pgtest"
$ pgbench -i pgbench
$ pgbench -i pgtest

In a master-slave setup built using streaming replication, you must ensure that the directories exist in the slave, before running a

"CREATE TABLESPACE ..."

  on the master. This is because, the same statements used to create a tablespace are shipped/applied to the slave through WALs – this is unavoidable. The slave crashes with the following message, when these directories do not exist:

2018-12-15 12:00:56.319 UTC [13121] LOG: consistent recovery state reached at 0/80000F8
2018-12-15 12:00:56.319 UTC [13119] LOG: database system is ready to accept read only connections
2018-12-15 12:00:56.327 UTC [13125] LOG: started streaming WAL from primary at 0/9000000 on timeline 1
2018-12-15 12:26:36.310 UTC [13121] FATAL: directory "/data_pgbench" does not exist
2018-12-15 12:26:36.310 UTC [13121] HINT: Create this directory for the tablespace before restarting the server.
2018-12-15 12:26:36.310 UTC [13121] CONTEXT: WAL redo at 0/9000448 for Tablespace/CREATE: 16417 "/data_pgbench"
2018-12-15 12:26:36.311 UTC [13119] LOG: startup process (PID 13121) exited with exit code 1
2018-12-15 12:26:36.311 UTC [13119] LOG: terminating any other active server processes
2018-12-15 12:26:36.314 UTC [13119] LOG: database system is shut down
2018-12-15 12:27:01.906 UTC [13147] LOG: database system was interrupted while in recovery at log time 2018-12-15 12:06:13 UTC
2018-12-15 12:27:01.906 UTC [13147] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

Step 3 :

Let’s now use pg_basebackup to take a backup. In this example, I use a tar format backup.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Ft -z -Xs -P
94390/94390 kB (100%), 3/3 tablespaces

In the above log, you could see that there are three tablespaces that have been backed up: one default, and two newly created tablespaces. If we go back and check how the data in the two tablespaces are distributed to appropriate directories, we see that there are symbolic links created inside the pg_tblspc directory (within the data directory) for the oid’s of both tablespaces. These links are directed to the actual location of the tablespaces, we specified in Step 1.

$ ls -l $PGDATA/pg_tblspc
total 0
lrwxrwxrwx. 1 postgres postgres 5 Dec 15 12:31 16419 -> /data_pgbench
lrwxrwxrwx. 1 postgres postgres 6 Dec 15 12:31 16420 -> /data_pgtest

Step 4 :

Here are the contents inside the backup directory, that was generated through the backup taken in Step 3.

$ ls -l /backup/latest_backup
total 8520
-rw-------. 1 postgres postgres 1791930 Dec 15 12:54 16419.tar.gz
-rw-------. 1 postgres postgres 1791953 Dec 15 12:54 16420.tar.gz
-rw-------. 1 postgres postgres 5113532 Dec 15 12:54 base.tar.gz
-rw-------. 1 postgres postgres 17097 Dec 15 12:54 pg_wal.tar.gz

Tar Files :

16419.tar.gz

 and

16420.tar.gz

 are created as a backup for the two tablespaces. These are created with the same names as the OIDs of their respective tablespaces.

Let’s now take a look how we can restore this backup to completely different locations for data and tablespaces.

Restore a backup with multiple tablespaces

Step 1 :

In order to proceed further with the restore, let’s first extract the base.tar.gz file. This file contains some important files that help us to proceed further.

$ tar xzf /backup/latest_backup/base.tar.gz -C /pgdata
$ ls -larth /pgdata
total 76K
drwx------. 2 postgres postgres 18 Dec 14 14:15 pg_xact
-rw-------. 1 postgres postgres 3 Dec 14 14:15 PG_VERSION
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_twophase
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_subtrans
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_snapshots
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_serial
drwx------. 4 postgres postgres 36 Dec 14 14:15 pg_multixact
-rw-------. 1 postgres postgres 1.6K Dec 14 14:15 pg_ident.conf
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_dynshmem
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_commit_ts
drwx------. 6 postgres postgres 54 Dec 14 14:18 base
-rw-------. 1 postgres postgres 4.5K Dec 14 16:16 pg_hba.conf
-rw-------. 1 postgres postgres 208 Dec 14 16:18 postgresql.auto.conf
drwx------. 2 postgres postgres 6 Dec 14 16:18 pg_stat
drwx------. 2 postgres postgres 58 Dec 15 00:00 log
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_stat_tmp
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_replslot
drwx------. 4 postgres postgres 68 Dec 15 12:54 pg_logical
-rw-------. 1 postgres postgres 224 Dec 15 12:54 backup_label
drwx------. 3 postgres postgres 28 Dec 15 12:57 pg_wal
drwx------. 2 postgres postgres 4.0K Dec 15 12:57 global
drwx------. 2 postgres postgres 32 Dec 15 13:01 pg_tblspc
-rw-------. 1 postgres postgres 55 Dec 15 13:01 tablespace_map
-rw-------. 1 postgres postgres 24K Dec 15 13:04 postgresql.conf
-rw-r--r--. 1 postgres postgres 64 Dec 15 13:07 recovery.conf
-rw-------. 1 postgres postgres 44 Dec 15 13:07 postmaster.opts
drwx------. 2 postgres postgres 18 Dec 15 13:07 pg_notify
-rw-------. 1 postgres postgres 30 Dec 15 13:07 current_logfiles

Step 2 :

The files that we need to consider for our recovery are :

  • backup_label
  • tablespace_map

When you open the backup_label file, we see the start WAL location, backup start time, etc. These are some details that help us perform a point-in-time-recovery.

$ cat backup_label
START WAL LOCATION: 0/B000028 (file 00000001000000000000000B)
CHECKPOINT LOCATION: 0/B000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-12-15 12:54:10 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

Now, let us see what is inside the

tablespace_map

 file.

$ cat tablespace_map
16419 /data_pgbench
16420 /data_pgtest

In the above log, you could see that there are two entries – one for each tablespace. This is a file that maps a tablespace (oid) to its location. When you start PostgreSQL after extracting the tablespace and WAL tar files, symbolic links are created automatically by postgres – inside the pg_tblspc directory for each tablespace – to the appropriate tablespace location, using the mapping done in this files.

Step 3 :

Now, in order to restore this backup in the same postgres server from where the backup was taken, you must remove the existing data in the original tablespace directories. This allows you to extract the tar files of each tablespaces to the appropriate tablespace locations.

The actual commands for extracting tablespaces from the backup in this case were the following:

$ tar xzf 16419.tar.gz -C /data_pgbench (Original tablespace location)
$ tar xzf 16420.tar.gz -C /data_pgtest  (Original tablespace location)

In a scenario where you want to restore the backup to the same machine from where the backup was originally taken, we must use different locations while extracting the data directory and tablespaces from the backup. In order to achieve that, tar files for individual tablespaces may be extracted to different directories than the original directories specified in

tablespace_map

 file, upon which we can modify the

tablespace_map

 file with the new tablespace locations. The next two steps should help you to see how this works.

Step 3a :

Create two different directories and extract the tablespaces to them.

$ tar xzf 16419.tar.gz -C /pgdata_pgbench (Different location for tablespace than original)
$ tar xzf 16420.tar.gz -C /pgdata_pgtest  (Different location for tablespace than original)

Step 3b :

Edit the

tablespace_map

 file with the new tablespace locations. Replace the original location of each tablespace with the new location, where we have extracted the tablespaces in the previous step. Here is how it appears after the edit.

$ cat tablespace_map
16419 /pgdata_pgbench
16420 /pgdata_pgtest

Step 4 :

Extract pg_wal.tar.gz from backup to pg_wal directory of the new data directory.

$ tar xzf pg_wal.tar.gz -C /pgdata/pg_wal

Step 5 :

Create

recovery.conf

 to specify the time until when you wish to perform a point-in-time-recovery. Please refer to our previous blog post – Step 3, to understand recovery.conf for PITR in detail.

Step 6 :

Once all of the steps above are complete you can start PostgreSQL.
You should see the following files renamed after recovery.

backup_label   --> backup_label.old
tablespace_map --> tablespace_map.old
recovery.conf  --> recovery.done

To avoid the exercise of manually modifying the tablespace_map file, you can use

--tablespace-mapping

 . This is an option that works when you use a plain format backup, but not with tar. Let’s see why you may prefer a tar format when compared to plain.

Backup of PostgreSQL cluster with tablespaces using plain format

Consider the same scenario where you have a PostgreSQL cluster with two tablespaces. You might see the following error when you do not use

--tablespace-mapping

 .

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Fp -Xs -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/22000028 on timeline 1
pg_basebackup: directory "/data_pgbench" exists but is not empty
pg_basebackup: removing contents of data directory "/backup/latest_backup"

What the above error means is that the pg_basebackup is trying to store the tablespaces in the same location as the original tablespace directory. Here

/data_pgbench

 is the location of tablespace :

data_pgbench.

 And, now, pg_basebackup is trying to store the tablespace backup in the same location. In order to overcome this error, you can apply tablespace mapping using the following syntax.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -T "/data_pgbench=/pgdata_pgbench" -T "/data_pgtest=/pgdata_pgtest" -Fp -Xs -P

-T

 is used to specify the tablespace mapping.

-T

 can be replaced by

--tablespace-mapping

.

The advantage of using -T (

--tablespace-mapping

 ) is that the tablespaces are stored separately in the mapping directories. In this example with plain format backup, you must extract all the following three directories in order to restore/recover the database using backup.

  • /backup/latest_backup
  • /pgdata_pgtest
  • /pgdata_pgbench

However, you do not need a

tablespace_map

  file in this scenario, as it is automatically taken care of by PostgreSQL.
If you take a backup in tar format, you see all the tar files for base, tablespaces and WAL segments stored in the same backup directory, and just this directory can be extracted for performing restore/recovery. However, you must manually extract the tablespaces and WAL segments to appropriate locations and edit the tablespace_map file, as discussed above.


Image based on Photos by Alan James Hendry on Unsplash   and  Tanner Boriack on Unsplash

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com