Sep
18
2020
--

MongoDB Backup Best Practices

MongoDB Backup Best Practices

MongoDB Backup Best PracticesIn this blog, we will be discussing different backup strategies for MongoDB and their use cases, along with the pros and cons of each.

Why Take Backups?

Regular database backups are a crucial step in guarding against unintended data loss events. It doesn’t matter if you lose your data because of mechanical failure, a natural disaster, or criminal malice, your data is gone. However, the data doesn’t need to be lost. You can back it up.

Generally, there are two types of backups used with databases technologies like MongoDB:

  • Logical Backups
  • Physical Backups

Additionally, we have the option of incremental backups as well (part of logical), where we can capture the deltas or incremental data changes made between full backups to minimize the data loss in case of any disaster. We will be discussing these two backup options, how to proceed with them, and which one suits better depending upon requirements and environment setup.

Logical Backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON formatted file.

During logical backups using client API, the data gets read from the server and returned back to the same API which will be serialized and written into respective “.bson”, “.json”, or “.csv”  backup files on disk depending upon the type of backup utilities used.

MongoDB offers the below utility to take logical backups:

Mongodump: Takes dump/backup of the databases into “.bson” format which can be later restored by replaying the same logical statements captured in dump files back to the databases.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax, then the backup will be taken for the entire database or collections respectively. If “authorization” is enabled then we must specify the “authenticationDatabase”.

Also, you should use “–oplog” to take the incremental data while the backup still running, and we can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire database backups.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

Pros:

  1. It can take the backup at a more granular level like a specific database or a collection which will be helpful during restoration.
  2. Does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons:

  1. As it reads all data it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache. The WT cache pressure increases which slows down the performance.
  2. It doesn’t capture the index data into the metadata backup file due to which while restoring, all the indexes have to be built again after the collection data is reinserted. This will be done in one pass through the collection after the inserts have finished, so it can add a lot of time for big collection restores..
  3. The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.

Note: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation from Primary node.

As we have different types of environment setups, we should be approaching each one of them as below.

  1. Replica set: Always preferred to run on secondaries.
  2. Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them.

Since we are discussing distributed database system like shard cluster, we should also keep in mind to have consistency in our backups at a point in time (Replica sets backups using mongodump are generally consistent using “–oplog”).

Let’s discuss this scenario where the application is still writing data and cannot be stopped because of business reasons. Now, even if we take backups of the config server and each shard separately, at some point in time, the backup will finish at different times because of data volume, load, etc. Hence, while restoring there might be some inconsistencies occurring because of the same reason.

For that, Percona Backup for MongoDB is very useful (uses mongodump libraries internally) since it tails the oplog on each shard separately while the backup is still running until completion. More references can be found here in the release notes.

Now comes the restoration part when dealing with Logical backups. Same as for backups, MongoDB provides the below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place once the data is restored which causes to use additional memory resources and time.

mongorestore --host=mongodb1.example.net --port=27017 --username=user  --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

For the restore of the incremental dump, we can add –oplogReplay in the above syntax to replay the oplog entries as well.

Note: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Physical/Filesystem Backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath)  at a point in time, and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several methods to take the filesystem level backup, also known as Physical backups, as below.

  1. Manually Copying the entire data files (using Rsync ? Depends on N/W bandwidth)
  2. LVM based snapshots
  3. Cloud-based disk snapshots (AWS/GCP/Azure or any other cloud provider)
  4. Percona hot backup here

We’ll be discussing all these above options but first, let’s see their Pros and Cons over Logical Based backups.

Pros:

  1. They are at least as fast as, and usually faster than, logical backups.
  2. Can be easily copied over or shared with remote servers or attached NAS.
  3. Recommended for large datasets because of speed and reliability.
  4. Can be convenient while building new nodes within the same cluster or new cluster.

Cons:

  1. It is impossible when performing a restore on a less granular level such as specific DB or Collection restore.
  2. Incremental backups cannot be achieved yet.
  3. A dedicated node is recommended for backup (might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB

Index Size: <1MB (since it was only on _id for testing)

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 137029,
        "avgObjSize" : 2097192,
        "dataSize" : 267.6398703530431,
        "storageSize" : 13.073314666748047,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 0.0011749267578125,
        "scaleFactor" : 1073741824,
        "fsUsedSize" : 16.939781188964844,
        "fsTotalSize" : 49.98826217651367,
        "ok" : 1,
        ...
}
demo:PRIMARY>

        1. Hot backup

Syntax : 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

 

Note: The backup path “backupDir” should be absolute. It also supports storing the backups on the filesystem and AWS S3 buckets.

[root@ip-172-31-37-92 tmp]# time mongo  < hot.js
Percona Server for MongoDB shell version v4.2.8-8
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }
Percona Server for MongoDB server version: v4.2.8-8
switched to db admin
{
        "ok" : 1,
        ...
}
bye

real    3m51.773s
user    0m0.067s
sys     0m0.026s
[root@ip-172-31-37-92 tmp]# ls
hot  hot.js  mongodb-27017.sock  nohup.out  systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD  tmux-0
[root@ip-172-31-37-92 tmp]# du -sch hot
15G     hot
15G     total

Notice the time taken by “Percona hot backup” was just 4 minutes approx. It is even very helpful during the rebuild of a node or spinning new instances/cluster with the same dataset. The best part is it doesn’t compromise with locking of writes or any performance hits. However, it is also recommended to run it against the secondaries. 

       2.  Filesystem Snapshot

The approximate time taken for the snapshot to be completed was only 4 minutes.

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots  --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"
{
    "SnapshotId": "snap-0f4403bc0fa0f2e9c",
    "StartTime": "2020-08-26T12:26:32.783Z"
}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots \
> --snapshot-ids snap-0f4403bc0fa0f2e9c
{
    "Snapshots": [
        {
            "Description": "This is my snapshot backup",
            "Encrypted": false,
            "OwnerId": "021086068589",
            "Progress": "100%",
            "SnapshotId": "snap-0f4403bc0fa0f2e9c",
            "StartTime": "2020-08-26T12:26:32.783Z",
            "State": "completed",
            "VolumeId": "vol-0def857c44080a556",
            "VolumeSize": 50
        }
    ]
}

       3. Mongodump

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &
[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out
2020-08-26T12:36:20.842+0000    writing test.collG to /mongodump/test/collG.bson
2020-08-26T12:51:08.832+0000    [####....................]  test.collG  27353/137029  (20.0%)

Note: Just to give an idea, we can clearly see that for the same dataset where snapshot and hot backup took only 3-5 minutes, “mongodump” took almost 15 minutes just for 20% of the dump. Hence the speed to back up the data is definitely very slow as compared to the other two options we have. And on top of that, we would only be left with one option to restore the backup that is “mongorestore” which will eventually make the whole process much slower.

Conclusion

So, which backup method would be the best? It completely depends on factors like the type of infrastructure, environment, dataset size, load, etc. But generally, if the dataset is around 100GB or less than that, then the logical backups are the best option along with scheduled incremental backups as well, depending upon RTO (Recovery Time Objective)/RPO (Recovery Point Objective)  needs. However, if the dataset size is more than that, we should always go for physical backups including incremental backups (oplogs) as well.

Interested in trying Percona Backup for MongoDB? Download it for free! 

Jul
01
2019
--

Setting World-Writable File Permissions Prior to Preparing the Backup Can Break It

Setting World-Writable File Permissions

Setting World-Writable File PermissionsIt’s bad practice to provide world-writable access to critical files in Linux, though we’ve seen time and time again that this is done to conveniently share files with other users, applications, or services. But with Xtrabackup, preparing backups could go wrong if the backup configuration has world-writable file permissions.

Say you performed a backup on a MySQL instance configured with data-at-rest encryption using the keyring plugin. On the backup directory, the generated backup-my.cnf contains these instructions to load this plugin that will be used by Xtrabackup while preparing the backup:

backup-my.cnf

[mysqld]
innodb_checksum_algorithm=crc32
innodb_log_checksum_algorithm=strict_crc32
innodb_data_file_path=ibdata1:12M:autoextend
innodb_log_files_in_group=2
innodb_log_file_size=1073741824
innodb_fast_checksum=false
innodb_page_size=16384
innodb_log_block_size=512
innodb_undo_directory=./
innodb_undo_tablespaces=0
server_id=0
redo_log_version=1
plugin_load=keyring_file.so
server_uuid=00005726-0000-0000-0000-000000005726
master_key_id=1

Perhaps you wanted to share the backup with another user, but made a mistake of making the directory and its contents world-writable: chmod -R 777 /backup/mysql

When that user prepares the backup, the corresponding output will show that Xtrabackup ignored reading backup-my.cnf and so it doesn’t know that it has to load the keyring plugin to decrypt the .ibd files:

~$ xtrabackup --prepare --keyring-file-data=/backup/mysql/keyring --target-dir=/backup/mysql
xtrabackup: [Warning] World-writable config file '/backup/mysql/backup-my.cnf' is ignored.
xtrabackup: recognized server arguments:
xtrabackup: [Warning] World-writable config file '/backup/mysql/backup-my.cnf' is ignored.
xtrabackup: recognized client arguments: --prepare=1 --target-dir=/backup/mysql
xtrabackup version 2.4.14 based on MySQL server 5.7.19 Linux (x86_64) (revision id: ef675d4)
xtrabackup: cd to /backup/mysql/
xtrabackup: This target seems to be not prepared yet.
InnoDB: Number of pools: 1
xtrabackup: xtrabackup_logfile detected: size=215089152, start_lsn=(3094928949)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup:   innodb_log_group_home_dir = .
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 215089152
xtrabackup: [Warning] World-writable config file './backup-my.cnf' is ignored.
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup:   innodb_log_group_home_dir = .
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 215089152
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: PUNCH HOLE support available
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Uses event mutexes
InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
InnoDB: Compressed tables use zlib 1.2.8
InnoDB: Number of pools: 1
InnoDB: Using CPU crc32 instructions
InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
InnoDB: Completed initialization of buffer pool
InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
InnoDB: Highest supported file format is Barracuda.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest2.ibd can't be decrypted.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest1.ibd can't be decrypted.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest4.ibd can't be decrypted.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest3.ibd can't be decrypted.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest5.ibd can't be decrypted.
InnoDB: Log scan progressed past the checkpoint lsn 3094928949
** redacted **
InnoDB: Doing recovery: scanned up to log sequence number 3097681408 (1%)
InnoDB: Doing recovery: scanned up to log sequence number 3102924288 (4%)
InnoDB: Doing recovery: scanned up to log sequence number 3108167168 (6%)
InnoDB: Doing recovery: scanned up to log sequence number 3113410048 (9%)
InnoDB: Doing recovery: scanned up to log sequence number 3118652928 (12%)
InnoDB: Doing recovery: scanned up to log sequence number 3123895808 (15%)
InnoDB: Doing recovery: scanned up to log sequence number 3129138688 (17%)
InnoDB: Doing recovery: scanned up to log sequence number 3134381568 (20%)
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
** redacted **
InnoDB: Doing recovery: scanned up to log sequence number 3265453568 (89%)
InnoDB: Doing recovery: scanned up to log sequence number 3270696448 (91%)
InnoDB: Doing recovery: scanned up to log sequence number 3275939328 (94%)
InnoDB: Doing recovery: scanned up to log sequence number 3281182208 (97%)
InnoDB: Doing recovery: scanned up to log sequence number 3286158358 (100%)
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: xtrabackup: Last MySQL binlog file position 568369058, file name mysql-bin.000004
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest1.ibd can't be decrypted.
InnoDB: Removing missing table `sbtest/sbtest1` from InnoDB data dictionary.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest2.ibd can't be decrypted.
InnoDB: Removing missing table `sbtest/sbtest2` from InnoDB data dictionary.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest3.ibd can't be decrypted.
InnoDB: Removing missing table `sbtest/sbtest3` from InnoDB data dictionary.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest4.ibd can't be decrypted.
InnoDB: Removing missing table `sbtest/sbtest4` from InnoDB data dictionary.
InnoDB: Encryption can't find master key, please check the keyring plugin is loaded.
InnoDB: Encryption information in datafile: ./sbtest/sbtest5.ibd can't be decrypted.
InnoDB: Removing missing table `sbtest/sbtest5` from InnoDB data dictionary.
InnoDB: Creating shared tablespace for temporary tables
InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
InnoDB: File './ibtmp1' size is now 12 MB.
InnoDB: 96 redo rollback segment(s) found. 1 redo rollback segment(s) are active.
InnoDB: 32 non-redo rollback segment(s) are active.
InnoDB: page_cleaner: 1000ms intended loop took 6627ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
InnoDB: 5.7.19 started; log sequence number 3286158358
InnoDB: xtrabackup: Last MySQL binlog file position 568369058, file name mysql-bin.000004

Even if you fix the permissions on backup-my.cnf, if you try to prepare the same backup again, Xtrabackup will warn you that it has already prepared the backup.

~$ xtrabackup --prepare --keyring-file-data=/backup/mysql/keyring --target-dir=/backup/mysql
xtrabackup: recognized server arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksum_algorithm=strict_crc32 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=1073741824 --innodb_fast_checksum=0 --innodb_page_size=16384 --innodb_log_block_size=512 --innodb_undo_directory=./ --innodb_undo_tablespaces=0 --server-id=0 --redo-log-version=1
xtrabackup: recognized client arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksum_algorithm=strict_crc32 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=1073741824 --innodb_fast_checksum=0 --innodb_page_size=16384 --innodb_log_block_size=512 --innodb_undo_directory=./ --innodb_undo_tablespaces=0 --server-id=0 --redo-log-version=1 --prepare=1 --target-dir=/backup/mysql
xtrabackup version 2.4.14 based on MySQL server 5.7.19 Linux (x86_64) (revision id: ef675d4)
xtrabackup: cd to /backup/mysql/
xtrabackup: This target seems to be already prepared.
InnoDB: Number of pools: 1

This means that changes made while the backup was taking place will not be applied and what you have restored is an inconsistent, potentially corrupt backup. You need to perform a full backup again and make sure that you do not place world/other writable permissions on the backup this around so that you will not face the same issue.

Jun
17
2019
--

Percona Backup for MongoDB 0.5.0 Early Release is Now Available

Percona Backup for MongoDB

Percona Backup for MongoDBPercona is pleased to announce the early release of our latest software product Percona Backup for MongoDB 0.5.0 on June 17, 2019. The GA version is scheduled to be released later in 2019.

Percona Backup for MongoDB is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a single replica set), and for restoring those backups to a specific point in time. Percona Backup for MongoDB uses a distributed client/server architecture to perform backup/restore actions. The project was inspired by (and intends to replace) the Percona-Lab/mongodb_consistent_backup tool.

Percona Backup for MongoDB supports Percona Server for MongoDB or MongoDB Community Server version 3.6 or higher with MongoDB replication enabled. Binaries for the supported platforms as well as the tarball with source code are available from the Percona Backup for MongoDB download page. For more information about Percona Backup for MongoDB and the installation steps, see the documentation.

Percona Backup for MongoDB 0.5.0 features the following:

  • Enables storing backup metadata on Amazon Simple Storage Service storages.
  • The API of Percona Backup for MongoDB introduces HTTP basic authentication to prevent an unauthorized user from running backups or restoring data if they manage to access the API port.
  • To optimize the usage of network resources, the pbm-agent on mongos is not needed anymore and backup coordinator automatically establishes connection to the appropriate mongos instance.
  • The output of pbmctl list nodes now includes the replica set name and informs the backup status of the node.

Percona doesn’t recommend this release for production as its API and configuration fields are still likely to change. It only features a basic API level security. Please report any bugs you encounter in our bug tracking system.

New Features and Improvements

  • PBM-93: Support storage of backup metadata on AWS S3.
  • PBM-99pbm-agent is deprecated on mongos
  • PBM-105: Log a warning if a Primary node-type is used for a backup
  • PBM-122: Include the replica set name to the output of pmbctl list nodes
  • PBM-130: Add HTTP Basic Authentication to gRPC servers (API and RPC)
  • PBM-139: Support listing backup status in the output of pmbctl list nodes
  • PBM-170: Enable setting the ‘stopOnError’ attribute in mongorestore to ensure consistency of the data being restored.

Percona Backup for MongoDB, a free open source back-up solution, will enable you to manage your own back-ups without third party involvement or costly licenses. Percona Backup for MongoDB is distributed under the terms of Apache License v2.0.

We look forward to sharing the GA version of Percona Backup for MongoDB later this year and appreciate any feedback that you might have on our 0.5.0 version in the meantime.

May
10
2019
--

pgBackRest – A Great Backup Solution and a Wonderful Year of Growth

pgBackRest postgresql backup solution

pgBackRest postgresql backup solutionpgBackRest addresses many of the must-have features that you’ll want to look for in a PostgreSQL backup solution. I have been a great fan of the pgBackRest project for quite some time, and it gets better all the time. Historically, it was written in perl and now over the last year, the project is making steady progress converting into native C code. At the time of writing, the latest version is 2.13 and there remains dependency on a long list of perl libraries. In case you’ve never tried pgBackRest, now it is a great time to do it. This post should help you to set up a simple backup with a local backup repository.

Installation

The pgBackRest project packages are maintained in the PGDG repository. If you have already added the PGDG repository to package manager,  installation is a breeze.

On RHEL/CentOS/OEL:

$ sudo yum install pgbackrest

On Ubuntu/Debian

$ sudo apt install pgbackrest

This will fetch all the required perl libraries too:

The backrest is a native executable now (version 2):

$ file /usr/bin/pgbackrest
/usr/bin/pgbackrest: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=5e3f6123d02e0013b53f6568f99409378d43ad89, not stripped

Some of the other changes DBAs should keep in mind are:

  1. thread-max option is no longer valid – use process-max instead.
  2. archive-max-mb option is no longer valid and has been replaced with the archive-push-queue-max option which has different semantics
  3. The default for the backup-user (deprecated), which is a new repo-host-user, has changed from backrest to pgbackrest.
  4. The configuration file has changed from /etc/pgbackrest.conf to /etc/pgbackrest/pgbackrest.conf

Building from source

We may want to build pgBackRest depending on our environment and version. Building pgBackrest from source on Debian / Ubuntu is already covered in the official documentation. Below I’ve provided the steps to follow for the Red Hat family.

Get the tarball of the latest release:

curl -LO https://github.com/pgbackrest/pgbackrest/archive/release/2.13.tar.gz

In order to build from source, we need to build tools and libraries:

sudo yum install gcc make zlib-devel libxml2-devel openssl-devel

Now we can unpack the tarball and move it to the unpacked the directory:

tar -xvf 2.13.tar.gz
cd pgbackrest-release-2.13

Before proceeding any further, we should make sure that perl is built with the  USE_64_BIT_INT option.

$ perl -V | grep USE_64_BIT_INT
----------output-------------
USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES

pgBackRest depends on a few perl libaries. If they are not already present in the system, we have to install them:

sudo yum install perl-devel perl-ExtUtils-Embed.noarch perl-JSON-PP.noarch perl-Compress-Raw-Zlib.x86_64 perl-DBD-Pg.x86_64

Now we can build pgBackRest by executing “make” in the src directory

cd src
make

If the build is successful we should see the pgbackrest executable in the src directory. We should move it to a standard path location like /usr/bin

sudo cp ./pgbackrest /usr/bin/

Now we can create the repository location and log directory for pgBackRest. We grant privileges on that directory to the OS user that will execute pgBackRest.

sudo mkdir /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest
sudo mkdir -p /var/log/pgbackrest
sudo chown postgres:postgres /var/log/pgbackrest

Configuration

The very first step is to create a configuration file  /etc/pgbackrest.conf for the stanza. A stanza defines the backup configuration for a specific PostgreSQL database cluster. Any global configuration sections can be overridden to define stanza-specific settings.

sudo mkdir /etc/pgbackrest
sudo bash -c 'cat << EOF  > /etc/pgbackrest/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
[pg0app]
pg1-path=/var/lib/pgsql/11/data
pg1-port=5432
EOF'

Here, we have specified the backup repository location to be /var/lib/pgbackrest.

Once we have defined the configuration file, we can proceed to create the stanza

$ pgbackrest stanza-create --stanza=pg0app --log-level-console=info

This will produce output like this:

2019-04-23 11:47:50.640 P00   INFO: stanza-create command begin 2.13: --log-level-console=info --pg1-path=/var/lib/pgsql/11/data --pg1-port=5432 --repo1-path=/var/lib/pgbackrest --stanza=pg0app
2019-04-23 11:47:51.166 P00   INFO: stanza-create command end: completed successfully (527ms)

Now we need to set up the database parameters to use pgbackrest. In particular, we should use pgbackrest for archive_command. One of the major concerns for using cp like utilities in archive_command is that they are lazy writers: they do not ensure that everything is written and that fsync is called. This is a potential hole in many backup configurations and pgbackrest can solve this issue. Here is a sample set of parameter changes:

ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET archive_mode = 'on';
ALTER SYSTEM SET archive_command = 'pgbackrest --stanza=pg0app archive-push %p';
ALTER SYSTEM SET max_wal_senders = '10';
ALTER SYSTEM SET hot_standby = 'on';

From version 2.12 onwards, the archive-push used in the above setting is entirely coded in C, making it more lightweight and faster to execute.

Basic backup configuration is complete now and we can check the backup setup:

$ pgbackrest check --stanza=pg0app --log-level-console=info

This should produce output like this:

2019-04-23 13:57:33.241 P00   INFO: check command begin 2.13: --log-level-console=info --pg1-path=/var/lib/pgsql/11/data --pg1-port=5432 --repo1-path=/var/lib/pgbackrest --stanza=pg0app
2019-04-23 13:57:35.184 P00   INFO: WAL segment 00000001000000000000003D successfully stored in the archive at '/var/lib/pgbackrest/archive/pg0app/11-1/0000000100000000/00000001000000000000003D-9335cf780f0e1e468de62e0cbf22e7953a84c0c4.gz'
2019-04-23 13:57:35.185 P00   INFO: check command end: completed successfully (1946ms)

Taking a full backup

Now we are all set to take a full backup. We can push a backup without specifying its type like this:

$ pgbackrest backup --stanza=pg0app --log-level-console=info

In my case, I received this output:

2019-04-23 13:58:31.912 P00   INFO: backup command begin 2.13: --log-level-console=info --pg1-path=/var/lib/pgsql/11/data --pg1-port=5432 --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanza=pg0app
WARN: no prior backup exists, incr backup has been changed to full
2019-04-23 13:58:32.775 P00   INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2019-04-23 13:58:32": backup begins after the next regular checkpoint completes
2019-04-23 13:58:32.981 P00   INFO: backup start archive = 00000001000000000000003F, lsn = 0/3F000028
2019-04-23 13:58:36.712 P01   INFO: backup file /var/lib/pgsql/11/data/base/18126/26241 (128MB, 57%) checksum 40a6c0cc69e81b9aaa93977bc5e3f809ae336b79
2019-04-23 13:58:37.958 P01   INFO: backup file /var/lib/pgsql/11/data/base/18126/26259 (21.4MB, 66%) checksum c34d484c34431dcc757e48c709c9ae68e694fb13
2019-04-23 13:58:38.223 P01   INFO: backup file /var/lib/pgsql/11/data/base/18126/26272 (15.7MB, 73%) checksum fe3155a0c6b7f6c94e846c47557f5e1eca6a89ef

There are a couple of points to note here.

  1. Since we are taking a backup for the first time, it detects that there is no prior backup and switches to full database backup.
  2. The output says pg_start_backup() is issued in non-exclusive mode. pgBackRest handles the creation of label file.

At the end of the backup,  I received these details, and you should look for something similar:

2019-04-23 13:58:56.024 P01   INFO: backup file /var/lib/pgsql/11/data/base/1/13723 (0B, 100%)
2019-04-23 13:58:56.031 P01   INFO: backup file /var/lib/pgsql/11/data/base/1/13718 (0B, 100%)
2019-04-23 13:58:56.076 P00   INFO: full backup size = 224.5MB
2019-04-23 13:58:56.076 P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2019-04-23 13:58:56.200 P00   INFO: backup stop archive = 00000001000000000000003F, lsn = 0/3F000130
2019-04-23 13:58:56.675 P00   INFO: new backup label = 20190423-135832F
2019-04-23 13:58:56.727 P00   INFO: backup command end: completed successfully (24817ms)
2019-04-23 13:58:56.728 P00   INFO: expire command begin
2019-04-23 13:58:56.737 P00   INFO: full backup total < 2 - using oldest full backup for 11-1 archive retention
2019-04-23 13:58:56.739 P00   INFO: expire command end: completed successfully (11ms)

Check the backup related information

$ pgbackrest info
stanza: pg0app
    status: ok
    cipher: none
    db (current)
        wal archive min/max (11-1): 000000010000000000000043/000000010000000000000043
        full backup: 20190423-143526F
            timestamp start/stop: 2019-04-23 14:35:26 / 2019-04-23 14:35:39
            wal start/stop: 000000010000000000000043 / 000000010000000000000043
            database size: 224.6MB, backup size: 224.6MB
            repository size: 14.1MB, repository backup size: 14.1MB

As we can see, pgBackRest has compressed the backup to 14.1MB in size. This represents a great reduction compared to the size of the original database.

If a good full backup exists, then by default pgBackRest will attempt to perform an incremental backup.

Restoring backup

Restoring the backup is as simple as a single liner. pgBackRest will find out the correct full backup/ differential backup and incremental backup and do the restore for you.

$ pgbackrest restore --stanza=pg0app --log-level-console=info

The restore will display details on screen. Towards the end of the restore you should see a few lines as follows:

2019-04-23 15:28:29.113 P01   INFO: restore file /var/lib/pgsql/11/data/base/1/13723 (0B, 100%)
2019-04-23 15:28:29.120 P01   INFO: restore file /var/lib/pgsql/11/data/base/1/13718 (0B, 100%)
2019-04-23 15:28:29.121 P00   INFO: write /var/lib/pgsql/11/data/recovery.conf
2019-04-23 15:28:29.123 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2019-04-23 15:28:29.127 P00   INFO: restore command end: completed successfully (22290ms)

So pgBackRest creates a recovery.conf file automatically as the log output says. pgBackRest automatically inserts the restore_command to restore archived WAL logs like:

restore_command = 'pgbackrest --log-level-console=info --stanza=pg0app archive-get %f "%p"'

As a DBA user, you can just start up PostgreSQL as usual. In my case I just needed this command:

$ sudo systemctl start postgresql-11

PostgreSQL executes the above-mentioned restore_command as it starts and this will get all the archived WALs that are needed to complete the recovery before starting up.

Summary

pgBackRest is continuously evolving and it as it matures it’s emerging as one of the best backup solutions for PostgreSQL. The new features, especially in Version 2, are impressive. It takes away the mundane tasks of backup configuration and retention policies, which is extremely helpful.

Don’t forget! At Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track. Please see all our PostgreSQL talks here.


Photo of elephant by Magda Ehlers from Pexels

Apr
29
2019
--

ZFS For MongoDB Backups

mongodb backup using zfs

mongodb backup using zfsWe have successfully used ZFS for MySQL® backups and MongoDB® is no different. Normally, backups will be taken from a hidden secondary, either with

mongodump

 , WT hot backup or filesystem snapshots. In the case of the latter, instead of LVM2, we will use ZFS and discuss potential other benefits.

Preparation for initial snapshot

Before taking a ZFS snapshot, it is important to use

db.fsyncLock()

. This allows a consistent on disk copy of the data by blocking writes. It gives the server the time it needs to commit the journal to disk before the snapshot is taken.

My MongoDB instance below is running a ZFS volume and we will take an initial snapshot.

revin@mongodb:~$ sudo zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo        596M  9.04G    24K  /zfs-mongo
zfs-mongo/data   592M  9.04G   592M  /zfs-mongo/data
revin@mongodb:~$ mongo --port 28020 --eval 'db.serverCmdLineOpts().parsed.storage' --quiet
{
    "dbPath" : "/zfs-mongo/data/m40",
    "journal" : {
        "enabled" : true
    },
    "wiredTiger" : {
        "engineConfig" : {
            "cacheSizeGB" : 0.25
        }
    }
}
revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncLock()' --quiet
{
    "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
    "lockCount" : NumberLong(1),
...
}
revin@mongodb:~$ sleep 0.6
revin@mongodb:~$ sudo zfs snapshot zfs-mongo/data@full
revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncUnlock()' --quiet
{
    "info" : "fsyncUnlock completed",
    "lockCount" : NumberLong(0),
...
}

Notice the addition of sleep on line 23 of my command above. This is to ensure that even with the maximum

storage.journal.commitIntervalMs

of 500ms we allow enough time to commit the data to disk. This is simply an extra layer of guarantee and may not be necessary if you have very low journal commit interval.

revin@mongodb:~$ sudo zfs list -t all
NAME                  USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo             596M  9.04G    24K  /zfs-mongo
zfs-mongo/data        592M  9.04G   592M  /zfs-mongo/data
zfs-mongo/data@full   192K      -   592M  -

Now I have a snapshot…

At this point, I have a snapshot I can use for a number of purposes.

  • Replicate a full and delta snapshot to a remote storage or region with tools like zrepl. This allows for an extra layer of redundancy and disaster recovery.
  • Use the snapshots to rebuild, replace or create new secondary nodes or refresh test/development servers regularly.
  • Use the snapshots to do point in time recovery. ZFS snapshots are relatively cost free so it is possible to take snapshots even at five minutes interval! This is actually my favorite use case and feature.

Let’s say we take snapshots every five minutes. If a collection was accidentally dropped or even just a few rows were deleted, we can mount the last snapshot before this event. If the event was discovered in less than five minutes (perhaps that’s unrealistic) we only need to replay less than five minutes of oplog!

Point-in-Time-Recovery

To start a PITR, first clone the snapshot. Cloning the snapshot like below will automatically mount it. We can then start a temporary mongod instance with this mounted directory.

revin@mongodb:~$ sudo zfs clone zfs-mongo/data@full zfs-mongo/data-clone
revin@mongodb:~$ sudo zfs list -t all
NAME                   USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo              606M  9.04G    24K  /zfs-mongo
zfs-mongo/data         600M  9.04G   592M  /zfs-mongo/data
zfs-mongo/data@full   8.46M      -   592M  -
zfs-mongo/data-clone     1K  9.04G   592M  /zfs-mongo/data-clone
revin@mongodb:~$ ./mongodb-linux-x86_64-4.0.8/bin/mongod \
	--dbpath /zfs-mongo/data-clone/m40 \
	--port 28021 --oplogSize 200 --wiredTigerCacheSizeGB 0.25

Once mongod has started, I would like to find out the last oplog event it has completed.

revin@mongodb:~$ mongo --port 28021 local --quiet \
>     --eval 'db.oplog.rs.find({},{ts: 1}).sort({ts: -1}).limit(1)'
{ "ts" : Timestamp(1555356271, 1) }

We can use this timestamp to dump the oplog from the current production and use it to replay on our temporary instance.

revin@mongodb:~$ mkdir ~/mongodump28020
revin@mongodb:~$ cd ~/mongodump28020
revin@mongodb:~/mongodump28020$ mongodump --port 28020 -d local -c oplog.rs \
>     --query '{ts: {$gt: Timestamp(1555356271, 1)}}'
2019-04-16T23:57:50.708+0000	writing local.oplog.rs to
2019-04-16T23:57:52.723+0000	done dumping local.oplog.rs (186444 documents)

Assuming our bad incident occurred 30 seconds from the time this snapshot was taken, we can apply the oplog dump with mongorestore. Be aware, you’d have to identify this from your own oplog.

revin@mongodb:~/mongodump28020$ mv dump/local/oplog.rs.bson dump/oplog.bson
revin@mongodb:~/mongodump28020$ rm -rf dump/local
revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'
79767
revin@mongodb:~/mongodump28020$ mongorestore --port 28021 --dir=dump/ --oplogReplay \
>     --oplogLimit 1555356302 -vvv

Note the

oplogLimit

  above shows a 31 seconds difference from the snapshot’s. Since we want to apply the next 30 seconds from the time the snapshot was taken,

oplogLimit

  takes a value before the specified value.

2019-04-17T00:06:46.410+0000	using --dir flag instead of arguments
2019-04-17T00:06:46.412+0000	checking options
2019-04-17T00:06:46.413+0000		dumping with object check disabled
2019-04-17T00:06:46.414+0000	will listen for SIGTERM, SIGINT, and SIGKILL
2019-04-17T00:06:46.418+0000	connected to node type: standalone
2019-04-17T00:06:46.418+0000	standalone server: setting write concern w to 1
2019-04-17T00:06:46.419+0000	using write concern: w='1', j=false, fsync=false, wtimeout=0
2019-04-17T00:06:46.420+0000	mongorestore target is a directory, not a file
2019-04-17T00:06:46.421+0000	preparing collections to restore from
2019-04-17T00:06:46.421+0000	using dump as dump root directory
2019-04-17T00:06:46.421+0000	found oplog.bson file to replay
2019-04-17T00:06:46.421+0000	enqueued collection '.oplog'
2019-04-17T00:06:46.421+0000	finalizing intent manager with multi-database longest task first prioritizer
2019-04-17T00:06:46.421+0000	restoring up to 4 collections in parallel
...
2019-04-17T00:06:46.421+0000	replaying oplog
2019-04-17T00:06:46.446+0000	timestamp 6680204450717499393 is not below limit of 6680204450717499392; ending oplog restoration
2019-04-17T00:06:46.446+0000	applied 45 ops
2019-04-17T00:06:46.446+0000	done

After applying 45 oplog events, we can see additional documents has been added to the

percona.session

  collection.

revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'
79792

Conclusion

Because snapshots are immediately available and because of its support for deltas, ZFS is quite ideal for large datasets that would otherwise take hours for other backup tools to complete.


Photo by Designecologist from Pexels

Mar
04
2019
--

Percona XtraBackup 8.0.5 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackup 8.0

Percona is glad to announce the release of Percona XtraBackup 8.0.5 on March 4, 2019. Downloads are available from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

Percona XtraBackup 8.0.5 introduces the support of undo tablespaces created using the new syntax (CREATE UNDO TABLESPACEavailable since MySQL 8.0.14. Percona XtraBackup also supports the binary log encryption introduced in MySQL 8.0.14.

Two new options were added to xbstream. Use the --decompress option with xbstream to decompress individual qpress files. With the --decompress-threads option, specify the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.

This release of Percona XtraBackup is a General Availability release ready for use in a production environment.

All Percona software is open-source and free.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using Percona XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona XtraBackup 8.0.5, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

New Features

  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.
  • PXB-1768: Added support for undo tablespaces created with the new MySQL 8.0.14 syntax.
  • PXB-1781: Added support for binary log encryption introduced in MySQL 8.0.14.
  • PXB-1797: For xbstream, two new options were added. The --decompress option enables xbstream to decompress individual qpress files. The --decompress-threads option controls the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.

Bugs Fixed

  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory” error. Bugs fixed PXB-1691 and PXB-1698.
  • When Percona XtraBackup was started run with the --slave-info, incorrect coordinates were written to the xtrabackup_slave_info file. Bug fixed PXB-1737
  • Percona XtraBackup could crash at the prepare stage when making an incremental backup if the variable innodb-rollback-segments was changed after starting the MySQL Server. Bug fixed PXB-1785.
  • The full backup could fail when Percona Server was started with the --innodb-encrypt-tables parameter. Bug fixed PXB-1793.

Other bugs fixed: PXB-1632PXB-1715PXB-1770PXB-1771PXB-1773.

Jan
10
2019
--

Percona Backup for MongoDB 0.2.0-Alpha Is Now Available

Percona Backup for MongoDB

Percona Backup for MongoDBPercona announces the first public release of Percona Backup for MongoDB 0.2.0-Alpha on January 10, 2019.

Percona Backup for MongoDB is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a single replica set), and for restoring those backups to a specific point in time. Percona Backup for MongoDB uses a distributed client/server architecture to perform backup/restore actions. The project was inspired by (and intends to replace) the Percona-Lab/mongodb_consistent_backup tool.

This release features:

  • Consistent backup of sharded clusters
  • Compression of oplogs and logical backups
  • Backup and restore from local files
  • Backup to S3
  • Running the backup on a single replica set using the safest node (preferably non-Primary or hidden nodes with the lowest replication priority and smallest replication lag)

Future releases will include:

Percona Backup for MongoDB supports Percona Server for MongoDB or MongoDB Community Server version 3.6 or higher with MongoDB replication enabled. Binaries for the supported platforms as well as the tarball with source code are available from the GitHub repository (https://github.com/percona/percona-backup-mongodb/releases/tag/v0.2.0). For more information about Percona Backup for MongoDB and the installation steps, see this README file.

Note Percona doesn’t recommend this release for production, and its API and configuration fields are likely to change in the future. It does not feature any API level security. You are welcome to report any bugs you encounter in our bug tracking system.

Percona Backup for MongoDB

Percona Backup for MongoDB process and interactions between key components.

 

Dec
21
2018
--

Backup and Restore a PostgreSQL Cluster With Multiple Tablespaces Using pg_basebackup

PostgreSQL backup cluster multiple tablespaces

PostgreSQL logopg_basebackup is a widely used PostgreSQL backup tool that allows us to take an ONLINE and CONSISTENT file system level backup. These backups can be used for point-in-time-recovery or to set up a slave/standby. You may want to refer to our previous blog posts, PostgreSQL Backup StrategyStreaming Replication in PostgreSQL and Faster PITR in PostgreSQL where we describe how we used pg_basebackup for different purposes. In this post, I’ll demonstrate the steps to restore a backup taken using pg_basebackup when we have many tablespaces that store databases or their underlying objects.

A simple backup can be taken using the following syntax.

Tar and Compressed Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Ft -z -Xs -P
Plain Format
$ pg_basebackup -h localhost -p 5432 -U postgres -D /backupdir/latest_backup -Fp -Xs -P

Using a tar and compressed format is advantageous when you wish to use less disk space to backup and store all tablespaces, data directory and WAL segments, with everything in just one directory (target directory for backup).

Whereas a plain format stores a copy of the data directory as is, in the target directory. When you have one or more non-default tablespaces, tablespaces may be stored in a separate directory. This is usually the same as the original location, unless you use

--tablespace-mapping

  to modify the destination for storing the tablespaces backup.

PostgreSQL supports the concept of tablespaces. In simple words, a tablespace helps us maintain multiple locations to scatter databases or their objects. In this way, we can distribute the IO and balance the load across multiple disks.

To understand what happens when we backup a PostgreSQL cluster that contains multiple tablespaces, let’s consider the following example. We’ll take these steps:

  • Create two tablespaces in an existing master-slave replication setup.
  • Take a backup and see what is inside the backup directory.
  • Restore the backup.
  • Conclude our findings

Create 2 tablespaces and take a backup (tar format) using pg_basebackup

Step 1 :

I set up a replication cluster using PostgreSQL 11.2. You can refer to our blog post Streaming Replication in PostgreSQL to reproduce the same scenario. Here are the steps used to create two tablespaces:

$ sudo mkdir /data_pgbench
$ sudo mkdir /data_pgtest
$ psql -c "CREATE TABLESPACE data_pgbench LOCATION '/data_pgbench'"
$ psql -c "CREATE TABLESPACE data_pgtest LOCATION '/data_pgtest'"
$ psql -c "select oid, spcname, pg_tablespace_location(oid) from pg_tablespace"
oid | spcname | pg_tablespace_location
-------+--------------+------------------------
1663 | pg_default |
1664 | pg_global |
16419 | data_pgbench | /data_pgbench
16420 | data_pgtest | /data_pgtest
(4 rows)

Step 2 :

Now, I create two databases in two different tablespaces, using pgbench to create a few tables and load some data in them.

$ psql -c "CREATE DATABASE pgbench TABLESPACE data_pgbench"
$ psql -c "CREATE DATABASE pgtest TABLESPACE data_pgtest"
$ pgbench -i pgbench
$ pgbench -i pgtest

In a master-slave setup built using streaming replication, you must ensure that the directories exist in the slave, before running a

"CREATE TABLESPACE ..."

  on the master. This is because, the same statements used to create a tablespace are shipped/applied to the slave through WALs – this is unavoidable. The slave crashes with the following message, when these directories do not exist:

2018-12-15 12:00:56.319 UTC [13121] LOG: consistent recovery state reached at 0/80000F8
2018-12-15 12:00:56.319 UTC [13119] LOG: database system is ready to accept read only connections
2018-12-15 12:00:56.327 UTC [13125] LOG: started streaming WAL from primary at 0/9000000 on timeline 1
2018-12-15 12:26:36.310 UTC [13121] FATAL: directory "/data_pgbench" does not exist
2018-12-15 12:26:36.310 UTC [13121] HINT: Create this directory for the tablespace before restarting the server.
2018-12-15 12:26:36.310 UTC [13121] CONTEXT: WAL redo at 0/9000448 for Tablespace/CREATE: 16417 "/data_pgbench"
2018-12-15 12:26:36.311 UTC [13119] LOG: startup process (PID 13121) exited with exit code 1
2018-12-15 12:26:36.311 UTC [13119] LOG: terminating any other active server processes
2018-12-15 12:26:36.314 UTC [13119] LOG: database system is shut down
2018-12-15 12:27:01.906 UTC [13147] LOG: database system was interrupted while in recovery at log time 2018-12-15 12:06:13 UTC
2018-12-15 12:27:01.906 UTC [13147] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

Step 3 :

Let’s now use pg_basebackup to take a backup. In this example, I use a tar format backup.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Ft -z -Xs -P
94390/94390 kB (100%), 3/3 tablespaces

In the above log, you could see that there are three tablespaces that have been backed up: one default, and two newly created tablespaces. If we go back and check how the data in the two tablespaces are distributed to appropriate directories, we see that there are symbolic links created inside the pg_tblspc directory (within the data directory) for the oid’s of both tablespaces. These links are directed to the actual location of the tablespaces, we specified in Step 1.

$ ls -l $PGDATA/pg_tblspc
total 0
lrwxrwxrwx. 1 postgres postgres 5 Dec 15 12:31 16419 -> /data_pgbench
lrwxrwxrwx. 1 postgres postgres 6 Dec 15 12:31 16420 -> /data_pgtest

Step 4 :

Here are the contents inside the backup directory, that was generated through the backup taken in Step 3.

$ ls -l /backup/latest_backup
total 8520
-rw-------. 1 postgres postgres 1791930 Dec 15 12:54 16419.tar.gz
-rw-------. 1 postgres postgres 1791953 Dec 15 12:54 16420.tar.gz
-rw-------. 1 postgres postgres 5113532 Dec 15 12:54 base.tar.gz
-rw-------. 1 postgres postgres 17097 Dec 15 12:54 pg_wal.tar.gz

Tar Files :

16419.tar.gz

 and

16420.tar.gz

 are created as a backup for the two tablespaces. These are created with the same names as the OIDs of their respective tablespaces.

Let’s now take a look how we can restore this backup to completely different locations for data and tablespaces.

Restore a backup with multiple tablespaces

Step 1 :

In order to proceed further with the restore, let’s first extract the base.tar.gz file. This file contains some important files that help us to proceed further.

$ tar xzf /backup/latest_backup/base.tar.gz -C /pgdata
$ ls -larth /pgdata
total 76K
drwx------. 2 postgres postgres 18 Dec 14 14:15 pg_xact
-rw-------. 1 postgres postgres 3 Dec 14 14:15 PG_VERSION
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_twophase
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_subtrans
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_snapshots
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_serial
drwx------. 4 postgres postgres 36 Dec 14 14:15 pg_multixact
-rw-------. 1 postgres postgres 1.6K Dec 14 14:15 pg_ident.conf
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_dynshmem
drwx------. 2 postgres postgres 6 Dec 14 14:15 pg_commit_ts
drwx------. 6 postgres postgres 54 Dec 14 14:18 base
-rw-------. 1 postgres postgres 4.5K Dec 14 16:16 pg_hba.conf
-rw-------. 1 postgres postgres 208 Dec 14 16:18 postgresql.auto.conf
drwx------. 2 postgres postgres 6 Dec 14 16:18 pg_stat
drwx------. 2 postgres postgres 58 Dec 15 00:00 log
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_stat_tmp
drwx------. 2 postgres postgres 6 Dec 15 12:54 pg_replslot
drwx------. 4 postgres postgres 68 Dec 15 12:54 pg_logical
-rw-------. 1 postgres postgres 224 Dec 15 12:54 backup_label
drwx------. 3 postgres postgres 28 Dec 15 12:57 pg_wal
drwx------. 2 postgres postgres 4.0K Dec 15 12:57 global
drwx------. 2 postgres postgres 32 Dec 15 13:01 pg_tblspc
-rw-------. 1 postgres postgres 55 Dec 15 13:01 tablespace_map
-rw-------. 1 postgres postgres 24K Dec 15 13:04 postgresql.conf
-rw-r--r--. 1 postgres postgres 64 Dec 15 13:07 recovery.conf
-rw-------. 1 postgres postgres 44 Dec 15 13:07 postmaster.opts
drwx------. 2 postgres postgres 18 Dec 15 13:07 pg_notify
-rw-------. 1 postgres postgres 30 Dec 15 13:07 current_logfiles

Step 2 :

The files that we need to consider for our recovery are :

  • backup_label
  • tablespace_map

When you open the backup_label file, we see the start WAL location, backup start time, etc. These are some details that help us perform a point-in-time-recovery.

$ cat backup_label
START WAL LOCATION: 0/B000028 (file 00000001000000000000000B)
CHECKPOINT LOCATION: 0/B000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-12-15 12:54:10 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

Now, let us see what is inside the

tablespace_map

 file.

$ cat tablespace_map
16419 /data_pgbench
16420 /data_pgtest

In the above log, you could see that there are two entries – one for each tablespace. This is a file that maps a tablespace (oid) to its location. When you start PostgreSQL after extracting the tablespace and WAL tar files, symbolic links are created automatically by postgres – inside the pg_tblspc directory for each tablespace – to the appropriate tablespace location, using the mapping done in this files.

Step 3 :

Now, in order to restore this backup in the same postgres server from where the backup was taken, you must remove the existing data in the original tablespace directories. This allows you to extract the tar files of each tablespaces to the appropriate tablespace locations.

The actual commands for extracting tablespaces from the backup in this case were the following:

$ tar xzf 16419.tar.gz -C /data_pgbench (Original tablespace location)
$ tar xzf 16420.tar.gz -C /data_pgtest  (Original tablespace location)

In a scenario where you want to restore the backup to the same machine from where the backup was originally taken, we must use different locations while extracting the data directory and tablespaces from the backup. In order to achieve that, tar files for individual tablespaces may be extracted to different directories than the original directories specified in

tablespace_map

 file, upon which we can modify the

tablespace_map

 file with the new tablespace locations. The next two steps should help you to see how this works.

Step 3a :

Create two different directories and extract the tablespaces to them.

$ tar xzf 16419.tar.gz -C /pgdata_pgbench (Different location for tablespace than original)
$ tar xzf 16420.tar.gz -C /pgdata_pgtest  (Different location for tablespace than original)

Step 3b :

Edit the

tablespace_map

 file with the new tablespace locations. Replace the original location of each tablespace with the new location, where we have extracted the tablespaces in the previous step. Here is how it appears after the edit.

$ cat tablespace_map
16419 /pgdata_pgbench
16420 /pgdata_pgtest

Step 4 :

Extract pg_wal.tar.gz from backup to pg_wal directory of the new data directory.

$ tar xzf pg_wal.tar.gz -C /pgdata/pg_wal

Step 5 :

Create

recovery.conf

 to specify the time until when you wish to perform a point-in-time-recovery. Please refer to our previous blog post – Step 3, to understand recovery.conf for PITR in detail.

Step 6 :

Once all of the steps above are complete you can start PostgreSQL.
You should see the following files renamed after recovery.

backup_label   --> backup_label.old
tablespace_map --> tablespace_map.old
recovery.conf  --> recovery.done

To avoid the exercise of manually modifying the tablespace_map file, you can use

--tablespace-mapping

 . This is an option that works when you use a plain format backup, but not with tar. Let’s see why you may prefer a tar format when compared to plain.

Backup of PostgreSQL cluster with tablespaces using plain format

Consider the same scenario where you have a PostgreSQL cluster with two tablespaces. You might see the following error when you do not use

--tablespace-mapping

 .

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -Fp -Xs -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/22000028 on timeline 1
pg_basebackup: directory "/data_pgbench" exists but is not empty
pg_basebackup: removing contents of data directory "/backup/latest_backup"

What the above error means is that the pg_basebackup is trying to store the tablespaces in the same location as the original tablespace directory. Here

/data_pgbench

 is the location of tablespace :

data_pgbench.

 And, now, pg_basebackup is trying to store the tablespace backup in the same location. In order to overcome this error, you can apply tablespace mapping using the following syntax.

$ pg_basebackup -h localhost -p 5432 -U postgres -D /backup/latest_backup -T "/data_pgbench=/pgdata_pgbench" -T "/data_pgtest=/pgdata_pgtest" -Fp -Xs -P

-T

 is used to specify the tablespace mapping.

-T

 can be replaced by

--tablespace-mapping

.

The advantage of using -T (

--tablespace-mapping

 ) is that the tablespaces are stored separately in the mapping directories. In this example with plain format backup, you must extract all the following three directories in order to restore/recover the database using backup.

  • /backup/latest_backup
  • /pgdata_pgtest
  • /pgdata_pgbench

However, you do not need a

tablespace_map

  file in this scenario, as it is automatically taken care of by PostgreSQL.
If you take a backup in tar format, you see all the tar files for base, tablespaces and WAL segments stored in the same backup directory, and just this directory can be extracted for performing restore/recovery. However, you must manually extract the tablespaces and WAL segments to appropriate locations and edit the tablespace_map file, as discussed above.


Image based on Photos by Alan James Hendry on Unsplash   and  Tanner Boriack on Unsplash

Dec
10
2018
--

Percona XtraBackup 8.0.4 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackup 8.0Percona is glad to announce the release of Percona XtraBackup 8.0.4 on December 10, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

This release of Percona Xtrabackup is a General Availability release ready for use in a production environment.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona Xtrabackup 8.0.4, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

All Percona software is open-source and free. We are grateful to the community for the invaluable contributions to Percona XtraBackup. We would especially like to highlight the input of Alexey Kopytov who has been actively offering improvements and submitting bug reports for Percona XtraBackup.

New Features

  • Percona XtraBackup 8.0.4 is based on MySQL 8.0.13 and fully supports Percona Server for MySQL 8.0 series and MySQL 8.0 series.

Bugs Fixed

  • PXB-1699:xtrabackup --prepare could fail on backups of MySQL 8.0.13 databases
  • PXB-1704:xtrabackup --prepare could hang while performing insert buffer merge
  • PXB-1668: When the --throttle option was used, the applied value was different from the one specified by the user (off by one error)
  • PXB-1679: PXB could crash when ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL
Nov
05
2018
--

How to Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node in InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication (Shutterstock)

In this blog, we’ll look at how to quickly add a node to an InnoDB Cluster or Group Replication using Percona XtraBackup.

Adding nodes to a Group Replication cluster can be easy (documented here), but it only works if the existing nodes have retained all the binary logs since the creation of the cluster. Obviously, this is possible if you create a new cluster from scratch. The nodes rotate old logs after some time, however. Technically, if the

gtid_purged

 set is non-empty, it means you will need another method to add a new node to a cluster. You also need a different method if data becomes inconsistent across cluster nodes for any reason. For example, you might hit something similar to this bug, or fall prey to human error.

Hot Backup to the Rescue

The quick and simple method I’ll present here requires the Percona XtraBackup tool to be installed, as well as some additional small tools for convenience. I tested my example on Centos 7, but it works similarly on other Linux distributions. First of all, you will need the Percona repository installed:

# yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm -y -q

Then, install Percona XtraBackup and the additional tools. You might need to enable the EPEL repo for the additional tools and the experimental Percona repo for XtraBackup 8.0 that works with MySQL 8.0. (Note: XtraBackup 8.0 is still not GA when writing this article, and we do NOT recommend or advise that you install XtraBackup 8.0 into a production environment until it is GA). For MySQL 5.7, Xtrabackup 2.4 from the regular repo is what you are looking for:

# grep -A3 percona-experimental-\$basearch /etc/yum.repos.d/percona-release.repo
[percona-experimental-$basearch]
name = Percona-Experimental YUM repository - $basearch
baseurl = http://repo.percona.com/experimental/$releasever/RPMS/$basearch
enabled = 1

# yum install pv pigz nmap-ncat percona-xtrabackup-80 -q
==============================================================================================================================================
 Package                             Arch                 Version                             Repository                                 Size
==============================================================================================================================================
Installing:
 nmap-ncat                           x86_64               2:6.40-13.el7                       base                                      205 k
 percona-xtrabackup-80               x86_64               8.0.1-2.alpha2.el7                  percona-experimental-x86_64                13 M
 pigz                                x86_64               2.3.4-1.el7                         epel                                       81 k
 pv                                  x86_64               1.4.6-1.el7                         epel                                       47 k
Installing for dependencies:
 perl-DBD-MySQL                      x86_64               4.023-6.el7                         base                                      140 k
Transaction Summary
==============================================================================================================================================
Install  4 Packages (+1 Dependent package)
Is this ok [y/d/N]: y
#

You need to do it on both the source and destination nodes. Now, my existing cluster node (I will call it a donor) – gr01 looks like this:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
  CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
   MEMBER_HOST: gr01
   MEMBER_PORT: 3306
  MEMBER_STATE: ONLINE
   MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.13
1 row in set (0.00 sec)
gr01 > show global variables like 'gtid%';
+----------------------------------+-----------------------------------------------+
| Variable_name                    | Value                                         |
+----------------------------------+-----------------------------------------------+
| gtid_executed                    | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662 |
| gtid_executed_compression_period | 1000                                          |
| gtid_mode                        | ON                                            |
| gtid_owned                       |                                               |
| gtid_purged                      | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-295538 |
+----------------------------------+-----------------------------------------------+
5 rows in set (0.01 sec)

The new node candidate (I will call it a joiner) – gr02, has no data but the same MySQL version installed. It also has the required settings in place, like the existing node address in group_replication_group_seeds, etc. The next step is to stop the MySQL service on the joiner (if already running), and wipe out it’s datadir:

[root@gr02 ~]# rm -fr /var/lib/mysql/*

and start the “listener” process, that waits to receive the data snapshot (remember to open the TCP port if you have a firewall):

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql

Then, start the backup job on the donor:

[root@gr01 ~]# xtrabackup --user=root --password=*** --backup --parallel=4 --stream=xbstream --target-dir=./ 2> backup.log |pv|pigz -c --fast| nc -w 2 192.168.56.98 4444
240MiB 0:00:02 [81.4MiB/s] [ <=>

On the joiner side, we will see:

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql
21.2MiB 0:03:30 [ 103kiB/s] [ <=> ]
[root@gr02 ~]# du -hs /var/lib/mysql
241M /var/lib/mysql

BTW, if you noticed the difference in transfer rate between the two, please note that on the donor side I put

|pv|

 before the compressor while in the joiner before decompressor. This way, I can monitor the compression ratio at the same time!

The next step will be to prepare the backup on joiner:

[root@gr02 ~]# xtrabackup --use-memory=1G --prepare --target-dir=/var/lib/mysql 2>prepare.log
[root@gr02 ~]# tail -1 prepare.log
181019 19:18:56 completed OK!

and fix the files ownership:

[root@gr02 ~]# chown -R mysql:mysql /var/lib/mysql

Now we should verify the GTID position information and restart the joiner (I have the

group_replication_start_on_boot=off

 in my.cnf):

[root@gr02 ~]# cat /var/lib/mysql/xtrabackup_binlog_info
binlog.000023 893 aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662
[root@gr02 ~]# systemctl restart mysqld

Now, let’s check if the position reported by the node is consistent with the above:

gr02 > select @@GLOBAL.gtid_executed;
+-----------------------------------------------+
| @@GLOBAL.gtid_executed                        |
+-----------------------------------------------+
| aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302660 |
+-----------------------------------------------+
1 row in set (0.00 sec)

No, it is not. We have to correct it:

gr02 > reset master; set global gtid_purged="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662";
Query OK, 0 rows affected (0.05 sec)
Query OK, 0 rows affected (0.00 sec)

Finally, start the replication:

gr02 > START GROUP_REPLICATION;
Query OK, 0 rows affected (3.91 sec)

Let’s check the cluster status again:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
MEMBER_HOST: gr01
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.13
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
MEMBER_HOST: gr02
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.13
2 rows in set (0.00 sec)
gr01 > select * from performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 3
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
            LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302665
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 2
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 3
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 2. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 0
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
            LAST_CONFLICT_FREE_TRANSACTION:
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 0
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
2 rows in set (0.00 sec)

OK, our cluster is consistent! The new node joined successfully as secondary. We can proceed to add more nodes!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com