Jan
24
2023
--

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshotsDatabases on Kubernetes continue their rising trend. We see the growing adoption of our Percona Kubernetes Operators and the demand to migrate workloads to the cloud-native platform. Our Operators provide built-in backup and restore capabilities, but some users are still looking for old-fashioned ways, like storage-level snapshots (i.e., AWS EBS Snapshots).

In this blog post, you will learn:

  1. How to back up and restore from storage snapshots using Percona Operators
  2. What the risks and limitations are of such backups

Overview

Volume Snapshots went GA in Kubernetes 1.20. Both your storage and Container Storage Interface (CSI) must support snapshots. All major cloud providers support them but might require some steps to enable it. For example, for GKE, you must create a VolumeSnapshotClass resource first.

At the high level, snapshotting on Kubernetes looks like this:

As PersistentVolume is represented by the real storage volume,

VolumeSnapshot

is the Kubernetes resource for volume snapshot in the cloud.

Getting ready for backups

First, we need to be sure that VolumeSnapshots are supported. For the major clouds, read the following docs:

Once you have CSI configured and Volume Snapshot Class is in place, proceed to create a backup.

Take the backup

Identify the PersistentVolumeClaims (PVC) that you want to snapshot. For example, for my MongoDB cluster, I have six PVCs: three x replica set nodes and three x config server nodes.

$ kubectl get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongod-data-my-cluster-name-cfg-0   Bound    pvc-c9fb5afa-1fc9-41f9-88f3-4ed457f88e58   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-cfg-1   Bound    pvc-b9253264-f79f-4fd0-8496-1d88105d84e5   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-cfg-2   Bound    pvc-5d462005-4015-47ad-9269-c205b7a3dfcb   3Gi        RWO            standard-rwo   76m
mongod-data-my-cluster-name-rs0-0   Bound    pvc-410acf85-36ad-4bfc-a838-f311f9dfd40b   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-rs0-1   Bound    pvc-a621dd8a-a671-4a35-bb3b-3f386550c101   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-rs0-2   Bound    pvc-484bb835-0e2d-4a40-b5a3-1ba340ec0567   3Gi        RWO            standard-rwo   76m

Each PVC will have its own VolumeSnapshot. Example for

mongod-data-my-cluster-name-cfg-0

:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongod-data-my-cluster-name-cfg-0-snap
spec:
  volumeSnapshotClassName: gke-snapshotclass
  source:
    persistentVolumeClaimName: mongod-data-my-cluster-name-cfg-0

I have listed all my VolumeSnapshots objects in one YAML manifest here.

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-volumesnapshots.yaml
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-2-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-2-snap created

VolumeSnapshotContent is created and bound to every

VolumeSnapshot

resource. Its status can tell you the name of the snapshot in the cloud and check if a snapshot is ready:

$ kubectl get volumesnapshotcontent snapcontent-0e67c3b5-551f-495b-b775-09d026ea3c8f -o yaml
…
status:
  creationTime: 1673260161919000000
  readyToUse: true
  restoreSize: 3221225472
  snapshotHandle: projects/percona-project/global/snapshots/snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f

  • snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f is the snapshot I have in GCP for the volume.
  • readyToUse: true – indicates that the snapshot is ready

Restore

The restoration process, in a nutshell, looks as follows:

  1. Create persistent volumes using the snapshots. The names of the volumes must match the standard that Operator uses.
  2. Provision the cluster

Like any other backup, it must have secrets in place: TLS and users.

You can use this restoration process to clone existing clusters as well, just make sure you change the cluster, PVCs, and Secret names.

Create persistent volumes from snapshots. It is the same as the creation of regular PersistentVolumeClaim, but with a

dataSource

section that points to the snapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongod-data-my-cluster-name-rs0-0
spec:
  dataSource:
    name: mongod-data-my-cluster-name-rs0-0-snap
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-pvc-restore.yaml
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-2 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-2 created

Once done, spin up the cluster as usual. The volumes you created earlier will be used automatically. Restoration is done.

Risks and limitations

Storage support

Both storage and the storage plugin in Kubernetes must support volume snapshots. This limits the choices. Apart from public clouds, there are open source solutions like Ceph (rook.io for k8s) that can provide snapshotting capabilities.

Point-in-time recovery

Point-in-time recovery (PITR) allows you to reduce your Point Recovery Objective by restoring or rolling back the database to a specific transaction or time.

Volume snapshots in the clouds store data in increments. The first snapshot holds all the data, and the following ones only store the changes. This significantly reduces your cloud bill. But snapshots cannot provide you with the same RPO as native database mechanisms.

Data consistency and corruption

Snapshots are not data-aware. When a snapshot is taken, numerous transactions and data modifications can happen. For example, heavy write activity and simultaneous compound index creation in MongoDB might lead to snapshot corruption. The biggest problem is that you will learn about data corruption during restoration.

Locking or freezing a filesystem before the snapshot would help to avoid such issues. Solutions like Velero or Veeam make the first steps towards data awareness and can create consistent snapshots by automating file system freezes or stopping replication.

Percona Services teams use various tools to automate the snapshot creation safely. Please contact us here to ensure data safety.

Cost

Public clouds store snapshots on cheap object storage but charge you extra for convenience. For example, the AWS EBS snapshot is priced at $0.05/GB, whereas S3 is only $0.023. It is a 2x difference, which for giant data sets might significantly increase your bill.

Time to recover

It is not a risk or limitation but a common misconception I often see: recovery from snapshots takes only a few seconds. It does not. When you create an EBS volume from the snapshot, it takes a few seconds. But in reality, the volume you just created does not have any data. You can read more about the internals of EBS snapshots in this nice blog post.

Conclusion

Volume Snapshots on Kubernetes can be used for databases but come with certain limitations and risks. Data safety and consistency are the most important factors when choosing a backup solution. For Percona Operators, we strongly recommend using built-in solutions which guarantee data consistency and minimize your recovery time and point objectives.

Learn More About Percona Kubernetes Operators

Mar
11
2022
--

PostgreSQL 101 for Non-Postgres DBAs (Simple Backup and Restore)

PostgreSQL 101 Backup and Restore

It’s no surprise that PostgreSQL is becoming the de facto goto database for many. Just a few of the many reasons include advanced technology, scalability, and ways to save money. With that said, we see many experienced DBAs being tasked with migrating existing databases from Oracle, MySQL, SQL Server, and others to Postgres. Although fundamentally speaking, a good DBA should have a conceptual knowledge and understanding of database fundamentals, translating your existing way of performing daily tasks differs from one technology to the other. With that in mind, this blog is addressed to those experienced DBAs that have a well-known and proven set of routines in their old technology and want to know how to perform them in Postgres.

Postgres offers several utilities for performing both physical and logical backups and restores. We will talk about these and how to use them here.

For the purpose of this mini-tutorial, we are assuming all tasks will be performed by the user “postgres”, which has superuser privileges on the database unless otherwise noted.

I Want To …..

 

Logical Backups

Logical backups are processed with native tools such as pg_dump and pg_dumpall. These tools should be included in the default bin directory for postgres installation such as /usr/pgsql-11/bin. If your path is not set, you may want to include the bin directory in your path.

There are many options that can be used when running these tools to customize your data dumps. So, we will cover a few scenarios in this blog.

 

Physical Backups

Physical backups are processed with native tools such as pg_basebackup. Again, these tools should be included in the default bin directory for postgres installation such as /usr/pgsql-11/bin. If your path is not set, you may want to include the bin directory in your path.

You can also use system tools for physical backups such as tar or other archiving tools at your disposal.

 

Prerequisite for Remote Backups

The source database server has to allow a remote connection for the user performing the task. Remember, we are assuming for our examples that the user is postgres. 

 

  1. Create an entry in the pg_hba.conf file similar to the following under the IPv4 connections section.

host    all        postgres        0.0.0.0/0               md5

      2. Edit your postgresql.conf file or whatever file you may be loading for runtime configs and change the parameter listen_addresses to the following:

listen_addresses = ‘*’

Once the above changes are made, reload your configuration file or restart postgres. 

The above examples are pretty open. For security, you most likely will restrict the IP address in the hba.conf file to a more specific IP, Subnet.

In our example, we are allowing postgres to connect from anywhere with password authentication. Thus, the 0.0.0.0/0 and md5. You could change the 0.0.0.0/0 to the address of the other database server like 192.168.1.2/32 We also specify the user postgres with the -U option since it is the user we opened up in the pg_hba.conf file.

If the user running the commands has different credentials on source/target servers you will need to save the password to .pgpass or set the environment variable PGPASSWORD so you are not prompted for the password whenever it is needed.

I want to dump my entire database, including users and credentials to a file.

This is quite a simple task to perform if you have the correct privileges and configuration settings along with the storage needed depending on your database size.

Performing the Data Dump Locally

If you have only one instance of postgres running on your server and have minimal / default configuration for the pg_hba.conf file and your path includes the postgres bin directory, all you need to do as user postgres is ….

pg_dumpall > savedfile.sql

The above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump a specific instance all you do is …

pg_dumpall -p port > savedfile.sql

Replace the port above with the port number the instance you wish to dump is running on.

Performing the data dump remotely.

Although this is pretty much the same thing as on a local server, there are a few things you need to have configured in order to execute this data dump remotely. Plus, your prerequisites need to be addressed.

Now from our remote client or server, we can run the following commands as long as the postgres tools are installed.

pg_dumpall -h host -p port -U postgres > savedfile.sql

Replace the host above with the address of the source DB and port with the port number it is running on.

There are other flags and options you can use. Have a look here for the usage options

I want to dump a specific database only.

Performing the data dump locally.

Similar to the other commands with a slight variation

pg_dump -d dname > savedfile.sql

Like in other scenarios, the above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump from a specific instance all you do is …

pg_dump -p port -d dbname > savedfile.sql

I want to dump a specific database and specific table or tables only.

On a local server

Similar to the other commands with a slight variation

pg_dump -d dname -t tablename > savedfile.sql

Like in other scenarios, the above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump from a specific instance all you do is …

pg_dump -p port -d dbname -t tablename > savedfile.sql

If you want more than one table, list their names or patterns like so …

pg_dump -d dname -t table1 -t table2 -t table3 > savedfile.sql

From a remote server

Just like in previous examples, specify the connection options with -h host -p port

I only want to dump the users and credentials to restore them somewhere else.

This is just as simple as the above data dumps. However, keep in mind that this will not get you what you need if your instance is an RDS instance. Amazon really locks down what you can do as a privileged user on an RDS instance. Even as Postgres.

From a local server

pg_dumpall -g > users.sql

From a remote server or client. ( saves file locally )

pg_dumpall -g -h host -p port -U postgres > users.sql

You can edit the above dump file and remove any user you do not wish to apply when you restore the file to a different server.

Restoring a Logical Dump

Restoring the newly created backup is a simple task. There are several ways to accomplish this and we will go over a few of these just to get you going.  Keep in mind there is a pg_restore utility as well which we will not be addressing in this blog. Pg_restore lets you get more creative with your dumps and imports.

Again, we assume all actions here are executed as user postgres.

Restoring a pg_dumpall to a local server from a saved file.

psql postgres -f savedfile.sql

Restoring a pg_dumpall to a remote server from a saved file.

psql -h host -p port postgres -f savedfile.sql

Restoring a pg_dumpall to a remote server from the source server.

pg_dumpall | psql -h host -p port postgres

Restoring from a pg_dumpall from a remote server to a remote server.

pg_dumpall -h src_host -p src_port | psql -h target_host -p target_port postgres

Restoring a pg_dump of a specific database from a saved file.

psql dbname -f savedfile.sql

Restoring a pg_dump of a specific database to a remote server from a saved file.

psql -h host -p port dbname -f savedfile.sql

Restoring a pg_dump with a different owner on the target.

Sometimes you don’t have access to the users and credentials on a source database or want them to be different on your target/restored database. Follow these steps to achieve this.

  1. Perform your pg_dump command as noted previously but add the –no-owner option.
  2. Perform the restore as noted above but run the commands as the new owner. 

pg_dump -d database –no-owner > savedfile.sql

psql -U newowner dbname -f savedfile.sql

Remember for remote servers as noted in the other examples, use the -h host -p port and any other connection string option needed.

If the user’s credentials are different and you are prompted for passwords,  read the prerequisites section of this blog.

Let’s Get Physical with pg_baseback

A common way of performing physical backups in Postgres is with the use of pg_basebackup. This tool allows us to generate a physical backup with the necessary WAL files needed to restore or stand up a stand-alone instance.

There are many flags and options for this tool including compression but for the sake of this blog, we will focus on the basic use of pg_basebackup with minimal options.

For the purpose of this document, we will cover physical backups using the native pg_basebackup tool.

NOTE: Typically, one specifies the destination path for the physical backup. This is noted with the -D option of pg_basebackup.

Saving the backup to destination path

pg_basebackup -D /destination/path -Pv –checkpoint=fast

Sending the backup as tar files to the directory path specified

pg_basebackup -D /destination/path -Pv –checkpoint=fast -F t

The above will generate two tar files. A base.tar and a pg_wal.tar

Create a Physical Backup From a Remote Instance

Make sure you have set up the prerequisites as explained here

The only difference between remote and local execution is that for remote, we specify a source server with the -h remote_host and the port postgres is running on with the -p remote_port  

pg_basebackup -h host -p port -D /destination/path -Pv –checkpoint=fast

If the user executing pg_basebackup is not trusted directly from the server executing the pg_basebackup, add the additional option of -U username. For example …

pg_basebackup -U postgres -h host -p port -D /destination/path -Pv –checkpoint=fast

Stand up a Local Instance of Postgres using pg_basebackup

Tar file method

If you execute the pg_baseback with the tar file option, it will generate two tar files. A base.tar and a pg_wal.tar 

Extract the base.tar. If you do not have different WAL files to restore, extract the pg_wal.tar and place the wal segment file in the pg_wal directory.

Directory method

Make sure the directory where the new cluster will be located exists with the proper permissions and storage capacity. Remember, this will consume the same amount of space as the source database.

Define where the target database will reside.

  • mkdir -p /destination/path
  • chmod 700 /destination/path
  • chown postgres:postgres  /destination/path

As user postgres, run the following command assuming pg_basebackup is in your path.

Source database is local

pg_basebackup -D /destination/path-Pv –checkpoint=fast -X stream

Source database is on a remote server

pg_basebackup -h host -p port -D /destination/path-Pv –checkpoint=fast -X stream

What does the above do?

  1. Assumes postgres is running on the localhost using the default port of 5432 and the user executing it has the necessary privs to do so.
  2. initiate a pg_basebackup of the current and running instance of postgres.
  3. Save the copy to the path specified after the -D 
  4. Optionally, the -Pv will show the progress and verbose output of the process.
  5. Perform a fast checkpoint rather than spreading it out. Makes the backup start sooner.
  6. Stream the WAL changes that are happening on the running cluster and save them in the new cluster. This will allow for starting the new cluster without additional WALs.

The above applies to whether the database is remote or not.

Starting the separate instance of postgres

When the pg_basebackup completes, to start up the new local instance, go into the new data directory /destination/path modify the postgresql.conf file or whatever file you may have defined your previous port in..

  • Set the port to a number not in use such as 5433. I.e  port = 5433
  • Modify any memory parameters necessary
  • Make sure, if archiving is enabled, it archives to a different location than the original cluster.

You can then proceed to start the new instance of postgres as follows:

pg_ctl -D /destination/path -o “-p 5433” start

You should now be able to connect to the new cluster with the exact credentials as the source cluster with 

psql -p 5433

Stand up a remote cluster

This process is pretty much identical to the local cluster process above. The only difference is you will specify a host and credentials.

From the remote target host 

pg_basebackup -h source_server -p port -U username -D /destination/path  -Pv –checkpoint=fast -X stream

As you can see, we are simply adding a connection string to the original command we ran for the local copy. This will generate the backup on the remote host and save it to the local destination path.

Once the copy is placed on the target host, if necessary, change your port and archive location if archiving is enabled as mentioned above.

Last words

The above examples are meant to get you started with basic backups and restores. They do not cover more advanced options such as archiving of wal files, point in time recovery, etc … This will be addressed in a future blog or by simply searching online.  Furthermore, using backups to stand up replicas will also be addressed in future blog postings.

Apr
02
2021
--

Percona XtraBackup Point-In-Time Recovery for the Single Database

Percona XtraBackup Point-In-Time Recovery

Percona XtraBackup Point-In-Time RecoveryRecovering to a particular time in the past is called Point-In-Time Recovery (PITR). With PITR you can rollback unwanted DELETE without WHERE clause or any other harmful command.

PITR with Percona XtraBackup is pretty straightforward and perfectly described in the user manual. You need to restore the data from the backup, then apply all binary logs created or updated after the backup was taken, but skip harmful event(s).

However, if your data set is large you may want to recover only the affected database or table. This is possible but you need to be smart when filtering events from the binary log. In this post, I will show how to perform such a partial recovery using Percona XtraBackup, mysql command-line client, and mysqlbinlog programs only. There is an alternative approach that involves creating a fake source server, that is described in MySQL Point in Time Recovery the Right Way. You may consider it, especially if you need to apply changes to a single table.

Percona XtraBackup Point-In-Time Recovery

For our example we will create data first, then run DROP and DELETE commands on two different tables. Then we will rollback these commands.

First, let’s assume we have a server with two databases: test and sbtest. We are using GTIDs and row-based binary log format. We also run the server with the option innodb_file_per_table=1 and all our InnoDB tables use individual tablespaces. Otherwise, the individual restore method would not work.

mysql> show tables from sbtest;
+------------------+
| Tables_in_sbtest |
+------------------+
| sbtest1          |
| sbtest2          |
| sbtest3          |
| sbtest4          |
| sbtest5          |
| sbtest6          |
| sbtest7          |
| sbtest8          |
+------------------+
8 rows in set (0.00 sec)

mysql> show tables from test;
+----------------+
| Tables_in_test |
+----------------+
| bar            |
| baz            |
| foo            |
+----------------+
3 rows in set (0.00 sec)

We will experiment with tables foo and bar. We assume that at the time of our first backup, each of the tables contained five rows. Tables in the database sbtest also contain data, but it does not really matter for our experiment.

mysql> select count(*) from foo;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from bar;
+----------+
| count(*) |
+----------+
| 5        |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from baz;
+----------+
| count(*) |
+----------+
| 0        |
+----------+
1 row in set (0.00 sec)

Since we want to restore individual tables, we need to make a preparation before taking a backup: store database structure. We will do it with help of the mysqldump command. In this example, I store structure per database to make partial PITR easier, but you are free to use the option --all-databases.

mysqldump --no-data --set-gtid-purged=OFF --triggers --routines --events test > test_structure.sql
mysqldump --no-data --set-gtid-purged=OFF --triggers --routines --events sbtest > sbtest_structure.sql

Then we are ready to take the backup.

xtrabackup --parallel=8 --target-dir=./full_backup --backup

I am using the option --parallel to speed up the backup process.

Now let’s do some testing. First, let’s update rows in the table foo.

mysql> update foo set f1=f1*2;
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5 Changed: 5 Warnings: 0

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
+----+------+
5 rows in set (0.00 sec)

And then drop it and delete all rows from the table bar.

mysql> drop table foo;
Query OK, 0 rows affected (0.02 sec)

mysql> delete from bar;
Query OK, 5 rows affected (0.01 sec)

Finally, let’s insert a few rows into the tables bar and baz.

mysql> insert into bar(f1) values(6),(7),(8),(9),(10);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> insert into baz(f1) values(1),(2),(3),(4),(5);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

Assume that the DROP TABLE and DELETE command was an accident and we want to restore the state of the tables foo and bar as they were before these unlucky statements.

First, we need to prepare the backup.

Since we are interested in restoring only tables in the database test we need to prepare the backup with a special option --export that exports tablespaces in a way that they could be later imported:

xtrabackup --prepare --export --target-dir=./full_backup

Now the directory for the database test contains not only table definition files (.frm, only before 8.0) and tablespace files (.ibd) but also configuration files (.cfg).

Since we want all changes that happened after backup and before the problematic DROP TABLE and DELETE statements were applied, we need to identify which binary log and position were actual at the backup time. We can find it in the xtrabackup_binlog_info file:

$ cat full_backup/xtrabackup_binlog_info
master-bin.000004 1601 0ec00eed-87f3-11eb-acd9-98af65266957:1-56

Now we are ready to perform restore.

First, let’s restore the table foo from the backup. Restoring individual tablespaces requires the ALTER TABLE ... IMPORT TABLESPACE command. This command assumes that the table exists in the server. However, in our case, it was dropped and therefore we need to re-create it.

We will recreate the full database test from the file test_structure.sql

Since we do not want these administrative tasks to be re-applied, I suggest disabling binary logging for the session which will recreate the database structure.

mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

mysql> source test_structure.sql
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)
....

Once tables are recreated discard their tablespaces. I will show an example for the table foo. Adjust the code for the rest of the tables.

mysql> alter table foo discard tablespace;
Query OK, 0 rows affected (0.01 sec)

Then, in another terminal, copy the tablespace and configuration files from the backup to the database directory:

cp full_backup/test/foo.{ibd,cfg} var/mysqld.1/data/test/

And, finally, import the tablespace:

mysql> alter table foo import tablespace;
Query OK, 0 rows affected (0.05 sec)

Repeat for the other tables in the database test.

Now you can enable binary logging back.

You can do the same task in a script. For example:

for table in `mysql test --skip-column-names --silent -e "show tables"`
> do
>   mysql test -e "set sql_log_bin=0; alter table $table discard tablespace"
>   cp full_backup/test/$table.{ibd,cfg} var/mysqld.1/data/test/
>   mysql test -e "set sql_log_bin=0; alter table $table import tablespace"
> done

Our tables are recovered but do not have the updates made after the backup.

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
+----+------+
5 rows in set (0.00 sec)

mysql> select * from bar;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
+----+------+
5 rows in set (0.00 sec)

mysql> select * from baz;
Empty set (0.00 sec)

Therefore, we need to restore data from the binary logs.

To do it we first need to identify the GTID of the disaster event. It can be done if we dump all binary logs updated and created after backup into a dump file and then search for the DROP TABLE and DELETE statements and skipping them.

First, let’s check which binary logs do we have.

mysql> show binary logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| master-bin.000001 |   1527476 |
| master-bin.000002 |      3035 |
| master-bin.000003 |      1987 |
| master-bin.000004 |      2466 |
| master-bin.000005 |       784 |
+-------------------+-----------+
5 rows in set (0.00 sec)

So we need to parse them, starting from the log master-bin.000004 and position 1601:

mysqlbinlog --start-position=1601 -vvv --base64-output=decode-rows --database=test master-bin.000004 master-bin.000005 > binlog_test.sql

I used options -vvv that prints SQL representation of row events, so we can find the one which we want to skip and --base64-output=decode-rows to not print row events at all. We will not use this file for the restore, only for searching the DROP TABLE and DELETE events.

Here they are, at the positions 2007 and 2123, with GTID 0ec00eed-87f3-11eb-acd9-98af65266957:58 and 0ec00eed-87f3-11eb-acd9-98af65266957:59

# at 2007
#210321 13:29:58 server id 1 end_log_pos 2123 CRC32 0xd1eb9854 Query thread_id=138 exec_time=0 error_code=0
use `test`/*!*/;
SET TIMESTAMP=1616322598/*!*/;
DROP TABLE `foo` /* generated by server */
/*!*/;
# at 2123
#210321 13:30:08 server id 1 end_log_pos 2188 CRC32 0xfc9b2088 GTID last_committed=7 sequence_number=8 rbr_only=yes original_committed_timestamp=0 immediate_commit_timestamp=0 transaction_length=0
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=0 (1970-01-01 02:00:00.000000 EET)
# immediate_commit_timestamp=0 (1970-01-01 02:00:00.000000 EET)
/*!80001 SET @@session.original_commit_timestamp=0*//*!*/;
/*!80014 SET @@session.original_server_version=0*//*!*/;
/*!80014 SET @@session.immediate_server_version=0*//*!*/;
SET @@SESSION.GTID_NEXT= '0ec00eed-87f3-11eb-acd9-98af65266957:59'/*!*/;
# at 2188
#210321 13:30:08 server id 1 end_log_pos 2260 CRC32 0x1d525b11 Query thread_id=138 exec_time=0 error_code=0
SET TIMESTAMP=1616322608/*!*/;
BEGIN
/*!*/;
# at 2260
#210321 13:30:08 server id 1 end_log_pos 2307 CRC32 0xb57ecb73 Table_map: `test`.`bar` mapped to number 226
# at 2307
#210321 13:30:08 server id 1 end_log_pos 2387 CRC32 0x6770a7e2 Delete_rows: table id 226 flags: STMT_END_F
### DELETE FROM `test`.`bar`
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2=1 /* INT meta=0 nullable=1 is_null=0 */
### DELETE FROM `test`.`bar`
### WHERE
...

Note that decoded row event contains a DELETE command for each affected row.

We may also find to which binary log this event belongs if search for the "Rotate to" event. In our case “Rotate to master-bin.000005” happened after the found positions, so we only need file master-bin.000004 In your case, you may need to skip events from the previous log files too.

So to restore the data we need to run mysqlbinlog one more time, this time with parameters:

mysqlbinlog  --start-position=1601 --exclude-gtids=0ec00eed-87f3-11eb-acd9-98af65266957:58-59 --database=test --skip-gtids=true master-bin.000004 master-bin.000005 > binlog_restore.sql

I removed options -vvvbecause we are not going to examine this restore file and option --base64-output=decode-rows because we need row events to present in the resulting file. I also used option --exclude-gtids=0ec00eed-87f3-11eb-acd9-98af65266957:58-59 to exclude GTIDs that we do not want to re-apply. We also need to use --skip-gtids=true because otherwise updates will be skipped since such GTIDs already exist on the server.

Now binlog_restore.sql contains all updates to the database test made after the backup and before the DROP statement. Let’s restore it.

mysql test < binlog_restore.sql

Restore went successfully. Our tables have all past updates.

mysql> select * from foo;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
+----+------+
5 rows in set (0.01 sec)

mysql> select count(*) from bar;
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from baz;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

Conclusion

You may save the time required for PITR if use the per-database restore method. However, you need to take into account the following considerations:

  • mysqlbinlog does not support filtering per table, therefore you either need to restore the full database or use a fake server method, described in MySQL Point in Time Recovery the Right Way.
  • Per-database filters depend on the USE statement in the statement-based binary log format. Therefore option --database can only be considered safe with a row-based format.
  • If you do not use GTID you still can use this method. You will need to combine options --start-position and --stop-position to skip the event.

Percona XtraBackup is a free, open source database backup solution for Percona Server for MySQL and MySQL.

Apr
18
2018
--

Restore a MongoDB Logical Backup

MongoDB Logical Backup

MongoDB Logical BackupIn this article, we will explain how to restore a MongoDB logical backup performed via ‘mongodump’ to a mongod instance.

MongoDB logical backup requires the use of the ‘mongorestore‘ tool to perform the restore backup. This article focuses on this tool and process.

Note: Percona develops a backup tool named Percona-Lab/mongodb-consistent-backup, which is a wrapper for ‘mongodump‘, adding cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the exact same steps as a regular ‘mongodump’ backup – no special steps!

Mongorestore Command Flags

–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017). If authorization is enabled, add –user/–password flags also.

–drop

This is almost always required. This causes ‘mongodump‘ to drop the collection that is being restored before restoring it. Without this flag, the documents from the backup are inserted one at a time and if they already exist the restore fails.

–oplogReplay

This is almost always required. Replays the oplog that was dumped by mongodump. It is best to include this flag on replset-based backups unless there is a specific reason not to. You can tell if the backup was from a replset by looking for the file ‘oplog.bson‘ at the base of the dump directory.

–dir

Required. The path to the mongodump data.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the restore. This is required if ‘mongodump‘ used the –gzip flag (look for *.bson.gz files if you’re not sure if the collection files have no .gz suffix, don’t use –gzip).

–numParallelCollections=<number>

Optional. For mongodump >= 3.2 only, sets the number of collections to insert in parallel. By default four threads are used, and if you have a large server and you want to restore faster (more resource usage though), you could increase this number. Note that each thread uncompresses bson if the ‘–gzip‘ flag is used, so consider this when raising this number.

Steps

  1. (Optional) If the backup is archived (mongodb_consistent_backup defaults to creating tar archives), un-archive the backup so that ‘mongorestore‘ can access the .bson/.bson.gz files:
    $ tar -C /opt/mongodb/backup/testbackup/20160809_1306 -xvf /opt/mongodb/backup/testbackup/20160809_1306/test1.tar
    test1/
    test1/dump/
    test1/dump/wikipedia/
    test1/dump/wikipedia/pages.metadata.json.gz
    test1/dump/wikipedia/pages.bson.gz
    test1/dump/oplog.bson

    ** This command un-tars the backup to ‘/opt/mongodb/backup/testbackup/20160809_1306/test1/dump’ **

  2. Check (and then check again!) that you’re restoring the right backup to the right host. When in doubt, it is safer to ask the customer or others.
    1. The Percona ‘mongodb_consistent_backup‘ tool names backup subdirectories by replica set name, so you can ensure you’re restoring the right backup by checking the replica set name of the node you’re restoring to, if it exists.
    2. If you’re restoring to a replica set you will need to restore to the PRIMARY member and there needs to be a majority (so writes are accepted – some exceptions if you override write-concern, but not advised).
  3. Use ‘mongorestore‘ to restore the data by dropping/restoring each collection (–drop flag) and replay the oplog changes (–oplogReplay flag), specifying the restore dir explicitly (–dir flag) to the ‘mongorestore‘ command. In this example I also used authorization (–user/–password flags) and un-compression (–gzip flag):
    $ mongorestore --drop --host localhost --port 27017 --user secret --password secret --oplogReplay --gzip --dir /opt/mongodb/backup/testbackup/20160809_1306/test1/dump
    2016-08-09T14:23:04.057+0200    building a list of dbs and collections to restore from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump dir
    2016-08-09T14:23:04.065+0200    reading metadata for wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.metadata.json.gz
    2016-08-09T14:23:04.067+0200    restoring wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.bson.gz
    2016-08-09T14:23:07.058+0200    [#######.................]  wikipedia.pages  63.9 MB/199.0 MB  (32.1%)
    2016-08-09T14:23:10.058+0200    [###############.........]  wikipedia.pages  127.7 MB/199.0 MB  (64.1%)
    2016-08-09T14:23:13.060+0200    [###################.....]  wikipedia.pages  160.4 MB/199.0 MB  (80.6%)
    2016-08-09T14:23:16.059+0200    [#######################.]  wikipedia.pages  191.5 MB/199.0 MB  (96.2%)
    2016-08-09T14:23:19.071+0200    [########################]  wikipedia.pages  223.5 MB/199.0 MB  (112.3%)
    2016-08-09T14:23:22.062+0200    [########################]  wikipedia.pages  255.6 MB/199.0 MB  (128.4%)
    2016-08-09T14:23:25.067+0200    [########################]  wikipedia.pages  271.4 MB/199.0 MB  (136.4%)
    ...
    ...
    2016-08-09T14:24:19.058+0200    [########################]  wikipedia.pages  526.9 MB/199.0 MB  (264.7%)
    2016-08-09T14:24:22.058+0200    [########################]  wikipedia.pages  558.9 MB/199.0 MB  (280.8%)
    2016-08-09T14:24:23.521+0200    [########################]  wikipedia.pages  560.6 MB/199.0 MB  (281.6%)
    2016-08-09T14:24:23.522+0200    restoring indexes for collection wikipedia.pages from metadata
    2016-08-09T14:24:23.528+0200    finished restoring wikipedia.pages (32725 documents)
    2016-08-09T14:24:23.528+0200    replaying oplog
    2016-08-09T14:24:23.597+0200    done
    1. If you encounter problems with ‘mongorestore‘, carefully read the error message or rerun with several ‘-v‘ flags, e.g.: ‘-vvv‘. Once you have an error, attempt to troubleshoot the cause.
  4. Check to see that you saw “replaying oplog” and “done” after the restore (last two lines in the example). If you don’t see this, there is a problem.

As you notice, using this tool for MongoDB logical backup is very simple. However, when using sharding please note that –oplog is not available and the mongodump uses the primaries for each shard. As this is not advised typically in production, you might consider looking at Percona-Lab/mongodb-consistent-backup to ensure you are consistent and hitting secondary nodes, like mongodump with replica sets, will work.

If MongoDB and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Restore a MongoDB Logical Backup appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com