Jun
15
2022
--

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDB

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDB

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDBPercona Backup for MongoDB (PBM) is a distributed backup and restore tool for sharded and non-sharded clusters. In 1.8.0, we added the replset-remapping functionality that allows you to restore data on a new compatible cluster topology.

The new environment can have different replset names and/or serve on different hosts and ports. PBM handles this hard work for you. Making such migration indistinguishable from the usual restore. In this blog post, I’ll show you how to migrate to a new cluster practically.

The Problem

Usually to change a cluster topology you do lots of manual steps. PBM reduces the process.

Let’s have a look at a case where we will have an initial cluster and a desired one.

Initial cluster:

configsrv: "configsrv/conf:27017"
shards:
  - "rs0/rs0:27017,rs1:27017,rs2:27017"
  - "extra-shard/extra:27018"

The cluster consists of the configsrv configsvr replset with a single node and two shards: rs0 (3 nodes in the replset) and extra-shard (1 node in the replset). The names, hosts, and ports are not conventional across the cluster but we will resolve this.

Target cluster:

configsrv: "cfg/cfg0:27019"
shards:
  - "rs0/rs00:27018,rs01:27018,rs02:27018"
  - "rs1/rs10:27018,rs11:27018,rs12:27018"
  - "rs2/rs20:27018,rs21:27018,rs22:27018"

Here we have the cfg configsvr replset with a single node and 3 shards rs0rs2 where each shard is 3-nodes replset.

Think about how you can do this.

With PBM, all that we need is deployed cluster and logical backup made with PBM 1.5.0 or later. The following simple command will do the rest:

pbm restore $BACKUP_NAME --replset-remapping "cfg=configsrv,rs1=extra-shard"

Migration in Action

Let me show you how it looks in practice. I’ll provide details at the end of the post. In the repo, you can find all configs, scripts, and output used here.

As mentioned above, we need a backup. For this, we will deploy a cluster, seed data, and then make the backup.

Deploying the initial cluster

$> initial/deploy >initial/deploy.out
$> docker compose -f "initial/compose.yaml" exec pbm-conf \
     pbm status -s cluster
 
Cluster:
========
configsvr:
  - configsvr/conf:27019: pbm-agent v1.8.0 OK
rs0:
  - rs0/rs00:27017: pbm-agent v1.8.0 OK
  - rs0/rs01:27017: pbm-agent v1.8.0 OK
  - rs0/rs02:27017: pbm-agent v1.8.0 OK
extra-shard:
  - extra-shard/extra:27018: pbm-agent v1.8.0 OK

links: initial/deployinitial/deploy.out

The cluster is ready and we can add some data.

Seed data

We will insert the first 1000 numbers in a natural number sequence: 1 – 1000.

$> mongosh "mongo:27017/rsmap" --quiet --eval "
     for (let i = 1; i <= 1000; i++)
       db.coll.insertOne({ i })" >/dev/null

Getting the data state

These documents should be partitioned across all shards at insert time. Let’s see, in general, how. We will use thedbHash command on all shards to have the collections’ state. It will be useful for verification later.

We will also do a quick check on shards and mongos.

$> initial/dbhash >initial/dbhash.out && cat initial/dbhash.out
 
# rs00:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# extra:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs00:27017  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# extra:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# mongo:27017
[ 1000, true ]

links: initial/dbhashinitial/dbhash.out

All rs0 members have the same data. So secondaries replicate from primary correctly.

The quickcheck.js used in the initial/dbhash script describes our documents. It returns the number of documents and whether these documents make the natural number sequence.

We have data for the backup. Time to make the backup.

Making a backup

$> docker compose -f initial/compose.yaml exec pbm-conf bash
pbm-conf> pbm backup --wait
 
Starting backup '2022-06-15T08:18:44Z'....
Waiting for '2022-06-15T08:18:44Z' backup.......... done
 
pbm-conf> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [complete: 2022-06-15T08:18:49Z]

We have a backup. It’s enough for migration to the new cluster.

Let’s destroy the initial cluster and deploy the target environment. (Destroying the initial cluster is not a requirement. I just don’t want to waste resources on it.)

Deploying the target cluster

pbm-conf> exit
$> docker compose -f initial/compose.yaml down -v >/dev/null
$> target/deploy >target/deploy.out

links: target/deploy, target/deploy.out

Let’s check the PBM status.

PBM Status

$> docker compose -f target/compose.yaml exec pbm-cfg0 bash
pbm-cfg0> pbm config --force-resync  # ensure agents sync from storage
 
Storage resync started
 
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [incompatible: Backup doesn't match current cluster topology - it has different replica set names. Extra shards in the backup will cause this, for a simple example. The extra/unknown replica set names found in the backup are: extra-shard, configsvr. Backup has no data for the config server or sole replicaset] [2022-06-15T08:18:49Z]

As expected, it is incompatible with the new deployment.

See how to make it work

Resolving PBM Status

pbm-cfg0> export PBM_REPLSET_REMAPPING="cfg=configsvr,rs1=extra-shard"
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [complete: 2022-06-15T08:18:49Z]

Nice. Now we can restore.

Restoring

pbm-cfg0> pbm restore '2022-06-15T08:18:44Z' --wait
 
Starting restore from '2022-06-15T08:18:44Z'....Started logical restore.
Waiting to finish.....Restore successfully finished!

The –wait flag blocks the shell session till the restore completes. You could not wait but check it later.

pbm-cfg0> pbm list --restore
 
Restores history:
  2022-06-15T08:18:44Z

Everything is going well so far. Almost done

Let’s verify the data.

Data verification

pbm-cfg0> exit
$> target/dbhash >target/dbhash.out && cat target/dbhash.out
 
# rs00:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs10:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs11:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs12:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs20:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs21:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs22:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs00:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# rs10:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# rs20:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ ]
# mongo:27017
[ 1000, true ]

links: target/dbhash, target/dbhash.out

As you can see, the rs2 shard is empty. The other two have the identical dbHash and the quickcheck results as in the initial cluster. I think balancer can tell something about this

Balancer status

$> mongosh "mongo:27017" --quiet --eval "sh.balancerCollectionStatus('rsmap.coll')"
 
{
  balancerCompliant: false,
  firstComplianceViolation: 'chunksImbalance',
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1655281436, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1655281436, i: 1 })
}

We know what to do. Starting balancer and checking status again.

$> mongosh "mongo:27017" --quiet --eval "sh.startBalancer().ok"

1
 
$> mongosh "mongo:27017" --quiet --eval "sh.balancerCollectionStatus('rsmap.coll')"
 
{
  balancerCompliant: true,
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1655281457, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1655281457, i: 1 })
}
 
$> target/dbhash >target/dbhash-2.out && cat target/dbhash-2.out

# rs00:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs10:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs11:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs12:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs20:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs21:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs22:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs00:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# rs10:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# rs20:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 229, false ]
# mongo:27017
[ 1000, true ]

links: target/dbhash-2.out

Interesting. rs2 shard has some data. However, rs1 and rs2 haven’t changed. It’s expected that mongos moves some chunks to rs2 and updates the router config. Physically deletion of chunks on a shard is a separate step. That’s why querying data directly on a shard is inaccurate. The data could disappear at any time. The cursor returns all available documents in a replset at the moment despite the router config.

Anyway, we shouldn’t care about it anymore. It is mongos/mongod responsibility now to update router config, query right shards, and remove moved chunks from shards by demand. In the end, we have valid data through mongos.

That’s it.

But wait, we didn’t make a backup! Never forget to make another solid backup.

Making a new backup

Better to change the storage so that we will have backups for the new deployment in a different place and will not see errors about incompatible backups from the initial cluster further.

$> pbm config --file "$NEW_PBM_CONFIG" >/dev/null
$> pbm config --force-resync >/dev/null
$> pbm backup -w >/dev/null
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:25:44Z 165.34KB <logical> [complete: 2022-06-15T08:25:49Z]

Now we’re done. And can sleep better.

One More Thing: Possible Misconfiguration

Let’s review another imaginal case to explain all possible errors.

Initial cluster: cfg, rs0, rs1, rs2, rs3, rs4, rs5

Target cluster: cfg, rs0, rs1, rs2, rs3, rs4, rs6

If we apply remapping:rs0=rs0,rs1=rs2,rs2=rs1,rs3=rs4, we will get error like “missed replsets: rs3, rs5. And nothing about rs6.

The missed rs5 should be obvious: backup topology has rs5 replset, but it is missed on target. And target rs6 does not have data to restore from. Adding rs6=rs5 fixes this.

But the missed rs3 could be confusing. Let’s visualize:

init | curr
-----+-----
cfg     cfg  # unchanged
rs0 --> rs0  # mapped. unchanged
rs1 --> rs2
rs2 --> rs1
rs3 -->      # err: no shard
rs4 --> rs3
     -> rs4  # ok: no data
rs5 -->      # err: no shard
     -> rs6  # ok: no data

When we remap the backup from rs4 to rs3, the target rs3 is reserved. The rs3 in the backup does not have a target replset now. Just remapping rs3 to available rs4 will fix it too.

This reservation avoids data duplication. That’s why we use the quick check via mongos.

Details

Compatible topology

Simply speaking, compatible topology is equal to or has a larger number of shards in the target deployment. In our example, we had initial 2 shards but restored them to 3 shards. PBM restored data on two shards only. MongoDB can distribute it with the remaining shards later when the balancer is enabled (sh.startBalancer()). The number of replset members does not matter because PBM takes backup from a member (per replset) and restores it to primary only. Other data-bearing members replicate data from the primary. So you could make a backup from a multi-members replset and then restore it to a single member replset.

You cannot restore to a different replset type like from shardsvr to configsvr.

Preconfigured environment

The cluster should be deployed with all shards added. Users and permissions should be added and assigned in advance. PBM agents should be configured to the same storage and be accessible to it from the new cluster.

Note: PBM agents store backup metadata on storage and keep the cache in MongoDB. pbm config –force-resync lets you refresh the cache from the storage. Do it on a new cluster right after deployment to see backups/oplog chunks made from the initial cluster.

Understanding replset remapping

You can remap replset names by the –replset-remapping flag or PBM_REPLSET_REMAPPING environment variable. If both sets, the flag has precedence.

For full restore, point-in-time recovery, and oplog replay, PBM CLI sends the mapping as a parameter in the command. Each command gets a separate explicit mapping (or none). It can be done only by CLI. Agents do not use the environment variable nor have the flag.

pbm status and pbm list use the flag/envvar to remap replsets in backups/oplog metadata and apply this mapping to the current deployment to show them properly. If backup and present replset names do not match, pbm list will not show these backups, and pbm status prints an error with missed replset names.

Restoring with remapping works with logical backups only.

How does PBM do this?

During restore, PBM reviews current topology and assigns members’ snapshots and oplog chunks to each shard/replset by name, respectively. The remapping changes the default assignment.

After the restore is done, PBM agents sync the router config to make the restored data “native” to this cluster.

Behind the scene

The config.shards collection describes the current topology. PBM uses it to know where and what to restore. The collection is not modified by PBM. But restored data contains some other router configurations for initial topology.

We updated two collections to replace old shard names with new ones in restored data:

  • config.databases – primary shard for non-sharded databases
  • config.chunks – shards where chunks are

After this, MongoDB knows where databases, collections, and chunks are in the new cluster.

CONCLUSION

Migration of a cluster requires much attention, knowledge, and calm. The replset-remapping functionality in Percona Backup for MongoDB reduces complexity during migration between two different environments. I would say, it is near to a routine job now.

Have a nice day ?

May
27
2022
--

Physical Backup Support in Percona Backup for MongoDB

Physical Backup Support in Percona Backup for MongoDB

Percona Backup for MongoDB (PBM) is a backup utility custom-built by Percona to help solve the needs of customers and users who don’t want to pay for proprietary software like MongoDB Enterprise and Ops Manager but want a fully-supported open-source backup tool that can perform cluster-wide consistent backups in MongoDB.

Version 1.7.0 of PBM was released in April 2022 and comes with a technical preview of physical backup functionality. This functionality enables users to benefit from the reduced recovery time. 

With logical backups, you extract the data from the database and store it in some binary or text format, and for recovery, you write all this data back to the database. For huge data sets, it is a time-consuming operation and might take hours and days. Physical backups take all your files which belong to the database from the disk itself, and recovery is just putting these files back. Such recovery is much faster as it does not depend on the database performance at all.

In this blog post, you will learn about the architecture of the physical backups feature in PBM, see how fast it is compared to logical backups, and try it out yourself.

Tech Peek

Architecture Review

In general, physical backup means a copy of a database’s physical files. In the case of Percona Server for MongoDB (PSMDB) these are WiredTiger

*.wt

  Btree, config, metadata, and journal files you can usually find in

/data/db

.  The trick is to copy all those files without stopping the cluster and interrupting running operations. And to be sure that data in files is consistent and no data will be changed during copying. Another challenge is to achieve consistency in a sharded cluster. In other words, how to be sure that we are gonna be able to restore data to the same cluster time across all shards.

PBM’s physical backups are based on the backupCursors feature of PSMDB (PSMDB). This implies that to use this feature, you should use Percona Server for MongoDB.

Backup

On each replica set, PBM uses

$backupCursor

to retrieve a list of files that need to be copied to achieve backup. Having that list next step is to ensure cluster-wide consistency. For that, each replica set posts a cluster time of the latest observed operation. The backup leader picks the most recent one. This will be the common backup timestamp (recovery timestamp) saved as the

last_write_ts

in the backup metadata. After agreeing on the backup time, the

pbm-agent

on each cluster opens a

$backupCursorExtend

. The cursor will return its result only after the node reaches the given timestamp. Thus the returned list of logs (journals) will contain the “common backup timestamp”. At that point, we have a list of all files that have to be in the backup. So each node copies them to the storage, saves metadata, closes cursors, and calls it a backup. Here is a blog post explaining Backup Cursors in great detail.

Of course, PBM does a lot more behind the scenes starting from electing appropriate nodes for the backup, coordinating operations across the cluster, logging, error handling, and many more. But all these subjects are for other posts.

Backup’s Recovery Timestamp

Restoring any backup PBM returns the cluster to some particular point in time. Here we’re talking about the time, not in terms of wall time but MongoDB’s cluster-wide logical clock. So the point that point-in-time is consistent across all nodes and replica sets in a cluster. In the case of logical or physical backup, that time is reflected in the

complete

section of

pbm list

  of

pbm status

  outputs. E.g.:

    2022-04-19T15:36:14Z 22.29GB <physical> [complete: 2022-04-19T15:36:16]
    2022-04-19T14:48:40Z 10.03GB <logical> [complete: 2022-04-19T14:58:38]

This time is not the time when a backup has finished, but the time at which cluster state was captured (hence the time the cluster will be returned to after the restore). In PBM’s logical backups, the recovery timestamp tends to be closer to the backup finish. To define it, PBM has to wait until the snapshot on all replica sets finishes. And then it starts oplog capturing from the backup start up to that time. Doing physical backups, PBM would pick a recovery timestamp right after a backup start. Holding the backup cursor open guarantees the checkpoint data won’t change during the backup, and PBM can define complete-time right ahead.

Restore

There are a few considerations for restoration.

First of all, files in the backup may contain operations beyond the target time (

commonBackupTimestamp

). To deal with that, PBM uses a special function of the replication subsystem’s startup process to set the limit of the oplog being restored. It’s done by setting the

oplogTruncateAfterPoint

value in the local DB’s

replset.oplogTruncateAfterPoint

 collection.

Along with the

oplogTruncateAfterPoint

 database needs some other changes and clean-up before start. This requires a series of restarts of PSMDB in a standalone mode.

Which in turn brings some hassle to the PBM operation. To communicate and coordinate its work across all agents, PBM relies on PSMDB itself. But once a cluster is taken down, PBM has to switch to communication via storage. Also, during standalone runs, PBM is unable to store its logs in the database. Hence, at some point during restore,

pbm-agent

logs are being available only in agents’ stderr. And

pbm logs

 won’t have access to them. We’re planning to solve this problem by the physical backups GA.

Also, we had to decide on the restore strategy in a replica set. One way is to restore one node, then delete all data on the rest and let the PSMDB replication do the job. Although it’s a bit easier, it means until InitialSync finishes, the cluster will be of little use. Besides, logical replication at this stage almost neglects all the speed benefits (later on that) the physical restore brings to the table. So we went with the restoration of each node in a replica set. And making sure after the cluster starts, no node will spot any difference and won’t start ReSync.

As with the PBM’s logical backups, the physical once currently can be restored to the cluster with the same topology, meaning replica set names in the backup and the target cluster should match. Although it won’t be an issue for logical backups starting from the next PBM version. And later this feature will be extended to the physical backups as well. Along with that, the number of replica sets in the cluster could be more than those in the backup but not vice-versa. Meaning all data in the backup should be restored. 

Performance Review

We used the following setup:

  • Cluster: 3-node replica set. Each mongod+pbm-agent on Digital Ocean droplet: 16GB, 8vCPU (CPU optimized).
  • Storage: nyc3.digitaloceanspaces.com
  • Data: randomly generated, ~1MB documents

physical backup MongoDB

In general, a logical backup should be more beneficial on small databases (a few hundred megabytes). Since on such a scale, the extra overhead on top of data that physical files bring still makes a difference. Basically reading/writing only user data during logical backup means less data needs to be transferred over the network. But as the database grows, overhead on logical read(select) and mostly write(insert) became a bottleneck for the logical backups. As for the physical backup, the speed is almost always bounded only by the network bandwidth to/from remote storage. In our tests, restoration time from physical backups has linear dependency on the dataset size, whereas logical restoration time grows non-linearly. The more data you have, the longer it takes to replay all the data and rebuild indexes. For example, for a 600GB dataset physical restore took 5x less time compared to logical. 

But on a small DB size, the difference is neglectable – a couple of minutes. So the main benefit of logical backups lay beyond the performance. It’s flexibility. Logical backups allow partial backup/restore of the database (on the roadmap for PBM). You can choose particular databases and/or collections to work with.  As physical backups work directly with database storage-engine files, they operate in an all-or-nothing frame.

Hands-on

PBM Configuration

In order to start using PBM with PSMDB or MongoDB, install all the necessary packages according to the installation instructions. Please note that starting from version 1.7.0 the user running the

pbm-agent

 process should also have the read/write access to PSMDB data directory for the purpose of performing operations with datafiles during physical backup or restore. 

Considering the design, starting from 1.7.0 the default user for

pbm-agent

is changed from

pbm

to

mongod

. So unless PSMDB runs under a different user than

mongod

, no extra actions are required. Otherwise, please carefully re-check your configuration and provide the necessary permissions to ensure proper PBM functioning.

In addition, keep in mind that for using PBM physical backups, you should run Percona Server for MongoDB starting from versions 4.2.15-16 and 4.4.6-8 and higher – this is where hotBackups and backup cursors were introduced.

Creating a Backup

With the new PBM version, you can specify what type of backup you wish to make: physical or logical. By default when no type is selected, PBM makes a logical backup.

> pbm backup
Starting backup '2022-04-20T11:12:53Z'....
Backup '2022-04-20T11:12:53Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm backup -t physical
Starting backup '2022-04-20T12:34:06Z'....
Backup '2022-04-20T12:34:06Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm status -s cluster -s backups
Cluster:
========
rs0:
  - rs0/mongo1.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo2.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo3.perconatest.com:27017: pbm-agent v1.7.0 OK
Backups:
========
S3 us-east-1 s3://https://storage.googleapis.com/pbm-bucket
  Snapshots:
    2022-04-20T12:34:06Z 797.38KB <physical> [complete: 2022-04-20T12:34:09]
    2022-04-20T11:12:53Z 13.66KB <logical> [complete: 2022-04-20T11:12:58]

Point-in-Time Recovery

Point-in-Time Recovery is currently supported only for logical backups. It means that a logical backup snapshot is required for pbm-agent to start periodically saving consecutive slices of the oplog. You can still make a physical backup while PITR is enabled, it won’t break or change the oplog saving process. 

The restoration process to the specific point in time will also use a respective logical backup snapshot and oplog slices which will be replayed on top of the backup.

Checking the Logs

During physical backup, PBM logs are available via

pbm logs

command as well as for all other operations. 

> pbm logs -e backup/2022-04-20T12:34:06Z
2022-04-20T12:34:07Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup started
2022-04-20T12:34:12Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading files
2022-04-20T12:34:54Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading done
2022-04-20T12:34:56Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup finished

As for restore,

pbm logs

command doesn’t provide information about restore from a physical backup. It’s caused by peculiarities of the restore procedure and will be improved in the upcoming PBM versions. However,

pbm-agent

still saves log locally, so it’s possible to check information about restore process on each node: 

> sudo journalctl -u pbm-agent.service | grep restore
pbm-agent[12560]: 2022-04-20T19:37:56.000+0000 I [restore/2022-04-20T12:34:06Z] restore started
.......
pbm-agent[12560]: 2022-04-20T19:38:22.000+0000 I [restore/2022-04-20T12:34:06Z] copying backup data
.......
pbm-agent[12560]: 2022-04-20T19:38:39.000+0000 I [restore/2022-04-20T12:34:06Z] preparing data
.......
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished <nil>
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished successfully

Restoring from a Backup

The restore process from a physical backup is similar to a logical one but requires several extra steps after the restore is finished by PBM.

> pbm restore 2022-04-20T12:34:06Z
Starting restore from '2022-04-20T12:34:06Z'.....Restore of the snapshot from '2022-04-20T12:34:06Z' has started. Leader: mongo1.perconatest.com:27017/rs0

After starting the restore process, pbm CLI returns the leader node ID, so it’s possible to track the restore progress by checking logs of the pbm-agent leader. In addition, status is written to the metadata file created on the remote storage. The status file is created in the root of the storage path and has the format

.pbm.restore/<restore_timestamp>.json

. As an option it’s also possible to pass

-w

flag during restore which will block the current shell session and wait for the restore to finish:

> pbm restore 2022-04-20T12:34:06Z -w
Starting restore from '2022-04-20T12:34:06Z'....Started physical restore. Leader: mongo2.perconatest.com:27017/rs0
Waiting to finish...........................Restore successfully finished!

After the restore is complete, it’s required to perform the following steps:

  • Restart all
    mongod

    (and

    mongos

     if present) nodes

  • Restart all pbm-agents
  • Run the following command to resync the backup list with the storage:

    $ pbm config --force-resync

Conclusion

MongoDB allows users to store enormous amounts of data. Especially if we talk about sharded clusters, where users are not limited by a single storage volume size limit. Database administrators often have to implement various home-grown solutions to ensure timely backups and restores of such big clusters. The usual approach is a storage-level snapshot. Such solutions do not guarantee data consistency and provide false confidence that data is safe.

Percona Backup for MongoDB with physical backup and restore capabilities enable users to backup and restore data fast and at the same time comes with data-consistency guarantees. 

Physical Backup functionality is in the Technical Preview stage. We encourage you to read more about it in our documentation and try it out. In case you face any issues feel free to contact us on the forum or raise the JIRA issue.

May
19
2021
--

Refreshing Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDBThis is a very straightforward article written with the intention to show you how easy it is to refresh your Test/Dev environments with PROD data, using Percona Backup for MongoDB (PBM). This article will cover all the steps from the PBM configuration until the restore, assuming that the PBM agents are all up and running on all the replica set members of either PROD and Dev/Test servers.

Taking the Backup on PROD

This step is quite simple and it demands no more than two commands:

1. Configuring the Backup

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm config --file /etc/pbm/pbm-s3.yaml
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpPROD
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

Backup list resync from the store has started

Important note on two things: I will address my backups to an S3 bucket and I am defining a prefix. When defining a prefix in the PBM storage configuration, a subdirectory will be automatically created and the backup files will be stored on that subdirectory instead of the root of the S3 bucket.

2. Taking the Backup

Having the PBM properly configured, it is time to take the backup. (You can skip this step if you already have PBM backups to use, of course.)

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm backup
Starting backup '2021-05-08T08:34:47Z'...................
Backup '2021-05-08T08:34:47Z' to remote store 's3://rafapbmtest/bpPROD' has started

And if we hit the PBM status command, we will see the snapshot running and when it is complete, the PBM status will show it as completed like below:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Configuring the PBM Space on a DEV/TEST Environment

All right, now my PROD has a proper backup routine configured. I will move one step forward and configure my PBM space but this time in a Dev/Test environment – named here as DEV.

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:50001/?replSetName=rbprepDEV?authSource=admin'

$ pbm config --file /etc/pbm/pbm-s3.yaml 
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpDEV
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

The backup list resync from the store has started.

Note that the S3 bucket is exactly the same where PROD is storing the backups but with a different prefix. If I hit a status command, I will see it is configured but no snapshots available yet:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
(none)

Lastly, note that the replica set name is exactly the same as PROD. If this was a sharded cluster, rather than a non-sharded replicaset, all the replica set names have to match in the target cluster. PBM is guided by the replica set name and if my DEV env had a different one, it would not be possible to load backup metadata from PROD to DEV

Transfering the Desired Backup Files

The next step will be transferring the backup files from the PROD prefix to the target prefix. I will use the AWS CLI to achieve that, but there is one important thing to keep in mind in advance: determining which files are referent to a certain backup set (snapshot). Let’s go back to the PBM status output taken in PROD previously:

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

The PBM snapshots are named with the timestamp from when the backup started. If we check at the S3 prefix where it is stored, we will see that the file’s names contain that timestamp in its name composition.

$ aws s3 ls s3://rafapbmtest/bpPROD/
2021-05-08 10:26:11          5 .pbm.init
2021-05-08 10:35:14       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:35:10      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:35:13        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

So, it will be easy now to know which file I have to copy.

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z.pbm.json

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.dump.s2

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.oplog.s2

Checking the DEV prefix:

$ aws s3 ls s3://rafapbmtest/bpDEV/
2021-05-08 10:43:59          5 .pbm.init
2021-05-08 10:52:02       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:52:13      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:52:24        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

The files are already there and PBM has already automatically loaded their metadata into the DEV PBM collections:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Finally – Restoring It

Believing it or not, now comes the easiest part: the restore. It is only one command and nothing else:

$ pbm restore '2021-05-08T08:34:47Z'
....Restore of the snapshot from '2021-05-08T08:34:47Z' has started

Refreshing Dev/Test environments with PROD data is a very common and required task in corporations worldwide. I hope this article helps to clarify the practical questions regarding using PBM for it!

May
13
2021
--

Percona Backup for MongoDB v1.5 Released

Percona Backup for MongoDB v1.5 Released

Percona Backup for MongoDB v1.5 ReleasedPercona Backup for MongoDB (PBM) has reached a new step with the release of version 1.5.0 today, May 13th, 2021.

Azure Blob Storage Support

Now you can use Azure Blob Storage as well as S3-compatible object stores.

Configuration example:

   storage:
     type: azure
     azure:
       account: <string>
       container: <string>
       prefix: <string>
       credentials:
         key: <your-access-key>

Preference Weight for Backup Source Nodes

Until now PBM would use a secondary as a backup source if there is one with a pbm-agent on it, otherwise, as a fall-back, it will use a primary.

There are, however, plenty of users who would like a certain mongod node, for example, those in a datacenter closer to the backup storage, to be the preferred mongod nodes to copy the data from. This is only a preference – if the user-preferred node is down then another one will be chosen.

Setting the priority section is entirely optional. If you don’t specify any preferences PBM will choose this way by default: Hidden secondaries are top preference (PBM-494), normal secondaries are next, primaries are last.

Configuration example of manually-chosen priorities:

   backup:
     priority:
       "mdb-c1.bc.net:28019": 2.5
       "mdb-s1n1.ad.net:27018": 2.5
       "mdb-s1n2.bc.net:27020": 2.0
       "mdb-s1n3.bc.net:27017": 0.1

The default preference weight is 1.0, so other nodes not explicitly listed above will have a priority above that of the “mdb-s1n3.bc.net:27017” example node above.

You can not set priority to <= 0 value as a way to ban a node. A banned mongod node might be the last healthy one at a given moment and the backup would fail to start, so a design decision to exclude the banning of nodes was made.

Important note: as soon as you begin specifying any node preference it is assumed you are taking full manual control. At this point the default rules, eg. to prefer secondaries to primaries, stop working.

Users and Roles Backup Method Change

Important notice for database administrators: The backup file format for v1.5 has an incompatible difference with v1.4 (or earlier) backups. v1.5 will not be able to restore backups of v1.4.x or earlier.

Restoring users and roles has some constraints. To prevent a collection drop of system.users or system.roles disrupting the pbm-agent <-> mongod connection, they are not re-inserted under their original collection name. Instead, they are inserted into a temporary location and the user and role records are copied one by one.

A catch with the point in time previously used to rename the collections to temporary names has interfered with a requirement regarding restoring collection UUIDs, which in turn blocked a fix of bug PBM-646.

Because PBM uses a single archive file for each full snapshot backup, there is no way to fix the embedded collection names in the backups for v1.4 before the restore process begins. This means there is no workaround that will allow v1.5 PBM to restore <= v1.4.x PBM backups.

The only workaround to restore <= 1.4.x backups after deploying v1.5 would be to roll back PBM executables (pbm and pbm-agent) to v1.4.1 just for the restore.

Bug Fixes

  • PBM-636: Restores of sharded collections needed to be fixed by setting the same UUID value. Thanks to Nikolay and Dmitry for reporting the issue.
  • PBM-646 – Stop the balancer during backup to make sure it doesn’t start running during restore.
  • PBM-642 – Display priority=0 members on agent list in PBM Status output.
May
10
2021
--

Point-in-Time Recovery for MongoDB on Kubernetes

point in time recovery mongodb

point in time recovery mongodbRunning MongoDB in Kubernetes with Percona Operator is not only simple but also by design provides a highly available MongoDB cluster suitable for mission-critical applications. In the latest 1.8.0 version, we add a feature that implements Point-in-time recovery (PITR). It allows users to store Oplog on an external S3 bucket and to perform recovery to a specific date and time once needed. The main value of this approach is a significantly lower Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

In this blog post, we will look into how we delivered this feature and review some architectural decisions.

Internals

For full backups and PITR features, the Operator relies on Percona Backup for MongoDB (PBM), which by design supports storing operations logs (oplogs) on S3-compatible storage. We run PBM as a sidecar container in replica sets Pods, including Config Server Pods. So each replica set has two containers from the very beginning – Percona Server for MongoDB (PSMDB) and Percona Backup for MongoDB.

While PBM is a great tool, it comes with some limitations that we needed to keep in mind when implementing the PITR feature.

One Bucket

If PITR is enabled, PBM stores backups on S3 storage in a chained mode: Oplogs are stored right after the full backup and require it. PBM stores metadata about the backups in the MongoDB cluster itself and creates a copy on S3 to maintain the full visibility of the state of backups and operation logs.

When a user wants to recover to a specific date and time, PBM figures out which full backup to use, recovers from it, and applies the oplogs.

If the user decides to use multiple S3 buckets to store backups, it means that oplogs are also scattered across these buckets. This complicates the recovery process because PBM only knows about the last S3 bucket used to store the full backup.

To simplify things and to avoid these split-brain situations with multiple buckets we made the following design decisions:

  • Do not enable the PITR feature in the user-specified multiple buckets in
    backup.storages

    section. This should cover most of the cases. We throw an error if the user tries that:

"msg":"Point-in-time recovery can be enabled only if one bucket is used in spec.backup.storages"

  • There are still cases where users can get into the situation with multiple buckets (ex. disable PITR and enable it again with another bucket).
    • That is why to recover from the backup we request the user to specify the
      backupName

      (

      psmdb-backup

      Custom Resource name) in the

      recover.yaml

      manifest. From this CR we get the storage and PBM fetches the oplogs which follow the full backup.

The obvious question is: why can’t the Operator handle the logic and somehow store metadata from multiple buckets?

There are several answers here:

  1. Bucket configurations can change during a cluster’s lifecycle and keeping all this data is possible, but the data may become obsolete over time. Also, our Operator is stateless and we want to keep it that way.
  2. We don’t want to bring this complexity into the Operator and are assessing the feasibility of adding this functionality into PBM instead (K8SPSMDB-460).

Full Backup Needed

We mentioned before that Oplogs require full backups. Without a full backup, PBM will not start uploading oplogs and the Operator will throw the following error:

"msg":"Point-in-time recovery will work only with full backup. Please create one manually or wait for scheduled backup to be created (if configured).

There are two cases when this can happen:

  1. User enables PITR for the cluster
  2. User recovers from backup

In this release, we decided not to create the full backup automatically, but leave it to the user or backup schedule. We might introduce the flag in the following releases which would allow users to configure this behavior, but for now, we decided that current primitives are enough to automate the full backup creation. 

10 Minutes RPO

Right now PBM uploads oplogs to the S3 bucket every 10 minutes. This time span is not configurable and hardcoded for now. What it means to the user is that a Recovery Point Objective (RPO) can be as much as ten minutes. 

This is going to be improved in the following releases of Percona Backup for MongoDB and captured in PBM-543 JIRA issue. Once it is there, the user would be able to control the period between Oplog uploads with

spec.backup.pitr.timeBetweenUploads

in

cr.yaml

.

Which Backups do I Have?

So the user has Full backups and PITR enabled. PBM has a nice feature that shows all the backups and Oplog (PITR) time frames:

$ pbm list

Backup snapshots:
     2020-12-10T12:19:10Z [complete: 2020-12-10T12:23:50]
     2020-12-14T10:44:44Z [complete: 2020-12-14T10:49:03]
     2020-12-14T14:26:20Z [complete: 2020-12-14T14:34:39]
     2020-12-17T16:46:59Z [complete: 2020-12-17T16:51:07]
PITR <on>:
     2020-12-14T14:26:40 - 2020-12-16T17:27:26
     2020-12-17T16:47:20 - 2020-12-17T16:57:55

But in Operator the user can see full backup details, but cannot see the Oplog information yet without going into the backup container manually:

$ kubectl get psmdb-backup backup2 -o yaml
…
status:
  completed: "2021-05-05T19:27:36Z"
  destination: "2021-05-05T19:27:11Z"
  lastTransition: "2021-05-05T19:27:36Z"
  pbmName: "2021-05-05T19:27:11Z"
  s3:
    bucket: my-bucket
    credentialsSecret: s3-secret
    endpointUrl: https://storage.googleapis.com
    region: us-central-1

The obvious idea is to somehow store this information in

psmdb-backup

Custom Resource but to do that we need to keep it updated. Updating hundreds of these objects all the time in a reconcile loop might result in pressure on the Operator and even Kubernetes API. We are still assessing different options here.

Conclusion

Point-in-time recovery is an important feature for Percona Operator for MongoDB as it reduces both RTO and RPO. The feature was present in PBM for some time already and was battle-tested in multiple production deployments. With Operator we want to reduce the manual burden to a minimum and automate day-2 operations as much as possible. Here is a quick summary of what is coming in the following releases of the Operator related to PITR:

  • Reduce RPO even more with configurable Oplogs upload period (PBM-543, K8SPSMDB-388)
  • Take full backup automatically if PITR is enabled (K8SPSMDB-460)
  • Provide users the visibility into available Oplogs time frames (K8SPSMDB-461)

Our roadmap is available publicly here and we would be curious to learn more about your ideas. If you are willing to contribute a good starting point would be CONTRIBUTING.md in our Github repository. It has all the details about how to contribute code, submit new ideas, and report a bug. A good place to ask questions is our Community Forum, where anyone can freely share their thoughts and suggestions regarding Percona software.

Nov
10
2020
--

Restore a Replica Set to a New Environment with Percona Backup for MongoDB

restore a backup MongoDB

restore a backup MongoDBPercona Backup for MongoDB (PBM) is our open source tool for backing up MongoDB clusters. Initially, the tool was developed for restoring the backups in the same environment they are taken. In this post, I will show you how to restore a backup to a new environment instead.

Let’s assume you followed the instructions to install Percona Backup for MongoDB packages on your newly provisioned replica set, and you already have at least one full backup of the source stored in remote backup storage.

Create the Backup User

Note: I am using a 3-node replicaset running in Centos 7 for this example.

The first step is to create the backup role on the target cluster’s primary:

db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
      "privileges": [
         { "resource": { "anyResource": true },
           "actions": [ "anyAction" ]
         }
      ],
      "roles": []
   });

Now, let’s also create the backup user and give it the proper permissions:

db.getSiblingDB("admin").createUser({user: "pbmuser",
       "pwd": "secretpwd",
       "roles" : [
          { "db" : "admin", "role" : "readWrite", "collection": "" },
          { "db" : "admin", "role" : "backup" },
          { "db" : "admin", "role" : "clusterMonitor" },
          { "db" : "admin", "role" : "restore" },
          { "db" : "admin", "role" : "pbmAnyAction" }
       ]
    });

Configure PBM Agent

The next step is configuring the credentials for pbm agent on each server. It is important to point each agent to its local node only (don’t use the replicaset uri here):

tee /etc/sysconfig/pbm-agent <<EOF
PBM_MONGODB_URI="mongodb://pbmuser:secretpwd@localhost:27017"
EOF

Now we can start the agent on all nodes of the new cluster:

systemctl start pbm-agent

We have to specify the location where backups are stored. This is saved inside MongoDB itself. The easiest way to load the configuration options at first is to create a YAML file and upload it. For example, given the following file:

tee /etc/pbm-agent-storage.conf <<EOF
type:s3
s3:
   region: us-west-2
   bucket: pbm-test-bucket-78967
   credentials:
      access-key-id: "your-access-key-id-here"
      secret-access-key: "your-secret-key-here"
EOF

Use the pbm config –file command to save (or update) the admin.pbmConfig collection, which all pbm-agents will refer to.

$ pbm config --file=/etc/pbm-agent-storage-local.conf 
[Config set]
------
pitr:
  enabled: false
storage:
  type: filesystem
  filesystem:
    path: /backup

Backup list resync from the store has started

Sync the Backups and Perform the Restore

As you can see, PBM automatically starts scanning the remote destination for backup files. After a few moments, you should be able to list the existing backups:

$ pbm list --mongodb-uri mongodb://pbmuser:secretpwd@localhost:27017/?replicaSet=testRPL
Backup snapshots:
  2020-11-02T16:53:53Z
PITR <off>:
  2020-11-02T16:54:15 - 2020-11-05T11:43:26

Note: in the case of a sharded cluster, the above connection must be to the config server replica set.

You can also use the following command if you need to re-run the scan for any reason:

pbm config --force-resync

The last step is to fire off the restore:

$ pbm restore 2020-11-02T16:53:53Z --mongodb-uri mongodb://pbmuser:secretpwd@localhost:27017/?replicaSet=testRPL
...Restore of the snapshot from '2020-11-02T16:53:53Z' has started

We can check the progress by tailing the journal:

$ journalctl -u pbm-agent -f

Nov 05 13:00:31 mongo0 pbm-agent[10875]: 2020-11-05T13:00:31.000+0000 [INFO] got command restore [name: 2020-11-05T13:00:31.580485314Z, backup name: 2020-11-02T16:53:53Z] <ts: 1604581231>
Nov 05 13:00:31 mongo0 pbm-agent[10875]: 2020-11-05T13:00:31.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restore started
Nov 05 13:00:34 mongo0 pbm-agent[10875]: 2020-11-05T13:00:34.918+0000        preparing collections to restore from
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.011+0000        reading metadata for admin.pbmRUsers from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.051+0000        restoring admin.pbmRUsers from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.517+0000        restoring indexes for collection admin.pbmRUsers from metadata
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.548+0000        finished restoring admin.pbmRUsers (3 documents, 0 failures)
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.548+0000        reading metadata for admin.pbmRRoles from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.558+0000        restoring admin.pbmRRoles from archive on stdin
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.011+0000        restoring indexes for collection admin.pbmRRoles from metadata
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.031+0000        finished restoring admin.pbmRRoles (2 documents, 0 failures)
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.050+0000        reading metadata for admin.test from archive on stdin
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.061+0000        restoring admin.test from archive on stdin
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.775+0000        no indexes to restore
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.776+0000        finished restoring admin.test (1000000 documents, 0 failures)
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.901+0000        reading metadata for admin.pbmLockOp from archive on stdin
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.993+0000        restoring admin.pbmLockOp from archive on stdin
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.379+0000        restoring indexes for collection admin.pbmLockOp from metadata
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.647+0000        finished restoring admin.pbmLockOp (0 documents, 0 failures)
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.751+0000        reading metadata for test.test from archive on stdin
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.784+0000        restoring test.test from archive on stdin
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.772+0000        no indexes to restore
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.776+0000        finished restoring test.test (533686 documents, 0 failures)
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.000+0000 [INFO] restore/2020-11-02T16:53:53Z: mongorestore finished
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: starting oplog replay
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: oplog replay finished on {0 0}
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restoring users and roles
Nov 05 13:01:31 mongo0 pbm-agent[10875]: 2020-11-05T13:01:31.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restore finished successfully

Conclusion

Percona Backup for MongoDB is a must-have tool for sharded environments, because of multi-shard consistency. This article shows how PBM can be used for disaster recovery; everything is simple and automatic.

A caveat here is that unless you want to go into the rabbit hole of manual metadata renaming, you should keep the same replica set names on both the source and target clusters.

If you would like to follow the development, report a bug, or have ideas for feature requests, make sure to check out the PBM project in the Percona issue tracker.

Oct
19
2020
--

5 Things Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDBMongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up. Oftentimes, developers want a quick environment to just test out an idea they have for an application or to try and figure out a good data model for their data without waiting for their Operations team to spin up the infrastructure.  What can sometimes happen is these quick, one-off instances grow, and before you know it that little test DB is your production DB supporting your new app. For anyone who finds themselves in this situation, I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB.  Don’t let the ease of installing MongoDB fool you into a false sense of security, there are things you need to consider as a developer before deploying MongoDB.  Here are five things developers should know before deploying MongoDB in production.

1) Enable Authentication and Authorization

Security is of utmost importance to your database.  While gone are the days when security was disabled by default for MongoDB, it’s still easy to start MongoDB without security.  Without security and with your database bound to a public IP, anyone can connect to your database and steal your data.   By simply adding some important security configuration options to your configuration file, you can ensure that your data is protected.  You can also configure MongoDB to utilize native LDAP or Kerberos for authentication.  Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure.  The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.

security:
  authorization: enabled
  keyfile: /path/to/our.keyfile

 

2) Connect to a Replica Set/Multiple Mongos, Not Individual Nodes

MongoDB’s drivers all support connecting directly to a standalone node, a replica set, or a mongos for sharded clusters.   Sometimes your database starts off with one specific node that is always your primary.  It’s easy to set your connection string to only connect to that one node.   But what happens when that one node goes down?   If you don’t have a highly available connection string in your application configuration, then you’re missing out on a key advantage of MongoDB replica sets. Connect to the primary no matter which node it is.  All of MongoDB’s supported language drivers support the MongoDB URI connection string format and implement the failover logic.  Here are some examples of connection strings for PyMongo, MongoDB’s Python Driver, of a standalone connection string, a replica set, and an SRV record connection string.  If you have the privilege to set up SRV DNS records, it allows you to standardize your connection string to point to an address without needing to worry about the underlying infrastructure getting changed.

Standalone Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017/?authSource=admin')

 

Replica Set Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017,hostdef:27017,hostxyz.example.com/?replicaSet=foo&authSource=admin')

 

SRV Connection String:

client = MongoClient('mongodb+srv://host.example.com/')

Post-script for clusters: If you’re just starting you’re usually not setting up a sharded cluster. But if it is a cluster then instead of using a replicaset connection you will connect to a mongos node. To get automatic failover in the event of mongos node being restarted (or otherwise being down) start them on multiple hosts and put them, comma-concatenated, in your connection string’s host list. As with replicasets you can use SRV records for these too.

3) Sharding Can Help Performance But Isn’t Always a Silver Bullet

Sharding is how MongoDB handles the partitioning of data.  This practice is used to distribute load across more replicasets for a variety of reasons such as write performance, low-latency geographical writes, and archiving data to shards utilizing slower and cheaper storage.   These sharding approaches are helpful in keeping your working set in memory because it lowers the amount of data each shard has to deal with.

As previously mentioned, sharding can also be used to reduce latency by separating your shards by geographic region, a common example if having a US-based shard, an EU-based shard, and a shard in Asia where the data is kept local to its origin.  Although it is not the only application for shard zoning “Geo-sharding” like this is a common one. This approach can also help applications comply with various data regulations that are becoming more important and more strict throughout the world.

While sharding can oftentimes help write performance, that sometimes comes at the detriment of read performance.  An easy example of a poor read performance would be if we needed to run a query to find all of the orders regardless of their origin. This find query would need to be sent to the US shard, the EU shard, and the shard in Asia, with all the network latency that comes with reading from the non-local regions, and then it would need to sort all the returned records on the mongos query router before returning them to the client. This kind of give and take should help you determine what approach you take to choosing a shard key and weighing its impact on your typical query patterns.

4) Replication ? Backups

MongoDB Replication, while powerful and easy to set up, is not a substitution for a good backup strategy.  Some might think that their replica set members in a DR data center will be sufficient to keep them up in a data loss scenario.   While a replica set member in a DR center will surely help you in a DR situation, it will not help you if you accidentally drop a database or a collection in production as that delete will quickly be replicated to your secondary in your DR data center.

Other common misconceptions are that delayed replica set members keep you safe.   Delayed members still rely on you finding the issue you want to restore from before it gets applied to your delayed member.  Are your processes that rock-solid that you can guarantee that you’ll find the issue before it reaches your delayed member?

Backups are just as important with MongoDB as they were with any other database.  There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, full and Incremental Backups.  As mentioned, Backups can be run from any node in your replica set.  The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node.   In addition to the above methods, you can also take snapshots of your data, this is possible as long as you pause writes to the node that you’re snapshotting by freezing the file system to ensure a consistent snapshot of your MongoDB database.

5) Schemaless is a Myth, Schemas Still Matter

MongoDB was originally touted as a schemaless database, this was attractive to developers who had long struggled to update and maintain their schemas in relational databases.   But these schemas succeeded for good reasons in the early days of databases and while MongoDB allowed you the flexibility to not set up your schema and create it on the fly, this often led to some poor-performing schema designs and anti-patterns.   There are lots of stories out in the wild of users not enforcing any structured schema on their MongoDB data models and running into various performance problems as their schema began to become unwieldy.  Today, MongoDB supports JSON schema and schema validation.  These approaches allow you to apply as much or as little structure to your schemas as is needed, so you still have the flexibility of MongoDB’s looser schema structure while still enforcing schema rules that will keep your application performing well and your data model consistent.

Another aspect that is affected by poor schema design in MongoDB is its aggregation framework.   The aggregation framework lets you do more analytical query patterns such as sorting, grouping, and some useful things such as unwinding of arrays and supporting joins and a whole lot more.  Without a good schema, these sorts of queries can really suffer poor performance.

MongoDB was also popular due to its lack of support for joins. Joins can be expensive and avoiding them allowed MongoDB to run quite fast.  Though MongoDB has since added $lookup to support left outer joins, embedded documents are a typical workaround to this approach.   This approach comes with its pros and cons.  As with relational databases, embedding documents is essentially creating a One-to-N relationship, this is covered in greater detail in this blog.  In MongoDB, the value of N matters, if it’s One-to-few (2-10), one-to-many,(10-1000) this can still be a good schema design as long as your indexes support your queries.   When you get to one-to-tons(10000+) this is where you need to consider things like MongoDB’s 16 MB limit per document or using references to the parent document.

Examples of each of these approaches:

One-to-Few, consider having multiple phone numbers for a user:

{  "_id" : ObjectId("1234567890"),
  "name" :  "John Doe",
  "phone" : [     
     { "type" : "mobile", "number" : "+1-585-555-5555" }, 
     { "type" : "work", "number" : "+1-585-555-1111"}  
            ]
}

One-to-Many, consider a parts list for a product with multiple items:

{ "_id" : ObjectId("123"),
 “Item” : “Widget”,
 “Price” : 100 
}

{  "_id" : ObjectId("0123456789"), 
   "manufacturer" : "Percona",
   "catalog_number" : 123456,
   "parts" : [    
      { “item”: ObjectID("123")},  
      { “item”: ObjectID("456")},
      { “item”: ObjectID("789")},
       ...  
              ] 
}

One-to-Tons, consider a social network type application:

{  "_id" : ObjectId("123"),
   "username" : "Jane Doe" 
}

{  "_id" : ObjectId("456"),
   "username" : "Eve DBA"
 }

{  "_id" : ObjectId("9876543210"),
   "username" : "Percona",
   "followers" : [     
                    ObjectID("123"),
                    ObjectID("456"),
                    ObjectID("789"),
                    ...  
                 ]
}

 

Bonus Topic: Transactions

MongoDB supports multi-document transactions since MongoDB 4.0 (replica sets) and MongoDB 4.2 (sharded clusters).  Transactions in MongoDB work quite similarly to how they work in relational databases.   That is to say that either all actions in the transaction succeed or they all fail.  Here’s an example of a transaction in MongoDB:

rs1:PRIMARY> session.startTransaction() 
rs1:PRIMARY> session.getDatabase("percona").test.insert({today : new Date()})
WriteResult({ "nInserted" : 1 })
rs1:PRIMARY> session.getDatabase("percona").test.insert({some_value : "abc"})
WriteResult({ "nInserted" : 1 }) 
rs1:PRIMARY> session.commitTransaction()

Transactions can be quite powerful if they are truly needed for your application, but do realize the performance implications as all queries in a transaction will wait to finish until the whole transaction succeeds or fails.

Takeaways:

While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a developer, should consider before deploying MongoDB.   We’ve covered enabling authentication and authorization to ensure you have a secure application and don’t leak data.   We’ve highlighted using Highly available connection strings, whether to your replica set, a mongos node list, or utilizing SRV, to ensure you’re always connecting to the appropriate nodes.  The balancing act of ensuring that when you select your shard key you consider the impact to both reads and writes and understand the tradeoffs that you are making.   The importance of backups and to not rely on replication as a backup method was also covered.  Finally, we covered the fact that schemas still matter with MongoDB, but you still have flexibility in defining how rigid it is. We hope this helps you have a better idea about things to consider when deploying MongoDB for your next application and to be able to understand it better.  Thanks for reading!

Sep
18
2020
--

MongoDB Backup Best Practices

MongoDB Backup Best Practices

MongoDB Backup Best PracticesIn this blog, we will be discussing different backup strategies for MongoDB and their use cases, along with the pros and cons of each.

Why Take Backups?

Regular database backups are a crucial step in guarding against unintended data loss events. It doesn’t matter if you lose your data because of mechanical failure, a natural disaster, or criminal malice, your data is gone. However, the data doesn’t need to be lost. You can back it up.

Generally, there are two types of backups used with databases technologies like MongoDB:

  • Logical Backups
  • Physical Backups

Additionally, we have the option of incremental backups as well (part of logical), where we can capture the deltas or incremental data changes made between full backups to minimize the data loss in case of any disaster. We will be discussing these two backup options, how to proceed with them, and which one suits better depending upon requirements and environment setup.

Logical Backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON formatted file.

During logical backups using client API, the data gets read from the server and returned back to the same API which will be serialized and written into respective “.bson”, “.json”, or “.csv”  backup files on disk depending upon the type of backup utilities used.

MongoDB offers the below utility to take logical backups:

Mongodump: Takes dump/backup of the databases into “.bson” format which can be later restored by replaying the same logical statements captured in dump files back to the databases.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax, then the backup will be taken for the entire database or collections respectively. If “authorization” is enabled then we must specify the “authenticationDatabase”.

Also, you should use “–oplog” to take the incremental data while the backup still running, and we can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire database backups.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

Pros:

  1. It can take the backup at a more granular level like a specific database or a collection which will be helpful during restoration.
  2. Does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons:

  1. As it reads all data it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache. The WT cache pressure increases which slows down the performance.
  2. It doesn’t capture the index data into the metadata backup file due to which while restoring, all the indexes have to be built again after the collection data is reinserted. This will be done in one pass through the collection after the inserts have finished, so it can add a lot of time for big collection restores..
  3. The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.

Note: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation from Primary node.

As we have different types of environment setups, we should be approaching each one of them as below.

  1. Replica set: Always preferred to run on secondaries.
  2. Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them.

Since we are discussing distributed database system like shard cluster, we should also keep in mind to have consistency in our backups at a point in time (Replica sets backups using mongodump are generally consistent using “–oplog”).

Let’s discuss this scenario where the application is still writing data and cannot be stopped because of business reasons. Now, even if we take backups of the config server and each shard separately, at some point in time, the backup will finish at different times because of data volume, load, etc. Hence, while restoring there might be some inconsistencies occurring because of the same reason.

For that, Percona Backup for MongoDB is very useful (uses mongodump libraries internally) since it tails the oplog on each shard separately while the backup is still running until completion. More references can be found here in the release notes.

Now comes the restoration part when dealing with Logical backups. Same as for backups, MongoDB provides the below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place once the data is restored which causes to use additional memory resources and time.

mongorestore --host=mongodb1.example.net --port=27017 --username=user  --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

For the restore of the incremental dump, we can add –oplogReplay in the above syntax to replay the oplog entries as well.

Note: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Physical/Filesystem Backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath)  at a point in time, and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several methods to take the filesystem level backup, also known as Physical backups, as below.

  1. Manually Copying the entire data files (using Rsync ? Depends on N/W bandwidth)
  2. LVM based snapshots
  3. Cloud-based disk snapshots (AWS/GCP/Azure or any other cloud provider)
  4. Percona hot backup here

We’ll be discussing all these above options but first, let’s see their Pros and Cons over Logical Based backups.

Pros:

  1. They are at least as fast as, and usually faster than, logical backups.
  2. Can be easily copied over or shared with remote servers or attached NAS.
  3. Recommended for large datasets because of speed and reliability.
  4. Can be convenient while building new nodes within the same cluster or new cluster.

Cons:

  1. It is impossible when performing a restore on a less granular level such as specific DB or Collection restore.
  2. Incremental backups cannot be achieved yet.
  3. A dedicated node is recommended for backup (might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB

Index Size: <1MB (since it was only on _id for testing)

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 137029,
        "avgObjSize" : 2097192,
        "dataSize" : 267.6398703530431,
        "storageSize" : 13.073314666748047,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 0.0011749267578125,
        "scaleFactor" : 1073741824,
        "fsUsedSize" : 16.939781188964844,
        "fsTotalSize" : 49.98826217651367,
        "ok" : 1,
        ...
}
demo:PRIMARY>

        1. Hot backup

Syntax : 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

 

Note: The backup path “backupDir” should be absolute. It also supports storing the backups on the filesystem and AWS S3 buckets.

[root@ip-172-31-37-92 tmp]# time mongo  < hot.js
Percona Server for MongoDB shell version v4.2.8-8
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }
Percona Server for MongoDB server version: v4.2.8-8
switched to db admin
{
        "ok" : 1,
        ...
}
bye

real    3m51.773s
user    0m0.067s
sys     0m0.026s
[root@ip-172-31-37-92 tmp]# ls
hot  hot.js  mongodb-27017.sock  nohup.out  systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD  tmux-0
[root@ip-172-31-37-92 tmp]# du -sch hot
15G     hot
15G     total

Notice the time taken by “Percona hot backup” was just 4 minutes approx. It is even very helpful during the rebuild of a node or spinning new instances/cluster with the same dataset. The best part is it doesn’t compromise with locking of writes or any performance hits. However, it is also recommended to run it against the secondaries. 

       2.  Filesystem Snapshot

The approximate time taken for the snapshot to be completed was only 4 minutes.

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots  --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"
{
    "SnapshotId": "snap-0f4403bc0fa0f2e9c",
    "StartTime": "2020-08-26T12:26:32.783Z"
}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots \
> --snapshot-ids snap-0f4403bc0fa0f2e9c
{
    "Snapshots": [
        {
            "Description": "This is my snapshot backup",
            "Encrypted": false,
            "OwnerId": "021086068589",
            "Progress": "100%",
            "SnapshotId": "snap-0f4403bc0fa0f2e9c",
            "StartTime": "2020-08-26T12:26:32.783Z",
            "State": "completed",
            "VolumeId": "vol-0def857c44080a556",
            "VolumeSize": 50
        }
    ]
}

       3. Mongodump

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &
[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out
2020-08-26T12:36:20.842+0000    writing test.collG to /mongodump/test/collG.bson
2020-08-26T12:51:08.832+0000    [####....................]  test.collG  27353/137029  (20.0%)

Note: Just to give an idea, we can clearly see that for the same dataset where snapshot and hot backup took only 3-5 minutes, “mongodump” took almost 15 minutes just for 20% of the dump. Hence the speed to back up the data is definitely very slow as compared to the other two options we have. And on top of that, we would only be left with one option to restore the backup that is “mongorestore” which will eventually make the whole process much slower.

Conclusion

So, which backup method would be the best? It completely depends on factors like the type of infrastructure, environment, dataset size, load, etc. But generally, if the dataset is around 100GB or less than that, then the logical backups are the best option along with scheduled incremental backups as well, depending upon RTO (Recovery Time Objective)/RPO (Recovery Point Objective)  needs. However, if the dataset size is more than that, we should always go for physical backups including incremental backups (oplogs) as well.

Interested in trying Percona Backup for MongoDB? Download it for free! 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com