May
19
2021
--

Refreshing Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDBThis is a very straightforward article written with the intention to show you how easy it is to refresh your Test/Dev environments with PROD data, using Percona Backup for MongoDB (PBM). This article will cover all the steps from the PBM configuration until the restore, assuming that the PBM agents are all up and running on all the replica set members of either PROD and Dev/Test servers.

Taking the Backup on PROD

This step is quite simple and it demands no more than two commands:

1. Configuring the Backup

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm config --file /etc/pbm/pbm-s3.yaml
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpPROD
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

Backup list resync from the store has started

Important note on two things: I will address my backups to an S3 bucket and I am defining a prefix. When defining a prefix in the PBM storage configuration, a subdirectory will be automatically created and the backup files will be stored on that subdirectory instead of the root of the S3 bucket.

2. Taking the Backup

Having the PBM properly configured, it is time to take the backup. (You can skip this step if you already have PBM backups to use, of course.)

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm backup
Starting backup '2021-05-08T08:34:47Z'...................
Backup '2021-05-08T08:34:47Z' to remote store 's3://rafapbmtest/bpPROD' has started

And if we hit the PBM status command, we will see the snapshot running and when it is complete, the PBM status will show it as completed like below:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Configuring the PBM Space on a DEV/TEST Environment

All right, now my PROD has a proper backup routine configured. I will move one step forward and configure my PBM space but this time in a Dev/Test environment – named here as DEV.

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:50001/?replSetName=rbprepDEV?authSource=admin'

$ pbm config --file /etc/pbm/pbm-s3.yaml 
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpDEV
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

The backup list resync from the store has started.

Note that the S3 bucket is exactly the same where PROD is storing the backups but with a different prefix. If I hit a status command, I will see it is configured but no snapshots available yet:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
(none)

Lastly, note that the replica set name is exactly the same as PROD. If this was a sharded cluster, rather than a non-sharded replicaset, all the replica set names have to match in the target cluster. PBM is guided by the replica set name and if my DEV env had a different one, it would not be possible to load backup metadata from PROD to DEV

Transfering the Desired Backup Files

The next step will be transferring the backup files from the PROD prefix to the target prefix. I will use the AWS CLI to achieve that, but there is one important thing to keep in mind in advance: determining which files are referent to a certain backup set (snapshot). Let’s go back to the PBM status output taken in PROD previously:

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

The PBM snapshots are named with the timestamp from when the backup started. If we check at the S3 prefix where it is stored, we will see that the file’s names contain that timestamp in its name composition.

$ aws s3 ls s3://rafapbmtest/bpPROD/
2021-05-08 10:26:11          5 .pbm.init
2021-05-08 10:35:14       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:35:10      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:35:13        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

So, it will be easy now to know which file I have to copy.

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z.pbm.json

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.dump.s2

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.oplog.s2

Checking the DEV prefix:

$ aws s3 ls s3://rafapbmtest/bpDEV/
2021-05-08 10:43:59          5 .pbm.init
2021-05-08 10:52:02       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:52:13      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:52:24        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

The files are already there and PBM has already automatically loaded their metadata into the DEV PBM collections:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Finally – Restoring It

Believing it or not, now comes the easiest part: the restore. It is only one command and nothing else:

$ pbm restore '2021-05-08T08:34:47Z'
....Restore of the snapshot from '2021-05-08T08:34:47Z' has started

Refreshing Dev/Test environments with PROD data is a very common and required task in corporations worldwide. I hope this article helps to clarify the practical questions regarding using PBM for it!

May
13
2021
--

Percona Backup for MongoDB v1.5 Released

Percona Backup for MongoDB v1.5 Released

Percona Backup for MongoDB v1.5 ReleasedPercona Backup for MongoDB (PBM) has reached a new step with the release of version 1.5.0 today, May 13th, 2021.

Azure Blob Storage Support

Now you can use Azure Blob Storage as well as S3-compatible object stores.

Configuration example:

   storage:
     type: azure
     azure:
       account: <string>
       container: <string>
       prefix: <string>
       credentials:
         key: <your-access-key>

Preference Weight for Backup Source Nodes

Until now PBM would use a secondary as a backup source if there is one with a pbm-agent on it, otherwise, as a fall-back, it will use a primary.

There are, however, plenty of users who would like a certain mongod node, for example, those in a datacenter closer to the backup storage, to be the preferred mongod nodes to copy the data from. This is only a preference – if the user-preferred node is down then another one will be chosen.

Setting the priority section is entirely optional. If you don’t specify any preferences PBM will choose this way by default: Hidden secondaries are top preference (PBM-494), normal secondaries are next, primaries are last.

Configuration example of manually-chosen priorities:

   backup:
     priority:
       "mdb-c1.bc.net:28019": 2.5
       "mdb-s1n1.ad.net:27018": 2.5
       "mdb-s1n2.bc.net:27020": 2.0
       "mdb-s1n3.bc.net:27017": 0.1

The default preference weight is 1.0, so other nodes not explicitly listed above will have a priority above that of the “mdb-s1n3.bc.net:27017” example node above.

You can not set priority to <= 0 value as a way to ban a node. A banned mongod node might be the last healthy one at a given moment and the backup would fail to start, so a design decision to exclude the banning of nodes was made.

Important note: as soon as you begin specifying any node preference it is assumed you are taking full manual control. At this point the default rules, eg. to prefer secondaries to primaries, stop working.

Users and Roles Backup Method Change

Important notice for database administrators: The backup file format for v1.5 has an incompatible difference with v1.4 (or earlier) backups. v1.5 will not be able to restore backups of v1.4.x or earlier.

Restoring users and roles has some constraints. To prevent a collection drop of system.users or system.roles disrupting the pbm-agent <-> mongod connection, they are not re-inserted under their original collection name. Instead, they are inserted into a temporary location and the user and role records are copied one by one.

A catch with the point in time previously used to rename the collections to temporary names has interfered with a requirement regarding restoring collection UUIDs, which in turn blocked a fix of bug PBM-646.

Because PBM uses a single archive file for each full snapshot backup, there is no way to fix the embedded collection names in the backups for v1.4 before the restore process begins. This means there is no workaround that will allow v1.5 PBM to restore <= v1.4.x PBM backups.

The only workaround to restore <= 1.4.x backups after deploying v1.5 would be to roll back PBM executables (pbm and pbm-agent) to v1.4.1 just for the restore.

Bug Fixes

  • PBM-636: Restores of sharded collections needed to be fixed by setting the same UUID value. Thanks to Nikolay and Dmitry for reporting the issue.
  • PBM-646 – Stop the balancer during backup to make sure it doesn’t start running during restore.
  • PBM-642 – Display priority=0 members on agent list in PBM Status output.
May
10
2021
--

Point-in-Time Recovery for MongoDB on Kubernetes

point in time recovery mongodb

point in time recovery mongodbRunning MongoDB in Kubernetes with Percona Operator is not only simple but also by design provides a highly available MongoDB cluster suitable for mission-critical applications. In the latest 1.8.0 version, we add a feature that implements Point-in-time recovery (PITR). It allows users to store Oplog on an external S3 bucket and to perform recovery to a specific date and time once needed. The main value of this approach is a significantly lower Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

In this blog post, we will look into how we delivered this feature and review some architectural decisions.

Internals

For full backups and PITR features, the Operator relies on Percona Backup for MongoDB (PBM), which by design supports storing operations logs (oplogs) on S3-compatible storage. We run PBM as a sidecar container in replica sets Pods, including Config Server Pods. So each replica set has two containers from the very beginning – Percona Server for MongoDB (PSMDB) and Percona Backup for MongoDB.

While PBM is a great tool, it comes with some limitations that we needed to keep in mind when implementing the PITR feature.

One Bucket

If PITR is enabled, PBM stores backups on S3 storage in a chained mode: Oplogs are stored right after the full backup and require it. PBM stores metadata about the backups in the MongoDB cluster itself and creates a copy on S3 to maintain the full visibility of the state of backups and operation logs.

When a user wants to recover to a specific date and time, PBM figures out which full backup to use, recovers from it, and applies the oplogs.

If the user decides to use multiple S3 buckets to store backups, it means that oplogs are also scattered across these buckets. This complicates the recovery process because PBM only knows about the last S3 bucket used to store the full backup.

To simplify things and to avoid these split-brain situations with multiple buckets we made the following design decisions:

  • Do not enable the PITR feature in the user-specified multiple buckets in
    backup.storages

    section. This should cover most of the cases. We throw an error if the user tries that:

"msg":"Point-in-time recovery can be enabled only if one bucket is used in spec.backup.storages"

  • There are still cases where users can get into the situation with multiple buckets (ex. disable PITR and enable it again with another bucket).
    • That is why to recover from the backup we request the user to specify the
      backupName

      (

      psmdb-backup

      Custom Resource name) in the

      recover.yaml

      manifest. From this CR we get the storage and PBM fetches the oplogs which follow the full backup.

The obvious question is: why can’t the Operator handle the logic and somehow store metadata from multiple buckets?

There are several answers here:

  1. Bucket configurations can change during a cluster’s lifecycle and keeping all this data is possible, but the data may become obsolete over time. Also, our Operator is stateless and we want to keep it that way.
  2. We don’t want to bring this complexity into the Operator and are assessing the feasibility of adding this functionality into PBM instead (K8SPSMDB-460).

Full Backup Needed

We mentioned before that Oplogs require full backups. Without a full backup, PBM will not start uploading oplogs and the Operator will throw the following error:

"msg":"Point-in-time recovery will work only with full backup. Please create one manually or wait for scheduled backup to be created (if configured).

There are two cases when this can happen:

  1. User enables PITR for the cluster
  2. User recovers from backup

In this release, we decided not to create the full backup automatically, but leave it to the user or backup schedule. We might introduce the flag in the following releases which would allow users to configure this behavior, but for now, we decided that current primitives are enough to automate the full backup creation. 

10 Minutes RPO

Right now PBM uploads oplogs to the S3 bucket every 10 minutes. This time span is not configurable and hardcoded for now. What it means to the user is that a Recovery Point Objective (RPO) can be as much as ten minutes. 

This is going to be improved in the following releases of Percona Backup for MongoDB and captured in PBM-543 JIRA issue. Once it is there, the user would be able to control the period between Oplog uploads with

spec.backup.pitr.timeBetweenUploads

in

cr.yaml

.

Which Backups do I Have?

So the user has Full backups and PITR enabled. PBM has a nice feature that shows all the backups and Oplog (PITR) time frames:

$ pbm list

Backup snapshots:
     2020-12-10T12:19:10Z [complete: 2020-12-10T12:23:50]
     2020-12-14T10:44:44Z [complete: 2020-12-14T10:49:03]
     2020-12-14T14:26:20Z [complete: 2020-12-14T14:34:39]
     2020-12-17T16:46:59Z [complete: 2020-12-17T16:51:07]
PITR <on>:
     2020-12-14T14:26:40 - 2020-12-16T17:27:26
     2020-12-17T16:47:20 - 2020-12-17T16:57:55

But in Operator the user can see full backup details, but cannot see the Oplog information yet without going into the backup container manually:

$ kubectl get psmdb-backup backup2 -o yaml
…
status:
  completed: "2021-05-05T19:27:36Z"
  destination: "2021-05-05T19:27:11Z"
  lastTransition: "2021-05-05T19:27:36Z"
  pbmName: "2021-05-05T19:27:11Z"
  s3:
    bucket: my-bucket
    credentialsSecret: s3-secret
    endpointUrl: https://storage.googleapis.com
    region: us-central-1

The obvious idea is to somehow store this information in

psmdb-backup

Custom Resource but to do that we need to keep it updated. Updating hundreds of these objects all the time in a reconcile loop might result in pressure on the Operator and even Kubernetes API. We are still assessing different options here.

Conclusion

Point-in-time recovery is an important feature for Percona Operator for MongoDB as it reduces both RTO and RPO. The feature was present in PBM for some time already and was battle-tested in multiple production deployments. With Operator we want to reduce the manual burden to a minimum and automate day-2 operations as much as possible. Here is a quick summary of what is coming in the following releases of the Operator related to PITR:

  • Reduce RPO even more with configurable Oplogs upload period (PBM-543, K8SPSMDB-388)
  • Take full backup automatically if PITR is enabled (K8SPSMDB-460)
  • Provide users the visibility into available Oplogs time frames (K8SPSMDB-461)

Our roadmap is available publicly here and we would be curious to learn more about your ideas. If you are willing to contribute a good starting point would be CONTRIBUTING.md in our Github repository. It has all the details about how to contribute code, submit new ideas, and report a bug. A good place to ask questions is our Community Forum, where anyone can freely share their thoughts and suggestions regarding Percona software.

Nov
10
2020
--

Restore a Replica Set to a New Environment with Percona Backup for MongoDB

restore a backup MongoDB

restore a backup MongoDBPercona Backup for MongoDB (PBM) is our open source tool for backing up MongoDB clusters. Initially, the tool was developed for restoring the backups in the same environment they are taken. In this post, I will show you how to restore a backup to a new environment instead.

Let’s assume you followed the instructions to install Percona Backup for MongoDB packages on your newly provisioned replica set, and you already have at least one full backup of the source stored in remote backup storage.

Create the Backup User

Note: I am using a 3-node replicaset running in Centos 7 for this example.

The first step is to create the backup role on the target cluster’s primary:

db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",
      "privileges": [
         { "resource": { "anyResource": true },
           "actions": [ "anyAction" ]
         }
      ],
      "roles": []
   });

Now, let’s also create the backup user and give it the proper permissions:

db.getSiblingDB("admin").createUser({user: "pbmuser",
       "pwd": "secretpwd",
       "roles" : [
          { "db" : "admin", "role" : "readWrite", "collection": "" },
          { "db" : "admin", "role" : "backup" },
          { "db" : "admin", "role" : "clusterMonitor" },
          { "db" : "admin", "role" : "restore" },
          { "db" : "admin", "role" : "pbmAnyAction" }
       ]
    });

Configure PBM Agent

The next step is configuring the credentials for pbm agent on each server. It is important to point each agent to its local node only (don’t use the replicaset uri here):

tee /etc/sysconfig/pbm-agent <<EOF
PBM_MONGODB_URI="mongodb://pbmuser:secretpwd@localhost:27017"
EOF

Now we can start the agent on all nodes of the new cluster:

systemctl start pbm-agent

We have to specify the location where backups are stored. This is saved inside MongoDB itself. The easiest way to load the configuration options at first is to create a YAML file and upload it. For example, given the following file:

tee /etc/pbm-agent-storage.conf <<EOF
type:s3
s3:
   region: us-west-2
   bucket: pbm-test-bucket-78967
   credentials:
      access-key-id: "your-access-key-id-here"
      secret-access-key: "your-secret-key-here"
EOF

Use the pbm config –file command to save (or update) the admin.pbmConfig collection, which all pbm-agents will refer to.

$ pbm config --file=/etc/pbm-agent-storage-local.conf 
[Config set]
------
pitr:
  enabled: false
storage:
  type: filesystem
  filesystem:
    path: /backup

Backup list resync from the store has started

Sync the Backups and Perform the Restore

As you can see, PBM automatically starts scanning the remote destination for backup files. After a few moments, you should be able to list the existing backups:

$ pbm list --mongodb-uri mongodb://pbmuser:secretpwd@localhost:27017/?replicaSet=testRPL
Backup snapshots:
  2020-11-02T16:53:53Z
PITR <off>:
  2020-11-02T16:54:15 - 2020-11-05T11:43:26

Note: in the case of a sharded cluster, the above connection must be to the config server replica set.

You can also use the following command if you need to re-run the scan for any reason:

pbm config --force-resync

The last step is to fire off the restore:

$ pbm restore 2020-11-02T16:53:53Z --mongodb-uri mongodb://pbmuser:secretpwd@localhost:27017/?replicaSet=testRPL
...Restore of the snapshot from '2020-11-02T16:53:53Z' has started

We can check the progress by tailing the journal:

$ journalctl -u pbm-agent -f

Nov 05 13:00:31 mongo0 pbm-agent[10875]: 2020-11-05T13:00:31.000+0000 [INFO] got command restore [name: 2020-11-05T13:00:31.580485314Z, backup name: 2020-11-02T16:53:53Z] <ts: 1604581231>
Nov 05 13:00:31 mongo0 pbm-agent[10875]: 2020-11-05T13:00:31.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restore started
Nov 05 13:00:34 mongo0 pbm-agent[10875]: 2020-11-05T13:00:34.918+0000        preparing collections to restore from
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.011+0000        reading metadata for admin.pbmRUsers from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.051+0000        restoring admin.pbmRUsers from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.517+0000        restoring indexes for collection admin.pbmRUsers from metadata
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.548+0000        finished restoring admin.pbmRUsers (3 documents, 0 failures)
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.548+0000        reading metadata for admin.pbmRRoles from archive on stdin
Nov 05 13:00:35 mongo0 pbm-agent[10875]: 2020-11-05T13:00:35.558+0000        restoring admin.pbmRRoles from archive on stdin
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.011+0000        restoring indexes for collection admin.pbmRRoles from metadata
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.031+0000        finished restoring admin.pbmRRoles (2 documents, 0 failures)
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.050+0000        reading metadata for admin.test from archive on stdin
Nov 05 13:00:36 mongo0 pbm-agent[10875]: 2020-11-05T13:00:36.061+0000        restoring admin.test from archive on stdin
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.775+0000        no indexes to restore
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.776+0000        finished restoring admin.test (1000000 documents, 0 failures)
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.901+0000        reading metadata for admin.pbmLockOp from archive on stdin
Nov 05 13:01:09 mongo0 pbm-agent[10875]: 2020-11-05T13:01:09.993+0000        restoring admin.pbmLockOp from archive on stdin
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.379+0000        restoring indexes for collection admin.pbmLockOp from metadata
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.647+0000        finished restoring admin.pbmLockOp (0 documents, 0 failures)
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.751+0000        reading metadata for test.test from archive on stdin
Nov 05 13:01:11 mongo0 pbm-agent[10875]: 2020-11-05T13:01:11.784+0000        restoring test.test from archive on stdin
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.772+0000        no indexes to restore
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.776+0000        finished restoring test.test (533686 documents, 0 failures)
Nov 05 13:01:27 mongo0 pbm-agent[10875]: 2020-11-05T13:01:27.000+0000 [INFO] restore/2020-11-02T16:53:53Z: mongorestore finished
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: starting oplog replay
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: oplog replay finished on {0 0}
Nov 05 13:01:30 mongo0 pbm-agent[10875]: 2020-11-05T13:01:30.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restoring users and roles
Nov 05 13:01:31 mongo0 pbm-agent[10875]: 2020-11-05T13:01:31.000+0000 [INFO] restore/2020-11-02T16:53:53Z: restore finished successfully

Conclusion

Percona Backup for MongoDB is a must-have tool for sharded environments, because of multi-shard consistency. This article shows how PBM can be used for disaster recovery; everything is simple and automatic.

A caveat here is that unless you want to go into the rabbit hole of manual metadata renaming, you should keep the same replica set names on both the source and target clusters.

If you would like to follow the development, report a bug, or have ideas for feature requests, make sure to check out the PBM project in the Percona issue tracker.

Oct
19
2020
--

5 Things Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDBMongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up. Oftentimes, developers want a quick environment to just test out an idea they have for an application or to try and figure out a good data model for their data without waiting for their Operations team to spin up the infrastructure.  What can sometimes happen is these quick, one-off instances grow, and before you know it that little test DB is your production DB supporting your new app. For anyone who finds themselves in this situation, I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB.  Don’t let the ease of installing MongoDB fool you into a false sense of security, there are things you need to consider as a developer before deploying MongoDB.  Here are five things developers should know before deploying MongoDB in production.

1) Enable Authentication and Authorization

Security is of utmost importance to your database.  While gone are the days when security was disabled by default for MongoDB, it’s still easy to start MongoDB without security.  Without security and with your database bound to a public IP, anyone can connect to your database and steal your data.   By simply adding some important security configuration options to your configuration file, you can ensure that your data is protected.  You can also configure MongoDB to utilize native LDAP or Kerberos for authentication.  Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure.  The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.

security:
  authorization: enabled
  keyfile: /path/to/our.keyfile

 

2) Connect to a Replica Set/Multiple Mongos, Not Individual Nodes

MongoDB’s drivers all support connecting directly to a standalone node, a replica set, or a mongos for sharded clusters.   Sometimes your database starts off with one specific node that is always your primary.  It’s easy to set your connection string to only connect to that one node.   But what happens when that one node goes down?   If you don’t have a highly available connection string in your application configuration, then you’re missing out on a key advantage of MongoDB replica sets. Connect to the primary no matter which node it is.  All of MongoDB’s supported language drivers support the MongoDB URI connection string format and implement the failover logic.  Here are some examples of connection strings for PyMongo, MongoDB’s Python Driver, of a standalone connection string, a replica set, and an SRV record connection string.  If you have the privilege to set up SRV DNS records, it allows you to standardize your connection string to point to an address without needing to worry about the underlying infrastructure getting changed.

Standalone Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017/?authSource=admin')

 

Replica Set Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017,hostdef:27017,hostxyz.example.com/?replicaSet=foo&authSource=admin')

 

SRV Connection String:

client = MongoClient('mongodb+srv://host.example.com/')

Post-script for clusters: If you’re just starting you’re usually not setting up a sharded cluster. But if it is a cluster then instead of using a replicaset connection you will connect to a mongos node. To get automatic failover in the event of mongos node being restarted (or otherwise being down) start them on multiple hosts and put them, comma-concatenated, in your connection string’s host list. As with replicasets you can use SRV records for these too.

3) Sharding Can Help Performance But Isn’t Always a Silver Bullet

Sharding is how MongoDB handles the partitioning of data.  This practice is used to distribute load across more replicasets for a variety of reasons such as write performance, low-latency geographical writes, and archiving data to shards utilizing slower and cheaper storage.   These sharding approaches are helpful in keeping your working set in memory because it lowers the amount of data each shard has to deal with.

As previously mentioned, sharding can also be used to reduce latency by separating your shards by geographic region, a common example if having a US-based shard, an EU-based shard, and a shard in Asia where the data is kept local to its origin.  Although it is not the only application for shard zoning “Geo-sharding” like this is a common one. This approach can also help applications comply with various data regulations that are becoming more important and more strict throughout the world.

While sharding can oftentimes help write performance, that sometimes comes at the detriment of read performance.  An easy example of a poor read performance would be if we needed to run a query to find all of the orders regardless of their origin. This find query would need to be sent to the US shard, the EU shard, and the shard in Asia, with all the network latency that comes with reading from the non-local regions, and then it would need to sort all the returned records on the mongos query router before returning them to the client. This kind of give and take should help you determine what approach you take to choosing a shard key and weighing its impact on your typical query patterns.

4) Replication ? Backups

MongoDB Replication, while powerful and easy to set up, is not a substitution for a good backup strategy.  Some might think that their replica set members in a DR data center will be sufficient to keep them up in a data loss scenario.   While a replica set member in a DR center will surely help you in a DR situation, it will not help you if you accidentally drop a database or a collection in production as that delete will quickly be replicated to your secondary in your DR data center.

Other common misconceptions are that delayed replica set members keep you safe.   Delayed members still rely on you finding the issue you want to restore from before it gets applied to your delayed member.  Are your processes that rock-solid that you can guarantee that you’ll find the issue before it reaches your delayed member?

Backups are just as important with MongoDB as they were with any other database.  There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, full and Incremental Backups.  As mentioned, Backups can be run from any node in your replica set.  The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node.   In addition to the above methods, you can also take snapshots of your data, this is possible as long as you pause writes to the node that you’re snapshotting by freezing the file system to ensure a consistent snapshot of your MongoDB database.

5) Schemaless is a Myth, Schemas Still Matter

MongoDB was originally touted as a schemaless database, this was attractive to developers who had long struggled to update and maintain their schemas in relational databases.   But these schemas succeeded for good reasons in the early days of databases and while MongoDB allowed you the flexibility to not set up your schema and create it on the fly, this often led to some poor-performing schema designs and anti-patterns.   There are lots of stories out in the wild of users not enforcing any structured schema on their MongoDB data models and running into various performance problems as their schema began to become unwieldy.  Today, MongoDB supports JSON schema and schema validation.  These approaches allow you to apply as much or as little structure to your schemas as is needed, so you still have the flexibility of MongoDB’s looser schema structure while still enforcing schema rules that will keep your application performing well and your data model consistent.

Another aspect that is affected by poor schema design in MongoDB is its aggregation framework.   The aggregation framework lets you do more analytical query patterns such as sorting, grouping, and some useful things such as unwinding of arrays and supporting joins and a whole lot more.  Without a good schema, these sorts of queries can really suffer poor performance.

MongoDB was also popular due to its lack of support for joins. Joins can be expensive and avoiding them allowed MongoDB to run quite fast.  Though MongoDB has since added $lookup to support left outer joins, embedded documents are a typical workaround to this approach.   This approach comes with its pros and cons.  As with relational databases, embedding documents is essentially creating a One-to-N relationship, this is covered in greater detail in this blog.  In MongoDB, the value of N matters, if it’s One-to-few (2-10), one-to-many,(10-1000) this can still be a good schema design as long as your indexes support your queries.   When you get to one-to-tons(10000+) this is where you need to consider things like MongoDB’s 16 MB limit per document or using references to the parent document.

Examples of each of these approaches:

One-to-Few, consider having multiple phone numbers for a user:

{  "_id" : ObjectId("1234567890"),
  "name" :  "John Doe",
  "phone" : [     
     { "type" : "mobile", "number" : "+1-585-555-5555" }, 
     { "type" : "work", "number" : "+1-585-555-1111"}  
            ]
}

One-to-Many, consider a parts list for a product with multiple items:

{ "_id" : ObjectId("123"),
 “Item” : “Widget”,
 “Price” : 100 
}

{  "_id" : ObjectId("0123456789"), 
   "manufacturer" : "Percona",
   "catalog_number" : 123456,
   "parts" : [    
      { “item”: ObjectID("123")},  
      { “item”: ObjectID("456")},
      { “item”: ObjectID("789")},
       ...  
              ] 
}

One-to-Tons, consider a social network type application:

{  "_id" : ObjectId("123"),
   "username" : "Jane Doe" 
}

{  "_id" : ObjectId("456"),
   "username" : "Eve DBA"
 }

{  "_id" : ObjectId("9876543210"),
   "username" : "Percona",
   "followers" : [     
                    ObjectID("123"),
                    ObjectID("456"),
                    ObjectID("789"),
                    ...  
                 ]
}

 

Bonus Topic: Transactions

MongoDB supports multi-document transactions since MongoDB 4.0 (replica sets) and MongoDB 4.2 (sharded clusters).  Transactions in MongoDB work quite similarly to how they work in relational databases.   That is to say that either all actions in the transaction succeed or they all fail.  Here’s an example of a transaction in MongoDB:

rs1:PRIMARY> session.startTransaction() 
rs1:PRIMARY> session.getDatabase("percona").test.insert({today : new Date()})
WriteResult({ "nInserted" : 1 })
rs1:PRIMARY> session.getDatabase("percona").test.insert({some_value : "abc"})
WriteResult({ "nInserted" : 1 }) 
rs1:PRIMARY> session.commitTransaction()

Transactions can be quite powerful if they are truly needed for your application, but do realize the performance implications as all queries in a transaction will wait to finish until the whole transaction succeeds or fails.

Takeaways:

While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a developer, should consider before deploying MongoDB.   We’ve covered enabling authentication and authorization to ensure you have a secure application and don’t leak data.   We’ve highlighted using Highly available connection strings, whether to your replica set, a mongos node list, or utilizing SRV, to ensure you’re always connecting to the appropriate nodes.  The balancing act of ensuring that when you select your shard key you consider the impact to both reads and writes and understand the tradeoffs that you are making.   The importance of backups and to not rely on replication as a backup method was also covered.  Finally, we covered the fact that schemas still matter with MongoDB, but you still have flexibility in defining how rigid it is. We hope this helps you have a better idea about things to consider when deploying MongoDB for your next application and to be able to understand it better.  Thanks for reading!

Sep
18
2020
--

MongoDB Backup Best Practices

MongoDB Backup Best Practices

MongoDB Backup Best PracticesIn this blog, we will be discussing different backup strategies for MongoDB and their use cases, along with the pros and cons of each.

Why Take Backups?

Regular database backups are a crucial step in guarding against unintended data loss events. It doesn’t matter if you lose your data because of mechanical failure, a natural disaster, or criminal malice, your data is gone. However, the data doesn’t need to be lost. You can back it up.

Generally, there are two types of backups used with databases technologies like MongoDB:

  • Logical Backups
  • Physical Backups

Additionally, we have the option of incremental backups as well (part of logical), where we can capture the deltas or incremental data changes made between full backups to minimize the data loss in case of any disaster. We will be discussing these two backup options, how to proceed with them, and which one suits better depending upon requirements and environment setup.

Logical Backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON formatted file.

During logical backups using client API, the data gets read from the server and returned back to the same API which will be serialized and written into respective “.bson”, “.json”, or “.csv”  backup files on disk depending upon the type of backup utilities used.

MongoDB offers the below utility to take logical backups:

Mongodump: Takes dump/backup of the databases into “.bson” format which can be later restored by replaying the same logical statements captured in dump files back to the databases.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax, then the backup will be taken for the entire database or collections respectively. If “authorization” is enabled then we must specify the “authenticationDatabase”.

Also, you should use “–oplog” to take the incremental data while the backup still running, and we can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire database backups.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

Pros:

  1. It can take the backup at a more granular level like a specific database or a collection which will be helpful during restoration.
  2. Does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons:

  1. As it reads all data it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache. The WT cache pressure increases which slows down the performance.
  2. It doesn’t capture the index data into the metadata backup file due to which while restoring, all the indexes have to be built again after the collection data is reinserted. This will be done in one pass through the collection after the inserts have finished, so it can add a lot of time for big collection restores..
  3. The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.

Note: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation from Primary node.

As we have different types of environment setups, we should be approaching each one of them as below.

  1. Replica set: Always preferred to run on secondaries.
  2. Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them.

Since we are discussing distributed database system like shard cluster, we should also keep in mind to have consistency in our backups at a point in time (Replica sets backups using mongodump are generally consistent using “–oplog”).

Let’s discuss this scenario where the application is still writing data and cannot be stopped because of business reasons. Now, even if we take backups of the config server and each shard separately, at some point in time, the backup will finish at different times because of data volume, load, etc. Hence, while restoring there might be some inconsistencies occurring because of the same reason.

For that, Percona Backup for MongoDB is very useful (uses mongodump libraries internally) since it tails the oplog on each shard separately while the backup is still running until completion. More references can be found here in the release notes.

Now comes the restoration part when dealing with Logical backups. Same as for backups, MongoDB provides the below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place once the data is restored which causes to use additional memory resources and time.

mongorestore --host=mongodb1.example.net --port=27017 --username=user  --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

For the restore of the incremental dump, we can add –oplogReplay in the above syntax to replay the oplog entries as well.

Note: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Physical/Filesystem Backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath)  at a point in time, and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several methods to take the filesystem level backup, also known as Physical backups, as below.

  1. Manually Copying the entire data files (using Rsync ? Depends on N/W bandwidth)
  2. LVM based snapshots
  3. Cloud-based disk snapshots (AWS/GCP/Azure or any other cloud provider)
  4. Percona hot backup here

We’ll be discussing all these above options but first, let’s see their Pros and Cons over Logical Based backups.

Pros:

  1. They are at least as fast as, and usually faster than, logical backups.
  2. Can be easily copied over or shared with remote servers or attached NAS.
  3. Recommended for large datasets because of speed and reliability.
  4. Can be convenient while building new nodes within the same cluster or new cluster.

Cons:

  1. It is impossible when performing a restore on a less granular level such as specific DB or Collection restore.
  2. Incremental backups cannot be achieved yet.
  3. A dedicated node is recommended for backup (might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB

Index Size: <1MB (since it was only on _id for testing)

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 137029,
        "avgObjSize" : 2097192,
        "dataSize" : 267.6398703530431,
        "storageSize" : 13.073314666748047,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 0.0011749267578125,
        "scaleFactor" : 1073741824,
        "fsUsedSize" : 16.939781188964844,
        "fsTotalSize" : 49.98826217651367,
        "ok" : 1,
        ...
}
demo:PRIMARY>

        1. Hot backup

Syntax : 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

 

Note: The backup path “backupDir” should be absolute. It also supports storing the backups on the filesystem and AWS S3 buckets.

[root@ip-172-31-37-92 tmp]# time mongo  < hot.js
Percona Server for MongoDB shell version v4.2.8-8
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }
Percona Server for MongoDB server version: v4.2.8-8
switched to db admin
{
        "ok" : 1,
        ...
}
bye

real    3m51.773s
user    0m0.067s
sys     0m0.026s
[root@ip-172-31-37-92 tmp]# ls
hot  hot.js  mongodb-27017.sock  nohup.out  systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD  tmux-0
[root@ip-172-31-37-92 tmp]# du -sch hot
15G     hot
15G     total

Notice the time taken by “Percona hot backup” was just 4 minutes approx. It is even very helpful during the rebuild of a node or spinning new instances/cluster with the same dataset. The best part is it doesn’t compromise with locking of writes or any performance hits. However, it is also recommended to run it against the secondaries. 

       2.  Filesystem Snapshot

The approximate time taken for the snapshot to be completed was only 4 minutes.

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots  --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"
{
    "SnapshotId": "snap-0f4403bc0fa0f2e9c",
    "StartTime": "2020-08-26T12:26:32.783Z"
}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots \
> --snapshot-ids snap-0f4403bc0fa0f2e9c
{
    "Snapshots": [
        {
            "Description": "This is my snapshot backup",
            "Encrypted": false,
            "OwnerId": "021086068589",
            "Progress": "100%",
            "SnapshotId": "snap-0f4403bc0fa0f2e9c",
            "StartTime": "2020-08-26T12:26:32.783Z",
            "State": "completed",
            "VolumeId": "vol-0def857c44080a556",
            "VolumeSize": 50
        }
    ]
}

       3. Mongodump

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &
[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out
2020-08-26T12:36:20.842+0000    writing test.collG to /mongodump/test/collG.bson
2020-08-26T12:51:08.832+0000    [####....................]  test.collG  27353/137029  (20.0%)

Note: Just to give an idea, we can clearly see that for the same dataset where snapshot and hot backup took only 3-5 minutes, “mongodump” took almost 15 minutes just for 20% of the dump. Hence the speed to back up the data is definitely very slow as compared to the other two options we have. And on top of that, we would only be left with one option to restore the backup that is “mongorestore” which will eventually make the whole process much slower.

Conclusion

So, which backup method would be the best? It completely depends on factors like the type of infrastructure, environment, dataset size, load, etc. But generally, if the dataset is around 100GB or less than that, then the logical backups are the best option along with scheduled incremental backups as well, depending upon RTO (Recovery Time Objective)/RPO (Recovery Point Objective)  needs. However, if the dataset size is more than that, we should always go for physical backups including incremental backups (oplogs) as well.

Interested in trying Percona Backup for MongoDB? Download it for free! 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com