Percona-Lab/mongodb_consistent_backup: 1.0 Release Explained


In this blog post, I will cover the Percona-Lab/mongodb_consistent_backup tool and the improvements in the 1.0.1 release of the tool.


mongodb_consistent_backup is a tool for performing cluster consistent backups on MongoDB clusters or single-replica sets. This tool is open source Python code, developed by Percona and published under our Percona-Lab GitHub repository. Percona-Lab is a place for code projects maintained and supported with only best-effort from Percona.

By considering the entire MongoDB cluster’s shards and individual shard members, mongodb_consistent_backup can backup a cluster with one or many shards to a single point in time. Single-point-in-time consistency of cluster backups is critical to data integrity for any “sharded” database technology, and is a topic often overlooked in database deployments.

This topic is explained in detail by David Murphy in this Percona blog:

1.0 Release

mongodb_consistent_backup originally was a single replica set backup script internal to Percona, which morphed into a large multi-threaded/concurrent Python project. It was released to the public (Percona-Lab) with some rough edges.

This release focuses on the efficiency and reliability of the existing components, many of the pain-points in extending, deploying and troubleshooting the tool and adding some small features.

New Features: Config File Overhaul

The tool was moved to use a structured, nested YAML config file instead of the messy config implemented in 0.x.

You can see a full example of this new format at this URL:

Here’s an example of a very basic config file that’s using 3 x replica-set config servers as “seed hosts” (a new feature in 1.0!), username+password and the optional Nagios NSCA notification method:

  host: csReplSet/config01:27019,config02:27019,config03:27019
  username: mongodb_consistent_password
  password: "correct horse battery staple"
  authdb: admin
  log_dir: /var/log/mongodb_consistent_backup
    method: mongodump
    name: production-eu
    location: /var/lib/mongodb_consistent_backup
    method: tar
    method: nsca
      check_host: mongodb-production-eu
      check_name: "mongodb_consistent_backup"
    method: none

New Features: Logging

Overall there is much more logged in this release, both in “regular” mode and “verbose” mode. A highlight for this release is live logging of the output of mongodump, something that was missing from the 0.x versions of the tool.

Now we can see the progress of the backup of each shard/replset in a cluster! Below we can see the backup of csReplset (a config server replica set) dump many collections and complete its backup. After, we can see the replica sets “test1” and “test2” dumping “wikipedia.pages”.

[2017-05-05 20:11:05,366] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.settings (1 document)
[2017-05-05 20:11:05,367] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing config.version to
[2017-05-05 20:11:05,372] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.version (1 document)
[2017-05-05 20:11:05,373] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing config.locks to
[2017-05-05 20:11:05,377] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.locks (3 documents)
[2017-05-05 20:11:05,378] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing config.databases to
[2017-05-05 20:11:05,381] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.databases (1 document)
[2017-05-05 20:11:05,383] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing config.tags to
[2017-05-05 20:11:05,385] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.tags (0 documents)
[2017-05-05 20:11:05,387] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing config.changelog to
[2017-05-05 20:11:05,399] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	done dumping config.changelog (112 documents)
[2017-05-05 20:11:05,401] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/	writing captured oplog to
[2017-05-05 20:11:05,578] [INFO] [MongodumpThread-7] [MongodumpThread:run:133] Backup csReplSet/ completed in 0.71 seconds, 0 oplog changes
[2017-05-05 20:11:08,042] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[........................]  wikipedia.pages  636/35080  (1.8%)
[2017-05-05 20:11:08,071] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[........................]  wikipedia.pages  878/35118  (2.5%)
[2017-05-05 20:11:11,041] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[#.......................]  wikipedia.pages  1853/35080  (5.3%)
[2017-05-05 20:11:11,068] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[#.......................]  wikipedia.pages  2063/35118  (5.9%)
[2017-05-05 20:11:14,043] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[##......................]  wikipedia.pages  2983/35080  (8.5%)
[2017-05-05 20:11:14,075] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[##......................]  wikipedia.pages  3357/35118  (9.6%)
[2017-05-05 20:11:17,040] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[##......................]  wikipedia.pages  4253/35080  (12.1%)
[2017-05-05 20:11:17,070] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[###.....................]  wikipedia.pages  4561/35118  (13.0%)
[2017-05-05 20:11:20,038] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[###.....................]  wikipedia.pages  5180/35080  (14.8%)
[2017-05-05 20:11:20,067] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[###.....................]  wikipedia.pages  5824/35118  (16.6%)
[2017-05-05 20:11:23,050] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/	[####....................]  wikipedia.pages  6216/35080  (17.7%)
[2017-05-05 20:11:23,072] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/	[####....................]  wikipedia.pages  6964/35118  (19.8%)

Also, while backup data is gathered the status output from each Oplog tailing thread is now logged every 30 seconds (by default):

[2017-05-05 20:12:09,648] [INFO] [TailThread-2] [TailThread:status:60] Oplog tailer test1/ status: 256 oplog changes, ts: Timestamp(1494020048, 6)
[2017-05-05 20:12:11,033] [INFO] [TailThread-3] [TailThread:status:60] Oplog tailer test2/ status: 1588 oplog changes, ts: Timestamp(1494020049, 50)
[2017-05-05 20:12:22,804] [INFO] [TailThread-4] [TailThread:status:60] Oplog tailer csReplSet/ status: 43 oplog changes, ts: Timestamp(1494020062, 1)

You can now write log files to disk by setting the ‘log_dir’ config variable or ‘–log-dir’ command-line flag. One log file per backup is written to this directory, with a symlink pointing to the latest log file. The previous backup’s log file is automatically compressed with gzip.

New Features: ZBackup

ZBackup is an open-source de-duplication, compression and (optional) encryption tool for archive-like data (similar to backups). Files that are fed into ZBackup are organized at a block-level into pieces called “bundles”. When more files are fed into ZBackup, it can re-use the bundles when it notices the same blocks are being backed up. This approach provides significant savings on disk space (required for many database backups). To add to the savings, all data in ZBackup is compressed using LZMA compression, which generally compresses better than gzip/deflate or zip. ZBackup also supports an optional AES-128 encryption at rest. You enable it by providing a key file to ZBackup that allows it to encode/decode the data.

mongodb_consistent_backup 1.0.0 now supports ZBackup as a new archiving method!

Below is an example of ZBackup used on a small database (about 1GB) that is constantly growing.

This graph compares the size added on disk for seven backups taken 10-minutes apart using two methods. The first method is mongodb_consistent_backup, with mongodump built-in gzip compression (available via the –gzip flag since 3.2) enabled. By default mongodump gzip is enabled by mongodb_consistent_backup (if it’s available), so this is a good “baseline”. The second method is mongodb_consistent_backup with mongodump gzip compression disabled and ZBackup used as the mongodb_consistent_backup archiving method, a post-backup stage in our tool. Notice each backup in the graph after the first only adds 14-18mb to the disk usage, meaning ZBackup was able to recognize similarities in the data.

To try out ZBackup as an archive method, use one of these methods:

  1. Set the field “method” under the “archive” section of your mongodb_consistent_backup config file to “zbackup” (example):
         method: zbackup
  2. Or, add the command-line flag “archive.method=zbackup” to your command line.

This archive method causes mongodb_consistent_backup to create a subdirectory in the backup location named “mongodb-consistent-backup_zbackup” and import completed backups into ZBackup after the backup stage. This directory contains the ZBackup storage files that it needs to operate, and they should not be modified!

Of course, there are trade-offs. ZBackup adds some additional system resource usage and time to the overall backup AND restore process – both importing and exporting data into ZBackup takes some additional time.

By default ZBackup’s restore uses a very small amount of RAM for cache, so increasing the cache with the “–cache-size” flag may improve restore performance. ZBackup uses threading so more CPUs can also improve performance of backups and restores.

New Features: Docker Container

We now offer a Dockerfile for building mongodb_consistent_backup with all dependencies into a Docker container! The goal for the image is to be as “thin” as possible, and so the build merely downloads a prebuilt binary of the tool and installs dependencies. See:

Some interesting use cases for a Docker-based deployment of the tool come to mind:

  • Running MongoDB backups using ephemeral containers on Apache Mesos or Kubernetes (with persistent volumes or remote upload)
  • Restricting system resources used by mongodb_consistent_backup via Docker/cgroup’s isolation features
  • Simplified deployment or isolated dependencies (e.g., Python, Mongodump, etc.)

Up-to-date images of mongodb_consistent_backup are available at this Dockerhub URL: This image includes mongodb_consistent_backup, gzip-capable mongodump binaries and latest-stable ZBackup binaries.

To run the latest Dockerhub image:

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest <mongodb_consistent_backup-flags here>

To just list the “help” page (all available options):

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest --help
usage: mongodb-consistent-backup [-h] [-c CONFIGPATH]
                                 [-e {production,staging,development}] [-V]
                                 [-v] [-H HOST] [-P PORT] [-u USER]
                                 [-p PASSWORD] [-a AUTHDB] [-n BACKUP.NAME]
                                 [-l BACKUP.LOCATION] [-m {mongodump}]
                                 [-L LOG_DIR] [--lock-file LOCK_FILE]
                                 [--sharding.balancer.wait_secs SHARDING.BALANCER.WAIT_SECS]
                                 [--sharding.balancer.ping_secs SHARDING.BALANCER.PING_SECS]
                                 [--archive.method {tar,zbackup,none}]
                                 [--archive.tar.compression {gzip,none}]
                                 [--archive.tar.threads ARCHIVE.TAR.THREADS]
                                 [--archive.zbackup.binary ARCHIVE.ZBACKUP.BINARY]
                                 [--archive.zbackup.cache_mb ARCHIVE.ZBACKUP.CACHE_MB]
                                 [--archive.zbackup.compression {lzma}]

An example script for running the container with persistent Docker volumes is available here:

New Features: Multiple Seed Hosts + Config Servers

mongodb_consistent_backup 1.0 introduces the ability to define a list of multiple “seed” hosts, preventing a potential for a single-point of failure in your backups! If a host in the list is unavailable, it will be skipped.

Multiple hosts should be specified with this replica-set URL format, many hosts separated by commas:


Or you can specify a comma-separated list without the replica set name for non-replset nodes (eg: mongos or non-replset config servers):


Also, the functionality to use cluster Config Servers as seed hosts was added. Before version 1.0 a clustered backup needed to use a single mongos router as a seed host to find all shards and cluster members. Sometimes mongos routers can come and go as you scale, making this design brittle.

With this new functionality, mongodb_consistent_backup can use the Cluster Config Servers to map out the cluster, which are usually three times the fairly-static hosts in an infrastructure. This makes the deployment and operation of the tool a bit simpler and more reliable.

Overall Improvements

As mentioned, a focus in this release was improving the existing code. A major refactoring of the code structure of the project was completed in 1.0, and moves the major “phases” or “stages” in the tool to their own Python sub-modules (e.g., “Backup” and “Archive”) that then auto-load their various “methods” like “mongodump” or “Zbackup”.

The code was broken into these high-level stages:

  1. Backup. The stage that gathers the backup of data. During this stage, Oplog tailing and resolving also occur if the backup is for a cluster. More backup methods are coming soon!
  2. Archive. The stage that archives and optionally compresses the backup data. The new ZBackup method also adds de-duplication and encryption ability to this stage.
  3. Upload. The stage that uploads the resulting data to a remote storage. Currently only AWS S3 is supported with Google Cloud Storage and Rsync being added as we speak.
  4. Notify. The stage that notifies external systems of the success/failure of the backup. Currently, our tool only supports Nagios NSCA, with plans for PagerDuty and others to be added.

Some interesting code enhancements include:

  • Reusing of database connections. This reduces the number of connections on seed hosts.
  • Replication heartbeat time (“operational lag”). This is now considered in replica set lag calculations.
  • Added thread safety for oplog tailing threads. This resolves some issues on extremely-overloaded hosts.

Another focus was efficiency and preventing race conditions. The tool should be much less susceptible to error as a result, although if you see any problems we’d like to hear about them on our GitHub “Issues” page.

Lastly, we encourage the open source community to contribute additional functionality to this tool via our GitHub!

Release Notes:

  • 1.0.0
    • Move to dynamic code “Submodules” and subclassing of repeated components
    • Restructuring of YAML config to nested config
    • Safe start/stopping of oplog tailer threads, additional checking on all thread states
    • File-based logging with gzip of old log
    • Oplog tailer ‘oplogReplay’ performance optimization
    • Fixes to oplog durability to-disk
    • Live mongodump output to stdout in realtime
    • Oplog tailer status logging
    • ZBackup archive method: supporting deduplication, compression and option AES encryption
    • Support for list discovery/seed hosts
    • Support configdb servers as cluster seed hosts
    • Fewer (reused) database connections
    • Database connections to use strong write concern
    • Consider replication operational lag in secondary scoring
    • Backup metadata is written for future functionality and troubleshooting
    • mongodb_consistent_backup.Errors custom exceptions for proper exception handling
    • Python PyPi support added
    • Dockerfile support for running under containers
    • Additional log messages
    • Support for MongoDB 3.4 datatypes
    • Significant reworking of existing code for efficiency, reliability and readability

More about our releases can be seen here:

Powered by WordPress | Theme: Aeros 2.0 by