Apr
18
2018
--

Restore a MongoDB Logical Backup

MongoDB Logical Backup

MongoDB Logical BackupIn this article, we will explain how to restore a MongoDB logical backup performed via ‘mongodump’ to a mongod instance.

MongoDB logical backup requires the use of the ‘mongorestore‘ tool to perform the restore backup. This article focuses on this tool and process.

Note: Percona develops a backup tool named Percona-Lab/mongodb-consistent-backup, which is a wrapper for ‘mongodump‘, adding cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the exact same steps as a regular ‘mongodump’ backup – no special steps!

Mongorestore Command Flags

–host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017). If authorization is enabled, add –user/–password flags also.

–drop

This is almost always required. This causes ‘mongodump‘ to drop the collection that is being restored before restoring it. Without this flag, the documents from the backup are inserted one at a time and if they already exist the restore fails.

–oplogReplay

This is almost always required. Replays the oplog that was dumped by mongodump. It is best to include this flag on replset-based backups unless there is a specific reason not to. You can tell if the backup was from a replset by looking for the file ‘oplog.bson‘ at the base of the dump directory.

–dir

Required. The path to the mongodump data.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the restore. This is required if ‘mongodump‘ used the –gzip flag (look for *.bson.gz files if you’re not sure if the collection files have no .gz suffix, don’t use –gzip).

–numParallelCollections=<number>

Optional. For mongodump >= 3.2 only, sets the number of collections to insert in parallel. By default four threads are used, and if you have a large server and you want to restore faster (more resource usage though), you could increase this number. Note that each thread uncompresses bson if the ‘–gzip‘ flag is used, so consider this when raising this number.

Steps

  1. (Optional) If the backup is archived (mongodb_consistent_backup defaults to creating tar archives), un-archive the backup so that ‘mongorestore‘ can access the .bson/.bson.gz files:
    $ tar -C /opt/mongodb/backup/testbackup/20160809_1306 -xvf /opt/mongodb/backup/testbackup/20160809_1306/test1.tar
    test1/
    test1/dump/
    test1/dump/wikipedia/
    test1/dump/wikipedia/pages.metadata.json.gz
    test1/dump/wikipedia/pages.bson.gz
    test1/dump/oplog.bson

    ** This command un-tars the backup to ‘/opt/mongodb/backup/testbackup/20160809_1306/test1/dump’ **

  2. Check (and then check again!) that you’re restoring the right backup to the right host. When in doubt, it is safer to ask the customer or others.
    1. The Percona ‘mongodb_consistent_backup‘ tool names backup subdirectories by replica set name, so you can ensure you’re restoring the right backup by checking the replica set name of the node you’re restoring to, if it exists.
    2. If you’re restoring to a replica set you will need to restore to the PRIMARY member and there needs to be a majority (so writes are accepted – some exceptions if you override write-concern, but not advised).
  3. Use ‘mongorestore‘ to restore the data by dropping/restoring each collection (–drop flag) and replay the oplog changes (–oplogReplay flag), specifying the restore dir explicitly (–dir flag) to the ‘mongorestore‘ command. In this example I also used authorization (–user/–password flags) and un-compression (–gzip flag):
    $ mongorestore --drop --host localhost --port 27017 --user secret --password secret --oplogReplay --gzip --dir /opt/mongodb/backup/testbackup/20160809_1306/test1/dump
    2016-08-09T14:23:04.057+0200    building a list of dbs and collections to restore from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump dir
    2016-08-09T14:23:04.065+0200    reading metadata for wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.metadata.json.gz
    2016-08-09T14:23:04.067+0200    restoring wikipedia.pages from /opt/mongodb/backup/testbackup/20160809_1306/test1/dump/wikipedia/pages.bson.gz
    2016-08-09T14:23:07.058+0200    [#######.................]  wikipedia.pages  63.9 MB/199.0 MB  (32.1%)
    2016-08-09T14:23:10.058+0200    [###############.........]  wikipedia.pages  127.7 MB/199.0 MB  (64.1%)
    2016-08-09T14:23:13.060+0200    [###################.....]  wikipedia.pages  160.4 MB/199.0 MB  (80.6%)
    2016-08-09T14:23:16.059+0200    [#######################.]  wikipedia.pages  191.5 MB/199.0 MB  (96.2%)
    2016-08-09T14:23:19.071+0200    [########################]  wikipedia.pages  223.5 MB/199.0 MB  (112.3%)
    2016-08-09T14:23:22.062+0200    [########################]  wikipedia.pages  255.6 MB/199.0 MB  (128.4%)
    2016-08-09T14:23:25.067+0200    [########################]  wikipedia.pages  271.4 MB/199.0 MB  (136.4%)
    ...
    ...
    2016-08-09T14:24:19.058+0200    [########################]  wikipedia.pages  526.9 MB/199.0 MB  (264.7%)
    2016-08-09T14:24:22.058+0200    [########################]  wikipedia.pages  558.9 MB/199.0 MB  (280.8%)
    2016-08-09T14:24:23.521+0200    [########################]  wikipedia.pages  560.6 MB/199.0 MB  (281.6%)
    2016-08-09T14:24:23.522+0200    restoring indexes for collection wikipedia.pages from metadata
    2016-08-09T14:24:23.528+0200    finished restoring wikipedia.pages (32725 documents)
    2016-08-09T14:24:23.528+0200    replaying oplog
    2016-08-09T14:24:23.597+0200    done
    1. If you encounter problems with ‘mongorestore‘, carefully read the error message or rerun with several ‘-v‘ flags, e.g.: ‘-vvv‘. Once you have an error, attempt to troubleshoot the cause.
  4. Check to see that you saw “replaying oplog” and “done” after the restore (last two lines in the example). If you don’t see this, there is a problem.

As you notice, using this tool for MongoDB logical backup is very simple. However, when using sharding please note that –oplog is not available and the mongodump uses the primaries for each shard. As this is not advised typically in production, you might consider looking at Percona-Lab/mongodb-consistent-backup to ensure you are consistent and hitting secondary nodes, like mongodump with replica sets, will work.

If MongoDB and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Restore a MongoDB Logical Backup appeared first on Percona Database Performance Blog.

Apr
18
2018
--

Webinar Thursday, April 19, 2018: Running MongoDB in Production, Part 1

Running MongoDB

Running MongoDBPlease join Percona’s Senior Technical Operations Architect, Tim Vaillancourt as he presents Running MongoDB in Production, Part 1 on Thursday, April 19, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Are you a seasoned MySQL DBA that needs to add MongoDB to your skills? Are you used to managing a small environment that runs well, but want to know what you might not know yet? This webinar helps you with running MongoDB in production environments.

MongoDB works well, but when it has issues, the number one question is “where should I go to solve a problem?”

This tutorial will cover:

Backups
– Logical vs Binary-level backups
– Sharding and Replica-Set Backup strategies
Security
– Filesystem and Network Security
– Operational Security
– External Authentication features of Percona Server for MongoDB
– Securing connections with SSL and MongoDB Authorization
– Encryption at Rest
– New Security features in 3.6
Monitoring
– Monitoring Strategy
– Important metrics to monitor in MongoDB and Linux
– Percona Monitoring and Management

Register for the webinar now.

Part 2 of this series will take place on Thursday, April 26, 2018, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4). Register for the second part of this series here.

Running MongoDBTimothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB, with the goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned. Tim is based in Amsterdam, NL and enjoys traveling, coding and music.

Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thursday, April 19, 2018: Running MongoDB in Production, Part 1 appeared first on Percona Database Performance Blog.

Apr
13
2018
--

MongoDB Replica Set Tag Sets

MongoDB Replica Set Tag Sets

MongoDB Replica Set Tag SetsIn this blog post, we will look at MongoDB replica set tag sets, which enable you to use customized write concern and read preferences for replica set members.

This blog post will cover most of the questions that come to mind before using tag sets in a production environment.

  • What scenarios are these helpful for?
  • Do these tag sets work with all read preferences modes?
  • What if we’re already using maxStalenessSeconds along with the read preferences, can we still use a tag set?
  • How can one configure tag sets in a replica set?
  • Do these tags work identically for custom read preferences and write concerns?

Now let’s answer all these questions one by one.

What scenarios are these helpful for?

You can use tags:

  • If replica set members have different configurations and queries need to be redirected to the specific secondaries as per their purpose. For example, production queries can be redirected to the higher configuration member for faster execution and queries used for internal reporting purpose can be redirected to the low configurations secondaries. This will help improve per node resource utilization.
  • When you use custom read preferences, but the reads are routed to a secondary that resides in another data center to make reads more optimized and cost-effective. You can use tag sets to make sure that specific reads are routed to the specific secondary node within the DC.
  • If you want to use custom write concerns with the tag set for acknowledging writes are propagated to the secondary nodes per the requirements.

Do these tag sets work with all read preferences modes?

Yes, these tag-sets work with all the read preferences — except “primary” mode. “Primary” preferred read preference mode doesn’t allow you to add any tag sets while querying.

replicaTest:PRIMARY> db.tagTest.find().readPref('primary', [{"specs" : "low","purpose" : "general"}])
Error: error: {
	"ok" : 0,
	"errmsg" : "Only empty tags are allowed with primary read preference",
	"code" : 2,
	"codeName" : "BadValue"
}

What if we’re already using maxStalenessSeconds along with the read preferences, can tag set still be used?

Yes, you can use tag sets with a maxStalenessSeconds value. In that case, priority is given to staleness first, then tags, to get the most recent data from the secondary member.

How can one configure tag sets in a replica set?

You can configure tags by adding a parameter in the replica set configuration. Consider this test case with a five members replica set:

"members" : [
		{
			"_id" : 0,
			"name" : "host1:27017",
			"stateStr" : "PRIMARY",
		},
		{
			"_id" : 1,
			"name" : "host2:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 2,
			"name" : "host3:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 3,
			"name" : "host4:27017",
			"stateStr" : "SECONDARY",
		},
		{
			"_id" : 4,
			"name" : "host5:27017",
			"stateStr" : "SECONDARY",
         }
		]

For our test case, members specification of the host are “specs” and the requirement for the query as per the application is the “purpose,” in order to route queries to specific members in an optimized manner.

You must associate tags to each member by adding it to the replica set configuration:

cfg=rs.conf()
cfg.members[0].tags={"specs":"high","purpose":"analytics"}
cfg.members[1].tags={"specs":"high"}
cfg.members[2].tags={"specs":"low","purpose":"general"}
cfg.members[3].tags={"specs":"high","purpose":"analytics"}
cfg.members[4].tags={"specs":"low"}
rs.reconfig(cfg)

After adding tags, you can validate these changes by checking replica set configurations like:

rs.conf()
	"members" : [
		{
			"_id" : 0,
			"host" : "host1:27017",
			"tags" : {
				"specs" : "high",
				"purpose" : "analytics"
			},
		},
		{
			"_id" : 1,
			"host" : "host2:27017",
			"tags" : {
				"specs" : "high"
			},
		},
		{
			"_id" : 2,
			"host" : "host3:27017",
			"tags" : {
				"specs" : "low",
				"purpose" : "general"
			},
		},
		{
			"_id" : 3,
			"host" : "host4:27017",
			"tags" : {
				"specs" : "high",
				"purpose" : "analytics"
			},
		},
		{
			"_id" : 4,
			"host" : "host5:27017",
			"tags" : {
				"specs" : "low"
			},
		}
	]

Now, we are done with the tag-set configuration.

Do these tags work identically for custom read preferences and write concerns?

No, custom read preferences and write concerns consider tag sets in different ways.

Read preferences routes read operations to a required specific member by following tag values assigned to it, but write concerns follows tag values only to check if the value is unique or not. It will not consider tag values while selecting replica members.

Let us see how to use tag sets with write concerns. As per our test case, we have two unique tag values (i.e., “analytics” and “general”) defined as:

cfg=rs.conf()
cfg.settings={ getLastErrorModes: {writeNode:{"purpose": 2}}}
rs.reconfig(cfg)

You can validate these changes by checking the replica set configuration:

rs.conf()
	"settings" : {
			"getLastErrorModes" : {
			"writeNode" : {
				"purpose" : 2
			}<strong>
		},</strong>
	}

Now let’s try to insert a sample document in the collection named “tagTest” with this write concern:

db.tagTest.insert({name:"tom",tech:"nosql",status:"active"},{writeConcern:{w:"writeNode"}})
WriteResult({ "nInserted" : 1 })

Here, the write concern “writeNode” means the client gets a write acknowledgment from two nodes with unique tag set values. If the value set in the configuration exceeds the count of unique values, then it leads to an error at the time of the write:

cfg.settings={ getLastErrorModes: {writeNode:{"purpose": 4}}}
rs.reconfig(cfg)
db.tagTest.insert({name:"tom",tech:"nosql",status:"active"},{writeConcern:{w:"writeNode"}})
WriteResult({
	"nInserted" : 1,
	"writeConcernError" : {
		"code" : 100,
		"codeName" : "CannotSatisfyWriteConcern",
		"errmsg" : "Not enough nodes match write concern mode "writeNode""
	}
}

You can perform read and write operations with tag sets like this:

db.tagTest.find({name:"tom"}).readPref("secondary",[{"specs":"low","purpose":"general"}])
db.tagTest.insert({name:"john",tech:"rdbms",status:"active"},{writeConcern:{w:"writeNode"}})

I hope this helps you to understand how to configure MongoDB replica set tag sets, how the read preferences and write concerns handle them, and where you can use them

The post MongoDB Replica Set Tag Sets appeared first on Percona Database Performance Blog.

Apr
13
2018
--

This Week in Data with Colin Charles 35: Percona Live 18 final countdown and a roundup of recent news

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live is just over a week away — there’s an awesome keynote lineup, and you really should register. Also don’t forget to save the date as Percona Live goes to Frankfurt, Germany November 5-7 2018! Prost!

In acquisitions, we have seen MariaDB acquire MammothDB and Idera acquire Webyog.

Some interesting Amazon notes: Amazon Aurora Continues its Torrid Growth, More than Doubling the Number of Active Customers in the Last Year (not sure I’d describe it as torrid but this is great for MySQL and PostgreSQL), comes with a handful of customer mentions. In addition, there have already been 65,000 database migrations on AWS. For context, in late November 2017, it was 40,000 database migrations.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 35: Percona Live 18 final countdown and a roundup of recent news appeared first on Percona Database Performance Blog.

Apr
12
2018
--

Percona Server for MongoDB 3.4.14-2.12 Is Now Available

Percona Server for MongoDB 3.4

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.4.14-2.12 on April 12, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.14 and does not include any additional changes.

The Percona Server for MongoDB 3.4.14-2.12 release notes are available in the official documentation.

The post Percona Server for MongoDB 3.4.14-2.12 Is Now Available appeared first on Percona Database Performance Blog.

Apr
11
2018
--

Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available!

Percona Live 2018 Keynotes

Percona Live 2018 KeynotesWe’ve posted the Percona Live 2018 keynote addresses for the seventh annual Percona Live Open Source Database Conference 2018, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, CA. 

This year’s keynotes explore topics ranging from how cloud and open source database adoption accelerates business growth, to leading-edge emerging technologies, to the importance of MySQL 8.0, to the growing popularity of PostgreSQL.

We’re excited by the great lineup of speakers, including our friends at Alibaba Cloud, Grafana, Microsoft, Oracle, Upwork and VividCortex, the innovative leaders on the Cool Technologies panel, and Brendan Gregg from Netflix, who will discuss how to get the most out of your database on a Linux OS, using his experiences at Netflix to highlight examples.  

With the theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to learn how to use and operate open source databases.

The Percona Live 2018 keynotes include:

Tuesday, April 24, 2018

  • Open Source for the Modern Business – Peter Zaitsev of Percona will discuss how open source database adoption continues to grow in enterprise organizations, the expectations and definitions of what constitutes success continue to change. A single technology for everything is no longer an option; welcome to the polyglot world. The talk will include several compelling open source projects and trends of interest to the open source database community and will be followed by a round of lightning talks taking a closer look at some of those projects.
  • Cool Technologies Showcase – Four industry leaders will introduce key emerging industry developments. Andy Pavlo of Carnegie Mellon University will discuss the requirements for enabling autonomous database optimizations. Nikolay Samokhvalov of PostgreSQL.org will discuss new PostgreSQL tools. Sugu Sougoumarane of PlanetScale Data will explore how Vitess became a high-performance, scalable and available MySQL clustering cloud solution in line with today’s NewSQL storage systems. Shuhao Wu of Shopify explains how to use Ghostferry as a data migration tool for incompatible cloud platforms.
  • State of the Dolphin 8.0 – Tomas Ulin of Oracle will discuss the focus, strategy, investments and innovations that are evolving MySQL to power next-generation web, mobile, cloud and embedded applications – and why MySQL 8.0 is the most significant MySQL release in its history.
  • Linux Performance 2018 – Brendan Gregg of Netflix will summarize recent performance features to help users get the most out of their Linux systems, whether they are databases or application servers. Topics include the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more.

Wednesday, April 25, 2018

  • Panel Discussion: Database Evolution in the Cloud – An expert panel of industry leaders, including Lixun Peng of Alibaba, Sunil Kamath of Microsoft, and Baron Schwartz of VividCortex, will discuss the rapid changes occurring with databases deployed in the cloud and what that means for the future of databases, management and monitoring and the role of the DBA and developer.
  • Future Perfect: The New Shape of the Data Tier – Baron Schwartz of VividCortex will discuss the impact of macro trends such as cloud computing, microservices, containerization, and serverless applications. He will explore where these trends are headed, touching on topics such as whether we are about to see basic administrative tasks become more automated, the role of open source and free software, and whether databases as we know them today are headed for extinction.
  • MongoDB at Upwork – Scott Simpson of Upwork, the largest freelancing website for connecting clients and freelancers, will discuss how MongoDB is used at Upwork, how the company chose the database, and how Percona helps make the company successful.

We will also present the Percona Live 2018 Community Awards and Lightning Talks on Monday, April 23, 2018, during the Opening Night Reception. Don’t miss the first day of tutorials and Opening Night Reception!

Register for the conference on the Percona Live Open Source Database Conference 2018 website.

Sponsorships

Limited Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are still available, and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Percona, VividCortex
  • Platinum – Alibaba Cloud, Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, Box, Dynimize, ObjectRocket, Pingcap, Shannon Systems, SolarWinds, TimescaleDB, TwinDB, Yelp
  • Contributing Sponsors – cPanel, Github, Google Cloud, NaviCat
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

The post Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available! appeared first on Percona Database Performance Blog.

Apr
06
2018
--

Free, Fast MongoDB Hot Backup with Percona Server for MongoDB

MongoDB Hot Backups

In this blog post, we will discuss the MongoDB Hot Backup feature in Percona Server for MongoDB and how it can help you get a safe backup of your data with minimal impact.

Percona Server for MongoDB

Percona Server for MongoDB is Percona’s open-source fork of MongoDB, aimed at having 100% feature compatibility (much like our MySQL fork).MongoDB Hot Backup We have added a few extra features to our fork for free that are only available with MongoDB Enterprise binaries for an additional fee.

The feature pertinent to this article is our free, open-source Hot Backup feature for WiredTiger and RocksDB, only available in Percona Server for MongoDB.

Essentially, this Hot Backup feature adds a MongoDB server command that creates a full binary backup of your data set to a new directory with no locking or impact to the database, aside from some increased resource usage due to copying the data.

It’s important to note these backups are binary-level backups, not logical backups (such as mongodump would produce).

Logical vs. Binary Backups

Before the concept of a MongoDB Hot Backup, the only way to backup a MongoDB instance, cluster or replica set was using the logical backup tool ‘mongodump’, or using block-device (binary) snapshots.

A “binary-level” backup means backup data contains the data files (WiredTiger or RocksDB) that MongoDB stores on disk. This is different from the BSON representation of the data that ‘mongodump’ (a “logical” backup tool) outputs.

Binary-level backups are generally faster than logical because a logical backup (mongodump) requires the server to read all data and return it to the MongoDB Client API. Once received, ‘mongodump’ serializes the payload into .bson files on disk. Important areas like indices are not backed up in full, merely the metadata describing the index is backed up. On restore, the entire process is reversed: ‘mongorestore’ must deserialize the data created by ‘mongodump’, send it over the MongoDB Client API to the server, then the server’s storage engine must translate this into data files on disk. Due to only metadata of indices being backed up in this approach, all indices must be recreated at restore time. This is a serial operation for each collection! I have personally restored several databases where the majority of the restore time was the index rebuilds and NOT the actual restore of the raw data!

In contrast, binary-level backups are much simpler: the storage-engine representation of the data is what is backed up. This style of backup includes the real index data, meaning a restore is as simple as copying the backup directory to be in the location of the MongoDB dbPath and restarting MongoDB. No recreation of indices is necessary! For very large datasets, this can be a massive win. Hours or even days of restore time can be saved due to this efficiency.

Of course, there are always some tradeoffs. Binary-level backups can take a bit more space on disk and care must be taken to ensure the files are restored to the right version of MongoDB on matching CPU architecture. Generally, backing up the MongoDB Configuration File and version number with your backups addresses this concern.

‘createBackup’ Command

The Hot Backup feature is triggered by a simple admin command via the ‘mongo’ shell named ‘createBackup’. This command requires only one input: the path to output the backup to, named ‘backupDir’. This backup directory must not exist or an error is returned.

If you have MongoDB Authorization enabled (I hope you do!), this command requires the built-in role: ‘backup’ or a role that inherits the “backup” role.

An example in the  ‘mongo’ shell:

> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017"
  })
{ "ok" : 1 }

When this command returns an “ok”, a full backup is available to be read at the location specified in ‘backupDir’. This end-result is similar to using block-device snapshots such as LVM snapshots, however with less overhead vs. the 10-30% write degradation many users report at LVM snapshot time.

This backup directory can be deleted with a regular UNIX/Linux “rm -rf ...” command once it is no longer required. A typical deployment archives this directory and/or upload the backup to a remote location (Rsync, NFS, AWS S3, Google Cloud Storage, etc.) before removing the backup directory.

WiredTiger vs. RocksDB Hot Backups

Although the ‘createBackup’ command is syntactically the same for WiredTiger and RocksDB, there are some big differences in the implementation of the backup behind-the-scenes.

RocksDB is a storage engine available in Percona Server for MongoDB that uses a level-tiered compaction strategy that is highly write-optimized. As RocksDB uses immutable (write-once) files on disk, it can provide much more efficient backups of the database by using filesystem “hardlinks” to a single inode on disk. This is important to know for large data sets as this requires exponentially less overhead to create a backup.

If your RocksDB-based server ‘createBackup’ command uses an output path that is on the same disk volume as the MongoDB dbPath (very important requirement), hardlinks are used instead of copying most of the database data! If only 5% of the data changes during backup, only 5% of data is duplicated/copied. This makes backups potentially much faster than WiredTiger, which needs to make a full copy of the data and use two-times as much disk space as a result.

Here is an example of a ‘createBackup’ command on a RocksDB-based mongod instance that uses ‘/data/mongodb27017’ as a dbPath:

$ mongo --port=27017
test1:PRIMARY> db.adminCommand({
    createBackup: 1,
    backupDir: "/data/backup27017.rocksdb"
})
{ "ok" : 1 }
test1:PRIMARY> quit()

Seeing we received { “ok”: 1 }, the backup is ready at our output path. Let’s see:

$ cd /data/backup27017.rocksdb
$ ls -alh
total 4.0K
drwxrwxr-x. 3 tim tim  36 Mar  6 15:25 .
drwxr-xr-x. 9 tim tim 147 Mar  6 15:25 ..
drwxr-xr-x. 2 tim tim 138 Mar  6 15:25 db
-rw-rw-r--. 1 tim tim  77 Mar  6 15:25 storage.bson
$ cd db
$ ls -alh
total 92K
drwxr-xr-x. 2 tim tim  138 Mar  6 15:25 .
drwxrwxr-x. 3 tim tim   36 Mar  6 15:25 ..
-rw-r--r--. 2 tim tim 6.4K Mar  6 15:21 000013.sst
-rw-r--r--. 2 tim tim  18K Mar  6 15:22 000015.sst
-rw-r--r--. 2 tim tim  36K Mar  6 15:25 000017.sst
-rw-r--r--. 2 tim tim  12K Mar  6 15:25 000019.sst
-rw-r--r--. 1 tim tim   16 Mar  6 15:25 CURRENT
-rw-r--r--. 1 tim tim  742 Mar  6 15:25 MANIFEST-000008
-rw-r--r--. 1 tim tim 4.1K Mar  6 15:25 OPTIONS-000005

Inside the RocksDB ‘db’ subdirectory we can see .sst files containing the data are there! As this MongoDB instance stores data on the same disk at ‘/data/mongod27017’, let’s prove that RocksDB created a “hardlink” instead of a full copy of the data.

First, we get the Inode number of an example .sst file using the ‘stat’ command. I chose the RocksDB data file: ‘000013.sst’:

$ stat 000013.sst
  File: ‘000013.sst’
  Size: 6501      	Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 33556899    Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tim)   Gid: ( 1000/     tim)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2018-03-06 15:21:10.735310581 +0200
Modify: 2018-03-06 15:21:10.738310479 +0200
Change: 2018-03-06 15:25:56.778556981 +0200
 Birth: -

Notice the Inode number for this file is 33556899. After the ‘find’ command can be used to find all files pointing to Inode 33556899 on /data:

$ find /data -inum 33556899
/data/mongod27017/db/000013.sst
/data/backup27017.rocksdb/db/000013.sst

Using the ‘-inum’ (Inode Number) flag of find, here we can see .sst files in both the live MongoDB instance (/data/mongodb27017) and the backup (/data/backup27017.rocksdb) are pointing to the same inode on disk for their ‘000013.sst’ file, meaning this file was NOT duplicated or copied during the hot backup process. Only metadata was written to point to the same inode! Now, imagine this file was 1TB+ and this becomes very impressive!

Restore Time

MongoDB Hot BackupIt bears repeating that restoring a logical, mongodump-based backup is very slow; Indices are rebuilt in serial for each collection and both mongorestore and the server need to spend time translating data from logical to binary representations.

Sadly, it is extremely rare that backup restore times are tested and I’ve seen large users of MongoDB disappointed to find that logical-backup restores will take several hours, an entire day or longer while their Production is on fire.

Thankfully, binary-level backups are very easy to restore: the backup directory needs to be copied to the location of the (stopped) MongoDB instance dbPath and then the instance just needs to be started with the same configuration file and version of MongoDB. No indices are rebuilt and there is no time spent rebuilding the data files from logical representations!

Percona-Labs/mongodb_consistent_backup Support

We have plans for our Percona-Labs/mongodb_consistent_backup tool to support the ‘createBackup’/binary-level method of backups in the future. See more about this tool in this Percona Blog post: https://www.percona.com/blog/2017/05/10/percona-lab-mongodb_consistent_backup-1-0-release-explained.

Currently, this project supports cluster-consistent backups using ‘mongodump’ (logical backups only), which are very time consuming for large systems.

Support for ‘createBackup’ in this tool greatly reduces the overhead and time required for backups of clusters, but it requires some added complexity to support remote filesystems it would use.

Conclusion

As this article outlines, there are a lot of exciting developments that have reduced the impact of taking backups of your systems. More important than the backup efficiency is the time it takes to recover your important data from backup, an area where “hot” binary backups are the clear winner by leaps and bounds.

If MongoDB hot backup and topics like this interest you, please see the document below, we are hiring!

{
  hiring: true,
  role: "Consultant",
  tech: "MongoDB",
  location: "USA",
  moreInfo: "https://www.percona.com/about-percona/careers/mongodb-consultant-usa-based"
}

The post Free, Fast MongoDB Hot Backup with Percona Server for MongoDB appeared first on Percona Database Performance Blog.

Apr
05
2018
--

Percona Live Europe 2018 – Save the Date!

Percona Live Europe 2018

Percona Live Europe 2018We’ve been searching for a great venue for Percona Live Europe 2018, and I am thrilled to announce we’ll be hosting it in Frankfurt, Germany! Please block November 5-7, 2018 on your calendar now and plan to join us at the Radisson Blu Frankfurt for the premier open source database conference.

We’re in the final days of organizing for the Percona Live 2018 in Santa Clara. You can still purchase tickets for an amazing lineup of keynote speakers, tutorials and sessions. We have ten tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Major areas of focus at the conference will include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

The Call for Papers for Percona Live Europe will open soon. We look forward to seeing you in Santa Clara!

The post Percona Live Europe 2018 – Save the Date! appeared first on Percona Database Performance Blog.

Apr
05
2018
--

Managing MongoDB Bulk Deletes and Inserts with Minimal Impact to Production Environments

MongoDB bulk deletes and inserts

MongoDB bulk deletes and insertsIn this blog post, we’ll look at how to manage MongoDB bulk deletes and inserts with little impact on production traffic.

If you are like me, there is no end to the demands placed on you as a DBA. One of the biggest is when we want to load X% more data into the database, during peak traffic no less. I refer to this as MongoDB bulk deletes and inserts. As a DBA, my first reaction is “no, do this during off-peak hours.” However, the business person in me says what if this is due to clients loading a customer, product, or email list into the system for work during business hours. That puts it into another light, does it not?

This raises the question of how can we change data in the database as fast as possible while also trying to give the production system some breathing room. In this blog, I wanted to give you some nice scripts that you can load into your MongoDB shell to really simplify the process.

First, we will cover an iterative delete function that can be stopped and restarted at any time. Next, I will talk about smart updating with similarly planned overhead. Lastly, I want to talk about more advanced forms of health checking when you want to do something a bit smarter than where this basic series of scripts stop.

Bulk Deleting with a Plan

In this code, you can see there are a couple of ways to manage these deletes. Specifically, you can see how to call this from anywhere (deleteFromCollection). I’ve also shown how to extend the shell so you can call (db.collection.deleteBulk). This avoids the need to provide the namespace, as it can discover that from the context of the function.

The idea behind this function is pretty straightforward: you provide it with a find pattern for what you want to delete. This could be { } if you don’t want to restrict it, but you should use .drop() in that case. After that, it expects a batch size, which is the number of document ID’s to use to drop in a single go. There is a trade-off between more deletes with more iterations or more with fewer iterations. Keep in mind this means there are 1000 of oplog entries per batch (also in my examples). You should consider this carefully and watch your oplog range as a result. You could improve this to allow someone to check that size, but it requires more permissions (we’ll leave that discussion for another time). Finally, between batches, the pauseNS sleeps for that duration.

If you find that the overhead is too much for you, simply kill the shell running this and it will stop running. You can then reduce the batch, increase the pause, or both, to make the system handle the change better. Sadly, this is not an exact science as some people have different behaviors they consider an “acceptable” from an impact perspective with so many writes. We will talk about this more in a bit:

function parseNS(ns){
    //Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.
    if (ns instanceof Array){
        database =  ns[0];
        collection = ns[1];
    }
    else{
        tNS =  ns.split(".");
        if (tNS.length > 2){
            print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');
            return false;
        }
        database = tNS[0];
        collection = tNS[1];
    }
    return {database: database,collection: collection};
}
DBCollection.prototype.deleteBulk = function( query, batchSize, pauseMS){
    //Parse and check namespaces
    ns = this.getFullName();
    srcNS={
        database:   ns.split(".")[0],
        collection: ns.split(".").slice(1,ns.length).join("."),
    };
    var db = this._db;
    var batchBucket = new Array();
    var totalToProcess = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess; }
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            printjson(db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).remove({_id : { "$in" : batchBucket}}));
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
function deleteFromCollection( sourceNS, query, batchSize, pauseMS){
    //Parse and check namespaces
    srcNS = parseNS(sourceNS);
    if (srcNS == false) { return false; }
    batchBucket = new Array();
    totalToProcess = db.getDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess};
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            db.getDB(srcNS.database).getCollection(srcNS.collection).remove({_id : { "$in" : batchBucket}});
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
/** Example Usage:
    deleteFromCollection("foo.bar",{"type":"archive"},1000,20);
  or
    db.bar.deleteBulk({type:"archive"},1000,20);
**/

Inserting & Updating with a Plan

Not to be outdone with the deletes, MongoDB updates and inserts are equally good for the same logic. In these cases, only small changes would be needed to create batches of inserts and then pass .insert(batchBucket) into the shell. Using “sleep” allows breather room for other reads and actions in the system. I find we don’t need this for modern MongoDB using WiredTiger, but your mileage can vary based on workloads. Also, you might want to figure out a way to tell the script how to handle a document that already exists. In the case of data loading, you could wrap the script with a check for errors not including a duplicate key. Please note it’s very easy to duplicate data if you do not have a unique index, and MongoDB is auto assigning its own _id field.

Updates are a tad tricker as they can be expensive if the query portion of the code is not indexed. I’ve provided you with an example, however. You should consider the query time when planning batches and pauses — the more the update is based on a table scan, the smaller the batch you should consider. The reasoning here is that we want to avoid restarting and causing a new table scan as much as possible. A future improvement might be to also support reads from a secondary, and doing the update itself on the primary by the _id field, to ensure a pin-pointed update query.

DBCollection.prototype.updateBulk = function( query, changeObject, batchSize, pauseMS){
    //Parse and check namespaces
    ns = this.getFullName();
    srcNS={
        database:   ns.split(".")[0],
        collection: ns.split(".").slice(1,ns.length).join("."),
    };
    var db = this._db;
    var batchBucket = new Array();
    var totalToProcess = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    if (totalToProcess < batchSize){ batchSize = totalToProcess; }
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc._id);
        if ( batchBucket.length >= batchSize){
            var bulk = db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).initializeUnorderedBulkOp();
            batchBucket.forEach(function(doc){
                bulk.find({_id:doc._id}).update(changeObject);
            })
            printjson(bulk.execute());
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");
        }
    })
    print("Completed");
}
/** Example Usage:
    db.bar.updateBulk({type:"archive"},{$set:{archiveDate: ISODate()}},1000,20);
**/

In each iteration, the update prints out the failure. You can extend this example code to either write the failures to a file or try to automatically fix any issues as appropriate. My goal here is to provide you the starter function to build on. As with the earlier example, this assumes the JS shell, but you can follow the logic in the programming language of your choice if you would rather use Python, Golang or Java.

If you got nothing else from this blog on MongoDB bulk deletes and inserts, I hope you learned a good deal more about writing functions in the shell. Hopefully, you learned how to use programming to add pauses to the bulk operations you need to do. Taking this forward, you could be inventive by having a query to measure latency to trigger pauses (canary query), or even measure things like the oplog to ensure your not adversely impacting HA and replication. There is no right answer, but this is a great start towards explaining more operationally safe ways to do the bigger actions DBAs are asked to do from time to time.

The post Managing MongoDB Bulk Deletes and Inserts with Minimal Impact to Production Environments appeared first on Percona Database Performance Blog.

Apr
04
2018
--

Percona Monitoring and Management 1.9.0 Is Now Available

Percona Monitoring and Management

PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

Percona Monitoring and ManagementThere are a number of significant updates in 1.9.0 that we hope you will like, some of the key highlights include:

  • Faster loading of the index page: We have enabled performance optimizations using gzip and HTTP2.
  • AWS improvements: We have added metrics from CloudWatch RDS to 6 dashboards, as well as changed our AWS add instance workflow, and made some changes to credentials handling.
  • Percona Snapshot Server: If you are a Percona customer you can now securely share your dashboards with Percona Engineers.
  • Exporting PMM Server logs: Retrieve logs from PMM Server for troubleshooting using single button-click, avoiding the need to log in manually to the docker container.
  • Low RAM support: We have reduced the memory requirement so PMM Server will run on systems with 512MB
  • Dashboard improvements: We have changed MongoDB instance identification for MongoDB graphs, and set maximum graph Y-axis on Prometheus Exporter Status dashboard

AWS Improvements

CloudWatch RDS metrics

Since we are already consuming Amazon Cloudwatch metrics and persisting them in Prometheus, we have improved six node-specific dashboards to now display Amazon RDS node-level metrics:

  • Cross_Server (Network Traffic)
  • Disk Performance (Disk Latency)
  • Home Dashboard (Network IO)
  • MySQL Overview (Disk Latency, Network traffic)
  • Summary Dashboard (Network Traffic)
  • System Overview (Network Traffic)

AWS Add Instance changes

We have changed our AWS add instance interface and workflow to be more clear on information needed to add an Amazon Aurora MySQL or Amazon RDS MySQL instance. We have provided some clarity on how to locate your AWS credentials.

AWS Settings

We have improved our documentation to highlight connectivity best practices, and authentication options – IAM Role or IAM User Access Key.

Enabling Enhanced Monitoring

Credentials Screen

Low RAM Support

You can now run PMM Server on instances with memory as low as 512MB RAM, which means you can deploy to the free tier of many cloud providers if you want to experiment with PMM. Our memory calculation is now:

METRICS_MEMORY_MULTIPLIED=$(( (${MEMORY_AVAIABLE} - 256*1024*1024) / 100 * 40 ))
if [[ $METRICS_MEMORY_MULTIPLIED < $((128*1024*1024)) ]]; then
    METRICS_MEMORY_MULTIPLIED=$((128*1024*1024))
fi

Percona Snapshot Server

Snapshots are a way of sharing PMM dashboards via a link to individuals who do not normally have access to your PMM Server. If you are a Percona customer you can now securely share your dashboards with Percona Engineers. We have replaced the button for sharing to the Grafana publicly hosted platform onto one administered by Percona. Your dashboard will be written to Percona snapshots and only Percona Engineers will be able to retrieve the data. We will be expiring old snapshots automatically at 90 days, but when sharing you will have the option to configure a shorter retention period.

Export of PMM Server Logs

In this release, the logs from PMM Server can be exported using single button-click, avoiding the need to log in manually to the docker container. This simplifies the troubleshooting process of a PMM Server, and especially for Percona customers, this feature will provide a more consistent data gathering task that you will perform on behalf of requests from Percona Engineers.

Faster Loading of the Index Page

In version 1.8.0, the index page was redesigned to reveal more useful information about the performance of your hosts as well an immediate access to essential components of PMM, however the index page had to load much data dynamically resulting in a noticeably longer load time. In this release we enabled gzip and HTTP2 to improve the load time of the index page. The following screenshots demonstrate the results of our tests on webpagetest.org where we reduce page load time by half. We will continue to look for opportunities to improve the performance of the index page and expect that when we upgrade to Prometheus 2 we will see another improvement.

The load time of the index page of PMM version 1.8.0

The load time of the index page of PMM version 1.9.0

Issues in this release¶

New Features

  • PMM-781: Plot new PXC 5.7.17, 5.7.18 status variables on new graphs for PXC Galera, PXC Overview dashboards
  • PMM-1274: Export PMM Server logs as zip file to the browser
  • PMM-2058: Percona Snapshot Server

Improvements

  • PMM-1587: Use mongodb_up variable for the MongoDB Overview dashboard to identify if a host is MongoDB.
  • PMM-1788: AWS Credentials form changes
  • PMM-1823: AWS Install wizard improvements
  • PMM-2010: System dashboards update to be compatible with RDS nodes
  • PMM-2118: Update grafana config for metric series that will not go above 1.0
  • PMM-2215: PMM Web speed improvements
  • PMM-2216: PMM can now be started on systems without memory limit capabilities in the kernel
  • PMM-2217: PMM Server can now run in Docker with 512 Mb memory
  • PMM-2252: Better handling of variables in the navigation menu

Bug fixes

  • PMM-605: pt-mysql-summary requires additional configuration
  • PMM-941: ParseSocketFromNetstat finds an incorrect socket
  • PMM-948: Wrong load reported by QAN due to mis-alignment of time intervals
  • PMM-1486: MySQL passwords containing the dollar sign ($) were not processed properly.
  • PMM-1905: In QAN, the Explain command could fail in some cases.
  • PMM-2090: Minor formatting issues in QAN
  • PMM-2214: Setting Send real query examples for Query Analytic OFF still shows the real query in example.
  • PMM-2221: no Rate of Scrapes for MySQL & MySQL Errors
  • PMM-2224: Exporter CPU Usage glitches
  • PMM-2227: Auto Refresh for dashboards
  • PMM-2243: Long host names in Grafana dashboards are not displayed correctly
  • PMM-2257: PXC/galera cluster overview Flow control paused time has a percentage glitch
  • PMM-2282: No data is displayed on dashboards for OVA images
  • PMM-2296: The mysql:metrics service will not start on Ubuntu LTS 16.04

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

The post Percona Monitoring and Management 1.9.0 Is Now Available appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com