May
22
2018
--

Upcoming Webinar Thursday, 5/24: What’s New in MongoDB 3.6

Running MongoDB

Please join Percona’s Senior Support Engineer, Adamo Tonete as he presents What’s New in MongoDB 3.6 on Thursday, May 24th, 2018, at 12:30 PM PDT (UTC-7) / 3:30 PM EDT (UTC-4).

In this webinar, Adamo will walk though what’s new in MongoDB 3.6, including:

  • Change streams for building reactive, real-time applications
  • Retryable writes for always-on write availability
  • Schema validation with JSON Schema for new data governance controls
  • Fully expressive array updates that perform complex array manipulations in a single atomic update operation
  • New security controls
  • End-to-end compression to create efficient, distributed architectures

This webinar is a summary and follow up to several published blog posts on MongoDB 3.6. More information can be found here.

Download the guide to MongoDB 3.6

 

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL Database Administrator for three years. As the main database member of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 7/24 environment. Before that, he worked as a Microsoft SQL Server DBA in a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA and in the past three years he has moved to NoSQL technologies without giving up relational databases. He likes to play video games and to study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.

Register for the webinar

The post Upcoming Webinar Thursday, 5/24: What’s New in MongoDB 3.6 appeared first on Percona Database Performance Blog.

Mar
12
2018
--

Mass Upgrade MongoDB Versions: from 2.6 to 3.6

Upgrade MongoDB

Upgrade MongoDBIn this blog post, we’re going to look at how to upgrade MongoDB when leaping versions.

Here at Percona, we see every type of upgrade you could imagine. Lately, however, I see an increase very old version upgrades (OVE). This is when you upgrade MongoDB from a version more than one step before the version you are upgrading to. Some examples are:

  • 2.6 -> 3.2
  • 2.6 -> 3.4
  • 2.6 -> 3.6
  • 3.0 -> 3.4
  • 3.0 -> 3.6
  • 3.2 -> 3.6

Luckily, they all have the same basic path to upgrade. Unluckily, it’s not an online upgrade. You need to dump/import again on the new version. For a relatively small system with few indexes and less than 100GB of data, this isn’t a big deal.

Many people might ask: “Why can’t I just upgrade 2.6->3.0->3.2->3.4->3.6?” You can do this and it could be fine, but it is hazardous. How many times have you upgraded five major versions in one row with no issues? How much work is involved with testing one driver change let alone five? What about the changes in the optimizer, storage engine, and driver versions, bulk feature differences and moving from stand-alone config servers to a replica set way they are now? Upgrading to the new election protocol, which implies more overhead?

Even if you navigate all of that, you still have to worry about what you didn’t think about. In a perfect world, you would have an ability to build a fresh duplicate cluster on 3.6 and then run production traffic on it to make sure things still work.

My advice is to only plan an in-place upgrade for a single version, and even then you should talk to an in-house expert or consulting firm to make sure you are making the right changes for future support.

As such, I am going to break things down into two areas:

  • Upgrade from the previous version (3.4.12 -> 3.6.3, for example)
  • Upgrading using dump/import

Upgrading from previous versions when using a Replica Set and in place

Generally speaking, if you are taking this path the manual is a great help (https://docs.mongodb.com/manual/release-notes/3.6-upgrade-replica-set/#upgrade-process). However, this is specific to 3.6, and my goal is to make this a bit more generic. As such, let’s break it down into steps acceptable in all systems.

Read the upgrade page for your version. At the end of the process below, you might have extra work to do.

  1. Set the setFeatureCompatibilityVersion to the previous version ‘db.adminCommand( { setFeatureCompatibilityVersion: “3.4” } )’
  2. Make your current primary prefer to be primary using something like below, where I assume the primary you want is the first node in the list
    >x=rs.config()
    >x.members[0].priority=1000
    >rs.reconfig(x)
  3. Now in reverse order from rs.config().members, take the highest member ID and stop one node at a time
    1. Stop the mongod node
    2. Run yum/apt upgrade, or replace the binary files with new ones
    3. Try to start the process manually, this might fail if you failed to note and fix configuration file changes
      1. A good example of this is requirements to set the engine to MMAPv1 moving from 3.0 -> 3.2, or how “smallfiles” was removed as an option and could cause the host not to start.
    4. Once started on the new version, make sure replication can keep up with ‘rs.printSlaveReplicationInfo()’
    5. Repeat this process one at a time until only node “0” (your primary) is done.
  4. Reverse your work from step three, and remove priority on the primary node. This might cause an election, but it rarely changes the primary.
  5. If the primary has not changed, run ‘rs.stepdown(300, 30)’. This tells it to let someone else be primary, gives the secondaries 30 seconds to catch up, and doesn’t allow itself to be prior for 270 more seconds.
  6. Inside those 270 seconds, you must shutdown the node and repeat step four (but only for this one node).
  7. You are done with a replica set, however, check the nodes on anything you needed to do on the Mongos layer.
    1. In MongoDB 3.6, we require config servers to be a replica set. This is easily done if the configdb configuration line on a mongos is “xxx/host1:port,host2:port2,host3:port” (Replica Set) or “host1:port,host2:port,host3:port” (Non-Replica Set). If you do not do this BEFORE upgrading mongos, it will fail to start. Treat Configs as a replica set upgrade if they are already in one.
  8. You can do one shard/replica set at a time, but if you do, the balancer MUST be off during this time to prevent odd confusion and rollbacks.
  9. You’re done!

As you can see, this is pretty generic. But it is a good set of rules to follow since each version might have a deprecated feature, removed configuration options or other changes. By reading the documentation, you should be OK. However, having someone who has done tens to hundreds of these already is very helpful.

Back to the challenge at hand, how would you like to follow this process five times in a row per replica-set/shard if you were moving to 2.6->3.6? What would be the risk of human error in all of that? Hopefully your starting just from an operational reason why we advise against OVE’s. But that’s only one side. During each of these iterations, you also need to redeploy the application, test to ensure it still works by running some type of Load or UAT system — including upgrading the driver for each version and applying builds for that driver (as some functions may change). I don’t know about you, but as a DBA, architecture, product owner and support manager this is just to much risk.

What are our options for doing this in a much more straightforward fashion, without causing engineer fatigue, splitting risk trees and other such concerns?

Upgrading using the dump/import method

Before we get into this one, we should talk about a couple of points. You do have a choice about online and offline modes for this. I will only cover the offline mode. You need to collect and apply operations occurring during this process for the online mode, and I do not recommend this for our support customers. It is something I have helped do for our consulting customer. This is because we can make sure the process works for your environment, and at the end of the day my job to make sure data is available and safe over anything else.

If you’re sharded, this must be done in parallel. You should use MCB (https://github.com/Percona-Lab/mongodb_consistent_backup). This is a good idea even if you’re not sharded, as it works with sharded and plain replica sets to ensure all the backups and config servers (if applicable) are “dumped” to the same point in time.

Next, if you are not using virtualization or the cloud, you’ll need to order in 2x the hardware and have a place for the old equipment. While not optimal, you might consider the above approach only for just the last version even with its risk if you don’t have the budget for anything else. With virtualization or cloud, people can typically use more hardware for a short time, and the cost is only the use of the equipment for that time. This is easily budgeted as part of the upgrade cost against the risks of not upgrading.

  1. Use MCB to take a backup and save it. An example config is:
    production:
         host: localhost
         port: 27017
         log_dir: /var/log/mongodb-consistent-backup
         backup:
             method: mongodump
             name: upgrade
             location: /mongo_backups/upgrade_XX_to_3.6
         replication:
             max_lag_secs: 10
         sharding:
             balancer:
                 wait_secs: [1+] (default: 300)
                 ping_secs: [1+] (default: 3)
         archive:
             method: tar
         tar:
             compression: none
  2. It figures out if it’s sharded or not. Additionally, it reaches out and maybe even backs up from another secondary as needed. When done, you will have a structure like:
    production:
    >find /mongo_backups/upgrade_XX_to_3.6
    >/mongo_backups/upgrade_XX_to_3.6/upgrade
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/rs1
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/rs1/rs1.tar
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/rs2
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/rs2/rs2.tar
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/config
    >/mongo_backups/upgrade_XX_to_3.6/upgrade_<datetime>/config/config.tar
    and so on...
  3. Now that we have a backup, we can build new hardware for rs1. As I said, we will focus only on this one replica set in this example (a backup would just have that folder in a single replica set back up):
    • Setup all nodes with a 3.6 compatible configuration file. You do not need to keep the same engine, use WiredTiger (default) if you’re not sure.
    • Disable authentication for now.
    • Start the nodes and ensure the replica sets are working and healthy (rs.initiate, rs.add, rs.status).
    • Run import using mongorestore and –oplog on the extracted tar file.
    • Drop admin.users. If the salt has changed, you’ll need to recreate all users (2.6 -> 3.0+).
    • Once the restore is complete, use rs.printReplicationInfo or PMM to verify when the replication is caught up from the import.
    • Start up your application pointing to the new location using the new driver you’ve already tested on this version, grab a beer and you’re done!

Hopefully, you can see how much more comfortable this route is. You know all the new features are working, and you do not need to do anything else (like in the old system) to make sure you have switched to replica-set configs or something.

In this process to upgrade MongoDB, if you used MCB you can do the same for sharding. However, you will keep all of you existing sharding, which the default dump/restore sharded does for you. It should be noted that in a future version they could change the layout of the config servers and this process might need adaption. If you think is the case, drop a question in the Percona Forums, Twitter, or even the contact-us page and we will be glad to help.

I want to thank you for reading this blog on how to upgrade MongoDB, and hope it helps. This is just a base guideline and there are many specific things per-version to consider that are outside of the scope of this blog. Percona has support and experts to help guide you if you have any questions.

Dec
22
2017
--

MongoDB 3.6 Sorting Changes and Useful New Features

MongoDB 3.6 sorting

MongoDB 3.6 sortingIn this blog, we will cover new MongoDB 3.6 sorting change and other useful features.

The new MongoDB 3.6 GA release can help you build applications and run them on MongoDB, and we will cover areas where there are changes to server behavior that may affect your application usage of MongoDB.

Sorting Changes

The most significant behavior change from an application perspective is sorting on arrays in find and aggregate queries. It is important to review this new behavior in a test environment when upgrading to MongoDB 3.6, especially if the exact order of sorted arrays in find results is crucial.

In MongoDB 3.6 sorting, a sort field containing an array is ordered with the lowest-valued element of the array first for ascending sorts and the highest-valued element of the array first for descending sorts. Before 3.6, it used the lowest-valued array element that matches the query.

Example: a collection has the following documents:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

And we perform this sort operation:

db.test.find({a: {$gte: 0}}).sort({a: 1});

In MongoDB 3.6 sorting, the sort operation no longer takes into account the query predicate when determining its sort key. The operation returns the documents in the following order:

{ "_id" : 1, "a" : [ 5, -4 ] }
{ "_id" : 0, "a" : [ -3, -2, 2, 3 ] }

Previous to 3.6 the result would be:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

More on this change here: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#array-sort-behavior and https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#find-method-sorting.

Wire Protocol Compression

In MongoDB 3.4, wire protocol compression was added to the server and mongo shell using the snappy algorithm. However, this was disabled by default. In MongoDB 3.6, wire protocol compression becomes enabled by default and the zlib compression algorithm was added as an optional compression with snappy being the default. We recommend snappy, unless a higher level of compression (with added cost) is needed.

It’s important to note that MongoDB tools do not yet support wire protocol compression. This includes mongodump, mongorestore, mongoreplay, etc. As these tools are generally used to move a lot of data, there are significant benefits to be had when using these tools over a non-localhost network.

I created this MongoDB ticket earlier this year to add wire protocol compression support to these tools: https://jira.mongodb.org/browse/TOOLS-1668. Please watch and vote for this improvement if this feature is important to you.

$jsonSchema Schema Validation

A big driver for using MongoDB is its “schema-less”, document-based data model. A drawback to this flexibility is sometimes it can result in incomplete/incorrect data in the database, due to the lack of input checking and sanitization.

The relatively unknown “Schema Validation” was introduced in MongoDB 3.2 to address this risk. This feature allowed a user to define what fields are required and what field-values are acceptable using a simple $or array condition, known as a “validator”.

In MongoDB 3.6, a much-more friendy $jsonSchema format was introduced as a “validator” in Schema Validation. On top of that, the ability to query documents matching a defined $jsonSchema was introduced!

Below is an example of me creating a collection named “test” with the required field “x” that must be the bsonType: “number”:

test1:PRIMARY> db.createCollection("test", {
	validator: {
		$jsonSchema: {
			bsonType: "object",
			required: ["x"],
			properties: {
				x: {
					bsonType: "number",
					description: ”field ‘x’ must be a number"
				}
			}
		}
	}
})
{ "ok" : 1, "operationTime" : Timestamp(1513090298, 1) }

Now when I insert a document that does not contain this criterion (“x” should be a number), I get an error:

test1:PRIMARY> db.test.insert({ x: "abc" })
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 121,
		"errmsg" : "Document failed validation"
	}
})

Of course, if my document matches the schema my insert will succeed:

test1:PRIMARY> db.test.insert({ x: 1 })
WriteResult({ "nInserted" : 1 })

To demonstrate $jsonSchema further, let’s perform a .find() query that returns documents matching my defined schema:

test1:PRIMARY> db.test.find({
	$jsonSchema:{
		bsonType: "object",
		required: ["x"],
		properties: {
			x: {
				bsonType: "number",
				description: "must be a number"
			}
		}
	}
})
{ "_id" : ObjectId("5a2fecfd6feb229a6aae374d"), "x" : 1 }

As we can see here, combining the power of the “schema-less” document model of MongoDB with the Schema Validation features is a very powerful combination! Now we can be sure our documents are complete and correct while still offering an extreme amount of developer flexibility.

If data correctness is important to your application, I suggest you implement a Schema Validator at the very start of your application development as implementing validation after data has been inserted is not straightforward.

More on $jsonSchema can be found here: https://docs.mongodb.com/manual/core/schema-validation/#json-schema

DNS SRV Connection

DNS-based Seedlists for connections is a very cool addition to MongoDB 3.6. This allows the server, mongo shell and client drivers (that support the new feature) to use a DNS SRV record to gather a list of MongoDB hosts to connect to. This avoids administrators from having to change seed hosts lists on several servers (usually in an application config) when the host topology changes.

DNS-based seedlists begin with “mongodb+srv://” and have a single DNS SRV record as the hostname.

An example:

mongodb+srv://server.example.com/

Would cause a DNS query to the SRV record ‘_mongodb._tcp.server.example.com’.

On the DNS server, we set the full list of MongoDB hosts that should be returned in this DNS SRV record query. Here is an example DNS response this feature requires:

Record                            TTL   Class    Priority Weight Port  Target
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27317 mongodb1.example.com.
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27017 mongodb2.example.com.

In this above example the hosts ‘mongodb1’ and ‘mongodb2.example.com’ would be used to connect to the database. If we decided to change the list of hosts, only the DNS SRV record needs to be updated. Neat!

More on this new feature here: https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist

dropDatabase Wait for Majority

In 3.6 the behavior of ‘dropDatabase’ was changed to wait for a majority of members to drop the database before returning success. This is a great step in the right direction to improve data integrity/correctness.

More on this change here: https://docs.mongodb.com/manual/reference/command/dropDatabase/#behavior

FTDC for mongos

On mongod instances the FTDC (full-time diagnostic capture) feature outputs .bson files to a directory named ‘diagnostics.data’ in the database path (the server dbPath variable). These files are useful for diagnostics, understanding crashes, etc.

On mongos the new FTDC support outputs the .bson files to ‘mongos.diagnostic.data’ beside the mongos log file. You can change the output path for FTDC files with the server parameter diagnosticDataCollectionDirectoryPath.

FTDC output files must be decoded to be read. The GitHub project ‘ftdc-utils’ is a great tool for reading these specially-formatted files, see more about this tool here: https://github.com/10gen/ftdc-utils.

Here is an example of how to decode the FTDC output files. We can follow the same process for mongod as well:

$ cd /path/to/mongos/mongos.diagnostic.data
$ ftdc decode metrics.2017-12-12T14-44-36Z-00000 -o output

Now it decodes the FTDC metrics to the file ‘output’.

listDatabases Filters

Added in MongoDB 3.6, you can now filter the ‘listDatabases‘ server command. Also, a ‘nameOnly’ boolean option was added to only output database names without additional detail.

The filtering of output is controlled by the new ‘listDatabases‘ option ‘filter’. The ‘filter’ variable must be a match-document with any combination of these available fields for filtering:

  1. name
  2. sizeOnDisk
  3. empty
  4. shards

An example filtering by “name” equal to “tim”:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { name: "tim" } })
{
	"databases" : [
		{
			"name" : "tim",
			"sizeOnDisk" : 8192,
			"empty" : false
		}
	],
	"totalSize" : 8192,
	"ok" : 1,
	"operationTime" : Timestamp(1513100396, 1)
}

Here, I am filtering ‘sizeOnDisk’ to find database larger than 30,000 bytes:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { sizeOnDisk: { $gt: 30000 } } })
{
	"databases" : [
		{
			"name" : "admin",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "local",
			"sizeOnDisk" : 233472,
			"empty" : false
		},
		{
			"name" : "test",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "tim",
			"sizeOnDisk" : 32768,
			"empty" : false
		}
	],
	"totalSize" : 331776,
	"ok" : 1,
	"operationTime" : Timestamp(1513100566, 2)
}

This can be really useful to reduce the size of the ‘listDatabases‘ result.

More on this here: https://docs.mongodb.com/manual/reference/command/listDatabases/#dbcmd.listDatabases

Arbiter priority: 0

MongoDB 3.6 changed the arbiter replica set priority to be 0 (zero). As the arbiter’s priority is not considered, this is a more correct value. You’ll notice your replica set configuration is automatically updated when upgrading to MongoDB 3.6.

More on this change here: https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/#considerations

More on MongoDB 3.6

There are many more changes in this release. It’s important to review these resources below before any upgrade. We always strongly recommend testing functionality in a non-production environment!

Check David Murphy’s blog post on MongoDB 3.6 sessions.

Release Notes: https://docs.mongodb.com/manual/release-notes/3.6/

Compatibility Changes: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/

Conclusion

It is really exciting to see that with each recent major release the MongoDB project is (impressively) tackling both usability/features, while significantly hardening the existing features.

Give your deployment, developers and operations engineers the gift of these new features and optimizations this holiday season/new year! Best wishes in 2018!

Dec
15
2017
--

MongoDB 3.6 Security Improvements

MongoDB 3.6 sorting

MongoDB 3.6 SecurityIn this blog post, we’ll look at MongoDB 3.6 security improvements.

As we’ve already talked about in this series, MongoDB 3.6 has a number of new features in it. But we have talked less about the new security enhancements in this release. The MongoDB 3.6 security features are particularly exciting. Some of these are just moving packaging default into binary default, but others are evidence that MongoDB is growing up and taking in what security professionals have been asking for. We can break things down into two major areas: Network Listening and more restrictive access controls. Hopefully, as a MongoDB user you are excited by this.

Network Listening

This is a purposely vague topic. As you know, MongoDB is just one of the many NoSQL databases that have been hit with hacks, theft and even ransomware events. It is true this was largely due to updated systems not following best practices, but MongoDB’s binaries did nothing to help the situation. Point of fact, both MongoDB Inc and Percona Server for MongoDB ship with configurations that are more restrictive than the binaries. Percona even has a tool to print a warning that it might not be secured if it detects a public IP bound to the machine.

It should be noted for anyone coming from MySQL or the RDBMS world that this differs from having user:password@host ACL restrictions. Even with this setting, any user can connect from any whitelist IP address. So it’s still important to consider a separate password per environment to prevent accidental issues where a developer thinks they are on a test box and instead drop a database in production. However, all is not lost: there are separate improvements on that front also.

You might ask, “How do I configure the bind IP setting?” (especially if you haven’t been). Let’s assume you have some type of NAT in your system, and 10.10.10.10 is some private NAT address that directly maps to a dedicated public one (your DB nodes are in your own data center, and your applications are somewhere else). We will also assume that while you don’t have a VPN, you do have firewall rules such that only your application hosts are allowed into the database. Not perfect, but better than some things we have seen in the news.

To enable listening on that port, you have two methods: command line and configuration file.

Command Line looks like

mongod --bind-ip 10.10.10.10 --fork --logpath /var/log/mongod/mongod.log --port 17001

.

The configuration file, on the other hand, is more of a YAML format:

net:
   port: 17001
   bindIp: 10.10.10.10
   bindIpAll: false

Please note that you should almost never set bindIpAll, as it forces the old behavior of listening to everything. Instead, use a comma-separated list, like “10.10.10.10, 68.82.123.213, 192.168.10.2”.

User Restrictions – CIDR/IP Whitelisting

Just now we talked about how bindIp works. It is for the server as a whole, not per user (something many RDBM systems have long enjoyed). David Murphy discussed this in his MongoDB 3.6 blog, and how MySQL has had it at least since at least 1998. Not only has MongoDB finally added host control to its abilities, but it’s also raised the game using the power of the document model. Typically in the MySQL world, you define a user like:

GRANT ALL PRIVILEGES ON dbTest.* To 'user'@'hostname' IDENTIFIED BY 'password';&nbsp;

Not a bad method really, but what if it allowed networks or specific IPs for a single user? This is actually a great first step, but there is more to go. For now, you can only say what sources and destinations a user can map to. You still need to make a user per environment, as you can’t define roles inside of the restriction arrays.

Let me demonstrate with some examples:

rs1:PRIMARY>devUser={
	"user" : "example_devuser",
	"roles" : [
		{
			"role" : "read",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.30.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	],
	"pwd" : "changeme"
}
rs1:PRIMARY>prodUser={
	"user" : "example_produser",
	"roles" : [
		{
			"role" : "readWrite",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.11.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	],
	"pwd" : "changeme"
}
rs1:PRIMARY> db.createUser(prodUser)
Successfully added user: {
	"user" : "example_produser",
	"roles" : [
		{
			"role" : "readWrite",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.11.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	]
}

We strongly suggest you start to use both of these MongoDB 3.6 security features to enable the best possible security by ensuring only the application host can use the application user, and a developer can only use their user. Additionally, look into using Ansible, Chef, or similar tools enable easy of deploying the restrictions.

Hopefully, we can all save a good amount of accidental dropCollections or ensureIndexes being build in production versus development environments.

Nov
22
2017
--

MongoDB 3.6 Change Streams: A Nest Temperature and Fan Control Use Case

MongoDB 3.6 Change Streams

MongoDB 3.6 Change StreamsIn this post, I’ll look at what MongoDB 3.6 change streams are, in a creative way. Just in time for the holidays!

What is a change stream?

Change streams in MongoDB provide a cross-platform unified API that can be supported with sharding. It has an option for talking to secondaries, and even allows for security controls like restrictions and action controls.

How is this important? To demonstrate, I’ll walk through an example of using a smart oven with a Nest Thermostat to keep your kitchen from becoming a sauna while you bake a cake — without the need for you to moderate the room temperature yourself.

What does a change stream look like?

db.device_states.watch( {
     $match: {
             documentKey.device: {
                   $in : [ "jennair_oven", "nest_kitchen_thermo"]
             },
             operationType: "insert"
     }
});

What can we watch?

We can use change streams to watch these actions:

  • Insert
  • Delete
  • Replace
  • Update
  • Invalidate

Why not just use the Oplog?

Any change presented in the oplog could be rolled back as it’s only single node durable. Change streams need at least one other node to receive the change. In general, this represents a majority for a typical three node replica-set.

In addition, change streams are resumable. Having a collector job that survives an election is easy as pie, as by default it will automatically retry once. However, you can also record the last seen token to know how to resume where it left off.

Finally, since this is sharding supported with the new cluster clock (wc in Oplog), you can trust the operations order you get, even across shards. This was problematic both with the old oplog format and when managing connections manually.

In short, this is the logical evolution of oplog scrapping, and helps fit a long help request to be able to tail the oplog via mongos, not per replica set.

So what’s the downside?

It’s estimated that after 1000 streams you will start to see very measurable performance drops. Why there is not a global change stream option to avoid having so many cursors floating around is not clear. I think it’s something that should be looked at for future versions of this feature. Up to now, many use cases of mongo, specifically in the multi-tenant world, might have > 1000 namespaces on a system. This would make the performance drop problematic.

What’s in a change stream anyhow?

The first thing to understand is that while some drivers will have

db.collection.watch(XXXX)

 as a function, you could use, this is just an alias for an actual aggregation pipeline $changeStream. This means you could mix this with much more powerful pipelines, though you should be careful. Things like projection could break the ability to resume if the token is not passed on accurately.

So a change stream:

  1. Is a view of an oplog entry for a change This sometimes means you know the change contents, and sometimes you don’t, for example in a delete
  2. Is an explicit API for drivers and code, but also ensures you can get data via Mongos rather than having to connect directly to each node.
  3. Is scalable, resumable, and well ordered – even when sharded!
  4. Harnesses the power of aggregations.
  5. Provides superior ACL support via roles and privileges

Back to a real-world example. Let’s assume you have a Nest unit in your house (because none of us techies have those right?) Let’s also assume you’re fancy and have the Jenn-Air oven which can talk to the Nest. If you’re familiar with the Nest, you might know that its API lets you enable the Jenn-Air fan or set its oven temperature remotely. Sure the oven has a fan schedule to prevent it running at night, but its ability to work with other appliances is a bit more limited.

So for our example, assume you want the temperature in the kitchen to drop by 15 degrees F whenever the oven is on, and that the fan should run even if it’s outside its standard time window.

Hopefully, you can see how such an app, powered by MongoDB, could be useful? However, there are a few more assumptions, which we have already set up: a collection of “device_states” to record the original state of the temperature setting in the Nest; and to record the oven’s status so that we know how to reset the oven using the Nest once cooking is done.

As we know we have the state changes for the devices coming in on a regular basis, we could simply say:

db.device_states.watch({
    $match: {
        documentKey.device: {
              $in : [ "jennair_oven", "nest_kitchen_thermo"]
        },
        operationType: "insert"
     }
});

This will watch for any changes to either of these devices whether it be inserting new states or updating old ones.

Now let’s assume anytime something comes in for the Nest, we are updating  db.nest_settings with that document. However, in this case, when the oven turns on we update a secondary document with an _id of “override” to indicate this is the last known nest_setting before the oven enabling. This means that we can revert to it later.

This would be accomplished via something like…

Change Event document

{ 
    _id: <resume_token>,
    operationType: 'insert',
    ns: {db:'example',coll:"device_states"},
    documentKey: { device:'nest_kitchen_thermo'},
    fullDocument: { 
       _id : ObjectId(),
       device: 'nest_kitchen_thermo',
       temp: 68
    }
}

So you could easily run the follow from your code:

db.nest_settings.update({_id:"current"},{_id:"current",data: event.fullDocument})

Now the current document is set to the last checking from the Nest API.

That was simple enough, but now we can do even more cool things…

Change Event document

{ 
    _id: <resume_token>,
    operationType: 'insert',
    ns: {db:'example',coll:"device_states"},
    documentKey: { device:'jennair_oven'},
    fullDocument: { 
       _id : ObjectId(),
       device: 'jennair_oven',
       temp: 350,
       power: 1,
       state: "warming"
    }
}

This next segment is mode pseudocode:

var event = watcherObj.next();
var device = event.documentKey.device;
var data = event.fullDocument;
if ( device == "jennair_oven"){
     override_enabled = db.nest_settings.count({_id:"override"});
     if ( data.power  && !override_enabled){
        var doc = db.nest_settings.findOne({_id:"current"});
        doc._id="override";
        doc.data.temp += -15; 
        db.nest_settings.insert(doc);
     }
     if (data.power){
         overide_doc = db.nest_settings.findOne({_id:"override"});
         NestObj.termostate.updateTemp(override_doc.data.temp);
         NestObj.termostate.enableFan(15); //Enable for 15 minutes 
     }else{
         overide_doc = db.nest_settings.findOne({_id:"override"});
         overide_doc.data.temp += 15;
         NestObj.termostate.updateTemp(override_doc.data.temp);
         NestObj.termostate.enableFan(0); //Enable for 15 minutes 
         db.nest_settings.remove({_id:"override"});
     }
}

This code is doing a good deal, but it’s pretty basic at the same time:

  1. If the oven is on, but there is no override document, create one from the most recent thermostat settings.
  2. Decrease the current temp setting by 15, and then insert it with the “override” _id value
  3. If the power is set to on
    (a) read in the current override document
    (b) set the thermostat to that setting
    (c) enable the fan for 15 minutes
  4. If the power is now off
    (a) read in the current override document
    (b) set the thermostat to 15 degrees higher
    (c) set the fan to disabled

Assuming you are constantly tailing the watch cursor, this means you will disable the oven and fan as soon as the oven is off.

Hopefully, this blog has helped explain how change streams work by using a real-world logical application to keep your kitchen from becoming a sweat sauna while making some cake… and then eating it!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com