In this blog, we will cover new MongoDB 3.6 sorting change and other useful features.
The new MongoDB 3.6 GA release can help you build applications and run them on MongoDB, and we will cover areas where there are changes to server behavior that may affect your application usage of MongoDB.
Sorting Changes
The most significant behavior change from an application perspective is sorting on arrays in find and aggregate queries. It is important to review this new behavior in a test environment when upgrading to MongoDB 3.6, especially if the exact order of sorted arrays in find results is crucial.
In MongoDB 3.6 sorting, a sort field containing an array is ordered with the lowest-valued element of the array first for ascending sorts and the highest-valued element of the array first for descending sorts. Before 3.6, it used the lowest-valued array element that matches the query.
Example: a collection has the following documents:
{ _id: 0, a: [-3, -2, 2, 3] } { _id: 1, a: [ 5, -4 ] }
And we perform this sort operation:
db.test.find({a: {$gte: 0}}).sort({a: 1});
In MongoDB 3.6 sorting, the sort operation no longer takes into account the query predicate when determining its sort key. The operation returns the documents in the following order:
{ "_id" : 1, "a" : [ 5, -4 ] } { "_id" : 0, "a" : [ -3, -2, 2, 3 ] }
Previous to 3.6 the result would be:
{ _id: 0, a: [-3, -2, 2, 3] } { _id: 1, a: [ 5, -4 ] }
More on this change here: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#array-sort-behavior and https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#find-method-sorting.
Wire Protocol Compression
In MongoDB 3.4, wire protocol compression was added to the server and mongo shell using the snappy algorithm. However, this was disabled by default. In MongoDB 3.6, wire protocol compression becomes enabled by default and the zlib compression algorithm was added as an optional compression with snappy being the default. We recommend snappy, unless a higher level of compression (with added cost) is needed.
It’s important to note that MongoDB tools do not yet support wire protocol compression. This includes mongodump, mongorestore, mongoreplay, etc. As these tools are generally used to move a lot of data, there are significant benefits to be had when using these tools over a non-localhost network.
I created this MongoDB ticket earlier this year to add wire protocol compression support to these tools: https://jira.mongodb.org/browse/TOOLS-1668. Please watch and vote for this improvement if this feature is important to you.
$jsonSchema Schema Validation
A big driver for using MongoDB is its “schema-less”, document-based data model. A drawback to this flexibility is sometimes it can result in incomplete/incorrect data in the database, due to the lack of input checking and sanitization.
The relatively unknown “Schema Validation” was introduced in MongoDB 3.2 to address this risk. This feature allowed a user to define what fields are required and what field-values are acceptable using a simple $or array condition, known as a “validator”.
In MongoDB 3.6, a much-more friendy $jsonSchema format was introduced as a “validator” in Schema Validation. On top of that, the ability to query documents matching a defined $jsonSchema was introduced!
Below is an example of me creating a collection named “test” with the required field “x” that must be the bsonType: “number”:
test1:PRIMARY> db.createCollection("test", { validator: { $jsonSchema: { bsonType: "object", required: ["x"], properties: { x: { bsonType: "number", description: ”field ‘x’ must be a number" } } } } }) { "ok" : 1, "operationTime" : Timestamp(1513090298, 1) }
Now when I insert a document that does not contain this criterion (“x” should be a number), I get an error:
test1:PRIMARY> db.test.insert({ x: "abc" }) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
Of course, if my document matches the schema my insert will succeed:
test1:PRIMARY> db.test.insert({ x: 1 }) WriteResult({ "nInserted" : 1 })
To demonstrate $jsonSchema further, let’s perform a .find() query that returns documents matching my defined schema:
test1:PRIMARY> db.test.find({ $jsonSchema:{ bsonType: "object", required: ["x"], properties: { x: { bsonType: "number", description: "must be a number" } } } }) { "_id" : ObjectId("5a2fecfd6feb229a6aae374d"), "x" : 1 }
As we can see here, combining the power of the “schema-less” document model of MongoDB with the Schema Validation features is a very powerful combination! Now we can be sure our documents are complete and correct while still offering an extreme amount of developer flexibility.
If data correctness is important to your application, I suggest you implement a Schema Validator at the very start of your application development as implementing validation after data has been inserted is not straightforward.
More on $jsonSchema can be found here: https://docs.mongodb.com/manual/core/schema-validation/#json-schema
DNS SRV Connection
DNS-based Seedlists for connections is a very cool addition to MongoDB 3.6. This allows the server, mongo shell and client drivers (that support the new feature) to use a DNS SRV record to gather a list of MongoDB hosts to connect to. This avoids administrators from having to change seed hosts lists on several servers (usually in an application config) when the host topology changes.
DNS-based seedlists begin with “mongodb+srv://” and have a single DNS SRV record as the hostname.
An example:
mongodb+srv://server.example.com/
Would cause a DNS query to the SRV record ‘_mongodb._tcp.server.example.com’.
On the DNS server, we set the full list of MongoDB hosts that should be returned in this DNS SRV record query. Here is an example DNS response this feature requires:
Record TTL Class Priority Weight Port Target _mongodb._tcp.server.example.com. 86400 IN SRV 0 5 27317 mongodb1.example.com. _mongodb._tcp.server.example.com. 86400 IN SRV 0 5 27017 mongodb2.example.com.
In this above example the hosts ‘mongodb1’ and ‘mongodb2.example.com’ would be used to connect to the database. If we decided to change the list of hosts, only the DNS SRV record needs to be updated. Neat!
More on this new feature here: https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist
dropDatabase Wait for Majority
In 3.6 the behavior of ‘dropDatabase’ was changed to wait for a majority of members to drop the database before returning success. This is a great step in the right direction to improve data integrity/correctness.
More on this change here: https://docs.mongodb.com/manual/reference/command/dropDatabase/#behavior
FTDC for mongos
On mongod instances the FTDC (full-time diagnostic capture) feature outputs .bson files to a directory named ‘diagnostics.data’ in the database path (the server dbPath variable). These files are useful for diagnostics, understanding crashes, etc.
On mongos the new FTDC support outputs the .bson files to ‘mongos.diagnostic.data’ beside the mongos log file. You can change the output path for FTDC files with the server parameter diagnosticDataCollectionDirectoryPath.
FTDC output files must be decoded to be read. The GitHub project ‘ftdc-utils’ is a great tool for reading these specially-formatted files, see more about this tool here: https://github.com/10gen/ftdc-utils.
Here is an example of how to decode the FTDC output files. We can follow the same process for mongod as well:
$ cd /path/to/mongos/mongos.diagnostic.data $ ftdc decode metrics.2017-12-12T14-44-36Z-00000 -o output
Now it decodes the FTDC metrics to the file ‘output’.
listDatabases Filters
Added in MongoDB 3.6, you can now filter the ‘listDatabases‘ server command. Also, a ‘nameOnly’ boolean option was added to only output database names without additional detail.
The filtering of output is controlled by the new ‘listDatabases‘ option ‘filter’. The ‘filter’ variable must be a match-document with any combination of these available fields for filtering:
- name
- sizeOnDisk
- empty
- shards
An example filtering by “name” equal to “tim”:
test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { name: "tim" } }) { "databases" : [ { "name" : "tim", "sizeOnDisk" : 8192, "empty" : false } ], "totalSize" : 8192, "ok" : 1, "operationTime" : Timestamp(1513100396, 1) }
Here, I am filtering ‘sizeOnDisk’ to find database larger than 30,000 bytes:
test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { sizeOnDisk: { $gt: 30000 } } }) { "databases" : [ { "name" : "admin", "sizeOnDisk" : 32768, "empty" : false }, { "name" : "local", "sizeOnDisk" : 233472, "empty" : false }, { "name" : "test", "sizeOnDisk" : 32768, "empty" : false }, { "name" : "tim", "sizeOnDisk" : 32768, "empty" : false } ], "totalSize" : 331776, "ok" : 1, "operationTime" : Timestamp(1513100566, 2) }
This can be really useful to reduce the size of the ‘listDatabases‘ result.
More on this here: https://docs.mongodb.com/manual/reference/command/listDatabases/#dbcmd.listDatabases
Arbiter priority: 0
MongoDB 3.6 changed the arbiter replica set priority to be 0 (zero). As the arbiter’s priority is not considered, this is a more correct value. You’ll notice your replica set configuration is automatically updated when upgrading to MongoDB 3.6.
More on this change here: https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/#considerations
More on MongoDB 3.6
There are many more changes in this release. It’s important to review these resources below before any upgrade. We always strongly recommend testing functionality in a non-production environment!
Check David Murphy’s blog post on MongoDB 3.6 sessions.
Release Notes: https://docs.mongodb.com/manual/release-notes/3.6/
Compatibility Changes: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/
Conclusion
It is really exciting to see that with each recent major release the MongoDB project is (impressively) tackling both usability/features, while significantly hardening the existing features.
Give your deployment, developers and operations engineers the gift of these new features and optimizations this holiday season/new year! Best wishes in 2018!