Dec
22
2017
--

MongoDB 3.6 Sorting Changes and Useful New Features

MongoDB 3.6 sorting

MongoDB 3.6 sortingIn this blog, we will cover new MongoDB 3.6 sorting change and other useful features.

The new MongoDB 3.6 GA release can help you build applications and run them on MongoDB, and we will cover areas where there are changes to server behavior that may affect your application usage of MongoDB.

Sorting Changes

The most significant behavior change from an application perspective is sorting on arrays in find and aggregate queries. It is important to review this new behavior in a test environment when upgrading to MongoDB 3.6, especially if the exact order of sorted arrays in find results is crucial.

In MongoDB 3.6 sorting, a sort field containing an array is ordered with the lowest-valued element of the array first for ascending sorts and the highest-valued element of the array first for descending sorts. Before 3.6, it used the lowest-valued array element that matches the query.

Example: a collection has the following documents:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

And we perform this sort operation:

db.test.find({a: {$gte: 0}}).sort({a: 1});

In MongoDB 3.6 sorting, the sort operation no longer takes into account the query predicate when determining its sort key. The operation returns the documents in the following order:

{ "_id" : 1, "a" : [ 5, -4 ] }
{ "_id" : 0, "a" : [ -3, -2, 2, 3 ] }

Previous to 3.6 the result would be:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

More on this change here: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#array-sort-behavior and https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#find-method-sorting.

Wire Protocol Compression

In MongoDB 3.4, wire protocol compression was added to the server and mongo shell using the snappy algorithm. However, this was disabled by default. In MongoDB 3.6, wire protocol compression becomes enabled by default and the zlib compression algorithm was added as an optional compression with snappy being the default. We recommend snappy, unless a higher level of compression (with added cost) is needed.

It’s important to note that MongoDB tools do not yet support wire protocol compression. This includes mongodump, mongorestore, mongoreplay, etc. As these tools are generally used to move a lot of data, there are significant benefits to be had when using these tools over a non-localhost network.

I created this MongoDB ticket earlier this year to add wire protocol compression support to these tools: https://jira.mongodb.org/browse/TOOLS-1668. Please watch and vote for this improvement if this feature is important to you.

$jsonSchema Schema Validation

A big driver for using MongoDB is its “schema-less”, document-based data model. A drawback to this flexibility is sometimes it can result in incomplete/incorrect data in the database, due to the lack of input checking and sanitization.

The relatively unknown “Schema Validation” was introduced in MongoDB 3.2 to address this risk. This feature allowed a user to define what fields are required and what field-values are acceptable using a simple $or array condition, known as a “validator”.

In MongoDB 3.6, a much-more friendy $jsonSchema format was introduced as a “validator” in Schema Validation. On top of that, the ability to query documents matching a defined $jsonSchema was introduced!

Below is an example of me creating a collection named “test” with the required field “x” that must be the bsonType: “number”:

test1:PRIMARY> db.createCollection("test", {
	validator: {
		$jsonSchema: {
			bsonType: "object",
			required: ["x"],
			properties: {
				x: {
					bsonType: "number",
					description: ”field ‘x’ must be a number"
				}
			}
		}
	}
})
{ "ok" : 1, "operationTime" : Timestamp(1513090298, 1) }

Now when I insert a document that does not contain this criterion (“x” should be a number), I get an error:

test1:PRIMARY> db.test.insert({ x: "abc" })
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 121,
		"errmsg" : "Document failed validation"
	}
})

Of course, if my document matches the schema my insert will succeed:

test1:PRIMARY> db.test.insert({ x: 1 })
WriteResult({ "nInserted" : 1 })

To demonstrate $jsonSchema further, let’s perform a .find() query that returns documents matching my defined schema:

test1:PRIMARY> db.test.find({
	$jsonSchema:{
		bsonType: "object",
		required: ["x"],
		properties: {
			x: {
				bsonType: "number",
				description: "must be a number"
			}
		}
	}
})
{ "_id" : ObjectId("5a2fecfd6feb229a6aae374d"), "x" : 1 }

As we can see here, combining the power of the “schema-less” document model of MongoDB with the Schema Validation features is a very powerful combination! Now we can be sure our documents are complete and correct while still offering an extreme amount of developer flexibility.

If data correctness is important to your application, I suggest you implement a Schema Validator at the very start of your application development as implementing validation after data has been inserted is not straightforward.

More on $jsonSchema can be found here: https://docs.mongodb.com/manual/core/schema-validation/#json-schema

DNS SRV Connection

DNS-based Seedlists for connections is a very cool addition to MongoDB 3.6. This allows the server, mongo shell and client drivers (that support the new feature) to use a DNS SRV record to gather a list of MongoDB hosts to connect to. This avoids administrators from having to change seed hosts lists on several servers (usually in an application config) when the host topology changes.

DNS-based seedlists begin with “mongodb+srv://” and have a single DNS SRV record as the hostname.

An example:

mongodb+srv://server.example.com/

Would cause a DNS query to the SRV record ‘_mongodb._tcp.server.example.com’.

On the DNS server, we set the full list of MongoDB hosts that should be returned in this DNS SRV record query. Here is an example DNS response this feature requires:

Record                            TTL   Class    Priority Weight Port  Target
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27317 mongodb1.example.com.
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27017 mongodb2.example.com.

In this above example the hosts ‘mongodb1’ and ‘mongodb2.example.com’ would be used to connect to the database. If we decided to change the list of hosts, only the DNS SRV record needs to be updated. Neat!

More on this new feature here: https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist

dropDatabase Wait for Majority

In 3.6 the behavior of ‘dropDatabase’ was changed to wait for a majority of members to drop the database before returning success. This is a great step in the right direction to improve data integrity/correctness.

More on this change here: https://docs.mongodb.com/manual/reference/command/dropDatabase/#behavior

FTDC for mongos

On mongod instances the FTDC (full-time diagnostic capture) feature outputs .bson files to a directory named ‘diagnostics.data’ in the database path (the server dbPath variable). These files are useful for diagnostics, understanding crashes, etc.

On mongos the new FTDC support outputs the .bson files to ‘mongos.diagnostic.data’ beside the mongos log file. You can change the output path for FTDC files with the server parameter diagnosticDataCollectionDirectoryPath.

FTDC output files must be decoded to be read. The GitHub project ‘ftdc-utils’ is a great tool for reading these specially-formatted files, see more about this tool here: https://github.com/10gen/ftdc-utils.

Here is an example of how to decode the FTDC output files. We can follow the same process for mongod as well:

$ cd /path/to/mongos/mongos.diagnostic.data
$ ftdc decode metrics.2017-12-12T14-44-36Z-00000 -o output

Now it decodes the FTDC metrics to the file ‘output’.

listDatabases Filters

Added in MongoDB 3.6, you can now filter the ‘listDatabases‘ server command. Also, a ‘nameOnly’ boolean option was added to only output database names without additional detail.

The filtering of output is controlled by the new ‘listDatabases‘ option ‘filter’. The ‘filter’ variable must be a match-document with any combination of these available fields for filtering:

  1. name
  2. sizeOnDisk
  3. empty
  4. shards

An example filtering by “name” equal to “tim”:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { name: "tim" } })
{
	"databases" : [
		{
			"name" : "tim",
			"sizeOnDisk" : 8192,
			"empty" : false
		}
	],
	"totalSize" : 8192,
	"ok" : 1,
	"operationTime" : Timestamp(1513100396, 1)
}

Here, I am filtering ‘sizeOnDisk’ to find database larger than 30,000 bytes:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { sizeOnDisk: { $gt: 30000 } } })
{
	"databases" : [
		{
			"name" : "admin",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "local",
			"sizeOnDisk" : 233472,
			"empty" : false
		},
		{
			"name" : "test",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "tim",
			"sizeOnDisk" : 32768,
			"empty" : false
		}
	],
	"totalSize" : 331776,
	"ok" : 1,
	"operationTime" : Timestamp(1513100566, 2)
}

This can be really useful to reduce the size of the ‘listDatabases‘ result.

More on this here: https://docs.mongodb.com/manual/reference/command/listDatabases/#dbcmd.listDatabases

Arbiter priority: 0

MongoDB 3.6 changed the arbiter replica set priority to be 0 (zero). As the arbiter’s priority is not considered, this is a more correct value. You’ll notice your replica set configuration is automatically updated when upgrading to MongoDB 3.6.

More on this change here: https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/#considerations

More on MongoDB 3.6

There are many more changes in this release. It’s important to review these resources below before any upgrade. We always strongly recommend testing functionality in a non-production environment!

Check David Murphy’s blog post on MongoDB 3.6 sessions.

Release Notes: https://docs.mongodb.com/manual/release-notes/3.6/

Compatibility Changes: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/

Conclusion

It is really exciting to see that with each recent major release the MongoDB project is (impressively) tackling both usability/features, while significantly hardening the existing features.

Give your deployment, developers and operations engineers the gift of these new features and optimizations this holiday season/new year! Best wishes in 2018!

Aug
13
2009
--

XtraDB: The Top 10 enhancements

Note: This post is part 2 of 4 on building our training workshop.

Last week I talked about why you don’t want to shard. This week I’m following up with the top 10 enhancements that XtraDB has over the built-in InnoDB included in MySQL 5.0 and 5.1.  Building this list was not really a scientific process – It’s always difficult to say which feature is better than another, because a lot of it depends on the individual workload.  My ranking method was to pick the features that have the highest impact and are most applicable to all workloads first:

  1. CPU scalability fixes – XtraDB improves performance on systems with multi-cpus (see docs 1, 2).
  2. Import/Export Tables – XtraDB allows you to import an arbitrary  table from one server to another, by backing up the .ibd file with Xtrabackup (see docs).
  3. IO scalability fixes – A lot of the internal algorithms of InnoDB are based on the non-configurable assumption that the server has only a single disk installed (100 iops).  One problem in particular that this causes, is that InnoDB the algorithm which chooses if InnoDB is too busy to flush dirty pages can consider it’s self busy very easily.  Keeping a large percentage of pages dirty increases recovery time, and will lead to more work when checkpoints are eventually forced at the end of a log file.  XtraDB improves this with innodb_io_capacity, as well as configuration items for innodb_read_threads, innodb_write_threads (see docs).
  4. Better Diagnostics – The SHOW ENGINE INNODB STATUS command in XtraDB shows a lot more information than the standard InnoDB status (see docs).  The built-in InnoDB status also has some problems with the placement of items (a long transaction list will prevent the rest of the information shown).  In addition to this, XtraDB diagnostics include the ability to see the contents of the buffer pool (see docs), and InnoDB row statistics are inserted into the slow query log (see docs).
  5. Fast Crash Recovery – In the built-in InnoDB, the crash recovery process is sometimes best measured in hours and days – this restricts users to using very small transaction log files (innodb_log_file_size), which is worse for performance.  In a simple test, XtraDB recovered ten times faster (see docs).
  6. InnoDB Plugin Features – XtraDB is derived from the InnoDB plugin, which has fast index creation (as opposed to recreating the whole table!) and page compression.
  7. Adaptive Checkpointing – The built-in InnoDB can have erratic dips in performance as it approaches the end of a log file and needs to checkpoint – which can cause a denial of service to your application (this can be seen in any benchmark – such as this one).  In XtraDB, adaptive checkpointing can smooth out the load, and checkpoint data more aggressively as you approach the end of a log file (see docs).
  8. Insert Buffer control – The insert buffer is a great feature of InnoDB that is not often discussed.  It allows you to delay the writing of non-unique secondary index pages, which can often lead to a lot of merged requests and reduced IO.  The  problem with the insert buffer in the built-in InnoDB, is that there are no options to tweak it.  It can grow to 1/2 the size of your buffer pool, and when it does, it doesn’t try to aggressively free entries (a full buffer provides no use) or reduce its size (see docs).
  9. Data dictionary control – Once an InnoDB table is opened it is never freed from the in-memory data dictionary (which is unlimited in size).  XtraDB introduces a patch to be able to control this, which is useful in cases where users have a large number of tables. (see docs).
  10. Additional undo slots – In the built-in InnoDB, the number of open transactions is limited to 1023 (see bug report).  XtraDB allows this to be expanded to 4072 (Warning: Incompatible change!).  (see docs).

All of these 10 items will be covered in our Training workshops for InnoDB and XtraDB.  In Santa Clara / San Francisco between 14-16 September?  Come along!

My next post in this series will be on XtraDB: The Top 10 Configuration Parameters.


Entry posted by Morgan Tocker |
5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com