Dec
22
2017
--

MongoDB 3.6 Sorting Changes and Useful New Features

MongoDB 3.6 sorting

MongoDB 3.6 sortingIn this blog, we will cover new MongoDB 3.6 sorting change and other useful features.

The new MongoDB 3.6 GA release can help you build applications and run them on MongoDB, and we will cover areas where there are changes to server behavior that may affect your application usage of MongoDB.

Sorting Changes

The most significant behavior change from an application perspective is sorting on arrays in find and aggregate queries. It is important to review this new behavior in a test environment when upgrading to MongoDB 3.6, especially if the exact order of sorted arrays in find results is crucial.

In MongoDB 3.6 sorting, a sort field containing an array is ordered with the lowest-valued element of the array first for ascending sorts and the highest-valued element of the array first for descending sorts. Before 3.6, it used the lowest-valued array element that matches the query.

Example: a collection has the following documents:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

And we perform this sort operation:

db.test.find({a: {$gte: 0}}).sort({a: 1});

In MongoDB 3.6 sorting, the sort operation no longer takes into account the query predicate when determining its sort key. The operation returns the documents in the following order:

{ "_id" : 1, "a" : [ 5, -4 ] }
{ "_id" : 0, "a" : [ -3, -2, 2, 3 ] }

Previous to 3.6 the result would be:

{ _id: 0, a: [-3, -2, 2, 3] }
{ _id: 1, a: [ 5, -4 ] }

More on this change here: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#array-sort-behavior and https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#find-method-sorting.

Wire Protocol Compression

In MongoDB 3.4, wire protocol compression was added to the server and mongo shell using the snappy algorithm. However, this was disabled by default. In MongoDB 3.6, wire protocol compression becomes enabled by default and the zlib compression algorithm was added as an optional compression with snappy being the default. We recommend snappy, unless a higher level of compression (with added cost) is needed.

It’s important to note that MongoDB tools do not yet support wire protocol compression. This includes mongodump, mongorestore, mongoreplay, etc. As these tools are generally used to move a lot of data, there are significant benefits to be had when using these tools over a non-localhost network.

I created this MongoDB ticket earlier this year to add wire protocol compression support to these tools: https://jira.mongodb.org/browse/TOOLS-1668. Please watch and vote for this improvement if this feature is important to you.

$jsonSchema Schema Validation

A big driver for using MongoDB is its “schema-less”, document-based data model. A drawback to this flexibility is sometimes it can result in incomplete/incorrect data in the database, due to the lack of input checking and sanitization.

The relatively unknown “Schema Validation” was introduced in MongoDB 3.2 to address this risk. This feature allowed a user to define what fields are required and what field-values are acceptable using a simple $or array condition, known as a “validator”.

In MongoDB 3.6, a much-more friendy $jsonSchema format was introduced as a “validator” in Schema Validation. On top of that, the ability to query documents matching a defined $jsonSchema was introduced!

Below is an example of me creating a collection named “test” with the required field “x” that must be the bsonType: “number”:

test1:PRIMARY> db.createCollection("test", {
	validator: {
		$jsonSchema: {
			bsonType: "object",
			required: ["x"],
			properties: {
				x: {
					bsonType: "number",
					description: ”field ‘x’ must be a number"
				}
			}
		}
	}
})
{ "ok" : 1, "operationTime" : Timestamp(1513090298, 1) }

Now when I insert a document that does not contain this criterion (“x” should be a number), I get an error:

test1:PRIMARY> db.test.insert({ x: "abc" })
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 121,
		"errmsg" : "Document failed validation"
	}
})

Of course, if my document matches the schema my insert will succeed:

test1:PRIMARY> db.test.insert({ x: 1 })
WriteResult({ "nInserted" : 1 })

To demonstrate $jsonSchema further, let’s perform a .find() query that returns documents matching my defined schema:

test1:PRIMARY> db.test.find({
	$jsonSchema:{
		bsonType: "object",
		required: ["x"],
		properties: {
			x: {
				bsonType: "number",
				description: "must be a number"
			}
		}
	}
})
{ "_id" : ObjectId("5a2fecfd6feb229a6aae374d"), "x" : 1 }

As we can see here, combining the power of the “schema-less” document model of MongoDB with the Schema Validation features is a very powerful combination! Now we can be sure our documents are complete and correct while still offering an extreme amount of developer flexibility.

If data correctness is important to your application, I suggest you implement a Schema Validator at the very start of your application development as implementing validation after data has been inserted is not straightforward.

More on $jsonSchema can be found here: https://docs.mongodb.com/manual/core/schema-validation/#json-schema

DNS SRV Connection

DNS-based Seedlists for connections is a very cool addition to MongoDB 3.6. This allows the server, mongo shell and client drivers (that support the new feature) to use a DNS SRV record to gather a list of MongoDB hosts to connect to. This avoids administrators from having to change seed hosts lists on several servers (usually in an application config) when the host topology changes.

DNS-based seedlists begin with “mongodb+srv://” and have a single DNS SRV record as the hostname.

An example:

mongodb+srv://server.example.com/

Would cause a DNS query to the SRV record ‘_mongodb._tcp.server.example.com’.

On the DNS server, we set the full list of MongoDB hosts that should be returned in this DNS SRV record query. Here is an example DNS response this feature requires:

Record                            TTL   Class    Priority Weight Port  Target
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27317 mongodb1.example.com.
_mongodb._tcp.server.example.com. 86400 IN SRV   0        5      27017 mongodb2.example.com.

In this above example the hosts ‘mongodb1’ and ‘mongodb2.example.com’ would be used to connect to the database. If we decided to change the list of hosts, only the DNS SRV record needs to be updated. Neat!

More on this new feature here: https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist

dropDatabase Wait for Majority

In 3.6 the behavior of ‘dropDatabase’ was changed to wait for a majority of members to drop the database before returning success. This is a great step in the right direction to improve data integrity/correctness.

More on this change here: https://docs.mongodb.com/manual/reference/command/dropDatabase/#behavior

FTDC for mongos

On mongod instances the FTDC (full-time diagnostic capture) feature outputs .bson files to a directory named ‘diagnostics.data’ in the database path (the server dbPath variable). These files are useful for diagnostics, understanding crashes, etc.

On mongos the new FTDC support outputs the .bson files to ‘mongos.diagnostic.data’ beside the mongos log file. You can change the output path for FTDC files with the server parameter diagnosticDataCollectionDirectoryPath.

FTDC output files must be decoded to be read. The GitHub project ‘ftdc-utils’ is a great tool for reading these specially-formatted files, see more about this tool here: https://github.com/10gen/ftdc-utils.

Here is an example of how to decode the FTDC output files. We can follow the same process for mongod as well:

$ cd /path/to/mongos/mongos.diagnostic.data
$ ftdc decode metrics.2017-12-12T14-44-36Z-00000 -o output

Now it decodes the FTDC metrics to the file ‘output’.

listDatabases Filters

Added in MongoDB 3.6, you can now filter the ‘listDatabases‘ server command. Also, a ‘nameOnly’ boolean option was added to only output database names without additional detail.

The filtering of output is controlled by the new ‘listDatabases‘ option ‘filter’. The ‘filter’ variable must be a match-document with any combination of these available fields for filtering:

  1. name
  2. sizeOnDisk
  3. empty
  4. shards

An example filtering by “name” equal to “tim”:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { name: "tim" } })
{
	"databases" : [
		{
			"name" : "tim",
			"sizeOnDisk" : 8192,
			"empty" : false
		}
	],
	"totalSize" : 8192,
	"ok" : 1,
	"operationTime" : Timestamp(1513100396, 1)
}

Here, I am filtering ‘sizeOnDisk’ to find database larger than 30,000 bytes:

test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { sizeOnDisk: { $gt: 30000 } } })
{
	"databases" : [
		{
			"name" : "admin",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "local",
			"sizeOnDisk" : 233472,
			"empty" : false
		},
		{
			"name" : "test",
			"sizeOnDisk" : 32768,
			"empty" : false
		},
		{
			"name" : "tim",
			"sizeOnDisk" : 32768,
			"empty" : false
		}
	],
	"totalSize" : 331776,
	"ok" : 1,
	"operationTime" : Timestamp(1513100566, 2)
}

This can be really useful to reduce the size of the ‘listDatabases‘ result.

More on this here: https://docs.mongodb.com/manual/reference/command/listDatabases/#dbcmd.listDatabases

Arbiter priority: 0

MongoDB 3.6 changed the arbiter replica set priority to be 0 (zero). As the arbiter’s priority is not considered, this is a more correct value. You’ll notice your replica set configuration is automatically updated when upgrading to MongoDB 3.6.

More on this change here: https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/#considerations

More on MongoDB 3.6

There are many more changes in this release. It’s important to review these resources below before any upgrade. We always strongly recommend testing functionality in a non-production environment!

Check David Murphy’s blog post on MongoDB 3.6 sessions.

Release Notes: https://docs.mongodb.com/manual/release-notes/3.6/

Compatibility Changes: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/

Conclusion

It is really exciting to see that with each recent major release the MongoDB project is (impressively) tackling both usability/features, while significantly hardening the existing features.

Give your deployment, developers and operations engineers the gift of these new features and optimizations this holiday season/new year! Best wishes in 2018!

Dec
21
2017
--

This Week in Data with Colin Charles 20: cPanel changes strategy, Percona Live CFP extended

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

I think the biggest news from last week was from cPanel – if you haven’t already read the post, please do – on Being a Good Open Source Community Member: Why we hesitated on MySQL 5.7. cPanel anticipated MariaDB being the eventual replacement for MySQL, based on movements from Red Hat, Wikipedia and Google. The advantage focused on transparency around security disclosure, and the added features/improvements. Today though, “MySQL now consistently matches or outpaces MariaDB when it comes to development and releases, which in turn is increasing the demand on us for providing those upgraded versions of MySQL by our users.” And maybe a little more telling, “when MariaDB 10.2 became stable in May 2017 it included many features found in MySQL 5.7. However, MySQL reached stable nearly 18 months earlier in October 2015.” (emphasis mine).

So cPanel is going forth and supporting MySQL 5.7. They will continue supporting MariaDB Server for the foreseeable future. This really is cPanel ensuring they are responsive to users: “The people using and building database-driven applications are doing so with MySQL in mind, and are hesitant to add support for MariaDB. Responding to our community’s desires is one of the most important things to us, and this is something that we are hearing asked for from our community consistently.”

I, of course, think this is a great move. Users deserve choice. And MySQL has features that are sometimes still not included in MariaDB Server. Have you seen the Complete list of new features in MySQL 5.7? Or my high-level response to a MariaDB Corporation white paper?

I can only hope to see more people think pragmatically like cPanel. Ubuntu as a Linux distribution still does – you get MySQL 5.7 as a default (very unlike the upstream Debian which ships MariaDB Server nowadays). I used to be a proponent of MariaDB Server being everywhere, when it was community-developed, feature-enhanced, and backward-compatible. However, the moment it stopped being a branch and a true fork is the moment where trouble lies for users. I think it was still marginally fine with 10.0, and maybe even 10.1, but the ability to maintain feature parity with enhanced features has long gone. Short of a rebase? But then… what would be different to the already popular branch of MySQL called Percona Server for MySQL?

While there are wins and support from cloud vendors, like Amazon AWS RDS and Microsoft Azure, you’ll notice that they offer both MySQL and MariaDB Server. Google Cloud SQL notably only offers MySQL. IBM may be a sponsor of the MariaDB Foundation, but I don’t see their services like Compose offering anything other than MySQL (with group replication nonetheless!). Platinum member Alibaba Cloud offers MySQL and PostgreSQL. However, Tencent seems to suggest that MariaDB is coming soon? One interesting statistic to watch would be user uptake naturally.

Events

From an events standpoint, the Percona Live 2018 Call for Papers has been extended to January 12, 2018. We expect an early announcement of maybe ten talks in the week of  January 5. Please submit to the CFP. Have you got your tickets yet? Nab them during our Percona Live 2018 super saver registration when they are the best price!

FOSDEM has got Sveta and myself speaking in the MySQL and Friends DevRoom, but we also have good news in the sense that Peter Zaitsev is also going to be at FOSDEM – speaking in the main track. We’ll also have plenty of schwag at the stand.

I think it’s important to take note of the updates to Percona bug tracking: yes, its Jira all the way. Would be good for everyone to start also looking at how the sausage is made.

Dragph, a “distributed fast graph database“, just raised $3m and released 1.0. Have you used it?

On a lighter note, there seems to be a tweet going around by many, so I thought I’d share it here. Merry Christmas and Happy Holidays.

He’s making a database
He’s sorting it twice
SELECT * FROM girls_boys WHERE behaviour = “nice”
SQL Claus is coming to town!

Releases

Link List

Upcoming appearances

  • FOSDEM 2018 – Brussels, Belgium – February 3-4 2018
  • SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Dec
21
2017
--

Three P’s of a Successful Black Friday: Percona, Pepper Media Holding, and PMM

Successful Black Friday

As we close out the holiday season, let’s look at some data that tells us how to guarantee a successful Black Friday (from a database perspective).

There are certain peak times of the year where companies worldwide hold their breath in the hope that their databases do not become overloaded or unresponsive. A large percentage of yearly profits are achieved in a matter of hours during peak events. It is critical that the database environment remains online and responsive. According to a recent survey, users will not wait more than 2.5 seconds for a site to load before navigating elsewhere. Percona has partnered with many clients over the years to ensure success during these critical events. Our goal is always to provide our clients with the most responsive, stable open-source database environments in order to meet their business needs.

First Stop: Germany

In this blog post, we are going to take a closer look at what happened during Black Friday for a high-demand, high-traffic, business-critical application. Pepper Media Holding runs global deals sites where users post and vote on top deals on products in real-time. To give you a better idea of what the user sees, there is a screenshot below from their Germany mydealz.de branch of Pepper Media Holding.Successful Black Friday

As you can imagine, Black Friday results in a huge spike in traffic and user contribution. In order to ensure success during these crucial times, Pepper Media Holding utilizes Percona’s fully managed service offering. Percona’s Managed Services team has become an extension of Pepper Media Holding’s team by helping plan, prepare, and implement MySQL best-practices across their entire database environment.

Pepper Media Holding and Percona thought it would be interesting to reflect on Black Friday 2017 and how we worked together to flourish under huge spikes in query volume and user connections.

Below is a graph of MySQL query volume for Germany servers supporting the mydealz.de front-end. This graph is taken from Percona’s Managed Service Team’s installation of Percona Monitoring and Management (PMM), which they use to monitor Pepper Media’s environment.

As to be expected, MySQL query volume peaked shortly before and during midnight local time. It also spiked early in the morning as users were waking up. The traffic waned throughout the day. The most interesting data point is the spike from 5 AM to 9 AM which saw an 800% increase from the post-midnight dip. The sustained two-day traffic surge was on average a 200% increase when compared to normal, day-to-day query traffic hitting the database.

For more statistics on how the mydealz.de fared from a front-end and user perspective, visit Pepper Media Holding’s newsroom where Pepper Media has given a breakdown of various statistics related to website traffic during Black Friday.

Next Stop: United Kingdom

Another popular Pepper Media Holding branch is in the United Kingdom – better known as HotUKDeals. HotUKDeals hosts user-aggregated and voted-on deals for UK users. This is the busiest Pepper Media Holding database environment on average. Below is a screenshot of the user interface.

The below graphs are from our Managed Service Team’s Percona Monitoring and Management installation and representative of the UK servers supporting the HotUKDeals website traffic.

The first graph we are taking a look at is MySQL Replication Delay. As you can see, the initial midnight wave of Black Friday deals caused a negligible replica delay. The Percona Monitoring and Management MySQL Replication Delay graph is based on seconds_behind_master which is an integer value only. This means the delay is somewhere between 0 and 1 most of the time. Only once did it go between 1 and 2 over the entire course of Black Friday traffic.

The below graphs highlight the MySQL Traffic seen on the UK servers during the Black Friday traffic spike. One interesting note with this graph is the gradual lead-up to the midnight Black Friday spike. It looks like Black Friday is overstepping its boundaries into Gray Thursday. The traffic spikes here mimic the ones we saw in Germany. There’s an initial spike at midnight on Black Friday and then another spike as shoppers are waking up for their day. The UK servers saw a 361% spike in traffic the morning of Black Friday.

MySQL connections also saw an expected and significant spike during this time. Neglecting to consider max_connections system parameter during an event rush might result in “ERROR 1040 (00000): Too many connections.” However, our CEO, Peter Zaitsev, cautions against absent-mindedly setting this parameter at an unreachable level just to avoid this error. In a blog post, he explained best-practices for this scenario.

The MySQL query graph below shows a 400% spike in MySQL queries during the peak Black Friday morning traffic rush. The average number of queries hitting the database over this two day period is significantly higher than normal – approximately 183%.

Conclusion

Percona reported no emergencies during the Black Friday period for its Managed Service customers – including Pepper Media Holding. We saw similarly high traffic spikes among our customers during this 2017 Black Friday season. I hope that this run-down of a few PMM graphs taken during Pepper Media Holding’s Black Friday traffic period was informative and interesting. Special thanks to Pepper Media Holding for working with us to create this blog post.

Note: Check out our Pepper Media case study on how Percona helps them manage their database environment.

If you would like to further explore the graphs and statistics that Percona Monitoring and Management has to offer, we have a live demo available at https://pmmdemo.percona.com. To discuss how Percona Managed Services can help your database thrive during event-based traffic spikes (and all year round), please call us at +1-888-316-9775 (USA), +44 203 608 6727 (Europe), or have us contact you.

Dec
20
2017
--

Percona Live 2018 Call for Papers Deadline Extended to January 12, 2018

Percona Live 2018 Call for Papers

Percona Live 2018 Call for PapersPercona is extending the Percona Live 2018 call for papers deadline to January 12, 2018!

Percona’s gift to you this holiday season is the gift of time – submit your speaking topics right up until January 12, 2018!

As the year winds up, we received many requests to extend the Percona Live Open Source Database Conference 2018 call for papers. Since many speakers wanted to submit during the week that they’re planning vacations (from Christmas until New Year’s Day), we realized that December 22 was too soon.

If you haven’t submitted already, please consider doing so. Speaking at Percona Live is a great way to talk about what you’re doing, build up your personal and company brands, and get collaborators to your project. If selected, all speakers receive a full complimentary conference pass.

Percona Live 2018 is the destination to share, learn and explore all pertinent topics related to open source databases. The theme for Percona Live 2018 is “Championing Open Source Databases,” with topics on MySQLMongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations, and Business/Case Studies.

Percona Live KeynotesRemember, just like last year, we aren’t looking for just MySQL-ecosystemrelated talks (that includes MariaDB Server and Percona Server for MySQL). We are actively looking for talks around MongoDB, as well as other open source databases (so this is where you can add PostgreSQL, time series databases, graph databases, etc.). That also involves complementary technologies, such as the increasing importance of the cloud and container solutions such as Kubernetes.

Talk about your journey to open source. Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Share your case studies, best practices and technical knowledge with an engaged audience of open source peers.

We are looking for breakout sessions (25 or 50 minutes long), tutorials (3 hours or 6 hours long), and lightning talks and birds of a feather sessions. Submit as many topics as you think you can deliver well.

The conference itself features one day of tutorials and two days of talks. There will also be exciting keynote talks. Don’t forget that registration is now open, and our Super Saver tickets are the best price you can get (Super Saver tickets are on sale until January 7, 2018).

If your company is interested in sponsoring the conference, please take a look at the sponsorship prospectus.

All in, submit away and remember the Percona Live 2018 call for papers deadline is January 12, 2018. We look forward to seeing you at the conference from April 23-25 2018 in Santa Clara.

Dec
20
2017
--

Updates to Percona Bug Tracking

Percona Bug Tracking

Percona Bug TrackingWe’re completing our move of Percona bug tracking into JIRA, and the drop-dead date is December 28, 2017.

For some time now, Percona has maintained both the legacy Launchpad bug tracking system and a JIRA bug tracking system for some of the newer products. The time has come to consolidate everything into the JIRA bug tracking system.

Assuming everything goes according to schedule, on December 28, 2017, we will copy all bug reports in Launchpad into the appropriate JIRA projects (with the appropriate issue state). The new JIRA issue will link to the original Launchpad issue, and the new JIRA issue link is added to the original Launchpad issue. Once this is done, we will then turn off editing on the Launchpad projects.

Q&A

Which Launchpad projects are affected?
Why are you copying all closed issues from Launchpad?

Copying all Launchpad issues to JIRA enables it to be the one place to search for previously reported issues, instead of having to search for old issues in Launchpad and new issues in JIRA.

What should I do now to prepare?

Go to https://jira.percona.com/ and create an account.

Thanks for reporting bugs, and post any questions in the comments section.

Dec
19
2017
--

Percona Monitoring and Management a Trend Setting Product for 2018

DBTA Trend-Setting Products in Data and Information Management for 2018

…but don’t just take our word for it…

Percona Monitoring and Management
DBTA Trend-Setting Products for 2018

The online industry news magazine Database Trends and Applications (DBTA) has included Percona Monitoring and Management in its roundup of Trend-Setting Products in Data and Information Management for 2018.

“…each year, Database Trends and Applications magazine looks for offerings that promise to help organizations derive greater benefit from their data, make better decisions, work more efficiently, achieve greater security, and address emerging challenges.” [December 7, 2018]

As well as including PMM in its selection of products to watch in 2018, DBTA has included a product spotlight penned by Michael Coburn, Product Manager for PMM. This provides a synopsis of the features and benefits of the product for DBAs.

While here at Percona we know that the PMM team are doing excellent work, it’s great they’ve had independent recognition. This news marks a welcome end to a busy and productive year.

The team isn’t resting on its laurels though! We are looking forward to producing further enhancements to PMM throughout 2018.

Watch this space!

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. PMM incorporates some best-of-breed open source tools to provide a comprehensive database monitoring and management facility. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL, MariaDB® and MongoDB servers to ensure that your data works as efficiently as possible. Recent developments have included enhanced support for Amazon RDS and Amazon Aurora.

Dec
18
2017
--

Percona Monitoring and Management 1.5.3 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.5.3. This release contains fixes for bugs found after the release of Percona Monitoring and Management 1.5.2, as well as some important fixes and improvements not related to the previous release.

Improvements

  • PMM-1874: The read timeout of the proxy server (/prometheus) has been increased from the default of 60 seconds to avoid nxginx gateway timeout error when loading data-rich dashboards.
  • PMM-1863: We improved our handling of temporary Grafana credentials

Bug fixes

  • PMM-1828: On CentOS 6.9, pmm-admin list incorrectly reported that no monitoring services were running.
  • PMM-1842: It was not possible to restart the mysql:queries monitoring service after PMM Client was upgraded from version 1.0.4.
  • PMM-1797: It was not possible to update the CloudWatch data source credentials.
  • PMM-1829: When the user clicked a link in the Query Abstract column, an outdated version of QAN would open.
  • PMM-1836PMM Server installed in a Docker container could not be started if the updating procedure had been temporarily interrupted.
  • PMM-1871: In some cases, RDS instances could not be discovered.
  • PMM-1845: Converted FLUSH SLOW LOGS to FLUSH NO_WRITE_TO_BINLOG SLOW LOGS so that GTID event isn’t created
  • PMM-1816: Fixed a rendering error in Firefox.
Dec
15
2017
--

MongoDB 3.6 Security Improvements

MongoDB 3.6 sorting

MongoDB 3.6 SecurityIn this blog post, we’ll look at MongoDB 3.6 security improvements.

As we’ve already talked about in this series, MongoDB 3.6 has a number of new features in it. But we have talked less about the new security enhancements in this release. The MongoDB 3.6 security features are particularly exciting. Some of these are just moving packaging default into binary default, but others are evidence that MongoDB is growing up and taking in what security professionals have been asking for. We can break things down into two major areas: Network Listening and more restrictive access controls. Hopefully, as a MongoDB user you are excited by this.

Network Listening

This is a purposely vague topic. As you know, MongoDB is just one of the many NoSQL databases that have been hit with hacks, theft and even ransomware events. It is true this was largely due to updated systems not following best practices, but MongoDB’s binaries did nothing to help the situation. Point of fact, both MongoDB Inc and Percona Server for MongoDB ship with configurations that are more restrictive than the binaries. Percona even has a tool to print a warning that it might not be secured if it detects a public IP bound to the machine.

It should be noted for anyone coming from MySQL or the RDBMS world that this differs from having user:password@host ACL restrictions. Even with this setting, any user can connect from any whitelist IP address. So it’s still important to consider a separate password per environment to prevent accidental issues where a developer thinks they are on a test box and instead drop a database in production. However, all is not lost: there are separate improvements on that front also.

You might ask, “How do I configure the bind IP setting?” (especially if you haven’t been). Let’s assume you have some type of NAT in your system, and 10.10.10.10 is some private NAT address that directly maps to a dedicated public one (your DB nodes are in your own data center, and your applications are somewhere else). We will also assume that while you don’t have a VPN, you do have firewall rules such that only your application hosts are allowed into the database. Not perfect, but better than some things we have seen in the news.

To enable listening on that port, you have two methods: command line and configuration file.

Command Line looks like

mongod --bind-ip 10.10.10.10 --fork --logpath /var/log/mongod/mongod.log --port 17001

.

The configuration file, on the other hand, is more of a YAML format:

net:
   port: 17001
   bindIp: 10.10.10.10
   bindIpAll: false

Please note that you should almost never set bindIpAll, as it forces the old behavior of listening to everything. Instead, use a comma-separated list, like “10.10.10.10, 68.82.123.213, 192.168.10.2”.

User Restrictions – CIDR/IP Whitelisting

Just now we talked about how bindIp works. It is for the server as a whole, not per user (something many RDBM systems have long enjoyed). David Murphy discussed this in his MongoDB 3.6 blog, and how MySQL has had it at least since at least 1998. Not only has MongoDB finally added host control to its abilities, but it’s also raised the game using the power of the document model. Typically in the MySQL world, you define a user like:

GRANT ALL PRIVILEGES ON dbTest.* To 'user'@'hostname' IDENTIFIED BY 'password'; 

Not a bad method really, but what if it allowed networks or specific IPs for a single user? This is actually a great first step, but there is more to go. For now, you can only say what sources and destinations a user can map to. You still need to make a user per environment, as you can’t define roles inside of the restriction arrays.

Let me demonstrate with some examples:

rs1:PRIMARY>devUser={
	"user" : "example_devuser",
	"roles" : [
		{
			"role" : "read",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.30.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	],
	"pwd" : "changeme"
}
rs1:PRIMARY>prodUser={
	"user" : "example_produser",
	"roles" : [
		{
			"role" : "readWrite",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.11.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	],
	"pwd" : "changeme"
}
rs1:PRIMARY> db.createUser(prodUser)
Successfully added user: {
	"user" : "example_produser",
	"roles" : [
		{
			"role" : "readWrite",
			"db" : "foo"
		}
	],
	"authenticationRestrictions" : [
		{
			"clientSource" : [
				"10.11.0.0/16"
			],
			"serverAddress" : [
				"10.11.0.0/16"
			]
		}
	]
}

We strongly suggest you start to use both of these MongoDB 3.6 security features to enable the best possible security by ensuring only the application host can use the application user, and a developer can only use their user. Additionally, look into using Ansible, Chef, or similar tools enable easy of deploying the restrictions.

Hopefully, we can all save a good amount of accidental dropCollections or ensureIndexes being build in production versus development environments.

Dec
06
2017
--

MongoDB 3.6 Community Is Here!

MongoDB 3.6 Community

MongoDB 3.6 CommunityBy now you surely know MongoDB 3.6 Community became generally available on Dec 5, 2017. Of course, this is great news: it has some big ticket items that we are all excited about! But I want to also talk about my general thoughts on this release.

It is always a good idea for your internal teams to study and consider new versions. This is crucial for understanding if and when you should start using it. After deciding to use it, there is the question of if you want your developers using the new features (or are they not suitably implemented yet to be used)?

So what is in MongoDB 3.6 Community? Check it out:

  • Sessions
  • Change Streams
  • Retryable Writes
  • Security Improvement
  • Major love for Arrays in Aggregation
  • A better balancer
  • JSON Schema Validation
  • Better Time management
  • Compass is Community
  • Major WiredTiger internal overhaul

As you can see, this is an extensive list. But there are 1400+ implemented Jira tickets just on the server itself (not even in the WiredTigers project).

To that end, I thought we should break my review into a few areas. We will have blog posts out soon covering these areas in more depth. This blog is more about my thoughts on the topics above.

Expected blogs (we will link to them as they become available):

  • Change Streams –  Nov 11 2017
  • Sessions
  • Transactions and new functions
  • Aggregation improvements
  • Security Controls to use ASAP
  • Other changes from diagnostics to Validation

Today let’s quickly recap the above areas.

Sessions

We will have a blog on this (it has some history). This move has been long-awaited by anyone using MongoDB before 2.4. There were connection changes in that release that made it complicated for load balancers due to the inability to “re-attach” to the same session.  If you were not careful in 2.4+, you could easily use a load-balancer and have very odd behavior: from broken to invisibly incomplete getMores (big queries).

Sessions aim is to change this. Now, the client drivers know about the internal session to the database used for reads and writes. Better yet, MongoDB tracks these sessions so even if an election occurs, when your drive fails over so will the session. For anyone who’s applications handled fail-overs badly, this is an amazing improvement. Some of the other new features that make 3.6 a solid release require this new behavior.

Does this mean this solution is perfect and works everywhere? No! It, like newer features we have seen, leave MMAPv1 out in the cold due to its inability without major re-work to support logic that is so native to Wired Tiger. Talking with engine engineers, it’s clear that some of the logic behind the underlying database snapshots and rollbacks added here can cause issues (which we will talk about more in the transactions blog).

Change streams

As one of the most talked about (but most specialized) features, I can see its appeal in a very specific use case (but it is rather limited). However, I am getting ahead of myself! Let’s talk about what it is first and where it came from.

Before this feature, people streamed data out of MySQL and MongoDB into Elastic and Hadoop stacks. I mention MySQL, as this was the primer for the initial method MongoDB used. The tools would read the MySQL binlogs – typically saved off somewhere – and then apply those operations to some other technology. When they went to do the same thing in MongoDB, there was a big issue: if writes are not sent to the majority of the nodes, it can cause a rollback. In fact, such rollbacks were not uncommon. The default was w:1 (meaning the primary only needed to have the write), which resulted in data existing in Elastic that had been removed from MongoDB. Hopefully, everyone can see the issue here, and why a better solution was needed than just reading the oplog.

Enter

$changeStream

, which in the shell has a helper called

.watch()

 . This is a method that uses a multi-node consistent read to ensure the data is on the majority of nodes before the command returns the data in a tailable cursor. For this use case, this is amazing as it allows the data replicating tool much more assurance that data is not going to vanish. 

$changeStream

 is not without limits: if we have 10k collections and we want to watch them all, this is 10k separate operations and cursors. This puts a good deal of strain on the systems, so MongoDB Inc. suggests you do this on no more than 1000 namespaces at a time to prevent overhead issues.

Sadly it is not currently possible to take a mongod-wide snapshot to support this under the hood, as this is done on each namespace to implement the snapshot directly inside WiredTiger’s engine. So for anyone with a very large collection count, this will not be your silver bullet yet. This also means streams between collections and databases are not guaranteed to be in sync. This could be an issue for someone even with a smaller number of namespaces that expect this. Please don’t get me wrong: it’s a step in the correct direction, but it falls short.

I had very high hopes for this to simplify backups in MongoDB. Percona Labs’s GitHub has a tool called MongoDB-Consistent-Backup, which tails multiple oplogs to get a consistent sharded backup without the need to pay for MongoDB’s backup service or use the complicated design that is Ops Manager (when you host it yourself). Due to the inability to do a system-wide change stream, this type of tool still needs to use the oplog. If you are not using

w:majority

  it could trigger a failure if you have an election or if a rollback occurs. Still, I have hopes this will be something that can be considered in the future to make things better for everyone.

Retryable writes

Unlike change streams, this feature is much more helpful to the general MongoDB audience. I am very excited for it. If you have not watched this video, please do right now! Samantha does a good job explaining the issue and solution in detail. However, for now just know there has been a problem that where a write that has an issue (network, app shutdown, DB shutdown, election), you had no way to know if the write failed or not. This unknown situation was terrible for a developer, and they would not know if they needed to run the command again or not. This is especially true if you have an ordering system and you’re trying to remove stock from your inventory system. Sessions, as discussed before, allowed the client to reconnect to a broken connection and keep getting results to know what happened or didn’t. To me, this is the second best feature of 3.6. Only Security is more important to me personally.

Security improvement

In speaking of security, there is one change that the security community wanted (which I don’t think is that big of a deal). For years now, the MongoDB packaging for all OSs (and even the Percona Server for MongoDB packing) by default would limit the bindIP setting to localhost. This was to prevent unintended situations where you had a database open to the public. With 3.6 now the binaries also default to this. So, yes, it will help some. But I would (and have) argued that when you install a database from binaries or source, you are taking more ownership of its setup compared to using Docker, Yum or Apt.

The other major move forward, however, is something I have been waiting for since 2010. Seriously, I am not joking! It offers the ability to limit users to specific CIDR or IP address ranges. Please note MySQL has had this since at least 1998. I can’t recall if it’s just always been there, so let’s say two decades.

This is also known as “authenticationRestriction” and it’s an array you can put into the user document when creating a document. The manual describes it as:

The authentication restrictions the server enforces on the created user. Specifies a list of IP addresses and CIDR ranges from which the user is allowed to connect to the server or from which the server can accept users.

I can not overstate how huge this is. MongoDB Inc. did a fantastic job on it. Not only does it support the classic client address matching, it supports an array of these with matching on the server IP/host also. This means supporting multiple IP segments with a single user is very easy. By extension, I could see a future where you could even limit some actions by range – allowing dev/load test to drop collections, but production apps would not be allowed to. While they should have separate users, I regularly see clients who have one password everywhere. That extension would save them from unplanned outages due to human errors of connecting to the wrong thing.

We will have a whole blog talking about these changes, their importance and using them in practice. Yes, security is that important!

Major love for array and more in Aggregation

This one is a bit easier to summarize. Arrays and dates get some much-needed aggregation love in particular. I could list all the new operators here, but I feel it’s better served in a follow-up blog where we talk about each operator and how to get the most of it. I will say my favorite new option is the $hint. Finally, I can try to control the work plan a bit if it’s making bad decisions, which sometimes happens in any technology.

A better balancer

Like many other areas, there was a good deal that went into balancer improvements. However, there are a few key things that continue the work of 3.4’s parallel balancing improvements.

Some of it makes a good deal of sense for anyone in a support role, such as FTDC now also existing in mongos’. If you do not know what this is, basically MongoDB collects some core metrics and state data and puts it into binary files in dbPath for engineers at companies like Percona and MongoDB Inc. to review. That is not to say you can’t use this data also. However, think of it as a package of performance information if a bug happens. Other diagnostic type improvements include moveChunk, which provides data about what happened when it runs in its output. Previously you could get this data from the config.changeLog or config.actionLog collections in the config servers. Obviously, more and more people are learning MongoDB’s internals and this should be made more available to them.

Having talked about diagnostic items, let’s move more into the operations wheelhouse. The single biggest frustration about sharding and replica-sets is the sensitivity to time variations that cause odd issues, even when using ntpd. To this point, as of 3.6 there is now a logical clock in MongoDB. For the geekier in the crowd, this was implemented using a Lamport Clock (great video of them). For the less geeky, think of it as a cluster-wide clock preventing some of the typical issues related to varying system times. In fact, if you look closer at the oplog record format in 3.6 there is a new wt field for tracking this. Having done that, the team at MongoDB Inc. considered what other things were an issue. At times like elections of the config servers, meta refreshes did not try enough times and could cause a mongos to stop working or fail. Those days are gone! Now it will check three times as much, for a total of ten attempts before giving up. This makes the system much more resilient.

A final change that is still somewhat invisible to the user but helps make dropping collections more stable, is that they remove the issue MongoDB had about dropping and recreating sharded collections. Your namespaces look as they always have. Behind the scenes, however, they have UUID’s attached to them so that if foo.bar drops and gets recreated, it would be a different UUID. This allows for less-blocking drops. Additionally, it prevents confusion in a distributed system if we are talking about the current or old collection.

JSON Schema validation

Some non-developers might not know much about something called JSON Schema. It allows you to set rules on schema design more efficiently and strictly than MongoDB’s internal rules. With 3.6, you can use this directly. Read more about it here. Some key points:

  • Describes your existing data format
  • Clear, human- and machine-readable documentation
  • Complete structural validation, useful for:
    • Automated testing
    • Validating client-submitted data
You can even make it reject when certain fields are missing. As for MySQL DBAs, you might ask why this is a big deal? You could always have a DBA define a schema in an RDBMS, and the point of MongoDB was to be flexible. That’s a fair and correct view. However, the big point of using it is you could apply this in production, not in development. This gives developers the freedom to move quickly, but provides operational teams with control methods to understand when mistakes or bugs are present before production is adversely affected. Taking a step back, its all about bridging the control and freedom ravines to ensure both camps are happy and able to do their jobs efficiently.

Compass is Community

If you have never used Compass, you might think this isn’t that great. You could use things like RoboMongo and such. You absolutely could, but Compass can do visualization as well as CRUD operations. It’s also a very fluid experience that everyone should know is available for use. This is especially true for QA teams who want to review how often some types of data are present, or a business analyst who needs to understand in two seconds what your demographics are.

Major WiredTiger internal overhaul

There is so much here that I recommend any engineer-minded person take a look at this deck, presented by one of the great minds behind WiredTiger. It does a fantastic job explaining all the reasons behind some of the 3.2 and 3.4 scaling issues MongoDB had on WiredTiger. Of particular interest is why it tended to have worse and worse performance as you added more collections and indexes. It then goes into how they fixed those issues:

  • Some key points on what they did
  • Made Evictions Smarts, as they are not collection uniform
  • Improved assumption around the handle cache
  • Made Checkpoints better in all ways
  • Aggressively cleaned up old handles

I hope this provides a peek into the good, bad, and ugly in MongoDB 3.6 Community! Please check back as we publish more in-depth blogs on how these features work in practice, and how to best use them.

Dec
05
2017
--

Webinar Wednesday, December 6, 2017: Gain a MongoDB Advantage with the Percona Memory Engine

Percona Memory Engine

Percona Memory EngineJoin Percona’s, CTO, Vadim Tkachenko as he presents Gain a MongoDB Advantage with the Percona Memory Engine on Wednesday, December 6, 2017, at 11:00 am PST / 2:00 pm EST (UTC-8).

Experience: Entry Level to Intermediate

Tags: Developer, DBAs, Operations

Looking for the performance of Redis or Memcache, the expressiveness of the MongoDB query language and simple high availability and sharding? Percona Memory Engine, available as part of Percona Server for MongoDB, has it all!

In this webinar, Vadim explains the architecture of the MongoDB In-Memory storage engine. He’ll also show some benchmarks compared to disk-based storage engines and other in-memory technologies.

Vadim will share specific use cases where Percona Memory Engine for MongoDB excels, such as:

  • Caching documents
  • Highly volatile data
  • Workloads with predictable response time requirements

Register for the webinar now.

Vadim TkachenkoVadim Tkachenko, CTO

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High-Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High-Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com