The flexibility of MongoDB as a schemaless database is one of its strengths. In early versions, it was left to application developers to ensure that any necessary data validation is implemented. With the introduction of JSON Schema Validator there are new techniques to enforce data integrity for MongoDB. In this article, we use examples to show you how to use the JSON Schema Validator to introduce validation checks at the database level—and consider the pros and cons of doing so.
Why validate?
MongoDB is a schemaless database. This means that we don’t have to define a fixed schema for a collection. We just need to insert a JSON document into a collection and that’s all. Documents in the same collection can have a completely different set of fields, and even the same fields can have different types on different documents. The same object can be a string in some documents and can be a number in other documents.
The schemaless feature has given MongoDB great flexibility and the capability to adapt the database to the changing needs of applications. Let’s say that this flexibility is one of the main reasons to use MongoDB. Relational databases are not so flexible: you always need to define a schema at first. Then, when you need to add new columns, create new tables or change existing architecture to respond to the needs of the applications it’s sometimes a very hard task.
The real world can often be messy and MongoDB can really help, but in most cases the real world requires some kind of backbone architecture too. In real applications built on MongoDB there is always some kind of “fixed schema” or “validation rules” in collections and in documents. It’s possible to have in a collection two documents that represent two completely different things.
Well, it’s technically possible, but it doesn’t make sense in most cases for the application. Most of the arguments for enforcing a schema on the data are well known: schemas maintain structure, giving a clear idea of what’s going into the database, reducing preventable bugs and allowing for cleaner code. Schemas are a form of self-documenting code, as they describe exactly what type of data something should be, and they let you know what checks will be performed. It’s good to be flexible, but behind the scenes we need some strong regulations.
So, what we need to do is to find a balance between flexibility and schema validation. In real world applications, we need to define a sort of “backbone schema” for our data and retain the possibility to be flexible to manage specific particularities. In the past developers implemented schema validation in their applications, but starting from version 3.6, MongoDB supports the JSON Schema Validator. We can rely on it to define a fixed schema and validation rules directly into the database and free the applications to take care of it.
Let’s have a look at how it works.
JSON Schema Validator
In fact, a “Validation Schema” was already introduced in 3.2 but the new “JSON Schema Validator” introduced in the 3.6 release is by far the best and a friendly way to manage validations in MongoDB.
What we need to do is to define the rules using the operator $jsonSchema in the db.createCollection command. The $jsonSchema operator requires a JSON document where we specify all the rules to be applied on each inserted or updated document: for example what are the required fields, what type the fields must be, what are the ranges of the values, what pattern a specific field must have, and so on.
Let’s have a look at the following example where we create a collection people defining validation rules with JSON Schema Validator.
db.createCollection( "people" , { validator: { $jsonSchema: { bsonType: "object", required: [ "name", "surname", "email" ], properties: { name: { bsonType: "string", description: "required and must be a string" }, surname: { bsonType: "string", description: "required and must be a string" }, email: { bsonType: "string", pattern: "^.+\@.+$", description: "required and must be a valid email address" }, year_of_birth: { bsonType: "int", minimum: 1900, maximum: 2018, description: "the value must be in the range 1900-2018" }, gender: { enum: [ "M", "F" ], description: "can be only M or F" } } } }})
Based on what we have defined, only 3 fields are strictly required in every document of the collection: name, surname, and email. In particular, the email field must have a specific pattern to be sure the content is a valid address. (Note: to validate an email address you need a more complex regular expression, here we just use a simpler version just to check there is the @ symbol). Other fields are not required but in case someone inserts them, we have defined a validation rule.
Let’s try to do some example inserting documents to test if everything is working as expected.
Insert a document with one of the required fields missing:
MongoDB > db.people.insert( { name : "John", surname : "Smith" } ) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
Insert a document with all the required fields but with an invalid email address
MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith.gmail.com" } ) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
Finally, insert a valid document
MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith@gmail.com" } ) WriteResult({ "nInserted" : 1 })
Let’s try now to do more inserts including of other fields.
MongoDB > db.people.insert( { name : "Bruce", surname : "Dickinson", email : "bruce@gmail.com", year_of_birth : NumberInt(1958), gender : "M" } ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people.insert( { name : "Corrado", surname : "Pandiani", email : "corrado.pandiani@percona.com", year_of_birth : NumberInt(1971), gender : "M" } ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people.insert( { name : "Marie", surname : "Adamson", email : "marie@gmail.com", year_of_birth : NumberInt(1992), gender : "F" } ) WriteResult({ "nInserted" : 1 })
The records were inserted correctly because all the rules on the required fields, and on the other not required fields, were satisfied. Let’s see now a case where the year_of_birth or gender fields are not correct.
MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(1980), gender : "X" } ) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } }) MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", year_of_birth : NumberInt(1899), gender : "F" } ) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
In the first query gender is X, but the valid values are only M or F. In the second query year of birth is outside the permitted range.
Let’s try now to insert documents with arbitrary extra fields that are not in the JSON Schema Validator.
MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(2000), gender : "M", shirt_size : "XL", preferred_band : "Coldplay" } ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", gender : "F", shirt_size : "M", preferred_band : "Maroon Five" } ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people.find().pretty() { "_id" : ObjectId("5b6b12e0f213dc83a7f5b5e8"), "name" : "John", "surname" : "Smith", "email" : "john.smith@gmail.com" } { "_id" : ObjectId("5b6b130ff213dc83a7f5b5e9"), "name" : "Bruce", "surname" : "Dickinson", "email" : "bruce@gmail.com", "year_of_birth" : 1958, "gender" : "M" } { "_id" : ObjectId("5b6b1328f213dc83a7f5b5ea"), "name" : "Corrado", "surname" : "Pandiani", "email" : "corrado.pandiani@percona.com", "year_of_birth" : 1971, "gender" : "M" } { "_id" : ObjectId("5b6b1356f213dc83a7f5b5ed"), "name" : "Marie", "surname" : "Adamson", "email" : "marie@gmail.com", "year_of_birth" : 1992, "gender" : "F" } { "_id" : ObjectId("5b6b1455f213dc83a7f5b5f0"), "name" : "Tom", "surname" : "Tom", "email" : "tom@gmail.com", "year_of_birth" : 2000, "gender" : "M", "shirt_size" : "XL", "preferred_band" : "Coldplay" } { "_id" : ObjectId("5b6b1476f213dc83a7f5b5f1"), "name" : "Luise", "surname" : "Luise", "email" : "tom@gmail.com", "gender" : "F", "shirt_size" : "M", "preferred_band" : "Maroon Five" }
As we can see, we have the flexibility to add new fields with no restrictions on the permitted values.
Having a really fixed schema
The behavior we have seen so far to permit the addition of extra fields that are not in the validation rules is the default. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the additionalProperties: false parameter in the createCollection command.
In the following example, we create a validator to permit only the required fields. No other extra fields are permitted.
db.createCollection( "people2" , { validator: { $jsonSchema: { bsonType: "object", additionalProperties: false, properties: { _id : { bsonType: "objectId" }, name: { bsonType: "string", description: "required and must be a string" }, age: { bsonType: "int", minimum: 0, maximum: 100, description: "required and must be in the range 0-100" } } } }})
Note a couple of differences:
- we don’t need to specify the list of required fields; using additionalProperties: false forces all the fields to be required by default
- we need to put explicitly even the _id field
As you can notice in the following test, we are no longer allowed to add extra fields. We are forced to insert documents always with the same two fields name and age.
MongoDB > db.people2.insert( {name : "George", age: NumberInt(30)} ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people2.insert( {name : "Maria", age: NumberInt(35), surname: "Peterson"} ) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
In this case we don’t have flexibility, and that is the main benefit of having a NoSQL database like MongoDB.
Well, it’s up to you to use it or not. It depends on the nature and goals of your application. I wouldn’t recommend it in most cases.
Add validation to existing collections
We have seen so far how to create a new collection with validation rules, But what about the existing collections? How can we add rules?
This is quite trivial. The syntax to use in $jsonSchema remains the same, we just need to use the collMod command instead of createCollection. The following example shows how to create validation rules on an existing collection.
First we create a simple new collection people3, inserting some documents.
MongoDB > db.people3.insert( {name: "Corrado", surname: "Pandiani", year_of_birth: NumberLong(1971)} ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people3.insert( {name: "Tom", surname: "Cruise", year_of_birth: NumberLong(1961), gender: "M"} ) WriteResult({ "nInserted" : 1 }) MongoDB > db.people3.insert( {name: "Kevin", surname: "Bacon", year_of_birth: NumberLong(1964), gender: "M", shirt_size: "L"} ) WriteResult({ "nInserted" : 1 })
Let’s create the validator.
MongoDB > db.runCommand( { collMod: "people3", validator: { $jsonSchema : { bsonType: "object", required: [ "name", "surname", "gender" ], properties: { name: { bsonType: "string", description: "required and must be a string" }, surname: { bsonType: "string", description: "required and must be a string" }, gender: { enum: [ "M", "F" ], description: "required and must be M or F" } } } }, validationLevel: "moderate", validationAction: "warn" })
The two new options validationLevel and validationAction are important in this case.
validationLevel can have the following values:
- “off” : validation is not applied
- “strict”: it’s the default value. Validation applies to all inserts and updates
- “moderated”: validation applies to all the valid existing documents. Not valid documents are ignored.
When creating validation rules on existing collections, the “moderated” value is the safest option.
validationAction can have the following values:
- “error”: it’s the default value. The document must pass the validation in order to be written
- “warn”: a document that doesn’t pass the validation is written but a warning message is logged
When adding validation rules to an existing collection the safest option is “warn”
These two options can be applied even with createCollection. We didn’t use them because the default values are good in most of the cases.
How to investigate a collection definition
In case we want to see how a collection was defined, and, in particular, what the validator rules are, the command db.getCollectionInfos() can be used. The following example shows how to investigate the “schema” we have created for the people collection.
MongoDB > db.getCollectionInfos( {name: "people"} ) [ { "name" : "people", "type" : "collection", "options" : { "validator" : { "$jsonSchema" : { "bsonType" : "object", "required" : [ "name", "surname", "email" ], "properties" : { "name" : { "bsonType" : "string", "description" : "required and must be a string" }, "surname" : { "bsonType" : "string", "description" : "required and must be a string" }, "email" : { "bsonType" : "string", "pattern" : "^.+@.+$", "description" : "required and must be a valid email address" }, "year_of_birth" : { "bsonType" : "int", "minimum" : 1900, "maximum" : 2018, "description" : "the value must be in the range 1900-2018" }, "gender" : { "enum" : [ "M", "F" ], "description" : "can be only M or F" } } } } }, "info" : { "readOnly" : false, "uuid" : UUID("5b98c6f0-2c9e-4c10-a3f8-6c1e7eafd2b4") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.people" } } ]
Limitations and restrictions
Validators cannot be defined for collections in the following databases: admin, local, config.
Validators cannot be defined for system.* collections.
A limitation in the current implementation of JSON Schema Validator is that the error messages are not very good in terms of helping you to understand which of the rules are not satisfied by the document. This should be confirmed manually by doing some tests, and that’s not so easy when dealing with complex documents. Having more specific error strings, hopefully taken from the validator definition, could be very useful when debugging application errors and warnings. This is definitely something that should be improved in the next releases.
While waiting for improvements, someone has developed a wrapper for the mongo client to gather more defined error strings. You can have a look at https://www.npmjs.com/package/mongo-schemer. You can test it and use it, but pay attention to the clause “Running in prod is not recommended due to the overhead of validating documents against the schema“.
Conclusions
Doing schema validation in the application remains, in general, a best practice, but JSON Schema Validator is a good tool to enforce validation directly into the database.
Hence even though it needs some improvements, the JSON Schema feature is good enough for most of the common cases. We suggest to test it and use it when you really need to create a backbone structure for your data.
While you are here…
You might also enjoy these other articles about MongoDB 3.6
- What’s new in MongoDB 3.6 – webinar
- MongoDB 3.6 Transactions – blog post
- MongoDB 3.6 Change Streams – a Nest Temperature Fan Control Use Case
The post MongoDB: how to use the JSON Schema Validator appeared first on Percona Database Performance Blog.