Jul
28
2021
--

MongoDB: Modifying Documents Using Aggregation Pipelines and Update Expressions

MongoDB Modifying Documents

MongoDB Modifying DocumentsUpdating documents on MongoDB prior to version 4.2 was quite limited. It was not possible to set values to a conditional expression, combine fields, or update a field based on the value of another field on the server-side. Tracing a parallel to the SQL update statements, for example, it wasn’t possible to do something like the following:

Update t1 set t1.f1 = t1.f2 where…

It wasn’t possible to use a conditional expression either, something easily achieved with SQL standards:

UPDATE t1 SET t1.f1 = CASE WHEN f2 = 1 THEN 1 WHEN f2 = 2 THEN 5 END WHERE…

If something similar to both examples above was required and the deployment was 3.4+, probably the usage of $addFields would be an alternative way to accomplish it. However, it would not touch the current document because the $out output destination could only be a different collection.

With older versions, the only way around was creating a cursor with aggregation pipelines and iterating it on the client side. Inside the loop, it was possible to update using the proper $set values. It was a difficult and tedious task, which would result in a full javascript code.

With MongoDB 4.2 and onwards, it is possible to use an aggregation pipeline to update MongoDB documents conditionally, allowing the update/creation of a field based on another field. This article presents some very common/basic operations which are easily achieved with SQL databases.

Field Expressions in Update Commands (v4.2+)

Updating a field with the value of some other field:

This is similar to the classic example of an SQL command: update t1 set t1.f1 = t1.f2 + t1.f3

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.update({_id:3},[{$set:{result:{$add: [ "$f2", "$f3" ] } }} ]);
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.find({_id:3});
{ "_id" : 3, "f1" : 30, "f2" : 300, "f3" : 3000, "result" : 3300 }

The key point is the “$” on the front of the field names being referenced (“f2” and “f3” in this example). These are the simplest type of field path expression, as they’re called in the MongoDB documentation. You’ve probably seen them in the aggregation pipeline before, but it was only in v4.2 that you could also use them in a normal update command.

Applying “CASE” conditions:

It is quite suitable now to determine conditions for a field value while updating a collection:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({_id:3});
{ "_id" : 3, "grade" : 8 }


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.update(
  { _id: 3}, 
  [
    { $set: {result : { 
      $switch: {branches: [
        { case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, 
        { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, 
        { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } 
      ] } 
    } } } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({ _id: 3});
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Adding new fields for a specific filtered doc:

Let’s say that you want to stamp a document with the updated date = NOW and add a simple comment field:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z") }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.update( 
  { _id: 3 }, 
  [ 
    { $set: { "comment": "Comment3", mod_date: "$$NOW"} } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z"), "comment" : "Comment3", "mod_date" : ISODate("2021-07-05T18:48:44.710Z") }

Reaching several Docs with the same expression:

It is possible now to either use the command updateMany() and reach multiple docs with the same pipeline. 

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8}
{ "_id" : 2, "grade" : 5}
{ "_id" : 3, "grade" : 8}

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.updateMany({}, 
  [
    { $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } 
  ] 
)
{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 2 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Or use the command option { multi: true } if you want to stick to using the original db.collection.update() command. Note that the default is false, which means that only the first occurrence will be updated.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ] )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5 }
{ "_id" : 3, "grade" : 8 }

When specifying {multi:true} the expected outcome is finally achieved:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ],{multi:true} )
WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 2 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Update by $merge Stage in the Aggregation Pipeline

Prior to version 4.2, addressing the result of an aggregate pipeline to a new collection was achieved by using $out. Starting on version 4.2 it is possible to use $merge which is way more flexible considering that while using $out, the entire collection will be replaced, and with merge, it is possible to replace a single document and a few or more things. You may want to refer to the comparison table described here:

https://docs.mongodb.com/manual/reference/operator/aggregation/merge/#std-label-merge-out-comparison

With MongoDB 4.4 and onwards, it is allowed to update a collection directly on the aggregate pipeline through the $merge stage. The magic happens after determining the output collection with the same name as the one being aggregated. The example below illustrates how to flag the max grade of the student Rafa in math class:

  • Original document
replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math" }

  • The aggregation pipeline
replset:PRIMARY> db.getSiblingDB("dbtest").students2.aggregate( [{ $match : { "name": "Rafa","class":"math" } }, {$project:{maxGrade:{$max:"$grades"}}}, {$merge : { into: { db: "dbtest", coll: "students2" }, on: "_id",  whenMatched: "merge", whenNotMatched:"discard"} } ]);

  • Checking the result
replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math", "maxGrade" : 9 }

Note that the maxGrade field was merged into the doc, flagging that the max grade achieved by that student in math was 9.

Watch out: behind the scenes, the merge will trigger an update against the same collection. If that update changes the physical location of the document, the update might revisit the same document multiple times or even get into an infinite loop (Halloween Problem)

The other cool thing is using the $merge stage to work exactly how a SQL command INSERT AS SELECT works (and this is possible with MongoDB 4.2 and onwards). The example below demonstrates how to fill the collection colltest_reporting with the result of an aggregation hit against colltest5.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest5.aggregate( [{ $match : { class: "A" } }, { $group: { _id: "$class",maxGrade: { $max: "$grade" } }},  {$merge : { into: { db: "dbtest", coll: "colltest_reporting" }, on: "_id",  whenMatched: "replace", whenNotMatched: "insert" } } ] );
replset:PRIMARY> db.getSiblingDB("dbtest").colltest_reporting.find()
{ "_id" : "A", "maxGrade" : 8 }

Conclusion

There are plenty of new possibilities which will make a developer’s life easier (especially the life of those developers who are coming from SQL databases) considering that the aggregation framework provides several operators and various different stages to play. Although, it is important to highlight that the complexity of a pipeline may incur performance degradation (that may be a topic for another blog post). For more information on updates with aggregation pipelines, please refer to the official documentation.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com