Dec
13
2016
--

MongoDB 3.4: Facet Aggregation Features and SERVER-27395 Mongod Crash

Mongod Crash

This blog discusses MongoDB 3.4 GA facet aggregation features and the SERVER-27395 mongod crash bug.

As you may have heard, in late November MongoDB 3.4 GA was released. One feature that stuck out for me, a Lucene enthusiast, was the addition of powerful grouping and faceted search features in MongoDB 3.4.

Faceted Search

For those unfamiliar with the term faceted search, this is a way of grouping data using one or many different grouping criteria over a large result. It’s a tough idea to define Mongod Crashspecifically, but the aim of a faceted search is generally to show the most relevant information possible to the user and allow them to further filter what is usually a very large result of a given search criteria.

The most common day-to-day example of a faceted search is performing a search for a product on an e-commerce website such as eBay, Amazon, etc. As e-commerce sites commonly have the challenge of supplying a massive range of items to users that often provide limited search criteria, it is rare to see an online store today that does not have many “filters” in the right-side of their website to further narrow down a given product search.

Here is an example of me searching the term “mongodb” on a popular auction site:

Mongod CrashWhile this may seem like a specific search to some, at large volume this search term might not immediately show something relevant to some users. What if the user only wants a “used” copy of a MongoDB book from a specific year? What if the user was looking for a MongoDB sticker and not a book at all? This is why you’ll often see filters alongside search results (which we can call “facets”) showing item groupings such as different store departments, different item conditions (such as used/new), publication years, price ranges, review ratings, etc.

In some traditional databases, to get this kind of result we might need to issue many different expensive “GROUP BY” queries that could be painful for a database to process. Each of these queries would independently scan data, even if all queries are summarizing the same “result set.” This is very inefficient. A faceted search offers powerful groupings using a single operation on result data.

When I made my search for “mongodb”, under a faceted search model the page of items (in this case MongoDB books) and all the different groupings of departments, condition, rrice, etc., are performed as a single grouping operation in one “pass” of the data. The result from a faceted search contain items matching the search criteria AND the grouping results of the matched items as a single response.

Traditionally faceted searches were mostly limited to Lucene-based search engines such as Apache Solr, Elasticsearch and various closed-source solutions. With the release of MongoDB 3.4, this has changed!

The new Aggregation Pipeline features named $bucket and $bucketAuto provide functionality for processing groupings of result data in a single aggregation stage, and $facet allows the processing of many aggregation pipelines on the same result for even more complex cases.

New Facetting Features

MongoDB 3.4 introduces these new Aggregation Pipeline operators, allowing some advanced grouping and faceted-search-like features:

  1. $facet – Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
  2. $bucket – Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries.
  3. $bucketAuto – Similar to $bucket, however bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.

As a very basic example, let’s consider this collection of store items:

> db.items.find()
{ "_id" : ObjectId("58502ade9a49537a011226fb"), "name" : "scotch", "price_usd" : 90, "department" : "food and drinks" }
{ "_id" : ObjectId("58502ade9a49537a011226fc"), "name" : "wallet", "price_usd" : 95, "department" : "clothing" }
{ "_id" : ObjectId("58502ade9a49537a011226fd"), "name" : "watch", "price_usd" : 900, "department" : "clothing" }
{ "_id" : ObjectId("58502ade9a49537a011226fe"), "name" : "flashlight", "price_usd" : 9, "department" : "hardware" }

From this example data, I’d like to gather a count of items in buckets by price (field ‘price_usd’):

  1. $0.99 to $9.99
  2. $9.99 to $99.99
  3. $99.99 to $999.99

For each price-bucket, I would also like a list of unique “department” names for the matches. Here is how I would do this with $bucket (and the result):

> db.items.aggregate([
...   { $bucket: {
...     groupBy: "$price_usd",
...     boundaries: [ 0.99, 9.99, 99.99, 999.99 ],
...     output: {
...       count: { $sum: 1 },
...       departments: { $addToSet: "$department" }
...     }
...   } }
... ])
{ "_id" : 0.99, "count" : 1, "departments" : [ "hardware" ] }
{ "_id" : 9.99, "count" : 2, "departments" : [ "clothing", "food and drinks" ] }
{ "_id" : 99.99, "count" : 1, "departments" : [ "clothing" ] }

If you wanted to do something more complex, you have the flexibility of either making the $bucket stage more complex or you can even chain multiple stages together with $facet!

Mongod Crash: SERVER-27395

As I mentioned in my explanation of faceted search, it is a very complex/advanced feature that – due to the implementation challenges – is bound to have some bugs and inefficiencies.

During the evaluation of these new features, I noticed a very serious issue: I was able to crash the entire MongoDB 3.4.0 database instance using the $bucketAuto feature in combination with an $addToSet accumulator in the output definition. This is very serious!

This the example output from my issue reproduction script, responsible for sending the $bucketAuto query to the mongo instance and then checking if it crashed:

$ bash -x ./run.sh
+ js='db.tweets.aggregate([
  { $bucketAuto: {
    groupBy: "$user.location",
    buckets: 1,
    output: {
      count: { $sum: 1 },
      location: { $addToSet: "$user.location" }
    }
  } }
])'
+ echo '### Running crashing $bucketAuto .aggregate() query'
### Running crashing $bucketAuto .aggregate() query
+ /opt/mongodb-linux-x86_64-3.4.0/bin/mongo --port=27017 '--eval=db.tweets.aggregate([
  { $bucketAuto: {
    groupBy: "$user.location",
    buckets: 1,
    output: {
      count: { $sum: 1 },
      location: { $addToSet: "$user.location" }
    }
  } }
])' test
MongoDB shell version v3.4.0
connecting to: mongodb://127.0.0.1:27017/test
MongoDB server version: 3.4.0
2016-12-13T12:59:10.066+0100 E QUERY    [main] Error: error doing query: failed: network error while attempting to run command 'aggregate' on host '127.0.0.1:27017'  :
DB.prototype.runCommand@src/mongo/shell/db.js:132:1
DB.prototype.runReadCommand@src/mongo/shell/db.js:109:16
DBCollection.prototype._dbReadCommand@src/mongo/shell/collection.js:183:12
DBCollection.prototype.aggregate/doAgg<@src/mongo/shell/collection.js:1298:30
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1301:15
@(shell eval):1:1
+ sleep 1
++ tail -1 mongod.log
+ '[' '-----  END BACKTRACE  -----' = '-----  END BACKTRACE  -----' ']'
+ echo '###  Crashed mongod 3.4.0!'
###  Crashed mongod 3.4.0!

As you can see above, a full server crash occurred in my test when using $bucketAuto with $addToSet accumulators. The “network error” is caused by the MongoDB shell losing connection to the now-crashed server.

The mongod log file reports the following lines before the crash (and backtrace):

2016-12-13T12:59:10.048+0100 F -        [conn2] Invalid operation at address: 0x7f1d43ba990a
2016-12-13T12:59:10.061+0100 F -        [conn2] Got signal: 8 (Floating point exception).
 0x7f1d443e0f91 0x7f1d443e0089 0x7f1d443e06f6 0x7f1d42153100 0x7f1d43ba990a 0x7f1d43ba91df 0x7f1d43bc8d2e 0x7f1d43bcae3a 0x7f1d43bce255 0x7f1d43ca4492 0x7f1d43a3b0a5 0x7f1d43a3b29c 0x7f1d43a3b893 0x7f1d43d3c31a 0x7f1d43d3cc3b 0x7f1d4398447b 0x7f1d439859a9 0x7f1d438feb2b 0x7f1d438ffd70 0x7f1d43f12afd 0x7f1d43b1c54d 0x7f1d4371082d 0x7f1d4371116d 0x7f1d4435ec22 0x7f1d4214bdc5 0x7f1d41e78ced

This has been reported as the ticket SERVER-27395, and exists in MongoDB 3.4.0. Please see the ticket for more details, updates and a full issue reproduction: https://jira.mongodb.org/browse/SERVER-27395. If this issue is important to you, please vote for this issue at the ticket URL.

This highlights the importance of testing new features with your exact application usage pattern, especially during a major version release such as MongoDB 3.4.0. With all the new exciting ways one can aggregate data in MongoDB 3.4.0, and the infinite ways to stitch those features together in a pipeline, there are bound to be some cases where the code needs improvement.

Nonetheless, I am very excited to see the addition of these powerful new features and I look forward to them maturing.

Links

  1. https://docs.mongodb.com/manual/reference/operator/aggregation/facet/
  2. https://docs.mongodb.com/manual/reference/operator/aggregation/bucket/
  3. https://docs.mongodb.com/manual/reference/operator/aggregation/bucketAuto/
  4. https://docs.mongodb.com/manual/release-notes/3.4/#aggregation
  5. https://docs.mongodb.com/v3.4/core/aggregation-pipeline/
  6. https://en.wikipedia.org/wiki/Faceted_search
  7. https://jira.mongodb.org/browse/SERVER-27395

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com