Dec
22
2023
--

The Pros and Cons of Wildcard Indexes on MongoDB

https://www.percona.com/blog/wp-content/uploads/2023/07/Enterprise-Grade-MongoDB-Alternative.jpg

​​MongoDB is a schemaless database that is extremely flexible. When you create a collection, you don’t have to specify a structure in advance, providing field names and data types. Just start inserting JSON documents, and MongoDB will store them, no matter which fields and data types you provide. As a consequence, a collection can store completely different documents.

MongoDB does not require any ALTER statement to modify the schema of a collection, like in a relational database. If you need at some point to add new fields, just do it. Start inserting new JSON documents with the additional fields. Very easy.

But how can we manage the creation of indexes on a collection where I cannot foresee the fields we could have? In this article, I’ll show wildcard indexes and their pros and cons.

Create a wildcard index on a single field

The simple idea of a wildcard index is to provide the possibility to create an index without knowing in advance the fields we are expecting in the documents. You can put whatever you need and MongoDB will index everything, no matter the field’s name, no matter the data type. The feature looks amazing, but it comes at some cost.

To test wildcard indexes, let’s create a small collection for storing our users’ details. We have some fixed fields like name, date_of_birth, and gender, but also we have a subdocument userMetadata for any other attribute we don’t know in advance. This way, we can store everything we need.

db.user.insert( { name: "John", date_of_birth: new ISODate("2001-02-05"), gender: 'M', userMetadata: { "likes" : [ "dogs", "cats" ] } } )
db.user.insert( { name: "Marie", date_of_birth: new ISODate("2008-03-12"), gender: 'F', userMetadata: { "dislikes" : "hamsters" } } )
db.user.insert( { name: "Tom", date_of_birth: new ISODate("1998-12-23"), gender: 'M', userMetadata: { "age" : 25 } } )
db.user.insert( { name: "Adrian", date_of_birth: new ISODate("1991-06-22"), gender: 'M', userMetadata: "inactive" } )
db.user.insert( { name: "Janice", date_of_birth: new ISODate("1995-09-04"), gender: 'F', userMetadata: { "shoeSize": 8, "likes": [ "horses", "dogs" ] } } )
db.user.insert( { name: "Peter", date_of_birth: new ISODate("2004-01-25"), gender: 'M', userMetadata: { "drivingLicense": { class: "A", "expirationDate": new ISODate("2030-05-05") } } } )

db.user.find()
[
{
_id: ObjectId('658452229a147dcb1198d9df'),
name: 'John',
date_of_birth: ISODate('2001-02-05T00:00:00.000Z'),
gender: 'M',
userMetadata: { likes: [ 'dogs', 'cats' ] }
},
{
_id: ObjectId('658452289a147dcb1198d9e0'),
name: 'Marie',
date_of_birth: ISODate('2008-03-12T00:00:00.000Z'),
gender: 'F',
userMetadata: { dislikes: 'hamsters' }
},
{
_id: ObjectId('6584522e9a147dcb1198d9e1'),
name: 'Tom',
date_of_birth: ISODate('1998-12-23T00:00:00.000Z'),
gender: 'M',
userMetadata: { age: 25 }
},
{
_id: ObjectId('658452519a147dcb1198d9e2'),
name: 'Adrian',
date_of_birth: ISODate('1991-06-22T00:00:00.000Z'),
gender: 'M',
userMetadata: 'inactive'
},
{
_id: ObjectId('658452d69a147dcb1198d9e3'),
name: 'Janice',
date_of_birth: ISODate('1995-09-04T00:00:00.000Z'),
gender: 'F',
userMetadata: { shoeSize: 8, likes: [ 'horses', 'dogs' ] }
},
{
_id: ObjectId('658453a09a147dcb1198d9e4'),
name: 'Peter',
date_of_birth: ISODate('2004-01-25T00:00:00.000Z'),
gender: 'M',
userMetadata: {
drivingLicense: {
class: 'A',
expirationDate: ISODate('2030-05-05T00:00:00.000Z')
}
}
}
]

As you can see, the metaData subdocument contains different fields. But all those fields are not indexed. Let’s suppose our collection contains several million documents; how can we retrieve, for example, all users with a specific driving license class or a specific shoe size without triggering a full collection scan? We can create a wildcard index on the userMetadata field using the special syntax $**

Let’s do it:

db.user.createIndex({ "userMetadata.$**" : 1 })
db.user.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{ v: 2, key: { 'userMetadata.$**': 1 }, name: 'userMetadata.$**_1' }
]

This tells MongoDB to create an entry in the index for every single field and for any array member inside userMetadata.

Now, we can benefit from the index to execute any kind of query.

db.user.find({ "userMetadata.likes": "dogs" })
[
{
_id: ObjectId('658452229a147dcb1198d9df'),
name: 'John',
date_of_birth: ISODate('2001-02-05T00:00:00.000Z'),
gender: 'M',
userMetadata: { likes: [ 'dogs', 'cats' ] }
},
{
_id: ObjectId('658452d69a147dcb1198d9e3'),
name: 'Janice',
date_of_birth: ISODate('1995-09-04T00:00:00.000Z'),
gender: 'F',
userMetadata: { shoeSize: 8, likes: [ 'horses', 'dogs' ] }
}
]

We can simply run explain() to certify the wildcard index is used and the query is an IXSCAN.

db.user.find({ "userMetadata.likes": "dogs" }).explain()
{
explainVersion: '1',
queryPlanner: {
namespace: 'test.user',
indexFilterSet: false,
parsedQuery: { 'userMetadata.likes': { '$eq': 'dogs' } },
queryHash: 'E2BC0D70',
planCacheKey: '7C6EEF39',
maxIndexedOrSolutionsReached: false,
maxIndexedAndSolutionsReached: false,
maxScansToExplodeReached: false,
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { '$_path': 1, 'userMetadata.likes': 1 },
indexName: 'userMetadata.$**_1',
isMultiKey: true,
multiKeyPaths: {
'$_path': [],
'userMetadata.likes': [ 'userMetadata.likes' ]
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
'$_path': [ '["userMetadata.likes", "userMetadata.likes"]' ],
'userMetadata.likes': [ '["dogs", "dogs"]' ]
}
}
},
rejectedPlans: []
...
...

For example, all the following queries can benefit from the same index. You can test them on your own using explain().

db.user.find( { "userMetadata.age" : { $gt: 20 }  } )
db.user.find( { "userMetadata": "inactive" } )
db.user.find( { "userMetadata.drivingLicense.class": "A", "userMetadata.drivingLicense.expirationDate": { $lt: ISODate("2032-01-01") } } )
db.user.find( { "userMetadata.shoeSize": 8})

Create a wildcard index on the entire document

What about creating a wildcard index on the entire document? Is it possible?

Yes, that’s possible. We can do this if we don’t know anything in advance about the documents we’re going to get in the collection.

There is another special syntax for doing that. Use the $** again without specifying a field name. Let’s do it on our user collection.

db.user.createIndex( { "$**" : 1 } )

Again, you can test the same queries we did before. You can notice all fields of the document are indexed now.

db.user.find( { name: "Marie" } )
[
{
_id: ObjectId('658452289a147dcb1198d9e0'),
name: 'Marie',
date_of_birth: ISODate('2008-03-12T00:00:00.000Z'),
gender: 'F',
userMetadata: { dislikes: 'hamsters' }
}
]

db.user.find( { name: "Marie" } ).explain()
{
explainVersion: '1',
queryPlanner: {
namespace: 'test.user',
indexFilterSet: false,
parsedQuery: { name: { '$eq': 'Marie' } },
queryHash: '64908032',
planCacheKey: 'A6C0273F',
maxIndexedOrSolutionsReached: false,
maxIndexedAndSolutionsReached: false,
maxScansToExplodeReached: false,
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { '$_path': 1, name: 1 },
indexName: '$**_1',
isMultiKey: false,
multiKeyPaths: { '$_path': [], name: [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
'$_path': [ '["name", "name"]' ],
name: [ '["Marie", "Marie"]' ]
}
}
},
rejectedPlans: []
...
...

Pros and cons

The good thing about wildcard indexes is easy to say it’s the great flexibility they provide. Just index everything, even what you are not expecting.

The bad thing instead is the index size. As you can simply imagine, what could happen in this case, I’ll start to insert an impressive amount of fields or arrays with thousands of items in the userMetadata? Well, more and more index entries will be created. A single document can generate thousands of index entries. Am I supposed to use such a huge amount of disk and memory for managing this very large index structure?

Remember, indexes are most effective if they can fit into the memory. The size of a wildcard index can simply explode if we cannot control (or we were not able to foresee) the amount of data we create.

Our collection is very small, so the numbers should not worry. But think about what could happen in the case of very large collections. The size of the indexes can go out of control.

Let’s use a simple trick to increase the size of the collection. Run the following statement to double at any time the number of documents. Do it eight or ten times, depending on how many documents you want.

db.user.find( {}, {_id:0}).forEach(function (doc) { db.user.insertOne(doc); } )

In my case, I have a collection of 19K documents. Large enough for testing index size.

db.user.stats()
...
...
 size: 2388744,
count: 19147,
numOrphanDocs: 0,
storageSize: 540672,
totalIndexSize: 1343488,
totalSize: 1884160,
indexSizes: { _id_: 630784, 'userMetadata.$**_1': 192512, '$**_1': 520192 },
avgObjSize: 124,
ns: 'test.user',
nindexes: 3,
...

The data in the collection is around 2.3MB, and the overall size of the indexes is 1.3 MB. Indexes represent more than 50% of the data size. If this is not impressive enough, think about what the percentage should be in case you have very large subdocuments. I have seen collections in production environments being over-indexed because of a single wildcard index.

The wildcard index, which was beneficial at the beginning to make things more flexible, ended instead in a serious bottleneck for the performance, causing more memory utilization and swapping.

Also, remember that most of the time, only a few fields are used for your most frequent queries. Not always does the utilization of a wildcard index really make sense.

Conclusions

Wildcard indexes may be useful in some specific cases when you don’t know much about the documents you expect. The main suggestion is not to abuse wildcard indexes since they can end in serious bottlenecks. Single-field, compound, and multikey indexes are usually better choices than wildcard ones.

Monitor the index size for all the collections for which you have a wildcard index. If you see the size is increasing rapidly drop it and create instead other indexes on the fields that are most frequently used in your queries.

There is some mitigation you can apply, like excluding some of the fields from the wildcard index. Have a look at the following page for more details: https://www.mongodb.com/docs/manual/core/indexes/index-types/index-wildcard/create-wildcard-index-multiple-fields/

The new MongoDB 7.0 now supports the creation of compound wildcard indexes. Have a look at the following page for more details: https://www.mongodb.com/docs/manual/core/indexes/index-types/index-wildcard/index-wildcard-compound/

Further reading about indexes on MongoDB: MongoDB Indexes Explained: A Comprehensive Guide to Better MongoDB Performance.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com