Good intentions can sometimes end up with bad results. Adding indexes boosts performance until it doesn’t. Avoid over-indexing.
The difference between your application being fast, responsive, and scaling properly is often dependent on how you use indexes in the database. MongoDB is no different, its performance (and the overall performance of your application) is heavily dependent on getting the proper amount of indexes on the right things. A simple index or two can speed up getting data from MongoDB a million-fold for million-records tables. But at the same time having too many indexes on a large collection can lead to massive slowdowns in overall performance. You need to get your indexes just right.
For this blog, we are going to talk about having too many indexes and help you find both duplicate and unused indexes. If you are interested in finding out if you need additional indexes or if your query is using an index, I would suggest reading previous Percona articles on query tuning (Part 1 & Part 2 of that series).
So, indexes are very good for getting faster queries. How many indexes do I need to create on a collection? What are the best practices for the indexes? How do I find which indexes are being used or not? What if I have duplicated indexes?
Common Performance Problems
After analyzing a lot of different MongoDB environments I can provide the following list summarizing the typical errors I have seen:
- Not creating indexes at all, other than the primary key _id created by design.
- I’m not joking – I have seen databases without any user-created indexes, which had owners surprised the server was overloaded and/or the queries were very slow.
- Over-indexing the collection.
- Some developers usually create a lot of indexes without a specific reason or just for testing a query. Then they forget to drop them.
- In some cases, the size of all the indexes was larger than the data. This is not good; indexes should be as small as possible to be really effective.
I’m not considering the first case. I’m going to discuss instead the second one.
How Many Indexes you Need in a Collection
It depends – that’s the right answer. Basically, it depends on your application workload. You should consider the following rules when indexing a collection:
- Create as many indexes as possible for your application.
- Don’t create a lot of indexes.
What? These rules are stating the opposite thing! Well, we can summarize in just one simple rule:
- You need to create all the indexes your application really needs for solving the most frequent queries. Not one more, not one less.
That’s it.
Pros and Cons of Indexing
The big advantage of the indexes is that they permit the queries, updates, and deletes to run as fast as possible if they are used. (Every update or delete also needs to do a lookup step first). More indexes in a collection can benefit several queries.
Unfortunately, the indexes require some extra work for MongoDB. Any time your run a write, all the indexes must be updated. The new values are stored or dropped into the B-Tree structure, some splitting or merging is needed, and this requires some time.
The main problem is that “more indexes you have in a collection, the slower all the writes will be”.
A very large collection with just 10 or 15 indexes can have a significant performance loss for the writes. Also, remember that indexes have to be copied into the WiredTiger cache. More indexes imply also more pressure for the memory cache. The pressure can then lead to more cache evictions and slowness.
A good example of this is when I was working with a customer a few weeks ago we found 12 extra indexes on a collection they did not need. The collection was around 80GB; the total index size was more than the data size. They had a relevant write load based on several frequent inserts and updates all the time. Cleaning these indexes increased their write queries execution time by 25-30 percent on average. The improvement observed for this real case won’t be the same quantitative amount in other cases, but for sure the fewer indexes you have the faster all the writes will be.
We need to find some kind of balancing: creating more indexes, but not that much.
How to Reduce Over-Indexing
Very easy to say: drop all the indexes you don’t need.
There are two things you can do to identify the indexes to get dropped:
- Check for the duplicates.
- Check for the unused indexes.
For dropping an index you need to run something like the following:
db.mycollection.dropIndex("index_name")
Find Duplicate Indexes
A duplicate index could be an index with the same exact definition as another index that already exists in the collection. Fortunately, MongoDB is able to check this and it is not permitted to create such an index.
Let’s do a test using a simple collection with no indexes.
rs_test:PRIMARY> db.test.find() { "_id" : ObjectId("60521309d7268c122c7cd630"), "name" : "corrado", "age" : 49 } { "_id" : ObjectId("60521313d7268c122c7cd631"), "name" : "simone", "age" : 12 } { "_id" : ObjectId("6052131cd7268c122c7cd632"), "name" : "gabriele", "age" : 19 } { "_id" : ObjectId("60521323d7268c122c7cd633"), "name" : "luca", "age" : 14 } { "_id" : ObjectId("60521328d7268c122c7cd634"), "name" : "lucia", "age" : 49 } # create an index on name field rs_test:PRIMARY> db.test.createIndex( { name: 1 } ) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "commitQuorum" : "votingMembers", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1615991942, 5), "signature" : { "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="), "keyId" : NumberLong("6890926313742270469") } }, "operationTime" : Timestamp(1615991942, 5) } # check indexes available rs_test:PRIMARY> db.test.getIndexes() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" }, { "v" : 2, "key" : { "name" : 1 }, "name" : "name_1" } ] # try to create again the same index rs_test:PRIMARY> db.test.createIndex( { name: 1 } ) { "numIndexesBefore" : 2, "numIndexesAfter" : 2, "note" : "all indexes already exist", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1615991942, 5), "signature" : { "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="), "keyId" : NumberLong("6890926313742270469") } }, "operationTime" : Timestamp(1615991942, 5) } # great, MongoDB can detect the index already exists # let's try to see if you can create the same index with a different name rs_test:PRIMARY> db.test.createIndex( { name: 1 }, { name: "this_is_a_different_index_name" } ) { "operationTime" : Timestamp(1615991981, 1), "ok" : 0, "errmsg" : "Index with name: this_is_a_different_index_name already exists with a different name", "code" : 85, "codeName" : "IndexOptionsConflict", "$clusterTime" : { "clusterTime" : Timestamp(1615991981, 1), "signature" : { "hash" : BinData(0,"whkRyQQxyJVBt+7d3HOtFvYY32g="), "keyId" : NumberLong("6890926313742270469") } } } # even in this case MongoDB doesn't permit the index creation
MongoDB is then clever enough to avoid the creation of duplicate indexes. But what about the creation of an index that is the left-prefix of an existing index? Let’s test it.
# let's drop the previous index we have created rs_test:PRIMARY> db.test.dropIndex( "name_1" ) { "nIndexesWas" : 2, "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1615993029, 1), "signature" : { "hash" : BinData(0,"njFiuCeyA5VcdNOOP2ASboOpWwo="), "keyId" : NumberLong("6890926313742270469") } }, "operationTime" : Timestamp(1615993029, 1) } # check indexes. Only _id available rs_test:PRIMARY> db.test.getIndexes() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ] # create a compound index rs_test:PRIMARY> db.test.createIndex( { name:1, age: 1 } ) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "commitQuorum" : "votingMembers", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1615993054, 5), "signature" : { "hash" : BinData(0,"gfaPsWsSM745opEiQORCt2L3HYo="), "keyId" : NumberLong("6890926313742270469") } }, "operationTime" : Timestamp(1615993054, 5) } # create another index that is the leftmost prefix of the compound index rs_test:PRIMARY> db.test.createIndex( { name:1 } ) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "commitQuorum" : "votingMembers", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1615993060, 5), "signature" : { "hash" : BinData(0,"C2XWVA5mi+WWyPMn3Jw2VHTw/Dk="), "keyId" : NumberLong("6890926313742270469") } }, "operationTime" : Timestamp(1615993060, 5) } # check indexes rs_test:PRIMARY> db.test.getIndexes() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" }, { "v" : 2, "key" : { "name" : 1, "age" : 1 }, "name" : "name_1_age_1" }, { "v" : 2, "key" : { "name" : 1 }, "name" : "name_1" } ]
We consider a leftmost-prefix index as a duplicate as well.
To take benefit from a compound index MongoDB doesn’t need to use all the fields of that index, the leftmost prefix is enough. For example an index on (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C). As a consequence, if I have two different indexes, one on (A, B, C) and another one on (A, B), the second is a duplicate because the first can be used the same way for solving the query with the combinations (A, B) and (A).
Then, find all duplicate indexes and drop them since they’re useless. Just be aware and check that your application doesn’t use hint() on the indexes you’re going to drop.
In order to avoid manually checking all the collections to discover the duplicates, I provide here a javascript code for that:
var ldb = db.adminCommand( { listDatabases: 1 } ); for ( i = 0; i < ldb.databases.length; i++ ) { if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local') { print('DATABASE ',ldb.databases[i].name); print("+++++++++++++++++++++++++++++++++++++++++") var db = db.getSiblingDB(ldb.databases[i].name); var cpd = db.getCollectionNames(); for ( j = 0; j < cpd.length; j++ ) { if ( cpd[j] != 'system.profile' ) { var indexes = JSON.parse(JSON.stringify(db.runCommand( { listIndexes: cpd[j] } ).cursor.firstBatch)); print("COLL :"+cpd[j]); for ( k = 0; k < indexes.length; k++ ) { indexes[k] = (((JSON.stringify(indexes[k].key)).replace("{","")).replace("}","")).replace(/,/g ,"_"); } var founddup = false; for ( k1 = 0; k1 < indexes.length; k1++ ) { for ( k2 = 0; k2 < indexes.length; k2++ ) { if ( k1 != k2 ) { if (indexes[k1].startsWith(indexes[k2],0)) { print("{ "+indexes[k2]+" } is the left prefix of { "+indexes[k1]+" } and should be dropped"); founddup = true; } } } } if (!founddup) { print("no duplicate indexes found"); } print("\n"); } } print("\n"); } }
Note: this script is just an initial test and could be improved, but it should work in most cases.
Find Unused Indexes
MongoDB maintains internal statistics about index usage. Any time an index is used for solving a query a specific counter is an increment. After running MongoDB for a significant amount of time – days or weeks – the statistics are reliable and we can find out which indexes have been used or not.
For looking at the index stats, MongoDB provides a stage in the aggregation pipeline: $indexStats
Here you can see an example:
rs_test:PRIMARY> db.restaurants.aggregate([ { $indexStats: {} } ]).pretty() { "name" : "borough_1", "key" : { "borough" : 1 }, "host" : "ip-172-30-2-12:27017", "accesses" : { "ops" : NumberLong(312), "since" : ISODate("2021-03-17T13:48:51.305Z") }, "spec" : { "v" : 2, "key" : { "borough" : 1 }, "name" : "borough_1" } } { "name" : "_id_", "key" : { "_id" : 1 }, "host" : "ip-172-30-2-12:27017", "accesses" : { "ops" : NumberLong(12), "since" : ISODate("2021-03-17T13:48:51.305Z") }, "spec" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } } { "name" : "cuisine_1_borough_1", "key" : { "cuisine" : 1, "borough" : 1 }, "host" : "ip-172-30-2-12:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2021-03-17T13:48:51.305Z") }, "spec" : { "v" : 2, "key" : { "cuisine" : 1, "borough" : 1 }, "name" : "cuisine_1_borough_1" } }
The accesses.ops is the number of times the index has been used. In the example you can see the { borough:1 } has been used 312 times, the index { _id } 12 times, and the index { cuisine:1, borough: 1} 0 times. The last one could be dropped.
If the database is running for a long time with millions of queries executed and if an index was not used, most probably it won’t be used even in the future.
Then you should consider dropping the unused indexes in order to improve the writes, reduce the cache pressure, and saving disk space as well.
Using the following script you can find out the index statistics for all the collections:
var ldb=db.adminCommand( { listDatabases: 1 } ); for (i=0; i<ldb.databases.length; i++) { print('DATABASE ',ldb.databases[i].name); if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local' ) { var db = db.getSiblingDB(ldb.databases[i].name); var cpd = db.getCollectionNames(); for (j=0; j<cpd.length; j++) { if ( cpd[j] != 'system.profile' ) { print(cpd[j]); var pui = db.runCommand({ aggregate : cpd[j] ,pipeline : [{$indexStats: {}}],cursor: { batchSize: 100 } }); printjson(pui); } } print('\n\n'); } }
Look for the indexes having “ops”: NumberLong(0)
Conclusion
Creating indexes for solving queries is a good habit, but be aware to not abuse indexing. Excessive indexing can lead to slower writes, excessive pressure on the memory cache, and more evictions.
You should consider maintaining your indexes from time to time dropping all the duplicates and the unused indexes. The scripts provided in this article may help your index analysis.