Nov
21
2018
--

Identifying Unused Indexes in MongoDB

mongodb index usage stats PMM visualization

Like MySQL, having too many indexes on a MongoDB collection not only affects overall write performance, but disk and memory resources as well. While MongoDB holds predictably well in scaling both reads and writes options, maintaining a heathly schema design should always remain a core character of a good application stack.

Aside from knowing when to add an index to improve query performance, and how to modify indexes to satisfy changing query complexities, we also need to know how to identify unused indexes and cut their unnecessary overhead.

First of all, you can already identify access operation counters from each collection using the

$indexStats

  (

indexStats

  command before 3.0) aggregation command. This command provides two important pieces of information: the

ops

  counter value, and

since

 , which is when the ops counter first iterated to one. It is reset when the

mongod

  instance is restarted.

m34:PRIMARY> db.downloads.aggregate( [ { $indexStats: { } } ] ).pretty()
{
	"name" : "_id_",
	"key" : {
		"_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:53:31.429Z")
	}
}
{
	"name" : "h_id_1",
	"key" : {
		"h_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:54:57.634Z")
	}
}

From this information, if the ops counter is zero for any index, then we can assume it has not been used either since the index was added or since the server was restarted, with a few exceptions. An index might be unique and not used at all (a uniqueness check on INSERT does not increment the ops counter). The documentation also indicates that index stats counter does not get updated by TTL indexes expiration or chunk split and migration operations.

Be aware of occasional index use

One golden rule, however, is that this type of observation based on type is subjective – before you decide to drop the index, make sure that the counter has collected for a considerable amount of time. Dropping an index that is only used once a month for some heavy reporting can be problematic.

The same information from

$indexStats

  can also be made available to PMM. By default, the

mongo_exporter

  does not include this this information but it can be enabled as an additional collection parameter.

sudo pmm-admin add mongodb:metrics --uri 127.0.0.1:27018 -- -collect.indexusage

Once enabled, we can create a custom graph for this information from any PMM dashboard, as shown below. As mentioned above, any index(es) that has zero values will not have been used for the current time range in the graph. One minor issue with the collector is that each metric does not come with the database and collection information. Consequently, we cannot filter to the collection level yet, we have an improvement request open for that.

MongoDB index usage dashboard report from percona monitoring and management

An alternative view to this information from Grafana PMM is available from the Time Series to Aggregation table panel, shown below. One advantage of having these metrics in PMM is that the data survives an instance restart. Of course, to be useful for identifying unused indexes, the retention period has to match or exceed your complete application “cycle” period.MongoDB index usage stats from PMM

Given that in a MongoDB replicaset, you can delegate data bearing member nodes to different roles, perhaps with tags and priorities. You can also have nodes with different sets of indexes. Being able to identify the sets of indexes needed at the node level allows you to optimize replication, queries, and resource usage.

More Resources

We have an introductory series of posts on MongoDB indexes available on this blog. Read Part 1 here.

You can download Percona Server for MongoDB – all Percona software is open source and free.

Sep
06
2018
--

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

MongoDB explain() method optimization tool

MongoDB explain() method optimization toolThis is the second part of a two parts series. In MongoDB: index usage and MongoDB explain() we introduced the main index types supported by MongoDB, and how to create and use them. In this second article, we are going to see some examples on how to use explain() method to investigate queries. Do you need to optimize MongoDB queries? You’ll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn’t!

What is explain()

explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain() we can see find out stuff like:

  • how many documents were scanned
  • how many documents were returned
  • which index was used
  • how long the query took to be executed
  • which alternative execution plans were evaluated

…and other useful information.

The aim of using explain() is to find out how to improve the query. For example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB’s explain() method are exactly the same.

Explainable object

You can apply the explain() method to a query or a cursor in the following way, as per any other method:

MongoDB > db.mycollection.find().explain()

However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.

We can create an explainable object like this:

MongoDB > var myexp = db.mycollection.explain()

Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:

MongoDB > myexp.find()
MongoDB > myexp.update()
MongoDB > myexp.remove()
MongoDB > myexp.aggregate()

Restaurants test database

To see some examples we need a collection with some data.
For our purposes we can use the New York restaurants database. You can download this from the following url:

https://www.w3resource.com/mongodb-exercises/retaurants.zip

Unzip the archive, and import the JSON file into MongoDB:

$ unzip retaurants.zip
$ mongoimport --host 127.0.0.1 -d test -c restaurants --file retaurants.json

This collection has 3772 documents, all the restaurants in New York City. Here is a document sample.

MongoDB > use test
switched to db test
MongoDB > db.restaurants.find().pretty().limit(1)
{
	"_id" : ObjectId("5b71b3281979e24aa18c0121"),
	"address" : {
		"building" : "1007",
		"coord" : [
			-73.856077,
			40.848447
		],
		"street" : "Morris Park Ave",
		"zipcode" : "10462"
	},
	"borough" : "Bronx",
	"cuisine" : "Bakery",
	"grades" : [
		{
			"date" : ISODate("2014-03-03T00:00:00Z"),
			"grade" : "A",
			"score" : 2
		},
		{
			"date" : ISODate("2013-09-11T00:00:00Z"),
			"grade" : "A",
			"score" : 6
		},
		{
			"date" : ISODate("2013-01-24T00:00:00Z"),
			"grade" : "A",
			"score" : 10
		},
		{
			"date" : ISODate("2011-11-23T00:00:00Z"),
			"grade" : "A",
			"score" : 9
		},
		{
			"date" : ISODate("2011-03-10T00:00:00Z"),
			"grade" : "B",
			"score" : 14
		}
	],
	"name" : "Morris Park Bake Shop",
	"restaurant_id" : "30075445"
}

Explain() verbosity

The explain() method has three verbosity modes.

  • queryPlanner – this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
  • executionStats – this mode includes all the information provided by the queryPlanner, plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
  • allPlansExecution – this mode includes all the information provided by the executionStats plus information about the discarded execution plans

We’ll see the explain() output in the following examples.

Example 1

It’s time to use the restaurants collection to run our first example: find out all the restaurants in the Manhattan borough.

Let’s create first the explainable object with the executionStats mode.

MongoDB > var exp = db.restaurants.explain("executionStats")

Then let’s investigate the query.

MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 3772,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 3774,
			"advanced" : 1883,
			"needTime" : 1890,
			"needYield" : 0,
			"saveState" : 29,
			"restoreState" : 29,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 3772
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Here we can see the output of the explain(). First of all, we can clearly distinguish the “queryPlanner” and the “executionStats” modes. We won’t describe each one of the values, as some are really intuitive. Let’s have a look just at some of them:

queryPlanner.winningPlan.stage = “COLLSCAN”
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query it’s not optimized because all the documents must be read.

queryPlanner.winningPlan.rejectedPlans = []
It’s empty. There aren’t rejected plans. When the query needs to be executed with COLLSCAN the only execution plan is the winning plan. We don’t have any indexes in the collection, apart the one on _id, so there aren’t other execution plans.

executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.

executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN

executionStats.executionTimeMillis = 1
The execution time of the query. It’s just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!

How can we improve the query?

In this case it’s simple. Let’s try to create a single field index on borough, the only condition we have in the find(). Then let’s try to explain the same query again.

MongoDB > db.restaurants.createIndex( {borough: 1} )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 1883,
		"totalDocsExamined" : 1883,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 1884,
			"advanced" : 1883,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 14,
			"restoreState" : 14,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1883,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1883,
				"executionTimeMillisEstimate" : 0,
				"works" : 1884,
				"advanced" : 1883,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 14,
				"restoreState" : 14,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				},
				"keysExamined" : 1883,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Interpreting the output

Now, the output is completely different. Let’s have a look at some of the most relevant values.

queryPlanner.winningPlan.inputStage.stage = “IXSCAN”
This is very important. IXSCAN means that now MongoDB doesn’t need to do a collection scan but an index can be used to find the documents.

queryPlanner.winningPlan.inputStage.indexName = “borough_1”
The name of the index used. This is the default name of an index: the name of the field plus the _1 for ascending or _-1 for descending order.

queryPlanner.winningPlan.inputStage.direction = “forward”
MongoDB traverses the index in a forward direction.

executionStats.nRertuned = 1883
The number of documents returned. Obviously this is the same as before.

executionStats.totalKeysExamined = 1883
The number of keys examined in the index.

executionStats.totalDocsExamined  = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.

We have optimized the query.

Example 2

Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.

MongoDB > var exp = db.restaurants.explain()
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"cuisine" : {
							"$eq" : "Italian"
						}
					},
					{
						"grades.score" : {
							"$gt" : 50
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

We have a COLLSCAN again. Let’s try to improve the query creating an index on the cuisine field.

MongoDB > var exp = db.restaurants.explain("executionStats")
MongoDB > db.restaurants.createIndex({cuisine:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 4,
		"totalKeysExamined" : 325,
		"totalDocsExamined" : 325,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 326,
			"advanced" : 6,
			"needTime" : 319,
			"needYield" : 0,
			"saveState" : 2,
			"restoreState" : 2,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 325,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 325,
				"executionTimeMillisEstimate" : 0,
				"works" : 326,
				"advanced" : 325,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 2,
				"restoreState" : 2,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				},
				"keysExamined" : 325,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

The query has improved. The created index cuisine_1 is used, but still we have 325 documents examined with only 6 documents returned. Let’s see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.

MongoDB > db.restaurants.createIndex({cuisine:1, "grades.score":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 3,
	"numIndexesAfter" : 4,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"grades.score" : {
						"$gt" : 50
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"cuisine" : 1
					},
					"indexName" : "cuisine_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"cuisine" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"cuisine" : [
							"[\"Italian\", \"Italian\"]"
						]
					}
				}
			}
		]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 7,
		"totalDocsExamined" : 6,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 9,
			"advanced" : 6,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 6,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 6,
				"executionTimeMillisEstimate" : 0,
				"works" : 8,
				"advanced" : 6,
				"needTime" : 1,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				},
				"keysExamined" : 7,
				"seeks" : 1,
				"dupsTested" : 7,
				"dupsDropped" : 1,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Now, the winning plan uses the new compound index cuisine_1_grades.score_1 and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.

We have optimized the query.

Example 3

Let’s find out all the restaurants that don’t prepare any “American” cuisine in Brooklyn and achieved a grade of ‘A’. We want to see the  results ordered by cuisine, descending.

At this point you should be a little familiar with the explain() output, so in the next box we truncate it for the sake of simplicity, leaving only the relevant parts.

MongoDB > var exp.restaurants.explain("executionStats")
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"borough" : {
							"$eq" : "Brooklyn"
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 9,
		"totalKeysExamined" : 2518,
		"totalDocsExamined" : 2517,
...
...
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field—it’s because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:”backward” —this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1 we’ll see a completely different winning plan.

MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"name":1})
...
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"name" : -1
			},
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"filter" : {
						"$and" : [
							{
								"grades.grade" : {
									"$eq" : "A"
								}
							},
							{
								"$nor" : [
									{
										"cuisine" : {
											"$eq" : "American "
										}
									}
								]
							}
						]
					},
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"borough" : 1
						},
						"indexName" : "borough_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"borough" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"borough" : [
								"[\"Brooklyn\", \"Brooklyn\"]"
							]
						}
					}
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 13,
		"totalKeysExamined" : 684,
		"totalDocsExamined" : 684,
...
...

In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.

Let’s return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That’s not optimal. Let’s see if we can further improve the query by adding another compound index on (cuisine,borough,grades.grade).

MongoDB > db.restaurants.createIndex({cuisine:1,borough:1,"grades.grade":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 4,
	"numIndexesAfter" : 5,
	"ok" : 1
}
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
...
...		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$nor" : [
					{
						"cuisine" : {
							"$eq" : "American "
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"borough" : 1,
					"grades.grade" : 1
				},
				"indexName" : "cuisine_1_borough_1_grades.grade_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"borough" : [ ],
					"grades.grade" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					],
					"borough" : [
						"[\"Brooklyn\", \"Brooklyn\"]"
					],
					"grades.grade" : [
						"[\"A\", \"A\"]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 6,
		"totalKeysExamined" : 591,
		"totalDocsExamined" : 493,
...
...

Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.

We have optimized the query.

Example 4

This is the final example. Let’s find out the restaurants where the grades array contains a grade of ‘A’ and a score of 9 for a specific date.

MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"grades.date" : {
							"$eq" : ISODate("2014-08-11T00:00:00Z")
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"direction" : "forward"
		},
...

A COLLSCAN again. In the query we notice that all the conditions refer to embedded fields in an array object. So let’s try to create a multikey index. Let’s just a create the index on the date field only and see what happens.

MongoDB > db.restaurants.createIndex({"grades.date":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 5,
	"numIndexesAfter" : 6,
	"ok" : 1
}
MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"grades.date" : 1
				},
				"indexName" : "grades.date_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"grades.date" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"grades.date" : [
						"[new Date(1407715200000), new Date(1407715200000)]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 15,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 22,
		"totalDocsExamined" : 22,
...
...

MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {“grades.date”:1, “grades.grade”:1, “grades.score”:1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.

Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don’t need to put lot of fields in the index, the most selective ones are probably all that you need to improve the queries.

Conclusion

That’s all folks. In this two part series we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don’t expect now that you’ll know everything, but we hope this will be a good a starting point for you to practice MongoDB query optimization using explain(). You’ll find plenty more information in the manual and on the internet about indexes and explain(), if you are motivated to find out more.

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: Investigate Queries with explain() and Index Usage (part 2) appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com