Sep
06
2018
--

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

MongoDB explain() method optimization tool

MongoDB explain() method optimization toolThis is the second part of a two parts series. In MongoDB: index usage and MongoDB explain() we introduced the main index types supported by MongoDB, and how to create and use them. In this second article, we are going to see some examples on how to use explain() method to investigate queries. Do you need to optimize MongoDB queries? You’ll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn’t!

What is explain()

explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain() we can see find out stuff like:

  • how many documents were scanned
  • how many documents were returned
  • which index was used
  • how long the query took to be executed
  • which alternative execution plans were evaluated

…and other useful information.

The aim of using explain() is to find out how to improve the query. For example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB’s explain() method are exactly the same.

Explainable object

You can apply the explain() method to a query or a cursor in the following way, as per any other method:

MongoDB > db.mycollection.find().explain()

However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.

We can create an explainable object like this:

MongoDB > var myexp = db.mycollection.explain()

Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:

MongoDB > myexp.find()
MongoDB > myexp.update()
MongoDB > myexp.remove()
MongoDB > myexp.aggregate()

Restaurants test database

To see some examples we need a collection with some data.
For our purposes we can use the New York restaurants database. You can download this from the following url:

https://www.w3resource.com/mongodb-exercises/retaurants.zip

Unzip the archive, and import the JSON file into MongoDB:

$ unzip retaurants.zip
$ mongoimport --host 127.0.0.1 -d test -c restaurants --file retaurants.json

This collection has 3772 documents, all the restaurants in New York City. Here is a document sample.

MongoDB > use test
switched to db test
MongoDB > db.restaurants.find().pretty().limit(1)
{
	"_id" : ObjectId("5b71b3281979e24aa18c0121"),
	"address" : {
		"building" : "1007",
		"coord" : [
			-73.856077,
			40.848447
		],
		"street" : "Morris Park Ave",
		"zipcode" : "10462"
	},
	"borough" : "Bronx",
	"cuisine" : "Bakery",
	"grades" : [
		{
			"date" : ISODate("2014-03-03T00:00:00Z"),
			"grade" : "A",
			"score" : 2
		},
		{
			"date" : ISODate("2013-09-11T00:00:00Z"),
			"grade" : "A",
			"score" : 6
		},
		{
			"date" : ISODate("2013-01-24T00:00:00Z"),
			"grade" : "A",
			"score" : 10
		},
		{
			"date" : ISODate("2011-11-23T00:00:00Z"),
			"grade" : "A",
			"score" : 9
		},
		{
			"date" : ISODate("2011-03-10T00:00:00Z"),
			"grade" : "B",
			"score" : 14
		}
	],
	"name" : "Morris Park Bake Shop",
	"restaurant_id" : "30075445"
}

Explain() verbosity

The explain() method has three verbosity modes.

  • queryPlanner – this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
  • executionStats – this mode includes all the information provided by the queryPlanner, plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
  • allPlansExecution – this mode includes all the information provided by the executionStats plus information about the discarded execution plans

We’ll see the explain() output in the following examples.

Example 1

It’s time to use the restaurants collection to run our first example: find out all the restaurants in the Manhattan borough.

Let’s create first the explainable object with the executionStats mode.

MongoDB > var exp = db.restaurants.explain("executionStats")

Then let’s investigate the query.

MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 3772,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 3774,
			"advanced" : 1883,
			"needTime" : 1890,
			"needYield" : 0,
			"saveState" : 29,
			"restoreState" : 29,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 3772
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Here we can see the output of the explain(). First of all, we can clearly distinguish the “queryPlanner” and the “executionStats” modes. We won’t describe each one of the values, as some are really intuitive. Let’s have a look just at some of them:

queryPlanner.winningPlan.stage = “COLLSCAN”
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query it’s not optimized because all the documents must be read.

queryPlanner.winningPlan.rejectedPlans = []
It’s empty. There aren’t rejected plans. When the query needs to be executed with COLLSCAN the only execution plan is the winning plan. We don’t have any indexes in the collection, apart the one on _id, so there aren’t other execution plans.

executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.

executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN

executionStats.executionTimeMillis = 1
The execution time of the query. It’s just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!

How can we improve the query?

In this case it’s simple. Let’s try to create a single field index on borough, the only condition we have in the find(). Then let’s try to explain the same query again.

MongoDB > db.restaurants.createIndex( {borough: 1} )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 1883,
		"totalDocsExamined" : 1883,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 1884,
			"advanced" : 1883,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 14,
			"restoreState" : 14,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1883,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1883,
				"executionTimeMillisEstimate" : 0,
				"works" : 1884,
				"advanced" : 1883,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 14,
				"restoreState" : 14,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				},
				"keysExamined" : 1883,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Interpreting the output

Now, the output is completely different. Let’s have a look at some of the most relevant values.

queryPlanner.winningPlan.inputStage.stage = “IXSCAN”
This is very important. IXSCAN means that now MongoDB doesn’t need to do a collection scan but an index can be used to find the documents.

queryPlanner.winningPlan.inputStage.indexName = “borough_1”
The name of the index used. This is the default name of an index: the name of the field plus the _1 for ascending or _-1 for descending order.

queryPlanner.winningPlan.inputStage.direction = “forward”
MongoDB traverses the index in a forward direction.

executionStats.nRertuned = 1883
The number of documents returned. Obviously this is the same as before.

executionStats.totalKeysExamined = 1883
The number of keys examined in the index.

executionStats.totalDocsExamined  = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.

We have optimized the query.

Example 2

Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.

MongoDB > var exp = db.restaurants.explain()
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"cuisine" : {
							"$eq" : "Italian"
						}
					},
					{
						"grades.score" : {
							"$gt" : 50
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

We have a COLLSCAN again. Let’s try to improve the query creating an index on the cuisine field.

MongoDB > var exp = db.restaurants.explain("executionStats")
MongoDB > db.restaurants.createIndex({cuisine:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 4,
		"totalKeysExamined" : 325,
		"totalDocsExamined" : 325,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 326,
			"advanced" : 6,
			"needTime" : 319,
			"needYield" : 0,
			"saveState" : 2,
			"restoreState" : 2,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 325,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 325,
				"executionTimeMillisEstimate" : 0,
				"works" : 326,
				"advanced" : 325,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 2,
				"restoreState" : 2,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				},
				"keysExamined" : 325,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

The query has improved. The created index cuisine_1 is used, but still we have 325 documents examined with only 6 documents returned. Let’s see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.

MongoDB > db.restaurants.createIndex({cuisine:1, "grades.score":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 3,
	"numIndexesAfter" : 4,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"grades.score" : {
						"$gt" : 50
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"cuisine" : 1
					},
					"indexName" : "cuisine_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"cuisine" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"cuisine" : [
							"[\"Italian\", \"Italian\"]"
						]
					}
				}
			}
		]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 7,
		"totalDocsExamined" : 6,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 9,
			"advanced" : 6,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 6,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 6,
				"executionTimeMillisEstimate" : 0,
				"works" : 8,
				"advanced" : 6,
				"needTime" : 1,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				},
				"keysExamined" : 7,
				"seeks" : 1,
				"dupsTested" : 7,
				"dupsDropped" : 1,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Now, the winning plan uses the new compound index cuisine_1_grades.score_1 and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.

We have optimized the query.

Example 3

Let’s find out all the restaurants that don’t prepare any “American” cuisine in Brooklyn and achieved a grade of ‘A’. We want to see the  results ordered by cuisine, descending.

At this point you should be a little familiar with the explain() output, so in the next box we truncate it for the sake of simplicity, leaving only the relevant parts.

MongoDB > var exp.restaurants.explain("executionStats")
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"borough" : {
							"$eq" : "Brooklyn"
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 9,
		"totalKeysExamined" : 2518,
		"totalDocsExamined" : 2517,
...
...
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field—it’s because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:”backward” —this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1 we’ll see a completely different winning plan.

MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"name":1})
...
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"name" : -1
			},
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"filter" : {
						"$and" : [
							{
								"grades.grade" : {
									"$eq" : "A"
								}
							},
							{
								"$nor" : [
									{
										"cuisine" : {
											"$eq" : "American "
										}
									}
								]
							}
						]
					},
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"borough" : 1
						},
						"indexName" : "borough_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"borough" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"borough" : [
								"[\"Brooklyn\", \"Brooklyn\"]"
							]
						}
					}
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 13,
		"totalKeysExamined" : 684,
		"totalDocsExamined" : 684,
...
...

In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.

Let’s return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That’s not optimal. Let’s see if we can further improve the query by adding another compound index on (cuisine,borough,grades.grade).

MongoDB > db.restaurants.createIndex({cuisine:1,borough:1,"grades.grade":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 4,
	"numIndexesAfter" : 5,
	"ok" : 1
}
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
...
...		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$nor" : [
					{
						"cuisine" : {
							"$eq" : "American "
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"borough" : 1,
					"grades.grade" : 1
				},
				"indexName" : "cuisine_1_borough_1_grades.grade_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"borough" : [ ],
					"grades.grade" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					],
					"borough" : [
						"[\"Brooklyn\", \"Brooklyn\"]"
					],
					"grades.grade" : [
						"[\"A\", \"A\"]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 6,
		"totalKeysExamined" : 591,
		"totalDocsExamined" : 493,
...
...

Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.

We have optimized the query.

Example 4

This is the final example. Let’s find out the restaurants where the grades array contains a grade of ‘A’ and a score of 9 for a specific date.

MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"grades.date" : {
							"$eq" : ISODate("2014-08-11T00:00:00Z")
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"direction" : "forward"
		},
...

A COLLSCAN again. In the query we notice that all the conditions refer to embedded fields in an array object. So let’s try to create a multikey index. Let’s just a create the index on the date field only and see what happens.

MongoDB > db.restaurants.createIndex({"grades.date":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 5,
	"numIndexesAfter" : 6,
	"ok" : 1
}
MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"grades.date" : 1
				},
				"indexName" : "grades.date_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"grades.date" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"grades.date" : [
						"[new Date(1407715200000), new Date(1407715200000)]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 15,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 22,
		"totalDocsExamined" : 22,
...
...

MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {“grades.date”:1, “grades.grade”:1, “grades.score”:1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.

Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don’t need to put lot of fields in the index, the most selective ones are probably all that you need to improve the queries.

Conclusion

That’s all folks. In this two part series we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don’t expect now that you’ll know everything, but we hope this will be a good a starting point for you to practice MongoDB query optimization using explain(). You’ll find plenty more information in the manual and on the internet about indexes and explain(), if you are motivated to find out more.

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: Investigate Queries with explain() and Index Usage (part 2) appeared first on Percona Database Performance Blog.

Aug
01
2016
--

Introduction into storage engine troubleshooting: Q & A

storage engine troubleshooting

 storage engine troubleshootingIn this blog, I will provide answers to the Q & A for the “Introduction into storage engine troubleshooting” webinar.

First, I want to thank everybody for attending the July 14 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: At which isolation level do 

pt-online-schema-change

 and 

pt-archive

  copy data from a table?

A: Both tools do not change the server’s default transaction isolation level. Use either

REPEATABLE READ

 or set it in my

.cnf

.

Q: Can I create an index to optimize a query which has group by A and order by B, both from different tables and A column is from the first table in the two table join?

A: Do you mean a query like

SELECT ... FROM a, b GROUP BY a.A ORDER BY b.B

 ? Yes, this is possible:

mysql> explain select A, B, count(*) from a join b on(a.A=b.id) WHERE b.B < 4 GROUP BY a.A, b.B ORDER BY b.B ASC;
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref       | rows | Extra                                                     |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | b     | range | PRIMARY,B     | B    | 5       | NULL      |   15 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | a     | ref   | A             | A    | 5       | test.b.id |    1 | Using index                                               |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

Q: Where can I find recommendations on what kind of engine to use for different application types or use cases?

A: Storage engines are always being actively developed, therefore I suggest that you don’t search for generic recommendations. These can be outdated just a few weeks after they are written. Study engines instead. For example, just a few years ago MyISAM was the only engine (among those officially supported) that could work with FULLTEXT indexes and SPATIAL columns. Now InnoDB supports both: FULLTEXT indexes since version 5.6 and GIS features in 5.7. Today I can recommend InnoDB as a general-purpose engine for all installations, and TokuDB for write-heavy workloads when you cannot use high-speed disks.

Alternative storage engines can help to realize specific business needs. For example, CONNECT brings data to your server from many sources, SphinxSE talks to the Sphinx daemon, etc.

Other alternative storage engines increase the speed of certain workloads. Memory, for example, can be a good fit for temporary tables.

Q: Can you please explain how we find the full text of the query when we query the view ‘statements_with_full_table_Scans’?

A: Do you mean view in sys schema? Sys schema views take information from

summary_*

 and digests it in Performance Schema, therefore it does not contain full queries (only digests). Full text of the query can be found in the

events_statements_*

  tables in the Performance Schema. Note that even the 

events_statements_history_long

  table can be rewritten very quickly, and you may want to save data from it periodically.

Q: Hi is TokuDB for the new document protocol?

A: As Alex Rubin showed in his detailed blog post, the new document protocol just converts NoSQL queries into SQL, and is thus not limited to any storage engine. To use documents and collections, a storage engine must support generated columns (which TokuDB currently does not). So support of X Protocol for TokuDB is limited to relational tables access.

Q: Please comment on “read committed” versus “repeatable read.”
Q: Repeatable read holds the cursor on the result set for the client versus read committed where the cursor is updated after a transaction.

A:

READ COMMITTED

 and

REPEATABLE READ

 are transaction isolation levels, whose details are explained here.
I would not correlate locks set on table rows in different transaction isolation modes with the result set. A transaction with isolation level

REPEATABLE READ

  instead creates a snapshot of rows that are accessed by the transaction. Let’s consider a table:

mysql> create table ti(id int not null primary key, f1 int) engine=innodb;
Query OK, 0 rows affected (0.56 sec)
mysql> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9);
Query OK, 9 rows affected (0.03 sec)
Records: 9  Duplicates: 0  Warnings: 0

Then start the transaction and select a few rows from this table:

mysql1> begin;
Query OK, 0 rows affected (0.00 sec)
mysql1> select * from ti where id < 5;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.04 sec)

Now let’s update another set of rows in another transaction:

mysql2> update ti set f1 = id*2 where id > 5;
Query OK, 4 rows affected (0.06 sec)
Rows matched: 4  Changed: 4  Warnings: 0
mysql2> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
|  6 |   12 |
|  7 |   14 |
|  8 |   16 |
|  9 |   18 |
+----+------+
9 rows in set (0.00 sec)

You see that the first four rows – which we accessed in the first transaction – were not modified, and last four were modified. If InnoDB only saved the cursor (as someone answered above) we would expect to see the same result if we ran 

SELECT * ...

  query in our old transaction, but it actually shows whole table content before modification:

mysql1> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
|  6 |    6 |
|  7 |    7 |
|  8 |    8 |
|  9 |    9 |
+----+------+
9 rows in set (0.00 sec)

So “snapshot”  is a better word than “cursor” for the result set. In the case of

READ COMMITTED

, the first transaction would see modified rows:

mysql1> drop table ti;
Query OK, 0 rows affected (0.11 sec)
mysql1> create table ti(id int not null primary key, f1 int) engine=innodb;
Query OK, 0 rows affected (0.38 sec)
mysql1> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9);
Query OK, 9 rows affected (0.04 sec)
Records: 9  Duplicates: 0  Warnings: 0
mysql1> set transaction isolation level read committed;
Query OK, 0 rows affected (0.00 sec)
mysql1> begin;
Query OK, 0 rows affected (0.00 sec)
mysql1> select * from ti where id < 5;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

Let’s update all rows in the table this time:

mysql2> update ti set f1 = id*2;
Query OK, 9 rows affected (0.04 sec)
Rows matched: 9  Changed: 9  Warnings: 0

Now the first transaction sees both the modified rows with id >= 5 (not in the initial result set), but also the modified rows with id < 5 (which existed in the initial result set):

mysql1> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
|  6 |   12 |
|  7 |   14 |
|  8 |   16 |
|  9 |   18 |
+----+------+
9 rows in set (0.00 sec)

Jan
02
2015
--

The MySQL Query Cache: How it works, plus workload impacts (good and bad)

Query caching is one of the prominent features in MySQL and a vital part of query optimization. It is important to know how it works as it has the potential to cause significant performance improvements – or a slowdown – of your workload.

The MySQL query cache is a global one shared among the sessions. It caches the select query along with the result set, which enables the identical selects to execute faster as the data fetches from the in memory. It is important to have everything identical, no new comments, spaces, or most significantly differences in the WHERE clause. Basically when you trigger a select query, if it is available in the cache; it fetches from there or it considers the query as a new one and will go to the parser.

Even though it has some nice advantages, the MySQL query cache has its own downsides too. Well, let’s think about this: If you are frequently updating the table, you are then invalidating the query cache for ALL queries cached for that table. So really, anytime you have a “frequently updated table” means you’re probably not going to get any sort of good usage from the query cache. See the below example.

mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 0       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.06 sec)
mysql> select * from d.t1;
405 rows in set (0.05 sec)
mysql> select * from d.t1 where id=88995159;
1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1020600 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 2       |
| Qcache_total_blocks     | 6       |
+-------------------------+---------+
8 rows in set (0.00 sec)

From the above  we are sure the queries are cached. Let us try an insert and see the status, it will invalidate the query cache and reclaim the memory.

mysql> insert into d.t1 (data)value('Welcome');
Query OK, 1 row affected (0.05 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.00 sec)

Now let us think about how to decide the query cache size:

To exemplify:-  I am having a mysql instance with two tables “t” and “t1″. Table “t” is with numerous records and “t1″ is with a fewer records. Let’s restart the mysql and see the query cache details.

mysql> show variables like 'query_cache_size';
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| query_cache_size | 1048576 |
+------------------+---------+
1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1031320 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 0       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 1       |
+-------------------------+---------+
8 rows in set (0.01 sec)

From the above status note the below four points.

1) There is around 1 MB free space with Qcache.

2) The queries in Qcache are zero.

3) There is no Qcache hits.

4) Qcache lowmem prunes is zero.

mysql> select * from d.t1;
405 rows in set (0.03 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1021624 |
| Qcache_hits             | 0       |
| Qcache_inserts          | 1       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 1       |
| Qcache_queries_in_cache | 1       |
| Qcache_total_blocks     | 4       |
+-------------------------+---------+
8 rows in set (0.01 sec)

From the aforesaid status it is clear the query has been cached and it should execute much faster in the next try and increase the Qcache hits status variable by one.

mysql> select * from d.t1;
405 rows in set (0.00 sec).
mysql>  SHOW STATUS LIKE "%Qcache_hits%";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Qcache_hits   | 1     |
+---------------+-------+
1 row in set (0.00 sec)

Now let us see how the data is getting pruned from the Qcache. For this I will execute a select on table “t” which is having massive records.

mysql> select * from d.t where id > 78995159;
mysql>  SHOW STATUS LIKE "Qcache_lowmem_prunes";
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Qcache_lowmem_prunes | 1     |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> select * from d.t1;
405 rows in set (0.02 sec)
mysql> SHOW STATUS LIKE "qcache%";
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_free_blocks      | 1       |
| Qcache_free_memory      | 1021624 |
| Qcache_hits             | 1       |
| Qcache_inserts          | 2       |
| Qcache_lowmem_prunes    | 1       |
| Qcache_not_cached       | 6       |
| Qcache_queries_in_cache | 1       |
| Qcache_total_blocks     | 4       |
+-------------------------+---------+
8 rows in set (0.01 sec)

The Qcache_lowmem_prunes is the status variable which indicates how many times MySQL had to clear/prune some data from the Qcache to make space for the outputs of other queries. We need to observe the Qcache_lowmem_prunes  status variable and try to increase/adjust the size of the cache till we get a very low value ratio for the variable.

It is also undesirable to keep the query cache relatively high value at 256 MB as the Qcache invalidation becomes costly. For details, Peter Zaitsev wrote about this a few years ago in a post that’s still relevant today titled, “Beware large Query_Cache sizes.”

Contention often makes query cache the bottleneck instead of help when you have many CPU cores. Generally, query cache should be off unless proven useful for your workload. So it is important to know your environment well to enable the query cache and to decide what the query cache size should be.

There will also be circumstances where there is no chance of identical selects and in this case it is important to set the query_cache_size and query_cache_type variable to zero. The query_cache_type variable controls the query cache and  setting the query_cache_type to zero will reduce the significant overhead in query execution. On a highly concurrent environment there are chances of query cache mutex, which may become the source of a bottleneck. Setting the query_cache_type to zero will avoid the query cache mutex, as the query cache cannot be enabled at runtime which reduces the overhead in query execution. Please go through the details of QUERY CACHE ENHANCEMENTS with Percona Server.

The post The MySQL Query Cache: How it works, plus workload impacts (good and bad) appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com