Jul
06
2021
--

Sharding With Zones Based on Compound Shard-Keys on MongoDB 4.4

Compound Shard-Keys on MongoDB 4.4

Compound Shard-Keys on MongoDB 4.4This article was written with the main purpose of showing you how to determine zones on a shard when using compound shard keys.

Defining Zones in Shards means pre-defining where certain chunks will be stored and amongst which set of particular shards they will be balanced according to the shard key definition.

MongoDB 4.4 brings the possibility to shard a collection and determine zones by compound keys, including mixing a hash key with non-hashed keys.  Hashed keys may be placed in the prefix of the index (shard key) or not. Depending on the method chosen, different settings must be in compliance with the balancer and this article will also show you a couple of examples.

Defining the Index Prior to Sharding

Even though it is not always required to create indexes in advance of running the “shardCollection” command, I am defining it to better illustrate the procedure

  • The collections colltest1 will be sharded based on a key containing the hashed key in the prefix
mongos> db.getSiblingDB("dbtest").colltest1.createIndex({_id:"hashed","location":1},{unique:false})
{
	"raw" : {
		"shard02/localhost:40003" : {
			"createdCollectionAutomatically" : true,
			"numIndexesBefore" : 1,
			"numIndexesAfter" : 2,
			"commitQuorum" : "votingMembers",
			"ok" : 1
		}
	},
	"ok" : 1,
	"operationTime" : Timestamp(1624357456, 7),
	"$clusterTime" : { ... }
}

  • The collections colltest2 will be sharded based on a key containing the non-hashed key in the prefix
mongos> db.getSiblingDB("dbtest").colltest2.createIndex({"location":1,_id:"hashed"},{unique:false})
{
	"raw" : {
		"shard02/localhost:40003" : {
			"createdCollectionAutomatically" : true,
			"numIndexesBefore" : 1,
			"numIndexesAfter" : 2,
			"commitQuorum" : "votingMembers",
			"ok" : 1
		}
	},
	"ok" : 1,
	"operationTime" : Timestamp(1624357534, 2),
	"$clusterTime" : { ... }
}

It is important to highlight  a couple of important points about creating the indexes for the shard keys:

  • Once the collection is empty or if it is a new one, the command used to get a collection sharded will automatically create the indexes if they are not present yet. This will be described in the upcoming sections.
  • If the shard key prefix will not be a hashed value, the indexes shall be created with the option {unique:false}

Creating the Initial Chunks Based on Hashed Key as a Prefix

There are, basically, a couple of requirements to ensure that the collection will be in compliance with the balancer and the initial chunk distribution will be optimally performed. 

  • For each non-hashed value, a range of the hashed key must be defined with upper and lower boundaries. 
  • The collection must be sharded with the option presplitHashedZones: true if there is a hashed field.

The below example shows how to shard the collection colltest1.

Defining the Zones and Assigning to the Shards (To make things brief, only one zone “LATAM” is used in this example.)

mongos> sh.addShardToZone("shard01", "LATAM")
{
	"ok" : 1,
	"operationTime" : Timestamp(1624359227, 38),
	"$clusterTime" : { ... }
}
mongos> sh.addShardToZone("shard02", "LATAM")
{
	"ok" : 1,
	"operationTime" : Timestamp(1624359231, 1),
	"$clusterTime" : {... }
}

If you check the output of the sh.status(), you will notice that tags were created and assigned to the shards according to the zones definition:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("60d1be52be9f36000b01a9f0")
  }
  shards:
        {  "_id" : "shard01",  "host" : "shard01/localhost:40002",  "state" : 1,  "tags" : [ "LATAM" ] }
        {  "_id" : "shard02",  "host" : "shard02/localhost:40003",  "state" : 1,  "tags" : [ "LATAM" ] }        
<<the rest of status output was intentionally truncated to avoid unnecessary verbosity>>

Creating the Zone Ranges

mongos> sh.updateZoneKeyRange("dbtest.colltest1",{ "_id" : MinKey, "location" : MinKey },{ _id:MaxKey,"location" : MaxKey },"LATAM");
{
	"ok" : 1,
	"operationTime" : Timestamp(1624361452, 1),
	"$clusterTime" : { ... }
}

Enabling Shard

mongos> sh.enableSharding("dbtest")
{
	"ok" : 1,
	"operationTime" : Timestamp(1624361047, 6),
	"$clusterTime" : { ... }
}
mongos> sh.shardCollection("dbtest.colltest1",{_id:"hashed","location":1},false,{ presplitHashedZones: true })
{
	"collectionsharded" : "dbtest.colltest1",
	"collectionUUID" : UUID("30b2efff-2e2e-4c72-9a82-9593f227002f"),
	"ok" : 1,
	"operationTime" : Timestamp(1624361528, 31),
	"$clusterTime" : { ... }
}

In this example, the namespace dbtest.colltest1 will be evenly distributed according to the zone LATAM which will reach the shards shard01 and shard02. Looking at the sh.status() again, you will see that the initial chunks were created following the range defined above.

Creating the Initial Chunks Based on Non-Hashed Key as a Prefix

This example section will be a little bit more complicated to make the shard key compliant with the balancer for the initial chunk distribution.

  • You need to specify MinKey for each field in the shard key, which defines the lower boundary of each zone range.
  • Every zone must comprise a range, so at least one of the fields in the shard key must have an upper-boundary value larger than its MinKey
  • Every combination of values for the fields of the shard key will fall within the range of one of the defined zones.  
  • At the moment of sharding the collection, presplitHashedZones must be set to true.
  • Not required, but likely desirable: One of the zones should define its upper boundary as MaxKey so it acts as a catchall for out-of-range values.

Defining the Zones and Assigning Them to the Shards

The zones were defined differently for this example:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("60d1be52be9f36000b01a9f0")
  }
  shards:
        {  "_id" : "shard01",  "host" : "shard01/localhost:40002",  "state" : 1,  "tags" : [ "EU" ] }
        {  "_id" : "shard02",  "host" : "shard02/localhost:40003",  "state" : 1,  "tags" : [ "LATAM" ] }
        {  "_id" : "shard03",  "host" : "shard03/localhost:40004",  "state" : 1,  "tags" : [ "AMER" ] }
        {  "_id" : "shard04",  "host" : "shard04/localhost:40005",  "state" : 1,  "tags" : [ "APAC" ] }

Creating the Zone Ranges

mongos> sh.updateZoneKeyRange("dbtest2.colltest2",{ "location": "DC01", "_id" : MinKey },{ "location": "DC02", "_id" : MinKey },"LATAM");
{
	"ok" : 1,
	"operationTime" : Timestamp(1624364375, 1),
	"$clusterTime" : { ... }
}
mongos> sh.updateZoneKeyRange("dbtest2.colltest2",{ "location": "DC02", "_id" : MinKey },{ "location": MaxKey, "_id" : MinKey },"EU");
{
	"ok" : 1,
	"operationTime" : Timestamp(1624364430, 1),
	"$clusterTime" : { ... }
}

Enabling Shard

mongos> sh.enableSharding("dbtest2")
{
	"ok" : 1,
	"operationTime" : Timestamp(1624366057, 5),
	"$clusterTime" : { ... }
}
mongos> sh.shardCollection("dbtest2.colltest2",{"location":1,_id:"hashed"},false,{ presplitHashedZones: true })
{
	"collectionsharded" : "dbtest2.colltest2",
	"collectionUUID" : UUID("b8a64972-97df-4791-8019-9526d2f8d405"),
	"ok" : 1,
	"operationTime" : Timestamp(1624366085, 37),
	"$clusterTime" : { ... }
}
mongos>

The above example basically describes how to use a non-hashed field on the prefix of a shard key to ensure that certain values of that field will reach certain zones (determined as tags on shards) and the boundaries applied on the hashed field will ensure even distribution. In that case, all the docs of the namespace dbtest2.colltest2 with the minimum _id of location DC01 and the minimum _id of the location DC02, will be placed on the zone LATAM (shard02). And the next docs from the minimum _id of the location DC02 until the rest will be placed on the zone EU (shard 01)

It is very important to highlight that if the collection is not in compliance with the balancer, the migration of the chunks will never happen, though all the chunks will stay on the Primary Shard. It is possible to predict that situation right after enabling the sharding on the collection by looking at the output of the command sh.balancerCollectionStatus

mongos> sh.balancerCollectionStatus("dbtest.colltest1")
{
	"balancerCompliant" : true,
	"ok" : 1,
	"operationTime" : Timestamp(1623691376, 1),
	"$clusterTime" : { ... }
}

If the balancerCompliant is true, means that the balancer will be able to split and migrate the chunks.

Conclusion

Defining the shard key is the most important step of deploying a healthy sharded cluster. Having a field on the shard key which contains a few distinct values would compromise the shard distribution, and as consequence, the performance. Hence, this is a great improvement coming along in MongoDB 4.4 which makes it possible to have keys with low cardinality defining the boundaries, yet still ensuring that the shard will rely on a hashed distribution based on a very selective key.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com