Dec
30
2021
--

MongoDB Config Server Upgrade From SCCC to CSRS ReplicaSet

MongoDB Config Server Upgrade

MongoDB Config Server UpgradeRecently, I got some of our customers doing an upgrade to v4.x from v3.0 (Yeah, still could see older MongoDB versions and we suggest everyone do the upgrade to stay with the current GA!). There were some pain points in the upgrade process, and especially before migrating to v3.4, you will need to change from SCCC config setup to CSRS replicaSet setup. Interestingly, I did some tests in that process and would like to share them here. Even though this topic is pretty outdated, this would be beneficial to the people out there who still seek to migrate from the older version and are stuck at this config server upgrade.

In this blog post, we will see how to migrate a mirrored config server configuration (SCCC) to the replicaSet configuration (CSRS) along with the storage engine from MMAP to wiredTiger change. From MongoDB 3.2.4, the replicaSet configuration is supported for config servers too and the support of SCCC (Sync Cluster Connection Configuration) configuration is removed from MongoDB 3.4. So it is important to make sure the config servers are configured with the replicaSet before upgrading to MongoDB 3.4. Also, the config servers must run on wiredTiger when configured as replicaSet. There are two ways to do this – one is that when you want to replace existing config servers and another one is to migrate to the new set of servers/ports. Let’s see them in the following topics.

Migrate From SCCC to CSRS on the Different Hosts/Ports:

This section gives a brief about migrating the existing mirrored hosts/ports to different hosts or ports. It is always recommended to backup your config server before your steps and get downtime to avoid any writes on this crucial step. The test shared cluster setup runs on Percona Server for MongoDB (PSMDB) v3.2.22 with SCCC set up on the following ports:

47080 – mongos
47087, 47088, 47089 – mongoc – config servers with mmapv1 engine (sccc)
47081, 47082, 47083 – shard01 replicaset
47084, 47085, 47086 – shard02 replicaset

New Config servers to migrate (for testing purposes, using localhost but different ports to migrate):

47090, 47091, 47092 – new mongoc (csrs)

Now stop the balancer:

mongos> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "balancer" })
mongos> 
mongos> sh.getBalancerState()
false

 

Initiate the replicaSet in one of the config servers (here using config db running on 47087). Use this member as a base for migrating from SCCC to CSRS:

$ mongo32 localhost:47087/admin
MongoDB shell version: 3.2.21-2-g105acca0d4
connecting to: localhost:47087/admin
configsvr> 
configsvr> rs.initiate({ _id: "configRepl", configsvr: true, version:1, members: [ {_id:0, host: "localhost:47087"} ] } )
{ "ok" : 1 }

 

Start the new config DB instances using the same replicaset name used in the above initiate command:

$ /home/balaguru/mongodb/mongodb-linux-x86_64-3.2.21-2-g105acca0d4/bin/mongod --dbpath /home/balaguru/mongodb/testshard32/data/configrepl/rs1/db/ --logpath /home/balaguru/mongodb/testshard32/data/configrepl/rs1/mongod.log --port 47090 --fork --configsvr --storageEngine wiredTiger --wiredTigerCacheSizeGB 1 --replSet configRepl
about to fork child process, waiting until server is ready for connections.
forked process: 476923
child process started successfully, parent exiting

 

$ /home/balaguru/mongodb/mongodb-linux-x86_64-3.2.21-2-g105acca0d4/bin/mongod --dbpath /home/balaguru/mongodb/testshard32/data/configrepl/rs2/db/ --logpath /home/balaguru/mongodb/testshard32/data/configrepl/rs2/mongod.log --port 47091 --fork --configsvr --storageEngine wiredTiger --wiredTigerCacheSizeGB 1 --replSet configRepl
about to fork child process, waiting until server is ready for connections.
forked process: 478193
child process started successfully, parent exiting

 

$ /home/balaguru/mongodb/mongodb-linux-x86_64-3.2.21-2-g105acca0d4/bin/mongod --dbpath /home/balaguru/mongodb/testshard32/data/configrepl/rs3/db/ --logpath /home/balaguru/mongodb/testshard32/data/configrepl/rs3/mongod.log --port 47092 --fork --configsvr --storageEngine wiredTiger --wiredTigerCacheSizeGB 1 --replSet configRepl
about to fork child process, waiting until server is ready for connections.
forked process: 478254
child process started successfully, parent exiting

 

Now restart 47087 with the option –replSet and –configsvrMode which allow this mirrored config DB to be in replicaset and have MMAP engine.

$ /home/balaguru/mongodb/mongodb-linux-x86_64-3.2.21-2-g105acca0d4/bin/mongod --dbpath /home/balaguru/mongodb/testshard32/data/config/db --logpath /home/balaguru/mongodb/testshard32/data/config/mongod.log --port 47087 --fork --configsvr --storageEngine mmapv1 --configsvrMode sccc --replSet configRepl
about to fork child process, waiting until server is ready for connections.
forked process: 477682
child process started successfully, parent exiting

 

Add the new config DB instances to this replicaSet (set priority/votes 0 to avoid election):

configRepl:PRIMARY> rs.add({host: "localhost:47090", priority:0, votes:0})
{ "ok" : 1 }

configRepl:PRIMARY> rs.add({host: "localhost:47091", priority:0, votes:0})
{ "ok" : 1 }

configRepl:PRIMARY> rs.add({host: "localhost:47092", priority:0, votes:0})
{ "ok" : 1 }

configRepl:PRIMARY> rs.status()
{
        "set" : "configRepl",
        "date" : ISODate("2021-11-23T08:11:13.065Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "configsvr" : true,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:47087",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 154,
                        "optime" : {
                                "ts" : Timestamp(1637655072, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2021-11-23T08:11:12Z"),
                        "electionTime" : Timestamp(1637654919, 1),
                        "electionDate" : ISODate("2021-11-23T08:08:39Z"),
                        "configVersion" : 4,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "localhost:47090",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 69,
                        "optime" : {
                                "ts" : Timestamp(1637655069, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2021-11-23T08:11:09Z"),
                        "lastHeartbeat" : ISODate("2021-11-23T08:11:11.608Z"),
                       "lastHeartbeatRecv" : ISODate("2021-11-23T08:11:11.611Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "localhost:47087",
                        "configVersion" : 4
                },
                {
                        "_id" : 2,
                        "name" : "localhost:47091",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 6,
                        "optime" : {
                                "ts" : Timestamp(1637655069, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2021-11-23T08:11:09Z"),
                        "lastHeartbeat" : ISODate("2021-11-23T08:11:11.608Z"),
                        "lastHeartbeatRecv" : ISODate("2021-11-23T08:11:09.609Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "localhost:47090",
                        "configVersion" : 4
                },
                {
                        "_id" : 3,
                        "name" : "localhost:47092",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 3,
                        "optime" : {
                                "ts" : Timestamp(1637655069, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2021-11-23T08:11:09Z"),
                        "lastHeartbeat" : ISODate("2021-11-23T08:11:11.608Z"),
                        "lastHeartbeatRecv" : ISODate("2021-11-23T08:11:11.620Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "localhost:47091",
                        "configVersion" : 4
                }
        ],
        "ok" : 1
}

 

Shut down one of the other mirrored config servers (either 47088 or 47089). If one of the mirrored DB is down in between the migration, then the sharded cluster goes into read-only mode, i.e there will be a failure when issuing DDL (mentioned in the below example). So it is advised to do the activity when there is downtime to avoid any unexpected error from the application side.

// try to create new db and collection as follows to test when 47088 is down
mongos> use vinodh  
switched to db vinodh
mongos> db.testCreateColl.insert({id:1})
WriteResult({
        "writeError" : {
                "code" : 13663,
                "errmsg" : "unable to target write op for collection vinodh.testCreateColl :: caused by :: Location13663: exception creating distributed lock vinodh/balaguru-UbuntuPC:47080:1637654366:1487823871 :: caused by :: SyncClusterConnection::update prepare failed:  localhost:47088 (127.0.0.1) failed:9001 socket exception [CONNECT_ERROR] server [couldn't connect to server localhost:47088, connection attempt failed] "
        }
})

 

Once the replicaSet is set, change the priority and votes of other nodes to participate in the election:

configRepl:PRIMARY> cfg.members[1].host
localhost:47090
configRepl:PRIMARY> cfg.members[1].priority
0
configRepl:PRIMARY> cfg.members[1].priority = 1
1
configRepl:PRIMARY> cfg.members[1].votes 
0
configRepl:PRIMARY> cfg.members[1].votes = 1
1
configRepl:PRIMARY> cfg.members[2].priority = 1
1
configRepl:PRIMARY> cfg.members[3].priority = 1
1
configRepl:PRIMARY> cfg.members[2].votes = 1
1
configRepl:PRIMARY> cfg.members[3].votes = 1
1
configRepl:PRIMARY> rs.reconfig(cfg)
{ "ok" : 1 }

 

Now change the PRIMARY role to the new member and make 47087 as SECONDARY:

configRepl:PRIMARY> rs.stepDown()
2021-11-23T13:58:37.885+0530 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host 'localhost:47087'  :
DB.prototype.runCommand@src/mongo/shell/db.js:135:1
DB.prototype.adminCommand@src/mongo/shell/db.js:152:1
rs.stepDown@src/mongo/shell/utils.js:1202:12
@(shell):1:1

2021-11-23T13:58:37.887+0530 I NETWORK  [thread1] trying reconnect to localhost:47087 (127.0.0.1) failed
2021-11-23T13:58:37.887+0530 I NETWORK  [thread1] reconnect localhost:47087 (127.0.0.1) ok
configRepl:SECONDARY>

 

Restart the old config member 47087 without –configsvrMode option. Here 47087 will go into REMOVED which is totally fine as it still has MMAPV2 engine + running without the configsvrMode option. Then you can restart mongos with new –-configdb option mentioning only the new config servers (–configdb “configRepl/localhost:47090,localhost:47091,localhost:47092” ):

$ /home/balaguru/mongodb/mongodb-linux-x86_64-3.2.21-2-g105acca0d4/bin/mongos --logpath /home/balaguru/mongodb/testshard32/data/mongos.log --port 47080 --configdb "configRepl/localhost:47090,localhost:47091,localhost:47092" --fork

 

Now 47087 is ready to be removed from the replicaSet finally:

configRepl:PRIMARY> rs.remove("localhost:47087")
{ "ok" : 1 }

You can then check the shard status and the data through mongos. Once everything looks good, stop the other mirrored config DBs on 47088 and 47089 ports followed by enabling the balancer.

 

Migrate From SCCC to CSRS on the Same Hosts:

This section gives a brief about replacing the existing mirrored hosts/port. This has steps until enabling the replicaSet in one of the mirrored config servers. For this testing case, the shared cluster setup runs with PSMDB v3.2.22 with SCCC set up on the following ports:

27100 – mongos
27107, 27108, 27109 – mongoc – config servers with mmapv1 engine
27101, 27102, 27103 – shard01 replicaset
27104, 27105, 27106 – shard02 replicaset

Here using the instance running on port 27107 to start the migration. As said in the above method,

  • Initiate the replicaSet
  • Restart the instance with –-replSet and –-configsvrMode options

The replicaSet status on 27107 instance.

csReplSet:PRIMARY> rs.status()
{
 "set" : "csReplSet",
 "date" : ISODate("2021-10-07T09:04:20.266Z"),
 "myState" : 1,
 "term" : NumberLong(1),
 "configsvr" : true,
 "heartbeatIntervalMillis" : NumberLong(2000),
 "members" : [
  {
   "_id" : 0,
   "name" : "localhost:27107",
   "health" : 1,
   "state" : 1,
   "stateStr" : "PRIMARY",
   "uptime" : 9,
   "optime" : {
    "ts" : Timestamp(1633597452, 1),
    "t" : NumberLong(1)
   },
   "optimeDate" : ISODate("2021-10-07T09:04:12Z"),
   "infoMessage" : "could not find member to sync from",
   "electionTime" : Timestamp(1633597451, 1),
   "electionDate" : ISODate("2021-10-07T09:04:11Z"),
   "configVersion" : 1,
   "self" : true
  }
 ],
 "ok" : 1
}

 

Then stop the second config 27108 instance and clear the dbpath files to remove MMAPv2 files. Start it with the replicaset + WT engine options like below:

$ rm -rf data/config2/db/*

$ /home/vinodh.krishnaswamy/mongo/mongodb-linux-x86_64-3.2.22/bin/mongod --dbpath /home/vinodh.krishnaswamy/mongo/testshard32/data/config2/db --logpath /home/vinodh.krishnaswamy/mongo/testshard32/data/config2/mongod.log --port 27108 --fork --configsvr  --wiredTigerCacheSizeGB 1  --replSet csReplSet
about to fork child process, waiting until server is ready for connections.
forked process: 42341
child process started successfully, parent exiting

At this point, as said earlier let me remind you of the point of the sharded cluster going into a read-only mode when an existing mirrored instance is down or not reachable. So don’t be surprised if you get any warning about DDL errors while in this situation.

Now add this 27108 to the replicaset from PRIMARY (27107) as follows with priority:0 and votes:0 to avoid this server taking part in the election:

$ mongo32 localhost:27107
MongoDB shell version: 3.2.22
connecting to: localhost:27107/test
csReplSet:PRIMARY> rs.add({host: "localhost:27108", priority: 0, votes: 0 })
{ "ok" : 1 }

 

Then repeat the same for the third config server – 27109. Remove DB files + restart it with –replSet and WT options for 27109 followed by adding it to the replicaSet from PRIMARY:

csReplSet:PRIMARY> rs.add({host: "localhost:27109", priority:0 , votes: 0})
{ "ok" : 1 }

 

Once they are in sync and the replicaSet looks healthy, you can adjust the priority and votes for the added members(27108, 27109) as follows to allow them to participate in the election.

csReplSet:PRIMARY> cfg = rs.conf()
{
 "_id" : "csReplSet",
 "version" : 3,
 "configsvr" : true,
 "protocolVersion" : NumberLong(1),
 "members" : [
  {
   "_id" : 0,
   "host" : "localhost:27107",
   "arbiterOnly" : false,
   "buildIndexes" : true,
   "hidden" : false,
   "priority" : 1,
   "tags" : {
   },
   "slaveDelay" : NumberLong(0),
   "votes" : 1
  },
  {
   "_id" : 1,
   "host" : "localhost:27108",
   "arbiterOnly" : false,
   "buildIndexes" : true,
   "hidden" : false,
   "priority" : 0,
   "tags" : {
   },
   "slaveDelay" : NumberLong(0),
  "votes" : 0
  },
  {
   "_id" : 2,
   "host" : "localhost:27109",
   "arbiterOnly" : false,
   "buildIndexes" : true,
   "hidden" : false,
   "priority" : 0,
   "tags" : {
   },
   "slaveDelay" : NumberLong(0),
   "votes" : 0
  }
 ],

 "settings" : {
  "chainingAllowed" : true,
  "heartbeatIntervalMillis" : 2000,
  "heartbeatTimeoutSecs" : 10,
  "electionTimeoutMillis" : 10000,
  "getLastErrorModes" : {
  },
  "getLastErrorDefaults" : {
   "w" : 1,
   "wtimeout" : 0
  },
  "replicaSetId" : ObjectId("615eb78fdb1ed4092c166786")
 }
}

csReplSet:PRIMARY> cfg.members[1].priority = 1
1
csReplSet:PRIMARY> cfg.members[2].priority = 1
1
csReplSet:PRIMARY> cfg.members[1].votes = 1
1
csReplSet:PRIMARY> cfg.members[2].votes = 1
1
csReplSet:PRIMARY> cfg.members[2].votes
1
csReplSet:PRIMARY> rs.reconfig(cfg)
{ "ok" : 1 }

 

Then go to the first config server 27107 and step down it for another member to become PRIMARY. So that you can make it down and change the storage engine to WT in it like above as well + without configsvrMode option. When it joins back, it does the initial sync from the syscSource of the replicaSet.

csReplSet:PRIMARY> rs.stepDown()
2021-10-07T05:14:26.617-0400 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host 'localhost:27107'  :
DB.prototype.runCommand@src/mongo/shell/db.js:135:1
DB.prototype.adminCommand@src/mongo/shell/db.js:153:16
rs.stepDown@src/mongo/shell/utils.js:1202:12
@(shell):1:1

2021-10-07T05:14:26.619-0400 I NETWORK  [thread1] trying reconnect to localhost:27107 (127.0.0.1) failed
2021-10-07T05:14:26.619-0400 I NETWORK  [thread1] reconnect localhost:27107 (127.0.0.1) ok
csReplSet:SECONDARY>

 

To remind you that if you forget to change the storage engine to WT and without —configsvrMode option, it will go into REMOVED state, so please make note of it. (Here it is started purposefully started to show as an example.)

$ mongo32 localhost:27107
MongoDB shell version: 3.2.22
connecting to: localhost:27107/test
csReplSet:REMOVED>

 

Now, bringing down the DB instance 27017 and clearing DB files + starting back with WT and replSet options (remove configsvrMode option this time)

$ rm -rf data/config/db/*

$ /home/vinodh.krishnaswamy/mongo/mongodb-linux-x86_64-3.2.22/bin/mongod --dbpath /home/vinodh.krishnaswamy/mongo/testshard32/data/config/db --logpath /home/vinodh.krishnaswamy/mongo/testshard32/data/config/mongod.log --port 27107 --fork --configsvr --wiredTigerCacheSizeGB 1 --replSet csReplSet
about to fork child process, waiting until server is ready for connections.
forked process: 35202
child process started successfully, parent exiting

 

Then restart the mongos instances with the below –configDB option for the new config to take effect:

mongos <options> –configDB csReplSet/localhost:27107,localhost:27108,localhost:27109

From the mongos logfile, you will see the below message if the config server replicaset was connected successfully:

2021-10-07T05:48:25.679-0400 I NETWORK  [conn6] Successfully connected to csReplSet/localhost:27107,localhost:27108,localhost:27109 (1 connections now open to csReplSet/localhost:27107,localhost:27108,localhost:27109 with a 0 second timeout)

Conclusion

We at Percona don’t recommend anyone to have EOL versions running as they don’t get any bugs fixed or you cannot use the recently added features. Hope this blog helps some who still looking to migrate from older versions.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com