Oct
10
2018
--

MongoDB Replica set Scenarios and Internals

MongoDB replica sets replication internals r

MongoDB replica sets replication internals rThe MongoDB® replica set is a group of nodes with one set as the primary node, and all other nodes set as secondary nodes. Only the primary node accepts “write” operations, while other nodes can only serve “read” operations according to the read preferences defined. In this blog post, we’ll focus on some MongoDB replica set scenarios, and take a look at the internals.

Example configuration

We will refer to a three node replica set that includes one primary node and two secondary nodes running as:

"members" : [
{
"_id" : 0,
"name" : "192.168.103.100:25001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3533,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"electionTime" : Timestamp(1537797392, 2),
"electionDate" : ISODate("2018-09-24T13:56:32Z"),
"configVersion" : 3,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.103.100:25002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 3063,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.664Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25001",
"configVersion" : 3
},
{
"_id" : 2,
"name" : "192.168.103.100:25003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2979,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.989Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25002",
"configVersion" : 3
}

Here, the primary is running on port 25001, and the two secondaries are running on ports 25002 and 25003 on the same host.

Secondary nodes can only sync from Primary?

No, it’s not mandatory. Each secondary can replicate data from the primary or any other secondary to the node that is syncing. This term is also known as chaining, and by default, this is enabled.

In the above replica set, you can see that secondary node

"_id":2 

  is syncing from another secondary node

"_id":1

   as

"syncingTo" : "192.168.103.100:25002" 

This can also be found in the logs as here the parameter

chainingAllowed :true

   is the default setting.

settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } }

Chaining?

That means that a secondary member node is able to replicate from another secondary member node instead of from the primary node. This helps to reduce the load from the primary. If the replication lag is not tolerable, then chaining could be disabled.

For more details about chaining and the steps to disable it please refer to my earlier blog post here.

Ok, then how does the secondary node select the source to sync from?

If Chaining is False

When chaining is explicitly set to be false, then the secondary node will sync from the primary node only or could be overridden temporarily.

If Chaining is True

  • Before choosing any sync node, TopologyCoordinator performs validations like:
    • Whether chaining is set to true or false.
    • If that particular node is part of the current replica set configurations.
    • Identify the node ahead with oplog with the lowest ping time.
    • The source code that includes validation is here.
  • Once the validation is done, SyncSourceSelector relies on SyncSourceResolver which contains the result and details for the new sync source
  • To get the details and response, SyncSourceResolver coordinates with ReplicationCoordinator
  • This ReplicationCoordinator is responsible for the replication, and co-ordinates with TopologyCoordinator
  • The TopologyCoordinator is responsible for topology of the cluster. It finds the primary oplog time and checks for the maxSyncSourceLagSecs
  • It will reject the source to sync from if the maxSyncSourceLagSecs  is greater than the newest oplog entry. The code for this can be found here
  • If the criteria for the source selection is not fulfilled, then BackgroundSync thread waits and restarts the whole process again to get the sync source.

Example for “unable to find a member to sync from” then, in the next attempt, finding a candidate to sync from

This can be found in the log like this. On receiving the message from rsBackgroundSync thread

could not find member to sync from

, the whole internal process restarts and finds a member to sync from i.e.

sync source candidate: 192.168.103.100:25001

, which means it is now syncing from node 192.168.103.100 running on port 25001.

2018-09-24T13:58:43.197+0000 I REPL     [rsSync] transition to RECOVERING
2018-09-24T13:58:43.198+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2018-09-24T13:58:43.201+0000 I REPL     [rsSync] transition to SECONDARY
2018-09-24T13:58:59.208+0000 I REPL     [rsBackgroundSync] sync source candidate: 192.168.103.100:25001

  • Once the sync source node is selected, SyncSourceResolver probes the sync source to confirm that it is able to fetch the oplogs.
  • RollbackID is also fetched i.e. rbid  after the first batch is returned by oplogfetcher.
  • If all eligible sync sources are too fresh, such as during initial sync, then the syncSourceStatus Oplog start is missing and earliestOpTimeSeen will set a new minValid.
  • This minValid is also set in the case of rollback and abrupt shutdown.
  • If the node has a minValid entry then this is checked for the eligible sync source node.

Example showing the selection of a new sync source when the existing source is found to be invalid

Here, as the logs show, during sync the node chooses a new sync source. This is because it found the original sync source is not ahead, so not does not contain recent oplogs from which to sync.

2018-09-25T15:20:55.424+0000 I REPL     [replication-1] Choosing new sync source because our current sync source, 192.168.103.100:25001, has an OpTime ({ ts: Timestamp 1537879296000|1, t: 4 }) which is not ahead of ours ({ ts: Timestamp 1537879296000|1, t: 4 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)

2018-09-25T15:20:55.425+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source 192.168.103.100:25001 (config version: 3; last applied optime: { ts: Timestamp 1537879296000|1, t: 4 }; sync source index: -1; primary index: -1) is no longer valid

  • If the secondary node is too far behind the eligible sync source node, then the node will enter maintenance node and then resync needs to be call manually.
  • Once the sync source is chosen, BackgroundSync starts oplogFetcher.

Example for oplogFetcher

Here is an example of fetching oplog from the “oplog.rs” collection, and checking for the greater than required timestamp.

2018-09-26T10:35:07.372+0000 I COMMAND  [conn113] command local.oplog.rs command: getMore { getMore: 20830044306, collection: "oplog.rs", maxTimeMS: 5000, term: 7, lastKnownCommittedOpTime: { ts: Timestamp 1537955038000|1, t: 7 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1537903865000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 7, readConcern: { afterOpTime: { ts: Timestamp 1537903865000|1, t: 6 } } } planSummary: COLLSCAN cursorid:20830044306 keysExamined:0 docsExamined:0 numYields:1 nreturned:0 reslen:451 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 3063398ms

When and what details replica set nodes communicate with each other?

At a regular interval, all the nodes communicate with each other to check the status of the primary node, check the status of the sync source, to get the oplogs and so on.

ReplicationCoordinator has ReplicaSetConfig that has a list of all the replica set nodes, and each node has a copy of it. This makes nodes aware of other nodes under same replica set.

This is how nodes communicate in more detail:

Heartbeats: This checks the status of other nodes i.e. alive or die

heartbeatInterval: Every node, at an interval of two seconds, sends the other nodes a heartbeat to make them aware that “yes I am alive!”

heartbeatTimeoutSecs: This is a timeout, and means that if the heartbeat is not returned in 10 seconds then that node is marked as inaccessible or simply die.

Every heartbeat is identified by these replica set details:

  • replica set config version
  • replica set name
  • Sender host address
  • id from the replicasetconfig

The source code could be referred to from here.

When the remote node receives the heartbeat, it processes this data and validates if the details are correct. It then prepares a ReplSetHeartbeatResponse, that includes:

  • Name of the replica set, config version, and optime details
  • Details about primary node as per the receiving node.
  • Sync source details and state of receiving node

This heartbeat data is processed, and if primary details are found then the election gets postponed.

TopologyCoordinator checks for the heartbeat data and confirms if the node is OK or NOT. If the node is OK then no action is taken. Otherwise it needs to be reconfigured or else initiate a priority takeover based on the config.

Response from oplog fetcher

To get the oplogs from the sync source, nodes communicate with each other. This oplog fetcher fetches oplogs through “find” and “getMore”. This will only affect the downstream node that gets metadata from its sync source to update its view from the replica set.

OplogQueryMetadata only comes with OplogFetcher responses

OplogQueryMetadata comes with OplogFetcher response and ReplSetMetadata comes with all the replica set details including configversion and replication commands.

Communicate to update Position commands:

This is to get an update for replication progress. ReplicationCoordinatorExternalState creates SyncSourceFeedback sends replSetUpdatePosition commands.

It includes Oplog details, Replicaset config version, and replica set metadata.

If a new node is added to the existing replica set, how will that node get the data?

If a new node is added to the existing replica set then the “initial sync” process takes place. This initial sync can be done in two ways:

  1. Just add the new node to the replicaset and let initial sync threads restore the data. Then it syncs from the oplogs until it reaches the secondary state.
  2. Copy the data from the recent data directory to the node, and restart this new node. Then it will also sync from the oplogs until it reaches the secondary state.

This is how it works internally

When “initial sync” or “rsync” is called by ReplicationCoordinator  then the node goes to “STARTUP2” state, and this initial sync is done in DataReplicator

  • A sync source is selected to get the data from, then it drops all the databases except the local database, and oplogs are recreated.
  • DatabasesCloner asks syncsource for a list of the databases, and for each database it creates DatabaseCloner.
  • For each DatabaseCloner it creates CollectionCloner to clone the collections
  • This CollectionCloner calls ListIndexes on the syncsource and creates a CollectionBulkLoader for parallel index creation while data cloning
  • The node also checks for the sync source rollback id. If rollback occurred, then it restarts the initial sync. Otherwise, datareplicator is done with its work and then replicationCoordinator assumes the role for ongoing replication.

Example for the “initial sync” :

Here node enters  

"STARTUP2"- "transition to STARTUP2"

Then sync source gets selected and drops all the databases except the local database.  Next, replication oplog is created and CollectionCloner is called.

Local database not dropped: because every node has its own “local” database with its own and other nodes’ information, based on itself, this database is not replicated to other nodes.

2018-09-26T17:57:09.571+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2018-09-26T17:57:14.589+0000 I REPL     [replication-1] sync source candidate: 192.168.103.100:25003
2018-09-26T17:57:14.590+0000 I STORAGE  [replication-1] dropAllDatabasesExceptLocal 1
2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB... 2018-09-26T17:57:14.633+0000 I REPL     [replication-0] CollectionCloner::start called, on ns:admin.system.version

Finished fetching all the oplogs, and finishing up initial sync.

2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1537984626000|1, t: 9 }[-1139925876765058240]
2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Initial sync attempt finishing up.

What are oplogs and where do these reside?

oplogs stands for “operation logs”. We have used this term so many times in this blog post as these are the mandatory logs for the replica set. These operations are in the capped collection called “oplog.rs”  that resides in “local” database.

Below, this is how oplogs are stored in the collection “oplog.rs” that includes details for timestamp, operations, namespace, output.

rplint:PRIMARY> use local
rplint:PRIMARY> show collections
oplog.rs
rplint:PRIMARY> db.oplog.rs.findOne()
{
 "ts" : Timestamp(1537797392, 1),
 "h" : NumberLong("-169301588285533642"),
 "v" : 2,
 "op" : "n",
 "ns" : "",
 "o" : {
 "msg" : "initiating set"
 }
}

It consists of rolling update operations coming to the database. Then these oplogs replicate to the secondary node(s) to maintain the high availability of the data in case of failover.

When the replica MongoDB instance starts, it creates an oplog ocdefault size. For Wired tiger, the default size is 5% of disk space, with a lower bound size of 990MB. So here in the example it creates 990MB of data. If you’d like to learn more about oplog size then please refer here

2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB...

What if the same oplog is applied multiple times, will that not lead to inconsistent data?

Fortunately, oplogs are Idempotent that means the value will remain unchanged, or will provide the same output, even when applied multiple times.

Let’s check an example:

For the $inc operator that will increment the value by 1 for the filed “item”, if this oplog is applied multiple times then the result might lead to an inconsistent record if this is not Idempotent. However, rather than increasing the item value multiple times, it is actually applied only once.

rplint:PRIMARY> use db1
//inserting one document
rplint:PRIMARY> db.col1.insert({item:1, name:"abc"})
//updating document by incrementing item value with 1
rplint:PRIMARY> db.col1.update({name:"abc"},{$inc:{item:1}})
//updated value is now item:2
rplint:PRIMARY> db.col1.find()
{ "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 2, "name" : "abc" }

This is how these operations are stored in oplog, here this $inc value is stored in oplog as $set

rplint:PRIMARY> db.oplog.rs.find({ns:"db1.col1"})
//insert operation
{ "ts" : Timestamp(1537987964, 2), "t" : NumberLong(9), "h" : NumberLong("8083740413874479202"), "v" : 2, "op" : "i", "ns" : "db1.col1", "o" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 1, "name" : "abc" } }
//$inc operation is changed as ""$set" : { "item" : 2"
{ "ts" : Timestamp(1537988022, 1), "t" : NumberLong(9), "h" : NumberLong("-1432987813358665721"), "v" : 2, "op" : "u", "ns" : "db1.col1", "o2" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16") }, "o" : { "$set" : { "item" : 2 } } }

That means that however many  times it is applied, it will generate the same results, so no inconsistent data!

I hope this blog post helps you to understand multiple scenarios for MongoDB replica sets, and how data replicates to the nodes.

May
31
2018
--

MongoDB: deploy a replica set with transport encryption (part 3/3)

MongoDB Encryption Replica Sets

MongoDB Encryption Replica SetsIn this third and final post of the series, we look at how to configure transport encryption on a deployed MongoDB replica set. Security vulnerabilities can arise when internal personnel have legitimate access to the private network, but should not have access to the data. Encrypting intra-node traffic ensures that no one can “sniff” sensitive data on the network.

In part 1 we described MongoDB replica sets and how they work.
In part 2 we provided a step-by-step guide to deploy a simple 3-node replica set, including information on replica set configuration.

Enable Role-Based Access Control

In order for the encryption to be used in our replica set, we need first to activate Role-Based Access Control (RBAC). By default, a MongoDB installation permits anyone to connect and see the data, as in the sample deployment we created in part 2. Having RBAC enabled is mandatory for encryption.

RBAC governs access to a MongoDB system. Users are created and assigned privileges to access specific resources, such as databases and collections. Likewise, for carrying out administrative tasks, users need to be created with specific grants. Once activated, every user must authenticate themselves in order to access MongoDB.

Prior to activating RBAC, let’s create an administrative user. We’ll connect to the PRIMARY member and do the following:

rs-test:PRIMARY> use admin
switched to db admin
rs-test:PRIMARY> db.createUser({user: 'admin', pwd: 'secret', roles:['root']})
Successfully added user: { "user" : "admin", "roles" : [ "root" ] }

Let’s activate the RBAC in the configuration file /etc/mongod.conf on each node

security:
      authorization: enabled

and restart the daemon

sudo service mongod restart

Now to connect to MongoDB we issue the following command:

mongo -u admin -p secret --authenticationDatabase "admin"

Certificates

MongoDB supports X.509 certificate authentication for use with a secure TLS/SSL connection. The members can use X.509 certificates to verify their membership of the replica set.

In order to use encryption, we need to create certificates on all the nodes and have a certification authority (CA) that signs them. Since having a certification authority can be quite costly, we decide to use self-signed certificates. For our purposes, this solution ensures encryption and has no cost. Using a public CA is not necessary inside a private infrastructure.

To proceed with certificate generation we need to have openssl installed on our system and certificates need to satisfy these requirements:

  • any certificate needs to be signed by the same CA
  • the common name (CN) required during the certificate creation must correspond to the hostname of the host
  • any other field requested in the certificate creation should be a non-empty value and, hopefully, should reflect our organization details
  • it is also very important that all the fields, except the CN, should match those from the certificates for the other cluster members

The following guide describes all the steps to configure internal X.509 certificate-based encryption.

1 – Connect to one of the hosts and generate a new private key using openssl

openssl genrsa -out mongoCA.key -aes256 8192

We have created a new 8192 bit private key and saved it in the file mongoCA.key
Remember to enter a strong passphrase when requested.

2 – Sign a new CA certificate

Now we are going to create our “fake” local certification authority that we’ll use later to sign each node certificate.

During certificate creation, some fields must be filled out. We could choose these randomly but they should correspond to our organization’s details.

root@psmdb1:~# openssl req -x509 -new -extensions v3_ca -key mongoCA.key -days 365 -out
    mongoCA.crt
    Enter pass phrase for mongoCA.key:
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [AU]:US
    State or Province Name (full name) [Some-State]:California
    Locality Name (eg, city) []:San Francisco
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company Ltd
    Organizational Unit Name (eg, section) []:DBA
    Common Name (e.g. server FQDN or YOUR name) []:psmdb
    Email Address []:corrado@mycompany.com

3 – Issue self-signed certificates for all the nodes

For each node, we need to generate a certificate request and sign it using the CA certificate we created in the previous step.

Remember: fill out all the fields requested the same for each host, but remember to fill out a different common name (CN) that must correspond to the hostname.

For the first node issue the following commands.

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb1.key -out psmdb1.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb1.csr -out psmdb1.crt
cat psmdb1.key psmdb1.crt > psmdb1.pem

for the second node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb2.key -out psmdb2.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb2.csr -out psmdb2.crt
cat psmdb2.key psmdb2.crt > psmdb2.pem

and for the third node

openssl req -new -nodes -newkey rsa:4096 -keyout psmdb3.key -out psmdb3.csr
openssl x509 -CA mongoCA.crt -CAkey mongoCA.key -CAcreateserial -req -days 365 -in psmdb3.csr -out psmdb3.crt
cat psmdb3.key psmdb3.crt > psmdb3.pem

4 – Place the files

We could execute all of the commands in the previous step on the same host, but now we need to copy the generated files to the proper nodes:

  • Copy to each node the CA certifcate file: mongoCA.crt
  • Copy each self signed certifcate <hostname>.pem into the relative member
  • Create on each member a directory that only the MongoDB user can read, and copy both files there
sudo mkdir -p /etc/mongodb/ssl
sudo chmod 700 /etc/mongodb/ssl
sudo chown -R mongod:mongod /etc/mongodb
sudo cp psmdb1.pem /etc/mongodb/ssl
sudo cp mongoCA.crt /etc/mongodb/ssl

Do the same on each host.

5 – Configure mongod

Finally, we need to instruct mongod about the certificates to enable the encryption.

Change the configuration file /etc/mongod.conf on each host adding the following rows:

net:
   port: 27017
   ssl:
      mode: requireSSL
      PEMKeyFile: /etc/mongodb/ssl/psmdb1.pem
      CAFile: /etc/mongodb/ssl/mongoCA.crt
      clusterFile: /etc/mongodb/ssl/psmdb1.pem
   security:
      authorization: enabled
      clusterAuthMode: x509

Restart the daemon

sudo service mongodb restart

Make sure to put the proper file names on each host (psmdb2.pem on psmdb2 host and so on)

Now, as long as we have made no mistakes, we have a properly configured replica set that is using encrypted connections.

Issue the following command to connect on node psmdb1:

mongo admin --ssl --sslCAFile /etc/mongodb/ssl/mongoCA.crt
--sslPEMKeyFile /etc/mongodb/ssl/psmdb1.pem
-u admin -p secret --host psmdb1

Access the first two articles in this series

  • Part 1: Introduces basic replica set concepts, how it works and what its main features
  • Part 2:  Provides a step-by-step guide to configure a three-node replica set

The post MongoDB: deploy a replica set with transport encryption (part 3/3) appeared first on Percona Database Performance Blog.

Jul
07
2017
--

Last Resort: How to Use a Backup to Start a Secondary Instance for MongoDB?

Backup to Start a Secondary

Backup to Start a SecondaryIn this blog post, I’ll look at how you can use a backup to start a secondary instance for MongoDB.

Although the documentation says it is not possible to use a backup to start a secondary, sometimes this is the only possible way to start a new instance. In this blog post, we will explain how to bypass this limitation and use a backup to start a secondary instance.

The initial sync/rsync or snapshot works fine when the instances are in the same data center, but it can fail or be really slow. Much slower than moving a compressed backup between data centers.

Not every backup can be used as a source for starting a replica set. The backup must have the oplog file. This means the backup must be done in a previously existent replica set using the

--oplog flag

 point in time backup when dumping the collections. The time spent to move and restore the backup file must be less than the oplog window.

Please follow the next steps to create a secondary from a backup:

  1. Create a backup using the
    --oplog

     command.

    mongodump --oplog -o /tmp/backup/
  2. Backup the replica set collection from the local database.

    mongodump -d local -c system.replset
  3. After backup finishes please confirm the oplog.rs file is in the backup folder.
  4. Use
    bsondump

     to convert the oplog to JSON. We will use the last oplog entry as a starting point for the replica set.

    bsondump oplog.rs > oplog.rs.txt
  5. Initiate the new instance without the replica set parameter configured. At this point, the instance will act as a single instance.
  6. Restore the database normally using
    --oplogreplay

     to apply all the oplogs that have been recorded while the backup was running.

    mongorestore /tmp/backup --oplogReplay
  7. Connect to this server and use the local database to create the oplog.rs collection. Please use the same value as the other members (e.g., 20 GB).
    mongo
    use local
    db.runCommand( { create: "oplog.rs", capped: true, size: (20 * 1024 * 1024 * 1024) } )
  8. From the oplog.rs.txt generated in step 4, get the last line and copy the fields ts and h values to a MongoDB document.

    tail -n 1 opslog.rs.txt
  9. Insert the JSON value to the oplog.rs collection that was created before.
    mongo
    use local
    db.oplog.rs.insert({ ts: Timestamp(12354454,1), h: NumberLong("-3434387384732843")})
  10. Restore the replset collection to the local database.
    mongorestore -d local -c system.replset ./backup/repliset.bson
  11. Stop the service and edit the parameter replica set name to match the existing replica set.
  12. Connect to the primary and add this new host. The new host must start catching up the oplog and get in sync after a few hours/minutes, depending on the number of operations the replica set handles. It is important to consider adding this new secondary as a hidden secondary, without votes if possible, to avoid triggering an election. When the secondary is added to the replica set drivers, it will start using this host to perform reads. If you don’t add the server with
    hidden: true

    , the application will read inconsistent data (old data).

    mongo
    PRIMARY>rs.add({_id : <your id>, host: "<newhost:27017>", hidden : true, votes : 0, priority :0})
  13. Please check the replication lag, and once the seconds behind master is near to zero, change the host parameters in the replica set to
    hidden: false

     and priority or

    votes

    .

    mongo
    PRIMARY> rs.printSlaveReplicationInfo()
  14. We are considering a replica set with three members, where the new secondary has the ID 2 in the member’s array. Use the following command to unhide the secondary and make it available for reads. The priority and votes depend on your environment. Please notice you might need to change the member ID.
    mongo
    PRIMARY> cfg = rs.config()
    cfg.members[2].hidden = false
    cfg.members[2].votes = 1
    cfg.members[2].priority = 1
    PRIMARY> rs.reconfig(rs)

I hope this tutorial helps in an emergency situation. Please consider using initial sync, disk snapshots and hot backups before using this method.

Feel free to reach out me on twitter @AdamoTonete or @percona.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com