In every new version of MongoDB, there have been a lot of changes and newly introduced features. One such change is the introduction of setDefaultRWConcern command from MongoDB 4.4. This feature has caused multi-document transaction writes to fail for one of my customers. In this blog post, we will look into the problem and how to resolve it.
Introduction
When you want to set the default common writeConcern for your replicaSet to use if writeConcern is not specified explicitly in the command, then you can set it via rs.conf().settings.getLastErrorDefaults values. The default value is {w: 1, wtimeout: 0} i.e. requires an acknowledgment from PRIMARY member alone and this has been there for a long time in MongoDB. After upgrading to MongoDB 4.4, the customer was facing issues with multi-document transactions and the writes failed causing application downtime.
Issue
The cluster/replicaset wide writeConcern could be changed via rs.conf().settings.getLastErrorDefaults value till MongoDB 4.2. But from MongoDB 4.4, if you are using a multi-document transaction with the non-default values of getLastErrorDefaults, then it will eventually fail with the below message:
"errmsg" : "writeConcern is not allowed within a multi-statement transaction",
This is because changing the default value of getLastErrorDefaults was deprecated from MongoDB v4.4 and to change the global writeConcern/readConcern, it needs to be done via the new method – setDefaultRWConcern which was introduced in MongoDB 4.4.
Test Case
The default value of getLastErrorDefaults is {w: 1, wtimeout: 0}. Let’s change it to different values – { “w” : “majority”, “wtimeout” : 3600 }:
replset:PRIMARY> cfg = rs.conf()
replset:PRIMARY> cfg.settings.getLastErrorDefaults.w = "majority"
majority
replset:PRIMARY> cfg.settings.getLastErrorDefaults.wtimeout = 3600
3600
replset:PRIMARY>
replset:PRIMARY> rs.reconfig(cfg)
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1624733172, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1624733172, 1)
}
replset:PRIMARY> rs.conf().settings.getLastErrorDefaults
{ "w" : "majority", "wtimeout" : 3600 }
When running a transaction into a collection percona.people, the error occurs and complains that writes could not be written with the changed getLastErrorDefaults value:
replset:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("3ae288d5-b793-4505-ab5b-5e37d289414a") }
replset:PRIMARY> session.startTransaction()
replset:PRIMARY> session.getDatabase("percona").people.insert([{_id: 1 , name : "George"},{_id: 2, name: "Tom"}])
WriteCommandError({
"operationTime" : Timestamp(1623235947, 1),
"ok" : 0,
"errmsg" : "writeConcern is not allowed within a multi-statement transaction",
"code" : 72,
"codeName" : "InvalidOptions",
"$clusterTime" : {
"clusterTime" : Timestamp(1623235947, 1),
"signature" : {
"hash" : BinData(0,"XPcHTqxG4/LNyaScd/M3ZV6yM3g="),
"keyId" : NumberLong("6952046137905774595")
}
}
})
How to Resolve It
To resolve this issue, revert getLastErrorDefaults to its default value if changed. Let’s change it to the default value as follows:
replset:PRIMARY> cfg = rs.conf()
replset:PRIMARY> cfg.settings.getLastErrorDefaults.w = 1
1
replset:PRIMARY> cfg.settings.getLastErrorDefaults.wtimeout = 0
0
replset:PRIMARY> rs.reconfig(cfg)
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1623236051, 1),
"signature" : {
"hash" : BinData(0,"gZwK9B08VTiEUcLq2/1wvxW5RJI="),
"keyId" : NumberLong("6952046137905774595")
}
},
"operationTime" : Timestamp(1623236051, 1)
}
replset:PRIMARY> rs.conf().settings.getLastErrorDefaults
{ "w" : 1, "wtimeout" : 0 }
Then mention the required default writeConcern, readConcern via the command setDefaultRWConcern as follows:
replset:PRIMARY> db.adminCommand({ "setDefaultRWConcern" : 1, "defaultWriteConcern" : { "w" : "majority", "wtimeout": 3600 } })
{
"defaultWriteConcern" : {
"w" : "majority",
"wtimeout" : 3600
},
"updateOpTime" : Timestamp(1624786369, 1),
"updateWallClockTime" : ISODate("2021-06-27T09:32:57.906Z"),
"localUpdateWallClockTime" : ISODate("2021-06-27T09:32:57.956Z"),
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1624786377, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1624786377, 2)
}
Then write with a transaction into the collection percona.people:
replset:PRIMARY> session2 = db.getMongo().startSession()
session { "id" : UUID("aedf3139-08a4-466d-b1df-d8ced97abd9d") }
replset:PRIMARY> session2.startTransaction()
replset:PRIMARY> session2.getDatabase("percona").people.find()
replset:PRIMARY> session2.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 2,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
replset:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
replset:PRIMARY> session2.commitTransaction()
replset:PRIMARY>
Check the data with the different sessions to validate the writes:
replset:PRIMARY> use percona
switched to db percona
replset:PRIMARY> db.people.find()
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
Note:
Here note that normal writes (non-transaction writes) are not affected by changing defaults for getLastErrorDefaults. So if you have an application that doesn’t use transactions, then you don’t need to act immediately to remove non-default values in getLastErrorDefaults though it is not recommended.
How setDefaultRWConcern Works
Here, we will also verify a case, whether the default writeConcern set via setDefaultRWConcern is working as expected. For testing, let’s take down a member from the 3 nodes replicaSet:
replset:PRIMARY> rs.status().members.forEach(function(doc){printjson(doc.name+" - "+doc.stateStr)})
"localhost:37040 - PRIMARY"
"localhost:37041 - SECONDARY"
"localhost:37042 - (not reachable/healthy)"
Then test it via a insert command with writeConcern mentioned in it explicitly. It should fail as there are only 2 members alive and {w:3} needs acknowledgement from 3 members of the replicaSet:
replset:PRIMARY> rs.conf().settings.getLastErrorDefaults
{ "w" : 1, "wtimeout" : 0 }
replset:PRIMARY> db.people.insert({"_id" : 6, "name" : "F"}, { writeConcern: { w: 3, wtimeout: 50 } })
WriteResult({
"nInserted" : 1,
"writeConcernError" : {
"code" : 64,
"codeName" : "WriteConcernFailed",
"errmsg" : "waiting for replication timed out",
"errInfo" : {
"wtimeout" : true,
"writeConcern" : {
"w" : 3,
"wtimeout" : 50,
"provenance" : "clientSupplied"
}
}
}
})
Let’s now test by setting setDefaultRWConcern to { “w” : 3, “wtimeout”: 30 } as follows:
replset:PRIMARY> db.adminCommand({ "setDefaultRWConcern" : 1, "defaultWriteConcern" : { "w" : 3, "wtimeout": 30 } })
{
"defaultWriteConcern" : {
"w" : 3,
"wtimeout" : 30
},
"updateOpTime" : Timestamp(1624787659, 1),
"updateWallClockTime" : ISODate("2021-06-27T09:54:26.332Z"),
"localUpdateWallClockTime" : ISODate("2021-06-27T09:54:26.332Z"),
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1624787666, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1624787666, 1)
}
Now try the write within a transaction or normally without specifying the writeConcern explicitly:
replset:PRIMARY> db.people.insert({"_id" : 7, "name" : "G"})
WriteResult({
"nInserted" : 1,
"writeConcernError" : {
"code" : 64,
"codeName" : "WriteConcernFailed",
"errmsg" : "waiting for replication timed out",
"errInfo" : {
"wtimeout" : true,
"writeConcern" : {
"w" : 3,
"wtimeout" : 30,
"provenance" : "customDefault"
}
}
}
})
With Transaction, the error occurs on commitTransaction() as follows:
replset:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("9aa75ea2-e03f-4ee6-abfb-8969334c9d98") }
replset:PRIMARY>
replset:PRIMARY> session.startTransaction()
replset:PRIMARY>
replset:PRIMARY> session.getDatabase("percona").people.insert([{_id: 8 , name : "H"},{_id: 9, name: "I"}])
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 2,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
replset:PRIMARY>
replset:PRIMARY> session.commitTransaction()
uncaught exception: Error: command failed: {
"writeConcernError" : {
"code" : 64,
"codeName" : "WriteConcernFailed",
"errmsg" : "waiting for replication timed out",
"errInfo" : {
"wtimeout" : true,
"writeConcern" : {
"w" : 3,
"wtimeout" : 30,
"provenance" : "customDefault"
}
}
},
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1624787808, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1624787808, 1)
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
commitTransaction@src/mongo/shell/session.js:966:17
@(shell):1:1
replset:PRIMARY>
How to Check DefaultRWConcern
You can get the current value of read/write concerns via getDefaultRWConcern command.
replset:PRIMARY> db.adminCommand( { getDefaultRWConcern : 1 } )
{
"defaultWriteConcern" : {
"w" : 3,
"wtimeout" : 30
},
"updateOpTime" : Timestamp(1624793719, 1),
"updateWallClockTime" : ISODate("2021-06-27T11:35:22.552Z"),
"localUpdateWallClockTime" : ISODate("2021-06-27T11:35:22.552Z"),
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1624793821, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1624793821, 1)
}
Conclusion
From 4.4.0, the workaround for the said problem is to use the defaults for getLastErrorDefaults until 4.4.6 and if needed after 4.4.6, set the cluster/replicaset wide write/read concerns via setDefaultRWConcern command instead. The version MongoDB 4.4 honors any write concern value that you specify in getLastErrorDefaults, however, it is not allowable from v5.0 (refer SERVER-56241). This behavior was reported in JIRA SERVER-54896, SERVER-55701 and got fixed from the version 4.4.7 to ignore the value of getLastErrorDefaults as per the bug fix.
Also, when you do MongoDB 4.4 upgrade, you can note this and if needed change your default read/write concern settings through setDefaultRWConcern method.
Hope this helps you!