Dec
18
2018
--

Percona Server for MongoDB 4.0.4-1 GA Is Now Available

Percona Server for MongoDB Operator

Percona announces the GA release of Percona Server for MongoDB 4.0.4-1 on December 18, 2018. Download the latest version from the Percona website or the Percona software repositories.

Date: December 18, 2018
Download: Percona website
Installation: Installing Percona Server for MongoDB

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends the functionality of the MongoDB 4.0 Community Edition by including the Percona Memory Engine storage engine, encrypted WiredTiger storage engine, audit logging, SASL authentication, hot backups, and enhanced query profilingPercona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition 4.0. Most notable among these are:

Note that the MMAPv1 storage engine is deprecated in MongoDB 4.0 Community Edition 4.0.

In Percona Server for MongoDB 4.0.4-1, data at rest encryption is considered BETA quality. Do not use this feature in a production environment.

Bugs Fixed

  • PSMDB-235: In some cases, hot backup did not back up the keydb directory; mongod could crash after restore.
  • PSMDB-233: When starting Percona Server for MongoDB with WiredTiger encryption options but using a different storage engine, the server started normally and produced no warnings that these options had been ignored
  • PSMDB-239: The WiredTiger encryption was not disabled when using the Percona Memory Engine storage engine.
  • PSMDB-241: WiredTiger per database encryption keys were not purged when the database was deleted
  • PSMDB-243: A log message was added to indicate that the server is running with encryption
  • PSMDB-245: KeyDB’s WiredTiger logs were not properly rotated without restarting the server.
  • PSMDB-266: When running the server with the --directoryperdb option, the user could add arbitrary collections to the keydb directory which is designated for data encryption.

Due to the fix of bug PSMDB-266, it is not possible to downgrade from version 4.0.4-1 to version 3.6.8-2.0 of  Percona Server for MongoDB if using data at rest encryption (it will be possible to downgrade to PSMDB 3.6 as soon as PSMDB-266 is ported to that version).

Dec
13
2018
--

MongoDB Backup: How and When To Use PSMDB hotbackup and mongodb_consistent_backup

mongodb backup

mongodb backupWe have many backup methods to backup a MongoDB database using native mongodump or external tools. However, in this article, we’ll take a look at the backup tools offered by Percona, keeping in mind the restoration scenarios for MongoDB replicaSet and Sharded Cluster environments. We’ll explore how and when to use the tool mongodb-consistent-backup from Percona lab to backup the database consistently in Sharded Cluster/replicaSet environments. We’ll also take a look at hotbackup, a tool that’s available in Percona Server for MongoDB (PSMDB) packages. 

Backup is done – What about Restore?

Those who are responsible for data almost always think about the methods needed to backup the database and store the backups securely. But they often fail to foresee the scenario where the backup needs to be used to restore data. For example, unfortunately, I have seen many companies schedule the backup of config files and shard servers separately, but they start and complete the backups at different times based on data volumes. But can we use that backup when we need to restore and start the cluster with it? The answer is no—well, maybe yes if you can tweak the metadata, but data inconsistency may occur. Using this backup schedule, the backup is not consistent for the whole cluster, and we don’t have a point where we can restore the data for all shards/config dbs so that we can start the cluster from that point. Consequently, we face a difficult situation where we really need to use that backup! 

Let’s explore the two tools/features available to backup MongoDB from Percona, and look at which method to choose based on your restoration plan. 

Hot backup for both replicaset and Sharded cluster:

The main problem with backup is maintaining consistency, as an application still writes to the DB while backup is going on. So to maintain the consistency throughout the backup, and get a reliable full backup of all data needed to restore the database, the backup tool needs to track changes via oplog as well.  Using the mongodump utility along with oplog backup would help to achieve this easily in a replicaSet environment since you will need consistency for that replicaSet alone.

But when we need a consistent backup of a Sharded cluster, then it is very difficult to achieve the total cluster consistency as it involvs the backup of all shards and config servers all together up to a particular point,  to reuse in failover cases. In this case, even if you use mongodump manually in each shard/config separately, and try to take a consistent backup of the total cluster when there are writes being made, it is a very tedious job.  The backup of each shard ends at different points based on different scenarios such as load, data volume etc.

To remedy this, we could take a consistent hot backup of the Sharded cluster by using our utility mongodb-consistent-backup – in other words, point-in-time backup for the sharded cluster environment. This utility internally uses mongodump and gets the oplog changes from each node until the backup from all data nodes and configs are complete. This ensures that there is consistency in the backup of a total Sharded Cluster! You have to make sure you are using replicaSet for your config server too.  In fact, this tool also helps you to take a consistent backup in the replicaSet environment. 

This utility is available in our Percona lab but please note that it is not yet supported officially. To install this package, please make sure you install all the dependency packages, and follow the steps mentioned in this link to complete the installation process.

If you have enabled authentication in your environment, then create a user like below:

db.createUser({
	user: "backup_usr",
	pwd: "backup_pass",
	roles: [
	{ role: "clusterMonitor", db: "admin" }
	]
})/

The backup could be taken as follows by connecting one of the mongos node in the Sharded Cluster. Here mongos is running on 27051 port and the Cluster has one config replicaSet cfg and two Shards s1 and s2.

[root@app mongodb_consistent_backup-master]# ./bin/mongodb-consistent-backup -H localhost \
> -P 27051 \
> -u backup_usr \
> -p backup_pass \
> -a admin \
> -n clusterFullBackup \
> -l backup/mongodb
[2018-12-05 18:57:38,863] [INFO] [MainProcess] [Main:init:144] Starting mongodb-consistent-backup version 1.4.0 
(git commit: unknown)
[2018-12-05 18:57:38,864] [INFO] [MainProcess] [Main:init:145] Loaded config: {"archive": {"method": "tar", "tar": 
{"binary": "tar", "compression": "gzip"}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, 
"authdb": "admin", "backup": {"location": "backup/mongodb", "method": "mongodump", "mongodump": {"binary": "/usr/bin/mongodump", 
"compression": "auto"}, "name": "clusterFullBackup"}, "environment": "production", "host": "localhost", "lock_file": 
"/tmp/mongodb-consistent-backup.lock", "notify": {"method": "none"}, "oplog": {"compression": "none", "flush": {"max_docs": 100, 
"max_secs": 1}, "tailer": {"enabled": "true", "status_interval": 30}}, "password": "******", "port": 27051, "replication": 
{"max_lag_secs": 10, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"method": 
"none", "retries": 5, "rsync": {"path": "/", "port": 22}, "s3": {"chunk_size_mb": 50, "region": "us-east-1", "secure": true}, 
"threads": 4}, "username": "backup_usr"}
...
...
[2018-12-05 18:57:40,715] [INFO] [MongodumpThread-5] [MongodumpThread:run:204] Starting mongodump backup of s2/127.0.0.1:27043
[2018-12-05 18:57:40,722] [INFO] [MongodumpThread-7] [MongodumpThread:run:204] Starting mongodump backup of cfg/127.0.0.1:27022
[2018-12-05 18:57:40,724] [INFO] [MongodumpThread-6] [MongodumpThread:run:204] Starting mongodump backup of s1/127.0.0.1:27032
[2018-12-05 18:57:40,800] [INFO] [MongodumpThread-5] [MongodumpThread:wait:130] s2/127.0.0.1:27043:	Enter password:
[2018-12-05 18:57:40,804] [INFO] [MongodumpThread-6] [MongodumpThread:wait:130] s1/127.0.0.1:27032:	Enter password:
[2018-12-05 18:57:40,820] [INFO] [MongodumpThread-7] [MongodumpThread:wait:130] cfg/127.0.0.1:27022:	Enter password:
...
...
[2018-12-05 18:57:54,880] [INFO] [MainProcess] [Mongodump:wait:105] All mongodump backups completed successfully
[2018-12-05 18:57:54,892] [INFO] [MainProcess] [Stage:run:95] Completed running stage mongodb_consistent_backup.Backup with task 
Mongodump in 14.21 seconds
[2018-12-05 18:57:54,913] [INFO] [MainProcess] [Tailer:stop:86] Stopping all oplog tailers
[2018-12-05 18:57:55,955] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer s2/127.0.0.1:27043 to stop
[2018-12-05 18:57:56,889] [INFO] [TailThread-2] [TailThread:run:177] Done tailing oplog on s2/127.0.0.1:27043, 2 oplog changes, 
end ts: Timestamp(1544036268, 1)
[2018-12-05 18:57:59,967] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer s1/127.0.0.1:27032 to stop
[2018-12-05 18:58:00,801] [INFO] [TailThread-3] [TailThread:run:177] Done tailing oplog on s1/127.0.0.1:27032, 3 oplog changes, 
end ts: Timestamp(1544036271, 1)
[2018-12-05 18:58:03,985] [INFO] [MainProcess] [Tailer:stop:118] Waiting for tailer cfg/127.0.0.1:27022 to stop
[2018-12-05 18:58:04,803] [INFO] [TailThread-4] [TailThread:run:177] Done tailing oplog on cfg/127.0.0.1:27022, 8 oplog changes, 
end ts: Timestamp(1544036279, 1)
[2018-12-05 18:58:06,989] [INFO] [MainProcess] [Tailer:stop:125] Oplog tailing completed in 27.85 seconds
...
...
[2018-12-05 18:58:09,478] [INFO] [MainProcess] [Rotate:symlink:83] Updating clusterFullBackup latest symlink to current backup 
path: backup/mongodb/clusterFullBackup/20181205_1857
[2018-12-05 18:58:09,480] [INFO] [MainProcess] [Main:run:461] Completed mongodb-consistent-backup in 30.49 sec

where,
n – backup directory name to be created
l – backup directory
H – hostname
P – port
p – password
u – user
a – authentication database

The log, above, shows the backup pattern going on, and it captures the state of the oplog, and updates the changes. The same command could be used to connect the replicaSet by having a proper hostname. The tool also has the ability to identify whether it is a replicaSet or Sharded cluster before proceeding with the backup. This can be determined from the log output, as shown below, which is written by the tool when running the backup:

For shading cluster:

[2018-12-05 19:05:02,453] [INFO] [MainProcess] [Main:run:299] Running backup in sharding mode using seed node(s): localhost:27051

For replicaSet:

[2018-12-05 19:23:05,070] [INFO] [MainProcess] [Main:run:257] Running backup in replset mode using seed node(s): localhost:27041

You can check out a couple of our blogs here and here for more details about the utility.

Hot but Cold backup

You may be wondering about the title Hot but Cold backup. Yes, for Percona Server for MongoDB (PSMDB) packages, there is feature to take the binary hot backup using hotbackup. Those who know the MySQL world will already know about Percona XtraBackup which is our open source and free binary hot backup utility for MySQL. PSMDB hotbackup works in a similar way. When you use hotbackup to backup, then you will have a binary backup ready to start an instance with the backup directory. You don’t need to worry about restoring from scratch and recreating indices. However, this solution works for replicaset/standalone mongodb instances only. 

If you can plan well, then you could feasibly use this feature to backup a Sharded cluster by bringing down one of the secondaries from all shards/config servers at the same time (probably when there is low or no transaction writing), then start them on a different port and without the replicaSet variable option, so that those instances won’t rejoin their replicaSet. Now you can start the hotbackup in all instances, once they are finished. You can revert the changes in the config file and allow them to rejoin their replicaSet.

Cautionary notes: Please make sure you are using the low priority or hidden nodes for this purpose, so that the election is not triggered when they split/join back to the replicaSet and don’t use SIGKILL (kill -9) to stop the db as it shuts down the database abruptly. Also, please plan to have at least an equal amount of disk space to that of your shard. A hotbackup takes an approximately equal amount of space as your node. 

 My colleague Tim Vaillancourt has written a great blogpost on this. See here.  

Conclusion

So from the above two methods, now you have the option to choose the similar backup methods based on your RTO, RPO explained here. Hope this helps you! Please share your comments and feedback below, and tell me what you think!

REFERENCES:

https://www.percona.com/doc/percona-server-for-mongodb/LATEST/hot-backup.html
https://www.percona.com/forums/questions-discussions/percona-server-for-mongodb/53006-percona-mongodb-difference-between-hot-backup-and-backup-using-mongo-dump
https://www.percona.com/blog/2016/07/25/mongodb-consistent-backups/
https://www.percona.com/blog/2018/04/06/free-fast-mongodb-hot-backup-with-percona-server-for-mongodb/
https://www.bluelock.com/blog/rpo-rto-pto-and-raas-disaster-recovery-explained/
https://en.wikipedia.org/wiki/Disaster_recovery
https://www.druva.com/blog/understanding-rpo-and-rto/
https://www.percona.com/live/e17/sites/default/files/slides/Running%20MongoDB%20in%20Production%20-%20FileId%20-%20115299.pdf
https://major.io/2010/03/18/sigterm-vs-sigkill/
https://docs.mongodb.com/manual/core/sharded-cluster-config-servers


Photo by Designecologist from Pexels

 

Dec
11
2018
--

Percona XtraDB Cluster Operator Is Now Available as an Early Access Release

Percona XtraDB Cluster Operator

Percona announces the early access release of Percona XtraDB Cluster Operator.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona XtraDB Cluster Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

You can install Percona XtraDB Cluster Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The operator was developed with high availability in mind, so it will attempt to run ProxySQL and XtraDB Cluster instances on separate worker nodes if possible, deploying the database cluster on at least three member nodes.

Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses that integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

Dec
11
2018
--

Percona Server for MongoDB Operator Is Now Available as an Early Access Release

Percona Server for MongoDB Operator

Percona Server for MongoDB OperatorPercona announces the early access release of Percona Server for MongoDB Operator.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona Server for MongoDB Operator simplifies the deployment and management of Percona Server for MongoDB in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona Server for MongoDB Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

Percona Server for MongoDB Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona Server for MongoDB features in this early access release, instructions on how to install and configure it are already available along with the operator source code, which is hosted in our Github repository.

The operator was developed to give consideration to high availability, so it will attempt to run MongoDB instances on separate worker nodes (if possible), and deploy the database cluster as a single Replica Set with at least three member nodes.

Percona Server for MongoDB Operator

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

Dec
04
2018
--

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

Dec
03
2018
--

Percona Live 2019 Call for Papers is Now Open!

Percona Live CFP 2019

Percona Live 2019Announcing the opening of the Percona Live 2019 Open Source Database Conference call for papers. It will be open from now until January 20, 2019. The Percona Live Open Source Database Conference 2019 takes place May 28-30 in Austin, Texas.

Our theme this year is CONNECT. ACCELERATE. INNOVATE.

As a speaker at Percona Live, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE!

Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live in showcasing thought leading practices in the open source database world.

With a nod to innovation, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits.

Speaking Opportunities

The Percona Live Open Source Database Conference 2019 Call for Papers is open until January 20, 2019. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry-level or of general interest to all), Core (intermediate), and Masterclass (advanced) levels.

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

If your proposal is selected for breakout or tutorial sessions, you will receive a complimentary full conference pass.

Topics and Themes

We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI?

Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies –  what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers.

In the submission entry, indicate which of these themes your proposal best fits: tutorial, business needs; case studies/use cases; operations; or development. Also include which track(s) from the list below would be best suited to your talk.

Tracks

The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including:

  • MySQL. Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release influence you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment?
  • MariaDB. Talks highlighting MariaDB and MariaDB compatible databases and related tools. Discuss the latest features, how to optimize performance, and demonstrate the best practices you’ve adopted from real production use cases and applications.
  • PostgreSQL. Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your applications? Why, and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment?
  • MongoDB. Has the 4.0 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment?
  • Polyglot Persistence. How are you using multiple open source databases together? What tools and technologies are helping you to get them interacting efficiently? In what ways are multiple databases working together helping to solve critical business issues? What are the best practices you’ve discovered in your production environments?
  • Observability and Monitoring. How are you designing your database-powered applications for observability? What monitoring tools and methods are providing you with the best application and database insights for running your business? How are you using tools to troubleshoot issues and bottlenecks? How are you observing your production environment in order to understand the critical aspects of your deployments? 
  • Kubernetes. How are you running open source databases on the Kubernetes, OpenShift and other container platforms? What software are you using to facilitate their use? What best practices and processes are making containers a vital part of your business strategy? 
  • Automation and AI. How are you using automation to run databases at scale? Are you using automation to create self-running, self-healing, and self-tuning databases? Is machine learning and artificial intelligence (AI) helping you create a new generation of database automation?
  • Migration to Open Source Databases. How are you migrating to open source databases? Are you migrating on-premises or to the cloud? What are the tools and strategies you’ve used that have been successful, and what have you learned during and after the migration? Do you have real-world migration stories that illustrate how best to migrate?
  • Database Security and Compliance. All of us have experienced security and compliance challenges. From new legislation like GDPR, PCI and HIPAA, exploited software bugs, or new threats such as ransomware attacks, when is enough “enough”? What are your best practices for preventing incursions? How do you maintain compliance as you move to the cloud? Are you finding that security and compliance requirements are preventing your ability to be agile?
  • Other Open Source Databases. There are many, many great open source database software and solutions we can learn about. Submit other open source database talk ideas – we welcome talks for both established database technologies as well as the emerging new ones that no one has yet heard about (but should).
  • Business and Enterprise. Has your company seen big improvements in ROI from using Open Source Databases? Are there efficiency levels or interesting case studies you want to share? How did you convince your company to move to Open Source?

How to Respond to the Call for Papers

For information on how to submit your proposal, visit our call for papers page.

Sponsorship

If you would like to obtain a sponsor pack for Percona Live Open Source Database Conference 2019, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly.

Nov
27
2018
--

Setup Compatible OpenLDAP Server for MongoDB and MySQL

Set up LDAP authentication for MySQL and MongoDB

Set up LDAP authentication for MySQL and MongoDBBy the end of this article, you should be able to have a Percona Server for MongoDB and Percona Server for MySQL instance able to authenticate on an OpenLDAP backend. While this is mostly aimed at testing scenarios, it can be easily extended for production by following the OpenLDAP production best practices i.e. attending to security and high availability.

The first step is to install OpenLDAP via the

slapd

  package in Ubuntu.

sudo apt update
sudo apt install slapd ldap-utils

During installation, it will ask you for a few things listed below:

  • DNS Domain Name:
    ldap.local
  • Organization Name:
    Percona
  • Administrator password:
    percona

All these values are arbitrary, you can choose whatever suits your organization—especially the password.

Once

slapd

  is running, we can create our logical groups and actual users on the LDAP server. To make it simple, we use LDIF files instead of GUIs. Our first file,

perconadba.ldif

 contains our

perconadba

  group definition. Take note of the root name part

dc=ldap,dc=local

  it is simply the broken down value of our DNS Domain Name during the installation of

slapd

 .

dn: ou=perconadba,dc=ldap,dc=local
objectClass: organizationalUnit
ou: perconadba

We can add this definition into LDAP with the command shown below. With the

-W

  option, it will prompt you for a password.

ldapadd -x -W -D "cn=admin,dc=ldap,dc=local" -f perconadba.ldif

The next step is to create our user in LDAP, this user will be looked up by both MongoDB and MySQL during authentication to verify their password. Our LDIF file (

percona.ldif

 ) would look like this:

dn: uid=percona,ou=perconadba,dc=ldap,dc=local
objectClass: top
objectClass: account
objectClass: posixAccount
objectClass: shadowAccount
cn: percona
uid: percona
uidNumber: 1100
gidNumber: 100
homeDirectory: /home/percona
loginShell: /bin/bash
gecos: percona
userPassword: {crypt}x
shadowLastChange: -1
shadowMax: -1
shadowWarning: -1

The

-1

  values for the

shadow*

  fields are important, we set them to negative to mean the password shadow does not expire. If these are set to zero (0), then MySQL will not be able to authenticate since PAM will complain that the password has expired and needs to be changed.

We can then add this user into LDAP, again the command below will ask for the admin password we entered during slapd’s installation.

ldapadd -x -W -D "cn=admin,dc=ldap,dc=local" -f percona.ldif

To verify, we can search for the user we just entered using the command below. Notice we used the -w parameter to specify the admin password inline.

ldapsearch -x -D 'cn=admin,dc=ldap,dc=local' -w percona \
	-b 'ou=perconadba,dc=ldap,dc=local' '(uid=percona)'

As last step on setting up our LDAP user properly is to give it a valid password. The -s parameter below is the actual password we will set for this user.

ldappasswd -s percona -D "cn=admin,dc=ldap,dc=local" -w percona \
	-x "uid=percona,ou=perconadba,dc=ldap,dc=local"

At this point you should have a generic LDAP server that should work for both MongoDB and MySQL.

PAM Configuration for MySQL

To make this work for a MySQL and support PAM authentication, take note of the following configuration files. Instructions on setting up PAM for MySQL is aplenty on this blog I just need to specify Ubuntu Bionic specific configuration files to make it work.

/etc/nslcd.conf

The only important difference with this configuration—compared to Jaime’s post for example—is the values for

filter

 . If you are using Windows Active Directory, the map values are also important (posixAccount objectClass has been deprecated on recent release of Windows Active Directory).

uid nslcd
gid nslcd
uri ldap:///localhost
base ou=perconadba,dc=ldap,dc=local
filter passwd (&(objectClass=account)(objectClass=posixAccount))
filter group (&(objectClass=shadowAccount)(objectClass=account))
map    passwd uid           uid
map    passwd uidNumber     uidNumber
map    passwd gidNumber     gidNumber
map    passwd homeDirectory "/home/$uid"
map    passwd gecos         uid
map    passwd loginShell    "/bin/bash"
map    group gidNumber      gidNumber
binddn cn=admin,dc=ldap,dc=local
bindpw percona
tls_cacertfile /etc/ssl/certs/ca-certificates.crt

/etc/nsswitch.conf

Also for nsswitch.conf, make sure that passwd, group and shadow does LDAP lookups.

...
passwd:         compat systemd ldap
group:          compat systemd ldap
shadow:         compat systemd ldap
gshadow:        files ldap
...

SASL for MongoDB

Adamo’s excellent post on MongoDB LDAP Authentication has all the details on configuring MongoDB itself. To complement that, if you use this LDAP test setup, you need the take note of the following configuration files with specific differences.

/etc/mongod.conf

In the

mongod.conf

  configuration file, I explicitly added the saslauthd socket path.

security:
  authorization: enabled
setParameter:
  saslauthdPath: /var/run/saslauthd/mux
  authenticationMechanisms: PLAIN,SCRAM-SHA-1

/etc/saslauthd.conf

For the saslauthd daemon configuration, the configuration has no actual difference – just take note I used differing values based on the LDAP setup above. Specifically, the

ldap_filter

  and

ldap_search_base

  are key options here which are concatenated during an LDAP search to come up with the

percona

  user’s account information.

ldap_servers: ldap://localhost:389/
ldap_search_base: ou=perconadba,dc=ldap,dc=local
ldap_filter: (uid=%u)
# Optional: specify a user to perform ldap queries
ldap_bind_dn: CN=admin,DC=ldap,DC=local
# Optional: specify ldap user’s passwordi
ldap_password: percona

Enterprise quality features should not be complex and expensive. Tell us about your experience with our software and external authentication in the comments below!

Nov
26
2018
--

Upcoming Webinar Thurs 11/29: Improve MongoDB Performance with Proper Queries

Improve MongoDB Performance with Proper Queries

Improve MongoDB Performance with Proper QueriesPlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt, as he presents Improve MongoDB Performance with Proper Queries on Thursday, November 29th, 2018, at 12:30 PM PST (UTC-8) / 3:30 PM EST (UTC-5).

Register Now

There are many different ways you can use queries in MongoDB to find the data you need. However, knowing which queries are slowing down your performance can be a challenge. In this webinar we’ll discuss the following:

  • Performing ad hoc queries on the database using the find or findOne functions and a query document.
  • How to query for ranges, set inclusion, inequalities, and more using $-conditionals.
  • How to use and sort queries that return a database cursor, which lazily returns batches of documents as you need them.
  • What pitfalls you can encounter when performing the many available meta operations on a cursor, including skipping a certain number of results, and limiting the number of results returned.

By the end of this webinar you will have a better understanding of which queries impact performance. Moreover, you’ll understand how to leverage open source tools to monitor queries.

Register for this webinar to learn how to improve MongoDB performance with the proper queries.

Nov
23
2018
--

Percona Server for MongoDB 3.4.18-2.16 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.18-2.16 on November 23, 2018. Download the latest version from the Percona website or the Percona Software Repositories.

Percona Server for MongoDB 3.4 is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.18. There are the following improvements on top of those upstream fixes.

Improvements

  • #247: Now, AuditLog formats are listed in the output of mongod --help and mongos --help commands

Other improvements:

  • #238audit_drop_database.js test occasionally fails with MMAPv1

 

Nov
21
2018
--

Identifying Unused Indexes in MongoDB

mongodb index usage stats PMM visualization

Like MySQL, having too many indexes on a MongoDB collection not only affects overall write performance, but disk and memory resources as well. While MongoDB holds predictably well in scaling both reads and writes options, maintaining a heathly schema design should always remain a core character of a good application stack.

Aside from knowing when to add an index to improve query performance, and how to modify indexes to satisfy changing query complexities, we also need to know how to identify unused indexes and cut their unnecessary overhead.

First of all, you can already identify access operation counters from each collection using the

$indexStats

  (

indexStats

  command before 3.0) aggregation command. This command provides two important pieces of information: the

ops

  counter value, and

since

 , which is when the ops counter first iterated to one. It is reset when the

mongod

  instance is restarted.

m34:PRIMARY> db.downloads.aggregate( [ { $indexStats: { } } ] ).pretty()
{
	"name" : "_id_",
	"key" : {
		"_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:53:31.429Z")
	}
}
{
	"name" : "h_id_1",
	"key" : {
		"h_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:54:57.634Z")
	}
}

From this information, if the ops counter is zero for any index, then we can assume it has not been used either since the index was added or since the server was restarted, with a few exceptions. An index might be unique and not used at all (a uniqueness check on INSERT does not increment the ops counter). The documentation also indicates that index stats counter does not get updated by TTL indexes expiration or chunk split and migration operations.

Be aware of occasional index use

One golden rule, however, is that this type of observation based on type is subjective – before you decide to drop the index, make sure that the counter has collected for a considerable amount of time. Dropping an index that is only used once a month for some heavy reporting can be problematic.

The same information from

$indexStats

  can also be made available to PMM. By default, the

mongo_exporter

  does not include this this information but it can be enabled as an additional collection parameter.

sudo pmm-admin add mongodb:metrics --uri 127.0.0.1:27018 -- -collect.indexusage

Once enabled, we can create a custom graph for this information from any PMM dashboard, as shown below. As mentioned above, any index(es) that has zero values will not have been used for the current time range in the graph. One minor issue with the collector is that each metric does not come with the database and collection information. Consequently, we cannot filter to the collection level yet, we have an improvement request open for that.

MongoDB index usage dashboard report from percona monitoring and management

An alternative view to this information from Grafana PMM is available from the Time Series to Aggregation table panel, shown below. One advantage of having these metrics in PMM is that the data survives an instance restart. Of course, to be useful for identifying unused indexes, the retention period has to match or exceed your complete application “cycle” period.MongoDB index usage stats from PMM

Given that in a MongoDB replicaset, you can delegate data bearing member nodes to different roles, perhaps with tags and priorities. You can also have nodes with different sets of indexes. Being able to identify the sets of indexes needed at the node level allows you to optimize replication, queries, and resource usage.

More Resources

We have an introductory series of posts on MongoDB indexes available on this blog. Read Part 1 here.

You can download Percona Server for MongoDB – all Percona software is open source and free.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com