Jun
03
2022
--

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded ClusterIn this blog post, we will discuss how can we migrate from a replica set to sharded cluster. 

Before moving to migration let me briefly explain Replication and Sharding and why do we need to shard a replica Set.

Replication: It creates additional copies of data and allows for automatic failover to another node in case Primary went down. It also helps to scale our reads if the application is fine to read data that may not be the latest.

Sharding: It allows horizontal scaling of data writes by allowing data partition in multiple servers by using a shard key. Here, we should understand that a shard key is very important to distribute the data evenly across multiple servers. 

Why Do We Need a Sharded Cluster?

We need sharding due to the below reasons:

  1. By adding shards, we can reduce the number of operations each shard manages. 
  2. It increases the Read/Write capacity by distributing the Reads/Writes across multiple servers. 
  3. It also gives high availability as we deploy the replicas for the shards, config servers, and multiple MongoS.

Sharded cluster will include two more components which are Config Servers and Query routers i.e. MongoS.

Config Servers: It keeps metadata for the sharded cluster. The metadata comprises a list of chunks on each shard and the ranges that define the chunks. The metadata indicates the state of all the data and its components within the cluster. 

Query Routers(MongoS): It caches metadata and uses it to route the read or write operations to the respective shards. It also updates the cache when there are any metadata changes for the sharded cluster like Splitting of chunks or shard addition etc. 

Note: Before starting the migration process it’s recommended that you perform a full backup (if you don’t have one already).

The Procedure of Migration:

  1. Initiate at least a three-member replica set for the Config Server ( another member can be included as a hidden node for the backup purpose).
  2. Perform necessary OS, H/W, and disk-level tuning as per the existing Replica set.
  3. Setup the appropriate clusterRole for the Config servers in the mongod config file.
  4. Create at least two more nodes for the Query routers ( MongoS )
  5. Set appropriate configDB parameters in the mongos config file.
  6. Repeat step 2 from above to tune as per the existing replica set.
  7. Apply proper SELinux policies on all the newly configured nodes of Config server and MongoS.
  8. Add clusterRole parameter into existing replica set nodes in a rolling fashion.
  9. Copy all the users from the replica set to any MongoS.
  10. Connect to any MongoS and add the existing replica set as Shard. 

Note: Do not enable sharding on any database until the shard key is finalized. If it’s finalized then we can enable the sharding.

Detailed Migration Plan:

Here, we are assuming that a Replica set has three nodes (1 primary, and 2 secondaries)

  1. Create three servers to initiate a 3-member replica set for the Config Servers.

Perform necessary OS, H/W, and disk-level tuning. To know more about it, please visit our blog on Tuning Linux for MongoDB.

  1. Install the same version of Percona Server for MongoDB as the existing replica set from here.
  2. In the config file of the config server mongod, add the parameter clusterRole: configsvr and port: 27019  to start it as config server on port 27019.
  3. If SELinux policy is enabled then set the necessary SELinux policy for dbPath, keyFile, and logs as below.
sudo semanage fcontext -a -t mongod_var_lib_t '/dbPath/mongod.*'

sudo chcon -Rv -u system_u -t mongod_var_lib_t '/dbPath/mongod'

sudo restorecon -R -v '/dbPath/mongod'

sudo semanage fcontext -a -t mongod_log_t '/logPath/log.*'

sudo chcon -Rv -u system_u -t mongod_log_t '/logPath/log'

sudo restorecon -R -v '/logPath/log'

sudo semanage port -a -t mongod_port_t -p tcp 27019

Start all the Config server mongod instances and connect to any one of them. Create a temporary user on it and initiate the replica set.

> use admin

> rs.initiate()

> db.createUser( { user: "tempUser", pwd: "<password>", roles:[{role: "root" , db:"admin"}]})

Create a role anyResource with action anyAction as well and assign it to “tempUser“.

>db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",

      "privileges": [

         { "resource": { "anyResource": true },

           "actions": [ "anyAction" ]

         }

      ],

      "roles": []

   });

> 

>db.grantRolesToUser( "tempUser", [{role: "pbmAnyAction", db: "admin"}]  )

> rs.add("config_host[2-3]:27019")

Now our Config server replica set is ready, let’s move to deploying Query routers i.e. MongoS.

  1. Create two instances for the MongoS and tune the OS, H/W, and disk. To do it follow our blog Tuning Linux for MongoDB or point 1 from the above Detailed migration.
  2. In mongos config file, adjust the configDB parameter and include only non-hidden nodes of Config servers ( In this blog post, we have not mentioned starting hidden config servers).
  3. Apply SELinux policies if it’s enabled, then follow step 4 and keep the same keyFile and start the MongoS on port 27017.
  4. Add the below parameter in mongod.conf on the Replica set nodes. Make sure the services are restarted in a rolling fashion i.e. start with the Secondaries then step down the existing Primary and restart it with port 27018.
clusterRole: shardsvr

Login to any MongoS and authenticate using “tempUser” and add the existing replica set as a shard.

> sh.addShard( "replicaSetName/<URI of the replica set>") //Provide URI of the replica set

Verify it with:

> sh.status() or db.getSiblingDB("config")['shards'].find()

Connect to the Primary of the replica set and copy all the users and roles. To authenticate/authorize mention the replica set user.

> var mongos = new Mongo("mongodb://put MongoS URI string here/admin?authSource=admin") //Provide the URI of the MongoS with tempUser for authentication/authorization.

>db.getSiblingDB("admin").system.roles.find().forEach(function(d) {

mongos.getDB('admin').getCollection('system.roles').insert(d)});

>db.getSiblingDB("admin").system.users.find().forEach(function(d) { mongos.getDB('admin').getCollection('system.users').insert(d)});

  1.  Connect to any MongoS and verify copied users on it. 
  2.  Shard the database if shardKey is finalized (In this post, we are not sharing this information as it’s related to migration of Replica set to Sharded cluster only).

Shard the database:

>sh.enableSharding("<db>")

Shard the collection with hash-based shard key:

>sh.shardCollection("<db>.<coll1>", { <shard key field> : "hashed" } )

Shard the collection with range based shard key:

sh.shardCollection("<db>.<coll1>", { <shard key field> : 1, ... } )

Conclusion

Migration of a MongoDB replica set to a sharded cluster is very important to scale horizontally, increase the read/write operations, and also reduce the operations each shard manages.

We encourage you to try our products like Percona Server for MongoDB, Percona Backup for MongoDB, or Percona Operator for MongoDB. You can also visit our site to know “Why MongoDB Runs Better with Percona”.

Oct
19
2021
--

How to Build Percona Server for MongoDB for Various Operating Systems

How to Build Percona Server for MongoDB Operating Systems

How to Build Percona Server for MongoDB Operating SystemsFollowing this series of blog posts started by Evgeniy Patlan:

we’ll show you how to build Percona Server for MongoDB for various operating systems¹ using Docker on your local Linux machine/build server. In this very case, we’ll build packages of Percona Server for MongoDB 4.4.9-10 version for Centos 8 and Debian 11 (bullseye).

This can be useful when you need to test your changes to the code for different RPMs/DEBs based platforms and make sure that all works as expected in different environments. In our case, this approach is used for building Percona Server for MongoDB packages/binary tarballs for all supported OSes.

Prepare Build Environment

  • Make sure that you have at least 60GB of free disk space
  • Create a “build folder” – the folder where all the build actions will be performed, in our case “/mnt/psmdb-44/test
  • Make sure that you have installed the package which provides docker and docker service is up and running

Obtain Build Script of Needed Version²

You need to download the build script of the needed version to the “/mnt/psmdb-44” folder:

cd /mnt/psmdb-44/
wget https://raw.githubusercontent.com/percona/percona-server-mongodb/psmdb-4.4.9-10/percona-packaging/scripts/psmdb_builder.sh -O psmdb_builder.sh

Create Percona Server for MongoDB Source Tarball

  • Please note that for the creation of source tarball, we use the oldest supported OS, in this case, it is Centos 7.
docker run -ti -u root -v /mnt/psmdb-44:/mnt/psmdb-44 centos:7 sh -c '
set -o xtrace
cd /mnt/psmdb-44
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --install_deps=1
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --repo=https://github.com/percona/percona-server-mongodb.git \
--branch=release-4.4.9-10 --psm_ver=4.4.9 --psm_release=10 --mongo_tools_tag=100.4.1 --jemalloc_tag=psmdb-3.2.11-3.1 --get_sources=1
'

  • Check that source tarball has been created:
$ ls -la /mnt/psmdb-44/source_tarball/
total 88292
drwxr-xr-x. 2 root root     4096 Oct  1 10:58 .
drwxr-xr-x. 5 root root     4096 Oct  1 10:58 ..
-rw-r--r--. 1 root root 90398894 Oct  1 10:58 percona-server-mongodb-4.4.9-10.tar.gz

Build Percona Server for MongoDB Generic Source RPM/DEB:

Please note that for building generic source RPM/DEB, we still use the oldest supported RPM/DEB-based OS, in this case, Centos 7/ Ubuntu Xenial(16.04).

  • Build source RPM:
docker run -ti -u root -v /mnt/psmdb-44:/mnt/psmdb-44 centos:7 sh -c '
set -o xtrace
cd /mnt/psmdb-44
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --install_deps=1
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --repo=https://github.com/percona/percona-server-mongodb.git \
--branch=release-4.4.9-10 --psm_ver=4.4.9 --psm_release=10 --mongo_tools_tag=100.4.1 --jemalloc_tag=psmdb-3.2.11-3.1 --build_src_rpm=1
'

  • Build source DEB:
docker run -ti -u root -v /mnt/psmdb-44:/mnt/psmdb-44 ubuntu:xenial sh -c '
set -o xtrace
cd /mnt/psmdb-44
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --install_deps=1
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --repo=https://github.com/percona/percona-server-mongodb.git \
--branch=release-4.4.9-10 --psm_ver=4.4.9 --psm_release=10 --mongo_tools_tag=100.4.1 --jemalloc_tag=psmdb-3.2.11-3.1 --build_src_deb=1
'

  • Check that both SRPM and Source DEB have been created:
$ ls -la /mnt/psmdb-44/srpm/
total 87480
drwxr-xr-x. 2 root root     4096 Oct  1 11:35 .
drwxr-xr-x. 6 root root     4096 Oct  1 11:35 ..
-rw-r--r--. 1 root root 89570312 Oct  1 11:35 percona-server-mongodb-4.4.9-10.generic.src.rpm

$ ls -la /mnt/psmdb-44/source_deb/
total 88312
drwxr-xr-x. 2 root root     4096 Oct  1 11:45 .
drwxr-xr-x. 7 root root     4096 Oct  1 11:45 ..
-rw-r--r--. 1 root root    10724 Oct  1 11:45 percona-server-mongodb_4.4.9-10.debian.tar.xz
-rw-r--r--. 1 root root     1528 Oct  1 11:45 percona-server-mongodb_4.4.9-10.dsc
-rw-r--r--. 1 root root     2075 Oct  1 11:45 percona-server-mongodb_4.4.9-10_source.changes
-rw-r--r--. 1 root root 90398894 Oct  1 11:45 percona-server-mongodb_4.4.9.orig.tar.gz

Build Percona Server for MongoDB RPMs/DEBs:

  • Build RPMs:
docker run -ti -u root -v /mnt/psmdb-44:/mnt/psmdb-44 centos:8 sh -c '
set -o xtrace
cd /mnt/psmdb-44
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --install_deps=1
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --repo=https://github.com/percona/percona-server-mongodb.git \
--branch=release-4.4.9-10 --psm_ver=4.4.9 --psm_release=10 --mongo_tools_tag=100.4.1 --jemalloc_tag=psmdb-3.2.11-3.1 --build_rpm=1
'

  • Build DEBs:
docker run -ti -u root -v /mnt/psmdb-44:/mnt/psmdb-44 debian:bullseye sh -c '
set -o xtrace
cd /mnt/psmdb-44
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --install_deps=1
bash -x ./psmdb_builder.sh --builddir=/mnt/psmdb-44/test --repo=https://github.com/percona/percona-server-mongodb.git \
--branch=release-4.4.9-10 --psm_ver=4.4.9 --psm_release=10 --mongo_tools_tag=100.4.1 --jemalloc_tag=psmdb-3.2.11-3.1 --build_deb=1
'

  • Check that RPMs for Centos 8 and DEBs for Debian 11 have been created:
$  ls -la /mnt/psmdb-44/rpm/
total 1538692
drwxr-xr-x. 2 root root      4096 Oct  1 13:19 .
drwxr-xr-x. 9 root root      4096 Oct  1 13:19 ..
-rw-r--r--. 1 root root      8380 Oct  1 13:19 percona-server-mongodb-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  19603132 Oct  1 13:19 percona-server-mongodb-debugsource-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  16199100 Oct  1 13:19 percona-server-mongodb-mongos-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root 382301668 Oct  1 13:19 percona-server-mongodb-mongos-debuginfo-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  37794568 Oct  1 13:19 percona-server-mongodb-server-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root 829718252 Oct  1 13:19 percona-server-mongodb-server-debuginfo-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  13310328 Oct  1 13:19 percona-server-mongodb-shell-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root 218625728 Oct  1 13:19 percona-server-mongodb-shell-debuginfo-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  30823056 Oct  1 13:19 percona-server-mongodb-tools-4.4.9-10.el8.x86_64.rpm
-rw-r--r--. 1 root root  27196024 Oct  1 13:19 percona-server-mongodb-tools-debuginfo-4.4.9-10.el8.x86_64.rpm

$  ls -la /mnt/psmdb-44/deb/
total 2335288
drwxr-xr-x. 2 root root       4096 Oct  1 13:16 .
drwxr-xr-x. 9 root root       4096 Oct  1 13:16 ..
-rw-r--r--. 1 root root 2301998432 Oct  1 13:16 percona-server-mongodb-dbg_4.4.9-10.bullseye_amd64.deb
-rw-r--r--. 1 root root   14872728 Oct  1 13:16 percona-server-mongodb-mongos_4.4.9-10.bullseye_amd64.deb
-rw-r--r--. 1 root root   35356944 Oct  1 13:16 percona-server-mongodb-server_4.4.9-10.bullseye_amd64.deb
-rw-r--r--. 1 root root   12274928 Oct  1 13:16 percona-server-mongodb-shell_4.4.9-10.bullseye_amd64.deb
-rw-r--r--. 1 root root   26784020 Oct  1 13:16 percona-server-mongodb-tools_4.4.9-10.bullseye_amd64.deb
-rw-r--r--. 1 root root      18548 Oct  4 13:16 percona-server-mongodb_4.4.9-10.bullseye_amd64.deb

Now, the packages are ready to be installed for testing/working on Centos 8 and Debian 11.

As you can see from the above, the process of building packages for various operating systems is quite easy and doesn’t require lots of physical/virtual machines. All you need is the build script and Docker.

Also, as you may have noticed, all the build commands are similar to each other except the last passed argument, which defines the action that should be performed. Such an approach allows us to unify the build process and make it scripted so that the last argument can be passed as a parameter to the script. Surely all the rest arguments can and should also be passed as parameters in case you are going to automate the build process.

¹ Supported operating systems(version psmdb-4.4.9-10):

  • Centos 7
  • Centos 8
  • Ubuntu Xenial(16.04)
  • Ubuntu Bionic(18.04)
  • Ubuntu Focal(20.04)
  • Debian Stretch(9)
  • Debian Buster(10)
  • Debian Bullseye(11)

² In order to build Percona Server for MongoDB of another version, you need to use the build script of the proper version. For example, it is needed to build Percona Server for MongoDB of 4.2.7-7 version:

Complete the 2021 Percona Open Source Data Management Software Survey

Have Your Say!

Aug
17
2021
--

Percona Server for MongoDB 5.0.2 Release Candidate Is Now Available

Percona Server for MongoDB 5.0.2

Percona Server for MongoDB 5.0.2We’re happy to announce the first release candidate of Percona Server for MongoDB version 5.0.2 (PSMDB). It is now available for download from the Percona website and via the Percona Software Repositories.

Percona Server for MongoDB 5.0.2 is an enhanced, source-available, and highly scalable document-oriented database that is a fully compatible drop-in replacement for MongoDB 5.0.2 Community Edition. It includes all the features of MongoDB 5.0.2 Community Edition, as well as some additional enterprise-grade features.

The most notable features in version 5.0 include the following:

  • Resharding allows you to select a new shard key for a collection and then works in the background to correct any data distribution problems caused by bad shard keys and improve performance.
  • Time Series Collections are aimed at storing sequences of measurements over a period of time. These specialized collections will store data in a highly optimized way that will improve query efficiency, allow data analysis in real-time, and optimize disk usage.
  • Resumable Index Builds means that the index build for a collection continues if a primary node in a replica set is switched to another server or when a server restarts. The build process is saved to disk and resumes from the saved position. This allows DBAs to perform maintenance and not worry about losing the index build in the process.
  • Window operators allow operations on a specified span of documents known as a window. $setWindowFields is a new pipeline stage to operate with these documents.
  • Versioned API allows specifying which API version your application communicating with MongoDB runs against. Versioned API detaches the application’s lifecycle from that of the database. As a result, you modify the application only to introduce new features instead of having to maintain compatibility with the new version of MongoDB.

Additionally, new aggregation operators such as $count, $dateAdd, $dateDiff, $dateSubtract, $sampleRate and $rand are available with this release.

Note: As with every major release, version 5.0 comes with a significant number of new features and is still being rapidly updated. At this point, we’re making this version available as a “Release Candidate” only and we strongly suggest not to use it for production environments yet. However, we do encourage the use of this version in test and development environments.

We’re also still in the process of integrating support for version 5.0 into our other products. While Percona Backup for MongoDB 1.6.0 has just been released to support this version, some other products still need to be updated and tested.

For example, the Percona Distribution for MongoDB Operator will have PSMDB 5.0 support from version 1.10.0, which is slated to happen in mid-September.

On the Percona Monitoring and Management side, Percona Server for MongoDB 5.0 support is scheduled to be included in version 2.22.0 (currently targeting the end of September).

Because of these factors, we will not release version 5.0 of our Percona Distribution for MongoDB until we’ve updated these products and have gathered enough confidence to remove the “release candidate” label.

Jul
05
2021
--

MongoDB 5.0 Is Coming in Hot! What Do Database Experts Across the Community Think?

MongoDB 5.0 Percona

MongoDB 5.0 PerconaIf you love using MongoDB databases, you’ll want to tune in to this live-stream event ‘Percona and Friends React to MongoDB live’ at 11:00 AM EDT on July 15.

Watch or listen as industry experts from Percona, Southbank Software, and Qarbine respond to MongoDB’s conference announcements. The team will consider:

  • New features and other announcements
  • The importance of new MongoDB 5.0 features for applications
  • What this might mean for the Community Edition
  • The impact MongoDB 5.0 will have on users and the Community

This is a live event. So please bring your questions or concerns, and raise your voice to give your thoughts on the latest product news.

Or, if you’re feeling shy, you could just listen in!

Register Today

Our Community-based panel has a wide variety of expertise and experience.

Akira Kurogane

MongoDB Product Owner for Percona’s Enterprise MongoDB product additions and tools

Akira is an expert in MongoDB symptom-to-code defect analysis, diagnostics, and performance. He has helped countless distributed database clients overcome obstacles and adjust to the changing landscape. Since getting his start as a search engine and RDBMS-based developer, Akira describes himself as, “All MongoDB, all the time.”

Kimberly Wilkins

MongoDB Technical Lead with 20+ years of experience managing and architecting databases

Kimberly has been a DBA, a Principal Engineer, an architect, and has built out and managed expert database teams across multiple data store offerings over her database years. She has worked with MongoDB customers of all sizes in many industries and helped them architect, deploy, troubleshoot, and tune their databases to handle heavy workloads and keep their applications running. She specializes in MongoDB sharding to help customers scale and thrive as their businesses grow in today’s big data world. Kimberly enjoys sharing her experiences at technical conferences in the US and abroad. Why? Because after all, “there is no perfect shard key.”

Guy Harrison

CTO, ProvenDB and Southbank Software 

Author, MongoDB Performance Tuning

Not only is Guy a founder and CTO, he is also an IT professional with experience in a range of disciplines, technologies and practices but probably best known both for his longstanding involvement in relational databases (Oracle and MySQL) and for emerging database technologies such as MongoDB and Blockchain.  Guy is also an expert on performance tuning and has written several books on that subject including “MongoDB Performance Tuning”, “Next Generation Databases” and “MySQL Stored Procedure Programming”.  He also writes the “MongoDB Matters” column for Database Trends and Applications 

Bill Reynolds

CTO/Co-founder of Qarbine specializing in BI solutions for enterprise investments in NoSQL databases like MongoDB

Bill has led product teams who have integrated with 23 different database APIs across many favors of NoSQL such as MongoDB to pure object oriented, to legacy SQL.

His companies have licensed database and reporting software to most of the Fortune 500 and many others worldwide. For over 3 years he has been applying that experience developing a native MongoDB detailed reporting and analysis suite.

Join Percona and Friends as they react to MongoDB.live!

Register For Free

Jun
21
2021
--

Discover Why MongoDB Runs Better With Percona

MongoDB Percona

MongoDB PerconaIn just under a month, MongoDB will host its annual event, MongoDB.live. And just over a month ago, Percona held its annual event Percona Live

Despite the naming convention similarity, these events couldn’t be more different!

Percona Live was an open source database software community event with 196 speakers and over 200 presentations. We platformed a huge range of people and companies that use and champion a variety of open source databases and tools. 

Although many people still think of MongoDB as open source, this is incorrect. The Open Source Initiative referred to MongoDB’s introduction of the Server Side Public License (SSPL) as “fauxpen” source license.

In 2019, MongoDB CEO Dev Ittycheria stated in an interview, “MongoDB was built by MongoDB. There was no prior art. So one: it speaks to the technical acumen of the team here. And two: we didn’t open source it to get help from the community, to make the product better… We open sourced as a freemium strategy; to drive adoption.” 

For many people, this is totally contrary to open source values and the practice of open source overall. 

The move away from open source and the community means that MongoDB has become increasingly closed off. Without market alternatives, MongoDB can become a monopoly. They can raise fees without competition and lock-in users. Some people believe that this is their intent with their planned new Quarterly Release Cycle, which will provide quarterly releases only to Atlas customers.

This is where Percona can help. 

We offer a viable and secure drop-in replacement for MongoDB Community with added enterprise-level features. Plus, market-leading support services and open source tools

Percona customers are not locked in and enjoy a lower total cost of ownership, with the freedom to move their data at any time, without fees or barriers.

For the next six weeks, we will be focusing on Percona’s MongoDB offering and all the benefits a move to Percona can bring to your business.

Highlights include:

  • Expert webinars on a variety of hot MongoDB topics
  • New market insight and thought leadership
  • In-depth technical blogs addressing key MongoDB pain-points
  • Percona and Friends React to MongoDB.live – a live stream on the July 15th where industry experts discuss the news and announcements coming from MongoDB.live

Our first webinar kicks off on June 29th as Percona experts Kimberly Wilkins, Mike Grayson, and Vinicius Grippa present ‘Unlocking the Mystery of MongoDB Shard Key Selection’ and offer advice on the measures to take if things go wrong. Please register now to attend for free.

Keep an eye on our blog and social channels for much more exciting content, insight, and events over the next few weeks. 

Jun
02
2021
--

Using the MongoDB Field Encryption Feature

MongoDB Field Encryption

MongoDB Field EncryptionOne of the main topics we have today is surely about security. On a daily routine, this can pass unnoticed, but sooner or later, we have to implement or work on some security guidelines. And today, we are going to discuss one of them, which is Field Encryption

Feature Introduction

Discussing the feature per se, it’s new in version 4.2+ releases only. And MongoDB provides two methods of Field Encryption, they are:

The automatic mode is available only on the Enterprise Edition and Atlas, while the manual method is supported on the Community Edition by the MongoDB drivers and mongo shell as well.

This article will use the Percona Server for MongoDB (PSMDB) running version 4.4 release with authentication enabled and the manual method in this article. As our objective here is to demonstrate the feature, we will use the mongo-shell to run all the operations.

However, for the application encrypting via driver is the most suggested approach. There is a detailed list of supported drivers for field-level encryption in the official documentation:

https://docs.mongodb.com/manual/core/security-client-side-encryption/#field-level-encryption-drivers

How To

   1 – To start using the feature, and illustration purpose, we will use the locally Managed Keyfile as our Key Management Service (KMS). 

It’s important to mention a local keyfile is quick, but lower security, thus not recommended in a production environment as it is stored alongside the database. For production, please consider using one of the following services:

As Key Management Service (KMS), MongoDB also supports:

  • Amazon Web Services KMS
  • Azure Key Vault
  • Google Cloud Platform KMS

Using locally Managed Keyfile, MongoDB requires specifying a file with base64-encoded 96-byte string with no line breaks[1]; Which can be created with the following example:

shell#> openssl rand -hex 50 | head -c 96 | base64 | tr -d '\n' > /localkeys/client.key
shell#> chmod 600 /localkeys/client.key
shell#> chown mongod:mongod /localkeys/client.key

Please make sure to save the keyfile in a secure location, to avoid losing it. Otherwise, you would not be able to decrypt or read it later.

   2 – Once we have the key file, let’s open a mongo shell session with the database without connecting yet. Using the option –nodb; also use –shell to execute supplied code (in this case the –eval string value) without an automatic exit at the end; This step is necessary as we have to load the key into an object which will become a database connection property latter. In the following example, we are loading our key file into the database variable LOCAL_KEY:

shell#> mongo --shell --nodb --eval "var LOCAL_KEY = cat('/localkeys/client.key')"

Percona Server for MongoDB shell version v4.4.3-5
type "help" for help

mongo> LOCAL_KEY
ODEyMTY2YmNmNDA4YWZlZWVhNTFmOTUyODk4YTJjODc1ODk0NTZiN2EzYWQwZDdjNmM4MDQ5ODUzYzRkMjlhNGZlM2UyZDVmMTNjZWQ1YjAyNjAwNzZmMmQ1ZjVkMzdi

   3 – Next, load the document ClientSideFieldLevelEncryptionOptions using the proper client-side field-level encryption configuration:

mongo> var ClientSideFieldLevelEncryptionOptions = {
"keyVaultNamespace" : "encryption.__dataKeys",
"kmsProviders" : {
  "local" : {
    "key" : BinData(0, LOCAL_KEY)
  }
}
}

   4 – After setting the variable; We can establish an FLE-enabled connection using the above variables via ClientSideFieldLevelEncryptionOptions on Mongo() constructor. This connection is also using normal username+password authentication as you can see in the mongodb://…/ URI string.

mongo> csfleDatabaseConnection = Mongo("mongodb://dba:secret@localhost:27017/?authSource=admin", ClientSideFieldLevelEncryptionOptions)
connection to localhost:27017

   5 – The next step will be to create the internal keyvault object per se; Until this moment, we were setting the variables and adjusting our client connections to proceed. It can be made as follow:

mongo> keyVault = csfleDatabaseConnection.getKeyVault();
{
"mongo" : connection to localhost:27017,
"keyColl" : encryption.__dataKeys
}

  5.1 – For extra illustration, as the above command does not generate any output; With a different session, we can check the newly created vault structure on the server: 

> show dbs
admin            0.000GB
config           0.000GB
encryption       0.000GB

> use encryption
switched to db encryption

> show collections
__dataKeys

   6 – With all set and vault structures deployed, let’s add a data encryption key to the database connection’s key vault. If successful, createKey() returns the UUID of the new data encryption key. This UUID which is a BSON Binary it’s our encryption key and will be used to encrypt the fields manually:

mongo> keyVault.createKey(
"local", /*Local-type key*/
"", /*Customer master key, used with external KMSes*/
[ "myFirstCSFLEDataKey" ]
)
UUID("5bd46d64-3fe8-4e31-a800-219eaa1b6a85")

   7 – Next, let’s insert a document using the above key to encrypt; Please note we did the above operations in the same mongo-shell, and to encrypt, we are going to use the encrypt() method along with the parameters to hide the SSN: “123-45-6789”.

    7.1 To use the encrypt() function on a field it expects the following 3 arguments: 

  •   encryptionKeyId – This is our generated key.
  •   Value – In this example, we are going to hide the “123-45-6789” value from SSN.
  •   encryptionAlgorithm – And what algorithm are we going to use to encrypt the field, we can choose between Deterministic Encryption[2] or Randomized Encryption[3]

    7.2 – With all set, we can insert the document encrypting the field as follows:

mongo> clientEncryption = csfleDatabaseConnection.getClientEncryption();
mongo> var csfleDB = csfleDatabaseConnection.getDB("percona");
mongo> csfleDB.getCollection("newcollection").insert({
"_id": 1,
"medRecNum": 1,
"firstName": "Jose",
"lastName": "Pereira",
"ssn": clientEncryption.encrypt(UUID("47130fb5-987c-4af0-9e83-5eaf672d608b"), "123-45-6789","AEAD_AES_256_CBC_HMAC_SHA_512-Random"),
"comment": "Jose Pereira's SSN encrypted."});

WriteResult({ "nInserted" : 1 })

Perfect! At this point, we were able to encrypt the field manually.

So, now you must be thinking –

How can I read that encrypted value?

Let demonstrate in the next section.

Reading an Encrypted Field

At this point, if we connect without passing the encryption configuration, we will not be able to read the information:

shell#> mongo "mongodb://dba:secret@localhost:27017/?authSource=admin"

mongo> use percona
switched to db percona

mongo-shell-2> show collections
newcollection

mongo-shell-2> db.newcollection.find().pretty()
{
"_id" : 1,
"medRecNum" : 1,
"firstName" : "Jose",
"lastName" : "Pereira",
"ssn" : BinData(6,"AkcTD7WYfErwnoNer2ctYIsCVXS2nJYpSEgYFlp8ORmZ1i9PO/RGELdm+XxZyN6+ls+KLeDu1LQFtIIJs1Bwy5AMnaA3Lf4qAfm0Nmov6Iwuqer67HV2nIQk6dIa98QFLXs="),
"comment" : "Jose Pereira's SSN encrypted."
}
exit

1 – To read, the client must load the ClientSideFieldLevelEncryptionOptions configuration into its session; By opening a connection, load the options into the variable, and use the Mongo() constructor to logging, similar as we did before, but at this time, without the requirement of key vault configuration:

shell># mongo --shell --nodb --eval "var LOCAL_KEY = cat('/localkeys/client.key')"
Percona Server for MongoDB shell version v4.4.3-5
type "help" for help

mongo> var ClientSideFieldLevelEncryptionOptions = {
"keyVaultNamespace" : "encryption.__dataKeys",
"kmsProviders" : {
"local" : {
"key" : BinData(0, LOCAL_KEY)
}}}


mongo> csfleDatabaseConnection = Mongo("mongodb://dba:secret@localhost:27017/?authSource=admin", ClientSideFieldLevelEncryptionOptions)
connection to localhost:27017

(that’s why is important to store the key in a safe place, as its encoding will be used on the read-write encrypt routine)

2 – Into the session, and with the configuration done, we can run a find()  to return the plain document:

mongo> percona = csfleDatabaseConnection.getDB("percona")
percona

mongo> newcollection = percona.getCollection("newcollection")
percona.newcollection

mongo> percona.newcollection.find().pretty()
{
"_id" : 1,
"medRecNum" : 1,
"firstName" : "Jose",
"lastName" : "Pereira",
"ssn" : "123-45-6789",
"comment" : "Jose Pereira's SSN encrypted."
}

Common Questions

When implementing a new feature, we also start thinking about scenarios we would like to understand before implementing it. Here are some questions that you might have after reading this article:

     1. Can a root user be able to read the fields?

    • No, what dictates if a user will be able to read encrypted fields is if the connection is loaded with the used encryption key; Without it, the user won’t be able to read, even if it has root privileges on the database.
mongo localhost:4420/admin -uroot -psekret --eval "db.getSiblingDB('percona').newcollection.findOne()" --quiet
{
"_id" : 1,
"medRecNum" : 1,
"firstName" : "Jose",
"lastName" : "Pereira",
"ssn" : BinData(6,"ArIjMQ7O9Uwanyv31U9RulQCZPt7IxoZpu6mu9ekXMRcsaMKZgkJypzwNkuY+HEOMRn3eU6BMTkM71Gm5KDqi4ERTP8ExEfRMHwuDNrDmGmb1q0QA+W7CL4iMOL6oSX79uc="),
"comment" : "Jose Pereira's SSN encrypted."
} 

    2. What happens if I lose the Key?

    • Deleting or losing the customer master key renders all data encryption keys encrypted with that as permanently unreadable, which in turn causes all values encrypted with those data encryption keys as permanently unreadable.

     3. How does it work in a ReplicaSet or Sharded Cluster?

    • The same behavior noted in the “Reading an Encrypted field” section happens for both configurations; Once inserted a document with an encrypted field, it is replicated throughout the nodes as it is; encrypted. And only a user with the correct key will be able to read it.
replset:SECONDARY> use percona
switched to db percona

replset:SECONDARY> rs.secondaryOk()

replset:SECONDARY> db.getSiblingDB('percona').newcollection.findOne()
{
"_id" : 1,
"medRecNum" : 1,
"firstName" : "Jose",
"lastName" : "Pereira",
"ssn" : BinData(6,"ArIjMQ7O9Uwanyv31U9RulQCZPt7IxoZpu6mu9ekXMRcsaMKZgkJypzwNkuY+HEOMRn3eU6BMTkM71Gm5KDqi4ERTP8ExEfRMHwuDNrDmGmb1q0QA+W7CL4iMOL6oSX79uc="),
"comment" : "Jose Pereira's SSN encrypted."
}

     4. Are there any limitations to using the feature?

    • Yes, in the official manual here. It has a comprehensive description of the limitations; We suggest you review the manual as limitations exist on different configurations as Shard Key, Unique Indexes, Collation, Views, and so on. 

And if you have any further questions, please feel free to share them with us in the comment section below.

Conclusion

Even with the restriction to use the Automatic Client-Side Field Level Encryption, the manual method and the feature per se have shown us to be an exciting option, mainly because it strengthens the security by allowing encrypting sensitive fields natively from MongoDB instead of using third-party tools. It is helping to reduce possible breaches a third-party tool can create.

Additionally, if you are looking for a list of security measures to implement to protect the MongoDB installation, you can use the MongoDB Security Checklist, which is a good starting point if you are not following any standard security policy.

References:

May
19
2021
--

Refreshing Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDB

Test/Dev Environments With Prod Data Using Percona Backup for MongoDBThis is a very straightforward article written with the intention to show you how easy it is to refresh your Test/Dev environments with PROD data, using Percona Backup for MongoDB (PBM). This article will cover all the steps from the PBM configuration until the restore, assuming that the PBM agents are all up and running on all the replica set members of either PROD and Dev/Test servers.

Taking the Backup on PROD

This step is quite simple and it demands no more than two commands:

1. Configuring the Backup

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'
$ pbm config --file /etc/pbm/pbm-s3.yaml
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpPROD
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

Backup list resync from the store has started

Important note on two things: I will address my backups to an S3 bucket and I am defining a prefix. When defining a prefix in the PBM storage configuration, a subdirectory will be automatically created and the backup files will be stored on that subdirectory instead of the root of the S3 bucket.

2. Taking the Backup

Having the PBM properly configured, it is time to take the backup. (You can skip this step if you already have PBM backups to use, of course.)

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm backup
Starting backup '2021-05-08T08:34:47Z'...................
Backup '2021-05-08T08:34:47Z' to remote store 's3://rafapbmtest/bpPROD' has started

And if we hit the PBM status command, we will see the snapshot running and when it is complete, the PBM status will show it as completed like below:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Configuring the PBM Space on a DEV/TEST Environment

All right, now my PROD has a proper backup routine configured. I will move one step forward and configure my PBM space but this time in a Dev/Test environment – named here as DEV.

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:50001/?replSetName=rbprepDEV?authSource=admin'

$ pbm config --file /etc/pbm/pbm-s3.yaml 
[Config set]
------
pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: us-west-1
    bucket: rafapbmtest
    prefix: bpDEV
    credentials:
      access-key-id: '***'
      secret-access-key: '***'

The backup list resync from the store has started.

Note that the S3 bucket is exactly the same where PROD is storing the backups but with a different prefix. If I hit a status command, I will see it is configured but no snapshots available yet:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
(none)

Lastly, note that the replica set name is exactly the same as PROD. If this was a sharded cluster, rather than a non-sharded replicaset, all the replica set names have to match in the target cluster. PBM is guided by the replica set name and if my DEV env had a different one, it would not be possible to load backup metadata from PROD to DEV

Transfering the Desired Backup Files

The next step will be transferring the backup files from the PROD prefix to the target prefix. I will use the AWS CLI to achieve that, but there is one important thing to keep in mind in advance: determining which files are referent to a certain backup set (snapshot). Let’s go back to the PBM status output taken in PROD previously:

$ export PBM_MONGODB_URI='mongodb://pbmuser:secretpwd@127.0.0.1:40001/?replSetName=rbprepPROD?authSource=admin'

$ pbm status
Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpPROD
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

The PBM snapshots are named with the timestamp from when the backup started. If we check at the S3 prefix where it is stored, we will see that the file’s names contain that timestamp in its name composition.

$ aws s3 ls s3://rafapbmtest/bpPROD/
2021-05-08 10:26:11          5 .pbm.init
2021-05-08 10:35:14       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:35:10      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:35:13        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

So, it will be easy now to know which file I have to copy.

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z.pbm.json

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.dump.s2

$ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2' 's3://rafapbmtest/bpDEV/'
copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.oplog.s2

Checking the DEV prefix:

$ aws s3 ls s3://rafapbmtest/bpDEV/
2021-05-08 10:43:59          5 .pbm.init
2021-05-08 10:52:02       1428 2021-05-08T08:34:47Z.pbm.json
2021-05-08 10:52:13      11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2
2021-05-08 10:52:24        949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2

The files are already there and PBM has already automatically loaded their metadata into the DEV PBM collections:

$ pbm status

Cluster:
========
bprepPROD:
  - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK

PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 us-west-1 rafapbmtest/bpDEV
  Snapshots:
    2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08]

Finally – Restoring It

Believing it or not, now comes the easiest part: the restore. It is only one command and nothing else:

$ pbm restore '2021-05-08T08:34:47Z'
....Restore of the snapshot from '2021-05-08T08:34:47Z' has started

Refreshing Dev/Test environments with PROD data is a very common and required task in corporations worldwide. I hope this article helps to clarify the practical questions regarding using PBM for it!

Apr
09
2021
--

Deploying a MongoDB Proof of Concept on Google Cloud Platform

Deploy MongoDB Google Cloud PlatformRecently, I needed to set up a Proof of Concept (POC) and wanted to do it on Google Cloud Platform (GCP).  After documenting the process, it seemed it might be helpful for others looking for the most basic guide possible to get a Mongo server up and running on GCP.  The process below will set up the latest version of Percona Server for MongoDB on a Virtual Machine (VM) in GCP.  This will be a minimal install for which to do further work.  I will also be utilizing the free account on GCP to do this.

The first step will be setting up your SSH access to the node.  On my Mac, I ran the following command which should work equally well on Linux:

ssh-keygen -t rsa -f ~/.ssh/gcp -C [USERNAME]

I named my key “gcp” in the example above but you can use an existing key or generate a new one with whatever name you want.

From there, you will want to login to the GCP console in a browser and do some simple configuration.  The first step will be to create a project and then add an instance.  You will also choose a Region and Zone.  And for our final basic configuration of our VM, choose the type of machine you want.  For my testing, an e2-medium is sufficient.  I will also accept default disk size and type.

configuration of our VM

Next, edit the instance details and go to the SSH Keys section and add your SSH key.  Your key will be a lot longer but will look something like the below:

Save out the details and take note of the public IP of the node.  Of course, you will want to test logging in using your key to ensure you can get into the server.  I tested my access with the below command, replacing your key name (gcp in my case), username, and public IP:

ssh -i ~/.ssh/gcp [USERNAME]@[PUBLIC IP]

Our next step will be to install Percona Server for MongoDB.  We will do this as painlessly as possible using Percona’s RPMs.  We will start by setting up the repo:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
sudo percona-release enable psmdb-44 release

With the repo configured, we will install MongoDB with the following command:

sudo yum install percona-server-mongodb

You will likely want to enable the service:

sudo systemctl enable mongod

By default, MongoDB does not enable authentication to access it.  If you want to do this, you can use the following command to setup access:

sudo /usr/bin/percona-server-mongodb-enable-auth.sh

Here’s more information on enabling authentication on Percona Server for MongoDB.

Again, this is the most basic installation of Percona Server for MongoDB on the Google Cloud Platform.  This guide was created for those looking for the basic introduction to both platforms and just want to get their proverbial hands dirty with a basic POC.


Our Percona Distribution for MongoDB is the only truly open-source solution powerful enough for enterprise applications. It’s free to use, so try it today!

Mar
30
2021
--

What’s Running in My DB? A Journey with currentOp() in MongoDB

currentOp() in MongoDB

currentOp() in MongoDBI have been working a while with customers, supporting both MongoDB and MySQL technologies. Most of the time when an issue arises, the customers working with MySQL collect most of the information happening in the DB server, including all the queries running that particular time, using “show full processlist;” This information would help us to look at the problem, like which queries are taking the time and where it was spending the time. 

But for MongoDB, most of the time we don’t receive this (in-progress operations) information. And we had to check with long queries logged into the MongoDB log file and, of course, it writes most of the things like planSummary (whether it used the index or not), documents/index scanned, time to complete, etc. It’s like doing a postmortem rather than checking the issue happening in real-time. Actually collecting the information about operations taking the time or finding a problematic query while the issue is happening could help you find the right one to kill (to release the pressure) or check the situation of the database. 

The in-progress operations in MongoDB can be checked via the database command currentOp(). The level of information can be controlled via the options passed through it. Most of the time, the output from it is not that interesting to check because it contains a lot of information, making it difficult to spot the ones we need. However, MongoDB knows this and has included many options to filter the operations using currentOp over multiple versions easily. Some of the information regarding this is mentioned in the below release notes:

https://docs.mongodb.com/manual/release-notes/3.6/#new-aggregation-stages 

https://docs.mongodb.com/manual/release-notes/4.0/#id26

https://docs.mongodb.com/manual/release-notes/4.2/#currentop 

In this blog, I will share some tricks to work with this command and fetch the operations that we need to check. This would help a person check the ongoing operations and if necessary, kill the problematic command – if they wish.

Introduction

The database command ` provides information about the ongoing/currently running operations in the database. It must be run against the admin database. On servers that run with authorization, you need the inprog privilege action to view operations for all users. This is included in the built-in clusterMonitor role.

Use Cases

The command to see all the active connections:

db.currentOp()

The user that has no inprog privilege can view its own operations, without this privilege, with the below command:

db.currentOp( { "$ownOps": true } )

To see the connections in the background, and idle connections, you can use either one of the below commands:

db.currentOp(true)
db.currentOp( { "$all": true } )

As I said before, you can use filters here to check the operations you need, like a command running for more than a few seconds, waiting for a lock, active/inactive connections, running on a particular namespace, etc. Let’s see some examples from my test environment.

The below command provides information about all active connections. 

mongos> db.currentOp()
{
	"inprog" : [
		{
			"shard" : "shard01",
			"host" : "bm-support01.bm.int.percona.com:54012",
			"desc" : "conn52",
			"connectionId" : 52,
			"client_s" : "127.0.0.1:53338",
			"appName" : "MongoDB Shell",
			"clientMetadata" : {
				"application" : {
					"name" : "MongoDB Shell"
				},
				"driver" : {
					"name" : "MongoDB Internal Client",
					"version" : "4.0.19-12"
				},
				"os" : {
					"type" : "Linux",
					"name" : "CentOS Linux release 7.9.2009 (Core)",
					"architecture" : "x86_64",
					"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
				},
				"mongos" : {
					"host" : "bm-support01.bm.int.percona.com:54010",
					"client" : "127.0.0.1:36018",
					"version" : "4.0.19-12"
				}
			},
			"active" : true,
			"currentOpTime" : "2021-03-21T23:41:48.206-0400",
			"opid" : "shard01:1404",
			"lsid" : {
				"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
				"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
			},
			"secs_running" : NumberLong(0),
			"microsecs_running" : NumberLong(180),
			"op" : "getmore",
			"ns" : "admin.$cmd",
			"command" : {
				"getMore" : NumberLong("8620961729688473960"),
				"collection" : "$cmd.aggregate",
				"batchSize" : NumberLong(101),
				"lsid" : {
					"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
					"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1616384506, 2),
					"signature" : {
						"hash" : BinData(0,"z/r5Z/DxrxaeH1VIKOzeok06YxY="),
						"keyId" : NumberLong("6942317981145759774")
					}
				},
				"$client" : {
					"application" : {
						"name" : "MongoDB Shell"
					},
					"driver" : {
						"name" : "MongoDB Internal Client",
						"version" : "4.0.19-12"
					},
					"os" : {
						"type" : "Linux",
						"name" : "CentOS Linux release 7.9.2009 (Core)",
						"architecture" : "x86_64",
						"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
					},
					"mongos" : {
						"host" : "bm-support01.bm.int.percona.com:54010",
						"client" : "127.0.0.1:36018",
						"version" : "4.0.19-12"
					}
				},
				"$configServerState" : {
					"opTime" : {
						"ts" : Timestamp(1616384506, 2),
						"t" : NumberLong(1)
					}
				},
				"$db" : "admin"
			},
			"originatingCommand" : {
				"aggregate" : 1,
				"pipeline" : [
					{
						"$currentOp" : {
							"allUsers" : true,
							"truncateOps" : true
						}
					},
					{
						"$sort" : {
							"shard" : 1
						}
					}
				],
				"fromMongos" : true,
				"needsMerge" : true,
				"mergeByPBRT" : false,
				"cursor" : {
					"batchSize" : 0
				},
				"allowImplicitCollectionCreation" : true,
				"lsid" : {
					"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
					"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1616384506, 2),
					"signature" : {
						"hash" : BinData(0,"z/r5Z/DxrxaeH1VIKOzeok06YxY="),
						"keyId" : NumberLong("6942317981145759774")
					}
				},
				"$client" : {
					"application" : {
						"name" : "MongoDB Shell"
					},
					"driver" : {
						"name" : "MongoDB Internal Client",
						"version" : "4.0.19-12"
					},
					"os" : {
						"type" : "Linux",
						"name" : "CentOS Linux release 7.9.2009 (Core)",
						"architecture" : "x86_64",
						"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
					},
					"mongos" : {
						"host" : "bm-support01.bm.int.percona.com:54010",
						"client" : "127.0.0.1:36018",
						"version" : "4.0.19-12"
					}
				},
				"$configServerState" : {
					"opTime" : {
						"ts" : Timestamp(1616384506, 2),
						"t" : NumberLong(1)
					}
				},
				"$db" : "admin"
			},
			"numYields" : 0,
			"locks" : {
				
			},
			"waitingForLock" : false,
			"lockStats" : {
				
			}
		},
		{
			"shard" : "shard01",
			"host" : "bm-support01.bm.int.percona.com:54012",
			"desc" : "monitoring keys for HMAC",
…
...

Some of the important parameters that we may need to focus on from the output are as follows. I provide this information here as we will use these parameters to filter for the operations that we need.

PARAMETER DESCRIPTION
host The host that the operation is running
opid The operation id (it is used to kill that operation) 
active The connection’s status. True if it is running and false if it is idle
client Host/IP information about where the operation originated
clientMetadata Provides more information about client connection
shard Which shard is connected if it is sharded cluster environment
appName Information about the type of client
currentOpTime Start time of the operation
ns Namespace (details about the DB and collection)
command A document with the full command object associated with the operation
secs_running / microsecs_running How many seconds/microseconds that the particular operation is running
op Operation type like insert, update, find, delete etc
planSummary Whether the command uses the index IXSCAN or collection scan COLLSCAN (disk read)
cursor Cursor information for getmore operations
locks Type and mode of the lock. See here for more details
waitingForLock True if the operation waiting for a lock, false if it has required lock
msg A message that describes the status and progress of the operation
killPending Whether the operation is currently flagged for termination
numYields Is a counter that reports the number of times the operation has yielded to allow other operation

The raw currentOp output can be processed by the javascript forEach function method in the mongo shell, so we can use it to do many operations. For example, I want to take counts of the output or number of active connections. Then I can use the below one:

mongos> var c=1;
mongos> db.currentOp().inprog.forEach(
... function(doc){
...   c=c+1
... }
... )
mongos> print("The total number of active connections are: "+c)
The total number of active connections are: 16

To find the number of active and inactive connections:

mongos> var active=1; var inactive=1;
mongos> db.currentOp(true).inprog.forEach( function(doc){  if(doc.active){    active=active+1 }  else if(!doc.active){    inactive=inactive+1 }  } )
mongos> print("The number of active connections are: "+active+"\nThe number of inactive connections are: "+inactive)
The number of active connections are: 16
The number of inactive connections are: 118

To find the operations running (importing job) more than 1000 microseconds (for seconds, use secs_running) and with a specific namespace vinodh.testColl:

mongos> db.currentOp(true).inprog.forEach( function(doc){ if(doc.microsecs_running>1000 && doc.ns == "vinodh.testColl")  {print("\nop: "+doc.op+", namespace: "+doc.ns+", \ncommand: ");printjson(doc.command)} } )

op: insert, namespace: vinodh.testColl, 
command: 
{
  "$truncated" : "{ insert: \"testColl\", bypassDocumentValidation: false, ordered: false, documents: [ { _id: ObjectId('605a1ab05c15f7d2046d5d26'), id: 49004, name: \"Vernon Drake\", age: 19, emails: [ \"fetome@liek.gh\", \"noddo@ve.kh\", \"wunin@cu.ci\" ], born_in: \"1973\", ip_addresses: [ \"212.199.110.72\" ], blob: BinData(0, 4736735553674F6E6825) }, { _id: ObjectId('605a1ab05c15f7d2046d5d27'), id: 49003, name: \"Rhoda Burke\", age: 64, emails: [ \"zog@borvelaj.pa\", \"hoz@ni.do\", \"abfad@borup.cl\" ], born_in: \"1976\", ip_addresses: [ \"12.190.161.2\", \"16.63.87.211\" ], blob: BinData(0, 244C586A683244744F54) }, { _id: ObjectId('605a1ab05c15f7d2046d5d28'), id: 49002, name: \"Alberta Mack\", age: 25, emails: [ \"sibef@nuvaki.sn\", \"erusu@dimpu.ag\", \"miumurup@se.ir\" ], born_in: \"1971\", ip_addresses: [ \"250.239.181.203\", \"192.240.119.122\", \"196.13.33.240\" ], blob: BinData(0, 7A63566B42732659236D) }, { _id: ObjectId('605a1ab05c15f7d2046d5d29'), id: 49005, name: \"Minnie Chapman\", age: 33, emails: [ \"jirgenor@esevepu.edu\", \"jo@m..."
}

But this command can be easily written without forEach as follows directly as well:

mongos> db.currentOp({ "active": true, "microsecs_running": {$gt: 1000}, "ns": /^vinodh.testColl/ })
{
  "inprog" : [
    {
      "shard" : "shard01",
      "host" : "bm-support01.bm.int.percona.com:54012",
      "desc" : "conn268",
      "connectionId" : 268,
      "client_s" : "127.0.0.1:55480",
      "active" : true,
      "currentOpTime" : "2021-03-23T13:05:32.550-0400",
      "opid" : "shard01:689582",
      "secs_running" : NumberLong(0),
      "microsecs_running" : NumberLong(44996),
      "op" : "insert",
      "ns" : "vinodh.testColl",
      "command" : {
        "$truncated" : "{ insert: \"testColl\", bypassDocumentValidation: false, ordered: false, documents: [ { _id: ObjectId('605a1fdc5c15f7d2047ee04e'), id: 16002, name: \"Linnie Walsh\", age: 25, emails: [ \"evoludecu@logejvi.ai\", \"ilahubfep@ud.mc\", \"siujo@pipazvo.ht\" ], born_in: \"1982\", ip_addresses: [ \"198.117.218.117\" ], blob: BinData(0, 244A6E702A5047405149) }, { _id: ObjectId('605a1fdc5c15f7d2047ee04f'), id: 16004, name: \"Larry Watts\", age: 47, emails: [ \"sa@hulub.gy\", \"wepo@ruvnuhej.om\", \"jorvohki@nobajmo.hr\" ], born_in: \"1989\", ip_addresses: [], blob: BinData(0, 50507461366B6F766C40) }, { _id: ObjectId('605a1fdc5c15f7d2047ee050'), id: 16003, name: \"Alejandro Jacobs\", age: 61, emails: [ \"enijaze@hihen.et\", \"gekesaco@kockod.fk\", \"rohovus@il.az\" ], born_in: \"1988\", ip_addresses: [ \"239.139.123.44\", \"168.34.26.236\", \"123.230.33.251\", \"132.222.43.251\" ], blob: BinData(0, 32213574705938385077) }, { _id: ObjectId('605a1fdc5c15f7d2047ee051'), id: 16005, name: \"Mildred French\", age: 20, emails: [ \"totfi@su.mn\"..."
      },
      "numYields" : 0,
      "locks" : {
        
      },
      "waitingForLock" : false,
      "lockStats" : {
        "Global" : {
          "acquireCount" : {
            "r" : NumberLong(16),
            "w" : NumberLong(16)
          }
        },
        "Database" : {
          "acquireCount" : {
            "w" : NumberLong(16)
          }
…

The operations waiting for the lock on a specific namespace (ns) / operation (op) can be filtered as follows, and you can alter the parameters to filter as you wish:

db.currentOp(
   {
     "waitingForLock" : true,
    "ns": /^vinodh.testColl/,
     $or: [
        { "op" : { "$in" : [ "insert", "update", "remove" ] } },
        { "command.findandmodify": { $exists: true } }
    ]
   }
)

Aggregate – currentOp():

Starting with MongoDB 3.6, currentOp method is supported in aggregation. So checking the currentOp is even easier with this method. Also, the aggregation pipeline doesn’t have a 16MB result size limit as well. The usage is:

{ $currentOp: { allUsers: <boolean>, idleConnections: <boolean>, idleCursors: <boolean>, idleSessions: <boolean>, localOps: <boolean> } }

Note:

Options/Features added, version-wise, to currentOp()

  • allUsers, idleConnections – available from 3.6,
  • idleCursors – available from 4.2
  • idleSessions, localOps – available from 4.0

Let’s see an example of the same. Count all connections including idle connections with shard02:

mongos> db.aggregate( [ { $currentOp : { allUsers: true, idleConnections: true } },    
... { $match : { shard: "shard02" }}, {$group: {_id:"shard02", count: {$sum: 1}} } ] )
{ "_id" : "shard02", "count" : 65 }

Now using the same import job, finding the operation as follows:

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    
... { $match : { "ns": "vinodh.testColl" }} ] )
{ "shard" : "shard01", "host" : "bm-support01.bm.int.percona.com:54012", "desc" : "conn279", "connectionId" : 279, "client_s" : "127.0.0.1:38564", "active" : true, "currentOpTime" : "2021-03-23T13:33:57.225-0400", "opid" : "shard01:722388", "secs_running" : NumberLong(0), "microsecs_running" : NumberLong(24668), "op" : "insert", "ns" : "vinodh.testColl", "command" : { "insert" : "testColl", "bypassDocumentValidation" : false, "ordered" : false, "documents" : [ { "_id" : ObjectId("605a26855c15f7d20484d217"), "id" : 12020, "name" : "Dora Watson",....tId("000000000000000000000000") ], "writeConcern" : { "getLastError" : 1, "w" : "majority" }, "allowImplicitCollectionCreation" : false, "$clusterTime" : { "clusterTime" : Timestamp(1616520837, 1000), "signature" : { "hash" : BinData(0,"yze8dSs12MUKlnb7rpw5h2YblFI="), "keyId" : NumberLong("6942317981145759774") } }, "$configServerState" : { "opTime" : { "ts" : Timestamp(1616520835, 10), "t" : NumberLong(2) } }, "$db" : "vinodh" }, "numYields" : 0, "locks" : { "Global" : "w", "Database" : "w", "Collection" : "w" }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(8), "w" : NumberLong(8) } }, "Database" : { "acquireCount" : { "w" : NumberLong(8) } }, "Collection" : { "acquireCount" : { "w" : NumberLong(8) } } } }

To reduce the output and project only some fields in the output:

mongos> db.aggregate( [    
... { $currentOp : { allUsers: true, idleConnections: false } },    
... { $match : { ns: "vinodh.testColl", microsecs_running: {$gt: 10000} }}, 
... {$project: { _id:0, host:1, opid:1, secs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] )
{ "host" : "bm-support01.bm.int.percona.com:54012", "opid" : "shard01:777387", "secs_running" : NumberLong(0), "op" : "insert", "ns" : "vinodh.testColl", "numYields" : 0, "waitingForLock" : false }

To see the output in fantasy mode, used to be pretty ?

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    { $match : { ns: "vinodh.testColl", microsecs_running: {$gt: 10000} }}, {$project: { _id:0, host:1, opid:1, secs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] ).pretty()
{
	"host" : "bm-support01.bm.int.percona.com:54012",
	"opid" : "shard01:801285",
	"secs_running" : NumberLong(0),
	"op" : "insert",
	"ns" : "vinodh.testColl",
	"numYields" : 0,
	"waitingForLock" : false
}

I hope now you will have some idea on using currentOp() to check the ongoing operations. 

Let’s imagine you want to kill an operation running for a long time. From the same currentOp document you identified it with, you can take the opid and kill it using killOp() method. In the example below, I used the sharded environment and so the opid is in a “shard_no:opid” format. See here for more details.

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    { $match : { ns: "vinodh.testColl" }}, {$project: { _id:0, host:1, opid:1, microsecs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] ).pretty()
{
	"host" : "bm-support01.bm.int.percona.com:54012",
	"opid" : "shard01:1355440",
	"microsecs_running" : NumberLong(39200),
	"op" : "insert",
	"ns" : "vinodh.testColl",
	"numYields" : 0,
	"waitingForLock" : false
}


mongos> db.killOp("shard01:1355440")
{
	"shard" : "shard01",
	"shardid" : 1355440,
	"ok" : 1,
	"operationTime" : Timestamp(1616525284, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1616525284, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

Conclusion

So the next time when you want to check the ongoing operations, you can use these techniques for filtering operations waiting for a lock, running on a namespace, running more than a specified time, specific operation or specific shard, etc. Also, comment here if you have any other ideas on this topic. I am happy to learn/see that as well.


Percona Distribution for MongoDB is the only truly open-source solution powerful enough for enterprise applications.

It’s free to use, so try it today!

Jan
22
2021
--

Getting Started with MongoDB in OpenShift Using the Percona Server for MongoDB Kubernetes Operator

MongoDB in OpenShift

MongoDB in OpenShiftKubernetes has become an integral part of many architectures and has continued to rise in popularity.  Kubernetes is an open source platform for managing containerized workloads and services.  One approach to running Kubernetes is Redhat’s OpenShift.  OpenShift is RedHat’s name for its family of containerization products.  The main part of OpenShift and the product we’ll be writing about today is the OpenShift Container Platform, an on-premises platform as a service built around Docker containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux.

The other products provide this platform through different environments: OKD serves as the community-driven upstream (akin to the way that Fedora is upstream of Red Hat Enterprise Linux), OpenShift Online is the platform offered as software as a service, and OpenShift Dedicated is the platform offered as a managed service.

Operators are extensions to Kubernetes that allow for the management of a particular piece of software in a Kubernetes cluster.   In this blog post, we’ll help you get started with the Percona Kubernetes Operator for Percona Server for MongoDB.

Getting Started with OpenShift

The OpenShift Container Platform can be installed in many different ways and on many different platforms, whether on-premises or in a public cloud.  I’ve found the following guides helpful for installing on AWS, Azure, GCP, and Bare Metal.

Advantages of a Kubernetes Operator

You may be aware that you can easily spin up a containerized MongoDB instance on your container of choice.   So what advantages does a Kubernetes Operator provide?  In this case, the Operator can provide a lot of  Operational tasks that aren’t always as straightforward to do in a Kubernetes cluster context.  This includes, but is not limited to, creating users needed for the Operator, performing upgrades, initializing and configuring replica sets, scaling replica sets in and out, performing backups and restores, and, as of version 1.6.0 of the Percona Kubernetes Operator for Percona Server for MongoDB, setting up a 1 shard sharded cluster with all the networking needed.

Installing the Operator

First, we need to download the Operator from Github and change it to the directory that the operator was downloaded to.  Note that you can change the branch that you pull down by changing the value after the -b flag if you need a different version of the Operator.   At the time of writing, version 1.6.0 is the latest version of the Operator.

git clone -b v1.6.0 https://github.com/percona/percona-server-mongodb-operator

cd percona-server-mongodb-operator

After we’ve downloaded the Operator, we need to set up the Custom Resource Definitions (CRD) that the operator uses.  This CRD file tells Kubernetes about the resources which the Operator will be using.  To install these, run the following against your OpenShift cluster as a cluster admin user:

oc apply -f deploy/crd.yaml

There are times when you will not want your database cluster to be managed by a cluster admin user.   To have your Percona Server for MongoDB cluster Custom Resource (CR) managed with a non-privileged Kubernetes user, you would need to run the following:

oc create clusterrole psmdb-admin --verb="*" --resource=perconaservermongodbs.psmdb.percona.com,perconaservermongodbs.psmdb.percona.com/status,perconaservermongodbbackups.psmdb.percona.com,perconaservermongodbbackups.psmdb.percona.com/status,perconaservermongodbrestores.psmdb.percona.com,perconaservermongodbrestores.psmdb.percona.com/status

oc adm policy add-cluster-role-to-user psmdb-admin <some-user>

Additionally, if you have a cert-manager installed, there will be two more commands needed to be able to manage certificates with a non-privileged user.

oc create clusterrole cert-admin --verb="*" --resource=issuers.certmanager.k8s.io,certificates.certmanager.k8s.io

oc adm policy add-cluster-role-to-user cert-admin <some-user>

Creating a Project for your MongoDB Database

Next, we’ll need to create an OpenShift Project for your MongoDB Deployment.  In OpenShift, projects/namespaces are used to allow a community of users to organize and manage their content in isolation from other communities.   These projects may map to individual applications, pieces of software, or whole application stacks.

oc new-project psmdb

Configuring the Operator

The first part of configuring the Operator involves setting up Role-Based Access Control (RBAC) to define what actions can be performed by the Operator.  For example, a Role used by cert-manager should only have permissions on resources that are related to certificates and shouldn’t be granted access to say, stop and start a MongoDB deployment.

oc apply -f deploy/rbac.yaml

Next, we’ll start the Operator container.  This YAML file contains the definitions for the Operator itself.

oc apply -f deploy/operator.yaml

Defining Users and Secrets

Now we need to define the users and secrets that will be created in our MongoDB deployment.   These are administrative users that will perform different tasks as part of the Operator.  These users are User Admins for creating additional users, clusterAdmin for adding or removing replica set members, cluster monitor for monitoring specific parts of the replica set/sharded cluster by Percona Monitoring and Management (PMM), and a backup user for performing backups and restores.   Make sure you update these to match the required security policies for your organization. Don’t forget to rotate passwords according to security policies during the life of your cluster.

oc create -f deploy/secrets.yaml

Configuring Cluster Custom Resource for OpenShift and Deploying

First, we need to update deploy/cr.yaml to have our platform be openshift. It is needed because OpenShift (compared to vanilla Kubernetes) has important security settings that are applied by default:

spec:
  # platform: openshift

To:

spec:
  platform: openshift

Optionally, if you’re using MiniShift, which runs on an older version of OpenShift, an older version of Kubernetes and is not suitable for production usage, you need to change all instances of the following in your deploy/cr.yaml:

affinity:
     antiAffinityTopologyKey: "kubernetes.io/hostname"

To:

  affinity:
     antiAffinityTopologyKey: "none"

This CR file also defines how much CPU and Memory we’re granting to each container, and lets us define how many replica set members, config server replica set members, mongos’ we have and lets us add configuration options to our MongoDB-related processes.  Finally, we can apply the CR file using the following command:

oc apply -f deploy/cr.yaml

As of version 1.6 of the Operator, if you stick with all the defaults and your OpenShift cluster has sufficient memory and CPU, this will deploy a 1 shard sharded MongoDB cluster to your OpenShift cluster.

To verify your cluster is up and running, run the following command:

oc get pods

Depending on the resources for your OpenShift cluster, after some time, your results should look similar to this:

mgrayson@Michaels-MBP percona-server-mongodb-operator % oc get pods
NAME                                              READY     STATUS    RESTARTS   AGE
my-cluster-name-cfg-0                             2/2       Running   0          51s
my-cluster-name-cfg-1                             2/2       Running   0          30s
my-cluster-name-cfg-2                             2/2       Running   0          15s
my-cluster-name-mongos-6d84cb48c9-c6hxv           1/1       Running   0          48s
my-cluster-name-mongos-6d84cb48c9-hhrx7           1/1       Running   0          48s
my-cluster-name-mongos-6d84cb48c9-z6j5z           1/1       Running   0          48s
my-cluster-name-rs0-0                             2/2       Running   0          53s
my-cluster-name-rs0-1                             2/2       Running   0          36s
my-cluster-name-rs0-2                             2/2       Running   0          21s
percona-server-mongodb-operator-885dcbd4f-6nc7c   1/1       Running   0          1m

Connecting to Your Cluster

So, now that we have a working cluster, how do we connect to it?   The cluster is not available outside of the Kubernetes cluster by default, so we need to spin up a MongoDB client container to be able to connect to the cluster.  You can use the following commands to create a container with a MongoDB client:

oc run -i --rm --tty percona-client --image=percona/percona-server-mongodb:4.2.11-12 --restart=Never -- bash -il

Once you are in the container you can run the following command to connect to your cluster, substituting the username and password as appropriate:

mongo "mongodb://userAdmin:userAdmin123456@my-cluster-name-mongos.psmdb.svc.cluster.local/admin?ssl=false"

Summary:

We hope this has been helpful in getting you started using our Percona Kubernetes Operator for Percona Server for MongoDB on RedHat’s OpenShift Container Platform.   Thanks for reading!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com