Apr
09
2021
--

Deploying a MongoDB Proof of Concept on Google Cloud Platform

Deploy MongoDB Google Cloud PlatformRecently, I needed to set up a Proof of Concept (POC) and wanted to do it on Google Cloud Platform (GCP).  After documenting the process, it seemed it might be helpful for others looking for the most basic guide possible to get a Mongo server up and running on GCP.  The process below will set up the latest version of Percona Server for MongoDB on a Virtual Machine (VM) in GCP.  This will be a minimal install for which to do further work.  I will also be utilizing the free account on GCP to do this.

The first step will be setting up your SSH access to the node.  On my Mac, I ran the following command which should work equally well on Linux:

ssh-keygen -t rsa -f ~/.ssh/gcp -C [USERNAME]

I named my key “gcp” in the example above but you can use an existing key or generate a new one with whatever name you want.

From there, you will want to login to the GCP console in a browser and do some simple configuration.  The first step will be to create a project and then add an instance.  You will also choose a Region and Zone.  And for our final basic configuration of our VM, choose the type of machine you want.  For my testing, an e2-medium is sufficient.  I will also accept default disk size and type.

configuration of our VM

Next, edit the instance details and go to the SSH Keys section and add your SSH key.  Your key will be a lot longer but will look something like the below:

Save out the details and take note of the public IP of the node.  Of course, you will want to test logging in using your key to ensure you can get into the server.  I tested my access with the below command, replacing your key name (gcp in my case), username, and public IP:

ssh -i ~/.ssh/gcp [USERNAME]@[PUBLIC IP]

Our next step will be to install Percona Server for MongoDB.  We will do this as painlessly as possible using Percona’s RPMs.  We will start by setting up the repo:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
sudo percona-release enable psmdb-44 release

With the repo configured, we will install MongoDB with the following command:

sudo yum install percona-server-mongodb

You will likely want to enable the service:

sudo systemctl enable mongod

By default, MongoDB does not enable authentication to access it.  If you want to do this, you can use the following command to setup access:

sudo /usr/bin/percona-server-mongodb-enable-auth.sh

Here’s more information on enabling authentication on Percona Server for MongoDB.

Again, this is the most basic installation of Percona Server for MongoDB on the Google Cloud Platform.  This guide was created for those looking for the basic introduction to both platforms and just want to get their proverbial hands dirty with a basic POC.


Our Percona Distribution for MongoDB is the only truly open-source solution powerful enough for enterprise applications. It’s free to use, so try it today!

Apr
07
2021
--

Percona Kubernetes Operators and Azure Blob Storage

Percona Kubernetes Operators and Azure Blob Storage

Percona Kubernetes Operators allow users to simplify deployment and management of MongoDB and MySQL databases on Kubernetes. Both operators allow users to store backups on S3-compatible storage and leverage Percona XtraBackup and Percona Backup for MongoDB to deliver backup and restore functionality. Both backup tools do not work with Azure Blob Storage, which is not compatible with the S3 protocol.

This blog post explains how to run Percona Kubernetes Operators along with MinIO Gateway on Azure Kubernetes Service (AKS) and store backups on Azure Blob Storage:

Percona Kubernetes Operators along with MinIO Gateway

Setup

Prerequisites:

  • Azure account
  • Azure Blob Storage account and container (the Bucket in AWS terms)
  • Cluster deployed with Azure Kubernetes Service (AKS)

Deploy MinIO Gateway

I have prepared the manifest to deploy the MinIO gateway to Kubernetes, you can find them in the Github repo here.

First, create a separate namespace:

kubectl create namespace minio-gw

Create the secret which contains credentials for Azure Blob Storage:

$ cat minio-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: minio-secret
stringData:
  AZURE_ACCOUNT_NAME: Azure_account_name
  AZURE_ACCOUNT_KEY: Azure_account_key

$ kubectl -n minio-gw apply -f minio-secret.yaml

Apply

minio-gateway.yaml

 from the repository. This manifest does two things:

  1. Creates MinIO Pod backed by Deployment object
  2. Exposes this Pod on port 9000 as a ClusterIP through a Service object
$ kubectl -n minio-gw apply -f blog-data/operators-azure-blob/minio-gateway.yaml

It is also possible to use Helm Charts and deploy the Gateway with MinIO Operator. You can read more about it here. Running a MinIO Operator might be a good choice, but it is an overkill for this blog post.

Deploy PXC

Get the code from Github:

git clone -b v1.7.0 https://github.com/percona/percona-xtradb-cluster-operator

Deploy the bundle with Custom Resource Definitions:

cd percona-xtradb-cluster-operator 
kubectl apply -f deploy/bundle.yaml

Create the Secret object for backup. You should use the same Azure Account Name and Key that you used to setup MinIO:

$ cat deploy/backup-s3.yam
apiVersion: v1
kind: Secret
metadata:
  name: azure-backup
type: Opaque
data:
  AWS_ACCESS_KEY_ID: BASE64_ENCODED_AZURE_ACCOUNT_NAME
  AWS_SECRET_ACCESS_KEY: BASE64_ENCODED_AZURE_ACCOUNT_KEY

Add storage configuration into

cr.yaml

 under

spec.backup.storages

.

storages:
  azure-minio:
    type: s3
    s3:
      bucket: test-azure-container
      credentialsSecret: azure-backup
      endpointUrl: http://minio-gateway-svc.minio-gw:9000

  • bucket

    is the container created on Azure Blob Storage.

  • endpointUrl

    must point to the MinIO Gateway service that was created in the previous section.

Deploy the database cluster:

$ kubectl apply -f deploy/cr.yaml

Read more about the installation of the Percona XtraDB Cluster Operator in our documentation.

Take Backups and Restore

To take the backup or restore, follow the regular approach by creating corresponding

pxc-backup

or

pxc-restore

Custom Resources in Kubernetes. For example, to take the backup I use the following manifest:

$ cat deploy/backup/backup.yaml
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: backup1
spec:
  pxcCluster: cluster1
  storageName: azure-minio

This creates the Custom Resource object

pxc-backup

and the Operator uploads the backup to the Container in my Storage account:

Read more about backup and restore functionality in the Percona Kubernetes Operator for Percona XtraDB Cluster documentation.

Conclusion

Even though Azure Blob Storage is not S3-compatible, Cloud Native landscape provides production-ready tools for seamless integration. MinIO Gateway will work for both Percona Kubernetes Operators for MySQL and MongoDB, enabling S3-like backup and restore functionality.

The Percona team is committed to delivering smooth integration for its software products for all major clouds. Adding support for Azure Blob Storage is on the roadmap of Percona XtraBackup and Percona Backup for MongoDB, so as the certification on Azure Kubernetes Service for both operators.

Mar
30
2021
--

What’s Running in My DB? A Journey with currentOp() in MongoDB

currentOp() in MongoDB

currentOp() in MongoDBI have been working a while with customers, supporting both MongoDB and MySQL technologies. Most of the time when an issue arises, the customers working with MySQL collect most of the information happening in the DB server, including all the queries running that particular time, using “show full processlist;” This information would help us to look at the problem, like which queries are taking the time and where it was spending the time. 

But for MongoDB, most of the time we don’t receive this (in-progress operations) information. And we had to check with long queries logged into the MongoDB log file and, of course, it writes most of the things like planSummary (whether it used the index or not), documents/index scanned, time to complete, etc. It’s like doing a postmortem rather than checking the issue happening in real-time. Actually collecting the information about operations taking the time or finding a problematic query while the issue is happening could help you find the right one to kill (to release the pressure) or check the situation of the database. 

The in-progress operations in MongoDB can be checked via the database command currentOp(). The level of information can be controlled via the options passed through it. Most of the time, the output from it is not that interesting to check because it contains a lot of information, making it difficult to spot the ones we need. However, MongoDB knows this and has included many options to filter the operations using currentOp over multiple versions easily. Some of the information regarding this is mentioned in the below release notes:

https://docs.mongodb.com/manual/release-notes/3.6/#new-aggregation-stages 

https://docs.mongodb.com/manual/release-notes/4.0/#id26

https://docs.mongodb.com/manual/release-notes/4.2/#currentop 

In this blog, I will share some tricks to work with this command and fetch the operations that we need to check. This would help a person check the ongoing operations and if necessary, kill the problematic command – if they wish.

Introduction

The database command ` provides information about the ongoing/currently running operations in the database. It must be run against the admin database. On servers that run with authorization, you need the inprog privilege action to view operations for all users. This is included in the built-in clusterMonitor role.

Use Cases

The command to see all the active connections:

db.currentOp()

The user that has no inprog privilege can view its own operations, without this privilege, with the below command:

db.currentOp( { "$ownOps": true } )

To see the connections in the background, and idle connections, you can use either one of the below commands:

db.currentOp(true)
db.currentOp( { "$all": true } )

As I said before, you can use filters here to check the operations you need, like a command running for more than a few seconds, waiting for a lock, active/inactive connections, running on a particular namespace, etc. Let’s see some examples from my test environment.

The below command provides information about all active connections. 

mongos> db.currentOp()
{
	"inprog" : [
		{
			"shard" : "shard01",
			"host" : "bm-support01.bm.int.percona.com:54012",
			"desc" : "conn52",
			"connectionId" : 52,
			"client_s" : "127.0.0.1:53338",
			"appName" : "MongoDB Shell",
			"clientMetadata" : {
				"application" : {
					"name" : "MongoDB Shell"
				},
				"driver" : {
					"name" : "MongoDB Internal Client",
					"version" : "4.0.19-12"
				},
				"os" : {
					"type" : "Linux",
					"name" : "CentOS Linux release 7.9.2009 (Core)",
					"architecture" : "x86_64",
					"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
				},
				"mongos" : {
					"host" : "bm-support01.bm.int.percona.com:54010",
					"client" : "127.0.0.1:36018",
					"version" : "4.0.19-12"
				}
			},
			"active" : true,
			"currentOpTime" : "2021-03-21T23:41:48.206-0400",
			"opid" : "shard01:1404",
			"lsid" : {
				"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
				"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
			},
			"secs_running" : NumberLong(0),
			"microsecs_running" : NumberLong(180),
			"op" : "getmore",
			"ns" : "admin.$cmd",
			"command" : {
				"getMore" : NumberLong("8620961729688473960"),
				"collection" : "$cmd.aggregate",
				"batchSize" : NumberLong(101),
				"lsid" : {
					"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
					"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1616384506, 2),
					"signature" : {
						"hash" : BinData(0,"z/r5Z/DxrxaeH1VIKOzeok06YxY="),
						"keyId" : NumberLong("6942317981145759774")
					}
				},
				"$client" : {
					"application" : {
						"name" : "MongoDB Shell"
					},
					"driver" : {
						"name" : "MongoDB Internal Client",
						"version" : "4.0.19-12"
					},
					"os" : {
						"type" : "Linux",
						"name" : "CentOS Linux release 7.9.2009 (Core)",
						"architecture" : "x86_64",
						"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
					},
					"mongos" : {
						"host" : "bm-support01.bm.int.percona.com:54010",
						"client" : "127.0.0.1:36018",
						"version" : "4.0.19-12"
					}
				},
				"$configServerState" : {
					"opTime" : {
						"ts" : Timestamp(1616384506, 2),
						"t" : NumberLong(1)
					}
				},
				"$db" : "admin"
			},
			"originatingCommand" : {
				"aggregate" : 1,
				"pipeline" : [
					{
						"$currentOp" : {
							"allUsers" : true,
							"truncateOps" : true
						}
					},
					{
						"$sort" : {
							"shard" : 1
						}
					}
				],
				"fromMongos" : true,
				"needsMerge" : true,
				"mergeByPBRT" : false,
				"cursor" : {
					"batchSize" : 0
				},
				"allowImplicitCollectionCreation" : true,
				"lsid" : {
					"id" : UUID("6bd7549b-0c89-40b5-b59f-af765199bbcf"),
					"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1616384506, 2),
					"signature" : {
						"hash" : BinData(0,"z/r5Z/DxrxaeH1VIKOzeok06YxY="),
						"keyId" : NumberLong("6942317981145759774")
					}
				},
				"$client" : {
					"application" : {
						"name" : "MongoDB Shell"
					},
					"driver" : {
						"name" : "MongoDB Internal Client",
						"version" : "4.0.19-12"
					},
					"os" : {
						"type" : "Linux",
						"name" : "CentOS Linux release 7.9.2009 (Core)",
						"architecture" : "x86_64",
						"version" : "Kernel 5.10.13-1.el7.elrepo.x86_64"
					},
					"mongos" : {
						"host" : "bm-support01.bm.int.percona.com:54010",
						"client" : "127.0.0.1:36018",
						"version" : "4.0.19-12"
					}
				},
				"$configServerState" : {
					"opTime" : {
						"ts" : Timestamp(1616384506, 2),
						"t" : NumberLong(1)
					}
				},
				"$db" : "admin"
			},
			"numYields" : 0,
			"locks" : {
				
			},
			"waitingForLock" : false,
			"lockStats" : {
				
			}
		},
		{
			"shard" : "shard01",
			"host" : "bm-support01.bm.int.percona.com:54012",
			"desc" : "monitoring keys for HMAC",
…
...

Some of the important parameters that we may need to focus on from the output are as follows. I provide this information here as we will use these parameters to filter for the operations that we need.

PARAMETER DESCRIPTION
host The host that the operation is running
opid The operation id (it is used to kill that operation) 
active The connection’s status. True if it is running and false if it is idle
client Host/IP information about where the operation originated
clientMetadata Provides more information about client connection
shard Which shard is connected if it is sharded cluster environment
appName Information about the type of client
currentOpTime Start time of the operation
ns Namespace (details about the DB and collection)
command A document with the full command object associated with the operation
secs_running / microsecs_running How many seconds/microseconds that the particular operation is running
op Operation type like insert, update, find, delete etc
planSummary Whether the command uses the index IXSCAN or collection scan COLLSCAN (disk read)
cursor Cursor information for getmore operations
locks Type and mode of the lock. See here for more details
waitingForLock True if the operation waiting for a lock, false if it has required lock
msg A message that describes the status and progress of the operation
killPending Whether the operation is currently flagged for termination
numYields Is a counter that reports the number of times the operation has yielded to allow other operation

The raw currentOp output can be processed by the javascript forEach function method in the mongo shell, so we can use it to do many operations. For example, I want to take counts of the output or number of active connections. Then I can use the below one:

mongos> var c=1;
mongos> db.currentOp().inprog.forEach(
... function(doc){
...   c=c+1
... }
... )
mongos> print("The total number of active connections are: "+c)
The total number of active connections are: 16

To find the number of active and inactive connections:

mongos> var active=1; var inactive=1;
mongos> db.currentOp(true).inprog.forEach( function(doc){  if(doc.active){    active=active+1 }  else if(!doc.active){    inactive=inactive+1 }  } )
mongos> print("The number of active connections are: "+active+"\nThe number of inactive connections are: "+inactive)
The number of active connections are: 16
The number of inactive connections are: 118

To find the operations running (importing job) more than 1000 microseconds (for seconds, use secs_running) and with a specific namespace vinodh.testColl:

mongos> db.currentOp(true).inprog.forEach( function(doc){ if(doc.microsecs_running>1000 && doc.ns == "vinodh.testColl")  {print("\nop: "+doc.op+", namespace: "+doc.ns+", \ncommand: ");printjson(doc.command)} } )

op: insert, namespace: vinodh.testColl, 
command: 
{
  "$truncated" : "{ insert: \"testColl\", bypassDocumentValidation: false, ordered: false, documents: [ { _id: ObjectId('605a1ab05c15f7d2046d5d26'), id: 49004, name: \"Vernon Drake\", age: 19, emails: [ \"fetome@liek.gh\", \"noddo@ve.kh\", \"wunin@cu.ci\" ], born_in: \"1973\", ip_addresses: [ \"212.199.110.72\" ], blob: BinData(0, 4736735553674F6E6825) }, { _id: ObjectId('605a1ab05c15f7d2046d5d27'), id: 49003, name: \"Rhoda Burke\", age: 64, emails: [ \"zog@borvelaj.pa\", \"hoz@ni.do\", \"abfad@borup.cl\" ], born_in: \"1976\", ip_addresses: [ \"12.190.161.2\", \"16.63.87.211\" ], blob: BinData(0, 244C586A683244744F54) }, { _id: ObjectId('605a1ab05c15f7d2046d5d28'), id: 49002, name: \"Alberta Mack\", age: 25, emails: [ \"sibef@nuvaki.sn\", \"erusu@dimpu.ag\", \"miumurup@se.ir\" ], born_in: \"1971\", ip_addresses: [ \"250.239.181.203\", \"192.240.119.122\", \"196.13.33.240\" ], blob: BinData(0, 7A63566B42732659236D) }, { _id: ObjectId('605a1ab05c15f7d2046d5d29'), id: 49005, name: \"Minnie Chapman\", age: 33, emails: [ \"jirgenor@esevepu.edu\", \"jo@m..."
}

But this command can be easily written without forEach as follows directly as well:

mongos> db.currentOp({ "active": true, "microsecs_running": {$gt: 1000}, "ns": /^vinodh.testColl/ })
{
  "inprog" : [
    {
      "shard" : "shard01",
      "host" : "bm-support01.bm.int.percona.com:54012",
      "desc" : "conn268",
      "connectionId" : 268,
      "client_s" : "127.0.0.1:55480",
      "active" : true,
      "currentOpTime" : "2021-03-23T13:05:32.550-0400",
      "opid" : "shard01:689582",
      "secs_running" : NumberLong(0),
      "microsecs_running" : NumberLong(44996),
      "op" : "insert",
      "ns" : "vinodh.testColl",
      "command" : {
        "$truncated" : "{ insert: \"testColl\", bypassDocumentValidation: false, ordered: false, documents: [ { _id: ObjectId('605a1fdc5c15f7d2047ee04e'), id: 16002, name: \"Linnie Walsh\", age: 25, emails: [ \"evoludecu@logejvi.ai\", \"ilahubfep@ud.mc\", \"siujo@pipazvo.ht\" ], born_in: \"1982\", ip_addresses: [ \"198.117.218.117\" ], blob: BinData(0, 244A6E702A5047405149) }, { _id: ObjectId('605a1fdc5c15f7d2047ee04f'), id: 16004, name: \"Larry Watts\", age: 47, emails: [ \"sa@hulub.gy\", \"wepo@ruvnuhej.om\", \"jorvohki@nobajmo.hr\" ], born_in: \"1989\", ip_addresses: [], blob: BinData(0, 50507461366B6F766C40) }, { _id: ObjectId('605a1fdc5c15f7d2047ee050'), id: 16003, name: \"Alejandro Jacobs\", age: 61, emails: [ \"enijaze@hihen.et\", \"gekesaco@kockod.fk\", \"rohovus@il.az\" ], born_in: \"1988\", ip_addresses: [ \"239.139.123.44\", \"168.34.26.236\", \"123.230.33.251\", \"132.222.43.251\" ], blob: BinData(0, 32213574705938385077) }, { _id: ObjectId('605a1fdc5c15f7d2047ee051'), id: 16005, name: \"Mildred French\", age: 20, emails: [ \"totfi@su.mn\"..."
      },
      "numYields" : 0,
      "locks" : {
        
      },
      "waitingForLock" : false,
      "lockStats" : {
        "Global" : {
          "acquireCount" : {
            "r" : NumberLong(16),
            "w" : NumberLong(16)
          }
        },
        "Database" : {
          "acquireCount" : {
            "w" : NumberLong(16)
          }
…

The operations waiting for the lock on a specific namespace (ns) / operation (op) can be filtered as follows, and you can alter the parameters to filter as you wish:

db.currentOp(
   {
     "waitingForLock" : true,
    "ns": /^vinodh.testColl/,
     $or: [
        { "op" : { "$in" : [ "insert", "update", "remove" ] } },
        { "command.findandmodify": { $exists: true } }
    ]
   }
)

Aggregate – currentOp():

Starting with MongoDB 3.6, currentOp method is supported in aggregation. So checking the currentOp is even easier with this method. Also, the aggregation pipeline doesn’t have a 16MB result size limit as well. The usage is:

{ $currentOp: { allUsers: <boolean>, idleConnections: <boolean>, idleCursors: <boolean>, idleSessions: <boolean>, localOps: <boolean> } }

Note:

Options/Features added, version-wise, to currentOp()

  • allUsers, idleConnections – available from 3.6,
  • idleCursors – available from 4.2
  • idleSessions, localOps – available from 4.0

Let’s see an example of the same. Count all connections including idle connections with shard02:

mongos> db.aggregate( [ { $currentOp : { allUsers: true, idleConnections: true } },    
... { $match : { shard: "shard02" }}, {$group: {_id:"shard02", count: {$sum: 1}} } ] )
{ "_id" : "shard02", "count" : 65 }

Now using the same import job, finding the operation as follows:

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    
... { $match : { "ns": "vinodh.testColl" }} ] )
{ "shard" : "shard01", "host" : "bm-support01.bm.int.percona.com:54012", "desc" : "conn279", "connectionId" : 279, "client_s" : "127.0.0.1:38564", "active" : true, "currentOpTime" : "2021-03-23T13:33:57.225-0400", "opid" : "shard01:722388", "secs_running" : NumberLong(0), "microsecs_running" : NumberLong(24668), "op" : "insert", "ns" : "vinodh.testColl", "command" : { "insert" : "testColl", "bypassDocumentValidation" : false, "ordered" : false, "documents" : [ { "_id" : ObjectId("605a26855c15f7d20484d217"), "id" : 12020, "name" : "Dora Watson",....tId("000000000000000000000000") ], "writeConcern" : { "getLastError" : 1, "w" : "majority" }, "allowImplicitCollectionCreation" : false, "$clusterTime" : { "clusterTime" : Timestamp(1616520837, 1000), "signature" : { "hash" : BinData(0,"yze8dSs12MUKlnb7rpw5h2YblFI="), "keyId" : NumberLong("6942317981145759774") } }, "$configServerState" : { "opTime" : { "ts" : Timestamp(1616520835, 10), "t" : NumberLong(2) } }, "$db" : "vinodh" }, "numYields" : 0, "locks" : { "Global" : "w", "Database" : "w", "Collection" : "w" }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(8), "w" : NumberLong(8) } }, "Database" : { "acquireCount" : { "w" : NumberLong(8) } }, "Collection" : { "acquireCount" : { "w" : NumberLong(8) } } } }

To reduce the output and project only some fields in the output:

mongos> db.aggregate( [    
... { $currentOp : { allUsers: true, idleConnections: false } },    
... { $match : { ns: "vinodh.testColl", microsecs_running: {$gt: 10000} }}, 
... {$project: { _id:0, host:1, opid:1, secs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] )
{ "host" : "bm-support01.bm.int.percona.com:54012", "opid" : "shard01:777387", "secs_running" : NumberLong(0), "op" : "insert", "ns" : "vinodh.testColl", "numYields" : 0, "waitingForLock" : false }

To see the output in fantasy mode, used to be pretty ?

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    { $match : { ns: "vinodh.testColl", microsecs_running: {$gt: 10000} }}, {$project: { _id:0, host:1, opid:1, secs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] ).pretty()
{
	"host" : "bm-support01.bm.int.percona.com:54012",
	"opid" : "shard01:801285",
	"secs_running" : NumberLong(0),
	"op" : "insert",
	"ns" : "vinodh.testColl",
	"numYields" : 0,
	"waitingForLock" : false
}

I hope now you will have some idea on using currentOp() to check the ongoing operations. 

Let’s imagine you want to kill an operation running for a long time. From the same currentOp document you identified it with, you can take the opid and kill it using killOp() method. In the example below, I used the sharded environment and so the opid is in a “shard_no:opid” format. See here for more details.

mongos> db.aggregate( [    { $currentOp : { allUsers: true, idleConnections: false } },    { $match : { ns: "vinodh.testColl" }}, {$project: { _id:0, host:1, opid:1, microsecs_running: 1, op:1, ns:1, waitingForLock: 1, numYields: 1  } } ] ).pretty()
{
	"host" : "bm-support01.bm.int.percona.com:54012",
	"opid" : "shard01:1355440",
	"microsecs_running" : NumberLong(39200),
	"op" : "insert",
	"ns" : "vinodh.testColl",
	"numYields" : 0,
	"waitingForLock" : false
}


mongos> db.killOp("shard01:1355440")
{
	"shard" : "shard01",
	"shardid" : 1355440,
	"ok" : 1,
	"operationTime" : Timestamp(1616525284, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1616525284, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

Conclusion

So the next time when you want to check the ongoing operations, you can use these techniques for filtering operations waiting for a lock, running on a namespace, running more than a specified time, specific operation or specific shard, etc. Also, comment here if you have any other ideas on this topic. I am happy to learn/see that as well.


Percona Distribution for MongoDB is the only truly open-source solution powerful enough for enterprise applications.

It’s free to use, so try it today!

Mar
22
2021
--

Storing Kubernetes Operator for Percona Server for MongoDB Secrets in Github

storing kubernetes MongoDB secrets github

storing kubernetes MongoDB secrets githubMore and more companies are adopting GitOps as the way of implementing Continuous Deployment. Its elegant approach built upon a well-known tool wins the hearts of engineers. But even if your git repository is private, it’s strongly discouraged to store keys and passwords in unencrypted form.

This blog post will show how easy it is to use GitOps and keep Kubernetes secrets for Percona Kubernetes Operator for Percona Server for MongoDB securely in the repository with Sealed Secrets or Vault Secrets Operator.

Sealed Secrets

Prerequisites:

  • Kubernetes cluster up and running
  • Github repository (optional)

Install Sealed Secrets Controller

Sealed Secrets rely on asymmetric cryptography (which is also used in TLS), where the private key (which in our case is stored in Kubernetes) can decrypt the message encrypted with the public key (which can be stored in public git repository safely). To make this task easier, Sealed Secrets provides the kubeseal tool, which helps with the encryption of the secrets.

Install kubeseal operator into your Kubernetes cluster:

kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/controller.yaml

It will install the controller into the kube-system namespace and provide the Custom Resource Definition

sealedsecrets.bitnami.com

. All resources in Kubernetes with

kind: SealedSecrets

will be handled by this Operator.

Download the kubeseal binary:

wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/kubeseal-linux-amd64 -O kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubeseal

Encrypt the Keys

In this example, I intend to store important secrets of the Percona Kubernetes Operator for Percona Server for MongoDB in git along with my manifests that are used to deploy the database.

First, I will seal the secret file with system users, which is used by the MongoDB Operator to manage the database. Normally it is stored in deploy/secrets.yaml.

kubeseal --format yaml < secrets.yaml  > blog-data/sealed-secrets/mongod-secrets.yaml

This command creates the file with encrypted contents, you can see it in the blog-data/sealed-secrets repository here. It is safe to store it publicly as it can only be decrypted with a private key.

Executing

kubectl apply -f blog-data/sealed-secrets/mongod-secrets.yaml

does the following:

  1. A sealedsecrets custom resource (CR) is created. You can see it by executing
    kubectl get sealedsecrets

    .

  2. The Sealed Secrets Operator receives the event that a new sealedsecrets CR is there and decrypts it with the private key.
  3. Once decrypted, a regular Secrets object is created which can be used as usual.

$ kubectl get sealedsecrets
NAME               AGE
my-secure-secret   20m

$ kubectl get secrets my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   10     20m

Next, I will also seal the keys for my S3 bucket that I plan to use to store backups of my MongoDB database:

kubeseal --format yaml < backup-s3.yaml  > blog-data/sealed-secrets/s3-secrets.yaml
kubectl apply -f blog-data/sealed-secrets/s3-secrets.yaml

Vault Secrets Operator

Sealed Secrets is the simplest approach, but it is possible to achieve the same result with HashiCorp Vault and Vault Secrets Operator. It is a more advanced, mature, and feature-rich approach.

Prerequisites:

Vault Secrets Operator also relies on Custom Resource, but all the keys are stored in HashiCorp Vault:

Preparation

Create a policy on the Vault for the Operator:

cat <<EOF | vault policy write vault-secrets-operator -
path "kvv2/data/*" {
  capabilities = ["read"]
}
EOF

The policy might look a bit differently, depending on where your secrets are.

Create and fetch the token for the policy:

$ vault token create -period=24h -policy=vault-secrets-operator

Key                  Value                                                                                                                                                                                        
---                  -----                                                                                               
token                s.0yJZfCsjFq75GiVyKiZgYVOm
...

Write down the token, as you will need it in the next step.

Create the Kubernetes Secret so that the Operator can authenticate with the Vault:

export VAULT_TOKEN=s.0yJZfCsjFq75GiVyKiZgYVOm
export VAULT_TOKEN_LEASE_DURATION=86400

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: vault-secrets-operator
type: Opaque
data:
  VAULT_TOKEN: $(echo -n "$VAULT_TOKEN" | base64)
  VAULT_TOKEN_LEASE_DURATION: $(echo -n "$VAULT_TOKEN_LEASE_DURATION" | base64)
EOF

Deploy Vault Secrets Operator

It is recommended to deploy the Operator with Helm, but before we need to create the values.yaml file to configure the operator.

environmentVars:
  - name: VAULT_TOKEN
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN
  - name: VAULT_TOKEN_LEASE_DURATION
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN_LEASE_DURATION
vault:
  address: "http://vault.vault.svc:8200"

Environment variables are pointing to the Secret that was created in the previous chapter to authenticate with Vault. We also need to provide the Vault address for the Operator to retrieve the secrets.

Now we can deploy the Vault Secrets Operator:

helm repo add ricoberger https://ricoberger.github.io/helm-charts
helm repo update

helm upgrade --install vault-secrets-operator ricoberger/vault-secrets-operator -f blog-data/sealed-secrets/values.yaml

Give me the Secret

I have a key created in my HashiCorp Vault:

$ vault kv get kvv2/mongod-secret
…
Key                                 Value
---                                 -----                                                                                                                                                                         
MONGODB_BACKUP_PASSWORD             <>
MONGODB_CLUSTER_ADMIN_PASSWORD      <>
MONGODB_CLUSTER_ADMIN_USER          <>
MONGODB_CLUSTER_MONITOR_PASSWORD    <>
MONGODB_CLUSTER_MONITOR_USER        <>                                                                                                                                                               
MONGODB_USER_ADMIN_PASSWORD         <>
MONGODB_USER_ADMIN_USER             <>

It is time to create the secret out of it. First, we will create the Custom Resource object of

kind: VaultSecret

.

$ cat blog-data/sealed-secrets/vs.yaml
apiVersion: ricoberger.de/v1alpha1
kind: VaultSecret
metadata:
  name: my-secure-secret
spec:
  path: kvv2/mongod-secret
  type: Opaque

$ kubectl apply -f blog-data/sealed-secrets/vs.yaml

The Operator will connect to HashiCorp Vault and create regular Secret object automatically:

$ kubectl get vaultsecret
NAME               SUCCEEDED   REASON    MESSAGE              LAST TRANSITION   AGE
my-secure-secret   True        Created   Secret was created   47m               47m

$ kubectl get secret  my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   7      47m

Deploy MongoDB Cluster

Now that the secrets are in place, it is time to deploy the Operator and the DB cluster:

kubectl apply -f blog-data/sealed-secrets/bundle.yaml
kubectl apply -f blog-data/sealed-secrets/cr.yaml

The cluster will be up in a minute or two and use secrets we deployed.

By the way, my cr.yaml deploys MongoDB cluster with two shards. Multiple shards support was added in version 1.7.0of the Operator – I encourage you to try it out. Learn more about it here: Percona Server for MongoDB Sharding.

Mar
22
2021
--

Want MongoDB Performance? You Will Need to Add and Remove Indexes!

MongoDB Performance

MongoDB PerformanceGood intentions can sometimes end up with bad results.  Adding indexes boosts performance until it doesn’t. Avoid over-indexing.

The difference between your application being fast, responsive, and scaling properly is often dependent on how you use indexes in the database.  MongoDB is no different, its performance (and the overall performance of your application) is heavily dependent on getting the proper amount of indexes on the right things.   A simple index or two can speed up getting data from MongoDB a million-fold for million-records tables.  But at the same time having too many indexes on a large collection can lead to massive slowdowns in overall performance.  You need to get your indexes just right.

For this blog, we are going to talk about having too many indexes and help you find both duplicate and unused indexes.  If you are interested in finding out if you need additional indexes or if your query is using an index, I would suggest reading previous Percona articles on query tuning (Part 1 & Part 2 of that series).

So, indexes are very good for getting faster queries. How many indexes do I need to create on a collection? What are the best practices for the indexes? How do I find which indexes are being used or not?  What if I have duplicated indexes?

Common Performance Problems

After analyzing a lot of different MongoDB environments I can provide the following list summarizing the typical errors I have seen:

  • Not creating indexes at all, other than the primary key _id created by design.
    • I’m not joking – I have seen databases without any user-created indexes, which had owners surprised the server was overloaded and/or the queries were very slow.
  • Over-indexing the collection.
    • Some developers usually create a lot of indexes without a specific reason or just for testing a query. Then they forget to drop them.
    • In some cases, the size of all the indexes was larger than the data. This is not good; indexes should be as small as possible to be really effective.

I’m not considering the first case. I’m going to discuss instead the second one.

How Many Indexes you Need in a Collection

It depends – that’s the right answer. Basically, it depends on your application workload. You should consider the following rules when indexing a collection:

  • Create as many indexes as possible for your application.
  • Don’t create a lot of indexes.

What? These rules are stating the opposite thing! Well, we can summarize in just one simple rule:

  • You need to create all the indexes your application really needs for solving the most frequent queries. Not one more, not one less.

That’s it.

Pros and Cons of Indexing

The big advantage of the indexes is that they permit the queries, updates, and deletes to run as fast as possible if they are used. (Every update or delete also needs to do a lookup step first). More indexes in a collection can benefit several queries.

Unfortunately, the indexes require some extra work for MongoDB. Any time your run a write, all the indexes must be updated. The new values are stored or dropped into the B-Tree structure, some splitting or merging is needed, and this requires some time.

The main problem is that “more indexes you have in a collection, the slower all the writes will be”.

A very large collection with just 10 or 15 indexes can have a significant performance loss for the writes. Also, remember that indexes have to be copied into the WiredTiger cache. More indexes imply also more pressure for the memory cache. The pressure can then lead to more cache evictions and slowness.

A good example of this is when I was working with a customer a few weeks ago we found 12 extra indexes on a collection they did not need. The collection was around 80GB; the total index size was more than the data size. They had a relevant write load based on several frequent inserts and updates all the time. Cleaning these indexes increased their write queries execution time by 25-30 percent on average. The improvement observed for this real case won’t be the same quantitative amount in other cases, but for sure the fewer indexes you have the faster all the writes will be.

We need to find some kind of balancing: creating more indexes, but not that much.

How to Reduce Over-Indexing

Very easy to say: drop all the indexes you don’t need.

There are two things you can do to identify the indexes to get dropped:

  • Check for the duplicates.
  • Check for the unused indexes.

For dropping an index you need to run something like the following:

db.mycollection.dropIndex("index_name")

Find Duplicate Indexes

A duplicate index could be an index with the same exact definition as another index that already exists in the collection. Fortunately, MongoDB is able to check this and it is not permitted to create such an index.

Let’s do a test using a simple collection with no indexes.

rs_test:PRIMARY> db.test.find()
{ "_id" : ObjectId("60521309d7268c122c7cd630"), "name" : "corrado", "age" : 49 }
{ "_id" : ObjectId("60521313d7268c122c7cd631"), "name" : "simone", "age" : 12 }
{ "_id" : ObjectId("6052131cd7268c122c7cd632"), "name" : "gabriele", "age" : 19 }
{ "_id" : ObjectId("60521323d7268c122c7cd633"), "name" : "luca", "age" : 14 }
{ "_id" : ObjectId("60521328d7268c122c7cd634"), "name" : "lucia", "age" : 49 }

# create an index on name field
rs_test:PRIMARY> db.test.createIndex( { name: 1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 1,
   "numIndexesAfter" : 2,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991942, 5),
      "signature" : {
         "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615991942, 5)
}

# check indexes available
rs_test:PRIMARY> db.test.getIndexes()
[
   {
      "v" : 2,
      "key" : {
         "_id" : 1
      },
      "name" : "_id_"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1
      },
      "name" : "name_1"
   }
]

# try to create again the same index
rs_test:PRIMARY> db.test.createIndex( { name: 1 } )
{
   "numIndexesBefore" : 2,
   "numIndexesAfter" : 2,
   "note" : "all indexes already exist",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991942, 5),
      "signature" : {
         "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615991942, 5)
}

# great, MongoDB can detect the index already exists

# let's try to see if you can create the same index with a different name
rs_test:PRIMARY> db.test.createIndex( { name: 1 }, { name: "this_is_a_different_index_name" } )
{
   "operationTime" : Timestamp(1615991981, 1),
   "ok" : 0,
   "errmsg" : "Index with name: this_is_a_different_index_name already exists with a different name",
   "code" : 85,
   "codeName" : "IndexOptionsConflict",
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991981, 1),
      "signature" : {
         "hash" : BinData(0,"whkRyQQxyJVBt+7d3HOtFvYY32g="),
         "keyId" : NumberLong("6890926313742270469")
      }
   }
}

# even in this case MongoDB doesn't permit the index creation

MongoDB is then clever enough to avoid the creation of duplicate indexes. But what about the creation of an index that is the left-prefix of an existing index? Let’s test it.

# let's drop the previous index we have created
rs_test:PRIMARY> db.test.dropIndex( "name_1" )
{
   "nIndexesWas" : 2,
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993029, 1),
      "signature" : {
         "hash" : BinData(0,"njFiuCeyA5VcdNOOP2ASboOpWwo="),
         "keyId" : NumberLong("6890926313742270469")  
      }
   },
   "operationTime" : Timestamp(1615993029, 1)
}

# check indexes. Only _id available
rs_test:PRIMARY> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]

# create a compound index 
rs_test:PRIMARY> db.test.createIndex( { name:1, age: 1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 1,
   "numIndexesAfter" : 2,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993054, 5),
      "signature" : {
         "hash" : BinData(0,"gfaPsWsSM745opEiQORCt2L3HYo="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615993054, 5)
}

# create another index that is the leftmost prefix of the compound index
rs_test:PRIMARY> db.test.createIndex( { name:1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 2,
   "numIndexesAfter" : 3,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993060, 5),
      "signature" : {
         "hash" : BinData(0,"C2XWVA5mi+WWyPMn3Jw2VHTw/Dk="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615993060, 5)
}

# check indexes
rs_test:PRIMARY> db.test.getIndexes()
[
   {
      "v" : 2,
      "key" : {
         "_id" : 1 
      },
      "name" : "_id_"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1,
         "age" : 1
      },
      "name" : "name_1_age_1"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1
      },
      "name" : "name_1"
   }
]

We consider a leftmost-prefix index as a duplicate as well.

To take benefit from a compound index MongoDB doesn’t need to use all the fields of that index, the leftmost prefix is enough. For example an index on (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C). As a consequence, if I have two different indexes, one on (A, B, C) and another one on (A, B), the second is a duplicate because the first can be used the same way for solving the query with the combinations (A, B) and (A).

Then, find all duplicate indexes and drop them since they’re useless. Just be aware and check that your application doesn’t use hint() on the indexes you’re going to drop.

In order to avoid manually checking all the collections to discover the duplicates, I provide here a javascript code for that:

var ldb = db.adminCommand( { listDatabases: 1 } );

for ( i = 0; i < ldb.databases.length; i++ ) {

   if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local') {

      print('DATABASE ',ldb.databases[i].name);
      print("+++++++++++++++++++++++++++++++++++++++++")

      var db = db.getSiblingDB(ldb.databases[i].name); 
      var cpd = db.getCollectionNames();

      for ( j = 0; j < cpd.length; j++ ) { 

         if ( cpd[j] != 'system.profile' ) {

            var indexes = JSON.parse(JSON.stringify(db.runCommand( { listIndexes: cpd[j] } ).cursor.firstBatch));
            print("COLL :"+cpd[j]);

            for ( k = 0; k < indexes.length; k++ ) {

               indexes[k] = (((JSON.stringify(indexes[k].key)).replace("{","")).replace("}","")).replace(/,/g ,"_");

            }

            var founddup = false;

            for ( k1 = 0; k1 < indexes.length; k1++ ) {

               for ( k2 = 0; k2 < indexes.length; k2++ ) {

                  if ( k1 != k2 ) {

                     if (indexes[k1].startsWith(indexes[k2],0)) {

                        print("{ "+indexes[k2]+" } is the left prefix of { "+indexes[k1]+" } and should be dropped");

                        founddup = true;

                     }
                  }
               } 
            }

            if (!founddup) {

               print("no duplicate indexes found");

            }

            print("\n");

         } 
      }

      print("\n");
   } 
}

Note: this script is just an initial test and could be improved, but it should work in most cases.

Find Unused Indexes

MongoDB maintains internal statistics about index usage. Any time an index is used for solving a query a specific counter is an increment. After running MongoDB for a significant amount of time – days or weeks – the statistics are reliable and we can find out which indexes have been used or not.

For looking at the index stats, MongoDB provides a stage in the aggregation pipeline: $indexStats

Here you can see an example:

rs_test:PRIMARY> db.restaurants.aggregate([ { $indexStats: {} } ]).pretty()
{
   "name" : "borough_1",
   "key" : {
      "borough" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(312),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : {
      "v" : 2,
      "key" : {
         "borough" : 1
      },
      "name" : "borough_1"
   }
}
{
   "name" : "_id_",
   "key" : {
      "_id" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(12),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : {
      "v" : 2,
      "key" : {
         "_id" : 1
      },
      "name" : "_id_"
   }
}
{
   "name" : "cuisine_1_borough_1",
   "key" : {
      "cuisine" : 1,
      "borough" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(0),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : { 
      "v" : 2,
      "key" : {
         "cuisine" : 1,
         "borough" : 1
      },
      "name" : "cuisine_1_borough_1"
   }
}

The accesses.ops is the number of times the index has been used. In the example you can see the { borough:1 } has been used 312 times, the index { _id } 12 times, and the index { cuisine:1, borough: 1} 0 times. The last one could be dropped.

If the database is running for a long time with millions of queries executed and if an index was not used, most probably it won’t be used even in the future.

Then you should consider dropping the unused indexes in order to improve the writes, reduce the cache pressure, and saving disk space as well.

Using the following script you can find out the index statistics for all the collections:

var ldb=db.adminCommand( { listDatabases: 1 } );

   for (i=0; i<ldb.databases.length; i++) { 

      print('DATABASE ',ldb.databases[i].name);

      if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local' ) {

      var db = db.getSiblingDB(ldb.databases[i].name); 
      var cpd = db.getCollectionNames();

      for (j=0; j<cpd.length; j++) {

         if ( cpd[j] != 'system.profile' ) {

            print(cpd[j]); 

            var pui = db.runCommand({ aggregate : cpd[j] ,pipeline : [{$indexStats: {}}],cursor: { batchSize: 100 } }); 
            printjson(pui);

         } 
      }

      print('\n\n'); 
   }
}

Look for the indexes having “ops”: NumberLong(0)

Conclusion

Creating indexes for solving queries is a good habit, but be aware to not abuse indexing. Excessive indexing can lead to slower writes, excessive pressure on the memory cache, and more evictions.

You should consider maintaining your indexes from time to time dropping all the duplicates and the unused indexes. The scripts provided in this article may help your index analysis.

Mar
11
2021
--

Webinar March 26: MongoDB Backups Overview

Webinar MongoDB Backups Overview

Webinar MongoDB Backups OverviewJoin Percona Technical Expert Corrado Pandiani as he presents a quick yet robust comparison of the different backup solutions that can be used with MongoDB. This webinar will highlight:

– Terminology

– Elements of MongoDB Backups

– Backup & Restore Solution

– Performance and Impact Comparison

Please join Corrado Pandiani on Friday, March 26, 2021, at 3:00 AM EST/3:00 PM Singapore GMT+8 for his webinar MongoDB Backups Overview.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.

Mar
10
2021
--

A Peek at Percona Kubernetes Operator for Percona Server for MongoDB New Features

Percona Kubernetes Operator for Percona Server for MongoDB New Features

Percona Kubernetes Operator for Percona Server for MongoDB New FeaturesThe latest 1.7.0 release of Percona Kubernetes Operator for Percona Server for MongoDB came out just recently and enables users to:

Today we will look into these new features, the use cases, and highlight some architectural and technical decisions we made when implementing them.

Sharding

The 1.6.0 release of our Operator introduced single shard support, which we highlighted in this blog post and explained why it makes sense. But horizontal scaling is not possible without support for multiple shards.

Adding a Shard

A new shard is just a new ReplicaSet which can be added under spec.replsets in cr.yaml:

spec:
  ...
  replsets:
  - name: rs0
    size: 3
  ....
  - name: rs1
    size: 3
  ...

Read more on how to configure sharding.

In the Kubernetes world, a MongoDB ReplicaSet is a StatefulSet with a number of pods specified in

spec.replsets.[].size

variable.

Once pods are up and running, the Operator does the following:

  • Initiates ReplicaSet by connecting to newly created pods running mongod
  • Connects to mongos and adds a shard with sh.addShard() command

    adding a shard mongodb operator

Then the output of db.adminCommand({ listShards:1 }) will look like this:

        "shards" : [
                {
                        "_id" : "replicaset-1",
                        "host" : "replicaset-1/percona-cluster-replicaset-1-0.percona-cluster-replicaset-1.default.svc.cluster.local:27017,percona-cluster-replicaset-1-1.percona-cluster-replicaset-1.default.svc.cluster.local:27017,percona-cluster-replicaset-1-2.percona-cluster-replicaset-1.default.svc.cluster.local:27017",
                        "state" : 1
                },
                {
                        "_id" : "replicaset-2",
                        "host" : "replicaset-2/percona-cluster-replicaset-2-0.percona-cluster-replicaset-2.default.svc.cluster.local:27017,percona-cluster-replicaset-2-1.percona-cluster-replicaset-2.default.svc.cluster.local:27017,percona-cluster-replicaset-2-2.percona-cluster-replicaset-2.default.svc.cluster.local:27017",
                        "state" : 1
                }
        ],

Have open source expertise to share? Submit your talk for Percona Live ONLINE!

Deleting a Shard

Percona Operators are built to simplify the deployment and management of the databases on Kubernetes. Our goal is to provide resilient infrastructure, but the operator does not manage the data itself. Deleting a shard requires moving the data to another shard before removal, but there are a couple of caveats:

  • Sometimes data is not moved automatically by MongoDB – unsharded collections or jumbo chunks
  • We hit the storage problem – what if another shard does not have enough disk space to hold the data?

shard does not have enough disk space to hold the data

There are a few choices:

  1. Do not touch the data. The user needs to move the data manually and then the operator removes the empty shard.
  2. The operator decides where to move the data and deals with storage issues by upscaling if necessary.
    • Upscaling the storage can be tricky, as it requires certain capabilities from the Container Storage Interface (CNI) and the underlying storage infrastructure.

For now, we decided to pick option #1 and won’t touch the data, but in future releases, we would like to work with the community to introduce fully-automated shard removal.

When the user wants to remove the shard now, we first check if there are any non-system databases present on the ReplicaSet. If there are none, the shard can be removed:

func (r *ReconcilePerconaServerMongoDB) checkIfPossibleToRemove(cr *api.PerconaServerMongoDB, usersSecret *corev1.Secret, rsName string) error {
  systemDBs := map[string]struct{}{
    "local": {},
    "admin": {},
    "config":  {},
  }

delete a shard

Custom Sidecars

The sidecar container pattern allows users to extend the application without changing the main container image. They leverage the fact that all containers in the pod share storage and network resources.

Percona Operators have built-in support for Percona Monitoring and Management to gain monitoring insights for the databases on Kubernetes, but sometimes users may want to expose metrics to other monitoring systems.  Lets see how mongodb_exporter can expose metrics running as a sidecar along with ReplicaSet containers.

1. Create the monitoring user that the exporter will use to connect to MongoDB. Connect to mongod in the container and create the user:

> db.getSiblingDB("admin").createUser({
    user: "mongodb_exporter",
    pwd: "mysupErpassword!123",
    roles: [
      { role: "clusterMonitor", db: "admin" },
      { role: "read", db: "local" }
    ]
  })

2. Create the Kubernetes secret with these login and password. Encode both the username and password with base64:

$ echo -n mongodb_exporter | base64
bW9uZ29kYl9leHBvcnRlcg==
$ echo -n 'mysupErpassword!123' | base64
bXlzdXBFcnBhc3N3b3JkITEyMw==

Put these into the secret and apply:

$ cat mongoexp_secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mongoexp-secret
data:
  username: bW9uZ29kYl9leHBvcnRlcg==
  password: bXlzdXBFcnBhc3N3b3JkITEyMw==

$ kubectl apply -f mongoexp_secret.yaml

3. Add a sidecar for mongodb_exporter into cr.yaml and apply:

replsets:
- name: rs0
  ...
  sidecars:
  - image: bitnami/mongodb-exporter:latest
    name: mongodb-exporter
    env:
    - name: EXPORTER_USER
      valueFrom:
        secretKeyRef:
          name: mongoexp-secret
          key: username
    - name: EXPORTER_PASS
      valueFrom:
        secretKeyRef:
          name: mongoexp-secret
          key: password
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: MONGODB_URI
      value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017"
    args: ["--web.listen-address=$(POD_IP):9216"

$ kubectl apply -f deploy/cr.yaml

All it takes now is to configure the monitoring system to fetch the metrics for each mongod Pod. For example, prometheus-operator will start fetching metrics once annotations are added to ReplicaSet pods:

replsets:
- name: rs0
  ...
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '9216'

PVCs Clean Up

Running CICD pipelines that deploy MongoDB clusters on Kubernetes is a common thing. Once these clusters are terminated, the Persistent Volume Claims (PVCs) are not. We have now added automation that removes PVCs after cluster deletion. We rely on Kubernetes Finalizers – asynchronous pre-delete hooks. In our case we hook the finalizer to the Custom Resource (CR) object which is created for the MongoDB cluster.

PVCs Clean Up

A user can enable the finalizer through cr.yaml in the metadata section:

metadata:
  name: my-cluster-name
  finalizers:
     - delete-psmdb-pvc

Conclusion

Percona is committed to providing production-grade database deployments on Kubernetes. Our Percona Kubernetes Operator for Percona Server for MongoDB is a feature-rich tool to deploy and manage your MongoDB clusters with ease. Our Operator is free and open source. Try it out by following the documentation here or help us to make it better by contributing your code and ideas to our Github repository.

Mar
09
2021
--

3 Percona Software Products Take the SourceForge Leader Award!

Percona Software SourceForge Award

We are so grateful to all users of our software. Thanks to you, some of them have just been recognized as a Winter 2021 category leader by SourceForge!

The SourceForge Leader Award is only awarded to select products that have attained the highest levels of praise from user reviews on SourceForge.

This is a huge achievement, as Percona Monitoring and Management and Percona Server for MongoDB have been selected as best-in-class from over 60,000 products on SourceForge. SourceForge gets over 30 million visitors per month looking for business software and solutions.

Have open source expertise to share? Submit your talk for Percona Live ONLINE!

Thank you, all the users of the open source software products, for your trust and support. We highly appreciate it.

The best reviews are helpful to others by adding technical details and solutions. If you haven’t left a review for our software on SourceForge yet, we are looking forward to reading yours.

Percona Products Take SourceForge Leader Award

Mar
02
2021
--

Microsoft Azure expands its NoSQL portfolio with Managed Instances for Apache Cassandra

At its Ignite conference today, Microsoft announced the launch of Azure Managed Instance for Apache Cassandra, its latest NoSQL database offering and a competitor to Cassandra-centric companies like Datastax. Microsoft describes the new service as a ‘semi-managed offering that will help companies bring more of their Cassandra-based workloads into its cloud.

“Customers can easily take on-prem Cassandra workloads and add limitless cloud scale while maintaining full compatibility with the latest version of Apache Cassandra,” Microsoft explains in its press materials. “Their deployments gain improved performance and availability, while benefiting from Azure’s security and compliance capabilities.”

Like its counterpart, Azure SQL Manages Instance, the idea here is to give users access to a scalable, cloud-based database service. To use Cassandra in Azure before, businesses had to either move to Cosmos DB, its highly scalable database service which supports the Cassandra, MongoDB, SQL and Gremlin APIs, or manage their own fleet of virtual machines or on-premises infrastructure.

Cassandra was originally developed at Facebook and then open-sourced in 2008. A year later, it joined the Apache Foundation and today it’s used widely across the industry, with companies like Apple and Netflix betting on it for some of their core services, for example. AWS launched a managed Cassandra-compatible service at its re:Invent conference in 2019 (it’s called Amazon Keyspaces today), Microsoft launched the Cassandra API for Cosmos DB in September 2018. With today’s announcement, though, the company can now offer a full range of Cassandra-based servicer for enterprises that want to move these workloads to its cloud.


Early Stage is the premiere ‘how-to’ event for startup entrepreneurs and investors. You’ll hear firsthand how some of the most successful founders and VCs build their businesses, raise money and manage their portfolios. We’ll cover every aspect of company-building: Fundraising, recruiting, sales, legal, PR, marketing and brand building. Each session also has audience participation built-in — there’s ample time included in each for audience questions and discussion.


Feb
15
2021
--

Top 5 Features Developers Love About MongoDB

Developers Love About MongoDB

Developers Love About MongoDBMongoDB is one of the most admired and effortless NoSQL databases to set up. Developers want to spend time building the features for their application, and with MongoDB, developers can build the application quickly while utilizing well-supported infrastructure and high availability with automatic failover.

In this blog post, we will discuss the top five things which MongoDB does better than anyone else.

Ease of Setup

First and foremost, MongoDB is very easy to install and deploy, and a developer can start writing code immediately for the application. As said, the installation of MongoDB is very simple whether it’s on Windows, Mac, or Linux. Even for Linux/Mac, one can download the tarball, extract it, configure the db/log path, and start it. Percona offers “Percona Server for MongoDB”, an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB Community Edition. For more details on the installation of “Percona Server for MongoDB (PSMDB)” on various OS, please visit the installation section. One can also spin up MongoDB with Kubernetes, and Percona also has a Kubernetes Operator for Percona Server for MongoDB available.

Flexible Schema

One of the great features MongoDB has is a flexible schema. MongoDB can be a schemaless database. A developer won’t be stuck with a defined schema, i.e. we don’t need to define data-type in a collection before inserting the data or a field’s data type can be different across the documents in a collection. A document from an employee collection in MongoDB may look like:

{ "emp_name" : "XYZ", "city" : "NYC" }

{ "emp_name" : "XYZ", "city" : "NYC", "country" : "US"  }

Even if you have to change the structure of a document in a collection, you only need to update the document with a new structure. Consider a document of a phone collection below:

{

"_id" : ObjectId("5f8d175127f5862e567f676c"),

"model_name" : "iphone12",

"features" : {

"5G_support" : true,

"display" : "OLED with Ceramic Shield"

},

}

Let’s say you want to append a new field “screen_size” in it. One can easily do this with an update without specifying the type of screen_size column [in MongoDB it’s a key].

db.phone.update({ model_name: "iphone12" }, { $set : {   "screen_size" : 6.1 } } )

db.phone.find({ "model_name" : "iphone12" }).pretty()

{

"_id" : ObjectId("5f8d175127f5862e567f676c"),

"model_name" : "iphone12",

"features" : {

"5G_support" : true,

"display" : "OLED with Ceramic Shield"

},

"screen_size" : 6.1

}

It also allows related documents to be embedded as a single document or a document reference:

Embedded single document

{

"_id" : ObjectId("5f8d175127f5862e567f676c"),

"model_name" : "iphone12",

"features" : {

"5G_support" : true,

"display" : "OLED with Ceramic Shield"

},

"screen_size" : 6.1

}

A document with reference

// Customer collection’s document

{

_id : 1211,

name : "Apple Solomon Pond Mall",

address : "601 Donald Lynch Blvd, Marlborough, MA 01752, United States"

}




// Review collection’s Document

{

_id : 442321

review : “iphone SE is the cheapest iphone with a similar look like iPhone 8 but better internals with A13 bionic chip. However, it’s camera is not upto mark...”

cust_id : 1211

}

 

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Fault Tolerance

MongoDB has built-in Replication features that provide High Availability and redundancy. Since it has copies of data in multiple servers, it gives a layer of fault tolerance in case of loss of a database server. Having multiple copies of data in different regions increases the availability and data locality for reads with potential stale reading. It can also improve data locality for writes with zoned shards.

In the MongoDB Replica set, how many nodes can be unavailable and still have sufficient members to elect a new Primary is said to be the Fault Tolerance limit.

A correct fault tolerance configuration would be a mix of business consideration and budget. To achieve replication and fault tolerance of one, we would require a minimum of three nodes. So, if one node goes down, there will still be a majority of nodes available to elect a new Primary.

The below chart shows the number of required nodes to achieve fault tolerance.

Number of nodes Majority Required to Elect a New Primary Fault Tolerance
3 2 1
4 3 1
5 3 2
6 4 2

Replica Sets can also increase the number of queries served to the application as clients can send read requests to secondaries of the replica set, i.e a client can set the readPreference to read from the secondary, “nearest”, or by a tag set. However, reading from secondary nodes comes with a tradeoff as well. Clients may see stale data.

Scalability

Scalability is one of the key features of MongoDB. It is built on a scale-out architecture which enables it to sustain a high volume of data and traffic.

In any database system, growth can be managed by two methods: vertical and horizontal scaling. Vertical scaling involves increasing the capacity of a single server like a more powerful CPU, increasing RAM, or extending disk space. Horizontal scaling involves dividing the dataset into multiple small machines without any code change to be made at the application level.

MongoDB supports sharding through Horizontal scaling. It is cost-effective, more data can be written or read back as necessary as you’re able to distribute the load across your shards. When there is an increase in dataset growth a new shard can be added at any time and MongoDB will automatically migrate the data.

In MongoDB, sharding happens at the collection level and each document is associated with a shard key which decides which shard the document should live on. The application doesn’t send requests to shards directly, it sends the requests to the MongoS [the Query router] and MongoS redirects the read/write request to the respective shards by cached metadata from the config servers.

Performance

Database performance varies with many factors like “Database design”, “application queries” and “load” etc. and MongoDB has the ability to handle large volumes of unstructured data because it allows users to query in a different way which is more appropriate to their workload. It is always faster to retrieve a related single document than to join data across multiple collections.

To get better performance, one needs to make sure that the working set fits in RAM as well. All data persists in hard-disk, except when using the in-memory storage engine, but during the query execution, it fetches the data from local RAM. It is also important to have the right indexes and enough RAM in place to get the advantage of MongoDB’s performance.

Conclusion

MongoDB is feature-rich, and an easy way to get started with NoSQL databases. It has a flexible data model, expressive easy to learn query syntax, automatic failover with replica sets, and is quite scalable. It also has good documentation which makes a  developer’s life a lot easier.

Percona Server for MongoDB offers all of the functionality of MongoDB Enterprise edition with a non-licensed model. This means no need to worry about purchasing licenses for production or non-production environments. You can ensure consistent deployment across all environments by utilizing non-licensed, open source software, all while ensuring that the security standards required by your organization are being met and if support is what you need Percona has you covered there as well.

MongoDB has both Community and Enterprise editions. While the Community edition is source-available and it is free to use the database within the confines of the SSPL license, Enterprise is available as a part of the MongoDB Enterprise subscription which includes MongoDB-provided support for your deployment.

To know more about what Percona Server for MongoDB covers, please visit the blog “Why Pay for Enterprise When Open Source Has You Covered?”.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com