Jun
22
2022
--

Percona Monitoring and Management in Kubernetes is now in Tech Preview

Percona Monitoring and Management in Kubernetes

Percona Monitoring and Management in KubernetesOver the course of the years, we see the growing interest in running databases and stateful workloads in Kubernetes. With Container Storage Interfaces (CSI) maturing and more and more Operators appearing, running stateful workloads in your favorite platform is not that scary anymore. Our Kubernetes story at Percona started with Operators for MySQL and MongoDB, adding PostgreSQL later on. 

Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. It can be deployed in a virtual appliance or a Docker container. Our customers requested us to provide a way to deploy PMM in Kubernetes for a long time. We had an unofficial helm chart which was created as a PoC by Percona teams and the community (GitHub).

We are introducing the Technical Preview of the helm chart that is supported by Percona to easily deploy PMM in Kubernetes. You can find it in our helm chart repository here

Use cases

Single platform

If Kubernetes is a platform of your choice, currently you need a separate virtual machine to run Percona Monitoring and Management. No more with an introduction of a helm chart. 

As you know, Percona Operators all have integration with the PMM which enables monitoring for databases deployed on Kubernetes. Operators configure and deploy pmm-client sidecar container and register the nodes on a PMM server. Bringing PMM into Kubernetes simplifies this integration and the whole flow. Now the network traffic between pmm-client and PMM server does not leave the Kubernetes cluster at all.

All you have to do is to set the correct endpoint in a pmm section in the Custom Resource manifest. For example, for Percona Operator for MongoDB, the pmm section will look like this:

 pmm:
    enabled: true
    image: percona/pmm-client:2.28.0
    serverHost: monitoring-service

Where monitoring-service is the service created by a helm chart to expose a PMM server.

High availability

Percona Monitoring and Management has lots of moving parts: Victoria Metrics to store time-series data, ClickHouse for query analytics functionality, and PostgreSQL to keep PMM configuration. Right now all these components are a part of a single container or virtual machine, with Grafana as a frontend. To provide a zero-downtime deployment in any environment, we need to decouple these components. It is going to substantially complicate the installation and management of PMM.

What we offer instead right now are ways to automatically recover PMM in case of failure within minutes (for example leveraging the EC2 self-healing mechanism).

Kubernetes is a control plane for container orchestration that automates manual tasks for engineers. When you run software in Kubernetes it is best if you rely on its primitives to handle all the heavy lifting. This is what PMM looks like in Kubernetes:

PMM in Kubernetes

  • StatefulSet controls the Pod creation
  • There is a single Pod with a single container with all the components in it
  • This Pod is exposed through a Service that is utilized by PMM Users and pmm-clients
  • All the data is stored on a persistent volume
  • ConfigMap has various environment variable settings that can help to fine-tune PMM

In case of a node or a Pod failure, the StatefulSet is going to recover PMM Pod automatically and remount the Persistent Volume to it. The recovery time depends on the load of the cluster and node availability, but in normal operating environments, PMM Pod is up and running again within a minute.

Deploy

Let’s see how PMM can be deployed in Kubernetes.

Add the helm chart:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install PMM:

helm install pmm --set service.type="LoadBalancer" percona/pmm

You can now login into PMM using the LoadBalancer IP address and use a randomly generated password stored in a

pmm-secret

Secret object (default user is admin).

The Service object created for PMM is called

monitoring-service

:

$ kubectl get services monitoring-service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
monitoring-service   LoadBalancer   10.68.29.40   108.59.80.108   443:32591/TCP   3m34s

$ kubectl get secrets pmm-secret -o yaml
apiVersion: v1
data:
  PMM_ADMIN_PASSWORD: LE5lSTx3IytrUWBmWEhFTQ==
…

$ echo 'LE5lSTx3IytrUWBmWEhFTQ==' | base64 --decode && echo
,NeI<w#+kQ`fXHEM

Login to PMM by connecting to HTTPS://<YOUR_PUBLIC_IP>.

Customization

Helm chart is a template engine for YAML manifests and it allows users to customize the deployment. You can see various parameters that you can set to fine-tune your PMM installation in our README

For example, to set choose another storage class and set the desired storage size, set the following flags:

helm install pmm percona/pmm \
--set storage.storageClassName="premium-rwo" \
-–set storage.size=”20Gi”

You can also change these parameters in values.yaml and use “-f” flag:

# values.yaml contents
storage:
  storageClassName: “premium-rwo”
  size: 20Gi


helm install pmm percona/pmm -f values.yaml

Maintenance

For most of the maintenance tasks, regular Kubernetes techniques would apply. Let’s review a couple of examples.

Compute scaling

It is possible to vertically scale PMM by adding or removing resources through

pmmResources

variable in values.yaml. 

pmmResources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

Once done, upgrade the deployment:

helm upgrade -f values.yaml pmm percona/pmm

This will restart a PMM Pod, so better plan it carefully not to disrupt your team’s work.

Storage scaling

This depends a lot on your storage interface capabilities and the underlying implementation. In most clouds, Persistent Volumes can be expanded. You can check if your storage class supports it by describing it:

kubectl describe storageclass standard
…
AllowVolumeExpansion:  True

Unfortunately, just changing the size of the storage in values.yaml (storage.size) will not do the trick and you will see the following error:

helm upgrade -f values.yaml pmm percona/pmm
Error: UPGRADE FAILED: cannot patch "pmm" with kind StatefulSet: StatefulSet.apps "pmm" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We use the StatefulSet object to deploy PMM, and StatefulSets are mostly immutable and there are a handful of things that can be changed on the fly. There is a trick though.

First, delete the StatefulSet, but keep the Pods and PVCs:

kubectl delete sts pmm --cascade=orphan

Recreate it again with the new storage configuration:

helm upgrade -f values.yaml pmm percona/pmm

It will recreate the StatefulSet with the new storage size configuration. 

Edit Persistent Volume Claim manually and change the storage size (the name of the PVC can be different for you). You need to change the storage in spec.resources.requests.storage section:

kubectl edit pvc pmm-storage-pmm-0

The PVC is not resized yet and you can see the following message when you describe it:

kubectl describe pvc pmm-storage-pmm-0
…
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 16 Jun 2022 11:55:56 +0300           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

The last step would be to restart the Pod:

kubectl delete pod pmm-0

Upgrade

Running helm upgrade is a recommended way. Either once a new helm chart is released or when you want to upgrade the newer version of PMM by replacing the image in the image section. 

Backup and restore

PMM stores all the data on a Persistent Volume. As said before, regular Kubernetes techniques can be applied here to backup and restore the data. There are numerous options:

  • Volume Snapshots – check if it is supported by your CSI and storage implementation
  • Third-party tools, like Velero, can handle the backups and restores of PVCs
  • Snapshots provided by your storage (ex. AWS EBS Snapshots) with manual mapping to PVC during restoration

What is coming

To keep you excited there are numerous things that we are working on or have planned to enable further Kubernetes integrations.

OpenShift support

We are working on building a rootless container so that OpenShift users can run Percona Monitoring and Management there without having to grant elevated privileges. 

Microservices architecture

This is something that we have been discussing internally for some time now. As mentioned earlier, there are lots of components in PMM. To enable proper horizontal scaling, we need to break down our monolith container and run these components as separate microservices. 

Managing all these separate containers and Pods (if we talk about Kubernetes), would require coming up with separate maintenance strategies. This brings up the question of creating a separate Operator for PMM only to manage all this, but it is a significant effort. If you have an opinion here – please let us know on our community forum.

Automated k8s registration in DBaaS

As you know, Percona Monitoring and Management comes with a technical preview of  Database as a Service (DBaaS). Right now when you install PMM on a Kubernetes cluster, you still need to register the cluster to deploy databases. We want to automate this process so that after the installation you can start deploying and managing databases right away.

Conclusion

Percona Monitoring and Management enables database administrators and site reliability engineers to pinpoint issues in their open source database environments, whether it is a quick look through the dashboards or a detailed analysis with Query Analytics. Percona’s support for PMM on Kubernetes is a response to the needs of our customers who are transforming their infrastructure.

Some useful links that would help you to deploy PMM on Kubernetes:

Jun
21
2022
--

PostgreSQL for MySQL Database Administrators: Episodes 3 and 4

PostgreSQL for MySQL DBA

PostgreSQL for MySQL DBAThe videos for PostgreSQL for MySQL Database Administrators (DBA) episodes three and four are live here and here.  Episode three covers a simple backup and restoration while episode four covers some handy PSL commands.  For those of you who missed the first two videos in this series you can find them here: Episode one and Episode two.

Many MySQL DBAs hear about PostgreSQL, and this is a guided introductory series on setting up and using PostgreSQL.  The notes for the two latest episodes are below.   Each of the videos in this series shows you steps and commands, and then shows a video of those commands being executed.  If you are following along, these two episodes build on the previous ones.

Episode three

What we are going to do

  • Backup a database
  • Examine the backup
  • Create a new database
  • Load backup into new database

Creating a backup using pg_dump

$ pg_dump dvdrental > backup.sql

  • pg_dump is the name of the program
  • dvdrental is the name of the database to be backed up
  • Dumping the output to file backup.sql

Create a new database

$ sudo su – postgres
$ psql
(psql 14.3 (Ubuntu 2:14.3-3-focal))
Type “help” for help.
dvdrental=# CREATE DATABASE newdvd;
dvdrental=# \q
$ ^d

Restoration

$ psql -d newdvd -f backup.sql

Episode four

What we are going to do

  • Look at some PSQL commands
  • Warn you about some PSQL commands
  • Show you some PSQL commands

A quick summary

\c dbname Switch connection to a new database
\l List available databases
\dt List available tables
\d table_name Describe a table such as a column, type, modifiers of columns, etc.
\dn List all schemes of the currently connected database
\df List available functions in the current database
\dv List available views in the current database
\du List all users and their assigned roles
SELECT version(); Retrieve the current version of PostgreSQL server
\g Execute the last command again
\s Display command history
\s filename Save the command history to a file
\i filename Execute psql commands from a file
\? Know all available psql commands
\h Get help Eg:to get detailed information on ALTER TABLE statement use the \h ALTER TABLE
\e Edit command in your own editor
\a Switch from aligned to non-aligned column output
\H Switch the output to HTML format
\q Exit psql shell

Using \g To Repeat Commands

Using \c to switch databases

Using \d to see the Contents of a Database and More

Toggling Output Formats with \s and \H

Be sure to check out episode three and episode four of PostgreSQL for MySQL DBAs!

Jun
15
2022
--

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDB

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDB

Moving MongoDB Cluster to a Different Environment with Percona Backup for MongoDBPercona Backup for MongoDB (PBM) is a distributed backup and restore tool for sharded and non-sharded clusters. In 1.8.0, we added the replset-remapping functionality that allows you to restore data on a new compatible cluster topology.

The new environment can have different replset names and/or serve on different hosts and ports. PBM handles this hard work for you. Making such migration indistinguishable from the usual restore. In this blog post, I’ll show you how to migrate to a new cluster practically.

The Problem

Usually to change a cluster topology you do lots of manual steps. PBM reduces the process.

Let’s have a look at a case where we will have an initial cluster and a desired one.

Initial cluster:

configsrv: "configsrv/conf:27017"
shards:
  - "rs0/rs0:27017,rs1:27017,rs2:27017"
  - "extra-shard/extra:27018"

The cluster consists of the configsrv configsvr replset with a single node and two shards: rs0 (3 nodes in the replset) and extra-shard (1 node in the replset). The names, hosts, and ports are not conventional across the cluster but we will resolve this.

Target cluster:

configsrv: "cfg/cfg0:27019"
shards:
  - "rs0/rs00:27018,rs01:27018,rs02:27018"
  - "rs1/rs10:27018,rs11:27018,rs12:27018"
  - "rs2/rs20:27018,rs21:27018,rs22:27018"

Here we have the cfg configsvr replset with a single node and 3 shards rs0rs2 where each shard is 3-nodes replset.

Think about how you can do this.

With PBM, all that we need is deployed cluster and logical backup made with PBM 1.5.0 or later. The following simple command will do the rest:

pbm restore $BACKUP_NAME --replset-remapping "cfg=configsrv,rs1=extra-shard"

Migration in Action

Let me show you how it looks in practice. I’ll provide details at the end of the post. In the repo, you can find all configs, scripts, and output used here.

As mentioned above, we need a backup. For this, we will deploy a cluster, seed data, and then make the backup.

Deploying the initial cluster

$> initial/deploy >initial/deploy.out
$> docker compose -f "initial/compose.yaml" exec pbm-conf \
     pbm status -s cluster
 
Cluster:
========
configsvr:
  - configsvr/conf:27019: pbm-agent v1.8.0 OK
rs0:
  - rs0/rs00:27017: pbm-agent v1.8.0 OK
  - rs0/rs01:27017: pbm-agent v1.8.0 OK
  - rs0/rs02:27017: pbm-agent v1.8.0 OK
extra-shard:
  - extra-shard/extra:27018: pbm-agent v1.8.0 OK

links: initial/deployinitial/deploy.out

The cluster is ready and we can add some data.

Seed data

We will insert the first 1000 numbers in a natural number sequence: 1 – 1000.

$> mongosh "mongo:27017/rsmap" --quiet --eval "
     for (let i = 1; i <= 1000; i++)
       db.coll.insertOne({ i })" >/dev/null

Getting the data state

These documents should be partitioned across all shards at insert time. Let’s see, in general, how. We will use thedbHash command on all shards to have the collections’ state. It will be useful for verification later.

We will also do a quick check on shards and mongos.

$> initial/dbhash >initial/dbhash.out && cat initial/dbhash.out
 
# rs00:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27017  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# extra:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs00:27017  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# extra:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# mongo:27017
[ 1000, true ]

links: initial/dbhashinitial/dbhash.out

All rs0 members have the same data. So secondaries replicate from primary correctly.

The quickcheck.js used in the initial/dbhash script describes our documents. It returns the number of documents and whether these documents make the natural number sequence.

We have data for the backup. Time to make the backup.

Making a backup

$> docker compose -f initial/compose.yaml exec pbm-conf bash
pbm-conf> pbm backup --wait
 
Starting backup '2022-06-15T08:18:44Z'....
Waiting for '2022-06-15T08:18:44Z' backup.......... done
 
pbm-conf> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [complete: 2022-06-15T08:18:49Z]

We have a backup. It’s enough for migration to the new cluster.

Let’s destroy the initial cluster and deploy the target environment. (Destroying the initial cluster is not a requirement. I just don’t want to waste resources on it.)

Deploying the target cluster

pbm-conf> exit
$> docker compose -f initial/compose.yaml down -v >/dev/null
$> target/deploy >target/deploy.out

links: target/deploy, target/deploy.out

Let’s check the PBM status.

PBM Status

$> docker compose -f target/compose.yaml exec pbm-cfg0 bash
pbm-cfg0> pbm config --force-resync  # ensure agents sync from storage
 
Storage resync started
 
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [incompatible: Backup doesn't match current cluster topology - it has different replica set names. Extra shards in the backup will cause this, for a simple example. The extra/unknown replica set names found in the backup are: extra-shard, configsvr. Backup has no data for the config server or sole replicaset] [2022-06-15T08:18:49Z]

As expected, it is incompatible with the new deployment.

See how to make it work

Resolving PBM Status

pbm-cfg0> export PBM_REPLSET_REMAPPING="cfg=configsvr,rs1=extra-shard"
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:18:44Z 28.23KB <logical> [complete: 2022-06-15T08:18:49Z]

Nice. Now we can restore.

Restoring

pbm-cfg0> pbm restore '2022-06-15T08:18:44Z' --wait
 
Starting restore from '2022-06-15T08:18:44Z'....Started logical restore.
Waiting to finish.....Restore successfully finished!

The –wait flag blocks the shell session till the restore completes. You could not wait but check it later.

pbm-cfg0> pbm list --restore
 
Restores history:
  2022-06-15T08:18:44Z

Everything is going well so far. Almost done

Let’s verify the data.

Data verification

pbm-cfg0> exit
$> target/dbhash >target/dbhash.out && cat target/dbhash.out
 
# rs00:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs10:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs11:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs12:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs20:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs21:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs22:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ }
# rs00:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# rs10:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# rs20:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ ]
# mongo:27017
[ 1000, true ]

links: target/dbhash, target/dbhash.out

As you can see, the rs2 shard is empty. The other two have the identical dbHash and the quickcheck results as in the initial cluster. I think balancer can tell something about this

Balancer status

$> mongosh "mongo:27017" --quiet --eval "sh.balancerCollectionStatus('rsmap.coll')"
 
{
  balancerCompliant: false,
  firstComplianceViolation: 'chunksImbalance',
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1655281436, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1655281436, i: 1 })
}

We know what to do. Starting balancer and checking status again.

$> mongosh "mongo:27017" --quiet --eval "sh.startBalancer().ok"

1
 
$> mongosh "mongo:27017" --quiet --eval "sh.balancerCollectionStatus('rsmap.coll')"
 
{
  balancerCompliant: true,
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1655281457, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1655281457, i: 1 })
}
 
$> target/dbhash >target/dbhash-2.out && cat target/dbhash-2.out

# rs00:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs01:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs02:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "550f86eb459b4d43de7999fe465e39e0" }
# rs10:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs11:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs12:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "4a79c07e0cbf3c9076d6e2d81eb77f0a" }
# rs20:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs21:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs22:27018  db.getSiblingDB("rsmap").runCommand("dbHash").collections
{ "coll" : "6a54e10a5526e0efea0d58b5e2fbd7c5" }
# rs00:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 520, false ]
# rs10:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 480, false ]
# rs20:27018  db.getSiblingDB("rsmap").coll
    .find().sort({ i: 1 }).toArray()
    .reduce(([count = 0, seq = true, next = 1], { i }) =>
             [count + 1, seq && next == i, i + 1], [])
    .slice(0, 2)
[ 229, false ]
# mongo:27017
[ 1000, true ]

links: target/dbhash-2.out

Interesting. rs2 shard has some data. However, rs1 and rs2 haven’t changed. It’s expected that mongos moves some chunks to rs2 and updates the router config. Physically deletion of chunks on a shard is a separate step. That’s why querying data directly on a shard is inaccurate. The data could disappear at any time. The cursor returns all available documents in a replset at the moment despite the router config.

Anyway, we shouldn’t care about it anymore. It is mongos/mongod responsibility now to update router config, query right shards, and remove moved chunks from shards by demand. In the end, we have valid data through mongos.

That’s it.

But wait, we didn’t make a backup! Never forget to make another solid backup.

Making a new backup

Better to change the storage so that we will have backups for the new deployment in a different place and will not see errors about incompatible backups from the initial cluster further.

$> pbm config --file "$NEW_PBM_CONFIG" >/dev/null
$> pbm config --force-resync >/dev/null
$> pbm backup -w >/dev/null
pbm-cfg0> pbm status -s backups
 
Backups:
========
FS  /data/pbm
  Snapshots:
    2022-06-15T08:25:44Z 165.34KB <logical> [complete: 2022-06-15T08:25:49Z]

Now we’re done. And can sleep better.

One More Thing: Possible Misconfiguration

Let’s review another imaginal case to explain all possible errors.

Initial cluster: cfg, rs0, rs1, rs2, rs3, rs4, rs5

Target cluster: cfg, rs0, rs1, rs2, rs3, rs4, rs6

If we apply remapping:rs0=rs0,rs1=rs2,rs2=rs1,rs3=rs4, we will get error like “missed replsets: rs3, rs5. And nothing about rs6.

The missed rs5 should be obvious: backup topology has rs5 replset, but it is missed on target. And target rs6 does not have data to restore from. Adding rs6=rs5 fixes this.

But the missed rs3 could be confusing. Let’s visualize:

init | curr
-----+-----
cfg     cfg  # unchanged
rs0 --> rs0  # mapped. unchanged
rs1 --> rs2
rs2 --> rs1
rs3 -->      # err: no shard
rs4 --> rs3
     -> rs4  # ok: no data
rs5 -->      # err: no shard
     -> rs6  # ok: no data

When we remap the backup from rs4 to rs3, the target rs3 is reserved. The rs3 in the backup does not have a target replset now. Just remapping rs3 to available rs4 will fix it too.

This reservation avoids data duplication. That’s why we use the quick check via mongos.

Details

Compatible topology

Simply speaking, compatible topology is equal to or has a larger number of shards in the target deployment. In our example, we had initial 2 shards but restored them to 3 shards. PBM restored data on two shards only. MongoDB can distribute it with the remaining shards later when the balancer is enabled (sh.startBalancer()). The number of replset members does not matter because PBM takes backup from a member (per replset) and restores it to primary only. Other data-bearing members replicate data from the primary. So you could make a backup from a multi-members replset and then restore it to a single member replset.

You cannot restore to a different replset type like from shardsvr to configsvr.

Preconfigured environment

The cluster should be deployed with all shards added. Users and permissions should be added and assigned in advance. PBM agents should be configured to the same storage and be accessible to it from the new cluster.

Note: PBM agents store backup metadata on storage and keep the cache in MongoDB. pbm config –force-resync lets you refresh the cache from the storage. Do it on a new cluster right after deployment to see backups/oplog chunks made from the initial cluster.

Understanding replset remapping

You can remap replset names by the –replset-remapping flag or PBM_REPLSET_REMAPPING environment variable. If both sets, the flag has precedence.

For full restore, point-in-time recovery, and oplog replay, PBM CLI sends the mapping as a parameter in the command. Each command gets a separate explicit mapping (or none). It can be done only by CLI. Agents do not use the environment variable nor have the flag.

pbm status and pbm list use the flag/envvar to remap replsets in backups/oplog metadata and apply this mapping to the current deployment to show them properly. If backup and present replset names do not match, pbm list will not show these backups, and pbm status prints an error with missed replset names.

Restoring with remapping works with logical backups only.

How does PBM do this?

During restore, PBM reviews current topology and assigns members’ snapshots and oplog chunks to each shard/replset by name, respectively. The remapping changes the default assignment.

After the restore is done, PBM agents sync the router config to make the restored data “native” to this cluster.

Behind the scene

The config.shards collection describes the current topology. PBM uses it to know where and what to restore. The collection is not modified by PBM. But restored data contains some other router configurations for initial topology.

We updated two collections to replace old shard names with new ones in restored data:

  • config.databases – primary shard for non-sharded databases
  • config.chunks – shards where chunks are

After this, MongoDB knows where databases, collections, and chunks are in the new cluster.

CONCLUSION

Migration of a cluster requires much attention, knowledge, and calm. The replset-remapping functionality in Percona Backup for MongoDB reduces complexity during migration between two different environments. I would say, it is near to a routine job now.

Have a nice day ?

Jun
07
2022
--

Migration of MongoDB Enterprise/Community Edition to Percona Server for MongoDB

Migration of MongoDB to Percona Server for MongoDB

Migration of MongoDB to Percona Server for MongoDBIn this blog post, we will discuss how we can migrate from the enterprise/community edition of MongoDB to Percona Server for MongoDB. But before we begin, let’s take a second to explain why you should migrate to Percona Server for MongoDB. 

Percona Distribution for MongoDB is a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together. Percona customers benefit from no lock-in and lower total cost of ownership, along with the freedom to run their MongoDB environment wherever they want to – in a public or private cloud, on-premises, or hybrid environment.

Percona Server for MongoDB offers the same, or equivalent, security features as MongoDB Enterprise without the price tag, and Percona experts are always available to help, bringing in-depth operational knowledge of MongoDB and open source tools so you can optimize database performance. If you’d like to learn more, please click here

Anyway, let’s get back to the purpose of the blog: migrating from the enterprise/community edition of MongoDB to Percona Server for MongoDB. 

Before starting the migration process it’s recommended that you perform a full backup (if you don’t have one already). See this post for MongoDB backup best practices.

The migration procedure:

  1. Backup the config files of the Mongo process.
  2. Stop the Mongo process. If it’s a replica set, then do it in a rolling fashion.
  3. Remove the package of MongoDB community/enterprise edition. For the replica set, do it in a rolling fashion.
  4. Install the Percona Server for MongoDB (PSMDB). It can be downloaded from here. Do it in a rolling fashion for a replica set and start the Mongo service.

Detailed migration plan:

Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL):

   1. Backup the mongo config and service file:

sudo cp /etc/mongod.conf /etc/mongod.conf_bkp

Debian:

sudo cp /lib/systemd/system/mongod.service /lib/systemd/system/mongod.service_bkp

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service /usr/lib/systemd/system/mongod.service_bkp

   2. Stop mongo service first and then remove mongodb-community/enterprise packages and repo:

To stop mongo services, connect to admin database and shutdown as below:

>use admin

>db.shutdownServer()

Remove the package:

Debian:

sudo apt-get remove mongodb-org mongodb-org-mongos mongodb-org-server mongodb-org-shell mongodb-org-tools
sudo rm /etc/apt/sources.list.d/mongodb-org-4.0.list

RHEL/CentOS:

sudo yum erase $(rpm -qa | grep mongodb-org)

If it’s OpsManager then:

   a. Unmanage the project in OpsManager GUI.

   b. Make sure to uncheck enforce users in Opsmanager GUI. 

   c. Disable the automation agent with the below:

sudo apt disable mongodb-mms-automation-agent

   d. Remove the automation agent with:

sudo systemctl remove mongodb-mms-automation-agent

   3. Configure percona repo and install Percona Server for MongoDB (PSMDB):

Debian:

sudo wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb

sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb

Enable the repo:

sudo percona-release enable psmdb-44 release

sudo apt-get update

Install the package:

sudo apt-get install percona-server-mongodb

RHEL/CentOS:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm

Enable the repo:

sudo percona-release setup pdmdb-44

Install the package:

sudo yum install percona-server-mongodb

   4. Copy back the mongod config and service file:

sudo cp /etc/mongod.conf_bkp /etc/mongod.conf

Debian:

sudo cp /lib/systemd/system/mongod.service_bkp /lib/systemd/system/mongod.service

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service_bkp /usr/lib/systemd/system/mongod.service

NOTE: Kindly check that the permissions and ownership of the data directory, keyfile, and log directory are properly updated for the mongod user. 

Also, if the SELinux policy is enabled, then set the necessary SELinux policy for dbPath, keyFile, and logs as below:

sudo semanage fcontext -a -t mongod_var_lib_t '/dbPath/mongod.*'
sudo chcon -Rv -u system_u -t mongod_var_lib_t '/dbPath/mongod'
sudo restorecon -R -v '/dbPath/mongod'
sudo semanage fcontext -a -t mongod_log_t '/logPath/log.*'
sudo chcon -Rv -u system_u -t mongod_log_t '/logPath/log'
sudo restorecon -R -v '/logPath/log'

   5. Enable and start mongod service:

sudo systemctl daemon-reload
sudo systemctl enable mongod
sudo systemctl start mongod
sudo systemctl status mongod

Migrate Replica set MongoDB Enterprise/Community edition to Percona Server for MongoDB (Debian/RHEL):

This migration process involves stopping the Mongo process in the hidden/secondary node first, removing the MongoDB community/enterprise edition packages, installing Percona Server for MongoDB, and starting it with the same data files. Then, step down the current primary node and repeat the same process.

   a. Make sure to check the current Primary and Secondary/hidden nodes.

db.isMaster().primary

   b. Start with the hidden node (if there is no hidden node then start with one of the secondary nodes with the least priority) first.

   c. Repeat steps from 1 to 5 from the section Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL).

   d. Wait for each node to be synced with Primary. Verify it with 

rs.printSecondaryReplicationInfo()

   e. Once completed for all secondary nodes, step down the current primary with

rs.stepDown()

   f. Wait for the new node to be elected as a Primary node and repeat steps 1 to 5 from the section Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL) and wait for the former primary node to be synced with the newly elected Primary.

Migrate Sharded cluster MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL):

   1. Stop the balancer first:

sh.stopBalancer()

   2. Back up the configuration and service files for shards, CSRS, and Query router.

Backup the mongo config and service file for shards and CSRS:

sudo cp /etc/mongod.conf /etc/mongod.conf_bkp

Debian:

sudo cp /lib/systemd/system/mongod.service /lib/systemd/system/mongod.service_bkp

For router:

sudo cp /etc/mongos.conf /etc/mongos.conf_bkp

sudo cp /lib/systemd/system/mongos.service /lib/systemd/system/mongos.service_bkp

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service /usr/lib/systemd/system/mongod.service_bkp

For router:

sudo cp /etc/mongos.conf /etc/mongos.conf_bkp
sudo cp /usr/lib/systemd/system/mongos.service /usr/lib/systemd/system/mongos.service_backup

   3. Start with the hidden node of the CSRS first (if there is no hidden node then start with one of the secondary nodes).

Repeat steps a to f from the section Migrate Replica set MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL)

Once migrated, the CSRS to Percona Server for MongoDB moves to Shards for the migration. Repeat steps a to f from the section Migrate Replica set MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL)

   4. After the migration of the CSRS and Shards, start migrating the MongoS. Connect to one router at a time and execute the below steps followed by the remaining routers.

   5. Stop mongo service and then remove mongodb-community/enterprise packages and repo:

sudo systemctl stop mongos

Debian:

sudo apt-get remove mongodb-org mongodb-org-mongos mongodb-org-server mongodb-org-shell mongodb-org-tools
sudo rm /etc/apt/sources.list.d/mongodb-org-4.0.list

RHEL/CentOS:

sudo yum erase $(rpm -qa | grep mongodb-org)

   6. Configure percona repos and install Percona Server for MongoDB (PSMDB):

Debian:

sudo wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb

Enable the repo:

sudo percona-release enable psmdb-44 release
sudo apt-get update

Install the package:

sudo apt-get install percona-server-mongodb

RHEL/CentOS:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm

Enable the repo:

sudo percona-release setup pdmdb-44

Install the package:

sudo yum install percona-server-mongodb

Copy back the config and service file:

sudo cp /etc/mongos.conf_bkp /etc/mongos.conf

Debian:

sudo cp /lib/systemd/system/mongos.service_bkp /lib/systemd/system/mongos.service

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongos.service_bkp /usr/lib/systemd/system/mongos.service

NOTE: Kindly check that the permissions and ownership of keyfile and log directory are properly updated for the mongod user.

   7. Enable and start mongos service:

sudo systemctl daemon-reload
sudo systemctl enable mongos
sudo systemctl start mongos
sudo systemctl status mongos

   8. Re-enable the balancer with below:

sh.startBalancer()

Conclusion:

To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered? 

We also encourage you to try our products for MongoDB like Percona Server for MongoDB, Percona Backup for MongoDB, or Percona Operator for MongoDB.

Jun
03
2022
--

MyDumper’s Stream Implementation

MyDumper Stream Implementation

MyDumper Stream ImplementationAs you might know, mysqldump is single-threaded and STDOUT is its default output. As MyDumper is multithreaded, it has to write on different files. Since version 0.11.3 was released in Nov 2021, we have the possibility to stream our backup in MyDumper. We thought for several months until we decided what was the simplest way to implement it and we also had to add support for compression. So, after fixing several bugs, and we now consider it is stable enough, we can explain how it works.

How Can You Stream if MyDumper is Multithreaded?

Receiving a stream is not a problem for myloader, it receives a file at a time and sends it to a thread to process it. However, each worker thread in mydumper is connected to the database, and as soon as it reads data, it should be sent to the stream, which might cause collisions with other worker threads that are reading data from the database. In order to avoid this issue, we ended up with the simplest solution: mydumper is going to take a backup and store it in the local file system that you configured, and the filename will be enqueued to be processed by the Stream Thread which pops one file at a time and pipes to stdout. We study the alternative to send chunks of the file while it is being dumped, but the way that we implemented is simpler and improves the overall performance.

Implementation Details

Here is a high-level diagram of how we implemented it:

MyDumper


When a mydumper Worker Thread processes a job, it connects to the database and stores the output into a file. That didn’t change, but with stream, we are pushing the filename into the mydumper stream_queue.

The mydumper Stream Thread is popping filenames from the mydumper stream_queue, it is going to send the header of the file to stdout and then open the file and send its content.

Then, myloader Stream Thread is going to receive and detect the header, it will create the new file with the filename from the header and store the content in it.

After closing the file, it will enqueue the filename in the myloader stream_queue. A myloader Worker Thread is going to take that file and process it according to the kind of file it is.

By default, the files are deleted, but if you want to keep them, you can use the –no-delete option. 

The header is simply adding — to the filename so you can use myloader or mysql client to import your database. Here is an example:

-- sbtest-schema-create.sql
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `sbtest` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */;

-- sbtest.sbtest1-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;

/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `sbtest1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `pad2` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100010 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- sbtest.sbtest1.00000.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `sbtest1` VALUES(1,49929,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","67847967377-48000963322-62604785301-91415491898-96926520291","")
…

Simple Use Cases

A thread writes to a single file to avoid a collision, which improves the performance. However, having thousands of files for a backup of a couple of tables is not manageable. So, the simplest use case is to send everything to a single file:

mydumper -B <SCHEMA_NAME> -h <FROM> > filename.sql

Then you can just simply import it using:

myloader --stream -o -h <TO_SERVER> < filename.sql

Now that you can pipe from a mydumper process to myloader, this execution is possible:

mydumper -B <SCHEMA_NAME> -h <FROM> | myloader --stream -o -h <TO>

pipe from a mydumper process to myloader

Or you can send the stream through the network using nc: 

mydumper -B <SCHEMA_NAME> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
nc -l <MYDUMPER_SERVER> <ANY_PORT> | myloader --stream -o -h <TO_SERVER>

 

stream through the network using nc

This implementation is using the backup directory on mydumper and myloader as Buffers, you must take this into account, as by default it is going to create a directory where you run it.

Another thing that you need to take into account is that mydumper and myloader will be writing on disk, the whole backup will be written on both File Systems while it is being processed, and use a file system with enough disk space.

Finally, you can keep myloader running and send several mydumper backups. First, you need to run:

nc -k -l <MYDUMPER_SERVER> <ANY_PORT> | myloader --stream -o -h <TO_SERVER>

And then execute:

mydumper -B <SCHEMA_NAME_1> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_2> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_3> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_4> -h <FROM_SERVER> | nc -N <MYDUMPER_SERVER> <ANY_PORT>

Some versions of nc have these two options:

      -k      When a connection is completed, listen for another one.  Requires -l.

     -N      shutdown(2) the network socket after EOF on the input.  Some servers require this to finish their work.

This is very useful if you are refreshing some testing environment and you only need a couple of tables on different databases or if you are using a where clause that only applies to some tables.

Considerations

Usually, when you send data to STDOUT, you are not going to have trouble with disk space usage on the dumper server. That is NOT true if you are using MyDumper. Files will be stored on the mydumper server until they are transferred to the receiving server. For instance, if you have a 10TB database, with a very low network bandwidth compared to the disk bandwidth, you might end up filling up the disk where you keep the files temporarily.

Conclusion

We focus the implementation to speed up export and import processes. Opposite to other software or implementations, we use the file system as a buffer causing a higher disk utilization.

Jun
03
2022
--

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded ClusterIn this blog post, we will discuss how can we migrate from a replica set to sharded cluster. 

Before moving to migration let me briefly explain Replication and Sharding and why do we need to shard a replica Set.

Replication: It creates additional copies of data and allows for automatic failover to another node in case Primary went down. It also helps to scale our reads if the application is fine to read data that may not be the latest.

Sharding: It allows horizontal scaling of data writes by allowing data partition in multiple servers by using a shard key. Here, we should understand that a shard key is very important to distribute the data evenly across multiple servers. 

Why Do We Need a Sharded Cluster?

We need sharding due to the below reasons:

  1. By adding shards, we can reduce the number of operations each shard manages. 
  2. It increases the Read/Write capacity by distributing the Reads/Writes across multiple servers. 
  3. It also gives high availability as we deploy the replicas for the shards, config servers, and multiple MongoS.

Sharded cluster will include two more components which are Config Servers and Query routers i.e. MongoS.

Config Servers: It keeps metadata for the sharded cluster. The metadata comprises a list of chunks on each shard and the ranges that define the chunks. The metadata indicates the state of all the data and its components within the cluster. 

Query Routers(MongoS): It caches metadata and uses it to route the read or write operations to the respective shards. It also updates the cache when there are any metadata changes for the sharded cluster like Splitting of chunks or shard addition etc. 

Note: Before starting the migration process it’s recommended that you perform a full backup (if you don’t have one already).

The Procedure of Migration:

  1. Initiate at least a three-member replica set for the Config Server ( another member can be included as a hidden node for the backup purpose).
  2. Perform necessary OS, H/W, and disk-level tuning as per the existing Replica set.
  3. Setup the appropriate clusterRole for the Config servers in the mongod config file.
  4. Create at least two more nodes for the Query routers ( MongoS )
  5. Set appropriate configDB parameters in the mongos config file.
  6. Repeat step 2 from above to tune as per the existing replica set.
  7. Apply proper SELinux policies on all the newly configured nodes of Config server and MongoS.
  8. Add clusterRole parameter into existing replica set nodes in a rolling fashion.
  9. Copy all the users from the replica set to any MongoS.
  10. Connect to any MongoS and add the existing replica set as Shard. 

Note: Do not enable sharding on any database until the shard key is finalized. If it’s finalized then we can enable the sharding.

Detailed Migration Plan:

Here, we are assuming that a Replica set has three nodes (1 primary, and 2 secondaries)

  1. Create three servers to initiate a 3-member replica set for the Config Servers.

Perform necessary OS, H/W, and disk-level tuning. To know more about it, please visit our blog on Tuning Linux for MongoDB.

  1. Install the same version of Percona Server for MongoDB as the existing replica set from here.
  2. In the config file of the config server mongod, add the parameter clusterRole: configsvr and port: 27019  to start it as config server on port 27019.
  3. If SELinux policy is enabled then set the necessary SELinux policy for dbPath, keyFile, and logs as below.
sudo semanage fcontext -a -t mongod_var_lib_t '/dbPath/mongod.*'

sudo chcon -Rv -u system_u -t mongod_var_lib_t '/dbPath/mongod'

sudo restorecon -R -v '/dbPath/mongod'

sudo semanage fcontext -a -t mongod_log_t '/logPath/log.*'

sudo chcon -Rv -u system_u -t mongod_log_t '/logPath/log'

sudo restorecon -R -v '/logPath/log'

sudo semanage port -a -t mongod_port_t -p tcp 27019

Start all the Config server mongod instances and connect to any one of them. Create a temporary user on it and initiate the replica set.

> use admin

> rs.initiate()

> db.createUser( { user: "tempUser", pwd: "<password>", roles:[{role: "root" , db:"admin"}]})

Create a role anyResource with action anyAction as well and assign it to “tempUser“.

>db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",

      "privileges": [

         { "resource": { "anyResource": true },

           "actions": [ "anyAction" ]

         }

      ],

      "roles": []

   });

> 

>db.grantRolesToUser( "tempUser", [{role: "pbmAnyAction", db: "admin"}]  )

> rs.add("config_host[2-3]:27019")

Now our Config server replica set is ready, let’s move to deploying Query routers i.e. MongoS.

  1. Create two instances for the MongoS and tune the OS, H/W, and disk. To do it follow our blog Tuning Linux for MongoDB or point 1 from the above Detailed migration.
  2. In mongos config file, adjust the configDB parameter and include only non-hidden nodes of Config servers ( In this blog post, we have not mentioned starting hidden config servers).
  3. Apply SELinux policies if it’s enabled, then follow step 4 and keep the same keyFile and start the MongoS on port 27017.
  4. Add the below parameter in mongod.conf on the Replica set nodes. Make sure the services are restarted in a rolling fashion i.e. start with the Secondaries then step down the existing Primary and restart it with port 27018.
clusterRole: shardsvr

Login to any MongoS and authenticate using “tempUser” and add the existing replica set as a shard.

> sh.addShard( "replicaSetName/<URI of the replica set>") //Provide URI of the replica set

Verify it with:

> sh.status() or db.getSiblingDB("config")['shards'].find()

Connect to the Primary of the replica set and copy all the users and roles. To authenticate/authorize mention the replica set user.

> var mongos = new Mongo("mongodb://put MongoS URI string here/admin?authSource=admin") //Provide the URI of the MongoS with tempUser for authentication/authorization.

>db.getSiblingDB("admin").system.roles.find().forEach(function(d) {

mongos.getDB('admin').getCollection('system.roles').insert(d)});

>db.getSiblingDB("admin").system.users.find().forEach(function(d) { mongos.getDB('admin').getCollection('system.users').insert(d)});

  1.  Connect to any MongoS and verify copied users on it. 
  2.  Shard the database if shardKey is finalized (In this post, we are not sharing this information as it’s related to migration of Replica set to Sharded cluster only).

Shard the database:

>sh.enableSharding("<db>")

Shard the collection with hash-based shard key:

>sh.shardCollection("<db>.<coll1>", { <shard key field> : "hashed" } )

Shard the collection with range based shard key:

sh.shardCollection("<db>.<coll1>", { <shard key field> : 1, ... } )

Conclusion

Migration of a MongoDB replica set to a sharded cluster is very important to scale horizontally, increase the read/write operations, and also reduce the operations each shard manages.

We encourage you to try our products like Percona Server for MongoDB, Percona Backup for MongoDB, or Percona Operator for MongoDB. You can also visit our site to know “Why MongoDB Runs Better with Percona”.

Jun
01
2022
--

Session Temporary Tablespaces and Disk Space Usage in MySQL

Session Temporary Tablespaces and Disk Space Usage in MySQL

Session Temporary Tablespaces and Disk Space Usage in MySQLTemporary Tables handling in MySQL has a somewhat checkered past. Some time ago, my colleague Przemek wrote a post about differences in the implementations between versions of MySQL. If you are looking for the differences in implementation, that’s a perfect place to check.

In this post, I’d like to focus on Session Temporary Tablespaces, InnoDB on-disk internal temporary tables that come to play when a query requires storing more data than tmp_table_size or TempTable engine allocated more than temptable_max_mmap of memory-mapped temporary files.

If that condition happens, a session executing the query needs to be allocated with a tablespace from a pool of temporary tablespaces. The pool initially contains 10 temporary tablespaces that are created when an instance is started. The size of the pool never shrinks, and tablespaces are added to the pool automatically if needed. The default size of the tablespace (IBT file) is five pages or 80 KB.

When a session disconnects, its temporary tablespaces are truncated and released to the pool with their initial size. The truncate operation happens ONLY when the session disconnects; this also means that as long as the session is connected, the tablespaces consume disk space. MySQL can reuse the area for future queries, but if one query requires a lot of temporary space, the IBT will remain huge for the whole lifetime of a session.

Let’s demonstrate that behavior. Firstly, let’s create a table and populate it with one million rows. As a side note, storing SHA1 checksums in a char(40) field isn’t perfect, but it makes the test cleaner.

mysql> CREATE TABLE table01 (id int unsigned primary key auto_increment, s char(40));
mysql> INSERT INTO table01 (s) SELECT sha1(FLOOR(RAND() * 100))  from sequence_table(1000000) t;

Then, the below query that’s using temporary tables is going to be executed. As the temporary table size is larger than a value of tmp_table_size, which is 16 MB by default, that query will have to use the Session Temporary Tablespace.

mysql> pager pt-visual-explain
mysql> explain SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
Table scan
rows           1991860
+- DERIVED
   table          derived(temporary(union(table01,table01)))
   +- Table scan
      +- TEMPORARY
         table          temporary(union(table01,table01))
         +- Filesort
            +- Table scan
               +- UNION
                  table          union(table01,table01)
                  +- Table scan
                  |  rows           995930
                  |  +- Table
                  |     table          table01
                  +- Table scan
                     rows           995930
                     +- Table
                        table          table01

mysql> SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
+--------+------------------------------------------+
| id     | s                                        |
+--------+------------------------------------------+
| 999145 | fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f |
+--------+------------------------------------------+

Let’s check the temporary tablespaces attached to that session after executing the query:

mysql> SELECT PATH, format_bytes(SIZE), STATE, PURPOSE FROM INFORMATION_SCHEMA.INNODB_SESSION_TEMP_TABLESPACES WHERE id = CONNECTION_ID();
+----------------------------+--------------------+--------+-----------+
| PATH                       | format_bytes(SIZE) | STATE  | PURPOSE   |
+----------------------------+--------------------+--------+-----------+
| ./#innodb_temp/temp_10.ibt | 392.00 MiB         | ACTIVE | INTRINSIC |
+----------------------------+--------------------+--------+-----------+

# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 392M 05-29 14:10 temp_10.ibt

After closing the session the file size is back to its default value:

# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 80K 05-29 14:10 temp_10.ibt

The solution works as described in the MySQL manual, however, it also means that it can be problematic. Long-living sessions aren’t uncommon in the databases world, primarily if applications use connection pooling. Moreover, connection pooling was designed precisely for that purpose, to mitigate the overhead of creating a new connection each time an application needs it, as reusing already existing connections to a database is more efficient than opening a new connection.

For instance, if between an application and MySQL, ProxySQL is used as middleware, the application lacks most of the control of the backend connection. It means that balanced connections to the backend are going to be used, but also, the connections will most likely live forever. Because of that fact, MySQL will rarely reclaim space from Session Temporary Tablespace, and this, as a result, will increase the overall disk space utilization.

It’s not that hard to hit. One heavy OLAP query from time to time it’s enough to start the snowball.

ProxySQL has an option that forces backend connection to be re-initialized – mysql-connection_max_age_ms (disabled by default). However, this works only for ProxySQL, and there are plenty more Connection Poolers available, and some of them are implemented directly on the applications’ side, which in general increases the complexity of the whole problem.

In an ideal world, MySQL should handle this issue by triggering the truncate process more often. I’ve submitted a Feature Request with the goal of having the “surprise” factor removed, and ultimately making it more user-friendly.

May
31
2022
--

Percona Platform First Steps

Percona Platform

I am a big fan of Percona Monitoring and Management (PMM) and am happy to report that setting up Percona Platform is as easy to set up and offers a lot of value. Percona Platform reached GA status recently and I think you will find it a handy addition to your infrastructure.

What is Percona Platform?

Percona Platform brings together enterprise-level distributions of MySQL, PostgreSQL, and MongoDB plus it includes a range of open source tools for data backup, availability, and management. The core is PMM which provides database management, monitoring, and automated insights, making it easier to manage database deployments. The number of sites with more than 100 separate databases has grown rapidly in the past few years.  Being able to have command and control of that many instances from a CLI has become impossible. Businesses need to move faster in increasingly complex environments which puts ever-increasing pressure on database administrators, developers, and everyone involved in database operations. The spiraling levels of demand make it harder to support, manage, and correct issues in database environments.

What Percona Platform provides is a unified view of the health of your entire database environment to quickly visually identify and remediate issues.  Developers can now self-service many of their database demands quickly and efficiently so they can easily provision and manage databases on a self-service basis across test and production instances. So you spend fewer resources and time on the management of database complexity.

The two keys to Percona Platform are Query Analytics (QAN), which provides granular insights into database behavior and helps uncover new database performance patterns for in-depth troubleshooting and performance optimization, and Percona Advisors, which are automated insights, created by Percona Experts to identify important issues for remediation such as security vulnerabilities, misconfigurations, performance problems, policy compliance, and database design issues. Automated insights within Percona Monitoring and Management ensure your database performs at its best. The Advisors check for replication inconsistencies, durability issues, password-less users, insecure connections, unstable OS configuration, and search for available performance improvements among other functions. 

Percona Platform is a point of control for your database infrastructure and augments PMM to be even more intelligent when connected to the Percona Platform. By connecting PMM with the Percona Platform, you get more advanced Advisors, centralized user account management, access to support tickets, private Database as a Service, Percona Expertise with the fastest SLAs, and more.

Percona Platform architecture

So How Do I Install Percona Platform?

The first step is to install PMM by following the Quick Start Guide. You need version 2.2.7 or later.

Second, You will need a Percona Account.

Third, you will need to connect that account to PMM.

I will assume that you will already have PMM installed.  Did I mention that PMM is free, open source software?

Percona Platform account signup

The signup form allows you to create a new account or use an existing account.

Now you can create a name for your organization.

Form for creating your organization and connecting PMM

After creating your username and password, create your organization

Now login to your PMM dashboard and select the Settings / Percona Platform.  You will need to get your ‘Public Address’ which the browser can populate the value for you if need be.

The PMM Server ID is automatically generated by PMM.  You will need to provide a name for your server, and you will need a second browser window to login into Percona Platform to get the Percona Platform Access Token (this token has a thirty-minute lifetime, so be quick or regenerate another token).

Go back into PMM, paste the Access Token into the Percona Platform Access Token field, and click Connect.

On the Percona Platform page, you will see your PMM instances. Congratulations, you are using Percona Platform!

Advisor Checks

All checks are hosted on Percona Platform. PMM Server automatically downloads them from here when the Advisors and Telemetry options are enabled in PMM under Configuration > Settings > Advanced Settings. Both options are enabled by default.

Depending on the entitlements available for your Percona Account, the set of advisor checks that PMM can download from Percona Platform differ in terms of complexity and functionality.

If your PMM instance is not connected to Percona Platform, PMM can only download the basic set of Anonymous advisor checks. As soon as you connect your PMM instance to Percona Platform, has access to additional checks, available only for Registered PMM instances.

If you are a Percona customer with a Percona Customer Portal account, you also get access to Paid checks, which offer more advanced database health information. A list is provided below.

Check Name Description Tier
MongoDB Active vs Available Connections Checks the ratio between Active and Available connections. Registered, Paid
MongoDB Authentication Warns if MongoDB authentication is disabled. Anonymous, Registered, Paid
MongoDB Security AuthMech Warns if MongoDB is not using the default SHA-256 hashing as SCRAM authentication method. Paid
MongoDB IP Bindings Warns if MongoDB network binding is not set as recommended. Anonymous, Registered, Paid
MongoDB CVE Version Shows an error if MongoDB or Percona Server for MongoDB version is not the latest one with CVE fixes. Anonymous, Registered, Paid
MongoDB Journal Check Warns if journal is disabled. Registered, Paid
MongoDB Localhost Authentication Bypass is Enabled Warns if MongoDB localhost bypass is enabled. Anonymous, Registered, Paid
MongoDB Non-Default Log Level Warns if MongoDB is not using the default log level. Paid
MongoDB Profiling Level Warns when the MongoDB profile level is set to collect data for all operations. Registered, Paid
MongoDB Read Tickets Warns if MongoDB is using more than 128 read tickets. Paid
MongoDB Replica Set Topology Warns if the Replica Set cluster has less than three members. Registered, Paid
MongoDB Version Warns if MongoDB or Percona Server for MongoDB version is not the latest one. Anonymous, Registered, Paid
MongoDB Write Tickets Warns if MongoDB network is using more than 128 write tickets. Paid
Check if Binaries are 32-bits Notifies if version_compatible_machine equals i686. Anonymous, Registered, Paid
MySQL Automatic User Expired Password Notifies if version_compatible_machine equals i686. Registered, Paid
MySQL InnoDB flush method and File Format check Checks the following settings: innodb_file_formatinnodb_file_format_maxinnodb_flush_method and innodb_data_file_path Registered, Paid
MySQL Checks based on values of MySQL configuration variables Checks the following settings: innodb_file_format,innodb_file_format_max,innodb_flush_method and innodb_data_file_path. Paid
MySQL Binary Logs checks, Local infile and SQL Mode checks Warns about non-optimal settings for Binary Log, Local Infile and SQL mode. Registered, Paid
MySQL Configuration Check Warns if parameters are not following Percona best practices, for infile, replication threads, and replica checksum. Paid
MySQL Users With Granted Public Networks Access Notifies about MySQL accounts allowed to be connected from public networks. Registered, Paid
MySQL User Check Runs a high-level check on user setup Registered, Paid
MySQL Advanced User Check Runs a detailed check on user setup Paid
MySQL Security Check Runs a detailed check on user setup Paid
MySQL Test Database This check returns a notice if there is a database with name ‘test’ or ‘test_%’. Registered, Paid
MySQL Version Warns if MySQL, Percona Server for MySQL, or MariaDB version is not the latest one. Anonymous, Registered, Paid
PostgreSQL Archiver is Failing Verifies if the archiver has failed. Paid
PostgreSQL Cache Hit Ratio Checks database hit ratio and complains when this is too low. Paid
PostgreSQL Configuration Change Requires Restart/Reload Warns when a configuration was changed and requires a server restart/reload Registered, Paid
PostgreSQL fsync is Disabled Shows an error if the fsync configuration is disabled, as this can result in unrecoverable data corruption. Anonymous, Registered, Paid
PostgreSQL Autovacuum Logging Is Disabled Shows an error if the fsync configuration is disabled, as this can result in unrecoverable data corruption. Paid
PostgreSQL Checkpoints Logging Is Disabled Notifies if the log_checkpoints configuration option is not enabled. Paid
PostgreSQL Max_connections is too high Notifies if the max_connections setting is set above 300. Anonymous, Registered, Paid
PostgreSQL Stale Replication Slot Warns for stale replication slots since these can lead to WAL file accumulation and DB server outage. Paid
PostgreSQL Super Role Notifies if there are users with superuser privileges. Anonymous, Registered, Paid
PostgreSQL Version Check Warns if the PostgreSQL minor or major versions are not the latest, and shows an error if the major version is 9.4 or older. Anonymous, Registered, Paid

More Advisors will be added and Perona always welcomes contributions. Check out Percona Platform today!

 

May
26
2022
--

Is MySQL Statement-Based / Mixed Replication Really Safe?

MySQL Statement-Based

MySQL Statement-BasedThe binary logging format in MySQL has been ROW by default since MySQL 5.7, yet there are still many users sticking with STATEMENT or MIXED formats for various reasons. In some cases, there is just simple hesitation from changing something that has worked for years on legacy applications. But in others, there may be serious blockers, most typically missing primary keys in badly designed schemas, which would lead to serious performance issues on the replicas.

As a Support Engineer, I can still see quite a few customers using STATEMENT or MIXED formats, even if they are already on MySQL 8.0. In many cases this is OK, but recently I had to deal with a pretty nasty case, where not using ROW format was found to cause the replicas to silently lose data updates, without raising any replication errors! Was it some really rare edge use case? Not at all! Let me demonstrate a very simple test case below to illustrate how easy it is to end up in such a bad situation.

— source 

mysql> select @@binlog_format,@@system_time_zone;
+-----------------+--------------------+
| @@binlog_format | @@system_time_zone |
+-----------------+--------------------+
| STATEMENT       | BST                |
+-----------------+--------------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE `test1` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    ->   `a` varchar(30) NOT NULL,
    ->   `name` varchar(25) DEFAULT NULL,
    ->   PRIMARY KEY (`a`),
    ->   UNIQUE KEY `id` (`id`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> insert into test1 values (null,now(),"test1",0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test1 values (null,now(),"test2",0);
Query OK, 1 row affected (0.01 sec)

mysql> insert into test1 values (null,now(),"test3",0);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 10:13:37 | test1 | 0    |
|  2 | 2022-05-22 10:13:37 | test2 | 0    |
|  3 | 2022-05-22 10:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

— replica

mysql> select @@system_time_zone;
+--------------------+
| @@system_time_zone |
+--------------------+
| UTC                |
+--------------------+
1 row in set (0.00 sec)

mysql> select * from db1.test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 09:13:37 | test1 | 0    |
|  2 | 2022-05-22 09:13:37 | test2 | 0    |
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

— source 

mysql> UPDATE test1 SET name = 'foobar', d = CURRENT_TIMESTAMP WHERE a = 'test1' AND d = '2022-05-22 10:13:37';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from test1;
+----+---------------------+-------+--------+
| id | d                   | a     | name   |
+----+---------------------+-------+--------+
|  1 | 2022-05-22 10:16:15 | test1 | foobar |
|  2 | 2022-05-22 10:13:37 | test2 | 0      |
|  3 | 2022-05-22 10:13:38 | test3 | 0      |
+----+---------------------+-------+--------+
3 rows in set (0.00 sec)

— replica

mysql> select * from db1.test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 09:13:37 | test1 | 0    |
|  2 | 2022-05-22 09:13:37 | test2 | 0    |
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

mysql> pager egrep "Running|SQL_Error"
PAGER set to 'egrep "Running|SQL_Error"'

mysql > show replica status\G
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
               Last_SQL_Error: 
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

 

Another test, using UTC_TIME() compared against a column that was populated using the CURRENT_TIMESTAMP function:

— source 

mysql> select * from test1 WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 10:13:38 | test3 | 0    |
+----+---------------------+-------+------+
1 row in set (0.00 sec)

— replica

mysql> select * from test1 WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
Empty set (0.00 sec)

Therefore, when a similar condition is used for update:

— source

mysql> update test1 set name="bar" WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from test1 where id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 22:12:15 | test3 | bar  |
+----+---------------------+-------+------+
1 row in set (0.01 sec)

— replica

mysql> select * from test1 where id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
1 row in set (0.01 sec)

mysql > show replica status\G
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
               Last_SQL_Error: 
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

Again replica ignored the update while there is no replication error reported. This particular scenario can be expected to happen really often in geographically-distributed database environments.

As the functions used here are not considered unsafe for replication, two usual safety actions are not performed:

  • no warning is printed in the error log when the STATEMENT format used
  • replication event is not logged in RBR format when the MIXED format is used but rather left the original query as it is

I find this potentially dangerous and hence reported it: https://bugs.mysql.com/bug.php?id=107293

Summary

ROW-based replication has become the standard in MySQL and is the most reliable one. It is also the only one permitted for virtually synchronous replication solutions like Percona XtraDB Cluster/Galera and MySQL Group Replication.

At the same time, STATEMENT or even MIXED format, may lead to data consistency issues, which can be undetected for a long time, making it very difficult to investigate when finally replication error happens as a result.

If there is anything that has prevented you from switching to ROW format yet, the sooner you deal with it the better.

May
26
2022
--

Looking for an Excellent MySQL Book for Beginners? The MySQL Workshop is a Great Choice

MySQL Book for Beginners

MySQL Book for Beginners

Last week at Percona Live, I was asked what book I recommend for novices seeking to learn MySQL.  For a long time, there has not been a good choice for modern versions of MySQL. Luckily I had just stumbled upon such a book.  Now I am happy to recommend The MySQL Workshop – A practical guide to working with data and managing databases with MySQL by Petit and Cosentino.

The first chapter introduces database architectures, data types, storage engines (including MyRocks), and data normalization. The following chapter cover in great detail how to create a database, using MySQL Workbench, backups & restoring data, and creating indexes. Chapter four has a very good section on working with SQL, functions, and case statements. Then JOINs and stored procedures are covered.

In another book, that would probably be enough content, but later chapters plunge into using Node.JS, Access, and Excel with MySQL.

The last few chapters cover loading data from many file formats, JSON data usage, user permissions, and logical backups. While knot a comprehensive MySQL DBA guide this material is well done and complete.

That is a lot of content in seven hundred pages.  The writing style is clear and numerous examples are well done.  Bravo to the authors for their achievements in this book.

This may not be the reference text for a grizzled MySQL veteran but for real novices and intermediates, this book is a must-have.  This would be a great classroom text for an introduction to MySQL class or a handy hardcopy reference for someone on the initial climb of the learning curve.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com