Percona Monitoring and Management in Kubernetes is now in Tech Preview

Percona Monitoring and Management in Kubernetes

Percona Monitoring and Management in KubernetesOver the course of the years, we see the growing interest in running databases and stateful workloads in Kubernetes. With Container Storage Interfaces (CSI) maturing and more and more Operators appearing, running stateful workloads in your favorite platform is not that scary anymore. Our Kubernetes story at Percona started with Operators for MySQL and MongoDB, adding PostgreSQL later on. 

Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. It can be deployed in a virtual appliance or a Docker container. Our customers requested us to provide a way to deploy PMM in Kubernetes for a long time. We had an unofficial helm chart which was created as a PoC by Percona teams and the community (GitHub).

We are introducing the Technical Preview of the helm chart that is supported by Percona to easily deploy PMM in Kubernetes. You can find it in our helm chart repository here

Use cases

Single platform

If Kubernetes is a platform of your choice, currently you need a separate virtual machine to run Percona Monitoring and Management. No more with an introduction of a helm chart. 

As you know, Percona Operators all have integration with the PMM which enables monitoring for databases deployed on Kubernetes. Operators configure and deploy pmm-client sidecar container and register the nodes on a PMM server. Bringing PMM into Kubernetes simplifies this integration and the whole flow. Now the network traffic between pmm-client and PMM server does not leave the Kubernetes cluster at all.

All you have to do is to set the correct endpoint in a pmm section in the Custom Resource manifest. For example, for Percona Operator for MongoDB, the pmm section will look like this:

    enabled: true
    image: percona/pmm-client:2.28.0
    serverHost: monitoring-service

Where monitoring-service is the service created by a helm chart to expose a PMM server.

High availability

Percona Monitoring and Management has lots of moving parts: Victoria Metrics to store time-series data, ClickHouse for query analytics functionality, and PostgreSQL to keep PMM configuration. Right now all these components are a part of a single container or virtual machine, with Grafana as a frontend. To provide a zero-downtime deployment in any environment, we need to decouple these components. It is going to substantially complicate the installation and management of PMM.

What we offer instead right now are ways to automatically recover PMM in case of failure within minutes (for example leveraging the EC2 self-healing mechanism).

Kubernetes is a control plane for container orchestration that automates manual tasks for engineers. When you run software in Kubernetes it is best if you rely on its primitives to handle all the heavy lifting. This is what PMM looks like in Kubernetes:

PMM in Kubernetes

  • StatefulSet controls the Pod creation
  • There is a single Pod with a single container with all the components in it
  • This Pod is exposed through a Service that is utilized by PMM Users and pmm-clients
  • All the data is stored on a persistent volume
  • ConfigMap has various environment variable settings that can help to fine-tune PMM

In case of a node or a Pod failure, the StatefulSet is going to recover PMM Pod automatically and remount the Persistent Volume to it. The recovery time depends on the load of the cluster and node availability, but in normal operating environments, PMM Pod is up and running again within a minute.


Let’s see how PMM can be deployed in Kubernetes.

Add the helm chart:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install PMM:

helm install pmm --set service.type="LoadBalancer" percona/pmm

You can now login into PMM using the LoadBalancer IP address and use a randomly generated password stored in a


Secret object (default user is admin).

The Service object created for PMM is called



$ kubectl get services monitoring-service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
monitoring-service   LoadBalancer   443:32591/TCP   3m34s

$ kubectl get secrets pmm-secret -o yaml
apiVersion: v1

$ echo 'LE5lSTx3IytrUWBmWEhFTQ==' | base64 --decode && echo

Login to PMM by connecting to HTTPS://<YOUR_PUBLIC_IP>.


Helm chart is a template engine for YAML manifests and it allows users to customize the deployment. You can see various parameters that you can set to fine-tune your PMM installation in our README

For example, to set choose another storage class and set the desired storage size, set the following flags:

helm install pmm percona/pmm \
--set storage.storageClassName="premium-rwo" \
-–set storage.size=”20Gi”

You can also change these parameters in values.yaml and use “-f” flag:

# values.yaml contents
  storageClassName: “premium-rwo”
  size: 20Gi

helm install pmm percona/pmm -f values.yaml


For most of the maintenance tasks, regular Kubernetes techniques would apply. Let’s review a couple of examples.

Compute scaling

It is possible to vertically scale PMM by adding or removing resources through


variable in values.yaml. 

    memory: "4Gi"
    cpu: "2"
    memory: "8Gi"
    cpu: "4"

Once done, upgrade the deployment:

helm upgrade -f values.yaml pmm percona/pmm

This will restart a PMM Pod, so better plan it carefully not to disrupt your team’s work.

Storage scaling

This depends a lot on your storage interface capabilities and the underlying implementation. In most clouds, Persistent Volumes can be expanded. You can check if your storage class supports it by describing it:

kubectl describe storageclass standard
AllowVolumeExpansion:  True

Unfortunately, just changing the size of the storage in values.yaml (storage.size) will not do the trick and you will see the following error:

helm upgrade -f values.yaml pmm percona/pmm
Error: UPGRADE FAILED: cannot patch "pmm" with kind StatefulSet: StatefulSet.apps "pmm" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We use the StatefulSet object to deploy PMM, and StatefulSets are mostly immutable and there are a handful of things that can be changed on the fly. There is a trick though.

First, delete the StatefulSet, but keep the Pods and PVCs:

kubectl delete sts pmm --cascade=orphan

Recreate it again with the new storage configuration:

helm upgrade -f values.yaml pmm percona/pmm

It will recreate the StatefulSet with the new storage size configuration. 

Edit Persistent Volume Claim manually and change the storage size (the name of the PVC can be different for you). You need to change the storage in spec.resources.requests.storage section:

kubectl edit pvc pmm-storage-pmm-0

The PVC is not resized yet and you can see the following message when you describe it:

kubectl describe pvc pmm-storage-pmm-0
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 16 Jun 2022 11:55:56 +0300           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

The last step would be to restart the Pod:

kubectl delete pod pmm-0


Running helm upgrade is a recommended way. Either once a new helm chart is released or when you want to upgrade the newer version of PMM by replacing the image in the image section. 

Backup and restore

PMM stores all the data on a Persistent Volume. As said before, regular Kubernetes techniques can be applied here to backup and restore the data. There are numerous options:

  • Volume Snapshots – check if it is supported by your CSI and storage implementation
  • Third-party tools, like Velero, can handle the backups and restores of PVCs
  • Snapshots provided by your storage (ex. AWS EBS Snapshots) with manual mapping to PVC during restoration

What is coming

To keep you excited there are numerous things that we are working on or have planned to enable further Kubernetes integrations.

OpenShift support

We are working on building a rootless container so that OpenShift users can run Percona Monitoring and Management there without having to grant elevated privileges. 

Microservices architecture

This is something that we have been discussing internally for some time now. As mentioned earlier, there are lots of components in PMM. To enable proper horizontal scaling, we need to break down our monolith container and run these components as separate microservices. 

Managing all these separate containers and Pods (if we talk about Kubernetes), would require coming up with separate maintenance strategies. This brings up the question of creating a separate Operator for PMM only to manage all this, but it is a significant effort. If you have an opinion here – please let us know on our community forum.

Automated k8s registration in DBaaS

As you know, Percona Monitoring and Management comes with a technical preview of  Database as a Service (DBaaS). Right now when you install PMM on a Kubernetes cluster, you still need to register the cluster to deploy databases. We want to automate this process so that after the installation you can start deploying and managing databases right away.


Percona Monitoring and Management enables database administrators and site reliability engineers to pinpoint issues in their open source database environments, whether it is a quick look through the dashboards or a detailed analysis with Query Analytics. Percona’s support for PMM on Kubernetes is a response to the needs of our customers who are transforming their infrastructure.

Some useful links that would help you to deploy PMM on Kubernetes:


Percona Platform and Percona Account Benefits

Percona Platform and Percona Account

On the 15th of April, Percona introduced the general availability of Percona Platform, and with that step, our company started a new, interesting period.

Popular questions we have received after this date include: 

  • What is Percona Platform, and how does it differ from Pecona Account? 
  • What is the difference between Percona Platform and Percona Portal? 
  • Why should Percona users connect to Percona Platform?

In this blog post, I will answer these questions and provide a summary of the benefits of a Percona Account. 

What is Percona Platform, and how does it differ from Pecona Account? 

First, let’s make sure we understand the following concepts:

Percona PlatformA unified experience for developers and database administrators to monitor, manage, secure, and optimize database environments on any infrastructure. It includes open source software and tools, access to Percona Expert support, automated insights, and self-service knowledge base articles.

Percona Account – A single login provides users with seamless access across all Percona properties, as well as insights into your database environment, knowledge base, support tickets, and guidance from Percona Experts. By creating a Percona Account, you get access to Percona resources like Forums, Percona Portal, ServiceNow (for current customers), and, in the future, other Percona resources (www.percona.com, Jira, etc.).

Percona Portal One location for your Percona relationship, with a high-level overview of your subscription and key resources like the knowledge base, support tickets, and ability to access your account team from anywhere.

Next, let’s try to understand the complete flow users will have to go through when they decide to start using the Percona Platform. A Percona Account is foundational to Percona Platform. Users can create their own account here on Percona Portal. There are two options for Percona Account creation, either via an email and password or by using a social login from Google or Github.  

What is the difference between Percona Platform and Percona Portal? 

From the description above Percona Platform is a great experience in open source software, services, tools, and resources Percona is offering to all users. Percona Portal is just a part of Percona Platform and could be thought of as the core component of Percona Platform to discover the value Percona is offering to registered users.

Why should Percona users connect to Percona Platform?

Percona Platform brings together enterprise-ready distributions of MySQL, PostgreSQL, and MongoDB and a range of open source tools for database monitoring, backup, and management, making it easier to run complex database environments.

Percona Platform consists of several essential units: 

  • Percona Account, 
  • Percona Monitoring & Management, which can assure alerting, advisor insight for users’ environments, backups, private DBaaS, etc.
  • Percona Portal, 
  • Support and services.

Percona Monitoring and Management, DBaaS 

The heart of Percona Platform is Percona Monitoring and Management (PMM), which provides the observability required to understand database health while offering actionable insights to remediate database incidents or performance issues. 

PMM has already won the hearts of millions of DBAs around the world with its Query Analytics (QAN) and Advisors and now with Percona Platform release we made it even cooler! 

The new technical preview feature that I’m talking about is Private DBaaS. 

With less than 20 minutes to configure, Percona Private DBaaS enables you to provide self-service databases to your internal teams in any environment. 

Percona Private DBaaS enables Developers to create and manage database clusters through familiar UI and well-documented API, providing completely open source and ready-to-use clusters with MySQL or MongoDB that contain all necessary Kubernetes settings. Each cluster is set up with high availability, load balancing, and essential monitoring and management.

Private DBaaS feature of PMM is free from vendor lock-in and does not enforce any cloud or on-prem infrastructure usage restrictions.

Learn more about DBaaS functionality in our Documentation

Percona Portal 

Percona Portal is a simple UI to check and validate your relationship with Percona software and services. Stay tuned for updates. 

percona platform

To start getting the value from Percona Portal, just create an organization in the Portal and connect Percona Monitoring & Management using a Percona account. Optionally, you can also invite other members to this group in order to share a common view of your organization, any PMM connection details, a list of current organization members, etc. 

When existing Percona customers create their accounts with corporate email, Percona Portal already knows about their organization. They are automatically added as members and can see additional details about their account, like opened support tickets, entitlements, and contact details.

Percona Portal

Percona Account and SSO

Percona Platform is a combination of open source software and services. You can start discovering software advantages right from installing PMM and start monitoring and managing your environments. You could ask yourself what role of the Platform connection is there? Let’s briefly clarify this.

SSO is the first advantage you should see upon connecting your PMM instance to Percona Platform. All users in the organization are able to sign in to PMM using their Percona Account credentials. They also will be granted the same roles in PMM and Portal.  For example, if a user is an admin on Percona Portal, they do not need to create a new user in PMM, but will instead be automatically added as an admin in PMM. Portal Technical users will be granted a Viewer role in PMM.

Let’s assume a Percona Platform user has already created an account, has his own organization with a couple of members in it, and has also connected PMM to the Platform with a Percona Account. Then, he is also able to use SSO for the current organization. 

Percona Advisors

Advisors provide automated insights and recommendations within Percona Monitoring and Management, ensuring your database performs at its best. Constantly evolving with technology and user feedback, the Advisors check for:

  • Availability at risk,
  • Replication inconsistencies,
  • Durability at risk,
  • Passwordless users,
  • Unsecure connections,
  • Unstable OS configuration,
  • Available performance improvements and more.

 Advisors workflow consists of the following:

  1. If the user has just started PMM and set up monitoring of some databases and environments, we provide him with a set of basic Advisors.  To check the list the user should go to the Advisors section in PMM.
    Percona Advisors
  2. After the user connects PMM and Platform on Percona Portal, this list gets bigger and the set of advisors is updated. 
  3. The final step happens with a Percona Platform subscription. If a user subscribes to Percona Platform, we provide the most advanced list of Advisors, which cover a lot of edge cases and check database environments.

See the full list of Advisors and various tiers in our documentation here

As an open source company, we are always eager to engage the community and invite them to innovate with us. 

Here you can find the developer guides on how to create your own Advisors.

How to register a Percona Account?

There are several options to create a Percona account. The first way is to specify your basic data – email, last name, first name, and password.

  1. Go toDon’t have an account? Create one” on the landing page for Percona Portal. 
  2. Enter a minimal set of information (work email, password, first and last name), confirm that you agree with Percona’s Terms of Service and the Privacy Policy agreement, and hit Create.
  3. register percona accountTo complete your registration, check your email and follow the steps provided there. Click the confirmation link in order to activate your account. You may see messages coming from Okta, this is because Percona Platform uses Okta as an identity provider.
  4. After you have confirmed your account creation, you can log in to Percona Portal.

You can also register a Percona Account if you already have a Google or Github account created.

When you create a Percona account with Google and Github, Percona will store only your email and authenticate you using data from the given identity providers.

  1. Make sure you have an account on Google/Github.
  2. On the login page, there are dedicated buttons that will instruct you to confirm account information usage by Percona.
  3. Confirm you are signing in to your account if 2FA for your Google is enabled.
  4. Hurray! Now you have an active account on Percona Portal.

The next option to register a Percona Account is to Continue with Github. The process is similar to Google. 

  1.  An important precondition has to be met prior to account usage and registration: Set your email address in Github to Public.
  2. From Settings, go to the Emails section and uncheck the Keep my email address private option. These settings will be saved automatically. If you do not want to change this option in Github, so you could use other accounts like Google, or create your own with email.
  3. If you already registered an account, you are ready to go with Percona Portal. Use your Google/Github credentials every time you log in to your account.


Start your Percona Platform experience with Percona Monitoring & Management – open source database monitoring, management, and observability solution for MySQL, PostgreSQL, and MongoDB.

Set up Percona Monitoring and Management (version >2.27.0) following PMM Documentation

The next step is to start exploring Percona Platform and get more value and experience from Percona database experts by creating a Percona Account on Percona Portal. The key benefits of Percona  Platform include: 

  • A Percona Account as a single authentication mechanism to use across all Percona resources, 
  • Access to Percona content like blogs and forums, knowledge base.
  • Percona Advisors that help optimize, manage and monitor database environments.

Once you create a Percona account, you will get basic advisors, open source software, and access to community support and forums.

When connecting Percona Monitoring and Management to Percona Account you will get a more advanced set of advisors for your systems security, configuration, performance, and data design.

Percona Platform subscription is offering even more advanced advisors for Percona customers and also a fully supported software experience, private DBaaS, and an assigned Percona Expert to ensure your success with the Platform. 

Please also consider sharing your feedback and experience with Percona on our Forums.

Visit Percona Platform online help to view Platform-related documentation, and follow our updates on the Platform what’s new page.


Monitoring MongoDB Collection Stats with Percona Monitoring and Management

Monitoring MongoDB Collection Stats

Monitoring MongoDB Collection StatsOne approach to get to know a MongoDB system we are not familiar with is to start by checking the busiest collections. MongoDB provides the top administrative command for this purpose.

From the mongo shell, we can run db.adminCommand(“top”) to get a snapshot of all the collections at a specific point in time:

		"test.testcol" : {
			"total" : {
				"time" : 17432,
				"count" : 58
			"readLock" : {
				"time" : 358,
				"count" : 57
			"writeLock" : {
				"time" : 17074,
				"count" : 1
			"queries" : {
				"time" : 100,
				"count" : 1
			"getmore" : {
				"time" : 0,
				"count" : 0
			"insert" : {
				"time" : 17074,
				"count" : 1
			"update" : {
				"time" : 0,
				"count" : 0
			"remove" : {
				"time" : 0,
				"count" : 0
			"commands" : {
				"time" : 0,
				"count" : 0

In the extract above we can see some details about the testcol collection. For each operation type, we have the amount of server time spent (measured in microseconds), and a counter for the number of operations issued. The total section is simply the sum for all operation types against the collection.

In this case, I executed a single insert and then a find command to check the results, hence the counters are one for each of those operations.

So by storing samples of the above values in a set of metrics, we can easily see the trends and historical usage of a MongoDB instance.

Top  Metrics in Percona Monitoring and Management (PMM)

PMM 2.26 includes updates to mongodb_exporter so we now have the ability to get metrics using the output of the MongoDB top command.

For example:

MongoDB Percona Monitoring and Management

As you can see, the metric names correlate with the output of the top command we saw. In this release, the dbname and collection name is part of the metric name (that is subject to change in the future based on community feedback).

Collection and Index Stats

In addition to this, PMM 2.26 also includes the ability to gather database, collection, and index statistics. Using these metrics we can monitor collection counts, data growth, and index usage over time.

For example, we can collect the following information about a database:

PMM Collection and Index Stats

And here are the colstats metrics for the test.testcol collection:

PMM metrics

Note: By default, PMM won’t collect this information if you have more than 200 collections (this number is subject to change). The reason is to avoid too much overhead in metrics collection, which may adversely affect performance. You can override this behavior by using the –max-collections-limit option.

Enabling the Additional Collectors

The PMM client starts by default only with the diagnosticdata and replicasetstatus collectors enabled. We can run the PMM client with the –enable-all-collectors argument, to make all the new metrics available. For example:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host= --port=27017 --enable-all-collectors

If we want to override the limit mentioned above, use something like this:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host= --port=27017 --enable-all-collectors -–max-collections-limit=500

We also have the ability to enable only some of the extra collectors. For example, if you want all collectors except topmetrics, specify:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host= --port=27017 --enable-all-collectors --disable-collectors=topmetrics

We can also filter which databases and collections we are interested in getting metrics about. The optional argument –stats-collections can be set with a namespace.

For example:

–stats-collections=test (get data for all collections in test db)
–stats-collections=test.testcol (get data only from testcol collection of testdb)

Also, check the documentation page for more information.

Creating Dashboards

We can easily create dashboards using the newly available metrics. For example, to see the index usages, we can use the following promSQL expression:

{__name__ =~ "mongodb_.*_accesses_ops"}

This is what the dashboard would look like:

MongoDB dashboard

Note: If the image above is too small, you can open it in a new browser tab to see it better.


We can use these new metrics to create dashboards with the collections most written to (or most read from). We can also see what the characteristics of the workload are at a glance to determine if we are dealing with a read-intensive, mixed, or write-intensive system.

One of the most important uses for these new collectors and metrics is performance tuning. By knowing more about your top or “hottest” collections, you will likely be able to better tune your queries and indexing to improve overall performance. These new metrics on collections will be very helpful for performance tuning

Keep in mind these features are currently a technical preview, so they are disabled by default.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today


PMM Now Supports Monitoring of PostgreSQL Instances Connecting With Any Database (Name)

Monitoring of PostgreSQL Instances

The recent release of Percona Monitoring and Management 2.25.0 (PMM) includes a fix for bug PMM-6937: before that, PMM expected all monitoring connections to PostgreSQL servers to be made using the default postgres database. This worked well for most deployments, however, some DBaaS providers like Heroku and DigitalOcean do not provide direct access to the postgres database (at least not by default) and instead create custom databases for each of their instances. Starting with this new release of PMM, it is possible to specify the name of the database the Postgres exporter should connect to, both through the pmm-admin command-line tool as well as when configuring the monitoring of a remote PostgreSQL instance directly in the PMM web interface.

Secure Connections

Most DBaaS providers enforce or even restrict access to their instances to SSL (well, in fact, TLS) client connections – and that’s a good thing! Just pay attention to this detail when you configure the monitoring of these instances on PMM. If when you configure your instance the system returns a red-box alert with the saying:

Connection check failed: pq: no pg_hba.conf entry for host “”, user “pmm”, database “mydatabase”, SSL off.

and your firewall allows external connections to your database, chances are the problem is on the last part – “SSL off”.

Web Interface

When configuring the monitoring of a remote PostgreSQL instance in the PMM web interface, make sure you tick the box for using TLS connections:

PMM TLS connections

But note that checking this box is not enough; you have to complement this setting with one of two options:

a) Provide the TLS certificates and key, using the forms that appear once you click on the “Use TLS” check-box:

  • TLS CA
  • TLS certificate key
  • TLS certificate

The PMM documentation explains this further and more details about server and client certificates for PostgreSQL connection can be found in the SSL Support chapter of its online manual.

b) Opt to not provide TLS certificates, leaving the forms empty and checking the “Skip TLS” check-box:

What should rule this choice is the SSL mode the PostgreSQL server requires for client connections. When DBaaS providers enforce secure connections, usually this means sslmode=require and for this option b above is enough. Only when the PostgreSQL server employs the more restrictive verify-ca or verify-full modes you will need to with option a and provide the TLS certificates and key.

Command-Line Tool

When configuring the monitoring of a PostgreSQL server using the pmm-admin command-line tool, the same options are available and the PMM documentation covers them as well. Here’s a quick example to illustrate configuring the monitoring for a remote PostgreSQL server when sslmode=require and providing a custom database name:

$ pmm-admin add postgresql --server-url=https://admin:admin@localhost:443 --server-insecure-tls --host=my.dbaas.domain.com --port=12345 --username=pmmuser --password=mYpassword --database=customdb --tls --tls-skip-verify

Note that option server-insecure-tls applies to the connection with the PMM server itself; the options prefixed with tls are the ones that apply to the connection to the PostgreSQL database.

Tracking Stats

You may have also noticed in the release notes that PMM 2.25.0 also provides support for the release candidate of our own pg_stat_monitor. For now, however, it is unlikely this extension will be made available by DBaaS providers, so you need to stick with pg_stat_statements for now:


Make sure this extension is loaded for your instance’s custom database, otherwise, PMM won’t be able to track many important stats. That is the case already for all non-hobby Heroku Postgres but for other providers like Digital Ocean it needs to be created beforehand:

defaultdb=> select count(*) from pg_stat_statements;
ERROR:  relation "pg_stat_statements" does not exist
LINE 1: select count(*) from pg_stat_statements;
defaultdb=> CREATE EXTENSION pg_stat_statements;

defaultdb=> select count(*) from pg_stat_statements;
(1 row)

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today


Custom Percona Monitoring and Management Metrics in MySQL and PostgreSQL

mysql postgresl custom metrics

mysql postgresl custom metricsA few weeks ago we did a live stream talking about Percona Monitoring and Management (PMM) and showcased some of the fun things we were doing at the OSS Summit.  During the live stream, we tried to enable some custom queries to track the number of comments being added to our movie database example.  We ran into a bit of a problem live and did not get it to work. As a result, I wanted to follow up and show you all how to add your own custom metrics to PMM and show you some gotchas to avoid when building them.

Custom metrics are defined in a file deployed on each server you are monitoring (not on the server itself).  You can add custom metrics by navigating over to one of the following:

  • For MySQL:  /usr/local/percona/pmm2/collectors/custom-queries/mysql
  • For PostgreSQL:  /usr/local/percona/pmm2/collectors/custom-queries/postgresql
  • For MongoDB:  This feature is not yet available – stay tuned!

You will notice the following directories under each directory:

  • high-resolution/  – every 5 seconds
  • medium-resolution/ – every 10 seconds
  • low-resolution/ – every 60 seconds

Note you can change the frequency of the default metric collections up or down by going to the settings and changing them there.  It would be ideal if in the future we added a resolution config in the YML file directly.  But for now, it is a universal setting:

Percona Monitoring and Management metric collections

In each directory you will find an example .yml file with a format like the following:

  query: "select count(1) as comment_cnt from movie_json_test.movies_normalized_user_comments;"
    - comment_cnt: 
        usage: "GAUGE" 
        description: "count of the number of comments coming in"

Our error during the live stream was we forgot to include the database in our query (i.e. table_name.database_name), but there was a bug that prevented us from seeing the error in the log files.  There is no setting for the database in the YML, so take note.

This will create a metric named mysql_oss_demo_comment_cnt in whatever resolution you specify.  Each YML will execute separately with its own connection.  This is important to understand as if you deploy lots of custom queries you will see a steady number of connections (this is something you will want to consider if you are doing custom collections).  Alternatively, you can add queries and metrics to the same file, but they are executed sequentially.  If, however, the entire YML file can not be completed in less time than the defined resolution ( i.e. finished within five seconds for high resolution), then the data will not be stored, but the query will continue to run.  This can lead to a query pile-up if you are not careful.   For instance, the above query generally takes 1-2 seconds to return the count.  I placed this in the medium bucket.  As I added load to the system, the query time backed up.

You can see the slowdown.  You need to be careful here and choose the appropriate resolution.  Moving this over to the low resolution solved the issue for me.

That said, query response time is dynamic based on the conditions of your server.  Because these queries will run to completion (and in parallel if the run time is longer than the resolution time), you should consider limiting the query time in MySQL and PostgreSQL to prevent too many queries from piling up.

In MySQL you can use:

mysql>  select /*+ MAX_EXECUTION_TIME(4) */  count(1) as comment_cnt from movie_json_test.movies_normalized_user_comments ;
ERROR 1317 (70100): Query execution was interrupted

And on PostgreSQL you can use:

SET statement_timeout = '4s'; 
select count(1) as comment_cnt from movies_normalized_user_comments ;
ERROR:  canceling statement due to statement timeout

By forcing a timeout you can protect yourself.  That said, these are “errors” so you may see errors in the error log.

You can check the system logs (syslog or messages) for errors with your custom queries (note at this time as of PMM 2.0.21, errors were not making it into these logs because of a potential bug).  If the data is being collected and everything is set up correctly, head over to the default Grafana explorer or the “Advanced Data Exploration” dashboard in PMM.  Look for your metric and you should be able to see the data graphed out:

Advanced Data Exploration PMM

In the above screenshot, you will notice some pretty big gaps in the data (in green).  These gaps were caused by our query taking longer than the resolution bucket.  You can see when we moved to 60-second resolution (in orange), the graphs filled in.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today


Inspecting MySQL Servers Part 5: Percona Monitoring and Management

Inspecting MySQL Servers PMM

Inspecting MySQL Servers PMMIn the previous posts of this series, I presented how the Percona Support team approaches the analysis and troubleshooting of a MySQL server using a tried-and-tested method supported by specific tools found in the Percona Toolkit:

Inspecting MySQL Servers Part 1: The Percona Support Way

Inspecting MySQL Servers Part 2: Knowing the Server

Inspecting MySQL Servers Part 3: What MySQL?

Inspecting MySQL Servers Part 4: An Engine in Motion

A drawback from such an approach is that data collection is done in a “reactive” way and (part of) it needs to be processed before we can interpret it. Enters Percona Monitoring and Management (PMM): PMM continually collects MySQL status variables and plots the metrics in easy-to-interpret Grafana graphs and panels. Plus, it includes a rich Query Analytics dashboard that helps identify the top slow queries and show how they are executing. It makes for an excellent complement to the approach we presented. In fact, many times it takes the central role: we analyze the data available on PMM and, if necessary, look at complementing it with pt-stalk samples. In this post, I will show you how we can obtain much of the same information we got from the Percona Toolkit tools (and sometimes more) from PMM.

* As was the case in the previous posts in this series, data, and graphs used to illustrate this post does not come from a single server and have been captured using different versions of PMM.

Know the Server

Once you are connected to PMM, you can select the target server under the Node Name field in the menu located on the top-left side of the interface, then select PMM dashboards on the left menu, System (Node), and, finally, Node Summary, as shown in the screenshot below:

PMM Dashboard

The header section of the Node Summary page shows the basic hardware specs of the server as well as a few metrics and projections. You will find on the right side of this section the full output of pt-summary, which we have scrutinized extensively in the second post of this series, there, waiting for you:

MySQL Node Summary
Below the header section, there are four panels dedicated to CPU, Memory, Disk, and Network, each containing graphics with specific metrics on each of these areas. It makes it easy, for example, to look at overall CPU utilization:

CPU utilization

Recent spikes in I/O activity:

And memory usage:

Note the graphs cover the last 12 hours of activity by default but you can select a different time range in the top-right menu:

What MySQL?

Taking a slightly different route by selecting MySQL instead of System (Node) and then MySQL Summary, we get to access a dashboard that displays MySQL-specific metrics for the selected instance:

MySQL Summary

Under the Service Summary panel, you will find the full output of pt-mysql-summary, which we reviewed in detail in the third post of this series:

The main goal of the pt-mysql-summary is to provide a sneak-peek into how MySQL is configured, at a single point in time.  With PMM you get instant access to most of the MySQL trends and status variables we only get a glance from in the report. We can go and look straight under the hood to look at the engine characteristics while it is under load, over the last 5 minutes to the last 30 days or more!

An Engine in Motion

There is so much we can look at at this point. If we go and more or less follow the sequence observed in the previous posts we can start by checking if the table cache is big enough. The example below shows it to be just right, if we base ourselves in the limited time frame this particular sample covers, with an average hit ratio close to 100%:

MySQL Table Open Cache Status

Or we can look for a disruption in the pattern, such as a peak in threads connected:

And then investigate the effects it caused on the server (or was it already a consequence of something else that occurred?), for example, a change in the rate of temporary tables created at that time for both in-memory and on-disk tables:

The MySQL Instance Summary is just one of many dashboards available for MySQL:

Under the MySQL InnoDB Details dashboard we find many InnoDB-specific metrics plotted as a multitude of different graphs, providing a visual insight into things such as the number of requests that can be satisfied from data that is already loaded in the Buffer Pool versus those that must be first read from disk (does my hot data fit in memory?):

InnoDB Buffer Pool Requests

Besides MySQL status variables metrics, there is also data filtered directly from SHOW ENGINE INNODB STATUS. For instance, we can find long-running transactions based on increasing values of InnoDB’s history length list:

Another perk of PMM is the ability to easily evaluate whether redo log space is big enough based on the rate of writes versus the size of the log files:

And thus observe checkpoint age, a concept that is explained in detail for PMM in How to Choose the MySQL innodb_log_file_size:

Another evaluation made easy with PMM is whether a server’s workload is benefitting from having InnoDB’s Adaptive Hash Index (AHI) enabled. The example below shows an AHI hit-ratio close to 100% up to a certain point, from which the number of searches increased and the situation inverted:

The evaluation of settings like the size of the redo log space and the efficiency of AHI should be done at a macro level, spanning days: we should be looking for what is the best general configuration for these. However, when we are investigating a particular event, it is important to zoom in on the time frame where it occurred to better analyze the data captured at the time. Once you do this, change the data resolution from the default of auto to 1s or 5s interval/granularity so you can better see spikes and overall variation: 

QAN: Query Analytics

Query analysis is something I only hinted at but didn’t explore in the first articles in this series. The “manual” way requires processing the slow query log with a tool such as pt-query-digest and then going for details about a particular query by connecting to the server to obtain the execution plan and schema details. A really strong feature of PMM is the Query Analytics dashboard, which provides a general overview of query execution and captures all information about it for you. 

The example below comes from a simple sysbench read-write workload on my test server:

PMM Query AnalyticsWe can select an individual query on the list and check the details of its execution:

The query’s  EXPLAIN plan is also available, both in classic and JSON formats:

You can read more about QAN on our website as well as in other posts on our blog platform, such as How to Find Query Slowdowns Using Percona Monitoring and Management.

What PMM Does Not Include

There remains information/data we cannot obtain from PMM, such as the full output of SHOW ENGINE INNODB STATUS. For situations when obtaining this information is important, we resort back to pt-stalk. It is not one or the other, we see them as complementary tools in our job of inspecting MySQL servers. 

If you are curious about PMM and would like to see how it works in practice, check our demo website at https://pmmdemo.percona.com/. To get up and running with PMM quickly, refer to our quickstart guide.

Tuning the Engine for the Race Track

There you have it! It certainly isn’t all there is but we’ve got a lot packed in this series, enough to get you moving in the right direction when it comes to inspecting and troubleshooting MySQL servers. I hope you have enjoyed the journey and learned a few new tricks along the way ?


MongoDB Integrated Alerting in Percona Monitoring and Management

MongoDB Integrated Alerting

MongoDB Integrated AlertingPercona Monitoring and Management (PMM) recently introduced the Integrated Alerting feature as a technical preview. This was a very eagerly awaited feature, as PMM doesn’t need to integrate with an external alerting system anymore. Recently we blogged about the release of this feature.

PMM includes some built-in templates, and in this post, I am going to show you how to add your own alerts.

Enable Integrated Alerting

The first thing to do is navigate to the PMM Settings by clicking the wheel on the left menu, and choose Settings:

Next, go to Advanced Settings, and click on the slider to enable Integrated Alerting down in the “Technical Preview” section.

While you’re here, if you want to enable SMTP or Slack notifications you can set them up right now by clicking the new Communications tab (which shows up after you hit “Apply Changes” turning on the feature).

The example below shows how to configure email notifications through Gmail:

You should now see the Integrated Alerting option in the left menu under Alerting, so let’s go there next:

Configuring Alert Destinations

After clicking on the Integrated Alerting option, go to the Notification Channels to configure the destination for your alerts. At the time of this writing, email via your SMTP server, Slack and PagerDuty are supported.

Creating a Custom Alert Template

Alerts are defined using MetricsQL which is backward compatible with Prometheus QL. As an example, let’s configure an alert to let us know if MongoDB is down.

First, let’s go to the Explore option from the left menu. This is the place to play with the different metrics available and create the expressions for our alerts:

To identify MongoDB being down, one option is using the up metric. The following expression would give us the alert we need:


To validate this, I shut down a member of a 3-node replica set and verified that the expression returns 0 when the node is down:

The next step is creating a template for this alert. I won’t go into a lot of detail here, but you can check Integrated Alerting Design in Percona Monitoring and Management for more information about how templates are defined.

Navigate to the Integrated Alerting page again, and click on the Add button, then add the following template:

  - name: MongoDBDown
    version: 1
    summary: MongoDB is down
    expr: |-
      up{service_type="mongodb"} == 0
    severity: critical
      summary: MongoDB is down ({{ $labels.service_name }})
      description: |-
        MongoDB {{ $labels.service_name }} on {{ $labels.node_name }} is down

This is how it looks like:

Next, go to the Alert Rules and create a new rule. We can use the Filters section to add comma-separated “key=value” pairs to filter alerts per node, per service, per agent, etc.

For example: node_id=/node_id/123456, service_name=mongo1, agent_id=/agent_id/123456

After you are done, hit the Save button and go to the Alerts dashboard to see if the alert is firing:

From this page, you can also silence any firing alerts.

If you configured email as a destination, you should have also received a message like this one:

For now, a single notification is sent. In the future, it will be possible to customize the behavior.

Creating MongoDB Alerts

In addition to the obvious “MongoDB is down” alert, there are a couple more things we should monitor. For starters, I’d suggest creating alerts for the following conditions:

  • Replica set member in an unusual state
mongodb_replset_member_state != 1 and mongodb_replset_member_state != 2

  • Connections higher than expected
avg by (service_name) (mongodb_connections{state="current"}) > 5000

  • Cache evictions higher than expected
avg by(service_name, type) (rate(mongodb_mongod_wiredtiger_cache_evicted_total[5m])) > 5000

  • Low WiredTiger tickets
avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50

The values listed above are just for illustrative purposes, you need to decide the proper thresholds for your specific environment(s).

As another example, let’s add the alert template for the low WiredTiger tickets:

  - name: MongoDB Wiredtiger Tickets
    version: 1
    summary: MongoDB Wiredtiger Tickets low
    expr: avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50
    severity: warning
      description: "WiredTiger available tickets on (instance {{ $labels.node_name }}) are less than 50"


Integrated alerting is a really nice to have feature. While it is still in tech preview state, we already have a few built-in alerts you can test, and also you can define your own. Make sure to check the Integrated Alerting official documentation for more information about this topic.

Do you have any specific MongoDB alerts you’d like to see? Given the feature is still in technical preview, any contributions and/or feedback about the functionality are welcome as we’re looking to release this as GA very soon!


Improving Percona Monitoring and Management EC2 Instance Resilience Using CloudWatch Alarm Actions

Percona Monitoring and Management EC2 Instance Resilience Using CloudWatch

Percona Monitoring and Management EC2 Instance Resilience Using CloudWatchNothing lasts forever, including hardware running your EC2 instances. You will usually receive an advance warning on hardware degradation and subsequent instance retirement, but sometimes hardware fails unexpectedly. Percona Monitoring and Management (PMM) currently doesn’t have an HA setup, and such failures can leave wide gaps in monitoring if not resolved quickly.

In this post, we’ll see how to set up automatic recovery for PMM instances deployed in AWS through Marketplace. The automation will take care of the instance following an underlying systems failure. We’ll also set up an automatic restart procedure in case the PMM instance itself is experiencing issues. These simple automatic actions go a long way in improving the resilience of your monitoring setup and minimizing downtime.

Some Background

Each EC2 instance has two associated status checks: System and Instance. You can read in more detail about them on the “Types of status checks” AWS documentation page. The gist of it is, the System check fails to pass when there are some infrastructure issues. The Instance check fails to pass if there’s anything wrong on the instance side, like its OS having issues. You can normally see the results of these checks as “2/2 checks passed” markings on your instances in the EC2 console.

ec2 instance overview showing 2/2 status checks

CloudWatch, an AWS monitoring system, can react to the status check state changes. Specifically, it is possible to set up a “recover” action for an EC2 instance where a system check is failing. The recovered instance is identical to the original, but will not retain the same public IP unless it’s assigned an Elastic IP. I recommend that you use Elastic IP for your PMM instances (see also this note in PMM documentation). For the full list of caveats related to instance recovery check out the “Recover your instance” page in AWS documentation.

According to CloudWatch pricing, having two alarms set up will cost $0.2/month. An acceptable cost for higher availability.

Automatically Recovering PMM on System Failure

Let’s get to actually setting up the automation. There are at least two ways to set up the alarm through GUI: from the EC2 console, and from the CloudWatch interface. The latter option is a bit involved, but it’s very easy to set up alarms from the EC2 console. Just right-click on your instance, pick the “Monitor and troubleshoot” section in the drop-down menu, and then choose the “Manage CloudWatch alarms” item.

EC2 console dropdown menu navigation to CloudWatch Alarms

Once there, choose the “Status check failed” as the “Type of data to sample”, and specify the “Recover” Alarm action. You should see something like this.

Adding CloudWatch alarm through EC2 console interface

You could notice that GUI offers to set up a notification channel for the alarm. If you want to get a message if your PMM instance is recovered automatically, feel free to set that up.

The alarm will fire when the maximum value of the StatusCheckFailed_System metric is >=0.99 during two consecutive 300 seconds periods. Once the alarm fires, it will recover the PMM instance. We can check out our new alarm in the CloudWatch GUI.

EC2 instance restart alarm

EC2 console will also show that the alarm is set up.

Single alarm set in the instance overview

This example uses a pretty conservative alarm check duration of 10 minutes, spread over two 5-minute intervals. If you want to recover a PMM instance sooner, risking triggering on false positives, you can bring down the alarm period and number of evaluation periods. We also use the “maximum” of the metric over a 5-minute interval. That means a check could be in failed state only one minute out of five, but still count towards alarm activation. The assumption here is that checks don’t flap for ten minutes without a reason.

Automatically Restarting PMM on Instance Failure

While we’re at it, we can also set up an automatic action to execute when “Instance status check” is failing. As mentioned, usually that happens when there’s something wrong within the instance: really high load, configuration issue, etc. Whenever a system check fails, an instance check is going to be failing, too, so we’ll set this alarm to check for a longer period of time before firing. That’ll also help us to minimize the rate of false-positive restarts, for example, due to a spike in load. We use the same period of 300 seconds here but will only alarm after three periods show instance failure. The restart will thus happen after ~15 minutes. Again, this is pretty conservative, so adjust as you think works best for you.

In the GUI, repeat the same actions that we did last time, but pick the “Status check failed: instance” data type, and “Reboot” alarm action.

Adding CloudWatch alarm for instance failure through EC2 console interface

Once done, you can see both alarms reflected in the EC2 instance list.

EC2 instance overview showing 2 alarms set

Setting up Alarms Using AWS CLI

Setting up our alarm is also very easy to do using the AWS CLI.  You’ll need an AWS account with permissions to create CloudWatch alarms, and some EC2 permissions to list instances and change their state. A pretty minimal set of permissions can be found in the policy document below. This set is enough for a user to invoke CLI commands in this post.

  "Version": "2012-10-17",
  "Statement": [
      "Action": [
      "Resource": [
      "Effect": "Allow"
      "Action": [
      "Resource": [
      "Effect": "Allow"

And here are the specific AWS CLI commands. Do not forget to change instance id (i-xxx) and region in the command. First, setting up recovery after system status check failure.

$ aws cloudwatch put-metric-alarm \
    --alarm-name restart_PMM_on_system_failure_i-xxx \
    --alarm-actions arn:aws:automate:REGION:ec2:recover \
    --metric-name StatusCheckFailed_System --namespace AWS/EC2 \
    --dimensions Name=InstanceId,Value=i-xxx \
    --statistic Maximum --period 300 --evaluation-periods 2 \
    --datapoints-to-alarm 2 --threshold 1 \
    --comparison-operator GreaterThanOrEqualToThreshold \
    --tags Key=System,Value=PMM

Second, setting up restart after instance status check failure.

$ aws cloudwatch put-metric-alarm \
    --alarm-name restart_PMM_on_instance_failure_i-xxx \
    --alarm-actions arn:aws:automate:REGION:ec2:reboot \
    --metric-name StatusCheckFailed_Instance --namespace AWS/EC2 \
    --dimensions Name=InstanceId,Value=i-xxx \
    --statistic Maximum --period 300 --evaluation-periods 3 \
    --datapoints-to-alarm 3 --threshold 1 \
    --comparison-operator GreaterThanOrEqualToThreshold \
    --tags Key=System,Value=PMM

Testing New Alarms

Unfortunately, it doesn’t seem to be possible to simulate system status check failure. If there’s a way, let us know in the comments. Because of that, we’ll be testing our alarms using instance failure instead. The simplest way to fail the Instance status check is to mess up networking on an instance. Let’s just bring down a network interface. In this particular instance, it’s ens5.

# ip link
2: ens5:...
# date
Sat Apr 17 20:06:26 UTC 2021
# ip link set ens5 down

The stuff of nightmares: having no response for a command. In 15 minutes, we can check that the PMM instance is available again. We can see that alarm triggered and executed the restart action at 20:21:37 UTC.

CloudWatch showing Alarm fired actions

And we can access the server now.

$ date
Sat Apr 17 20:22:59 UTC 2021

The alarm itself takes a little more time to return to OK state. Finally, let’s take a per-minute look at the instance check metric.

CloudWatch metric overview

Instance failure was detected at 20:10 and resolved at 20:24, according to the AWS data. In reality, the server rebooted and was accessible even earlier. PMM instances deployed from a marketplace image have all the required service in autostart, and thus PMM is fully available after restarting without a need for an operator to take any action.


Even though Percona Monitoring and Management itself doesn’t offer high availability at this point, you can utilize built-in AWS features to make your PMM installation more resilient to failure. PMM in the Marketplace is set up in a way that should not have problems with recovery from the instance retirement events. After your PMM instance is recovered or rebooted, you’ll be able to access monitoring without any manual intervention.

Note: While this procedure is universal and can be applied to any EC2 instance, there are some caveats explained in the “What do I need to know when my Amazon EC2 instance is scheduled for retirement?” article on the AWS site. Note specifically the Warning section. 



Running Percona Monitoring and Management v2 on Windows with Docker

PMM Windows Docker

PMM Windows DockerOne way to run Percona Monitoring and Management (PMM) v2 on Windows is using the Virtual Appliance installation method. It works well but requires you to run extra software to run Virtual Appliances, such as Oracle VirtualBox, VMWare Workstation Player, etc. Or, you can use Docker.

Docker is not shipped with Windows 10, though you can get it installed for free with Docker Desktop. Modern Docker on Windows actually can either run using Hyper-V backend or using Windows Subsystem for Linux v2 (WSL2). Because WSL2-based Docker installation is more involved, I’m sticking to what is now referred to as “legacy” Hyper-V backend.

On my Windows 10 Pro (19042) system, I had to change some BIOS settings:


PMM on Windows Docker


And install the Hyper-V component for Docker Desktop to install correctly:


Percona Monitoring and Management Windows


With this, we’re good to go to install Percona Monitoring and Management on Windows.


Register for Percona Live ONLINE
A Virtual Event about Open Source Databases


Official installing PMM with Docker instructions assumes you’re running a Linux/Unix-based system that needs to be slightly modified for Windows. First, there is no sudo on Windows, and second, “Command Prompt” does not interpret “\”  the same way as shell does.

With those minor fixes though, basically, the same command to start PMM works:

docker pull percona/pmm-server:2
docker create --volume /srv --name pmm-data percona/pmm-server:2 /bin/true
docker run --detach --restart always --publish 80:80 --publish 443:443 --volumes-from pmm-data --name pmm-server percona/pmm-server:2

That’s it! You should have your Percona Monitoring and Management (PMM) v2 running!

Looking at PMM Server Instance through PMM itself you may notice it does not have all resources by default, as Docker on Linux would.




Note there are only 2 CPU cores, 2GB of Memory, and some 60GB of Disk space available.  This should be enough for a test but if you want to put some real load on this instance you may want to adjust those settings. It is done through the “Resources” setting in Docker Desktop:




Adjust these to suit your load needs and you’re done!  Enjoy your Percona Monitoring and Management!


Percona Monitoring and Management Meetup on May 6th: Join Us for Live Tuning and Roadmap Talk

We’re leveling up – would you like to join us on Discord? There we chat about databases, open source, and even play together.

On May 6th at 11 am EST, the Percona Community is talking about Percona Monitoring and Management (PMM). Our experts are also there to live tune and optimize your database – come check it out!

Are there any features you would like to see in PMM? The meetup is a good way to get your voice heard! But you can already start chatting with all of us on Discord right now.

It’s just a few more nights of sleep until Percona Live ONLINE with many interesting events for you to check out too. Join the community and let’s shape the future of databases together!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com