Integrated Alerting Design in Percona Monitoring and Management

Integrated Alerting Design Percona Monitoring and Management

Integrated Alerting Design Percona Monitoring and ManagementPercona Monitoring and Management 2.13 (PMM) introduced the Integrated Alerting feature as a technical preview. It adds a user-friendly way to set up and manage alerts for your databases. You can read more about this feature usage in our announcement blog post and in our documentation, while in this article we will be focusing on design and implementation details.


There are four basic entities used for IA: Alert Rule Template, Alert Rule, Alert, and Notification Channel.

Everything starts from the alert rule template. You can see its YAML representation below:

 - name: pmm_mongodb_high_memory_usage
   version: 1
   summary: Memory used by MongoDB
   expr: |-
     sum by (node_name) (mongodb_ss_mem_resident * 1024 * 1024)
     / on (node_name) (node_memory_MemTotal_bytes)
     * 100
     > [[ .threshold ]]
     - name: threshold
       summary: A percentage from configured maximum
       unit: "%"
       type: float
       range: [0, 100]
       value: 80
   for: 5m
   severity: warning
     cultom_label: demo
     summary: MongoDB high memory usage ({{ $labels.service_name }})
     description: |-
       {{ $value }}% of memory (more than [[ .threshold ]]%) is used
       by {{ $labels.service_name }} on {{ $labels.node_name }}.

A template serves as the base for alert rules. It defines several fields, let’s look at them:

  • name: uniquely identifies template (required)
  • version: defines template format version (required)
  • summary: a template description (required)
  • expr: a MetricsQL query string with parameter placeholders. MetricsQL is backward compatible with PromQL and provides some additional features. (required)
  • params: contains parameter definitions required for the query. Each parameter has a name, type, and summary. It also may have a unit, available range, and default value.
  • for: specifies the duration of time the expression must be met for;  The  alert query should return true for this period of time at which point the alert will be fired (required)
  • severity: specifies default alert severity level (required)
  • labels: are additional labels to be added to generated alerts (optional)
  • annotations: are additional annotations to be added to generated alerts. (optional)

A template is designed to be re-used as the basis for multiple alert rules so from a single pmm_node_high_cpu_load template you can have alerts for production vs non-production, warning vs critical, etc.

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

Users can create alert rules from templates. An alert rule is what’s actually executed against metrics and what produces an alert. The rule can override default values specified in the template, add filters to apply the rule to only required services/nodes/etc, and specify target notification channels, such as email, Slack, PagerDuty, or Webhooks. If the rule hasn’t any associated notification channels its alerts will be available only via PMM UI. It’s useful to note that after creation rule keeps its relation with the template and any change in the template will affect all related rules.

Here is an alert rule example:

 - name: PMM Integrated Alerting
     - alert: /rule_id/c8e5c559-ffba-43ed-847b-921f69c031a9
       rule: test
       expr: |-
         sum by (node_name) (mongodb_ss_mem_resident * 1024 * 1024)
         / on (node_name) (node_memory_MemTotal_bytes)
         * 100
         > 40
       for: 5s
         ia: "1"
         rule_id: /rule_id/c8e5c559-ffba-43ed-847b-921f69c031a9
         severity: error
         template_name: pmm_mongodb_high_memory_usage
         cultom_label: demo
         description: |-
         { { $value } }% of memory (more than 40%) is used
         by {{ $labels.service_name }} on {{ $labels.node_name }}.
         summary: MongoDB high memory usage ({{ $labels.service_name }})

It has a Prometheus alert rule format.

How it Works

Integrated Alerting feature built on top of Prometheus Alertmanager, VictoriaMetrics TimescaleDB (TSDB), and VMAlert.

VictoriaMetrics TSDB is the main metrics storage in PMM, VMalert responsible for alert rules execution, and Prometheus Alertmanager responsible for alerts delivery. VMAlert runs queries on VM TSDB, checks if they are positive for the specified amount of time (example: MySQL is down for 5 minutes), and triggers alerts. All alerts forwarded to the PMM internal Alertmanager but also can be duplicated to some external Alertmanager (it can be set up on the PMM Settings page).

There are four available templates sources:

  1. Built-in templates, shipped with PMM distribution. They are embedded into the managed binary (core component on PMM).
  2. Percona servers. It’s not available yet, but it will be similar to the STT checks delivery mechanism (HTTPS + files signatures).
  3. Templates created by the user via PMM UI. We persist them in PMM’s database.
  4. Templates created by the user as files in the /srv/ia/templates directory.

During PMM startup, managed loads templates from all sources into the memory.

Alert rules can be created via PMM UI or just by putting rule files in the /srv/prometheus/rules directory. Alert rules created via UI persist in PMM’s internal PostgreSQL database. For each alert rule from DB, managed binary creates a YAML file in /etc/ia/rules/ and asks VMalert to reload the configuration and reread rule files. VMAlert executes query from each loaded alert rule every minute, once the rule condition is met (query is positive for the specified amount of time) VMAlert produces an alert and passes it to the Alertmanager. Please note that /etc/ia/rules/ controlled by managed and any manual changes in that directory will be lost.

Managed generates configuration for Alertmanager and updates it once any related entity changes.

Managed goes through the list of the existing rules and collects unique notification channel combinations. For example, if we have two rules and each of them has assigned channels a,b, and c it will be the one unique channel combination. For each rule managed generates a route and for each unique channel combination, it generates a receiver in the Alertmanager configuration file. Each route has a target receiver and filter by rule id, also it can contain user-defined filters. If a rule hasn’t assigned notification channels, then a special empty receiver will be used. Users can redefine an empty receiver with Alertmanagers base configuration file /srv/alertmanager/alertmanager.base.yml. When some Notification Channel is disabled, managed recollects unique channel combinations excluding disabled channels and regenerates receivers and routing rules. If the rule has only one specified channel and it was disabled then a special disabled receiver will be used for that. Unlike empty receiver, disabled can’t be redefined by the user and always means “do nothing”.  It prevents unexpected behavior after channels disabling. After each Alertmanager configuration update, managed asks Alermanager to reload it.

When Alertmanager receives an alert from VMAlert, it uses routes to find an appropriate receiver and forward alerts to destination channels. The user also can observe alerts via PMM UI. In that case, managed gets all available alerts from Alertmanager API and applies required filters before showing them.


The Integrated Alerting feature has many moving parts, and functionally it’s more about managing configuration for different components and making them work together. It provides a really nice way to be aware of important events in your system. While it’s still in tech preview state, it’s already helpful. With built-in templates, it’s easy to try without diving into documentation about Prometheus queries and other stuff. So please try it and tell us about your experience. What parameters of a system you would like to have covered with templates? What use cases do you have for alerting? We will happy to any feedback.


Add Microsoft Azure Monitoring Within Percona Monitoring and Management 2.16.0

microsoft azure percona monitoring and management

microsoft azure percona monitoring and managementThe Microsoft Azure SQL Database is among the most popular databases of 2020, according to DB-Engine’s DBMS of the Year award. Also, it’s steadily climbing up in DB-Engines Ranking. The ranking is updated monthly and places database management systems according to their popularity. In case you didn’t know, DB-Engines is an initiative to collect and present information on database management systems (DBMS).

So we are excited to share that you can now monitor Azure instances in the Percona Monitoring and Management 2.16.0 (PMM) release. PMM can collect Azure DB metrics as well as available system metrics.

Only basics metrics are provided by Azure Portal.

No Disk, virtual CPU, or RAM data are available in PMM dashboards. Here is an example of a home page with a monitored Azure service. It’s shown in the middle row.

Microsoft Azure Monitoring Within Percona Monitoring and Management
DB metrics are collected by exporters from services directly. It allows you to have all possible metrics. You can find some screenshots of MySQL and PostgreSQL dashboards at the end of this blog post.

Simple Steps to Add an Azure DB Service and Get Metrics in PMM

  • The feature is a technical preview and has to be enabled on the Setting page. Turning this feature OFF will not remove added Services from monitoring, it will just hide the ability to dscover and add new Microsoft Azure Services.

Add an Azure DB Service PMM
This feature is a technical preview because we are releasing it as soon as possible to get some feedback from users. We are expecting to do more work on this feature, to make it more API and resource-efficient.

  • Go to page “Add Instance” (Configuration … PMM Inventory … Add instance)

  • Press the button “Microsoft Azure MySQL or PostgreSQL” and fill in the requested Azure and DB credentials.

Microsoft Azure MySQL or PostgreSQL

Please follow the link “Where do I get the security credentials for my Azure DB instance” if some credential parameters are missing.

Also, please keep in mind that a separate node will be created for each service. It’s named as a service hostname and can’t be changed. But you may specify a service name when adding service details. By default, node and service names are equal.

That’s it. You may go to the list of dashboards and observe collected data.

If you are a Microsoft Azure user or going to become one, please give Percona Monitoring and Management a test run.  We are always open to suggestions and propositions.  Please contact us, leave a message on our forum, or join the slack channel.

Here are screenshots of the “MySQL Instance Summary” and “PostgreSQL Instance Summary” dashboards for Azure instances.


Read more about the release of Percona Monitoring and Management 2.16 and all the exciting new features included with it!

Percona Live ONLINE, the open source database conference, is coming up quickly! Registration is now OPEN… and FREE! 


Percona Monitoring and Management 2.16 Brings Microsoft Azure Monitoring via a Technical Preview

Percona Monitoring and Management 2.16 release

Percona Monitoring and Management 2.16 releaseThis week we release Percona Monitoring and Management 2.16 (PMM), which brings some exciting new additions we’d like to highlight!

Amazon RDS PostgreSQL Monitoring

AWS monitoring in PMM now covers PostgreSQL RDS and PostgreSQL Aurora types. PMM will include them in a Discovery UI where they can be added which will result in node-related metrics as well as PostgreSQL database performance metrics. Before this release, this was available only to MySQL-related instances from Amazon RDS.

Security Threat Tool Scheduling

Security Threat Tool users are now able to control the Security Check execution time intervals for groups of checks, move checks between groups, and disable individual checks if necessary, allowing for an even more configurable experience for users.

Microsoft Azure Discovery and Node Metrics Extraction

Percona Monitoring and Management now monitors Azure instances and can collect Azure DB metrics as well as available System metrics. (Please note that only basic metrics are provided by Azure Portal.)

This means that as of today our Technical Preview has PMM providing the same level of support for Microsoft Azure Database as a Service (DBaaS) as we have for AWS’s DBaaS (RDS/Aurora on MySQL or PostgreSQL). Users are able to easily discover and add Azure databases for monitoring by PMM complete with node-level monitoring. This feature is available only if you explicitly activate it on the PMM Settings page. Deactivating it will not remove added services from monitoring, but will just hide the ability to discover and add new Microsoft Azure Services. Read more about Microsoft Azure monitoring within Percona Monitoring and Management.

Percona Monitoring and Management 2.16

Percona Live, the open source database conference, is going to be even BIGGER and BETTER in 2021. Registration is now OPEN! 

Improvements to Integrated Alerting within PMM

The PMM 2.16 release also brings numerous improvements to the Technical Preview of Integrated Alerting within Percona Monitoring and Management. You can read more on the design and implementation details of this work at that link.

Additional PMM 2.16 release highlights include…

Support for pg_stat_monitor v0.8

Technical Preview: Added compatibility with pg_stat_monitor plugin v 0.8.0. This is not exposing the new features for the plugin in PMM yet but ensures Query Analytics metrics are collected to the same degree it was with version 0.6.0 of the plugin.

[DBaaS] Resource planning and prediction (Resource calculator)

The Preview of DBaaS in PMM: While creating a DB cluster a user can see a prediction of the resources this cluster will consume with all components as well as the current total and available resources in the Kubernetes cluster. Users will be warned that if they attempt to create a DB cluster; it may be unsuccessful because of available resources in the Kubernetes cluster.

[DBaaS] Percona Server for MongoDB 1.7.0 Operator Support

The Preview of DBaaS in PMM will be using the recently-released Percona Kubernetes Operator for Percona Server for MongoDB 1.7.0 to create MongoDB clusters.


The release of PMM 2.16 includes many impressive enhancements AND brand new features for our user base. We hope as always that you will continue to let us know your thoughts on these new PMM v2 features as well as any ideas you have for improvement!

Download and try Percona Monitoring and Management today! Read the PMM 2.16 full release notes.



Monitoring OPNSense Firewall with Percona Monitoring and Management

OPNsense firewall Percona Monitoring Management

OPNsense firewall Percona Monitoring ManagementI try to use Open Source when a good solution exists, so for my home firewall, I’m using OPNSense  – a very powerful FreeBSD-based firewall appliance with great features along with a powerful GUI.

One of the plugins available with OPNSense is  node_exporter, which exposes a lot of operating system metrics through the Prometheus protocol.

Installing this plugin will allow you to monitor your OPNSense based firewall with any Prometheus-compatible system including, as you have guessed,  Percona Monitoring and Management (PMM).

For best results, you will need PMM 2.14 or later, as it has improved support for external exporters.

Adding OPNSense to PMM for monitoring requires just one simple command:


pmm-admin add external-serverless --url= --external-name fw01 --group opnsense


Let’s break down what this command does:

  • We are adding this as “serverless” exported because there are no pmm-agent processes running on that node and the only access we have to it is through the Prometheus protocol.
  • is the IP of the firewall.  Port 9100 is what OPNSense uses by default.
  • I chose to name this firewall “fw01” for purpose of monitoring, this is how it will be identified in PMM.
  • We put it in the group “opnsense” which will allow us to easily have dashboards that are focused on OPNSense firewalls only, not accidentally picking data from other services.

If you prefer, you can also use your PMM installation instead (See PMM -> PMM Add Instance Menu) and pick “External Service”.

PMM External Service


After this step, we will already have some information available in our PMM installation.


PMM dashboard


The Node Summary Dashboard will pick up some of the OS metrics, however, as this dashboard is built with a focus on Linux rather than FreeBSD, we will not have all data populated or tested to be correct, and this should be seen as a lucky incident rather than an expected outcome.

The next step you can take is to look if there are any dashboards available for the system you’re looking to monitor.  A quick search located this dashboard on the Grafana website.

While this dashboard was a good start, it relied on very particular naming of the hosts in order to work and had some bugs which needed fixing.   If a given dashboard was not designed to work with PMM, you also often need to make some adjustments because PMM applies different labels to the metrics compared to a vanilla Prometheus installation.

I uploaded an updated dashboard back to the Grafana website.

This makes installing it with PMM very easy; just go to Import Dashboard and Enter Dashboard ID – 14150


import dashboard percona monitoring and management


Once the dashboard is imported you will see a variety of data the OpnSense built-in node_exporter provides:


OpnSense built-in node_exporter


That’s it!

Percona Monitoring and Management is free to download and use. Try it today!


Webinar April 14: Optimize and Troubleshoot MySQL Using Percona Monitoring and Management

Troubleshoot MySQL Using Percona Monitoring and Management

Troubleshoot MySQL Using Percona Monitoring and ManagementOptimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering applications need to be able to handle changing traffic workloads while remaining responsive and stable in order to deliver an excellent user experience. Further, DBAs are also expected to find cost-efficient means of solving these issues.

In this webinar, we will demonstrate the advanced options of Percona Monitoring and Management V.2 that enable you to solve these challenges, which are built on free and open-source software. We will look at specific, common MySQL problems and review them.

Please join Peter Zaitsev on Wednesday, April 14th, 2021, at 11 am EDT for his webinar Optimize and Troubleshoot MySQL using Percona Monitoring and Management (PMM).

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.


How To Automate Dashboard Importing in Percona Monitoring and Management

Automate Dashboard Importing in Percona Monitoring and Management

Automate Dashboard Importing in Percona Monitoring and ManagementIn this blog post, I’ll look at how to import custom dashboards into Percona Monitoring and Management (PMM) 2.x, and give some tips on how to automate it.

The number of dashboards in PMM2 is constantly growing. For example, we recently added a new HAProxy dashboard to the latest 2.15.0 release. Even though the PMM server has more than fifty dashboards, it’s not possible to cover all common server applications.

The greatest source of dashboards is the official Grafana site. Here, anyone can share their own dashboards with the community or find already uploaded ones. Percona has its own account and publishes as-yet-unreleased or unique (non-PMM) dashboards.

Each dashboard has its own number which can be used to refer to it. For example, 12630 is assigned to the dashboard “MySQL Query Performance Troubleshooting”.
Percona Monitoring and Management Dashboard

You can download dashboards as JSON files and import them into your PMM2 installation using the UI.

This is easy, but we are forgetting that dashboards can be updated by publishers as new revisions. So it’s possible that the dashboard has a bunch of useful changes that were published after you downloaded it. But, you keep using an old version of the dashboard.

So the only way to use the latest dashboard version is to check the site from time to time. It can really be a pain in the neck, especially if you have to track more than one dashboard.

This is why it’s time to take a look at automation. Grafana has a very powerful API that I used to create this shell script. Let’s take a peek at it. It’s based on the api/dashboards/import API function. The function requires a POST request with a dashboard body.

The first step is to download a dashboard.

curl -s https://grafana.com/api/dashboards/12630/revisions/1/download --output 12630_rev1.json

Note how I used dashboard number 12630 and revision 1 in the command. By increasing the revision number I can find out the latest available dashboard version. This is exactly the approach used in the script.

In the next example, I’ll use a dashboard from our dashboard repository. (I will explain why later.)

curl -L -k https://github.com/percona/grafana-dashboards/raw/PMM-2.0/dashboards/Disk_Details.json --output Disk_Details.json

Now I have a file and can form a POST request to import the dashboard into a PMM installation.

$ curl -s -k -X POST -H "Content-Type: application/json" -d "{\"dashboard\":$(cat Disk_Details.json),\"overwrite\":true}" -u admin:admin

The dashboard has been uploaded. If you take a look at the output you may notice the parameter folderId. With this, it’s possible to specify a Grafana folder for my dashboards.

Here is the command for fetching a list of existing folders.

curl -s -k -u admin:admin

I now have folder IDs and can use them in the importing command. The Folder ID should be specified in a POST request as shown in the next example.

Now you are familiar with API import commands, I’ll give you a closer look at community dashboards.

Most of them have the parameter “Data Sources”.
It means that for dashboard importing, you have to specify the data source names assigned by your installation.

This point makes it impossible to import any downloaded dashboards with the API without modifying them. If I execute the import command used earlier (the 12630_rev1.json file downloaded from Grafana.com) I will get an error.

So, here’s another script (cleanup_dash.py) that replaces the datasource fields in dashboards and allows me to pass an importing command. The script takes a dashboard file name as a parameter.

The importing script calls cleanup-dash.py automatically if an initial importing attempt was unsuccessful.

Note the parameters of the importing script. Here you should set the details of your PMM installation. dashboards is an array of dashboards IDs that you want to import into PMM2.

dashboards=(13266 12630 12470)

Now, you should download both scripts and try to import dashboards. Make sure that both scripts are executable and in the same folder. Here are the commands to do it.

curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/master/misc/import-dashboard-grafana-cloud.sh --output import-dashboard-grafana-cloud.sh
curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/master/misc/cleanup-dash.py --output cleanup-dash.py

chmod a+x import-dashboard-grafana-cloud.sh
chmod a+x cleanup-dash.py


You can next find the imported dashboards in your PMM installation. They were put into the ‘Insight’ folder and can be found by the keyword ‘PMM2’.

imported PMM dashboards

By default, the script imports all designed for PMM2 dashboards from Percona account. Also, folder names and dashboard IDs can be specified as parameters for the script.

Here are some usage examples:

import-dashboard-grafana-cloud.sh Default list of dashboards will be uploaded into General folder
import-dashboard-grafana-cloud.sh Insight Default list of dashboards will be uploaded into Insight folder
import-dashboard-grafana-cloud.sh 13266 12630 12470 Dashboards 13266 12630 12470 will be uploaded into General folder
import-dashboard-grafana-cloud.sh Insight 13266 12630 12470 Dashboards 13266 12630 12470 will be uploaded into Insight folder

You can define any number of dashboards in the script parameters and run the script periodically to always have the most recent dashboard versions.

Percona Monitoring and Management is free to download and use. Try it today!


3 Percona Software Products Take the SourceForge Leader Award!

Percona Software SourceForge Award

We are so grateful to all users of our software. Thanks to you, some of them have just been recognized as a Winter 2021 category leader by SourceForge!

The SourceForge Leader Award is only awarded to select products that have attained the highest levels of praise from user reviews on SourceForge.

This is a huge achievement, as Percona Monitoring and Management and Percona Server for MongoDB have been selected as best-in-class from over 60,000 products on SourceForge. SourceForge gets over 30 million visitors per month looking for business software and solutions.

Have open source expertise to share? Submit your talk for Percona Live ONLINE!

Thank you, all the users of the open source software products, for your trust and support. We highly appreciate it.

The best reviews are helpful to others by adding technical details and solutions. If you haven’t left a review for our software on SourceForge yet, we are looking forward to reading yours.

Percona Products Take SourceForge Leader Award


Percona Monitoring and Management 2.15 Brings Even MORE Reasons to Upgrade to PMM v2!

Percona Monitoring and Management - 2.15

Percona Monitoring and Management - 2.15In November of 2020, we announced that in early 2021 Percona was slated to release a version of Percona Monitoring and Management (PMM) v2 that would include all of the critical functionality users of PMM v1 have come to know and love over the years. In our initial blog, we also addressed some of the specifics related to features for which we had not yet achieved parity such as external services, annotations, MongoDB Explain, and custom collectors per service to name a few.

Well friends the time has come, and we’re happy to announce that any remaining critical parity items have been completed… but even MORE importantly, the enhancements to Percona Monitoring and Management v2 are ones you won’t want to miss out on. This means one thing: if you haven’t already — IT’S TIME TO UPGRADE!

Some of the most recent work included in the PMM 2.15 release include:

Disable collectors while adding node/service to monitoring:

  • PMM users can disable any collector PMM utilizes to gather metrics. In certain situations, disabling the collector(s) prevents PMM from flooding logs or saves infrastructure resources if the given metrics simply aren’t needed. This is an early step towards providing our users full management capabilities when it comes to the metrics they collect. We will continue to expand this effort in future releases.

External services monitoring:

  • Prior to this release, PMM v2 did not support external services monitoring on systems that couldn’t also run the PMM client. BUT as of this week, any non-native services supported by PMM can now be monitored with external services monitoring. You can see the list of possible exporters to be used here: https://prometheus.io/docs/instrumenting/exporters/.

Provide summary information for systems (pt-*-summary actions):

  • With the addition of “pt-*-summary” in PMM v2, users can now view summary information pertaining to services and nodes within their PMM dashboard. Summary information is provided in the format of pt-*-summary tools output, in order to simplify the portability of this data. This format will also be preserved when summary information is shared with the Percona Support team, simplifying their investigations of issues.

Note: “pt-*-summary” includes formats for: 

  • pt-mysql-summary
  • pt-mongodb-summary
  • pt-pg-summary
  • pt-summary


Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!


HAProxy support by PMM

  • Users are now able to add HAProxy services to be monitored in PMM v2. This allows users who use HAProxy in their HA configuration to have this component also monitored by PMM.

As a refresher, PMM v2 users also benefit from other valuable enhancements over PMM v1, including:

  • A complete rewrite of the Query Analytics (QAN) tool, including improved speed, global sparkline hover, filtering, new dimensions to collect data, and rich searching capabilities.
  • Our Security Threat Tool (STT) so that you not only can monitor database performance but also database security vulnerabilities.
  • A robust expansion of MongoDB and PostgreSQL support (along with continued improvements for MySQL).
  • Integration with external AlertManager to create and deploy alerting and “integrated alerting” to provide native alerting inside PMM itself.
  • Global and local annotations across nodes and services to highlight key events for correlation. Get to the “WHY” and easily see changes occurring in your environment(s).

There is no better time than now to upgrade to Percona Monitoring and Management v2!

One last reminder, we are flipping the latest version flag to the PMM v2 series from PMM v1 with this release.

Please note that this does NOT mean that we are “sunsetting” PMM v1 and will no longer support that application. While we are not creating new features for PMM v1, we do continue to maintain it with critical bug fixes as needed as well as support for the product for those customers on a support contract. This maintenance and support will continue until PMM moves to version 3.x at a date to be determined in the future.

Let us know your thoughts on these new PMM v2 features as well as any ideas you have for improvement.

Download and Try Percona Monitoring and Management Today!

Read the PMM 2.15 full release notes here


Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator

Kubernetes Costs Percona Monitoring and Management

Kubernetes Costs Percona Monitoring and ManagementMore and more companies are adopting Kubernetes, but after some time they see an unexpected growth around cloud costs. Engineering teams did their part in setting up auto-scalers, but the cloud bill is still growing. Today we are going to see how Percona Monitoring and Management (PMM) can help with monitoring Kubernetes and reducing the costs of the infrastructure.

Get the Metrics


Prometheus Operator is a great tool to monitor Kubernetes as it deploys a full monitoring stack (prometheus, grafana, alertmanager, node exporters) and works out of the box. But if you have multiple k8s clusters, then it would be great to have a single pane of glass from which to monitor them all.

To get there I will have Prometheus Operator running on each cluster and pushing metrics to my PMM server. Metrics will be stored in VictoriaMetrics time-series DB, which PMM uses by default since the December 2020 release of version 2.12.

Prometheus Operator


I followed this manual to the letter to install my PMM server with docker. Don’t forget to open the HTTPS port on your firewall, so that you can reach the UI from your browser, and so that the k8s clusters can push their metrics to VictoriaMetrics through NGINX.

Prometheus Operator

On each Kubernetes cluster, I will now install Prometheus Operator to scrape the metrics and send them to PMM. Bear in mind that Helm charts are stored in prometheus-community repo.

Add helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Prepare the configuration before installing the operator

$ cat values.yaml
    create: false

  enabled: false

    enabled: false

extraScrapeConfigs: |
    - url: https://{PMM_USER}:{PMM_PASS}@{YOUR_PMM_HOST}/victoriametrics/api/v1/write

      kubernetes_cluster_name: {UNIQUE_K8S_LABEL}

  • Disable alertmanager, as I will rely on PMM
  • Add remote_write section to write metrics to PMM’s VictoriaMetrics storage
    • Use your PMM user and password to authenticate. The default username and password are admin/admin. It is highly recommended to change defaults, see how here.
    • /victoriametrics

      endpoint is exposed through NGINX on PMM server

    • If you use https and a self-signed certificate you may need to disable TLS verification:
   insecure_skip_verify: true

  • external_labels

    section is important – it labels all the metrics sent from Prometheus. Each cluster must have a unique


    label to distinguish metrics once they are merged in VictoriaMetrics.

Create namespace and deploy

kubectl create namespace prometheus
helm install prometheus prometheus-community/prometheus -f values.yaml  --namespace=prometheus


Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!


  • PMM Server is up – check
  • Prometheus Operators run on Kubernetes Clusters – check

Now let’s check if metrics are getting to PMM:

  • Go to PMM Server UI
  • On the left pick Explore

PMM Server UI

  • Run the query

It should return the information about the Nodes running on the cluster with UNIQUE_K8S_LABEL. If it does – all good, metrics are there.

Monitor the Costs

The main reasons for the growth of the cloud bill are computing and storage. Kubernetes can scale up adding more and more nodes, skyrocketing compute costs. 

We are going to add two dashboards to the PMM Server which would equip us with a detailed understanding of how resources are used and what should be tuned to reduce the number of nodes in the cluster or change instance types accordingly:

  1. Cluster overview dashboard
  2. Namespace and Pods dashboard

Import these dashboards in PMM:

dashboards in PMM

Dashboard #1 – Cluster Overview

The goal of this dashboard is to provide a quick overview of the cluster and its workloads.

Cluster Overview

The cluster on the screenshot has some room for improvement in utilization. It has a capacity of 1.6 thousand CPU cores but utilizes only 146 cores (~9%). Memory utilization is better – ~62%, but can be improved as well.

Quick take:

  • It is possible to reduce # of nodes and get utilization to at least 80%
  • Looks like workloads in this cluster are mostly memory bound, so it would be wiser to run nodes with more memory and less CPU.

Graphs in the CPU/Mem Request/Limit/Capacity section gives a detailed view of resource usage over time:

CPU/Mem Request/Limit/Capacity section

Another two interesting graphs would show us the top 20 namespaces that are wasting resources. It is calculated as the difference between requests and real utilization for CPU and Memory. The values on this graph can be negative if requests for the containers are not set.

This dashboard also has a graph showing persistent volume claims and their states. It can potentially help to reduce the number of volumes spun up on the cloud.

Dashboard #2 – Namespace and Pod

Now that we have an overview, it is time to dive deeper into the details. At the top, this dashboard allows the user to choose the Cluster, the Namespace, and the Pod.

At first, the user sees Namespace details: Quotas (might be empty if Resource Quotas are not set for the namespace), requests, limits, and real usage for CPU, Memory, Pods, and Persistent Volume Claims.

Namespace and Pod

The Namespace on the screenshot utilizes almost zero CPU cores but requests 20+ cores. If requests are tuned properly, then the capacity required to run the workloads would drop and the number of nodes can be reduced.

The next valuable insight that the user can pick from this dashboard is real Pod utilization – CPU, Memory, Network, and disks (only local storage).

Pod CPU Usage

In the case above you can see CPU and Memory container-level utilization for Prometheus Pod, which is shipping the metrics on one of my Kubernetes clusters.


This blog post equips you with the design to collect multiple Kubernetes clusters metrics in a single time-series database and expose them on the Percona Monitoring and Management UI through dashboards to analyze and gain insights. These insights help you drive your infrastructure costs down and highlight issues on the clusters.

Also, look to PMM on Kubernetes for monitoring of your databases – see our demo here and contact Percona if you are interested in learning more about how to become a Percona Customer, we are here to help!

The call for papers for Percona Live is open. We’d love to receive submissions on topics related to open-source databases such as MySQL, MongoDB, MariaDB, and PostgreSQL. To find out more visit percona.com/live.


Request for Comments: Global Processlist in Percona Monitoring and Management

Global Processlist in Percona Monitoring and Management

Global Processlist in Percona Monitoring and ManagementOne piece of feedback I often hear from users of Percona Monitoring and Management  (PMM) is that while the Query Analytics feature is great and provides a lot of insights into queries the server handled, it can’t help us to see which queries are running now.

Problem Statement

Real-time access to queries that are running right now can be extremely helpful in case of pileups if the optimizer gets crazy, a bad query id is deployed, or some unexpected locking situation takes place. The usual result is that many queries of the same kind pile up… and if you’re not lucky, they may not complete for many minutes or even hours, all this time invisible in PMM.

Proposed Solution

In addition to Query History, what Query Analytics really provides now is access to “Live Queries”.  This basically gathers currently-running queries from all the nodes that a user currently observes (could be one node or could be a hundred), where queries can be grouped and sliced in a way similar to how Query Analytics works.

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

For example, for a given a QueryId (Query Pattern), we can see how many instances of such a query are running right now, what the maximum and average execution time is so far, what database hosts and what databases it is active for, what client IPs and users this query are coming from, etc.

Some other Query Analytics features such as EXPLAIN for a query, information about involved tables, etc., also remain relevant for running queries too.

Also, working with current events and not just history means we can do more than just observe them. I could imagine killing some particular query instance or even all queries which match a particular pattern, which would be handy too.

What do you think? Would having such a “Global Processlist” feature in Percona Monitoring and Management be helpful for you?   Anything else we should consider? Let me know in the comments!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com