We’ve received numerous reports about issues occurring with Percona Monitoring and Management (PMM) when updating to the latest version (currently 2.42.0), or updating to any version from an old version (2.37.1 and before) using the update button on the UI. Symptoms PMM won’t show any notifications on the UI about new available versions if: You’ve […]
27
2024
19
2024
How to Monitor Valkey/Redis with Percona Monitoring and Management
In this blog post, we will show you how to add external exporters to Percona Monitoring and Management (PMM) and monitor various services. Our focus will be on Valkey/Redis, but the principles apply to other technologies as well. PMM uses VictoriaMetrics, which is compatible with Prometheus. Prometheus has numerous exporters that you can use. When […]
21
2024
High Availability: Choosing the Right Option for Your Percona Monitoring and Management
Ensuring your monitoring system stays up and running is vital. High availability (HA) minimizes downtime for Percona Monitoring and Management (PMM) during hardware failures, in times of disaster recovery, or increased usage of the tool. It’s not just about extra storage, RAM, or CPU but rather having redundant systems ready to take over seamlessly, like […]
08
2024
Mastering Database Monitoring: Running PMM in High Availability Mode
Percona Monitoring and Management (PMM) has become a valuable tool for database professionals, providing comprehensive insights into database health and performance. A recent update (version 2.41.0) introduced a significant enhancement: the ability to run PMM in high availability (HA) mode. This feature, currently in technical preview, offers exciting possibilities for ensuring the reliability and robustness […]
01
2024
Simplify the Use of ENV Variables in Percona Monitoring and Management AMI
The Percona Monitoring and Management (PMM) Amazon Machine Image (AMI) currently lacks native support for ENV variables. In this guide, we’ll walk through a straightforward workaround that simplifies the process of using ENV variables in PMM AMI and reapplying them after an upgrade.Step one: Adding ENV variables to /srv/.envBegin by consolidating your ENV variables in […]
21
2023
Percona Monitoring and Management High Availability – A Proof of Concept
Percona Monitoring and Management (PMM) is a state-of-the-art piece of software that exists in part thanks to great open source projects like VictoriaMetrics, PostgreSQL, and ClickHouse. The integration of those projects, plus the years of Percona expertise in the database space, makes PMM one of the best database monitoring solutions on the market.
Being software composed of different, multiple technologies can add complexity to a well-known concept: High Availability (HA). Achieving HA for PMM, as a whole, has proved to be a challenge but not an impossible task.
The easy part
Setting up the PMM cluster is the easy part. All it needs is a reverse proxy in front of a couple or more PMM instances. The go-to proxy is HAProxy configured for active/passive topology, that is, without load distribution.
For the purpose of the PoC, a single HAProxy instance is used (running as a docker container). The configuration file looks like this:
global stats socket /var/run/api.sock user haproxy group haproxy mode 660 level admin expose-fd listeners log stdout format raw local0 info defaults mode http timeout client 10s timeout connect 5s timeout server 10s timeout http-request 10s log global frontend stats bind *:8404 stats enable stats uri / stats refresh 10s frontend pmmfrontend bind :80 default_backend pmmservers backend pmmservers option tcp-check server pmm1 172.31.12.174:80 check port 80 server pmm2 172.31.11.132:80 check port 80 backup
The Docker container is run with this command:
docker run -d --name haproxy -v $(pwd):/usr/local/etc/haproxy:ro -p 80:80 -p 443:443 -p 8404:8404 haproxytech/haproxy-alpine:2.
The -v for the volume guarantees that the local copy of the haproxy.cfg file is the one used inside the container. Whenever you make a change in the cfg file, for the haproxy container to use it, just execute:
docker kill -s HUP haproxy
And to follow the haproxy logs:
docker logs -f haproxy
We have two frontends: One for the HAProxy stats and another for the PMM itself. The backend is a single one where the “passive” PMM instance (the one that is a pure “read replica”) is marked as “backup” so that traffic is only routed there in case the primary fails the health check.
For simplicity, the PMM instances are configured to listen to the 80 port (http) on the private IPs. This is made to avoid SSL certificates since everything goes through the same VPC (everything runs on ec2 instances on AWS). The health check, then, can be a simple “tcp-check” against port 80.
As you can see, stats are available via the port 8404. With this, the easy part is done.
For this example, the PMM SERVER endpoint will be where the HAProxy frontend is listening, and that’s the one used when registering a new PMM CLIENT.
And you can access PMM always using the same endpoint.
The not-so-easy part (made easy)
The proxy is configured to be aware of two different PMM instances — pmm1 and pmm2 — (using the private IPs 172.31.12.174 and 172.31.11.132 in this case), but we haven’t mentioned anything about those PMM instances.
And here is the not-so-easy part: One has to deploy at least two PMM instances on at least two different servers AND set up replicas. How to do it? This is the actual Proof of Concept: Enter the PMM HA script: https://github.com/nethalo/pmmha
The script will take care of installing PMM (if you already have it, you can skip this step), preparing the Primary, and setting up the Secondary. Simple as that. PMM will remain to be a black box.
The requirements are:
Steps to run the script:
git clone https://github.com/nethalo/pmmha.git cd pmmha bash pmm.sh
Failover
The failover will be handled by the HAProxy automatically when it detects that the current primary is no longer available. Traffic will be routed to the backup server from the backend, which, if properly set as a replica, will already have the historical data for metrics and QAN, and also the inventory will be ready to continue the data scrapping from the exporters.
Your feedback is key
We want to hear from you! Suggestions, feature requests, and comments in general are welcome, either in the comment section or via the GitHub repository.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
19
2023
Setting Up Percona Monitoring and Management Alerts for External Channels (Telegram, Slack, WebHook)
Setting up Percona Monitoring and Management (PMM) alerts for multiple channels can significantly enhance your monitoring strategy. In this blog post, we will talk about the steps to configure alerts for some well-known communication platforms like Telegram, Slack, and WebHook.
Please note that I am not covering the basic alerting and configuration setup. For that, you can refer to the official manual:- https://docs.percona.com/percona-monitoring-and-management/get-started/alerting.html#percona-alerting
Before using these channels, it is essential to first create some alerts for notification purposes.
https://docs.percona.com/percona-monitoring-and-management/get-started/alerting.html
Setting up PMM alerts for Slack
1) Go to the Slack API website – https://api.slack.com/apps
2) Then, click on “Create New App” and follow the prompts to create a new app for your workspace.
- Choose the “slack workspace” and define the “app name”. You can also create/add a new workspace in Slack as required instead of using the existing ones.
- Then, select the “Bots” section.
- And, click the “Review Scopes to Add” section.
- Here, we need to add the different “Scopes” levels or types of access for the Slack app.
- Then, hit the “Install to Workplace” section.
3) Now, you will see the “OAuth Token,” which we will use later in PMM Slack configurations.
4) Next, we need to add the app “pmm_alerts” to one or more Slack channels in order to get the notifications.
5) Finally, we can add the details below in the PMM -> Alerting -> Contact Points section.
TOKEN: xoxb-5840606778679-6229228252662-teofbDQUiFCnrp3cZT08epyL ## Bot Api Token Recipient: test ###channel name
- On successful testing we should see some test alerts.
- The original alert message based on the trigger condition will look like below.
Setting up PMM Alerts for Telegram
For the Telegram app, we also need the “BOT API Token” and “Chat ID” details to set up alert notifications.
1) Create a new bot in Telegram. Select “@BotFather” in the search tab and send the command “/newbot” in the chat section.
2) Provide a “name” for your bot and then a “username” for the bot. The username needs to end in “bot”.
Here, we have received the BOT API Token :- 6396505873:AAEQT5DCFAlzpqqdh9p69YwiQermTespfDA
3) Now, we need to change the “group privacy policy” in order to allow the bot to read messages sent to any group it is a member of.
- In the “@BotFather” section, type “/setprivacy” for user:- “@pmm_test_alerts_bot” and disable “Disable” the group policy as below:
4) After disabling group privacy, create a group “pmm_alerts” and add the new bot: “pmm_test_alerts” to that group.
So here, we have successfully created the group. Now, we need to send at least one message in order to activate the group.
5) Next, we need the second input, “Chat ID,” that PMM requires. This can be done using either curl or directly running the URL on the web browser.
Here, we are using the complete token “6396505873:AAEQT5DCFAlzpqqdh9p69YwiQermTespfDA” we got in step 2. In order to use this with the API, we need the prefix “bot” in the string.
curl https://api.telegram.org/bot6396505873:AAEQT5DCFAlzpqqdh9p69YwiQermTespfDA/getUpdates
Output:
{"ok":true,"result":[{"update_id":815074387, "message":{"message_id":3,"from":{"id":6452928862,"is_bot":false,"first_name":"Anil","last_name":"Joshi"},"chat":{"id":-4013864418,"title":"pmm_alerts","type":"group","all_members_are_administrators":true},"date":1700735471,"text":"hi"}}]}
So finally, we got the CHAT ID “-4013864418″ as well.
6) Now it’s time to use the above details in the PMM -> Alerting -> Contact Points section.
BOT API TOKEN: 6396505873:AAEQT5DCFAlzpqqdh9p69YwiQermTespfDA CHAT ID: -4013864418
- On successful testing we should see some test alerts.
- If we trigger a real alert, it will appear like this.
Setting up PMM Alerts for WebHook
Webhooks are a powerful tool for building integrations between different applications or services, enabling them to work together seamlessly. They are widely used in web development, APIs, and cloud services to create more dynamic and responsive systems.
URL/API:
https://xxx.mn/v1/main/update/status
In simple terms, it’s just an API that can be created in any programming language (PHP, Java, Node Js, etc.) to send and get the response and integrate that with any 3rd party applications.
Here, we are simply integrating the URL with the PMM in order to get the response.
Response from the URL
{"Info":"{"receiver":"grafana-default-email","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"pmm_mysql_down Alerting Rule","grafana_folder":"MySQL","node_name":"localhost.localdomain","percona_alerting":"1","service_name":"localhost.localdomain-mysql","severity":"critical","template_name":"pmm_mysql_down"},"annotations":{"description":"MySQL localhost.localdomain-mysql on localhost.localdomain is down.","summary":"MySQL down (localhost.localdomain-mysql)"},"startsAt":"2023-11-24T03:45:10Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"https://localhost/graph/alerting/grafana/1E1kb3SSz/view","fingerprint":"3be1993cc9a48420","silenceURL":"https://localhost/graph/alerting/silence/new?alertmanager=grafana&matcher=alertname%3Dpmm_mysql_down+Alerting+Rule&matcher=grafana_folder%3DMySQL&matcher=node_name%3Dlocalhost.localdomain&matcher=percona_alerting%3D1&matcher=service_name%3Dlocalhost.localdomain-mysql&matcher=severity%3Dcritical&matcher=template_name%3Dpmm_mysql_down","dashboardURL":null,"panelURL":null,"valueString":"[ var='A' labels={node_name=localhost.localdomain, service_name=localhost.localdomain-mysql} value=1 ]"}],"groupLabels":{"alertname":"pmm_mysql_down Alerting Rule","grafana_folder":"MySQL"},"commonLabels":{"alertname":"pmm_mysql_down Alerting Rule","grafana_folder":"MySQL","node_name":"localhost.localdomain","percona_alerting":"1","service_name":"localhost.localdomain-mysql","severity":"critical","template_name":"pmm_mysql_down"},"commonAnnotations":{"description":"MySQL localhost.localdomain-mysql on localhost.localdomain is down.","summary":"MySQL down (localhost.localdomain-mysql)"},"externalURL":"https://localhost/graph/","version":"1","groupKey":"{}:{alertname="pmm_mysql_down Alerting Rule", grafana_folder="MySQL"}","truncatedAlerts":0,"orgId":1,"title":"[FIRING:1] pmm_mysql_down Alerting Rule MySQL (localhost.localdomain 1 localhost.localdomain-mysql critical pmm_mysql_down)","state":"alerting","message":"**Firing**nnValue: [ var='A' labels={node_name=localhost.localdomain, service_name=localhost.localdomain-mysql} value=1 ]nLabels:n - alertname = pmm_mysql_down Alerting Rulen - grafana_folder = MySQLn - node_name = localhost.localdomainn - percona_alerting = 1n - service_name = localhost.localdomain-mysqln - severity = criticaln - template_name = pmm_mysql_downnAnnotations:n - description = MySQL localhost.localdomain-mysql on localhost.localdomain is down.n - summary = MySQL down (localhost.localdomain-mysql)nSource: https://localhost/graph/alerting/grafana/1E1kb3SSz/viewnSilence: https://localhost/graph/alerting/silence/new?alertmanager=grafana&matcher=alertname%3Dpmm_mysql_down+Alerting+Rule&matcher=grafana_folder%3DMySQL&matcher=node_name%3Dlocalhost.localdomain&matcher=percona_alerting%3D1&matcher=service_name%3Dlocalhost.localdomain-mysql&matcher=severity%3Dcritical&matcher=template_name%3Dpmm_mysql_downn"}"} [
The above response can be generated via directly submitting the URL or by using tools like Postman or cURL, which are widely used to interact with the HTTP-based APIs.
curl -X POST -d /tmp/file.json -H "Content-Type: application/json" https://xxx.mn/v1/main/update/status
There are a few use cases where webhook would be useful:
- Webhooks allow systems to receive real-time updates when certain events occur. For example, in a messaging application, a webhook can be used to notify a third-party service whenever a new message is received.
- Webhooks are commonly employed for sending notifications. This could include alerts for system events, status changes, or important updates. For example, a monitoring system can use webhooks to notify administrators when there’s a critical issue.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
Download Percona Monitoring and Management Today
Further references:
07
2023
How to Filter or Customize Alert Notifications in Percona Monitoring and Management (Subject and Body)
In many scenarios, the standard alert notification template in Percona Monitoring and Management (PMM), while comprehensive, may not align perfectly with specific operational needs. This often leads to an excess of details in the notification’s “Subject” and “Body”, cluttering your inbox with information that may not be immediately relevant.
The focus today is on tailoring these notifications to fit your unique requirements. We’ll guide you through the process of editing the “Subject” and “Body” in the PMM UI, ensuring that the alerts you receive are filtered and relevant to your specific business context.
Please note: This post assumes a foundational understanding of basic alerting and configuration in PMM. For those new to these concepts, we recommend consulting the documentation on “SMTP” and “PMM Integrated/Grafana alert” for a primer.
Customizing the “Subject” section of alert notification
1) The default “Subject” will look something like below.
2) Now, let’s proceed to edit the “subject” content.
I) First, we need to create a new message template called “email.subject” in Alerting -> Contact points with the following content.
Template_name: email.subject
{{ define "email.subject" }} {{ range .Alerts }} Percona Alert | {{ .Labels. }} | {{ .Labels.node_name }} | {{ .Labels.DB }} {{ end }} {{ end }}
Here, we are simply using the range to iterate over the alert labels. We loop through the alert labels and extract the alert name and node name.
The provided template is written in Go’s templating language. For a more detailed understanding of the syntax and usage of templates, please refer to the official manual.
II) Then we need to edit the default contact point name inside “Alerting->Contact points”
And define the below “Subject” under “Optional Email Settings”.
{{ template "email.subject". }}
III) After successfully testing, we can save the changes.
That’s it. Now, if the alert triggers, we will observe a customized subject in the email.
Example:
Customizing the “Body” section of alert notification
1) Let’s first see how the notifications appear with the native alerting. This is a basic notification alert that triggers when the database/MySQL is down. As we can see, it includes additional information, such as various labels and a summary.
2) Now, suppose we want to get rid of some content and want only a few relevant details. This can be achieved by following the below outlined steps.
I) Go to Alerting -> Contact points and add new “Message templates”.
II) Next, create a notification template named “email” with two templates in the content: “email.message_alert” and “email.message”.
The “email.message_alert” template is used to display the labels and values for each firing and resolved alert, while the “email.message” template contains the email’s structure.
Template name: email.message
### These are the key-value pairs that we want to display in our alerts.### {{- define "email.message_alert" -}} AlertName = {{ index .Labels "alertname" }}{{ "n" }} Database = {{ index .Labels "DB" }}{{ "n" }} Node_name = {{ index .Labels "node_name" }}{{ "n" }} Service_name = {{ index .Labels "service_name" }}{{ "n" }} Service Type = MySQL {{ "n" }} Severity = {{ index .Labels "severity" }}{{ "n" }} TemplateName = {{ index .Labels "template_name" }}{{ "n" }} {{- end -}} ### Next, we have defined the main section that governs the alerting and firing rules. ### {{ define "email.message" }} There are {{ len .Alerts.Firing }} firing alert(s), and {{ len .Alerts.Resolved }} resolved alert(s){{ "n" }} ###Finally, the alerts and firing rules are invoked and triggered based on the generated alerts or fixes. ### {{ if .Alerts.Firing -}} Firing alerts:{{ "n" }} {{- range .Alerts.Firing }} - {{ template "email.message_alert" . }} {{- end }} {{- end }} {{ if .Alerts.Resolved -}} Resolved alerts:{{ "n" }} {{- range .Alerts.Resolved }} - {{ template "email.message_alert" . }} {{- end }} {{- end }} {{ end }}
The above template is written in Go’s templating language. To know more in detail about the syntax and template usage you can refer to the manual.
III) Lastly, simply save the template
3) Next, we will edit the default “Contact points” and define the below content under “Update contact point -> Optional Email settings->Message” for email. Similarly, you can add other channels as well, like Telegram, Slack, etc.
Execute the template from the “message” field in your contact point integration.
{{ template "email.message" . }}
Percona Alerting comes with a pre-configured default notification policy. This policy utilizes the grafana-default-email contact point and is automatically applied to all alerts that do not have a custom notification policy assigned to them.
Reference:- https://docs.percona.com/percona-monitoring-and-management/use/alerting.html#notification-policies
After verifying a successful test message, we can save the updated contact point.
4) Finally, once the alert is triggered, you will be able to see the customized notification reflecting only the defined key/values.
Moreover, we can also use “LABEL LOOPS” instead of defining the separate “Key/Value” pairs as we did in the above steps. In this way, we can have all the default parameters in iteration without explicitly defining each of them.
Here, we use a range to iterate over the alerts such that dot refers to the current alert in the list of alerts, and then use a range on the sorted labels so dot is updated to refer to the current label. Inside the range, use “.Name” and “.Value” to print the name and value of each label.
### applying label loop option ### {{- define "email.message_alert" -}} Label Loop: {{ range .Labels.SortedPairs }} {{ .Name }} => {{ .Value }} {{ end }} {{- end -}} {{ define "email.message" }} There are {{ len .Alerts.Firing }} firing alert(s), and {{ len .Alerts.Resolved }} resolved alert(s){{ "n" }} {{ if .Alerts.Firing -}} Firing alerts:{{ "n" }} {{- range .Alerts.Firing }} - {{ template "email.message_alert" . }} {{- end }} {{- end }} {{ if .Alerts.Resolved -}} Resolved alerts:{{ "n" }} {{- range .Alerts.Resolved }} - {{ template "email.message_alert" . }} {{- end }} {{- end }} {{ end }}
To add some more options, say (summary and description) in the customized alerts below, template changes can be performed.
I) First, you can add/update the “Summary and annotations” section inside the “alert rule” based on your preference.
II) Then, edit the below Message template (“email.message”) in Alerting->contact points with the updated changes.
Template name: email.message
{{- define "email.message_alert" -}} AlertName = {{ index .Labels "alertname" }}{{ "n" }} Database = {{ index .Labels "DB" }}{{ "n" }} Node_name = {{ index .Labels "node_name" }}{{ "n" }} Service_name = {{ index .Labels "service_name" }}{{ "n" }} Service Type = {{ index .Labels "service_type" }}{{ "n" }} Severity = {{ index .Labels "severity" }}{{ "n" }} TemplateName = {{ index .Labels "template_name" }}{{ "n" }} {{- end -}} {{ define "email.message" }} There are {{ len .Alerts.Firing }} firing alert(s), and {{ len .Alerts.Resolved }} resolved alert(s){{ "n" }} {{ if .Alerts.Firing -}} Firing alerts:{{ "n" }} {{- range .Alerts.Firing }} - {{ template "email.message_alert" . }} - {{ template "alerts.summarize" . }} {{- end }} {{- end }} {{ if .Alerts.Resolved -}} Resolved alerts:{{ "n" }} {{- range .Alerts.Resolved }} - {{ template "email.message_alert" . }} - {{ template "alerts.summarize" . }} {{- end }} {{- end }} {{ end }} {{ define "alerts.summarize" -}} {{ range .Annotations.SortedPairs}} {{ .Name }} = {{ .Value }} {{ end }} {{ end }}
Reference:- https://grafana.com/blog/2023/04/05/grafana-alerting-a-beginners-guide-to-templating-alert-notifications/
Sometimes, the alert notifications might appear in a single line instead of separate lines for all the Keys. Although this is not a regular behavior it can be fixed by using the below changes.
I) Access to the PMM Server
sudo docker exec -it pmm-server bash
II) Thereafter, you can edit the file:- “/usr/share/grafana/public/emails/ng_alert_notification.html” and replace the text in between lines (288 to 290) as below.
Replace:
{{ if gt (len .Message) 0 }} <div style="white-space: pre-line;" align="left">{{ .Message }} {{ else }}
With:
{{ if gt (len .Message) 0 }} <span style="white-space: pre-line;">{{ .Message }}</span> {{ else }}
Note: Please ensure to take the backup before making any changes to the PMM Server files. Moreover, these changes could be lost when doing a PMM upgrade, especially when Grafana is upgraded as part of PMM, so a backup of the edited version would also be needed for later restoration purposes.
III) Finally, you can restart the Grafana service.
supervisorctl restart grafana
Summary
Filtering in alert notifications proves useful in concealing extraneous information from the relevant users. Only the specified elements are displayed in the notification email, thereby preventing unnecessary clutter in the alert content.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
06
2023
PMM Dump: A New Tool to Support Percona Customers
In this post, I will present a new Percona tool: PMM Dump. The PMM Dump performs a logical backup of the performance metrics collected by the Percona Monitoring and Management (PMM) Server and imports them into a different PMM Server instance. PMM Dump allows you to share monitoring data collected by your PMM server securely with the Percona Support team. First General Availability version of PMM Dump was released on November 22, 2023.
In Percona Support, we help our customers to resolve performance issues. In order to do it, we need to see performance metrics collected at the time when the issue happens. PMM collects the metrics we need, and if a customer can provide us access to their PMM instance, we can start resolving the issue right away.
Unfortunately, not everyone who uses PMM can provide us access due to their companies’ security restrictions. We also cannot effectively use PMM snapshots because they require individual sharing of dashboards or panels. This requires a lot of manual work on both the customer’s and our sides.
To resolve this problem, we created a tool that allows customers to export performance and query analytics metrics into a dump file that we can later restore on our own secure, private, and temporary PMM instances and access the data as if it were collected by our own servers.
Once the issue is resolved and the corresponding ticket closed, we remove all data provided by customers. You can read more about Percona security and data privacy practices at https://www.percona.com/legal/percona-security-and-data-privacy-practices.
PMM Dump supports rich filtering options and native VictoriaMetrics or ClickHouse queries so users can decide which data they want to share with us.
PMM Dump is a command line tool that can be downloaded from https://www.percona.com/get/pmm-dump. Starting from PMM version 2.27.0, it is also shipped together with PMM. Unofficial ARM packages are available at https://github.com/Percona-Lab/percona-on-arm/releases.
By default, PMM Dump dumps all performance metrics collected in the last 4 hours:
$ pmm-dump export --pmm-url='https://admin:admin@127.0.0.1' > --allow-insecure-certs 2023-01-05T20:01:33+03:00 INF Credential user was obtained from pmm-url 2023-01-05T20:01:33+03:00 INF Credential password was obtained from pmm-url 2023-01-05T20:01:33+03:00 INF Exporting metrics... 2023-01-05T20:01:33+03:00 INF Processing 1/49 chunk... 2023-01-05T20:01:33+03:00 INF Processing 2/49 chunk... 2023-01-05T20:01:33+03:00 INF Processing 3/49 chunk... ... 2023-01-05T20:01:39+03:00 INF Processing 49/49 chunk... 2023-01-05T20:01:39+03:00 INF Writing chunk to the dump... filename=1672934793-1672935093.bin source=vm ... 2023-01-05T20:01:39+03:00 INF Successfully exported! $ls *tar.gz pmm-dump-1672938093.tar.gz
It will not collect query analytics data by default. To do it, specify the option
—dump–qan.
To specify a different time range, use options
—start–ts and
—end–ts:
$ pmm-dump export --start-ts="2021-01-02T15:04:05Z" --end-ts="2021-01-02T20:04:05+03:00"
To dump only specific instances, use option
—instance:
$ pmm-dump export --pmm-url='http://admin:admin@127.0.0.1' --instance='my_mysql_8025'
You can specify this option as many times as you need:
$ pmm-dump export --pmm-url= 'http://admin:admin@127.0.0.1' > --instance='supp-mongo_23599' > --instance='supp-mongo_23600' > --instance='supp-mongo_23601'
You can also filter by a dashboard:
$ pmm-dump export --pmm-url= 'http://admin:admin@127.0.0.1' > --dashboard='MySQL InnoDB Details' > --dashboard='Node Summary' --dashboard='MySQL Instance Summary'
On the Support side, we import the resulting dump using either the import command or
load–pmm–dump helper script.
$ ./pmm-dump import --pmm-url='https://admin:admin@127.0.0.1' > --allow-insecure-certs --dump-path=pmm-dump-1672938093.tar.gz 2023-01-05T20:09:46+03:00 INF Credential user was obtained from pmm-url 2023-01-05T20:09:46+03:00 INF Credential password was obtained from pmm-url 2023-01-05T20:09:46+03:00 INF Importing metrics... 2023-01-05T20:09:46+03:00 INF Opening dump file... path=pmm-dump-1672938093.tar.gz 2023-01-05T20:09:46+03:00 INF Processing chunk 'vm/1672923693-1672923993.bin'... 2023-01-05T20:09:46+03:00 INF Successfully processed 'vm/1672923693-1672923993.bin' ... 2023-01-05T20:09:55+03:00 INF Processing chunk 'vm/1672937793-1672938093.bin'... 2023-01-05T20:09:55+03:00 INF Successfully processed 'vm/1672937793-1672938093.bin' 2023-01-05T20:09:55+03:00 INF Successfully imported!
Once imported successfully, we can access the local PMM instance and start resolving performance issues for our customers.
For more information about PMM Dump and its rich filtering options, consult the user reference manual at https://docs.percona.com/pmm-dump-documentation/index.html.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
09
2023
PostgreSQL Checkpoints, Buffers, and WAL Usage with Percona Monitoring and Management
In this blog post, we will discuss how to extend Percona Monitoring and Management (PMM) to get PostgreSQL metrics on checkpointing activity, internal buffers, and WAL usage. With this data, we’ll be able to better understand and tune our Postgres servers.
We’ll assume there are working PostgreSQL and PMM environments set up already. You can search the blog for more information on this if needed. For generating load, we used pgbench with the following commands:
shell> pgbench -i -s 100 sbtest shell> pgbench -c 8 -j 3 -T $((60*60*4)) -N -S -P 1 sbtest
Creating a custom query collector
Note: This step will not be needed with soon-to-come postgresql_exporter PMM versions!
The first step is to include a new query that will gather data on WAL usage because this is not yet collected by default on the latest version of PMM. However, do note that only three graphs will use these custom metrics, so if you want to try out the dashboard without doing this customization, it’s also possible for you to do it (at the expense of having a bit less data).
For this, it’s as easy as executing the following (in the PMM client node that is monitoring the Postgres servers):
cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution/ cat<<EOF >queries-postgres-wal.yml # ###################################################### # This query gets information related to WAL LSN numbers in bytes, # which can be used to monitor the write workload. # Returns a COUNTER: https://prometheus.io/docs/concepts/metric_types/ # ###################################################### pg_wal_lsn: query: "SELECT pg_current_wal_lsn() - '0/0' AS bytes, pg_walfile_name(pg_current_wal_lsn()) as file, pg_current_wal_insert_lsn() - '0/0' AS insert_bytes, pg_current_wal_flush_lsn() - '0/0' AS flush_bytes" metrics: - bytes: usage: "COUNTER" description: "WAL LSN (log sequence number) in bytes." - file: usage: "COUNTER" description: "WAL file name being written to." - insert_bytes: usage: "COUNTER" description: "WAL insert LSN (log sequence number) in bytes." - flush_bytes: usage: "COUNTER" description: "WAL flush LSN (log sequence number) in bytes." EOF chown pmm-agent:pmm-agent queries-postgres-wal.yml chmod 660 queries-postgres-wal.yml
This will result in the following new metrics on PMM:
pg_wal_lsn_bytes pg_wal_lsn_file pg_wal_lsn_insert_bytes pg_wal_lsn_flush_bytes
Importing the custom PMM dashboard
For viewing these new metrics, we will import a new custom PMM dashboard that will provide this information in a structured and easy-to-understand way. This can be done in two easy steps (since I’ve uploaded it to Grafana Labs).
1. Click on the Dashboards -> Import menu:
2. Import via grafana.com with ID number (19600):
3. Select the PostgreSQL folder and Metrics, and import it:
After this, you’ll be taken to the dashboard, which I generally “star” for easy access later on:
Using the new dashboard
The dashboard is divided into three main areas, all of which have relevant configurations at the top and graphs following them. The main idea is to be able to easily detect what the server is doing at any point in time so we can better understand how it reacts to the workload and tune accordingly. These graphs and panels not only show changes in usage and workload patterns but also changes in configuration, which can help during root cause analysis and performance reviews.
Checkpointing section
Here, we will find everything related to checkpointing: when are checkpoints started (and due to what) and when they finish. Checkpointing has a lot of impact on the write throughput/utilization by Postgres, so it’s important to make sure that it’s not being triggered before needed.
In the following example, we can see how at the beginning, there were many instances of forced checkpointing, not only because the checkpointer was running more often than checkpoint_timeout seconds but because of the kind of metric we can see in the Checkpoints graph (requested vs. scheduled). We can’t see it yet because the WAL graphs are at the bottom, but this was fixed after we increased the value for max_wal_size.
Additionally, after the metrics refresh, we can see that the Checkpoint Timeout stat panel shows another change closer to the end of our currently selected time range:
This means that we have also changed the checkpoint_timeout value. This was done only after our workload was more stable and there were no more forced checkpoints being issued.
Buffers section
This section holds configurations and metrics related to reads to and writes from the shared buffers. Again, demonstrating it by example, let’s see how much of an impact correctly sizing the shared_buffers has on our test workload. It’s set at 128Mb by default, which is hardly enough for any serious workload and will mean that we should see a lot of churn in both reads and writes until we increase it. The only downside is that modifying shared_buffers needs a restart, so before doing that, we can increase bgwriter_lru_maxpages (which doesn’t need a restart) to avoid the writer stopping each round and get a bit more performance out of it at the expense of I/O.
We can clearly see how, at first, there were constant reads into the shared buffers, and the background writer was doing most of the writes. Additionally, we can see the positive impact on increasing bgwriter_lru_maxpages because that process is not being throttled anymore. After we increased shared_buffers and restarted Postgres, the reads decreased to almost having to read nothing from the page cache or disks, and the writes are no longer done by the background writer but by the checkpointer.
WAL usage section
The last section pertains to write-ahead logs, which are another source of potential performance bottlenecks. Note that (for now) this is the only section that needs data coming from the custom query collector we added at the beginning, so you can use the rest of the dashboard even if you decide not to include it. PMM already collects the configurations, so we will only miss data from the graphs.
The WAL Writes Per Checkpoint Timeout graph will group the number of bytes that were written in chunks of checkpoint_timeout seconds so we can have a clear view of the expected max_wal_size value. If the chunks (shown as orange bars) exceed the max_wal_size (shown as a red line), it means that there will be forced checkpointing. We can easily dimension max_wal_size knowing that it should be larger than any orange bar, and we can tell when exactly it was changed, following changes in the red line. The WAL Writes graph uses the same metric but is shown as a rate, which can help us pinpoint heavy write times more granularly. Lastly, the WAL Locations graph is included for completeness and shows the different locations for WAL pointers (such as insertion and flush locations), which, according to the documentation, are mainly used for debugging purposes.
Hints and tips
Most graphs contain additional information and links on either the configuration variables or the metrics they show, so they are useful in case we need a bit of help interpreting them. Just hover the mouse over the “i” icon on each of them:
Getting even more data
Starting from PostgreSQL 15, we have a new view on WAL metrics: pg_stat_wal.
The metrics used in this custom collector and dashboard will work for older versions, but in the future, it will be nice to see what data we can extract from this new view and how we can use it to tune our servers.
Conclusion
This is another example of how tunable and powerful PMM is. Not only because we can add our custom metrics easily but because it already collects many metrics we can use out of the box. Additionally, it’s easy to share new PMM dashboards via the Grafana Labs page by simply publishing it and sharing the ID number. With all this new information at hand, we can now better tune our PostgreSQL instances!
Lastly, we are working closely with the PMM Dev team to include this new dashboard in the PMM server itself, so all these custom steps won’t be needed in future releases. Let us know if you have any comments on it.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.