Mar
17
2023
--

Percona Monitoring and Management 2 Scaling and Capacity Planning


2022 was an exciting year for Percona Monitoring and Management (PMM). We’ve added and improved many features, including Alerting and Backup Management. These updates are designed to keep databases running at peak performance and simplify database operations. But as companies grow and see more demand for their databases, we need to ensure that PMM also remains scalable so you don’t need to worry about its performance while tending to the rest of your environment.

PMM2 uses VictoriaMetrics (VM) as its metrics storage engine. Percona’s co-Founder Peter Zaitsev wrote a detailed post about migration from Prometheus to VictoriaMetrics, One of the most significant differences in terms of performance of PMM2 comes with the usage for VM, which can also be derived from performance comparison on node_exporter metrics between Prometheus and VictoriaMetrics.

Planning for resources of a PMM Server host instance can be tricky because the numbers can change depending on the DB instances being monitored by PMM. For example, a higher number of data samples ingested per second or a monitoring database with a huge number of tables (1000+) can affect performance; similarly, the configuration for exporters or a custom metric resolution can also have an impact on the performance of a PMM server host. The point is that scaling up PMM isn’t linear, and this post is only meant to give you a general idea and serve our users as a good starting point when planning to set up PMM2.

The VictoriaMetrics team has also published some best practices, which can also be referred to while planning for resources for setting up PMM2.

Home Dashboard for PMM2
We have tested PMM version 2.33.0 with its default configuration, and it can monitor more than 1,000 MySQL services, with the databases running with the default sysbench Read-Write workload.  We observed that the overall performance of PMM monitoring 1,000 database services was good, and no significant resource usage spikes were observed; this is a HUGE increase in performance and capacity over previous versions!  Please note that the focus of these tests was around standard metrics gathering and display, we’ll use a future blog post to benchmark some of the more intensive query analytics (QAN) performance numbers.

Capacity planning and setup details

We used a dedicated 32-core CPU and 64GB of RAM for our testing.

CPU Usage for PMM Server Host System

The CPU usage averaged 24% utilization, as you can see in the above picture.

 

Memory Utilization for PMM Server Host System

Virtual Memory utilization was averaging 48 GB of RAM.

VictoriaMetrics maintains an in-memory cache for mapping active time series into internal series IDs. The cache size depends on the available memory for VictoriaMetrics in the host system; hence planning for enough RAM on the host system is important for better performance to avoid having a high percentage of slow inserts.

If we talk about overall disk usage for the instance monitoring 1,000 Database services, the average disk usage per datapoint comes out to be roughly .25 Bytes, or you should plan storage roughly between 500 GB – one TB for a default 30 day retention period.

Average Datapoints Size

Average datapoints size

Stats on Victoria Metrics

We recommended having at least two GB RAM and a two-core system for PMM Server as a bare minimum requirement to set up monitoring your database services. With this minimum recommended setup, you can monitor up to three databases comfortably, possibly more, depending on some of your environment’s already mentioned factors.

Based on our observations and various setups we have done with PMM, overall, with a reasonably powerful pmm-server host system (eight+ GB RAM and eight+ cores), it is the most optimum to target monitoring 32 databases per core or 16 databases per one GB RAM, hence keeping this in mind is really helpful while planning resources for your respective monitoring setups.

Number of Database Services Monitored Min Recommend Requirement
0-250 services 8 Core, 16 GB RAM
250-500 services 16 Core, 32 GB RAM
500-1000 services 32 Core, 64 GB RAM

PMM scalability improved dramatically through UX and performance research

In earlier versions of PMM2, the Home Dashboard could not load with more than 400 DB services, resulting in a frustrating experience for users. Interacting with UI elements such as filters and date pickers was previously impossible. We conducted thorough research to improve scalability and the user experience on our Home Dashboard for 1,000 database services. Our findings revealed that the design of the Home Dashboard heavily impacted scalability and poor UX on the UI resulting in unresponsive pages.

We redesigned the Home Dashboard as a solution, and the results were significant. The new dashboard provides a much better user experience with more critical information being displayed and scalability for environments up to 1000 DB services. The overall load time has improved dramatically, going from 50+ seconds to roughly 20 seconds, and there are no longer any unresponsive errors on the UI. Users can now interact with filters on other dashboards seamlessly as well!

There are still some limitations we’re working on addressing

  • Instance Overview Dashboards, which are shipped with PMM, do not work well with such a large number of instances, so it is recommended not to rely on them when such a high number of databases are being monitored. They would work well only with a maximum of 400 database services.
  • There is a known issue around the request “URI too Large” pop-up message that is visible because of some large query requests, this also leads to an issue with setting a big time range for observing metrics from the monitored Database.  Our team is planning to implement a fix for this soon.
  • QAN takes 50+ seconds to load up when 400+ database services are monitored. Also, the overall interaction with QAN feels laggy when searching and applying filters across a big list of services/nodes. Our team is working on improving the overall user experience of the QAN App, which will soon be fixed in future releases of PMM.

Not a formula but a rule of thumb

Overall resource usage in PMM depends on the configuration and workload, and it may vary for different setups so it’s difficult to say, “for monitoring this number of DB services, you need a machine of that size.” This post is meant to show how the PMM server scales up and performs with the default setup and all database host nodes configured in default metrics mode (push).

We plan to also work on another follow-up post on the performance and scalability where we would highlight results for different dashboards and QAN, showcasing the improvements we have made over the last few PMM releases.

Tell us what you think about PMM!

We’re excited about all of the improvements we’ve made, but we’re not done yet! Have some thoughts about how we can improve PMM, or want to ask questions? Come talk to us in the Percona Forums, and let us know what you’re thinking!

PMM Forum

Dec
19
2022
--

PMM V2.33: Offline Metric Collection, Guided Alerting Tour, Security Fixes, and More!

latest release of Percona Monitoring and Management

latest release of Percona Monitoring and ManagementWe are excited to announce the latest release of Percona Monitoring and Management (PMM) – V2.33. This release includes several new features and improvements that make PMM even more effective and user-friendly. Some of the key highlights of PMM V2.33 include:

  • Offline metric collection during PMM server outages or loss of PMM client-server network connectivity
  • A guided tour of Alerting, which helps new users get up to speed quickly and start using the alerting features of PMM
  • Easily restore your MongoDB databases to a previous state
  • Updated Grafana to version 9.2.5 to fix critical security vulnerabilities
  • Tab completion for the pmm-admin CLI command, which makes it easier to use the command line interface to manage PMM

You can get started using PMM in minutes with our PMM Quickstart guide to check out the latest version of PMM V2.33. 

Client-side caching minimizes potential for metrics loss

This new feature ensures that the PMM Client saves the monitoring data locally when a connection to the PMM server is lost, preventing gaps in the data. When the connection is restored, the data is sent to the PMM server, allowing the monitoring of your systems to continue without any data loss.

Note:

The client node is currently limited to storing only 1 GB of offline data. So,  if your instance is down for three days and generates more than 1 GB of data during that time, all the data will not be retrieved.

One of the core principles of our open-source philosophy is transparency, and we are committed to sharing our roadmap openly and transparently with our users. We are happy to share the roadmap for the implementation of PMM high availability (HA) in three stages, which has been a highly requested feature by our users. 

PMM HA will be rolled out in three stages. Stage one, which is included in PMM 2.33.0, involves the implementation of a data loss prevention solution using VictoriaMetrics integration for short outages. This feature is now available in the latest release of PMM. Stages two and three of PMM HA will be rolled out, including additional features and enhancements to provide a complete high availability solution for PMM. We are excited to bring this much-anticipated feature to our users, and we look forward to sharing more details in the coming months.

 

Stages of PMM HA Solutions Provided
Stage one (included in PMM 2.33.0) As an initial step toward preventing data loss we have developed the following:

Offline metric collection for short outages

Stage two (will be rolled out in 2023) As part of PMM HA stage two in HA we plan to implement the following:

HA data sources

As part of stage two, we will let the users use external data sources thereby decreasing dependency on the file system.

Stage three (will be rolled out in 2023) As part of PMM HA stage three we plan to implement the following:

HA Clustered PMM Servers 

Clustered PMM will be the focus of stage three. Detailed information will be included in the upcoming release notes.

 

Please feel free to book a 1:1 meeting with us to share your thoughts, needs, and feedback about PMM HA.

Tip: To improve the availability of the PMM Server until the general availability of PMM HA, PMM administrators can deploy it on Kubernetes via the Helm chart. The Kubernetes cluster can help ensure that PMM is available and able to handle different types of failures, such as the failure of a node or the loss of network connectivity.

Critical security vulnerabilities fixed

In PMM 2.33.0, we have updated Grafana to version 9.2.5, which includes important security fixes. This upgrade addresses several critical and moderate vulnerabilities, including CVE-2022-39328, CVE-2022-39307, and CVE-2022-39306. For more details, please see the Grafana 9.2.5 release notes. We strongly recommend that all users upgrade to the latest version of PMM to ensure the security of their systems.

Guided tour on Alerting

In the 2.31.0 release of PMM, we added a new feature called Percona Alerting, which provides a streamlined alerting system. To help users get started with this new feature, we have added a short in-app tutorial that automatically pops up when you first open the Alerting page. This tutorial will guide you through the fundamentals of Percona Alerting, and help you explore the various features and options available. We hope this tutorial will make it easier for users to get started with Percona Alerting and take full advantage of its capabilities.

Restore MongoDB backups more easily

Building upon the significant improvements for MongoDB backup management introduced in the previous release, we are now simplifying the process for restoring physical MongoDB backups. Starting with this release, you can restore physical backups straight from the UI, and PMM will handle the process end-to-end. Prior to this, you would need to perform additional manual steps to restart your MongoDB database service so that your applications could make use of the restored data.

Improvements on the pmm-admin CLI command

pmm-admin is a command-line tool that is used to manage and configure PMM. It is part of the PMM Client toolset and can be used to perform various administrative tasks, such as managing inventory. We have added tab completion for the pmm-admin CLI command. This means that you no longer have to know the entire command when using pmm-admin. Instead, you can simply type the initial part of the command and press Tab, and the rest of the command will be automatically completed for you.  This new feature makes it easier to use the command line interface and ensures that you can quickly and easily access all of the powerful features of PMM. 

What’s next?

  • A Health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 

Install PMM 2.33 now or upgrade your installation to V2.33 by checking our documentation for more information about upgrading.

Thanks to Community and Perconians

At Percona, we are grateful for our supportive community and dedicated team, who work together to shape the future of PMM. If you would like to be a part of this community, you can join us on our forums to request new features, share your feedback, and ask for support. We value the input of our community and welcome all members to participate in the ongoing development of PMM.

See PMM in action now!

Nov
15
2022
--

PMM v2.32: Backup Management for MongoDB in GA, New Home Dashboard, and More!

Percona Monitoring and Management v2.32

Percona Monitoring and Management v2.32We are pleased to announce the general availability of the Backup Management for MongoDB and other improvements in Percona Monitoring and Management (PMM) v.2.32 that has been released in November 2022. Details are in this blog and also in the PMM 2.32 Release Notes.

PMM is now on the scene with a new Home Dashboard where you can quickly and easily check your databases’ health at one glance and detect anomalies. While there’s no one-size-fits-all approach, we created and released the new Home Dashboard to make it more user-friendly, even for users new to PMM.

You can get started using PMM in minutes with PMM Demo to check out the latest version of PMM V2.32.

Let’s have a look at the highlights of PMM 2.32:

General availability of Backup Management for MongoDB

The Backup Management for MongoDB in PMM has reached General Availability and is no longer in Technical Preview.

Supported setups

MongoDB Backup Management now supports replica set setups for the following actions:

  • Create logical snapshot backups
  • Create logical Point In Time Recovery (PITR) backups
  • Create physical snapshot backups. This is available only with Percona Server for MongoDB
  • Restore logical snapshot backups.
  • Restore physical backups. This requires additional manual operations after the restore and is only available with Percona Server for MongoDB.
  • Restore logical PITR backups from S3

Current limitations

  • Restoring logical PITR backups only supports S3 storage type
  • Restoring physical backups requires manual post-restore actions
  • Restoring a MongoDB backup on a new cluster is not yet supported
  • Restoring physical backups for containerized MongoDB setups is not supported
  • Local storage for MySQL is not supported
  • Sharded cluster setups are not supported
  • Backups that are stored locally cannot be removed automatically
  • Retention is not supported for PITR artifacts

 

Quicker and easier database health overview with the new Home Dashboard

As mentioned and promised in previous release notes, we were investigating better approaches, methods, and user-friendly presentation of database health in the Home Dashboard, which is also the entry point to PMM. Finally, we are proud to release this finalized dashboard as the new Home Dashboard. Thank you for your feedback and collaboration during all iterations of the experimental versions.

Monitor hundreds of nodes without any performance issues

If you have hundreds of nodes being monitored with the same PMM instance, the original dashboard may have taken a long time to load, which could have resulted in an unresponsive page, due to the design of the original Home Dashboard with repeating panels for each node. With performance issues in mind, we re-designed the Home Dashboard with new logic to show what is wrong or what is OK with your databases, instead of showing all metrics, for each node.  

PMM Home Dashboard_home_select multiple nodes

PMM Home Dashboard_home_select multiple nodes

Anomaly detection

Many of you probably use dozens of tools for different purposes in your daily work, meetings, and projects. These tools should make your life easier, not more intensive. With monitoring tools, the issue of too many metrics can be daunting— so analyzing data, and detecting anomalies that deviate from a database’s normal behavior should be easy and fast. Functional Anomaly Detection panels, as opposed to separate graphs for each node, are a much better way to visualize and recognize problems with your databases that may require action to be taken.

  • You can click the node name on the panel to see the Node Overview Dashboard of the related node if you see any anomaly. So you can see all metrics of the Node that you need to diagnose the problems.
  • All panels except Advisor Checks can be filtered by node and environment variables
  • Graphs in the Anomaly Detection row show the data for the top 20 nodes. e.g., CPU anomalies in the top 20
Anomaly Detection

PMM Anomaly Detection panels

Command Center panels

The primary motivation behind the new Home Dashboard is simplicity. It was always hard to balance presenting the required metrics for everyone and at the same time, making it clean, functional, and simple while working on the new design. So we decided to use Command Center panels which are collapsed by default. If you see any anomaly in Memory Usage with more than 90%, how do you know when it happened or started? Time-series graphs for the Top 20 in the Command Center panels will help you see when the anomalies occurred: in the last 1 hour or the last week? 

PMM Command Center Panels

PMM Command Center Panels on Home Dashboard

Enhanced main menu

We returned with two improvements we previously promised. These improvements were announced in V2.32 for easier access to dashboards from the Main Menu. After the last changes, with each possible monitored services type represented on the Main Menu as icons, the menu became crowded and extended with all icons representing different service types. In the latest version,  you’ll only see the icons of currently monitored services on the Main Menu. For example, if you’re monitoring MongoDB, you will see the MongoDB Dashboard’s icon on the main menu, as opposed to the previous versions, which showed all database types PMM is capable of monitoring, whether you had them in your system or not. When and if you start to monitor other services like MySQL, they will be automatically added to the Main Menu.

Another improvement on the Main Menu is the visibility of all other dashboards. PMM provides multiple dashboards for different levels of information for each service. You only see some dashboards in the main menu; the rest are available in the folders. Some users can miss these dashboards, which are not presented in the Main Menu. Also, customer dashboards created by different users in your organization can be missed or invisible to you until you see them in the folders by chance. So, we added Other Dashboards links to the sub-menu of each service,  so that you can easily click and see all dashboards in the Service folder.

Quick access to other dashboards from the menu

Quick access to other dashboards from the menu

What’s next?

  • We’ll improve the Vacuum Dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • A health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 
  • For Backup Management, we are planning to continue to iterate on the current limitations listed above and make the restore processes as seamless as possible for all database types.

Install PMM 2.32 now or upgrade your installation to V2.32 by checking our documentation for more information about upgrading.

Learn more about Percona Monitoring and Management 3.32

Thanks to Community and Perconians

We love our community and team in Percona, who shape the future of PMM, together and help us with all those changes.

You can also join us on our community forums to request new features, share your feedback, and ask for support.

Thank you for your collaboration on the new Home Dashboards:

Cihan Tunal?   @SmartMessage 

Tyson McPherson @Parts Authority

Paul Migdalen @IntelyCare

Sep
14
2022
--

Percona Monitoring and Management Home Dashboard: What’s New?

Percona Monitoring and Management Home Dashboard

Percona Monitoring and Management Home DashboardThe all-new Percona Monitoring and Management (PMM) Home dashboard is the answer to some of the main concerns of our users:

  • A clear entry point (where do I start?)
  • Context (is everything okay?)

How did we achieve this? By coming up with a redesign of the Home dashboard so we can tackle the known performance issues that appeared when the number of monitored nodes increased, and also improve the overall experience.

Now, where is that new amazing Home dashboard? When you install/upgrade PMM to the 2.30.0 version, you are getting access to the Home dashboard, but not by default (yet). Since we would love to have as much feedback as possible from our user base as possible, we haven’t made the new Home dashboard the default one, so where is it?

Meet: The experimental dashboards directory

There are several ways to make it to the experimental directory, but the easiest way is to use the “Search Dashboards” option available in the left bar.

PMM Dashboard

And scroll to find the “Experimental” directory:

experimental PMM home dashboard

Or, just directly write “new-home” in the textbox and click on the link:

And that’s it! Once you’re there this is what you should be looking for:

The above screenshot is from the PMM Demo hosted by Percona, and you can access it by clicking this link.

Navigating the new Home dashboard

The idea behind this is simplicity. You as a user don’t need to be bombarded with tons of information that, in the end, is just noise. So, what are we looking at here? 

Outliers, things that are odd, wrong, and shouldn’t be.

And outliers where? At the server level. Since PMM is a polyglot monitoring tool that spans from MySQL to PostgreSQL to MongoDB, and also monitors a couple of reverse proxy tools (HAProxy and ProxySQL), we gotta make sure that the home works for everyone – and what does everyone has in common? Server-level metrics!

Anomaly detection PMM
The Anomaly detection panel is pretty simple, the left part informs on CPU/Disk Queue anomalies and the right part shows wrong behavior, and by wrong we mean metrics beyond a high threshold for a considerable amount of time:

  • CPU above 90% for more than 15 minutes
  • Disk Queue above 20 requests over a minute

Those values are customizable but we consider them to be good enough to catch up on real issues and also reduce the probability of false positives.

Now, the anomaly detection part. As impressive as it sounds, it came out very simple, but for details let’s talk about the next panel

Command Center

PMM Command Center
When designing the new Home dashboard, one of the main issues to tackle was the lack of context on the metrics provided: we can see that a CPU is at 50% but how do we know if that is expected or not? Is it normal behavior or is it an anomaly?

One of the considered options was to calculate the Z-Score. The standard score (or z-score) is a measure of the amount of standard deviation above or below the mean. Is a pretty cool way to find outliers. You can read more about it on Wikipedia: https://en.wikipedia.org/wiki/Standard_score.

However, out of simplicity and having scalability as a key condition, we ended up implementing a much more basic option: Seasonality, which is pretty much comparing the current value with the value for the same metric an X amount of time ago. In our case, we choose that amount of time to be a week. Why a week? Because it covers a big chunk of cases, for example:

Imagine you are a restaurant reservation app and your peak of traffic is on the weekends. What would you have to say about a CPU usage of 50%? Well, it depends on the day of the week and the hour. A Thursday night? Expected, and, in fact, probably a little bit low. But, what about a Monday morning? Totally wrong, an anomaly! Should be easy to spot.

We are calculating anomalies for:

  • CPU usage
  • Disk queue
  • Disk Write latency
  • Disk Read latency
  • Memory usage

In the Command Center, it is easy to spot the trends by having graphs for the last hour and graphs for a week ago, also the last hour.

Making it the default Home dashboard

Do you like it enough to make it the new Home dashboard for your PMM? Here’s how to do it:

default Home dashboard PMM
On the TOP left part, right next to the dashboard name, mark it as a FAVORITE.


Now, go to the “Preferences” section by clicking on the link available at the left bar at the bottom.


And finally, select the New-Home Dashboard as the default option. That’s it!

Final considerations

Is this good enough for debugging? No, but the idea of the Home dashboard is to work as a central health overview point to check in a bit on the status of your database fleet, identify any suspicious situations, and then go deep with the summary dashboards either for MySQL, Percona XtraDB Cluster, PostgreSQL, or MongoDB.

We would love to hear from you on this new Home dashboard, so feel free to hit the comment section or go directly to the forum (https://forums.percona.com/t/new-home-dashboard-feedback/16935) to leave us any feedback you have!

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Sep
06
2022
--

Percona Private DBaaS API in Action

Percona Private DBaaS API

Percona Monitoring and Management (PMM) comes with Database as a Service (DBaaS) functionality that allows you to deploy highly-available databases through a simple user interface and API. PMM DBaaS is sort of unique for various reasons:

  • It is fully open source and free
  • It runs on your premises – your data center or public cloud account
  • The databases are deployed on Kubernetes and you have full control over your data

PMM features a robust API but while I have seen demos on the internet demonstrating the UI, I have never seen anything about API. PMM API is great for building tools that can automate workflows and operations at scale. In this blog post, I’m going to impersonate a developer playing with PMMs API to deploy and manage databases. I also created an experimental CLI tool in python to showcase the possible integration.

Percona Private DBaaS API in Action

Preparation

At the end of this step, you should have the following:

  • Percona Monitoring and Management up and running
  • Kubernetes cluster 
  • PMM API token generated

First off, you need a PMM server installed and reachable from your environment. See the various installation ways in our documentation.

For DBaaS to work you would also need a Kubernetes cluster. You can use minikube, or leverage the recently released free Kubernetes capability (in this case it will take you ~2 minutes to set everything up).

To automate various workflows, you will need programmatic access to Percona Monitoring and Management. The recommended way is API tokens. To generate it please follow the steps described here. Please keep in mind that for now, you need admin-level privileges to use DBaaS.

Using API

I will use our API documentation for all the experiments here. DBaaS has a dedicated section. In each step, I will provide an example with the cURL command, but keep in mind that our documentation has examples for cURL, Python, Golang, and many more. My PMM server address is

34.171.88.159

and my token

eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=

. Just replace these with yours.

In the demo below, you can see me playing with PMM API through percona-dbaas-cli tool that I created to demonstrate possible integration for your teams. The goal here is to deploy the database with API and connect to it.

Below are the basic steps describing the path from setting up PMM to deploying your first database.

Connection check

To quickly check if everything is configured correctly, let’s try to get the PMM version.

API endpoint: /v1/version

CLI tool:

percona-dbaas-cli pmm version

 

cURL:

curl -k --request GET \
     --url https://34.171.88.159/v1/version \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0='

It should return the information about the PMM server. If you get the error, there is no way we can proceed.

In the CLI tool, if you have not configured access to the PMM, it will ask you to do it first.

Enable DBaaS in PMM

Database as a Service in PMM is in the technical preview stage at the time of writing this blog post. So we are going to enable it if you have not enabled it during the installation.

API endpoint: /v1/Settings/Change

CLI tool:

percona-dbaas-cli dbaas enable

 

cURL:

curl -k --request POST \
     --url https://34.171.88.159/v1/Settings/Change \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=' \
     --data '{"enable_dbaas": true}'

Now you should see the DBaaS icon in your PMM user interface and we can proceed with further steps. 

Register Kubernetes cluster

At this iteration, PMM DBaaS uses Percona Kubernetes Operators to run databases. It is required to register the Kubernetes cluster in PMM by submitting its kubeconfig.

API endpoint: /v1/management/DBaaS/Kubernetes/Register

CLI tool:

percona-dbaas-cli dbaas kubernetes-register

 

Registering k8s with cURL requires some magic. First, you will need to put kubeconfig into a variable and it should be all in one line. We have an example in our documentation:

KUBECONFIG=$(kubectl config view --flatten --minify | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\\n/g')

curl -k --request POST \
      --url "http://34.171.88.159/v1/management/DBaaS/Kubernetes/Register" \
     --header "accept: application/json" \
     --header "authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=" \
     --data "{ \"kubernetes_cluster_name\": \"my-k8s\", \"kube_auth\": { \"kubeconfig\": \"${KUBECONFIG}\" }}"

It is much more elegant in python or other languages. We will think about how to simplify this in the following iterations.

Once the Kubernetes cluster is registered, PMM does the following:

  1. Deploys Percona Operators for MySQL and for MongoDB
  2. Deploys Victoria Metrics Operators, so that we can get monitoring data from the Kubernetes in PMM

Get the list of Kubernetes clusters

Mostly to check if the cluster was added successfully and if the Operators were installed.

API endpoint: /v1/management/DBaaS/Kubernetes/List

CLI tool:

percona-dbaas-cli dbaas kubernetes-list

 

cURL:

curl -k --request POST \
     --url https://34.171.88.159/v1/management/DBaaS/Kubernetes/List \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=' \

In the CLI tool, I decided to have a nicely formatted list of the clusters, as it is possible to have many registered in a single PMM server.

Create the database

Right now our DBaaS solutions support MySQL (based on Percona XtraDB Cluster) and MongoDB, thus there are two endpoints to create databases:

API endpoints: 

CLI tool:

percona-dbaas-cli dbaas databases-create

 

cURL: 

curl -k --request POST \
     --url https://34.171.88.159/v1/management/DBaaS/PSMDBCluster/Create \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=' \
     --header 'Content-Type: application/json' \
     --data '{"kubernetes_cluster_name": "my-k8s", \"expose\": true}'

In the experimental CLI tool, I decided to go with a single command, where the user can specify the engine with the

–engine

flag.

Notice that I also set the expose flag to true, which instructs the Operator to create a LoadBalancer Service for my cluster. It is going to be publicly exposed to the internet, not a good idea for production.

There are various other parameters that you can use to tune your database when interacting with the API.

For now, there is a certain gap between the features that Operators provide and the API. We are heading towards more flexibility, stay tuned for future releases.

Get credentials and connect

It will take some time to provision the database – in the background, Persistent Volume Claims are provisioned, the cluster is formed and networking is getting ready. You can get the list of the databases and their statuses by looking at /v1/management/DBaaS/DBClusters/List endpoint. 

We finally have the cluster up and running. It is time to get the credentials:

API endpoints:

CLI tool:

percona-dbaas-cli dbaas get-credentials

 

cURL:

curl --request POST \
     --url https://34.171.88.159/v1/management/DBaaS/PXCClusters/GetCredentials \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer eyJrIjoiNmJ4VENyb0p0NWg1ODlONXRLT1FwN1N6YkU2SW5XMmMiLCJuIjoiYWRtaW5rZXkiLCJpZCI6MX0=' \
     --header 'Content-Type: application/json' \
     --data '{"kubernetes_cluster_name": "my-k8s","name": "my-mysql-0"}'

This will return you the endpoint to connect to, user and password. Use your favorite CLI tool or ODBC to connect to the database.

Conclusion

Automated database provisioning and management with various Database as a Service solution is becoming a minimal requirement for agile teams. Percona is committed to helping developers and operations teams to run databases anywhere. You can deploy fully open source Percona Monitoring and Management in the cloud or on-premises and provide a self-service experience to your teams not only through UI but API as well.

Right now PMM DBaaS is in technical preview and we encourage you to try it out. Feel free to tell us about your experience in our community forum.

Sep
02
2022
--

PMM v2.30: New Dashboards, Improved DBaaS Experience, Free K8s Cluster for Testing, and More!

Percona Monitoring and Management v2.30

Percona Monitoring and Management v2.30In this v2.30 release, we have focused on improving Percona Monitoring and Management (PMM) for usability, refining the dashboards (effortless root cause analysis), Database as a Service (DBaaS) functionality, and seamless onboarding to align with your needs. Your valuable feedback along with your contributions is important to us. Please keep contributing to take PMM to the next level and make the world more open source.

Note that you can get started with PMM in minutes with the PMM Demo to check out the latest version of PMM V2.30.

Here are some of the highlights in PMM V2.30:

New experimental dashboards

We created new experimental dashboards based on your feedback and requests. The new Home Dashboard was designed to gain actionable insights while monitoring numerous nodes or services without excessive loading time. Our objective with the all-new MongoDB dashboards is to provide collections and document-based metrics as requested by the PMM users.

Important: These experimental dashboards are subject to change. It is recommended to use these dashboards for testing purposes only.

New Home Dashboard (experimental)

The redesigned and improved new Home Dashboard was introduced as a more user-friendly and insightful experimental dashboard in PMM version 2.30.0. The driver behind the new Home Dashboard design is your requests, and feedback about long loading time issues of the existing Home dashboard for monitoring a large number of nodes or services in PMM. Repeated rows (for a large number of nodes) for each node on the dashboard can take a considerable time to load the Home Dashboard UI, which was reported as a performance problem by some users. So we reworked the Home dashboard to provide a boost in performance for monitoring big scales. Our new experimental dashboard ensures fast retrieval of data on the Home Dashboard. For more information, read more on experimental Home Dashboard metrics.

New Home Dashboard (experimental)

Please do not hesitate to give feedback by posting your comments here.

Tip: If you’d like to use an experimental Home dashboard as your default dashboard, you can follow these steps :
  • Open the new Home Dashboard and start it
  • Open the “Configuration” page by clicking it from the User Profile on the main menu
  • Select the new dashboard from the Home Dashboard dropdown under preferences. 

New MongoDB Dashboard (experimental)

After dozens of calls with PMM users who monitor MongoDB metrics in PMM, we collected all the feedback and put them together on new experimental dashboards. (Special thanks to Kimberly Wilkins and Michael Patrick from Percona) We have introduced the following experimental dashboards to collect more metrics and data for MongoDB and present them in a more structured way with new dashboards:

1. Experimental MongoDB collection overview

This dashboard contains panels for data about the hottest collections in the MongoDB database. For more information, see the documentation.

Experimental MongoDB collection overview

2. Experimental MongoDB collection details

This experimental dashboard provides detailed information about the top collections by document count, size, and document read for MongoDB databases. For more information, see the documentation.

Experimental MongoDB collection details

3. Experimental MongoDB oplog details

This real-time dashboard contains oplog details such as Recovery Window, Processing Time, Buffer Capacity, and Oplog Operations. For more information, see the documentation.

oplog details dashboard

Tip: If you’d like to move MongoDB experimental dashboards to the MongoDB folder or other folders that you internally use to gather all MongoDB dashboards, you can follow these steps:
Note: You should have at least an Editor role to do this change.
  • Open the dashboard that you want to move to another folder
  • Click the gear icon to open Dashboard Settings
  • Select the folder name that you want to move the dashboard from the dropdown under the folder in the General section
  • Click the “Save Dashboard” button on the left to save the change

Improved user experience for DBaaS

Usability is one of the themes on our roadmap and is now more important for us to give the best PMM experience in minutes. So we applied a couple of improvements to the configuration and deployment of DBaaS database clusters:

  • In the 2.30.0 release of PMM, we simplified the process of registering the K8s cluster to suggest defaults and populated all possible defaults on the database creation screen.
  • If you have configured the DBaaS feature with K8s registering in it, the creation of the database will be a “one click of the button” action.
  • For more information, see the documentation.
  • We have simplified the DBaaS and Percona Platform connection configuration by suggesting and automatically setting the public address of the PMM server if it’s not set up. This happens when connecting to Percona Platform, adding a K8s cluster, or adding a database cluster. 
  • If you are not a K8s user for now but want to test the DBaaS functionality of PMM, Percona is offering a cluster for free via the Percona Platform portal. Read more in the Private DBaaS with Free Kubernetes Cluster blog post.

Operators support

PMM now supports Percona Operator for MySQL based on Percona XtraDB Cluster.

What’s next?

We’re working on a new main menu on top of the Grafana 9.1 menu structure and Vacuum Monitoring with a new experimental dashboard. If you have any feedback or suggestions, please get in touch with us

Also, another topic on our next release agenda is Podman GA. Please share your experience/feedback with Podman. 

Run PMM Server with Podman now!

We know that you’re waiting for news for Alerting in PMM and we’ll get back to you with good news in v2.31 by combining our Integrated Alerting with Grafana Alerting to make it easier for setup and management. Please follow our emails and release notes to be informed about upcoming releases.

Thanks to the Community and Perconians

We love our Community and team at Percona who help us to shape PMM and make it better! 

Thank you for your collaboration on the new Home Dashboard:

  • Daniel Burgos
  • Anton Bystrov

Thank you for your collaboration on MongoDB dashboards:

  • Anton Bystrov
  • Kimberly Wilkins
  • Michael Patrick

Appreciate your feedback:

Learn more about Percona Monitoring and Management

Aug
23
2022
--

Installing PMM Server from RHEL Repositories

Installing PMM Server from RHEL Repositories

Installing PMM Server from RHEL RepositoriesWe currently provide multiple ways to install Percona Monitoring and Management (PMM) Server, with the primary way  to use a docker:

Install Percona Monitoring and Management

or Podman:

Podman – Percona Monitoring and Management

We implemented it this way to simplify deployments, as the PMM server uses multiple components like Nginx, Grafana, PostgreSQL, ClickHouse, VictoriaMetrics, etc. So we want to avoid the pain of configuring each component individually, which is labor intensive and error-prone.

For this reason, we also provide bundles in a virtual box:

Virtual Appliance – Percona Monitoring and Management

or as Amazon AMI:

AWS Marketplace – Percona Monitoring and Management

Recently, for even simpler and effort-free deployments, we partnered with HostedPMM. They will install and manage the PMM server for you with a 1-Click experience.

However, we still see some interest in having a PMM server installed from repos, using, for example, rpm packages. For this reason, we have crafted ansible scripts that you can use on RedHat 7 compatible system.

The scripts are located here:

Percona-Lab/install-repo-pmm-server (github.com)

Please note these scripts are ONLY for EXPERIMENTATION and quick evaluation of PMM-server, and this way IS NOT SUPPORTED by Percona.

The commands to install PMM Server:

git clone https://github.com/Percona-Lab/install-repo-pmm-server
cd install-repo-pmm-server/ansible
ansible-playbook -v -i 'localhost,' -c local pmm2-docker/main.yml
ansible-playbook -v -i 'localhost,' -c local /usr/share/pmm-update/ansible/playbook/tasks/update.yml

And you should have an empty RedHat 7 system without any database or web-server packages installed.

Let us know if you have interest in this way of deploying PMM server!

Jul
13
2022
--

How Percona Monitoring and Management Helps You Find Out Why Your MySQL Server Is Stalling

Percona Monitoring and Management MySQL Server Is Stalling

In this blog, I will demonstrate how to use Percona Monitoring and Management (PMM) to find out the reason why the MySQL server is stalling. I will use only one typical situation for the MySQL server stall in this example, but the same dashboards, graphs, and principles will help you in all other cases.

Nobody wants it but database servers may stop handling connections at some point. As a result, the application will slow down and then will stop responding.

It is always better to know about the stall from a monitoring instrument rather than from your own customers.

PMM is a great help in this case. If you look at its graphs and notice that many of them started showing unusual behavior, you need to react. In the case of stalls, you will see that either some activity went to 0 or, otherwise, it increased to high numbers. In both cases, it does not change.

Let’s review the dashboard “MySQL Instance Summary” and its graph “MySQL Client Thread Activity” during normal operation:

PMM MySQL Instance Summary

As you see, the number of active threads fluctuates and this is normal for any healthy application: even if all connections request data, MySQL puts some threads into idle states while they need to wait while the storage engine prepares the data for them. Or, if the client application processes retrieved data.

The next screenshot was taken when the server was stalling:

Percona monitoring dashboard

In this picture, you see that the number of active threads is near maximum. At the same time the number of “MySQL Temporary Objects” lowered down to zero. This by itself shows that something unusual happened. But to understand the picture better let’s examine storage engine graphs.

I, like most MySQL users, used InnoDB for this example. Therefore the next step for figuring out what is going on would be to examine graphs in the “MySQL InnoDB Details” dashboard.

MySQL InnoDB Details

First, we are seeing that the number of rows that InnoDB reads per second went down to zero, as well as the number of rows written. This means that something prevents InnoDB from performing its operations.

MySQL InnoDB Details PMM

More importantly, we see that all I/O operations were stopped. This is unusual even on a server that does not handle any user connection: InnoDB always performs background operations and is never completely idle.

Percona InnoDB

You may see this in the “InnoDB Logging Performance” graph: InnoDB still uses log files but only for background operations.

InnoDB Logging Performance

InnoDB Buffer Pool activity is also stopped. What is interesting here is that the number of dirty pages went down to zero. This is visible on the “InnoDB Buffer Pool Data” graph: dirty pages are colored in yellow. This actually shows that InnoDB was able to flush all dirty pages from the buffer pool when InnoDB stopped processing user queries.

At this point we can make the first conclusion that our stall was caused by some external lock, preventing MySQL and InnoDB from handling user requests.

MySQL and InnoDB

The “Transaction History” graph confirms this guess: there are no new transactions and InnoDB was able to flush all transactions that were waiting in the queue before the stall happened.

We can conclude that we are NOT experiencing hardware issues.

MySQL and InnoDB problems

This group shows why we experience the stall. As you can see in the  “InnoDB Row Lock Wait Time” graph, the wait time was raised to its maximum value around 14:02, then lowered to zero. There is no row lock waits registered during the stall time.

This means that at some point all InnoDB transactions were waiting for a row lock, then failed with a timeout. Still, they have to wait for something. Since there are no hardware issues and InnoDB functions healthy in the background, this means that all threads are waiting for a global MDL lock, created by the server.

If we have Query Analytics (QAN) enabled we can find such a command easily.

Query Analytics (QAN)

For the selected time frame we can see that many queries were running until a certain time when a query with id 2 was issued, then other queries stopped running and restarted a few minutes later. The query with id 2 is FLUSH TABLES WITH READ LOCK which prevents any write activity once the tables are flushed.

This is the command that caused a full server stall.

Once we know the reason for the stall we can perform actions to prevent similar issues in the future.

Conclusion

PMM is a great tool that helps not only to identify if your database server is stalling but also to figure out what was the reason for the stall. I used only one scenario in this blog. But you may use the same dashboards and graphs to find out other reasons for the stall, such as DoS attack, hardware failure, a high number of IO operations, caused by poorly optimized queries, and many more.

Jul
07
2022
--

Automate the SSL Certificate Lifecycle of your Percona Monitoring and Management Server

SSL Certificate Lifecycle of your Percona Monitoring and Management

SSL Certificate Lifecycle of your Percona Monitoring and ManagementWe highly value security here at Percona, and in this blog post, we will show how to protect your Percona Monitoring and Management (PMM) Server with an SSL certificate and automate its lifecycle by leveraging a proxy server.

Introduction

As you may know, PMM Server provides a self-signed SSL certificate out-of-the-box to encrypt traffic between the client (web or CLI) and the server. While some people choose to use this certificate in non-critical environments, oftentimes protected by a private network, it is definitely not the best security practice for production environments.

I have to mention that a self-signed certificate still achieves the goal of encrypting the connection. Here are a few things, or problems, you should know about when it comes to self-signed certificates in general:

  • they cannot be verified by a trusted Certificate Authority (CA)
  • they cannot be revoked as a result of a security incident (if issued by a trusted CA, they can be revoked)
  • when used on public websites, they may negatively impact your brand or personal reputation

This is why most modern browsers will show pretty unappealing security warnings when they detect a self-signed certificate. Most of them will prevent users from accidentally opening websites that do not feature a secure connection (have you heard of the thisisunsafe hack for Chrome?). The browser vendors are obviously attempting to raise security awareness amongst the broad internet community.

We highly encourage our users to have their SSL certificates issued by a trusted Certificate Authority.

Some people find it overwhelming to keep track of expiry dates and remember to renew the certificates before they expire. Until several years ago, SSL certificates had to be paid for and were quite expensive. Projects like Let’s Encrypt — a non-profit Certificate Authority — have been devised to popularize web security by offering SSL certificates absolutely free of charge. However, their validity is only limited to three months, which is quite short.

We will explore the two most popular reverse proxy tools which allow you to leverage an SSL certificate for better security of your PMM instance. More importantly, we will cover how to automate the certificate renewal using those tools. While PMM Server is distributed in three flavors — docker, AMI, and OVF — we will focus on docker being our most popular distribution.

All scripts in this post assume you have a basic familiarity and experience of working with docker and docker compose.

Reverse proxies

While the choice of open source tools is quite abundant these days, we will talk about two of them that I consider being the most popular: nginx and traefik.

In order to try one of the solutions proposed here, you’ll need the following:

Let’s take a look at our network diagram. It shows that the proxy server is standing between PMM Server and its clients. This means that the proxy, not PMM Server, takes care of terminating SSL and encrypting traffic.

PMM Server and Client Network Diagram

 

Nginx

Nginx came to the market in 2004, which is quite a solid age for any software product. It has gained tremendous adoption since then and to this date, it powers many websites as a reverse proxy. Let’s see what it takes to use Nginx for SSL certificate management.

First, let me remind you where PMM Server stores the self-signed certificates:

% docker exec -t pmm-server sh -c "ls -l /srv/nginx/*.{crt,key,pem}"
total 24
-rw-r--r-- 1 root root 6016 Jun 10 11:24 ca-certs.pem
-rw-r--r-- 1 root root  977 Jun 10 11:24 certificate.crt
-rw-r--r-- 1 root root 1704 Jun 10 11:24 certificate.key
-rw-r--r-- 1 root root  424 Jun 10 11:24 dhparam.pem

We could certainly pass our own certificates by mounting a host directory containing custom certificates or, alternatively, by copying the certificates to the container. However, that would require us to issue the certificates first. What we want to achieve is to get the certificates issued automatically. I must say that even though nginx does not offer such functionality out of the box, there are a few open source projects that effectively close that gap.

One such project is nginx-proxy. It seems to be quite mature and stable, boasting of 16K GitHub stars. It has a peer project — acme-companion — which takes care of the certificate lifecycle. We will use them both, so ultimately we will end up running three separate containers:

  1. nginx-proxy – the reverse proxy
  2. acme-companion – certificate management component
  3. pmm-server – the proxied container

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:
# proxy.conf

server_tokens off;
client_max_body_size 10m;

# HTTP 1.1 support
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $http_host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $proxy_connection;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
proxy_set_header X-Forwarded-Ssl $proxy_x_forwarded_ssl;
proxy_set_header X-Forwarded-Port $proxy_x_forwarded_port;

# Mitigate httpoxy attack
proxy_set_header Proxy "";

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:

Note that most recent versions of docker client come with docker compose sub-command, while earlier versions may require you to additionally install a python-based docker-compose tool.

Here is what docker-compose.yml looks like. You can go ahead and create it while making sure to replace my-domain.org with your own public domain name.

# docker-compose.yml

version: '3'

services:

  nginx:
    image: nginxproxy/nginx-proxy
    container_name: nginx
    environment:
      - DEFAULT_HOST=pmm.my-domain.org
    ports:
      - '80:80'
      - '443:443'
    restart: always
    volumes:
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf:ro
      - ./nginx/vhost.d:/etc/nginx/vhost.d:ro
      - ./nginx/certs:/etc/nginx/certs:ro
      - ./nginx/html:/usr/share/nginx/html
      - ./nginx/dhparam:/etc/nginx/dhparam
      - /var/run/docker.sock:/tmp/docker.sock:ro

  letsencrypt:
    image: nginxproxy/acme-companion
    container_name: acme-companion
    restart: always
    environment:
      - NGINX_PROXY_CONTAINER=nginx
      - DEFAULT_EMAIL=admin@my-domain.org
    depends_on:
      - nginx
    volumes:
      - ./nginx/vhost.d:/etc/nginx/vhost.d
      - ./nginx/certs:/etc/nginx/certs
      - ./nginx/html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro

  pmm-server:
    container_name: pmm-server
    image: percona/pmm-server:2
    restart: unless-stopped
    environment:
      - VIRTUAL_HOST=pmm.my-domain.org
      - LETSENCRYPT_HOST=pmm.my-domain.org
      - VIRTUAL_PORT=80
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    depends_on:
      - nginx
    volumes:
      - pmm-data:/srv
    expose:
      - '80'

volumes:
  pmm-data:
    name: pmm-data
    external: false

Now launch your site by running docker compose up -d. That’s it! If everything works fine, you should see your PMM Server’s login page by visiting https://pmm.my-domain.org. If you type docker ps -a in the terminal, you will see all three containers listed:

% docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED       STATUS                 PORTS                                                                      NAMES
07979480350c   percona/pmm-server:2        "/opt/entrypoint.sh"     1 hours ago   Up 1 hours (healthy)   80/tcp, 443/tcp                                                            pmm-server
b7ed4cbf1064   nginxproxy/acme-companion   "/bin/bash /app/entr…"   1 hours ago   Up 1 hours                                                                                        acme-companion
d35da441c103   nginxproxy/nginx-proxy      "/app/docker-entrypo…"   1 hours ago   Up 1 hours             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   nginx

Should anything go wrong, you can troubleshoot by checking the container logs, i.e. docker logs pmm-server or docker logs acme-companion maybe useful to explore.

Restrict access to PMM

Sometimes you need to restrict access to known IPs or networks. Gladly, nginx-proxy allows you to easily configure that. All you need to do is:

  • create a text file with the name of your domain — for example, pmm.my-domain.org — and save it to ./nginx/vhost.d/
    # Contents of pmm.my-domain.org
    
    # Allow trafic from known IP addresses or networks
    allow 127.0.0.0/8;  # internal network
    allow 100.1.1.0/20; # another network   
    allow 219.24.87.93; # just one IP address
    
    # Reject traffic from all other IPs and networks
    deny all;
  • finally, restart the container with docker restart pmm-server.

From now on, if someone tries to access your PMM Server from an IP address that is not in the allow list, they should get the error HTTP 403 Forbidden.

Test the connection security

While protecting your connection with an SSL certificate is a good idea, there are a whole lot of other security concerns that the certificate alone does not address. There are a few useful web services out there that allow you to not only verify your certificate but also perform a number of other vital checks to test the connection, such as sufficient cipher strength, use of outdated security protocols, vulnerability to known SSL attacks, etc.

Qualys is one such service, and all it takes to review the certificate is to go to their web page, enter the domain name of your site, and let them do the rest. After a few minutes you should see a scan report like this:

Voilà! We have achieved a very high rating of connection security without really spending much time! That’s the magic of the tool ?

Prior to using the Nginx proxy, I could never achieve the same result from the first attempt and the best score I could get was “B”. It took me time to google up what every warning of the scan report meant and it took even more time to find a proper solution and apply it to my configuration.

To wrap it up — if you keep your version of Nginx and its companion updated, it’ll save you a ton of time and also make your TLS comply with the highest security standards. And yes, certificate rotation before expiry is guaranteed.

Traefik

Traefik is a more recent product. It was first released in 2016 and has gained substantial momentum since then.

Traefik is super popular when it comes to using it as a proxy server and a load balancer for docker and Kubernetes alike. Many claim Traefik to be more powerful and flexible compared to Nginx, but also a bit more difficult to set up. Unlike Nginx, Traefik can manage the certificate lifecycle without additional tools, which is why we will launch only two containers — one for PMM Server and one for Traefik proxy.

Basic configuration

To get started with a very minimal Traefik configuration please follow the steps below.

  • choose a root directory for your project
  • create a basic docker-compose.yml in the root folder with the following contents:
    version: '3'
    
    services:
    
      traefik:
        image: traefik:v2.7
        container_name: traefik
        restart: always
        command:
          - '--log.level=DEBUG'
          - '--providers.docker=true'
          - '--providers.docker.exposedbydefault=false'
          - '--entrypoints.web.address=:80'
          - '--entrypoints.websecure.address=:443'
          - '--certificatesresolvers.le.acme.httpchallenge=true'
          - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
          - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
          - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
        ports:
          - '80:80'
          - '443:443'
        volumes:
          - ./letsencrypt:/letsencrypt
          - /var/run/docker.sock:/var/run/docker.sock:ro
        networks:
          - www
    
      pmm-server:
        image: percona/pmm-server:2
        container_name: pmm-server
        restart: always
        environment:
          - DISABLE_TELEMETRY=0
          - ENABLE_DBAAS=1
          - ENABLE_BACKUP_MANAGEMENT=1
          - ENABLE_ALERTING=1
          - GF_ANALYTICS_CHECK_FOR_UPDATES=false
          - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
          - GF_SECURITY_DISABLE_GRAVATAR=true
        volumes:
          - pmm-data:/srv
        expose:
          - '80'
        labels:
          - 'traefik.enable=true'
          - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
          - 'traefik.http.routers.pmm.entrypoints=websecure'
          - 'traefik.http.routers.pmm.tls.certresolver=le'
        networks:
          - www
    
    volumes:
      pmm-data:
        name: pmm-data
        external: false
    
    networks:
      www:
        name: www
        external: false
  • replace my-domain.org with the domain name you own
  • make sure ports 80 and 443 are open on your server and accessible publicly
  • create a directory letsencrypt in the root of your project — it will be mounted to the container so Traefik can save the certificates

Once you launch the project with docker compose up -d, you should be able to go to https://pmm.my-domain.org and see the login page of your PMM Server.

If you run docker ps -a, you will see two containers in the output:

% docker ps -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                    PORTS                                                                      NAMES
4a2746c26738   traefik:v2.7           "/entrypoint.sh --pr…"   52 seconds ago   Up 48 seconds             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik
27943a8edfef   percona/pmm-server:2   "/opt/entrypoint.sh"     52 seconds ago   Up 48 seconds (healthy)   80/tcp, 443/tcp                                                            pmm-server

If you face any issues, you can troubleshoot by checking the logs of Traefik container with docker logs traefik. For obvious performance reasons, we advise you to decrease the log verbosity by setting the log.level parameter to INFO.

Advanced configuration

A closer analysis of the basic configuration confirms that it suffers from a few problems, in particular:

  • Visiting http://pmm.my-domain.org will result in HTTP 404 Not found, i.e. no redirect is configured by default.
  • Qualys SSL report rates our configuration with a “B”, which is insufficient for prod environments. It highlights three issues:
    • the server supports outdated, or insecure, TLS protocols 1.0 and 1.1
      Proxy configuration - B-flat report
    • the server uses quite a few weak ciphers
      Proxy configuration - weak ciphers
    • absence of security headers in proxy response

We will try to address all of the mentioned issues one by one.

  1. Redirect HTTP to HTTPS
    Traefik uses a special HTTP option called entryPoint. It can be easily achieved by passing the following parameters to Traefik:

    --entrypoints.web.http.redirections.entryPoint.to=websecure
    --entrypoints.web.http.redirections.entryPoint.scheme=https
  2. Avoid using outdated protocols TLS 1.0 and 1.1
  3. Avoid using weak ciphers

    As those two issues relate to TLS, why not combine the solution for both in one? Traefik offers two types of configuration — static and dynamic. The number of options necessary to address the ciphers can get quite big, this is why we’ll choose to use the dynamic configuration, i.e. put everything in a file versus inline in our docker-compose.yml file. Therefore, let’s create a file called dynamic.yml and place it in the project root:

    tls:
      options:
        default:
          minVersion: VersionTLS12
          sniStrict: true
          preferServerCipherSuites: true
          cipherSuites:
            - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
            - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
            - TLS_CHACHA20_POLY1305_SHA256
            - TLS_AES_256_GCM_SHA384
            - TLS_AES_128_GCM_SHA256
          curvePreferences:
            - secp521r1
            - secp384r1

    As you can see here, we limit the TLS version only to the most secure, the same is true for the cipher suites and cryptoalgorythmic preferences. The file name can be anything you like, so we will need to point Traefik to the file containing the dynamic configuration. To do this, we will leverage the file provider parameter:

    - '--providers.file=true'
    - '--providers.file.filename=/dynamic.yml'
  4. Add security headers to the proxy response

    Additional security headers will go to the static configuration though:

    - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
    - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
    - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
    - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
    - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'

     

That should be it for Traefik. We have successfully addressed all security concerns, which is confirmed by a fresh SSL report.

Proxy configuration - A-plus superior report

For your convenience, I will put the final version of yml file below.

docker-compose.yml

version: '3'

services:
  traefik:
    image: traefik:v2.7
    container_name: traefik
    hostname: traefik
    command:
      - '--log.level=DEBUG'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--providers.file=true'
      - '--providers.file.filename=/dynamic.yml'
      - '--entrypoints.web.address=:80'
      - '--entrypoints.web.http.redirections.entryPoint.to=websecure'
      - '--entrypoints.web.http.redirections.entryPoint.scheme=https'
      - '--entrypoints.websecure.address=:443'
      - '--certificatesresolvers.le.acme.httpchallenge=true'
      - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
      - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
      - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./letsencrypt:/letsencrypt
      - ./dynamic.yml:/dynamic.yml
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - www

  pmm-server:
    image: percona/pmm-server:2
    container_name: pmm-server
    restart: always
    environment:
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    volumes:
      - pmm-data:/srv
    expose:
      - '80'
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
      - 'traefik.http.routers.pmm.entrypoints=websecure'
      - 'traefik.http.routers.pmm.tls.certresolver=le'
      - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
      - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
      - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
      - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
      - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'
    networks:
      - www

volumes:
  pmm-data:
    name: pmm-data
    external: false

networks:
  www:
    name: www
    external: false

Some security aspects

Docker socket

I’m sure you noticed the following section in docker-compose.yml:

volumes:
      - '/var/run/docker.sock:/var/run/docker.sock:ro'

You may say it’s not super secure to pass the docker socket to one or more containers, right? True, but this is a hard requirement if we want to automate things. It turns out, that both the proxy server and its companion are doing many things under the hood to make it happen, so they need access to docker runtime to be able to listen to docker events and react to them.

Consider this example: if you add a new service (or site in our case) to the same or a different docker-compose.yml and then launch it with docker compose up -d, the proxy will detect this event and request a new certificate from a Certificate Authority or pick it up from the storage, if it has already been issued. Only then it will be able to pass the certificate to your service. This kind of low-level control over docker runtime is what makes it possible for Nginx or Traefik to manipulate other containers. The mission would be impossible without the socket.

Oftentimes, when developing apps or services, engineering teams will put all their services — backend, frontend, and database — in one docker-compose.yml, which is super convenient as it can be shared within the team. However, you will certainly want to isolate prod from non-prod environments. For instance, if your database server is also deployed in a docker container, it should be done in a different environment, much more hardened ? than this.

The use of port 80

It may appear weird to mention port 80, which is the standard port used for unencrypted HTTP, in a blog post about security. Nonetheless, both Nginx and Traefik rely on it being open on the server end. This is because we used the so-called HTTP challenge (read more about challenge types here), which is supported by most proxies and is the easiest to implement. When the CA is about to process the request to issue a certificate, it needs to verify your domain ownership. It does so by querying a special .well-known endpoint on your site which can only be available on port 80. Then, in case of success, it proceeds to issue the certificate.

Both proxy servers provide quite convenient options for redirecting HTTP to HTTPS, which is also good for search engine optimization (SEO).

Conclusion

I find it difficult to disagree that automating the SSL certificate lifecycle, especially if you own or maintain more than just a few public sites (or PMM servers ?), is a much more pleasant journey than doing it manually. Thankfully, the maturity and feature completeness of the open source proxies are amazing, they are used and trusted by many, which is why we recommend you try them out as well.

P.S. There is so much more you can do with your configurations, i.e. add compression for the static assets, harden your security even more, issue a wildcard certificate, etc., but that would be too much for one post and it is probably a good topic to explore in the next one.

Have you tried some other proxies? How do they compare to Nginx or Traefik? We’d love to hear about your experiences, please share them with us via our community forums.

Jun
22
2022
--

Percona Monitoring and Management in Kubernetes is now in Tech Preview

Percona Monitoring and Management in Kubernetes

Percona Monitoring and Management in KubernetesOver the course of the years, we see the growing interest in running databases and stateful workloads in Kubernetes. With Container Storage Interfaces (CSI) maturing and more and more Operators appearing, running stateful workloads in your favorite platform is not that scary anymore. Our Kubernetes story at Percona started with Operators for MySQL and MongoDB, adding PostgreSQL later on. 

Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. It can be deployed in a virtual appliance or a Docker container. Our customers requested us to provide a way to deploy PMM in Kubernetes for a long time. We had an unofficial helm chart which was created as a PoC by Percona teams and the community (GitHub).

We are introducing the Technical Preview of the helm chart that is supported by Percona to easily deploy PMM in Kubernetes. You can find it in our helm chart repository here

Use cases

Single platform

If Kubernetes is a platform of your choice, currently you need a separate virtual machine to run Percona Monitoring and Management. No more with an introduction of a helm chart. 

As you know, Percona Operators all have integration with the PMM which enables monitoring for databases deployed on Kubernetes. Operators configure and deploy pmm-client sidecar container and register the nodes on a PMM server. Bringing PMM into Kubernetes simplifies this integration and the whole flow. Now the network traffic between pmm-client and PMM server does not leave the Kubernetes cluster at all.

All you have to do is to set the correct endpoint in a pmm section in the Custom Resource manifest. For example, for Percona Operator for MongoDB, the pmm section will look like this:

 pmm:
    enabled: true
    image: percona/pmm-client:2.28.0
    serverHost: monitoring-service

Where monitoring-service is the service created by a helm chart to expose a PMM server.

High availability

Percona Monitoring and Management has lots of moving parts: Victoria Metrics to store time-series data, ClickHouse for query analytics functionality, and PostgreSQL to keep PMM configuration. Right now all these components are a part of a single container or virtual machine, with Grafana as a frontend. To provide a zero-downtime deployment in any environment, we need to decouple these components. It is going to substantially complicate the installation and management of PMM.

What we offer instead right now are ways to automatically recover PMM in case of failure within minutes (for example leveraging the EC2 self-healing mechanism).

Kubernetes is a control plane for container orchestration that automates manual tasks for engineers. When you run software in Kubernetes it is best if you rely on its primitives to handle all the heavy lifting. This is what PMM looks like in Kubernetes:

PMM in Kubernetes

  • StatefulSet controls the Pod creation
  • There is a single Pod with a single container with all the components in it
  • This Pod is exposed through a Service that is utilized by PMM Users and pmm-clients
  • All the data is stored on a persistent volume
  • ConfigMap has various environment variable settings that can help to fine-tune PMM

In case of a node or a Pod failure, the StatefulSet is going to recover PMM Pod automatically and remount the Persistent Volume to it. The recovery time depends on the load of the cluster and node availability, but in normal operating environments, PMM Pod is up and running again within a minute.

Deploy

Let’s see how PMM can be deployed in Kubernetes.

Add the helm chart:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install PMM:

helm install pmm --set service.type="LoadBalancer" percona/pmm

You can now login into PMM using the LoadBalancer IP address and use a randomly generated password stored in a

pmm-secret

Secret object (default user is admin).

The Service object created for PMM is called

monitoring-service

:

$ kubectl get services monitoring-service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
monitoring-service   LoadBalancer   10.68.29.40   108.59.80.108   443:32591/TCP   3m34s

$ kubectl get secrets pmm-secret -o yaml
apiVersion: v1
data:
  PMM_ADMIN_PASSWORD: LE5lSTx3IytrUWBmWEhFTQ==
…

$ echo 'LE5lSTx3IytrUWBmWEhFTQ==' | base64 --decode && echo
,NeI<w#+kQ`fXHEM

Login to PMM by connecting to HTTPS://<YOUR_PUBLIC_IP>.

Customization

Helm chart is a template engine for YAML manifests and it allows users to customize the deployment. You can see various parameters that you can set to fine-tune your PMM installation in our README

For example, to set choose another storage class and set the desired storage size, set the following flags:

helm install pmm percona/pmm \
--set storage.storageClassName="premium-rwo" \
-–set storage.size=”20Gi”

You can also change these parameters in values.yaml and use “-f” flag:

# values.yaml contents
storage:
  storageClassName: “premium-rwo”
  size: 20Gi


helm install pmm percona/pmm -f values.yaml

Maintenance

For most of the maintenance tasks, regular Kubernetes techniques would apply. Let’s review a couple of examples.

Compute scaling

It is possible to vertically scale PMM by adding or removing resources through

pmmResources

variable in values.yaml. 

pmmResources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

Once done, upgrade the deployment:

helm upgrade -f values.yaml pmm percona/pmm

This will restart a PMM Pod, so better plan it carefully not to disrupt your team’s work.

Storage scaling

This depends a lot on your storage interface capabilities and the underlying implementation. In most clouds, Persistent Volumes can be expanded. You can check if your storage class supports it by describing it:

kubectl describe storageclass standard
…
AllowVolumeExpansion:  True

Unfortunately, just changing the size of the storage in values.yaml (storage.size) will not do the trick and you will see the following error:

helm upgrade -f values.yaml pmm percona/pmm
Error: UPGRADE FAILED: cannot patch "pmm" with kind StatefulSet: StatefulSet.apps "pmm" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We use the StatefulSet object to deploy PMM, and StatefulSets are mostly immutable and there are a handful of things that can be changed on the fly. There is a trick though.

First, delete the StatefulSet, but keep the Pods and PVCs:

kubectl delete sts pmm --cascade=orphan

Recreate it again with the new storage configuration:

helm upgrade -f values.yaml pmm percona/pmm

It will recreate the StatefulSet with the new storage size configuration. 

Edit Persistent Volume Claim manually and change the storage size (the name of the PVC can be different for you). You need to change the storage in spec.resources.requests.storage section:

kubectl edit pvc pmm-storage-pmm-0

The PVC is not resized yet and you can see the following message when you describe it:

kubectl describe pvc pmm-storage-pmm-0
…
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 16 Jun 2022 11:55:56 +0300           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

The last step would be to restart the Pod:

kubectl delete pod pmm-0

Upgrade

Running helm upgrade is a recommended way. Either once a new helm chart is released or when you want to upgrade the newer version of PMM by replacing the image in the image section. 

Backup and restore

PMM stores all the data on a Persistent Volume. As said before, regular Kubernetes techniques can be applied here to backup and restore the data. There are numerous options:

  • Volume Snapshots – check if it is supported by your CSI and storage implementation
  • Third-party tools, like Velero, can handle the backups and restores of PVCs
  • Snapshots provided by your storage (ex. AWS EBS Snapshots) with manual mapping to PVC during restoration

What is coming

To keep you excited there are numerous things that we are working on or have planned to enable further Kubernetes integrations.

OpenShift support

We are working on building a rootless container so that OpenShift users can run Percona Monitoring and Management there without having to grant elevated privileges. 

Microservices architecture

This is something that we have been discussing internally for some time now. As mentioned earlier, there are lots of components in PMM. To enable proper horizontal scaling, we need to break down our monolith container and run these components as separate microservices. 

Managing all these separate containers and Pods (if we talk about Kubernetes), would require coming up with separate maintenance strategies. This brings up the question of creating a separate Operator for PMM only to manage all this, but it is a significant effort. If you have an opinion here – please let us know on our community forum.

Automated k8s registration in DBaaS

As you know, Percona Monitoring and Management comes with a technical preview of  Database as a Service (DBaaS). Right now when you install PMM on a Kubernetes cluster, you still need to register the cluster to deploy databases. We want to automate this process so that after the installation you can start deploying and managing databases right away.

Conclusion

Percona Monitoring and Management enables database administrators and site reliability engineers to pinpoint issues in their open source database environments, whether it is a quick look through the dashboards or a detailed analysis with Query Analytics. Percona’s support for PMM on Kubernetes is a response to the needs of our customers who are transforming their infrastructure.

Some useful links that would help you to deploy PMM on Kubernetes:

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com