Nov
15
2022
--

PMM v2.32: Backup Management for MongoDB in GA, New Home Dashboard, and More!

Percona Monitoring and Management v2.32

Percona Monitoring and Management v2.32We are pleased to announce the general availability of the Backup Management for MongoDB and other improvements in Percona Monitoring and Management (PMM) v.2.32 that has been released in November 2022. Details are in this blog and also in the PMM 2.32 Release Notes.

PMM is now on the scene with a new Home Dashboard where you can quickly and easily check your databases’ health at one glance and detect anomalies. While there’s no one-size-fits-all approach, we created and released the new Home Dashboard to make it more user-friendly, even for users new to PMM.

You can get started using PMM in minutes with PMM Demo to check out the latest version of PMM V2.32.

Let’s have a look at the highlights of PMM 2.32:

General availability of Backup Management for MongoDB

The Backup Management for MongoDB in PMM has reached General Availability and is no longer in Technical Preview.

Supported setups

MongoDB Backup Management now supports replica set setups for the following actions:

  • Create logical snapshot backups
  • Create logical Point In Time Recovery (PITR) backups
  • Create physical snapshot backups. This is available only with Percona Server for MongoDB
  • Restore logical snapshot backups.
  • Restore physical backups. This requires additional manual operations after the restore and is only available with Percona Server for MongoDB.
  • Restore logical PITR backups from S3

Current limitations

  • Restoring logical PITR backups only supports S3 storage type
  • Restoring physical backups requires manual post-restore actions
  • Restoring a MongoDB backup on a new cluster is not yet supported
  • Restoring physical backups for containerized MongoDB setups is not supported
  • Local storage for MySQL is not supported
  • Sharded cluster setups are not supported
  • Backups that are stored locally cannot be removed automatically
  • Retention is not supported for PITR artifacts

 

Quicker and easier database health overview with the new Home Dashboard

As mentioned and promised in previous release notes, we were investigating better approaches, methods, and user-friendly presentation of database health in the Home Dashboard, which is also the entry point to PMM. Finally, we are proud to release this finalized dashboard as the new Home Dashboard. Thank you for your feedback and collaboration during all iterations of the experimental versions.

Monitor hundreds of nodes without any performance issues

If you have hundreds of nodes being monitored with the same PMM instance, the original dashboard may have taken a long time to load, which could have resulted in an unresponsive page, due to the design of the original Home Dashboard with repeating panels for each node. With performance issues in mind, we re-designed the Home Dashboard with new logic to show what is wrong or what is OK with your databases, instead of showing all metrics, for each node.  

PMM Home Dashboard_home_select multiple nodes

PMM Home Dashboard_home_select multiple nodes

Anomaly detection

Many of you probably use dozens of tools for different purposes in your daily work, meetings, and projects. These tools should make your life easier, not more intensive. With monitoring tools, the issue of too many metrics can be daunting— so analyzing data, and detecting anomalies that deviate from a database’s normal behavior should be easy and fast. Functional Anomaly Detection panels, as opposed to separate graphs for each node, are a much better way to visualize and recognize problems with your databases that may require action to be taken.

  • You can click the node name on the panel to see the Node Overview Dashboard of the related node if you see any anomaly. So you can see all metrics of the Node that you need to diagnose the problems.
  • All panels except Advisor Checks can be filtered by node and environment variables
  • Graphs in the Anomaly Detection row show the data for the top 20 nodes. e.g., CPU anomalies in the top 20
Anomaly Detection

PMM Anomaly Detection panels

Command Center panels

The primary motivation behind the new Home Dashboard is simplicity. It was always hard to balance presenting the required metrics for everyone and at the same time, making it clean, functional, and simple while working on the new design. So we decided to use Command Center panels which are collapsed by default. If you see any anomaly in Memory Usage with more than 90%, how do you know when it happened or started? Time-series graphs for the Top 20 in the Command Center panels will help you see when the anomalies occurred: in the last 1 hour or the last week? 

PMM Command Center Panels

PMM Command Center Panels on Home Dashboard

Enhanced main menu

We returned with two improvements we previously promised. These improvements were announced in V2.32 for easier access to dashboards from the Main Menu. After the last changes, with each possible monitored services type represented on the Main Menu as icons, the menu became crowded and extended with all icons representing different service types. In the latest version,  you’ll only see the icons of currently monitored services on the Main Menu. For example, if you’re monitoring MongoDB, you will see the MongoDB Dashboard’s icon on the main menu, as opposed to the previous versions, which showed all database types PMM is capable of monitoring, whether you had them in your system or not. When and if you start to monitor other services like MySQL, they will be automatically added to the Main Menu.

Another improvement on the Main Menu is the visibility of all other dashboards. PMM provides multiple dashboards for different levels of information for each service. You only see some dashboards in the main menu; the rest are available in the folders. Some users can miss these dashboards, which are not presented in the Main Menu. Also, customer dashboards created by different users in your organization can be missed or invisible to you until you see them in the folders by chance. So, we added Other Dashboards links to the sub-menu of each service,  so that you can easily click and see all dashboards in the Service folder.

Quick access to other dashboards from the menu

Quick access to other dashboards from the menu

What’s next?

  • We’ll improve the Vacuum Dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • A health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 
  • For Backup Management, we are planning to continue to iterate on the current limitations listed above and make the restore processes as seamless as possible for all database types.

Install PMM 2.32 now or upgrade your installation to V2.32 by checking our documentation for more information about upgrading.

Learn more about Percona Monitoring and Management 3.32

Thanks to Community and Perconians

We love our community and team in Percona, who shape the future of PMM, together and help us with all those changes.

You can also join us on our community forums to request new features, share your feedback, and ask for support.

Thank you for your collaboration on the new Home Dashboards:

Cihan Tunal?   @SmartMessage 

Tyson McPherson @Parts Authority

Paul Migdalen @IntelyCare

Sep
28
2022
--

PMM v2.31: Enhanced Alerting, User-Friendly Main Menu, Prometheus Query Builder, Podman GA, and more!

Percona Monitoring and Management v2.31

Percona Monitoring and Management v2.31Autumn brought new cool features and improvements to Percona Monitoring and Management (PMM) in V2.31. Enhanced user experience with the updated main menu, Alerting, and better PostgreSQL autovacuum observability with the new Vacuum dashboard are the major themes that we focused on in this release. Check out our Release Note of 2.31 for the full list of new features, enhancements, and bug fixes.

You can get started with PMM in minutes with the PMM Demo to check out the latest version of PMM V2.31.

Some of the highlights in PMM V2.31 include:

General availability of Percona Alerting

We are excited to introduce a streamlined alert setup process in PMM with an overhauled, unified alerting system based on Grafana. 

All Alerting functionality is now consolidated in a single pane of glass on the Alerting page. From here, you can configure, create and monitor alerts based on Percona or Grafana templates. 

The Alert Rules tab has also evolved into a more intuitive interface with an added layer for simplifying complex Grafana rules. You’ll find that the new Percona templated alert option here offers the same functionality available in the Tech Preview of Integrated Alerting but uses a friendlier interface with very advanced alerting capabilities. 

As an important and generally useful feature, this new Alerting feature is now enabled by default and ready to use in production! 

For more information about Percona Alerting, check out Alerting doc.

Deprecated Integrated Alerting

The new Percona Alerting feature fully replaces the old Integrated Alerting Tech Preview available in previous PMM versions. The new alerting brings full feature parity with Integrated Alerting, along with additional benefits like Grafana-based alert rules and a unified alerting command center

However, alert rules created with Integrated Alerting are not automatically migrated to Percona Alerting. After upgrading, make sure to manually migrate any custom alert rules that you want to transfer to PMM 2.31 using the script

Easier query building, enhanced main menu in PMM 2.31

We have powered-up PMM with Grafana 9.1 by drawing on its latest features and improvements. Here is the list of features and enhancements that have been shipped in this release: 

Redesigned expandable main menu (side menu)

With the 2.31 release, we introduce a more user-friendly and accessible main menu inspired by Grafana’s expandable side menu. PMM dashboards are the heart of the monitoring, and we aimed to provide quick and easy access to the frequently used dashboard from the main menu. On this menu, you’ll be able to browse dashboards with one click, like Operating System, MySQL, MongoDB, PostgreSQL, etc. 

PMM new side menu

PMM new side menu

Pin your favorite dashboards to the main menu (side menu)

PMM provides many custom dashboards with dozens of metrics to monitor your databases. Most users in an organization use just a handful of dashboards regularly; now, it is much easier to access them by saving the most frequently used dashboards to the main menu. You see your saved dashboards under the Starred section on the main menu.

This feature is enabled by default in PMM. You can disable it by disabling the savedItems feature flag if you have server admin or Grafa admin roles.

dd

Tip:

You can follow these steps to add your dashboard to Starred on the main menu:

  1. Open your dashboard
  2. Mark it by clicking the star icon next to the Dashboard name on the top right corner
  3. Hover Starred icon on the main menu and see all saved dashboards.

Search dashboards on Panel titles

While looking for a specific metric or panel inside dashboards, it is easy to forget which dashboard presents it. Now you can quickly find the dashboard you need on the Search dashboard page.

Percona Monitoring and Management v2.31

Command palette

A new shortcut which is named “command palette” in Grafana, has been provided in this PMM version. You can easily access main menu sections, dashboards, or other tasks using cmd+K (MacOS) or ctrl+K (Linux/Windows).  Run the command on the Explore section to quickly run a query or on the Preferences section to easily change theme preferences. 

Command Palette

Command Palette

Visual Prometheus Query Builder in Explore (Beta)

A new Prometheus query builder has been introduced in Grafana 9. This feature allows everyone, especially new users, to build queries on PromQL without extensive expertise. Visual query builder UI in Explore allows anyone to write queries and understand what the query means. 

You can easily switch to the new Prometheus query builder (Builder) by clicking on Builder mode in the top-right corner. The Builder mode allows you to build your queries by choosing the metric from the dropdown menu. If you want to continue on Text mode, you can switch to Code mode while having your text changes preserved. Please check this blog to learn more about Builder mode.

new visual query builder

Visual Prometheus Query Builder in Explore (Beta)

Add your queries to a dashboard or create a new dashboard from Explore

You’ll probably like this news if you’re a fan of the Explore feature or frequently use it. Now, creating a panel/dashboard from Explore with one click is possible by saving you from jobs like copy-paste or re-write queries. You only need to click the “Add to dashboard” button after you run your query.  Then, your panel will be automatically created with the query and a default visualization. You can change the visualization on the dashboard later by clicking the “Edit” panel. Note that you need to have the Editor/Admin/SuperAdmin role to save the panel to the chosen dashboard and follow the current dashboard save flow to save the added panel. Otherwise, you’ll lose the added new panel on your dashboard.

add query from Explore

Add your queries to a dashboard or create a new dashboard from Explore

Experimental Vacuum Dashboard

The autovacuum process in PostgreSQL is designed to prevent table bloat by removing dead tuples. These dead tuples can accumulate because of the unique way that PostgreSQL handles MVCC. Because PostgreSQL’s architecture is so unique, the autovacuum process is sometimes not understood well enough to be able to tune its parameters for peak performance. After talking to many customers and realizing that this is a recurring problem, we decided to help our users by providing a dashboard that allows you to monitor metrics related to the vacuum process – thereby helping you tune these parameters for better performance.

Now, you can monitor PostgreSQL vacuum processes with a new experimental dashboard named PostgreSQL Vacuum Monitoring which is available in the Experimental folder. We’re still working on this dashboard to add more metrics. Please let us know your feedback about this dashboard in the comments.

Experimental Vacuum Dashboard

Experimental Vacuum Dashboard

Tip

If you’d like to move the Vacuum experimental dashboard to the PostgreSQL folder or other folders that you internally use to gather all PostgreSQL dashboards, please check this document to see how you can move dashboards to a different folder.

General availability of Podman

We are excited to announce the General Availability (GA) of Podman support for deploying PMM 2.31.0. We had introduced it in 2.29.0 as a preview feature, but now we are production ready with this feature

Simplified deployment with Database as a Service (DBaaS)

In our constant endeavor and focus on an enhanced user experience, in PMM 2.31.0, we have simplified the deployment and configuration of DBaaS as follows:

  • With PMM 2.31.0, you can easily add a DB cluster from a newly created K8s cluster. All the DB cluster window fields are auto-populated with the values based on the existing K8s cluster. 
  • For PMM 2.31.0, while accessing DbaaS, if you have an existing Kubernetes cluster configured for DBaaS, you will be automatically redirected to the DB Cluster page. Otherwise, you would be redirected to the Kubernetes Cluster page.

What’s next?

  • New UX improvements are baking! We’re working on making our main menu easy to use and minimal. Next release, the main menu will present only monitored services, and you’ll have a clearer and less crowded main menu.
  • The home dashboard will replace the experimental Home dashboard, which is available in the Experimental folder after v2.30. Please do not forget to share your feedback with us if you have tried it. 
  • We’ll improve the vacuum dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • We have started to work on two new and big projects: High Availability in PMM and advanced role-based access control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 

Thanks to Community and Perconians

We love our community and team in Percona, who help shape PMM and improve better! 

Thank you for your collaboration on the new main menu:

Pedro Fernandes, Fábio Silva, Matej Kubinec

Thank you for your collaboration on Vacuum Dashboards:

Anton Bystrov, Daniel Burgos, Jiri Ctvrtka, Nailya Kutlubaeva, Umair Shahid

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Sep
14
2022
--

Percona Monitoring and Management Home Dashboard: What’s New?

Percona Monitoring and Management Home Dashboard

Percona Monitoring and Management Home DashboardThe all-new Percona Monitoring and Management (PMM) Home dashboard is the answer to some of the main concerns of our users:

  • A clear entry point (where do I start?)
  • Context (is everything okay?)

How did we achieve this? By coming up with a redesign of the Home dashboard so we can tackle the known performance issues that appeared when the number of monitored nodes increased, and also improve the overall experience.

Now, where is that new amazing Home dashboard? When you install/upgrade PMM to the 2.30.0 version, you are getting access to the Home dashboard, but not by default (yet). Since we would love to have as much feedback as possible from our user base as possible, we haven’t made the new Home dashboard the default one, so where is it?

Meet: The experimental dashboards directory

There are several ways to make it to the experimental directory, but the easiest way is to use the “Search Dashboards” option available in the left bar.

PMM Dashboard

And scroll to find the “Experimental” directory:

experimental PMM home dashboard

Or, just directly write “new-home” in the textbox and click on the link:

And that’s it! Once you’re there this is what you should be looking for:

The above screenshot is from the PMM Demo hosted by Percona, and you can access it by clicking this link.

Navigating the new Home dashboard

The idea behind this is simplicity. You as a user don’t need to be bombarded with tons of information that, in the end, is just noise. So, what are we looking at here? 

Outliers, things that are odd, wrong, and shouldn’t be.

And outliers where? At the server level. Since PMM is a polyglot monitoring tool that spans from MySQL to PostgreSQL to MongoDB, and also monitors a couple of reverse proxy tools (HAProxy and ProxySQL), we gotta make sure that the home works for everyone – and what does everyone has in common? Server-level metrics!

Anomaly detection PMM
The Anomaly detection panel is pretty simple, the left part informs on CPU/Disk Queue anomalies and the right part shows wrong behavior, and by wrong we mean metrics beyond a high threshold for a considerable amount of time:

  • CPU above 90% for more than 15 minutes
  • Disk Queue above 20 requests over a minute

Those values are customizable but we consider them to be good enough to catch up on real issues and also reduce the probability of false positives.

Now, the anomaly detection part. As impressive as it sounds, it came out very simple, but for details let’s talk about the next panel

Command Center

PMM Command Center
When designing the new Home dashboard, one of the main issues to tackle was the lack of context on the metrics provided: we can see that a CPU is at 50% but how do we know if that is expected or not? Is it normal behavior or is it an anomaly?

One of the considered options was to calculate the Z-Score. The standard score (or z-score) is a measure of the amount of standard deviation above or below the mean. Is a pretty cool way to find outliers. You can read more about it on Wikipedia: https://en.wikipedia.org/wiki/Standard_score.

However, out of simplicity and having scalability as a key condition, we ended up implementing a much more basic option: Seasonality, which is pretty much comparing the current value with the value for the same metric an X amount of time ago. In our case, we choose that amount of time to be a week. Why a week? Because it covers a big chunk of cases, for example:

Imagine you are a restaurant reservation app and your peak of traffic is on the weekends. What would you have to say about a CPU usage of 50%? Well, it depends on the day of the week and the hour. A Thursday night? Expected, and, in fact, probably a little bit low. But, what about a Monday morning? Totally wrong, an anomaly! Should be easy to spot.

We are calculating anomalies for:

  • CPU usage
  • Disk queue
  • Disk Write latency
  • Disk Read latency
  • Memory usage

In the Command Center, it is easy to spot the trends by having graphs for the last hour and graphs for a week ago, also the last hour.

Making it the default Home dashboard

Do you like it enough to make it the new Home dashboard for your PMM? Here’s how to do it:

default Home dashboard PMM
On the TOP left part, right next to the dashboard name, mark it as a FAVORITE.


Now, go to the “Preferences” section by clicking on the link available at the left bar at the bottom.


And finally, select the New-Home Dashboard as the default option. That’s it!

Final considerations

Is this good enough for debugging? No, but the idea of the Home dashboard is to work as a central health overview point to check in a bit on the status of your database fleet, identify any suspicious situations, and then go deep with the summary dashboards either for MySQL, Percona XtraDB Cluster, PostgreSQL, or MongoDB.

We would love to hear from you on this new Home dashboard, so feel free to hit the comment section or go directly to the forum (https://forums.percona.com/t/new-home-dashboard-feedback/16935) to leave us any feedback you have!

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Sep
08
2022
--

Enabling ProcFS UDF in Percona Monitoring and Management

Enabling ProcFS UDF in Percona Monitoring and Management

Enabling ProcFS UDF in Percona Monitoring and ManagementIn my previous blog post, ProcFS UDF: A Different Approach to Agentless Operating System Observability in Your Database, I wrote about the ProcFS UDF MySQL plugin, which allows you to get operating systems stats, through the MySQL database, without having shell access to the server and any local agent installation.

Some of you wondered whether there is a way to use this goodness in Percona Monitoring and Management (PMM), and this blog post will show you exactly how to do that.

Unfortunately, at this point, Percona Monitoring and Management does not support the ProcFS UDF MySQL plugin out of the box. It is in the backlog, along with many other cool things. However, if this particular feature would be valuable to you, please let us know. 

That said, Percona Monitoring and Management is extensible, so you can actually make things work with a little bit of elbow grease using the external exporter functionality.

Here’s how:

1. Configure the MySQL host you wish to monitor with ProcFS UDF as described in this blog post.

2. Add this MySQL server as a remote instance using “Add Instance,” available in the PMM menu in the top right corner.

PMM Add Instance

3. Pick the host to capture metrics.

While you do not need any agent installed on the MySQL server, if you’re looking to monitor, you’ll need a server to capture metrics from it and pass them to the PMM server. This server will need the PMM client installed and configured. You will also need to install Node Exporter with ProcFS UDF:

docker/podman run -p 9100:9100 -d docker.io/perconalab/node_exporter:procfs --collector.mysqlprocfs="MYSQLUSER:MYSQLPASSWORD@tcp(MYSQLHOST:3306)"

If you do not want to use docker, instructions on how to compile patched Node Exporter are included in the previously mentioned ProcFS UDF Introduction blog post.

You can use one host to monitor multiple MySQL servers — just run multiple Node Exporters on different ports.

4. Configure passing information to PMM.

Now that we have MySQL metrics flowing to PMM as a remote instance and Node Exporter running on a different host, which is ready to provide us metrics, how do we establish a connection so that those “node metrics” are attached to the correct host?

First, we need to find the node_id of the remote node we’ve added:

root@client1:# pmm-admin inventory list nodes
Nodes list.

Node type     Node name            Address           Node ID
GENERIC_NODE  mysql1               66.228.62.195
GENERIC_NODE  client1              50.116.36.182     /node_id/9ba48cd4-a7c2-43f6-9fa6-9571d1f0aebf     
….
REMOTE_NODE   procfstest           173.230.136.197   /node_id/29800e10-53fc-43f7-bba6-27c22ab3a483

Second. we need to get the external service added for this node:

root@client1:~# pmm-admin inventory add service external --name=procfstest-node --node-id=/node_id/29800e10-53fc-43f7-bba6-27c22ab3a483
External Service added.
Service ID     : /service_id/c477453f-29fb-41e1-aa64-84ee51c38cd8
Service name   : procfstest-node
Node ID        : /node_id/29800e10-53fc-43f7-bba6-27c22ab3a483
Environment    :
Cluster name   :
Replication set:
Custom labels  : map[]
Group          : external

Note: The Node ID we use here is the Node ID of the remote node we are monitoring.

This creates the external service, but it is really orphan at this point — there is no agent to supply the data.

root@client1:~# pmm-admin inventory add agent external --runs-on-node-id=/node_id/9ba48cd4-a7c2-43f6-9fa6-9571d1f0aebf --service-id=/service_id/c477453f-29fb-41e1-aa64-84eec38cd8 --listen-port=9100
External Exporter added.
Agent ID              : /agent_id/93e3856a-6d74-4f62-8e8d-821f7de73977
Runs on node ID       : /node_id/9ba48cd4-a7c2-43f6-9fa6-9571d1f0aebf
Service ID            : /service_id/c477453f-29fb-41e1-aa64-84ee51c38cd8
Username              :
Scheme                : http
Metrics path          : /metrics
Listen port           : 9100

Disabled              : false
Custom labels         : map[]

This command now specifies what external service we’ve created for our remote node will get data from the agent, which is running on the other node and will use the port specified by the listen-port option. This is what our ProcFS Enabled Node Exporter is using.

After you’ve done these steps, you should see OS data for the remote host appear on the home dashboard.

PMM Remote Host Percona

And even more important, you’ll have OS metrics populated in the Node Summary dashboard.

Node Summary dashboard PMM

Summary

While this is much harder than it has to be, I think it serves as a great proof of concept (POC) of what is possible with the ProcFS UDF MySQL plugin — getting the full power of the operating system and MySQL observability without requiring any shell access to MySQL.

I think this can be extremely valuable for MySQL provided as a Database as a Service (DBaaS), as well as for enterprises practicing great separation between their teams responsible for operating system and database operations!

Learn more about ProcFS UDF

Sep
02
2022
--

PMM v2.30: New Dashboards, Improved DBaaS Experience, Free K8s Cluster for Testing, and More!

Percona Monitoring and Management v2.30

Percona Monitoring and Management v2.30In this v2.30 release, we have focused on improving Percona Monitoring and Management (PMM) for usability, refining the dashboards (effortless root cause analysis), Database as a Service (DBaaS) functionality, and seamless onboarding to align with your needs. Your valuable feedback along with your contributions is important to us. Please keep contributing to take PMM to the next level and make the world more open source.

Note that you can get started with PMM in minutes with the PMM Demo to check out the latest version of PMM V2.30.

Here are some of the highlights in PMM V2.30:

New experimental dashboards

We created new experimental dashboards based on your feedback and requests. The new Home Dashboard was designed to gain actionable insights while monitoring numerous nodes or services without excessive loading time. Our objective with the all-new MongoDB dashboards is to provide collections and document-based metrics as requested by the PMM users.

Important: These experimental dashboards are subject to change. It is recommended to use these dashboards for testing purposes only.

New Home Dashboard (experimental)

The redesigned and improved new Home Dashboard was introduced as a more user-friendly and insightful experimental dashboard in PMM version 2.30.0. The driver behind the new Home Dashboard design is your requests, and feedback about long loading time issues of the existing Home dashboard for monitoring a large number of nodes or services in PMM. Repeated rows (for a large number of nodes) for each node on the dashboard can take a considerable time to load the Home Dashboard UI, which was reported as a performance problem by some users. So we reworked the Home dashboard to provide a boost in performance for monitoring big scales. Our new experimental dashboard ensures fast retrieval of data on the Home Dashboard. For more information, read more on experimental Home Dashboard metrics.

New Home Dashboard (experimental)

Please do not hesitate to give feedback by posting your comments here.

Tip: If you’d like to use an experimental Home dashboard as your default dashboard, you can follow these steps :
  • Open the new Home Dashboard and start it
  • Open the “Configuration” page by clicking it from the User Profile on the main menu
  • Select the new dashboard from the Home Dashboard dropdown under preferences. 

New MongoDB Dashboard (experimental)

After dozens of calls with PMM users who monitor MongoDB metrics in PMM, we collected all the feedback and put them together on new experimental dashboards. (Special thanks to Kimberly Wilkins and Michael Patrick from Percona) We have introduced the following experimental dashboards to collect more metrics and data for MongoDB and present them in a more structured way with new dashboards:

1. Experimental MongoDB collection overview

This dashboard contains panels for data about the hottest collections in the MongoDB database. For more information, see the documentation.

Experimental MongoDB collection overview

2. Experimental MongoDB collection details

This experimental dashboard provides detailed information about the top collections by document count, size, and document read for MongoDB databases. For more information, see the documentation.

Experimental MongoDB collection details

3. Experimental MongoDB oplog details

This real-time dashboard contains oplog details such as Recovery Window, Processing Time, Buffer Capacity, and Oplog Operations. For more information, see the documentation.

oplog details dashboard

Tip: If you’d like to move MongoDB experimental dashboards to the MongoDB folder or other folders that you internally use to gather all MongoDB dashboards, you can follow these steps:
Note: You should have at least an Editor role to do this change.
  • Open the dashboard that you want to move to another folder
  • Click the gear icon to open Dashboard Settings
  • Select the folder name that you want to move the dashboard from the dropdown under the folder in the General section
  • Click the “Save Dashboard” button on the left to save the change

Improved user experience for DBaaS

Usability is one of the themes on our roadmap and is now more important for us to give the best PMM experience in minutes. So we applied a couple of improvements to the configuration and deployment of DBaaS database clusters:

  • In the 2.30.0 release of PMM, we simplified the process of registering the K8s cluster to suggest defaults and populated all possible defaults on the database creation screen.
  • If you have configured the DBaaS feature with K8s registering in it, the creation of the database will be a “one click of the button” action.
  • For more information, see the documentation.
  • We have simplified the DBaaS and Percona Platform connection configuration by suggesting and automatically setting the public address of the PMM server if it’s not set up. This happens when connecting to Percona Platform, adding a K8s cluster, or adding a database cluster. 
  • If you are not a K8s user for now but want to test the DBaaS functionality of PMM, Percona is offering a cluster for free via the Percona Platform portal. Read more in the Private DBaaS with Free Kubernetes Cluster blog post.

Operators support

PMM now supports Percona Operator for MySQL based on Percona XtraDB Cluster.

What’s next?

We’re working on a new main menu on top of the Grafana 9.1 menu structure and Vacuum Monitoring with a new experimental dashboard. If you have any feedback or suggestions, please get in touch with us

Also, another topic on our next release agenda is Podman GA. Please share your experience/feedback with Podman. 

Run PMM Server with Podman now!

We know that you’re waiting for news for Alerting in PMM and we’ll get back to you with good news in v2.31 by combining our Integrated Alerting with Grafana Alerting to make it easier for setup and management. Please follow our emails and release notes to be informed about upcoming releases.

Thanks to the Community and Perconians

We love our Community and team at Percona who help us to shape PMM and make it better! 

Thank you for your collaboration on the new Home Dashboard:

  • Daniel Burgos
  • Anton Bystrov

Thank you for your collaboration on MongoDB dashboards:

  • Anton Bystrov
  • Kimberly Wilkins
  • Michael Patrick

Appreciate your feedback:

Learn more about Percona Monitoring and Management

Aug
23
2022
--

Installing PMM Server from RHEL Repositories

Installing PMM Server from RHEL Repositories

Installing PMM Server from RHEL RepositoriesWe currently provide multiple ways to install Percona Monitoring and Management (PMM) Server, with the primary way  to use a docker:

Install Percona Monitoring and Management

or Podman:

Podman – Percona Monitoring and Management

We implemented it this way to simplify deployments, as the PMM server uses multiple components like Nginx, Grafana, PostgreSQL, ClickHouse, VictoriaMetrics, etc. So we want to avoid the pain of configuring each component individually, which is labor intensive and error-prone.

For this reason, we also provide bundles in a virtual box:

Virtual Appliance – Percona Monitoring and Management

or as Amazon AMI:

AWS Marketplace – Percona Monitoring and Management

Recently, for even simpler and effort-free deployments, we partnered with HostedPMM. They will install and manage the PMM server for you with a 1-Click experience.

However, we still see some interest in having a PMM server installed from repos, using, for example, rpm packages. For this reason, we have crafted ansible scripts that you can use on RedHat 7 compatible system.

The scripts are located here:

Percona-Lab/install-repo-pmm-server (github.com)

Please note these scripts are ONLY for EXPERIMENTATION and quick evaluation of PMM-server, and this way IS NOT SUPPORTED by Percona.

The commands to install PMM Server:

git clone https://github.com/Percona-Lab/install-repo-pmm-server
cd install-repo-pmm-server/ansible
ansible-playbook -v -i 'localhost,' -c local pmm2-docker/main.yml
ansible-playbook -v -i 'localhost,' -c local /usr/share/pmm-update/ansible/playbook/tasks/update.yml

And you should have an empty RedHat 7 system without any database or web-server packages installed.

Let us know if you have interest in this way of deploying PMM server!

Jul
13
2022
--

How Percona Monitoring and Management Helps You Find Out Why Your MySQL Server Is Stalling

Percona Monitoring and Management MySQL Server Is Stalling

In this blog, I will demonstrate how to use Percona Monitoring and Management (PMM) to find out the reason why the MySQL server is stalling. I will use only one typical situation for the MySQL server stall in this example, but the same dashboards, graphs, and principles will help you in all other cases.

Nobody wants it but database servers may stop handling connections at some point. As a result, the application will slow down and then will stop responding.

It is always better to know about the stall from a monitoring instrument rather than from your own customers.

PMM is a great help in this case. If you look at its graphs and notice that many of them started showing unusual behavior, you need to react. In the case of stalls, you will see that either some activity went to 0 or, otherwise, it increased to high numbers. In both cases, it does not change.

Let’s review the dashboard “MySQL Instance Summary” and its graph “MySQL Client Thread Activity” during normal operation:

PMM MySQL Instance Summary

As you see, the number of active threads fluctuates and this is normal for any healthy application: even if all connections request data, MySQL puts some threads into idle states while they need to wait while the storage engine prepares the data for them. Or, if the client application processes retrieved data.

The next screenshot was taken when the server was stalling:

Percona monitoring dashboard

In this picture, you see that the number of active threads is near maximum. At the same time the number of “MySQL Temporary Objects” lowered down to zero. This by itself shows that something unusual happened. But to understand the picture better let’s examine storage engine graphs.

I, like most MySQL users, used InnoDB for this example. Therefore the next step for figuring out what is going on would be to examine graphs in the “MySQL InnoDB Details” dashboard.

MySQL InnoDB Details

First, we are seeing that the number of rows that InnoDB reads per second went down to zero, as well as the number of rows written. This means that something prevents InnoDB from performing its operations.

MySQL InnoDB Details PMM

More importantly, we see that all I/O operations were stopped. This is unusual even on a server that does not handle any user connection: InnoDB always performs background operations and is never completely idle.

Percona InnoDB

You may see this in the “InnoDB Logging Performance” graph: InnoDB still uses log files but only for background operations.

InnoDB Logging Performance

InnoDB Buffer Pool activity is also stopped. What is interesting here is that the number of dirty pages went down to zero. This is visible on the “InnoDB Buffer Pool Data” graph: dirty pages are colored in yellow. This actually shows that InnoDB was able to flush all dirty pages from the buffer pool when InnoDB stopped processing user queries.

At this point we can make the first conclusion that our stall was caused by some external lock, preventing MySQL and InnoDB from handling user requests.

MySQL and InnoDB

The “Transaction History” graph confirms this guess: there are no new transactions and InnoDB was able to flush all transactions that were waiting in the queue before the stall happened.

We can conclude that we are NOT experiencing hardware issues.

MySQL and InnoDB problems

This group shows why we experience the stall. As you can see in the  “InnoDB Row Lock Wait Time” graph, the wait time was raised to its maximum value around 14:02, then lowered to zero. There is no row lock waits registered during the stall time.

This means that at some point all InnoDB transactions were waiting for a row lock, then failed with a timeout. Still, they have to wait for something. Since there are no hardware issues and InnoDB functions healthy in the background, this means that all threads are waiting for a global MDL lock, created by the server.

If we have Query Analytics (QAN) enabled we can find such a command easily.

Query Analytics (QAN)

For the selected time frame we can see that many queries were running until a certain time when a query with id 2 was issued, then other queries stopped running and restarted a few minutes later. The query with id 2 is FLUSH TABLES WITH READ LOCK which prevents any write activity once the tables are flushed.

This is the command that caused a full server stall.

Once we know the reason for the stall we can perform actions to prevent similar issues in the future.

Conclusion

PMM is a great tool that helps not only to identify if your database server is stalling but also to figure out what was the reason for the stall. I used only one scenario in this blog. But you may use the same dashboards and graphs to find out other reasons for the stall, such as DoS attack, hardware failure, a high number of IO operations, caused by poorly optimized queries, and many more.

Jul
07
2022
--

Automate the SSL Certificate Lifecycle of your Percona Monitoring and Management Server

SSL Certificate Lifecycle of your Percona Monitoring and Management

SSL Certificate Lifecycle of your Percona Monitoring and ManagementWe highly value security here at Percona, and in this blog post, we will show how to protect your Percona Monitoring and Management (PMM) Server with an SSL certificate and automate its lifecycle by leveraging a proxy server.

Introduction

As you may know, PMM Server provides a self-signed SSL certificate out-of-the-box to encrypt traffic between the client (web or CLI) and the server. While some people choose to use this certificate in non-critical environments, oftentimes protected by a private network, it is definitely not the best security practice for production environments.

I have to mention that a self-signed certificate still achieves the goal of encrypting the connection. Here are a few things, or problems, you should know about when it comes to self-signed certificates in general:

  • they cannot be verified by a trusted Certificate Authority (CA)
  • they cannot be revoked as a result of a security incident (if issued by a trusted CA, they can be revoked)
  • when used on public websites, they may negatively impact your brand or personal reputation

This is why most modern browsers will show pretty unappealing security warnings when they detect a self-signed certificate. Most of them will prevent users from accidentally opening websites that do not feature a secure connection (have you heard of the thisisunsafe hack for Chrome?). The browser vendors are obviously attempting to raise security awareness amongst the broad internet community.

We highly encourage our users to have their SSL certificates issued by a trusted Certificate Authority.

Some people find it overwhelming to keep track of expiry dates and remember to renew the certificates before they expire. Until several years ago, SSL certificates had to be paid for and were quite expensive. Projects like Let’s Encrypt — a non-profit Certificate Authority — have been devised to popularize web security by offering SSL certificates absolutely free of charge. However, their validity is only limited to three months, which is quite short.

We will explore the two most popular reverse proxy tools which allow you to leverage an SSL certificate for better security of your PMM instance. More importantly, we will cover how to automate the certificate renewal using those tools. While PMM Server is distributed in three flavors — docker, AMI, and OVF — we will focus on docker being our most popular distribution.

All scripts in this post assume you have a basic familiarity and experience of working with docker and docker compose.

Reverse proxies

While the choice of open source tools is quite abundant these days, we will talk about two of them that I consider being the most popular: nginx and traefik.

In order to try one of the solutions proposed here, you’ll need the following:

Let’s take a look at our network diagram. It shows that the proxy server is standing between PMM Server and its clients. This means that the proxy, not PMM Server, takes care of terminating SSL and encrypting traffic.

PMM Server and Client Network Diagram

 

Nginx

Nginx came to the market in 2004, which is quite a solid age for any software product. It has gained tremendous adoption since then and to this date, it powers many websites as a reverse proxy. Let’s see what it takes to use Nginx for SSL certificate management.

First, let me remind you where PMM Server stores the self-signed certificates:

% docker exec -t pmm-server sh -c "ls -l /srv/nginx/*.{crt,key,pem}"
total 24
-rw-r--r-- 1 root root 6016 Jun 10 11:24 ca-certs.pem
-rw-r--r-- 1 root root  977 Jun 10 11:24 certificate.crt
-rw-r--r-- 1 root root 1704 Jun 10 11:24 certificate.key
-rw-r--r-- 1 root root  424 Jun 10 11:24 dhparam.pem

We could certainly pass our own certificates by mounting a host directory containing custom certificates or, alternatively, by copying the certificates to the container. However, that would require us to issue the certificates first. What we want to achieve is to get the certificates issued automatically. I must say that even though nginx does not offer such functionality out of the box, there are a few open source projects that effectively close that gap.

One such project is nginx-proxy. It seems to be quite mature and stable, boasting of 16K GitHub stars. It has a peer project — acme-companion — which takes care of the certificate lifecycle. We will use them both, so ultimately we will end up running three separate containers:

  1. nginx-proxy – the reverse proxy
  2. acme-companion – certificate management component
  3. pmm-server – the proxied container

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:
# proxy.conf

server_tokens off;
client_max_body_size 10m;

# HTTP 1.1 support
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $http_host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $proxy_connection;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
proxy_set_header X-Forwarded-Ssl $proxy_x_forwarded_ssl;
proxy_set_header X-Forwarded-Port $proxy_x_forwarded_port;

# Mitigate httpoxy attack
proxy_set_header Proxy "";

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:

Note that most recent versions of docker client come with docker compose sub-command, while earlier versions may require you to additionally install a python-based docker-compose tool.

Here is what docker-compose.yml looks like. You can go ahead and create it while making sure to replace my-domain.org with your own public domain name.

# docker-compose.yml

version: '3'

services:

  nginx:
    image: nginxproxy/nginx-proxy
    container_name: nginx
    environment:
      - DEFAULT_HOST=pmm.my-domain.org
    ports:
      - '80:80'
      - '443:443'
    restart: always
    volumes:
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf:ro
      - ./nginx/vhost.d:/etc/nginx/vhost.d:ro
      - ./nginx/certs:/etc/nginx/certs:ro
      - ./nginx/html:/usr/share/nginx/html
      - ./nginx/dhparam:/etc/nginx/dhparam
      - /var/run/docker.sock:/tmp/docker.sock:ro

  letsencrypt:
    image: nginxproxy/acme-companion
    container_name: acme-companion
    restart: always
    environment:
      - NGINX_PROXY_CONTAINER=nginx
      - DEFAULT_EMAIL=admin@my-domain.org
    depends_on:
      - nginx
    volumes:
      - ./nginx/vhost.d:/etc/nginx/vhost.d
      - ./nginx/certs:/etc/nginx/certs
      - ./nginx/html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro

  pmm-server:
    container_name: pmm-server
    image: percona/pmm-server:2
    restart: unless-stopped
    environment:
      - VIRTUAL_HOST=pmm.my-domain.org
      - LETSENCRYPT_HOST=pmm.my-domain.org
      - VIRTUAL_PORT=80
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    depends_on:
      - nginx
    volumes:
      - pmm-data:/srv
    expose:
      - '80'

volumes:
  pmm-data:
    name: pmm-data
    external: false

Now launch your site by running docker compose up -d. That’s it! If everything works fine, you should see your PMM Server’s login page by visiting https://pmm.my-domain.org. If you type docker ps -a in the terminal, you will see all three containers listed:

% docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED       STATUS                 PORTS                                                                      NAMES
07979480350c   percona/pmm-server:2        "/opt/entrypoint.sh"     1 hours ago   Up 1 hours (healthy)   80/tcp, 443/tcp                                                            pmm-server
b7ed4cbf1064   nginxproxy/acme-companion   "/bin/bash /app/entr…"   1 hours ago   Up 1 hours                                                                                        acme-companion
d35da441c103   nginxproxy/nginx-proxy      "/app/docker-entrypo…"   1 hours ago   Up 1 hours             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   nginx

Should anything go wrong, you can troubleshoot by checking the container logs, i.e. docker logs pmm-server or docker logs acme-companion maybe useful to explore.

Restrict access to PMM

Sometimes you need to restrict access to known IPs or networks. Gladly, nginx-proxy allows you to easily configure that. All you need to do is:

  • create a text file with the name of your domain — for example, pmm.my-domain.org — and save it to ./nginx/vhost.d/
    # Contents of pmm.my-domain.org
    
    # Allow trafic from known IP addresses or networks
    allow 127.0.0.0/8;  # internal network
    allow 100.1.1.0/20; # another network   
    allow 219.24.87.93; # just one IP address
    
    # Reject traffic from all other IPs and networks
    deny all;
  • finally, restart the container with docker restart pmm-server.

From now on, if someone tries to access your PMM Server from an IP address that is not in the allow list, they should get the error HTTP 403 Forbidden.

Test the connection security

While protecting your connection with an SSL certificate is a good idea, there are a whole lot of other security concerns that the certificate alone does not address. There are a few useful web services out there that allow you to not only verify your certificate but also perform a number of other vital checks to test the connection, such as sufficient cipher strength, use of outdated security protocols, vulnerability to known SSL attacks, etc.

Qualys is one such service, and all it takes to review the certificate is to go to their web page, enter the domain name of your site, and let them do the rest. After a few minutes you should see a scan report like this:

Voilà! We have achieved a very high rating of connection security without really spending much time! That’s the magic of the tool ?

Prior to using the Nginx proxy, I could never achieve the same result from the first attempt and the best score I could get was “B”. It took me time to google up what every warning of the scan report meant and it took even more time to find a proper solution and apply it to my configuration.

To wrap it up — if you keep your version of Nginx and its companion updated, it’ll save you a ton of time and also make your TLS comply with the highest security standards. And yes, certificate rotation before expiry is guaranteed.

Traefik

Traefik is a more recent product. It was first released in 2016 and has gained substantial momentum since then.

Traefik is super popular when it comes to using it as a proxy server and a load balancer for docker and Kubernetes alike. Many claim Traefik to be more powerful and flexible compared to Nginx, but also a bit more difficult to set up. Unlike Nginx, Traefik can manage the certificate lifecycle without additional tools, which is why we will launch only two containers — one for PMM Server and one for Traefik proxy.

Basic configuration

To get started with a very minimal Traefik configuration please follow the steps below.

  • choose a root directory for your project
  • create a basic docker-compose.yml in the root folder with the following contents:
    version: '3'
    
    services:
    
      traefik:
        image: traefik:v2.7
        container_name: traefik
        restart: always
        command:
          - '--log.level=DEBUG'
          - '--providers.docker=true'
          - '--providers.docker.exposedbydefault=false'
          - '--entrypoints.web.address=:80'
          - '--entrypoints.websecure.address=:443'
          - '--certificatesresolvers.le.acme.httpchallenge=true'
          - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
          - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
          - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
        ports:
          - '80:80'
          - '443:443'
        volumes:
          - ./letsencrypt:/letsencrypt
          - /var/run/docker.sock:/var/run/docker.sock:ro
        networks:
          - www
    
      pmm-server:
        image: percona/pmm-server:2
        container_name: pmm-server
        restart: always
        environment:
          - DISABLE_TELEMETRY=0
          - ENABLE_DBAAS=1
          - ENABLE_BACKUP_MANAGEMENT=1
          - ENABLE_ALERTING=1
          - GF_ANALYTICS_CHECK_FOR_UPDATES=false
          - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
          - GF_SECURITY_DISABLE_GRAVATAR=true
        volumes:
          - pmm-data:/srv
        expose:
          - '80'
        labels:
          - 'traefik.enable=true'
          - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
          - 'traefik.http.routers.pmm.entrypoints=websecure'
          - 'traefik.http.routers.pmm.tls.certresolver=le'
        networks:
          - www
    
    volumes:
      pmm-data:
        name: pmm-data
        external: false
    
    networks:
      www:
        name: www
        external: false
  • replace my-domain.org with the domain name you own
  • make sure ports 80 and 443 are open on your server and accessible publicly
  • create a directory letsencrypt in the root of your project — it will be mounted to the container so Traefik can save the certificates

Once you launch the project with docker compose up -d, you should be able to go to https://pmm.my-domain.org and see the login page of your PMM Server.

If you run docker ps -a, you will see two containers in the output:

% docker ps -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                    PORTS                                                                      NAMES
4a2746c26738   traefik:v2.7           "/entrypoint.sh --pr…"   52 seconds ago   Up 48 seconds             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik
27943a8edfef   percona/pmm-server:2   "/opt/entrypoint.sh"     52 seconds ago   Up 48 seconds (healthy)   80/tcp, 443/tcp                                                            pmm-server

If you face any issues, you can troubleshoot by checking the logs of Traefik container with docker logs traefik. For obvious performance reasons, we advise you to decrease the log verbosity by setting the log.level parameter to INFO.

Advanced configuration

A closer analysis of the basic configuration confirms that it suffers from a few problems, in particular:

  • Visiting http://pmm.my-domain.org will result in HTTP 404 Not found, i.e. no redirect is configured by default.
  • Qualys SSL report rates our configuration with a “B”, which is insufficient for prod environments. It highlights three issues:
    • the server supports outdated, or insecure, TLS protocols 1.0 and 1.1
      Proxy configuration - B-flat report
    • the server uses quite a few weak ciphers
      Proxy configuration - weak ciphers
    • absence of security headers in proxy response

We will try to address all of the mentioned issues one by one.

  1. Redirect HTTP to HTTPS
    Traefik uses a special HTTP option called entryPoint. It can be easily achieved by passing the following parameters to Traefik:

    --entrypoints.web.http.redirections.entryPoint.to=websecure
    --entrypoints.web.http.redirections.entryPoint.scheme=https
  2. Avoid using outdated protocols TLS 1.0 and 1.1
  3. Avoid using weak ciphers

    As those two issues relate to TLS, why not combine the solution for both in one? Traefik offers two types of configuration — static and dynamic. The number of options necessary to address the ciphers can get quite big, this is why we’ll choose to use the dynamic configuration, i.e. put everything in a file versus inline in our docker-compose.yml file. Therefore, let’s create a file called dynamic.yml and place it in the project root:

    tls:
      options:
        default:
          minVersion: VersionTLS12
          sniStrict: true
          preferServerCipherSuites: true
          cipherSuites:
            - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
            - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
            - TLS_CHACHA20_POLY1305_SHA256
            - TLS_AES_256_GCM_SHA384
            - TLS_AES_128_GCM_SHA256
          curvePreferences:
            - secp521r1
            - secp384r1

    As you can see here, we limit the TLS version only to the most secure, the same is true for the cipher suites and cryptoalgorythmic preferences. The file name can be anything you like, so we will need to point Traefik to the file containing the dynamic configuration. To do this, we will leverage the file provider parameter:

    - '--providers.file=true'
    - '--providers.file.filename=/dynamic.yml'
  4. Add security headers to the proxy response

    Additional security headers will go to the static configuration though:

    - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
    - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
    - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
    - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
    - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'

     

That should be it for Traefik. We have successfully addressed all security concerns, which is confirmed by a fresh SSL report.

Proxy configuration - A-plus superior report

For your convenience, I will put the final version of yml file below.

docker-compose.yml

version: '3'

services:
  traefik:
    image: traefik:v2.7
    container_name: traefik
    hostname: traefik
    command:
      - '--log.level=DEBUG'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--providers.file=true'
      - '--providers.file.filename=/dynamic.yml'
      - '--entrypoints.web.address=:80'
      - '--entrypoints.web.http.redirections.entryPoint.to=websecure'
      - '--entrypoints.web.http.redirections.entryPoint.scheme=https'
      - '--entrypoints.websecure.address=:443'
      - '--certificatesresolvers.le.acme.httpchallenge=true'
      - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
      - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
      - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./letsencrypt:/letsencrypt
      - ./dynamic.yml:/dynamic.yml
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - www

  pmm-server:
    image: percona/pmm-server:2
    container_name: pmm-server
    restart: always
    environment:
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    volumes:
      - pmm-data:/srv
    expose:
      - '80'
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
      - 'traefik.http.routers.pmm.entrypoints=websecure'
      - 'traefik.http.routers.pmm.tls.certresolver=le'
      - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
      - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
      - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
      - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
      - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'
    networks:
      - www

volumes:
  pmm-data:
    name: pmm-data
    external: false

networks:
  www:
    name: www
    external: false

Some security aspects

Docker socket

I’m sure you noticed the following section in docker-compose.yml:

volumes:
      - '/var/run/docker.sock:/var/run/docker.sock:ro'

You may say it’s not super secure to pass the docker socket to one or more containers, right? True, but this is a hard requirement if we want to automate things. It turns out, that both the proxy server and its companion are doing many things under the hood to make it happen, so they need access to docker runtime to be able to listen to docker events and react to them.

Consider this example: if you add a new service (or site in our case) to the same or a different docker-compose.yml and then launch it with docker compose up -d, the proxy will detect this event and request a new certificate from a Certificate Authority or pick it up from the storage, if it has already been issued. Only then it will be able to pass the certificate to your service. This kind of low-level control over docker runtime is what makes it possible for Nginx or Traefik to manipulate other containers. The mission would be impossible without the socket.

Oftentimes, when developing apps or services, engineering teams will put all their services — backend, frontend, and database — in one docker-compose.yml, which is super convenient as it can be shared within the team. However, you will certainly want to isolate prod from non-prod environments. For instance, if your database server is also deployed in a docker container, it should be done in a different environment, much more hardened ? than this.

The use of port 80

It may appear weird to mention port 80, which is the standard port used for unencrypted HTTP, in a blog post about security. Nonetheless, both Nginx and Traefik rely on it being open on the server end. This is because we used the so-called HTTP challenge (read more about challenge types here), which is supported by most proxies and is the easiest to implement. When the CA is about to process the request to issue a certificate, it needs to verify your domain ownership. It does so by querying a special .well-known endpoint on your site which can only be available on port 80. Then, in case of success, it proceeds to issue the certificate.

Both proxy servers provide quite convenient options for redirecting HTTP to HTTPS, which is also good for search engine optimization (SEO).

Conclusion

I find it difficult to disagree that automating the SSL certificate lifecycle, especially if you own or maintain more than just a few public sites (or PMM servers ?), is a much more pleasant journey than doing it manually. Thankfully, the maturity and feature completeness of the open source proxies are amazing, they are used and trusted by many, which is why we recommend you try them out as well.

P.S. There is so much more you can do with your configurations, i.e. add compression for the static assets, harden your security even more, issue a wildcard certificate, etc., but that would be too much for one post and it is probably a good topic to explore in the next one.

Have you tried some other proxies? How do they compare to Nginx or Traefik? We’d love to hear about your experiences, please share them with us via our community forums.

Jun
22
2022
--

Percona Monitoring and Management in Kubernetes is now in Tech Preview

Percona Monitoring and Management in Kubernetes

Percona Monitoring and Management in KubernetesOver the course of the years, we see the growing interest in running databases and stateful workloads in Kubernetes. With Container Storage Interfaces (CSI) maturing and more and more Operators appearing, running stateful workloads in your favorite platform is not that scary anymore. Our Kubernetes story at Percona started with Operators for MySQL and MongoDB, adding PostgreSQL later on. 

Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. It can be deployed in a virtual appliance or a Docker container. Our customers requested us to provide a way to deploy PMM in Kubernetes for a long time. We had an unofficial helm chart which was created as a PoC by Percona teams and the community (GitHub).

We are introducing the Technical Preview of the helm chart that is supported by Percona to easily deploy PMM in Kubernetes. You can find it in our helm chart repository here

Use cases

Single platform

If Kubernetes is a platform of your choice, currently you need a separate virtual machine to run Percona Monitoring and Management. No more with an introduction of a helm chart. 

As you know, Percona Operators all have integration with the PMM which enables monitoring for databases deployed on Kubernetes. Operators configure and deploy pmm-client sidecar container and register the nodes on a PMM server. Bringing PMM into Kubernetes simplifies this integration and the whole flow. Now the network traffic between pmm-client and PMM server does not leave the Kubernetes cluster at all.

All you have to do is to set the correct endpoint in a pmm section in the Custom Resource manifest. For example, for Percona Operator for MongoDB, the pmm section will look like this:

 pmm:
    enabled: true
    image: percona/pmm-client:2.28.0
    serverHost: monitoring-service

Where monitoring-service is the service created by a helm chart to expose a PMM server.

High availability

Percona Monitoring and Management has lots of moving parts: Victoria Metrics to store time-series data, ClickHouse for query analytics functionality, and PostgreSQL to keep PMM configuration. Right now all these components are a part of a single container or virtual machine, with Grafana as a frontend. To provide a zero-downtime deployment in any environment, we need to decouple these components. It is going to substantially complicate the installation and management of PMM.

What we offer instead right now are ways to automatically recover PMM in case of failure within minutes (for example leveraging the EC2 self-healing mechanism).

Kubernetes is a control plane for container orchestration that automates manual tasks for engineers. When you run software in Kubernetes it is best if you rely on its primitives to handle all the heavy lifting. This is what PMM looks like in Kubernetes:

PMM in Kubernetes

  • StatefulSet controls the Pod creation
  • There is a single Pod with a single container with all the components in it
  • This Pod is exposed through a Service that is utilized by PMM Users and pmm-clients
  • All the data is stored on a persistent volume
  • ConfigMap has various environment variable settings that can help to fine-tune PMM

In case of a node or a Pod failure, the StatefulSet is going to recover PMM Pod automatically and remount the Persistent Volume to it. The recovery time depends on the load of the cluster and node availability, but in normal operating environments, PMM Pod is up and running again within a minute.

Deploy

Let’s see how PMM can be deployed in Kubernetes.

Add the helm chart:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install PMM:

helm install pmm --set service.type="LoadBalancer" percona/pmm

You can now login into PMM using the LoadBalancer IP address and use a randomly generated password stored in a

pmm-secret

Secret object (default user is admin).

The Service object created for PMM is called

monitoring-service

:

$ kubectl get services monitoring-service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
monitoring-service   LoadBalancer   10.68.29.40   108.59.80.108   443:32591/TCP   3m34s

$ kubectl get secrets pmm-secret -o yaml
apiVersion: v1
data:
  PMM_ADMIN_PASSWORD: LE5lSTx3IytrUWBmWEhFTQ==
…

$ echo 'LE5lSTx3IytrUWBmWEhFTQ==' | base64 --decode && echo
,NeI<w#+kQ`fXHEM

Login to PMM by connecting to HTTPS://<YOUR_PUBLIC_IP>.

Customization

Helm chart is a template engine for YAML manifests and it allows users to customize the deployment. You can see various parameters that you can set to fine-tune your PMM installation in our README

For example, to set choose another storage class and set the desired storage size, set the following flags:

helm install pmm percona/pmm \
--set storage.storageClassName="premium-rwo" \
-–set storage.size=”20Gi”

You can also change these parameters in values.yaml and use “-f” flag:

# values.yaml contents
storage:
  storageClassName: “premium-rwo”
  size: 20Gi


helm install pmm percona/pmm -f values.yaml

Maintenance

For most of the maintenance tasks, regular Kubernetes techniques would apply. Let’s review a couple of examples.

Compute scaling

It is possible to vertically scale PMM by adding or removing resources through

pmmResources

variable in values.yaml. 

pmmResources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

Once done, upgrade the deployment:

helm upgrade -f values.yaml pmm percona/pmm

This will restart a PMM Pod, so better plan it carefully not to disrupt your team’s work.

Storage scaling

This depends a lot on your storage interface capabilities and the underlying implementation. In most clouds, Persistent Volumes can be expanded. You can check if your storage class supports it by describing it:

kubectl describe storageclass standard
…
AllowVolumeExpansion:  True

Unfortunately, just changing the size of the storage in values.yaml (storage.size) will not do the trick and you will see the following error:

helm upgrade -f values.yaml pmm percona/pmm
Error: UPGRADE FAILED: cannot patch "pmm" with kind StatefulSet: StatefulSet.apps "pmm" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We use the StatefulSet object to deploy PMM, and StatefulSets are mostly immutable and there are a handful of things that can be changed on the fly. There is a trick though.

First, delete the StatefulSet, but keep the Pods and PVCs:

kubectl delete sts pmm --cascade=orphan

Recreate it again with the new storage configuration:

helm upgrade -f values.yaml pmm percona/pmm

It will recreate the StatefulSet with the new storage size configuration. 

Edit Persistent Volume Claim manually and change the storage size (the name of the PVC can be different for you). You need to change the storage in spec.resources.requests.storage section:

kubectl edit pvc pmm-storage-pmm-0

The PVC is not resized yet and you can see the following message when you describe it:

kubectl describe pvc pmm-storage-pmm-0
…
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 16 Jun 2022 11:55:56 +0300           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

The last step would be to restart the Pod:

kubectl delete pod pmm-0

Upgrade

Running helm upgrade is a recommended way. Either once a new helm chart is released or when you want to upgrade the newer version of PMM by replacing the image in the image section. 

Backup and restore

PMM stores all the data on a Persistent Volume. As said before, regular Kubernetes techniques can be applied here to backup and restore the data. There are numerous options:

  • Volume Snapshots – check if it is supported by your CSI and storage implementation
  • Third-party tools, like Velero, can handle the backups and restores of PVCs
  • Snapshots provided by your storage (ex. AWS EBS Snapshots) with manual mapping to PVC during restoration

What is coming

To keep you excited there are numerous things that we are working on or have planned to enable further Kubernetes integrations.

OpenShift support

We are working on building a rootless container so that OpenShift users can run Percona Monitoring and Management there without having to grant elevated privileges. 

Microservices architecture

This is something that we have been discussing internally for some time now. As mentioned earlier, there are lots of components in PMM. To enable proper horizontal scaling, we need to break down our monolith container and run these components as separate microservices. 

Managing all these separate containers and Pods (if we talk about Kubernetes), would require coming up with separate maintenance strategies. This brings up the question of creating a separate Operator for PMM only to manage all this, but it is a significant effort. If you have an opinion here – please let us know on our community forum.

Automated k8s registration in DBaaS

As you know, Percona Monitoring and Management comes with a technical preview of  Database as a Service (DBaaS). Right now when you install PMM on a Kubernetes cluster, you still need to register the cluster to deploy databases. We want to automate this process so that after the installation you can start deploying and managing databases right away.

Conclusion

Percona Monitoring and Management enables database administrators and site reliability engineers to pinpoint issues in their open source database environments, whether it is a quick look through the dashboards or a detailed analysis with Query Analytics. Percona’s support for PMM on Kubernetes is a response to the needs of our customers who are transforming their infrastructure.

Some useful links that would help you to deploy PMM on Kubernetes:

May
24
2022
--

Store and Manage Logs of Percona Operator Pods with PMM and Grafana Loki

Store and Manage Logs of Percona Operator Pods with PMM and Grafana Loki

While it is convenient to view the log of MySQL or MongoDB pods with

kubectl logs

, sometimes the log is purged when the pod is deleted, which makes searching historical logs a bit difficult. Grafana Loki, an aggregation logging tool from Grafana, can be installed in the existing Kubernetes environment to help store historical logs and build a comprehensive logging stack.

What Is Grafana Loki

Grafana Loki is a set of tools that can be integrated to provide a comprehensive logging stack solution. Grafana-Loki-Promtail is the major component of this solution. The Promtail agent collects and ships the log to the Loki datastore and then visualizes logs with Grafana. While collecting the logs, Promtail can label, convert and filter the log before sending. Next step, Loki receives the logs and indexes the metadata of the log. At this step, the logs are ready to be visualized in the Grafana and the administrator can use Grafana and Loki’s query language, LogQL, to explore the logs. 

 

     Grafana Loki

Installing Grafana Loki in Kubernetes Environment

We will use the official helm chart to install Loki. The Loki stack helm chart supports the installation of various components like

promtail

,

fluentd

,

Prometheus

and

Grafana

$ helm repo add grafana https://grafana.github.io/helm-charts
$ helm repo update
$ helm search repo grafana

NAME                                CHART VERSION APP VERSION DESCRIPTION
grafana/grafana                     6.29.2       8.5.0      The leading tool for querying and visualizing t...
grafana/grafana-agent-operator      0.1.11       0.24.1     A Helm chart for Grafana Agent Operator
grafana/fluent-bit                  2.3.1        v2.1.0     Uses fluent-bit Loki go plugin for gathering lo...
grafana/loki                        2.11.1       v2.5.0     Loki: like Prometheus, but for logs.
grafana/loki-canary                 0.8.0        2.5.0      Helm chart for Grafana Loki Canary
grafana/loki-distributed            0.48.3       2.5.0      Helm chart for Grafana Loki in microservices mode
grafana/loki-simple-scalable        1.0.0        2.5.0      Helm chart for Grafana Loki in simple, scalable...
grafana/loki-stack                  2.6.4        v2.4.2     Loki: like Prometheus, but for logs.

For a quick introduction, we will install only

loki-stack

and

promtail

. We will use the Grafana pod, that is deployed by Percona Monitoring and Management to visualize the log. 

$ helm install loki-stack grafana/loki-stack --create-namespace --namespace loki-stack --set promtail.enabled=true,loki.persistence.enabled=true,loki.persistence.size=5Gi

Let’s see what has been installed:

$ kubectl get all -n loki-stack
NAME                            READY   STATUS    RESTARTS   AGE
pod/loki-stack-promtail-xqsnl   1/1     Running   0          85s
pod/loki-stack-promtail-lt7pd   1/1     Running   0          85s
pod/loki-stack-promtail-fch2x   1/1     Running   0          85s
pod/loki-stack-promtail-94rcp   1/1     Running   0          85s
pod/loki-stack-0                1/1     Running   0          85s

NAME                          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/loki-stack-headless   ClusterIP   None           <none>        3100/TCP   85s
service/loki-stack            ClusterIP   10.43.24.113   <none>        3100/TCP   85s

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/loki-stack-promtail   4         4         4       4            4           <none>          85s

NAME                          READY   AGE
statefulset.apps/loki-stack   1/1     85s

Promtail

and

loki-stack

pods have been created, together with a service that

loki-stack

will use to publish the logs to Grafana for visualization.

You can see the promtail pods are deployed by a

daemonset

 which spawns a promtail pod in every node. This is to make sure the logs from every pod in all the nodes are collected and shipped to the

loki-stack

  pod for centralizing storing and management.

$ kubectl get pods -n loki-stack -o wide
NAME                        READY   STATUS    RESTARTS   AGE    IP           NODE                 NOMINATED NODE   READINESS GATES
loki-stack-promtail-xqsnl   1/1     Running   0          2m7s   10.42.0.17   phong-dinh-default   <none>           <none>
loki-stack-promtail-lt7pd   1/1     Running   0          2m7s   10.42.1.12   phong-dinh-node1     <none>           <none>
loki-stack-promtail-fch2x   1/1     Running   0          2m7s   10.42.2.13   phong-dinh-node2     <none>           <none>
loki-stack-promtail-94rcp   1/1     Running   0          2m7s   10.42.3.11   phong-dinh-node3     <none>           <none>
loki-stack-0                1/1     Running   0          2m7s   10.42.0.19   phong-dinh-default   <none>           <none>

Integrating Loki With PMM Grafana

Next, we will add Loki as a data source of PMM Grafana, so we can use PMM Grafana to visualize the logs. You can do it from the GUI or with kubectl CLI. 

Below is the step to add data source from PMM GUI:

Navigate to

https://<PMM-IP-addres>:9443/graph/datasources

  then select

Add data source

 

Integrating Loki With PMM Grafana

Then Select Loki

Next, in the Settings, in the HTTP URL box, enter the DNS records of the

loki-stack

  service 

Loki settings

You can also use the below command to add the data source. Make sure to specify the name of PMM pod, in this command,

monitoring-0

  is the PMM pod. 

$ kubectl -n default exec -it monitoring-0 -- bash -c "curl 'http://admin:verysecretpassword@127.0.0.1:3000/api/datasources' -X POST -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{ \"orgId\": 1, \"name\": \"Loki\", \"type\": \"loki\", \"typeLogoUrl\": \"\", \"access\": \"proxy\", \"url\": \"http://loki-stack.loki-stack.svc.cluster.local:3100\", \"password\": \"\", \"user\": \"\", \"database\": \"\", \"basicAuth\": false, \"basicAuthUser\": \"\", \"basicAuthPassword\": \"\", \"withCredentials\": false, \"isDefault\": false, \"jsonData\": {}, \"secureJsonFields\": {}, \"version\": 1, \"readOnly\": false }'"

Exploring the Logs in PMM Grafana

Now, you can explore the pod logs in Grafana UI, navigate to Explore, and select Loki in the dropdown list as the source of metrics:

Exploring the Logs in PMM Grafana

In the Log Browser, you can select the appropriate labels to form your first LogQL query, for example, I select the following attributes

 

{app="percona-xtradb-cluster", component="pxc", container="logs",pod="cluster1-pxc-1"} 

 

Click Show logs  and you can see all the logs of

cluster-pxc-1

pod

You can perform a simple filter with |= “message-content”. For example, filtering all the messages related to State transfer  by

{app="percona-xtradb-cluster", component="pxc", container="logs",pod="cluster1-pxc-1"} |= "State transfer"

Conclusion

Deploying Grafana Loki in the Kubernetes environment is feasible and straightforward. Grafana Loki can be integrated easily with Percona Monitoring and Management to provide both centralized logging and comprehensive monitoring when running Percona XtraDB Cluster and Percona Server for MongoDB in Kubernetes.

It would be interesting to know if you have any thoughts while reading this, please share your comments and thoughts.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com