Mar
22
2023
--

Deploy Percona Monitoring and Management on Amazon EKS With eksctl and Helm

Deploy Percona Monitoring and Management on Amazon EKS

One of the installation methods we support for our database software is through Helm. We have a collection of Helm charts, in this repository, for the following Percona software:

Through this blog post, you will learn how to install Percona Monitoring and Management in a Kubernetes cluster on Amazon EKS using Helm and eksctl.

Requirements

For the installation of PMM with Helm, you will need:

  • Kubernetes 1.22+
  • Helm 3.2.0+
  • PV (Persistent Volume) provisioner support in the underlying infrastructure

If you want to install PMM in a Kubernetes cluster on Amazon EKS, you can use eksctl to create the required cluster. I published this blog post Creating a Kubernetes cluster on Amazon EKS with eksctl, in the Percona Community blog, where I explain how to use this tool.

For an easy way to deploy the Kubernetes cluster, check Percona My Database as a Service (MyDBaaS), a Percona Labs project. I also recommend checking this article Percona Labs Presents: Infrastructure Generator for Percona Database as a Service (DBaaS) on our blog, where the process of creating the cluster is described.

MyDBaaS will help you with cluster creation. It will generate the configuration file needed for eksctl, or it can deploy the cluster to AWS.

To use eksctl you must:

  • Install kubectl
  • Create a user with minimum IAM policies
  • Create an access key ID and secret access key for the user previously created
  • Install AWS CLI and configure authentication (
    aws configure

    , from the command line)

  • Install eksctl

Create a Kubernetes cluster

To create the cluster, you need to generate the configuration file for eksctl. Go to https://mydbaas.labs.percona.com/ and fill out the details of your cluster.

MyDBaaS - Kubernetes Cluster Configuration

Figure 1: MyDBaaS – Kubernetes Cluster Configuration

  1. Give your cluster a name. MyDbaas is the default value.
  2. Select the number of nodes. The value selected by default is three but you can create a cluster of up to five nodes.
  3. Write the instance type.
  4. Select the region. The default is us-west-2.

If you don’t know what instance type to use, go to the Instance Type Selector and select:

  • Number of CPUs
  • Memory size
  • Region
Instance Type Selector

Figure 2: MyDBaaS – Instance Type Selector

As stated on the website, this tool will only return the configuration file needed for eksctl. You can also provide your AWS credentials for the tool to deploy the EKS cluster.

After filling out the details, click on Submit and you will get the configuration file that will look like this:

addons:
- name: aws-ebs-csi-driver
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: MyDbaas
  region: us-east-1
nodeGroups:
- desiredCapacity: 2
  iam:
    withAddonPolicies:
      ebs: true
      efs: true
      fsx: true
  instanceType: m5.large
  maxSize: 5
  minSize: 1
  name: ng-1
  preBootstrapCommands:
  - echo 'OPTIONS="--default-ulimit nofile=1048576:1048576"' >> /etc/sysconfig/docker
  - systemctl restart docker
  volumeSize: 100
  volumeType: gp2

Then, create the cluster by running the following command:

eksctl create cluster -f cluster.yaml

While running, eksctl will create the cluster and all the necessary resources. It will take a few minutes to complete.

eksctl Running

Figure 3: eksctl Running

Cluster credentials can be found in

~/.kube/config

. Try

kubectl get nodes

to verify that this file is valid, as suggested by eksctl.

Install PMM with Helm

Once the cluster has been created and configured, you can install PMM using Helm.

  1. Add the repository
helm repo add percona https://percona.github.io/percona-helm-charts/

  1. Install PMM
helm install pmm --set service.type="LoadBalancer" percona/pmm

Once the PMM server runs, you must get its IP address. Run this command to get this value.

kubectl get services monitoring-service

You will get an output similar to the following.

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)                      AGE
monitoring-service   LoadBalancer   10.100.17.121   a57d50410ca2f4c9d9b029da8f44f73f-254463297.us-east-1.elb.amazonaws.com   443:30252/TCP,80:31201/TCP   100s

a57d50410ca2f4c9d9b029da8f44f73f-254463297.us-east-1.elb.amazonaws.com

 is the external IP and the one you need to access PMM from the browser.

Before accessing the dashboard of PMM, get the password.

export ADMIN_PASS=$(kubectl get secret pmm-secret --namespace default -o jsonpath='{.data.PMM_ADMIN_PASSWORD}' | base64 --decode)
echo $ADMIN_PASS

The value of

$ADMIN_PASS

  is the password you need to log into the dashboard. The default user is admin.

Go to the browser and paste the external IP in the address bar.

PMM running on Kubernetes

Figure 4: PMM running on Kubernetes

Now you have PMM running in the cloud on Amazon EKS.

I recommend you check this article Percona Monitoring and Management in Kubernetes is now in Tech Preview on our blog for more information about PMM in Kubernetes using Helm.

Conclusion

Through this blog post, you learned how to create a Kubernetes cluster with eksctl and deploy PMM using Helm with the help of Percona MyDBaaS. The process would be the same for any of the Percona software in the collection of Helm charts.

Learn more about Percona MyDBaaS!

Mar
17
2023
--

Percona Monitoring and Management 2 Scaling and Capacity Planning


2022 was an exciting year for Percona Monitoring and Management (PMM). We’ve added and improved many features, including Alerting and Backup Management. These updates are designed to keep databases running at peak performance and simplify database operations. But as companies grow and see more demand for their databases, we need to ensure that PMM also remains scalable so you don’t need to worry about its performance while tending to the rest of your environment.

PMM2 uses VictoriaMetrics (VM) as its metrics storage engine. Percona’s co-Founder Peter Zaitsev wrote a detailed post about migration from Prometheus to VictoriaMetrics, One of the most significant differences in terms of performance of PMM2 comes with the usage for VM, which can also be derived from performance comparison on node_exporter metrics between Prometheus and VictoriaMetrics.

Planning for resources of a PMM Server host instance can be tricky because the numbers can change depending on the DB instances being monitored by PMM. For example, a higher number of data samples ingested per second or a monitoring database with a huge number of tables (1000+) can affect performance; similarly, the configuration for exporters or a custom metric resolution can also have an impact on the performance of a PMM server host. The point is that scaling up PMM isn’t linear, and this post is only meant to give you a general idea and serve our users as a good starting point when planning to set up PMM2.

The VictoriaMetrics team has also published some best practices, which can also be referred to while planning for resources for setting up PMM2.

Home Dashboard for PMM2
We have tested PMM version 2.33.0 with its default configuration, and it can monitor more than 1,000 MySQL services, with the databases running with the default sysbench Read-Write workload.  We observed that the overall performance of PMM monitoring 1,000 database services was good, and no significant resource usage spikes were observed; this is a HUGE increase in performance and capacity over previous versions!  Please note that the focus of these tests was around standard metrics gathering and display, we’ll use a future blog post to benchmark some of the more intensive query analytics (QAN) performance numbers.

Capacity planning and setup details

We used a dedicated 32-core CPU and 64GB of RAM for our testing.

CPU Usage for PMM Server Host System

The CPU usage averaged 24% utilization, as you can see in the above picture.

 

Memory Utilization for PMM Server Host System

Virtual Memory utilization was averaging 48 GB of RAM.

VictoriaMetrics maintains an in-memory cache for mapping active time series into internal series IDs. The cache size depends on the available memory for VictoriaMetrics in the host system; hence planning for enough RAM on the host system is important for better performance to avoid having a high percentage of slow inserts.

If we talk about overall disk usage for the instance monitoring 1,000 Database services, the average disk usage per datapoint comes out to be roughly .25 Bytes, or you should plan storage roughly between 500 GB – one TB for a default 30 day retention period.

Average Datapoints Size

Average datapoints size

Stats on Victoria Metrics

We recommended having at least two GB RAM and a two-core system for PMM Server as a bare minimum requirement to set up monitoring your database services. With this minimum recommended setup, you can monitor up to three databases comfortably, possibly more, depending on some of your environment’s already mentioned factors.

Based on our observations and various setups we have done with PMM, overall, with a reasonably powerful pmm-server host system (eight+ GB RAM and eight+ cores), it is the most optimum to target monitoring 32 databases per core or 16 databases per one GB RAM, hence keeping this in mind is really helpful while planning resources for your respective monitoring setups.

Number of Database Services Monitored Min Recommend Requirement
0-250 services 8 Core, 16 GB RAM
250-500 services 16 Core, 32 GB RAM
500-1000 services 32 Core, 64 GB RAM

PMM scalability improved dramatically through UX and performance research

In earlier versions of PMM2, the Home Dashboard could not load with more than 400 DB services, resulting in a frustrating experience for users. Interacting with UI elements such as filters and date pickers was previously impossible. We conducted thorough research to improve scalability and the user experience on our Home Dashboard for 1,000 database services. Our findings revealed that the design of the Home Dashboard heavily impacted scalability and poor UX on the UI resulting in unresponsive pages.

We redesigned the Home Dashboard as a solution, and the results were significant. The new dashboard provides a much better user experience with more critical information being displayed and scalability for environments up to 1000 DB services. The overall load time has improved dramatically, going from 50+ seconds to roughly 20 seconds, and there are no longer any unresponsive errors on the UI. Users can now interact with filters on other dashboards seamlessly as well!

There are still some limitations we’re working on addressing

  • Instance Overview Dashboards, which are shipped with PMM, do not work well with such a large number of instances, so it is recommended not to rely on them when such a high number of databases are being monitored. They would work well only with a maximum of 400 database services.
  • There is a known issue around the request “URI too Large” pop-up message that is visible because of some large query requests, this also leads to an issue with setting a big time range for observing metrics from the monitored Database.  Our team is planning to implement a fix for this soon.
  • QAN takes 50+ seconds to load up when 400+ database services are monitored. Also, the overall interaction with QAN feels laggy when searching and applying filters across a big list of services/nodes. Our team is working on improving the overall user experience of the QAN App, which will soon be fixed in future releases of PMM.

Not a formula but a rule of thumb

Overall resource usage in PMM depends on the configuration and workload, and it may vary for different setups so it’s difficult to say, “for monitoring this number of DB services, you need a machine of that size.” This post is meant to show how the PMM server scales up and performs with the default setup and all database host nodes configured in default metrics mode (push).

We plan to also work on another follow-up post on the performance and scalability where we would highlight results for different dashboards and QAN, showcasing the improvements we have made over the last few PMM releases.

Tell us what you think about PMM!

We’re excited about all of the improvements we’ve made, but we’re not done yet! Have some thoughts about how we can improve PMM, or want to ask questions? Come talk to us in the Percona Forums, and let us know what you’re thinking!

PMM Forum

Feb
24
2023
--

PMM Access Control: A Comprehensive Guide With Use Cases and Examples

Why you might need granular Access Controls

As companies grow from startups to larger organizations, they establish a hierarchical structure of roles and responsibilities that changes year to year. As a result, it becomes increasingly important to protect the confidentiality of information to prevent data leaks and facilitate easy access to relevant data sources and tools for the related teams without having to sift through unrelated databases or environmental data. This is particularly important for large enterprises, where implementing least privilege access to all software used within the organization is likely required. To achieve this in a secure and efficient manner, a key requirement for any database monitoring and management software is to have robust access management features.

Label-based access control is a security mechanism that allows companies to control who can access specific metrics based on their labels.  Monitored data may contain sensitive data that needs to be protected, and label-based access control can be used to ensure that only authorized individuals can access that data.

In a nutshell, here are a few reasons why companies may need label-based access control for their monitoring solution:

  • To reduce the risk of insider threats: Insider threats are a significant concern for many companies. Label-based access control can help reduce the risk of insider threats by limiting access to sensitive data only to employees who need it.
  • To simplify access management: Label-based access control can simplify access management by allowing companies to assign labels to metrics. This makes managing access permissions easier and ensures that only authorized individuals can access specific data.

To ensure compliance: Many industries have regulatory requirements that mandate specific data handling and protection practices. Access control can help ensure that companies comply with these regulations by limiting access to data to only those employees who are authorized to view it.

Sample use cases for label-based Access Control in Percona Monitoring and Management (PMM)

Let’s say you are a member of a company selling a subscription-based streaming service that provides a wide variety of TV shows, movies and other video content to its subscribers. Your company operates a massive distributed architecture that spans multiple data centers and cloud providers, and your databases are a crucial part of this infrastructure. So, you have dozens of teams with hundreds of engineers who are responsible for ensuring high availability and reliability, including employing fault-tolerant systems, and implementing rigorous testing and monitoring procedures. Your organization may have several teams, which are:

  • Database administration, employing DBAs
  • Development Team, employing developers
  • Site reliability engineering, SREs
  • Quality Assurance, with QA Engineers
  • Monitoring Team

Let’s take a look at an example of why you might need and how you could use label-based access control (LBAC) in your authorization flow.

Once data has been ingested into a database and then into PMM’s dashboards, the organizations have a massive amount of data from different sources that is waiting for the DBAs, Developers, SREs, etc., to monitor and diagnose. +50 applications, +1000 nodes, three environments, including qa, dev, production, are all being monitored by PMM. And it is becoming hard to diagnose, isolate, and analyze any applications or environments for which a specific team may be responsible.

PMM, monitoring big scale environment

PMM, monitoring big scale environment

Vanessa is the database administrator in the company, and she would like to set up label-based access control inside PMM to make data access more secure and easy for her organization. The idea is to limit each team’s access to the data and environments that are relevant to their specific responsibilities. For example, the QA team would only have access to data from the QA environment rather than having access to the production environment.

This approach, known as least privilege access, ensures that each team only has access to the data they need to perform their duties, while limiting the risk of unauthorized access to sensitive information. By implementing this solution, she can protect its confidential data and make it easier for teams to find the information they need without being bogged down by irrelevant data.

Structuring the authorization goals

Vanessa has an Admin role in PMM with all privileges and can create users and roles and grant them privileges. 

As a first task, Vanessa plans an access control roll-out strategy. She starts with answering these questions:

  • Why do I need granular access control instead of basic roles with viewer, editor, admin access?
  • What options regarding access control are currently available in PMM? 
  • How do I structure permissions to make them easy to manage?
  • What are the needs of the specific teams who need to  access metrics?
  • Which approach should I use when assigning roles? Should I use the Grafana UI, provisioning, or the API?

Considering needs, Vanessa decides to use both basic roles and Access Roles, which provides label-based permissions. She then created the table below as an Access Control implementation schema. 

Note: You can take advantage of your current authentication provider to manage user and team permissions for basic roles. Access Roles doesn’t provide user management with authentication provider yet. It’s currently part of our roadmap, so please stay tuned for upcoming releases of PMM and keep an eye on the release notes.

Team/User Role name Description Labels*  Privilege 
DBA Team-1 (MySQL) role_dba_mysql Read Privilege MySQL database metrics of all apps on the prod and qa environment  environment~=”(dev|prod)”, service=”mysql” Mysql,

Prod and dev,

All apps

DBA Team-2 (MongoDB) role_dba_mongodb Read Privilege to  MongoDB database metrics of all apps on the prod and dev environment  environment=”dev, prod”, service=”mongodb” MongoDB,

Prod and dev,

All apps

Dev Team-1 (App A) role_dev_appA Read Privilege to database metrics of App A on the prod and dev environment  environment=”dev, prod”, app=”A” Prod and dev,,

App A

Dev Team-2 (App B) role_dev_appB Read Privilege to database metrics of App B on the prod and dev environment  environment=”dev, prod”, app=”B” Prod and dev,

App B

QA Team role_qa Read Privilege to database metrics of the qa environment  environment=”qa” QA

All apps

Monitoring Team role_monitoring Read Privilege to database metrics of the prod environment 

environment=”prod”

Prod

All apps

 

* Labels must be in key=”value” format, both the key and value must begin with a letter, and they can only contain letters, numbers, hyphens or dashes. Please check the labels for access control page on the product documentation.

Note: Note: Team-level role assignment is currently part of our roadmap, so please keep an eye out for future releases of PMM. If you require the team-level role assignment feature, please don’t hesitate to reach out to us on the forum and let us know.

Before label implementation

A privilege comes with a label and READ permission assigned to a role. So, a role can be granted the READ permission based on labels. 

Enable Access Control

Although Vanessa, a DBA, prefers to enable and configure Access Control via the UI, it is possible to enable it when deploying PMM Server via Docker. Read more on this configuration on our product documentation page. To configure access control from the PMM UI, she does the following:

  • Navigate to Configuration / Settings on the main menu
  • Click Advanced Settings and scroll down to end of the page
  • Click toggle to enable Access Control
  • Click “Apply changes” button

enable access control

After Vanessa enables Access Control, the Access Roles page will automatically be displayed on the main menu.  

Enabled Access Roles in PMM

Enabled Access Roles in PMM

Note that when Access Control is enabled, Full Access is automatically assigned as the default role to all users, granting them access to all metrics. When users log in to PMM for the first time without a designated role, they will be automatically assigned the “Default Role.” For administrators, the default role provides a convenient way to configure default permissions for new users.

Configuration / Access Roles

Configuration / Access Roles

It’s possible to change the default role from UI or pmm-admin. Visit the feature documentation to see how you can change the default role.

Create an access role

Note: To create roles, you must have admin privileges. For more information, see this related page. 

Follow the following steps:

  • Access Control, display on main menu
  • Click “Create” button
  • Enter Role name, Description
  • Add service labels to which this role will have read access to
  • Click “Create Role” button on the top-right corner to save the role or Click “Cancel”  button to exit without saving the role
Create a role in Access Control

Create a role in Access Control

The following label matching operators exists:

  • =: Select labels that are exactly equal to the provided string.
  • !=: Select labels that are not equal to the provided string.
  • =~: Select labels that regex-match the provided string.
  • !~: Select labels that do not regex-match the provided string.

Note: The filter uses PromQL which is a query language of Prometheus. Find more on Prometheus documentation.

Assign a role

To assign a role to users;

  • Click “Users” tab or go to Configuration / Users page to see user list 
  • Click Assign role dropdown to select one or multiple roles to assign
  • It is automatically saved
Assign a role

Assign a role

Role presents basic roles, which are Viewer, Editor and Admin:

  • Viewer – authenticated user with read-only access. They can only view information and can’t add services or do any harm to the system.
  • Editor – In addition to Viewer permission, it allows users to edit Grafana dashboards.
  • Admin – authenticated user with full control.
    • Example: PMM Admin who can add servers and perform updates.

Access roles are presented to allow Admins to create new user roles based on labels, which are designed to limit access to metrics. These roles are only available when Access Control is enabled.

Basic roles and Access roles complement each other and are independent of each other. Access roles is used to filter data based on the assigned filters to the role.

Note: Team-level role assignment is currently part of our roadmap, so please keep an eye out for future releases of PMM. If you require the team-level role assignment feature, please don’t hesitate to reach out to us on the forum and let us know.

Community Forum

Update a role

To edit Access Roles,  follow these steps:

  • From the Main menu, navigate to  Configuration ? Access Roles. The Access Roles tab opens.
  • On the role you want to edit, click the ellipsis (three vertical dots) > edit role in the Options column. The Edit role page opens.
  • Edit the role and click “Save Changes” on the top right corner.

For more information, visit the Manage Access Control documentation.

Update the role

Update the role

Delete a role

Note: You can only remove a role not assigned to any user. To remove any role, first unassign the users from this role.

To edit access roles,  follow these steps:

  • From the Main menu, navigate to  Configuration ? Access Roles. The Access Roles tab opens.
  • On the role you want to delete, click the ellipsis (three vertical dots) > Delete in the Options column. 
  • Click “Delete”

Labels FAQ

Can I add a role with full access?

Yes, you can. Simply follow the same steps as you would for role creation, but make sure to leave the metric filter field blank before saving the role.

Role with full access

Role with full access

Where can I create labels?

You can add custom or predefined labels in PMM while adding a service for monitoring in PMM. You can add predefined labels using both API and UI. To add a label using API, see API documentation. Also you can create their own custom labels and filter by them as well, check our API documentation. 

Where can I assign labels?

Admin can assign labels to services using PMM UI or pmm-admin. Metrics inherit lables from the service. Please check this page for more.

Where can labels be used?

Once a label has been created and/or has been used to filter metrics within the PMM instance, Admin can use these labels to restrict which metrics can display on dashboards and organize metrics by viewing metric data on dashboards based on labels.

Can I delete a label?

No, labels can not be deleted.

Can I rename a label?

Yes, Labels can be renamed by editing a service. Please check the feature page for more.

How does assigning a user permissions based on labels work?

Labels are additive, so you can only further restrict a user’s permissions by adding more labels. If a user has access to everything labeled environment:prod, we assume no restrictions on any other category of label. This user has less restricted permissions than another user who has access to everything with environment:prod AND app:A.

For example, if the following metrics had these set of labels:

Environments = prod, dev

Applications: App A, App B

Role Labels
Role-1 environment:prod, app:A
Role-2 environment:prod
Role-3 environment:dev, app:B

Then the following through users with Source, Admin restricted with Labels will only have access to the following metrics:

User Role with Labels Access to Metrics
Matej Role-1 environment:prod, app:A
Michal Role 2 environment:prod

Any application

Maxim Role 2, Role 3
environment:prod, any application

+ 

environment:dev, app:B

 

Can I limit access to PMM features like QAN, Alerting, and Backup?

Unfortunately, label-based access control is only the initial phase of PMM Access Control and does not currently encompass role-based access control. However, we have plans to incorporate it in 2023.

We’d like to hear your needs and feedback regarding Access Control. Feel free to book a 30-min call with our product team to share feedback.

Book a 30-min call with Percona Monitoring and Management team

Feb
24
2023
--

PMM V2.35: Label-based Access Control, General Availability of Helm Chart, and More!

PMM V2.35

Today, we are excited to announce the release of Percona Monitoring and Management (PMM) V2.35, including a tech preview of label-based access control, the general availability of Helm Chart, and a range of enhancements to our Database as a Service (DBaaS) offerings, among other improvements and features. Check out all the updates on release notes.

Follow update instructions to update your PMM instance to V2.35, or you can get started using PMM in minutes with our PMM Quickstart guide.

Key highlights of PMM 2.35:

Access Control (Tech Preview): Limit access to metrics

Disclaimer: Please note that PMM Access Control is currently in technical preview and is subject to change. Therefore, we recommend using it for testing purposes only.

PMM Dashboards offer valuable insights for monitoring database metrics and troubleshooting. However, in certain scenarios, such as private or team-specific environments, projects, or applications, access to metrics must sometimes be restricted to team members only.

Previously, you could either share all dashboard metrics with every user in the organization or restrict permissions based on the inherited permissions of the dashboard folder and assign basic roles such as viewer, editor, and admin. As a result, all users could view all metrics on the dashboards.

PMM V2.35 introduces access control, which allows you to manage metric access based on labels. You can manage which data can be viewed based on label(s) by creating access roles.

For instance, you can grant developers access to data related to the specific applications they’re responsible for in the production environment. By associating multiple labels with a role, only data from series that match your defined labels is returned. This feature ensures that you have complete control over who can access specific metrics. For more information, see Access Control documentation.

PMM Access Control

PMM Access Control

In addition to label-based access control, we plan to expand Role-based access control (RBAC), including access limitations to the features like Alerting and Backup with read-edit-delete permissions. 

We would appreciate it if you could share your feedback on access control. Kindly leave your comments and feedback on the forum.

Community Forum

An upcoming blog post will provide comprehensive information on use cases, examples and upcoming features relating to Access Control. Stay tuned!

GA: Deployment of PMM in Kubernetes with Helm Chart

Helm is a package manager for Kubernetes, somewhat like  package managers (YUM, APT, npm, Pip, Gem), but it works at the application level allowing you to deploy multiple manifests together.

We are happy to announce the General Availability of PMM deployment in Kubernetes with the Helm chart. PMM deployment via Helm chart in Kubernetes has been available as Tech Preview since PMM v2.29, but now we’re delighted to offer it to users as a fully supported feature. For more information, see Percona Helm Chart documentation. Percona Helm charts can be found in the percona-helm-charts repository on Github.

What benefits does Helm offer for deploying PMM in Kubernetes?

  • Helm provides the ability to group multiple Kubernetes manifests into a single entity known as a Helm Chart, which can be easily managed for deployments, rollbacks, and storage.
  • Additionally, Helm has built-in templating for Kubernetes manifests, eliminating the need for custom template systems to replace values within manifests, such as Docker tags.
  • Helm also supports creating Charts of Charts, which contain both templates and default values, allowing for easy deployment of an entire application and its dependencies.
  • Helm enables the creation of application catalogs or Helm repositories, similar to traditional package repositories, like npm registry, CPAN, Apache Maven Central, or Ruby Gems.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Feb
14
2023
--

Deploying Percona Monitoring and Management With Terraform

Percona Monitoring and Management With Terraform

Recently, we released Percona Monitoring and Management 2.34 (PMM) which includes upgrades for backup and Database as a Service (DBaaS) features, and we are seeking ways to simplify PMM deployment.

Previously I wrote about our Terraform provider to deploy Percona Server for MySQL — Percona Server for MySQL: Automatic Cloud Deployment with Terraform — and now we added capabilities to deploy PMM with Terraform.

You need the provider version 0.9.10 and PMM can be deployed as:

resource "percona_pmm" "pmm" {
  instance_type            = "t3.micro"                    
  key_pair_name            = "sshKey1"                     
  path_to_key_pair_storage = "/tmp/"                       
  volume_type              = "gp2"                         
}

Overall I like using Terraform as it offers the following benefits:

  1. Automation: Terraform automates the process of creating, updating, and managing infrastructure, reducing the risk of errors and increasing efficiency.
  2. Versioning: Terraform allows you to version control your infrastructure, making it easier to track changes and collaborate with others.
  3. Scalability: Terraform makes it easy to scale infrastructure up or down, so you can quickly adapt to changing needs.
  4. Reusability: Terraform provides reusable infrastructure components, so you don’t have to recreate the same infrastructure for different projects or environments.
  5. Portability: Terraform’s infrastructure as code approach makes it easy to move infrastructure between different cloud providers or data centers.
  6. Improved Collaboration: Terraform enables multiple team members to work together on infrastructure projects, improving collaboration and reducing the risk of configuration drift.

As another example, we also added PMM monitoring when deploying Percona Server for MySQL instances with Terraform:

resource "percona_ps" "ps" {
  count = 1
  instance_type            = "t3.micro" # for AWS
  key_pair_name            = "sshKey1"
  password                 = "password"
  replication_type         = "async"                          
  replication_password         = "replicaPassword"
  cluster_size             = 3
  path_to_key_pair_storage = "/tmp/"
  pmm_address              = "http://admin:XXXXXXX@18.191.19.40"    
}

The source code for our Terraform provider is here:

Percona-Lab/terraform-provider-percona: Terraform modules to deploy Percona Server and Percona XtraDB Cluster (github.com)

And you are welcome to use our prepackaged binaries from Percona-Lab/percona | Terraform Registry.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Jan
04
2023
--

Deep Dive into MySQL’s Performance Schema

MySQL Performance Schema

MySQL Performance SchemaRecently I was working with a customer wherein our focus was to carry out a performance audit of their multiple MySQL database nodes. We started looking into the stats of the performance schema. While working, the customer raised two interesting questions: how can he make complete use of the performance schema, and how can he find what he requires? I realized that it is important to understand the insights of the performance schema and how we can make effective use of it. This blog should make it easier to understand for everyone.

The performance schema is an engine in MySQL which can easily be checked whether enabled or not using SHOW ENGINES. It is entirely built upon various sets of instruments (also can be called event names) each serving different purposes.

Instruments are the main part of the performance schema. It is useful when I want to investigate a problem and its root causes. Some of the examples are listed below (but not limited to) :

1. Which IO operation is causing MySQL to slow down?
2. Which file a process/thread is mostly waiting for?
3. At which execution stage is a query taking time, or how much time will an alter command will take?
4. Which process is consuming most of the memory or how to identify the cause of memory leakage?

What is an instrument in terms of performance schema?

Instruments are a combination of different sets of components like wait, io, sql, binlog, file, etc. If we combine these components, they become a meaningful tool to help us troubleshoot different issues. For example, wait/io/file/sql/binlog is one of the instruments providing information regarding the wait and I/O details on binary log files. Instruments are being read from left and then components will be added with delimiter “/”. The more components we add to the instrument, the more complex or more specific it becomes, i.e. the more lengthy the instrument is, the more complex it goes.

You can locate all instruments available in your MySQL version under table setup_instruments. It is worth noting that every version of MySQL has a different number of instruments.

select count(1) from performance_schema.setup_instruments;

+----------+

| count(1) |

+----------+

|     1269 |

+----------+

 

For easy understanding, instruments can be divided into seven different parts as shown below. The MySQL version I am using here is 8.0.30. In earlier versions, we used to have only four, so expect to see different types of instruments in case you are using different/lower versions.

select distinct(substring_index(name,'/',1)) from performance_schema.setup_instruments;

+-------------------------------+

| (substring_index(name,'/',1)) |

+-------------------------------+

| wait                          |

| idle                          |

| stage                         |

| statement                     |

| transaction                   |

| memory                        |

| error                         |

+-------------------------------+

7 rows in set (0.01 sec)

    1. Stage – Instrument starting with ‘stage’ provides the execution stage of any query like reading data, sending data, altering table, checking query cache for queries, etc.  For example stage/sql/altering table.
    2. Wait – Instrument starting with ‘wait’ falls here. Like mutex waiting, file waiting, I/O waiting, and table waiting. Instrument for this can be wait/io/file/sql/map.
    3. Memory – Instrument starting from “memory” providing information regarding memory usage on a per-thread basis. For example memory/sql/MYSQL_BIN_LOG
    4. Statement – Instruments starting with “statement” provide information about the type of SQL, and stored procedures.
    5. Idle – provide information on socket connection and information related to a thread.
    6. Transaction – Provide information related to the transactions and have only one instrument.
    7.  Error – This single instrument provides information related to the errors generated by user activities. There are no further components attached to this instrument.

The total number of instruments for these seven components is listed below. You can identify these instruments starting with these names only. 

select distinct(substring_index(name,'/',1)) as instrument_name,count(1) from performance_schema.setup_instruments group by instrument_name;

+-----------------+----------+

| instrument_name | count(1) |

+-----------------+----------+

| wait            |      399 |

| idle            |        1 |

| stage           |      133 |

| statement       |      221 |

| transaction     |        1 |

| memory          |      513 |

| error           |        1 |

+-----------------+----------+

How to find which instrument you need

I do remember that a customer asked me since there are thousands of instruments available, how can he find out which one he requires. As I mentioned before that instruments are being read from left to right, we can find out which instrument we require and then find its respective performance.

For example – I need to observe the performance of redo logs (log files or WAL files) of my MySQL instance and need to check if threads/connections need to wait for the redo log files to be flushed before further writing and if so then how much.

select * from setup_instruments where name like '%innodb_log_file%';

+-----------------------------------------+---------+-------+------------+------------+---------------+

| NAME                                    | ENABLED | TIMED | PROPERTIES | VOLATILITY | DOCUMENTATION |

+-----------------------------------------+---------+-------+------------+------------+---------------+

| wait/synch/mutex/innodb/log_files_mutex | NO      | NO    |            |          0 | NULL          |

| wait/io/file/innodb/innodb_log_file     | YES     | YES   |            |          0 | NULL          |

+-----------------------------------------+---------+-------+------------+------------+---------------+

 

Here you see that I have two instruments for redo log files. One is for the mutex stats on the redo log files and the second is for the IO wait stats on the redo log files. 

Example two – You need to find out those operations or instruments for which you can calculate the time required i.e. how much time a bulk update will take. Below are all the instruments that help you to locate the same.

select * from setup_instruments where PROPERTIES='progress';        

+------------------------------------------------------+---------+-------+------------+------------+---------------+

| NAME                                                 | ENABLED | TIMED | PROPERTIES | VOLATILITY | DOCUMENTATION |

+------------------------------------------------------+---------+-------+------------+------------+---------------+

| stage/sql/copy to tmp table                          | YES     | YES   | progress   |          0 | NULL          |

| stage/sql/Applying batch of row changes (write)      | YES     | YES   | progress   |          0 | NULL          |

| stage/sql/Applying batch of row changes (update)     | YES     | YES   | progress   |          0 | NULL          |

| stage/sql/Applying batch of row changes (delete)     | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (end)                       | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (flush)                     | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (insert)                    | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (log apply index)           | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (log apply table)           | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (merge sort)                | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter table (read PK and internal sort) | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/alter tablespace (encryption)           | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/buffer pool load                        | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/clone (file copy)                       | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/clone (redo copy)                       | YES     | YES   | progress   |          0 | NULL          |

| stage/innodb/clone (page copy)                       | YES     | YES   | progress   |          0 | NULL          |

+------------------------------------------------------+---------+-------+------------+------------+---------------+

The above instruments are the ones for which progress can be tracked.

How to prepare these instruments to troubleshoot the performance issues

To take advantage of these instruments, they need to be enabled first to make the performance schema log-related data. In addition to logging the information of running threads, it is also possible to maintain the history of such threads (statement/stages or any particular operation). Let’s see, by default, how many instruments are enabled in the version I am using. I have not enabled any other instrument explicitly.

select count(*) from setup_instruments where ENABLED='YES';

+----------+

| count(*) |

+----------+

|      810 |

+----------+

1 row in set (0.00 sec)

The below query lists the top 30 enabled instruments for which logging will take place in the tables.

select * from performance_schema.setup_instruments where enabled='YES' limit 30;


+---------------------------------------+---------+-------+------------+------------+---------------+

| NAME                                  | ENABLED | TIMED | PROPERTIES | VOLATILITY | DOCUMENTATION |

+---------------------------------------+---------+-------+------------+------------+---------------+

| wait/io/file/sql/binlog               | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/binlog_cache         | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/binlog_index         | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/binlog_index_cache   | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/relaylog             | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/relaylog_cache       | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/relaylog_index       | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/relaylog_index_cache | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/io_cache             | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/casetest             | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/dbopt                | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/ERRMSG               | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/select_to_file       | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/file_parser          | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/FRM                  | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/load                 | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/LOAD_FILE            | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/log_event_data       | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/log_event_info       | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/misc                 | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/pid                  | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/query_log            | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/slow_log             | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/tclog                | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/trigger_name         | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/trigger              | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/init                 | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/SDI                  | YES     | YES   |            |          0 | NULL          |

| wait/io/file/sql/hash_join            | YES     | YES   |            |          0 | NULL          |

| wait/io/file/mysys/proc_meminfo       | YES     | YES   |            |          0 | NULL          |

+---------------------------------------+---------+-------+------------+------------+---------------+

As I mentioned previously, it is also possible to maintain the history of the events. For example, if you are running a load test and want to analyze the performance of queries post its completion, you need to activate the below consumers if not activated yet. 

select * from performance_schema.setup_consumers;

+----------------------------------+---------+

| NAME                             | ENABLED |

+----------------------------------+---------+

| events_stages_current            | YES     |

| events_stages_history            | YES     |

| events_stages_history_long       | YES     |

| events_statements_cpu            | YES     |

| events_statements_current        | YES     |

| events_statements_history        | YES     |

| events_statements_history_long   | YES     |

| events_transactions_current      | YES     |

| events_transactions_history      | YES     |

| events_transactions_history_long | YES     |

| events_waits_current             | YES     |

| events_waits_history             | YES     |

| events_waits_history_long        | YES     |

| global_instrumentation           | YES     |

| thread_instrumentation           | YES     |

| statements_digest                | YES     |

+----------------------------------+---------+

Note – The top 15 records in the above rows are self-explanatory, but the last one for digest means to allow the digest text for SQL statements. By digest I mean, grouping similar queries and showing their performance. This is being done by hashing algorithms.

Let’s say, you want to analyze the stages of a query that is spending most of the time in, you need to enable the respective logging using the below query.

MySQL> update performance_schema.setup_consumers set ENABLED='YES' where NAME='events_stages_current';

Query OK, 1 row affected (0.00 sec)

Rows matched: 1  Changed: 1  Warnings: 0

How to take advantage of the performance schema?

Now that we know what instruments are, how to enable them, and the amount of data we want to store in, it’s time to understand how to make use of these instruments. To make it easier to understand I have taken the output of a few instruments from my test cases as it won’t be possible to cover all as there are more than a thousand instruments.

Please note that to generate the fake load, I used sysbench (if you are not familiar with it, read about it here) to create read and write traffic using the below details : 

lua : oltp_read_write.lua

Number of tables : 1

table_Size : 100000

threads : 4/10 

rate - 10

As an example, think about a case when you want to find out where memory is getting utilized. To find out this, let’s execute the below query in the table related to the memory.

select * from memory_summary_global_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 3\G;



*************************** 1. row ***************************

                  EVENT_NAME: memory/innodb/buf_buf_pool

                 COUNT_ALLOC: 24

                  COUNT_FREE: 0

   SUM_NUMBER_OF_BYTES_ALLOC: 3292102656

    SUM_NUMBER_OF_BYTES_FREE: 0

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 24

             HIGH_COUNT_USED: 24

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 3292102656

   HIGH_NUMBER_OF_BYTES_USED: 3292102656

*************************** 2. row ***************************

                  EVENT_NAME: memory/sql/THD::main_mem_root

                 COUNT_ALLOC: 138566

                  COUNT_FREE: 138543

   SUM_NUMBER_OF_BYTES_ALLOC: 2444314336

    SUM_NUMBER_OF_BYTES_FREE: 2443662928

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 23

             HIGH_COUNT_USED: 98

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 651408

   HIGH_NUMBER_OF_BYTES_USED: 4075056

*************************** 3. row ***************************

                  EVENT_NAME: memory/sql/Filesort_buffer::sort_keys

                 COUNT_ALLOC: 58869

                  COUNT_FREE: 58868

   SUM_NUMBER_OF_BYTES_ALLOC: 2412676319

    SUM_NUMBER_OF_BYTES_FREE: 2412673879

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 1

             HIGH_COUNT_USED: 13

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 2440

   HIGH_NUMBER_OF_BYTES_USED: 491936


Above are the top three records, showing where the memory is getting mostly utilized.

Instrument memory/innodb/buf_buf_pool is related to the buffer pool which is utilizing 3 GB and we can fetch this information from SUM_NUMBER_OF_BYTES_ALLOC. Another data that is also important for us to consider is CURRENT_COUNT_USED which tells us how many blocks of data have been currently allocated and once work is done, the value of this column will be modified. Looking at the stats of this record, consumption of 3GB is not a problem since MySQL uses a buffer pool quite frequently ( for example, while writing data, loading data, modifying data, etc.). But the problem rises, when you have memory leakage issues or the buffer pool is not getting used. In such cases, this instrument is quite useful to analyze.

Looking at the second instrument memory/sql/THD::main_mem_root which is utilizing 2G, is related to the sql (that’s how we should read it from the very left). THD::main_mem_root is one of the thread classes. Let us try to understand this instrument: 

THD represent thread

main_mem_root is a class of mem_root. MEM_ROOT is a structure being used to allocate memory to threads while parsing the query, during execution plans, during execution of nested queries/sub-queries and other allocations while query execution. Now, in our case we want to check which thread/host is consuming memory so that we can further optimize the query. Before digging down further, let’s understand the 3rd instrument first which is an important instrument to look for.

memory/sql/filesort_buffer::sort_keys – As I mentioned earlier, instrument names should be read starting from left. In this case, it is related to memory allocated to sql. The next component in this instrument is filesort_buffer::sort_keys which is responsible for sorting the data (it can be a buffer in which data is stored and needs to be sorted. Various examples of this can be index creation or normal order by clause)

It’s time to dig down and analyze which connection is using this memory.   To find out this, I have used table memory_summary_by_host_by_event_name and filtered out the record coming from my application server.

select * from memory_summary_by_host_by_event_name where HOST='10.11.120.141' order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 2\G;

*************************** 1. row ***************************

                        HOST: 10.11.120.141

                  EVENT_NAME: memory/sql/THD::main_mem_root

                 COUNT_ALLOC: 73817

                  COUNT_FREE: 73810

   SUM_NUMBER_OF_BYTES_ALLOC: 1300244144

    SUM_NUMBER_OF_BYTES_FREE: 1300114784

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 7

             HIGH_COUNT_USED: 39

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 129360

   HIGH_NUMBER_OF_BYTES_USED: 667744

*************************** 2. row ***************************

                        HOST: 10.11.120.141

                  EVENT_NAME: memory/sql/Filesort_buffer::sort_keys

                 COUNT_ALLOC: 31318

                  COUNT_FREE: 31318

   SUM_NUMBER_OF_BYTES_ALLOC: 1283771072

    SUM_NUMBER_OF_BYTES_FREE: 1283771072

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 0

             HIGH_COUNT_USED: 8

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 0

   HIGH_NUMBER_OF_BYTES_USED: 327936

Event name memory/sql/THD::main_mem_root has consumed more than 1G memory ( sum ) by the host 11.11.120.141 which is my application host at the time of executing this query. Now since we know that this host is consuming memory, we can dig down further to find out the queries like nested or subquery and then try to optimize it. 

Similarly, if we see the memory allocation by filesort_buffer::sort_keys is also more than 1G (total) at the time of execution. Such instruments signal us to refer to any queries using sorting i.e. order by clause. 

Time to join all dotted lines

Let’s try to find out the culprit thread in one of the cases where most of the memory is being utilized by the file sort.  The first query helps us in finding the host and event name (instrument):

select * from memory_summary_by_host_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 1\G;

*************************** 1. row ***************************

                        HOST: 10.11.54.152

                  EVENT_NAME: memory/sql/Filesort_buffer::sort_keys

                 COUNT_ALLOC: 5617297

                  COUNT_FREE: 5617297

   SUM_NUMBER_OF_BYTES_ALLOC: 193386762784

    SUM_NUMBER_OF_BYTES_FREE: 193386762784

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 0

             HIGH_COUNT_USED: 20

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 0

   HIGH_NUMBER_OF_BYTES_USED: 819840

Ahan, this is my application host, and let’s find out which user is executing and its respective thread id.

select * from memory_summary_by_account_by_event_name where HOST='10.11.54.152' order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 1\G;

*************************** 1. row ***************************

                        USER: sbuser

                        HOST: 10.11.54.152

                  EVENT_NAME: memory/sql/Filesort_buffer::sort_keys

                 COUNT_ALLOC: 5612993

                  COUNT_FREE: 5612993

   SUM_NUMBER_OF_BYTES_ALLOC: 193239513120

    SUM_NUMBER_OF_BYTES_FREE: 193239513120

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 0

             HIGH_COUNT_USED: 20

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 0

   HIGH_NUMBER_OF_BYTES_USED: 819840




select * from memory_summary_by_thread_by_event_name where EVENT_NAME='memory/sql/Filesort_buffer::sort_keys' order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 1\G;

*************************** 1. row ***************************

                   THREAD_ID: 84

                  EVENT_NAME: memory/sql/Filesort_buffer::sort_keys

                 COUNT_ALLOC: 565645

                  COUNT_FREE: 565645

   SUM_NUMBER_OF_BYTES_ALLOC: 19475083680

    SUM_NUMBER_OF_BYTES_FREE: 19475083680

              LOW_COUNT_USED: 0

          CURRENT_COUNT_USED: 0

             HIGH_COUNT_USED: 2

    LOW_NUMBER_OF_BYTES_USED: 0

CURRENT_NUMBER_OF_BYTES_USED: 0

   HIGH_NUMBER_OF_BYTES_USED: 81984

Now, we have the complete details of the user and its thread id. Let’s see which sort of queries are being executed by this thread.

select * from events_statements_history where THREAD_ID=84 order by SORT_SCAN desc\G;

*************************** 1. row ***************************

              THREAD_ID: 84

               EVENT_ID: 48091828

           END_EVENT_ID: 48091833

             EVENT_NAME: statement/sql/select

                 SOURCE: init_net_server_extension.cc:95

            TIMER_START: 145083499054314000

              TIMER_END: 145083499243093000

             TIMER_WAIT: 188779000

              LOCK_TIME: 1000000

               SQL_TEXT: SELECT c FROM sbtest2 WHERE id BETWEEN 5744223 AND 5744322 ORDER BY c

                 DIGEST: 4f764af1c0d6e44e4666e887d454a241a09ac8c4df9d5c2479f08b00e4b9b80d

            DIGEST_TEXT: SELECT `c` FROM `sbtest2` WHERE `id` BETWEEN ? AND ? ORDER BY `c`

         CURRENT_SCHEMA: sysbench

            OBJECT_TYPE: NULL

          OBJECT_SCHEMA: NULL

            OBJECT_NAME: NULL

  OBJECT_INSTANCE_BEGIN: NULL

            MYSQL_ERRNO: 0

      RETURNED_SQLSTATE: NULL

           MESSAGE_TEXT: NULL

                 ERRORS: 0

               WARNINGS: 0

          ROWS_AFFECTED: 0

              ROWS_SENT: 14

          ROWS_EXAMINED: 28

CREATED_TMP_DISK_TABLES: 0

     CREATED_TMP_TABLES: 0

       SELECT_FULL_JOIN: 0

 SELECT_FULL_RANGE_JOIN: 0

           SELECT_RANGE: 1

     SELECT_RANGE_CHECK: 0

            SELECT_SCAN: 0

      SORT_MERGE_PASSES: 0

         SORT_RANGE: 0

              SORT_ROWS: 14

          SORT_SCAN: 1

          NO_INDEX_USED: 0

     NO_GOOD_INDEX_USED: 0

       NESTING_EVENT_ID: NULL

     NESTING_EVENT_TYPE: NULL

    NESTING_EVENT_LEVEL: 0

           STATEMENT_ID: 49021382

               CPU_TIME: 185100000

       EXECUTION_ENGINE: PRIMARY

I have pasted one record only as per rows_scan (which refers to the table scan) here but you can find similar other queries in your case and then try to optimize it either by creating an index or some other suitable solution.

Example Two 

Let’s try to find out the situation of table locking i.e. which lock i.e. read lock, write lock, etc., has been put on the user table and for what duration (displayed in pico seconds).

Lock a table with write lock :

mysql> lock tables sbtest2 write;

Query OK, 0 rows affected (0.00 sec)

mysql> show processlist;

+----+--------+---------------------+--------------------+-------------+--------+-----------------------------------------------------------------+------------------+-----------+-----------+---------------+

| Id | User   | Host                | db                 | Command     | Time   | State                                                           | Info             | Time_ms   | Rows_sent | Rows_examined |

+----+--------+---------------------+--------------------+-------------+--------+-----------------------------------------------------------------+------------------+-----------+-----------+---------------+

|  8 | repl   | 10.11.139.171:53860 | NULL               | Binlog Dump | 421999 | Source has sent all binlog to replica; waiting for more updates | NULL             | 421998368 |         0 |             0 |

|  9 | repl   | 10.11.223.98:51212  | NULL               | Binlog Dump | 421998 | Source has sent all binlog to replica; waiting for more updates | NULL             | 421998262 |         0 |             0 |

| 25 | sbuser | 10.11.54.152:38060  | sysbench           | Sleep       |  65223 |                                                                 | NULL             |  65222573 |         0 |             1 |

| 26 | sbuser | 10.11.54.152:38080  | sysbench           | Sleep       |  65222 |                                                                 | NULL             |  65222177 |         0 |             1 |

| 27 | sbuser | 10.11.54.152:38090  | sysbench           | Sleep       |  65223 |                                                                 | NULL             |  65222438 |         0 |             0 |

| 28 | sbuser | 10.11.54.152:38096  | sysbench           | Sleep       |  65223 |                                                                 | NULL             |  65222489 |         0 |             1 |

| 29 | sbuser | 10.11.54.152:38068  | sysbench           | Sleep       |  65223 |                                                                 | NULL             |  65222527 |         0 |             1 |

| 45 | root   | localhost           | performance_schema | Sleep       |   7722 |                                                                 | NULL             |   7722009 |        40 |           348 |

| 46 | root   | localhost           | performance_schema | Sleep       |   6266 |                                                                 | NULL             |   6265800 |        16 |          1269 |

| 47 | root   | localhost           | performance_schema | Sleep       |   4904 |                                                                 | NULL             |   4903622 |         0 |            23 |

| 48 | root   | localhost           | performance_schema | Sleep       |   1777 |                                                                 | NULL             |   1776860 |         0 |             0 |

| 54 | root   | localhost           | sysbench           | Sleep       |    689 |                                                                 | NULL             |    688740 |         0 |             1 |

| 58 | root   | localhost           | NULL               | Sleep       |     44 |                                                                 | NULL             |     44263 |         1 |             1 |

| 59 | root   | localhost           | sysbench           | Query       |      0 | init                                                            | show processlist |         0 |         0 |             0 |

+----+--------+---------------------+--------------------+-------------+--------+-----------------------------------------------------------------+------------------+-

Now, think of a situation wherein you are not aware of this session and you are trying to read this table and thus waiting for the meta data locks. In this situation, we need to take the help of instruments (to find out which session is locking this table) related to the lock i.e. wait/table/lock/sql/handler (table_handles is the table responsible for table lock instruments) : 

mysql> select * from table_handles where object_name='sbtest2' and OWNER_THREAD_ID is not null;

+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+

| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | OWNER_THREAD_ID | OWNER_EVENT_ID | INTERNAL_LOCK | EXTERNAL_LOCK  |

+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+

| TABLE       | sysbench      | sbtest2     |       140087472317648 |             141 |             77 | NULL          | WRITE EXTERNAL |

+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+

mysql> select * from metadata_locks;

+---------------+--------------------+------------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+

| OBJECT_TYPE   | OBJECT_SCHEMA      | OBJECT_NAME      | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE            | LOCK_DURATION | LOCK_STATUS | SOURCE            | OWNER_THREAD_ID | OWNER_EVENT_ID |

+---------------+--------------------+------------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+

| GLOBAL        | NULL               | NULL             | NULL        |       140087472151024 | INTENTION_EXCLUSIVE  | STATEMENT     | GRANTED     | sql_base.cc:5534  |             141 |             77 |

| SCHEMA        | sysbench           | NULL             | NULL        |       140087472076832 | INTENTION_EXCLUSIVE  | TRANSACTION   | GRANTED     | sql_base.cc:5521  |             141 |             77 |

| TABLE         | sysbench           | sbtest2          | NULL        |       140087471957616 | SHARED_NO_READ_WRITE | TRANSACTION   | GRANTED     | sql_parse.cc:6295 |             141 |             77 |

| BACKUP TABLES | NULL               | NULL             | NULL        |       140087472077120 | INTENTION_EXCLUSIVE  | STATEMENT     | GRANTED     | lock.cc:1259      |             141 |             77 |

| TABLESPACE    | NULL               | sysbench/sbtest2 | NULL        |       140087471954800 | INTENTION_EXCLUSIVE  | TRANSACTION   | GRANTED     | lock.cc:812       |             141 |             77 |

| TABLE         | sysbench           | sbtest2          | NULL        |       140087673437920 | SHARED_READ          | TRANSACTION   | PENDING     | sql_parse.cc:6295 |             142 |             77 |

| TABLE         | performance_schema | metadata_locks   | NULL        |       140088117153152 | SHARED_READ          | TRANSACTION   | GRANTED     | sql_parse.cc:6295 |             143 |            970 |

| TABLE         | sysbench           | sbtest1          | NULL        |       140087543861792 | SHARED_WRITE         | TRANSACTION   | GRANTED     | sql_parse.cc:6295 |             132 |            156 |

+---------------+--------------------+------------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+

From here we know that thread id 141 is holding the lock “SHARED_NO_READ_WRITE” on sbtest2 and thus we can take the corrective step i.e. either commit the session or kill it, once we realize its requirement. We need to find the respective processlist_id from the threads table to kill it.

mysql> kill 63;

Query OK, 0 rows affected (0.00 sec)

mysql> select * from table_handles where object_name='sbtest2' and OWNER_THREAD_ID is not null;

Empty set (0.00 sec)

Example Three 

In some situations, we need to find out where our MySQL server is spending most of the time waiting so that we can take further steps : 

mysql> select * from events_waits_history order by TIMER_WAIT desc limit 2\G;

*************************** 1. row ***************************

            THREAD_ID: 88

             EVENT_ID: 124481038

         END_EVENT_ID: 124481038

           EVENT_NAME: wait/io/file/sql/binlog

               SOURCE: mf_iocache.cc:1694

          TIMER_START: 356793339225677600

            TIMER_END: 420519408945931200

           TIMER_WAIT: 63726069720253600

                SPINS: NULL

        OBJECT_SCHEMA: NULL

          OBJECT_NAME: /var/lib/mysql/mysqld-bin.000009

           INDEX_NAME: NULL

          OBJECT_TYPE: FILE

OBJECT_INSTANCE_BEGIN: 140092364472192

     NESTING_EVENT_ID: 124481033

   NESTING_EVENT_TYPE: STATEMENT

            OPERATION: write

      NUMBER_OF_BYTES: 683

                FLAGS: NULL

*************************** 2. row ***************************

            THREAD_ID: 142

             EVENT_ID: 77

         END_EVENT_ID: 77

           EVENT_NAME: wait/lock/metadata/sql/mdl

               SOURCE: mdl.cc:3443

          TIMER_START: 424714091048155200

            TIMER_END: 426449252955162400

           TIMER_WAIT: 1735161907007200

                SPINS: NULL

        OBJECT_SCHEMA: sysbench

          OBJECT_NAME: sbtest2

           INDEX_NAME: NULL

          OBJECT_TYPE: TABLE

OBJECT_INSTANCE_BEGIN: 140087673437920

     NESTING_EVENT_ID: 76

   NESTING_EVENT_TYPE: STATEMENT

            OPERATION: metadata lock

      NUMBER_OF_BYTES: NULL

                FLAGS: NULL

2 rows in set (0.00 sec)

In the above example, bin log file has waited most of the time (timer_wait in pico seconds) to perform IO operations in mysqld-bin.000009. It may be because of several reasons, for example, storage is full. The next records show the details of example two I explained previously.

What else?

To make life more convenient and easy to monitor these instruments, Percona Monitoring and Management (PMM) plays an important role. For example, see the below snapshots. 

Percona Monitoring and Management

 

Percona Monitoring and Management

We can configure almost all instruments and instead of querying, we can just make use of these graphs. For getting familiar, check the PMM demo.

Obviously, knowing about performance schema helps us a lot but also enabling all of them incurs additional costs and impacts performance. Hence, in many cases, Percona Toolkit is helpful without impacting the DB performance. For example, pt-index-usage, pt-online schema change, pt-query-digest.

Some important points

  1. History table loads after a while, not instantly. Only after completion of a thread activity.
  2. Enabling all instruments may impact the performance of your MySQL as we are enabling more writes to these in-memory tables. Also, it will impose additional money on your budget. Hence enable as per requirements only.
  3. PMM contains most of the instruments and is also possible to configure more as per your requirements.
  4. You don’t need to remember the name of all the tables. You can just use PMM or use joins to create the queries. This article hashes the entire concept into smaller chunks and thus didn’t use any joins so that readers can understand it.
  5. The best method of enabling multiple instruments is in the staging environment and then optimize your findings and then moving to the productions.

Conclusion

Performance schemas are a great help while troubleshooting the behavior of your MySQL server. You need to find out which instrument you need. Should you be still struggling with the performance, please don’t hesitate to reach us and we will be more than happy to help you.

Dec
19
2022
--

PMM V2.33: Offline Metric Collection, Guided Alerting Tour, Security Fixes, and More!

latest release of Percona Monitoring and Management

latest release of Percona Monitoring and ManagementWe are excited to announce the latest release of Percona Monitoring and Management (PMM) – V2.33. This release includes several new features and improvements that make PMM even more effective and user-friendly. Some of the key highlights of PMM V2.33 include:

  • Offline metric collection during PMM server outages or loss of PMM client-server network connectivity
  • A guided tour of Alerting, which helps new users get up to speed quickly and start using the alerting features of PMM
  • Easily restore your MongoDB databases to a previous state
  • Updated Grafana to version 9.2.5 to fix critical security vulnerabilities
  • Tab completion for the pmm-admin CLI command, which makes it easier to use the command line interface to manage PMM

You can get started using PMM in minutes with our PMM Quickstart guide to check out the latest version of PMM V2.33. 

Client-side caching minimizes potential for metrics loss

This new feature ensures that the PMM Client saves the monitoring data locally when a connection to the PMM server is lost, preventing gaps in the data. When the connection is restored, the data is sent to the PMM server, allowing the monitoring of your systems to continue without any data loss.

Note:

The client node is currently limited to storing only 1 GB of offline data. So,  if your instance is down for three days and generates more than 1 GB of data during that time, all the data will not be retrieved.

One of the core principles of our open-source philosophy is transparency, and we are committed to sharing our roadmap openly and transparently with our users. We are happy to share the roadmap for the implementation of PMM high availability (HA) in three stages, which has been a highly requested feature by our users. 

PMM HA will be rolled out in three stages. Stage one, which is included in PMM 2.33.0, involves the implementation of a data loss prevention solution using VictoriaMetrics integration for short outages. This feature is now available in the latest release of PMM. Stages two and three of PMM HA will be rolled out, including additional features and enhancements to provide a complete high availability solution for PMM. We are excited to bring this much-anticipated feature to our users, and we look forward to sharing more details in the coming months.

 

Stages of PMM HA Solutions Provided
Stage one (included in PMM 2.33.0) As an initial step toward preventing data loss we have developed the following:

Offline metric collection for short outages

Stage two (will be rolled out in 2023) As part of PMM HA stage two in HA we plan to implement the following:

HA data sources

As part of stage two, we will let the users use external data sources thereby decreasing dependency on the file system.

Stage three (will be rolled out in 2023) As part of PMM HA stage three we plan to implement the following:

HA Clustered PMM Servers 

Clustered PMM will be the focus of stage three. Detailed information will be included in the upcoming release notes.

 

Please feel free to book a 1:1 meeting with us to share your thoughts, needs, and feedback about PMM HA.

Tip: To improve the availability of the PMM Server until the general availability of PMM HA, PMM administrators can deploy it on Kubernetes via the Helm chart. The Kubernetes cluster can help ensure that PMM is available and able to handle different types of failures, such as the failure of a node or the loss of network connectivity.

Critical security vulnerabilities fixed

In PMM 2.33.0, we have updated Grafana to version 9.2.5, which includes important security fixes. This upgrade addresses several critical and moderate vulnerabilities, including CVE-2022-39328, CVE-2022-39307, and CVE-2022-39306. For more details, please see the Grafana 9.2.5 release notes. We strongly recommend that all users upgrade to the latest version of PMM to ensure the security of their systems.

Guided tour on Alerting

In the 2.31.0 release of PMM, we added a new feature called Percona Alerting, which provides a streamlined alerting system. To help users get started with this new feature, we have added a short in-app tutorial that automatically pops up when you first open the Alerting page. This tutorial will guide you through the fundamentals of Percona Alerting, and help you explore the various features and options available. We hope this tutorial will make it easier for users to get started with Percona Alerting and take full advantage of its capabilities.

Restore MongoDB backups more easily

Building upon the significant improvements for MongoDB backup management introduced in the previous release, we are now simplifying the process for restoring physical MongoDB backups. Starting with this release, you can restore physical backups straight from the UI, and PMM will handle the process end-to-end. Prior to this, you would need to perform additional manual steps to restart your MongoDB database service so that your applications could make use of the restored data.

Improvements on the pmm-admin CLI command

pmm-admin is a command-line tool that is used to manage and configure PMM. It is part of the PMM Client toolset and can be used to perform various administrative tasks, such as managing inventory. We have added tab completion for the pmm-admin CLI command. This means that you no longer have to know the entire command when using pmm-admin. Instead, you can simply type the initial part of the command and press Tab, and the rest of the command will be automatically completed for you.  This new feature makes it easier to use the command line interface and ensures that you can quickly and easily access all of the powerful features of PMM. 

What’s next?

  • A Health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 

Install PMM 2.33 now or upgrade your installation to V2.33 by checking our documentation for more information about upgrading.

Thanks to Community and Perconians

At Percona, we are grateful for our supportive community and dedicated team, who work together to shape the future of PMM. If you would like to be a part of this community, you can join us on our forums to request new features, share your feedback, and ask for support. We value the input of our community and welcome all members to participate in the ongoing development of PMM.

See PMM in action now!

Nov
29
2022
--

PMM, Federated Tables, Table Stats, and Lots of Connections!

Percona Monitoring and Management Federated Tables

Percona Monitoring and Management Federated TablesEarlier in the year, I was working on an issue where one of my clients had reported a massive influx in connection on their hosts after enabling Percona Monitoring and Management (PMM). This was something I had not seen before and after researching for a couple of days I discovered that if you monitor a MySQL instance with PMM configured to collect table statistics, and if the tables that it’s gathering statistics from are Federated, it will generate a connection on the remote host for the Federated tables, one for each Federated table in the instance. Let’s go over the details and provide some examples so we can understand this a bit better.

First, I’ll offer a reminder that a Federated table is simply a table that you can put in your MySQL instance that is empty locally and uses a network connection to get the data from another MySQL host when the table is queried. For example, if I have a normal table called peter_data on host mysql1, I can set up a Federated table on mysql2 that points to mysql1. Each time that mysql2 has a query on the peter_data table, it connects to mysql1, gets the data, and then returns it locally. This feature is a lot less common now than it once was given how MySQL replication has improved over time, but as you can see here in the MySQL reference guide, it’s still supported.

So how does this impact our issue where PMM was establishing so many connections? Let’s set this up in my lab and have a look!

Lab setup

Let’s start by setting up my first host centos7-1 as the “remote host”. This is the host that has the actual data on it.

[root@centos7-1 ~]# mysql
.....
mysql> use ftest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show create table t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    2 |    2 |
|  3 |    3 |    3 |
+----+------+------+
3 rows in set (0.00 sec)

Now that’s done, I’ll set up my second host centos7-2 to act as the host that has the Federated table.

[root@centos7-2 ~]# mysql -u root -ppassword
mysql> select * from mysql.servers;
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
| Server_name | Host      | Db    | Username | Password | Port | Socket | Wrapper | Owner |
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
| fedlink     | 10.0.2.12 | ftest | root     | password | 3306 |        | mysql   |       |
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
1 row in set (0.00 sec)

mysql> use ftest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1 CONNECTION='fedlink/t1'
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    2 |    2 |
|  3 |    3 |    3 |
+----+------+------+
3 rows in set (0.01 sec)

Recreating the issue

Now that we have our Federated table set up. Let’s test and see how querying table metadata on centos7-2, the instance with the Federated table, impacts connections on centos7-1, the remote host. What I did was connect to centos7-2, query the information_schema.tables table much in the same way that PMM does, disconnected, and then connected a second time running the same query.

[root@centos7-2 ~]# mysql -u root -ppassword
...
mysql> select * from information_schema.tables where table_schema = 'ftest';
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE    | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME         | CHECK_TIME | TABLE_COLLATION   | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| def           | ftest        | t1         | BASE TABLE | FEDERATED |      10 | Fixed      |          3 |           5461 |       16383 |               0 |            0 |         0 |           NULL | NULL        | 1969-12-31 19:33:42 | NULL       | latin1_swedish_ci |     NULL |                |               |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> exit
Bye

[root@centos7-2 ~]# mysql -u root -ppassword
....
mysql> select * from information_schema.tables where table_schema = 'ftest';
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE    | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME         | CHECK_TIME | TABLE_COLLATION   | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| def           | ftest        | t1         | BASE TABLE | FEDERATED |      10 | Fixed      |          3 |           5461 |       16383 |               0 |            0 |         0 |           NULL | NULL        | 1969-12-31 19:33:42 | NULL       | latin1_swedish_ci |     NULL |                |               |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> exit
Bye

As you can see below, this resulted in two connections on centos7-1 that did not drop despite disconnecting and reconnecting on centos7-2.

mysql> show processlist;
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
| Id | User | Host            | db    | Command | Time | State    | Info             | Rows_sent | Rows_examined |
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
| 23 | root | localhost       | NULL  | Query   |    0 | starting | show processlist |         0 |             0 |
| 25 | root | 10.0.2.13:33232 | ftest | Sleep   |  112 |          | NULL             |         1 |             1 |
| 27 | root | 10.0.2.13:33236 | ftest | Sleep   |   71 |          | NULL             |         1 |             1 |
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
3 rows in set (0.00 sec)

This doesn’t sound like that big of a deal, especially considering that PMM usually remains connected to the host. So if you have one Federated table in your system and if PMM is monitoring table stats, it will only add one connection on the remote host right? That’s true, but in my lab, I expanded this to create 145 Federated tables, and the result of this despite only querying the information_schema.tables table, 145 connections were created on centos7-1.

[root@centos7-2 ~]# mysql -u root -ppassword
...
mysql> select * from information_schema.tables where table_schema = 'ftest';
....
145 rows in set (0.08 sec)

[root@centos7-1 ~]# mysql -u root -ppassword
mysql> select * from information_schema.processlist where substring_index(host,':',1) = '10.0.2.13' and user = 'root';
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
| ID   | USER | HOST             | DB    | COMMAND | TIME | STATE | INFO | TIME_MS | ROWS_SENT | ROWS_EXAMINED |
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
| 2120 | root  | 10.0.2.13:60728 | ftest | Sleep   |    7 |       | NULL |    6477 |         1 |             1 |
| 2106 | root  | 10.0.2.13:60700 | ftest | Sleep   |    7 |       | NULL |    6701 |         1 |             1 |
....
| 2117 | root  | 10.0.2.13:60722 | ftest | Sleep   |    7 |       | NULL |    6528 |         1 |             1 |
| 2118 | root  | 10.0.2.13:60724 | ftest | Sleep   |    7 |       | NULL |    6512 |         1 |             1 |
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
145 rows in set (0.00 sec)

This can be a big problem if you have a host that doesn’t support a lot of connections and you need those connections to be available for your app!

Conclusion

Based on the lab testing above, we can see how PMM queries against the information_schema.tables table can cause issues with a lot of connections being created on a Federated remote host. This probably will not be a problem for most MySQL users considering that Federated tables aren’t that common, but if you have Federated tables and if you’re considering adding PMM monitoring, or any other monitoring that collects table statistics, be warned! The maintenance of Federated connections on a remote host is not a bug, this is how it’s supposed to behave.

If you have Federated tables and if you want to avoid this problem, you can ensure that you use the flag –disable-tablestats when adding MySQL to your local PMM client using the “pmm-admin add mysql” command.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Nov
15
2022
--

PMM v2.32: Backup Management for MongoDB in GA, New Home Dashboard, and More!

Percona Monitoring and Management v2.32

Percona Monitoring and Management v2.32We are pleased to announce the general availability of the Backup Management for MongoDB and other improvements in Percona Monitoring and Management (PMM) v.2.32 that has been released in November 2022. Details are in this blog and also in the PMM 2.32 Release Notes.

PMM is now on the scene with a new Home Dashboard where you can quickly and easily check your databases’ health at one glance and detect anomalies. While there’s no one-size-fits-all approach, we created and released the new Home Dashboard to make it more user-friendly, even for users new to PMM.

You can get started using PMM in minutes with PMM Demo to check out the latest version of PMM V2.32.

Let’s have a look at the highlights of PMM 2.32:

General availability of Backup Management for MongoDB

The Backup Management for MongoDB in PMM has reached General Availability and is no longer in Technical Preview.

Supported setups

MongoDB Backup Management now supports replica set setups for the following actions:

  • Create logical snapshot backups
  • Create logical Point In Time Recovery (PITR) backups
  • Create physical snapshot backups. This is available only with Percona Server for MongoDB
  • Restore logical snapshot backups.
  • Restore physical backups. This requires additional manual operations after the restore and is only available with Percona Server for MongoDB.
  • Restore logical PITR backups from S3

Current limitations

  • Restoring logical PITR backups only supports S3 storage type
  • Restoring physical backups requires manual post-restore actions
  • Restoring a MongoDB backup on a new cluster is not yet supported
  • Restoring physical backups for containerized MongoDB setups is not supported
  • Local storage for MySQL is not supported
  • Sharded cluster setups are not supported
  • Backups that are stored locally cannot be removed automatically
  • Retention is not supported for PITR artifacts

 

Quicker and easier database health overview with the new Home Dashboard

As mentioned and promised in previous release notes, we were investigating better approaches, methods, and user-friendly presentation of database health in the Home Dashboard, which is also the entry point to PMM. Finally, we are proud to release this finalized dashboard as the new Home Dashboard. Thank you for your feedback and collaboration during all iterations of the experimental versions.

Monitor hundreds of nodes without any performance issues

If you have hundreds of nodes being monitored with the same PMM instance, the original dashboard may have taken a long time to load, which could have resulted in an unresponsive page, due to the design of the original Home Dashboard with repeating panels for each node. With performance issues in mind, we re-designed the Home Dashboard with new logic to show what is wrong or what is OK with your databases, instead of showing all metrics, for each node.  

PMM Home Dashboard_home_select multiple nodes

PMM Home Dashboard_home_select multiple nodes

Anomaly detection

Many of you probably use dozens of tools for different purposes in your daily work, meetings, and projects. These tools should make your life easier, not more intensive. With monitoring tools, the issue of too many metrics can be daunting— so analyzing data, and detecting anomalies that deviate from a database’s normal behavior should be easy and fast. Functional Anomaly Detection panels, as opposed to separate graphs for each node, are a much better way to visualize and recognize problems with your databases that may require action to be taken.

  • You can click the node name on the panel to see the Node Overview Dashboard of the related node if you see any anomaly. So you can see all metrics of the Node that you need to diagnose the problems.
  • All panels except Advisor Checks can be filtered by node and environment variables
  • Graphs in the Anomaly Detection row show the data for the top 20 nodes. e.g., CPU anomalies in the top 20
Anomaly Detection

PMM Anomaly Detection panels

Command Center panels

The primary motivation behind the new Home Dashboard is simplicity. It was always hard to balance presenting the required metrics for everyone and at the same time, making it clean, functional, and simple while working on the new design. So we decided to use Command Center panels which are collapsed by default. If you see any anomaly in Memory Usage with more than 90%, how do you know when it happened or started? Time-series graphs for the Top 20 in the Command Center panels will help you see when the anomalies occurred: in the last 1 hour or the last week? 

PMM Command Center Panels

PMM Command Center Panels on Home Dashboard

Enhanced main menu

We returned with two improvements we previously promised. These improvements were announced in V2.32 for easier access to dashboards from the Main Menu. After the last changes, with each possible monitored services type represented on the Main Menu as icons, the menu became crowded and extended with all icons representing different service types. In the latest version,  you’ll only see the icons of currently monitored services on the Main Menu. For example, if you’re monitoring MongoDB, you will see the MongoDB Dashboard’s icon on the main menu, as opposed to the previous versions, which showed all database types PMM is capable of monitoring, whether you had them in your system or not. When and if you start to monitor other services like MySQL, they will be automatically added to the Main Menu.

Another improvement on the Main Menu is the visibility of all other dashboards. PMM provides multiple dashboards for different levels of information for each service. You only see some dashboards in the main menu; the rest are available in the folders. Some users can miss these dashboards, which are not presented in the Main Menu. Also, customer dashboards created by different users in your organization can be missed or invisible to you until you see them in the folders by chance. So, we added Other Dashboards links to the sub-menu of each service,  so that you can easily click and see all dashboards in the Service folder.

Quick access to other dashboards from the menu

Quick access to other dashboards from the menu

What’s next?

  • We’ll improve the Vacuum Dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • A health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 
  • For Backup Management, we are planning to continue to iterate on the current limitations listed above and make the restore processes as seamless as possible for all database types.

Install PMM 2.32 now or upgrade your installation to V2.32 by checking our documentation for more information about upgrading.

Learn more about Percona Monitoring and Management 3.32

Thanks to Community and Perconians

We love our community and team in Percona, who shape the future of PMM, together and help us with all those changes.

You can also join us on our community forums to request new features, share your feedback, and ask for support.

Thank you for your collaboration on the new Home Dashboards:

Cihan Tunal?   @SmartMessage 

Tyson McPherson @Parts Authority

Paul Migdalen @IntelyCare

Sep
28
2022
--

PMM v2.31: Enhanced Alerting, User-Friendly Main Menu, Prometheus Query Builder, Podman GA, and more!

Percona Monitoring and Management v2.31

Percona Monitoring and Management v2.31Autumn brought new cool features and improvements to Percona Monitoring and Management (PMM) in V2.31. Enhanced user experience with the updated main menu, Alerting, and better PostgreSQL autovacuum observability with the new Vacuum dashboard are the major themes that we focused on in this release. Check out our Release Note of 2.31 for the full list of new features, enhancements, and bug fixes.

You can get started with PMM in minutes with the PMM Demo to check out the latest version of PMM V2.31.

Some of the highlights in PMM V2.31 include:

General availability of Percona Alerting

We are excited to introduce a streamlined alert setup process in PMM with an overhauled, unified alerting system based on Grafana. 

All Alerting functionality is now consolidated in a single pane of glass on the Alerting page. From here, you can configure, create and monitor alerts based on Percona or Grafana templates. 

The Alert Rules tab has also evolved into a more intuitive interface with an added layer for simplifying complex Grafana rules. You’ll find that the new Percona templated alert option here offers the same functionality available in the Tech Preview of Integrated Alerting but uses a friendlier interface with very advanced alerting capabilities. 

As an important and generally useful feature, this new Alerting feature is now enabled by default and ready to use in production! 

For more information about Percona Alerting, check out Alerting doc.

Deprecated Integrated Alerting

The new Percona Alerting feature fully replaces the old Integrated Alerting Tech Preview available in previous PMM versions. The new alerting brings full feature parity with Integrated Alerting, along with additional benefits like Grafana-based alert rules and a unified alerting command center

However, alert rules created with Integrated Alerting are not automatically migrated to Percona Alerting. After upgrading, make sure to manually migrate any custom alert rules that you want to transfer to PMM 2.31 using the script

Easier query building, enhanced main menu in PMM 2.31

We have powered-up PMM with Grafana 9.1 by drawing on its latest features and improvements. Here is the list of features and enhancements that have been shipped in this release: 

Redesigned expandable main menu (side menu)

With the 2.31 release, we introduce a more user-friendly and accessible main menu inspired by Grafana’s expandable side menu. PMM dashboards are the heart of the monitoring, and we aimed to provide quick and easy access to the frequently used dashboard from the main menu. On this menu, you’ll be able to browse dashboards with one click, like Operating System, MySQL, MongoDB, PostgreSQL, etc. 

PMM new side menu

PMM new side menu

Pin your favorite dashboards to the main menu (side menu)

PMM provides many custom dashboards with dozens of metrics to monitor your databases. Most users in an organization use just a handful of dashboards regularly; now, it is much easier to access them by saving the most frequently used dashboards to the main menu. You see your saved dashboards under the Starred section on the main menu.

This feature is enabled by default in PMM. You can disable it by disabling the savedItems feature flag if you have server admin or Grafa admin roles.

dd

Tip:

You can follow these steps to add your dashboard to Starred on the main menu:

  1. Open your dashboard
  2. Mark it by clicking the star icon next to the Dashboard name on the top right corner
  3. Hover Starred icon on the main menu and see all saved dashboards.

Search dashboards on Panel titles

While looking for a specific metric or panel inside dashboards, it is easy to forget which dashboard presents it. Now you can quickly find the dashboard you need on the Search dashboard page.

Percona Monitoring and Management v2.31

Command palette

A new shortcut which is named “command palette” in Grafana, has been provided in this PMM version. You can easily access main menu sections, dashboards, or other tasks using cmd+K (MacOS) or ctrl+K (Linux/Windows).  Run the command on the Explore section to quickly run a query or on the Preferences section to easily change theme preferences. 

Command Palette

Command Palette

Visual Prometheus Query Builder in Explore (Beta)

A new Prometheus query builder has been introduced in Grafana 9. This feature allows everyone, especially new users, to build queries on PromQL without extensive expertise. Visual query builder UI in Explore allows anyone to write queries and understand what the query means. 

You can easily switch to the new Prometheus query builder (Builder) by clicking on Builder mode in the top-right corner. The Builder mode allows you to build your queries by choosing the metric from the dropdown menu. If you want to continue on Text mode, you can switch to Code mode while having your text changes preserved. Please check this blog to learn more about Builder mode.

new visual query builder

Visual Prometheus Query Builder in Explore (Beta)

Add your queries to a dashboard or create a new dashboard from Explore

You’ll probably like this news if you’re a fan of the Explore feature or frequently use it. Now, creating a panel/dashboard from Explore with one click is possible by saving you from jobs like copy-paste or re-write queries. You only need to click the “Add to dashboard” button after you run your query.  Then, your panel will be automatically created with the query and a default visualization. You can change the visualization on the dashboard later by clicking the “Edit” panel. Note that you need to have the Editor/Admin/SuperAdmin role to save the panel to the chosen dashboard and follow the current dashboard save flow to save the added panel. Otherwise, you’ll lose the added new panel on your dashboard.

add query from Explore

Add your queries to a dashboard or create a new dashboard from Explore

Experimental Vacuum Dashboard

The autovacuum process in PostgreSQL is designed to prevent table bloat by removing dead tuples. These dead tuples can accumulate because of the unique way that PostgreSQL handles MVCC. Because PostgreSQL’s architecture is so unique, the autovacuum process is sometimes not understood well enough to be able to tune its parameters for peak performance. After talking to many customers and realizing that this is a recurring problem, we decided to help our users by providing a dashboard that allows you to monitor metrics related to the vacuum process – thereby helping you tune these parameters for better performance.

Now, you can monitor PostgreSQL vacuum processes with a new experimental dashboard named PostgreSQL Vacuum Monitoring which is available in the Experimental folder. We’re still working on this dashboard to add more metrics. Please let us know your feedback about this dashboard in the comments.

Experimental Vacuum Dashboard

Experimental Vacuum Dashboard

Tip

If you’d like to move the Vacuum experimental dashboard to the PostgreSQL folder or other folders that you internally use to gather all PostgreSQL dashboards, please check this document to see how you can move dashboards to a different folder.

General availability of Podman

We are excited to announce the General Availability (GA) of Podman support for deploying PMM 2.31.0. We had introduced it in 2.29.0 as a preview feature, but now we are production ready with this feature

Simplified deployment with Database as a Service (DBaaS)

In our constant endeavor and focus on an enhanced user experience, in PMM 2.31.0, we have simplified the deployment and configuration of DBaaS as follows:

  • With PMM 2.31.0, you can easily add a DB cluster from a newly created K8s cluster. All the DB cluster window fields are auto-populated with the values based on the existing K8s cluster. 
  • For PMM 2.31.0, while accessing DbaaS, if you have an existing Kubernetes cluster configured for DBaaS, you will be automatically redirected to the DB Cluster page. Otherwise, you would be redirected to the Kubernetes Cluster page.

What’s next?

  • New UX improvements are baking! We’re working on making our main menu easy to use and minimal. Next release, the main menu will present only monitored services, and you’ll have a clearer and less crowded main menu.
  • The home dashboard will replace the experimental Home dashboard, which is available in the Experimental folder after v2.30. Please do not forget to share your feedback with us if you have tried it. 
  • We’ll improve the vacuum dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • We have started to work on two new and big projects: High Availability in PMM and advanced role-based access control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 

Thanks to Community and Perconians

We love our community and team in Percona, who help shape PMM and improve better! 

Thank you for your collaboration on the new main menu:

Pedro Fernandes, Fábio Silva, Matej Kubinec

Thank you for your collaboration on Vacuum Dashboards:

Anton Bystrov, Daniel Burgos, Jiri Ctvrtka, Nailya Kutlubaeva, Umair Shahid

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com