Jul
13
2022
--

How Percona Monitoring and Management Helps You Find Out Why Your MySQL Server Is Stalling

Percona Monitoring and Management MySQL Server Is Stalling

In this blog, I will demonstrate how to use Percona Monitoring and Management (PMM) to find out the reason why the MySQL server is stalling. I will use only one typical situation for the MySQL server stall in this example, but the same dashboards, graphs, and principles will help you in all other cases.

Nobody wants it but database servers may stop handling connections at some point. As a result, the application will slow down and then will stop responding.

It is always better to know about the stall from a monitoring instrument rather than from your own customers.

PMM is a great help in this case. If you look at its graphs and notice that many of them started showing unusual behavior, you need to react. In the case of stalls, you will see that either some activity went to 0 or, otherwise, it increased to high numbers. In both cases, it does not change.

Let’s review the dashboard “MySQL Instance Summary” and its graph “MySQL Client Thread Activity” during normal operation:

PMM MySQL Instance Summary

As you see, the number of active threads fluctuates and this is normal for any healthy application: even if all connections request data, MySQL puts some threads into idle states while they need to wait while the storage engine prepares the data for them. Or, if the client application processes retrieved data.

The next screenshot was taken when the server was stalling:

Percona monitoring dashboard

In this picture, you see that the number of active threads is near maximum. At the same time the number of “MySQL Temporary Objects” lowered down to zero. This by itself shows that something unusual happened. But to understand the picture better let’s examine storage engine graphs.

I, like most MySQL users, used InnoDB for this example. Therefore the next step for figuring out what is going on would be to examine graphs in the “MySQL InnoDB Details” dashboard.

MySQL InnoDB Details

First, we are seeing that the number of rows that InnoDB reads per second went down to zero, as well as the number of rows written. This means that something prevents InnoDB from performing its operations.

MySQL InnoDB Details PMM

More importantly, we see that all I/O operations were stopped. This is unusual even on a server that does not handle any user connection: InnoDB always performs background operations and is never completely idle.

Percona InnoDB

You may see this in the “InnoDB Logging Performance” graph: InnoDB still uses log files but only for background operations.

InnoDB Logging Performance

InnoDB Buffer Pool activity is also stopped. What is interesting here is that the number of dirty pages went down to zero. This is visible on the “InnoDB Buffer Pool Data” graph: dirty pages are colored in yellow. This actually shows that InnoDB was able to flush all dirty pages from the buffer pool when InnoDB stopped processing user queries.

At this point we can make the first conclusion that our stall was caused by some external lock, preventing MySQL and InnoDB from handling user requests.

MySQL and InnoDB

The “Transaction History” graph confirms this guess: there are no new transactions and InnoDB was able to flush all transactions that were waiting in the queue before the stall happened.

We can conclude that we are NOT experiencing hardware issues.

MySQL and InnoDB problems

This group shows why we experience the stall. As you can see in the  “InnoDB Row Lock Wait Time” graph, the wait time was raised to its maximum value around 14:02, then lowered to zero. There is no row lock waits registered during the stall time.

This means that at some point all InnoDB transactions were waiting for a row lock, then failed with a timeout. Still, they have to wait for something. Since there are no hardware issues and InnoDB functions healthy in the background, this means that all threads are waiting for a global MDL lock, created by the server.

If we have Query Analytics (QAN) enabled we can find such a command easily.

Query Analytics (QAN)

For the selected time frame we can see that many queries were running until a certain time when a query with id 2 was issued, then other queries stopped running and restarted a few minutes later. The query with id 2 is FLUSH TABLES WITH READ LOCK which prevents any write activity once the tables are flushed.

This is the command that caused a full server stall.

Once we know the reason for the stall we can perform actions to prevent similar issues in the future.

Conclusion

PMM is a great tool that helps not only to identify if your database server is stalling but also to figure out what was the reason for the stall. I used only one scenario in this blog. But you may use the same dashboards and graphs to find out other reasons for the stall, such as DoS attack, hardware failure, a high number of IO operations, caused by poorly optimized queries, and many more.

Jul
07
2022
--

Automate the SSL Certificate Lifecycle of your Percona Monitoring and Management Server

SSL Certificate Lifecycle of your Percona Monitoring and Management

SSL Certificate Lifecycle of your Percona Monitoring and ManagementWe highly value security here at Percona, and in this blog post, we will show how to protect your Percona Monitoring and Management (PMM) Server with an SSL certificate and automate its lifecycle by leveraging a proxy server.

Introduction

As you may know, PMM Server provides a self-signed SSL certificate out-of-the-box to encrypt traffic between the client (web or CLI) and the server. While some people choose to use this certificate in non-critical environments, oftentimes protected by a private network, it is definitely not the best security practice for production environments.

I have to mention that a self-signed certificate still achieves the goal of encrypting the connection. Here are a few things, or problems, you should know about when it comes to self-signed certificates in general:

  • they cannot be verified by a trusted Certificate Authority (CA)
  • they cannot be revoked as a result of a security incident (if issued by a trusted CA, they can be revoked)
  • when used on public websites, they may negatively impact your brand or personal reputation

This is why most modern browsers will show pretty unappealing security warnings when they detect a self-signed certificate. Most of them will prevent users from accidentally opening websites that do not feature a secure connection (have you heard of the thisisunsafe hack for Chrome?). The browser vendors are obviously attempting to raise security awareness amongst the broad internet community.

We highly encourage our users to have their SSL certificates issued by a trusted Certificate Authority.

Some people find it overwhelming to keep track of expiry dates and remember to renew the certificates before they expire. Until several years ago, SSL certificates had to be paid for and were quite expensive. Projects like Let’s Encrypt — a non-profit Certificate Authority — have been devised to popularize web security by offering SSL certificates absolutely free of charge. However, their validity is only limited to three months, which is quite short.

We will explore the two most popular reverse proxy tools which allow you to leverage an SSL certificate for better security of your PMM instance. More importantly, we will cover how to automate the certificate renewal using those tools. While PMM Server is distributed in three flavors — docker, AMI, and OVF — we will focus on docker being our most popular distribution.

All scripts in this post assume you have a basic familiarity and experience of working with docker and docker compose.

Reverse proxies

While the choice of open source tools is quite abundant these days, we will talk about two of them that I consider being the most popular: nginx and traefik.

In order to try one of the solutions proposed here, you’ll need the following:

Let’s take a look at our network diagram. It shows that the proxy server is standing between PMM Server and its clients. This means that the proxy, not PMM Server, takes care of terminating SSL and encrypting traffic.

PMM Server and Client Network Diagram

 

Nginx

Nginx came to the market in 2004, which is quite a solid age for any software product. It has gained tremendous adoption since then and to this date, it powers many websites as a reverse proxy. Let’s see what it takes to use Nginx for SSL certificate management.

First, let me remind you where PMM Server stores the self-signed certificates:

% docker exec -t pmm-server sh -c "ls -l /srv/nginx/*.{crt,key,pem}"
total 24
-rw-r--r-- 1 root root 6016 Jun 10 11:24 ca-certs.pem
-rw-r--r-- 1 root root  977 Jun 10 11:24 certificate.crt
-rw-r--r-- 1 root root 1704 Jun 10 11:24 certificate.key
-rw-r--r-- 1 root root  424 Jun 10 11:24 dhparam.pem

We could certainly pass our own certificates by mounting a host directory containing custom certificates or, alternatively, by copying the certificates to the container. However, that would require us to issue the certificates first. What we want to achieve is to get the certificates issued automatically. I must say that even though nginx does not offer such functionality out of the box, there are a few open source projects that effectively close that gap.

One such project is nginx-proxy. It seems to be quite mature and stable, boasting of 16K GitHub stars. It has a peer project — acme-companion — which takes care of the certificate lifecycle. We will use them both, so ultimately we will end up running three separate containers:

  1. nginx-proxy – the reverse proxy
  2. acme-companion – certificate management component
  3. pmm-server – the proxied container

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:
# proxy.conf

server_tokens off;
client_max_body_size 10m;

# HTTP 1.1 support
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $http_host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $proxy_connection;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
proxy_set_header X-Forwarded-Ssl $proxy_x_forwarded_ssl;
proxy_set_header X-Forwarded-Port $proxy_x_forwarded_port;

# Mitigate httpoxy attack
proxy_set_header Proxy "";

Prior to launching the containers, the following steps need to be completed:

  1. Choose a root directory for your project.
  2. Create a directory ./nginx in your project root (docker will take care of creating the nested directories). This will be your project root folder.
  3. Create a file./nginx/proxy.conf. This file is mostly needed to override the default value of client_max_body_size, which happens to be too low, so PMM can properly handle large payloads. The file’s contents should be as follows:

Note that most recent versions of docker client come with docker compose sub-command, while earlier versions may require you to additionally install a python-based docker-compose tool.

Here is what docker-compose.yml looks like. You can go ahead and create it while making sure to replace my-domain.org with your own public domain name.

# docker-compose.yml

version: '3'

services:

  nginx:
    image: nginxproxy/nginx-proxy
    container_name: nginx
    environment:
      - DEFAULT_HOST=pmm.my-domain.org
    ports:
      - '80:80'
      - '443:443'
    restart: always
    volumes:
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf:ro
      - ./nginx/vhost.d:/etc/nginx/vhost.d:ro
      - ./nginx/certs:/etc/nginx/certs:ro
      - ./nginx/html:/usr/share/nginx/html
      - ./nginx/dhparam:/etc/nginx/dhparam
      - /var/run/docker.sock:/tmp/docker.sock:ro

  letsencrypt:
    image: nginxproxy/acme-companion
    container_name: acme-companion
    restart: always
    environment:
      - NGINX_PROXY_CONTAINER=nginx
      - DEFAULT_EMAIL=admin@my-domain.org
    depends_on:
      - nginx
    volumes:
      - ./nginx/vhost.d:/etc/nginx/vhost.d
      - ./nginx/certs:/etc/nginx/certs
      - ./nginx/html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro

  pmm-server:
    container_name: pmm-server
    image: percona/pmm-server:2
    restart: unless-stopped
    environment:
      - VIRTUAL_HOST=pmm.my-domain.org
      - LETSENCRYPT_HOST=pmm.my-domain.org
      - VIRTUAL_PORT=80
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    depends_on:
      - nginx
    volumes:
      - pmm-data:/srv
    expose:
      - '80'

volumes:
  pmm-data:
    name: pmm-data
    external: false

Now launch your site by running docker compose up -d. That’s it! If everything works fine, you should see your PMM Server’s login page by visiting https://pmm.my-domain.org. If you type docker ps -a in the terminal, you will see all three containers listed:

% docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED       STATUS                 PORTS                                                                      NAMES
07979480350c   percona/pmm-server:2        "/opt/entrypoint.sh"     1 hours ago   Up 1 hours (healthy)   80/tcp, 443/tcp                                                            pmm-server
b7ed4cbf1064   nginxproxy/acme-companion   "/bin/bash /app/entr…"   1 hours ago   Up 1 hours                                                                                        acme-companion
d35da441c103   nginxproxy/nginx-proxy      "/app/docker-entrypo…"   1 hours ago   Up 1 hours             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   nginx

Should anything go wrong, you can troubleshoot by checking the container logs, i.e. docker logs pmm-server or docker logs acme-companion maybe useful to explore.

Restrict access to PMM

Sometimes you need to restrict access to known IPs or networks. Gladly, nginx-proxy allows you to easily configure that. All you need to do is:

  • create a text file with the name of your domain — for example, pmm.my-domain.org — and save it to ./nginx/vhost.d/
    # Contents of pmm.my-domain.org
    
    # Allow trafic from known IP addresses or networks
    allow 127.0.0.0/8;  # internal network
    allow 100.1.1.0/20; # another network   
    allow 219.24.87.93; # just one IP address
    
    # Reject traffic from all other IPs and networks
    deny all;
  • finally, restart the container with docker restart pmm-server.

From now on, if someone tries to access your PMM Server from an IP address that is not in the allow list, they should get the error HTTP 403 Forbidden.

Test the connection security

While protecting your connection with an SSL certificate is a good idea, there are a whole lot of other security concerns that the certificate alone does not address. There are a few useful web services out there that allow you to not only verify your certificate but also perform a number of other vital checks to test the connection, such as sufficient cipher strength, use of outdated security protocols, vulnerability to known SSL attacks, etc.

Qualys is one such service, and all it takes to review the certificate is to go to their web page, enter the domain name of your site, and let them do the rest. After a few minutes you should see a scan report like this:

Voilà! We have achieved a very high rating of connection security without really spending much time! That’s the magic of the tool ?

Prior to using the Nginx proxy, I could never achieve the same result from the first attempt and the best score I could get was “B”. It took me time to google up what every warning of the scan report meant and it took even more time to find a proper solution and apply it to my configuration.

To wrap it up — if you keep your version of Nginx and its companion updated, it’ll save you a ton of time and also make your TLS comply with the highest security standards. And yes, certificate rotation before expiry is guaranteed.

Traefik

Traefik is a more recent product. It was first released in 2016 and has gained substantial momentum since then.

Traefik is super popular when it comes to using it as a proxy server and a load balancer for docker and Kubernetes alike. Many claim Traefik to be more powerful and flexible compared to Nginx, but also a bit more difficult to set up. Unlike Nginx, Traefik can manage the certificate lifecycle without additional tools, which is why we will launch only two containers — one for PMM Server and one for Traefik proxy.

Basic configuration

To get started with a very minimal Traefik configuration please follow the steps below.

  • choose a root directory for your project
  • create a basic docker-compose.yml in the root folder with the following contents:
    version: '3'
    
    services:
    
      traefik:
        image: traefik:v2.7
        container_name: traefik
        restart: always
        command:
          - '--log.level=DEBUG'
          - '--providers.docker=true'
          - '--providers.docker.exposedbydefault=false'
          - '--entrypoints.web.address=:80'
          - '--entrypoints.websecure.address=:443'
          - '--certificatesresolvers.le.acme.httpchallenge=true'
          - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
          - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
          - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
        ports:
          - '80:80'
          - '443:443'
        volumes:
          - ./letsencrypt:/letsencrypt
          - /var/run/docker.sock:/var/run/docker.sock:ro
        networks:
          - www
    
      pmm-server:
        image: percona/pmm-server:2
        container_name: pmm-server
        restart: always
        environment:
          - DISABLE_TELEMETRY=0
          - ENABLE_DBAAS=1
          - ENABLE_BACKUP_MANAGEMENT=1
          - ENABLE_ALERTING=1
          - GF_ANALYTICS_CHECK_FOR_UPDATES=false
          - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
          - GF_SECURITY_DISABLE_GRAVATAR=true
        volumes:
          - pmm-data:/srv
        expose:
          - '80'
        labels:
          - 'traefik.enable=true'
          - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
          - 'traefik.http.routers.pmm.entrypoints=websecure'
          - 'traefik.http.routers.pmm.tls.certresolver=le'
        networks:
          - www
    
    volumes:
      pmm-data:
        name: pmm-data
        external: false
    
    networks:
      www:
        name: www
        external: false
  • replace my-domain.org with the domain name you own
  • make sure ports 80 and 443 are open on your server and accessible publicly
  • create a directory letsencrypt in the root of your project — it will be mounted to the container so Traefik can save the certificates

Once you launch the project with docker compose up -d, you should be able to go to https://pmm.my-domain.org and see the login page of your PMM Server.

If you run docker ps -a, you will see two containers in the output:

% docker ps -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                    PORTS                                                                      NAMES
4a2746c26738   traefik:v2.7           "/entrypoint.sh --pr…"   52 seconds ago   Up 48 seconds             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   traefik
27943a8edfef   percona/pmm-server:2   "/opt/entrypoint.sh"     52 seconds ago   Up 48 seconds (healthy)   80/tcp, 443/tcp                                                            pmm-server

If you face any issues, you can troubleshoot by checking the logs of Traefik container with docker logs traefik. For obvious performance reasons, we advise you to decrease the log verbosity by setting the log.level parameter to INFO.

Advanced configuration

A closer analysis of the basic configuration confirms that it suffers from a few problems, in particular:

  • Visiting http://pmm.my-domain.org will result in HTTP 404 Not found, i.e. no redirect is configured by default.
  • Qualys SSL report rates our configuration with a “B”, which is insufficient for prod environments. It highlights three issues:
    • the server supports outdated, or insecure, TLS protocols 1.0 and 1.1
      Proxy configuration - B-flat report
    • the server uses quite a few weak ciphers
      Proxy configuration - weak ciphers
    • absence of security headers in proxy response

We will try to address all of the mentioned issues one by one.

  1. Redirect HTTP to HTTPS
    Traefik uses a special HTTP option called entryPoint. It can be easily achieved by passing the following parameters to Traefik:

    --entrypoints.web.http.redirections.entryPoint.to=websecure
    --entrypoints.web.http.redirections.entryPoint.scheme=https
  2. Avoid using outdated protocols TLS 1.0 and 1.1
  3. Avoid using weak ciphers

    As those two issues relate to TLS, why not combine the solution for both in one? Traefik offers two types of configuration — static and dynamic. The number of options necessary to address the ciphers can get quite big, this is why we’ll choose to use the dynamic configuration, i.e. put everything in a file versus inline in our docker-compose.yml file. Therefore, let’s create a file called dynamic.yml and place it in the project root:

    tls:
      options:
        default:
          minVersion: VersionTLS12
          sniStrict: true
          preferServerCipherSuites: true
          cipherSuites:
            - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
            - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
            - TLS_CHACHA20_POLY1305_SHA256
            - TLS_AES_256_GCM_SHA384
            - TLS_AES_128_GCM_SHA256
          curvePreferences:
            - secp521r1
            - secp384r1

    As you can see here, we limit the TLS version only to the most secure, the same is true for the cipher suites and cryptoalgorythmic preferences. The file name can be anything you like, so we will need to point Traefik to the file containing the dynamic configuration. To do this, we will leverage the file provider parameter:

    - '--providers.file=true'
    - '--providers.file.filename=/dynamic.yml'
  4. Add security headers to the proxy response

    Additional security headers will go to the static configuration though:

    - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
    - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
    - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
    - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
    - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
    - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'

     

That should be it for Traefik. We have successfully addressed all security concerns, which is confirmed by a fresh SSL report.

Proxy configuration - A-plus superior report

For your convenience, I will put the final version of yml file below.

docker-compose.yml

version: '3'

services:
  traefik:
    image: traefik:v2.7
    container_name: traefik
    hostname: traefik
    command:
      - '--log.level=DEBUG'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--providers.file=true'
      - '--providers.file.filename=/dynamic.yml'
      - '--entrypoints.web.address=:80'
      - '--entrypoints.web.http.redirections.entryPoint.to=websecure'
      - '--entrypoints.web.http.redirections.entryPoint.scheme=https'
      - '--entrypoints.websecure.address=:443'
      - '--certificatesresolvers.le.acme.httpchallenge=true'
      - '--certificatesresolvers.le.acme.httpchallenge.entrypoint=web'
      - '--certificatesresolvers.le.acme.email=postmaster@my-domain.org'
      - '--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json'
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./letsencrypt:/letsencrypt
      - ./dynamic.yml:/dynamic.yml
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - www

  pmm-server:
    image: percona/pmm-server:2
    container_name: pmm-server
    restart: always
    environment:
      - DISABLE_TELEMETRY=0
      - ENABLE_DBAAS=1
      - ENABLE_BACKUP_MANAGEMENT=1
      - ENABLE_ALERTING=1
      - GF_ANALYTICS_CHECK_FOR_UPDATES=false
      - GF_AUTH_LOGIN_COOKIE_NAME=pmm_session
      - GF_SECURITY_DISABLE_GRAVATAR=true
    volumes:
      - pmm-data:/srv
    expose:
      - '80'
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.pmm.rule=Host(`pmm.my-domain.org`)'
      - 'traefik.http.routers.pmm.entrypoints=websecure'
      - 'traefik.http.routers.pmm.tls.certresolver=le'
      - 'traefik.http.routers.pmm.middlewares=secureheaders@docker'
      - 'traefik.http.middlewares.secureheaders.headers.forceSTSHeader=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsPreload=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsIncludeSubdomains=true'
      - 'traefik.http.middlewares.secureheaders.headers.stsSeconds=315360000'
      - 'traefik.http.middlewares.secureheaders.headers.contentTypeNosniff=true'
      - 'traefik.http.middlewares.secureheaders.headers.browserXssFilter=true'
      - 'traefik.http.middlewares.secureheaders.headers.frameDeny=true'
    networks:
      - www

volumes:
  pmm-data:
    name: pmm-data
    external: false

networks:
  www:
    name: www
    external: false

Some security aspects

Docker socket

I’m sure you noticed the following section in docker-compose.yml:

volumes:
      - '/var/run/docker.sock:/var/run/docker.sock:ro'

You may say it’s not super secure to pass the docker socket to one or more containers, right? True, but this is a hard requirement if we want to automate things. It turns out, that both the proxy server and its companion are doing many things under the hood to make it happen, so they need access to docker runtime to be able to listen to docker events and react to them.

Consider this example: if you add a new service (or site in our case) to the same or a different docker-compose.yml and then launch it with docker compose up -d, the proxy will detect this event and request a new certificate from a Certificate Authority or pick it up from the storage, if it has already been issued. Only then it will be able to pass the certificate to your service. This kind of low-level control over docker runtime is what makes it possible for Nginx or Traefik to manipulate other containers. The mission would be impossible without the socket.

Oftentimes, when developing apps or services, engineering teams will put all their services — backend, frontend, and database — in one docker-compose.yml, which is super convenient as it can be shared within the team. However, you will certainly want to isolate prod from non-prod environments. For instance, if your database server is also deployed in a docker container, it should be done in a different environment, much more hardened ? than this.

The use of port 80

It may appear weird to mention port 80, which is the standard port used for unencrypted HTTP, in a blog post about security. Nonetheless, both Nginx and Traefik rely on it being open on the server end. This is because we used the so-called HTTP challenge (read more about challenge types here), which is supported by most proxies and is the easiest to implement. When the CA is about to process the request to issue a certificate, it needs to verify your domain ownership. It does so by querying a special .well-known endpoint on your site which can only be available on port 80. Then, in case of success, it proceeds to issue the certificate.

Both proxy servers provide quite convenient options for redirecting HTTP to HTTPS, which is also good for search engine optimization (SEO).

Conclusion

I find it difficult to disagree that automating the SSL certificate lifecycle, especially if you own or maintain more than just a few public sites (or PMM servers ?), is a much more pleasant journey than doing it manually. Thankfully, the maturity and feature completeness of the open source proxies are amazing, they are used and trusted by many, which is why we recommend you try them out as well.

P.S. There is so much more you can do with your configurations, i.e. add compression for the static assets, harden your security even more, issue a wildcard certificate, etc., but that would be too much for one post and it is probably a good topic to explore in the next one.

Have you tried some other proxies? How do they compare to Nginx or Traefik? We’d love to hear about your experiences, please share them with us via our community forums.

Jun
22
2022
--

Percona Monitoring and Management in Kubernetes is now in Tech Preview

Percona Monitoring and Management in Kubernetes

Percona Monitoring and Management in KubernetesOver the course of the years, we see the growing interest in running databases and stateful workloads in Kubernetes. With Container Storage Interfaces (CSI) maturing and more and more Operators appearing, running stateful workloads in your favorite platform is not that scary anymore. Our Kubernetes story at Percona started with Operators for MySQL and MongoDB, adding PostgreSQL later on. 

Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. It can be deployed in a virtual appliance or a Docker container. Our customers requested us to provide a way to deploy PMM in Kubernetes for a long time. We had an unofficial helm chart which was created as a PoC by Percona teams and the community (GitHub).

We are introducing the Technical Preview of the helm chart that is supported by Percona to easily deploy PMM in Kubernetes. You can find it in our helm chart repository here

Use cases

Single platform

If Kubernetes is a platform of your choice, currently you need a separate virtual machine to run Percona Monitoring and Management. No more with an introduction of a helm chart. 

As you know, Percona Operators all have integration with the PMM which enables monitoring for databases deployed on Kubernetes. Operators configure and deploy pmm-client sidecar container and register the nodes on a PMM server. Bringing PMM into Kubernetes simplifies this integration and the whole flow. Now the network traffic between pmm-client and PMM server does not leave the Kubernetes cluster at all.

All you have to do is to set the correct endpoint in a pmm section in the Custom Resource manifest. For example, for Percona Operator for MongoDB, the pmm section will look like this:

 pmm:
    enabled: true
    image: percona/pmm-client:2.28.0
    serverHost: monitoring-service

Where monitoring-service is the service created by a helm chart to expose a PMM server.

High availability

Percona Monitoring and Management has lots of moving parts: Victoria Metrics to store time-series data, ClickHouse for query analytics functionality, and PostgreSQL to keep PMM configuration. Right now all these components are a part of a single container or virtual machine, with Grafana as a frontend. To provide a zero-downtime deployment in any environment, we need to decouple these components. It is going to substantially complicate the installation and management of PMM.

What we offer instead right now are ways to automatically recover PMM in case of failure within minutes (for example leveraging the EC2 self-healing mechanism).

Kubernetes is a control plane for container orchestration that automates manual tasks for engineers. When you run software in Kubernetes it is best if you rely on its primitives to handle all the heavy lifting. This is what PMM looks like in Kubernetes:

PMM in Kubernetes

  • StatefulSet controls the Pod creation
  • There is a single Pod with a single container with all the components in it
  • This Pod is exposed through a Service that is utilized by PMM Users and pmm-clients
  • All the data is stored on a persistent volume
  • ConfigMap has various environment variable settings that can help to fine-tune PMM

In case of a node or a Pod failure, the StatefulSet is going to recover PMM Pod automatically and remount the Persistent Volume to it. The recovery time depends on the load of the cluster and node availability, but in normal operating environments, PMM Pod is up and running again within a minute.

Deploy

Let’s see how PMM can be deployed in Kubernetes.

Add the helm chart:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install PMM:

helm install pmm --set service.type="LoadBalancer" percona/pmm

You can now login into PMM using the LoadBalancer IP address and use a randomly generated password stored in a

pmm-secret

Secret object (default user is admin).

The Service object created for PMM is called

monitoring-service

:

$ kubectl get services monitoring-service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
monitoring-service   LoadBalancer   10.68.29.40   108.59.80.108   443:32591/TCP   3m34s

$ kubectl get secrets pmm-secret -o yaml
apiVersion: v1
data:
  PMM_ADMIN_PASSWORD: LE5lSTx3IytrUWBmWEhFTQ==
…

$ echo 'LE5lSTx3IytrUWBmWEhFTQ==' | base64 --decode && echo
,NeI<w#+kQ`fXHEM

Login to PMM by connecting to HTTPS://<YOUR_PUBLIC_IP>.

Customization

Helm chart is a template engine for YAML manifests and it allows users to customize the deployment. You can see various parameters that you can set to fine-tune your PMM installation in our README

For example, to set choose another storage class and set the desired storage size, set the following flags:

helm install pmm percona/pmm \
--set storage.storageClassName="premium-rwo" \
-–set storage.size=”20Gi”

You can also change these parameters in values.yaml and use “-f” flag:

# values.yaml contents
storage:
  storageClassName: “premium-rwo”
  size: 20Gi


helm install pmm percona/pmm -f values.yaml

Maintenance

For most of the maintenance tasks, regular Kubernetes techniques would apply. Let’s review a couple of examples.

Compute scaling

It is possible to vertically scale PMM by adding or removing resources through

pmmResources

variable in values.yaml. 

pmmResources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

Once done, upgrade the deployment:

helm upgrade -f values.yaml pmm percona/pmm

This will restart a PMM Pod, so better plan it carefully not to disrupt your team’s work.

Storage scaling

This depends a lot on your storage interface capabilities and the underlying implementation. In most clouds, Persistent Volumes can be expanded. You can check if your storage class supports it by describing it:

kubectl describe storageclass standard
…
AllowVolumeExpansion:  True

Unfortunately, just changing the size of the storage in values.yaml (storage.size) will not do the trick and you will see the following error:

helm upgrade -f values.yaml pmm percona/pmm
Error: UPGRADE FAILED: cannot patch "pmm" with kind StatefulSet: StatefulSet.apps "pmm" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We use the StatefulSet object to deploy PMM, and StatefulSets are mostly immutable and there are a handful of things that can be changed on the fly. There is a trick though.

First, delete the StatefulSet, but keep the Pods and PVCs:

kubectl delete sts pmm --cascade=orphan

Recreate it again with the new storage configuration:

helm upgrade -f values.yaml pmm percona/pmm

It will recreate the StatefulSet with the new storage size configuration. 

Edit Persistent Volume Claim manually and change the storage size (the name of the PVC can be different for you). You need to change the storage in spec.resources.requests.storage section:

kubectl edit pvc pmm-storage-pmm-0

The PVC is not resized yet and you can see the following message when you describe it:

kubectl describe pvc pmm-storage-pmm-0
…
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 16 Jun 2022 11:55:56 +0300           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

The last step would be to restart the Pod:

kubectl delete pod pmm-0

Upgrade

Running helm upgrade is a recommended way. Either once a new helm chart is released or when you want to upgrade the newer version of PMM by replacing the image in the image section. 

Backup and restore

PMM stores all the data on a Persistent Volume. As said before, regular Kubernetes techniques can be applied here to backup and restore the data. There are numerous options:

  • Volume Snapshots – check if it is supported by your CSI and storage implementation
  • Third-party tools, like Velero, can handle the backups and restores of PVCs
  • Snapshots provided by your storage (ex. AWS EBS Snapshots) with manual mapping to PVC during restoration

What is coming

To keep you excited there are numerous things that we are working on or have planned to enable further Kubernetes integrations.

OpenShift support

We are working on building a rootless container so that OpenShift users can run Percona Monitoring and Management there without having to grant elevated privileges. 

Microservices architecture

This is something that we have been discussing internally for some time now. As mentioned earlier, there are lots of components in PMM. To enable proper horizontal scaling, we need to break down our monolith container and run these components as separate microservices. 

Managing all these separate containers and Pods (if we talk about Kubernetes), would require coming up with separate maintenance strategies. This brings up the question of creating a separate Operator for PMM only to manage all this, but it is a significant effort. If you have an opinion here – please let us know on our community forum.

Automated k8s registration in DBaaS

As you know, Percona Monitoring and Management comes with a technical preview of  Database as a Service (DBaaS). Right now when you install PMM on a Kubernetes cluster, you still need to register the cluster to deploy databases. We want to automate this process so that after the installation you can start deploying and managing databases right away.

Conclusion

Percona Monitoring and Management enables database administrators and site reliability engineers to pinpoint issues in their open source database environments, whether it is a quick look through the dashboards or a detailed analysis with Query Analytics. Percona’s support for PMM on Kubernetes is a response to the needs of our customers who are transforming their infrastructure.

Some useful links that would help you to deploy PMM on Kubernetes:

May
24
2022
--

Store and Manage Logs of Percona Operator Pods with PMM and Grafana Loki

Store and Manage Logs of Percona Operator Pods with PMM and Grafana Loki

While it is convenient to view the log of MySQL or MongoDB pods with

kubectl logs

, sometimes the log is purged when the pod is deleted, which makes searching historical logs a bit difficult. Grafana Loki, an aggregation logging tool from Grafana, can be installed in the existing Kubernetes environment to help store historical logs and build a comprehensive logging stack.

What Is Grafana Loki

Grafana Loki is a set of tools that can be integrated to provide a comprehensive logging stack solution. Grafana-Loki-Promtail is the major component of this solution. The Promtail agent collects and ships the log to the Loki datastore and then visualizes logs with Grafana. While collecting the logs, Promtail can label, convert and filter the log before sending. Next step, Loki receives the logs and indexes the metadata of the log. At this step, the logs are ready to be visualized in the Grafana and the administrator can use Grafana and Loki’s query language, LogQL, to explore the logs. 

 

     Grafana Loki

Installing Grafana Loki in Kubernetes Environment

We will use the official helm chart to install Loki. The Loki stack helm chart supports the installation of various components like

promtail

,

fluentd

,

Prometheus

and

Grafana

$ helm repo add grafana https://grafana.github.io/helm-charts
$ helm repo update
$ helm search repo grafana

NAME                                CHART VERSION APP VERSION DESCRIPTION
grafana/grafana                     6.29.2       8.5.0      The leading tool for querying and visualizing t...
grafana/grafana-agent-operator      0.1.11       0.24.1     A Helm chart for Grafana Agent Operator
grafana/fluent-bit                  2.3.1        v2.1.0     Uses fluent-bit Loki go plugin for gathering lo...
grafana/loki                        2.11.1       v2.5.0     Loki: like Prometheus, but for logs.
grafana/loki-canary                 0.8.0        2.5.0      Helm chart for Grafana Loki Canary
grafana/loki-distributed            0.48.3       2.5.0      Helm chart for Grafana Loki in microservices mode
grafana/loki-simple-scalable        1.0.0        2.5.0      Helm chart for Grafana Loki in simple, scalable...
grafana/loki-stack                  2.6.4        v2.4.2     Loki: like Prometheus, but for logs.

For a quick introduction, we will install only

loki-stack

and

promtail

. We will use the Grafana pod, that is deployed by Percona Monitoring and Management to visualize the log. 

$ helm install loki-stack grafana/loki-stack --create-namespace --namespace loki-stack --set promtail.enabled=true,loki.persistence.enabled=true,loki.persistence.size=5Gi

Let’s see what has been installed:

$ kubectl get all -n loki-stack
NAME                            READY   STATUS    RESTARTS   AGE
pod/loki-stack-promtail-xqsnl   1/1     Running   0          85s
pod/loki-stack-promtail-lt7pd   1/1     Running   0          85s
pod/loki-stack-promtail-fch2x   1/1     Running   0          85s
pod/loki-stack-promtail-94rcp   1/1     Running   0          85s
pod/loki-stack-0                1/1     Running   0          85s

NAME                          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/loki-stack-headless   ClusterIP   None           <none>        3100/TCP   85s
service/loki-stack            ClusterIP   10.43.24.113   <none>        3100/TCP   85s

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/loki-stack-promtail   4         4         4       4            4           <none>          85s

NAME                          READY   AGE
statefulset.apps/loki-stack   1/1     85s

Promtail

and

loki-stack

pods have been created, together with a service that

loki-stack

will use to publish the logs to Grafana for visualization.

You can see the promtail pods are deployed by a

daemonset

 which spawns a promtail pod in every node. This is to make sure the logs from every pod in all the nodes are collected and shipped to the

loki-stack

  pod for centralizing storing and management.

$ kubectl get pods -n loki-stack -o wide
NAME                        READY   STATUS    RESTARTS   AGE    IP           NODE                 NOMINATED NODE   READINESS GATES
loki-stack-promtail-xqsnl   1/1     Running   0          2m7s   10.42.0.17   phong-dinh-default   <none>           <none>
loki-stack-promtail-lt7pd   1/1     Running   0          2m7s   10.42.1.12   phong-dinh-node1     <none>           <none>
loki-stack-promtail-fch2x   1/1     Running   0          2m7s   10.42.2.13   phong-dinh-node2     <none>           <none>
loki-stack-promtail-94rcp   1/1     Running   0          2m7s   10.42.3.11   phong-dinh-node3     <none>           <none>
loki-stack-0                1/1     Running   0          2m7s   10.42.0.19   phong-dinh-default   <none>           <none>

Integrating Loki With PMM Grafana

Next, we will add Loki as a data source of PMM Grafana, so we can use PMM Grafana to visualize the logs. You can do it from the GUI or with kubectl CLI. 

Below is the step to add data source from PMM GUI:

Navigate to

https://<PMM-IP-addres>:9443/graph/datasources

  then select

Add data source

 

Integrating Loki With PMM Grafana

Then Select Loki

Next, in the Settings, in the HTTP URL box, enter the DNS records of the

loki-stack

  service 

Loki settings

You can also use the below command to add the data source. Make sure to specify the name of PMM pod, in this command,

monitoring-0

  is the PMM pod. 

$ kubectl -n default exec -it monitoring-0 -- bash -c "curl 'http://admin:verysecretpassword@127.0.0.1:3000/api/datasources' -X POST -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{ \"orgId\": 1, \"name\": \"Loki\", \"type\": \"loki\", \"typeLogoUrl\": \"\", \"access\": \"proxy\", \"url\": \"http://loki-stack.loki-stack.svc.cluster.local:3100\", \"password\": \"\", \"user\": \"\", \"database\": \"\", \"basicAuth\": false, \"basicAuthUser\": \"\", \"basicAuthPassword\": \"\", \"withCredentials\": false, \"isDefault\": false, \"jsonData\": {}, \"secureJsonFields\": {}, \"version\": 1, \"readOnly\": false }'"

Exploring the Logs in PMM Grafana

Now, you can explore the pod logs in Grafana UI, navigate to Explore, and select Loki in the dropdown list as the source of metrics:

Exploring the Logs in PMM Grafana

In the Log Browser, you can select the appropriate labels to form your first LogQL query, for example, I select the following attributes

 

{app="percona-xtradb-cluster", component="pxc", container="logs",pod="cluster1-pxc-1"} 

 

Click Show logs  and you can see all the logs of

cluster-pxc-1

pod

You can perform a simple filter with |= “message-content”. For example, filtering all the messages related to State transfer  by

{app="percona-xtradb-cluster", component="pxc", container="logs",pod="cluster1-pxc-1"} |= "State transfer"

Conclusion

Deploying Grafana Loki in the Kubernetes environment is feasible and straightforward. Grafana Loki can be integrated easily with Percona Monitoring and Management to provide both centralized logging and comprehensive monitoring when running Percona XtraDB Cluster and Percona Server for MongoDB in Kubernetes.

It would be interesting to know if you have any thoughts while reading this, please share your comments and thoughts.

Apr
08
2022
--

A Story of a Failed Percona Monitoring and Management Update

Failed Percona Monitoring and Management Update

Just the other day, one of my Percona Monitoring and Management (PMM) instances failed to update and ended up in a funny state:

percona monitoring and management

 

No matter how many times I clicked the “check for updates” button in the lower right, PMM still showed “no updates available”. Basically, PMM forgot what version it is and because of that it thinks there are no updates available

If you run into this (or possibly other PMM  update problems), you may find the following helpful to troubleshoot or resolve the problem. 

I have deployed PMM through docker, so the first step is to get inside the docker container to be able to investigate and troubleshoot the installation.

docker exec -it pmm-server bash

Percona Monitoring and Management (as of version 2.26) learns its own version from the version of “pmm-update” package.  In my case, this did not work due to a corrupted rpm database:

[root@812fb43b3a96 opt]# yum info pmm-update
error: rpmdb: BDB0113 Thread/process 13841/140435865503552 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed

It is not clear to me why the database has become corrupted in my case, but unfortunately, it did and needed to be repaired:

mv /var/lib/rpm/__db* /tmp/
rpm --rebuilddb
rm -rf /var/cache/yum

These commands backup the current database to /tmp/ (just in case)  to rebuild the current database as well as remove already downloaded packages (it has nothing to do with database corruption but reduces reliance on the local state).

# supervisorctl start pmm-update-perform
pmm-update-perform: started

This command starts the update process.  You probably could also trigger an update through the Web GUI at this point but I want you to be aware of this option, too.

As this simply starts the background update process you do not get much information about what is going on. To find that out, you need to check the log:

tail /srv/logs/pmm-update-perform.log

Note while the update is going, the PMM instance may not be available at all. You could see this message:

Do not freak out – wait for the update to finish before trying to use PMM again. 

The update log does not really have a very clear message which states what update was completed successfully.  Here is, however, an example of how the end of a log from a successful update may look:

PLAY RECAP *********************************************************************
localhost                  : ok=79   changed=31   unreachable=0    failed=0    skipped=16   rescued=0    ignored=0

time="2022-04-01T15:36:39Z" level=info msg="Waiting for Grafana dashboards update to finish..."

With that, the update has been completed and my PMM instance has successfully updated to the recent version.

I hope this blog post will help you to troubleshoot your PMM update problems too if you happen to run into one! ?

Mar
25
2022
--

Percona Monitoring and Management Security Threat Tool

Percona Monitoring and Management Security Threat Tool

Percona Monitoring and Management (PMM) is a multi-faceted tool that includes the Security Threat Tool which provides the ability to check your databases for potential configuration or performance problems.  And boy was I shocked to find I had problems when I installed PMM for testing.

The complete list of checks that PMM runs daily can be found at https://docs.percona.com/percona-platform/checks.html and they range from unsecured log file permissions to low cache rates, depending on the database.  PMM checks MySQL, MongoDB, and PostgreSQL instances. These checks are categorized as critical, major, or trivial depending on their respective impact and you can silence them if the issue is chronic but has been decided as something that can be tolerated. 

I installed PMM and Percona Distribution for MySQL on a test server and enabled PMM security.  On the home screen, the alert menu was instantly displayed.  

Security Alert

It is a little shocking to find your new server has potential security issues.

Yup, my test system had problems!  I clicked on the above section of the PMM home page fearful of what horrors awaited me.  

The Security Warnings

The warnings from PMM’s Security Threat Tool are clear and concise, often offering setting recommendations

There were no ‘critical’ problems but there were two ‘major’ and three ‘trivial’ issues. The first of the ‘major’ problems was that the master_verify_checksum is disabled.  The master_verify_checksum variable is used in replication. Since this was a test machine and not part of any replication topology, there really is not a need to have a replication source verify events read from the binary log by examining checksums, stopping in the case of a mismatch. BTW master_verify_checksum is disabled by default.

The last ‘major’ issue is that the binary log files are rotated too quickly and PMM suggested a value for this setting. Once again for an ephemeral test system, I could live with this issue as nobody else was dependent on this system.

The ‘trivial’ issues were somethings that may not be considered trivial by all.  The first of these is the InnoDB flush method.  My test system was set up to use fsync while PMM recommends O_DIRECT which is usually the choice for use with local disks. I will not review all the options and opinions (and there are many) but it was nice to get a gentle prodding from PMM about this issue. If this test system was going to be around for a while, I would definitely change the current setting to the recommended. 

My second ‘trivial’ problem was more of a warning about a user who had DBA privileges. And the last problem was a recommendation to change the binlog_row_image being set to full when minimal would provide better performance.  You might consider these nagging by PMM but both are issues a DBA or a Reliability Engineer would gladly be reminded of.  

To enable the Security threat Tool, select the Gear Icon on the left side of the main PMM display and click on the gear icon for the Configuration option and then the second gear icon for Settings.

Config Settings

Please pick the configuration ‘gear’ icon

 

Then select advanced settings

Security Threat Tool

And finally, enable the Security Threat Tool and I would stick with the default values on intervals when you begin to explore this tool.

Conclusion

The Percona Monitoring and Management Security Threat Tool is a handy way to gain insight into your MySQL, PostgreSQL, or MongoDB database.  It provides information that general security tools will not provide and is packed with Percona’s easy-to-use PMM interface.  This is an open source tool that you need to have at your disposal.

Feb
16
2022
--

Monitoring MongoDB Collection Stats with Percona Monitoring and Management

Monitoring MongoDB Collection Stats

Monitoring MongoDB Collection StatsOne approach to get to know a MongoDB system we are not familiar with is to start by checking the busiest collections. MongoDB provides the top administrative command for this purpose.

From the mongo shell, we can run db.adminCommand(“top”) to get a snapshot of all the collections at a specific point in time:

...
		"test.testcol" : {
			"total" : {
				"time" : 17432,
				"count" : 58
			},
			"readLock" : {
				"time" : 358,
				"count" : 57
			},
			"writeLock" : {
				"time" : 17074,
				"count" : 1
			},
			"queries" : {
				"time" : 100,
				"count" : 1
			},
			"getmore" : {
				"time" : 0,
				"count" : 0
			},
			"insert" : {
				"time" : 17074,
				"count" : 1
			},
			"update" : {
				"time" : 0,
				"count" : 0
			},
			"remove" : {
				"time" : 0,
				"count" : 0
			},
			"commands" : {
				"time" : 0,
				"count" : 0
			}
		}
		...

In the extract above we can see some details about the testcol collection. For each operation type, we have the amount of server time spent (measured in microseconds), and a counter for the number of operations issued. The total section is simply the sum for all operation types against the collection.

In this case, I executed a single insert and then a find command to check the results, hence the counters are one for each of those operations.

So by storing samples of the above values in a set of metrics, we can easily see the trends and historical usage of a MongoDB instance.

Top  Metrics in Percona Monitoring and Management (PMM)

PMM 2.26 includes updates to mongodb_exporter so we now have the ability to get metrics using the output of the MongoDB top command.

For example:

MongoDB Percona Monitoring and Management

As you can see, the metric names correlate with the output of the top command we saw. In this release, the dbname and collection name is part of the metric name (that is subject to change in the future based on community feedback).

Collection and Index Stats

In addition to this, PMM 2.26 also includes the ability to gather database, collection, and index statistics. Using these metrics we can monitor collection counts, data growth, and index usage over time.

For example, we can collect the following information about a database:

PMM Collection and Index Stats

And here are the colstats metrics for the test.testcol collection:

PMM metrics

Note: By default, PMM won’t collect this information if you have more than 200 collections (this number is subject to change). The reason is to avoid too much overhead in metrics collection, which may adversely affect performance. You can override this behavior by using the –max-collections-limit option.

Enabling the Additional Collectors

The PMM client starts by default only with the diagnosticdata and replicasetstatus collectors enabled. We can run the PMM client with the –enable-all-collectors argument, to make all the new metrics available. For example:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host=127.0.0.1 --port=27017 --enable-all-collectors

If we want to override the limit mentioned above, use something like this:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host=127.0.0.1 --port=27017 --enable-all-collectors -–max-collections-limit=500

We also have the ability to enable only some of the extra collectors. For example, if you want all collectors except topmetrics, specify:

pmm-admin add mongodb --username=mongodb_exporter --password=percona --host=127.0.0.1 --port=27017 --enable-all-collectors --disable-collectors=topmetrics

We can also filter which databases and collections we are interested in getting metrics about. The optional argument –stats-collections can be set with a namespace.

For example:

–stats-collections=test (get data for all collections in test db)
–stats-collections=test.testcol (get data only from testcol collection of testdb)

Also, check the documentation page for more information.

Creating Dashboards

We can easily create dashboards using the newly available metrics. For example, to see the index usages, we can use the following promSQL expression:

{__name__ =~ "mongodb_.*_accesses_ops"}

This is what the dashboard would look like:

MongoDB dashboard

Note: If the image above is too small, you can open it in a new browser tab to see it better.

Conclusion

We can use these new metrics to create dashboards with the collections most written to (or most read from). We can also see what the characteristics of the workload are at a glance to determine if we are dealing with a read-intensive, mixed, or write-intensive system.

One of the most important uses for these new collectors and metrics is performance tuning. By knowing more about your top or “hottest” collections, you will likely be able to better tune your queries and indexing to improve overall performance. These new metrics on collections will be very helpful for performance tuning

Keep in mind these features are currently a technical preview, so they are disabled by default.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Dec
30
2021
--

Percona Monitoring and Management 2 Test Drive Using VirtualBox and SSH Tunnels

PMM using VirtualBox and SSH

PMM using VirtualBox and SSHPercona Monitoring and Management 2 (PMM2) is the database monitoring suite assembled and developed by Percona. It is based on standard open source components and custom-made software integrations. PMM2 helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

This blog post will describe a method to test PMM2 using your laptop’s VirtualBox, ssh tunnels, and without installing any agents on the database servers. This is a testing and evaluation environment, not intended for production. If you want to perform a full-fledged test of PMM2, we recommend an environment as similar as possible to your final production setup: enterprise virtualization, docker containers, or AWS. We assume that your laptop doesn’t have direct connectivity to the databases, this is why we use ssh tunnels.

PMM2 architecture consists of 2+1 components:PMM2 High Level Architecture
Two components run on your infrastructure: PMM Agents and PMM Server. The agents gather the metrics at the database and operating system levels. PMM Server takes care of processing, storing, grouping, and displaying these metrics. It can also perform additional operations like capturing serverless databases metrics, backups, and sending alerts (the last two features are in technical preview as of this writing). The other component to complete the formula is the Percona Platform, which adds more features to PMM, from advisors to DBaaS. Disclaimer: Percona Platform is in preview release with limited functionality – suitable for early adopters, development, and testing. Besides the extended features added to PMM, the Percona Platform brings together distributions of MySQL, PostgreSQL, and MongoDB including a range of open-source tools for data backup, availability, and management. You can learn more about the Platform here.

To make setup easier, PMM2 Server can be run either as a docker container or importing an OVA image, executed using VMWare VSphere, VirtualBox, or any other hypervisor. If you run your infrastructure in AWS, you can deploy PMM from the AWS Marketplace.

To run the agents, you need a Linux box. We recommend running the agents and the database on the same node. PMM can also gather the metrics using a direct connection to a server-less database or running an operating system that does not support the agent.

Often, installing the agents is a stopper for some DBAs who would like to test PMM2. Also, while containers are frequent in large organizations, we find virtualization and containers relegated to development and quality assurance environments. These environments usually don’t have direct access to production databases.

TCP Forwarding Across SSH Connections

AllowTcpForwarding is the ssh daemon configuration option that allows forwarding TCP ports across the ssh connection. At first sight, this may seem a security risk, but as the ssh documentation states: “disabling TCP forwarding does not improve security unless users are also denied shell access, as they can always install their forwarders.”

If your system administrators do not allow TCP forwarding, other options available to accomplish the same results are socat or netcat. But we will not cover them here.

If your laptop has direct access to the databases, you can skip all the ssh tunnels and use the direct access method described later in this post.

Install PMM 2 Ova

Download the Open Virtualization Format compatible image from https://www.percona.com/downloads/pmm2/ or use the command line:

$ wget https://www.percona.com/downloads/pmm2/2.25.0/ova/pmm-server-2.25.0.ova

You can import the OVA file using the UI, with the import option from the file menu, or using the command line:

$ VBoxManage import pmm-server-2.25.0.ova --vsys 0 --vmname "PMM Testing"
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Interpreting /Users/pep/Downloads/pmm-server-2.25.0.ova...
OK.
Disks:
vmdisk1 42949672960 -1 http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized PMM2-Server-2021-12-13-1012-disk001.vmdk -1 -1
vmdisk2 429496729600 -1 http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized PMM2-Server-2021-12-13-1012-disk002.vmdk -1 -1

Virtual system 0:
0: Suggested OS type: "RedHat_64"
(change with "--vsys 0 --ostype "; use "list ostypes" to list all possible values)
1: VM name specified with --vmname: "PMM Testing"
2: Suggested VM group "/"
(change with "--vsys 0 --group ")
3: Suggested VM settings file name "/Users/Pep/VirtualBox VMs/PMM2-Server-2021-12-13-1012/PMM2-Server-2021-12-13-1012.vbox"
(change with "--vsys 0 --settingsfile ")
4: Suggested VM base folder "/Users/Pep/VirtualBox VMs"
(change with "--vsys 0 --basefolder ")
5: Product (ignored): Percona Monitoring and Management
6: Vendor (ignored): Percona
7: Version (ignored): 2021-12-13
8: ProductUrl (ignored): https://www.percona.com/software/database-tools/percona-monitoring-and-management
9: VendorUrl (ignored): https://www.percona.com
10: Description "Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance"
(change with "--vsys 0 --description ")
11: Number of CPUs: 1
(change with "--vsys 0 --cpus ")
12: Guest memory: 4096 MB
(change with "--vsys 0 --memory ")
13: Network adapter: orig NAT, config 3, extra slot=0;type=NAT
14: SCSI controller, type LsiLogic
(change with "--vsys 0 --unit 14 --scsitype {BusLogic|LsiLogic}";
disable with "--vsys 0 --unit 14 --ignore")
15: Hard disk image: source image=PMM2-Server-2021-12-13-1012-disk001.vmdk, target path=PMM2-Server-2021-12-13-1012-disk001.vmdk, controller=14;channel=0
(change target path with "--vsys 0 --unit 15 --disk path";
disable with "--vsys 0 --unit 15 --ignore")
16: Hard disk image: source image=PMM2-Server-2021-12-13-1012-disk002.vmdk, target path=PMM2-Server-2021-12-13-1012-disk002.vmdk, controller=14;channel=1
(change target path with "--vsys 0 --unit 16 --disk path";
disable with "--vsys 0 --unit 16 --ignore")
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Successfully imported the appliance.

Once the machine is imported, we will connect it to a host-only network. This network restricts network traffic only between the host and the virtual machines. But first, let’s find a suitable network:

$ VBoxManage list hostonlyifs
Name: vboxnet0
GUID: 786f6276-656e-4074-8000-0a0027000000
DHCP: Disabled
IPAddress: 192.168.56.1
NetworkMask: 255.255.255.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Wireless: No
Status: Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet0

Select the first one that has Status up, write down the name and IP address. Then make sure that there is a DHCP server assigned to that interface:

$ VBoxManage list dhcpservers
NetworkName: HostInterfaceNetworking-vboxnet0
Dhcpd IP: 192.168.56.100
LowerIPAddress: 192.168.56.101
UpperIPAddress: 192.168.56.254
NetworkMask: 255.255.255.0
Enabled: Yes
Global Configuration:
minLeaseTime: default
defaultLeaseTime: default
maxLeaseTime: default
Forced options: None
Suppressed opts.: None
1/legacy: 255.255.255.0
Groups: None
Individual Configs: None

Now we will assign two network interfaces to our PMM virtual machine. One is allocated to the internal network, and the other uses NAT to connect to the internet and, for example, check for upgrades.

$ VBoxManage modifyvm "PMM Testing" --nic1 hostonly --hostonlyadapter1 vboxnet0
$ VBoxManage modifyvm "PMM Testing" --nic2 natnetwork

Once networking is configured, we may start the virtual machine.

$ VBoxManage startvm "PMM Testing"

The next step is to retrieve the IP address assigned to our PMM box. First, we will obtain the MAC address of the network card we recently added:

$ VBoxManage showvminfo "PMM Testing" | grep -i vboxnet0
NIC 1: MAC: 08002772600D, Attachment: Host-only Interface 'vboxnet0', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none

Using the retrieved MAC address we can look for the DHCP leases:

$ VBoxManage dhcpserver findlease --interface=vboxnet0 --mac-address=08002772600D
IP Address: 192.168.56.112
MAC Address: 08:00:27:72:60:0d
State: acked
Issued: 2021-12-21T22:15:54Z (1640124954)
Expire: 2021-12-21T22:25:54Z (1640125554)
TTL: 600 sec, currently 444 sec left

This is the IP address we will use to access the PMM server. Open a browser to connect to https://192.168.56.112 with the default credentials: admin/admin.

PMM2 Login window
The following step configures the tunnels to connect to the databases we monitor.

Set Up the SSH Tunnels

This is the topology of our network:

Private network topology
We will open two ssh connections per server we want to access from PMM. Open a terminal session and execute the following command, replacing it with the username you normally use to connect to your jump host:

$ ssh -L 192.168.56.1:3306:10.0.0.1:3306 @192.168.55.100

This creates a tunnel that connects the MySQL Server port 3306 with our local internal address in the same port. If you want to connect to more than one MySQL instance, you must use different ports. To open the tunnel for the MongoDB server, use the following command:

$ ssh -L 192.168.56.1:27017:10.0.0.2:27017 @192.168.55.100

Test the tunnel connectivity to the MySQL host using netcat:

$ nc -z 192.168.56.1 3306
Connection to 192.168.56.1 port 3306 [tcp/mysql] succeeded!

And also test the connectivity to the MongoDB host:

$ nc -z 192.168.56.1 27017
Connection to 192.168.56.1 port 27017 [tcp/*] succeeded!

This is the topology of our network including the ssh tunnels.
SSH tunnels

Configure Accounts

Follow the PMM documentation and create a MySQL account (or use an already existing account) with the required privileges:

CREATE USER 'pmm'@'10.0.0.100' IDENTIFIED BY '' WITH MAX_USER_CONNECTIONS 10;
GRANT SELECT, PROCESS, REPLICATION CLIENT, RELOAD, BACKUP_ADMIN ON *.* TO 'pmm'@'10.0.0.100';

Note that we need to use the internal IP address for the jump host. If you don’t know the IP address, use the wildcard ‘%’.

Add also the credentials for MongoDB, run this in a Mongo session:

db.getSiblingDB("admin").createRole({
role: "explainRole",
privileges: [{
resource: {
db: "",
collection: ""
},
actions: [
"listIndexes",
"listCollections",
"dbStats",
"dbHash",
"collStats",
"find"
]
}],
roles:[]
})

db.getSiblingDB("admin").createUser({
user: "pmm_mongodb",
pwd: "",
roles: [
{ role: "explainRole", db: "admin" },
{ role: "clusterMonitor", db: "admin" },
{ role: "read", db: "local" }
]
})

Add the Services to PMM

We can’t install the agents because we don’t have access to our PMM testing environment from the database servers. Instead, we will configure both services as remote instances. Go to the “Configuration” menu , select “PMM Inventory” , then “Add instance” . Then choose MySQL Add a remote instance.
Complete the following fields:
Hostname: 192.168.56.1 (This is the internal Host-Only VirtualBox address)
Service name: MySQL8
Port: 3306
Username: pmm
Password: <password>

And press the button. It will check the connectivity and, if everything is correct, the MySQL service will be added to the inventory. If there is an error, double-check that the ssh connection is still open and you entered the correct credentials. Make sure that the host that you specified to create the MySQL user is correct.

We will use a similar process for MongoDB:

These are the fields you have to complete with the correct information:
Hostname: 192.168.56.1 (Again, the internal Host-Only VirtualBox address)
Service name: MongoDB
Port: 27017
Username: pmm_mongodb
Password: <password>

And press the button. It will check the connectivity and, if everything is correct, the MongoDB service will be added to the inventory. If there is an error, double-check again that the ssh connection is open and you entered the correct credentials. You can use also the MongoDB client application to check access.

Once you added both services, you just need to wait for a few minutes to give time to collect data and start testing PMM2.

PMM2 Query Analyzer

Dec
20
2021
--

PMM Now Supports Monitoring of PostgreSQL Instances Connecting With Any Database (Name)

Monitoring of PostgreSQL Instances

The recent release of Percona Monitoring and Management 2.25.0 (PMM) includes a fix for bug PMM-6937: before that, PMM expected all monitoring connections to PostgreSQL servers to be made using the default postgres database. This worked well for most deployments, however, some DBaaS providers like Heroku and DigitalOcean do not provide direct access to the postgres database (at least not by default) and instead create custom databases for each of their instances. Starting with this new release of PMM, it is possible to specify the name of the database the Postgres exporter should connect to, both through the pmm-admin command-line tool as well as when configuring the monitoring of a remote PostgreSQL instance directly in the PMM web interface.

Secure Connections

Most DBaaS providers enforce or even restrict access to their instances to SSL (well, in fact, TLS) client connections – and that’s a good thing! Just pay attention to this detail when you configure the monitoring of these instances on PMM. If when you configure your instance the system returns a red-box alert with the saying:

Connection check failed: pq: no pg_hba.conf entry for host “123.123.123.123”, user “pmm”, database “mydatabase”, SSL off.

and your firewall allows external connections to your database, chances are the problem is on the last part – “SSL off”.

Web Interface

When configuring the monitoring of a remote PostgreSQL instance in the PMM web interface, make sure you tick the box for using TLS connections:

PMM TLS connections

But note that checking this box is not enough; you have to complement this setting with one of two options:

a) Provide the TLS certificates and key, using the forms that appear once you click on the “Use TLS” check-box:

  • TLS CA
  • TLS certificate key
  • TLS certificate

The PMM documentation explains this further and more details about server and client certificates for PostgreSQL connection can be found in the SSL Support chapter of its online manual.

b) Opt to not provide TLS certificates, leaving the forms empty and checking the “Skip TLS” check-box:

PMM Skip TLS
What should rule this choice is the SSL mode the PostgreSQL server requires for client connections. When DBaaS providers enforce secure connections, usually this means sslmode=require and for this option b above is enough. Only when the PostgreSQL server employs the more restrictive verify-ca or verify-full modes you will need to with option a and provide the TLS certificates and key.

Command-Line Tool

When configuring the monitoring of a PostgreSQL server using the pmm-admin command-line tool, the same options are available and the PMM documentation covers them as well. Here’s a quick example to illustrate configuring the monitoring for a remote PostgreSQL server when sslmode=require and providing a custom database name:

$ pmm-admin add postgresql --server-url=https://admin:admin@localhost:443 --server-insecure-tls --host=my.dbaas.domain.com --port=12345 --username=pmmuser --password=mYpassword --database=customdb --tls --tls-skip-verify

Note that option server-insecure-tls applies to the connection with the PMM server itself; the options prefixed with tls are the ones that apply to the connection to the PostgreSQL database.

Tracking Stats

You may have also noticed in the release notes that PMM 2.25.0 also provides support for the release candidate of our own pg_stat_monitor. For now, however, it is unlikely this extension will be made available by DBaaS providers, so you need to stick with pg_stat_statements for now:

pg_stat_statements

Make sure this extension is loaded for your instance’s custom database, otherwise, PMM won’t be able to track many important stats. That is the case already for all non-hobby Heroku Postgres but for other providers like Digital Ocean it needs to be created beforehand:

defaultdb=> select count(*) from pg_stat_statements;
ERROR:  relation "pg_stat_statements" does not exist
LINE 1: select count(*) from pg_stat_statements;
                             ^
defaultdb=> CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION

defaultdb=> select count(*) from pg_stat_statements;
 count 
-------
    60
(1 row)

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Oct
14
2021
--

Percona Is a Finalist for Best Use of Open Source Technologies in 2021!

Percona Finalist Open Source

Percona has been named a finalist in the Computing Technology Product Awards for Best Use of Open Source Technologies. If you’re a customer, partner, or just a fan of Percona and what we stand for, we’d love your vote.

With Great Power…

You know the phrase. We’re leaving it to you and your peers in the tech world to push us to the top.

Computing’s Technology Product Awards are open to a public vote until October 29. Vote Here!

percona Best Use of Open Source Technologies

Thank you for supporting excellence in the open source database industry. We look forward to the awards ceremony on Friday, November 26, 2021.

Why We’re an Open Source Finalist

A contributing factor to our success has been Percona Monitoring and Management (PMM), an open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical MySQL, MongoDB, PostgreSQL, and MariaDB database environments, no matter where they are located or deployed. It’s impressing customers, and even competitors, in the industry.

If you want to see how Percona became a finalist, learn more about Percona Monitoring and Management, and be sure to follow @Percona on all platforms.

Vote Today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com