Sep
09
2021
--

Q&A on Webinar “Using Open Source Software to Optimize and Troubleshoot Your MySQL Environment”

Optimize and Troubleshoot Your MySQL Environment

Optimize and Troubleshoot Your MySQL EnvironmentThanks to everyone who attended last week’s webinar on Using Open Source Software to Optimize and Troubleshoot Your MySQL Environment; hopefully you’ve found the time we spent in Percona Monitoring and Management (PMM) useful.

We had a record-breaking number of questions during the talk and unfortunately weren’t able to answer them all live, so we decided to answer them separately. Also, there were several requests for best practices around installation and configuration. This is something we are considering for the next webinar in this series, so stay tuned!

If you weren’t able to attend, the recording is available for viewing. But now, without further ado, here are the questions that we didn’t have time to cover during the presentation.

 

Q: Can PMM also be used for a web hosting server (Cpanel, Directadminetc)?

PMM by default can monitor a node to provide vital statistics on the health of the host.  From there, you can use external exporters to monitor other applications and send the data to PMM to visualize and create alerts.

 

Q: Does it provide any query optimization suggestions if my query is bad? 

Not at present…that’s planned for the future query advisor

 

Q: How soon we will be able to use the alerting manager in production?

We are looking at late Sept to early Oct. When it’s ready, you will hear about it!

 

Q: Capturing Queries Data for performance checking can be costly and some monitoring systems capture data every few seconds. At what level of data is captured here and analyzed…live systems with lots of database traffic? What percentage (all of it,  2 seconds, 1 second, etc.)?

We adhere to ‘do no harm’ so the impact of PMM  is typically 1-4% of the busiest systems.  We offer custom resolutions to adjust the scrape frequency to balance the need for information with the need for performance.

 

Q: Are long-running queries captured that potentially slow down the system over time & shown as graph/alert? Also, is there potentially more than one instance of these types running over again by a user.?

This is something we are going to include in our Alerting capabilities (coming soon, see above).

 

Q: Can more than one of the metrics be compared against each other to gain more insight into a problem in graphical form? Can you in effect play with these graphs?

Yes, you can, this is in fact how most of the dashboards are designed, where we connect different metric series together to drive graphs that explain system performance.  While you may be able to edit the existing graphs, Percona recommends that you instead make a copy of the dashboard you’d like to modify and make your changes on the copy.  The reason for this is if you modify a dashboard distributed by PMM, it will be overwritten on the next upgrade, and you’ll lose your changes.

 

Q: Could you list what can be monitored using PMM? And explain what recommended plugins are available and what they are used for? 

Natively, any Linux system and pretty much all flavors of MySQL, MariaDB, MongoDB, and PostgreSQL. You can use external exporters to gather even more data than default and using Grafana as the basis for visualization of PMM allows you to create custom dashboards and a wealth of community plugins.

 

Q: Can you choose to monitor a particular set of users? Set of queries? Set of schema? 

You can filter it down to view based on username, particular schema, and then filter those results by particular query strings.  We can monitor as much or as little about your database as the user you define to pull data.

 

Q: How can we work on optimization when using cloud-based services like RDS where we have limited access?

PMM can monitor RDS instances and has simplified the connection and selection process of its remote monitoring capabilities.  We can provide nearly the same data as an on-prem database however we don’t have access to the node level statistics.

 

Q: For Oracle MySQL 5.7.29, if you have many tables/objects in the database, will the PMM query information_schema and load the DB?

We have a predefined limit of 1000 tables that will disable polling information schema but you can configure this to your liking both with the client and with remote monitoring. This CAN have a more significant impact on your system though especially with large table and row counts.

 

Q: At what point do I know I’ve done enough optimization? 

HA! It’s a never-ending game of cat and mouse considering the sheer volume of variables in play. It’s these times where monitoring data for before and after become vital.

 

Q: Can a database monitoring package be the source of database performance issues? In particular, mysqld_exporter is installed as a docker container, as I’m seeing “out of resources” on a trace on mysqld_exporter.

Of course, there are plenty of ways to generate database performance issues and it’s possible monitoring can result in some overhead. For an extreme example, here’s one way to replicate some overhead: start the pmm-client on a MySQL database and restore a blank DB from mysqldump. A few million rows at a time should generate LOTS of chaos and load between QAN and exporters. Our pmm client runs the exporter natively so no need to use a container.

 

Q: Is the query analytics somehow slowing down the database server as well? Or is it save to enable/use it without further impact?

The impact is minimal.  Most of the Query Analytics processing is done at the PMM server, the only impact to the client is retrieving the queries from slowlog or performance schema so this can have a bigger impact for the most extremely active DB’s but still should remain below 5% CPU hit.

 

Q: Did I understand correctly that PMM is not for RDS users and that AWS tools are available?

PMM certainly is for RDS! Since RDS is managed by AWS, PMM cannot collect CPU/Disk/Memory metrics but all MySQL metrics are still available even in RDS.

 

Q: Do you have any instructions/steps to install PMM to monitor MySQL RDS? 

  • Gear icon ? PMM Inventory ? Add Instance
  • Choose AWS/RDS Add Remote Instance
  • Use your AWS credentials to view your available RDS & Aurora nodes
  • Ensure that performance_schema is enabled

 

Watch the Recording

Aug
31
2021
--

My Favorite Percona Monitoring and Management Additional Dashboards

Percona Monitoring and Management Dashboards

Percona Monitoring and Management (PMM) has dashboards that cover a lot of ground, yet PMM Superpowers come from the fact you do not need to stick to dashboards that are included with the product! You also can easily install additional dashboards provided by the Community, as well as implement your own.

In this blog post, we will cover some of the additional dashboards which I find particularly helpful.

Node Processes Dashboard

Node Processes Dashboard

Get insights into the processes on the system to better understand resource usage by your database server vs other stuff on the system.   Unexpected resource hog processes are a quite common cause of downtime and performance issues.  More information in the Understanding Processes on your Linux Host blog post.

MySQL Memory Usage Details

MySQL Memory Usage Details

Ever wondered where MySQL memory usage comes from? This dashboard can shed a light on this dark place, showing the top global memory consumers as well as what users and client hosts contribute to memory usage.  More details in the Understanding MySQL Memory Usage with Performance Schema blog post.

MySQL Query Performance Troubleshooting

MySQL Query Performance Troubleshooting

Want to understand which queries are responsible for CPU, Disk, Memory, or Network Usage and get some other advanced MySQL Query Troubleshooting tools? Check out this dashboard.  Read more about it in the  MySQL Query Performance Troubleshooting blog post.

RED Method for MySQL Dashboard

RED Method for MySQL Dashboard

Want to apply the RED (Rate-Errors-Duration)  method to MySQL?  Check out this dashboard, and check out RED Method for MySQL Performance Analyses for more details.

OK, so let’s say you’re convinced and want to get those dashboards into your PMM install but manual installation does not excite you.  Here is how you can use custom dashboard provisioning  to install all of them:

curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/main/misc/import-dashboard-grafana-cloud.sh --output import-dashboard-grafana-cloud.sh
curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/main/misc/cleanup-dash.py --output cleanup-dash.py

chmod a+x import-dashboard-grafana-cloud.sh
chmod a+x cleanup-dash.py

./import-dashboard-grafana-cloud.sh -s <PMM_SERVER_IP> -u admin:<ADMIN_PASSWORD> -f Custom -d 13266 -d 12630 -d 12470 -d 14239

Note:  Node Processes and MySQL Memory Usage Details dashboards also require additional configuration on the client-side. Check out the blog posts mentioned for specifics.

Enjoy!

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Aug
24
2021
--

PostgreSQL Custom Dashboards Ported to Percona Monitoring and Management 2

Custom Dashboards for PostgreSQL

At the recent Percona Live, Agustín Gallego and I did a presentation “Optimizing and Troubleshooting PostgreSQL with PMM“. While preparing for this talk, I’ve ported some of the older PostgreSQL custom dashboards that we published over the years over at grafana.com. Initially made for Percona Monitoring and Management (PMM) 1, they are now capable of being added to PMM 2 easily. In this short blog post, I will show you how to set up two of these dashboards on PMM 2: “PostgreSQL Tuple Statistics” and “PostgreSQL DB/Table Size Details”. Technically, the DB/Table Size one is for PMM 2, but there’s an easier way to set it up in recent PMM versions.

You can also check out the original blog post covering the installation of the Tuple Statistics dashboard for PMM 1: “Adding PostgreSQL Tuple Statistics Dashboard to the PMM Plugin“.

Disclaimer About PMM Versions

PostgreSQL support is continuously improving in PMM, and the engineering team is busy with QAN and exporter improvements. In PMM release 2.16.0, postgres_exporter was updated to the community version 0.8.0, which among other things significantly improves the way custom queries are executed. In this post, I assume that you’re using PMM version 2.16.0 or above. If you’re using an older version of PMM 2, you will have to use the dblink approach proposed in the original blog post. Note that some parts of the new dashboards may not work properly with the older PMM versions.

PostgreSQL Custom Dashboards Overview

The two dashboards that were ported are “PostgreSQL Tuple Statistics” and “PostgreSQL DB/Table Size Details“, both adding important missing pieces of information to PMM. They were initially implemented by Vadim Yalovets.

“PostgreSQL Tuple Statistics”, in addition to the breakdown of tuple operations, provides an overview of the dead/live tuples ratio and details on the history of vacuum executions. New addition with this port to PMM2 is that the dashboard now gives some insight into database age and current progress towards the “vacuum to prevent wraparound.” Having a view of operations happening on a per-table basis can highlight the hottest objects in the databases, and the vacuum details are a welcome addition to any monitoring system that works with PostgreSQL.

The “PostgreSQL DB/Table Size Details” dashboard is somewhat simpler and narrower in its scope. This dashboard gives an overview of database and table sizes and their rate of growth. Helpful when you want to understand where all the disk space went.

Setting up Custom Queries

Once you have set up the PMM client to monitor your PostgreSQL instance, you will find three directories where you can put custom queries:

# cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/
# ls -l
drwxr-xr-x. 2 pmm-agent pmm-agent  69 Jul 12 21:16 high-resolution
drwxr-xr-x. 2 pmm-agent pmm-agent 126 Jul 12 21:24 low-resolution
drwxr-xr-x. 2 pmm-agent pmm-agent  34 Jul 12 21:16 medium-resolution

Technical details can be found in the “Running Custom MySQL Queries in Percona Monitoring and Management” blog post. However, all you need to do is to create two files. Download or otherwise copy the following files to the low-resolution directory:

Once that’s done, restart the pmm-agent or kill the postgres_exporter process:

# systemctl restart pmm-agent

I recommend using the low resolution for these particular queries as, frankly, it doesn’t make a lot of sense to be checking database size and number of updates every 1 or 5 seconds.

Importing the Dashboards

Once the custom queries are prepared and the exporter is restarted, you can go ahead and import new dashboards in Grafana! See the official documentation on how to do that: Dashboards/Export and import.

You can pick up the dashboards on grafana.com: “PostgreSQL Tuple Statistics (Designed for PMM2)“, “PostgreSQL DB/Table Size Details (Designed for PMM2)“, or find raw sources over at my GitHub.

If everything worked as expected, you should see your new dashboards showing the data:

PMM Dashboard: PostgreSQL Tuple Details pt1

PostgreSQL Tuple Details pt1

PMM Dashboard: PostgreSQL Tuple Details pt2

PostgreSQL Tuple Details pt2

PMM Dashboard: PostgreSQL Tuple Details pt3

PostgreSQL Tuple Details pt3

PMM Dashboard: PostgreSQL DB/Table Size Details

PostgreSQL DB/Table Size Details

Enjoy monitoring with these PostgreSQL custom dashboards!

Jul
30
2021
--

Improve PostgreSQL Query Performance Insights with pg_stat_monitor

Improve PostgreSQL Query Performance pg_stat_monitor

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(Grand Unified Configuration).

You can find more about auto_explain here.

pg_stat_monitor

Whilst all previously mentioned views/extensions are great in their own right, one needs to manually combine client/connection information from pg_stat_activity, statistical data from pg_stat_statements, and query plan from auto_analyze to complete the dataset to understand query performance patterns

And that’s precisely the pain that pg_stat_monitor alleviates.

The feature set has been growing over the past year, with it providing, in a single view, all performance-related information that you may need to debug a low performant query. For more information about the extension see our GitHub repository, or for user-specific documentation, see our user guide.

Feature Set

Some features that were part of earlier releases are already discussed in this blog, however, for completeness, I’m going to discuss those here as well.

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals; time buckets. This allows for much better data accuracy, especially in the case of high resolution or unreliable networks.
  • Multi-Dimensional Grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses a more detailed group for higher precision:
    • Bucket ID (bucket),
    • User ID (userid),
    • Database ID (dbid),
    • Query ID (queryid),
    • Client IP Address (client_ip),
    • Plan ID (planid),
    • Application Name (application_name).

This allows you to drill down into the performance of queries coming from particular client addresses and applications, which we at Percona have found to be very valuable in a number of cases.

  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. Also, we found having query parameter values is very helpful, as you can run EXPLAIN on it, or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful when it can help identify issues. With the help of the histogram function, you can now view a timing/calling data histogram in response to a SQL query. And yes, it even works in psql.
SELECT * FROM histogram(0, 'F44CD1B4B33A47AF') AS a(range TEXT, freq INT, bar TEXT);
       range        | freq |              bar
--------------------+------+--------------------------------
  (0 - 3)}          |    2 | ??????????????????????????????
  (3 - 10)}         |    0 |
  (10 - 31)}        |    1 | ???????????????
  (31 - 100)}       |    0 |
  (100 - 316)}      |    0 |
  (316 - 1000)}     |    0 |
  (1000 - 3162)}    |    0 |
  (3162 - 10000)}   |    0 |
  (10000 - 31622)}  |    0 |
  (31622 - 100000)} |    0 |
(10 rows)

  • Functions: It may come as a surprise, but we do understand that functions may internally execute statements!!! To help ease the tracking and analysis, pg_stat_monitor now provides a column that specifically helps keep track of the top query for a statement so that you can backtrack to the originating function.
  • Relation Names: Relations used in a query are available in the “relations” column in the pg_stat_monitor view. This reduces work at your and makes analysis simpler and quicker.
  • Query Types: With query classification as SELECT, INSERT, UPDATE or DELETE, analysis becomes simpler. It’s another effort reduced at your end, and another simplification by pg_stat_monitor.
SELECT bucket, substr(query,0, 50) AS query, cmd_type FROM pg_stat_monitor WHERE elevel = 0;
 bucket |                       query                       | cmd_type 
--------+---------------------------------------------------+----------
      4 | END                                               | 
      4 | SELECT abalance FROM pgbench_accounts WHERE aid = | SELECT
      4 | vacuum pgbench_branches                           | 
      4 | select count(*) from pgbench_branches             | SELECT
      4 | UPDATE pgbench_accounts SET abalance = abalance + | UPDATE
      4 | truncate pgbench_history                          | 
      4 | INSERT INTO pgbench_history (tid, bid, aid, delta | INSERT

  • Query Metadata: Google’s Sqlcommenter is a useful tool that in a way bridges that gap between ORM libraries and understanding database performance. And we support it. So, you can now put any key value data in the comments in /* … */ syntax (see Sqlcommenter documentation for details) in your SQL statements, and the information will be parsed by pg_stat_monitor and made available in the comments column in pg_stat_monitor view.
CREATE EXTENSION hstore;
CREATE FUNCTION text_to_hstore(s text) RETURNS hstore AS $$
BEGIN
    RETURN hstore(s::text[]);
EXCEPTION WHEN OTHERS THEN
    RETURN NULL;
END; $$ LANGUAGE plpgsql STRICT;


SELECT 1 AS num /* { "application", java_app, "real_ip", 192.168.1.1} */;
 num 
-----
   1
(1 row)

SELECT query, text_to_hstore(comments)->'real_ip' AS real_ip from pg_stat_monitor;
query                                                                       |  real_ip 
----------------------------------------------------------------------------+-------------
 SELECT $1 AS num /* { "application", psql_app, "real_ip", 192.168.1.3) */  | 192.168.1.1

  • Logging Error and Warning: As seen in different monitoring/statics collector tools, most of the tools/extensions only monitor the successful queries. But in many cases, monitoring ERROR, WARNING, and LOG give meaningful information to debug the issue. pg_stat_monitor not only monitors the ERROR/WARNINGS/LOG but also collects the statistics about these queries. In PostgreSQL queries with ERROR/WARNING there is an error level (elevel), SQL Code (sqlcode), and an error message is attached. Pg_stat_monitor collects all this information along with its aggregates.
SELECT substr(query,0,50) AS query, decode_error_level(elevel) AS elevel,sqlcode, calls, substr(message,0,50) message 
FROM pg_stat_monitor;
                       query                       | elevel | sqlcode | calls |                      message                      
---------------------------------------------------+--------+---------+-------+---------------------------------------------------
 select substr(query,$1,$2) as query, decode_error |        |       0 |     1 | 
 select bucket,substr(query,$1,$2),decode_error_le |        |       0 |     3 | 
 select 1/0;                                       | ERROR  |     130 |     1 | division by zero

We’ve Come a Long Way

What started as a concept is now nearing its final approach. The pg_stat_monitor extension has evolved and has become very feature-rich. We have no doubt about its usefulness for DBAs, performance engineers, application developers, and anyone who needs to look at query performance. We believe it can help save many hours and help identify unexpected query behaviors. 

pg_stat_monitor is available on Github. We are releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out,  drop us a note, file an issue, or make a pull request!

Try Percona Distribution for PostgreSQL today; it’s free to download and use! 

Jul
07
2021
--

Inspecting MySQL Servers Part 5: Percona Monitoring and Management

Inspecting MySQL Servers PMM

Inspecting MySQL Servers PMMIn the previous posts of this series, I presented how the Percona Support team approaches the analysis and troubleshooting of a MySQL server using a tried-and-tested method supported by specific tools found in the Percona Toolkit:

Inspecting MySQL Servers Part 1: The Percona Support Way

Inspecting MySQL Servers Part 2: Knowing the Server

Inspecting MySQL Servers Part 3: What MySQL?

Inspecting MySQL Servers Part 4: An Engine in Motion

A drawback from such an approach is that data collection is done in a “reactive” way and (part of) it needs to be processed before we can interpret it. Enters Percona Monitoring and Management (PMM): PMM continually collects MySQL status variables and plots the metrics in easy-to-interpret Grafana graphs and panels. Plus, it includes a rich Query Analytics dashboard that helps identify the top slow queries and show how they are executing. It makes for an excellent complement to the approach we presented. In fact, many times it takes the central role: we analyze the data available on PMM and, if necessary, look at complementing it with pt-stalk samples. In this post, I will show you how we can obtain much of the same information we got from the Percona Toolkit tools (and sometimes more) from PMM.

* As was the case in the previous posts in this series, data, and graphs used to illustrate this post does not come from a single server and have been captured using different versions of PMM.

Know the Server

Once you are connected to PMM, you can select the target server under the Node Name field in the menu located on the top-left side of the interface, then select PMM dashboards on the left menu, System (Node), and, finally, Node Summary, as shown in the screenshot below:

PMM Dashboard

The header section of the Node Summary page shows the basic hardware specs of the server as well as a few metrics and projections. You will find on the right side of this section the full output of pt-summary, which we have scrutinized extensively in the second post of this series, there, waiting for you:

MySQL Node Summary
Below the header section, there are four panels dedicated to CPU, Memory, Disk, and Network, each containing graphics with specific metrics on each of these areas. It makes it easy, for example, to look at overall CPU utilization:

CPU utilization

Recent spikes in I/O activity:

And memory usage:

Note the graphs cover the last 12 hours of activity by default but you can select a different time range in the top-right menu:

What MySQL?

Taking a slightly different route by selecting MySQL instead of System (Node) and then MySQL Summary, we get to access a dashboard that displays MySQL-specific metrics for the selected instance:

MySQL Summary

Under the Service Summary panel, you will find the full output of pt-mysql-summary, which we reviewed in detail in the third post of this series:

The main goal of the pt-mysql-summary is to provide a sneak-peek into how MySQL is configured, at a single point in time.  With PMM you get instant access to most of the MySQL trends and status variables we only get a glance from in the report. We can go and look straight under the hood to look at the engine characteristics while it is under load, over the last 5 minutes to the last 30 days or more!

An Engine in Motion

There is so much we can look at at this point. If we go and more or less follow the sequence observed in the previous posts we can start by checking if the table cache is big enough. The example below shows it to be just right, if we base ourselves in the limited time frame this particular sample covers, with an average hit ratio close to 100%:

MySQL Table Open Cache Status

Or we can look for a disruption in the pattern, such as a peak in threads connected:

And then investigate the effects it caused on the server (or was it already a consequence of something else that occurred?), for example, a change in the rate of temporary tables created at that time for both in-memory and on-disk tables:

The MySQL Instance Summary is just one of many dashboards available for MySQL:

Under the MySQL InnoDB Details dashboard we find many InnoDB-specific metrics plotted as a multitude of different graphs, providing a visual insight into things such as the number of requests that can be satisfied from data that is already loaded in the Buffer Pool versus those that must be first read from disk (does my hot data fit in memory?):

InnoDB Buffer Pool Requests

Besides MySQL status variables metrics, there is also data filtered directly from SHOW ENGINE INNODB STATUS. For instance, we can find long-running transactions based on increasing values of InnoDB’s history length list:

Another perk of PMM is the ability to easily evaluate whether redo log space is big enough based on the rate of writes versus the size of the log files:

And thus observe checkpoint age, a concept that is explained in detail for PMM in How to Choose the MySQL innodb_log_file_size:

Another evaluation made easy with PMM is whether a server’s workload is benefitting from having InnoDB’s Adaptive Hash Index (AHI) enabled. The example below shows an AHI hit-ratio close to 100% up to a certain point, from which the number of searches increased and the situation inverted:

The evaluation of settings like the size of the redo log space and the efficiency of AHI should be done at a macro level, spanning days: we should be looking for what is the best general configuration for these. However, when we are investigating a particular event, it is important to zoom in on the time frame where it occurred to better analyze the data captured at the time. Once you do this, change the data resolution from the default of auto to 1s or 5s interval/granularity so you can better see spikes and overall variation: 

QAN: Query Analytics

Query analysis is something I only hinted at but didn’t explore in the first articles in this series. The “manual” way requires processing the slow query log with a tool such as pt-query-digest and then going for details about a particular query by connecting to the server to obtain the execution plan and schema details. A really strong feature of PMM is the Query Analytics dashboard, which provides a general overview of query execution and captures all information about it for you. 

The example below comes from a simple sysbench read-write workload on my test server:

PMM Query AnalyticsWe can select an individual query on the list and check the details of its execution:

The query’s  EXPLAIN plan is also available, both in classic and JSON formats:

You can read more about QAN on our website as well as in other posts on our blog platform, such as How to Find Query Slowdowns Using Percona Monitoring and Management.

What PMM Does Not Include

There remains information/data we cannot obtain from PMM, such as the full output of SHOW ENGINE INNODB STATUS. For situations when obtaining this information is important, we resort back to pt-stalk. It is not one or the other, we see them as complementary tools in our job of inspecting MySQL servers. 

If you are curious about PMM and would like to see how it works in practice, check our demo website at https://pmmdemo.percona.com/. To get up and running with PMM quickly, refer to our quickstart guide.

Tuning the Engine for the Race Track

There you have it! It certainly isn’t all there is but we’ve got a lot packed in this series, enough to get you moving in the right direction when it comes to inspecting and troubleshooting MySQL servers. I hope you have enjoyed the journey and learned a few new tricks along the way ?

Jul
06
2021
--

Move Percona Monitoring and Management Server Data From One Instance Type to Another

Move Percona Monitoring and Management Server Data

Move Percona Monitoring and Management Server DataPercona Monitoring and Management (PMM2) Server runs as a Docker container, a Virtual appliance, or as an instance on Amazon or Azure cloud services. Here I’ll show how to move the PMM Server and its data from one type to another.

Note, this is only for PMM2 to PMM2—you can’t migrate data from PMM Server version 1 to version 2 because of significant architectural differences.

For this exercise, imagine that your PMM server:

  • Is running on an Amazon EC2 instance (Server A) from an AMI,
  • You want to move it to a dedicated server (Server B) running as a Docker container.
  • Server A monitors one client instance (node1) with a MongoDB service (mongodb1).

Here’s the output of pmm-admin status for this instance.

pmm-admin status

Export Data

PMM2 data is stored in the /srv folder for all types of installations. So first make a backup archive of it.

tar -cv /srv | gzip > pmm-data.tar.gz

Copy this archive to Server B.

scp pmm-data.tar.gz user1@172.17.0.2:~/

Prepare New Server

Connect to Server B run all further commands on this server. Prepare the Docker container.

docker create -v /srv --name pmm-data percona/pmm-server:2 /bin/true

Next extract exported data from the archive.

tar -zxvf pmm-data.tar.gz -C /tmp

Create a container for the new PMM Server with a

/srv

partition on a separate container (

pmm-data

).

docker run -d -p 443:443 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:2

Stop all services and copy the exported data into the container.

docker exec -it pmm-server supervisorctl stop all
docker exec -it pmm-server sh -c 'cd /; rm -rf /srv/victoriametrics/data'
docker cp /tmp/srv pmm-data:/

Restore permissions for migrated data folders.

docker exec -it pmm-server chown -R root:pmm /srv/clickhouse /srv/ia /srv/nginx /srv/pmm-distribution /srv/update
docker exec -it pmm-server chown -R pmm:pmm /srv/logs /srv/victoriametrics /srv/alertmanager /srv/prometheus
docker exec -it pmm-server chown -R grafana:grafana /srv/grafana
docker exec -it pmm-server chown -R postgres:postgres /srv/postgres /srv/logs/postgresql.log

Restart PMM Server so that it reloads files with the correct permissions.

docker restart pmm-server

Switch Services to New Server

That’s it! Now you can switch your monitored node1 to use the new server (Server B).

Edit the the PMM agent configuration file

/usr/local/percona/pmm2/config/pmm-agent.yaml

.

Set the IP address of Server B (

172.17.0.2

) and restart

pmm-agent

.

systemctl restart pmm-agent

Check Status

Check the status of

pmm-agent

and monitored services with

pmm-admin status

.

Check Status PMM
The agent is now connected to your new server.

In the Grafana UI, you can see the migrated data of Server B. (The time gap in the data is how long it took to run the import and switch node1 to the new server.)

Grafana UI

If historical data is here then we are done. Otherwise, please follow the commands that are provided in the next section.

Export/Import VictoriaMetrics Data

Copy the metrics in the VictoriaMetrics time-series database using an API request for export/import data. (You can do the export remotely and run all further commands on Server B.)

curl -k -G -u admin:admin https://3.86.222.201/prometheus/api/v1/export/native -d 'match={__name__!=""}' > exported_data.dump

Next import the VictoriaMetrics data.

curl -k -u admin:admin -X POST https://172.17.0.2/prometheus/api/v1/import/native -T exported_data.dump

By default, the maximum allowed size of the client request body for PMM Server’s Nginx service is 10Mb. If

exported_data.dump

is bigger than this you must increase the limit and repeat the import.

docker exec -it pmm-server bash -c "sed -i 's/client_max_body_size 10m;/client_max_body_size 1000m;/g' /etc/nginx/conf.d/pmm.conf"
docker exec -it pmm-server bash -c "supervisorctl restart nginx"

Conclusion

You can use the same process to move from any instance type to another. Also, we have got a separate blog post about how to migrate if the pmm-data container isn’t used. Check it out!

Jul
01
2021
--

Percona Monitoring and Management – MySQL Semi-Sync Summary Dashboard

Percona Monitoring and Management - MySQL Semi-Sync Summary Dashboard

Percona Monitoring and Management - MySQL Semi-Sync Summary DashboardSome of you may use MySQL’s asynchronous replication feature called Semisynchronous Replication (aka semi-sync), and now with the MySQL Semi-Sync Summary Dashboard + Percona Monitoring and Management (PMM), you can see the most important metrics! Refer to the Install & Usage steps for deployment details (note you need Replication Set defined!).

What is Semisynchronous Replication

When enabled, Semisynchronous Replication instructs the Primary to wait until at least one replica has received and logged the event to the replica’s local relay log before completing the COMMIT on a transaction. This provides a higher level of data integrity because now it is known that the data exists in two places. This feature ensures a balance between data integrity (number of replicas acknowledging receipt of a transaction) vs the speed of commits, which will be slower since they need to wait on replica acknowledgment. Also, keep in mind that semi-sync does not wait for COMMIT on the replica; it only waits until the transaction is queued in the relay log. The actual execution of the transaction from the relay log is still asynchronous.

Dashboard Layout

Now that we know we can improve data integrity but pay a penalty on writes, I want to display the following information:

  1. Replica semi-sync status – enabled or not
  2. Waits by type, on Network or on Transactions
  3. How much total time was spent waiting on Transactions – what’s my penalty due to writes slowing down
  4. How much average time was spent waiting per transaction
  5. Commit acknowledgments – what’s my replication throughput

Replica Status

This lists the states that each replica has been in, whether the Replica semi-sync was enabled or disabled:

Percona Monitoring and Management

Waits by Net & TX

How many waits on the Network and on Transactions. Since the Primary is only waiting on one successful acknowledgment even though there could be multiple semi-sync replicas (the fastest one wins), your count of TX waits should be the same as TX commits on the Primary, but the wait on Network can be much higher.

Waits by Net & TX

Time Spent Waiting on Transactions

This is the contribution to query latency that semi-sync incurs on the Primary related to waits on transactions.

Time spent waiting on Transactions

Average Wait Time per Transaction

This is the overhead of waiting on a single transaction acknowledgment.

Average wait time per transaction

Commit Acknowledgments

The semi-synchronous replication feature considers the possibility that Replicas may be unavailable, and is controlled by the rpl_semi_sync_master_timeout.  This controls how long the Primary will wait on a commit for acknowledgment from a Replica before timing out and reverting to asynchronous replication. Was the commit acknowledged by semi-sync (Yes) or did the Primary lose all semi-sync replicas and did not acknowledge the commit (No) aka running in asynchronous mode.  You should be seeing Acknowledged only when things are working smoothly.

Commit Acknowledgements

Installation & Usage

  1. Download the dashboard definition in JSON from https://grafana.com/grafana/dashboards/14636/
  2. Import into PMM Server (tested on 2.18 but should work on older 2.x versions)

I built the dashboard to leverage the Replication Set (–replication-set) variable (which can be set to any string you want), so you will need this enabled for all servers that you want to view statistics, for example, your pmm-admin add mysql statement should look like:

pmm-admin add mysql … --replication-set=semi-sync

You can check to see whether the Replication Set variable is defined by referencing the PMM Inventory dashboard, in the last column called Other Details:

When you have the dashboard loaded, select your Replication Set from the drop-down:

New to Percona Monitoring and Management (PMM)?

Check out the PMM Quickstart guide, which helps you deploy docker for PMM Server, and pmm2-client package from the Percona Repositories, to have you up and monitoring in minutes!

I hope you find this dashboard useful! Feel free to let me know if there are missing fields or other features you’d like to see included!

Jun
14
2021
--

MongoDB Integrated Alerting in Percona Monitoring and Management

MongoDB Integrated Alerting

MongoDB Integrated AlertingPercona Monitoring and Management (PMM) recently introduced the Integrated Alerting feature as a technical preview. This was a very eagerly awaited feature, as PMM doesn’t need to integrate with an external alerting system anymore. Recently we blogged about the release of this feature.

PMM includes some built-in templates, and in this post, I am going to show you how to add your own alerts.

Enable Integrated Alerting

The first thing to do is navigate to the PMM Settings by clicking the wheel on the left menu, and choose Settings:

Next, go to Advanced Settings, and click on the slider to enable Integrated Alerting down in the “Technical Preview” section.

While you’re here, if you want to enable SMTP or Slack notifications you can set them up right now by clicking the new Communications tab (which shows up after you hit “Apply Changes” turning on the feature).

The example below shows how to configure email notifications through Gmail:

You should now see the Integrated Alerting option in the left menu under Alerting, so let’s go there next:

Configuring Alert Destinations

After clicking on the Integrated Alerting option, go to the Notification Channels to configure the destination for your alerts. At the time of this writing, email via your SMTP server, Slack and PagerDuty are supported.

Creating a Custom Alert Template

Alerts are defined using MetricsQL which is backward compatible with Prometheus QL. As an example, let’s configure an alert to let us know if MongoDB is down.

First, let’s go to the Explore option from the left menu. This is the place to play with the different metrics available and create the expressions for our alerts:

To identify MongoDB being down, one option is using the up metric. The following expression would give us the alert we need:

up{service_type="mongodb"}

To validate this, I shut down a member of a 3-node replica set and verified that the expression returns 0 when the node is down:

The next step is creating a template for this alert. I won’t go into a lot of detail here, but you can check Integrated Alerting Design in Percona Monitoring and Management for more information about how templates are defined.

Navigate to the Integrated Alerting page again, and click on the Add button, then add the following template:

---
templates:
  - name: MongoDBDown
    version: 1
    summary: MongoDB is down
    expr: |-
      up{service_type="mongodb"} == 0
    severity: critical
    annotations:
      summary: MongoDB is down ({{ $labels.service_name }})
      description: |-
        MongoDB {{ $labels.service_name }} on {{ $labels.node_name }} is down

This is how it looks like:

Next, go to the Alert Rules and create a new rule. We can use the Filters section to add comma-separated “key=value” pairs to filter alerts per node, per service, per agent, etc.

For example: node_id=/node_id/123456, service_name=mongo1, agent_id=/agent_id/123456

After you are done, hit the Save button and go to the Alerts dashboard to see if the alert is firing:

From this page, you can also silence any firing alerts.

If you configured email as a destination, you should have also received a message like this one:

For now, a single notification is sent. In the future, it will be possible to customize the behavior.

Creating MongoDB Alerts

In addition to the obvious “MongoDB is down” alert, there are a couple more things we should monitor. For starters, I’d suggest creating alerts for the following conditions:

  • Replica set member in an unusual state
mongodb_replset_member_state != 1 and mongodb_replset_member_state != 2

  • Connections higher than expected
avg by (service_name) (mongodb_connections{state="current"}) > 5000

  • Cache evictions higher than expected
avg by(service_name, type) (rate(mongodb_mongod_wiredtiger_cache_evicted_total[5m])) > 5000

  • Low WiredTiger tickets
avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50

The values listed above are just for illustrative purposes, you need to decide the proper thresholds for your specific environment(s).

As another example, let’s add the alert template for the low WiredTiger tickets:

---
templates:
  - name: MongoDB Wiredtiger Tickets
    version: 1
    summary: MongoDB Wiredtiger Tickets low
    expr: avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50
    severity: warning
    annotations:
      description: "WiredTiger available tickets on (instance {{ $labels.node_name }}) are less than 50"

Conclusion

Integrated alerting is a really nice to have feature. While it is still in tech preview state, we already have a few built-in alerts you can test, and also you can define your own. Make sure to check the Integrated Alerting official documentation for more information about this topic.

Do you have any specific MongoDB alerts you’d like to see? Given the feature is still in technical preview, any contributions and/or feedback about the functionality are welcome as we’re looking to release this as GA very soon!

Jun
03
2021
--

Webinar June 17: Open Mic on Open Source – Percona Monitoring and Management

June Webinar Percona Monitoring and Management

June Webinar Percona Monitoring and ManagementJoin our expert team for a live demo of the Percona Monitoring and Management (PMM) platform and witness the power of Percona support! Bring your questions to the table in a live, open forum discussion.  Brian Walters, Sr. Director of Solution Engineering, and Iwo Panowicz, Technical Support Engineer, will highlight features and pain points of Percona Monitoring and Management in an interactive demo of the product.  In this unique virtual event, we want to hear from you!

PMM is a free, best-of-breed, open source database monitoring solution. It helps you:

  1. Reduce complexity
  2. Save costs
  3. Optimize performance
  4. Improve the security of your business-critical database environments on-premise, in the cloud, or hybrid.

Please join Brian Walters, Sr. Director of Solution Engineering, and Iwo Panowicz, Technical Support Engineer, on June 17, 2021, at 11:00 CST for their webinar Open Mic on Open Source: Percona Monitoring and Management.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.

May
26
2021
--

Compiling a Percona Monitoring and Management v2 Client in ARM: Raspberry Pi 3 Reprise

Percona Monitoring and Management Client Raspberry Pi 3

Percona Monitoring and Management Client Raspberry Pi 3In this follow-up blog to Compiling a Percona Monitoring and Management v2 Client in ARM Architecture, we will show what changes are needed to get the latest versions of PMM working on ARM architecture. In this case, we will do it using a Raspberry Pi 3, instead of the AWS EC2 ARM node. With these two blogs, we are covering a good deal of ground in respect to ways of deploying PMM in ARM. Let us know if you use any other device, though, and we’ll be happy to try if we can get ahold of one!

The latest versions will need the vmagent for supporting the newly added VictoriaMetrics functionality. In this case, the binaries are precompiled by the VM team, so we just need to download them and copy them to the appropriate directories. These steps are exactly the same for AWS, but you should use the ARM64 packages instead.

For ease of use, we have again included binaries in the following project (note that versions kept there may not be current latest at any point): https://github.com/guriandoro/pmm_arm, and a gist with the commands used.

Installing Dependencies

As always, let’s get some basic packages needed out of the way:

sudo apt-get install git
curl -LO https://golang.org/dl/go1.15.2.linux-armv6l.tar.gz
tar xzf go1.15.2.linux-armv6l.tar.gz
sudo mv go /usr/local/go
echo "export PATH=\$PATH:/usr/local/go/bin" >> .bashrc
source .bashrc

shell> go version
go version go1.15.2 linux/arm

After we install these tools, we will clone all needed projects to our Go source directory:

mkdir -p ~/go/src/github.com/percona/
cd ~/go/src/github.com/percona/
git clone https://github.com/percona/pmm-admin.git
git clone https://github.com/percona/pmm-agent.git
git clone https://github.com/percona/node_exporter.git

mkdir -p ~/go/src/github.com/percona/
cd ~/go/src/github.com/percona/
git clone https://github.com/percona/pmm-admin.git
git clone https://github.com/percona/pmm-agent.git
git clone https://github.com/percona/node_exporter.git

And get the VictioraMetrics vmagent binary ready, too:

cd /tmp/
curl -LO https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.59.0/vmutils-arm-v1.59.0.tar.gz
tar xzf vmutils-arm-v1.59.0.tar.gz

Compiling

As with the previous blog, pmm-admin compilation is straightforward:

cd ~/go/src/github.com/percona/pmm-admin
make release

For node_exporter, we will need the changes for PR 1196:

cd ~/go/src/github.com/percona/node_exporter/
curl -LO https://raw.githubusercontent.com/pgier/node_exporter/1fc4d14c6613766483a8b9117ffa9cc50c1a976d/Makefile.common
make build

Lastly, for pmm-agent, we will also need an additional step. However, before we can introduce the changes, we need to compile and let it fail once:

cd ~/go/src/github.com/percona/pmm-agent
make release
...
/home/pi/go/pkg/mod/github.com/!percona-!lab/pg_query_go@v1.0.1-0.20190723081422-3fc3af54a6f7/parser/include/port/atomics.h:68:10: fatal error: port/atomics/arch-arm.h: No such file or directory
 #include "port/atomics/arch-arm.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:27: release] Error 2

We will then need to download the arch-arm.h file to that directory:

cd ~/go/pkg/mod/github.com/\!percona-\!lab/pg_query_go@v1.0.1-0.20190723081422-3fc3af54a6f7/parser/include/port/atomics/
sudo curl -LO https://raw.githubusercontent.com/postgres/postgres/master/src/include/port/atomics/arch-arm.h
sudo chown pi:pi arch-arm.h

And then we are ready to compile pmm-agent successfully:

cd ~/go/src/github.com/percona/pmm-agent
make release

Moving the Files to Their Final Destination

As the last step, let’s create the needed directories and move the files where they belong.

sudo mkdir -p /usr/local/percona/pmm2/
cd /usr/local/percona/pmm2/
sudo mkdir exporters/
sudo mkdir config/
sudo mkdir -p collectors/textfile-collector/high-resolution
sudo mkdir -p collectors/textfile-collector/medium-resolution
sudo mkdir -p collectors/textfile-collector/low-resolution

cd ~/go/src/github.com/percona/
sudo cp pmm-admin/bin/pmm-admin /usr/local/bin/
sudo cp pmm-agent/bin/pmm-agent /usr/local/bin/

sudo cp node_exporter/node_exporter /usr/local/percona/pmm2/exporters/
sudo cp /tmp/vmagent-prod /usr/local/percona/pmm2/exporters/vmagent

Starting the PMM Client

At last, we can configure and start the client!

sudo pmm-agent setup \
 --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml \
 --server-address=192.168.1.121:443 --server-insecure-tls \
 --server-username=admin --server-password=admin \
 192.168.1.247 generic raspberrypi

sudo pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml

Check if things are working:

shell> pmm-admin list
Service type        Service name        Address and port        Service ID

Agent type           Status           Metrics Mode        Agent ID                                              Service ID
pmm_agent            Connected                            /agent_id/2e2ed51d-f573-4510-8f4f-8b7899ef4f67
node_exporter        Running          push                /agent_id/6f34eb76-429f-47fb-a178-a1bbf4ddadcb
vmagent              Running          push                /agent_id/80a9e104-aab6-48ef-a709-475330385e32

OS and Disk Usage

Some information about the OS and kernel used:

shell> uname -a
Linux raspberrypi 5.4.83-v7+ #1379 SMP Mon Dec 14 13:08:57 GMT 2020 armv7l GNU/Linux

shell> lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster

As for disk usage, the four compiled binaries will take up 72Mb of space.

Extra Tools

We can also download pt-summary, to get the reports in the Node Summary dashboards:

cd /usr/local/percona/pmm2/tools/
curl -LO percona.com/get/pt-summary
chmod +x pt-summary

We Come in Peace… We Mean you no Harm

Resource usage is in the expected (very low) overhead ranges, as we can see in the following before (completely idle) and after (only PMM client binaries running) top samples.

Before:

top - 05:27:42 up 28 min,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 135 total,   1 running, 134 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.9 us,  1.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    924.8 total,    543.4 free,    104.3 used,    277.1 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    753.9 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1082 root      20   0   10432   2864   2480 R  11.1   0.3   0:00.05 top
    1 root      20   0   33784   8072   6392 S   0.0   0.9   0:05.36 systemd
...

After:

top - 05:46:09 up 47 min,  5 users,  load average: 0.08, 0.10, 0.09
Tasks: 144 total,   1 running, 143 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.0 us,  0.7 sy,  0.0 ni, 97.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 s
MiB Mem :    924.8 total,    495.2 free,    151.9 used,    277.8 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    713.5 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1136 root      20   0  805056  18368   8136 S   7.9   1.9   0:44.10 node_exporter
 1141 root      20   0  804132  33396   7244 S   1.3   3.5   0:20.27 vmagent
 1214 pi        20   0   10404   3084   2564 R   0.7   0.3   0:00.13 top
...

Conclusion

With this and the blog post linked at the beginning, you should have all the data you need to compile and run Percona Monitoring and Management (PMM) client in either a Raspberry Pi 3 or an AWS EC2 ARM node.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com