Resource Usage Improvements in Percona Monitoring and Management 1.13

PMM 1-13 reduction CPU usage by 5x

In Percona Monitoring and Management (PMM) 1.13 we have adopted Prometheus 2, and with this comes a dramatic improvement in resource usage, along with performance improvements!

What does it mean for you? This means you can have a significantly larger number of servers and database instances monitored by the same PMM installation. Or you can reduce the instance size you use to monitor your environment and save some money.

Let’s look at some stats!

CPU Usage

PMM 1.13 reduction in CPU usage by 5x

Percona Monitoring and Management 1.13 reduction in CPU usage after adopting Prometheus 2 by 8x

We can see an approximate 5x and 8x reduction of CPU usage on these two PMM Servers. Depending on the workload, we see CPU usage reductions to range between 3x and 10x.

Disk Writes

There is also less disk write bandwidth required:

PMM 1.13 reduction in disk write bandwidth

On this instance, the bandwidth reduction is “just” 1.5x times. Note this is disk IO for the entire PMM system, which includes more than only the Prometheus component. Prometheus 2 itself promises much more significant IO bandwidth reduction according to official benchmarks

According to the same benchmark, you should expect disk space usage reduction by 33-50% for Prometheus 2 vs Prometheus 1.8. The numbers will be less for Percona Monitoring and Management, as it also stores Query Statistics outside of Prometheus.

Resource usage on the monitored hosts

Also, resource usage on the monitored hosts is significantly reduced:

Percona Monitoring and Management 1.13 reduction of resource usage by Prometheus 2

Why does CPU usage go down on a monitored host with a Prometheus 2 upgrade? This is because PMM uses TLS for the Prometheus to monitored host communication. Before Prometheus 2, a full handshake was performed for every scrape, taking a lot of CPU time. This was optimized with Prometheus 2, resulting in a dramatic CPU usage decrease.

Query performance is also a lot better with Prometheus 2, meaning dashboards visually load a lot faster, though we did not do any specific benchmarks here to share the hard numbers. Note though this improvement only applies when you’re querying the data which is stored in Prometheus 2.

If you’re querying data that was originally stored in Prometheus 1.8, it will be queried through the much slower and less efficient “Remote Read” interface, being quite a bit slower and using a lot more CPU and memory resources.

If you love better efficiency and Performance, consider upgrading to PMM 1.13!

The post Resource Usage Improvements in Percona Monitoring and Management 1.13 appeared first on Percona Database Performance Blog.


Percona Monitoring and Management 1.13.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

The most significant feature in this release is Prometheus 2, however we also packed a lot of visual changes into release 1.13:

  • Prometheus 2 – Consumes less resources, and Dashboards load faster!
  • New Dashboard: Network Overview – New dashboard for all things IPv4!
  • New Dashboard: NUMA Overview – New Dashboard! Understand memory allocation across DIMMs
  • Snapshots and Updates Improvements – Clearer instructions for snapshot sharing, add ability to disable update reporting
  • System Overview Dashboard improvements – See high level summary, plus drill in on CPU, Memory, Disk, and Network
  • Improved SingleStat for percentages – Trend line now reflects percentage value

We addressed 13 new features and improvements, and fixed 13 bugs.

Prometheus 2

The long awaited Prometheus 2 release is here!  By upgrading to PMM release 1.13, Percona’s internal testing has shown you will achieve a 3x-10x reduction in CPU usage, which translates into PMM Server being able to handle more instances than you could in 1.12.  You won’t see any gaps in graphs since internally PMM Server will run two instances of Prometheus and leverage remote_read in order to provide consistent graphs!

Our Engineering teams have worked very hard to make this upgrade as transparent as possible – hats off to them for their efforts!!

Lastly on Prometheus 2, we also included a new set of graphs to the Prometheus Dashboard to help you better understand when your PMM Server may run out of space. We hope you find this useful!

Network Overview Dashboard

We’re introducing a new dashboard that focuses on all things Networking – we placed a Last Hour panel highlighting high-level network metrics, and then drill into Network Traffic + Details, then focus on TCP, UDP, and ICMP behavior.

Snapshots and Updates Improvements

Of most interest to current Percona Customers, we’ve clarified the instructions on how to take a snapshot of a Dashboard in order to highlight that you are securely sharing with Percona. We’ve also configured the sharing timeout to 30 seconds (up from 4 seconds) so that we more reliably share useful data to Percona Support Engineers, as shorter timeout led to incomplete graphs being shared.

Packed into this feature is also a change to how we report installed version, latest version, and what’s new information:

Lastly, we modified the behavior of the docker environment option DISABLE_UPDATES to remove the Update button.  As a reminder, you can choose to disable update reporting for environments where you want tighter control over (i.e. lock down) who can initiate an update by launching the PMM docker container along with the environment variable as follows:

docker run ... -e DISABLE_UPDATES=TRUE

System Overview Dashboard Improvements

We’ve updated our System Overview Dashboard to focus on the four criteria of CPU, Memory, Disk, and Network, while also presenting a single panel row of high level information (uptime, count of CPUs, load average, etc)

Our last feature we’re introducing in 1.13 is a fix to SingleStat panels where the percentage value is reflected in the level of the trend line in the background.  For example, if you have a stat panel at 20% and 86%, the line in the background should fill the respective amount of the box:Improved SingleStat for percentages

New Features & Improvements

  • PMM-2225 – Add new Dashboard: Network Overview
  • PMM-2485 – Improve Singlestat for percentage values to accurately display trend line
  • PMM-2550 – Update to Prometheus 2
  • PMM-1667 – New Dashboard: NUMA Overview
  • PMM-1930 – Reduce Durability for MySQL
  • PMM-2291 – Add Prometheus Disk Space Utilization Information
  • PMM-2444 – Increase space for legends
  • PMM-2594 – Upgrade to Percona Toolkit 3.0.10
  • PMM-2610 – Configure Snapshot Timeout Default Higher and Update Instructions
  • PMM-2637 – Check for Updates and Disable Updates Improvements
  • PMM-2652 – Fix “Unexpected error” on Home dashboard after upgrade
  • PMM-2661 – Data resolution on Dashboards became 15sec min instead of 1sec
  • PMM-2663 – System Overview Dashboard Improvements

Bug Fixes

  • PMM-1977 – after upgrade pmm-client (1.6.1-1) can’t start mysql:metrics – can’t find .my.cnf
  • PMM-2379 – Invert colours for Memory Available graph
  • PMM-2413 – Charts on MySQL InnoDB metrics are not fully displayed
  • PMM-2427 – Information loss in CPU Graph with Grafana 5 upgrade
  • PMM-2476 – AWS PMM is broken on C5/M5 instances
  • PMM-2576 – Error in logs for MySQL 8 instance on CentOS
  • PMM-2612 – Wrong information in PMM Scrapes Task
  • PMM-2639 – mysql:metrics does not work on Ubuntu 18.04
  • PMM-2643 – Socket detection and MySQL 8
  • PMM-2698 – Misleading Graphs for Rare Events
  • PMM-2701 – MySQL 8 – Innodb Checkpoint Age
  • PMM-2722 – Memory auto-configuration for Prometheus evaluates to minimum of 128MB in

How to get PMM Server

PMM is available for installation using three methods:

The post Percona Monitoring and Management 1.13.0 Is Now Available appeared first on Percona Database Performance Blog.


Percona Monitoring and Management 1.12.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

In release 1.12, we invested our efforts in the following areas:

  • Visual Explain in Query Analytics – Gain insight into MySQL’s query optimizer for your queries
  • New Dashboard – InnoDB Compression Metrics – Evaluate effectiveness of InnoDB Compression
  • New Dashboard – MySQL Command/Handler Compare – Contrast MySQL instances side by side
  • Updated Grafana to 5.1 – Fixed scrolling issues

We addressed 10 new features and improvements, and fixed 13 bugs.

Visual Explain in Query Analytics

We’re working on substantial changes to Query Analytics and the first part to roll out is something that users of Percona Toolkit may recognize – we’ve introduced a new element called Visual Explain based on pt-visual-explain.  This functionality transforms MySQL EXPLAIN output into a left-deep tree representation of a query plan, in order to mimic how the plan is represented inside MySQL.  This is of primary benefit when investigating tables that are joined in some logical way so that you can understand in what order the loops are executed by the MySQL query optimizer. In this example we are demonstrating the output of a single table lookup vs two table join:

Single Table Lookup Two Tables via INNER JOIN
FROM sbtest13
AND 49907
SELECT sbtest3.c
FROM sbtest1
INNER JOIN sbtest3
ON =
WHERE sbtest3.c ='long-string';

InnoDB Compression Metrics Dashboard

A great feature of MySQL’s InnoDB storage engine includes compression of data that is transparently handled by the database, saving you space on disk, while reducing the amount of I/O to disk as fewer disk blocks are required to store the same amount of data, thus allowing you to reduce your storage costs.  We’ve deployed a new dashboard that helps you understand the most important characteristics of InnoDB’s Compression.  Here’s a sample of visualizing Compression and Decompression attempts, alongside the overall Compression Success Ratio graph:


MySQL Command/Handler Compare Dashboard

We have introduced a new dashboard that lets you do side-by-side comparison of Command (Com_*) and Handler statistics.  A common use case would be to compare servers that share a similar workload, for example across MySQL instances in a pool of replicated slaves.  In this example I am comparing two servers under identical sysbench load, but exhibiting slightly different performance characteristics:

The number of servers you can select for comparison is unbounded, but depending on the screen resolution you might want to limit to 3 at a time for a 1080 screen size.

New Features & Improvements

  • PMM-2519: Display Visual Explain in Query Analytics
  • PMM-2019: Add new Dashboard InnoDB Compression metrics
  • PMM-2154: Add new Dashboard Compare Commands and Handler statistics
  • PMM-2530: Add timeout flags to mongodb_exporter (thank you unguiculus for your contribution!)
  • PMM-2569: Update the MySQL Golang driver for MySQL 8 compatibility
  • PMM-2561: Update to Grafana 5.1.3
  • PMM-2465: Improve pmm-admin debug output
  • PMM-2520: Explain Missing Charts from MySQL Dashboards
  • PMM-2119: Improve Query Analytics messaging when Host = All is passed
  • PMM-1956: Implement connection checking in mongodb_exporter

Bug Fixes

  • PMM-1704: Unable to connect to AtlasDB MongoDB
  • PMM-1950: pmm-admin (mongodb:metrics) doesn’t work well with SSL secured mongodb server
  • PMM-2134: rds_exporter exports memory in Kb with node_exporter labels which are in bytes
  • PMM-2157: Cannot connect to MongoDB using URI style
  • PMM-2175: Grafana singlestat doesn’t use consistent colour when unit is of type Time
  • PMM-2474: Data resolution on Dashboards became 15sec interval instead of 1sec
  • PMM-2581: Improve Travis CI tests by addressing pmm-admin check-network Time Drift
  • PMM-2582: Unable to scroll on “_PMM Add Instance” page when many RDS instances exist in an AWS account
  • PMM-2596: Set fixed height for panel content in PMM Add Instances
  • PMM-2600: InnoDB Checkpoint Age does not show data for MySQL
  • PMM-2620: Fix balancerIsEnabled & balancerChunksBalanced values
  • PMM-2634: pmm-admin cannot create user for MySQL 8
  • PMM-2635: Improve error message while adding metrics beyond “exit status 1”

Known Issues

  • PMM-2639: mysql:metrics does not work on Ubuntu 18.04 – We will address this in a subsequent release

How to get PMM Server

PMM is available for installation using three methods:

The post Percona Monitoring and Management 1.12.0 Is Now Available appeared first on Percona Database Performance Blog.


Upcoming Webinar Thursday, September 7: Using PMM to Troubleshoot MySQL Performance Issues

Troubleshooting MySQL Performance

Troubleshooting MySQL PerformanceJoin Percona’s Product Manager, Michael Coburn as he presents Using Percona Monitoring and Management to Troubleshoot MySQL Performance Issues on Thursday, September 7, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).


Successful applications often become limited by MySQL performance. Michael will show you how to get great MySQL performance using Percona Monitoring and Management (PMM). There will be a demonstration of how to leverage the combination of the query analytics and metrics monitor when troubleshooting MySQL performance issues. We’ll review the essential components of PMM, and use some of the most common database slowness cases as examples of where to look and what to do.

By the end of the webinar you will have a better understanding of:

  • Query metrics, including bytes sent, lock time, rows sent, and more
  • Metrics monitoring
  • How to identify MySQL performance issues
  • Point-in-time visibility and historical trending of database performance

Register for the webinar here.

Troubleshoot MySQL PerformanceMichael Coburn, Product Manager

Michael joined Percona as a Consultant in 2012 after having worked with high volume stock photography websites and email service provider platforms. WIth a foundation in systems administration, Michael enjoys working with SAN technologies and high availability solutions. A Canadian, Michael currently lives in the Nicoya, Costa Rica area with his wife, two children, and two dogs.

The cost of not properly managing your databases

Every day hundreds of millions of dollars are wasted by allowing improperly tuned or misconfigured systems, misunderstood infrastructure, and inefficient IT operations to live and thrive in data centers around the globe. There are both direct and indirect costs associated with allowing these unhealthy systems to continue to exist. Let’s look at some.

The setup:

Let us start by using a small example. We will start by looking at a small database setup. This setup will have a single master-slave, with a database size of lets say 500GB. Traffic is steady and let’s say this translates into 500 IOPS on the master. You have chosen to host this on Amazon’s AWS. A common way of ensuring backups occur in AWS is to setup ebs snapshots of the slave. In terms of usage, let us assume your CPU is about 50% used and you have about 20GB of hot data that needs to stay in the memory for the database.

If we look at what this would take to support in EC2 you are looking roughly at this:

  • 2 c3.4xlarge servers (16 vcpu, 30GB of memory )
  • Master-Slave Set
  • with 1TB of Provisioned IOPS SSD, over 2 volumes
  • with 500 IOPS on the master, 125 iops on the slave
  • estimated 7TB of storage for snapshots

This calculator gives us an estimated cost of $3,144.28 per month, or roughly $38,000 a year in hosting fees.  Note that you can choose other tiers of service, or reserved or spot servers to get different pricing.

Regular, steady growth:

Now let’s assume your database is growing along with its traffic at about 5% per month (these are rough numbers I know). After a year your database server would be out of steam using 86% CPU, 34GB of hot data (so relying more heavily on disk), and be consuming just about 850GB of storage space. Moving up to the next tier of servers and with additional iops you will see your spend per month jump to around $4,771.32 per month ($57,000 per year).

When tuning and auditing an environment like the above we been able to give some customers up to 50% or more improvement in performance, and often see 20-25% reduction in space. Let’s be conservative and say we can get a 25% boost in your performance, reduce your 5% monthly growth to 4%, and shrink your database by 10%. Based on that you can stave off upgrading your servers an additional 9 months, saving you almost $15,000 in that first year alone. Over 4 years this customer would end up saving an estimated $75,000 in total spend in AWS costs just based on smaller data and performance enhancements.


In this case performance enhancements are not the only place to save costs. Moving from EBS Snapshots to regular MySQL backups using Percona XtraBackup, keeping one copy on disk and sending those backups to s3, the cost of the environment drops to $2,043.87 per month ( from $3144.28).  This means a simple switch of backup methodology can net you about $1,100 a month or $13,200 a year off your hosting bill.


These numbers are are based on only two servers, the saving over dozens or even hundreds of servers can be huge. Take a look at this 10 server environment:

Often we are not only reducing the resources needed, but we can also reduce the number of servers needed to run your application through tuning. We had a recent client who was able to see a 90% reduction in their read heavy workload and actually turn off servers that used to be used to serve their application. Here is what their savings over the next couple of years would look like:

DB Cost Savings - 90% reduction

Here we helped cut this customers direct costs by two thirds.

Handling spikes:

 The one thing to keep in mind is this assumes a linear growth in terms of application and database usage.  This would mean you can predict when you will need hardware.  If your user base is growing and feature adds are controlled it is possible, however in most environments you will not see that linear growth. You will probably see something like this:


DB Workload - Cost savings

Understanding this pattern and the spikes are vital to keeping your costs down. See that giant spike up to 2,500? The first reaction for many is upgrade their hardware, then tune. Inevitably any tuning benefit is offset by the already sunk cost of the hardware upgrade which after the tuning they may not have needed. Getting in front of that spike and preventing it could have saved tens or even hundreds of thousands of dollars.

 Spikes kill performance and cost real dollars. Those spikes may not be easy to find. A few years ago I was working with a Fortune 500 client who had one of these spikes. They had been running perfectly fine with steady but controllable growth for 7 or 8 months, then the 9th month things went very wrong very quick.

A critical component of their company was to certify professionals through a testing application. During their peak time of season these certifications had stopped completely – delaying certifications for thousands of employees and clients for 2 weeks. I was flown out to help control the bleeding and hopefully fix the problem.

The number of users using the application was the same, the number of page views on the web was steady, but the number of queries to the database skyrocketed. None of the queries had hit their threshold of 1 second to be flagged as problem queries. It turned out to be one query that took 250ms to execute that was causing this company to grind to a halt. That one query ended up being executed 25,000 times per page when certain conditions were met, and those conditions were not met until the 9th month after this application was re-released.

This query lay like a trojan horse waiting to destroy this company’s ability to deliver to its customers.

 Two lessons can be learned from this. The first is even a seemingly well-tuned system may not be. Second, small things matter. In this case fixing the code is the correct solution, however, proper indexing of the tables dropped the query time from 250ms down to 50ms.  This was enough of a relief to allow the certification process to start up again until the code could be fixed. A seemingly small impacting query still should be optimized.

 Another source for these performance spikes is a company’s application release cycle. Applications are very a living entity in today’s world. They grow and expand and change on a regular basis. In order to stay ahead of any problems you need to have a process and resources in place that can proactively monitor and tune. Every release of new code should be going through a rigorous performance review to prevent trojan problems that may cause problems and extra costs down the road.

Indirect and hard to calculate costs:

All of this discussion so far has been around direct hosting costs. There is also a cost to your reputation and your ability to deliver services that meet a customer’s expectations. Customers who come to your site or are using your application can leave in droves due to poor performance. We have seen several customers who lost 50% or more of their user base due to performance problems with the application.

Lost revenue and profits are often much more difficult to quantify, and vary greatly from company to company. This cost, however, is very real. Silicon Valley is littered with the remnants of companies that did not plan to address scale or simply missed important problems in their IT infrastructure. Unfortunately I have worked directly with numerous companies that learned this lesson the hard way. These hidden costs can kill a customer quicker than any competitor or market shift.

 One of the biggest hidden costs companies needlessly pay is the cost of downtime.

The cost of downtime:

I was reading a gartner study where they estimated that the cost per minute of downtime was $5,600 dollars; other studies, like this one, have pegged the cost per minute of downtime at $7,900.

Anyway you slice it being down for even a minute costs you money. If we are conservative in our estimates, the cost of an hour of downtime can easily top $100,000. It’s amazing the number of well-established companies that don’t have a solid plan for dealing with downtime.

Let’s look at some common disaster recovery policies:

Restore from backup:

How quickly can your DBAs get alerted to an outage, then login to look at the outage, and finally make a call whether or not to restore? I submit that most people are going to take a few minutes to get an alert (let’s say 2). They will then take a few minutes to get to the computer and into the system (let’s say 5 minutes). Then they will take at least 10 minutes to try and figure out what’s going on. Fast-forward 17 minutes later…. minimum has gone by with nothing to show for it.

Restoring the backup itself could take a few minutes or several hours. Let’s just say 40 minutes total. If we use that $7,900 number, you could have just lost $316,000. That’s a huge amount that could have easily been avoided. Maybe you know that you’re not losing $7,900 a minute, maybe it’s only $1,000. That’s still $40,000!

Manual failover to a slave:

The time for getting, reacting and taking action does not change in this equation. The original 17 minutes of time (minimum) to react and start fixing just potentially cost you $134,300.

Automated failover:

Not all automated failovers are created equal. Some solutions can take several minutes to even hours to restore proper service (passive cold slaves warm up time). Just because you think you are protected does not mean you are. Having the right automated solution can mean you minimize your downtime risks to $10K or less, having the wrong one can be worse than having none at all.

It’s important to understand the cost of downtime and pick the proper solution to mitigate it.

Cost of being wrong is high:

These are just a few of the costs that companies can incur by having the improper database and infrastructure setup. Mitigating these costs requires a solid process, a high-level of expertise, and the right resources in place.

The post The cost of not properly managing your databases appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by