Grafana Labs launches observability stack for enterprise customers

Grafana Labs has created an open-source observability trifecta that includes Prometheus for monitoring, Loki for logging and Tempo for tracing. Today, the company announced it was releasing enterprise versions of these open-source projects in a unified stack designed specifically for the needs of large companies.

Company CEO Raj Dutt says that this product is really aimed at the largest companies in the world, who crave control over their software. “We’re really going after at-scale users who want a cutting-edge observability platform based on these leading open-source projects. And we are adding a lot of feature differentiation in the enterprise version along with 24/7 support from the experts, from the people who have actually created software,” he said.

Among those features is a set of plug-ins that lets these large customers pull data into the platform from leading enterprise software companies, including Splunk, New Relic, MongoDB and Snowflake. The Enterprise Stack also provides enhanced authentication and security.

Dutt calls this product self-managed to contrast it with the managed cloud versions of the product the company already has been offering for some time. “We have two main products, Grafana Cloud and now Grafana Enterprise Stack. Grafana Cloud is our hosted deployment model, and the Grafana Enterprise Stack is essentially licensed software that customers are free to run however they want, whether that’s on prem, in a colocation company like Equinix or on the cloud vendor of their choice,” Dutt explained.

They can also mix and match their deployments across the cloud or on-prem in a hybrid style, and the large enterprise customers that the company is going after with this product should like that flexibility. “It also allows them to hybridize their deployments, so they may decide to use the cloud for metrics, but their logs contain a lot of sensitive information [and they want to deploy that on prem]. And since it’s a composable stack, they may have a hybrid deployment that’s partly in the cloud and partly on prem,” he said.

When you combine this new enterprise version with the managed cloud version that already exists, it gives Grafana another potentially large revenue source. The open-source products act as a driver, giving Grafana a way into these companies, and Dutt says they know of more than 700,000 instances of the open-source products in use across the world.

While the open-source business model usually only turns a fraction of these users into paying customers, having numbers like this gives the company a huge head start and it’s gotten the attention of investors. The company has already raised over $75 million, including a $24 million Series A 2019 and a $50 million Series B in 2020.


Praqma puts Atlassian’s Data Center products into containers

It’s KubeCon + CloudNativeCon this week and in the slew of announcements, one name stood out: Atlassian . The company is best known as the maker of tools that allow developers to work more efficiently, and now as a cloud infrastructure provider. In this age of containerization, though, even Atlassian can bask in the glory that is Kubernetes, because the company today announced that its channel partner Praqma is launching Atlassian Software in Kubernetes (ASK), a new solution that allows enterprises to run and manage as containers its on-premise applications like Jira Data Center, with the help of Kubernetes.

Praqma is now making ASK available as open source.

As the company notes in today’s announcement, running a Data Center application and ensuring high availability can be a lot of work using today’s methods. With AKS and by containerizing the applications, scaling and management should become easier — and downtime more avoidable.

“Availability is key with ASK. Automation keeps mission-critical applications running whatever happens,” Praqma’s team explains. “If a Jira server fails, Data Center will automatically redirect traffic to healthy servers. If an application or server crashes Kubernetes automatically reconciles by bringing up a new application. There’s also zero downtime upgrades for Jira.”

AKS handles the scaling and most admin tasks, in addition to offering a monitoring solution based on the open-source Grafana and Prometheus projects.

Containers are slowly becoming the distribution medium of choice for a number of vendors. As enterprises move their existing applications to containers, it makes sense for them to also expect that they can manage their existing on-premises applications from third-party vendors in the same systems. For some vendors, that may mean a shift away from pre-server licensing to per-seat licensing, so there are business implications to this, but in general, it’s a logical move for most.


PMM’s Custom Queries in Action: Adding a Graph for InnoDB mutex waits

PMM mutex wait graph

One of the great things about Percona Monitoring and Management (PMM) is its flexibility. An example of that is how one can go beyond the exporters to collect data. One approach to achieve that is using textfile collectors, as explained in  Extended Metrics for Percona Monitoring and Management without modifying the Code. Another method, which is the subject matter of this post, is to use custom queries.

While working on a customer’s contention issue I wanted to check the behaviour of InnoDB Mutexes over time. Naturally, I went straight to PMM and didn’t find a graph suitable for my needs. No graph, no problem! Luckily anyone can enhance PMM. So here’s how I made the graph I needed.

The final result will looks like this:

Custom Queries

What is it?

Starting from the version 1.15.0, PMM provides user the ability to take a SQL SELECT statement and turn the resultset into a metric series in PMM. That is custom queries.

How do I enable that feature?

This feature is ON by default. You only need to edit the configuration file using YAML syntax

Where is the configuration file located?

Config file location is /usr/local/percona/pmm-client/queries-mysqld.yml by default. You can change it when adding mysql metrics via pmm-admin:

pmm-admin add mysql:metrics ... -- --queries-file-name=/usr/local/percona/pmm-client/query.yml

How often is data being collected?

The queries are executed at the LOW RESOLUTION level, which by default is every 60 seconds.

InnoDB Mutex monitoring

The method used to gather Mutex status is querying the PERFORMANCE SCHEMA, as explained here: https://dev.mysql.com/doc/refman/5.7/en/monitor-innodb-mutex-waits-performance-schema.html but intentionally removed the SUM_TIMER_WAIT > 0 condition, so the query used looks like this:

FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'

For this query to return data, some requirements need to be met:

  • The most important one: Performance Schema needs to be enabled
  • Consumers for “event_waits” enabled
  • Instruments for ‘wait/synch/mutex/innodb’ enabled.

If performance schema is enabled, the other two requirements are met by running these two queries:

update performance_schema.setup_instruments set enabled='YES' where name like 'wait/synch/mutex/innodb%';
update performance_schema.setup_consumers set enabled='YES' where name like 'events_waits%';

YAML Configuration File

This is where the magic happens. Explanation of the YAML syntax is covered in deep on the documentation: https://www.percona.com/doc/percona-monitoring-and-management/conf-mysql.html#pmm-conf-mysql-executing-custom-queries

The one used for this issue is:

    query: "SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT FROM performance_schema.events_waits_summary_global_by_event_name WHERE EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'"
      - EVENT_NAME:
          usage: "LABEL"
          description: "Name of the mutex"
      - COUNT_STAR:
          usage: "COUNTER"
          description: "Number of calls"
          usage: "GAUGE"
          description: "Duration"

The key info is:

  • The metric name is mysql_global_status_innodb_mutex
  • Since EVENT_NAME is used as a label, it will be possible to have values per event

Remember that this should be in the queries-mysql.yml file. Full path /usr/local/percona/pmm-client/queries-mysqld.yml  inside the db node.

Once that is done, you will start to have those metrics available in Prometheus. Now, we have a graph to do!

Creating the graph in Grafana

Before jumping to grafana to add the graph, we need a proper Prometheus Query (A.K.A: PromQL). I came up with these two (one for the count_star, one for the sum_timer_wait):

topk(5, label_replace(rate(mysql_global_status_innodb_mutex_COUNT_STAR{instance="$host"}[$interval]), "mutex", "$2", "EVENT_NAME", "(.*)/(.*)" ) or label_replace(irate(mysql_global_status_innodb_mutex_COUNT_STAR{instance="$host"}[5m]), "mutex", "$2", "EVENT_NAME", "(.*)/(.*)" ))


topk(5, label_replace(rate(mysql_global_status_innodb_mutex_SUM_TIMER_WAIT{instance="$host"}[$interval]), "mutex", "$2", "EVENT_NAME", "(.*)/(.*)" ) or label_replace(irate(mysql_global_status_innodb_mutex_SUM_TIMER_WAIT{instance="$host"}[5m]), "mutex", "$2", "EVENT_NAME", "(.*)/(.*)" ))

These queries are basically: Return the rate values of each mutex event for a specific host. And make some regex to return only the name of the event, and discard whatever is before the last slash character.

Once we are good with our PromQL queries, we can go and add the graph.

Finally, I got the graph that I needed with a very small effort.

The dashboard is also published on the Grafana Labs Community dashboards site.


PMM’s collection of graphs and dashboard is quite complete, but it is also natural that there are specific metrics that might not be there. For those cases, you can count on the flexibility and ease usage of PMM to collect metrics and create custom graphs. So go ahead, embrace PMM, customize it, make it yours!

The JSON for this graph, so it can be imported easily, is:

  "aliasColors": {},
  "bars": false,
  "dashLength": 10,
  "dashes": false,
  "datasource": "Prometheus",
  "fill": 0,
  "gridPos": {
    "h": 18,
    "w": 24,
    "x": 0,
    "y": 72
  "id": null,
  "legend": {
    "alignAsTable": true,
    "avg": true,
    "current": false,
    "max": true,
    "min": true,
    "rightSide": false,
    "show": true,
    "sideWidth": 0,
    "sort": "avg",
    "sortDesc": true,
    "total": false,
    "values": true
  "lines": true,
  "linewidth": 2,
  "links": [],
  "nullPointMode": "null",
  "percentage": false,
  "pointradius": 0.5,
  "points": false,
  "renderer": "flot",
  "seriesOverrides": [
      "alias": "/Timer Wait/i",
      "yaxis": 2
  "spaceLength": 10,
  "stack": false,
  "steppedLine": false,
  "targets": [
      "expr": "topk(5, label_replace(rate(mysql_global_status_innodb_mutex_COUNT_STAR{instance=\"$host\"}[$interval]), \"mutex\", \"$2\", \"EVENT_NAME\", \"(.*)/(.*)\" )) or topk(5,label_replace(irate(mysql_global_status_innodb_mutex_COUNT_STAR{instance=\"$host\"}[5m]), \"mutex\", \"$2\", \"EVENT_NAME\", \"(.*)/(.*)\" ))",
      "format": "time_series",
      "interval": "$interval",
      "intervalFactor": 1,
      "legendFormat": "{{ mutex }} calls",
      "refId": "A",
      "hide": false
      "expr": "topk(5, label_replace(rate(mysql_global_status_innodb_mutex_SUM_TIMER_WAIT{instance=\"$host\"}[$interval]), \"mutex\", \"$2\", \"EVENT_NAME\", \"(.*)/(.*)\" )) or topk(5, label_replace(irate(mysql_global_status_innodb_mutex_SUM_TIMER_WAIT{instance=\"$host\"}[5m]), \"mutex\", \"$2\", \"EVENT_NAME\", \"(.*)/(.*)\" ))",
      "format": "time_series",
      "interval": "$interval",
      "intervalFactor": 1,
      "legendFormat": "{{ mutex }} timer wait",
      "refId": "B",
      "hide": false
  "thresholds": [],
  "timeFrom": null,
  "timeShift": null,
  "title": "InnoDB Mutex",
  "tooltip": {
    "shared": true,
    "sort": 2,
    "value_type": "individual"
  "transparent": false,
  "type": "graph",
  "xaxis": {
    "buckets": null,
    "mode": "time",
    "name": null,
    "show": true,
    "values": []
  "yaxes": [
      "format": "short",
      "label": "",
      "logBase": 1,
      "max": null,
      "min": null,
      "show": true
      "decimals": null,
      "format": "ns",
      "label": "",
      "logBase": 1,
      "max": null,
      "min": "0",
      "show": true
  "yaxis": {
    "align": false,
    "alignLevel": null


Scaling Percona Monitoring and Management (PMM)

PMM tested with 1000 nodes

Starting with PMM 1.13,  PMM uses Prometheus 2 for metrics storage, which tends to be heaviest resource consumer of CPU and RAM.  With Prometheus 2 Performance Improvements, PMM can scale to more than 1000 monitored nodes per instance in default configuration. In this blog post we will look into PMM scaling and capacity planning—how to estimate the resources required, and what drives resource consumption.

PMM tested with 1000 nodes

We have now tested PMM with up to 1000 nodes, using a virtualized system with 128GB of memory, 24 virtual cores, and SSD storage. We found PMM scales pretty linearly with the available memory and CPU cores, and we believe that a higher number of nodes could be supported with more powerful hardware.

What drives resource usage in PMM ?

Depending on your system configuration and workload, a single node can generate very different loads on the PMM server. The main factors that impact the performance of PMM are:

  1. Number of samples (data points) injected into PMM per second
  2. Number of distinct time series they belong to (cardinality)
  3. Number of distinct query patterns your application uses
  4. Number of queries you have on PMM, through the user interface on the API, and their complexity

These specifically can be impacted by:

  • Software version – modern database software versions expose more metrics)
  • Software configuration – some metrics are only exposed in certain configuration
  • Workload – a large number of database objects and high concurrency will increase both the number of samples ingested and their cardinality.
  • Exporter configuration – disabling collectors can reduce amount of data collectors
  • Scrape frequency –  controlled by METRICS_RESOLUTION

All these factors together may impact resource requirements by a factor of ten or more, so do your own testing to be sure. However, the numbers in this article should serve as good general guidance as a start point for your research.

On the system supporting 1000 instances we observed the following performance:

Performance PMM 1000 nodes load

As you can see, we have more than 2.000 scrapes/sec performed, providing almost two million samples/sec, and more than eight million active time series. These are the main numbers that define the load placed on Prometheus.

Capacity planning to scale PMM

Both CPU and memory are very important resources for PMM capacity planning. Memory is the more important as Prometheus 2 does not have good options for limiting memory consumption. If you do not have enough memory to handle your workload, then it will run out of memory and crash.

We recommend at least 2GB of memory for a production PMM Installation. A test installation with 1GB of memory is possible. However, it may not be able to monitor more than one or two nodes without running out of memory. With 2GB of memory you should be able to monitor at least five nodes without problem.

With powerful systems (8GB of more) you can have approximately eight systems per 1GB of memory, or about 15,000 samples ingested/sec per 1GB of memory.

To calculate the CPU usage resources required, allow for about 50 monitored systems per core (or 100K metrics/sec per CPU core).

One problem you’re likely to encounter if you’re running PMM with 100+ instances is the “Home Dashboard”. This becomes way too heavy with such a large number of servers. We plan to fix this issue in future releases of PMM, but for now you can work around it in two simple ways:

You can select the host, for example “pmm-server” in your home dashboard and save it, before adding a large amount of hosts to the system.

set home dashboard for PMM

Or you can make some other dashboard of your choice and set it as the home dashboard.


  • More than 1,000 monitored systems is possible per single PMM server
  • Your specific workload and configuration may significantly change the resources required
  • If deploying with 8GB or more, plan 50 systems per core, and eight systems per 1GB of RAM

The post Scaling Percona Monitoring and Management (PMM) appeared first on Percona Database Performance Blog.


Prometheus 2 Times Series Storage Performance Analyses

cpu saturation and max core usage

Prometheus 2 time series database (TSDB) is an amazing piece of engineering, offering a dramatic improvement compared to “v2” storage in Prometheus 1 in terms of ingest performance, query performance and resource use efficiency. As we’ve been adopting Prometheus 2 in Percona Monitoring and Management (PMM), I had a chance to look into the performance of Prometheus 2 TSDB. This blog post details my observations.

Understanding the typical Prometheus workload

For someone who has spent their career working with general purpose databases, the typical workload of Prometheus is quite interesting. The ingest rate tends to remain very stable: typically, devices you monitor will send approximately the same amount of metrics all the time, and infrastructure tends to change relatively slowly.

Queries to the data can come from multiple sources. Some of them, such as alerting, tend to be very stable and predictable too. Others, such as users exploring data, can be spiky, though it is not common for this to be largest part of the load.

The Benchmark

In my assessment, I focused on handling an ingest workload. I had deployed Prometheus 2.3.2 compiled with Go 1.10.1 (as part of PMM 1.14)  on Linode using this StackScript.  For a maximally realistic load generation, I spin up multiple MySQL nodes running some real workloads (Sysbench TPC-C Test) , with each emulating 10 Nodes running MySQL and Linux using this StackScript

The observations below are based on a Linode instance with eight virtual cores and 32GB of memory, running  20 load driving simulating the monitoring of 200 MySQL instances. Or, in Prometheus Terms, some 800 targets; 440 scrapes/sec 380K samples ingested per second and 1.7M of active time series.

Design Observations

The conventional approach of traditional databases, and the approach that Prometheus 1.x used, is to limit amount of memory. If this amount of memory is not enough to handle the load, you will have high latency and some queries (or scrapes) will fail. Prometheus 2 memory usage instead is configured by


   which determines how long samples will be stored in memory before they are flushed (the default being 2h). How much memory it requires will depend on the number of time series, the number of labels you have, and your scrape frequency in addition to the raw ingest rate. On disk, Prometheus tends to use about three bytes per sample. Memory requirements, though, will be significantly higher.

While the configuration knob exists to change the head block size, tuning this by users is discouraged. So you’re limited to providing Prometheus 2 with as much memory as it needs for your workload.

If there is not enough memory for Prometheus to handle your ingest rate, then it will crash with out of memory error message or will be killed by OOM killer.

Adding more swap space as a “backup” in case Prometheus runs out of RAM does not seem to work as using swap space causes a dramatic memory usage explosion. I suspect swapping does not play well with Go garbage collection.

Another interesting design choice is aligning block flushes to specific times, rather than to time since start:

head block Prometheus 2

As you can see from this graph, flushes happen every two hours, on the clock. If you change min-block-duration  to 1h, these flushes will happen every hour at 30 minutes past the hour.

(If you want to see this and other graphs for your Prometheus Installation you can use this Dashboard. It has been designed for PMM but can work for any Prometheus installation with little adjustments.)

While the active block—called head block— is kept in memory, blocks containing older blocks are accessed through


  This eliminates the need to configure cache separately, but also means you need to allocate plenty of memory for OS Cache if you want to query data older than fits in the head block.

It also means the virtual memory you will see Prometheus 2 using will get very high: do not let it worry you.

Prometheus process memory usage

Another interesting design choice is WAL configuration. As you can see in the storage documentation, Prometheus protects from data loss during a crash by having WAL log. The exact durability guarantees, though, are not clearly described. As of Prometheus 2.3.2, Prometheus flushes the WAL log every 10 seconds, and this value is not user configurable.


Prometheus TSDB is designed somewhat similar to the LSM storage engines – the head block is flushed to disk periodically, while at the same time, compactions to merge a few blocks together are performed to avoid need to scan too many blocks for queries

Here is the number of data blocks I observed on my system after a 24h workload:

active data blocks

If you want more details about storage, you can check out the meta.json file which has additional information about the blocks you have, and how they came about.

       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       "compaction": {
               "level": 2,
               "sources": [
               "parents": [
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
       "version": 1

Compactions in Prometheus are triggered at the time the head block is flushed, and several compactions may be performed at these intervals:Prometheus 2 compactions

Compactions do not seem to be throttled in any way, causing huge spikes of disk IO usage when they run:

spike in io activity for compactions

And a spike in CPU usage:

spike in CPU usage during compactions

This, of course, can cause negative impact to the system performance. This is also why it is one of the greatest questions in LSM engines: how to run compactions to maintain great query performance, but not cause too much overhead.

Memory utilization as it relates to the compaction process is also interesting:

Memory utilization during compaction process

We can see after compaction a lot of memory changes from “Cached”  to “Free”, meaning potentially valuable data is washed out from memory. I wonder if


 or other techniques to minimize data washout from cache are in use, or if this is caused by the fact that the blocks which were cached are destroyed by the compaction process

Crash Recovery

Crash recovery from the log file takes time, though it is reasonable. For an ingest rate of about 1 mil samples/sec, I observed some 25 minutes recovery time on SSD storage:

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

The problem I observed with recovery is that it is very memory intensive. While the server may be capable of handling the normal load with memory to spare if it crashes, it may not be able to ever recover due to running out of memory.  The only solution I found for this is to disable scraping, let it perform crash recovery, and then restarting the server with scraping enabled


Another behavior to keep in mind is the need for warmup – a lower performance/higher resource usage ratio immediately after start. In some—but not all—starts I can observe significantly higher initial CPU and memory usage

cpu usage during warmup

memory usage during warmup

The gaps in the memory utilization graph show that Prometheus is not initially able to perform all the scrapes configured, and as such some data is lost.

I have not profiled what exactly causes this extensive CPU and memory consumption. I suspect these might be happening when new time series entries are created, at head block, and at high rate.

CPU Usage Spikes

Besides compaction—which is quite heavy on the Disk IO—I also can observe significant CPU spikes about every 2 minutes. These are longer with a higher ingest ratio. These seem to be caused by Go Garbage Collection during these spikes: at least some CPU cores are completely saturated

cpu usage spikes maybe during Go Garbage collection

cpu saturation and max core usage

These spikes are not just cosmetic. It looks like when these spikes happen, the Prometheus internal /metrics endpoint becomes unresponsive, thus producing data gaps during the exact time that the spikes occur:

Prometheus 2 process memory usage

We can also see the Prometheus Exporter hitting a one second timeout:

scrape time by job

We can observe this correlates with garbage collection:

garbage collection in Prometheus processing


Prometheus 2 TSDB offers impressive performance, being able to handle a cardinality of millions of time series, and also to handle hundreds of thousands of samples ingested per second on rather modest hardware. CPU and disk IO usage are both very impressive. I got up to 200K/metrics/sec per used CPU core!

For capacity planning purposes you need to ensure that you have plenty of memory available, and it needs to be real RAM. The actual amount of memory I observed was about 5GB per 100K/samples/sec ingest rate, which with additional space for OS cache, makes it 8GB or so.

There is work that remains to be done to avoid CPU and IO usage spikes, though this is not unexpected considering how young Prometheus 2 TSDB is – if we look at InnoDB, TokuDB, RocksDB, WiredTiger all of them had similar problem in their initial releases.

The post Prometheus 2 Times Series Storage Performance Analyses appeared first on Percona Database Performance Blog.


Monitoring S.M.A.R.T. Metrics with Prometheus and PMM

visualized using Grafana

In his excellent blog post, Pavel Trukhanov showed the value of S.M.A.R.T. metric collections, so I wondered how hard would it be to enable their collection in Percona Monitoring and Management (PMM)

A quick search led me to the  text_collector plugin SmartMon, which can be easily integrated with any Prometheus Installation

For PMM, Vadim Yalovets recently showed how to do custom integrations based on text_collector

Let’s put those together:

  1. Ensure you have the smartctl tool installed. It is available in repositories for most Linux distributions
  2. Get  smartmon.sh and place it in /usr/local/bin or other location
  3. Install the cron job
    echo  "*/5 * * * * root bash  /usr/local/bin/smartmon.sh > /tmp/smart_metrics.prom  " > /etc/cron.d/smartmon
  4. Enable textfile_collector as described in this blog post

That’s it! You should get your data flowing. Now you can use Prometheus to query device information:

use prometheus to query device

Or if you want to get a specific S.M.A.R.T value, such as media_wearout indicator:

specific smart value wearout indicator

If you would like to see a nicer visualization in Grafana, you can install the appropriate dashboard from the Grafana web site.

visualized using Grafana

The number and kind of metrics you’re going to get depends on the storage device vendor and model. Here is an example list from one of my test systems:

# HELP smartmon_smartctl_version SMART metric smartctl_version
# TYPE smartmon_smartctl_version gauge
smartmon_smartctl_version{version="6.5"} 1
# HELP smartmon_current_pending_sector_raw_value SMART metric current_pending_sector_raw_value
# TYPE smartmon_current_pending_sector_raw_value gauge
smartmon_current_pending_sector_raw_value{disk="/dev/sda",type="sat",smart_id="197"} 0.000000e+00
# HELP smartmon_current_pending_sector_threshold SMART metric current_pending_sector_threshold
# TYPE smartmon_current_pending_sector_threshold gauge
smartmon_current_pending_sector_threshold{disk="/dev/sda",type="sat",smart_id="197"} 0
# HELP smartmon_current_pending_sector_value SMART metric current_pending_sector_value
# TYPE smartmon_current_pending_sector_value gauge
smartmon_current_pending_sector_value{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_current_pending_sector_worst SMART metric current_pending_sector_worst
# TYPE smartmon_current_pending_sector_worst gauge
smartmon_current_pending_sector_worst{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_device_info SMART metric device_info
# TYPE smartmon_device_info gauge
smartmon_device_info{disk="/dev/sda",type="sat",vendor="",product="",revision="",lun_id="",model_family="",device_model="Crucial_CT275MX300SSD1",serial_number="16431465B53F",firmware_version="M0CR031"} 1
# HELP smartmon_device_smart_available SMART metric device_smart_available
# TYPE smartmon_device_smart_available gauge
smartmon_device_smart_available{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_enabled SMART metric device_smart_enabled
# TYPE smartmon_device_smart_enabled gauge
smartmon_device_smart_enabled{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_healthy SMART metric device_smart_healthy
# TYPE smartmon_device_smart_healthy gauge
smartmon_device_smart_healthy{disk="/dev/sda",type="sat"} 1
# HELP smartmon_end_to_end_error_raw_value SMART metric end_to_end_error_raw_value
# TYPE smartmon_end_to_end_error_raw_value gauge
smartmon_end_to_end_error_raw_value{disk="/dev/sda",type="sat",smart_id="184"} 0.000000e+00
# HELP smartmon_end_to_end_error_threshold SMART metric end_to_end_error_threshold
# TYPE smartmon_end_to_end_error_threshold gauge
smartmon_end_to_end_error_threshold{disk="/dev/sda",type="sat",smart_id="184"} 0
# HELP smartmon_end_to_end_error_value SMART metric end_to_end_error_value
# TYPE smartmon_end_to_end_error_value gauge
smartmon_end_to_end_error_value{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_end_to_end_error_worst SMART metric end_to_end_error_worst
# TYPE smartmon_end_to_end_error_worst gauge
smartmon_end_to_end_error_worst{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_offline_uncorrectable_raw_value SMART metric offline_uncorrectable_raw_value
# TYPE smartmon_offline_uncorrectable_raw_value gauge
smartmon_offline_uncorrectable_raw_value{disk="/dev/sda",type="sat",smart_id="198"} 0.000000e+00
# HELP smartmon_offline_uncorrectable_threshold SMART metric offline_uncorrectable_threshold
# TYPE smartmon_offline_uncorrectable_threshold gauge
smartmon_offline_uncorrectable_threshold{disk="/dev/sda",type="sat",smart_id="198"} 0
# HELP smartmon_offline_uncorrectable_value SMART metric offline_uncorrectable_value
# TYPE smartmon_offline_uncorrectable_value gauge
smartmon_offline_uncorrectable_value{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_offline_uncorrectable_worst SMART metric offline_uncorrectable_worst
# TYPE smartmon_offline_uncorrectable_worst gauge
smartmon_offline_uncorrectable_worst{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_power_cycle_count_raw_value SMART metric power_cycle_count_raw_value
# TYPE smartmon_power_cycle_count_raw_value gauge
smartmon_power_cycle_count_raw_value{disk="/dev/sda",type="sat",smart_id="12"} 2.000000e+01
# HELP smartmon_power_cycle_count_threshold SMART metric power_cycle_count_threshold
# TYPE smartmon_power_cycle_count_threshold gauge
smartmon_power_cycle_count_threshold{disk="/dev/sda",type="sat",smart_id="12"} 0
# HELP smartmon_power_cycle_count_value SMART metric power_cycle_count_value
# TYPE smartmon_power_cycle_count_value gauge
smartmon_power_cycle_count_value{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_cycle_count_worst SMART metric power_cycle_count_worst
# TYPE smartmon_power_cycle_count_worst gauge
smartmon_power_cycle_count_worst{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_on_hours_raw_value SMART metric power_on_hours_raw_value
# TYPE smartmon_power_on_hours_raw_value gauge
smartmon_power_on_hours_raw_value{disk="/dev/sda",type="sat",smart_id="9"} 1.313300e+04
# HELP smartmon_power_on_hours_threshold SMART metric power_on_hours_threshold
# TYPE smartmon_power_on_hours_threshold gauge
smartmon_power_on_hours_threshold{disk="/dev/sda",type="sat",smart_id="9"} 0
# HELP smartmon_power_on_hours_value SMART metric power_on_hours_value
# TYPE smartmon_power_on_hours_value gauge
smartmon_power_on_hours_value{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_power_on_hours_worst SMART metric power_on_hours_worst
# TYPE smartmon_power_on_hours_worst gauge
smartmon_power_on_hours_worst{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_raw_read_error_rate_raw_value SMART metric raw_read_error_rate_raw_value
# TYPE smartmon_raw_read_error_rate_raw_value gauge
smartmon_raw_read_error_rate_raw_value{disk="/dev/sda",type="sat",smart_id="1"} 0.000000e+00
# HELP smartmon_raw_read_error_rate_threshold SMART metric raw_read_error_rate_threshold
# TYPE smartmon_raw_read_error_rate_threshold gauge
smartmon_raw_read_error_rate_threshold{disk="/dev/sda",type="sat",smart_id="1"} 0
# HELP smartmon_raw_read_error_rate_value SMART metric raw_read_error_rate_value
# TYPE smartmon_raw_read_error_rate_value gauge
smartmon_raw_read_error_rate_value{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_raw_read_error_rate_worst SMART metric raw_read_error_rate_worst
# TYPE smartmon_raw_read_error_rate_worst gauge
smartmon_raw_read_error_rate_worst{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_reallocated_sector_ct_raw_value SMART metric reallocated_sector_ct_raw_value
# TYPE smartmon_reallocated_sector_ct_raw_value gauge
smartmon_reallocated_sector_ct_raw_value{disk="/dev/sda",type="sat",smart_id="5"} 0.000000e+00
# HELP smartmon_reallocated_sector_ct_threshold SMART metric reallocated_sector_ct_threshold
# TYPE smartmon_reallocated_sector_ct_threshold gauge
smartmon_reallocated_sector_ct_threshold{disk="/dev/sda",type="sat",smart_id="5"} 10
# HELP smartmon_reallocated_sector_ct_value SMART metric reallocated_sector_ct_value
# TYPE smartmon_reallocated_sector_ct_value gauge
smartmon_reallocated_sector_ct_value{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reallocated_sector_ct_worst SMART metric reallocated_sector_ct_worst
# TYPE smartmon_reallocated_sector_ct_worst gauge
smartmon_reallocated_sector_ct_worst{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reported_uncorrect_raw_value SMART metric reported_uncorrect_raw_value
# TYPE smartmon_reported_uncorrect_raw_value gauge
smartmon_reported_uncorrect_raw_value{disk="/dev/sda",type="sat",smart_id="187"} 0.000000e+00
# HELP smartmon_reported_uncorrect_threshold SMART metric reported_uncorrect_threshold
# TYPE smartmon_reported_uncorrect_threshold gauge
smartmon_reported_uncorrect_threshold{disk="/dev/sda",type="sat",smart_id="187"} 0
# HELP smartmon_reported_uncorrect_value SMART metric reported_uncorrect_value
# TYPE smartmon_reported_uncorrect_value gauge
smartmon_reported_uncorrect_value{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_reported_uncorrect_worst SMART metric reported_uncorrect_worst
# TYPE smartmon_reported_uncorrect_worst gauge
smartmon_reported_uncorrect_worst{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_smartctl_run SMART metric smartctl_run
# TYPE smartmon_smartctl_run gauge
smartmon_smartctl_run{disk="/dev/sda",type="sat"} 1535666337
# HELP smartmon_temperature_celsius_raw_value SMART metric temperature_celsius_raw_value
# TYPE smartmon_temperature_celsius_raw_value gauge
smartmon_temperature_celsius_raw_value{disk="/dev/sda",type="sat",smart_id="194"} 3.100000e+01
# HELP smartmon_temperature_celsius_threshold SMART metric temperature_celsius_threshold
# TYPE smartmon_temperature_celsius_threshold gauge
smartmon_temperature_celsius_threshold{disk="/dev/sda",type="sat",smart_id="194"} 0
# HELP smartmon_temperature_celsius_value SMART metric temperature_celsius_value
# TYPE smartmon_temperature_celsius_value gauge
smartmon_temperature_celsius_value{disk="/dev/sda",type="sat",smart_id="194"} 69
# HELP smartmon_temperature_celsius_worst SMART metric temperature_celsius_worst
# TYPE smartmon_temperature_celsius_worst gauge
smartmon_temperature_celsius_worst{disk="/dev/sda",type="sat",smart_id="194"} 59
# HELP smartmon_udma_crc_error_count_raw_value SMART metric udma_crc_error_count_raw_value
# TYPE smartmon_udma_crc_error_count_raw_value gauge
smartmon_udma_crc_error_count_raw_value{disk="/dev/sda",type="sat",smart_id="199"} 0.000000e+00
# HELP smartmon_udma_crc_error_count_threshold SMART metric udma_crc_error_count_threshold
# TYPE smartmon_udma_crc_error_count_threshold gauge
smartmon_udma_crc_error_count_threshold{disk="/dev/sda",type="sat",smart_id="199"} 0
# HELP smartmon_udma_crc_error_count_value SMART metric udma_crc_error_count_value
# TYPE smartmon_udma_crc_error_count_value gauge
smartmon_udma_crc_error_count_value{disk="/dev/sda",type="sat",smart_id="199"} 100
# HELP smartmon_udma_crc_error_count_worst SMART metric udma_crc_error_count_worst
# TYPE smartmon_udma_crc_error_count_worst gauge
smartmon_udma_crc_error_count_worst{disk="/dev/sda",type="sat",smart_id="199"} 100

The post Monitoring S.M.A.R.T. Metrics with Prometheus and PMM appeared first on Percona Database Performance Blog.


This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The call for submitting a talk to Percona Live Europe 2018 is closing today, and while there may be a short extension, have you already got your talk submitted? I suggest doing so ASAP!

I’m sure many of you have heard of cryptocurrencies, the blockchain, and so on. But how many of you realiize that Coinbase, an application that handles cryptocurrency trades, matching book orders, and more, is powered by MongoDB? With the hype and growth in interest in late 2017, Coinbase has had to scale. They gave an excellent talk at MongoDB World, titled MongoDB & Crypto Mania (the video is worth a watch), and they’ve also written a blog post, How we’re scaling our platform for spikes in customer demand. They even went driver hacking (the Ruby driver for MongoDB)!

It is great to see there be a weekly review of happenings in the Vitess world.

PingCap and TiDB have been to many Percona Live events to present, and recently hired Morgan Tocker. Morgan has migrated his blog from MySQL to TiDB. Read more about his experience in, This blog, now Powered by WordPress + TiDB. Reminds me of the early days of Galera Cluster and showing how Drupal could be powered by it!


Link List

  • Sys Schema MySQL 5.7+ – blogger from Wipro, focusing on an introduction to the sys schema on MySQL (note: still not available in the MariaDB Server fork).
  • Prometheus Graduates in the CNCF, so is considered a mature project. Criteria for graduation is such that “projects must demonstrate thriving adoption, a documented, structured governance process, and a strong commitment to community sustainability and inclusivity.” Percona benefits from Prometheus in Percona Monitoring & Management (PMM), so we should celebrate this milestone!
  • Replicating from MySQL 8.0 to MySQL 5.7
  • A while ago in this column, we linked to Shlomi Noach’s excellent post on MySQL High Availability at GitHub. We were also introduced to GitHub Load Balancer (GLB), which they ran on top of HAProxy. However back then, GLB wasn’t open; now you can get GLB Director: GLB: GitHub’s open source load balancer. The project describes GLB Director as: “… a Layer 4 load balancer which scales a single IP address across a large number of physical machines while attempting to minimise connection disruption during any change in servers. GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.”
  • F1 Query: Declarative Querying at Scale – a well-written paper.

Upcoming Appearances


I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.


The post This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF appeared first on Percona Database Performance Blog.


Resource Usage Improvements in Percona Monitoring and Management 1.13

PMM 1-13 reduction CPU usage by 5x

In Percona Monitoring and Management (PMM) 1.13 we have adopted Prometheus 2, and with this comes a dramatic improvement in resource usage, along with performance improvements!

What does it mean for you? This means you can have a significantly larger number of servers and database instances monitored by the same PMM installation. Or you can reduce the instance size you use to monitor your environment and save some money.

Let’s look at some stats!

CPU Usage

PMM 1.13 reduction in CPU usage by 5x

Percona Monitoring and Management 1.13 reduction in CPU usage after adopting Prometheus 2 by 8x

We can see an approximate 5x and 8x reduction of CPU usage on these two PMM Servers. Depending on the workload, we see CPU usage reductions to range between 3x and 10x.

Disk Writes

There is also less disk write bandwidth required:

PMM 1.13 reduction in disk write bandwidth

On this instance, the bandwidth reduction is “just” 1.5x times. Note this is disk IO for the entire PMM system, which includes more than only the Prometheus component. Prometheus 2 itself promises much more significant IO bandwidth reduction according to official benchmarks

According to the same benchmark, you should expect disk space usage reduction by 33-50% for Prometheus 2 vs Prometheus 1.8. The numbers will be less for Percona Monitoring and Management, as it also stores Query Statistics outside of Prometheus.

Resource usage on the monitored hosts

Also, resource usage on the monitored hosts is significantly reduced:

Percona Monitoring and Management 1.13 reduction of resource usage by Prometheus 2

Why does CPU usage go down on a monitored host with a Prometheus 2 upgrade? This is because PMM uses TLS for the Prometheus to monitored host communication. Before Prometheus 2, a full handshake was performed for every scrape, taking a lot of CPU time. This was optimized with Prometheus 2, resulting in a dramatic CPU usage decrease.

Query performance is also a lot better with Prometheus 2, meaning dashboards visually load a lot faster, though we did not do any specific benchmarks here to share the hard numbers. Note though this improvement only applies when you’re querying the data which is stored in Prometheus 2.

If you’re querying data that was originally stored in Prometheus 1.8, it will be queried through the much slower and less efficient “Remote Read” interface, being quite a bit slower and using a lot more CPU and memory resources.

If you love better efficiency and Performance, consider upgrading to PMM 1.13!

The post Resource Usage Improvements in Percona Monitoring and Management 1.13 appeared first on Percona Database Performance Blog.


This Week in Data with Colin Charles 45: OSCON and Percona Live Europe 2018 Call for Papers

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Hello again after the hiatus last week. I’m en route to Portland for OSCON, and am very excited as it is the conference’s 20th anniversary! I hope to see some of you at my talk on July 19.

On July 18, join me for a webinar: MariaDB 10.3 vs. MySQL 8.0 at 9:00 AM PDT (UTC-7) / 12:00 PM EDT (UTC-4). I’m also feverishly working on an update to MySQL vs. MariaDB: Reality Check, now that both MySQL 8.0 and MariaDB Server 10.3 are generally available.

Rather important: Percona Live Europe 2018 Call for Papers is now open. You can submit talk ideas until August 10, and the theme is Connect. Accelerate. Innovate.


Link List

Industry Updates

Upcoming appearances

  • OSCON – Portland, Oregon, USA – July 16-19 2018


I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 45: OSCON and Percona Live Europe 2018 Call for Papers appeared first on Percona Database Performance Blog.


This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Much of last week, there was a lot of talk around this article: New research shows 75% of ‘open’ Redis servers infected. It turns out, it helps that one should always read beyond the headlines because they tend to be more sensationalist than you would expect. From the author of Redis, I highly recommend reading Clarifications on the Incapsula Redis security report, because it turns out that in this case, it is beyond the headline. The content is also suspect. Antirez had to write this to help the press (we totally need to help keep reportage accurate).

Not to depart from the Redis world just yet, but Antirez also had some collaboration with the Apple Information Security Team with regards to the Redis Lua subsystem. The details are pretty interesting as documented in Redis Lua scripting: several security vulnerabilities fixed because you’ll note that the Alibaba team also found some other issues. Antirez also ensured that the Redis cloud providers (notably: Redis Labs, Amazon, Alibaba, Microsoft, Google, Heroku, Open Redis and Redis Green) got notified first (and in the comments, compose.io was missing, but now added to the list). I do not know if Linux distributions were also informed, but they will probably be rolling out updates soon.

In the “be careful where you get your software” department: some criminals have figured out they could host some crypto-currency mining software that you would get pre-installed if you used their Docker containers. They’ve apparently made over $90,000. It is good to note that the Backdoored images downloaded 5 million times finally removed from Docker Hub. This, however, was up on the Docker Hub for ten months and they managed to get over 5 million downloads across 17 images. Know what images you are pulling. Maybe this is again more reason for software providers to run their own registries?

James Turnbull is out with a new book: Monitoring with Prometheus. It just got released, I’ve grabbed it, but a review will come shortly. He’s managed all this while pulling off what seems to be yet another great O’Reilly Velocity San Jose Conference.


A quiet week on this front.

Link List

  • INPLACE upgrade from MySQL 5.7 to MySQL 8.0
  • PostgreSQL relevant: What’s is the difference between streaming replication vs hot standby vs warm standby ?
  • A new paper on Amazon Aurora is out: Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. It was presented at SIGMOD 2018, and an abstract: “One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage.” Aurora, as you know, avoids distributed consensus under most circumstances. Short 8-page read.
  • Dormando is blogging again, and this was of particular interest — Caching beyond RAM: the case for NVMe. This is done in the context of memcached, which I am certain many use.
  • It is particularly heartening to note that not only does MongoDB use Linkbench for some of their performance testing, they’re also contributing to making it better via a pull request.

Industry Updates

Trying something new here… To cover fundraising, and people on the move in the database industry.

  • Kenny Gorman — who has been on the program committee for several Percona Live conferences, and spoken at the event multiple times before — is the founder and CEO of Eventador, a stream-processing as a service company built on Apache Kafka and Apache Flink, has just raised $3.8 million in funding to fuel their growth. They are also naturally spending this on hiring. The full press release.
  • Jimmy Guerrero (formerly of MySQL and InfluxDB) is now VP Marketing & Community at YugaByte DB. YugaByte was covered in column 13 as having raised $8 million in November 2017.

Upcoming appearances

  • DataOps Barcelona – Barcelona, Spain – June 21-22, 2018 – code dataopsbcn50 gets you a discount
  • OSCON – Portland, Oregon, USA – July 16-19, 2018
  • Percona webinar on Maria Server 10.3 – June 26, 2018


I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 42: Security Focus on Redis and Docker a Timely Reminder to Stay Alert appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com