Sep
05
2018
--

Modifying List of Collected Metrics on PMM Linux Exporter

Changing metrics collection with PMM for Linux

Changing metrics collection with PMM for LinuxDo you need to modify the metrics collected from Linux by Percona Monitoring and Management (PMM)? In this blog post we will see how to enable, disable, and update collected metrics on PMM’s linux:metrics exporter. 

We will assume that the PMM client packages are installed, and they are configured already.

Using a custom list of metrics

Let’s now suppose we are not yet collecting any metrics on our desired client server, and we want to enable only the following: diskstats, meminfo, netdev and vmstat. We can use the following command:

pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,netdev,vmstat

So, in order to enable or disable the functionality we want, we need to modify the collectors.enabled list accordingly. In the following online documentation page, we are able to find all collectors supported:

https://www.percona.com/doc/percona-monitoring-and-management/section.exporter.node.html

In this way, by adding or removing items from the collectors.enabled list, we can choose which functionality will be set in our PMM linux:metrics collectors.

Checking list of metrics used

We can use the ps aux command to check the collectors list that applies at any given time. Let’s see this in practical terms. After running the previous command, we should expect to see the following:

shell> ps aux | grep node_exporter
...
root     20450 3.2  0.0 1660248 15924 ?       Sl 13:48 0:13 /usr/local/percona/pmm-client/node_exporter -collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,meminfo_numa -web.listen-address=10.10.8.141:42000 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml -web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt -web.ssl-key-file=/usr/local/percona/pmm-client/server.key -collectors.enabled=diskstats,meminfo,netdev,vmstat

Note: there is currently a bug where you will see duplicate entries when using custom collectors. We are tracking this under issue PMM-2857. For now, if you see two different lists on ps aux outputs, the one that applies is the last one (as seen above).

Updating the list of metrics

There is currently no way to enable (or disable) metrics dynamically (however, we are working on it). We will need to stop the collector, manually get and edit the metrics list, and re-add it with the updated collectors in place. For this, we will need to know the current list of metrics used from the ps aux outputs.

Continuing with the same example, suppose we are not happy with what netdev metrics offer, and we wish to disable them to avoid unnecessary overhead. Knowing the metrics list was:

-collectors.enabled=diskstats,meminfo,netdev,vmstat

We will need to run the following commands to have it removed from the collectors list:

pmm-admin stop linux:metrics
pmm-admin rm linux:metrics
pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,vmstat

Links for reference

  • Passing options to exporter: here and here
  • Collector options for linux:metrics exporter: here

Did you find this post useful?

You might also enjoy some of our other resources. Check out some of these recent technical webinars on using PMM, presented by my colleagues from Percona:

The post Modifying List of Collected Metrics on PMM Linux Exporter appeared first on Percona Database Performance Blog.

Sep
03
2018
--

Monitoring S.M.A.R.T. Metrics with Prometheus and PMM

visualized using Grafana

In his excellent blog post, Pavel Trukhanov showed the value of S.M.A.R.T. metric collections, so I wondered how hard would it be to enable their collection in Percona Monitoring and Management (PMM)

A quick search led me to the  text_collector plugin SmartMon, which can be easily integrated with any Prometheus Installation

For PMM, Vadim Yalovets recently showed how to do custom integrations based on text_collector

Let’s put those together:

  1. Ensure you have the smartctl tool installed. It is available in repositories for most Linux distributions
  2. Get  smartmon.sh and place it in /usr/local/bin or other location
  3. Install the cron job
    echo  "*/5 * * * * root bash  /usr/local/bin/smartmon.sh > /tmp/smart_metrics.prom  " > /etc/cron.d/smartmon
  4. Enable textfile_collector as described in this blog post

That’s it! You should get your data flowing. Now you can use Prometheus to query device information:

use prometheus to query device

Or if you want to get a specific S.M.A.R.T value, such as media_wearout indicator:

specific smart value wearout indicator

If you would like to see a nicer visualization in Grafana, you can install the appropriate dashboard from the Grafana web site.

visualized using Grafana

The number and kind of metrics you’re going to get depends on the storage device vendor and model. Here is an example list from one of my test systems:

# HELP smartmon_smartctl_version SMART metric smartctl_version
# TYPE smartmon_smartctl_version gauge
smartmon_smartctl_version{version="6.5"} 1
# HELP smartmon_current_pending_sector_raw_value SMART metric current_pending_sector_raw_value
# TYPE smartmon_current_pending_sector_raw_value gauge
smartmon_current_pending_sector_raw_value{disk="/dev/sda",type="sat",smart_id="197"} 0.000000e+00
# HELP smartmon_current_pending_sector_threshold SMART metric current_pending_sector_threshold
# TYPE smartmon_current_pending_sector_threshold gauge
smartmon_current_pending_sector_threshold{disk="/dev/sda",type="sat",smart_id="197"} 0
# HELP smartmon_current_pending_sector_value SMART metric current_pending_sector_value
# TYPE smartmon_current_pending_sector_value gauge
smartmon_current_pending_sector_value{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_current_pending_sector_worst SMART metric current_pending_sector_worst
# TYPE smartmon_current_pending_sector_worst gauge
smartmon_current_pending_sector_worst{disk="/dev/sda",type="sat",smart_id="197"} 100
# HELP smartmon_device_info SMART metric device_info
# TYPE smartmon_device_info gauge
smartmon_device_info{disk="/dev/sda",type="sat",vendor="",product="",revision="",lun_id="",model_family="",device_model="Crucial_CT275MX300SSD1",serial_number="16431465B53F",firmware_version="M0CR031"} 1
# HELP smartmon_device_smart_available SMART metric device_smart_available
# TYPE smartmon_device_smart_available gauge
smartmon_device_smart_available{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_enabled SMART metric device_smart_enabled
# TYPE smartmon_device_smart_enabled gauge
smartmon_device_smart_enabled{disk="/dev/sda",type="sat"} 1
# HELP smartmon_device_smart_healthy SMART metric device_smart_healthy
# TYPE smartmon_device_smart_healthy gauge
smartmon_device_smart_healthy{disk="/dev/sda",type="sat"} 1
# HELP smartmon_end_to_end_error_raw_value SMART metric end_to_end_error_raw_value
# TYPE smartmon_end_to_end_error_raw_value gauge
smartmon_end_to_end_error_raw_value{disk="/dev/sda",type="sat",smart_id="184"} 0.000000e+00
# HELP smartmon_end_to_end_error_threshold SMART metric end_to_end_error_threshold
# TYPE smartmon_end_to_end_error_threshold gauge
smartmon_end_to_end_error_threshold{disk="/dev/sda",type="sat",smart_id="184"} 0
# HELP smartmon_end_to_end_error_value SMART metric end_to_end_error_value
# TYPE smartmon_end_to_end_error_value gauge
smartmon_end_to_end_error_value{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_end_to_end_error_worst SMART metric end_to_end_error_worst
# TYPE smartmon_end_to_end_error_worst gauge
smartmon_end_to_end_error_worst{disk="/dev/sda",type="sat",smart_id="184"} 100
# HELP smartmon_offline_uncorrectable_raw_value SMART metric offline_uncorrectable_raw_value
# TYPE smartmon_offline_uncorrectable_raw_value gauge
smartmon_offline_uncorrectable_raw_value{disk="/dev/sda",type="sat",smart_id="198"} 0.000000e+00
# HELP smartmon_offline_uncorrectable_threshold SMART metric offline_uncorrectable_threshold
# TYPE smartmon_offline_uncorrectable_threshold gauge
smartmon_offline_uncorrectable_threshold{disk="/dev/sda",type="sat",smart_id="198"} 0
# HELP smartmon_offline_uncorrectable_value SMART metric offline_uncorrectable_value
# TYPE smartmon_offline_uncorrectable_value gauge
smartmon_offline_uncorrectable_value{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_offline_uncorrectable_worst SMART metric offline_uncorrectable_worst
# TYPE smartmon_offline_uncorrectable_worst gauge
smartmon_offline_uncorrectable_worst{disk="/dev/sda",type="sat",smart_id="198"} 100
# HELP smartmon_power_cycle_count_raw_value SMART metric power_cycle_count_raw_value
# TYPE smartmon_power_cycle_count_raw_value gauge
smartmon_power_cycle_count_raw_value{disk="/dev/sda",type="sat",smart_id="12"} 2.000000e+01
# HELP smartmon_power_cycle_count_threshold SMART metric power_cycle_count_threshold
# TYPE smartmon_power_cycle_count_threshold gauge
smartmon_power_cycle_count_threshold{disk="/dev/sda",type="sat",smart_id="12"} 0
# HELP smartmon_power_cycle_count_value SMART metric power_cycle_count_value
# TYPE smartmon_power_cycle_count_value gauge
smartmon_power_cycle_count_value{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_cycle_count_worst SMART metric power_cycle_count_worst
# TYPE smartmon_power_cycle_count_worst gauge
smartmon_power_cycle_count_worst{disk="/dev/sda",type="sat",smart_id="12"} 100
# HELP smartmon_power_on_hours_raw_value SMART metric power_on_hours_raw_value
# TYPE smartmon_power_on_hours_raw_value gauge
smartmon_power_on_hours_raw_value{disk="/dev/sda",type="sat",smart_id="9"} 1.313300e+04
# HELP smartmon_power_on_hours_threshold SMART metric power_on_hours_threshold
# TYPE smartmon_power_on_hours_threshold gauge
smartmon_power_on_hours_threshold{disk="/dev/sda",type="sat",smart_id="9"} 0
# HELP smartmon_power_on_hours_value SMART metric power_on_hours_value
# TYPE smartmon_power_on_hours_value gauge
smartmon_power_on_hours_value{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_power_on_hours_worst SMART metric power_on_hours_worst
# TYPE smartmon_power_on_hours_worst gauge
smartmon_power_on_hours_worst{disk="/dev/sda",type="sat",smart_id="9"} 100
# HELP smartmon_raw_read_error_rate_raw_value SMART metric raw_read_error_rate_raw_value
# TYPE smartmon_raw_read_error_rate_raw_value gauge
smartmon_raw_read_error_rate_raw_value{disk="/dev/sda",type="sat",smart_id="1"} 0.000000e+00
# HELP smartmon_raw_read_error_rate_threshold SMART metric raw_read_error_rate_threshold
# TYPE smartmon_raw_read_error_rate_threshold gauge
smartmon_raw_read_error_rate_threshold{disk="/dev/sda",type="sat",smart_id="1"} 0
# HELP smartmon_raw_read_error_rate_value SMART metric raw_read_error_rate_value
# TYPE smartmon_raw_read_error_rate_value gauge
smartmon_raw_read_error_rate_value{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_raw_read_error_rate_worst SMART metric raw_read_error_rate_worst
# TYPE smartmon_raw_read_error_rate_worst gauge
smartmon_raw_read_error_rate_worst{disk="/dev/sda",type="sat",smart_id="1"} 100
# HELP smartmon_reallocated_sector_ct_raw_value SMART metric reallocated_sector_ct_raw_value
# TYPE smartmon_reallocated_sector_ct_raw_value gauge
smartmon_reallocated_sector_ct_raw_value{disk="/dev/sda",type="sat",smart_id="5"} 0.000000e+00
# HELP smartmon_reallocated_sector_ct_threshold SMART metric reallocated_sector_ct_threshold
# TYPE smartmon_reallocated_sector_ct_threshold gauge
smartmon_reallocated_sector_ct_threshold{disk="/dev/sda",type="sat",smart_id="5"} 10
# HELP smartmon_reallocated_sector_ct_value SMART metric reallocated_sector_ct_value
# TYPE smartmon_reallocated_sector_ct_value gauge
smartmon_reallocated_sector_ct_value{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reallocated_sector_ct_worst SMART metric reallocated_sector_ct_worst
# TYPE smartmon_reallocated_sector_ct_worst gauge
smartmon_reallocated_sector_ct_worst{disk="/dev/sda",type="sat",smart_id="5"} 100
# HELP smartmon_reported_uncorrect_raw_value SMART metric reported_uncorrect_raw_value
# TYPE smartmon_reported_uncorrect_raw_value gauge
smartmon_reported_uncorrect_raw_value{disk="/dev/sda",type="sat",smart_id="187"} 0.000000e+00
# HELP smartmon_reported_uncorrect_threshold SMART metric reported_uncorrect_threshold
# TYPE smartmon_reported_uncorrect_threshold gauge
smartmon_reported_uncorrect_threshold{disk="/dev/sda",type="sat",smart_id="187"} 0
# HELP smartmon_reported_uncorrect_value SMART metric reported_uncorrect_value
# TYPE smartmon_reported_uncorrect_value gauge
smartmon_reported_uncorrect_value{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_reported_uncorrect_worst SMART metric reported_uncorrect_worst
# TYPE smartmon_reported_uncorrect_worst gauge
smartmon_reported_uncorrect_worst{disk="/dev/sda",type="sat",smart_id="187"} 100
# HELP smartmon_smartctl_run SMART metric smartctl_run
# TYPE smartmon_smartctl_run gauge
smartmon_smartctl_run{disk="/dev/sda",type="sat"} 1535666337
# HELP smartmon_temperature_celsius_raw_value SMART metric temperature_celsius_raw_value
# TYPE smartmon_temperature_celsius_raw_value gauge
smartmon_temperature_celsius_raw_value{disk="/dev/sda",type="sat",smart_id="194"} 3.100000e+01
# HELP smartmon_temperature_celsius_threshold SMART metric temperature_celsius_threshold
# TYPE smartmon_temperature_celsius_threshold gauge
smartmon_temperature_celsius_threshold{disk="/dev/sda",type="sat",smart_id="194"} 0
# HELP smartmon_temperature_celsius_value SMART metric temperature_celsius_value
# TYPE smartmon_temperature_celsius_value gauge
smartmon_temperature_celsius_value{disk="/dev/sda",type="sat",smart_id="194"} 69
# HELP smartmon_temperature_celsius_worst SMART metric temperature_celsius_worst
# TYPE smartmon_temperature_celsius_worst gauge
smartmon_temperature_celsius_worst{disk="/dev/sda",type="sat",smart_id="194"} 59
# HELP smartmon_udma_crc_error_count_raw_value SMART metric udma_crc_error_count_raw_value
# TYPE smartmon_udma_crc_error_count_raw_value gauge
smartmon_udma_crc_error_count_raw_value{disk="/dev/sda",type="sat",smart_id="199"} 0.000000e+00
# HELP smartmon_udma_crc_error_count_threshold SMART metric udma_crc_error_count_threshold
# TYPE smartmon_udma_crc_error_count_threshold gauge
smartmon_udma_crc_error_count_threshold{disk="/dev/sda",type="sat",smart_id="199"} 0
# HELP smartmon_udma_crc_error_count_value SMART metric udma_crc_error_count_value
# TYPE smartmon_udma_crc_error_count_value gauge
smartmon_udma_crc_error_count_value{disk="/dev/sda",type="sat",smart_id="199"} 100
# HELP smartmon_udma_crc_error_count_worst SMART metric udma_crc_error_count_worst
# TYPE smartmon_udma_crc_error_count_worst gauge
smartmon_udma_crc_error_count_worst{disk="/dev/sda",type="sat",smart_id="199"} 100

The post Monitoring S.M.A.R.T. Metrics with Prometheus and PMM appeared first on Percona Database Performance Blog.

Aug
28
2018
--

Extend Metrics for Percona Monitoring and Management Without Modifying Code

PMM Extended Metrics

Percona Monitoring and Management (PMM) provides an excellent solution for system monitoring. Sometimes, though, you’ll have the need for a metric that’s not present in the list of node_exporter metrics out of the box. In this post, we introduce a simple method and show how to extend the list of available metrics without modifying the node_exporter code. It’s based on the textfile collector.

Enable the textfile collector in pmm-client

This collector is not enabled by default in the latest version of pmm-client. So, first let’s enable the textfile collector.

# pmm-admin rm linux:metrics
OK, removed system pmm-client-hostname from monitoring.
# pmm-admin add linux:metrics -- --collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,textfile --collector.textfile.directory="/tmp"
OK, now monitoring this system.
# pmm-admin ls
pmm-admin 1.13.0
PMM Server      | 10.178.1.252  
Client Name     | pmm-client-hostname
Client Address  | 10.178.1.252  
Service Manager | linux-upstart
-------------- -------------------- ----------- -------- ------------ --------
SERVICE TYPE   NAME                 LOCAL PORT  RUNNING  DATA SOURCE  OPTIONS  
-------------- -------------------- ----------- -------- ------------ --------
linux:metrics  pmm-client-hostname  42000       YES      -

Notice that the whole list of default collectors has to be re-enabled. Also, don’t forget to specify the directory for reading files with the metrics (–collector.textfile.directory=”/tmp”). The exporter reads files with the extension .prom

Add a crontab task

The second step is to add a crontab task to collect metrics and place them into a file.

Here are the cron commands for collecting the number of running and stopping docker containers.

*/1 * * * *     root   echo -n "" > /tmp/docker_all.prom; /usr/bin/docker ps -a | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_total /p' >> /tmp/doc
ker_all.prom;
*/1 * * * *     root   echo -n "" > /tmp/docker_running.prom; /usr/bin/docker ps | sed -n '1!p'| /usr/bin/wc -l | sed -ne 's/^/node_docker_containers_running_total /p' >>
/tmp/docker_running.prom;

The result of the commands is placed into the files 

/tmp/docker_running.prom

and

/tmp/docker_running.prom

and read by exporter.

Look - we got a new metric!

Adding the crontab tasks by using a script

Also, we have a few bash scripts that make it much easier to add crontab tasks.

The first one allows you to collect the logged-in users and the size of Innodb data files.

Modifying the cron job - a script

You may use the suggested names of files and metrics or set new ones.

The second script is more universal. It allows us to get the size of any directories or files. This script can be placed directly into a crontab task. You should just specify the list of monitored instances (e.g. /var/log /var/cache/apt /var/lib/mysql/ibdata1)

echo  "*/5 * * * * root bash  /root/object_sizes.sh /var/log /var/cache/apt /var/lib/mysql/ibdata1"  > /etc/cron.d/object_size

So, I hope this has provided useful insight into how to set up the collection of new PMM metrics without the need to write code. Please feel free to use the scripts or configure commands similar to the ones provided above.

More resources you might enjoy

If you are new to PMM, there is a great demo site of the latest version, showing you those out of the box metrics. Or how about our free webinar on monitoring Amazon RDS with PMM?

The post Extend Metrics for Percona Monitoring and Management Without Modifying Code appeared first on Percona Database Performance Blog.

Oct
05
2017
--

Graph Descriptions for Metrics Monitor in Percona Monitoring and Management 1.3.0

PMM 1.3.0

The Metrics Monitor of Percona Monitoring and Management 1.3.0 (PMM) provides graph descriptions to display more information about the monitored data without cluttering the interface.

Percona Monitoring and Management 1.3.0 is a free and open-source platform for managing and monitoring MySQL®, MariaDB® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL, MariaDB® and MongoDB servers to ensure that your data works as efficiently as possible.

Each dashboard graph in PMM contains a lot of information. Sometimes, it is not easy to understand what the plotted line represents. The metric labels and the plotted data are limited and have to account for the space they can use in dashboards. It is simply not possible to provide additional information which might be helpful when interpreting the monitored metrics.

The new version of the PMM dashboards introduces on-demand descriptions with more details about the metrics in the given graph and about the data.

Percona Monitoring and Management 1.3.0

These on-demand descriptions are available when you hover the mouse pointer over the icon at the top left corner of each graph. The descriptions do not use the valuable space of your dashboard. The graph descriptions appear in small boxes. If more information exists about the monitored metrics, the description contains a link to the associated documentation.

Percona Monitoring and Management 1.3.0

In release 1.3.0 of PMM, Metrics Monitor only starts to use this convenient tool. In subsequent releases, graph descriptions are going to become a standard attribute of each Metrics Monitor dashboard.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com