Feb
20
2018
--

Understand Your Prometheus Exporters with Percona Monitoring and Management (PMM)

Prometheus Exporters 2 small

In this blog post, I will look at the new dashboards in Percona Monitoring and Management (PMM) for Prometheus exporters.

Percona Monitoring and Management (PMM) uses Prometheus exporters to capture metrics data from the system it monitors. Those Prometheus exporters are an important part of your monitoring infrastructure, and understanding their performance and other operational details is critical for well-implemented monitoring.    

To help you with this we’ve added a number of new dashboards to Percona Monitoring and Management.

The Prometheus Exporters Overview dashboard provides a high-level overview of your installed Prometheus exporter infrastructure:

Prometheus Exporters

The summary shows you how many hosts are monitored and how many exporters you have running, as well as how much CPU and memory they are using.

Note that the CPU usage shown in this graph is only the CPU usage of the exporter itself. It does not include the additional resource usage that is required to produce metrics by the application or operating system.

Next, we have an overview of resource usage by the host:  

Prometheus Exporters 2

Prometheus Exporters 3

These graphs allow us to analyze the resource usage for different hosts, allowing us to clearly see if any of the hosts have unusually high CPU or memory usage by exporters.

You may notice some of the CPU usage reported on these graphs is very high. This is due to the fact that we use very high-resolution sampling and very underpowered instances for this demonstration environment. CPU usage numbers like this are not typical.

The next graphs show resource usage by the type of exporter:

Prometheus Exporters 4

Prometheus Exporters 5

In this case, we measure CPU usage in “CPU Cores” rather than as a percent – it is more meaningful. Otherwise, the same amount of actual resource usage by the exporter will look very different on a system with one core versus a system with 64 cores. Core usage numbers have a pretty stable baseline, though.

Then there is a list of your monitored hosts and the exporters they are running:

Prometheus Exporters 6

This shows your CPU usage and memory usage per host, as well as the number of exporters running and system details.

You can click on a host to get to the System Overview, or jump to Prometheus Exporter Status dashboard.

Prometheus Exporter Status dashboard allows you to investigate how specific exporters are performing for the given host. Each of the well-known exporters has its own row in this dashboard.

Node Exporter Status shows us the resource usage, uptime and performance of Node Exporter (the exporter responsible for capturing OS-level metrics):   

Prometheus Exporters 7

Prometheus Exporters 8

The “Collector Scrape Successful” shows which node_exporter collector category (which are modules that collect specific information) have returned data reliably. If you have anything but a flat line on “1” here, you need to check for problems.

“Collector Execution Time” shows how long on average it takes to execute your enabled collectors. This shows which collectors are generally more expensive to run (or if some of them are experiencing performance problems).

MySQL Exporter Status shows us how MySQL exporter is performing:

Prometheus Exporters 9

Additionally, in resource usage we see the rate of scrapes for High, Medium and Low resolution data.

Generally, you should see three flat lines here if everything is working well. This is not the case for this host, and we can see some scrapes are not successful – either failing to complete, or not triggered by Prometheus Server altogether (due to overload or connectivity issues).

Prometheus Exporters 10

These graphs provide information about MySQL Exporter Errors – permission errors and other issues. It also shows if MySQL Server was up during this time. There are also similar details reported for MongoDB and ProxySQL exporters if they are running on the host.

I hope these new dashboards help you to understand your Prometheus exporter performance better!

Feb
14
2018
--

Amazon Aurora MySQL Monitoring with Percona Monitoring and Management (PMM)

Amazon Aurora MySQL Monitoring small

In this blog post, we’ll review additional Amazon Aurora MySQL monitoring capabilities we’ve added in Percona Monitoring and Management (PMM) 1.7.0. You can see them in action in the MySQL Amazon Aurora Metrics dashboard.

Amazon Aurora MySQL Transaction CommitsAmazon Aurora MySQL Monitoring

This graph looks at the number of commits the Amazon Aurora engine performed, as well as the average commit latency. As you can see from this graph, latency does not always correlate with the number of commits performed and can be quite high in certain situations.

Amazon Aurora MySQL LoadAmazon Aurora MySQL Monitoring 2

In Percona Monitoring and Management, we often use the concept of “Load” – which roughly corresponds to the number of operations of a type in progress. This graph shows us what statements contribute the most load on the system, as well as what load corresponds to the Amazon Aurora transaction commits (which we observed in the graph before).

Amazon Aurora MySQL Memory Usage

Amazon Aurora MySQL Monitoring 3

This graph is pretty self-explanatory. It shows how much memory is used by the Amazon Aurora lock manager, as well as the amount of memory used by Amazon Aurora to store Data Dictionary.

Amazon Aurora MySQL Statement Latency

Amazon Aurora MySQL Monitoring 4

This graph shows the average latency for the most important types of statements. Latency spikes, as shown in this example, are often indicative of the instance overload.

Amazon Aurora MySQL Special Command Counters

Amazon Aurora MySQL Monitoring 5

Amazon Aurora MySQL allows a number of commands that are not available in standard MySQL. This graph shows the usage of such commands. Regular “unit_test” calls can be seen in the default Amazon Aurora install, and the rest depends on your workload.

Amazon Aurora MySQL Problems

Amazon Aurora MySQL Monitoring 6

This graph is where you want to see a flat line. It shows different kinds of internal Amazon Aurora MySQL problems, which in normal operation should generally be zero.

I hope you find these Amazon Aurora MySQL monitoring improvements useful. Let us know if there is any other Amazon Aurora information that would be helpful to display!

Feb
12
2018
--

Does Percona Monitoring and Management (PMM) Support External Monitoring Services? Yes It Does!

External Monitoring Services

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

Starting with version 1.4.0 and improved in 1.7.0, PMM supports external monitoring services. This means you can plug in Prometheus exporters for technologies not directly provided by Percona. For example, you can start monitoring the metrics of your PostgreSQL database host, Memcached or Redis.

Exporters Overview

Applications store their metrics in arbitrary formats, and Prometheus exporters collect them and produce (or export to) a consistent format of key-value pairs. The keys refer to metric types and values are numbers in the float 64 format. Due to the diversity of formats that applications may use, you should program a specific exporter for each format. However, if you decide to make the metrics of your application available via PMM you may consider using one of existing Prometheus exporters.

Currently, PMM offers exporters for MySQL (mysqld_exporter) and MongoDB (mongodb_exporter) database management systems. Built-in exporters also exist for Percona XtraDBCluster, MariaDB, RDS and Aurora via mysqld_exporter and for ProxySQL (via proxysql_exporter). These exporters are made available as monitoring services that you can add or remove as necessary. In addition, PMM includes the node_exporter to capture the host level Linux metrics such as CPU, Load, and disk resources.

Using Exporters

On the computer where the PMM client is installed and connected to a PMM server, make use of the pmm-admin utility to add any built-in monitoring service directly. There is no extra effort in this case: the added monitoring service will run its exporter and all required configuration updates are made automatically to make the metrics available in the web interface for further analysis in Query analytics and Metrics monitor.

In case of external monitoring services, you need to locate, download, set up and run the specific Prometheus exporter to collect metrics. When it is ready, you can add it as a monitoring service:

pmm-admin add external:service job_name [instance] --service-port=PORT_NUMBER

This command adds an external monitoring service bound to the Prometheus job that you specify as the job_name parameter. You should also provide the port associated with this Prometheus job as the value of the service-port parameter. The instance parameter is optional. By default, it is assigned the name of the host where you run pmm-admin.

Example 1: Adding a PostgreSQL Monitoring Service

In order to add an external monitoring service for a PostgreSQL database server, make sure to install and configure your PostgreSQL server. Then, select a PostgreSQL Prometheus exporter from the list available from the  Prometheus site, such as PostgreSQL metric exporter for Prometheus. Refer to the documentation for this exporter for details about how to install and set it up.

As soon as your Prometheus exporter can collect metrics from your PostgreSQL database server,  you are ready to add this exporter as a monitoring service. Make sure that you have access to a configured PMM server and your PMM client has been set up to use it. Use the pmm-admin utility, which is part of PMM client, to add the PostgreSQL monitoring service. Assuming postgresql is the name of this monitoring service, your command should look like this:

pmm-admin add external:service --service-port=PORT_NUMBER postgresql

It is time now to display the metrics on the PMM Server. Open Metrics Monitor and check the Advanced Data Exploration dashboard. This can dashboard visualize a lot of metrics including those exposed by external monitoring services. In the Host field select your host. Use the Metric field to select a metric.

External Monitoring Services
Viewing a metric exposed by a PostgreSQL exporter.

Setting up an external monitoring service requires extra work compared to adding built-in monitoring services. However, by using external monitoring services you can considerably extend the capabilities of your PMM installation.

Note that running the pmm-admin list command lists the added external monitoring services. They also appear in the JSON output, too. To remove an external service use the remove (or its short form rm) command:

pmm-admin rm external:service --service-port=PORT_NUMBER NAME_OF_EXTERNAL_MONITORING_SERVICE

$ sudo pmm-admin list
pmm-admin 1.7.0
PMM Server      | 192.0.2.2 (password-protected)
Client Name     | postgres01
Client Address  | 192.0.2.3
Service Manager | unix-systemv
Job name    Scrape interval  Scrape timeout  Metrics path  Scheme  Target         Labels                   Health
postgresql  1s               1s              /metrics      http    192.0.2.3:9187 instance="postgres01"      UP

Example 2: Adding a Redis Monitoring Service

To start with, you must install a Prometheus exporter for Redis (this exporter is listed on the Prometheus Exporters and Integrations page) on the machine where your PMM client runs. The following command adds this exporter as an external monitoring service (run it as a superuser or use sudo). This time the command has an extra parameter:

$ sudo pmm-admin add external:service redis --service-port 9121 redis01
External service added.

Notice that we use Redis Server as the last parameter passed to pmm-admin add external:service command. The last positional parameter is a label that you assign to this particular instance.

pmm-admin add external:service --service-port=PORT_NUMBER NAME_OF_EXTERNAL_MONITORING_SERVICE [INSTANCE_LABEL]

You may choose any name for this purpose. Make sure to use quotes if you decide to use a label made of two or more words.

$ sudo pmm-admin list
pmm-admin 1.7.0
PMM Server | 127.0.0.1
Client Name | percona
Client Address | 172.17.0.1
Service Manager | linux-systemd
No services under monitoring.
Job name Scrape interval Scrape timeout Metrics path Scheme Target          Labels                  Health
redis    1m0s            10s            /metrics     http   172.17.0.1:9121 instance="redis01"      UP

To view Redis related metrics you need to open the Advanced Data Exploration dashboard on your PMM Server. The redis01 label automatically appears in the Host field in the Advanced Data Exploration dashboard. In the Host field, select the redis01 option and choose a metric to view from the Metric field, such as redis_exporter_scrapes_total.

Other Ways to Add External Services

The pmm-admin add external:service command is the recommended way to add an external monitoring service. There exist other, more specific, methods. The pmm-admin add external:metrics adds external Prometheus exporters job to metrics monitoring.

Feb
09
2018
--

Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM)

Collecting PostgreSQL Information using Percona Monitoring and Management

In this article, we’ll describe how to collect PostgreSQL metrics with Percona Monitoring and Management (PMM).

We designed Percona Monitoring and Management (PMM) to be the best tool for MySQL and MongoDB performance investigation. At the same time, it’s built on mature opensource components: Prometheus’ time series database and Grafana. Starting from PMM 1.4.0. it’s possible to add monitoring for any service supported by Prometheus.

Demo

# install docker and docker-compose.
git clone https://github.com/ihanick/pmm-postgresql-demo.git
cd pmm-postgresql-demo
docker-compose build
docker-compose up

At this point, we are running exporter, PostgreSQL and the PMM server, but pmm-client on the PostgreSQL server isn’t configured.

docker-compose exec pg sh /root/initpmm.sh

Now we configured pmm client and added external exporter.

Let’s assume that you have executed commands above on the localhost. At this point we have several URLs:

We also need to create graphs for our new exporter. This could be done manually (import JSON), or you can import the existing dashboard Postgres_exporter published in the Grafana gallery by number in the catalog:

  1. Go to your PMM server web interface and press on the Grafana icon at the top left corner, then dashboards, the import.
  2. Copy and paste the dashboard ID from the Grafana site to “Grafana.com Dashboard” field, and press load.
  3. In the next dialog, choose Prometheus as a data source and continue.

PostgreSQL performance graphs can be seen at: http://localhost:8080/graph/dashboard/db/postgres_exporter?orgId=1

collect PostgreSQL metrics with Percona Monitoring and Management
PMM PostgreSQL postgres_exporter template

 

PMM-PostgreSQL Demo Under the Hood

To move this configuration to production, we need to understand how this demo works.

PMM Server

First of all, you need an existing PMM Server. You can find details on new server configuration at Deploying Percona Monitoring and Management.

In my demo I’m starting PMM without volumes, and all metrics dropped after using the docker-compose down command. Also, there is no need to use port 8080 for PMM, set it up with SSL support and password in production.

PostgreSQL Setup

I’m modifying the latest default PostgreSQL image to:

Of course, you can use a dedicated PostgreSQL server instead of one running inside a docker-compose sandbox. The only requirement is that the PMM server should be able to connect to this server.

User creation and permissions:

CREATE DATABASE postgres_exporter;
CREATE USER postgres_exporter PASSWORD 'password';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;
-- If deploying as non-superuser (for example in AWS RDS)
-- GRANT postgres_exporter TO :MASTER_USER;
CREATE SCHEMA postgres_exporter AUTHORIZATION postgres_exporter;
CREATE VIEW postgres_exporter.pg_stat_activity
AS
  SELECT * from pg_catalog.pg_stat_activity;
GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;
CREATE VIEW postgres_exporter.pg_stat_replication AS
  SELECT * from pg_catalog.pg_stat_replication;
GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;

To simplify setup, you can use a superuser account and access pg_catalog directly. To improve security, allow this user to connect only from exporter host.

PMM Client Setup on PostgreSQL Host

You can obtain database-only statistics with just the external exporter, and you can use any host with pmm-client installed. Fortunately, you can also export Linux metrics from the database host.

After installing the pmm-client package, you still need to configure the system. We should point it to the PMM server and register the external exporter (and optionally add the linux:metrics exporter).

#!/bin/sh
pmm-admin config --client-name pg1 --server pmm-server
pmm-admin add external:metrics postgresql pgexporter:9187
# optional
pmm-admin add linux:metrics
# other postgresql instances
pmm-admin add external:instances postgresql 172.18.0.3:9187

It’s important to keep the external exporter job name as “postgresql”, since all existing templates check it. There is a bit of inconsistency here: the first postgresql server is added as external:metrics, but all further servers should be added as external:instances.

The reason is the first command creates the Prometheus job and first instance, and further servers can be added without job creation.

PMM 1.7.0 external:service

Starting from PMM 1.7.0 the setup simplified if exporter located on the same host as pmm-client:

pmm-admin config --client-name pg1 --server pmm-server
pmm-admin add external:service --service-port=9187 postgresql

pmm-admin add external:metrics or pmm-admin add external:instances are not required if you are running exporter on the same host as pmm-client.

Exporter Setup

Exporter is a simple HTTP/HTTPS server returning one page. The format is:

curl -si http://172.17.0.4:9187/metrics|grep pg_static
# HELP pg_static Version string as reported by postgres
# TYPE pg_static untyped
pg_static{short_version="10.1.0",version="PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit"} 1

As you can see, it’s a self-describing set of counters and string values. The Prometheus time series database built-in to PMM connects to the web server and stores the results on disk. There are multiple exporters available for PostgreSQL. postgres_exporter is listed as a third-party on the official Prometheus website.

You can compile exporter by yourself, or run it inside docker container. This and many other exporters are written in Go and compiled as a static binary so that you can copy the executable from the host with same CPU architecture. For production setups, you probably will run exporter from a database host directly and start the service with systemd.

In order to check network configuration issues, login to pmm-server and use the curl command from above. Do not forget to replace 172.17.0.4:9187 with the appropriate host:port (use the same IP address or DNS name as the pmm-admin add command).

You configure postgres_exporter with a single environment variable:

DATA_SOURCE_NAME=postgresql://postgres_exporter:password@pg:5432/postgres_exporter?sslmode=disable

Make sure that you provide the correct credentials, including the database name.

Run external exporter directly on database server

In order to simplify production setup, you can run exporter directly from the same server as you are using for running PostgreSQL.
This method allows you to use pmm-admin add external:service command recently added to PMM.

# Copy exporter binary from docker container to the local directory to skip build from sources
docker cp pmmpostgres_pgexporter_1:/postgres_exporter ./
# copy exporter binary to database host, use scp instead for existing database server.
docker cp postgres_exporter pmmpostgres_pg_1:/root/
# login to database server shell
docker exec -it pmmpostgres_pg_1 bash
# start exporter
DATA_SOURCE_NAME='postgresql://postgres_exporter:password@127.0.0.1:5432/postgres_exporter?sslmode=disable' ./postgres_exporter

Grafana Setup

In the demo, I’ve used Postgres_exporter dashboard. Use the same site and look for other PostgreSQL dashboards if you need more. The exporter provides many parameters, and not all of them are visualized in this dashboard.

For huge installations, you may find that filtering servers by “instance name” is not comfortable. Write your own JSON for the dashboard, or try to use one from demo repository. It’s the same as dashboard 3742, but uses the hostname for filtering and Prometheus job name in the legends.

All entries of instance=~"$instance" get replaced with instance=~"$host:.*".

The modification allows you to switch between PostgreSQL servers with host instead of “instance”, and see CPU and disk details for the current database server instead of the previously selected host.

Notice

This blog post on how to collect PostgreSQL metrics with Percona Monitoring and Management is not an official integration of PostgreSQL and PMM. I’ve tried to describe complex external exporters setup. Instead of PostgreSQL, you can use any other services and exporters with a similar setup, or even create your own exporter and instrument your application. It’s a great thing to see correlations between application activities and other system components like databases, web servers, etc.

Feb
09
2018
--

This Week in Data with Colin Charles 27: Percona Live Tutorials Released and a Comprehensive Review of the FOSDEM MySQL DevRoom

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 update: tutorials have been announced. The committee rated over 300+ talks, and easily 70% of the schedule should go live next week as well. In practice, then, you should see about 50 talks announced next week. There’s been great competition: we only have 70 slots in total, so about 1 in 5 talks get picked — talk about a competitive ratio.

FOSDEM

FOSDEM was truly awesome last week. From a Percona standpoint, we had a lot of excellent booth traffic (being outside of the PostgreSQL room on Saturday, and not too far out from the MySQL room on Sunday). We gave away bottle openers — useful in Brussels with all the beer; we tried a new design with a magnet to attach it to your fridge — stickers, some brochures, but most of all we had plenty of great conversations. There was quite a crowd from Percona, and it was excellent to see the MySQL & Friends DevRoom almost constantly full! A few of us brave souls managed to stay there the whole day, barely with any breaks, so as to enjoy all the talks.

I find the quality of talks to be extremely high. And when it comes to a community run event, with all content picked by an independent program committee, FOSDEM really sets the bar high. There is plenty of competition to get a good talk in, and I enjoyed everything we picked (yes, I was on the committee too). We’ve had plenty of events in the ecosystem that sort of had “MySQL” or related days, but FOSDEM might be the only one that has really survived. I understand we will have a day of some sort at SCALE16x, but even that has been scaled down. So if you care about the MySQL ecosystem, you will really want to ensure that you are at FOSDEM next year.

This year, we started with the usual MySQL Day on Friday. I could not be present, as I was at the CentOS Dojo, giving a presentation. So, the highlight of Friday for me? The community dinner. Over 80 people showed up, I know there was a waiting list, and lots of people were trying to get tickets at the last minute. Many missed out too; sorry, better luck next year; and also, hopefully, we will get a larger venue going forward. I really thank the organizers for this — we affectionately refer to them as the Belconians (i.e. a handful of Perconians based in Belgium). The conversation, the food, the drink —  they were all excellent. It’s good to see representation from all parts of the community: MySQL, Percona, MariaDB, Pythian, and others. So thank you again, Liz, Dimitri, Tom, and Kenny in absentia. I think Tjerk also deserves special mention for always helping (this year with the drinks)

As for FOSDEM itself, beyond the booth, I think the most interesting stuff was the talks. There are video recordings and slides of pretty much all talks, but I will also give you the “Cliff’s Notes” of them here.

MySQL DevRoom talk quick summaries

Beyond WHERE and GROUP BY – Sergei Golubchik

  • EXCEPT is in MariaDB Server 10.3
  • recursive CTEs are good for hierarchical data, graphs, data generation, Turing complete (you can use it to solve Sudoku even)
  • non-recursive CTEs can be an alternative syntax for subqueries in the FROM clause
  • Window functions:
    • Normal: one result per row, depend on that row only
    • Aggregate: one result per group, depending on the whole group
    • Window: one result per row, depending on the whole group
  • System versioned tables with AS OF
  • Aggregate stored functions

MySQL 8.0 Performance: InnoDB Re-Design – Dimitri Kravtchuk

  • Contention-Aware Transactions Scheduling (CATS), since 8.0.3. Not all transactions are equal, FIFO could not be optimal, unblock the most blocking transactions first
  • CATS (VATS) had a few issues, and there were bugs (they thought everything worked since MariaDB Server had implemented it). They spent about 9 months before fixing everything.
  • Where does CATS help? Workloads hitting row lock contentions. You can monitor via SHOW ENGINE INNODB MUTEX.
  • the main problem is because of repeatable read versus read committed transaction isolation on the same workload. You really need to understand your workload when it comes to VATS.

MySQL 8.0 Roles – Giuseppe Maxia

  • Created like a user, granted like privileges. You need to activate them to use them.
  • Before roles, you created a user, then grant, grant, and more grant’s… Add another user? Same deal. Lots of repetitive work and a lot of chances to make mistakes.
  • Faster user administration – define a role, assign it many times. Centralized grant handling – grant and revoke privileges to roles, add/edit all user profiles.
  • You need to remember to set the default role.
  • A user can have many roles; default role can be a list of roles.
  • Roles are users without a login – roles are saved in user tables. This is useful from an account lock/unlock perspective.
  • You can grant a user to a user
  • SET ROLE is for session management; SET DEFAULT ROLE is a permanent assignment of a role for a user. SET ROLE DEFAULT means assign the default role for this user for this session
  • The role_edges table reports which roles are assigned to which users. default_roles keeps track of the current default roles assigned to users. A default role may not exist.

Histogram support in MySQL 8.0 – Øystein Grøvlen

  • You can now do ANALYZE TABLE table UPDATE HISTOGRAM on column WITH n BUCKETS;
  • New storage engine API for sampling (default implementation is full table scan even when sampling)
  • Histogram is stored in a JSON column in the data dictionary. Grab this from the INFORMATION_SCHEMA.
  • Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries, queries with IN-subqueries, ORDER BY … LIMIT queries. Best fit: low cardinality columns (e.g. gender, orderStatus, dayOfWeek, enums), columns with uneven distribution (skew), stable distribution (do not change much over time)
  • How many buckets? equi-height, 100 buckets should be enough.
  • Histograms are stored in the data dictionary, so will persist over restarts of course.

Let’s talk database optimizers – Vicen?iu Ciorbaru

TLS for MySQL at Large Scale – Jaime Crespo

  • Literally took 3 lines in the my.cnf to turn on TLS
  • https://dbtree.wikimedia.org
  • They wanted to do a data centre failover and wanted to ensure replication would be encrypted.
  • They didn’t have proper orchestration in place (MySQL could have this too). Every time OpenSSL or MySQL had to be upgraded, the daemon needed restarting. If there was an incompatible change, you had to sync master/replicas too.
  • The automation and orchestration that Wikipedia uses: https://fosdem.org/2018/schedule/event/cumin_automation/ (it is called Cumin: https://wikitech.wikimedia.org/wiki/Cumin)
  • Server support was poor – OpenSSL – so they had to deploy wmf-mysql and wmf-mariadb of their own
  • Currently using MariaDB 10.0, and looking to migrate to MariaDB 10.1
  • Client library pain they’ve had
  • TLSv1.2 from the beginning (2015).
  • 20-50x slower for actual connecting; the impact is less than 5% for the actual query performance. Just fix client libraries, make them use persistent connections. They are now very interested in ProxySQL for this purpose.
  • https://grafana.wikimedia.org/?orgId=1
  • Monty asks, would a double certificate help? Jaime says sure. But he may not actually use double certificates; might not solve CA issues, and the goal is not to restart the server.
  • Monty wonders why not to upgrade to 10.2? “Let’s talk outside because it’s a much larger question.”

MySQL InnoDB Cluster – Miguel Araújo

  • group replication: update everywhere (multi-master), virtually synchronous replication, automatic server failover, distributed recovery, group reconfiguration, GCS (implementation of Paxos – group communication system). HA is a critical factor.
  • mysqlsh: interactive and batch operations. Document store (CRUD and relational access)
  • admin API in mysqlsh: create & manage clusters, hide complexity of configuration/provisioning/orchestration of the InnoDB clusters. Works with JavaScript and Python
  • Usability. HA out of the box.
  • It’s easy to join a new node; new node goes into recovery mode (and as long as you have all the binary logs, this is easy; otherwise start from a backup)
  • SET PERSIST – run a command remotely, and the configuration is persisted in the server
  • Network flapping? Group replication will just reject the node from the cluster if its flapping too often

Why we’re excited about MySQL 8 – Peter Zaitsev

  • Native data dictionary – atomic, crash safe, DDLs, no more MyISAM system table requirements
  • Fast INFORMATION_SCHEMA
  • utf8mb4 as default character set
  • Security: roles, breakdown of SUPER privileges, password history, faster cached-SHA2 authentication (default), builds using OpenSSL (like Percona Server), skip grants blocks remote connections, logs now encrypted when tablespace encryption enabled
  • Persistent AUTO_INCREMENT
  • auto-managed undo tablespaces – do not use system table space for undo space. Automatically reclaim space on disks.
  • Self-tuning, limited to InnoDB (innodb_dedicated_server to auto-tune)
  • partial in-place update for JSON – update filed in JSON object without full rewrite. Good for counters/statuses/timestamps. Update/removal of element is supported
  • Invisible indexes – test impact of dropping indexes before actually dropping them. Maintained but unused by the optimizer. If not needed or used, then drop away.
  • TmpTable Storage Engine – more efficient storage engine for internal temporary tables. Efficient storage for VARCHAR and VARBINARY columns. Good for GROUP BY queries. Doesn’t support BLOB/TEXT columns yet (this reverts to InnoDB temp table now)
  • Backup locks – prevent operations which may result in inconsistent backups. CHECK INSTANCE FOR BACKUP (something Percona Server has had before)
  • Optimizer histograms – detailed statistics on columns, not just indexes
  • improved cost model for the optimizer – www.unofficialmysqlguide.com
  • Performance schematic – faster (via “fake” indexes), error instrumentation, response time histograms (global & per query), digest summaries
  • select * from sys.session – fast potential replacement for show processlist
  • RESTART (command)
  • SET PERSIST – e.g. change the buffer pool size, and this helps during a restart
  • assumes default storage is SSD now
  • binary log on by default, log_slave_updates enabled by default, and log expires after 30 days by default
  • query cache removed. Look at ProxySQL or some other caching solution
  • native partitioning only – remove partitions from MyISAM or convert to InnoDB
  • resource groups – isolation and better performance (map queries to specific CPU cores; can jail your costly queries, like analytical queries)
  • Feature Requests: better single thread performance, no parallel query support

MySQL Test Framework for Support and Bugs Work – Sveta Smirnova

  • MTR allows you to add multiple connections
  • has commands for flow control

ProxySQL – GTID Consistent Reads – René Cannaò, Nick Vyzas

  • threshold is configurable in increments of 1 second. Replication lag can be monitored with ProxySQL. Want to ensure you don’t have stale reads.
  • Why is GTID important? To guarantee consistently. Auto positioning for restructuring topologies.
  • –session-track-gtids is an important feature which allows sending the GTID for a transaction on the OK packet for a transaction. Not available in MariaDB.
  • There is a ProxySQL Binlog Reader now – GTID information about a MySQL server to all connected ProxySQL instances. Lightweight process to run on your MySQL server.
  • ProxySQL can be configured to enforce GTID consistency for reads on any hostgroup/replication hostgroup.
  • Live demo by René

Turbocharging MySQL with Vitess – Sugu Sougoumarane

  • trend for the cloud: container instances, short-lived containers, tolerate neighbors, discoverability. No good tools yet for Kubernetes.
  • non-ideal options: application sharing, NoSQL, paid solutions, NewSQL (CockroachDB, TiDB, Yugabyte)
  • Vitess: leverage MySQL at massive scale, opensource, 8+ years of work, and multiple production examples
  • Square uses Vitess for Square Cash application.
  • Can MySQL run on Docker? Absolutely, many of the companies do huge QPS on Docker.
  • YouTube does a major re-shard every 2-3 months once. No one notices nowadays when that happens.
  • app server connects to vtgate, and only underneath it’s a bunch of smaller databases with vttablet + mysqld. The lockserver is what makes it run well in the cloud.
  • pluggable architecture with no compromise on performance: monitoring, health check, ACLs, tracing, more.
  • at most, it adds about 2ms overhead to connections
  • Go coding standards are enforced, unit tests with strict coverage requirements, end-to-end tests, Travis, CodeClimate and Netlify. Readability is king.
  • On February 5 2018, it will be a CNCF project. One year of due diligence. They said there was nothing to compare it with. Looked at maturity and contributors. It’s becoming a truly community-owned project! (CNCF to Host Vitess is already live as of now)
  • roadmap: full cross-shard queries, migration tools, simplify configurability, documentation.
  • full MySQL protocol, but a limited query set – they want to get it to a point where it accepts a full MySQL query.

Orchestrator on Raft – Shlomi Noach

  • Raft: guaranteed to be in-order replication log, an increasing index. This is how nodes choose a leader based on who has the higher index. Get periodic snapshots (node runs a full backup).
  • HashiCorp raft, a Golang raft implementation, used by Consul
  • orchestrator manages topology for HA topologies; also want orchestrator to be highly available. Now with orchestrator/raft, remove the MySQL backend dependency, and you can have data center fencing too. Now you get: better cross-DC deploys, DC-local KV control, and also Kubernetes friendly.
  • n-orchestrator nodes, each node still runs its own backend (either MySQL or SQLite). Orchestrator provides the communication for SQLite between the nodes. Only one (the Raft leader) will handle failovers
  • implementation & deployment @ Github – one node per DC (deployed at 3 different DCs). 1-second raft polling interval. 2 major DCs, one in the cloud. Step-down, raft-yield, SQLite-backed log store, and still a MySQL backend (SQLite backend use case is in the works)
  • They patched the HashiCorp raft library. The library doesn’t care about the identity of nodes, with Github they do want to control the identity of the leader. There is an “active” data center, and locality is important. This is what they mean by raft-yield (picking a candidate leader).
  • The ability for a leader to step down is also something they had to patch.
  • HashiCorp Raft only supports LMDB and another database, so the replication log is now kept in a relational SQLite backed log store. Another patch.
  • once orchestrator can’t run its own self-health check, it recognizes this. The application can tell raft now that it’s stepping down. Takes 5 seconds to step down, and raft then promotes another orchestrator node to be the leader. This is their patch.
  • can also grab leadership
  • DC fencing handles network partitioning.
  • orchestrator is Consul-aware. Upon failover, orchestrator updates Consul KV with the identity of the promoted master.
  • considerations to watch out for: what happens if, upon replay of the Raft log, you hit two failovers for the same cluster? NOW() and otherwise time-based assumptions. Reapplying snapshot/log upon startup
  • roadmap: use Kubernetes (cluster IP based configuration in progress, already container friendly via auto-re-provisioning of nodes via Raft)

MyRocks Roadmaps – Yoshinori Matsunobu

  • Facebook has a large User Database (UDB). Social graph, massively sharded, low latency, automated operations, pure flash storage (constrained by space, not CPU/IOPS)
  • They have a record cache in-front of MySQL – Tao for reads. If cache misses, then it hits the database. And all write requests go thru MySQL. UDB has to be fast to ensure a good user experience.
  • they also at Facebook run 2 instances of MySQL on the same machine, because CPU wasn’t huge, but the space savings were awesome.
  • design decisions: clustered index (same as InnoDB), slower for reads, faster for writes (bloom filters, column family), support for transactions including consistency between binlog and MyRocks. Faster data loading/deletes/replication, dynamic options (instead of having to restart mysqld), TTL (comparable to HBase TTL feature, specify the TTL, any data older than time, can be removed), online logical (for recovery purposes) & binary backup (for creating replicas)
  • Pros: smaller space, better cache hit rate, writes are faster so you get faster replication, much smaller bytes written
  • Cons: no statement based replication, GAP locks, foreign keys, full-text index, spatial index support. Need to use case sensitive collations for performance. Reads are slower, especially if the data fits in memory. Dependent on file system and OS; lack of solid direct I/O (uses buffered I/O). You need a newer than 4.6 kernel. Too many tuning options beyond buffer pool such as bloom filter, compactions, etc.
  • https://twitter.com/deniszh/status/960163082642382849
  • Completed InnoDB to MyRocks migration. Saved 50% space in UDB compared to compressed InnoDB.
  • Roadmaps: getting in MariaDB and Percona Server for MySQL. Read Mark’s blog for matching read performance vs InnoDB. Supporting mixed engines. Better replication and bigger instance sizes.
  • mixed engines: InnoDB and MyRocks on the same instance, though single transaction does not overlap engines. Plan to extend star backup to integrate `myrocks_hotbackup. Backport gtid_pos_auto_engines from MariaDB?
  • Removing engine log. Could be caused by binlog and engine log, which requires 2pc and ordered commits. Use one log? Either binlog or binlog like service or RocksDB WAL? Rely on binlog now (semi-sync, binlog consumers), need to determine how much performance is gained by stopping writing to WAL.
  • Parallel replication apply is important in MySQL 8
  • support bigger instance sizes: shared nothing database is not a general purpose database. Today you can get 256GB+ RAM and 10TB+ flash on commodity servers. Why not run one big instance and put everything there? Bigger instances may help general purpose small-mid applications. Then you don’t have to worry about sharing. Atomic transactions, joins and secondary keys will just work. Amazon Aurora today supports a 60TB instance!
  • today: you can start deploying slaves with consistency check. Many status counters for instance monitoring.

ProxySQL internals – René Cannaò

  • reduce latency, scales, maximize throughput. Single instance to travel hundreds of thousands of connections and to handle thousands of backend servers.
  • threading models: one thread per connection (blocking I/O), thread pooling (non-blocking I/O, scalable).
  • ProxySQL thread pool implementation: known as “MySQL threads”, fixed number of worker threads (configurable), all threads listen on the same port(s), client connections are not shared between threads, all threads perform their own network I/O, and it uses poll() (does that scale? True, but there is a reason why poll over epoll)
  • threads never share client connections – no need for synchronization, thread contention is reduced, each thread calls poll(). Possibly imbalanced load as a con (one thread that has way more connections that another). Is it really a problem? Most of the time, no, connections will automatically balance.
  • poll() is O(N), epoll() is O(1). Poll() is faster than epoll() for fewer connections (around 1000). Performance degrees when there are a lot of connections. So by default, it uses poll() instead of epoll(), around 50,000 connections performance degrades badly – so ProxySQL has auxiliary threads.
  • MySQL_Session() is implemented as a state machine. Stores metadata associated with the client session (running timers, default hostgroup, etc.)

MySQL Point-in-time recovery like a rockstar – Frederic Descamps

Releases

  • Percona Monitoring and Management 1.7.0 (PMM) – This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM!
  • MariaDB Server 10.1.31 – usual updates to storage engines, and a handful of bug fixes.

Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Feb
06
2018
--

Announcing Experimental Percona Monitoring and Management (PMM) Functionality via Percona Labs

Experimental Percona Monitoring and Management

Experimental Percona Monitoring and ManagementIn this blog post, we’ll introduce how you can look at some experimental Percona Monitoring and Management (PMM) features using Percona Labs builds on GitHub.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. While not covered by Percona support or services agreements, these handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements. 

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This month we’re announcing access to Percona Labs builds of Percona Monitoring and Management so that you can experiment with new functionality that’s not yet in our mainline product. You can identify the unique builds at:

https://hub.docker.com/r/perconalab/pmm-server/tags/

Most of the entries here are the pre-release candidate images we use for QA, and they follow a format of all integers (for example “201802061627”). You’re fine to use these images, but they aren’t the ones that have the experimental functionality.

Today we have two builds of note (these DO have the experimental functionality):

  • 1.6.0-prom2.1
  • 1.5.3-prometheus2

We’re highlighting Prometheus 2.1 on top of our January 1.6 release (1.6.0-prom2.1), available in Docker format. Some of the reasons you might want to deploy this experimental build to take advantage of the Prometheus 2 benefits are:

  • Reduced CPU usage by Prometheus, meaning you can add more hosts to your PMM Server
  • Performance improvements, meaning dashboards load faster
  • Reduced disk I/O, disk space usage

Please keep in mind that as this is a Percona Labs build (see our note above), so in addition note the following two criteria:

  • Support is available from our Percona Monitoring and Management Forums
  • Upgrades might not work – don’t count on upgrading out of this version to a newer release (although it’s not guaranteed to block upgrades)

How to Deploy an Experimental Build from Percona Labs

The great news is that you can follow our Deployment Instructions for Docker, and the only change is where you specify a different Docker container to pull. For example, the standard way to deploy the latest stable PMM Server release with Docker is:

docker pull percona/pmm-server:latest

To use the Percona Labs build 1.6.0-prom2.1 with Prometheus 2.1, execute the following:

docker pull perconalab/pmm-server:1.6.0-prom2.1

Please share your feedback on this build on our Percona Monitoring and Management Forums.

If you’re looking to deploy Percona’s officially released PMM Server (not the Percona Labs release, but our mainline version which currently is release 1.7) into a production environment, I encourage you to consider a Percona Support contract, which includes PMM at no additional charge!

Jan
31
2018
--

Percona Monitoring and Management 1.7.0 (PMM) Is Now Available

Experimental Percona Monitoring and Management

Percona Monitoring and Management 1.7.0Percona announces the release of Percona Monitoring and Management 1.7.0. (PMM ) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM! Watch for an upcoming blog post to demonstrate a walk-through of this unlocked functionality.

New Percona Monitoring and Management 1.7.0 Features

  • PMM-1949: New dashboard: MySQL Amazon Aurora Metrics.
    Percona Monitoring and Management 1.7.0

Improvements

  • PMM-1712: Improve external exporters to let you easily add data monitoring from an arbitrary Prometheus exporter you have running on your host.
  • PMM-1510: Rename swap in and swap out labels to be more specific and help clearly see the direction of data flow for Swap In and Swap Out. The new labels are Swap In (Reads) and Swap Out (Writes) accordingly.
  • PMM-1966: Remove Grafana from a list of exporters on the dashboard to eliminate confusion with existing Grafana in the list of exporters on the current version of the dashboard.
  • PMM-1974: Add the mongodb_up in the Exporter Status dashboard. The new graph is added to maintain consistency of information about exporters. This is done based on new metrics implemented in PMM-1586.

Bug fixes

  • PMM-1967: Inconsistent formulas in Prometheus dashboards.
  • PMM-1986: Signing out with HTTP auth enabled leaves the browser signed in.
Jan
19
2018
--

Percona Monitoring and Management (PMM) 1.6.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management (PMM) 1.6.0. In this release, Percona Monitoring and Management Grafana metrics are available in the Advanced Data Exploration dashboard. We’ve improved the integration with MyRocks, and its data is now collected from SHOW GLOBAL STATUS.

The MongoDB Exporter now features two new metrics: mongodb_up to inform if the MongoDB Server is running and mongodb_scrape_errors_total reporting the total number of errors when scaping MongoDB.

In this release, we’ve greatly improved the performance of the mongodb:metrics monitoring service.

Percona Monitoring and Management (PMM) 1.6.0 also includes version 4.6.3 of Grafana which includes fixes to bugs in the alert list and the alerting rules. More information is available in the Grafana’s change log.

New Features

  • PMM-1773: PMM Grafana specific metrics have been added to the Advanced Data Exploration dashboard.

Improvements

  • PMM-1485Updated MyRocks integration: MyRocks data is now collected entirely from SHOW GLOBAL STATUS, and we have eliminated SHOW ENGINE ROCKSDB STATUS as a data source in mysqld_exporter.
  • PMM-1895Update Grafana to version 4.6.3:
    • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #9951
    • Alerting: Fixes bug where rules evaluated as firing when all conditions were false and using OR operator. #9318
  • PMM-1586: The mongodb_exporter exporter exposes two new metrics: mongodb_up informing if the MongoDB Server is running and mongodb_scrape_errors_total informing the total number of times an error occurred when scraping MongoDB.
  • PMM-1764: Various small mongodb_exporter improvement
  • PMM-1942: Improved the consistency of using labels in all Prometheus related dashboards.
  • PMM-1936: Updated the Prometheus dashboard in Metrics Monitor
  • PMM-1937 Added the CPU Utilization Details (Cores) dashboard to Metrics Monitor.

Bug fixes

  • PMM-1549: Broken default auth db for mongodb:queries
  • PMM-1631: In some cases, percentage values were displayed incorrectly for MongoDB hosts.
  • PMM-1640: RDS exporter: simplify configuration
  • PMM-1760: After the mongodb:metrics monitoring service was added, the usage of CPU considerably increased in QAN versions 1.4.1 through 1.5.3.

    1.5.0 – CPU usage 95%
    1.5.3 – CPU usage 85%
    1.6.0 – CPU usage 1%

  • PMM-1815QAN could show data for a MySQL host when a MongoDB host was selected.
  • PMM-1888: In QAN, query metrics were not loaded when the QAN page was refreshed.
  • PMM-1898: In QANthe Per Query Stats graph displayed incorrect values for MongoDB
  • PMM-1796: In Metrics Monitor, the Top Process States Hourly graph from the MySQL Overview dashboard showed incorrect data.
  • PMM-1777: In QAN, the Load column could display incorrect data.
  • PMM-1744: The error Please provide AWS access credentials error appeared although the provided credentials could be processed successfully.
  • PMM-1676: In preparation for migration to Prometheus 2.0 we have updated the System Overview dashboard for compatibility.
  • PMM-1920: Some standard MySQL metrics were missing from the mysqld_exporter  Prometheus exporter.
  • PMM-1932: The Response Length metric was not displayed for MongoDB hosts in QAN.
Jan
17
2018
--

Troubleshooting Percona Monitoring and Management (PMM) Metrics

In this blog post, I’ll look at some helpful tips on troubleshooting Percona Monitoring and Management metrics.

With any luck, Percona Monitoring and Management (PMM) works for you out of the box. Sometimes, however, things go awry and you see empty or broken graphs instead of dashboards full of insights.

Troubleshooting Percona Monitoring and Management Metrics 1

Before we go through troubleshooting steps, let’s talk about how data makes it to the Grafana dashboards in the first place. The PMM Architecture documentation page helps explain it:

Troubleshooting Percona Monitoring and Management Metrics 2

If we focus just on the “Metrics” path, we see the following requirements:

  • The appropriate “exporters” (Part of PMM Client) are running on the hosts you’re monitoring
  • The database is configured to expose all the metrics you’re looking for
  • The hosts are correctly configured in the repository on PMM Server side (stored in Consul)
  • Prometheus on the PMM Server side can scrape them successfully – meaning it can reach them successfully, does not encounter any timeouts and has enough resources to ingest all the provided data
  • The exporters can retrieve metrics that they requested (i.e., there are no permissions problems)
  • Grafana can retrieve the metrics stored in Prometheus Server and display them

Now that we understand the basic requirements let’s look at troubleshooting the tool.

PMM Client

First, you need to check if the services are actually configured properly and running:

root@rocky:/mnt/data# pmm-admin list
pmm-admin 1.5.2
PMM Server      | 10.11.13.140
Client Name     | rocky
Client Address  | 10.11.13.141
Service Manager | linux-systemd
-------------- ------ ----------- -------- ------------------------------------------- ------------------------------------------
SERVICE TYPE   NAME   LOCAL PORT  RUNNING  DATA SOURCE                                 OPTIONS
-------------- ------ ----------- -------- ------------------------------------------- ------------------------------------------
mysql:queries  rocky  -           YES      root:***@unix(/var/run/mysqld/mysqld.sock)  query_source=slowlog, query_examples=true
linux:metrics  rocky  42000       YES      -
mysql:metrics  rocky  42002       YES      root:***@unix(/var/run/mysqld/mysqld.sock)

Second, you can also instruct the PMM client to perform basic network checks. These can spot connectivity problems, time drift and other issues:

root@rocky:/mnt/data# pmm-admin check-network
PMM Network Status
Server Address | 10.11.13.140
Client Address | 10.11.13.141
* System Time
NTP Server (0.pool.ntp.org)         | 2018-01-06 09:10:33 -0500 EST
PMM Server                          | 2018-01-06 14:10:33 +0000 GMT
PMM Client                          | 2018-01-06 09:10:33 -0500 EST
PMM Server Time Drift               | OK
PMM Client Time Drift               | OK
PMM Client to PMM Server Time Drift | OK
* Connection: Client --> Server
-------------------- -------
SERVER SERVICE       STATUS
-------------------- -------
Consul API           OK
Prometheus API       OK
Query Analytics API  OK
Connection duration | 355.085µs
Request duration    | 938.121µs
Full round trip     | 1.293206ms
* Connection: Client <-- Server
-------------- ------ ------------------- ------- ---------- ---------
SERVICE TYPE   NAME   REMOTE ENDPOINT     STATUS  HTTPS/TLS  PASSWORD
-------------- ------ ------------------- ------- ---------- ---------
linux:metrics  rocky  10.11.13.141:42000  OK      YES        -
mysql:metrics  rocky  10.11.13.141:42002  OK      YES        -

If everything is working, next we can check if exporters are providing the expected data directly.

Checking Prometheus Exporters

Looking at the output from pmm-admin check-network, we can see the “REMOTE ENDPOINT”. This shows the exporter address, which you can use to access it directly in your browser:

Troubleshooting Percona Monitoring and Management Metrics 3

You can see MySQL Exporter has different sets of metrics for high, medium and low resolution, and you can click on them to see the provided metrics:

Troubleshooting Percona Monitoring and Management Metrics 4

There are few possible problems you may encounter at this stage

  • You do not see the metrics you expect to seeThis could be a configuration issue on the database side (docs for MySQL and MongoDB), permissions errors or exporter not being correctly configured to expose the needed metrics.
  • Page takes too long to load. This could mean the data capture is too expensive for your configuration. For example, if you have a million tables, you probably can’t afford to capture per-table data.

mysql_exporter_collector_duration_seconds is a great metric that allows you to see which collectors are enabled for different resolutions, and how much time it takes for a given collector to execute. This way you can find and potentially disable collectors that are too expensive for your environment.

Let’s look at some more advanced ways to troubleshoot exporters.  

Looking at ProcessList

root@rocky:/mnt/data# ps aux | grep mysqld_exporter
root      1697  0.0  0.0   4508   848 ?        Ss    2017   0:00 /bin/sh -c
/usr/local/percona/pmm-client/mysqld_exporter -collect.auto_increment.columns=true
-collect.binlog_size=true -collect.global_status=true -collect.global_variables=true
-collect.info_schema.innodb_metrics=true -collect.info_schema.processlist=true
-collect.info_schema.query_response_time=true -collect.info_schema.tables=true
-collect.info_schema.tablestats=true -collect.info_schema.userstats=true
-collect.perf_schema.eventswaits=true -collect.perf_schema.file_events=true
-collect.perf_schema.indexiowaits=true -collect.perf_schema.tableiowaits=true
-collect.perf_schema.tablelocks=true -collect.slave_status=true
-web.listen-address=10.11.13.141:42002 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml
-web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt
-web.ssl-key-file=/usr/local/percona/pmm-client/server.key >>
/var/log/pmm-mysql-metrics-42002.log 2>&1

This shows us that the exporter is running, as well as specific command line options that were used to start it (which collectors were enabled, for example).

Checking out Log File

root@rocky:/mnt/data# tail /var/log/pmm-mysql-metrics-42002.log
time="2018-01-05T18:19:10-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:11-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:12-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:492"
time="2018-01-05T18:19:12-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:12-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:616"
time="2018-01-05T18:19:13-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:14-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:15-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
time="2018-01-05T18:19:16-05:00" level=error msg="Error pinging mysqld: dial unix
/var/run/mysqld/mysqld.sock: connect: no such file or directory" source="mysqld_exporter.go:442"
2018/01/06 09:10:33 http: TLS handshake error from 10.11.13.141:56154: tls: first record does not look like a TLS handshake

If you have problems such as authentication or permission errors, you will see them in the log file. In the example above, we can see the exporter reporting many connection errors (the MySQL Server was down).

Prometheus Server

Next, we can take a look at the Prometheus Server. It is exposed in PMM Server at /prometheus path. We can go to Status->Targets to see which Targets are configured and if they are working: correctly

Troubleshooting Percona Monitoring and Management Metrics 5

In this example, some hosts are scraped successfully while others are not. As you can see I have some hosts that are down, so scraping fails with “no route to host”. You might also see problems caused by firewall configurations and other reasons.

The next area to check, especially if you have gaps in your graph, is if your Prometheus server has enough resources to ingest all the data reported in your environment. Percona Monitoring and  Management ships with the Prometheus dashboard to help to answer this question (see demo).

There is a lot of information in this dashboard, but one of the most important areas you should check is if there is enough CPU available for Prometheus:

Troubleshooting Percona Monitoring and Management Metrics 6

The most typical problem to have with Prometheus is getting into “Rushed Mode” and dropping some of the metrics data:

Troubleshooting Percona Monitoring and Management Metrics 7

Not using enough memory to buffer metrics is another issue, which is shown as “Configured Target Storage Heap Size” on the graph:

Troubleshooting Percona Monitoring and Management Metrics 8

Values of around 40% of total memory size often make sense. The PMM FAQ details how to tune this setting.

If the amount of memory is already configured correctly, you can explore upgrading to a more powerful instance size or reducing the number of metrics Prometheus ingests. This can be done either by adjusting Metrics Resolution (as explained in FAQ) or disabling some of the collectors (Manual). 

You might wonder which collectors generate the most data? This information is available on the same Prometheus Dashboard:

Troubleshooting Percona Monitoring and Management Metrics 9

While these aren’t not exact values, they correlate very well with what the load collectors generate. In this case, for example, we can see that the Performance Schema is responsible for a large amount of time series data. As such, disabling its collectors can reduce the Prometheus load substantially.

Hopefully, these troubleshooting steps were helpful to you in diagnosing PMM’s metrics capture. In a later blog post, I will write about how to diagnose problems with Query Analytics (Demo).

Jan
16
2018
--

Webinar January 18, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2

Percona Monitoring and Management

Percona Monitoring and ManagementJoin Percona’s Product Manager Michael Coburn as he presents MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2 on Thursday, January 18, 2018, at 11:00 am PST / 2:00 pm EST (UTC-8).

Tags: Percona Monitoring and Management, PMM, Monitoring, MySQL, Performance, Optimization, DBA, SysAdmin, DevOps
Experience Level: Expert

Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering your applications need to handle heavy traffic loads while remaining responsive and stable. This is so that you can deliver an excellent user experience. Furthermore, DBA’s are also expected to find cost-efficient means of solving these issues.

In this webinar — the second part of a two-part series — Michael discusses how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.

By the end of this webinar, you will have a better understanding of how you can troubleshoot MySQL problems in your database.

Register for the webinar now.

Percona Monitoring and ManagementMichael Coburn, Product Manager

Michael joined Percona as a Consultant in 2012 and progressed through various roles including Managing Consultant, Principal Architect, Technical Account Manager, and Technical Support Engineer. He is now leading the Product Manager of Percona Monitoring and Management.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com