Dec
01
2017
--

Percona Monitoring and Management 1.5: QAN in Grafana Interface

Percona-Monitoring-and-Management-1.5-QAN-1 small

In this post, we’ll examine how we’ve improved the GUI layout for Percona Monitoring and Management 1.5 by moving the Query Analytics (QAN) functions into the Grafana interface.

For Percona Monitoring and Management users, you might notice that QAN appears a little differently in our 1.5 release. We’ve taken steps to unify the PMM interface so that it feels more natural to move from reviewing historical trends in Metrics Monitor to examining slow queries in QAN.  Most significantly:

  1. QAN moves from a stand-alone application into Metrics Monitor as a dashboard application
  2. We updated the color scheme of QAN to match Metrics Monitor (but you can toggle a button if you prefer to still see QAN in white!)
  3. Date picker and host selector now use the same methods as Metrics Monitor

Percona Monitoring and Management 1.5 QAN 1

Starting from the PMM landing page, you still see two buttons – one for Metrics Monitor and another for Query Analytics (this hasn’t changed):

Percona Monitoring and Management 1.5 QAN 2

Once you select Query Analytics on the left, you see the new Metrics Monitor dashboard page for PMM Query Analytics. It is now hosted as a Metrics Monitor dashboard, and notice the URL is no longer /qan:

Percona Monitoring and Management 1.5 QAN 3

Another advantage of the Metrics Monitor dashboard integration is that the QAN inherits the host selector from Grafana, which supports partial string matching. This makes it simpler to find the host you’re searching for if you have more than a handful of instances:

Percona Monitoring and Management 1.5 QAN 4

The last feature enhancement worth mentioning is the native Grafana time selector, which lets you select down to the minute resolution time frames. This was a frequent source of feature requests — previously PMM limited you to our pre-defined default ranges. Keep in mind that QAN has an internal archiving job that caps QAN history at eight days.

Percona Monitoring and Management 1.5 QAN 5

Last but not least is the ability to toggle between the default dark interface and the optional white. Look for the small lightbulb icon at the bottom left of any QAN screen (Percona Monitoring and Management 1.5 QAN 6) and give it a try!

Percona Monitoring and Management 1.5 QAN 7

We hope you enjoy the new interface, and we look forward to your feedback on these improvements!

Sep
14
2016
--

Webinar Thursday Sept. 15: Identifying and Solving Database Performance Issues with PMM

PMM

PMMPlease join Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10 am PDT (UTC-7) for a webinar on Identifying and Solving Database Performance Issues with PMM.

Database performance is the key to high-performance applications. Gaining visibility into the database is the key to improving database performance. Percona’s Monitoring and Management (PMM) provides the insight you need into your database environment.

In this webinar, we will demonstrate how using PMM for query analytics, in combination with database and host performance metrics, can more efficiently drive tuning, issue management and application development. Using PMM can result in faster resolution times, more focused development and a more efficient IT team.

Register for the webinar here.

register-now

PMMRoman Vynar, Lead Platform Engineer
Roman is a Lead Platform Engineer at Percona. He joined the company to establish and develop the Remote DBA service from scratch. Over time, the growing service successfully expanded to Managed Services. Roman develops the monitoring tools, automated scripts, backup solution, notification and incident tracking web system and currently leading Percona Monitoring and Management project.
Feb
26
2016
--

Monitoring MongoDB Response Time

Monitoring MongoDB response timeIn this blog post, we’ll discuss how using Prometheus can help with monitoring MongoDB response time. I am currently comparing the performance of different storage engines on Percona Server for MongoDB, using a slightly customized version of Tim Callaghan’s sysbench-mongodb. Since I’m interested in measuring response time for database operations, I created a very simple exporter of response time data for Prometheus.

My first approach to measuring MongoDB response time was inspired by Ignacio Nin’s work on tcprstat, some years ago – and by the way the VividCortex query agent works (which is not surprising, since, to the best of my knowledge, Baron inspired tcprstat in the first place).

With this in mind, I created mongo-response-time, which performs the only function of printing to stdout the response time of every mongodb query seen on the wire, along with a timestamp up to the second. My thanks go to the programmers of Facebook’s Go projects, as their code helped me hit the ground running.

As a first approach this was useful enough for my needs, and here is an example of a basic graph created from data generated by it: Monitoring MongoDB response time

I had to use a log scale as otherwise the graph was just a thick bar near the bottom, and a few outliers above. This is already useful, but it does not scale well. As an example, a sysbench-mongodb run of about an hour produced a csv file with a little over eight million data points. Larger rounds (like 24 hours) are just too difficult to handle with R (in one case, even though I had enough memory in my box to hold more than three copies of the file, read.csv aborted after running out of memory – if this happens to you, I suggest the use of RMySQL instead, which seems more memory-efficient than read.csv for ingesting large amounts of data).

For a second approach, I decided to live with less fidelity and settled for some quantiles and a max. For this, I created a simple Prometheus exporter that exposes 0.5, 0.9 and 0.99 quantiles, and also the max response time for every five second period.

With it, I was able to visualize the MongoDB response time data in Grafana in a way that is affordable and good enough for my needs, as can be seen in the following graphs: Monitoring MongoDB response time

The quantiles are calculated on the client side, using the Summary type from Prometheus’ Go client. The exporter also provides the same quantiles, but through a Histogram, which has the advantage of being more lightweight on clients. I decided to use the Summary as a source for this graph as the impact on the machine seems negligible for now, and I do find its data a bit more reliable (if I compare to calculating quantiles per periods analyzing the full data set in R). You can see how the max (a Gauge, glad you asked!) is useful to have, as it lets you find out about outliers that even the .99 quantile misses (which is expected, by the way).

If you want to try this out, you can find darwin and linux binaries here, and if you hit any problems, please reply here or email me directly at fernando (dot) ipar at Percona’s domain.

Happy monitoring!

Jan
21
2016
--

Tired of MySQL Making You Wait? Webinar: Questions and Answers

MySQLWe’d like to thank everybody for joining us on January 7th for our “Tired of MySQL Making You Wait?” webinar with Percona’s Alexander Rubin, Principal Consultant and SolarWinds’ Janis Griffin, Database Evangelist.

Too often developers and DBAs struggle to pinpoint the root cause of performance issues and then spend too much time in trying to fix them. In the webinar, we discussed how you can significantly increase the performance of your applications while also reducing database response time.

You can find an archived version of the webinar here.

Below are the questions that were asked during the webinar, with responses from Alex and Janis. If you need further clarification, please respond in the comments.

Thanks again, and we look forward to you joining us at our next webinar (with Percona CEO Peter Zaitsev), Compression In Open Source Databases!

 

Q: Are there special tuning tips for Galera Cluster?

A: Since Galera Cluster (Percona XtraDB Cluster) is based on MySQL, all query tuning tips will apply as well. There are a number of Galera Cluster configuration tips available: for example the blog post at this link talks about tuning the PXC for maximum write throughput: https://www.percona.com/blog/2015/06/03/optimizing-percona-xtradb-cluster-write-hotspots/

 

Q: Does DPA support Galera Cluster ?

A: Yes, DPA has the ability to group the cluster together to see load balancing, top 15 SQLs across the cluster, plus the top wait states.

 

Q: Can I create a covered index when I have “group by” and “order by” instructions together?

A: Yes, you can create a covered index and MySQL will use it to satisfy the query (you will see “using index”). If you have “group by” and “order by” on a different columns, however, MySQL will still have to perform a filesort and create a temp table. To create this index, specify all the following fields in your query in the index:

  1. All fields in the “where” condition
  2. The “group by” fields
  3. The “order by” fields
  4. The fields that the query is selecting.

Please note the limitations of such approach described here: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

 

Q: Can we use DPA with Azure MySQL?

A: Yes, DPA will monitor, tune and analyze the SQL server performance running on Microsoft Azure.

 

Q: Do you know if MariaDB has or is planning to follow with these virtual fields and/or SYS schema enhancements from MySQL 5.7?

A: MariaDB has had virtual or computed columns since version 5.2. I don’t believe MariaDB comes with the sys schema already installed, but you can download and install it.

 

Q: Does DPA support PostgreSQL? If not, is it in the roadmap?

A: Currently, DPA does not support PostgresSQL. However, we continually re-evaluate it with each new release.

 

Q: Does DPA support RDS instances?

A: Yes, DPA supports the monitoring of RDS instances.

 

Q: Does the performance schema show any information about how the load data is performing?

A: MySQL 5.5 performance_schema became available in 5.5.3 and has only 11 tables. Most of the tables deal with wait events and file information. In addition, you would need turn on the consumers and enable the instrumentation of the wait events. Once you’ve done that, you will be able to see the threads and what they are waiting on.

 

Q: I didn’t understand the reasoning that leads to the index on ORDER BY. I can’t link it to the previous slide query.

A: I assume this question is about the ORDER BY + LIMIT optimization. When you create an index on the ORDER BY field only, MySQL can start reading the whole table in the order of the index. As the index is sorted, it can start fetching the rows and filter out the rows that don’t match the ORDER BY condition. As there is a LIMIT N on the query, MySQL will stop after fetching N rows.

 

Q: How can I analyze parts of a stored procedure that runs nightly to see where by bottlenecks are? It has 100+ update queries that it performs every night to build a table with one million plus rows.

A: You can do it using the slow query log in Percona Server (5.5/5.6) and/or Performance Schema in MySQL 5.7. If you are running Percona Server, you can enable extended stored procedures logging as described here: https://www.percona.com/doc/percona-server/5.6/diagnostics/slow_extended.html. Another way is using a deprecated “show profile” method as described here: https://www.percona.com/blog/2009/01/19/profiling-mysql-stored-routines/

 

Q: How will DPA use the index when there are more than five columns in the “where” conditions? How would you create indexes?

A: I would suggest checking the “cardinality” of the fields (= number of unique values). Usually (unless you create a covered index or are optimizing the group by) it makes much more sense to limit the number of fields in an index, and only include the fields with the high cardinality. For example, PRIMARY KEY or UNIQUE INDEX works best, whereas the “gender” field (with only two unique values, “male” and “female”) would not be very useful.

 

Q: How would the analytics tool work in an open stack VM environment, where we have 100 database servers?

A: One installation of DPA can monitor hundreds of database servers. In fact, we have several very large companies that monitor 1000s of servers worldwide.

 

Q: If you have a small table with only 100 records, is it worth creating indexes on specific fields or just do a table scan?

A: If the table is only 100 records and you are not joining it with other tables, it usually does not make sense to add indexes. But because the table is so small it doesn’t really matter either way.

 

Q: Is the SolarWinds tool better than MONyog, and how expensive is the license cost for this?

A: MONyog is also a monitoring tool, but it doesn’t have the advisors, alarms, granularity, history, or customizations that DPA gives you. The retail cost per server is currently $1,995 per monitored server, but is heavily discounted the more you purchase.

 

Q: In many cases, due to the randomness and complexity of queries thrown at various tables, I end up creating a lot of indexes. At what point would there be too many indexes? Should I then create MySQL views instead of indexes? Should one use MySQL views at all to optimize searches?

A: First of all there are no “materialized views” in MySQL, so it is not a useful replacement for indexes. You can create “summary” tables manually, which will usually help a lot. Although it is hard to say when you have too many indexes, lots of indexes can decrease the performance of your insert/update/delete operations, as well as confuse MySQL. So a great many indexes might cause MySQL to start choosing a wrong index when doing selects.

 

Q: Sometime, we need to add indices for different queries for the same table. Eventually, the table has too many indices. Any suggestion for such cases?

A: See the response to the previous question.

 

Q: Is there a way in DPA to see what queries are currently running? In other words, to know about slow queries as they run rather than only knowing about them historically?

A: Yes. In the “Current” dashboard, click the “Currently Active Sessions” box. With this option, you can sort by longest running, etc.

 

Q: Why is delay indexed in the composite key? It only covers the query, but the temp table can be avoided by the first two fields?

A: You are referring to this example:

mysql> alter table ontime_2012
add key covered(dayofweek, Carrier, DepDelayMinutes);
explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2012 where dayofweek =7 group by CarrierG
...                    
possible_keys: DayOfWeek,covered
          key: covered
          key_len: 2
          ref: const
          rows: 905138
          Extra: Using where; Using index

The reason we add DepDelayMinutes is to make the index covered, so MySQL will be able to satisfy the query with an index only.

 

Jan
14
2016
--

Prometheus as an Engine for MySQL Monitoring

prometheusWhen I first discovered Graphite years ago, I was very impressed with its monitoring capabilities.  Compared to many RRD-based tools that were popular at the time (like Cacti), Graphite separated the captured data and graphs, allowing you to do all kinds of math and transformations while visualizing data. For example, I could plot the relationship between system queries and disk IO, and capture how the system was getting more IO bound over time. It also had reasonably high performance, allowing me to capture high-resolution data for medium-sized systems.

Just last year I discovered Prometheus, and it also impressed me. I think it has the potential to take Graphite’s flexibility to the next level. Though I am in no way a Prometheus expert, I  want to share my understanding and thoughts on it so far.

Data Model

The data model is perhaps what attracted me to Prometheus the most. While it’s not obvious at first, when you do figure it out it has fantastic flexibility.

In the data model used by Whisper and Carbon in Graphite, you will use something like this to store MySQL data:

myapp.store.mysql.db01.status.questions = 5000

You can set up any hierarchy you like, but it has to have a hierarchy.

What Prometheus does instead is allow you to use a set of key-value pairs. The same data shown above could be presented like this:

questions_total{app=”myapp”,subsystem=”store”,engine=”mysql”,host=”db01”, source=”status”} = 5000

(You most likely wouldn’t use this exact structure, but it’s good for illustration.)

The difference between these approaches it that Prometheus provides you multiple dimensions on which you can filter and aggregate, plus you can add those dimensions later as you need them (without needing to redesign your tree hierarchy).

These labels are very dynamic, and I can change them in a second. For example, a MySQL server reporting as a “Master” might start reporting as a “Slave” in the very next second, and its data will be aggregated differently.

This is especially important in the modern, often cloud-based and virtualized world. For example, using Prometheus it is very easy to tag servers by their region or availability zones. I can also do things like compute MySQL space usage by both the database and storage engine. The possibilities are endless.

Data Capture

Unlike Graphite – where the main model is push and the hosts themselves choose what kind of information they want to push to monitoring system and at which intervals – with Prometheus you set up “Exporters” that have the ability to export the data. It is up to the Prometheus server configuration to choose what data to sample and how frequently.

The clear benefit of Prometheus’ approach is that you can have as many servers as you like pulling the data, so it is very easy to create a development version of your system and play around with it – without affecting production. It also provides a simple pathway to high availability.

(Both the push and pull approaches have their benefits and drawbacks. Brian Brazil wrote an excellent article advertising the pull model of monitoring.)

Prometheus does create a few challenges for me. Unless I want to set up Service Discovery, it is a hassle to monitor any development/test VMs I might spin up (that would otherwise not be open to external access at all). While this isn’t the main use case for Prometheus, it is helpful for me to test the dashboard’s behavior with different operating systems, workloads, etc.

A more significant issue I discovered is dealing with some data that can’t be captured to multiple locations, because the data capture causes the data to change.

Here is specific example: if I look at the

events_statements_summary_by_digest

 table in

PERFORMANCE_SCHEMA

, there is a

MAX_TIMER_WAIT

 field that shows me what the maximum query execution time is for the query pattern. If I want to get the maximum query execution time for every minute, for example, I would need to “truncate” the table to reset the statistics and let the maximum value be computed again. If I don’t perform that operation, the data becomes meaningless. If I make the exporter to reset the statistics during the poll, however, I can’t pull it from two Prometheus servers.

This is one instance where Prometheus’ performance schema design could be better. I could set up a Cron job or Event to clear out the statistics regularly and get a  proper maximum value for every five minutes, but that isn’t an overly convenient solution.

Another issue I discovered is that Prometheus doesn’t have any protection from bad (long) samples, or a very good method of detecting of them. Let’s imagine that I have a MySQL server and I’m sampling status data every second. For some reason the call to

SHOW GLOBAL STATUS

 took five seconds to execute. The truth is we don’t really know where in those five seconds the

SHOW GLOBAL STATUS

 output corresponds – it might be at very start, it might be at the very end. As such, you don’t really know how to process the counters. Whatever you do, you’re likely to be wrong. My preference in this case it to simply discard such samples, because even missing one percent of the samples is unlikely to change the whole picture. Constantly questioning whether you really had a couple of seconds where the QPS spiked to ten times the normal rate, or that it’s an invalid sample, is not something I on which I want to waste a lot of time!

My preferred approach is to configure the

SHOW GLOBAL STATUS

 capture so that if it takes more than ten percent of the capture interval, it will be discarded. For example, with a one second capture I would allow 100ms for the capture. If the system is not keeping up with this scale, I would be better to not fool myself and reduce the capture resolution to around five seconds.

The only protection Prometheus allows is to configure the scrape_timeout, but unfortunately it is only limited to one second resolution at this point. This is too coarse for any high-resolution capture.

Finally, it is also inconvenient to specify different resolutions for different data. In MySQL there is a often a lot of data that I want to capture, but the resolution needed for each capture is different. For example,

SHOW GLOBAL STATUS

 with one second resolution is must. At the same time, capturing the table size information from

INFORMATION_SCHEMA

 with a one second resolution would put too much load on MySQL, especially if there are a lot of tables. That level of resolution in this case isn’t really needed.

An attractive thing about Prometheus is that the Prometheus development team uses it a lot for MySQL monitoring, so the MySQL Exporter is really good. Most MySQL monitoring plugins I find resort to reporting just a few basics statistics, which is not nearly enough for advanced diagnostics. The Prometheus MySQL exporter gets tons of stuff and has been adding more in every version.

I also very much like that the Prometheus Exporters are designed using HTTP protocol. This means it is very easy to debug or see what kind of data they capture. They present it simply using a web-browser:

HTTP browser

Computational Model

I think the basic operations in Prometheus are pretty intuitive, but if you look at some of the advanced behaviors you’re going to find some inconveniences and some things that are likely to surprise you.

One inconvenience is that Prometheus is mainly designed for working with high resolution data. If there are more than five minute holes (by default) in the time series, they could disappear from the graphs. As I mentioned, for MySQL there is quite a lot of information that it makes sense to capture at a resolution lower than five minutes.

Prometheus functions are looking in the “past,” and designed in a way that the value of the function at any time (T) when it could be computed is not going to change. It all looks clean and logical, but it causes issues with holes in the data capture.  

As an example, let’s imagine following five seconds where the total number of questions from the start successfully scrapped some seconds but not others (due to a network issue, overload, etc.):

1  –  10
2  –  20
3  –  X
4  –  X
5  –  200

 When we capture data of “counter” type the most important value it has is not the actual counter value at the given time but the rate of change of the counter at different time intervals. If in this case, for example, the query rate was ten QPS for intervals one through two seconds, this can be clearly computed. But what was the query rate in the three through four second interval? We don’t really have exact data, but that is fine: we know there have been 180 queries during the two through five second interval, giving us 60 QPS (which we can use for the three through four seconds interval).

This is NOT, however, how Prometheus will compute it if you use a high irate() function (which is suppose to give you highest resolution possible). When you evaluate irate() at T=4, it doesn’t have access to the T=5 value, even if it is in the database. Instead, it will look back and find the matching previous interval (one through two) and use the corresponding value of ten QPS.

I find this pretty puzzling and inconvenient.

There is also the rate() function, which can be used to get the average rate for the period.  Unfortunately it can’t estimate the rate for a smaller period based on the available data for a longer period. So for example if I ask rate() function to compute a query rate at T=4, looking one second back, it will return no data. This isn’t a big deal when you’re working with data manually, but if you’re building zoomable dashboards it means you can zoom in to the point where the data will disappear (rather than stopping at the best possible value available).

Storage

Prometheus has its own high performance storage system which is based in part on LevelDB. It is highly optimized for time series and can achieve a very high level of compression. Be ready, though: all your label combinations will create a different time series on the low level, and will require a lot of files. This isn’t really a problem with SSD drives and modern file systems, but it something to look out for.

The capture engine and storage systems are rather efficient. Even though Prometheus does not have built in clustering for “scaling out,” you can reportedly get more than 300K metrics per second captured on a single node. You can also use multiple Prometheus servers as needed.

The problem I found with Prometheus’ storage is that it is very self contained: you can only use it from Prometheus or access it from the HTTP API. There are no tools at this point to export it for advanced analysis with R, or to dump the whole database into something like JSON format so it can be loaded into a different database engine. Some of these features might already be on roadmap.

Purging and Aggregation

Retention configuration in Prometheus is pretty spartan. You can set

storage.local.retention

 to the length you want to store the data, but that’s it. You can’t configure it to purge different data at different times. You can run multiple Prometheus instances to achieve this, but it’s quite a hassle.  It’s also not possible to instruct Prometheus to automatically build summaries in order to execute low resolution queries faster.

For example if I have MySQL’s query rate captured every second, but I want to view the data over a long time period (e.g., how it changed over last three months to estimate growth trends), data aggregated at hour intervals would be enough for that purpose.

There is support for recording rules to help achieve some of this, but it is not explicit or convenient in my opinion.

Looking at the Prometheus roadmap, some of these issues might not be fixed in Prometheus but achieved through integrating other systems such as InfluxDB (where experimental support already exists).

Purpose

A lot of these limitations make sense if you look at the purpose for which Prometheus was created: getting high-resolution data and being able to provide as much troubleshooting information as possible to its Alerting engine. It is not really designed for storing extensive history. Too bad! I would very much like to get both of those properties in the single system!

Visualization

As you install Prometheus, it has a built-in Expression Browser, which is great for debugging and interactive analyses. It also allows you to see what data you actually have in the database. It will disappoint you, however, if you’re looking for beautiful graphs!

HTTP installer

This shows I have the information about MySQL query rate from two servers, as well as the available and configured labels.

If I want to pick one server and look at the average rate of queries per five minutes, I can do this:

HTTP graphs

There are some tools available in the graph to chose the time range and resolution.

You should aware that visualizing data with the rate() function often shows you things that do not exist. In this case, it looks like the number of queries was gradually creeping up. In reality, I just started the benchmark so the number of queries jumped almost immediately. This is what the real situation looks like (using irate()):

HTTP graphs 2

As I explained before, irate() does not handle missing data points very well, plus it behaves somewhat bizarrely when you “zoom out” – providing instant rate information at sparse intervals (e.g., the instant rate computed every one second over 60 seconds) rather than smoothing things to averages.

There is also the PromDash tool available for Prometheus, which gives you nicer looking dashboards and supports a lot of Prometheus’ features. Now that Grafana has official support for Prometheus, it is my preferred tool to build dashboards – especially since it supports multiple data sources besides Prometheus.

Summary

I’m very excited about Prometheus. It allows me to get a lot of information easily and use it for Performance analyses in benchmarking or troubleshooting. It would be great if it also had a simple integrated solution for long term storage and trending. I am also looking forward to better integration with Grafana and better documentation on how to create Prometheus-based dashboards – especially with some Prometheus-based examples!  

Note: All above was written about Prometheus 0.16.1. Prometheus is rapidly evolving and may  change with newer versions.

Oct
20
2015
--

Slow query graphs using Performance Schema and Graphite

I love graphs. They just make things easier when it comes to finding patterns. I also love visibility. Having the ability to known what is going on inside the database is priceless. How about having visibility of the slow queries execution time on a graph? Let’s do it.

We’ve already described how to get query digest using performance schema. Since the MySQL server is already doing the heavy lifting for you with little-to-no overhead, this information is available practically at will. So let’s make some graphs with that data.

To accomplish this I will use the well-known tool Graphite to store and render time-series data. For those who are not familiar with Graphite, it’s actually a 3-piece tool, consisting of:

  • The Carbon metric processing daemons
  • The Whisper time-series database library
  • The Graphite-Web, a Django-based web application that renders graphs and dashboards

Instructions for installing it are out of the scope of this post, but it is not hard to find instructions surfing the web and it’s not more complicated than using yum/apt to do it.

One of the greatest things about Graphite is that feeding data to it can be quite simple. The most basic way to do it is by using the “plaintext protocol” which is nothing more than sending the data to a specific port with the proper format. The format is:

<metric_path> <metric_value> <metric_timestamp>

You can use netcat to send the information. That’s it.

I’ve decided to use an AWS MySQL RDS instance running sysbench for testing purposes. For the metric path format, I’ve used: mysql.rds.<query_formatted> <value> <timestamp>. Now, assuming that the carbon is in the localhost machine and knowing that the default port number is 2003, to send a value I just need to do something like:

echo "mysql.rds.SELECT__ 0.00123 1445026691" | nc -w1 localhost 2003

And Graphite will take care of the rest!

The questions now are: Where do I get the data from? And how do I feed Graphite in a continuous way?

The first answer is: From the Performance Schema!
The second answer is: Bash.

You can write a small script that takes care of consulting the performance_schema.events_statements_summary_by_digest table, gives a good format to the queries and sends the data to Graphite. This is the one I used for my testing:

#!/bin/bash
query="set session group_concat_max_len=2048; select concat(digest_text,'dash', round(sum_timer_wait/ 1000000000000, 6),'\n') from performance_schema.events_statements_summary_by_digest order by sum_timer_wait desc limit 20;"
IFS="
"
for i in $(mysql -Nr -e"$query"); do
digest=$(echo ${i%%dash*})
digest=${digest%%,*}
digest=$(echo $digest | tr -d "`")
digest=$(echo $digest | tr " " "_")
digest=$(echo $digest | tr -d "?")
digest=$(echo $digest | tr "." "-")
digest=$(echo $digest | tr "(" "_")
digest=$(echo $digest | tr ")" "_")
value=$(echo ${i##*dash})
echo "mysql.rds.$digest $value $(date +%s)" | nc -w 1 localhost 2003
done

Not too fancy, but it will do the trick. Calling the script inside an infinite loop at every 1 second for several hours, the result is:

Grpahite Slow log compiledThe above graph shows the delta between subsequent data points, previously normalized to 1 per second. The data points here are all the INSERT, SELECT, DELETE and UPDATE’s from the server, combined with wildcards, like this:

functionsAppliedCombinedIn plain English: the graph shows how long the queries are taking. This will give you pretty good insight information from the queries’ perspective.

You can have more detailed graphs; for example, one with explicit queries that you want to track:

Slow queries graph detailesFrom the tree on the left you can see all the queries that the bash script has collected and sent to Graphite. This is somehow dynamic, since it will add “new” queries that fulfill the criteria.

This is just a small example of what can be done, but both Performance Schema and Graphite are very versatile and you can find creative ways to use them. Readers that are familiar with tools like Anemometer or the great blog post from openark might find this very similar; however this is a direct approach, that instead of using pt-query-digest with all it’s complexity, instead uses the P_S in a more customized and simpler way. Have fun!

The post Slow query graphs using Performance Schema and Graphite appeared first on MySQL Performance Blog.

Oct
13
2015
--

MySQL query digest with Performance Schema

Data AnalysisQuery analysis is a fantastic path in the pursuit to achieve high performance. It’s also probably the most repeated part of a DBA’s daily adventure. For most of us, the weapon of choice is definitely pt-query-digest, which is one of the best tools for slow query analysis out there.

Why not use pt-query-digest? Well, sometimes getting the slow log can be a challenge, such as with RDS instances or when your database is running as part of a DBaaS, which is a common practice in certain organizations.

In those cases it’s good to have an alternative. And in this case, the chosen one is the Performance Schema. We have already talked about the events_statements_* tables; however, this is the moment for the events_statements_summary_by_digest. In this table each row summarizes events for given schema/digest values (note that before MySQL 5.6.9, there is no SCHEMA_NAME column and grouping is based on DIGEST values only).

In order for MySQL to start to aggregate information in summary tables, you must verify that the consumer statement_digest is enabled.

Now, the most straightforward way to get data is to simply query the table, like this:

SELECT
SCHEMA_NAME,
digest,
digest_text,
round(sum_timer_wait/ 1000000000000, 6),
count_star
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;

This will show you the picture of volume and frequency of SQL statements in your server. As simple as that. But there are some caveats:

  • Statements are normalized to a digest text. Instead of seeing a query like SELECT age FROM population WHERE id BETWEEN 153 AND 153+69 you will have the fingerprint version: SELECT age FROM population WHERE id BETWEEN ? AND ? + ?
  • The events_statements_summary_by_digest table has a limited maximum number of rows (200 by default, but MySQL 5.6.5 can be modified with the performance_schema_digests_size variable). As a consequence, when the table is full, statement digest values that have no already existing row will be added to a special “catch-all” row with DIGEST = NULL. In plain English: you won’t have meaningful info for those statements.

To solve the first issue, we can use the events_statements_history table to get complete queries for all the digests. I chose not to use events_statements_currents because of the short life the rows have on that table. With history, there are more chances to get more queries in the same amount of time.

Now, when using pt-query-digest, the first step is always to collect a representative amount of data, commonly from the slow log, and then process. With Performance Schema, let’s collect a representative amount of complete queries so we can have examples for every aggregated statement.

To address the second issue, we just need to TRUNCATE the events_statements_summary_by_digest table. This way the summary will start from scratch.

Since Performance Schema is available on all the platforms supported by MySQL, I chose to run the tests on an Amazon RDS MySQL instance. The only thing I didn’t like is that P_S is disabled by default on RDS and to enable it requires an instances reboot. Other than that, everything is the same as in a regular instance.

The steps are:

  1. Enable the events_statements_history consumer
  2. Create a MEMORY table to hold the data
  3. Truncate tables to have a fresh start
  4. Create a MySQL EVENT that will fill the table with data
  5. Once the event has ended, get the query digest.

The table schema is the following:

CREATE TABLE IF NOT EXISTS percona.digest_seen
(schema_name varchar(64) DEFAULT NULL,
digest varchar(32) DEFAULT NULL,
sql_text varchar(1024) DEFAULT NULL,
PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;

The original SQL_TEXT field from the events_statements_history table is defined as longtext, but unless you are using Percona Server (5.5+), you won’t be able to use longtext on a memory table. This is possible in Percona Server because the Improved Memory Engine permits the use of Blob and Text fields on the Memory Storage Engine. The workaround is to define that field as a varchar 1024. Why 1024? That’s another requirement from the table: The SQL_TEXT is fixed at 1024 chars. It’s only after MySQL 5.7.6 that the maximum number of bytes to display can be modified by changing the performance_schema_max_sql_text_length system variable at server startup.

Also, since we are going to use EVENTS on RDS, the “event_scheduler” variable has to be set to ON. Luckily, it is a dynamic variable so there’s no need to reboot the instance after modifying the Parameter Group. If using a non-RDS, it’s enough to execute “SET GLOBAL event_scheduler = ON;”

Here is the complete list of steps:

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'events_statements_history';
SET SESSION max_heap_table_size = 1024*1024;
CREATE DATABASE IF NOT EXISTS percona;
USE percona;
CREATE TABLE IF NOT EXISTS percona.digest_seen (schema_name varchar(64) DEFAULT NULL, digest varchar(32) DEFAULT NULL, sql_text varchar(24) DEFAULT NULL, PRIMARY KEY USING BTREE (schema_name,digest)) engine=memory;
TRUNCATE TABLE percona.digest_seen;
TRUNCATE performance_schema.events_statements_summary_by_digest;
TRUNCATE performance_schema.events_statements_history;
CREATE EVENT IF NOT EXISTS getDigest
ON SCHEDULE EVERY 1 SECOND
ENDS CURRENT_TIMESTAMP + INTERVAL 5 MINUTE ON COMPLETION NOT PRESERVE
DO
INSERT IGNORE INTO percona.digest_seen SELECT CURRENT_SCHEMA, DIGEST, SQL_TEXT FROM performance_schema.events_statements_history WHERE DIGEST IS NOT NULL GROUP BY current_schema, digest LIMIT 50;

The event is defined to be run immediately, once per second, for 5 minutes. After the event is complete, it will be deleted.

When the event is done, we are in position to get the query digest. Simply execute this query:

SELECT
s.SCHEMA_NAME,
s.SQL_TEXT,
ROUND(d.SUM_TIMER_WAIT / 1000000000000, 6) as EXECUTION_TIME,
ROUND(d.AVG_TIMER_WAIT / 1000000000000, 6) as AVERAGE_TIME,
COUNT_STAR
FROM performance_schema.events_statements_summary_by_digest d
LEFT JOIN percona.digest_seen s USING (digest)
WHERE s.SCHEMA_NAME IS NOT NULL
GROUP BY s.digest
ORDER BY EXECUTION_TIME DESC LIMIT 10;

The order by is similar to the one pt-query-digest does by default, but it could be any one you want.

The output is:

+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
| SCHEMA_NAME | SQL_TEXT                                                                                                                                                                  | EXECUTION_TIME | AVERAGE_TIME | COUNT_STAR |
+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
| percona     | UPDATE population SET age=age+1 WHERE id=148                                                                                                                              |  202304.145758 |     1.949487 |     103773 |
| percona     | SELECT age FROM population WHERE id BETWEEN 153 AND 153+69                                                                                                                |     488.572609 |     0.000176 |    2771352 |
| percona     | SELECT sex,age,estimate2012 FROM population WHERE id BETWEEN 174 AND 174+69 ORDER BY sex                                                                                  |     108.841575 |     0.000236 |     461412 |
| percona     | SELECT census2010 FROM population WHERE id=153                                                                                                                            |      62.742239 |     0.000090 |     693526 |
| percona     | SELECT SUM(estimate2014) FROM population WHERE id BETWEEN 154 AND 154+69                                                                                                  |      44.940020 |     0.000195 |     230810 |
| percona     | SELECT DISTINCT base2010 FROM population WHERE id BETWEEN 154 AND 154+69 ORDER BY base2010                                                                                |      33.909593 |     0.000294 |     115338 |
| percona     | UPDATE population SET estimate2011='52906609184-39278192019-93190587310-78276160274-48170779146-66415569224-40310027367-70054020251-87998206812-01032761541' WHERE id=154 |       8.231353 |     0.000303 |      27210 |
| percona     | COMMIT                                                                                                                                                                    |       2.630153 |     0.002900 |        907 |
| percona     | BEGIN                                                                                                                                                                     |       0.705435 |     0.000031 |      23127 |
|             | SELECT 1                                                                                                                                                                  |       0.422626 |     0.000102 |       4155 |
+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------+------------+
10 rows in set (0.10 sec)

Finally, you can do some cleanup:

DROP event IF EXISTS getDigest;
DROP TABLE IF EXISTS percona.digest_seen;
SET SESSION max_heap_table_size = @@max_heap_table_size;
UPDATE performance_schema.setup_consumers SET ENABLED = 'NO' WHERE NAME = 'events_statements_history';

Summary: Performance Schema is doing the query digest already for you. It is just a matter of how to access the data in a way that suits your requirements.

The post MySQL query digest with Performance Schema appeared first on MySQL Performance Blog.

May
07
2015
--

Introducing email reports for Percona Cloud Tools!

For those PCT users who forget to regularly check query performance, you can now get server query reports delivered to your email daily or weekly. The report contains a digest of the most critical performance information related to a particular MySQL instance, enabling you to act on any recent issues. There are convenient links that open the corresponding view of the PCT Query Analytics module.

email-report-snapshot

Server summary snapshot

The report also provides a list of ten slowest queries and ten queries with the highest load during the period. It is important to compare these lists, because some of the slowest queries may occur only several times during the period (so they probably do not overload the server), while seemingly quick queries may run millions of times and put a lot of unnecessary overhead, which you could investigate and fix.

email-report-slow

List of slowest queries

email-report-load

List of queries with highest load

Finally, the report also provides information about new queries that you add to MySQL. This enables you to estimate the efficiency of a new query and see how it compares to the worst performing ones.

email-report-new

List of new queries

Reports are enabled by default for some of our customers. We will gradually enable them for more PCT users every week to make sure that the new feature rolls out smoothly. You can manage reports in the PCT web UI under Configure > Reports at https://cloud.percona.com/org/user-reports-settings.

By default, you will receive a separate weekly report for each MySQL instance in your infrastructure. If you add a new MySQL instance, weekly reports will be automatically enabled for it.

To disable all reports, clear the Enable server query reports check box. If you do not want to enable reports for new MySQL instances that you add, clear the Automatically receive reports for new MySQL instances check box.

You can enable reports for specific MySQL instances, as well as select between daily and weekly period. Disable reports for servers that do not require regular attention and enable weekly reports for more critical servers. Use daily reports for the most active servers, where you expect to have the most amount of change and load.

If you do not have a Percona Cloud Tools account yet, register at cloud.percona.com and get regular query reports by email!

The post Introducing email reports for Percona Cloud Tools! appeared first on MySQL Performance Blog.

Mar
11
2015
--

Easy query metrics with MySQL Performance Schema

The MySQL Performance Schema exposes so much data that it’s not trivial to learn, configure, and use. With recently released Percona Agent 1.0.11 you can get query metrics – like min, max, and average query execution time – with a few clicks:

Configure PCT Query Analytics for Performance Schema

Click “Apply” and about two minutes later you’ll have query metrics from Performance Schema, collected and sent every minute.

Percona Cloud Tools (PCT) and Percona Agent handle all the details. You’ll need MySQL (or Percona Server) 5.6 and Percona Agent 1.0.11 or newer. One caveat at the moment: it only works for local MySQL instances (so not Amazon RDS). This limitation should be fixed soon; we’re already working on it.

Why use Performance Schema? We prefer Query Analytics with MySQL slow logs (especially Percona Server slow logs which expose more metrics) because slow logs expose the most metrics compared to other sources, but sometimes the slow log just isn’t an option, so Performance Schema is the next best choice, but the choice means tradeoffs. For example, Performance Schema does not expose actual query examples (just fingerprints), so EXPLAIN does not work.

For those who’ve been following PCT development, you know that Percona Agent 1.0.5 first introduced support for Performance Schema. What’s new in 1.0.11 is everything – we completely rewrote this part of the agent. It’s so much better that it’s now the required minimum version for using Query Analytics with Performance Schema. Upgrading is really easy: just run the single command line you used to install the agent and it will auto-update.

MySQL Performance Schema exposes a lot of data and insights into the server, but query metrics are perhaps the most important because the primary job of your MySQL database is to execute queries. That’s why Percona Cloud Tools makes Query Analytics with Performance Schema (and slow logs) so easy: to help you focus on the essential and leave the details to the tools.

Percona Cloud Tools is in beta, so it’s still free to sign up and free to use all the tools and features.

The post Easy query metrics with MySQL Performance Schema appeared first on MySQL Performance Blog.

Feb
02
2015
--

Faster fingerprints and Go packages for MySQL

I’m happy to announce Go packages for MySQL. Particularly exciting is a new query fingerprint function which is very fast and efficient, but I’ll talk about that later. First, go-mysql is currently three simple Go packages for parsing and aggregating MySQL slow logs. If you’ve been following Percona development, you’ve no doubt heard of Percona Cloud Tools (PCT), a somewhat new performance management web service for MySQL.

One tool in PCT is “Query Analytics” which continuously analyzes query metrics from the slow log. The slow log provides the most metrics and therefore the most performance insight into MySQL. percona-agent, the open-source agent for PCT, uses go-mysql to parse and analyze the slow log, so the code has both heavy formal testing and heavy real-world testing. If you’re working with Go, MySQL, and MySQL slow logs, we invite you to try go-mysql.

Last October we implemented a completely new query fingerprint function. (See “Fingerprints” in the pt-query-digest doc for a background on query fingerprints.) Since mydumpslow, the very first slow log parser circa 2000, fingerprints have been accomplished with regular expressions. This approach is normally fine, but percona-agent needs to be faster and more efficient than normal to reduce the cost of observation. Regex patterns are like little state machines. One regex can be very fast, but several are required to produce a good fingerprint. Therefore, the regex approach requires processing the same query several times to produce a fingerprint. Even worse: a regex can backtrack which means a single logical pass through the query can result in several physical passes. In short: regular expressions are a quick and easy solution, but they are very inefficient.

Several years ago, a former colleague suggested a different approach: a single pass, purpose-built, character-level state machine. The resulting code is rather complicated, but the resulting performance is a tremendous improvement: 3-5x faster in informal benchmarks on my machine, and it handles more edge cases. In simplest terms: the new fingerprint function does more with less, which makes percona-agent and Query Analytics better.

Check out  github.com/percona/go-mysql, browse the percona-agent source code if you wish, and try Percona Cloud Tools for free.

The post Faster fingerprints and Go packages for MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com