Oct
20
2017
--

Percona Monitoring and Management 1.4.0 Is Now Available

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.4.0.

This release introduces the support of external Prometheus exporters so that you can create dashboards in the Metrics monitor even for the monitoring services other than those provided with PMM client packages. To attach an existing external Prometheus exporter, run pmm-admin add external:metrics NAME_OF_EXPORTER URL:PORT.

The list of attached monitoring services is now available not only in the tabular format but also as a JSON file to enable automatic verification of your configuration. To view the list of monitoring services in the JSON format run pmm-admin list --json.

In this release, Prometheus and Grafana have been upgraded. Prometheus version 1.7.2, shipped with this release, offers a number of bug fixes that will contribute to its smooth operation inside PMM. For more information, see the Prometheus change log.

Version 4.5.2 of Grafana, included in this release of PMM, offers a number of new tools that will facilitate data analysis in PMM:

  • New query editor for Prometheus expressions features syntax highlighting and autocompletion for metrics, functions and range vectors.
    Percona Monitoring and Management
  • Query inspector provides detailed information about the query. The primary goal of graph inspector is to enable analyzing a graph which does not display data as expected.
    Percona Monitoring and Management

The complete list of new features in Graphana 4.5.0 is available from What’s New in Grafana v4.5.

For install and upgrade instructions, see Deploying Percona Monitoring and Management.

New features

  • PMM-1520: Prometheus upgraded to version 1.7.2.
  • PMM-1521: Grafana upgraded to version 4.5.2.
  • PMM-1091: The pmm-admin list produces a JSON document as output if the --json option is supplied.
  • PMM-507: External exporters are supported with pmm-admin.
  • PMM-1622: docker images of PMM Server are available for downloading as tar packages.

Improvements

  • PMM-1553: Consul upgraded to the 0.8 release.

Bug fixes

  • PMM-1172: In some cases, the TABLES section of a query in QAN could contain no data and display the List of tables is empty error. The Query and Explain sections had the relevant values.
  • PMM-1519: A Prometheus instance could be forced to shut down if it contained too many targets (more than 50). When started the next time, Prometheus initiated a time-consuming crash recovery routine which took long on large installations.
Oct
18
2017
--

Webinar Thursday, October 19, 2017: What You Need to Get the Most Out of Indexes – Part 2

Indexes

IndexesJoin Percona’s Senior Architect, Matthew Boehm, as he presents What You Need to Get the Most Out of Indexes – Part 2 webinar on Thursday, October 19, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Proper indexing is key to database performance. Finely tune your query writing and database performance with tips from the experts. MySQL offers a few different types of indexes and uses them in a variety of ways.

In this session you’ll learn:

  • How to use composite indexes
  • Other index usages besides lookup
  • How to find unoptimized queries
  • What is there beyond EXPLAIN?

Register for the webinar.

IndexesMatthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster, massive sharding topologies, PHP development and a bit of MySQL-C-API development. Previously, Matthew DBAed for the fifth largest MySQL installation at eBay/PayPal. He also hails from managed hosting environments. During his off-hours, Matthew is a nationally ranked, competitive West Coast Swing dancer and travels to competitions around the US. He enjoys working out, camping, biking and playing MMOs with his son.

Oct
13
2017
--

This Week in Data with Colin Charles 10: MariaDB and Upcoming Appearances

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond spending time getting ready for Velocity and Open Source Summit Europe this week, there was some feature testing this week that compared MySQL and MariaDB. Naturally, a long report/blog is coming soon. Stay tuned.

Releases

I reckon a lot of folks are swamped after Percona Live Europe Dublin and Oracle OpenWorld, so the releases in the MySQL universe are a bit quieter.

Link List

Upcoming Appearances

Percona’s website keeps track of community events, so check out where to see and listen to a Perconian speak. My upcoming appearances are:

Feedback

I was asked why there weren’t many talks from MariaDB Foundation / MariaDB Corporation at Percona Live Europe 2017. Simple answer: there were hardly any submissions. We had two talk submissions from one speaker from the MariaDB Foundation (we accepted the one on MariaDB 10.3). There was another talk submission from a speaker from MariaDB Corporation (again, accepted). We announced the second talk in the sneak preview, but the talk was canceled as the speaker was unable to attend. We did, however, have a deep breadth of talks about MariaDB, with many talks that discussed high availability, security, proxies and the like.

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Oct
11
2017
--

Percona Monitoring and Management 1.3.2 Is Now Available

Percona Monitoring Management

Percona Monitoring ManagementPercona announces the release of Percona Monitoring and Management 1.3.2. This release only contains bug fixes related to usability.

For install and upgrade instructions, see Deploying Percona Monitoring and Management.

Bug fixes

  • PMM-1529: When the user selected “Today”, “This week”, “This month” or “This year” range in Metrics Monitor and clicked the Query Analytics button, the QAN page opened reporting no data for the selected range even if the data were available.
    Percona Monitoring and Management
  • PMM-1528: In some cases, the page not found error could appear instead of the QAN page after upgrading by using the Upgrade button.
  • PMM-1498 : In some cases, it was not possible to shut down the virtual machine containing the PMM Server imported as an OVA image.

Other bug fixes in this release: PMM-913, PMM-1215, PMM-1481PMM-1483, PMM-1507

 

Oct
06
2017
--

This Week in Data with Colin Charles 9: Oracle OpenWorld and Percona Live Europe Post Mortem

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This week: a quick roundup of releases, a summary of my thoughts about Percona Live Europe 2017 Dublin, links to look at and upcoming appearances. Oracle OpenWorld happened in San Francisco this past week, and there were lots of MySQL talks there as well (and a good community reception). I have a bit on that as well (from afar).

Look for these updates on Planet MySQL.

Releases

Percona Live Europe 2017Percona Live Europe Dublin

I arrived on Sunday and chose to rest for my tutorial on Monday. Ronald Bradford and I delivered a tutorial on MySQL Security, and in the morning we chose to rehearse. Percona Live Europe had a full tutorial schedule this year, albeit with one cancellation: MySQL and Docker by Giuseppe Maxia, whom we missed this conference. Check out his blog for further posts about MySQL, Docker, and SQL Roles in MySQL 8!

We had the welcome reception at Sinott’s Bar. There was a large selection of food on each table, as well as two drinks for each of us. It was lively, and I think we overtook most of the basement. Later that evening, there were drinks around the hotel bar, as people started to stream in for Tuesday’s packed schedule!

Tuesday was the conference kickoff, with Peter Zaitsev doing the opening keynote on the state of the open source database ecosystem. The bonus of this keynote was also the short 5-minute talks that would help you get a pick on the important topics and themes around the conference. I heard good things about this from attendees. While most people attended the talks, I spent most of my day in meetings! Then the Community Dinner (thank you Oracle for sponsoring), where we held this year’s Lightning Talks (and plenty more to drink). A summary of the social events is at Percona Live Europe Social.

Wednesday morning we definitely wanted to start a few minutes later, considering people were streaming in slower thanks to the poor weather (yes, it rained all day). The State of the Dolphin ensured we found out lots of new things coming to MySQL 8.0 (exciting!), then the sponsor keynote by Continuent given by MC Brown, followed by a database reliability engineering panel with the authors of Database Reliability Engineering Charity Majors and Laine Campbell. Their book signing went quickly too – they have many fans. We also heard from Pepper Media on their happy journey with Percona. Another great day of talks before the evening reception (which had less folk, since people were flying off that evening). Feel free to also read Matthias Crauwels, Percona Live Europe 2017 Review.

Percona Live Europe 2017 Dublin had over 350+ attendees, over 140+ speakers – all in a new location! If you have any comments please feel free to shoot me an email.

Oracle Open WorldOracle OpenWorld from Afar

At this year’s Oracle OpenWorld there was talk about Oracle’s new self-driving, machine-learning based autonomous database. There was a focus on Amazon SLAs.

It’s unclear if this will also be what MySQL gets eventually, but we have in the MySQL world lossless semi-sync replication. Amazon RDS for MySQL is still DRBD based, and Google Cloud SQL does use semisync – but we need to check further if this is lossless semisync or not.

Folk like Panoply.io claim they can do autonomous self-driving databases, and have many platform integrations to boot. Anyone using this?

Nice to see a Percona contribution to remove InnoDB buffer pool mutex get accepted, and apparently it was done the right way. This is sustainable engineering: fix and contribute back upstream!

I was particularly interested in StorageTapper released by Uber to do real-time MySQL change data streaming and transformation. The slide deck is worth a read as well.

Booking.com also gave a talk. My real takeaway from this was about why MySQL is strong: “thousands of instances, a handful of DBAs.” Doug Henschen also talks about a lot of custom automation capabilities, the bonus of which is many are probably already open source. There are some good talks and slide decks to review.

It wouldn’t be complete without Dimitri Kravtchuk doing some performance smackdowns, and I highly recommend you read MySQL Performance: 2.1M QPS on 8.0-rc.

And for a little bit of fun: there was also an award given to Alexander Rubin for fixing MySQL#2: does not make toast. It’s quite common for open source projects to have such bugs, like the famous Ubuntu bug #1. I’ve seen Alexander demo this before, and if you want to read more check out his blog post from over a year ago: Fixing MySQL Bug#2: now MySQL makes toast! (Yes, it says April 1! but really, it was done!) Most recently it was done at Percona Live Santa Clara 2017.

Link List

Upcoming appearances

Percona’s website keeps track of community events, to see where to listen to a Perconian speak. My upcoming appearances are:

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Sep
29
2017
--

This Week in Data with Colin Charles 8: Percona Live Europe 2017 Is a Wrap!

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Europe 2017

Percona Live Europe 2017 Dublin

We’ve spent a lot of time in the last few months organizing Percona Live Europe Dublin. I want to thank all the speakers, sponsors and attendees for helping us to pull off yet another great event. While we’ll provide some perspectives, thoughts and feedback soon, all the early mornings, jam-packed meetings and the 4 am bedtimes means I’ll probably talk about this event in my next column!

In the meantime, save the date for Percona Live Santa Clara, April 23-25 2018. The call for papers will open in October 2017.

Releases

Link List

Upcoming appearances

Percona’s website keeps track of community events, so check out where to listen to a Perconian speak. My upcoming appearances are:

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Sep
26
2017
--

Percona Live Europe 2017 Keynotes Day One

Percona Live Europe 2017 Keynotes

After yesterday’s very successful tutorial day, everyone looked forward to an inspiring first day of the conference and the Percona Live Europe 2017 keynotes. There were some fantastic keynotes delivered, and some excellent sessions scheduled.

Note. These videos are as shot, and the slides will be superimposed very soon so you can enjoy the full conference experience!

Laurie Coffin, Chief Marketing Office of Percona, opened proceedings with a welcome address where she paid tribute to Jaako Pesonen: a true champion of open source and friend to our community who passed away just this month. He will be missed.

Championing Open Source Databases


Peter Zaitsev delivers his keynote “Championing Open Source Databases”

Laurie then introduced Peter Zaitsev, CEO of Percona, who delivered his keynote “Championing Open Source Databases.” He reiterating Percona’s commitment to remaining an unbiased champion of the open source database ecosystem.

At Percona, we see a lot of compelling open source projects and trends that we think the community will find interesting, and following Peter’s keynote there was a round of lightning talks from projects that we think are stellar and deserve to be highlighted.

Percona Monitoring and Management Demo


Michael Coburn delivers his keynote “Percona Monitoring and Management Demo”

The second keynote was by Percona Product Manager Michael Coburn on Percona Monitoring and Management. How can you optimize database performance if you can’t see what’s happening? Percona Monitoring and Management (PMM) is a free, open source platform for managing and monitoring MySQL, MariaDB, MongoDB and ProxySQL performance. PMM uses Metrics Monitor (Grafana + Prometheus) for visualization of data points, along with Query Analytics, to help identify and quantify non-performant queries and provide thorough time-based analysis to ensure that your data works as efficiently as possible. Michael provided a brief demo of PMM.

MySQL as a Layered Service: How to use Proxy SQL to Control Traffic and Scale-Out

René Cannaò delivers his keynote “MySQL as a Layered Service: How to use Proxy SQL to Control Traffic and Scale-Out”

The next keynote was from René Cannaò, Founder at ProxySQLThe inability to control the traffic sent to MySQL is one of the worse nightmares for a DBA. Scaling out and high availability are only buzz words if the application doesn’t support such architectures. ProxySQL is able to create an abstraction layer between the application and the database: controlling traffic at this layer hides the complexity of the database infrastructure from the application, allowing both HA and scale out. The same layer is able to protect the database infrastructure from abusive traffic, acting as a firewall and cache, and rewriting queries.

Realtime DNS Analytics at Cloudflare with ClickHouse


Tom Arnfeld delivers his keynote “Realtime DNS Analytics at Cloudflare with ClickHouse” 

Cloudflare operates multiple DNS services that handle over 100 billion queries per day for over 6 million internet properties, collecting and aggregating logs for customer analytics, DDoS attack analysis and ad-hoc debugging. Tom Arnfeld, Systems Engineer at Cloudflare, talks briefly in his keynote on how Cloudflare securely and reliably ingests these log events, and uses ClickHouse as an OLAP system to both serve customer real-time analytics and other queries.

Why Open Sourcing our Database Tooling was a Smart Decision


Shlomi Noach delivers his keynote “Why Open Sourcing our Database Tooling was a Smart Decision” 

Drawing from experience at GitHub, Senior Infrastructure Engineer Shlomi Noach, argues in his keynote that open sourcing your database infrastructure/tooling is not only a good, but a smart business decision, that may reward you in unexpected ways. Here are his observations.

MyRocks at Facebook and a Roadmap


Yoshinori Matsunobu delivers his keynote “MyRocks at Facebook and a Roadmap”

A major objective of creating MyRocks at Facebook was replacing InnoDB as the main storage engine, with more space optimisations, and without big migration pains. They have made good progress and extended their goals to cover more use cases. In this keynote, Yoshinori Matsunobu, Production Engineer at Facebook, shares MyRocks production deployment status and MyRocks development plans.

Prometheus for Monitoring Metrics


Brian Brazil, CEO of Prometheus, delivers his keynote “Prometheus for Monitoring Metrics”

From its humble beginnings in 2012, the Prometheus monitoring system has grown a substantial community with a comprehensive set of integrations. Brian Brazil, CEO of Prometheus, provides an overview of the core ideas behind Prometheus and its feature set.

That sums up today’s keynotes. Stay tuned for the next set tomorrow!

Aug
25
2017
--

This Week in Data with Colin Charles #3: More Percona Live Europe!

Colin Charles

Percona Live Europe Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

We are five weeks out to the conference! The tutorials and the sessions have been released, and there’s an added bonus – you can now look at all this in a grid view: tutorials, day one and day two. Now that you can visualize what’s being offered, don’t forget to register.

If you want a discount code, feel free to email me at colin.charles@percona.com.

We have some exciting keynotes as well. Some highlights:

  1. MySQL as a Layered Service: How to Use ProxySQL to Control Traffic and Scale Out, given by René Cannaò, the creator of ProxySQL
  2. Why Open Sourcing Our Database Tooling was the Smart Decision, given by Shlomi Noach, creator of Orchestrator, many other tools, and developer at GitHub (so expect some talk about gh-ost)
  3. MyRocks at Facebook and a Roadmap, given by Yoshinori Matsunobu, shepherd of the MyRocks project at Facebook
  4. Real Time DNS Analytics at CloudFlare with ClickHouse, given by Tom Arnfeld
  5. Prometheus for Monitoring Metrics, given by Brian Brazil, core developer of Prometheus
  6. A Q&A session with Charity Majors and Laine Campbell on Database Reliability Engineering, their new upcoming book!

Let’s not forget the usual State of the Dolphin, an update from Oracle’s MySQL team (representative: Geir Høydalsvik), as well as a keynote by Peter Zaitsev (CEO, Percona) and Continuent. There will also be a couple of Percona customers keynoting, so expect information-packed fun mornings! You can see more details about the keynotes here: day one and day two.

Releases

  • Tarantool 1.7.5 stable. The first in the 1.7 series that comes as stable, and it also comes with its own Log Structured Merge Tree (LSM) engine called Vinyl. They wrote this when they found RocksDB insufficient for them. Slides: Vinyl: why we wrote our own write-optimized storage engine rather than chose RocksDB (and check out the video).
  • MariaDB Server 10.2.8. A– as per my previous column, this build merges TokuDB from Percona Server 5.6.36-82.1 (fixing some bugs). There is also a new InnoDB from MySQL 5.7.19 (current GA release). Have you tried MariaDB Backup yet? There are some GIS compatibility fixes (i.e., to make it behave like MySQL 5.7). One thing that piqued my interest is the CONNECT storage engine (typically used for ETL operations) now has beta support for the MONGO table type. No surprises, it’s meant to read MongoDB tables via the MongoDB C Driver API. Definitely something to try!

Link List

Upcoming Appearances

Percona’s website keeps track of community events, so check out where to listen to a Perconian speak. My upcoming appearances are:

  1. db tech show case Tokyo 2017 – 5-7 September 2017, Tokyo, Japan
  2. Open Source Summit North America – 11-14 September 2017, Los Angeles, CA, USA
  3. Percona Live Europe Dublin – 25-27 September 2017, Dublin, Ireland
  4. Velocity Europe – 17-20 October 2017, London, UK
  5. Open Source Summit Europe – 23-26 October 2017, Prague, Czech Republic

Feedback

Bill Bogasky (MariaDB Corporation) says that if you’re looking for commercial support for Riak now that Basho has gone under, you could get it from Erlang Solutions or TI Tokyo. See their announcement: Riak commercial support now available post-Basho. Thanks, Bill!

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Aug
17
2017
--

IMDb Data in a Graph Database

Graph Database 1

Graph Database 1In this first of its kind, Percona welcomes Dehowe Feng, Software Developer from Bitnine as a guest blogger. In his blog post, Dehowe discusses how viewing imported data from IMDb into a graph database (AgensGraph) lets you quickly see how data nodes relate to each other. This blog echoes a talk given by Bitnine at the Percona Live Open Source Database Conference 2017.

Graphs help illustrate the relationships between entities through nodes, drawing connections between people and objects. Relationships in IMDb are inherently visual. Seeing how things are connected grants us a better understanding of the context underneath. By importing IMDb data as graph data, you simplify the schema can obtain key insights.

In this post, we will examine how importing IMDb into a graph database (in this case, AgensGraph) allows us to look at data relationships in a much more visual way, providing more intuitive insights into the nature of related data.

For install instructions to the importing scripts, go here.

Internet Movie Database (IMDb) owned by Amazon.com is one of the largest movie databases. It contains 4.1 million titles and 7.7 million personalities (https://en.wikipedia.org/wiki/IMDb).

Relational Schema for IMDb

Graph Database 2

Relational Schema of IMDb Info

Picture courtesy of user ofthelit on StackOverflow, https://goo.gl/SpS6Ca

Because IMDb’s file format is not easy to read and parse, rather than implementing the file directly we use an additional step to load it into relational tables. For this project, we used IMDbpy to load relational data into AgensGraph in relational form. The above figure is the relational schema which IMDbpy created. This schema is somewhat complicated, but essentially there are four basic entries: Production, Person, Company and Keyword. Because there are many N-to-N relationships between these entities, the relational schema has more tables than the number of entities. This makes the schema harder to understand. For example, a person can be related to many movies and a movie can have many characters.

Concise Graph Modeling

From there, we developed our own graph schema using Production, Person, Company and Keyword as our nodes (or end data points).

Productions lie at the “center” of the graph, with everything leading to them. Keywords describing Productions, Persons and Companies are credited for their contributions to Productions. Productions are linked to other productions as well.

Graph Database 3

Simplified Graph Database Schema

With the data in graph form, one can easily see the connections between all the nodes. The data can be visualized as a network and querying the data with Cypher allows users to explore the connections between entities.

Compared to the relational schema of IMDb, the graph schema is much simpler to understand. By merging related information for the main entities into nodes, we can access all relevant information to that node through that node, rather than having to match IDs across tables to get the information that we need. If we want to examine how a node relates to another node, we can query its edges to see the connections it forms. Being able to visually “draw a connection” from one node to another helps to illustrate how they are connected.

Furthermore, the labels of the edges describe how the nodes are connected. Edge labels in the IMDb Graph describe what kind of connection is formed, and pertinent information may be stored in attributes in the edges. For example, for the connections ACTOR_IN and ACTRESS_IN, we store role data, such as character name and character id.

Data Migration

To make vertexes’ and edges’ properties we use “views”, which join related tables. The data is migrated into a graph format by querying the relational data using selects and joins into a single table with the necessary information for creating each node.

For example, here is the SQL query used to create the jsonb_keyword view:

CREATE VIEW jsonb_keyword AS
SELECT row_to_json(row(keyword)) AS data
FROM keyword;

We use a view to make importing queries simpler. Once this view is created, its content can be migrated into the graph. After the graph is created, the graph_path is set, and the VLABEL is created, we can use the convenient LOAD keyword to load the JSON values from the relational table into the graph:

LOAD FROM jsonb_keyword AS keywords
CREATE (a:Keyword = data(keywords) );

Note that here LOAD is used to load data in from a relational table, but LOAD can also be used to load data from external sources as well.

Creating edges is a similar process. We load edges from the tables that store id tuples of the between the entities after creating their ELABELs:

LOAD FROM movie_keyword AS rel_key_movie
MATCH (a:Keyword), (b:Production)
WHERE a.id::int = (rel_key_movie).keyword_id AND
b.id::int = (rel_key_movie).movie_id
CREATE (a)-[:KEYWORD_OF]->(b);

As you can see, AgensGraph is not restricted to the CSV format when importing data. We can import relational data into its graph portion using the LOAD feature and SQL statements to refine our data sets.

How is information stored?

Most of the pertinent information is held in the nodes (vertexes). Nodes are labeled either as Productions, Persons, Companies or Keywords, and their relative information is stored as JSONs. Since IMDB information is constantly updated, many fields for certain entities are left incomplete. Since JSON is semi-structured, if an entity does not have a certain piece of information the field will not exist at all – rather than having a field and marking it as NULL.

We also use nested JSON arrays to store data that may have multiple fields, such as quotes that persons might have said or alternate titles to productions. This makes it possible to store “duplicate” fields in each node.

How can this information be used?

In the graph IMDb database, querying between entities is very easy to learn. Using the Cypher Query Language, a user can find things such as all actors that acted in a certain production, all productions that a person has worked on or all other companies that have worked with a certain company on any production. Graph database strength is the simplicity of visualizing the data. There are many ways you can query a graph database to find what you need!

Find the name of all actors that acted in Night at the Museum:

MATCH (a:Person)-[:ACTOR_IN]->(b:Production)
WHERE title = 'Night at the Museum'
RETURN a.name,b.title;

Result:

name | title
-----------------------+---------------------
Asprinio, Stephen | Night at the Museum
Blais, Richard | Night at the Museum
Bougere, Teagle F. | Night at the Museum
Bourdain, Anthony | Night at the Museum
Cherry, Jake | Night at the Museum
Cheng, Paul Chih-Ping | Night at the Museum
...
(56 rows)

Find all productions that Ben Stiller worked on:

MATCH (a:Person)-[b]->(c:Production)
WHERE a.name = 'Stiller, Ben'
RETURN a.name,label(b),c.title;

Result:

name | label | title
-------------+-------------+-----------------------------------------------
...
Stiller, Ben | actor_in | The Heartbreak Kid: The Egg Toss
Stiller, Ben | producer_of | The Hardy Men
Stiller, Ben | actor_in | The Heartbreak Kid: Ben & Jerry
Stiller, Ben | producer_of | The Polka King
Stiller, Ben | actor_in | The Heartbreak Kid
Stiller, Ben | actor_in | The Watch
Stiller, Ben | actor_in | The History of 'Walter Mitty'
Stiller, Ben | producer_of | The Making of 'The Pick of Destiny'
Stiller, Ben | actor_in | The Making of 'The Pick of Destiny'
...
(901 rows)

Find all actresses that worked with Sarah Jessica Parker:

MATCH (a:Person)-[b:ACTRESS_IN]->(c:Production)<-[d:ACTRESS_IN]-(e:Person)
WHERE a.name = 'Parker, Sarah Jessica'
RETURN DISTINCT e.name;

Result:

name
---------------------------------
Aaliyah
Aaron, Caroline
Aaron, Kelly
Abascal, Nati
Abbott, Diane
Abdul, Paula
...
(3524 rows)

Summary

The most powerful aspects of a graph database are flexibility and visualization capabilities.

In the future, we plan to implement a one-step importing script. Currently, the importing script is two-phased: the first step is to load into relational tables and the second step is to load into the graph. Additionally, AgensGraph has worked with Gephi to release a data import plugin. The Gephi Connector allows for graph visualization and analysis. For more information, please visit www.bitnine.net and www.agensgraph.com.

Jul
11
2017
--

Thread_Statistics and High Memory Usage

thread_statistics

thread_statisticsIn this blog post, we’ll look at how using thread_statistics can cause high memory usage.

I was recently working on a high memory usage issue for one of our clients, and made some interesting discoveries: high memory usage with no bounds. It was really tricky to diagnose.

Below, I am going to show you how to identify that having thread_statistics enabled causes high memory usage on busy systems with many threads.

Part 1: Issue Background

I had a server with 55.0G of available memory. Percona Server for MySQL version:

Version | 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31
                 Built On | debian-linux-gnu x86_64

We have calculated approximately how much memory MySQL can use in a worst case scenario for max_connections=250:

mysql> select ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd_buffer_size+@@thread_stack)/1024/1024/1024*250);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 12.445816040039 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

So in our case, this shouldn’t be more than ~12.5G.

After MySQL Server has been restarted, it allocated about 7G. After running for a week, it reached 44G:

# Memory #####################################################
       Total | 55.0G
        Free | 8.2G
        Used | physical = 46.8G, swap allocated = 0.0, swap used = 0.0, virtual = 46.8G
      Shared | 560.0k
     Buffers | 206.5M
      Caches | 1.0G
       Dirty | 7392 kB
     UsedRSS | 45.3G
  Swappiness | 60
 DirtyPolicy | 20, 10
 DirtyStatus | 0, 0

# Top Processes ##############################################
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2074 mysql     20   0 53.091g 0.044t   8952 S 126.0 81.7  34919:49 mysqld

I checked everything that could be related to the high memory usage (for example, operating system settings such as Transparent Huge Pages (THP), etc.). But I still didn’t find the cause (THP was disabled on the server). I asked my teammates if they had any ideas.

Part 2: Team Is on Rescue

After brainstorming and reviewing the status, metrics and profiles again and again, my colleague (Yves Trudeau) pointed out that User Statistics is enabled on the server.

User Statistics adds several INFORMATION_SCHEMA tables, several commands, and the

userstat

 variable. The tables and commands can be used to better understand different server activity, and to identify the different load sources. Check out the documentation for more information.

mysql> show global variables like 'user%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| userstat      | ON    |
+---------------+-------+
mysql> show global variables like 'thread_statistics%';
+-------------------------------+---------------------------+
| Variable_name                 | Value                     |
+-------------------------------+---------------------------+
| thread_statistics             | ON                        |
+-------------------------------+---------------------------+

Since we saw many threads running, it was a good option to verify this as the cause of the issue.

Part 3: Cause Verification – Did It Really Eat Our Memory?

I decided to apply some calculations, and the following test cases to verify the cause:

  1. Looking at the THREAD_STATISTICS table in the INFORMATION_SCHEMA, we can see that for each connection there is a row like the following:
    mysql> select * from THREAD_STATISTICS limit 1G
    *************************** 1. row ***************************
    THREAD_ID: 3566
    TOTAL_CONNECTIONS: 1
    CONCURRENT_CONNECTIONS: 0
    CONNECTED_TIME: 30
    BUSY_TIME: 0
    CPU_TIME: 0
    BYTES_RECEIVED: 495
    BYTES_SENT: 0
    BINLOG_BYTES_WRITTEN: 0
    ROWS_FETCHED: 27
    ROWS_UPDATED: 0
    TABLE_ROWS_READ: 0
    SELECT_COMMANDS: 11
    UPDATE_COMMANDS: 0
    OTHER_COMMANDS: 0
    COMMIT_TRANSACTIONS: 0
    ROLLBACK_TRANSACTIONS: 0
    DENIED_CONNECTIONS: 0
    LOST_CONNECTIONS: 0
    ACCESS_DENIED: 0
    EMPTY_QUERIES: 0
    TOTAL_SSL_CONNECTIONS: 1
    1 row in set (0,00 sec)
  2. We have 22 columns, each of them BIGINT, which gives us ~ 176 bytes per row.
  3. Let’s calculate how many rows we have in this table at this time, and check once again in an hour:
    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 7864343  |
    +----------+
    1 row in set (15.35 sec)

    In an hour:

    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 12190801 |
    +----------+
    1 row in set (24.46 sec)
  4. Now let’s check on how much memory is currently in use:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2096 mysql 20 0 12.164g 0.010t 16036 S 173.4 18.9 2274:51 mysqld
  5. We have 12190801 rows in the THREAD_STATISTICS table, which is ~2G in size.
  6. Issuing the following statement cleans up the statistics:

    mysql> flush thread_statistics;
    mysql> select count(*) from information_schema.thread_statistics;
    +----------+
    | count(*) |
    +----------+
    | 0        |
    +----------+
    1 row in set (00.00 sec)
  7. Now, let’s check again on how much memory is in use:

    ID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2096 mysql 20 0 12.164g 7.855g 16036 S 99.7 14.3 2286:24 mysqld

As we can see, memory usage drops to the approximate value of 2G that we had calculated earlier!

That was the root cause of the high memory usage in this case.

Conclusion

User Statistics (basically Thread_Statistics) is a great feature that allows us to identify load sources and better understand server activity. At the same time, though, it can be dangerous (from the memory usage point of view) to use as a permanent monitoring solution due to no limitations on memory usage.

As a reminder, thread_statistics is NOT enabled by default when you enable User_Statistics. If you have enabled Thread_Statistics for monitoring purposes, please don’t forget to pay attention to it.

As a next step, we are considering submitting a feature request to implement some default limits that can prevent Out of Memory issues on busy systems.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com