Jan
30
2017
--

MariaDB ColumnStore

MariaDB ColumnStore

MariaDB ColumnStoreLast month, MariaDB officially released MariaDB ColumnStore, their column store engine for MySQL. This post discusses what it is (and isn’t), why it matters and how you can approach a test of it.

What is ColumnStore?

ColumnStore is a storage engine that turns traditional MySQL storage concepts on their head. Instead of storing the data by row, a column store stores the data by column (obviously). This provides advantages for certain types of data, and certain types of queries run against that data. See my previous post for more details on column-based storage systems.

ColumnStore is a fork of InfiniDB and carries forward many of the concepts behind that product. InfiniDB ceased operations in 2014. With the front end managed through MariaDB, you get access to all of the expected security and audit options of MariaDB. MariaDB designed ColumnStore as a massively parallel database, working best in an environment with multiple servers. This is somewhat different than a traditional row store database.

ColumnStore stores columnar data in a concept called an “extent.” An extent contains a range of values for a single column. Each extent contains no more than 8 million values. It stores additional values in a new extent. The extents for a single column get distributed across the database nodes, known as “Performance Modules” in ColumnStore. It stores each unique extent on more than one node, thus providing data redundancy and removing the need for replication. If a node is down, and it contains an extent needed for a query, that same extent is found on another node in the environment. This data redundancy also provides a high availability environment.

The query engine determines which extents process query requests. Since the data in an extent is often preordered (time series data, for example), many queries can ignore individual extents since they cannot contain any data needed for the query. If we are only looking for data from February 2017, for example, extents containing data outside of that range get ignored. However, if a query requires data from many or all extents on a single column, the query takes much longer to complete.

Unlike some traditional column store vendors, that take an all or nothing approach to storage, MariaDB decided to go with a mixed concept. In a MariaDB MySQL database, you can mix traditional InnoDB storage with the new ColumnStore storage, just like you used to mix InnoDB and MyISAM. This presents some nice options, not the least of which is that it provides a way to “dip your toe” into the world of column stores. On the other hand, it could lead to people using the ColumnStore engine in non-optimal ways. Also, the differences in what is considered optimal architecture between these two storage options make it hard to see how this plays out in the real world.

Data Definition

As discussed in the earlier post, column storage works great for specific types of data and queries. It is important that your data definitions are as tight as possible, and that your queries are appropriate for column-based data.

Many people set their field definition as VARCHAR(256) when setting up a new database. They might not know what type of data gets stored in the new field. This broad definition allows you to store whatever you throw at the database. The negative effect for row store is that it can cause over-allocation of storage – but it only has a minimal effect on queries.

In a column store, the field definition can drive decisions about the compression methods for storing the data, along with sorting implications. Columnar data can use storage more efficiently than a row store, since the data for a single column is well-defined. This leads to selecting the best compression algorithm for the data. If that data is poorly defined, the selected compression algorithm might not be the best for the data.

Sorting is also a problem in a column store when the data types are not well-defined. We’ve all seen integer or date data that is sorted alphabetically. While it can be annoying, we can still adjust to that sorting method to find what we need. Since a column store is often used to perform analytical queries over a range of data, this poorly-sorted data can present a bigger problem. If you specify a column to be VARCHAR and only include date information, that data is sorted alphabetically. The same column defined as DATE causes the data to be sorted by date. This chart shows the difference (date format is mm/dd/yy)

Alphabetic Sort Date Sort
01/01/17 01/01/17
01/11/17 01/02/17
01/02/17 01/04/17
01/21/17 01/11/17
01/4/17 01/21/17
11/01/17 02/01/17
11/02/17 11/01/17
02/01/17 11/02/17

 

Imagine running a query over a range of dates (requesting all activity in the months of January and February 2017, for example). In the alphabetic sort, this requires working through the whole file, since the data for November shows up between the data for January and February. In the date sort, the query only reads the until the end of February. We know there can be no more matching data after that. The alphabetic sort leads to more I/O, more query time and less happiness on the part of the user.

Why Should You Care About ColumnStore?

The first reason is that it allows you to try out column storage without doing a massive shift in technology and with minimal effort. By setting up some tables in a MariaDB database to use the ColumnStore engine, you can benefit from the storage efficiencies and faster query capabilities, provided that the data you’ve selected for this purpose is sound. This means that the data definitions should be tight (always a good plan anyway), and the queries should be more analytical than transactional. For a purely transactional workflow, a row store is the logical choice. For a purely analytical workflow, a column store is the logical choice. ColumnStore allows you to easily mix the two storage options so that you can have the best match possible. It is still important to know what type of workflow you’re dealing with, and match the storage engine to that need.

Another solid reason is that it is a great fit if you are already doing analysis over massive amounts of data. Any column store shines when asked to look at relatively few columns of data (ideally the column or two that are being aggregated and other columns to locate and group the data). If you are already running these types of queries in MySQL, ColumnStore would likely be a good fit.

But There Be Dragons!

As with any new technology, ColumnStore might not be a good fit for everyone. Given that you can mix and match your storage engines, with ColumnStore for some tables and InnoDB for others, it can be tempting to just go ahead with a ColumnStore test doing things the same way you always did in the past. While this still yields results, those results might not be a true test of the technology. It’s like trying to drive your minivan the same way you used to drive your sports car. “Hey, my Alfa Romeo never flipped over taking these turns at high speed!”

To effectively use ColumnStore, it’s important to match it to a proper analytical workload. This means that you will likely do more bulk loading into these tables, since there is additional overhead in writing the data out into the column files. The overall workflow should be more read-intensive. The queries should only look for data from a small set of fields, but can span massive amounts of data within a single column. In my earlier post, there’s also a discussion about normalization of data and how denormalizing data is more common in columnar databases.

You should address these issues in your testing for a valid conclusion.

The minimum specifications for ColumnStore also point to a need for a more advanced infrastructure than is often seen for transactional data. This is to support batch loading, read intensive workloads and possibly different ETL processes for each type of data. In fact, MariaDB states in the installation documentation for ColumnStore that it must be completed as a new installation of MariaDB. You must remove any existing installations of MariaDB or MySQL before installing the ColumnStore-enabled RPM on a system.

Is It Right for Me?

ColumnStore might fit well into your organization. But like haggis, it’s not for everyone. If you need analytical queries, it is a great option. If your workload is more read-intensive, it could still work for you. As we move to a more Internet of Things (IoT) world, we’re likely to see a need for more of this type of query work. In order to accurately present each user with the best possible Internet experience, we might want to analyze their activities over spans of time and come up with the best match for future needs.

Seriously consider if making the move to ColumnStore is right for you. It is newer software (version 1.0.6, the first GA version, was released on December 14, 2016, and 1.0.7 was released on January 23, 2017), so it might go through changes as it matures. Though a new product, it is based on InfiniDB concepts (which are somewhat dated). MariaDB has lots of plans for additional integrations and support for ancillary products that are absent in the current release.

MariaDB took a huge step forward with ColumnStore. But do yourself a favor and consider whether it is right for you before testing it out. Also, make sure that you are not trying to force your current workflow into the column store box. Kids know that we cannot put a square peg in a round hole, but we adults often try to do just that. Finding the right peg saves you lots of time, hassle and annoyance.

Jan
27
2017
--

Percona Software News and Roadmap Update with CEO Peter Zaitsev: Q1 2017

Percona Software News and Roadmap Update

This blog post is a summary of the webinar Percona Software News and Roadmap Update – Q1 2017 given by Peter Zaitsev on January 12, 2017.

Over the last few months, I’ve had the opportunity to meet and talk with many of Percona’s customers. I love these meetings, and I always get a bunch of questions about what we’re doing, what our plans are and what releases are coming.

I’m pleased to say there is a great deal going on at Percona, and I thought giving a quick talk about our current software and services, along with our plans, would provide a simple reference for many of these questions.

A full recording of this webinar, along with the presentation slide deck, can be found here.

Percona Solutions and Services

Let me start by laying out Percona’s company purpose:

To champion unbiased open source database solutions.

What does this mean? It means that we write software to offer you better solutions, and we use the best of what software and technology exist in the open source community.

Percona stands by a set of principles that we feel define us as a company, and are a promise to our customers:

  • 100% free and open source software
  • Focused on finding solution to maximize your success
  • Open source database strategy consulting and implementation
  • No vendor lock-in required

We offer trusted and unbiased expert solutions, support and resource in a broad software ecosystem, including:

We also have specialization options for PaaS, IaaS, and SaaS solutions like Amazon Web Services, OpenStack, Google Cloud Platform, OpenShift, Ceph, Docker and Kubernetes.

Percona’s immediate business focus includes building long-term partnership relationships through support and managed services.

The next few sections detail our current service offerings, with some outlook on our plans.

98% Customer Satisfaction Rating

Over the last six months, Percona has consistently maintained a 98% Customer Satisfaction Rating!

Customer Success Team

Our expanded Customer Success Team is here to ensure you’re getting most out of your Percona Services Subscription.

Managed Services Best Practices

  • Unification based on best practices
  • Organization changes to offer more personal service
  • Increased automation

Ongoing Services

Percona Software News and Roadmap Update

Consulting and Training. Our consulting and training services are available to assist you with whatever project or staff needs you have.

  • Onsite and remote
  • 4 hours to full time (weeks or months)
  • Project and staff augmentation

Advanced HA Included with Enterprise and Premier Support. Starting this past Spring, we included advance high availability (HA) support as part of our Enterprise and Premier support tiers. This advanced support includes coverage for:

  • Percona XtraDB Cluster
  • MariaDB Galera Cluster
  • Galera Cluster for MySQL
  • Upcoming MySQL group replication
  • Upcoming MySQL Innodb Cluster

Enterprise Wide Support Agreements. Our new Enterprise Wide Support option allows you to buy per-environment support coverage that covers all of the servers in your environment, rather than on a per-server basis. This method of support can save you money, because it:

  • Covers both “MySQL” and “MongoDB”
  • Means you don’t have to count servers
  • Provides highly customized coverage

 

Simplified Support Pricing. Get easy to understand support pricing quickly.

To discuss how Percona Support can help your business, please call us at +1-888-316-9775 (USA),
+44 203 608 6727 (Europe), or have us contact you.

New Percona Online Store – Easy to Buy, Pay Monthly

Percona Software

Below are the latest and upcoming features in Percona’s software. All of Percona’s software adheres to the following principles:

  • 100% free and open source
  • No restricted “Enterprise” version
  • No “open core”
  • No BS-licensing (BSL)

Percona Server for MySQL 5.7

Overview

  • 100% Compatible with MySQL 5.7 Community Edition
  • 100% Free and Open Source
  • Includes Alternatives to Many MySQL Enterprise Features
  • Includes TokuDB Storage Engine
  • Focus on Performance and Operational Visibility

Latest Improvements

Features about to be Released 

  • Integration of TokuDB and Performance Schema
  • MyRocks integration in Percona Server
  • MySQL Group Replication
  • Starting to look towards MySQL 8

Percona XtraBackup 2.4

Overview

  • #1 open source binary hot backup solution for MySQL
  • Alternative to MySQL Enterprise backup
  • Parallel backups, incremental backups, streaming, encryption
  • Supports MySQL, MariaDB, Percona Server

New Features

  • Support SHA256 passwords and secure connection to server
  • Improved Security (CVE-2016-6225)
  • Wrong passphrase detection

Percona Toolkit

Overview

  • “Swiss Army Knife” of tools
  • Helps DBAs be more efficient
  • Helps DBAs make fewer mistakes
  • Supports MySQL, MariaDB, Percona Server, Amazon RDS MySQL

New Features

  • Improved fingerprinting in pt-query-digest
  • Pause file for pt-online-schema-change
  • Provide information about transparent huge pages

Coming Soon

  • Working towards Percona Toolkit 3.0 release
  • Comprehensive support for MongoDB
  • New tools are now implemented in Go

Percona Server for MongoDB 3.2

Overview

  • 100% compatible with MongoDB 3.2 Community Edition
  • 100% open source
  • Alternatives for many MongoDB Enterprise features
  • MongoRocks (RocksDB) storage engine
  • Percona Memory Engine

New

  • Percona Server for MongoDB 3.2 – GA
  • Support for MongoRocks storage engine
  • PerconaFT storage engine depreciated
  • Implemented Percona Memory Engine

Coming Soon

  • Percona Server for MongoDB 3.4
  • Fully compatible with MongoDB 3.4 Community Edition
  • Updated RocksDB storage engine
  • Universal hot backup for WiredTiger and MongoRocks
  • Profiling rate limiting (query sampling)

Percona Memory Engine for MongoDB

Benchmarks

Percona Memory Engine for MongoDB® is a 100 percent open source in-memory storage engine for Percona Server for MongoDB.

Based on the in-memory storage engine used in MongoDB Enterprise Edition, WiredTiger, Percona Memory Engine for MongoDB delivers extremely high performance and reduced costs for a variety of use cases, including application cache, sophisticated data manipulation, session management and more.

Below are some benchmarks that we ran to demonstrate Percona Memory Engine’s performance.

Percona Software News and Roadmap Update

WiredTiger vs MongoRocks – write intensive

Percona XtraDB Cluster 5.7

Overview

  • Based on Percona Server 5.7
  • Easiest way to bring HA in your MySQL environment
  • Designed to work well in the cloud
  • Multi-master replication with no conflicts
  • Automatic node provisioning for auto-scaling and self-healing

Goals

  • Brought PXC development in-house to server our customers better
  • Provide complete clustering solution, not set of LEGO pieces
  • Improve usability and ease of use
  • Focus on quality

Highlights

  • Integrated cluster-aware load balancer with ProxySQL
  • Instrumentation with Performance Schema
  • Support for data at rest encryption (InnoDB tablespace encryption)
  • Your data is safe by default with “strict mode” – prevents using features that do not work correctly
  • Integration with Percona Monitoring and Management (PMM)

New in Percona XtraDB Cluster 5.7

  • One option to secure all network communication: pxc-encrypt-cluster-traffic
  • Zero downtime maintenance with ProxySQL and Maintenance Mode

Percona Monitoring and Management

Overview

  • Comprehensive database-focused monitoring
  • 100% open source, roll-your-own solution
  • Easy to install and use
  • Supports MySQL and MongoDB
  • Version 1.0 focuses on trending and query analyses
  • Management features to come

Examples of PMM Screens

What queries are causing the load?

Percona Software News and Roadmap Update

Why are they causing this load?

Percona Software News and Roadmap Update

How to fix them:

Percona Software News and Roadmap Update

System information:

Percona Software News and Roadmap Update

What happens on OS and hardware level:

Percona Software News and Roadmap Update

As well as the database level:

Percona Software News and Roadmap Update

New in Percona Monitoring and Management

  • Continuing to improve and expand dashboards with every release
  • Includes Grafana 4.0 (with basic Alerting)
  • SSL support for server-agent communications
  • Supports authentication for server-agent communication
  • Added support for Amazon RDS
  • Reduced space consumption by using advanced compression

Coming Soon 

  • PMM server available as AMI and Virtual Appliance image
  • Better MongoDB dashboards
  • Continuing work on dashboards Improvement
  • Query analytics application refinements
  • Short term continuing focus on monitoring functionality

Check out the Demo

Percona Live Open Source Database Conference 2017 is right around the corner!

high availibilityThe Percona Live Open Source Database Conference is the premier event for the diverse and active open source database community, as well as businesses that develop and use open source database software. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, PostgreSQL and other open source databases. Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience – all to help you learn how to tackle your open source database challenges in a whole new way

This conference has something for everyone!

Register now and get the early bird rate, but hurry prices go up Jan 31st.

Sponsorship opportunities are available as well. Become a Percona Live Sponsor, download the prospectus here.

 

Jan
27
2017
--

When MySQL Lies: Wrong seconds_behind_master with slave_parallel_workers > 0

seconds_behind_master

seconds_behind_masterIn today’s blog, I will show an issue with seconds_behind_master that one of our clients faced when running slave_parallel_works > 0. We found out that the reported seconds_behind_master from SHOW SLAVE STATUS was lying. To be more specific, I’m talking about bugs #84415 and #1654091.

The Issue

MySQL will not report the correct slave lag if you have slave_parallel_workers> 0. Let’s show it in practice.

I’ll use MySQL Sandbox to speed up one master and two slaves on MySQL version 5.7.17, and sysbench to populate the database:

# Create sandboxes
make_replication_sandbox /path/to/mysql/5.7.17
# Create table with 1.5M rows on it
sysbench --test=/usr/share/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox20192.sock --mysql-db=test --oltp-table-size=1500000 prepare
# Add slave_parallel_workers=5 and slave_pending_jobs_size_max=1G" on node1
echo "slave_parallel_workers=5" >> node1/my.sandbox.cnf
echo "slave_pending_jobs_size_max=1G" >> node1/my.sandbox.cnf
node1/restart

Monitor Replication lag via SHOW SLAVE STATUS:

for i in {1..1000};
do
    (
        node1/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node1: " $2}' &
        sleep 0.1 ;
        node2/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node2: " $2}' &
    );
    sleep 1;
done

On a separate terminal, DELETE some rows in the test.sbtest1 table on the master, and monitor the above output once the master completes the delete command:

DELETE FROM test.sbtest1 WHERE id > 100;

Here is a sample output:

master [localhost] {msandbox} (test) > DELETE FROM test.sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (46.42 sec)
. . .
Node1: 0
Node2: 0
Node1: 0
Node2: 48
Node1: 0
Node2: 48
Node1: 0
Node2: 49
Node1: 0
Node2: 50
. . .
Node1: 0
Node2: 90
Node1: 0
Node2: 91
Node1: 0
Node2: 0
Node1: 0
Node2: 0

As you can see, node1 (which is running with slave_parallel_workers = 5) doesn’t report any lag.

The Workaround

We can workaround this issue by querying performance_schema.threads:

SELECT PROCESSLIST_TIME FROM performance_schema.threads WHERE NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL  or PROCESSLIST_STATE != 'Waiting for an event from Coordinator') ORDER BY PROCESSLIST_TIME DESC LIMIT 1;

Let’s modify our monitoring script, and use the above query to monitor the lag on node1:

for i in {1..1000};
do
    (
        node1/use -BNe "SELECT PROCESSLIST_TIME FROM performance_schema.threads WHERE NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL  or PROCESSLIST_STATE != 'Waiting for an event from Coordinator') ORDER BY PROCESSLIST_TIME DESC LIMIT 1 INTO @delay; SELECT IFNULL(@delay, 0) AS 'lag';" | awk '{print "Node1: " $1}' &
        sleep 0.1 ;
        node2/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node2: " $2}' &
    );
    sleep 1;
done

master [localhost] {msandbox} (test) > DELETE FROM test.sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (45.21 sec)
Node1: 0
Node2: 0
Node1: 0
Node2: 0
Node1: 45
Node2: 45
Node1: 46
Node2: 46
. . .
Node1: 77
Node2: 77
Node1: 78
Node2: 79
Node1: 0
Node2: 80
Node1: 0
Node2: 81
Node1: 0
Node2: 0
Node1: 0
Node2: 0

Please note that in our query to performance_schema.threads, we are filtering PROCESSLIST_STATE “NULL” and “!= Waiting for an event from Coordinator”. The correct state is “Executing Event”, but it seems like it doesn’t correctly report that state (#84655).

Summary

MySQL parallel replication is a nice feature, but we still need to make sure we are aware of any potential issues it might bring. Most monitoring systems use the output of SHOW SLAVE STATUS to verify whether or not the slave is lagging behind the master. As shown above, it has its caveats.

As always, we should test, test and test again before implementing any change like this in production!

Jan
27
2017
--

Nokia introduces MIKA, a digital assistant for engineers and telecom operators

Nokia Shutterstock Nokia’s got a nice, adorable name picked out for its new voice assistant. MIKA stands for “Multi-purpose Intuitive Knowledge Assistant,” letting engineers and telecom operators access information through voice commands. The system is powered by the company’s cognitive services platform, using “augmented intelligence with automated learning to provide access to… Read More

Jan
26
2017
--

Percona Live Featured Tutorial with Frédéric Descamps — MySQL InnoDB Cluster & Group Replication in a Nutshell: Hands-On Tutorial

Percona Live Featured Tutorial

Percona Live Featured TutorialWelcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Frédéric Descamps, MySQL Community Manager at Oracle. Frédéric is probably better known in the community as “LeFred” (Twitter: @lefred)! His tutorial is MySQL InnoDB Cluster and Group Replication in a Nutshell: Hands-On Tutorial (along with  Part 2). Frédéric is delivering this talk with Percona’s MySQL Practice Manager Kenny Gryp. Attendees will get their hands on virtual machines and migrate standard Master/Slave architecture to the new MySQL InnoDB Cluster (native Group Replication) with minimal downtime. I had a chance to speak with Frédéric and learn a bit more about InnoDB Cluster and Group Replication:

Percona: How did you get into database technology? What do you love about it?

Frédéric: I started with SQL on VMS and DBASE during my IT courses. I wrote my thesis on SQLServer replication. To be honest, the implementation at the time wasn’t particularly good. At the same time, I enjoyed hacking a Perl module that allowed you to run SQL queries against plain text files. Then I worked as a Progress developer for MFG/Pro (a big customized ERP). When I decided to work exclusively for an open source company, I was managing more and more customers’ databases in MySQL (3.23 and 4.x), so my colleagues and I took MySQL training. It was still MySQL AB at that time. I passed my MySQL 4.1 Core and Professional Certification, and I ended up delivering MySQL Training for MySQL AB too. Some might also remember that I worked for a company called Percona for a while, before moving on to Oracle!

Percona: Your and Kenny’s tutorial is called “MySQL InnoDB Cluster & Group Replication in a nutshell: hands-on tutorial.” Why would somebody want to migrate to this technology?

Frédéric: The main reason to migrate to MySQL InnoDB is to achieve high availability (HA) easily for your MySQL database, while still maintaining performance.

Thanks to  MySQLShell’s adminAPI, you can create a cluster in less than five minutes. All the management can be done remotely, with just some shell commands. Additionally, for Group Replication our engineers have leveraged existing standard MySQL infrastructures (GTIDs, Multi-Source Replication, Multi-threaded Slave Applier, Binary Logs, etc.), which are already well known by the majority of MySQL DBAs.

Percona: How can moving to a high availability environment help businesses?

Frédéric: Time is money! Many businesses can’t afford downtime anymore. Having a reliable HA solution is crucial for businesses on the Internet. Expectations have changed: users want this to be natively supported inside the database technology versus externally. For example, when I started to work in IT, it was very common to have several maintenance windows during the night. But currently, and almost universally, the customer base is spread worldwide. Any time is during business hours somewhere in the world!

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Frédéric: I think that current HA solutions are too complex to setup and manage. They use external tools. In our tutorial, Kenny and I will demonstrate how MySQL InnoDB Cluster, being an internal implementation, is extremely easy to use. We will also cover some scenarios where things go wrong, and how to deal with them. Performance and ease-of-use were two key considerations in the design of InnoDB Cluster.

Percona: What are you most looking forward to at Percona Live?

Frédéric: Like every time I’ve attended, my main goal is to bring to the audience all the new and amazing stuff we are implementing in MySQL. MySQL has evolved quickly these last few years, and we don’t really plan to slow down. I also really enjoy the feedback from users and other MySQL professionals. This helps focus us on what really matters for our users. And finally, it’s a great opportunity to re-connect with ex-colleagues and friends.

You can find out more about Frédéric Descamps and his work with InnoDB Cluster at his Twitter handle @lefredRegister for Percona Live Data Performance Conference 2017, and see Frédéric and Kenny present their MySQL InnoDB Cluster and Group Replication in a Nutshell: Hands-On Tutorial talk. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Jan
26
2017
--

H2O’s Deep Water puts deep learning in the hands of enterprise users

screen-shot-2017-01-26-at-12-25-50-pm To complement existing offerings like Sparkling Water and Steam, H2O.ai is releasing Deep Water, a new tool to help businesses make deep learning a part of everyday operations. Deep Water will open up new possibilities for the TensorFlow, MXNet and Caffe communities to engage with H20.ai. This also means that the GPU is set to become a greater part of business operations for the entire… Read More

Jan
26
2017
--

Performance Schema Benchmarks: OLTP RW

Performance Schema Benchmarks

In this blog post, we’ll look at Performance Schema benchmarks for OLTP Read/Write workloads.

I am in love with Performance Schema and talk a lot about it. Performance Schema is a revolutionary MySQL troubleshooting instrument, but earlier versions had performance issues. Many of these issues are fixed now, and the default options work quickly and reliably. However, there is no free lunch! It is expected that the more instruments you turn ON, the more overhead you’ll have.

The advice I give my customers is that when in doubt, only turn ON the instruments that are required to troubleshoot your issue. Many of them ask: what exactly are the overhead costs for one instrumentation or another? I believe the best answer is “test on your system!” No generic benchmark can exactly repeat a workload on your site. But while I was working on the “OpenSource Databases on Big Machines” project, I decided to test the performance of Performance Schema as well.

I only tested a Read/Write workload. I used the same fast machine (144 CPU cores), the same MySQL options and the same SysBench commands that I described in this post. The option 

innodb_flush_method

 was changed to O_DIRECT, because it’s more reasonable for real-life workloads. I also upgraded the MySQL Server version to Oracle’s MySQL 5.7.17. The reason for the upgrade was to test if the issue described in this post is repeatable with latest Oracle MySQL server version. But since I tested Performance Schema, the effect on Percona Server for MySQL should be same.

I tested nine different scenarios:

  1. “All disabled”: Performance Schema is ON, but all instruments and consumers are disabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
  2. “All enabled”: Performance Schema is ON, and all instruments and consumers are enabled.
    update setup_instruments set enabled='yes';
    update setup_consumers set enabled='yes';
  3. “Default”: Performance Schema is ON, and only default instruments and consumers are enabled.
  4. “MDL only”: only Metadata Lock instrumentation is enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name='wait/lock/metadata/sql/mdl';
  5. “Memory only”: only Memory instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'memory%';
  6. “Stages and Statements”: only Stages and Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  7. “Stages only”: only Stages instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  8. “Statements only”: only Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
  9. “Waits only”: only Waits instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'wait%' ;
    update setup_consumers set enabled='yes' where name like 'events_waits%';

Here are the overall results.

As you can see, some instrumentation only slightly affects performance, while others affect it a lot. I created separate graphs to make the picture clearer.

As expected, enabling all instrumentation makes performance lower:

Does this mean to use Performance Schema, you need to start the server with it ON and then disable all instruments? No! The default options have very little effect on performance:

The same is true for Metadata Locks, Memory and Statements instrumentation:

Regarding statements, I should note that I used prepared statements (which are instrumented in version 5.7). But it makes sense to repeat the tests without prepared statements.

The Stages instrumentation starts affecting performance:

However, the slowdown is reasonable and it happens only after we reach 32 concurrent threads. It still provides great insights on what is happening during query execution.

The real performance killer is Waits instrumentation:

It affects performance close to the same way as all instruments ON.

Conclusion

Using Performance Schema with the default options, Memory, Metadata Locks and Statements instrumentation doesn’t have a great impact on read-write workload performance. You might notice slowdowns with Stages instrumentation after reaching 32 actively running parallel connections. The real performance killer is Waits instrumentation. And even with it on, you will start to notice a performance drop only after 10,000 transactions per second.

Save

Save

Save

Save

Save

Save

Save

Jan
26
2017
--

IBM adds support for Google’s Tensorflow to its PowerAI machine learning framework

shutterstock ibm PowerAI is IBM’s machine learning framework for companies that use servers based on its Power processors and NVIDIA’s NVLink high-speed interconnects that allow for data to pass extremely quickly between the processor and the GPU that does most of the deep learning calculations. Today, the company announced that PowerAI now supports Google’s popular Tensorflow machine… Read More

Jan
26
2017
--

Peer5 leverages viewers’ devices for a P2P approach to streaming video

Peer5 Anyone who’s tried to watch a popular show like Game of Thrones online has probably had moments where everything pauses and you’re stuck watching a spinning wheel or blinking dots as the streaming service struggles to handle viewer demand. Peer5, one of the startups in the current class at incubator Y Combinator, is working to solve this problem — in fact, it’s… Read More

Jan
25
2017
--

Teachable books $4 million to turn everybody into educators online

A user sets up an online course via Teachable.com Education tech startup Teachable (formerly known as Fedora) has raised $4 million in a Series A round of funding according to CEO and founder Ankur Nagpal. The company provides a platform that’s like a Shopify or a SquareSpace for tutors or teachers. Its platform allows subject matter experts to quickly construct online courses and sell or give them away to their followers, setting their… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com