Jun
23
2017
--

Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication High Availability Webinar: Q & A

High Availability Webinar

High Availability WebinarThank you for attending the Wednesday, June 21, 2017 high availability webinar titled Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication. In this blog, I will provide answers to the Q & A for that webinar.

You can find the slides and a recording of the webinar here.

Is there a minimum MySQL server version for Group Replication?

MySQL Group Replication is GA since MySQL Community 5.7.17. This is the lowest version that you should use for the Group Replication feature. Otherwise, you are using a beta version.

Since 5.7.17 was the GA release, it’s strongly recommended you use the latest 5.7 minor release. Bugs get fixed and features added in each of the minor releases (as can be seen in the Limitations section in the slide deck).

In MySQL 5.6 and earlier versions, Group Replication is not supported. Note that Percona Server for MySQL 5.7.17 and beyond also ships with Group Replication.

Can I use Percona XtraDB Cluster with MariaDB v10.2? or must I use Percona Server for MySQL?

Percona XtraDB Cluster is Percona Server for MySQL and Percona XtraBackup with the modified Galera library. You cannot run Percona XtraDB Cluster on MariaDB.

However, as Percona XtraDB Cluster is open source, it is possible that MariaDB/Codership implements our modifications into their codebase.

If Percona XtraDB Cluster does not allow InnoDB tables, how do we typically deal with applications that need to use MyISAM tables?

You cannot use MyISAM with Percona XtraDB Cluster, Galera or Group Replication. However, there is experimental MyISAM support in Galera/Percona XtraDB Cluster. But we strongly recommend that you don’t use this in production. It effectively executes all statements in Total Order Isolation, which results in bad performance.

What is a typical business use case for the Group Replication? I specifically like the writes order feature.

Typical use cases are:

  • Environments with strict **durability** requirements
  • Write to multiple nodes simultaneously while keeping data **consistent**
  • Reducing failover time
  • Using other nodes for read-scaling, where reading stale data is more difficult for the application (as opposed to standard asynchronous replication)

The use cases for Galera and Percona XtraDB Cluster are similar.

Where do you run ProxySQL, on a separate server? We are using HAProxy.

You can deploy ProxySQL in many different ways. One common method of installation is to run ProxySQL on a separate layer of servers (ensuring there is failover on this layer). Another commonly used method is to run a ProxySQL daemon on every application server.

Do you support KVM?

Yes, there are no limitations on virtualization solutions.

Can you give some examples of an “arbitrator”?

Some useful links:

What does Percona XtraDB add to make it more performant than InnoDB?

The scalability and performance improvement of Percona XtraDB are listed on the Percona Server for MySQL documentation page: https://www.percona.com/doc/percona-server/LATEST/index.html

How scalable is Percona XtraDB Cluster storage wise? Do we have any limitations?

Storage happens through the storage engine (which is InnoDB). Percona XtraDB Cluster does not have any different limitations than Percona Server for MySQL or MySQL.

However, we need to also consider the practical side of things: the larger the cluster gets, the longer certain operations take. For example, when adding a new node to the cluster another node must be the donor and provide all the data. This will take substantially longer with larger datasets. Certain operational aspects might therefore become more complex.

Is there any development to add multiple nodes simultaneously?

No, at the moment only one node can join the cluster at the same time. Other nodes automatically wait until it is finished before joining.

Why does Galera say we cannot use READ COMMITTED isolation for multimaster mode, even though we can start the cluster with READ-COMMITTED?

You can use READ-COMMITTED as transaction isolation level. The limitation is that you cannot use SERIALIZABLE: http://galeracluster.com/documentation-webpages/isolationlevels.html.

Galera Cluster and MariaDB currently do not prevent a user from using this transaction isolation level. Percona XtraDB Cluster implemented the strict mode to prevent these operations: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/pxc-strict-mode.html#explicit-table-locking

MariaDB 10.2 fixed the check constraints issue, When will Percona fix this issue?

There are currently no plans to support CHECK constraints in Percona Server for MySQL (and therefore Percona XtraDB Cluster as well).

As Percona Server is effectively a fully backwards-compatible (but modified) MySQL Community Server, CHECK constraints is a feature that normally would be implemented in MySQL Community first.

Can you share your performance benchmark git repository (if you have one)?

We don’t have a performance benchmark in git repository. You can get detailed information about this benchmark in this blog: Performance improvements in Percona XtraDB Cluster 5.7.17-29.20.

On your slide pointing to scalability charts, how many nodes did you run your test against?

We used a three-node cluster for this performance benchmark.

The product is using Master-Master replication. As such what do you mean when you talk about failover in such configuration?
Where do you maintain the cluster state?

All technologies automatically maintain the cluster state as you add and remove nodes.

What are the network/IP requirements for Proxy SQL?

There are no specific requirements. More documentation about ProxySQL can be found here: https://github.com/sysown/proxysql/wiki.

Jun
22
2017
--

ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark)

ClickHouse

ClickHouseIn this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.

We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns.

It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics.

First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html):

CREATE TABLE lineorder
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);

For ClickHouse, the table definition looks like this:

CREATE TABLE lineorderfull (
        LO_ORDERKEY             UInt32,
        LO_LINENUMBER           UInt8,
        LO_CUSTKEY              UInt32,
        LO_PARTKEY              UInt32,
        LO_SUPPKEY              UInt32,
        LO_ORDERDATE            Date,
        LO_ORDERPRIORITY        String,
        LO_SHIPPRIORITY         UInt8,
        LO_QUANTITY             UInt8,
        LO_EXTENDEDPRICE        UInt32,
        LO_ORDTOTALPRICE        UInt32,
        LO_DISCOUNT             UInt8,
        LO_REVENUE              UInt32,
        LO_SUPPLYCOST           UInt32,
        LO_TAX                  UInt8,
        LO_COMMITDATE           Date,
        LO_SHIPMODE             String
)Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192);

From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes.

The second table (RedShift definition):

CREATE TABLE customer
(
  c_custkey      INTEGER NOT NULL,
  c_name         VARCHAR(25) NOT NULL,
  c_address      VARCHAR(25) NOT NULL,
  c_city         VARCHAR(10) NOT NULL,
  c_nation       VARCHAR(15) NOT NULL,
  c_region       VARCHAR(12) NOT NULL,
  c_phone        VARCHAR(15) NOT NULL,
  c_mktsegment   VARCHAR(10) NOT NULL
);

For ClickHouse, I defined as:

CREATE TABLE customerfull (
        C_CUSTKEY       UInt32,
        C_NAME          String,
        C_ADDRESS       String,
        C_CITY          String,
        C_NATION        String,
        C_REGION        String,
        C_PHONE         String,
        C_MKTSEGMENT    String,
        C_FAKEDATE      Date
)Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192);

For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql.

For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future.

To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse.

It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table.

Hardware Setup

One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage.

For the SSB benchmark I use a scale factor of 2500, which provides (in raw data):

Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows

When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio.

We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup.

Single Table Operations

Query:

SELECT
    toYear(LO_ORDERDATE) AS yod,
    sum(LO_REVENUE)
FROM lineorderfull
GROUP BY yod

One node:

7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.)

Three nodes:

7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.)

We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast!

One Table with Filtering

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorderfull
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)

One node:

1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.)

Three nodes:

1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.)

It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system.

Two Tables (Independent Subquery)

In this case, I want to show how Clickhouse handles independent subqueries:

SELECT sum(LO_REVENUE)
FROM lineorderfull
WHERE LO_CUSTKEY IN
(
    SELECT C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
    WHERE C_REGION = 'ASIA'
)

One node:

1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.)

Three nodes:

1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.)

We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY

Two Tables JOIN

With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table.

First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like:

SELECT
    C_REGION,
    sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorderfull
ANY INNER JOIN
(
    SELECT
        C_REGION,
        C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
) USING (LO_CUSTKEY)
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION

One node:

5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.)

Three nodes:

5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.)

In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as:

CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand());
CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand());

Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example:

CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY);
CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY);

Three Tables JOIN

This is where it becomes very complicated. Let’s consider the query that you would normally write:

SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and
LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod;

With Clickhouse’s limitations on JOINs syntax, the query becomes:

SELECT
    sum(LO_REVENUE),
    P_MFGR,
    toYear(LO_ORDERDATE) AS yod
FROM
(
    SELECT
        LO_PARTKEY,
        LO_ORDERDATE,
        LO_REVENUE
    FROM lineorderfull
    ALL INNER JOIN
    (
        SELECT
            C_REGION,
            C_CUSTKEY AS LO_CUSTKEY
        FROM customerfull
    ) USING (LO_CUSTKEY)
    WHERE C_REGION = 'ASIA'
)
ALL INNER JOIN
(
    SELECT
        P_MFGR,
        P_PARTKEY AS LO_PARTKEY
    FROM partfull
) USING (LO_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution.

There is also not much speedup when we compare one node vs. three nodes:

One node execution time:

35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.)

Three nodes execution time:

35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.)

There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!)

Optimized query:

SELECT
    sum(revenue),
    P_MFGR,
    yod
FROM
(
    SELECT
        LO_PARTKEY AS P_PARTKEY,
        toYear(LO_ORDERDATE) AS yod,
        SUM(LO_REVENUE) AS revenue
    FROM lineorderfull
    WHERE LO_CUSTKEY IN
    (
        SELECT C_CUSTKEY
        FROM customerfull
        WHERE C_REGION = 'ASIA'
    )
    GROUP BY
        P_PARTKEY,
        yod
)
ANY INNER JOIN partfull USING (P_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

Optimized query time:

One node:

35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.)

Three nodes:

35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s.

That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data.

Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained.

I will review dictionary performance in future posts.

Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder).

Observations

  • ClickHouse can handle general analytical queries (it requires special schema design and considerations, however)
  • Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality
  • ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes
  • Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization
Jun
21
2017
--

Percona Monitoring and Management 1.1.5 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.1.5 on June 21, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

  • PMM-667: Fixed the Latency graph in the ProxySQL Overview dashboard to plot microsecond values instead of milliseconds.

  • PMM-800: Fixed the InnoDB Page Splits graph in the MySQL InnoDB Metrics Advanced dashboard to show correct page merge success ratio.

  • PMM-1007: Added links to Query Analytics from MySQL Overview and MongoDB Overview dashboards. The links also pass selected host and time period values.

    NOTE: These links currently open QAN2, which is still considered experimental.

Changes in PMM Client

  • PMM-931: Fixed pmm-admin script when adding MongoDB metrics monitoring for secondary in a replica set.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Jun
20
2017
--

Webinar Thursday June 22, 2017: Deploying MySQL in Production

Deploying MySQL

Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 MySQL is famous for being something you can install and get going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup?

This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc.

Register for the webinar here.

Deploying MySQLDaniel Kowalewski, Senior Technical Operations Engineer

Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything.

Jun
20
2017
--

The MySQL High Availability Landscape in 2017 (The Elders)

High Availability

In this blog, we’ll look at different MySQL high availability options.

The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies.

Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL.

The Elders

Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group:

  • Replication
  • Shared storage
  • NDB cluster

Let’s review these technologies in the following sections.

Replication

Simple replication topology

 

MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous:

  • Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks.
  • Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database.
  • Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly.
  • It is well known. No surprises here.
  • Used for failover. Your master died, promote a slave and use it as your new master.
  • Used for backups. You don’t want to overload your master with the backups, run them off a slave.

Of course, replication also has some issues:

  • Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were.
  • Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem.
  • Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings.
  • Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes.

Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication:

  • Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence.
  • Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset.
  • Checksum (since 5.6). Binlog events have checksum values to validate their integrity.
  • Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes.
  • Multi-source replication (since 5.7). Allows a slave to have more than one master.
  • Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag.

Managing replication is a tedious job. The community has written many tools to manage replication:

  • MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used.
  • MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular.
  • PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much.
  • Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.

 

Shared Storage

Simple shared storage topology

 

Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions.

The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment.

Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity.

NDB Cluster

A simple NDB Cluster topology

 

An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes.

An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional.

At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes.

Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications.

Jun
19
2017
--

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

Jun
19
2017
--

MariaDB Server 10.2 GA Release Overview

MariaDB Server 10.2

MariaDB Server 10.2This blog post looks at the recent MariaDB Server 10.2 GA release.

Congratulations to the MariaDB Foundation for releasing a generally available (GA) stable version of MariaDB Server 10.2! We’ll definitely spend the next few weeks talking about MariaDB Server 10.2, but here’s a quick overview in the meantime. Keep in mind that when thinking about compatibility, this is meant to be the equivalent of MySQL 5.7 (GA: October 21, 2015, with Percona Server for MySQL 5.7 GA available February 23, 2016).

Some of the highlights include:

  • Window functions – this is the first release in the MySQL ecosystem that includes Window functions and Recursive Common Table Expression. At the time of this writing, MariaDB hasn’t completed the documentation. It is worth noting that the implementation of Window functions in MariaDB Server 10.2 differs from what you see in MariaDB ColumnStore.
  • JSON functions – Many JSON functions to query, update, index and validate JSON. It’s worth noting that MariaDB Server 10.2 does not include a JSON data type as compared to MySQL 5.7). This means you can’t do CREATE TABLE t1 (jdoc JSON) – instead you need to use a VARCHAR or TEXT column. There are also other differences that produce different result sets, and seemingly no column path operator.
  • There is also support for GeoJSON functionality, but when we tried ST_AsGeoJSON (yes, documentation needs work), we noticed that the output could vary from MySQL 5.7.
  • MyRocks – MariaDB added the hot new storage engine MyRocks as an alpha. You will have to install the MyRocks engine package separably. It isn’t fully merged yet. Watch the umbrella task MDEV-9658.
  • SHOW CREATE USER – A new SHOW CREATE USER statement allows you to look at user limitation, as you can now limit users to a maximum number of queries, updates and connections per hour, as well as a maximum number of connections permitted by the user (see setting account resource limits for the MySQL 5.7 equivalent). You’ll want to read the updated documentation around CREATE USER. Don’t be surprised when you see something like “ERROR 1226 (42000): User ‘foo’ has exceeded the ‘max_user_connections’ resource (current value: 1)”. This also is an extension to ALTER USER.
  • Flashback – binary log based rollback, aka flashback, can rollback tables and databases to an older snapshot. This should help when the DBA or a user makes an error. This tool works well as long as its a DML statement. This feature came from Alibaba’s AliSQL tree.
  • Time delayed replication – new in MariaDB Server 10.2.
  • OpenSSL 1.1 – now there is support for OpenSSL 1.1, LibreSSL
  • MariaDB Connector/C – most importantly, MariaDB Connector/C replaces libmysql (see: MDEV-9055). This should be API and ABI compatible, but naturally there are some teething problems (see: MDEV-12950).
  • Amazon Key Management plugin – from a key management standpoint, the Amazon Key Management plugin is now available to use for encryption. It’s compiled and available as a package. Previously, you had to compile it yourself.

Some of the important things to take note of are:

  • As of this release, MariaDB now ships with InnoDB as the default storage engine (as opposed to Percona XtraDB). This means that from here on out, the improvements and fixes to Percona XtraDB won’t necessarily be available in MariaDB. This also means that Percona XtraDB parameters might get ignored (as reported in MDEV-12472).
  • In MySQL 5.6+, you can use SHA-256 pluggable authentication. However, this features is still not implemented in MariaDB Server 10.2 (see: MDEV-9804). You can use the ed25519 authentication plugin as a replacement, however.
  • When it comes to replication, MySQL 5.7 defaults to row-based replication. MariaDB Server 10.2 defaults to mixed-mode replication (see the discussion around this at MDEV-7635).
  • It is worth noting that in order to make MariaDB Server more “Oracle compatible,” DECIMAL now goes up to 38 decimals instead of 30 decimals. MDEV-10138 tells you what happens when you migrate from a long decimal to a default decimal type install (i.e., if you’re moving to another variant in the MySQL ecosystem).
  • If you’re familiar with how MySQL 5.7 manages passwords and a new install, the MariaDB Server 10.2 method hasn’t changed.

All in all, this release took a little over a year to make (Alpha was 18 April 2016, GA was 23 May 2017). It is extremely important to read the release notes and the changelogs of each and every release, as MariaDB Server diverges from MySQL quite a bit. At Percona, we will monitor Jira closely to ensure that you always stay informed of the latest changes.

Jun
16
2017
--

Peter Zaitsev’s Speaking Schedule: Percona University Belgium / PG Day / Meetups

Peter Zaitsev Speaking Schedule

This blog shows Peter Zaitsev’s speaking schedule for this summer.

Summer 2017 Speaking Engagements

This week I spoke at the DB Tech Showcase OSS conference in Japan and am now heading to Europe. I have a busy schedule in June and early July, but there are events and places where we can cross paths and have a quick conversation. Let’s meet at these events if you need anything from Percona (or me personally). 

Below is a full list of places I’ll be at this summer:

Amsterdam, Netherlands

On June 20 I am speaking at the In-Memory Computing Summit 2017 with Denis Magda (Product Manager, Gridgain Systems). Our talk “Accelerate MySQL® for Demanding OLAP and OLTP Use Cases with Apache® Ignite™” starts at 2:35 pm.

On the same day in Amsterdam, Denis and I will speak at the local MySQL User Group meetupI will share some how-tos for MySQL monitoring with Percona Monitoring and Management (PMM), along with a PMM demo.

Ghent, Belgium

On June 22 we are organizing a Percona University event in Ghent, Belgium, which is a widely known tech hub in the region. I will give several talks there on MySQL, MongoDB and PMM monitoring. Dimitri Vanoverbeke from Percona will discuss MySQL in the Cloud. We have also invited guest speakers: Frederic Descamps from Oracle, and Julien Pivotto from Inuits.

Percona University technical events are 100% free to attend, and so far we are getting very positive attendee feedback on them. To check the full agenda for the Belgium edition, and to register, please use this link.

St. Petersburg, Russia

Percona is sponsoring PG Day’17 Russia, the PostgreSQL conference. This year they added a track on open source databases (and I was happy to be their Committee member for the OSDB track). The conference starts on July 5, and on that day I will give a tutorial on InnoDB Architecture and Performance Optimization. Sveta Smirnova will also present a tutorial on MySQL Performance Troubleshooting.

On July 6-7, you can expect four more talks given by Perconians at PG Day. We invite you to stop by our booth (“Percona”) and ask us any tough questions you might have.

Moscow, Russia

On July 11 I will speak at a Moscow MySQL User Group meetup at the Mail.Ru Group office. While we’re still locking down the agenda, we always have a great selection of speakers at the MMUG meetups. Make sure you don’t miss this gathering!

Thank you, and I hope to see many of you at these events.

Jun
15
2017
--

Three Methods of Installing Percona Monitoring and Management

Installing Percona Monitoring and Management

Installing Percona Monitoring and ManagementIn this blog post, we’ll look at three different methods for installing Percona Monitoring and Management (PMM).

Percona offers multiple methods of installing Percona Monitoring and Management, depending on your environment and scale. I’ll also share comments on which installation methods we’ve decided to forego for now. Let’s begin by reviewing the three supported methods:

  1. Virtual Appliance
  2. Amazon Machine Image
  3. Docker

Virtual Appliance

We ship an OVF/OVA method to make installation as simple as possible, with the least amount of effort required and at the lowest cost to you. You can leverage the investment in your virtualization deployment platform. OVF is an open standard for packaging and distributing virtual appliances, designed to be run in virtual machines.

Using OVA with VirtualBox as a first step is common in order to quickly play with a working PMM system, and get right to adding clients and observing activity within your own environment against your MySQL and MongoDB instances. But you can also use the OVA file for enterprise deployments. It is a flexible file format that can be imported into other popular hypervisor systems such as VMware, Red Hat Virtualization, XenServer, Microsoft System Centre Virtual Machine Manager and others.

We’d love to hear your feedback on this installation method!

AWS AMI

We also have an AWS AMI in order to provide easy scaling of PMM Server in AWS, so that you can deploy onto any instance size required for your monitoring instance. Depending on the AWS region you’re in, you’ll need to choose from the appropriate AMI Instance ID. Soon we’ll be moving to the AWS Marketplace for even easier deployment. When this is implemented, you will no longer need to clone an existing AMI ID.

Docker

Docker is our most common production deployment method. It is easy (three commands) and scalable (tuning passed on the command line to Docker run). While we recognize that Docker is still a relatively new deployment system for many users, it is dramatically gaining adoption. It is also where Percona is investing the bulk of our development efforts. We deploy PMM Server as two Docker containers: one for storing the data that persists across restarts/upgrades, and the other for running the actual PMM Server binaries (Grafana, Prometheus, consul, Orchestrator, QAN, etc.).

Where are the RPM/DEB/tar.gz packages?!

A common question I hear is why doesn’t Percona support binary-based installation?

We hear you: RPM/DEB/tar.gz methods are commonly used today for many of your own applications. Percona is striving for simplicity in our deployment of PMM Server, and we spend considerable development and QA effort validating the specific versions of Grafana/Prometheus/QAN/consul/Orchestrator all work seamlessly together.

Percona wants to ensure OS compatibility and long-term support of PMM, and to do binary distribution “right” means it can quickly get expensive to build and QA across all the popular Linux distributions available today. We’re in no way against binary distributions. For example, see our list of the nine supported platforms for which we provide bug fix support.

Percona decided to focus our development efforts on stability and features, and less on the number of supported platforms. Hence the hyper-focus on Docker. We don’t have any current plans to move to a binary deployment method for PMM, but we are always open to hearing your feedback. If there is considerable interest, then please let me know via the comments below. We’ll take these thoughts into consideration for PMM planning in the second half of 2017.

Which other methods of installing Percona Monitoring and Management would you like to see?

Jun
14
2017
--

MySQL Triggers and Updatable Views

MySQL Triggers

MySQL TriggersIn this post we’ll review how MySQL triggers can affect queries.

Contrary to what the documentation states, we can activate triggers even while operating on views:

https://dev.mysql.com/doc/refman/5.7/en/triggers.html

Important: MySQL triggers activate only for changes made to tables by SQL statements. They do not activate for changes in views, nor by changes to tables made by APIs that do not transmit SQL statements to the MySQL server.

Be on the lookout if you use and depend on triggers, since it’s not the case for updatable views! We have reported a documentation bug for this but figured it wouldn’t hurt to mention this as a short blog post, too. ? The link to the bug in question is here:

https://bugs.mysql.com/bug.php?id=86575

Now, we’ll go through the steps we took to test this, and their outputs. These are for the latest MySQL version (5.7.18), but the same results were seen in 5.5.54, 5.6.35, and MariaDB 10.2.5.

First, we create the schema, tables and view needed:

mysql> CREATE SCHEMA view_test;
Query OK, 1 row affected (0.00 sec)
mysql> USE view_test;
Database changed
mysql> CREATE TABLE `main_table` (
   ->   `id` int(11) NOT NULL AUTO_INCREMENT,
   ->   `letters` varchar(64) DEFAULT NULL,
   ->   `numbers` int(11) NOT NULL,
   ->   `time` time NOT NULL,
   ->   PRIMARY KEY (`id`),
   ->   INDEX col_b (`letters`),
   ->   INDEX cols_c_d (`numbers`,`letters`)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.31 sec)
mysql> CREATE TABLE `table_trigger_control` (
   ->   `id` int(11),
   ->   `description` varchar(255)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.25 sec)
mysql> CREATE VIEW view_main_table AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.02 sec)

Indexes are not really needed to prove the point, but were initially added to the tests for completeness. They make no difference in the results.

Then, we create the triggers for all possible combinations of [BEFORE|AFTER] and [INSERT|UPDATE|DELETE]. We will use the control table to have the triggers insert rows, so we can check if they were actually called by our queries.

mysql> CREATE TRIGGER trigger_before_insert BEFORE INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_after_insert AFTER INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_update BEFORE UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE UPDATE");
Query OK, 0 rows affected (0.19 sec)
mysql> CREATE TRIGGER trigger_after_update AFTER UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER UPDATE");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_delete BEFORE DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "BEFORE DELETE");
Query OK, 0 rows affected (0.18 sec)
mysql> CREATE TRIGGER trigger_after_delete AFTER DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "AFTER DELETE");
Query OK, 0 rows affected (0.05 sec)

As you can see, they will insert the ID of the row in question, and the combination of time/action appropriate for each one. Next, we will proceed in the following manner:

  1. INSERT three rows in the main table
  2. UPDATE the second
  3. DELETE the third

The reasoning behind doing it against the base table is to check that the triggers are working correctly, and doing what we expect them to do.

mysql> INSERT INTO main_table VALUES (1, 'A', 10, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO main_table VALUES (2, 'B', 20, time(NOW()));
Query OK, 1 row affected (0.14 sec)
mysql> INSERT INTO main_table VALUES (3, 'C', 30, time(NOW()));
Query OK, 1 row affected (0.17 sec)
mysql> UPDATE main_table SET letters = 'MOD' WHERE id = 2;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM main_table WHERE id = 3;
Query OK, 1 row affected (0.10 sec)

And we check our results:

mysql> SELECT * FROM main_table;
+----+---------+---------+----------+
| id | letters | numbers | time     |
+----+---------+---------+----------+
|  1 | A       |      10 | 15:19:14 |
|  2 | MOD     |      20 | 15:19:14 |
+----+---------+---------+----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
+------+---------------+
10 rows in set (0.00 sec)

Everything is working as it should, so let’s move on with the tests that we really care about. We will again take the three steps mentioned above, but this time directly on the view.

mysql> INSERT INTO view_main_table VALUES (4, 'VIEW_D', 40, time(NOW()));
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO view_main_table VALUES (5, 'VIEW_E', 50, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO view_main_table VALUES (6, 'VIEW_F', 60, time(NOW()));
Query OK, 1 row affected (0.11 sec)
mysql> UPDATE view_main_table SET letters = 'VIEW_MOD' WHERE id = 5;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM view_main_table WHERE id = 6;
Query OK, 1 row affected (0.01 sec)

And we check our tables:

mysql> SELECT * FROM main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM view_main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
|    4 | BEFORE INSERT |
|    4 | AFTER INSERT  |
|    5 | BEFORE INSERT |
|    5 | AFTER INSERT  |
|    6 | BEFORE INSERT |
|    6 | AFTER INSERT  |
|    5 | BEFORE UPDATE |
|    5 | AFTER UPDATE  |
|    6 | BEFORE DELETE |
|    6 | AFTER DELETE  |
+------+---------------+
20 rows in set (0.00 sec)

As seen in the results, all triggers were executed, even when the queries were run against the view. Since this was an updatable view, it worked. On the contrary, if we try on a non-updatable view it fails (we can force ALGORITHM = TEMPTABLE to test it).

mysql> CREATE ALGORITHM=TEMPTABLE VIEW view_main_table_temp AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO view_main_table_temp VALUES (7, 'VIEW_H', 70, time(NOW()));
ERROR 1471 (HY000): The target table view_main_table_temp of the INSERT is not insertable-into
mysql> UPDATE view_main_table_temp SET letters = 'VIEW_MOD' WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the UPDATE is not updatable
mysql> DELETE FROM view_main_table_temp WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the DELETE is not updatable

As mentioned before, MariaDB shows the same behavior. The difference, however, is that the documentation is correct in mentioning the limitations, since it only shows the following:

https://mariadb.com/kb/en/mariadb/trigger-limitations/

Triggers cannot operate on any tables in the mysql, information_schema or performance_schema database.

Corollary to the Discussion

It’s always good to thoroughly check the documentation, but it’s also necessary to test things and prove the documentation is showing the real case (bugs can be found everywhere, not just in the code :)).

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com