Jun
19
2017
--

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

Sep
17
2015
--

Clarification on “Call me Maybe: MariaDB Galera Cluster”

Recently Aphyr (Kyle Kingsbury) published https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster

The article is technically valid, I am not going to dispute a conclusion Aphyr made, but it is also quite technically involved, so users who just jump to conclusion may get the wrong impression and we’re left with more questions than ever.

So, let me state what is the real conclusion of this article:
“Galera cluster does not support SNAPSHOT ISOLATION LEVEL, in contract to what was stated in the documentation”.
Following that conclusion is using Galera cluster may result in “corrupted” data.

I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.

So with this clarification, the Aphyr’s conclusion that Galera Cluster (and it affects both MariaDB Galera and Percona XtraDB Cluster products)
does not support SNAPSHOT ISOLATION and may leave data in inconsistent state is valid.
But there I need to add quite IMPORTANT addition: it may leave data in inconsistent state
if you use SPECIAL TYPE of transactions in default isolation levels that Aphyr uses in his test.
Moreover, if we test the same workload on a simple single instance InnoDB, we will get the same result.

Before getting too scary of “inconsistent data”, let’s review what kind of transactions are used and what are practical implications.

Aphyr uses following logic:
Assume we have a table

CREATE TABLE `accounts` (
  `account_id` int(11) NOT NULL,
  `balance` int(11) NOT NULL,
  PRIMARY KEY (`account_id`)
) ENGINE=InnoDB

We have N rows in table accounts, and each row populated with initial balance 100.
That results in SUM(balance) FROM accounts == 100*N

Application logic: Execute following transactions concurrently:

BEGIN (start transactions)
// in application: renerate random $fromAccount in a range [1;N]
SELECT balance FROM accounts WHERE account_id=$fromAccount
// in application read balance into $from_balance variable
// in application: renerate random $toAccount in a range [1;N], but != $fromAccount
SELECT balance FROM accounts WHERE account_id=$toAccount
// in application read balance into $to_balance variable
// in application, generate random amount to move from one account to another, as $moveAmt
// in application, calculate new balance for account1: $newBalance1=$from_balance-$moveAmt
// in application, calculate new balance for account2: $newBalance2=$to_balance+$moveAmt
// execute queries:
UPDATE account SET balance=$newBalance1 WHERE account_id=$fromAccount
UPDATE account SET balance=$newBalance2 WHERE account_id=$toAccount
COMMIT;

As you see it includes some application logic, so on database side, the transactions looks like:
(assuming we move 25 from account 5 to 8)

BEGIN
SELECT balance FROM accounts WHERE account_id=5
SELECT balance FROM accounts WHERE account_id=8
UPDATE account SET balance=75 WHERE account_id=5
UPDATE account SET balance=125 WHERE account_id=8
COMMIT;

Aphyr’s proves that these transactions executed concurrently should keep balances consistent (that is SUM(balances)==N*100) if database support SNAPSHOT ISOLATION or SERIALIZABLE isolation levels.
In his test he shows that running on Galera cluster these transactions executing concurrently results in inconsistent balance, therefore Galera cluster does not support SNAPSHOT ISOLATION level.

This however is totally expected.
Moreover, if you try this test on a single server against InnoDB in REPEATABLE-READ (default) mode,
you also will end up in inconsistent state (you can find my code here: https://github.com/vadimtk/go-trx-iso-test).

This is because how InnoDB handles REPEATABLE-READ mode (one may argue that InnoDB’s REPEATABLE-READ is weaker
than standard defined REPEATABLE-READ, and it is more closer to READ-COMMITED. This is a good opportunity for Asphyr to start another FUD “Call Me Maybe: InnoDB”). In simplified terms, InnoDB executes reads in repeatable-read mode, and writes or locked-read in read-committed mode.

What does it mean from practical standpoint?
From my opinion these transactions are little bit artificial (although are totally valid).

If you use this in a real life, the more obvious way to write these transactions is:

BEGIN
SELECT balance FROM accounts WHERE account_id=5
SELECT balance FROM accounts WHERE account_id=8
UPDATE account SET balance=balance-25 WHERE account_id=5
UPDATE account SET balance=balance+25 WHERE account_id=8
COMMIT;

If you do this, it will NOT produce an inconsistent state.

Another way to handle this (for a single server InnoDB) is to use SERIALIZABLE isolation level (with an expected performance penalty).
Unfortunately Galera Cluster does not support SERIALIZABLE isolation level, as it does not pass read-set between nodes,
and node communication happens on COMMIT stage.

Third way, MySQL also provides an extension: SELECT .. FOR UPDATE statements to handle cases exactly like these.
So if you want to keep REPEATABLE-READ and original transactions, you will need to rewrite this as

BEGIN
SELECT balance FROM accounts WHERE account_id=5 FOR UPDATE;
SELECT balance FROM accounts WHERE account_id=8 FOR UPDATE;
UPDATE account SET balance=75 WHERE account_id=5
UPDATE account SET balance=125 WHERE account_id=8
COMMIT;

This will result in a consistent state for table accounts and will work for both single InnoDB and multi-node Percona XtraDB Cluster deployments.

One thing to remember, that with Percona XtraDB Cluster you may get a DEADLOCK error trying to execute "COMMIT" statement, so your application should be ready to handle this error, rollback and repeat the transaction if needed.

So in conclusion of my post:

Using transactions described in https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster may result in inconsistent state (not in data corrupted state!), for both Galera Cluster and single instance InnoDB. But this is because these transactions do not use properly InnoDB’s REPETABLE-READ isolation level. To reflect InnoDB’s requirement we need to use “SELECT ... FOR UPDATE” or to rewrite transactions in a described way.

UPDATE 18-Sep-2015.
Based on Twitter comments and comments from https://news.ycombinator.com/item?id=10238690, I would like to add following.

The post Clarification on “Call me Maybe: MariaDB Galera Cluster” appeared first on MySQL Performance Blog.

Jul
31
2014
--

MariaDB: Selective binary logs events

In the first post in a series on MariaDB features we find interesting, we begin with selectively skipping replication of binlog events. This feature is available on MariaDB 5.5 and 10.

By default when using MySQL’s standard replication, all events are logged in the binary log and those binary log events are replicated to all slaves (it’s possible to filter out some schema). But with this feature, it’s also possible to bypass some events to be replicated on the slave(s) even if they are written in the binary log. Having those event in the binary logs is always useful for point-in-time recovery.

Indeed, usually when we need to not replicate an event, we set sql_log_bin = 0 and the event is bypassed: neither written into the binlog, neither replicated to slave(s).

So with this new feature, it’s possible to just set a session variable to tag events that will be written into the binary log and bypassed on demand on some slaves.

And it’s really easy to use, on the master you do:

set skip_replication=1;

and on the slave(s) having replicate_events_marked_for_skip='FILTER_ON_MASTER' or 'FILTER_ON_SLAVE' the events skipped on the master won’t be replicated.

The valid values for replicate_events_marked_for_skip are:

  • REPLICATE (default) : skipped events are replicated on the slave
  • FILTER_ON_SLAVE : events so marked will be skipped on the slave and not replicated
  • FILTER_ON_MASTER : the filtering will be done on the master so the slave won’t even receive it and then save network bandwidth

That’s a cool feature but when this can be very useful?

Use case:

For archiving this can be very interesting. Indeed most of the time when people is archiving data, they use something like pt-archiver that deletes the data and copy the removed data on an archive server.

Thanks to this feature, instead of having an archiving server where we copy the deleted data, it’s possible to have a slave where we won’t delete the data. This will be much faster (smarter?) and allows to have an archiving server always up to date. Of course in this case sql_log_bin = 0 would have worked (if we ignore the point-in-time recovery).
But with a Galera Cluster? Yes that’s where this feature is really cool, if we would have used sql_log_bin = 0 on a Galera Cluster node, all other nodes would have ignored the delete and the result would be inconsistency between the nodes.

So if you use an asynchronous slave as an archiving server of a Galera Cluster, this feature is really mandatory.

As illustrated below, you can have a MariaDB Galera Cluster node joining a Percona XtraDB Cluster that will be used to delete historical data using pt-archiver:

archiving

pt-archiver is started with --set-vars "skip_replication=1"

The post MariaDB: Selective binary logs events appeared first on MySQL Performance Blog.

Jul
20
2014
--

A schema change inconsistency with Galera Cluster for MySQL

I recently worked on a case where one node of a Galera cluster had its schema desynchronized with the other nodes. And that was although Total Order Isolation method was in effect to perform the schema changes. Let’s see what happened.

Background

For those of you who are not familiar with how Galera can perform schema changes, here is a short recap:

  • Two methods are available depending on the value of the wsrep_OSU_method setting. Both have benefits and drawbacks, it is not the main topic of this post.
  • With TOI (Total Order Isolation), a DDL statement is performed at the same point in the replication flow on all nodes, giving strong guarantees that the schema is always identical on all nodes.
  • With RSU (Rolling Schema Upgrade), a DDL statement is not replicated to the other nodes. Until the DDL statement has been executed on all nodes, the schema is not consistent everywhere (so you should be careful not to break replication).

You can look at the official document here.

If you read carefully the section on TOI, you will see that “[…] TOI transactions will never fail certification and are guaranteed to be executed.” But also that “The system replicates the TOI query before execution and there is no way to know whether it succeeds or fails. Thus, error checking on TOI queries is switched off.”

Confusing? Not really. It simply means that with TOI, a DDL statement will always pass certification. But if for some reason, the DDL statement fails on one of the nodes, it will not be rolled back on the other nodes. This opens the door for schema inconsistencies between nodes.

A test case

Let’s create a table on a 3-node Percona XtraDB Cluster 5.6 cluster and insert a few rows:

pxc1> create table t (id int not null auto_increment primary key, c varchar(10));
pxc1> insert into t (c) values ('aaaa'),('bbbb');

Then on node 3, let’s introduce a schema change on t that can make other schema changes fail:

pxc3> set global wsrep_OSU_method=RSU;
pxc3> alter table t add d int;
pxc3> set global wsrep_OSU_method=TOI;

As the schema change was done on node 3 with RSU, it is not replicated to the other nodes.

Now let’s try another schema change on node 1:

pxc1> alter table t add d varchar(10);
Query OK, 0 rows affected (0,14 sec)
Records: 0  Duplicates: 0  Warnings: 0

Apparently everything went well and indeed on node 1 and 2, we have the correct schema:

pxc2>show create table tG
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE t (
  id int(11) NOT NULL AUTO_INCREMENT,
  c varchar(10) DEFAULT NULL,
  d varchar(10) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

But on node 3, the statement failed so the schema has not been changed:

pxc3> show create table tG
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE t (
  id int(11) NOT NULL AUTO_INCREMENT,
  c varchar(10) DEFAULT NULL,
  d int(11) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

The error is visible in the error log of node 3:

2014-07-18 10:37:14 9649 [ERROR] Slave SQL: Error 'Duplicate column name 'd'' on query. Default database: 'repl_test'. Query: 'alter table t add d varchar(10)', Error_code: 1060
2014-07-18 10:37:14 9649 [Warning] WSREP: RBR event 1 Query apply warning: 1, 200
2014-07-18 10:37:14 9649 [Warning] WSREP: Ignoring error for TO isolated action: source: 577ffd51-0e52-11e4-a30e-4bde3a7ad3f2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 3 trx_id: -1 seqnos (l: 17, g: 200, s: 199, d: 199, ts: 7722177758966)

But of course it is easy to miss. And then a simple INSERT can trigger a shutdown on node3:

pxc2> insert into t (c,d) values ('cccc','dddd');
Query OK, 1 row affected (0,00 sec)

will trigger this on node 3:

2014-07-18 10:42:27 9649 [ERROR] Slave SQL: Column 2 of table 'repl_test.t' cannot be converted from type 'varchar(10)' to type 'int(11)', Error_code: 1677
2014-07-18 10:42:27 9649 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 201
2014-07-18 10:42:27 9649 [Warning] WSREP: Failed to apply app buffer: seqno: 201, status: 1
	 at galera/src/trx_handle.cpp:apply():340
[...]
2014-07-18 10:42:27 9649 [Note] WSREP: Received NON-PRIMARY.
2014-07-18 10:42:27 9649 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 201)
2014-07-18 10:42:27 9649 [Note] WSREP: Received self-leave message.
2014-07-18 10:42:27 9649 [Note] WSREP: Flow-control interval: [0, 0]
2014-07-18 10:42:27 9649 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-07-18 10:42:27 9649 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 201)
2014-07-18 10:42:27 9649 [Note] WSREP: RECV thread exiting 0: Success
2014-07-18 10:42:27 9649 [Note] WSREP: recv_thread() joined.
2014-07-18 10:42:27 9649 [Note] WSREP: Closing replication queue.
2014-07-18 10:42:27 9649 [Note] WSREP: Closing slave action queue.
2014-07-18 10:42:27 9649 [Note] WSREP: bin/mysqld: Terminated.

Conclusion

As on regular MySQL, schema changes are challenging with Galera. Some subtleties can create a lot of troubles if you are not aware of them. So before running DDL statement, make sure you fully understand how TOI and RSU methods work.

The post A schema change inconsistency with Galera Cluster for MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com