Jan
31
2013
--

Feature in details: Incremental state transfer after a node crash in Percona XtraDB Cluster

With our newest release of Percona XtraDB Cluster, I would like to highlight a very nice ability to recovery a node and bring it back to the cluster with an incremental transfer after a crash.
This feature was available even in previous release, but now I want to give some details.

So, MySQL crashes from time to time and this a fact of life. HA solution is exactly needed to deal with an one node failure and allowing whole cluster continuing to work.

The idea is, if a node crashed, after it recovered – we just transfer all changes that happened in the cluster, while the node was down. It sounds easy in words, but proven hard when it comes to implementation. It all comes to the question: if mysqld crashes, how do we know what is the last transaction was executed. For a single InnoDB instance it is easy, there is always LSN, which is used for recovery, but in a cluster all nodes have their individual LSNs. Instead Cluster uses Global Transaction ID (GTID), in form
50176f05-69b5-11e2-0800-930817fe924a:8549230.
So, how can we store GTID so it is available after a crash? Of course there is always a good all way to store it in a separate file, which however will require an additional fsync call for each transaction, and it is know performance killer.

Instead, we store GTID in InnoDB system area, which is updated for each transaction. So even the system crashes, we can access information about the last commited transaction.
In XtraDB Cluster you can access this information by calling mysqld with option:
mysqld --wsrep-recover
and having this information, we can force the node to start forcing to use initial GTID:, i.e
mysqld --wsrep_start_position=50176f05-69b5-11e2-0800-930817fe924a:8549230

In fact the same methodology can be used if to restore nodes from backup.
We can start all nodes from an identical starting position, so they all will assume to start on identical data. Well, you can do it even on not identical data, but you know that you do not have a consistent cluster in this case.

As it all may sound complicated, this logic
--wsrep-recover / --wsrep_start_position= is implemented in mysqld_safe script, so you have it out of box.

Here is how a start process looks like in an error.log:

130129 23:01:22 mysqld_safe Starting mysqld daemon with databases from /mnt/data/mysql
130129 23:01:22 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.l1zf2BIDja
130129 23:01:45 mysqld_safe WSREP: Recovered position 50176f05-69b5-11e2-0800-930817fe924a:8549230
130129 23:01:45 mysqld_safe WSREP: Recovered position 50176f05-69b5-11e2-0800-930817fe924a:8549230
130129 23:01:45 [Note] WSREP: wsrep_start_position var submitted: '50176f05-69b5-11e2-0800-930817fe924a:8549230'
130129 23:01:45 [Note] WSREP: Read nil XID from storage engines, skipping position init
130129 23:01:45 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/mysql/lib/libgalera_smm.so'
130129 23:01:45 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <info@codership.com> loaded succesfully.
...
130129 23:01:45 [Note] WSREP: Setting initial position to 50176f05-69b5-11e2-0800-930817fe924a:8549230
130129 23:01:56 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8549242)
130129 23:01:56 [Note] WSREP: State transfer required: 
        Group state: 50176f05-69b5-11e2-0800-930817fe924a:8549242
        Local state: 50176f05-69b5-11e2-0800-930817fe924a:8549230
....
130129 23:01:58 [Note] WSREP: Prepared IST receiver, listening at: tcp://208.88.225.242:4568
130129 23:01:58 [Note] WSREP: Node 5 (node3) requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor.
130129 23:01:58 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8549306)
130129 23:01:58 [Note] WSREP: Requesting state transfer: success, donor: 0
130129 23:01:58 [Note] WSREP: SST complete, seqno: 8549230
...
130129 23:02:01 [Note] WSREP: Receiving IST: 12 writesets, seqnos 8549230-8549242
130129 23:02:01 [Note] /usr/local/mysql/bin/mysqld: ready for connections.

So what happens there? Basically at start node detects that the last transaction had GTID:
50176f05-69b5-11e2-0800-930817fe924a:8549230, but joining cluster, it figures out
that the cluster already at position
50176f05-69b5-11e2-0800-930817fe924a:8549242,
and to catch up, the node has to recieve 12 events, which succesfully happen there:
130129 23:02:01 [Note] WSREP: Receiving IST: 12 writesets, seqnos 8549230-8549242

After applying 12 events locally the node is ready and succesfully joins cluster.

You may try it with the latest release Percona XtraDB Cluster 5.5.29

The post Feature in details: Incremental state transfer after a node crash in Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com