Enabling crash-safe slaves with MySQL 5.6

Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However we noticed confusion on how to enable this feature correctly, so let’s clarify how it should be done.

In short

1. Stop MySQL on slave
2. Add relay_log_info_repository = TABLE and relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax

The gory details

To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.

On a slave, replication involves 2 threads: the IO thread which copies the binary log of the master to a local copy called the relay log and the SQL thread which then executes the queries written in the relay log. The current position of each thread is stored in a file: for the IO thread and for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to: whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free of course but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could executed twice. The effect will depend on the transaction: replication may still run fine or it may be broken or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

In pseudo-code, instead of having:

-- Statement 1
-- ...
-- Statement N
-- Update replication info files

The server now behaves as if we had:

-- Statement 1
-- ...
-- Statement N
-- Update replication info

Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the slave_relay_info_info table at the same time it commits a transaction. However for the IO thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?

The answer is: it is controlled by sync_master_info. The default is 10,000 meaning that the IO thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as mentioned, it may have a performance impact (this is why 1 is not the default setting).

However there is a more elegant solution by using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, position for the IO thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus you do not even need to store IO thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

As a final sidenote, once relay_log_info_repository = TABLE, the sync_relay_log_info setting becomes irrelevant as the table will always be updated at each transaction commit whatever the value of the setting. So you can safely remove it from your configuration file.

I hope this post will help you benefit from this great feature!

The post Enabling crash-safe slaves with MySQL 5.6 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by