In our previous post, we introduced the MySQL Fabric utility and said we would dig deeper into it. This post is the first part of our test of MySQL Fabric’s High Availability (HA) functionality.
Today, we’ll review MySQL Fabric’s HA concepts, and then walk you through the setup of a 3-node cluster with one Primary and two Secondaries, doing a few basic tests with it. In a second post, we will spend more time generating failure scenarios and documenting how Fabric handles them. (MySQL Fabric is an extensible framework to manage large farms of MySQL servers, with support for high-availability and sharding.)
Before we begin, we recommend you read this post by Oracle’s Mats Kindahl, which, among other things, addresses the issues we raised on our first post. Mats leads the MySQL Fabric team.
Our lab
All our tests will be using our test environment with Vagrant (https://github.com/martinarrieta/vagrant-fabric)
If you want to play with MySQL Fabric, you can have these VMs running in your desktop following the instructions in the README file. If you don’t want full VMs, our colleague Jervin Real created a set of wrapper scripts that let you test MySQL Fabric using sandboxes.
Here is a basic representation of our environment.
Set up
To set up MyQSL Fabric without using our Vagrant environment, you can follow the official documentation, or check the ansible playbooks in our lab repo. If you follow the manual, the only caveat is that when creating the user, you should either disable binary logging for your session, or use a GRANT statement instead of CREATE USER. You can read here for more info on why this is the case.
A description of all the options in the configuration file can be found here. For HA tests, the one thing to mention is that, in our experience, the failure detector will only trigger an automatic failover if the value for failover_interval in the [failure_tracking] section is greater than 0. Otherwise, failures will be detected and written to the log, but no action will be taken.
MySQL configuration
In order to manage a mysqld instance with MySQL Fabric, the following options need to be set in the [mysqld] section of its my.cnf file:
log_bin
gtid-mode=ON
enforce-gtid-consistency
log_slave_updates
Additionally, as in any replication setup, you must make sure that all servers have a distinct server_id.
When everything is in place, you can setup and start MySQL Fabric with the following commands:
[vagrant@store ~]$ mysqlfabric manage setup
[vagrant@store ~]$ mysqlfabric manage start --daemon
The setup command creates the database schema used by MySQL Fabric to store information about managed servers, and the start one, well, starts the daemon. The –daemon option makes Fabric start as a daemon, logging to a file instead of to standard output. Depending on the port and file name you configured in fabric.cfg, this may need to be run as root.
While testing, you can make MySQL Fabric reset its state at any time (though it won’t change existing node configurations such as replication) by running:
[vagrant@store ~]$ mysqlfabric manage teardown
[vagrant@store ~]$ mysqlfabric manage setup
If you’re using our Vagrant environment, you can run the reinit_cluster.sh script from your host OS (from the root of the vagrant-fabric repo) to do this for you, and also initialise the datadir of the three instances.
Creating a High Availability Cluster:
A High Availability Cluster is a set of servers using the standard Asynchronous MySQL Replication with GTID.
Creating a group
The first step is to create the group by running mysqlfabric with this syntax:
$ mysqlfabric group create <group_name>
In our example, to create the cluster “mycluster” you can run:
[vagrant@store ~]$ mysqlfabric group create mycluster
Procedure :
{ uuid = 605b02fb-a6a1-4a00-8e24-619cad8ec4c7,
finished = True,
success = True,
return = True,
activities =
}
Add the servers to the group
The second step is add the servers to the group. The syntax to add a server to a group is:
$ mysqlfabric group add <group_name> <host_name or IP>[:port]
The port number is optional and only required if distinct from 3306. It is important to mention that the clients that will use this cluster must be able to resolve this host or IP. This is because clients will connect directly both with MySQL Fabric’s XML-PRC server and with the managed mysqld servers. Let’s add the nodes to our group.
[vagrant@store ~]$ for i in 1 2 3; do mysqlfabric group add mycluster node$i; done
Procedure :
{ uuid = 9d65c81c-e28a-437f-b5de-1d47e746a318,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = 235a7c34-52a6-40ad-8e30-418dcee28f1e,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = 4da3b1c3-87cc-461f-9705-28a59a2a4f67,
finished = True,
success = True,
return = True,
activities =
}
Promote a node as a master
Now that we have all our nodes in the group, we have to promote one of them. You can promote one specific node or you can let MySQL Fabric to choose one for you.
The syntax to promote a specific node is:
$ mysqlfabric group promote <group_name> --slave_uuid='<node_uuid>'
or to let MySQL Fabric pick one:
$ mysqlfabric group promote <group_name>
Let’s do that:
[vagrant@store ~]$ mysqlfabric group promote mycluster
Procedure :
{ uuid = c4afd2e7-3864-4b53-84e9-04a40f403ba9,
finished = True,
success = True,
return = True,
activities =
}
You can then check the health of the group like this:
[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success = True
return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
activities =
}
One current limitation of the ‘health’ command is that it only identifies servers by their uuid. To get a list of the servers in a group, along with quick status summary, and their host names, use lookup_servers instead:
[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster
Command :
{ success = True
return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'SECONDARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node2'}, {'status': 'PRIMARY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]
activities =
}
We sent a merge request to use a Json string instead of the “print” of the object in the “return” field from the XML-RPC in order to be able to use that information to display the results in a friendly way. In the same merge, we have added the address of the servers in the health command too.
Failure detection
Now we have the three lab machines set up in a replication topology of one master (the PRIMARY server) and two slaves (the SECONDARY ones). To make MySQL Fabric start monitoring the group for problems, you need to activate it:
[vagrant@store ~]$ mysqlfabric group activate mycluster
Procedure :
{ uuid = 230835fc-6ec4-4b35-b0a9-97944c18e21f,
finished = True,
success = True,
return = True,
activities =
}
Now MySQL Fabric will monitor the group’s servers, and depending on the configuration (remember the failover_interval we mentioned before) it may trigger an automatic failover. But let’s start testing a simpler case, by stopping mysql on one of the secondary nodes:
[vagrant@node2 ~]$ sudo service mysqld stop
Stopping mysqld: [ OK ]
And checking how MySQL Fabric report’s the group’s health after this:
[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success = True
return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': False, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
activities =
}
We can see that MySQL Fabric successfully marks the server as faulty. In our next post we’ll show an example of this by using one of the supported connectors to handle failures in a group, but for now, let’s keep on the DBA/sysadmin side of things, and try to bring the server back online:
[vagrant@node2 ~]$ sudo service mysqld start
Starting mysqld: [ OK ]
[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success = True
return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
activities =
}
So the server is back online, but Fabric still considers it faulty. To add the server back into rotation, we need to look at the server commands:
[vagrant@store ~]$ mysqlfabric help server
Commands available in group 'server' are:
server set_weight uuid weight [--synchronous]
server lookup_uuid address
server set_mode uuid mode [--synchronous]
server set_status uuid status [--update_only] [--synchronous]
The specific command we need is set_status, and in order to add the server back to the group, we need to change it’s status twice: first to SPARE and then back to SECONDARY. You can see what happens if we try to set it to SECONDARY directly:
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY
Procedure :
{ uuid = 9a6f2273-d206-4fa8-80fb-6bce1e5262c8,
finished = True,
success = False,
return = ServerError: Cannot change server's (e826d4ab-d889-11e3-86df-0800274fb806) status from (FAULTY) to (SECONDARY).,
activities =
}
So let’s try it the right way:
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SPARE
Procedure :
{ uuid = c3a1c244-ea8f-4270-93ed-3f9dfbe879ea,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY
Procedure :
{ uuid = 556f59ec-5556-4225-93c9-b9b29b577061,
finished = True,
success = True,
return = True,
activities =
}
And check the group’s health again:
[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success = True
return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
activities =
}
In our next post, when we discuss how to use the Fabric aware connectors, we’ll also test other failure scenarios like hard VM shutdown and network errors, but for now, let’s try the same thing but on the PRIMARY node instead:
[vagrant@node3 ~]$ sudo service mysqld stop
Stopping mysqld: [ OK ]
And let’s check the servers again:
[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster
Command :
{ success = True
return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'PRIMARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node2'}, {'status': 'FAULTY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]
activities =
}
We can see that MySQL Fabric successfully marked node3 as FAULTY, and promoted node2 to PRIMARY to resolve this. Once we start mysqld again on node3, we can add it back as SECONDARY using the same process of setting it’s status to SPARE first, as we did for node2 above.
Remember that unless failover_interval is greater than 0, MySQL Fabric will detect problems in an active group, but it won’t take any automatic action. We think it’s a good thing that the value for this variable in the documentation is 0, so that automatic failover is not enabled by default (if people follow the manual, of course), as even in mature HA solutions like Pacemaker, automatic failover is something that’s tricky to get right. But even without this, we believe the main benefit of using MySQL Fabric for promotion is that it takes care of reconfiguring replication for you, which should reduce the risk for error in this process, specially once the project becomes GA.
What’s next
In this post we’ve presented a basic replication setup managed by MySQL Fabric and reviewed a couple of failure scenarios, but many questions are left unanswered, among them:
- What happens to clients connected with a Fabric aware driver when there is a status change in the cluster?
- What happens when the XML-RPC server goes down?
- How can we improve its availability?
We’ll try to answer these and other questions in our next post. If you have some questions of your own, please leave them in the comments section and we’ll address them in the next or other posts, depending on the topic.
The post High Availability with MySQL Fabric: Part I appeared first on MySQL Performance Blog.