Aug
04
2014
--

Q&A: Putting MySQL Fabric to use

Percona MySQL webinar followup: Q&AMartin Arrieta and I gave an online presentation last week on “Putting MySQL Fabric To Use.” If you missed it, you can find a recording and the slides here, and the vagrant environment we used plus a transcript of the commands we ran here (be sure to check out the ‘sharding’ branch, as that’s what we used during the webinar).

Thank you all for attending and asking interesting questions. We were unable to answer all of them in the scheduled time, so here are our replies to all the questions.

What is GTID? And how does it relate to MySQL Fabric?
GTID stands for Global Transaction Identifier, and MySQL Fabric requires MySQL replication to be configured to used this in order to work. This means that it won’t work with MySQL versions previous to 5.6. You can find more information in the manual, and in the upcoming Percona webinar, “Using MySQL Global Transaction IDs in Production,” that our colleague Stephane Combaudon will present on August 27.

During any issue, does MySQL Fabric internally do a switch over or failover?
MySQL Fabric will only detect failures and initiate failovers for a group that has been activated. Otherwise, any changes must be made manually through the mysqlfabric command.

For an alter table, how do you avoid replication lag?
We think using pt-online-schema for the alter table is a good way to avoid or minimize this problem.

Are there benchmarks available for MySQL Fabric?
We’re not aware of any, but once we wrap up our current series of blog posts on the topic, we plan to do some benchmarks ourselves. Our interest lies in the overhead the connectors can place vs a standard connector accessing a single MySQL Server, but if you have other ideas, let us know in the comments and we’ll consider adding them to our list.

Can MySQL Fabric be used to handle Big Data?
We think the HA features are usable regardless of data set size, with the caveat that MySQL Fabric does not (currently) handle provisioning the data of the managed instances. It just configures and points replication as required, but you’re responsible for making sure all servers have the same set of data.

We feel that the sharding features, however, may not be a good fit for working with very large data sets yet, specially because of the use of mysqldump/mysql on the shard split/move operations. However, since sharding usually goes hand in hand with big data (a data set size too large is one of the reasons to shard after all) we’re sure this will get improved in the future. Or, someone with some python skills can adjust the scripts to use a more efficient mechanism to move data from one server to another, like Percona XtraBackup.

Does sharding require many changes to application coding (database code on shards etc) at the MySQL Fabric level? Should we sit with the developers/architects to help them understand tables, their usage, and to plan sharding?
Sharding with MySQL Fabric is not transparent (and we’d say sharding, in general, is typically not transparent) so some application coding and DBAs/Devs interaction will be needed. But you should be working close to your developers and architects (or DBAs and sysadmins, depending on which side of the counter you stand) anyway :)

Would you say that MySQL Fabric is a good HA replacement removing the need for VIPs?
We don’t see them as mutually exclusive. Simplifying, the HA features of MySQL Fabric are just another way to manage replication between a set of MySQL servers. It doesn’t force any way to make writable and readable nodes available to clients. With some coding, you can use MySQL Fabric together with VIPs, or dynamic DNS entries, or a load balancer like HA Proxy.

Does global group mean fabric node?
If you did not attend the webinar, we’d recommend you take a look at slide 7 to get the required context for this question.

Global group means a group that has the schema for all tables, all the data for global tables, and no data for sharded tables. Since it’s a group, it can be one or more nodes, but at any given time, there will only be one PRIMARY node in the group, and the PRIMARY nodes on all shard groups will be replication slaves of this node.

Is the config file at the application side or on the MySQL Fabric node?
There is some configuration at both ends. The MySQL Fabric node (the ‘store’ server in our Vagrant setup) has some configuration, but applications need their own too. The separation is rather clean though. Typically, the application only needs to know how to reach the MySQL Fabric server (host, port, credentials), while the MySQL Fabric node needs to know how to connect to the nodes it manages, how long to wait before starting a failover, etc. The configuration on MySQL Fabric’s side is done by the fabric.cfg file, while on the application side, it depends on the connector used.

When setting a faulty to secondary, does it also automatically come back in replication as a slave?
Yes, though remember you’ll first have to set it to SPARE :)

When we set primary and secondary and it sets the replication, it never copies data and builds replication? It only starts replication from that side and we will lose all the data which is not available on secondary
Yes to the first question. As to the second, we reiterate the fact that MySQL Fabric does not handle provisioning data for instances. When you’re using it to manage nodes, it (currently) assumes the nodes all have a copy of the data. It is your responsibility to make sure that is the case, though this may change in the future. To give you a specific example, suppose you run the following commands:


mysqlfabric group create ha
mysqlfabric group add ha node1
mysqlfabric group add ha node2

Then, on node1′s MySQL’s CLI:

create database test;

and finally run:

mysqlfabric group promote ha --slave_id=<uuid_for_node2>

You’ll end up with node2 being PRIMARY, node1 being SECONDARY, without the ‘test’ database on the PRIMARY node.

Are there any limitations around adding shards? Is it possible to re-configure the sharding keys to handle the additional shard(s)?
We don’t know about any limitations, and yes, it is possible to add more shards and configure the mapping. We did not have enough time to do this at the webinar, but the transcript on the repo includes examples for splitting and moving shards. What you want in this case is to add a new group (can be a single node) and then split a shard (optionally providing a pivot value), keeping one part on the original group and the rest on the new one. Fernando encountered one problem when doing this on a busy server, though the bug is not yet verified so that may have been a user error on his side, but we’d recommend you do similar tests yourself before attempting this in production.

Is the addition of Connectors at the MySQL Fabric level or even at application side? Is an application code change a huge effort?
It’s at the application side. It’s difficult to estimate the effort for the code changes, but if your database access is reasonably modularized/abstracted from your application logic, it shouldn’t be huge. In fact, for HA, if you don’t want to split read and writes, you can migrate an application to a MySQL Fabric-aware connector with just a couple of lines changed (changing the connection string and then requesting a MODE_READWRITE connection always).

What is the minimum server requirements to setup mysqlfabric and does it use any multithreading if i have more than one core for each node. Can we get the transcripts with examples..
We don’t know of the official minimum requirements, but it is a python script which in our experience has been lightweight. Remember it does *not* act as a middle man/proxy between your app and the servers, it is only involved when a client establishes a new connection or when a change in status happens in a server. As for multithreading, we know it’s concurrent (i.e. multiple clients can establish connections at the same time) but we don’t know if it’s parallel.

The transcript with examples can be found here.

How does MySQL Fabric handle latency issues? Does latency increase with mysql traffic?
MySQL Fabric uses regular MySQL connections so latency will affect it the same as any other app using MySQL, and yes, it will increase with traffic (but not due to Fabric).

What if MySQL Fabric is hung? Will connections going to the primary stop? And how can we deduce if there is an issue at the fabric side?
If MySQL Fabric is hung, the following scenarios will be problematic:

  • New connections come in. Clients will try to connect to Fabric before trying to connect any hosts (they won’t know what hosts to connect to otherwise!) so if MySQL Fabric is down, no new connections can be established.
  • A failure happens in the managed servers. MySQL Fabric is responsible for monitoring for failures and taking the appropriate actions (including promoting another server, repointing replicas, etc), so if something fails while MySQL Fabric is down, there will be nobody to take action.

Are there any scenarios where a data loss can happen when promoting/demoting nodes (or nodes failure happens) in a production environment?
Given the use of gtid-enforce-consistency and the fact that MySQL Fabric won’t promote a node to PRIMARY until it has caught up with any pending changes, we feel this is unlikely, but we’re planning a blog post specifically focused on evaluating the potential of data loss during failover.

How to configure MySQL Fabric to use synchronous / semi-synchronous mechanisms?
We have not attempted this, but from this manual page, we think currently only async replication is supported. I would expect sync to be in the roadmap though. If this is something you’d like to see, another section on the same page has some suggestions on how to make it happen.

Thanks again for attending the webinar, and feel free to post any further questions in the comments section.

The post Q&A: Putting MySQL Fabric to use appeared first on MySQL Performance Blog.

Jul
11
2014
--

Managing shards of MySQL databases with MySQL Fabric

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.

Introduction

At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

git checkout sharding

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

Fabric Sharding Structure

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

[vagrant@store ~]$ mysqlfabric group create salaries-global
Procedure :
{ uuid = 390aa6c0-acda-40e2-ad52-8c0869613635,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ for i in 1 2; do mysqlfabric group create salaries-$i; done
Procedure :
{ uuid = 274742a2-5e84-49b8-8446-5a8fc55f1899,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = 408cfd6a-ff3a-493e-b39b-a3241d83fda6,
finished = True,
success = True,
return = True,
activities =
}

 

The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

[vagrant@store ~]$ mysqlfabric group add salaries-global node1:3306
Procedure :
{ uuid = 0d0f657c-9304-4e3f-bf5b-a63a5e2e4390,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ mysqlfabric group add salaries-1 node2:3306
Procedure :
{ uuid = b0ee9a52-49a2-416e-bdfd-eda9a384f308,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ mysqlfabric group add salaries-2 node3:3306
Procedure :
{ uuid = ea5d8fc5-d4f9-48b1-b349-49520aa74e41,
finished = True,
success = True,
return = True,
activities =
}

We also need to promote the groups. Even though each group has a single node, MySQL Fabric sets up that node as SECONDARY, which means it can’t take writes.

[vagrant@store ~]$ mysqlfabric group promote salaries-global
Procedure :
{ uuid = 5e764b97-281a-49f0-b486-25088a96d96b,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ for i in 1 2; do mysqlfabric group promote salaries-$i; done
Procedure :
{ uuid = 7814e96f-71d7-4865-a278-cb6ed32a2d11,
finished = True,
success = True,
return = True,
activities =
}
Procedure :
{ uuid = cd30e9a9-b9ea-4b2d-a8ae-5e70f22363d6,
finished = True,
success = True,
return = True,
activities =
}

Finally, we are ready to create a shard definition and associate ranges to groups:

[vagrant@store ~]$ mysqlfabric sharding create_definition RANGE salaries-global
Procedure :
{ uuid = fffcbb5f-24c6-47a2-9348-f1d810c8ef2f,
finished = True,
success = True,
return = 1,
activities =
}
[vagrant@store ~]$ mysqlfabric sharding add_table 1 employees.salaries emp_no
Procedure :
{ uuid = 8d0a3c51-d543-49a6-b47a-36a4ab499ab4,
finished = True,
success = True,
return = True,
activities =
}
[vagrant@store ~]$ mysqlfabric sharding add_shard 1 "salaries-1/1, salaries-2/25000" --state=ENABLED
Procedure :
{ uuid = 2585a5ea-a097-44a4-89fa-a948298d0595,
finished = True,
success = True,
return = True,
activities =

The integer after each shard group is the lower bound for emp_no values found on that shard.

After the last command, the shard groups should be replicating off the global one. We can verify that this is the case by checking salaries-3:

[vagrant@node3 ~]$ mysql -uroot -e 'show slave statusG'
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: node1
Master_User: fabric
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 151
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 361
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 151
Relay_Log_Space: 566
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 870101
Master_UUID: e34ab4cd-00b9-11e4-8ced-0800274fb806
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 1

Looks good. Let’s go ahead and create the database schema. To avoid being too verbose, we’re only including the create statement for the salaries table in this example. Notice we run this on the PRIMARY node for the global group:

[vagrant@node1 ~]$ mysql -uroot
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 8
Server version: 5.6.19-log MySQL Community Server (GPL)
Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> CREATE DATABASE IF NOT EXISTS employees;
Query OK, 1 row affected (0.01 sec)
mysql> USE employees;
Database changed
mysql> CREATE TABLE salaries (
-> emp_no INT NOT NULL,
-> salary INT NOT NULL,
-> from_date DATE NOT NULL,
-> to_date DATE NOT NULL,
-> KEY (emp_no));
Query OK, 0 rows affected (0.06 sec)

And again, check that it made it to the shard groups:

[vagrant@node2 ~]$ mysql -uroot -e 'show databases'
+--------------------+
| Database |
+--------------------+
| information_schema |
| employees |
| mysql |
| performance_schema |
+--------------------+

Good. We’re now ready to use the Python connector and load some data into this table. We’ll be using the following script:

import mysql.connector
from mysql.connector import fabric
from mysql.connector import errors
import time
import random
import datetime
config = {
    'fabric': {
        'host': 'store',
        'port': 8080,
        'username': 'admin',
        'password': 'admin',
        'report_errors': True
    },
    'user': 'fabric',
    'password': 'f4bric',
    'database': 'employees',
    'autocommit': 'true'
}
from_min = datetime.datetime(1980,1,1,00,00,00)
to_max = datetime.datetime(2014,1,1,00,00,00)
fcnx = None
print "starting loop"
while 1:
    if fcnx == None:
	print "connecting"
        fcnx = mysql.connector.connect(**config)
        fcnx.reset_cache()
    try:
	print "will run query"
        emp_no = random.randint(1,50000)
        salary = random.randint(1,200000)
        from_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds())))
        to_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds())))
        fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READWRITE)
        cur = fcnx.cursor()
        cur.execute("insert into employees.salaries (emp_no,salary,from_date,to_date) values (%s, %s, %s, %s)",(emp_no,salary,from_date,to_date))
	print "inserted", emp_no, ", will now sleep 1 second"
        time.sleep(1)
    except (errors.DatabaseError, errors.InterfaceError):
        print "sleeping 1 second and reconnecting"
        time.sleep(1)
        del fcnx
        fcnx = None

This is similar to the script we used in our HA post. It inserts rows with random data in an endless loop. The sleep on every iteration is there just to make it easier to cancel the script, and to keep row insert rate under control.

If you leave this running for a while, you should then be able to check the global server and individual shards, and confirm they have different data:

[vagrant@store ~]$ for i in 1 2 3; do mysql -ufabric -pf4bric -hnode$i -e "select count(emp_no),max(emp_no) from employees.salaries"; done
Warning: Using a password on the command line interface can be insecure.
+---------------+-------------+
| count(emp_no) | max(emp_no) |
+---------------+-------------+
|             0 |        NULL |
+---------------+-------------+
Warning: Using a password on the command line interface can be insecure.
+---------------+-------------+
| count(emp_no) | max(emp_no) |
+---------------+-------------+
|            36 |       24982 |
+---------------+-------------+
Warning: Using a password on the command line interface can be insecure.
+---------------+-------------+
| count(emp_no) | max(emp_no) |
+---------------+-------------+
|            43 |       49423 |
+---------------+-------------+

As you can see, the global group’s server has no data for this table, and each shard’s server has data within the defined boundaries.

Querying data is done similarly (though with a READ_ONLY connection), and we can also lookup the group a row belongs to using the mysqlfabric utility directly:

[vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 2045
Command :
{ success     = True
  return      = [['ecab7dd2-00b9-11e4-8cee-0800274fb806', 'node2:3306', True]]
  activities  =
}
[vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 142045
Command :
{ success     = True
  return      = [['f8a90096-00b9-11e4-8cee-0800274fb806', 'node3:3306', True]]
  activities  =
}

Bear in mind that this lookups only use the fabric store, which means they can tell you on which servers a given row may be, but can’t confirm if the row exists or not. You need to actually query the given servers for that. If you use the connector, both steps are done for you when you issue the query.

The following code snippets illustrate the point:

>>> fcnx = mysql.connector.connect(**config)
>>> emp_no = random.randint(1,50000)
>>> fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READONLY)
>>>
>>> cur = fcnx.cursor()
>>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = %s", (emp_no,))
>>>
>>> for row in cur:
...     print row
...
(0,)
>>> fcnx.set_property(tables=["employees.salaries"], key=20734, mode=fabric.MODE_READONLY)
>>> cur = fcnx.cursor()
>>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = 20734")
>>> for row in cur:
...     print row
...
...
(1,)
>>>

In our examples, we connected directly to the PRIMARY node of the global group in order to execute DDL statements, but the same can be done requesting a global connection to MySQL Fabric, like so:

fcnx.set_property(group="salaries-global",scope=fabric.SCOPE_GLOBAL,mode=fabric.MODE_READWRITE)
cur = fcnx.cursor()
cur.execute("USE employees")
cur.execute("CREATE TABLE test (id int not null primary key)")

We can see that the table gets replicated as expected:

[vagrant@node3 ~]$ mysql -uroot -e 'show tables from employees'
+---------------------+
| Tables_in_employees |
+---------------------+
| salaries            |
| test                |
+---------------------+

Note that we’re explicitly indicating we want to connect to the global group here. When establishing a MySQL Fabric connection, we need to specify either a group name or a key and table pair (as in the insert example).

Summary

Today we’ve presented the basics of how MySQL Fabric can help you scale out by sharding, but we’ve intentionally left a few things out of the picture to keep this example simple.

In our next post, we’ll see how we can combine MySQL Fabric’s HA and sharding features, what support we have for shard operations and how HASH sharding works in MySQL Fabric.

The post Managing shards of MySQL databases with MySQL Fabric appeared first on MySQL Performance Blog.

May
29
2014
--

High Availability with MySQL Fabric: Part II

This is the third post in our MySQL Fabric series. If you missed the previous two, we started with an overall introduction, and then a discussion of MySQL Fabric’s high-availability (HA) features. MySQL Fabric was RC when we started this series, but it went GA recently. You can read the press release here, and see this blog post from Oracle’s Mats Kindahl for more details. In our previous post, we showed a simple HA setup managed with MySQL Fabric, including some basic failure scenarios. Today, we’ll present a similar scenario from an application developer’s point of view, using the Python Connector for the examples. If you’re following the examples on these posts, you’ll notice that the UUID for servers will be changing. That’s because we rebuild the environment between runs. Symbolic names stay the same though. That said, here’s our usual 3 node setup:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': '3084fcf2-df86-11e3-b46c-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.70.101'}, {'status': 'SECONDARY', 'server_uuid': '35cc3529-df86-11e3-b46c-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.70.102'}, {'status': 'PRIMARY', 'server_uuid': '3d3f6cda-df86-11e3-b46c-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.70.103'}]
  activities  =
}

For our tests, we will be using this simple script:

import mysql.connector
from mysql.connector import fabric
from mysql.connector import errors
import time
config = {
    'fabric': {
        'host': '192.168.70.100',
        'port': 8080,
        'username': 'admin',
        'password': 'admin',
        'report_errors': True
    },
    'user': 'fabric',
    'password': 'f4bric',
    'database': 'test',
    'autocommit': 'true'
}
fcnx = None
print "starting loop"
while 1:
    if fcnx == None:
	print "connecting"
        fcnx = mysql.connector.connect(**config)
        fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE)
    try:
	print "will run query"
        cur = fcnx.cursor()
        cur.execute("select id, sleep(0.2) from test.test limit 1")
        for (id) in cur:
            print id
	print "will sleep 1 second"
        time.sleep(1)
    except errors.DatabaseError:
        print "sleeping 1 second and reconnecting"
        time.sleep(1)
        del fcnx
        fcnx = mysql.connector.connect(**config)
        fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE)
        fcnx.reset_cache()
        try:
            cur = fcnx.cursor()
            cur.execute("select 1")
        except errors.InterfaceError:
            fcnx = mysql.connector.connect(**config)
            fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE)
            fcnx.reset_cache()

This simple script requests a MODE_READWRITE connection and then issues selects in a loop. The reason it requests a RW connector is that it makes it easier for us to provoke a failure, as we have two SECONDARY nodes that could be used for queries if we requested a MODE_READONLY connection. The select includes a short sleep to make it easier to catch it in SHOW PROCESSLIST. In order to work, this script needs the test.test table to exist in the mycluster group. Running the following statements in the PRIMARY node will do it:

mysql> create database if not exists test;
mysql> create table if not exists test.test (id int unsigned not null auto_increment primary key) engine = innodb;
mysql> insert into test.test values (null);

Dealing with failure

With everything set up, we can start the script and then cause a PRIMARY failure. In this case, we’ll simulate a failure by shutting down mysqld on it:

mysql> select @@hostname;
+-------------+
| @@hostname  |
+-------------+
| node3.local |
+-------------+
1 row in set (0.00 sec)
mysql> show processlist;
+----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+
| Id | User   | Host               | db   | Command          | Time | State                                                                 | Info                                         |
+----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+
|  5 | fabric | store:39929        | NULL | Sleep            |  217 |                                                                       | NULL                                         |
|  6 | fabric | node1:37999        | NULL | Binlog Dump GTID |  217 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL                                         |
|  7 | fabric | node2:49750        | NULL | Binlog Dump GTID |  216 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL                                         |
| 16 | root   | localhost          | NULL | Query            |    0 | init                                                                  | show processlist                             |
| 20 | fabric | 192.168.70.1:55889 | test | Query            |    0 | User sleep                                                            | select id, sleep(0.2) from test.test limit 1 |
+----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+
5 rows in set (0.00 sec)
[vagrant@node3 ~]$ sudo service mysqld stop
Stopping mysqld:                                           [  OK  ]

While this happens, here’s the output from the script:

will sleep 1 second
will run query
(1, 0)
will sleep 1 second
will run query
(1, 0)
will sleep 1 second
will run query
(1, 0)
will sleep 1 second
will run query
sleeping 1 second and reconnecting
will run query
(1, 0)
will sleep 1 second
will run query
(1, 0)
will sleep 1 second
will run query
(1, 0)

The ‘sleeping 1 second and reconnecting’ line means the script got an exception while running a query (when the PRIMARY node was stopped, waited one second and then reconnected. The next lines confirm that everything went back to normal after the reconnection. The relevant piece of code that handles the reconnection is this:

fcnx = mysql.connector.connect(**config)
        fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE)
        fcnx.reset_cache()

If fcnx.reset_cache() is not invoked, then the driver won’t connect to the xml-rpc server again, but will use it’s local cache of the group’s status instead. As the PRIMARY node is offline, this will cause the reconnect attempt to fail. By reseting the cache, we’re forcing the driver to connect to the xml-rpc server and fetch up to date group status information. If more failures happen and there is no PRIMARY (or candidate for promotion) node in the group, the following error is received:

will run query
(1, 0)
will sleep 1 second
will run query
sleeping 1 second and reconnecting
will run query
Traceback (most recent call last):
  File "./reader_test.py", line 34, in
    cur = fcnx.cursor()
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1062, in cursor
    self._connect()
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1012, in _connect
    exc))
mysql.connector.errors.InterfaceError: Error getting connection: No MySQL server available for group 'mycluster'

Running without MySQL Fabric

As we have discussed in previous posts, the XML-PRC server can become a single point of failure under certain circumstances. Specifically, there are at least two problem scenarios when this server is down:

  • When a node goes down
  • When new connection attempts are made

The first case is obvious enough. If MySQL Fabric is not running and a node fails, there won’t be any action, and clients will get an error whenever they send a query to the failed node. This is worse if the PRIMARY fails, as failover won’t happen and the cluster will be unavailable for writes. The second case means that while MySQL Fabric is not running, no new connections to the group can be established. This is because when connecting to a group, MySQL Fabric-aware clients first connect to the XML-RPC server to get a list of nodes and roles, and only then use their local cache for decisions. A way to mitigate this is to use connection pooling, which reduces the need for establishing new connections, and therefore minimises the chance of failure due to MySQL Fabric being down. This, of course, is assuming that something is monitoring MySQL Fabric ensuring some host provides the XML-PRC service. If that is not the case, failure will be delayed, but it will eventually happen anyway. Here is an example of what happens when MySQL Fabric is down and the PRIMARY node goes down:

Traceback (most recent call last):
  File "./reader_test.py", line 35, in
    cur.execute("select id, sleep(0.2) from test.test limit 1")
  File "/Library/Python/2.7/site-packages/mysql/connector/cursor.py", line 491, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1144, in cmd_query
    self.handle_mysql_error(exc)
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1099, in handle_mysql_error
    self.reset_cache()
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 832, in reset_cache
    self._fabric.reset_cache(group=group)
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 369, in reset_cache
    self.get_group_servers(group, use_cache=False)
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 478, in get_group_servers
    inst = self.get_instance()
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 390, in get_instance
    if not inst.is_connected:
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 772, in is_connected
    self._proxy._some_nonexisting_method()  # pylint: disable=W0212
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 272, in request
    raise InterfaceError("Connection with Fabric failed: " + msg)
mysql.connector.errors.InterfaceError: Connection with Fabric failed:

This happens when a new connection attempt is made after resetting the local cache.

Making sure MySQL Fabric stays up

As of this writing, it is the user’s responsibility to make sure MySQL Fabric is up and running. This means you can use whatever you feel comfortable with in terms of HA, like Pacemaker. While it does add some complexity to the setup, the XML-RPC server is very simple to manage and so a simple resource manager should work. For the backend, MySQL Fabric is storage engine agnostic, so an easy way to resolve this could be to use a small MySQL Cluster set up to ensure the backend is available. MySQL’s team blogged about such a set up here. We think the ndb approach is probably the simplest for providing HA at the MySQL Fabric store level, but believe that MySQL Fabric itself should provide or make it easy to achieve HA at the XML-RPC server level. If ndb is used as store, this means any node can take a write, which in turns means multiple XML-PRC instances should be able to write to the store concurrently. This means that in theory, improving this could be as easy as allowing Fabric-aware drivers to get a list of Fabric servers instead of a single IP and port to connect to.

What’s next

In the past two posts, we’ve presented MySQL Fabric’s HA features, seen how it handles failures at the node level, how to use MySQL databases with a MySQL Fabric-aware driver, and what remains unresolved for now. In our next post, we’ll review MySQL Fabric’s Sharding features.

The post High Availability with MySQL Fabric: Part II appeared first on MySQL Performance Blog.

May
15
2014
--

High Availability with MySQL Fabric: Part I

In our previous post, we introduced the MySQL Fabric utility and said we would dig deeper into it. This post is the first part of our test of MySQL Fabric’s High Availability (HA) functionality.

Today, we’ll review MySQL Fabric’s HA concepts, and then walk you through the setup of a 3-node cluster with one Primary and two Secondaries, doing a few basic tests with it. In a second post, we will spend more time generating failure scenarios and documenting how Fabric handles them. (MySQL Fabric is an extensible framework to manage large farms of MySQL servers, with support for high-availability and sharding.)

Before we begin, we recommend you read this post by Oracle’s Mats Kindahl, which, among other things, addresses the issues we raised on our first post. Mats leads the MySQL Fabric team.

Our lab

All our tests will be using our test environment with Vagrant (https://github.com/martinarrieta/vagrant-fabric)

If you want to play with MySQL Fabric, you can have these VMs running in your desktop following the instructions in the README file. If you don’t want full VMs, our colleague Jervin Real created a set of wrapper scripts that let you test MySQL Fabric using sandboxes.

Here is a basic representation of our environment.

Fabric Lab

Set up

To set up MyQSL Fabric without using our Vagrant environment, you can follow the official documentation, or check the ansible playbooks in our lab repo. If you follow the manual, the only caveat is that when creating the user, you should either disable binary logging for your session, or use a GRANT statement instead of CREATE USER. You can read here for more info on why this is the case.

A description of all the options in the configuration file can be found here. For HA tests, the one thing to mention is that, in our experience, the failure detector will only trigger an automatic failover if the value for failover_interval in the [failure_tracking] section is greater than 0. Otherwise, failures will be detected and written to the log, but no action will be taken.

MySQL configuration

In order to manage a mysqld instance with MySQL Fabric, the following options need to be set in the [mysqld] section of its my.cnf file:

log_bin
gtid-mode=ON
enforce-gtid-consistency
log_slave_updates

Additionally, as in any replication setup, you must make sure that all servers have a distinct server_id.

When everything is in place, you can setup and start MySQL Fabric with the following commands:

[vagrant@store ~]$ mysqlfabric manage setup
[vagrant@store ~]$ mysqlfabric manage start --daemon

The setup command creates the database schema used by MySQL Fabric to store information about managed servers, and the start one, well, starts the daemon. The –daemon option makes Fabric start as a daemon, logging to a file instead of to standard output. Depending on the port and file name you configured in fabric.cfg, this may need to be run as root.

While testing, you can make MySQL Fabric reset its state at any time (though it won’t change existing node configurations such as replication) by running:

[vagrant@store ~]$ mysqlfabric manage teardown
[vagrant@store ~]$ mysqlfabric manage setup

If you’re using our Vagrant environment, you can run the reinit_cluster.sh script from your host OS (from the root of the vagrant-fabric repo) to do this for you, and also initialise the datadir of the three instances.

Creating a High Availability Cluster:

A High Availability Cluster is a set of servers using the standard Asynchronous MySQL Replication with GTID.

Creating a group

The first step is to create the group by running mysqlfabric with this syntax:

$ mysqlfabric group create <group_name>

In our example, to create the cluster “mycluster” you can run:

[vagrant@store ~]$ mysqlfabric group create mycluster
Procedure :
{ uuid        = 605b02fb-a6a1-4a00-8e24-619cad8ec4c7,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

Add the servers to the group

The second step is add the servers to the group. The syntax to add a server to a group is:

$ mysqlfabric group add <group_name> <host_name or IP>[:port]

The port number is optional and only required if distinct from 3306. It is important to mention that the clients that will use this cluster must be able to resolve this host or IP. This is because clients will connect directly both with MySQL Fabric’s XML-PRC server and with the managed mysqld servers. Let’s add the nodes to our group.

[vagrant@store ~]$ for i in 1 2 3; do mysqlfabric group add mycluster node$i; done
Procedure :
{ uuid        = 9d65c81c-e28a-437f-b5de-1d47e746a318,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
Procedure :
{ uuid        = 235a7c34-52a6-40ad-8e30-418dcee28f1e,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
Procedure :
{ uuid        = 4da3b1c3-87cc-461f-9705-28a59a2a4f67,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

Promote a node as a master

Now that we have all our nodes in the group, we have to promote one of them. You can promote one specific node or you can let MySQL Fabric to choose one for you.

The syntax to promote a specific node is:

$ mysqlfabric group promote <group_name> --slave_uuid='<node_uuid>'

or to let MySQL Fabric pick one:

$ mysqlfabric group promote <group_name>

Let’s do that:

[vagrant@store ~]$ mysqlfabric group promote mycluster
Procedure :
{ uuid        = c4afd2e7-3864-4b53-84e9-04a40f403ba9,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

You can then check the health of the group like this:

[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success     = True
  return      = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  =
}

One current limitation of the ‘health’ command is that it only identifies servers by their uuid. To get a list of the servers in a group, along with quick status summary, and their host names, use lookup_servers instead:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'SECONDARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node2'}, {'status': 'PRIMARY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]
  activities  =
}

We sent a merge request to use a Json string instead of the “print” of the object in the “return” field from the XML-RPC in order to be able to use that information to display the results in a friendly way. In the same merge, we have added the address of the servers in the health command too.

Failure detection

Now we have the three lab machines set up in a replication topology of one master (the PRIMARY server) and two slaves (the SECONDARY ones). To make MySQL Fabric start monitoring the group for problems, you need to activate it:

[vagrant@store ~]$ mysqlfabric group activate mycluster
Procedure :
{ uuid        = 230835fc-6ec4-4b35-b0a9-97944c18e21f,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

Now MySQL Fabric will monitor the group’s servers, and depending on the configuration (remember the failover_interval we mentioned before) it may trigger an automatic failover. But let’s start testing a simpler case, by stopping mysql on one of the secondary nodes:

[vagrant@node2 ~]$ sudo service mysqld stop
Stopping mysqld:                                           [  OK  ]

And checking how MySQL Fabric report’s the group’s health after this:

[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success     = True
  return      = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': False, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  =
}

We can see that MySQL Fabric successfully marks the server as faulty. In our next post we’ll show an example of this by using one of the supported connectors to handle failures in a group, but for now, let’s keep on the DBA/sysadmin side of things, and try to bring the server back online:

[vagrant@node2 ~]$ sudo service mysqld start
Starting mysqld:                                           [  OK  ]
[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success     = True
  return      = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  =
}

So the server is back online, but Fabric still considers it faulty. To add the server back into rotation, we need to look at the server commands:

[vagrant@store ~]$ mysqlfabric help server
Commands available in group 'server' are:
    server set_weight uuid weight  [--synchronous]
    server lookup_uuid address
    server set_mode uuid mode  [--synchronous]
    server set_status uuid status  [--update_only] [--synchronous]

The specific command we need is set_status, and in order to add the server back to the group, we need to change it’s status twice: first to SPARE and then back to SECONDARY. You can see what happens if we try to set it to SECONDARY directly:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY
Procedure :
{ uuid        = 9a6f2273-d206-4fa8-80fb-6bce1e5262c8,
  finished    = True,
  success     = False,
  return      = ServerError: Cannot change server's (e826d4ab-d889-11e3-86df-0800274fb806) status from (FAULTY) to (SECONDARY).,
  activities  =
}

So let’s try it the right way:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SPARE
Procedure :
{ uuid        = c3a1c244-ea8f-4270-93ed-3f9dfbe879ea,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY
Procedure :
{ uuid        = 556f59ec-5556-4225-93c9-b9b29b577061,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

And check the group’s health again:

[vagrant@store ~]$ mysqlfabric group health mycluster
Command :
{ success     = True
  return      = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  =
}

In our next post, when we discuss how to use the Fabric aware connectors, we’ll also test other failure scenarios like hard VM shutdown and network errors, but for now, let’s try the same thing but on the PRIMARY node instead:

[vagrant@node3 ~]$ sudo service mysqld stop
Stopping mysqld:                                           [  OK  ]

And let’s check the servers again:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'PRIMARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node2'}, {'status': 'FAULTY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]
  activities  =
}

We can see that MySQL Fabric successfully marked node3 as FAULTY, and promoted node2 to PRIMARY to resolve this. Once we start mysqld again on node3, we can add it back as SECONDARY using the same process of setting it’s status to SPARE first, as we did for node2 above.

Remember that unless failover_interval is greater than 0, MySQL Fabric will detect problems in an active group, but it won’t take any automatic action. We think it’s a good thing that the value for this variable in the documentation is 0, so that automatic failover is not enabled by default (if people follow the manual, of course), as even in mature HA solutions like Pacemaker, automatic failover is something that’s tricky to get right. But even without this, we believe the main benefit of using MySQL Fabric for promotion is that it takes care of reconfiguring replication for you, which should reduce the risk for error in this process, specially once the project becomes GA.

What’s next

In this post we’ve presented a basic replication setup managed by MySQL Fabric and reviewed a couple of failure scenarios, but many questions are left unanswered, among them:

  • What happens to clients connected with a Fabric aware driver when there is a status change in the cluster?
  • What happens when the XML-RPC server goes down?
  • How can we improve its availability?

We’ll try to answer these and other questions in our next post. If you have some questions of your own, please leave them in the comments section and we’ll address them in the next or other posts, depending on the topic.

The post High Availability with MySQL Fabric: Part I appeared first on MySQL Performance Blog.

Apr
24
2014
--

Managing farms of MySQL servers with MySQL Fabric

While built-in replication has been a major cause for MySQL’s wide adoption, official tools to help DBAs manage replication topologies have typically been missing from the picture. The community has produced many good products to fill in this gap, but recently, Oracle has been filling it too with the addition of MySQL Utilities to the mix.

One part of the Utilities that has been generating interest recently is MySQL Fabric, and we will be discussing this project in an upcoming series of blog posts.

According to the official documentation, MySQL Fabric is a system for managing farms of MySQL Servers. At the current stage, the system focuses on two areas of MySQL operations: High Availability and Sharding, relying on GTID based replication (available only on MySQL >= 5.6.5) for the former. Its development has been spearheaded by Mats Kindahl, senior principal software developer in MySQL at Oracle, who explains what it is in this post and again in this short video interview at Percona Live earlier this month.

We will start this series of posts by providing a broad overview of the project, and then we’ll dig deeper on each area on subsequent posts.

What it does

Currently, MySQL Fabric has two areas of server management in which it can help DBAs: High Availability and Sharding.

In terms of High Availability, MySQL Fabric will let you pool a group of MySQL servers and treat them as a single logical unit, with a Primary server that can take reads and writes, and Secondary servers that can take reads (and be used to scale those) as well as take over the Primary role in the event of a failure.

As we’ll see, MySQL Fabric relies on GTID based replication to work. By default, the tool won’t do any automatic failover, but it can be configured to do that, and it does work. In future posts of this series, we’ll spend some time explaining how to set up an HA group of MySQL servers using MySQL Fabric, and then we’ll set to try and break them in many ways. Stay tuned if you’re interested in the results :)

When it comes to Sharding, MySQL Fabric relies on special connectors to work. Without the connectors, you can still use the mysqlfabric command line tool to manage shards (and even migrate them from one server to another), but you will have to modify your application to include logic to decide which shard should be used for any given query. When using the connectors, the MySQL connection will actually be established with MySQL Fabric (with an XML RPC service we’ll talk about later). The connectors cache the needed information (this is a simplification, we’ll go into more detail on the next post) so that the client can make the right routing decision when reading or writing data to a sharded table.

We feel that the HA features are more mature now, but the Sharding ones look promising. Also, MySQL Fabric is extensible, and we think this is one area where it can grow a lot (for example, by using Shard Query with MySQL Fabric to support parallel distributed queries).

General Concepts

In order to understand MySQL Fabric we first need to present some terminology used by the project. We’ll start by listing basic definitions, and then we’ll go into more details when needed.

  • Group. A collection of mysqld servers.
  • Global group. Special groups that store updates that must be propagated to all shards.
  • Node. A running instance of MySQL Fabric.
  • Shard. A horizontal partition of data in a table.
  • Primary. A group member that has been designated master.
  • Secondary. A group member that is read only.

Group

A server here is actually a mysqld instance, though ideally, all instances of a group should be on different servers. However, while testing, you can create multiple instances on the same host since you don’t really need HA.

A given server can only be part of a single group. This may seem confusing at first, but when you realize MySQL Fabric relies on replication (using GTID) for most of its work, it becomes clearer. A given MySQL server can only have one master and therefore it makes no sense for it to belong to multiple groups.

Groups have identifiers, which are just symbolic names that need to comply with some basic rules

Global group

Global groups are special groups involved in Sharding. They are used to propagate changes global to all shards (like schema changes, though not only that).

Node

Note that a node is *not* a MySQL server that’s part of a group. It is a python program that, among other things, provides the XML-RPC server that is used by special connectors and by the ‘mysqlfabric’ command line client. A node will, however, need a mysqld instance. This instance is called the backend store and will be used by MySQL Fabric to save all the information it needs to manage servers.

Shard

We said a shard is an horizontal partition of data in a table, but this partition happens at the MySQL Fabric level. MySQL is completely unaware of it, and at the mysqld instance level, a shard is just a table. We’ll talk more about this soon, as it has some consequences.

Primary

The primary server is the only writable server in a group. This applies to HA, not to sharding, though you could define a group (and therefore a Primary) per shard and therefore use MySQL Fabric both for sharding, and to provide HA for each shard.

Secondary

A Secondary server is a member of a group that is available to replace a Primary server on failover, and that is read only.It can also be used to scale out reads. 

Highlights 

 As with anything, MySQL Fabric has its good and bad points. Fortunately, the bulk of the bad points we identified are due to the project being still early in its lifecycle. Considering the latest release is a RC, we’re sure those will go away the future.

On the good side:

  • It is developed by Oracle.

We think this is important, because MySQL did not have a standard tool to manage farms of servers until now. People will still be able to use tools from other providers (or cook their own), but we think it’s good to have a standard offering that’s part of the MySQL packages.

  • It is extensible.

MySQL Fabric feels more like a framework than a closed product. In fact, even some official presentations introduce it as a framework. As a framework, it is implemented in python, a widely available and friendly interpreted language. We believe this means MySQL Fabric should be adaptable to specific needs with little hassle.

  • It is focused on MySQL versions 5.6.10 and newer.

By not worrying about backwards compatibility, implementation should be simpler. Take Secondary server promotion after the Primary goes down as example. GTID makes this much simpler.

On the bad side:

  • It is a bit rough around the edges.

This is expected for a new product, and to be honest, most problems we faced turned out being a documentation issue and not an actual problem with the software. The docs are still a bit green, but source code comments are good and plenty, so if you want to get your hands dirty and really understand how it works, that is the path we suggest.

  • Some things require help from MySQL to be properly implemented, and MySQL does not provide it (yet?).

An example: it is currently impossible to make sure that writes can never go to the wrong shard. As we said earlier, at the individual mysqld server, a shard is just a table, period. Ideally, some future MySQL version should extend the @read_only variable functionality and allow you to selectively mark parts of the data as read only. If we could mark a specific table as read only, or as updatable only when a given condition is met (i.e. WHERE id between <lower bound> and <upper bound>), it would increase the safety of sharding a lot.

  • It is focused on MySQL versions 5.6.10.

Yes, we said that’s good, but the obvious downside is you can’t consider using MySQL Fabric if you’re on an earlier MySQL version, and a lot of people still are.

  • In HA setups, MySQL Fabric itself can become a single point of failure

MySQL Fabric (the XML-RPC ‘Node’) and its data store (the mysqld instance that stores MySQL Fabric’s data) are a single point of failure that needs to be addressed. In practical terms, the impact of MySQL Fabric going down will vary with your use case. If you’re only using the mysqlfabric utility to manage servers, nothing will happen as long as all servers in a Group continue to work. If, however, you’re using one of the special connectors to access the Group, then your application will be down. This is resolvable, and we will discuss some approaches in the HA posts, but we think the best solution going forward is for MySQL Fabric to address this in the future by letting you set up multiple Fabric Nodes and have them monitor each other and promote a new active one if needed.

What comes next

Over the next few days, we will blog more about this, describing how to set up HA and Sharded clusters using MySQL Fabric, walking you through the process, and also trying to see how and when it can fail. We had a lot of fun doing this and we hope you have a good time reading it and experimenting yourself too!

The post Managing farms of MySQL servers with MySQL Fabric appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com