Sep
16
2016
--

Consul, ProxySQL and MySQL HA

ProxySQL

When it comes to “decision time” about which type of MySQL HA (high-availability) solution to implement, and how to architect the solution, many questions come to mind. The most important questions are:

  • “What are the best tools to provide HA and Load Balancing?”
  • “Should I be deploying this proxy tool on my application servers or on a standalone server?”.

Ultimately, the best tool really depends on the needs of your application and your environment. You might already be using specific tools such as Consul or MHA, or you might be looking to implement tools that provide richer features. The dilemma of deploying a proxy instance per application host versus a standalone proxy instance is usually a trade-off between “a less effective load balancing algorithm” or “a single point of failure.” Neither are desirable, but there are ways to implement a solution that balances all aspects.

In this article, we’ll go through a solution that is suitable for an application that has not been coded to split reads and writes over separate MySQL instances. An application like this would rely on a proxy or 3rd party tool to split reads/writes, and preferably a solution that has high-availability at the proxy layer. The solution described here is comprised of ProxySQLConsul and Master High Availability (MHA). Within this article, we’ll focus on the configuration required for ProxySQL and Consul since there are many articles that cover MHA configuration (such as Miguel’s recent MHA Quick Start Guide blog post).

When deploying Consul in production, a minimum of 3x instances are recommended – in this example, the Consul agents run on the Application Server (appserver) as well as on the two “ProxySQL servers” mysql1 and mysql2 (which act as the HA proxy pair). This is not a hard requirement, and these instances can easily run on another host or docker container. MySQL is deployed locally on mysql1 and mysql2, however this could just as well be 1..n separate standalone DB server instances:

Consul ProxySQL

So let’s move on to the actual configuration of this HA solution, starting with Consul.

Installation of Consul:

Firstly, we’ll need to install the required packages, download the Consul archive and perform the initial configuration. We’ll need to perform the same installation on each of the nodes (i.e., appserver, mysql1 and mysql2).

### Install pre-requisite packages:
sudo yum -y install wget unzip bind-utils dnsmasq
### Install Consul:
sudo useradd consul
sudo mkdir -p /opt/consul /etc/consul.d
sudo touch /var/log/consul.log /etc/consul.d/proxysql.json
cd /opt/consul
sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
sudo unzip consul_0.6.4_linux_amd64.zip
sudo ln -s /opt/consul/consul /usr/bin/consul
sudo chown consul:consul -R /etc/consul* /opt/consul* /var/log/consul.log

Configuration of Consul on Application Server (used as ‘bootstrap’ node):

Now, that we’re done with the installation on each of the hosts, let’s continue with the configuration. In this example we’ll bootstrap the Consul cluster using “appserver”:

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent1",
  "server": true,
  "ui": true,
  "bootstrap": true,
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.119"  ## Add server IP here
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root --password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Setup DNSMASQ (as root)
echo "server=/consul/127.0.0.1#8600" > /etc/dnsmasq.d/10-consul
service dnsmasq restart
### Remember to add the localhost as a DNS server (this step can vary
### depending on how your DNS servers are managed... here I'm just
### adding the following line to resolve.conf:
sudo vi /etc/resolve.conf
#... snippet ...#
nameserver 127.0.0.1
#... snippet ...#
### Restart dnsmasq
sudo service dnsmasq restart

The service should now be started, and you can verify this in the logs in “/var/log/consul.log”.

Configuration of Consul on Proxy Servers:

The next item is to configure each of the proxy Consul agents. Note that the “agent name” and the “IP address” need to be updated for each host (values for both must be unique):

### Edit configuration files
$ sudo vi /etc/consul.conf
{
  "datacenter": "dc1",
  "data_dir": "/opt/consul/",
  "log_level": "INFO",
  "node_name": "agent2",  ### Agent node name must be unique
  "server": true,
  "ui": true,
  "bootstrap": false,   ### Disable bootstrap on joiner nodes
  "client_addr": "0.0.0.0",
  "advertise_addr": "192.168.1.xxx",  ### Set to local instance IP
  "dns_config": {
    "only_passing": true
  }
}
######
$ sudo vi /etc/consul.d/proxysql.json
{"services": [
  {
   "id": "proxy1",
   "name": "proxysql",
   "address": "192.168.1.120",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123",
     "interval": "3s"}
   },
  {
   "id": "proxy2",
   "name": "proxysql",
   "address": "192.168.1.121",
   "tags": ["mysql"],
   "port": 6033,
   "check": {
     "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root password=123",
     "interval": "3s"}
   }
 ]
}
######
### Start Consul agent:
$ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &'
### Join Consul cluster specifying 1st node IP e.g.
$ consul join 192.168.1.119
### Verify logs and look out for the following messages:
$ cat /var/log/consul.log
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'agent2'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 192.168.1.120 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas:
==> Log data will now stream in as it occurs:
# ... snippet ...
    2016/09/05 19:48:04 [INFO] agent: Synced service 'consul'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql1'
    2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql2'
# ... snippet ...

At this point, we have Consul installed, configured and running on each of our hosts appserver (mysql1 and mysql2). Now it’s time to install and configure ProxySQL on mysql1 and mysql2.

Installation & Configuration of ProxySQL:

The same procedure should be run on both mysql1 and mysql2 hosts:

### Install ProxySQL packages and initialise ProxySQL DB
sudo yum -y install https://github.com/sysown/proxysql/releases/download/v1.2.2/proxysql-1.2.2-1-centos7.x86_64.rpm
sudo service proxysql initial
sudo service proxysql stop
### Edit the ProxySQL configuration file to update username / password
vi /etc/proxysql.cnf
###
admin_variables=
{
    admin_credentials="admin:admin"
    mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock"
}
###
### Start ProxySQL
sudo service proxysql start
### Connect to ProxySQL and configure
mysql -P6032 -h127.0.0.1 -uadmin -padmin
### First we create a replication hostgroup:
mysql> INSERT INTO mysql_replication_hostgroups VALUES (10,11,'Standard Replication Groups');
### Add both nodes to the hostgroup 11 (ProxySQL will automatically put the writer node in hostgroup 10)
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.120',11,3306,1000);
mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.121',11,3306,1000);
### Save server configuration
mysql> LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
### Add query rules for RW split
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .* FOR UPDATE', 10, NULL, 1);
mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .*', 11, NULL, 1);
mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
### Finally configure ProxySQL user and save configuration
mysql> INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('root','123',1,10,'test');
mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK;
mysql> EXIT;

MySQL Configuration:

We also need to perform one configuration step on the MySQL servers in order to create a user for ProxySQL to monitor the instances:

### ProxySQL's monitor user on the master MySQL server (default username and password is monitor/monitor)
mysql -h192.168.1.120 -P3306 -uroot -p123 -e"GRANT USAGE ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';"

We can view the configuration of the monitor user on the ProxySQL host by checking the global variables on the admin interface:

mysql> SHOW VARIABLES LIKE 'mysql-monitor%';
+----------------------------------------+---------+
| Variable_name                          | Value   |
+----------------------------------------+---------+
| mysql-monitor_enabled                  | true    |
| mysql-monitor_connect_timeout          | 200     |
| mysql-monitor_ping_max_failures        | 3       |
| mysql-monitor_ping_timeout             | 100     |
| mysql-monitor_replication_lag_interval | 10000   |
| mysql-monitor_replication_lag_timeout  | 1000    |
| mysql-monitor_username                 | monitor |
| mysql-monitor_password                 | monitor |
| mysql-monitor_query_interval           | 60000   |
| mysql-monitor_query_timeout            | 100     |
| mysql-monitor_slave_lag_when_null      | 60      |
| mysql-monitor_writer_is_also_reader    | true    |
| mysql-monitor_history                  | 600000  |
| mysql-monitor_connect_interval         | 60000   |
| mysql-monitor_ping_interval            | 10000   |
| mysql-monitor_read_only_interval       | 1500    |
| mysql-monitor_read_only_timeout        | 500     |
+----------------------------------------+---------+

Testing Consul:

Now that Consul and ProxySQL are configured we can do some tests from the “appserver”. First, we’ll verify that the hosts we’ve added are both reporting [OK] on our DNS requests:

$ dig @127.0.0.1 -p 53 proxysql.service.consul
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @127.0.0.1 -p 53 proxysql.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9975
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;proxysql.service.consul.	IN	A
;; ANSWER SECTION:
proxysql.service.consul. 0	IN	A	192.168.1.121
proxysql.service.consul. 0	IN	A	192.168.1.120
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Sep 05 19:32:12 UTC 2016
;; MSG SIZE  rcvd: 158

As you can see from the output above, DNS is reporting both 192.168.120 and 192.168.1.121 as available for the ProxySQL service. As soon as the ProxySQL check fails, the nodes will no longer report in the output above.

We can also view the status of our cluster and agents through the Consul Web GUI which runs on port 8500 of all the Consul servers in this configuration (e.g. http://192.168.1.120:8500/):

Consul GUI

Testing ProxySQL:

So now that we have this configured we can also do some basic tests to see that ProxySQL is load balancing our connections:

[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql1.localdomain |
+--------------------+
[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname"
+--------------------+
| @@hostname         |
+--------------------+
| mysql2.localdomain |
+--------------------+

Perfect! We’re ready to use the hostname “proxysql.service.consul” to connect to our MySQL instances using a round-robin load balancing and HA proxy solution. If one of the two ProxySQL instances fails, we’ll continue communicating with the database through the other. Of course, this configuration is not limited to just two hosts, so feel free to add as many as you need. Be aware that in this example the two hosts’ replication hierarchy is managed by MHA in order to allow for master/slave promotion. By performing an automatic or manual failover using MHA, ProxySQL automatically detects the change in replication topology and redirect writes to the newly promoted master instance.

To make this configuration more durable, it is encouraged to create a more intelligent Consul check – i.e., a check that checks more than just the availability of the MySQL service (an example would be to select some data from a table). It is also recommended to fine tune the interval of the check to suit the requirements of your application.

Sep
13
2016
--

ProxySQL and MHA Integration

MHA

MHAThis blog post discusses ProxySQL and MHA integration, and how they work together.

MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.

This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.

The following is an example of an MHA configuration file for use with ProxySQL:

server default]
    user=mha
    password=mha
    ssh_user=root
    repl_password=replica
    manager_log=/tmp/mha.log
    manager_workdir=/tmp
    remote_workdir=/tmp
    master_binlog_dir=/opt/mysql_instances/mha1/logs
    client_bindir=/opt/mysql_templates/mysql-57/bin
    client_libdir=/opt/mysql_templates/mysql-57/lib
    master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover
    master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change
    log_level=debug
    [server1]
    hostname=mha1r
    ip=192.168.1.104
    candidate_master=1
    [server2]
    hostname=mha2r
    ip=192.168.1.107
    candidate_master=1
    [server3]
    hostname=mha3r
    ip=192.168.1.111
    candidate_master=1
    [server4]
    hostname=mha4r
    ip=192.168.1.109
    no_master=1

NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.

After that, just install MHA as you normally would.

In ProxySQL, be sure to have all MHA users and the servers set.

When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute

max_replication_lag

, or the check will be ignored.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10);
INSERT INTO mysql_replication_hostgroups VALUES (600,601);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1);
insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1);
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE');
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha');
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

OK, now that all is ready,  let’s rock’n’roll!

Controlled fail-over

First of all, the masterha_manager should not be running or you will get an error.

Now let’s start some traffic:

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all  run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10  --mysql-ignore-errors=all run

Let it run for a bit, then check:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.104 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 551256  | 44307633        | 0               | 285        | <--- current Master
| 601       | 192.168.1.111 | 3306     | ONLINE | 5        | 3        | 11     | 0       | 1053685 | 52798199        | 4245883580      | 1133       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 3        | 5        | 10     | 0       | 1006880 | 50473746        | 4052079567      | 369        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1040524 | 52102581        | 4178965796      | 604        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 7        | 1        | 16     | 0       | 987548  | 49458526        | 3954722258      | 285        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects:  0.00 <--- moment of the switch
[ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects:  0.00
[ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects:  1.00
[ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects:  0.00

Check ProxySQL:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.107 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 123457  | 9922280         | 0               | 658        | <--- new master
| 601       | 192.168.1.111 | 3306     | ONLINE | 2        | 6        | 14     | 0       | 1848302 | 91513537        | 7590137770      | 1044       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 5        | 3        | 12     | 0       | 1688789 | 83717258        | 6927354689      | 220        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1834415 | 90789405        | 7524861792      | 658        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 6        | 2        | 24     | 0       | 1667252 | 82509124        | 6789724589      | 265        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the servers weren’t behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.

Read operations were not affected, at all. Nice, eh?

Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.

This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead of an easy switch, we have to cover a real failover?

First of all, let’s start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107

ps aux|grep mysql
mysql    18755  0.0  0.0 113248  1608 pts/0    S    Aug28   0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf
mysql    21975  3.2 30.4 4398248 941748 pts/0  Sl   Aug28  93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306
And kill the MySQL process.
kill -9 21975 18755

As before, check what happened on the application side:

[  80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects:  0.00 <-- issue starts
[ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects:  0.00
[ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.40         <-- total stop in write
[ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects:  0.00 <-- writes restart
[ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects:  0.00
[ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects:  0.00

So it takes ~10 seconds to perform failover.

To understand better, let see what happened in MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf..
... Read conf and start
Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock..
Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
... Wait for errors
Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time
Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s)..                                     <--- Finally MHA decide to do something
Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker!
Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable!
Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable.
Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).

To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.

As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.

Sep
02
2016
--

MHA Quick Start Guide

MHA

high availabilityMHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

192.168.1.116	mysql-server1
192.168.1.117   mysql-server2
192.168.1.118   mysql-server3
192.168.1.119   mha-manager

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

tar -xzf mha4mysql-node-0.57.tar.gz
perl Makefile.PL
make
make install

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

# cat /etc/app1.cnf
[server default]
# mysql user and password
user=root
password=supersecure
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=mysql-server1
candidate_master=1
[server2]
hostname=mysql-server2
candidate_master=1
[server3]
hostname=mysql-server3
no_master=1

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

# masterha_check_ssh --conf=/etc/app1.cnf
[...]
[info] All SSH connection tests passed successfully.

It works. Now let’s check MySQL:

# masterha_check_repl --conf=/etc/app1.cnf
[...]
MySQL Replication Health is OK.

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf
[...]
[info] Starting ping health check on mysql-server1(192.168.1.116:3306)..
[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

[...]
[warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
[info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/log/mysql, up to mysql-bin.000002
[info] HealthCheck: SSH to mha-server1 is reachable.
[...]

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

* Phase 1: Configuration Check Phase..
* Phase 2: Dead Master Shutdown Phase..
* Phase 3: Master Recovery Phase..
* Phase 3.1: Getting Latest Slaves Phase..
* Phase 3.2: Saving Dead Master's Binlog Phase..
* Phase 3.3: Determining New Master Phase..
[info] Finding the latest slave that has all relay logs for recovering other slaves..
[info] All slaves received relay logs to the same position. No need to resync each other.
[info] Starting master failover..
[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
* Phase 3.3: New Master Diff Log Generation Phase..
* Phase 3.4: Master Log Apply Phase..
* Phase 4: Slaves Recovery Phase..
* Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
* Phase 4.2: Starting Parallel Slave Log Apply Phase..
* Phase 5: New master cleanup phase..

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

[info]
From:
mysql-server1(192.168.1.116:3306) (current master)
 +--mysql-server2(192.168.1.117:3306)
 +--mysql-server3(192.168.1.118:3306)
To:
mysql-server2(192.168.1.117:3306) (new master)
 +--mysql-server3(192.168.1.118:3306)
Starting master switch from mha-server1(192.168.1.116:3306) to mha-server2(192.168.1.117:3306)? (yes/NO): yes
[...]
[info] Switching master to mha-server2(192.168.1.117:3306) completed successfully.

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  ? These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

Aug
19
2014
--

Measuring failover time for ScaleArc load balancer

ScaleArc hired Percona to benchmark failover times for the ScaleArc database traffic management software in different scenarios. We tested failover times for various clustered setups, where ScaleArc itself was the load balancer for the cluster. These tests complement other performance tests on the ScaleArc software – sysbench testing for latency and testing for WordPress acceleration.

We tested failover times for Percona XtraDB Cluster (PXC) and MHA (any traditional MySQL replication-based solution works pretty much the same way).

In each case, we tested failover with a rate limited sysbench benchmark. In this mode, sysbench generates roughly 10 transactions each second, even if not all 10 were completed in the previous second. In that case, the newly generated transactions are queued.

The sysbench command we used for testing is the following.

# while true ; do sysbench --test=sysbench/tests/db/oltp.lua
                           --mysql-host=172.31.10.231
                           --mysql-user=root
                           --mysql-password=admin
                           --mysql-db=sbtest
                           --oltp-table-size=10000
                           --oltp-tables-count=4
                           --num-threads=4
                           --max-time=0
                           --max-requests=0
                           --report-interval=1
                           --tx-rate=10
                           run ; sleep 1; done

The command is run in a loop, because typically at the time of failover, the application receives some kind of error while the virtual IP is moved, or while the current node is declared dead. Well-behaving applications are reconnecting and retrying in this case. Sysbench is not a well-behaving application from this perspective – after failover it has to be restarted.

This is good for testing the duration of errors during a failover procedure – but not the number of errors. In a simple failover scenario (ScaleArc is just used as a regular load balancer), the number of errors won’t be any higher than usual with ScaleArc’s queueing mechanism.

ScaleArc+MHA

In this test, we used MHA as a replication manager. The test result would be similar regardless of how its asynchronous replication is managed – only the ScaleArc level checks would be different. In the case of MHA, we tested graceful and non-graceful failover. In the graceful case, we stopped the manager and performed a manual master switchover, after which we informed the ScaleArc software via an API call for failover.

We ran two tests:

  • A manual switchover with the manager stopped, switching over manually, and informing the load balancer about the change.
  • An automatic failover where the master was killed, and we let MHA and the ScaleArc software discover it.

ScaleArc+MHA manual switchover

The manual switchover was performed with the following command.

# time (
          masterha_master_switch --conf=/etc/cluster_5.cnf
          --master_state=alive
          --new_master_host=10.11.0.107
          --orig_master_is_new_slave
          --interactive=0
          &&
          curl -k -X PUT https://172.31.10.30/api/cluster/5/manual_failover
          -d '{"apikey": "0c8aa9384ddf2a5756299a9e7650742a87bbb550"}' )
{"success":true,"message":"4214   Failover status updated successfully.","timestamp":1404465709,"data":{"apikey":"0c8aa9384ddf2a5756299a9e7650742a87bbb550"}}
real    0m8.698s
user    0m0.302s
sys     0m0.220s

The curl command calls ScaleArc’s API to inform the ScaleArc software about the master switchover.

During this time, sysbench output was the following.

[  21s] threads: 4, tps: 13.00, reads/s: 139.00, writes/s: 36.00, response time: 304.07ms (95%)
[  21s] queue length: 0, concurrency: 1
[  22s] threads: 4, tps: 1.00, reads/s: 57.00, writes/s: 20.00, response time: 570.13ms (95%)
[  22s] queue length: 8, concurrency: 4
[  23s] threads: 4, tps: 19.00, reads/s: 237.99, writes/s: 68.00, response time: 976.61ms (95%)
[  23s] queue length: 0, concurrency: 2
[  24s] threads: 4, tps: 9.00, reads/s: 140.00, writes/s: 40.00, response time: 477.55ms (95%)
[  24s] queue length: 0, concurrency: 3
[  25s] threads: 4, tps: 10.00, reads/s: 105.01, writes/s: 28.00, response time: 586.23ms (95%)
[  25s] queue length: 0, concurrency: 1

Only a slight hiccup is visible at 22 seconds. In this second, only 1 transaction was done, and 8 others were queued. These results show a sub-second failover time. The reason no errors were received is that ScaleArc itself queued the transactions during the failover process. If the transaction in question were done from an interactive client, the queuing itself would be visible as increased response time – for example a START TRANSACTION or an INSERT command is taking longer than usual, but no errors result. This is as good as it gets for graceful failover. ScaleArc knows about the failover (and in the case of a switchover initiated by a DBA, notifying the ScaleArc software can be part of the failover process). The queueing mechanism is quite configurable. Administrators can set up the timeout for the queue – we set it to 60 seconds, so if the failover doesn’t complete in that timeframe transactions start to fail.

ScaleArc+MHA non-graceful failover

In the case of the non-graceful failover MHA and the ScaleArc software have to figure out that the node died.

[  14s] threads: 4, tps: 11.00, reads/s: 154.00, writes/s: 44.00, response time: 1210.04ms (95%)
[  14s] queue length: 4, concurrency: 4
( sysbench restarted )
[   1s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   1s] queue length: 13, concurrency: 4
[   2s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   2s] queue length: 23, concurrency: 4
[   3s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   3s] queue length: 38, concurrency: 4
[   4s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   4s] queue length: 46, concurrency: 4
[   5s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   5s] queue length: 59, concurrency: 4
[   6s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   6s] queue length: 69, concurrency: 4
[   7s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   7s] queue length: 82, concurrency: 4
[   8s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   8s] queue length: 92, concurrency: 4
[   9s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[   9s] queue length: 99, concurrency: 4
[  10s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  10s] queue length: 108, concurrency: 4
[  11s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  11s] queue length: 116, concurrency: 4
[  12s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  12s] queue length: 126, concurrency: 4
[  13s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  13s] queue length: 134, concurrency: 4
[  14s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  14s] queue length: 144, concurrency: 4
[  15s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  15s] queue length: 153, concurrency: 4
[  16s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%)
[  16s] queue length: 167, concurrency: 4
[  17s] threads: 4, tps: 5.00, reads/s: 123.00, writes/s: 32.00, response time: 16807.85ms (95%)
[  17s] queue length: 170, concurrency: 4
[  18s] threads: 4, tps: 18.00, reads/s: 255.00, writes/s: 76.00, response time: 16888.55ms (95%)
[  18s] queue length: 161, concurrency: 4

The failover time in this case was 18 seconds. We looked at the results and found that a check the ScaleArc software does (which involves opening an SSH connection to all nodes in MHA) take 5 seconds, and ScaleArc declares the node dead after 3 consecutive checks (this parameter is configurable). Hence the high failover time. A much lower one can be achieved with more frequent checks and a different check method – for example checking the read-only flag, or making MHA store its state in the databases.

ScaleArc+Percona XtraDB Cluster tests

Percona XtraDB Cluster is special when it comes to high availability testing because of the many ways one can use it. Many people write only to one node to avoid rollback on conflicts, but we have also seen lots of folks using all the nodes for writes.

Also, graceful failover can be interpreted differently.

  • Failover can be graceful from both Percona XtraDB Cluster’s and from the ScaleArc software’s perspective. First the traffic is switched to another node, or removed from the node in question, and then MySQL is stopped.
  • Failover can be graceful from Percona XtraDB Cluster’s perspective, but not from the ScaleArc software’s perspective. In this case the database is simply stopped with service mysql stop, and the load balancer figures out what happened. I will refer to this approach as semi-graceful from now on.
  • Failover can be completely non-graceful if a node is killed, where neither Percona XtraDB Cluster, nor ScaleArc knows about its departure.

We did all the tests using one node at a time, and using all three nodes. What makes these test sets even more complex is that when only one node at a time used, some tests (the semi-graceful and the non-graceful one) don’t have the same result if the node removed is the used one or an unused one. This process involves a lot of tests, so for the sake of brevity, I omit the actual sysbench output here – they look either like the graceful MHA and non-graceful MHA case – and only present the results in a tabular format. In the case of active/active setups, to remove the nodes gracefully we first have to lower the maximum number of connections on that node to 0.

Failover type 1 node (active) 1 node (passive) all nodes
Graceful sub-second (no errors) no effect at all sub-second (no errors)
Semi-graceful 4 seconds (errors) no effect at all 3 seconds
Non-graceful 4 seconds (errors) 6 seconds (no errors) 7 seconds (errors)

In the previous table, active means that the failed node did receive sysbench transactions and passive means that it didn’t.

All the graceful cases are similar to MHA’s graceful case.

If only one node is used and a passive node is removed from the cluster, by stopping the database itself gracefully with the

# service mysql stop

command, it doesn’t have an effect on the cluster. For the subsequent cases of the graceful failover, switching on ScaleArc will enable queuing similar to MHA’s case. In case of the semi-graceful, if the passive node (which has no traffic) departs, it has no effect. Otherwise, the application will get errors (because of the unexpected mysql stop), and the failover time is around 3-4 seconds for the cases when only 1 node is active and when all 3 are active. This makes sense, because ScaleArc was configured to do checks (using clustercheck) every second, and declare a node dead after three consecutive failed checks. After the ScaleArc software determined that it should fail over, and it did so, the case is practically the same as the passive node’s removal from that point (because ScaleArc removes traffic in practice making that node passive).

The non-graceful case is tied to suspect timeout. In this case, XtraDB Cluster doesn’t know that the node departed the cluster, and the originating nodes are trying to send write sets to it. Those write sets will never be certified because the node is gone, so the writes will be stalled. The exception here is the case when the active node failed. After ScaleArc figures out that the node dies (three consecutive checks at one second intervals) a new node is chosen, but because only the failed node was doing transactions, no write sets are in the remaining two nodes’ queues, which are waiting for certification, so there is no need to wait for suspect timeout here.

Conclusion

ScaleArc does have reasonably good failover time, especially in the case when it doesn’t have to interrupt transactions. Its promise of zero-downtime maintenance is fulfilled both in the case of MHA and XtraDB Cluster.

The post Measuring failover time for ScaleArc load balancer appeared first on MySQL Performance Blog.

Oct
23
2013
--

High-availability options for MySQL, October 2013 update

The technologies allowing to build highly-available (HA) MySQL solutions are in constant evolution and they cover very different needs and use cases. In order to help people choose the best HA solution for their needs, we decided, Jay Janssen and I, to publish, on a regular basis (hopefully, this is the first), an update on the most common technologies and their state, with a focus on what type of workloads suite them best. We restricted ourselves to the open source solutions that provide automatic failover. Of course, don’t simply look at the number of Positives/Negatives items, they don’t have the same values. Should you pick any of these technologies, heavy testing is mandatory, HA is never beyond scenario that have been tested.

Percona XtraDB Cluster (PXC)

Percona XtraDB Cluster (PXC) is a version of Percona Server implementing the Galera replication protocol from Codeship.

Positive points Negative points
  • Almost synchronous replication, very small lag if any
  • Automatic failover
  • At best with small transactions
  • All nodes are writable
  • Very small read after write lag, usually no need to care about
  • Scale reads very well and to some extent, writes
  • New nodes are provisioned automatically through State Snapshot Transfer (SST)
  • Multi-threaded apply, greater write capacity than regular replication
  • Can do geographical disaster recovery (Geo DR)
  • More resilient to unresponsive nodes (swapping)
  • Can resolve split-brain situations by itself
  • Still under development, some rough edges
  • Large transactions like multi-statement transactions or large write operations cause issues and are usually not a good fit
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • SST can be heavy over a Wan
  • Commit are affected by the network latency, this impacts especially Geo DR
  • To achieve HA, a load balancer, like haproxy, is needed
  • Failover time is determined by the load balancer check frequency
  • Performance is affected by the weakest/busiest node
  • Foreign Keys are potential issues
  • MyISAM should be avoided
  • Can be mixed with regular async replication as master or slave but, slaves are not easy to reconfigure after a SST on their master
  • Require careful setup of the host, swapping can lead to node expulsion from the cluster
  • No manual failover mode
  • Debugging some Galera protocol issues isn’t trivial

 

Percona replication manager (PRM)

Percona replication manager (PRM) uses the Linux HA Pacemaker resource manager to manage MySQL and replication and provide high-availability. Information about PRM can be found here, the official page on the Percona web site is in the making.

Positive points Negative points
  • Nothing specific regarding the workload
  • Unlimited number of slaves
  • Slaves can have different roles
  • Typically VIP based access, typically 1 writer VIP and many reader VIPs
  • Also works without VIP (see the fake_mysql_novip agent)
  • Detects if slave lags too much and remove reader VIPs
  • All nodes are monitored
  • The best slaves is picked for master after failover
  • Geographical Disaster recovery possilbe with the lightweight booth protocol
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 2s in normal conditions
  • Ungraceful failover under 30s
  • Distributed operation with Pacemaker, no single point of failure
  • Builtin pacemaker logic, stonith, etc. Very rich and flexible.
  • Still under development, some rough edges
  • Transaction maybe lost is master crashes (async replication)
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Careful setup for the host, swapping can lead to node expulsion from the cluster
  • Data inconsistency can happen if the master crashes (fix coming)
  • Pacemaker is complex, logs are difficult to read and understand

 

MySQL master HA (MHA)

Like with PRM above, MySQL master HA (MHA), provides high-availability through replication. The approach is different, instead of relying on an HA framework like Pacemaker, it uses Perl scripts. Information about MHA can be found here.

Positive points Negative points
  • Mature
  • Nothing specific regarding the workload
  • No latency effects on writes
  • Can have many slaves and slaves can have different roles
  • Very good binlog/relaylog handling
  • Work pretty hard to minimise data loss
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 5s in normal conditions
  • If the master crashes, slaves will be consistent
  • The logic is fairly easy to understand
  • Transaction maybe lost is master crashes (async replication)
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Monitoring and logic are centralized, single-point of failure, a network partition can cause a split-brain
  • Custom fencing devices, custom VIP scripts, no reuse of other projects tools
  • Most of the deployments are using manual failover (at least at Percona)
  • Requires priviledged ssh access to read relay-logs, can be a security concern
  • No monitoring of the slave to invalidate it if it lags too much or if replication is broken, need to be done by external tool like HAProxy
  • Careful setup for the host, swapping can lead to node expulsion from the cluster

 

NDB Cluster

NDB cluster is the most high-end form of high-availability configuration for MySQL. It is a complete shared nothing architecture where the storage engine is distributed over multiple servers (data nodes). Probably the best starting point with NDB is the official document, here.

Positive points Negative points
  • Mature
  • Synchronous replication
  • Very good at small transactions
  • Very good at high concurrency (many client threads)
  • Huge transaction capacity, more than 1M trx/s are not uncommon
  • Failover can be ~1s
  • No single point of failure
  • Geographical disaster recovery capacity built-in
  • Strong at async replication, applying by batches gives multithreaded apply at the data node level
  • Can scale reads and writes, the framework implements sharding by hashes
  • Not a drop-in replacement for Innodb, you need to tune the schema and the queries
  • Not a general purpose database, some loads like reporting are just bad
  • Only the Read-commited isolation level is available
  • Hardware heavy, need 4 servers mininum for full HA
  • Memory (RAM) hungry, even with disk-based tables
  • Complex to operate, lots of parameters to adjust
  • Need a load balancer for failover
  • Very new foreign key support, field reports scarce on it

 

Shared storage/DRBD

Achieving high-availability use a shared storage medium is an old and well known method. It is used by nearly all the major databases. The share storage can be a DAS connected to two servers, a LUN on SAN accessible from 2 servers or a DRBD partition replicated synchronously over the network. DRBD is by bar the most common shared storage device used in the MySQL world.

Positive points Negative points
  • Mature
  • Synchronous replication (DRBD)
  • Automatic failover is easy to implement
  • VIP based access
  • Write capacity is impacted by network latency for DRBD
  • SANs are expensive
  • Only for InnoDB
  • Standby node, a big server doing nothing
  • Need a warmup period after failover to be fully operational
  • Disk corruption can spread

 

The post High-availability options for MySQL, October 2013 update appeared first on MySQL Performance Blog.

May
28
2013
--

Choosing a MySQL HA Solution – MySQL Webinar: June 5

Choosing a MySQL HA Solution - MySQL Webinar: June 5Selecting the most appropriate solution for a MySQL HA infrastructure is as much a business and philosophical decision as it is a technical one, but often the choice is made without adequately considering all three perspectives.  When too much attention is paid to one of these aspects at the cost of the others, the resulting system may be over-engineered, poorly-performing, and/or various other flavors of suboptimal.

On Wednesday, June 5, at 10 a.m. PDT (1700 UTC), I will be presenting a webinar entitled, Choosing a MySQL HA Solution, in which we’ll explore the topic of MySQL HA from each of these perspectives.  The goal will be to motivate your thinking about HA in a holistic fashion and help guide you towards asking the right questions when considering a new or upgraded HA deployment.

This webinar will be both technical and non-technical in nature, beginning with a discussion of some general HA principles and some common misconceptions.  We will then explore some of the more well-known MySQL HA tools and technologies available today (largely grouped into those which use traditional MySQL replication, those which use some other MySQL-level replication, and those which replicate at some other layer of the system stack) and then conclude with some typical use cases where a given approach may be well-suited or particularly contraindicated.

If this topic interests you, then register today to reserve your spot.  I look forward to speaking with all of you next week.

The post Choosing a MySQL HA Solution – MySQL Webinar: June 5 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com