Jan
11
2021
--

Full Read Consistency Within Percona Kubernetes Operator for Percona XtraDB Cluster

Full Read Consistency Within Percona Kubernetes Operator

Full Read Consistency Within Percona Kubernetes OperatorThe aim of Percona Kubernetes Operator for Percona XtraDB Cluster is to be a special type of controller introduced to simplify complex deployments. The Operator extends the Kubernetes API with custom resources. The Operator solution is using Percona XtraDB Cluster (PXC) behind the hood to provide a highly available, resilient, and scalable MySQL service in the Kubernetes space. 

This solution comes with all the advantages/disadvantages provided by Kubernetes, plus some advantages of its own like the capacity to scale reads on the nodes that are not Primary.

Of course, there are some limitations like the way PXC handles DDLs, which may impact the service, but there is always a cost to pay to get something, expecting to have it all for free is unreasonable.     

In this context, we need to talk and cover what is full read consistency in this solution and why it is important to understand the role it plays.  

Stale Reads

When using Kubernetes we should talk about the service and not about the technology/product used to deliver such service. 

In our case, the Percona Operator is there to deliver a MySQL service. We should then see that as a whole, as a single object. To be more clear what we must consider is NOT the fact we have a cluster behind the service but that we have a service that to be resilient and highly available, use a cluster. 

We should not care if a node/pod goes down unless the service is discontinued.

What we have as a plus in the Percona Operator solution is a certain level of READ scalability. This achieved optimizing the use of the non PRIMARY nodes, and instead of having them sitting there applying only replicated data, the Percona Operator provides access to them to scale the reads.  

But… there is always a BUT ? 

Let us start with an image:

 

By design, the apply and commit finalize in Galera (PXC) may have (and has) a delay between nodes. This means that, if using defaults, applications may have inconsistent reads if trying to access the data from different nodes than the Primary. 

It provides access using two different solutions:

  • Using HAProxy (default)
  • Using ProxySQL

 

 

When using HAProxy you will have 2 entry points:

  • cluster1-haproxy, which will point to the Primary ONLY, for reads and writes. This is the default entry point for the applications to the MySQL database.
  • cluster1-haproxy-replicas, which will point to all three nodes and is supposed to be used for READS only. This is the PLUS you can use if your application has READ/WRITE separation.

Please note that at the moment there is nothing preventing an application to use the cluster1-haproxy-replicas also for write, but that is dangerous and wrong because will generate a lot of certification conflicts and BF abort given it will distribute writes all over the cluster impacting on performance as well (and not giving you any write scaling):

 

[marcotusa@instance-1 ~]$ for i in `seq 1 100`; do mysql -h cluster1-haproxy-replicas -e "insert into test.iamwritingto values(null,@@hostname)";done
+----------------+-------------+
| host           | count(host) |
+----------------+-------------+
| cluster1-pxc-1 |          34 |
| cluster1-pxc-2 |          33 |
| cluster1-pxc-0 |          33 |
+----------------+-------------+

When using ProxySQL the entry point is a single one, but you may define query rules to automatically split the R/W requests coming from the application. This is the preferred method when an application has no way to separate the READS from the writes.

Here I have done a comparison of the two methods, HAProxy and ProxySQL.

Now, as mentioned above, by default, PXC (any Galera base solution) comes with some relaxed settings, for performance purposes. This is normally fine in many standard cases, but if you use the Percona Operator and use the PLUS of scaling reads using the second access point with HAproxy or Query Rules with Proxysql, you should NOT have stale reads, given the service must provide consistent data, as if you are acting on a single node. 

To achieve that you can change the defaults and change the parameter in PXC wsrep_sync_wait. 

When changing the parameter wsrep_sync_wait as explained in the documentation, the node initiates a causality check, blocking incoming queries while it catches up with the cluster. 

Once all data on the node receiving the READ request is commit_finalized, the node performs the read.

But this has a performance impact, as said before.

What Is The Impact?

To test the performance impact I had used a cluster deployed in GKE, with these characteristics:

  • 3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
  • 1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
  • PXC pods using:
    •  25GB of the 32 available 
    • 6 CPU of the 8 available
  • HAProxy:
    • 600m CPU
    • 1GB RAM
  • PMM agent
    • 500m CPU
    • 500 MB Ram

In the application node, I used sysbench running two instances, one in r/w mode the other only reads. Finally, to test the stale read, I used the stale read test from my test suite.

Given I was looking for results with a moderate load, I just used 68/96/128 threads per sysbench instance. 

Results

Marco, did we have or not have stale reads? Yes, we did:

I had from 0 (with very light load) up to 37% stale reads with a MODERATED load, where moderated was the 128 threads sysbench running. 

Setting wsrep_sync_wait=3 of course I had full consistency.  But I had performance loss:

As you can see, I had an average loss of 11% in case of READS:

While for writes the average loss was 16%. 

Conclusions

At this point, we need to stop and think about what is worth doing. If my application is READs heavy and READs scaling, it is probably worth enabling the full synchronicity given scaling on the additional node allows me to have 2x or more READs. 

If instead my application is write critical, probably losing also ~16% performance is not good.

Finally if my application is stale reads tolerant, I will just go with the defaults and get all the benefits without penalties.

Also keep in mind that Percona Kubernetes Operator for Percona XtraDB Cluster is designed to offer a MySQL service so the state of the single node is not as critical as if you are using a default PXC installation, PODs are by nature ephemeral objects while service is resilient.

References

Percona Kubernetes Operator for Percona XtraDB Cluster

https://github.com/Tusamarco/testsuite

https://en.wikipedia.org/wiki/Isolation_(database_systems)#Dirty_reads

https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-sync-wait

https://www.slideshare.net/lefred.descamps/galera-replication-demystified-how-does-it-work

Oct
15
2015
--

Proxy Protocol and Percona XtraDB Cluster: A Quick Guide

On September 21st, we released Percona XtraDB Cluster 5.6.25. This is the first PXC release supporting proxy-protocol that has been included in Percona Server since 5.6.25-73.0.

With this blog post, I want to promote a new feature that you may have ignored.

Let’s start with a description of the use case:

Architecture Overview:

cluster3

HAProxy is configured like this:

listen 3307-active-passive-writes 0.0.0.0:3307
    mode tcp
    balance leastconn
    option  httpchk
    server pxc1 pxc1:3306 check port 8000 inter 1000 rise 3 fall 3
    server pxc2 pxc2:3306 check port 8000 inter 1000 rise 3 fall 3   backup
    server pxc3 pxc3:3306 check port 8000 inter 1000 rise 3 fall 3   backup

So until now, when we connected from any machine to HA Proxy in MySQL, the connection was made by HAProxy’s host.

[root@app1 ~]# mysql -h haproxy -P 3307 -utest -ptest -e "select @@wsrep_node_name, sleep(20)";
[root@app2 ~]# mysql -h haproxy -P 3307 -utest -ptest -e "select @@wsrep_node_name, sleep(20)";

In the processlist we could see it like this:

pxc1 mysql>  SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user,
             PROCESSLIST_HOST AS host, PROCESSLIST_INFO
             FROM performance_schema.threads
             WHERE PROCESSLIST_INFO LIKE 'select @% sleep%';
+------+------+---------+-------------------------------------+
| id   | user | host    | PROCESSLIST_INFO                    |
+------+------+---------+-------------------------------------+
|  961 | test | haproxy | select @@wsrep_node_name, sleep(20) |
|  963 | test | haproxy | select @@wsrep_node_name, sleep(20) |
+------+------+---00----+-------------------------------------+

Shown like this, it’s impossible to know which connection is made from app1 and which from app2, as they all come from the proxy. Therefore this is also the host that’s required in GRANTS.

And it is particularly for this reason that we decided to support proxy protocol.

Let’s add the required setting in my.cnf:

proxy_protocol_networks = *

Then we need to also modify HAProxy’s config to be like this:

listen 3307-active-passive-writes 0.0.0.0:3307
    mode tcp
    balance leastconn
    option  httpchk
    server pxc1 pxc1:3306 send-proxy check port 8000 inter 1000 rise 3 fall 3
    server pxc2 pxc2:3306 send-proxy check port 8000 inter 1000 rise 3 fall 3   backup
    server pxc3 pxc3:3306 send-proxy check port 8000 inter 1000 rise 3 fall 3   backup

send-proxy-v2 is also supported.


From HA Proxy’s Manual:

Version 1 senders MAY only produce the human-readable header format. Version 2
senders MAY only produce the binary header format. Version 1 receivers MUST at
least implement the human-readable header format. Version 2 receivers MUST at
least implement the binary header format, and it is recommended that they also
implement the human-readable header format for better interoperability and ease
of upgrade when facing version 1 senders.

So now, when MySQL and HAProxy are restarted, we can see in the processlist the origin of the connection:

+------+------+------+-------------------------------------+
| id   | user | host | PROCESSLIST_INFO                    |
+------+------+------+-------------------------------------+
|  283 | test | app1 | select @@wsrep_node_name, sleep(20) |
|  284 | test | app2 | select @@wsrep_node_name, sleep(20) |
+------+------+------+-------------------------------------+

Is everything perfect then? Not really…

One disadvantage of this proxy_protocol_network setting is that now you are not able to connect to MySQL if you don’t send the proxy headers:

[root@app1 ~]# mysql -h pxc1 -P 3306 -utest -ptest -e "select @@wsrep_node_name, sleep(20)";

This connection seems stalled… no answer, no error…

In the MySQL processlist we can see:

pxc1 mysql> show full processlist;
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+
| Id  | User                 | Host            | db   | Command | Time | State              | Info                  | Rows_sent | Rows_examined |
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+
|   1 | system user          |                 | NULL | Sleep   | 1922 | NULL               | NULL                  |         0 |             0 |
|   2 | system user          |                 | NULL | Sleep   | 1922 | wsrep aborter idle | NULL                  |         0 |             0 |
|   9 | root                 | localhost       | NULL | Query   |    0 | init               | show full processlist |         0 |             0 |
| 360 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+

This means you can connect only via HAProxy or the socket… Of course this limits you a lot, not only for DBAs or developers but also for things like replication slaves, for example.

And you can also imagine how fast you could reach the max connections as every tentative will be stalled in that state:

pxc1 mysql> show full processlist;
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+
| Id  | User                 | Host            | db   | Command | Time | State              | Info                  | Rows_sent | Rows_examined |
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+
|   1 | system user          |                 | NULL | Sleep   | 1969 | NULL               | NULL                  |         0 |             0 |
|   2 | system user          |                 | NULL | Sleep   | 1969 | wsrep aborter idle | NULL                  |         0 |             0 |
|   9 | root                 | localhost       | NULL | Query   |    0 | init               | show full processlist |         0 |             0 |
| 990 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
| 992 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
| 993 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
| 994 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
| 996 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
| 997 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL                  |         0 |             0 |
+-----+----------------------+-----------------+------+---------+------+--------------------+-----------------------+-----------+---------------+

The usual MySQL variable connect_timeout doesn’t work.

To avoid this and also to connect directly to MySQL, you need to specify from which network MySQL expects those proxy headers. Therefore proxy_protocol_network (ppn) should be set to the proxy address (or network range if your proxies use a different dedicated one):

proxy_protocol_networks = 192.168.56.5

Even if this is the recommended solution, it won’t work because it’s mandatory to bind MySQL to listen to the IPv4 address:

bind_address=192.168.56.2 (on pxc1)

This is the error you would face if you don’t bind MySQL:

ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0

When this is implemented in the configuration and MySQL is restarted, it’s now possible to connect to MySQL from the proxy or directly:

[root@app1 ~]# mysql -h haproxy -P 3307 -utest -ptest -e "select @@wsrep_node_name, sleep(20)"
+-------------------+-----------+
| @@wsrep_node_name | sleep(20) |
+-------------------+-----------+
| pxc1              |         0 |
+-------------------+-----------+
[root@app1 ~]# mysql -h pxc1 -P 3306 -utest -ptest -e "select @@wsrep_node_name, sleep(20)"
+-------------------+-----------+
| @@wsrep_node_name | sleep(20) |
+-------------------+-----------+
| pxc1              |         0 |
+-------------------+-----------+

I tried to completely disable IPv6 instead of binding to a specific IP in MySQL (something I’ve heard of?) but it didn’t help:

[root@pxc2 ~]# mysql -h haproxy -P 3307 -utest -ptest -e "select @@wsrep_node_name, sleep(20)"
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0

To summarize, if you want to enable the resolution of the real client (origin) instead of HAProxy’s IP in MySQL’s processlist and GRANTS (for authentication), you need:

  • add proxy-protocol in HAProxy
  • add proxy_protocol_networks to my.cnf and set it to the proxy’s IP
  • bind mysql to its IPv4 IP

This is also a summary of settings I’ve tried and their results:

MySQL bind proxy_protocol_networks HA Proxy protocol option Result
n/a n/a n/a We see HA Proxy IP/host in MySQL as source for the connection
n/a n/a send-proxy ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading authorization packet’, system error: 0
n/a ppn=* send-proxy We see APP as source of connection when connecting using the proxy
We can’t connect directly to MySQL
n/a ppn=ha proxy’s IP send-proxy We can connect directly
We can’t conection using the proxy: ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading authorization packet’, system error: 0
IP ppn=ha proxy’s IP send-proxy We see APP as source of the connection when using the proxy
We can connect direclty
IP ppn=MySQL’s IP send-proxy Cannot connect using the proxy: ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading authorization packet’, system error: 0
IP ppn=network range (192.168.56.0/24) send-proxy We can connect using the Proxy and see APP as source
We cannot connect directly

These two bugs were reported while writing this blog post:

The post Proxy Protocol and Percona XtraDB Cluster: A Quick Guide appeared first on MySQL Performance Blog.

Jun
19
2015
--

Q&A: High availability when using MySQL in the cloud

Percona MySQL webinar followup: Q&ALast week I hosted a webinar on using MySQL in the cloud for High Availability (HA) alongside 451 Research analyst Jason Stamper. You can watch the recording and also download the slides (free) here. Just click the “Register” button at the end of that page.

We had several excellent questions and we didn’t have time to get to several of them in the allotted time. I’m posting them here along with the answers. Feel free to ask follow-up questions in the comments below.


Q: Can the TokuDB engine be used in a PXC environment?

A: No, TokuDB cannot currently be used in a PXC environment, the only supported engine in Percona XtraDB Cluster 5.6 is InnoDB.

Q: With Galera replication (PXC), is balancing the load on each node?

A: No, you need to implement your own load balancing and HA layer between your clients and the Percona XtraDB Cluster server.  Examples mentioned in the webinar include HAProxy and F5 BigIP.

Q: What’s the best version of Percona XtraDB Cluster regarding InnoDB performance?

A: In general for best performance you should be using the latest release of Percona XtraDB Cluster 5.6, which is currently 5.6.24, released on June 3rd, 2015.

Q: Can I redirect my writes in Percona XtraDB Cluster to multiple nodes using the HAProxy? While trying with SysBench I can see write-only goes to first nodes in PXC while reads does goes to multiple nodes.

A: Yes you can configure HAProxy to distribute both reads and writes across all of your nodes in a Percona XtraDB Cluster environment. Perhaps SysBench created only one database connection for all writes, and so haproxy kept those confined to only one host. You may want to experiment with parallel_prepare.lua.

Q: What’s the optimal HA for small datasets (db is less than 10gb)?

A: The optimal HA deployment for small datasets would be dependent on your level of recovery required (tolerance for loss of transactions) and time that you can be in an unavailable state (seconds, minutes, hours?).  Unfortunately there isn’t a single answer to your question, however, if you are interested in further discussion on this point Percona would be happy to coordinate a time to speak.  Please feel free to contact me directly and we can continue the conversation at michael.coburn@percona.com.

 Q: Is there a concept of local master vs. remote master with PXC?

A: No there is no concept of local vs remote master.  All nodes in a Percona XtraDB Cluster can now be classified as Master, regardless of their proximity to the clients.

Q: Are there any concerns when considering AWS RDS or AURORA DB for MySQL HA in the Cloud?

A: Regarding AWS RDS, yes this a good option for MySQL HA in the Cloud.  I unfortunately haven’t worked with Aurora DB that much yet so I don’t have an opinion on it’s suitability for HA in the Cloud.

Q: We tried out PXC awhile back and it used to lock everything whenever any ddl was done. Has that changed?

A: We would have to look at the specifics of your environment, however, there have been numerous improvements in the 1½ years of development since Percona XtraDB Cluster went Generally Available (GA) on January 30th, 2014 in version 5.6.15.

Q: Is using the arbitrator a must?

A: No the arbitrator role via the garbd daemon is generally only used when operating in a minimal environment of two nodes that contain the data and you need a third node for quorum but don’t want to store the data a third time.

Q: Can we do a cluster across different zones?

A: Yes you can. However be aware that the latency incurred for all cluster certification operations will be impacted by the round trip time between nodes.

Q: Does PXC also support the MyISAM database?

A: No, Percona XtraDB Cluster does not support any storage engine other than InnoDB as of PXC 5.6.

Q: How do load balancers affect the throughput in a Galera-based setup given that the write would be limited by the slowest node?

A: Load balancers will introduce some measure of additional latency in the form of CPU time in the load balancer layer as it evaluates its own ruleset, and also in network time due to additional hop via load balancer.  Otherwise there should be no perceptible difference in the write throughput of a Percona XtraDB Cluster with and without a load balancer as it relates to the “slowest node” factor.

Q: Have you used MaxScale yet? If so, what are your thoughts?

A: Unfortunately I haven’t used MaxScale however Yves Trudeau, Percona Principal Architect, has recently written about MaxScale in this blog post.

Q: How do you configure timeout and maintain persistent connection to HAProxy?

A: I would encourage you to refer to the HAProxy Documentation.

The post Q&A: High availability when using MySQL in the cloud appeared first on MySQL Performance Blog.

Dec
18
2014
--

Making HAProxy 1.5 replication lag aware in MySQL

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

Making HAProxy 1.5 replication lag aware in MySQLHAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php
= 10 && $lag < 60){
		return "up 50%";
	}
	else
		return "up 5%";
}
set_time_limit(0);
$socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr);
if (!$socket) {
	echo "$errstr ($errno)
n";
} else {
	while ($conn = stream_socket_accept($socket,9999999999999)) {
		$cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
		exec("$cmd",$lag);
		$weight = set_weight($lag[0]);
		unset($lag);
		fputs ($conn, $weight);
		fclose ($conn);
	}
	fclose($socket);
}
?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
up 100%
Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front
bind *:3307
mode tcp
option tcplog
log global
default_backend read_only-back
backend read_only-back
mode tcp
balance leastconn
server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions
server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# HAProxy
$ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 25
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,50
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,150

Slave2 down

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: NULL
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,DOWN (agent),100
read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.

Oct
03
2014
--

HAProxy: Give me some logs on CentOS 6.5!

HAProxy is frequently used as a load-balancer in front of a Galera cluster. While diagnosing an issue with HAProxy configuration, I realized that logging doesn’t work out of the box on CentOS 6.5. Here is a simple recipe to fix the issue.

If you look at the top of /etc/haproxy/haproxy.cfg, you will see something like:

global
    log         127.0.0.1 local2
[...]

This means that HAProxy will send its messages to rsyslog on 127.0.0.1. But by default, rsyslog doesn’t listen on any address, hence the issue.

Let’s edit /etc/rsyslog.conf and uncomment these lines:

$ModLoad imudp
$UDPServerRun 514

This will make rsyslog listen on UDP port 514 for all IP addresses. Optionally you can limit to 127.0.0.1 by adding:

$UDPServerAddress 127.0.0.1

Now create a /etc/rsyslog.d/haproxy.conf file containing:

local2.*    /var/log/haproxy.log

You can of course be more specific and create separate log files according to the level of messages:

local2.=info     /var/log/haproxy-info.log
local2.notice    /var/log/haproxy-allbutinfo.log

Then restart rsyslog and see that log files are created:

# service rsyslog restart
Shutting down system logger:                               [  OK  ]
Starting system logger:                                    [  OK  ]
# ls -l /var/log | grep haproxy
-rw-------. 1 root   root      131  3 oct.  10:43 haproxy-allbutinfo.log
-rw-------. 1 root   root      106  3 oct.  10:42 haproxy-info.log

Now you can start your debugging session!

The post HAProxy: Give me some logs on CentOS 6.5! appeared first on MySQL Performance Blog.

Oct
15
2013
--

Using keepalived for HA on top of Percona XtraDB Cluster

Percona XtraDB Cluster (PXC) itself manages quorum and node failure.  Minorities of nodes in a network partition situation will move themselves into a Non-primary state and not allow any DB activity.  Nodes in such a state will be easily detectable via SHOW GLOBAL STATUS variables.

It’s common to use HAproxy with PXC for load balancing purposes, but what if you are planning to just send traffic to a single node?  We would standardly use keepalived to HA HAproxy, and keepalived supports track_scripts that can monitor whatever we want, so why not just monitor PXC directly?

If we have clustercheck working on all hosts:

mysql> GRANT USAGE ON *.* TO 'clustercheck'@'localhost' IDENTIFIED BY PASSWORD '*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19';
[root@node1 ~]# /usr/bin/clustercheck clustercheck password 0; echo $?
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40
Percona XtraDB Cluster Node is synced.
0

Then we can just install keepalived and the this config on all nodes:

vrrp_script chk_pxc {
        script "/usr/bin/clustercheck clustercheck password 0"
        interval 1
}
vrrp_instance PXC {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 100
    nopreempt
    virtual_ipaddress {
        192.168.70.100
    }
    track_script {
        chk_pxc
    }
    notify_master "/bin/echo 'now master' > /tmp/keepalived.state"
    notify_backup "/bin/echo 'now backup' > /tmp/keepalived.state"
    notify_fault "/bin/echo 'now fault' > /tmp/keepalived.state"
}

And start the keepalived service. The virtual IP above will be brought up on an active node in the cluster and moved around if clustercheck fails.

[root@node1 ~]# cat /tmp/keepalived.state
now backup
[root@node2 ~]# cat /tmp/keepalived.state
now master
[root@node3 ~]# cat /tmp/keepalived.state
now backup
[root@node2 ~]# ip a l | grep 192.168.70.100
    inet 192.168.70.100/32 scope global eth1
[root@node3 ~]# mysql -h 192.168.70.100 -u test -ptest test -e "show global variables like 'wsrep_node_name'"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| wsrep_node_name | node2 |
+-----------------+-------+

If I shutdown PXC on node2:

[root@node2 keepalived]# service mysql stop
Shutting down MySQL (Percona XtraDB Cluster)....... SUCCESS!
[root@node2 ~]# /usr/bin/clustercheck clustercheck password 0; echo $?
HTTP/1.1 503 Service Unavailable
Content-Type: text/plain
Connection: close
Content-Length: 44
Percona XtraDB Cluster Node is not synced.
1
[root@node1 ~]# cat /tmp/keepalived.state
now master
[root@node2 ~]# cat /tmp/keepalived.state
now fault
[root@node3 ~]# cat /tmp/keepalived.state
now backup
[root@node1 ~]# ip a l | grep 192.168.70.100
    inet 192.168.70.100/32 scope global eth1
[root@node2 ~]# ip a l | grep 192.168.70.100
[root@node2 ~]#
[root@node3 ~]# ip a l | grep 192.168.70.100
[root@node3 ~]#

We can see node2 moves to a FAULT state and the VIP moves to node1 instead.  This provides us with a very simple way to do Application to PXC high availability.

A few additional notes:

  • You can disqualify donors (i.e., make clustercheck fail on Donor/Desynced nodes) by setting the 3rd argument to clustercheck to 0.  Setting this to 1 means Donors can retain the VIP.
  • Each keepalived instance monitors its own state only, hence the @localhost GRANT.  This is much cleaner than exposing clustercheck as a web port via xinetd.
  • It’s possible to more complex things with keepalived like multiple vips, node weighting, etc.
  • Keepalived can track over multiple network interfaces (in this example, just eth1) for better reliability.

The post Using keepalived for HA on top of Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Jun
17
2013
--

Percona XtraDB Cluster (PXC) in the real world: Share your use cases!

Percona XtraDB ClusterThe aim of this post is to enumerate real-world usage of Percona XtraDB Cluster (PXC), and also to solicit use cases from the readers. One of the prominent usages in the production environment that we have come across (and our Percona consultants have assisted) is that of HP Cloud. There is a post about it here by Patrick Galbraith of HP. The post focuses on their deployment of PXC for HP Cloud DNS. The post focuses on the key aspects of synchronous replication setup with high-availability guarantees like split-brain immunity.

Nobody likes to debug async replication while its broken or do the master-master/master-slave switchover when master is dying/dead. Yes, there are wrappers/scripts around this to make life easier, however, wouldn’t it be nice if this was built into the system itself? PXC based on Galera strives to provide that. Scaling makes sense only when addition/removal of hosts from a cluster or a HA setup is simple and uncomplicated.

Their post focuses on following aspects:

  • Initial setup
  • Setup of other nodes with SST (Xtrabackup SST)
  • Integration of chef with PXC
  • Finally, integration of HAProxy as a loadbalancer.

To elucidate, their initial setup goes into bootstrapping the first node. Note that in the cloud environment other nodes are not known until they are brought up, hence bootstrapping with an empty gcomm:// is done for the first node by the chef. The second node is then added which SSTs with node1 (based on gcomm://node1 of node2) through Xtrabackup SST (state snapshot transfer). Node3 subsequently joins the cluster with node1 and node2 in its gcomm:// (since by this time node1, node2 are up). After this, a subsequent run of chef-client is done to update the cnf files with IP address of members (excluding itself). The rationale behind this is that when a node is restarted (and there are others when it comes up) it joins the cluster seamlessly. I would like to note here that we are adding a bootstrap parameter to PXC so that any latter modifications like these to cnf files are not required and preset it during cluster startup itself. The only caveat is that the node information – IP address or hostname – should be known in advance (the node itself needn’t be up), which may not be feasible in a cloud environment.

Next, the SST. Xtrabackup SST is used there. SST matters a lot because not only is it used during initial node setup but also it is required when a node has been down for a while and IST (incremental state transfer) is not feasible. It also helps when node data integrity is compromised. So, naturally duration of SST is paramount. We recommend Xtrabackup SST for its reduced locking period from its use (which means the donor is blocked for a shorter while). By using Xtrabackup for SST, you also get its benefits like compression, parallel streaming, encryption, compact backups which can be used for SST (Note, the wsrep_sst_xtrabackup in 5.5.30 can’t do those except parallel, the one in 5.5.31 will handle them all, also XB 2.1 is required for most).

Finally, the HAProxy. HAProxy is one of the loadbalancers recommended for use with PXC. The other one is glb. HAProxy is used with xinetd on the node along with a script which checks PXC for its sync status. As referenced in that post, you can refer a post by Peter Boros (“Percona XtraDB Cluster reference architecture with HaProxy“) for details. In their setup they have automated this with a HAProxy in each AZ (Availability Zone) for the API server. To add, we are looking at reducing the overhead here, through steps like replacing xinetd and clustercheck with a single serving process (we are adding one in 5.5.31), looking for optimizations with HAProxy to account for high connection rates, and using pacemaker with PXC. The goal is to reduce the overhead of status checks, mainly on the node. You can also look this PLMCE talk for HAProxy deployment strategies with PXC.

To conclude, it is interesting to note that they have been able to manage this with a small team. That strongly implies scalability of resources – you scale more with less, and that is how it should be. We would like to hear from you about your architectural setup around PXC – any challenges you faced (and horror stories if any), any special deployment methodologies you employed (Puppet, Chef, Salt, Ansible etc. ), and finally any suggestions.

The post Percona XtraDB Cluster (PXC) in the real world: Share your use cases! appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com