Dec
02
2014
--

Tips from the trenches for over-extended MySQL DBAs

This post is a follow-up to my November 19 webinar, “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA,” during which I described some of the most common reasons DBAs experience avoidable downtime. The session was aimed at the “over-stretched DBA,” identified as the MySQL DBA short of time or an engineer of another discipline without the depth of the MySQL system. The over-stretched DBA may be prone to making fundamental mistakes that cause downtime through poor response time, operations that cause blocking on important data or administrative mishaps through the lack of best practice monitoring and alerting. (You can download my slides and view the recorded webinar here.)

Monitor the things
One of the aides to keeping the system up and running is ensuring that your finger is on the pulse of the environment. Here on the Percona Managed Services team, we leverage Percona Monitoring Plugins (open source plugins for Nagios, Cacti and Zabbix) to ensure we have visibility of our client’s operations. Having a handle on basics such as disk space, memory usage and MySQL operational metrics ensures that we avoid trivial downtime that would affect the client’s uptime or worse, their bottom line.

Road Blocks
One of the most common reasons that an application is unable to serve data to its end user is that access to a table is being blocked due to another ongoing operation. This can be blamed on a variety of sources: backups, schema changes, poor configuration and long running transactions can all lend themselves to costly blocking. Understanding the impact of actions on a MySQL server can be the difference between a happy end user and a frustrated one.

During the webinar I made reference to some resources and techniques that can assist the over extended DBA avoid downtime and here are some highlights….

Monitoring and Alerting
It’s important that you have some indications that something is reaching its capacity. It might be the disk, connections to MySQL or auto_increment limit on a highly used table. There is quite the landscape to cover but here are a handful of helpful tools:
* Percona Monitoring Plugins
* Monyog
* New Relic

Query Tuning
Poorly performing SQL can be indicative that the configuration is incorrect, that there’s a missing index or that your development team needs a quick lesson on MySQL anti-patterns. Arm yourself with proof that the SQL statements are substandard using these resources and work with the source to make things more efficient:
* Percona Cloud Tools
* pt-query-digest, explain, indexes

High Availability
If you need to ensure that your application survives hiccups such as hardware failure or network impairment, a well deployed HA solution will give you the peace of mind that you can quickly mitigate bumps in the road.
* MHA
Percona XtraDB Cluster, Galera
* Percona Replication Manager
* LinuxHA/Corosync/DRBD

Backups
A wise man once quoted “A backup today saves you tomorrow.” Covering all bases can be the difference between recovering from a catastrophic failure and job hunting. Mixing logical, physical and incremental backups while adding in some offsite copies can provide you with the safety net in the event that a small mistake like a dropped table is met or worse, all working copies of data and backups are lost in a SAN failure. It happens so be prepared.
* Percona XtraBackup
* mydumper
* mysqldump
* mysqlbinlog (5.6)
* mylvmbackup

We had some great questions from the attendees and regrettably were unable to answer them all, so here are some of them with my response.

Tips from the trenches for over-extended MySQL DBAsQ: I use MySQL on Amazon RDS. Isn’t much of the operations automated or do these tips still apply?
A: It’s not completely automated. There are still challenges to address and configuration opportunities, but understanding the limitations of RDS is key. For example, the location and size of the tmpdir is something you are unable to customise on RDS. You would typically review this config in a production environment if your workload required it. Any costly queries that perform operations requiring tmp area to sort (think OLAP) might not be a good fit on RDS due to this limitation. Getting to know the limitations around hosted or DBaaS services is time well spent to avoid explaining what keeps taking the application down in peak hours.

Q: What other parts of Percona Toolkit do you recommend for MySQL operations?
A: Percona Toolkit is a well-evolved suite of tools that all MySQL DBAs should familiarize themselves with. In particular I will fit many tools into my weekly workflow:

Operations

  • pt-online-schema-change
  • pt-table-checksum
  • pt-table-sync

Troubleshooting

  • pt-stalk
  • pt-pmp
  • pt-config-diff

Knowledge Gathering

  • pt-summary
  • pt-mysql-summary
  • pt-duplicate -key-checker

The key with Percona Toolkit is that many common tasks or problems that could cause you to reinvent the wheel are covered, mature and production ready. As with any tool, you should always read the label or in this case the documentation so you’re well aware what the tools can do, the risks and the features that you can make use of.

Q: HA – are there any solutions that you would stay away from?
A: Using any particular HA solution is going to be another R&D exercise. You will need to understand the tradeoffs, configuration options and compare between products. Some might have a higher TCO or lack functionality. Once the chosen solution is implemented it’s pertinent that the engineers understand the technology to be able to troubleshoot or utilize the functionality in the situation where failover needs to be instigated. I like HA solutions to be fast to failover to and some entail starting MySQL from cold.

Q: You mentioned having tested backups. How do you perform this?
A: Percona’s method is using a dedicated host with access to the backup files. Then with a combination of mysqlsandbox and pt-table-checksum we can discover if we trust the files we capture for disaster recovery. Many people underestimate the importance of this task.

Q: Percona Cloud Tools – how much does it cost?
A: Right now it’s a free service. Visit cloud.percona.com for more information, but in a nutshell Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses.

Q: Is there API access to Percona Cloud Tools for application integration?
A: There is currently not a public API available. It is on the roadmap, though. We’d be interested to hear more about your use case so please sign up for the service and try it out. After signing in, all pages include a Feedback link to share your thoughts and ideas such as how you’d like to use a public API.

Q: Can you use MHA with Percona XtraDB Cluster?
A: MHA is not something that can be used with Percona XtraDB Cluster (PXC). It’s common to partner PXC with HAProxy for making sure your writes are going to the appropriate node.

Q: Can MHA make automatic failover? If MHA has automatic failover, what do you recommend? Configure it for automatic failover?
A: MHA can make an automatic failover. Personally I prefer managed failover. When working with automated failover it’s important that failback is manual to avoid “flapping.” “Splitbrain” is an ailment that you don’t want to suffer from as well and auto failover removes the human judgment from the decision to relocate operations from a failed node onto a standby node. If you are going to vote for an automatic failover it is advised to test all potential failure scenarios and to employ a STONITH method to really ensure that the unresponsive node is not serving read/write traffic.

Q: What is the best way to detect database blocking from DML statements? Is there a tool that will show blocking after the fact so you don’t have to catch it real-time?
A: Once again, Percona has a tool called pt-deadlock-logger that can detect and log deadlocks. Detecting locking can be achieved using “SHOW ENGINE INNODB STATUS” or utilizing the information_schema.innodb_locks table. Some engineering might be required for this to be logged but those resources exist for use.

Q: Since you mentioned tinkering with ELK I was wondering if you had any tips on good Kibana dashboards to build to monitor MySQL databases/clusters?
A: ELK is something that I’m looking to publish some information on soon so watch this space!

Thanks again everyone for the great questions! And as a reminder, you can download my slides and view the recorded webinar here.

The post Tips from the trenches for over-extended MySQL DBAs appeared first on MySQL Performance Blog.

Aug
05
2014
--

New in Percona Replication Manager: Slave resync, Async stop

Percona Replication Manager (PRM) continues to evolve and improve, I want to introduce two new features: Slave resync and Async stop.

Slave resync

This behavior is for regular non-gtid replication.  When a master crashes and is no longer available, how do we make sure all the slaves are in a consistent state. It is easy to find the most up to date slave and promote it to be the new master based on the master execution position, the PRM agent already does that but how do we apply the missing transactions to the other slaves.

In order to solve that problem, I modified a tool originally written by Yelp, that outputs the MD5 sums of the payload (XID boundaries) and the commit positions of a binlog file. It produces an output like:

root@yves-desktop:/home/yves/src/PRM/percona-pacemaker-agents/tools/prm_binlog_parser# ./prm_binlog_parser.static.x86_64 /opt/prm/binlog.000382 | tail -n 5
53844618,E5A35A971445FC8E77D730932869F2
53846198,E37AA6704BE572727B342ABE6EFA935
53847779,B17816AC37424BB87B3CD55226D5EB17
53848966,A7CFF846D281DB363FE4192C38DD7743
53850351,A9C5B7FC24A1BA3B97C332CC362633CE

Where the first field is the commit position and the second field, the md5 sum of the payload. The only type of transaction that is not supported is “LOAD DATA INTO”. Of course, since it relies on XID values, this only works with InnoDB. It also requires the “log_slave_updates” to be enabled. The sources and some static binaries can be found in the tools directory on the PRM github, link is at the bottom.

So, if the agent detects a master crash and the prm_binlog_parser tool is available (the prm_binlog_parser_path primitive parameter), upon promotion, the new master will look at its binary logs and publish to the pacemaker CIB, the last 3000 transactions. The 3000 transactions limit is from bash, the command line must be below 64k. With some work this limit could be raised but I believe, maybe wrongly, that it covers most of the cases. The published data in the CIB attribute also contains the corresponding binlog file.

The next step happens on the slaves, when they get the post-promotion notification. They look for the MD5 sum of their last transaction in the relay log, again using the prm_binlog_parser tool, find the matching file and position in the last 3000 transactions the new master published to the CIB and reconfigure replication using the corrected file and position.

The result is a much more resilient solution that helps having the slave in a consistent state. My next goal regarding this feature will be to look at the crashed master and attempt to salvage any transactions from the old master binlog in the case MySQL crashed but Pacemaker is still up and running.

Async_stop

The async_stop feature, enabled by the “async_stop” primitive parameter, allows for faster failover. Without this feature, when stopping mysqld, PRM will wait until is confirmed stopped before completing a failover. When there’re many InnoDB dirty pages, we all know that stopping mysql can take many minutes. Jervin Real, a colleague at Percona, suggested that we should fake to Pacemaker that MySQL is stopping in order to proceed faster with failover. After adding some safe guards, it proved to be a very good idea. I’ll spare you of the implementation details but now, if the setting is enabled, as soon as mysqld is effectively stopping, the failover completes. If it happens that Pacemaker wants to start a stopping MySQL instance, a very usual situation, an error will be generated.

PRM agents, tools and documentation can be found here: https://github.com/percona/percona-pacemaker-agents

The post New in Percona Replication Manager: Slave resync, Async stop appeared first on MySQL Performance Blog.

Jan
20
2014
--

Percona Replication Manager (PRM) now supporting 5.6 GTID

Over the last few days, I integrated the MySQL 5.6 GTID version of the Percona Replication Manager (PRM) work of Frédéric Descamps, a colleague at Percona. The agent supports the GTID replication mode of MySQL 5.6 and if the master suffers a hard crash, it picks the slave having applied the highest transaction ID from the dead master. Given the nature of GTID-based replication, that causes all the other slaves to resync appropriately to their new master which is pretty cool and must yet be matched by the regular PRM agent.

For now, it is part of a separate agent, mysql_prm56, which may be integrated with the regular agent in the future. To use it, download the agent with the link above, the pacemaker configuration is similar to the one of the regular PRM agent. If you start from scratch, have a look here and of course, replace “mysql_prm” with “mysql_prm56″. Keep in mind that although it successfully ran many tests, it is the first release and there’s no field experience. I invite you to send any issue or successful usage to PRM-discuss.

As a side note, dealing with GTID based replication is slightly different than regular replication. I invite to consult these posts for more details:

Replication in MySQL 5.6: GTIDs benefits and limitations – Part 1

How to create/restore a slave using GTID replication in MySQL 5.6
How to create a new (or repair a broken) GTID based slave with Percona XtraBackup
Repair MySQL 5.6 GTID replication by injecting empty transactions

The post Percona Replication Manager (PRM) now supporting 5.6 GTID appeared first on MySQL Performance Blog.

Jan
10
2014
--

The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB

Most technologies achieving high-availability for MySQL need a load-balancer to spread the client connections to a valid database host, even the Tungsten special connector can be seen as a sophisticated load-balancer. People often use hardware load balancer or software solution like haproxy. In both cases, in order to avoid having a single point of failure, multiple load balancers must be used. Load balancers have two drawbacks: they increase network latency and/or they add a validation check load on the database servers. The increased network latency is obvious in the case of standalone load balancers where you must first connect to the load balancer which then completes the request by connecting to one of the database servers. Some workloads like reporting/adhoc queries are not affected by a small increase of latency but other workloads like oltp processing and real-time logging are. Each load balancers must also check regularly if the database servers are in a sane state, so adding more load balancers increases the idle chatting over the network. In order to reduce these impacts, a very different type of load balancer is needed, let me introduce the Iptables ClusterIP target.

Normally, as stated by the RFC 1812 Requirements for IP Version 4 Routers an IP address must be unique on a network and each host must respond only for IPs it own. In order to achieve a load balancing behavior, the Iptables ClusterIP target doesn’t strictly respect the RFC. The principle is simple, each computer in the cluster share an IP address and MAC address with the other members but it answers requests only for a given subset, based on the modulo of a network value which is sourceIP-sourcePort by default. The behavior is controlled by an iptables rule and by the content of the kernel file /proc/net/ipt_CLUSTERIP/VIP_ADDRESS. The kernel /proc file just informs the kernel to which portion of the traffic it should answer. I don’t want to go too deep in the details here since all those things are handled by the Pacemaker resource agent, IPaddr2.

The IPaddr2 Pacemaker resource agent is commonly used for VIP but what is less know is its behavior when defined as part of a clone set. When part of clone set, the resource agent defines a VIP which uses the Iptables ClusterIP target, the iptables rules and the handling of the proc file are all done automatically. That seems very nice in theory but until recently, I never succeeded in having a suitable distribution behavior. When starting the clone set on, let’s say, three nodes, it distributes correctly, one instance on each but if 2 nodes fail and then recover, the clone instances all go to the 3rd node and stay there even after the first two nodes recover. That bugged me for quite a while but I finally modified the resource agent and found a way to have it work correctly. It also now set correctly the MAC address if none is provided to the MAC multicast address domain which starts by “01:00:5E”. The new agent, IPaddr3, is available here. Now, let’s show what we can achieve with it.

We’ll start from the setup described in my previous post and we’ll modify it. First, download and install the IPaddr3 agent.

root@pacemaker-1:~# wget -O /usr/lib/ocf/resource.d/percona/IPaddr3 https://github.com/percona/percona-pacemaker-agents/raw/master/agents/IPaddr3
root@pacemaker-1:~# chmod u+x /usr/lib/ocf/resource.d/percona/IPaddr3

Repeat these steps on all 3 nodes. Then, we’ll modify the pacemaker configuration like this (I’ll explain below):

node pacemaker-1 \
        attributes standby="off"
node pacemaker-2 \
        attributes standby="off"
node pacemaker-3 \
        attributes standby="off"
primitive p_cluster_vip ocf:percona:IPaddr3 \
        params ip="172.30.212.100" nic="eth1" \
        meta resource-stickiness="0" \
        op monitor interval="10s"
primitive p_mysql_monit ocf:percona:mysql_monitor \
        params reader_attribute="readable_monit" writer_attribute="writable_monit" user="repl_user" password="WhatAPassword" pid="/var/lib/mysql/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" max_slave_lag="5" cluster_type="pxc" \
        op monitor interval="1s" timeout="30s" OCF_CHECK_LEVEL="1"
clone cl_cluster_vip p_cluster_vip \
        meta clone-max="3" clone-node-max="3" globally-unique="true"
clone cl_mysql_monitor p_mysql_monit \
        meta clone-max="3" clone-node-max="1"
location loc-distrib-cluster-vip cl_cluster_vip \
        rule $id="loc-distrib-cluster-vip-rule" -1: p_cluster_vip_clone_count gt 1
location loc-enable-cluster-vip cl_cluster_vip \
        rule $id="loc-enable-cluster-vip-rule" 2: writable_monit eq 1
location loc-no-cluster-vip cl_cluster_vip \
        rule $id="loc-no-cluster-vip-rule" -inf: writable_monit eq 0
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1384275025" \
        maintenance-mode="off"

First, the VIP primitive is modified to use the new agent, IPaddr3, and we set resource-stickiness=”0″. Next, we define the cl_cluster_vip clone set using: clone-max=”3″ to have three instances, clone-node-max=”3″ to allow up to three instances on the same node and globally-unique=”true” to tell Pacemaker it has to allocate an instance on a node even if there’s already one. Finally, there’re three location rules needed to get the behavior we want, one using the p_cluster_vip_clone_count attribute and the other two around the writable_monit attribute. Enabling all that gives:

root@pacemaker-1:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:51:38 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

and the network configuration is:

root@pacemaker-1:~# iptables -L INPUT -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
CLUSTERIP  all  --  0.0.0.0/0            172.30.212.100       CLUSTERIP hashmode=sourceip-sourceport clustermac=01:00:5E:91:18:86 total_nodes=3 local_node=1 hash_init=0
root@pacemaker-1:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
2
root@pacemaker-2:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
3
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1

In order to test the access, you need to query the VIP from a fourth node:

root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-1
pacemaker-1
pacemaker-2
pacemaker-2
pacemaker-2
pacemaker-3
pacemaker-2
^C

So, all good… Let’s now desync the pacemaker-1 and pacemaker-2.

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=1;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:53:51 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-3
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 0
    + writable_monit                    : 0
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 3
    + readable_monit                    : 1
    + writable_monit                    : 1
root@pacemaker-3:~# cat /proc/net/ipt_CLUSTERIP/172.30.212.100
1,2,3
root@pacemaker-4:~# while [ 1 ]; do mysql -h 172.30.212.100 -u repl_user -pWhatAPassword -BN -e "select variable_value from information_schema.global_variables where variable_name like 'hostname';"; sleep 1; done
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3
pacemaker-3

Now, if pacemaker-1 and pacemaker-2 are back in sync, we have the desired distribution:

root@pacemaker-1:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-1:~#
root@pacemaker-2:~# mysql -e 'set global wsrep_desync=0;'
root@pacemaker-2:~#
root@pacemaker-3:~# crm_mon -A1
============
Last updated: Tue Jan  7 10:58:40 2014
Last change: Tue Jan  7 10:50:38 2014 via cibadmin on pacemaker-1
Stack: openais
Current DC: pacemaker-2 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
6 Resources configured.
============
Online: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
 Clone Set: cl_cluster_vip [p_cluster_vip] (unique)
     p_cluster_vip:0    (ocf::percona:IPaddr3): Started pacemaker-3
     p_cluster_vip:1    (ocf::percona:IPaddr3): Started pacemaker-1
     p_cluster_vip:2    (ocf::percona:IPaddr3): Started pacemaker-2
 Clone Set: cl_mysql_monitor [p_mysql_monit]
     Started: [ pacemaker-1 pacemaker-2 pacemaker-3 ]
Node Attributes:
* Node pacemaker-1:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-2:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1
* Node pacemaker-3:
    + p_cluster_vip_clone_count         : 1
    + readable_monit                    : 1
    + writable_monit                    : 1

All the clone instances redistributed on all nodes as we wanted.

As a conclusion, Pacemaker with a clone set of IPaddr3 is a very interesting kind of load balancer, especially if you already have pacemaker deployed. It introduces almost no latency, it doesn’t need any other hardware, doesn’t increase the database validation load and is as highly-available as your database is. The only drawback I can see is in a case where the inbound traffic is very important. In that case, all nodes are receiving all the traffic and are equally saturated. With databases and web type traffics, the inbound traffic is usually small. This solution also doesn’t redistribute the connections based on the server load like a load balancer can do but that would be fairly easy to implement with something like a server_load attribute and an agent similar to mysql_monitor but that will check the server load instead of the database status. In such a case, I suggest using much more than 1 VIP clone instance per node to have a better granularity in load distribution. Finally, the ClusterIP target, although still fully supported, has been deprecated in favor of the Cluster-match target. It is basically the same principle and I plan to adapt the IPaddr3 agent to Cluster-match in a near future.

The post The use of Iptables ClusterIP target as a load balancer for PXC, PRM, MHA and NDB appeared first on MySQL Performance Blog.

Dec
10
2013
--

Q&A: Geographical disaster recovery with Percona Replication Manager

Q&A: Geographical disaster recovery with Percona Replication ManagerMy December 4 webinar, “Geographical disaster recovery with  Percona Replication Manager (PRM),”  gave rise to a few questions. The recording of the webinar and the slides are available here, and I’ve answered the questions I didn’t have time to address below.

Q1: Hi, I was wondering if corosync will work in cloud environment. As far as I know it is hard to implement because of no support of unicast or multicast.

A1: Corosync supports the udpu transport since somewhere in the 1.3.0 branch. udpu stands for udp unicast and it works in AWS for instance. Most recent distribution are using 1.4.x so it is easy to find.

Q2: For token wouldn’t it make sense to have a value higher than 3000 to account for any packetloss and the default retry of 3 seconds for tcp communication?

A2: Corosync uses udp, not tcp so the argument is somewhat irrelevant.

Q3: Is PRM supported with a Percona support contract?

A3: Yes, PRM is now supported with a Percona Gold or Platinum Support contract. It is not availabe with Silver support.

Q4: Is PRM working with GTID’s in 5.6?

A4: There’s a version in testing phases adapted by Frederic Descamps that works with 5.6/GTID. As soon as it is tested properly, I’ll release it. So far, it is very clean in term of logic.

Q5: Does Percona Replication Manager do anything different with replication over the built in mySQL replication to combat the single threaded nature?

A5: No

Q6: We agree that fencing always has to be configured (even with no shared resources , in cases of mysql stop failure for example) : What about MySQL and Percona’s PRM Ressource agent behavior when fencing? and recommendations concerning fencing? There was no fencing in the demo and there is no fencing configured on the pacemaker crm snippets examples provided in on Percona’s github repo.

A6: Fencing is independent of PRM and is well covered elsewhere on the web. I’ll see to add an example configuration with stonith devices. On real hardware, the most common stonith device are IPMI or ILO. These technologies comes with nearly all the server brands.

Q7: Is there any other Mysql HA setup supported by Percona’s mysql_prm pacemaker resource agent than the Mysql master slave replication ? Like Multi master setups ? In case yes , will you release some crm configurations snippets examples for other Mysql HA setups ?

A7: No, only a single master is supported. The main argument here is that multiple master doesn’t scale writes and are a big source of conflicts.

Q8: Why use Percona Replication Manager over XtraDB cluster (omitting ease of Geo-DR) on a write performance perspective , also considering HA and cost ?

A8: Percona XtraDB Cluster (PXC) is more advanced and capable but, some workloads, like large transactions, are not working well with PXC. Regular replication is well known and many customers are not willing to try a newer technology like (PXC) and prefer to rely on regular replication. Also, PRM can be mixed with PXC without problem. Example of such configuration will be published soon. In term of cost both are free, support is available from Percona (Gold and Platinium) but (PXC) has a premium.

The post Q&A: Geographical disaster recovery with Percona Replication Manager appeared first on MySQL Performance Blog.

Nov
26
2013
--

Geographical disaster recovery with PRM: Register for Dec. 4 Webinar

Dec. 4 webinar: MySQL High Availability and Geographical Disaster Recovery with Percona Replication ManagerDowntime caused by a disaster is probable in your application’s lifetime. While caused by a large-scale geographical disaster, cyber attack, spiked consumer demand, or a relatively less catastrophic event, downtime equates to lost business. Setting up geographical disaster recovery (geo-DR) has always been challenging but Percona replication manager (PRM) with booth provides a solution for an efficient geo-DR deployment.

Join me on wednesday December 4th at 10 a.m. PST for a presentation of the geo-DR capabilities of PRM followed by a step by step setup of a full geo-DR solution. The title of the webinar is, “MySQL High Availability and Geographical Disaster Recovery with Percona Replication Manager” and you can register here.

Feel free to ask questions in advance here in the comments section. The webinar will be recorded and available for replay here shortly afterward.

I hope to see you on Wednesday December 4th!

The post Geographical disaster recovery with PRM: Register for Dec. 4 Webinar appeared first on MySQL Performance Blog.

Oct
23
2013
--

High-availability options for MySQL, October 2013 update

The technologies allowing to build highly-available (HA) MySQL solutions are in constant evolution and they cover very different needs and use cases. In order to help people choose the best HA solution for their needs, we decided, Jay Janssen and I, to publish, on a regular basis (hopefully, this is the first), an update on the most common technologies and their state, with a focus on what type of workloads suite them best. We restricted ourselves to the open source solutions that provide automatic failover. Of course, don’t simply look at the number of Positives/Negatives items, they don’t have the same values. Should you pick any of these technologies, heavy testing is mandatory, HA is never beyond scenario that have been tested.

Percona XtraDB Cluster (PXC)

Percona XtraDB Cluster (PXC) is a version of Percona Server implementing the Galera replication protocol from Codeship.

Positive points Negative points
  • Almost synchronous replication, very small lag if any
  • Automatic failover
  • At best with small transactions
  • All nodes are writable
  • Very small read after write lag, usually no need to care about
  • Scale reads very well and to some extent, writes
  • New nodes are provisioned automatically through State Snapshot Transfer (SST)
  • Multi-threaded apply, greater write capacity than regular replication
  • Can do geographical disaster recovery (Geo DR)
  • More resilient to unresponsive nodes (swapping)
  • Can resolve split-brain situations by itself
  • Still under development, some rough edges
  • Large transactions like multi-statement transactions or large write operations cause issues and are usually not a good fit
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • SST can be heavy over a Wan
  • Commit are affected by the network latency, this impacts especially Geo DR
  • To achieve HA, a load balancer, like haproxy, is needed
  • Failover time is determined by the load balancer check frequency
  • Performance is affected by the weakest/busiest node
  • Foreign Keys are potential issues
  • MyISAM should be avoided
  • Can be mixed with regular async replication as master or slave but, slaves are not easy to reconfigure after a SST on their master
  • Require careful setup of the host, swapping can lead to node expulsion from the cluster
  • No manual failover mode
  • Debugging some Galera protocol issues isn’t trivial

 

Percona replication manager (PRM)

Percona replication manager (PRM) uses the Linux HA Pacemaker resource manager to manage MySQL and replication and provide high-availability. Information about PRM can be found here, the official page on the Percona web site is in the making.

Positive points Negative points
  • Nothing specific regarding the workload
  • Unlimited number of slaves
  • Slaves can have different roles
  • Typically VIP based access, typically 1 writer VIP and many reader VIPs
  • Also works without VIP (see the fake_mysql_novip agent)
  • Detects if slave lags too much and remove reader VIPs
  • All nodes are monitored
  • The best slaves is picked for master after failover
  • Geographical Disaster recovery possilbe with the lightweight booth protocol
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 2s in normal conditions
  • Ungraceful failover under 30s
  • Distributed operation with Pacemaker, no single point of failure
  • Builtin pacemaker logic, stonith, etc. Very rich and flexible.
  • Still under development, some rough edges
  • Transaction maybe lost is master crashes (async replication)
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Careful setup for the host, swapping can lead to node expulsion from the cluster
  • Data inconsistency can happen if the master crashes (fix coming)
  • Pacemaker is complex, logs are difficult to read and understand

 

MySQL master HA (MHA)

Like with PRM above, MySQL master HA (MHA), provides high-availability through replication. The approach is different, instead of relying on an HA framework like Pacemaker, it uses Perl scripts. Information about MHA can be found here.

Positive points Negative points
  • Mature
  • Nothing specific regarding the workload
  • No latency effects on writes
  • Can have many slaves and slaves can have different roles
  • Very good binlog/relaylog handling
  • Work pretty hard to minimise data loss
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 5s in normal conditions
  • If the master crashes, slaves will be consistent
  • The logic is fairly easy to understand
  • Transaction maybe lost is master crashes (async replication)
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Monitoring and logic are centralized, single-point of failure, a network partition can cause a split-brain
  • Custom fencing devices, custom VIP scripts, no reuse of other projects tools
  • Most of the deployments are using manual failover (at least at Percona)
  • Requires priviledged ssh access to read relay-logs, can be a security concern
  • No monitoring of the slave to invalidate it if it lags too much or if replication is broken, need to be done by external tool like HAProxy
  • Careful setup for the host, swapping can lead to node expulsion from the cluster

 

NDB Cluster

NDB cluster is the most high-end form of high-availability configuration for MySQL. It is a complete shared nothing architecture where the storage engine is distributed over multiple servers (data nodes). Probably the best starting point with NDB is the official document, here.

Positive points Negative points
  • Mature
  • Synchronous replication
  • Very good at small transactions
  • Very good at high concurrency (many client threads)
  • Huge transaction capacity, more than 1M trx/s are not uncommon
  • Failover can be ~1s
  • No single point of failure
  • Geographical disaster recovery capacity built-in
  • Strong at async replication, applying by batches gives multithreaded apply at the data node level
  • Can scale reads and writes, the framework implements sharding by hashes
  • Not a drop-in replacement for Innodb, you need to tune the schema and the queries
  • Not a general purpose database, some loads like reporting are just bad
  • Only the Read-commited isolation level is available
  • Hardware heavy, need 4 servers mininum for full HA
  • Memory (RAM) hungry, even with disk-based tables
  • Complex to operate, lots of parameters to adjust
  • Need a load balancer for failover
  • Very new foreign key support, field reports scarce on it

 

Shared storage/DRBD

Achieving high-availability use a shared storage medium is an old and well known method. It is used by nearly all the major databases. The share storage can be a DAS connected to two servers, a LUN on SAN accessible from 2 servers or a DRBD partition replicated synchronously over the network. DRBD is by bar the most common shared storage device used in the MySQL world.

Positive points Negative points
  • Mature
  • Synchronous replication (DRBD)
  • Automatic failover is easy to implement
  • VIP based access
  • Write capacity is impacted by network latency for DRBD
  • SANs are expensive
  • Only for InnoDB
  • Standby node, a big server doing nothing
  • Need a warmup period after failover to be fully operational
  • Disk corruption can spread

 

The post High-availability options for MySQL, October 2013 update appeared first on MySQL Performance Blog.

May
29
2013
--

How to fix your PRM cluster when upgrading to RHEL/CentOS 6.4

If you aHow to fix your PRM cluster when upgrading to RHEL/CentOS 6.4re using Percona Replication Manager (PRM) with RHEL/CentOS prior to 6.4, upgrading your distribution to 6.4 may break your cluster. In this post I will explain you how to fix your cluster in case it breaks after a distribution upgrade that implies an update of pacemaker from 1.1.7 to 1.18. You can also follow the official documentation here.

The version of Pacemaker (always considered as Technology Preview by RedHat) provided with 6.4 is 1.1.8-x which is not 100% compatible with 1.1.7-x see this report.

So if you want to upgrade, you cannot apply any rolling upgrade process. So like for Pacemaker 0.6.x to 1.0.x, you need again to update all nodes as once. As notified in RHBA-2013-0375, RedHat encourages people to use Pacemaker in combination with the CMAN manager (It may become mandatory with the next release).

CMAN v3 is a Corosync plugin that monitors the names and number of active cluster nodes in order to deliver membership and quorum information to clients (such as the Pacemaker daemons) and it’s part of the RedHat cluster stack. If you were using some puppet recipes published previously here you are not yet using CMAN.

Let’s have look at what happens if we have a cluster with 3 nodes (CentOS 6.3) and using PRM as OCF:

[root@percona1 percona]# crm_mon -1
============
Last updated: Thu May 23 08:04:30 2013
Last change: Thu May 23 08:03:41 2013 via crm_attribute on percona2
Stack: openais
Current DC: percona1 – partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
7 Resources configured.
============

Online: [ percona1 percona2 percona3 ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona3
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona2
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona1
writer_vip (ocf::heartbeat:IPaddr2): Started percona1
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ percona2 ]
Slaves: [ percona3 percona1 ]

[root@percona1 ~]# cat /etc/redhat-release
CentOS release 6.3 (Final)
[root@percona1 ~]# rpm -q pacemaker
pacemaker-1.1.7-6.el6.x86_64
[root@percona1 ~]# rpm -q corosync
corosync-1.4.1-7.el6_3.1.x86_64

Everything is working :-)
Let’s update our system to 6.4 on one server…

NOTE: In production you should put the cluster in maintenance mode before the update, see bellow how to perform this action

[root@percona1 percona]# yum update -y

[root@percona1 percona]# cat /etc/redhat-release
CentOS release 6.4 (Final)

[root@percona1 ~]# rpm -q pacemaker
pacemaker-1.1.8-7.el6.x86_64
[root@percona1 ~]# rpm -q corosync
corosync-1.4.1-15.el6_4.1.x86_64

Let’s reboot it…

[root@percona1 percona]# reboot

If we check the cluster from another node, we see that percona1 is now offline:

============
Last updated: Thu May 23 08:29:36 2013
Last change: Thu May 23 08:03:41 2013 via crm_attribute on percona2
Stack: openais
Current DC: percona3 – partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
7 Resources configured.
============

Online: [ percona2 percona3 ]
OFFLINE: [ percona1 ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona2
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona3
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona2
writer_vip (ocf::heartbeat:IPaddr2): Started percona3
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ percona2 ]
Slaves: [ percona3 ]
Stopped: [ p_mysql:2 ]

After the update and after fixing some small issues like the one bellow, you are able to start Corosync and Pacemaker but the node doesn’t join the cluster :(

May 23 08:34:12 percona1 corosync[1535]: [MAIN ] parse error in config: Can't open logfile '/var/log/corosync.log' for reason: Permission denied (13).#012.

So now you need to update all nodes to Pacemaker 1.1.8 but to avoid again issues with the next distribution update, I prefer to use CMAN as recommended.

First as we have 2 nodes of 3 running, we should try to not stop all our servers… let’s put the cluster in maintenance mode (don’t forget you should have done this even before updating the first node, but I wanted to simulate the problem):

[root@percona3 percona]# crm configure property maintenance-mode=true

We can see that the resources are unmanaged:

============
Last updated: Thu May 23 08:43:49 2013
Last change: Thu May 23 08:43:49 2013 via cibadmin on percona3
Stack: openais
Current DC: percona3 – partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, 3 expected votes
7 Resources configured.
============

Online: [ percona2 percona3 ]
OFFLINE: [ percona1 ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona2 (unmanaged)
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona3 (unmanaged)
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona2 (unmanaged)
writer_vip (ocf::heartbeat:IPaddr2): Started percona3 (unmanaged)
Master/Slave Set: ms_MySQL [p_mysql] (unmanaged)
p_mysql:0 (ocf::percona:mysql): Master percona2 (unmanaged)
p_mysql:1 (ocf::percona:mysql): Started percona3 (unmanaged)
Stopped: [ p_mysql:2 ]

Now we can upgrade all servers to 6.4

[root@percona2 percona]# yum -y update
[root@percona3 percona]# yum -y update

Meanwhile, we can already prepare the first node to use CMAN:

[root@percona1 ~]# yum -y install cman ccs

Back on the two nodes that were updating, they are now updated to 6.4:

[root@percona3 percona]# cat /etc/redhat-release
CentOS release 6.4 (Final)

And let’s check the cluster status:

[root@percona3 percona]# crm_mon -1
Could not establish cib_ro connection: Connection refused (111)

Connection to cluster failed: Transport endpoint is not connected…

…but MySQL is still running:

[root@percona2 percona]# mysqladmin ping
mysqld is alive

[root@percona3 percona]# mysqladmin ping
mysqld is alive

Let’s install CMAN on percona2 and percona3 too:

[root@percona2 percona]# yum -y install cman ccs
[root@percona3 percona]# yum -y install cman ccs

Then on ALL nodes, stop Pacemaker and Corosync

[root@percona1 ~]# /etc/init.d/pacemaker stop
[root@percona1 ~]# /etc/init.d/corosync stop
[root@percona2 ~]# /etc/init.d/pacemaker stop
[root@percona2 ~]# /etc/init.d/corosync stop
[root@percona3 ~]# /etc/init.d/pacemaker stop
[root@percona3 ~]# /etc/init.d/corosync stop

Remove Corosync from the startup services:

[root@percona1 ~]# chkconfig corosync off
[root@percona2 ~]# chkconfig corosync off
[root@percona3 ~]# chkconfig corosync off

Let’s specify that the cluster can start without quorum:

[root@percona1 ~]# sed -i.sed “s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g” /etc/sysconfig/cman
[root@percona2 ~]# sed -i.sed “s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g” /etc/sysconfig/cman
[root@percona3 ~]# sed -i.sed “s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g” /etc/sysconfig/cman

And create the cluster, perform the following command on one server only:

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –createcluster lefred_prm

Now add the nodes to the cluster:

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addnode percona1
Node percona1 added.
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addnode percona2
Node percona2 added.
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addnode percona3
Node percona3 added.

we need then to delegate the fencing to pacemaker (adding a fence device, fence methods to specific node and the instances) :

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addfencedev pcmk agent=fence_pcmk

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addmethod pcmk-redirect percona1
Method pcmk-redirect added to percona1.
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addmethod pcmk-redirect percona2
Method pcmk-redirect added to percona2.
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addmethod pcmk-redirect percona3
Method pcmk-redirect added to percona3.

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addfenceinst pcmk percona1 pcmk-redirect port=percona1
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addfenceinst pcmk percona2 pcmk-redirect port=percona2
[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –addfenceinst pcmk percona3 pcmk-redirect port=percona3

Encrypt the cluster:

[root@percona1 ~]# ccs -f /etc/cluster/cluster.conf –setcman keyfile=”/etc/corosync/authkey” transport=”udpu”

Let’s check if the configuration file is OK:

[root@percona1 ~]# ccs_config_validate -f /etc/cluster/cluster.conf
Configuration validates

We can now copy the configuration file on all nodes:

[root@percona1 ~]# scp /etc/cluster/cluster.conf percona2:/etc/cluster/
[root@percona1 ~]# scp /etc/cluster/cluster.conf percona3:/etc/cluster/

Enable CMAN at startup on all nodes:

[root@percona1 ~]# chkconfig cman on
[root@percona2 ~]# chkconfig cman on
[root@percona3 ~]# chkconfig cman on

And start the services on all nodes:

[root@percona1 ~]# /etc/init.d/cman start
Starting cluster:
Checking if cluster has been disabled at boot… [ OK ]
Checking Network Manager… [ OK ]
Global setup… [ OK ]
Loading kernel modules… [ OK ]
Mounting configfs… [ OK ]
Starting cman… [ OK ]
Waiting for quorum… [ OK ]
Starting fenced… [ OK ]
Starting dlm_controld… [ OK ]
Tuning DLM kernel config… [ OK ]
Starting gfs_controld… [ OK ]
Unfencing self… [ OK ]
Joining fence domain… [ OK ]
[root@percona1 ~]# /etc/init.d/pacemaker start
Starting cluster:
Checking if cluster has been disabled at boot… [ OK ]
Checking Network Manager… [ OK ]
Global setup… [ OK ]
Loading kernel modules… [ OK ]
Mounting configfs… [ OK ]
Starting cman… [ OK ]
Waiting for quorum… [ OK ]
Starting fenced… [ OK ]
Starting dlm_controld… [ OK ]
Tuning DLM kernel config… [ OK ]
Starting gfs_controld… [ OK ]
Unfencing self… [ OK ]
Joining fence domain… [ OK ]
Starting Pacemaker Cluster Manager: [ OK ]

[root@percona2 ~]# /etc/init.d/cman start
[root@percona2 ~]# /etc/init.d/pacemaker start
[root@percona3 ~]# /etc/init.d/cman start
[root@percona3 ~]# /etc/init.d/pacemaker start

We can now connect crm_mon to the cluster and check its status:

[root@percona2 percona]# crm_mon -1
Last updated: Thu May 23 09:18:58 2013
Last change: Thu May 23 09:16:31 2013 via crm_attribute on percona1
Stack: cman
Current DC: percona1 – partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, 3 expected votes
7 Resources configured.

Online: [ percona1 percona2 percona3 ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona3
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona2
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona1
writer_vip (ocf::heartbeat:IPaddr2): Started percona1
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ percona1 ]
Slaves: [ percona2 percona3 ]

We can see that some resources changed this is because we didn’t put it in maintenance on node1 before the update to 6.4

In case we put everything in maintenance mode as it should be before the upgrade to 6.4, it’s time to stop the maintenance mode… but crm command is not present any more ;)

It’s still possible to find the command install crmsh (crm shell from another repository) or just install pcs (Pacemaker Configuration System)

[root@percona2 percona]# yum -y install pcs
[root@percona2 percona]# pcs status
Last updated: Thu May 23 09:24:37 2013
Last change: Thu May 23 09:16:31 2013 via crm_attribute on percona1
Stack: cman
Current DC: percona1 – partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, 3 expected votes
7 Resources configured.

Online: [ percona1 percona2 percona3 ]

Full list of resources:

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona3
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona2
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona1
writer_vip (ocf::heartbeat:IPaddr2): Started percona1
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ percona1 ]
Slaves: [ percona2 percona3 ]

So if you were in maintenance mode, you should have :

[root@percona2 percona]# pcs status
Last updated: Thu May 23 09:26:56 2013
Last change: Thu May 23 09:26:50 2013 via cibadmin on percona2
Stack: cman
Current DC: percona1 – partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, 3 expected votes
7 Resources configured.

Online: [ percona1 percona2 percona3 ]

Full list of resources:

reader_vip_1 (ocf::heartbeat:IPaddr2): Started percona3 (unmanaged)
reader_vip_2 (ocf::heartbeat:IPaddr2): Started percona2 (unmanaged)
reader_vip_3 (ocf::heartbeat:IPaddr2): Started percona1 (unmanaged)
writer_vip (ocf::heartbeat:IPaddr2): Started percona1 (unmanaged)
Master/Slave Set: ms_MySQL [p_mysql] (unmanaged)
p_mysql:0 (ocf::percona:mysql): Master percona1 (unmanaged)
p_mysql:1 (ocf::percona:mysql): Slave percona2 (unmanaged)
p_mysql:2 (ocf::percona:mysql): Slave percona3 (unmanaged)

And now you are able to stop maintenance mode:

[root@percona2 percona]# pcs property set maintenance-mode=false

You can also check your cluster using cman_tool or clustat (if you have installed rgmanager)

[root@percona3 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 64 2013-05-23 09:52:03 percona1
2 M 64 2013-05-23 09:52:03 percona2
3 M 64 2013-05-23 09:52:03 percona3

[root@percona3 ~]# clustat
Cluster Status for lefred_prm @ Thu May 23 10:20:36 2013
Member Status: Quorate

Member Name ID Status
—— —- —- ——
percona1 1 Online
percona2 2 Online
percona3 3 Online, Local

Now the cluster is fixed and everything works again as expected and you should be ready for the next distro upgrade!

INFO: If you have the file /etc/corosync/service.d/pcmk you need to delete it before installing CMAN

The post How to fix your PRM cluster when upgrading to RHEL/CentOS 6.4 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com