Aug
02
2018
--

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

Feb
25
2016
--

High availability with asynchronous replication… and transparent R/W split

High availability with asynchronous replication

High availability with asynchronous replicationIn this post, the first one of a Maxscale series, I describe how to use MariaDB’s MaxScale and MySQL-utilities with MySQL Asynchronous replication.

When we talk about high availability with asynchronous replication, we always think about MHA or PRM. But if we want to transparently use the slave(s) for READs, what can we use ?

Description:

  • Three MySQL servers, but one has very limited resources and will never be able to handle the production load. In fact this node is used for backup and some back-office queries.
  • We would like to use one of the nodes as a master and the other two as slaves, but only one will be addressed by the application for the READs. If needed, that same node will become the master.
  • The application doesn’t handle READ and WRITE connections, and it’s impossible to change it.

To achieve our goals, we will use MaxScale and it’s R/W filter. When using Maxscale and asynchronous replication with MariaDB, it’s possible to use MariaDB’s replication manager, which is a wonderful tool written in Go. Unfortunately, this tool doesn’t support standard MySQL. To replace it, I used then the Oracle’s MySQL-Utilities.

Our three nodes are:

  • percona1 (master)
  • percona2 (powerful slave)
  • percona3 (weak slave)

It’s mandatory in this solution to use GTID, as it’s the only method supported by the mysql-utilities we are using.

This is the MaxScale configuration:

[maxscale]
threads=4
[Splitter Service]
type=service
router=readwritesplit
servers=percona1, percona2
user=maxscale
passwd=264D375EC77998F13F4D0EC739AABAD4
[Splitter Listener]
type=listener
service=Splitter Service
protocol=MySQLClient
port=3306
socket=/tmp/ClusterMaster
[percona1]
type=server
address=192.168.90.2
port=3306
protocol=MySQLBackend
[percona2]
type=server
address=192.168.90.3
port=3306
protocol=MySQLBackend
[percona3]
type=server
address=192.168.90.4
port=3306
protocol=MySQLBackend
[Replication Monitor]
type=monitor
module=mysqlmon
servers=percona1, percona2, percona3
user=maxscale
passwd=264D375EC77998F13F4D0EC739AABAD4
monitor_interval=1000
script=/usr/local/bin/failover.sh
events=master_down
[CLI]
type=service
router=cli
[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
address=localhost
port=6603

As you can notice, the Splitter Service contains only the two nodes able to handle the load.

And to perform the failover, in the Replication Monitor section, we define a script to use when the master is down.

That script calls mysqlrpladmin from the mysql-utilities.

In the script we also define the following line to be sure the weak slave will never become a master.

never_master=192.168.90.4

When everything is setup and running, you should see something like this:

# maxadmin -pmariadb list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          15 | Master, Running
percona2           | 192.168.90.3    |  3306 |        1025 | Slave, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

So as you can see, Maxscale discovers on its own which server is the master; this doesn’t need to be specified in the configuration.

You can also use mysqldrpladmin utility to verify the cluster’s health:

# /usr/bin/mysqlrpladmin --rpl-user=repl:replpercona --master=manager:percona@192.168.90.2:3306 --slaves=manager:percona@192.168.90.3:3306,manager:percona@192.168.90.4:3306  health
# Checking privileges.
#
# Replication Topology Health:
+---------------+-------+---------+--------+------------+---------+
| host          | port  | role    | state  | gtid_mode  | health  |
+---------------+-------+---------+--------+------------+---------+
| 192.168.90.2  | 3306  | MASTER  | UP     | ON         | OK      |
| 192.168.90.3  | 3306  | SLAVE   | UP     | ON         | OK      |
| 192.168.90.4  | 3306  | SLAVE   | UP     | ON         | OK      |
+---------------+-------+---------+--------+------------+---------+

Try it with --verbose ????

When we test with sysbench, and we stop the master, we can see that there are some errors due to disconnects. Also, during the promotion of the new master, sysbench can’t reconnect:

[  20s] queue length: 0, concurrency: 0
[  21s] threads: 8, tps: 2.00, reads: 28.00, writes: 8.00, response time: 107.61ms (95%), errors: 0.00, reconnects:  0.00
[  21s] queue length: 0, concurrency: 0
[  22s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  22s] queue length: 0, concurrency: 0
[  23s] threads: 8, tps: 1.00, reads: 14.00, writes: 4.00, response time: 100.85ms (95%), errors: 0.00, reconnects:  0.00
[  23s] queue length: 0, concurrency: 0
[  24s] threads: 8, tps: 0.00, reads: 11.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  24s] queue length: 0, concurrency: 1
[  25s] threads: 8, tps: 1.00, reads: 3.00, writes: 4.00, response time: 235.41ms (95%), errors: 0.00, reconnects:  0.00
[  25s] queue length: 0, concurrency: 0
[  26s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  26s] queue length: 0, concurrency: 0
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  27s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  27s] queue length: 0, concurrency: 3
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  28s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  28s] queue length: 0, concurrency: 4
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  29s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  29s] queue length: 0, concurrency: 5
[  30s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  30s] queue length: 0, concurrency: 5
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
[  31s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[  31s] queue length: 0, concurrency: 7
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Target transaction rate: 1/sec
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
FATAL: unable to connect to MySQL server, aborting...
FATAL: error 1045: failed to create new session
PANIC: unprotected error in call to Lua API (Failed to connect to the database)
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Target transaction rate: 1/sec
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
[   1s] threads: 8, tps: 1.99, reads: 27.93, writes: 7.98, response time: 211.49ms (95%), errors: 0.00, reconnects:  0.00
[   1s] queue length: 0, concurrency: 0
[   2s] threads: 8, tps: 1.00, reads: 14.00, writes: 4.00, response time: 51.01ms (95%), errors: 0.00, reconnects:  0.00
[   2s] queue length: 0, concurrency: 0
[   3s] threads: 8, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[   3s] queue length: 0, concurrency: 0
[   4s] threads: 8, tps: 1.00, reads: 13.99, writes: 4.00, response time: 80.28ms (95%), errors: 0.00, reconnects:  0.00

It took 8 seconds to automatically failover.

Then we can see the status of the servers:

# maxadmin -pmariadb list serversServers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          17 | Down
percona2           | 192.168.90.3    |  3306 |        1025 | Master, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

If we re-start percona1, we can see now:

# maxadmin -pmariadb list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |          17 | Running
percona2           | 192.168.90.3    |  3306 |        1025 | Master, Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

To add the node again in the asynchronous replication as a slave, we need to use another MySQL utility, mysqlreplicate:

# mysqlreplicate --master=manager:percona@192.168.90.3 --slave=manager:percona@192.168.90.2 --rpl-user=repl:replpercona
# master on 192.168.90.3: ... connected.
# slave on 192.168.90.2: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

This is source of failover.sh:

#!/bin/bash
# failover.sh
# wrapper script to mysqlrpladmin
# user:password pair, must have administrative privileges.
user=manager:percona
# user:password pair, must have REPLICATION SLAVE privileges.
repluser=repl:replpercona
never_master=192.168.90.4
ARGS=$(getopt -o '' --long 'event:,initiator:,nodelist:' -- "$@")
eval set -- "$ARGS"
while true; do
    case "$1" in
        --event)
            shift;
            event=$1
            shift;
        ;;
        --initiator)
            shift;
            initiator=$1
            shift;
        ;;
        --nodelist)
            shift;
            nodelist=$1
            shift;
        ;;
        --)
            shift;
            break;
        ;;
    esac
done
# find the candidates
for i in $(echo $nodelist | sed s/,/n/g)
do
  if [[ "$i" =~ "$never_master" ]]
  then
     # do nothing
     echo nothing >/dev/null
  else
     if [[ "$i" =~ "$initiator" ]]
     then
	# do nothing
        echo nothing >/dev/null
     else
        candidates="$candidates,${user}@${i}"
     fi
  fi
  if [[ "$i" =~ "$initiator" ]]
  then
     # do nothing
     echo nothing >/dev/null
  else
     slaves="$slaves,${user}@${i}"
  fi
done
cmd="/usr/bin/mysqlrpladmin --rpl-user=$repluser --slaves=${slaves#?} --candidates=${candidates#?} failover"
# uncomment following line for debug
#echo $cmd >> /tmp/fred
eval $cmd

In the next post, we will focus on the monitoring module used in this configuration.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com