Oct
10
2018
--

MongoDB Replica set Scenarios and Internals

MongoDB replica sets replication internals r

MongoDB replica sets replication internals rThe MongoDB® replica set is a group of nodes with one set as the primary node, and all other nodes set as secondary nodes. Only the primary node accepts “write” operations, while other nodes can only serve “read” operations according to the read preferences defined. In this blog post, we’ll focus on some MongoDB replica set scenarios, and take a look at the internals.

Example configuration

We will refer to a three node replica set that includes one primary node and two secondary nodes running as:

"members" : [
{
"_id" : 0,
"name" : "192.168.103.100:25001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3533,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"electionTime" : Timestamp(1537797392, 2),
"electionDate" : ISODate("2018-09-24T13:56:32Z"),
"configVersion" : 3,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.103.100:25002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 3063,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.664Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25001",
"configVersion" : 3
},
{
"_id" : 2,
"name" : "192.168.103.100:25003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2979,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.989Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25002",
"configVersion" : 3
}

Here, the primary is running on port 25001, and the two secondaries are running on ports 25002 and 25003 on the same host.

Secondary nodes can only sync from Primary?

No, it’s not mandatory. Each secondary can replicate data from the primary or any other secondary to the node that is syncing. This term is also known as chaining, and by default, this is enabled.

In the above replica set, you can see that secondary node

"_id":2 

  is syncing from another secondary node

"_id":1

   as

"syncingTo" : "192.168.103.100:25002" 

This can also be found in the logs as here the parameter

chainingAllowed :true

   is the default setting.

settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } }

Chaining?

That means that a secondary member node is able to replicate from another secondary member node instead of from the primary node. This helps to reduce the load from the primary. If the replication lag is not tolerable, then chaining could be disabled.

For more details about chaining and the steps to disable it please refer to my earlier blog post here.

Ok, then how does the secondary node select the source to sync from?

If Chaining is False

When chaining is explicitly set to be false, then the secondary node will sync from the primary node only or could be overridden temporarily.

If Chaining is True

  • Before choosing any sync node, TopologyCoordinator performs validations like:
    • Whether chaining is set to true or false.
    • If that particular node is part of the current replica set configurations.
    • Identify the node ahead with oplog with the lowest ping time.
    • The source code that includes validation is here.
  • Once the validation is done, SyncSourceSelector relies on SyncSourceResolver which contains the result and details for the new sync source
  • To get the details and response, SyncSourceResolver coordinates with ReplicationCoordinator
  • This ReplicationCoordinator is responsible for the replication, and co-ordinates with TopologyCoordinator
  • The TopologyCoordinator is responsible for topology of the cluster. It finds the primary oplog time and checks for the maxSyncSourceLagSecs
  • It will reject the source to sync from if the maxSyncSourceLagSecs  is greater than the newest oplog entry. The code for this can be found here
  • If the criteria for the source selection is not fulfilled, then BackgroundSync thread waits and restarts the whole process again to get the sync source.

Example for “unable to find a member to sync from” then, in the next attempt, finding a candidate to sync from

This can be found in the log like this. On receiving the message from rsBackgroundSync thread

could not find member to sync from

, the whole internal process restarts and finds a member to sync from i.e.

sync source candidate: 192.168.103.100:25001

, which means it is now syncing from node 192.168.103.100 running on port 25001.

2018-09-24T13:58:43.197+0000 I REPL     [rsSync] transition to RECOVERING
2018-09-24T13:58:43.198+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2018-09-24T13:58:43.201+0000 I REPL     [rsSync] transition to SECONDARY
2018-09-24T13:58:59.208+0000 I REPL     [rsBackgroundSync] sync source candidate: 192.168.103.100:25001

  • Once the sync source node is selected, SyncSourceResolver probes the sync source to confirm that it is able to fetch the oplogs.
  • RollbackID is also fetched i.e. rbid  after the first batch is returned by oplogfetcher.
  • If all eligible sync sources are too fresh, such as during initial sync, then the syncSourceStatus Oplog start is missing and earliestOpTimeSeen will set a new minValid.
  • This minValid is also set in the case of rollback and abrupt shutdown.
  • If the node has a minValid entry then this is checked for the eligible sync source node.

Example showing the selection of a new sync source when the existing source is found to be invalid

Here, as the logs show, during sync the node chooses a new sync source. This is because it found the original sync source is not ahead, so not does not contain recent oplogs from which to sync.

2018-09-25T15:20:55.424+0000 I REPL     [replication-1] Choosing new sync source because our current sync source, 192.168.103.100:25001, has an OpTime ({ ts: Timestamp 1537879296000|1, t: 4 }) which is not ahead of ours ({ ts: Timestamp 1537879296000|1, t: 4 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)

2018-09-25T15:20:55.425+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source 192.168.103.100:25001 (config version: 3; last applied optime: { ts: Timestamp 1537879296000|1, t: 4 }; sync source index: -1; primary index: -1) is no longer valid

  • If the secondary node is too far behind the eligible sync source node, then the node will enter maintenance node and then resync needs to be call manually.
  • Once the sync source is chosen, BackgroundSync starts oplogFetcher.

Example for oplogFetcher

Here is an example of fetching oplog from the “oplog.rs” collection, and checking for the greater than required timestamp.

2018-09-26T10:35:07.372+0000 I COMMAND  [conn113] command local.oplog.rs command: getMore { getMore: 20830044306, collection: "oplog.rs", maxTimeMS: 5000, term: 7, lastKnownCommittedOpTime: { ts: Timestamp 1537955038000|1, t: 7 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1537903865000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 7, readConcern: { afterOpTime: { ts: Timestamp 1537903865000|1, t: 6 } } } planSummary: COLLSCAN cursorid:20830044306 keysExamined:0 docsExamined:0 numYields:1 nreturned:0 reslen:451 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 3063398ms

When and what details replica set nodes communicate with each other?

At a regular interval, all the nodes communicate with each other to check the status of the primary node, check the status of the sync source, to get the oplogs and so on.

ReplicationCoordinator has ReplicaSetConfig that has a list of all the replica set nodes, and each node has a copy of it. This makes nodes aware of other nodes under same replica set.

This is how nodes communicate in more detail:

Heartbeats: This checks the status of other nodes i.e. alive or die

heartbeatInterval: Every node, at an interval of two seconds, sends the other nodes a heartbeat to make them aware that “yes I am alive!”

heartbeatTimeoutSecs: This is a timeout, and means that if the heartbeat is not returned in 10 seconds then that node is marked as inaccessible or simply die.

Every heartbeat is identified by these replica set details:

  • replica set config version
  • replica set name
  • Sender host address
  • id from the replicasetconfig

The source code could be referred to from here.

When the remote node receives the heartbeat, it processes this data and validates if the details are correct. It then prepares a ReplSetHeartbeatResponse, that includes:

  • Name of the replica set, config version, and optime details
  • Details about primary node as per the receiving node.
  • Sync source details and state of receiving node

This heartbeat data is processed, and if primary details are found then the election gets postponed.

TopologyCoordinator checks for the heartbeat data and confirms if the node is OK or NOT. If the node is OK then no action is taken. Otherwise it needs to be reconfigured or else initiate a priority takeover based on the config.

Response from oplog fetcher

To get the oplogs from the sync source, nodes communicate with each other. This oplog fetcher fetches oplogs through “find” and “getMore”. This will only affect the downstream node that gets metadata from its sync source to update its view from the replica set.

OplogQueryMetadata only comes with OplogFetcher responses

OplogQueryMetadata comes with OplogFetcher response and ReplSetMetadata comes with all the replica set details including configversion and replication commands.

Communicate to update Position commands:

This is to get an update for replication progress. ReplicationCoordinatorExternalState creates SyncSourceFeedback sends replSetUpdatePosition commands.

It includes Oplog details, Replicaset config version, and replica set metadata.

If a new node is added to the existing replica set, how will that node get the data?

If a new node is added to the existing replica set then the “initial sync” process takes place. This initial sync can be done in two ways:

  1. Just add the new node to the replicaset and let initial sync threads restore the data. Then it syncs from the oplogs until it reaches the secondary state.
  2. Copy the data from the recent data directory to the node, and restart this new node. Then it will also sync from the oplogs until it reaches the secondary state.

This is how it works internally

When “initial sync” or “rsync” is called by ReplicationCoordinator  then the node goes to “STARTUP2” state, and this initial sync is done in DataReplicator

  • A sync source is selected to get the data from, then it drops all the databases except the local database, and oplogs are recreated.
  • DatabasesCloner asks syncsource for a list of the databases, and for each database it creates DatabaseCloner.
  • For each DatabaseCloner it creates CollectionCloner to clone the collections
  • This CollectionCloner calls ListIndexes on the syncsource and creates a CollectionBulkLoader for parallel index creation while data cloning
  • The node also checks for the sync source rollback id. If rollback occurred, then it restarts the initial sync. Otherwise, datareplicator is done with its work and then replicationCoordinator assumes the role for ongoing replication.

Example for the “initial sync” :

Here node enters  

"STARTUP2"- "transition to STARTUP2"

Then sync source gets selected and drops all the databases except the local database.  Next, replication oplog is created and CollectionCloner is called.

Local database not dropped: because every node has its own “local” database with its own and other nodes’ information, based on itself, this database is not replicated to other nodes.

2018-09-26T17:57:09.571+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2018-09-26T17:57:14.589+0000 I REPL     [replication-1] sync source candidate: 192.168.103.100:25003
2018-09-26T17:57:14.590+0000 I STORAGE  [replication-1] dropAllDatabasesExceptLocal 1
2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB... 2018-09-26T17:57:14.633+0000 I REPL     [replication-0] CollectionCloner::start called, on ns:admin.system.version

Finished fetching all the oplogs, and finishing up initial sync.

2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1537984626000|1, t: 9 }[-1139925876765058240]
2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Initial sync attempt finishing up.

What are oplogs and where do these reside?

oplogs stands for “operation logs”. We have used this term so many times in this blog post as these are the mandatory logs for the replica set. These operations are in the capped collection called “oplog.rs”  that resides in “local” database.

Below, this is how oplogs are stored in the collection “oplog.rs” that includes details for timestamp, operations, namespace, output.

rplint:PRIMARY> use local
rplint:PRIMARY> show collections
oplog.rs
rplint:PRIMARY> db.oplog.rs.findOne()
{
 "ts" : Timestamp(1537797392, 1),
 "h" : NumberLong("-169301588285533642"),
 "v" : 2,
 "op" : "n",
 "ns" : "",
 "o" : {
 "msg" : "initiating set"
 }
}

It consists of rolling update operations coming to the database. Then these oplogs replicate to the secondary node(s) to maintain the high availability of the data in case of failover.

When the replica MongoDB instance starts, it creates an oplog ocdefault size. For Wired tiger, the default size is 5% of disk space, with a lower bound size of 990MB. So here in the example it creates 990MB of data. If you’d like to learn more about oplog size then please refer here

2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB...

What if the same oplog is applied multiple times, will that not lead to inconsistent data?

Fortunately, oplogs are Idempotent that means the value will remain unchanged, or will provide the same output, even when applied multiple times.

Let’s check an example:

For the $inc operator that will increment the value by 1 for the filed “item”, if this oplog is applied multiple times then the result might lead to an inconsistent record if this is not Idempotent. However, rather than increasing the item value multiple times, it is actually applied only once.

rplint:PRIMARY> use db1
//inserting one document
rplint:PRIMARY> db.col1.insert({item:1, name:"abc"})
//updating document by incrementing item value with 1
rplint:PRIMARY> db.col1.update({name:"abc"},{$inc:{item:1}})
//updated value is now item:2
rplint:PRIMARY> db.col1.find()
{ "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 2, "name" : "abc" }

This is how these operations are stored in oplog, here this $inc value is stored in oplog as $set

rplint:PRIMARY> db.oplog.rs.find({ns:"db1.col1"})
//insert operation
{ "ts" : Timestamp(1537987964, 2), "t" : NumberLong(9), "h" : NumberLong("8083740413874479202"), "v" : 2, "op" : "i", "ns" : "db1.col1", "o" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 1, "name" : "abc" } }
//$inc operation is changed as ""$set" : { "item" : 2"
{ "ts" : Timestamp(1537988022, 1), "t" : NumberLong(9), "h" : NumberLong("-1432987813358665721"), "v" : 2, "op" : "u", "ns" : "db1.col1", "o2" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16") }, "o" : { "$set" : { "item" : 2 } } }

That means that however many  times it is applied, it will generate the same results, so no inconsistent data!

I hope this blog post helps you to understand multiple scenarios for MongoDB replica sets, and how data replicates to the nodes.

Oct
02
2018
--

CRITICAL UPDATE for Percona XtraDB Cluster users: 5.7.23-31.31.2 Is Now Available

Percona XtraDB Cluster 5.7

High AvailabilityTo resolve a critical regression, Percona announces the release of Percona XtraDB Cluster 5.7.23-31.31.2 on October 2, 2018 Binaries are available from the downloads section or from our software repositories.

This release resolves a critical regression in the upstream wsrep library and supersedes 5.7.23-31.31

Percona XtraDB Cluster 5.7.23-31.31.2 is now the current release, based on the following:

All Percona software is open-source and free.

Fixed Bugs

  • #2254: A cluster conflict could cause a crash in Percona XtraDB Cluster 5.7.23 if autocommit=off.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post CRITICAL UPDATE for Percona XtraDB Cluster users: 5.7.23-31.31.2 Is Now Available appeared first on Percona Database Performance Blog.

Sep
28
2018
--

High Availability for Enterprise-Grade PostgreSQL environments

High Availability PostgreSQL

PostgreSQL logoHigh availability (HA) and database replication is a major topic of discussion for database technologists. There are a number of informed choices to be made to optimize PostgreSQL replication so that you achieve HA. In this post we introduce an overview of the topic, and cover some options available to achieve high availability in PostgreSQL. We’ll then focus in on just one way to implement HA for postgres, using Patroni.  

In our previous blog posts, we have discussed the features available to build a secured PostgreSQL environment and the tools available to help you set up a reliable backup strategy. The series of articles is designed to provide a flavor of how you might go about building an enterprise-grade PostgreSQL environment using open source tools. If you’d like to see this implemented, then please do check our our webinar presentation of October 10 – we think you might find it both useful and intriguing! 

Replication in PostgreSQL

The first step towards achieving high availability is making sure you don’t rely on a single database server: your data should be replicated to at least one standby replica/slave. Database replication can be done using the two options available with PostgreSQL community software:

  1. Streaming replication
  2. Logical replication & logical decoding

When we setup streaming replication, a standby replica connects to the master (primary) and streams WAL records from it. Streaming replication is considered to be one of the safest and fastest methods for replication in PostgreSQL. A standby server becomes an exact replica of the primary with potentially minimal lag between primary and standby even on very busy transactional servers. PostgreSQL allows you to build synchronous and asynchronous replication while in streaming replication. Synchronous replication ensures that a client is given a success message only when the change is not only committed to the master but also successfully replicated on the standby server as well. As standby servers can accept read requests from clients, we can make more efficient use of our PostgreSQL setup by sparing the master from serving read requests and redirecting these to the replicas instead. You can read more about Streaming Replication in this blog post

Logical replication in PostgreSQL allows users to perform a selective replication of a subset of the tables found in the master. While streaming replication is implemented in PostgreSQL at block level—where every database in the master gets replicated to the replica, which remains read-only—logical replication suits such unique scenarios where you need to perform replication of a selection of tables in a database and (optionally) allow direct writes to your slave. A slave configured with logical replication can also be configured to replicate from multiple masters. One situation where this is helpful is when you need to replicate data from several PostgreSQL databases to a single PostgreSQL server for reporting and data warehousing tasks.

While it’s technically possible to employ standby servers configured with logical replication in an HA environment, this doesn’t fare well as a best practice. For such usage, a standby server should be able to take the place of another server “transparently” – the more they resemble the master the better. Logical replication opens the door for different data to be replicated to different servers, and this may break things

Here is a list of built-in features available in PostgreSQL that are designed to help achieve high availability:

  • Streaming replication
  • Cascaded replication
  • Asynchronous standby
  • Synchronous standby
  • Warm standby
  • Hot standby
  • pg_rewind and pg_basebackup 

Automatic failover and an always-on strategy

There are many more open source solutions that can help us achieve high availability with PostgreSQL, especially during critical moments, when a master (primary server) becomes unavailable. The following is a list of a few such open source solutions built for PostgreSQL: 

  1. Patroni
  2. Stolon
  3. repmgr
  4. PostgreSQL Automatic Failover (PAF)
  5. pglookout
  6. pgPool-II

However, the HA solutions available for PostgreSQL are not just limited to the list above. We would be interested to hear what you have implemented as an HA solution in the comments section below.

In our upcoming webinar we are going to show you a PostgreSQL replication cluster built using Patroni and how it provides a seamless failover that is transparent to the application. 

Patroni

Patroni is a PostgreSQL cluster management template/framework which stores and talks to a distributed consensus key-value store and decides on the state of the cluster. It started as a fork of Governor project. A Patroni PostgreSQL cluster is composed of many individual PostgreSQL instances running on bare metal, containers or virtual machines. In our setup we’ve used etcd for consensus management, which handles leader elections and decides the leader among a cluster of servers that are partitioned by network. This distributed consensus management can also be achieved by using other technologies, such as ZooKeeper and Consul. In the event of a failover, Patroni promotes the slave that has been assigned as a leader by etcd-like consensus key-value stores. Note that when we setup asynchronous replication, you have an option to specify maximum_lag_on_failover to restrict it from promoting a slave that is lagging by more than this value.

Here’s an architecture diagram of Patroni :  

Key Benefits of Patroni:

  1. Continuous monitoring and automatic failover
  2. Manual/scheduled switchover with a single command
  3. Built-in automation for bringing back a failed node to cluster again.
  4. REST APIs for entire cluster configuration and further tooling.
  5. Provides infrastructure for transparent application failover
  6. Distributed consensus for every action and configuration.
  7. Integration with Linux watchdog for avoiding split-brain syndrome.
If you found this post useful…

You are sure to enjoy our webinar of October 10, where we demonstrate live how to build an enterprise-grade PostgreSQL environment with open source tools. If you make it to the live presentation, you will also have the chance to ask questions of the team.

In the next blog post of this series we’ll be covering the scalability of our solution and how to accommodate an increase in traffic while maintaining the quality of the service. We’re moving ever closer to an enterprise-grade environment with open source tools!

The post High Availability for Enterprise-Grade PostgreSQL environments appeared first on Percona Database Performance Blog.

Sep
26
2018
--

Percona XtraDB Cluster 5.7.23-31.31 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona is glad to announce the release of Percona XtraDB Cluster 5.7.23-31.31 on September 26, 2018. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.23-31.31 is now the current release, based on the following:

Deprecated

The following variables are deprecated starting from this release:

This variable, which defines whether locking sessions should be converted to transactions, is deprecated in Percona XtraDB Cluster 5.7.23-31.31 because it is rarely used in practice.

Fixed Bugs

  • PXC-1017: Memcached access to InnoDB was not replicated by Galera.
  • PXC-2164: The SST script prevented SELinux from being enabled.
  • PXC-2155wsrep_sst_xtrabackup-v2 did not delete all folders on cleanup.
  • PXC-2160: In some cases, the MySQL version was not detected correctly with the Xtrabackup-v2 method of SST (State Snapshot Transfer).
  • PXC-2199: When the DROP TRIGGER IF EXISTS statement was run for a not existing trigger, the node GTID was incremented instead of the cluster GTID.
  • PXC-2209: The compression dictionary was not replicated in PXC.
  • PXC-2202: In some cases, a disconnected cluster node was not shut down.
  • PXC-2165: SST could fail if either wsrep_node_address or wsrep_sst_receive_address were not specified.
  • PXC-2213: NULL/VOID DDL transactions could commit in a wrong order.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

 

The post Percona XtraDB Cluster 5.7.23-31.31 Is Now Available appeared first on Percona Database Performance Blog.

Sep
18
2018
--

Percona XtraDB Cluster 5.6.41-28.28 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona announces the release of Percona XtraDB Cluster 5.6.41-28.28 (PXC) on September 18, 2018. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.41-28.28 is now the current release, based on the following:

Fixed Bugs

  • PXC-1017: Memcached API is now disabled if node is acting as a cluster node, because InnoDB Memcached access is not replicated by Galera.
  • PXC-2164: SST script compatibility with SELinux was improved by forcing it to look for port associated with the said process only.
  • PXC-2155: Temporary folders created during SST execution are now deleted on cleanup.
  • PXC-2199: TOI replication protocol was fixed to prevent unexpected GTID generation caused by the  DROP TRIGGER IF EXISTS statement logged by MySQL as a successful one due to its IF EXISTS clause.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.41-28.28 Is Now Available appeared first on Percona Database Performance Blog.

Sep
07
2018
--

Setting up Streaming Replication in PostgreSQL

postgres streaming replication

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender

 ,

wal receiver

 and

startup

 process, these play a major role in achieving streaming replication in postgres.

A

wal sender

 process runs on a master, whereas the

wal receiver

 and

startup

 processes runs on its slave. When you start the replication, a

wal receiver

 process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the

wal sender

 process on master sends the WAL data until the latest LSN starting from the LSN sent by the

wal receiver

, to the slave.

Wal receiver

 writes the WAL data sent by

wal sender

 to WAL segments. It is the

startup

 process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave

Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

CREATE USER replicator
WITH REPLICATION
ENCRYPTED PASSWORD 'replicator';

Step 2:

The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set to 3 if you are starting with one slave. For every slave, you may add 2 wal senders.
  • wal_keep_segments : Set the WAL retention in pg_xlog (until PostgreSQL 9.x) and pg_wal (from PostgreSQL 10). Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  • archive_command : This parameter takes a shell command or external programs. It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  • listen_addresses : Set it to * or the range of IP Addresses that need to be whitelisted to connect to your master PostgreSQL server. Your slave IP should be whitelisted too, else, the slave cannot connect to the master to replicate/replay WALs.
  • hot_standby : Must be set to ON on standby/replica and has no effect on the master. However, when you setup your replication, parameters set on the master are automatically copied. This parameter is important to enable READS on slave. Otherwise, you cannot run your SELECT queries against slave.

The above parameters can be set on the master using these commands followed by a restart:

ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';

$ pg_ctl -D $PGDATA restart -mf

Step 3:

Add an entry to

pg_hba.conf

 of the master to allow replication connections from the slave. The default location of

pg_hba.conf

 is the data directory. However, you may modify the location of this file in the file 

postgresql.conf

. In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command =>

pg_lsclusters

.

host replication replicator 192.168.0.28/32 md5

The IP address mentioned in this line must match the IP address of your slave server. Please change the IP accordingly.

In order to get the changes into effect, issue a SIGHUP:

$ pg_ctl -D $PGDATA reload
Or
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"

Step 4:

pg_basebackup

 helps us to stream the data through the 

wal sender

 process from the master to a slave to set up replication. You can also take a tar format backup from master and copy that to the slave server. You can read more about tar format pg_basebackup here

The following step can be used to stream data directory from master to slave. This step can be performed on a slave.

$ pg_basebackup -h 192.168.0.28 -U replicator -p 5432 -D $PGDATA -P -Xs -R

Please replace the IP address with your master’s IP address.

In the above command, you see an optional argument -R. When you pass -R, it automatically creates a

recovery.conf

  file that contains the role of the DB instance and the details of its master. It is mandatory to create the recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on master that can be copied to slave, you must create this recovery.conf file manually. Here are the contents of the recovery.conf file:

$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.28 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

In the above file, the role of the server is defined by standby_mode.

standby_mode

  must be set to ON for slaves in postgres.
And to stream WAL data, details of the master server are configured using the parameter

primary_conninfo

 .

The two parameters

standby_mode

  and

primary_conninfo

 are automatically created when you use the optional argument -R while taking a pg_basebackup. This recovery.conf file must exist in the data directory($PGDATA) of Slave.

Step 5:

Start your slave once the backup and restore are completed.

If you have configured a backup (remotely) using the streaming method mentioned in Step 4, it just copies all the files and directories to the data directory of the slave. Which means it is both a back up of the master data directory and also provides for restore in a single step.

If you have taken a tar back up from the master and shipped it to the slave, you must unzip/untar the back up to the slave data directory, followed by creating a recovery.conf as mentioned in the previous step. Once done, you may proceed to start your PostgreSQL instance on the slave using the following command.

$ pg_ctl -D $PGDATA start

Step 6:

In a production environment, it is always advisable to have the parameter

restore_command

set appropriately. This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave, if the WAL is not available on the master.

For example:

If a network issue has caused a slave to lag behind the master for a substantial time, it is less likely to have those WALs required by the slave available on the master’s

pg_xlog

 or

pg_wal

 location. Hence, it is sensible to archive the WALs to a safe location, and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. To achieve that, you have to add a line similar to the next example to your recovery.conf file in slave. You may substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

Setting the above parameter requires a restart and cannot be done online.

Final step: validate that replication is setup

As discussed earlier, a

wal sender

  and a

wal receiver

  process are started on the master and the slave after setting up replication. Check for these processes on both master and slave using the following commands.

On Master
==========
$ ps -eaf | grep sender
On Slave
==========
$ ps -eaf | grep receiver
$ ps -eaf | grep startup

You must see those all three processes running on master and slave as you see in the following example log.

On Master
=========
$ ps -eaf | grep sender
postgres  1287  1268  0 10:40 ?        00:00:00 postgres: wal sender process replicator 192.168.0.28(36924) streaming 0/50000D68
On Slave
=========
$ ps -eaf | egrep "receiver|startup"
postgres  1251  1249  0 10:40 ?        00:00:00 postgres: startup process   recovering 000000010000000000000050
postgres  1255  1249  0 10:40 ?        00:00:04 postgres: wal receiver process   streaming 0/50000D68

You can see more details by querying the master’s pg_stat_replication view.

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.0.28
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Reference : https://www.postgresql.org/docs/10/static/warm-standby.html#STANDBY-SERVER-SETUP

If you found this post interesting…

Did you know that Percona now provides PostgreSQL support services? If you’d like to read more about this, here’s some more information. We’re here to help.

The post Setting up Streaming Replication in PostgreSQL appeared first on Percona Database Performance Blog.

Aug
17
2018
--

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';
GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%';
FLUSH PRIVILEGES;

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"
#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

schema_mappings:
sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.

Aug
10
2018
--

This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The call for submitting a talk to Percona Live Europe 2018 is closing today, and while there may be a short extension, have you already got your talk submitted? I suggest doing so ASAP!

I’m sure many of you have heard of cryptocurrencies, the blockchain, and so on. But how many of you realiize that Coinbase, an application that handles cryptocurrency trades, matching book orders, and more, is powered by MongoDB? With the hype and growth in interest in late 2017, Coinbase has had to scale. They gave an excellent talk at MongoDB World, titled MongoDB & Crypto Mania (the video is worth a watch), and they’ve also written a blog post, How we’re scaling our platform for spikes in customer demand. They even went driver hacking (the Ruby driver for MongoDB)!

It is great to see there be a weekly review of happenings in the Vitess world.

PingCap and TiDB have been to many Percona Live events to present, and recently hired Morgan Tocker. Morgan has migrated his blog from MySQL to TiDB. Read more about his experience in, This blog, now Powered by WordPress + TiDB. Reminds me of the early days of Galera Cluster and showing how Drupal could be powered by it!

Releases

Link List

  • Sys Schema MySQL 5.7+ – blogger from Wipro, focusing on an introduction to the sys schema on MySQL (note: still not available in the MariaDB Server fork).
  • Prometheus Graduates in the CNCF, so is considered a mature project. Criteria for graduation is such that “projects must demonstrate thriving adoption, a documented, structured governance process, and a strong commitment to community sustainability and inclusivity.” Percona benefits from Prometheus in Percona Monitoring & Management (PMM), so we should celebrate this milestone!
  • Replicating from MySQL 8.0 to MySQL 5.7
  • A while ago in this column, we linked to Shlomi Noach’s excellent post on MySQL High Availability at GitHub. We were also introduced to GitHub Load Balancer (GLB), which they ran on top of HAProxy. However back then, GLB wasn’t open; now you can get GLB Director: GLB: GitHub’s open source load balancer. The project describes GLB Director as: “… a Layer 4 load balancer which scales a single IP address across a large number of physical machines while attempting to minimise connection disruption during any change in servers. GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.”
  • F1 Query: Declarative Querying at Scale – a well-written paper.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF appeared first on Percona Database Performance Blog.

Aug
02
2018
--

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

Jul
20
2018
--

InnoDB Cluster in a Nutshell Part 3: MySQL Shell

MySQL InnoDB Cluster MySQL Shell

MySQL InnoDB Cluster MySQL ShellWelcome to the third part of this series. I’m glad you’re still reading, as hopefully this means you find this subject interesting at least. Previously we presented the first two components of MySQL InnoDB Cluster: Group Replication and MySQL Router and now we will discuss the last component, MySQL Shell.

MySQL Shell

This is the last component in the cluster and I love it. Oracle have created this tool to centralize cluster management, providing a friendly, command-line based user interface.

The tool can be defined as an advanced MySQL shell, which is much more powerful than the well known MySQL client. With the capacity to work with both relational and document (JSON) data, the tool provides an extended capability to interact with the database from a single place.

MySQL Shell is also able to understand different languages:

  • JavaScript (default) which includes several built-in functions to administer the cluster—create, destroy, restart, etc.—in a very easy way.
  • Python it provides an easy way to write Python code to interact with the database. This is particularly useful for developers who don’t need to have SQL skills or run applications to test code.
  • SQL to work in classic mode to query database as we used to do with the old MySQL client.

A very interesting feature provided with MySQL Shell is the ability to establish different connections to different servers/clusters from within the same shell. There is no need to exit to connect to a different server, just issuing the command \connect will make this happen. As DBA, I find this pretty useful when handling multiple clusters/servers.

Some of the features present in this tool:

  • Capacity to use both Classic and X protocols.
  • Online switch mode to change languages (JavaScript, Python and SQL)
  • Auto-completion of commands using tab, a super expected feature in MySQL client.
  • Colored formatting output that also supports different formats like Table, Tab-separated and Json formats.
  • Batch mode that processes batches of commands allowing also an interactive mode to print output according each line is processed.

Some sample commands

Samples of new tool and execution modes:

#switch modes
\sql
\js
\py
#connect to instance
\connect user@host:[port]
#create a cluster (better to handle through variables)
var cluster=dba.createCluster('percona')
#add instances to cluster
cluster.addInstance(‘root@192.168.70.2:3306’)
#check cluster status
cluster.status()
#using another variable
var cluster2=dba.getCluster(‘percona’)
cluster.status()
#get cluster structure
cluster.describe()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#rejoin instance to cluster - needs to be executed locally to the instance
cluster.rejoinInstance()
#recover from lost quorum
cluster.forceQuorumUsingPartitionOf(‘root@localhost:3306’)
#recover from lost quorum
cluster.rebootClusterFromCompleteOutage()
#destroy cluster
cluster.dissolve({force:true});

Personally, I think this tool is a very good replacement for the classic MySQL client. Sadly, mysql-server installations do not include MySQL shell by default, but it is worth getting used to. I recommend you try it.

Conclusion

We finally reached the end of this series. I hope you have enjoyed this short introduction to what seems to be Oracle’s bid to have a built-in High Availability solution based on InnoDB. It may become a good competitor to Galera-based solutions. Still, there is a long way to go, as the tool was only just released as GA (April 2018). There are a bunch of things that need to be addressed before it becomes consistent enough to be production-ready. In my personal opinion, it is not—yet. Nevertheless, I think it is a great tool that will eventually be a serious player in the HA field as it’s an excellent, flexible and easy to deploy solution.

The post InnoDB Cluster in a Nutshell Part 3: MySQL Shell appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com