Nov
05
2018
--

How to Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node in InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication

Quickly Add a Node to an InnoDB Cluster or Group Replication (Shutterstock)

In this blog, we’ll look at how to quickly add a node to an InnoDB Cluster or Group Replication using Percona XtraBackup.

Adding nodes to a Group Replication cluster can be easy (documented here), but it only works if the existing nodes have retained all the binary logs since the creation of the cluster. Obviously, this is possible if you create a new cluster from scratch. The nodes rotate old logs after some time, however. Technically, if the

gtid_purged

 set is non-empty, it means you will need another method to add a new node to a cluster. You also need a different method if data becomes inconsistent across cluster nodes for any reason. For example, you might hit something similar to this bug, or fall prey to human error.

Hot Backup to the Rescue

The quick and simple method I’ll present here requires the Percona XtraBackup tool to be installed, as well as some additional small tools for convenience. I tested my example on Centos 7, but it works similarly on other Linux distributions. First of all, you will need the Percona repository installed:

# yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm -y -q

Then, install Percona XtraBackup and the additional tools. You might need to enable the EPEL repo for the additional tools and the experimental Percona repo for XtraBackup 8.0 that works with MySQL 8.0. (Note: XtraBackup 8.0 is still not GA when writing this article, and we do NOT recommend or advise that you install XtraBackup 8.0 into a production environment until it is GA). For MySQL 5.7, Xtrabackup 2.4 from the regular repo is what you are looking for:

# grep -A3 percona-experimental-\$basearch /etc/yum.repos.d/percona-release.repo
[percona-experimental-$basearch]
name = Percona-Experimental YUM repository - $basearch
baseurl = http://repo.percona.com/experimental/$releasever/RPMS/$basearch
enabled = 1

# yum install pv pigz nmap-ncat percona-xtrabackup-80 -q
==============================================================================================================================================
 Package                             Arch                 Version                             Repository                                 Size
==============================================================================================================================================
Installing:
 nmap-ncat                           x86_64               2:6.40-13.el7                       base                                      205 k
 percona-xtrabackup-80               x86_64               8.0.1-2.alpha2.el7                  percona-experimental-x86_64                13 M
 pigz                                x86_64               2.3.4-1.el7                         epel                                       81 k
 pv                                  x86_64               1.4.6-1.el7                         epel                                       47 k
Installing for dependencies:
 perl-DBD-MySQL                      x86_64               4.023-6.el7                         base                                      140 k
Transaction Summary
==============================================================================================================================================
Install  4 Packages (+1 Dependent package)
Is this ok [y/d/N]: y
#

You need to do it on both the source and destination nodes. Now, my existing cluster node (I will call it a donor) – gr01 looks like this:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
  CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
   MEMBER_HOST: gr01
   MEMBER_PORT: 3306
  MEMBER_STATE: ONLINE
   MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.13
1 row in set (0.00 sec)
gr01 > show global variables like 'gtid%';
+----------------------------------+-----------------------------------------------+
| Variable_name                    | Value                                         |
+----------------------------------+-----------------------------------------------+
| gtid_executed                    | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662 |
| gtid_executed_compression_period | 1000                                          |
| gtid_mode                        | ON                                            |
| gtid_owned                       |                                               |
| gtid_purged                      | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-295538 |
+----------------------------------+-----------------------------------------------+
5 rows in set (0.01 sec)

The new node candidate (I will call it a joiner) – gr02, has no data but the same MySQL version installed. It also has the required settings in place, like the existing node address in group_replication_group_seeds, etc. The next step is to stop the MySQL service on the joiner (if already running), and wipe out it’s datadir:

[root@gr02 ~]# rm -fr /var/lib/mysql/*

and start the “listener” process, that waits to receive the data snapshot (remember to open the TCP port if you have a firewall):

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql

Then, start the backup job on the donor:

[root@gr01 ~]# xtrabackup --user=root --password=*** --backup --parallel=4 --stream=xbstream --target-dir=./ 2> backup.log |pv|pigz -c --fast| nc -w 2 192.168.56.98 4444
240MiB 0:00:02 [81.4MiB/s] [ <=>

On the joiner side, we will see:

[root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql
21.2MiB 0:03:30 [ 103kiB/s] [ <=> ]
[root@gr02 ~]# du -hs /var/lib/mysql
241M /var/lib/mysql

BTW, if you noticed the difference in transfer rate between the two, please note that on the donor side I put

|pv|

 before the compressor while in the joiner before decompressor. This way, I can monitor the compression ratio at the same time!

The next step will be to prepare the backup on joiner:

[root@gr02 ~]# xtrabackup --use-memory=1G --prepare --target-dir=/var/lib/mysql 2>prepare.log
[root@gr02 ~]# tail -1 prepare.log
181019 19:18:56 completed OK!

and fix the files ownership:

[root@gr02 ~]# chown -R mysql:mysql /var/lib/mysql

Now we should verify the GTID position information and restart the joiner (I have the

group_replication_start_on_boot=off

 in my.cnf):

[root@gr02 ~]# cat /var/lib/mysql/xtrabackup_binlog_info
binlog.000023 893 aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662
[root@gr02 ~]# systemctl restart mysqld

Now, let’s check if the position reported by the node is consistent with the above:

gr02 > select @@GLOBAL.gtid_executed;
+-----------------------------------------------+
| @@GLOBAL.gtid_executed                        |
+-----------------------------------------------+
| aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302660 |
+-----------------------------------------------+
1 row in set (0.00 sec)

No, it is not. We have to correct it:

gr02 > reset master; set global gtid_purged="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662";
Query OK, 0 rows affected (0.05 sec)
Query OK, 0 rows affected (0.00 sec)

Finally, start the replication:

gr02 > START GROUP_REPLICATION;
Query OK, 0 rows affected (3.91 sec)

Let’s check the cluster status again:

gr01 > select * from performance_schema.replication_group_members\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
MEMBER_HOST: gr01
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.13
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
MEMBER_HOST: gr02
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.13
2 rows in set (0.00 sec)
gr01 > select * from performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 3
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
            LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302665
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 2
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 3
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 2. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 15399708149765074:4
                                 MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 0
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
            LAST_CONFLICT_FREE_TRANSACTION:
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 0
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
2 rows in set (0.00 sec)

OK, our cluster is consistent! The new node joined successfully as secondary. We can proceed to add more nodes!

Oct
17
2018
--

Can MySQL Parallel Replication Help My Slave?

InnoDB Row Operations per Hour graph from Percona Monitoring and Management performance monitoring tool

Parallel replication has been around for a few years now but is still not that commonly used. I had a customer where the master had a very large write workload. The slave could not keep up so I recommended to use parallel slave threads. But how can I measure if it really helps and is working?

At my customer the

slave_parallel_workers

  was 0. But how big should I increase it, maybe to 1? Maybe to 10? There is a blog post about how can we see how many threads are actually used, which is a great help.

We changed the following variables on the slave:

slave_parallel_type = LOGICAL_CLOCK;
slave_parallel_workers = 40;
slave_preserve_commit_order = ON;

40 threads sounds a lot, right? Of course, this is workload specific: if the transactions are independent it might be useful.

Let’s have a look, how many threads are working:

mysql> SELECT performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID AS THREAD_ID
, performance_schema.events_transactions_summary_by_thread_by_event_name.COUNT_STAR AS COUNT_STAR
FROM performance_schema.events_transactions_summary_by_thread_by_event_name
WHERE performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID IN
     (SELECT performance_schema.replication_applier_status_by_worker.THREAD_ID
      FROM performance_schema.replication_applier_status_by_worker);
+-----------+------------+
| THREAD_ID | COUNT_STAR |
+-----------+------------+
| 25882 | 442481 |
| 25883 | 433200 |
| 25884 | 426460 |
| 25885 | 419772 |
| 25886 | 413751 |
| 25887 | 407511 |
| 25888 | 401592 |
| 25889 | 395169 |
| 25890 | 388861 |
| 25891 | 380657 |
| 25892 | 371923 |
| 25893 | 362482 |
| 25894 | 351601 |
| 25895 | 339282 |
| 25896 | 325148 |
| 25897 | 310051 |
| 25898 | 292187 |
| 25899 | 272990 |
| 25900 | 252843 |
| 25901 | 232424 |
+-----------+------------+

You can see all the threads are working. Which is great.

But did this really speed up the replication? Could the slave write more in the same period of time?

Let’s see the replication lag:

MySQL Replication Delay graph from PMM

As we can see, lag goes down quite quickly. Is this because the increased thread numbers? Or because the job which generated the many inserts finished and there are no more writes coming? (The replication delay did not go to 0 because this slave is deliberately delayed by an hour.)

Luckily in PMM we have other graphs as well that can help us. Like this one showing InnoDB row operations:

InnoDB Row Operations graph from PMM

 

That looks promising: the slave now inserts many more rows than usual. But how much rows were inserted, actually? Let’s create a new graph to see how many rows were inserted per hour. In PMM we already have all this information, we just have to create a new graph using the following query:

increase(mysql_global_status_innodb_row_ops_total{instance="$host",operation!="read"}[1h])

And this is the result:

InnoDB Row Operations per Hour graph from Percona Monitoring and Management performance monitoring tool

We can see a huge jump in the number of inserted rows per hour, it went from ~50Mil to 200-400Mil per hours. We can say that increasing the number of 

slave_parallel_workers

  really helped.

Conclusion

In this case, parallel replication was extremely useful and we could confirm that using PMM and Performance Schema. If you tune the

slave_parallel_workers

  check the graphs. You can show the impact to your boss. ?

 

 

Oct
10
2018
--

MongoDB Replica set Scenarios and Internals

MongoDB replica sets replication internals r

MongoDB replica sets replication internals rThe MongoDB® replica set is a group of nodes with one set as the primary node, and all other nodes set as secondary nodes. Only the primary node accepts “write” operations, while other nodes can only serve “read” operations according to the read preferences defined. In this blog post, we’ll focus on some MongoDB replica set scenarios, and take a look at the internals.

Example configuration

We will refer to a three node replica set that includes one primary node and two secondary nodes running as:

"members" : [
{
"_id" : 0,
"name" : "192.168.103.100:25001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3533,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"electionTime" : Timestamp(1537797392, 2),
"electionDate" : ISODate("2018-09-24T13:56:32Z"),
"configVersion" : 3,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.103.100:25002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 3063,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.664Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25001",
"configVersion" : 3
},
{
"_id" : 2,
"name" : "192.168.103.100:25003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2979,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.989Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.103.100:25002",
"configVersion" : 3
}

Here, the primary is running on port 25001, and the two secondaries are running on ports 25002 and 25003 on the same host.

Secondary nodes can only sync from Primary?

No, it’s not mandatory. Each secondary can replicate data from the primary or any other secondary to the node that is syncing. This term is also known as chaining, and by default, this is enabled.

In the above replica set, you can see that secondary node

"_id":2 

  is syncing from another secondary node

"_id":1

   as

"syncingTo" : "192.168.103.100:25002" 

This can also be found in the logs as here the parameter

chainingAllowed :true

   is the default setting.

settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } }

Chaining?

That means that a secondary member node is able to replicate from another secondary member node instead of from the primary node. This helps to reduce the load from the primary. If the replication lag is not tolerable, then chaining could be disabled.

For more details about chaining and the steps to disable it please refer to my earlier blog post here.

Ok, then how does the secondary node select the source to sync from?

If Chaining is False

When chaining is explicitly set to be false, then the secondary node will sync from the primary node only or could be overridden temporarily.

If Chaining is True

  • Before choosing any sync node, TopologyCoordinator performs validations like:
    • Whether chaining is set to true or false.
    • If that particular node is part of the current replica set configurations.
    • Identify the node ahead with oplog with the lowest ping time.
    • The source code that includes validation is here.
  • Once the validation is done, SyncSourceSelector relies on SyncSourceResolver which contains the result and details for the new sync source
  • To get the details and response, SyncSourceResolver coordinates with ReplicationCoordinator
  • This ReplicationCoordinator is responsible for the replication, and co-ordinates with TopologyCoordinator
  • The TopologyCoordinator is responsible for topology of the cluster. It finds the primary oplog time and checks for the maxSyncSourceLagSecs
  • It will reject the source to sync from if the maxSyncSourceLagSecs  is greater than the newest oplog entry. The code for this can be found here
  • If the criteria for the source selection is not fulfilled, then BackgroundSync thread waits and restarts the whole process again to get the sync source.

Example for “unable to find a member to sync from” then, in the next attempt, finding a candidate to sync from

This can be found in the log like this. On receiving the message from rsBackgroundSync thread

could not find member to sync from

, the whole internal process restarts and finds a member to sync from i.e.

sync source candidate: 192.168.103.100:25001

, which means it is now syncing from node 192.168.103.100 running on port 25001.

2018-09-24T13:58:43.197+0000 I REPL     [rsSync] transition to RECOVERING
2018-09-24T13:58:43.198+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2018-09-24T13:58:43.201+0000 I REPL     [rsSync] transition to SECONDARY
2018-09-24T13:58:59.208+0000 I REPL     [rsBackgroundSync] sync source candidate: 192.168.103.100:25001

  • Once the sync source node is selected, SyncSourceResolver probes the sync source to confirm that it is able to fetch the oplogs.
  • RollbackID is also fetched i.e. rbid  after the first batch is returned by oplogfetcher.
  • If all eligible sync sources are too fresh, such as during initial sync, then the syncSourceStatus Oplog start is missing and earliestOpTimeSeen will set a new minValid.
  • This minValid is also set in the case of rollback and abrupt shutdown.
  • If the node has a minValid entry then this is checked for the eligible sync source node.

Example showing the selection of a new sync source when the existing source is found to be invalid

Here, as the logs show, during sync the node chooses a new sync source. This is because it found the original sync source is not ahead, so not does not contain recent oplogs from which to sync.

2018-09-25T15:20:55.424+0000 I REPL     [replication-1] Choosing new sync source because our current sync source, 192.168.103.100:25001, has an OpTime ({ ts: Timestamp 1537879296000|1, t: 4 }) which is not ahead of ours ({ ts: Timestamp 1537879296000|1, t: 4 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)

2018-09-25T15:20:55.425+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source 192.168.103.100:25001 (config version: 3; last applied optime: { ts: Timestamp 1537879296000|1, t: 4 }; sync source index: -1; primary index: -1) is no longer valid

  • If the secondary node is too far behind the eligible sync source node, then the node will enter maintenance node and then resync needs to be call manually.
  • Once the sync source is chosen, BackgroundSync starts oplogFetcher.

Example for oplogFetcher

Here is an example of fetching oplog from the “oplog.rs” collection, and checking for the greater than required timestamp.

2018-09-26T10:35:07.372+0000 I COMMAND  [conn113] command local.oplog.rs command: getMore { getMore: 20830044306, collection: "oplog.rs", maxTimeMS: 5000, term: 7, lastKnownCommittedOpTime: { ts: Timestamp 1537955038000|1, t: 7 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1537903865000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 7, readConcern: { afterOpTime: { ts: Timestamp 1537903865000|1, t: 6 } } } planSummary: COLLSCAN cursorid:20830044306 keysExamined:0 docsExamined:0 numYields:1 nreturned:0 reslen:451 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 3063398ms

When and what details replica set nodes communicate with each other?

At a regular interval, all the nodes communicate with each other to check the status of the primary node, check the status of the sync source, to get the oplogs and so on.

ReplicationCoordinator has ReplicaSetConfig that has a list of all the replica set nodes, and each node has a copy of it. This makes nodes aware of other nodes under same replica set.

This is how nodes communicate in more detail:

Heartbeats: This checks the status of other nodes i.e. alive or die

heartbeatInterval: Every node, at an interval of two seconds, sends the other nodes a heartbeat to make them aware that “yes I am alive!”

heartbeatTimeoutSecs: This is a timeout, and means that if the heartbeat is not returned in 10 seconds then that node is marked as inaccessible or simply die.

Every heartbeat is identified by these replica set details:

  • replica set config version
  • replica set name
  • Sender host address
  • id from the replicasetconfig

The source code could be referred to from here.

When the remote node receives the heartbeat, it processes this data and validates if the details are correct. It then prepares a ReplSetHeartbeatResponse, that includes:

  • Name of the replica set, config version, and optime details
  • Details about primary node as per the receiving node.
  • Sync source details and state of receiving node

This heartbeat data is processed, and if primary details are found then the election gets postponed.

TopologyCoordinator checks for the heartbeat data and confirms if the node is OK or NOT. If the node is OK then no action is taken. Otherwise it needs to be reconfigured or else initiate a priority takeover based on the config.

Response from oplog fetcher

To get the oplogs from the sync source, nodes communicate with each other. This oplog fetcher fetches oplogs through “find” and “getMore”. This will only affect the downstream node that gets metadata from its sync source to update its view from the replica set.

OplogQueryMetadata only comes with OplogFetcher responses

OplogQueryMetadata comes with OplogFetcher response and ReplSetMetadata comes with all the replica set details including configversion and replication commands.

Communicate to update Position commands:

This is to get an update for replication progress. ReplicationCoordinatorExternalState creates SyncSourceFeedback sends replSetUpdatePosition commands.

It includes Oplog details, Replicaset config version, and replica set metadata.

If a new node is added to the existing replica set, how will that node get the data?

If a new node is added to the existing replica set then the “initial sync” process takes place. This initial sync can be done in two ways:

  1. Just add the new node to the replicaset and let initial sync threads restore the data. Then it syncs from the oplogs until it reaches the secondary state.
  2. Copy the data from the recent data directory to the node, and restart this new node. Then it will also sync from the oplogs until it reaches the secondary state.

This is how it works internally

When “initial sync” or “rsync” is called by ReplicationCoordinator  then the node goes to “STARTUP2” state, and this initial sync is done in DataReplicator

  • A sync source is selected to get the data from, then it drops all the databases except the local database, and oplogs are recreated.
  • DatabasesCloner asks syncsource for a list of the databases, and for each database it creates DatabaseCloner.
  • For each DatabaseCloner it creates CollectionCloner to clone the collections
  • This CollectionCloner calls ListIndexes on the syncsource and creates a CollectionBulkLoader for parallel index creation while data cloning
  • The node also checks for the sync source rollback id. If rollback occurred, then it restarts the initial sync. Otherwise, datareplicator is done with its work and then replicationCoordinator assumes the role for ongoing replication.

Example for the “initial sync” :

Here node enters  

"STARTUP2"- "transition to STARTUP2"

Then sync source gets selected and drops all the databases except the local database.  Next, replication oplog is created and CollectionCloner is called.

Local database not dropped: because every node has its own “local” database with its own and other nodes’ information, based on itself, this database is not replicated to other nodes.

2018-09-26T17:57:09.571+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2018-09-26T17:57:14.589+0000 I REPL     [replication-1] sync source candidate: 192.168.103.100:25003
2018-09-26T17:57:14.590+0000 I STORAGE  [replication-1] dropAllDatabasesExceptLocal 1
2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB... 2018-09-26T17:57:14.633+0000 I REPL     [replication-0] CollectionCloner::start called, on ns:admin.system.version

Finished fetching all the oplogs, and finishing up initial sync.

2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1537984626000|1, t: 9 }[-1139925876765058240]
2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Initial sync attempt finishing up.

What are oplogs and where do these reside?

oplogs stands for “operation logs”. We have used this term so many times in this blog post as these are the mandatory logs for the replica set. These operations are in the capped collection called “oplog.rs”  that resides in “local” database.

Below, this is how oplogs are stored in the collection “oplog.rs” that includes details for timestamp, operations, namespace, output.

rplint:PRIMARY> use local
rplint:PRIMARY> show collections
oplog.rs
rplint:PRIMARY> db.oplog.rs.findOne()
{
 "ts" : Timestamp(1537797392, 1),
 "h" : NumberLong("-169301588285533642"),
 "v" : 2,
 "op" : "n",
 "ns" : "",
 "o" : {
 "msg" : "initiating set"
 }
}

It consists of rolling update operations coming to the database. Then these oplogs replicate to the secondary node(s) to maintain the high availability of the data in case of failover.

When the replica MongoDB instance starts, it creates an oplog ocdefault size. For Wired tiger, the default size is 5% of disk space, with a lower bound size of 990MB. So here in the example it creates 990MB of data. If you’d like to learn more about oplog size then please refer here

2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB...

What if the same oplog is applied multiple times, will that not lead to inconsistent data?

Fortunately, oplogs are Idempotent that means the value will remain unchanged, or will provide the same output, even when applied multiple times.

Let’s check an example:

For the $inc operator that will increment the value by 1 for the filed “item”, if this oplog is applied multiple times then the result might lead to an inconsistent record if this is not Idempotent. However, rather than increasing the item value multiple times, it is actually applied only once.

rplint:PRIMARY> use db1
//inserting one document
rplint:PRIMARY> db.col1.insert({item:1, name:"abc"})
//updating document by incrementing item value with 1
rplint:PRIMARY> db.col1.update({name:"abc"},{$inc:{item:1}})
//updated value is now item:2
rplint:PRIMARY> db.col1.find()
{ "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 2, "name" : "abc" }

This is how these operations are stored in oplog, here this $inc value is stored in oplog as $set

rplint:PRIMARY> db.oplog.rs.find({ns:"db1.col1"})
//insert operation
{ "ts" : Timestamp(1537987964, 2), "t" : NumberLong(9), "h" : NumberLong("8083740413874479202"), "v" : 2, "op" : "i", "ns" : "db1.col1", "o" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 1, "name" : "abc" } }
//$inc operation is changed as ""$set" : { "item" : 2"
{ "ts" : Timestamp(1537988022, 1), "t" : NumberLong(9), "h" : NumberLong("-1432987813358665721"), "v" : 2, "op" : "u", "ns" : "db1.col1", "o2" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16") }, "o" : { "$set" : { "item" : 2 } } }

That means that however many  times it is applied, it will generate the same results, so no inconsistent data!

I hope this blog post helps you to understand multiple scenarios for MongoDB replica sets, and how data replicates to the nodes.

Oct
05
2018
--

MongoDB-Disable Chained Replication

MongoDB chained replication

MongoDB chained replicationIn this blog post, we will learn what MongoDB chained replication is, why you might choose to disable it, and the steps you need to take to do so.

What is chain replication?

Chain Replication in MongoDB, as the name suggests, means that a secondary member is able to replicate from another secondary member instead of a primary.

Default settings

By default, chained replication is enabled in MongoDB. It helps to reduce the load from the primary but it may lead to a replication lag. When enabled, the secondary node selects its target using the ping time for the closest node.

Reasons to disable chained replication

The main reason to disable chain replication is replication lag. In other words, the length of the delay between MongoDB writing an operation on the primary and replicating the same operation to the secondary.

In either case—chained replication enabled or disabled—replication works in the same way when the primary node fails: the secondary server will promote to the primary. Therefore, writing and reading of data from the application is not affected.

Steps to disable chained replication

1) Check the current status of chained replication in replica set configuration for “settings” like this:

PRIMARY> rs.config().settings
{
"chainingAllowed" : true,
}

2) Disable chained replication, set “chainingAllowed” to false and then reconfig to implement changes.

PRIMARY> cg = rs.config()
PRIMARY> cg.settings.chainingAllowed = false
false
PRIMARY> rs.reconfig(cg)

3) Check again for the current status of chained replication and its done.

PRIMARY> rs.config().settings
{
	"chainingAllowed" : false,
}

Can I override sync source target even after disabling chaining?

Yes, even after you have disabled chained replication, you can still override sync target, though only temporarily. That means it will be overridden until:

  • mognod instance restarts
  • established connection between sync source and secondary node.
  • Additional: if chaining is enabled and sync source falls more than 30 seconds behind another member then the SyncSourceResolver will choose other member having recent oplogs to sync from.

Override sync source

Parameter “replSetSyncFrom” could be used, for example, the secondary node is syncing from host

192.168.103.100:27001

  and we would like to sync it from

192.168.103.100:27003

1) Check for the current host it is syncing from:

PRIMARY> rs.status()
{
			"_id" : 1,
			"name" : "192.168.103.100:27002",
			"syncingTo" : "192.168.103.100:27001",
			"syncSourceHost" : "192.168.103.100:27001",
		},

2) Login to that mongod, and execute:

SECONDARY> db.adminCommand( { replSetSyncFrom: "192.168.103.100:27003" })

3) Check replica set status again

SECONDARY> rs.status()
{
			"_id" : 1,
			"name" : "192.168.103.100:27002",
			"syncingTo" : "192.168.103.100:27003",
			"syncSourceHost" : "192.168.103.100:27003",
		},

This is how we can override the sync source in case of testing, maintenance or while the replica is not syncing from the required host.

I hope this blog helps you to understand how to disable chained replication or override the sync source for the specific purpose or reason. The preferred setting of the chainingAllowed parameter is true as it reduces the load from the primary node and also a default setting.

Oct
01
2018
--

How To Fix MySQL Replication After an Incompatible DDL Command

fix MySQL replication after incompatible DDL

fix MySQL replication after incompatible DDLMySQL supports replicating to a slave that is one release higher. This allows us to easily upgrade our MySQL setup to a new version, by promoting the slave and pointing the application to it. However, though unsupported, there are times when the MySQL version of slave deployed is one release lower. In this scenario, if your application has been performing much better on an older version of MySQL, you would like to have a convenient option to downgrade. You can simply promote the slave to get the old performance back.

The MySQL manual says that ROW based replication can be used to replicate to a lower version, provided that no DDLs replicated are incompatible with the slave. One such incompatible command is ALTER USER which is a new feature in MySQL 5.7 and not available on 5.6. :

ALTER USER 'testuser'@'localhost' IDENTIFIED BY 'testuser';

Executing that command would break replication. Here is an example of a broken slave in non-GTID replication:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5723
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 36915649
               Relay_Log_File: mysql_sandbox5641-relay-bin.000006
                Relay_Log_Pos: 36174552
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
*** redacted ***
                   Last_Errno: 1064
                   Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A784' at line 1' on query. Default database: ''. Query: 'ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8''
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 36174373
              Relay_Log_Space: 36916179
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
*** redacted ***
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1064
               Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A784' at line 1' on query. Default database: ''. Query: 'ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8''
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 00005723-0000-0000-0000-000000005723
*** redacted ***
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 180918 22:03:40
*** redacted ***
                Auto_Position: 0
1 row in set (0.00 sec)

Skipping the statement does not resume replication:

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.02 sec)
mysql> SET GLOBAL sql_slave_skip_counter=1;
Query OK, 0 rows affected (0.00 sec)
mysql> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql> SHOW SLAVE STATUS\G

Fixing non-GTID replication

When you check slave status, replication still isn’t fixed. To fix it, you must manually skip to the next binary log position. The current binary log (Relay_Master_Log_File) and position (Exec_Master_Log_Pos) executed are mysql-bin.000002 and 36174373 respectively. We can use mysqlbinlog on the master to determine the next position:

mysqlbinlog -v --base64-output=DECODE-ROWS --start-position=36174373 /ssd/sandboxes/msb_5_7_23/data/mysql-bin.000002 | head -n 30
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 36174373
#180918 22:03:40 server id 1  end_log_pos 36174438 CRC32 0xc7e1e553 	Anonymous_GTID	last_committed=19273	sequence_number=19277	rbr_only=no
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 36174438
#180918 22:03:40 server id 1  end_log_pos 36174621 CRC32 0x2e5bb235 	Query	thread_id=563	exec_time=0	error_code=0
SET TIMESTAMP=1537279420/*!*/;
SET @@session.pseudo_thread_id=563/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8'
/*!*/;
# at 36174621
#180918 22:03:40 server id 1  end_log_pos 36174686 CRC32 0x86756b3f 	Anonymous_GTID	last_committed=19275	sequence_number=19278	rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 36174686
#180918 22:03:40 server id 1  end_log_pos 36174760 CRC32 0x30e663f9 	Query	thread_id=529	exec_time=0	error_code=0
SET TIMESTAMP=1537279420/*!*/;
BEGIN
/*!*/;
# at 36174760
#180918 22:03:40 server id 1  end_log_pos 36174819 CRC32 0x48054daf 	Table_map: `sbtest`.`sbtest1` mapped to number 226

Based on the output above, the next binary log position is 36174621. To fix the slave, run:

STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=36174621;
START SLAVE;

Verify if the slave threads are now running by executing SHOW SLAVE STATUS\G

Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5723
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 306841423
               Relay_Log_File: mysql_sandbox5641-relay-bin.000002
                Relay_Log_Pos: 190785290
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
*** redacted ***
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 226959625
              Relay_Log_Space: 270667273
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
*** redacted ***
        Seconds_Behind_Master: 383
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 00005723-0000-0000-0000-000000005723
             Master_Info_File: /ssd/sandboxes/msb_5_6_41/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Opening tables
           Master_Retry_Count: 86400
*** redacted ***
                Auto_Position: 0

To make the slave consistent with the master, execute the compatible query on the slave.

SET SESSION sql_log_bin = 0;
GRANT USAGE ON *.* TO 'testuser'@'localhost' IDENTIFIED BY 'testuser';

Done.

GTID replication

For GTID replication, in addition to injecting an empty transaction for the offending statement, you’ll need skip it by using the non-GTID solution provided above. Once running, flip it back to GTID.

Here’s an example of a broken GTID slave:

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5723
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 14364967
               Relay_Log_File: mysql_sandbox5641-relay-bin.000002
                Relay_Log_Pos: 8630318
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
*** redacted ***
                   Last_Errno: 1064
                   Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A784' at line 1' on query. Default database: ''. Query: 'ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8''
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 12468343
              Relay_Log_Space: 10527158
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
*** redacted ***
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1064
               Last_SQL_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A784' at line 1' on query. Default database: ''. Query: 'ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8''
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 00005723-0000-0000-0000-000000005723
             Master_Info_File: /ssd/sandboxes/msb_5_6_41/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 180918 22:32:28
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00005723-0000-0000-0000-000000005723:2280-8530
            Executed_Gtid_Set: 00005723-0000-0000-0000-000000005723:1-7403
                Auto_Position: 1
1 row in set (0.00 sec)
mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
+---------------+---------------------------------------------+
| Variable_name | Value                                       |
+---------------+---------------------------------------------+
| gtid_executed | 00005723-0000-0000-0000-000000005723:1-7403 |
+---------------+---------------------------------------------+
1 row in set (0.00 sec)

Since the last position executed is 7403, so you’ll need to create an empty transaction for the offending sequence 7404.

STOP SLAVE;
SET GTID_NEXT='00005723-0000-0000-0000-000000005723:7404';
BEGIN;
COMMIT;
SET GTID_NEXT=AUTOMATIC;
START SLAVE;

Note: If you have MTS enabled, you can also get the offending GTID coordinates from Last_SQL_Error of SHOW SLAVE STATUS\G

The next step is to find the next binary log position. The current binary log(Relay_Master_Log_File) and position(Exec_Master_Log_Pos) executed are mysql-bin.000003 and 12468343 respectively. We can again use

mysqlbinlog

  on the master to determine the next position:

mysqlbinlog -v --base64-output=DECODE-ROWS --start-position=12468343 /ssd/sandboxes/msb_5_7_23/data/mysql-bin.000003 | head -n 30
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 12468343
#180918 22:32:19 server id 1  end_log_pos 12468408 CRC32 0x259ee085 	GTID	last_committed=7400	sequence_number=7404	rbr_only=no
SET @@SESSION.GTID_NEXT= '00005723-0000-0000-0000-000000005723:7404'/*!*/;
# at 12468408
#180918 22:32:19 server id 1  end_log_pos 12468591 CRC32 0xb349ad80 	Query	thread_id=142	exec_time=0	error_code=0
SET TIMESTAMP=1537281139/*!*/;
SET @@session.pseudo_thread_id=142/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
ALTER USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*3A2EB9C80F7239A4DE3933AE266DB76A7846BCB8'
/*!*/;
# at 12468591
#180918 22:32:19 server id 1  end_log_pos 12468656 CRC32 0xb2019f3f 	GTID	last_committed=7400	sequence_number=7405	rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '00005723-0000-0000-0000-000000005723:7405'/*!*/;
# at 12468656
#180918 22:32:19 server id 1  end_log_pos 12468730 CRC32 0x76b5ea6c 	Query	thread_id=97	exec_time=0	error_code=0
SET TIMESTAMP=1537281139/*!*/;
BEGIN
/*!*/;
# at 12468730
#180918 22:32:19 server id 1  end_log_pos 12468789 CRC32 0x48f0ba6d 	Table_map: `sbtest`.`sbtest8` mapped to number 115

The next binary log position is 36174621. To fix the slave, run:

STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=12468591, MASTER_AUTO_POSITION=0;
START SLAVE;

Notice that I added MASTER_AUTO_POSITION=0 above to disable GTID replication for now. You can run SHOW SLAVE STATUS\G to determine that MySQL is running fine:

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5723
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 446194575
               Relay_Log_File: mysql_sandbox5641-relay-bin.000002
                Relay_Log_Pos: 12704248
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
*** redacted ***
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 25172522
              Relay_Log_Space: 433726939
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
*** redacted ***
        Seconds_Behind_Master: 2018
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 00005723-0000-0000-0000-000000005723
             Master_Info_File: /ssd/sandboxes/msb_5_6_41/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Reading event from the relay log
           Master_Retry_Count: 86400
*** redacted ***
           Retrieved_Gtid_Set: 00005723-0000-0000-0000-000000005723:7405-264930
            Executed_Gtid_Set: 00005723-0000-0000-0000-000000005723:1-14947
                Auto_Position: 0

Since it’s running fine you can now revert back to GTID replication:

STOP SLAVE;
CHANGE MASTER TO MASTER_AUTO_POSITION=1;
START SLAVE;

Finally, to make the slave consistent with the master, execute the compatible query on the slave.

SET SESSION sql_log_bin = 0;
GRANT USAGE ON *.* TO 'testuser'@'localhost' IDENTIFIED BY 'testuser';

Summary

In this article, I’ve shared how to fix replication when it breaks due to an incompatible command being replicated to the slave. In fact, I’ve only identified ALTER USER as an incompatible command for 5.6. If there are other incompatible commands, please share them in the comment section. Thanks in advance.

Update: 
I filed a bug at https://bugs.mysql.com/bug.php?id=92629 to verify if the errors I’ve encountered here is a bug or undocumented behavior.

The post How To Fix MySQL Replication After an Incompatible DDL Command appeared first on Percona Database Performance Blog.

Sep
07
2018
--

Setting up Streaming Replication in PostgreSQL

postgres streaming replication

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender

 ,

wal receiver

 and

startup

 process, these play a major role in achieving streaming replication in postgres.

A

wal sender

 process runs on a master, whereas the

wal receiver

 and

startup

 processes runs on its slave. When you start the replication, a

wal receiver

 process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the

wal sender

 process on master sends the WAL data until the latest LSN starting from the LSN sent by the

wal receiver

, to the slave.

Wal receiver

 writes the WAL data sent by

wal sender

 to WAL segments. It is the

startup

 process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave

Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

CREATE USER replicator
WITH REPLICATION
ENCRYPTED PASSWORD 'replicator';

Step 2:

The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set to 3 if you are starting with one slave. For every slave, you may add 2 wal senders.
  • wal_keep_segments : Set the WAL retention in pg_xlog (until PostgreSQL 9.x) and pg_wal (from PostgreSQL 10). Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  • archive_command : This parameter takes a shell command or external programs. It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  • listen_addresses : Set it to * or the range of IP Addresses that need to be whitelisted to connect to your master PostgreSQL server. Your slave IP should be whitelisted too, else, the slave cannot connect to the master to replicate/replay WALs.
  • hot_standby : Must be set to ON on standby/replica and has no effect on the master. However, when you setup your replication, parameters set on the master are automatically copied. This parameter is important to enable READS on slave. Otherwise, you cannot run your SELECT queries against slave.

The above parameters can be set on the master using these commands followed by a restart:

ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';

$ pg_ctl -D $PGDATA restart -mf

Step 3:

Add an entry to

pg_hba.conf

 of the master to allow replication connections from the slave. The default location of

pg_hba.conf

 is the data directory. However, you may modify the location of this file in the file 

postgresql.conf

. In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command =>

pg_lsclusters

.

host replication replicator 192.168.0.28/32 md5

The IP address mentioned in this line must match the IP address of your slave server. Please change the IP accordingly.

In order to get the changes into effect, issue a SIGHUP:

$ pg_ctl -D $PGDATA reload
Or
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"

Step 4:

pg_basebackup

 helps us to stream the data through the 

wal sender

 process from the master to a slave to set up replication. You can also take a tar format backup from master and copy that to the slave server. You can read more about tar format pg_basebackup here

The following step can be used to stream data directory from master to slave. This step can be performed on a slave.

$ pg_basebackup -h 192.168.0.28 -U replicator -p 5432 -D $PGDATA -P -Xs -R

Please replace the IP address with your master’s IP address.

In the above command, you see an optional argument -R. When you pass -R, it automatically creates a

recovery.conf

  file that contains the role of the DB instance and the details of its master. It is mandatory to create the recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on master that can be copied to slave, you must create this recovery.conf file manually. Here are the contents of the recovery.conf file:

$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.28 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

In the above file, the role of the server is defined by standby_mode.

standby_mode

  must be set to ON for slaves in postgres.
And to stream WAL data, details of the master server are configured using the parameter

primary_conninfo

 .

The two parameters

standby_mode

  and

primary_conninfo

 are automatically created when you use the optional argument -R while taking a pg_basebackup. This recovery.conf file must exist in the data directory($PGDATA) of Slave.

Step 5:

Start your slave once the backup and restore are completed.

If you have configured a backup (remotely) using the streaming method mentioned in Step 4, it just copies all the files and directories to the data directory of the slave. Which means it is both a back up of the master data directory and also provides for restore in a single step.

If you have taken a tar back up from the master and shipped it to the slave, you must unzip/untar the back up to the slave data directory, followed by creating a recovery.conf as mentioned in the previous step. Once done, you may proceed to start your PostgreSQL instance on the slave using the following command.

$ pg_ctl -D $PGDATA start

Step 6:

In a production environment, it is always advisable to have the parameter

restore_command

set appropriately. This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave, if the WAL is not available on the master.

For example:

If a network issue has caused a slave to lag behind the master for a substantial time, it is less likely to have those WALs required by the slave available on the master’s

pg_xlog

 or

pg_wal

 location. Hence, it is sensible to archive the WALs to a safe location, and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. To achieve that, you have to add a line similar to the next example to your recovery.conf file in slave. You may substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

Setting the above parameter requires a restart and cannot be done online.

Final step: validate that replication is setup

As discussed earlier, a

wal sender

  and a

wal receiver

  process are started on the master and the slave after setting up replication. Check for these processes on both master and slave using the following commands.

On Master
==========
$ ps -eaf | grep sender
On Slave
==========
$ ps -eaf | grep receiver
$ ps -eaf | grep startup

You must see those all three processes running on master and slave as you see in the following example log.

On Master
=========
$ ps -eaf | grep sender
postgres  1287  1268  0 10:40 ?        00:00:00 postgres: wal sender process replicator 192.168.0.28(36924) streaming 0/50000D68
On Slave
=========
$ ps -eaf | egrep "receiver|startup"
postgres  1251  1249  0 10:40 ?        00:00:00 postgres: startup process   recovering 000000010000000000000050
postgres  1255  1249  0 10:40 ?        00:00:04 postgres: wal receiver process   streaming 0/50000D68

You can see more details by querying the master’s pg_stat_replication view.

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.0.28
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Reference : https://www.postgresql.org/docs/10/static/warm-standby.html#STANDBY-SERVER-SETUP

If you found this post interesting…

Did you know that Percona now provides PostgreSQL support services? If you’d like to read more about this, here’s some more information. We’re here to help.

The post Setting up Streaming Replication in PostgreSQL appeared first on Percona Database Performance Blog.

Aug
17
2018
--

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';
GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%';
FLUSH PRIVILEGES;

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"
#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

schema_mappings:
sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.

Aug
07
2018
--

Replicating from MySQL 8.0 to MySQL 5.7

replicate from MySQL 8 to MySQL 5.7

In this blog post, we’ll discuss how to set a replication from MySQL 8.0 to MySQL 5.7. There are some situations that having this configuration might help. For example, in the case of a MySQL upgrade, it can be useful to have a master that is using a newer version of MySQL to an older version slave as a rollback plan. Another example is in the case of upgrading a master x master replication topology.

Officially, replication is only supported between consecutive major MySQL versions, and only from a lower version master to a higher version slave. Here is an example of a supported scenario:

5.7 master –> 8.0 slave

while the opposite is not supported:

8.0 master –> 5.7 slave

In this blog post, I’ll walk through how to overcome the initial problems to set a replication working in this scenario. I’ll also show some errors that can halt the replication if a new feature from MySQL 8 is used.

Here is the initial set up that will be used to build the topology:

slave > select @@version;
+---------------+
| @@version     |
+---------------+
| 5.7.17-log |
+---------------+
1 row in set (0.00 sec)
master > select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.12    |
+-----------+
1 row in set (0.00 sec)

First, before executing the CHANGE MASTER command, you need to modify the collation on the master server. Otherwise the replication will run into this error:

slave > show slave status\G
                   Last_Errno: 22
                   Last_Error: Error 'Character set '#255' is not a compiled character set and is not specified in the '/opt/percona_server/5.7.17/share/charsets/Index.xml' file' on query. Default database: 'mysql8_1'. Query: 'create database mysql8_1'

This is because the default character_set and the collation has changed on MySQL 8. According to the documentation:

The default value of the character_set_server and character_set_database system variables has changed from latin1 to utf8mb4.

The default value of the collation_server and collation_database system variables has changed from latin1_swedish_ci to utf8mb4_0900_ai_ci.

Let’s change the collation and the character set to utf8 on MySQL 8 (it is possible to use any option that exists in both versions):

# master my.cnf
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_unicode_ci

You need to restart MySQL 8 to apply the changes. Next, after the restart, you have to create a replication user using mysql_native_password.This is because MySQL 8 changed the default Authentication Plugin to caching_sha2_password which is not supported by MySQL 5.7. If you try to execute the CHANGE MASTER command with a user using caching_sha2_password plugin, you will receive the error message below:

Last_IO_Errno: 2059
Last_IO_Error: error connecting to master 'root@127.0.0.1:19025' - retry-time: 60 retries: 1

To create a user using mysql_native_password :

master> CREATE USER 'replica_user'@'%' IDENTIFIED WITH mysql_native_password BY 'repli$cat';
master> GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%';

Finally, we can proceed as usual to build the replication:

master > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000007
Position: 155
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=155; start slave;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
# This procedure works with GTIDs too
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025,MASTER_AUTO_POSITION = 1 ; start slave;

Checking the replication status:

master > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: replica_user
Master_Port: 19025
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 155
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 155
Relay_Log_Space: 524
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 100
Master_UUID: 00019025-1111-1111-1111-111111111111
Master_Info_File: /home/vinicius.grippa/sandboxes/rsandbox_5_7_17/master/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.01 sec)

Executing a quick test to check if the replication is working:

master > create database vinnie;
Query OK, 1 row affected (0.06 sec)

slave > show databases like 'vinnie';
+-------------------+
| Database (vinnie) |
+-------------------+
| vinnie |
+-------------------+
1 row in set (0.00 sec)

Caveats

Any tentative attempts to use a new feature from MySQL 8 like roles, invisible indexes or caching_sha2_password will make the replication stop with an error:

master > alter user replica_user identified with caching_sha2_password by 'sekret';
Query OK, 0 rows affected (0.01 sec)

slave > show slave status\G
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'replica_user'@'%'' on query. Default database: ''. Query: 'ALTER USER 'replica_user'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$H	MEDi\"gQ
                        wR{/I/VjlgBIUB08h1jIk4fBzV8kU1J2RTqeqMq8Q2aox0''

Summary

Replicating from MySQL 8 to MySQL 5.7 is possible. In some scenarios (especially upgrades), this might be helpful, but it is not advisable to have a heterogeneous topology because it will be prone to errors and incompatibilities under some cases.

You might also like:

 

The post Replicating from MySQL 8.0 to MySQL 5.7 appeared first on Percona Database Performance Blog.

Aug
02
2018
--

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

Jul
04
2018
--

How to Set Up Replication Between AWS Aurora and an External MySQL Instance

Amazon RDS Aurora replication to external server

Amazon RDS Aurora replication to external serverAmazon RDS Aurora (MySQL) provides its own low latency replication. Nevertheless, there are cases where it can be beneficial to set up replication from Aurora to an external MySQL server, as Amazon RDS Aurora is based on MySQL and supports native MySQL replication. Here are some examples of when replicating from Amazon RDS Aurora to an external MySQL server can make good sense:

  • Replicating to another cloud or datacenter (for added redundancy)
  • Need to use an independent reporting slave
  • Need to have an additional physical backup
  • Need to use another MySQL flavor or fork
  • Need to failover to another cloud and back

In this blog post I will share simple step by step instructions on how to do it.

Steps to setup MySQL replication from AWS RDS Aurora to MySQL server

  1. Enable binary logs in the option group in Aurora (Binlog format = mixed). This will require a restart.
  2. Create a snapshot and restore it (create a new instance from a snapshot). This is only needed to make a consistent copy with mysqldump. As Aurora does not allow “super” privileges, running
    mysqldump --master-data

      is not possible. The snapshot is the only way to get a consistent backup with the specific binary log position.

  3. Get the binary log information from the snapshot. In the console, look for the “Alarms and Recent Events” for the restored snapshot instance. We should see something like:
    Binlog position from crash recovery is mysql-bin-changelog.000708 31278857
  4. Install MySQL 5.6 (i.e. Percona Server 5.6) on a separate EC2 instance (for Aurora 5.6 – note that you should use MySQL 5.7 for Aurora 5.7). After MySQL is up and running, import the timezones:
    # mysql_tzinfo_to_sql /usr/share/zoneinfo/|mysql

    Sample config:

    [mysqld]
    log-bin=log-bin
    log-slave-updates
    binlog-format=MIXED
    server-id=1000
    relay-log=relay-bin
    innodb_log_file_size=1G
    innodb_buffer_pool_size=2G
    innodb_flush_method=O_DIRECT
    innodb_flush_log_at_trx_commit=0 # as this is replication slave
  5. From now on we will make all backups from the restored snapshot. First get all users and import those to the new instance:
    pt-show-grants -h myhost...amazonaws.com -u percona > grants.sql

    # check that grants are valid and upload to MySQL

    mysql -f < grants.sql

    Make a backup of all schemas except for the “mysql” system tables as Aurora using different format of those (make sure we connect to the snapshot):

    host="my-snapshot...amazonaws.com"
    mysqldump --single-transaction -h $host -u percona
    --triggers --routines
    --databases `mysql -u percona -h $host -NBe
    "select group_concat(schema_name separator ' ') from information_schema.schemata where schema_name not in ('mysql', 'information_schema', 'performance_schema')"` > all.sql
  6. Restore to the local database:
    mysql -h localhost < all.sql
  7. Restore users again (some users may fail to create where there are missing databases):
    mysql -f < grants.sql
  8. Download the RDS/Aurora SSL certificate:
    # cd /etc/ssl
    # wget 'https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem'
    # chown mysql.mysql rds-combined-ca-bundle.pem
  9. Configure MySQL replication. Take the values for the binary log name and position from #3 above. Please note: now we connect to the actual instance, not a snapshot:
    # mysql -h localhost
    ...
    mysql> CHANGE MASTER TO
    MASTER_HOST='dev01-aws-1...',
    MASTER_USER='awsreplication',
    MASTER_PASSWORD='<pass>',
    MASTER_LOG_FILE = 'mysql-bin-changelog.000708',
    MASTER_LOG_POS = 31278857,
    MASTER_SSL_CA = '/etc/ssl/rds-combined-ca-bundle.pem',
    MASTER_SSL_CAPATH = '',
    MASTER_SSL_VERIFY_SERVER_CERT=1;
    mysql> start slave;
  10. Verify that the slave is working. Optionally add the SQL_Delay option to the CHANGE MASTER TO (or anytime) and specify the slave delay in seconds.

I hope those steps will be helpful for setting up an external MySQL replica.

The post How to Set Up Replication Between AWS Aurora and an External MySQL Instance appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com