Setting up Streaming Replication in PostgreSQL

postgres streaming replication

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender


wal receiver



 process, these play a major role in achieving streaming replication in postgres.


wal sender

 process runs on a master, whereas the

wal receiver



 processes runs on its slave. When you start the replication, a

wal receiver

 process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the

wal sender

 process on master sends the WAL data until the latest LSN starting from the LSN sent by the

wal receiver

, to the slave.

Wal receiver

 writes the WAL data sent by

wal sender

 to WAL segments. It is the


 process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave

Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

CREATE USER replicator

Step 2:

The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set to 3 if you are starting with one slave. For every slave, you may add 2 wal senders.
  • wal_keep_segments : Set the WAL retention in pg_xlog (until PostgreSQL 9.x) and pg_wal (from PostgreSQL 10). Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  • archive_command : This parameter takes a shell command or external programs. It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  • listen_addresses : Set it to * or the range of IP Addresses that need to be whitelisted to connect to your master PostgreSQL server. Your slave IP should be whitelisted too, else, the slave cannot connect to the master to replicate/replay WALs.
  • hot_standby : Must be set to ON on standby/replica and has no effect on the master. However, when you setup your replication, parameters set on the master are automatically copied. This parameter is important to enable READS on slave. Otherwise, you cannot run your SELECT queries against slave.

The above parameters can be set on the master using these commands followed by a restart:

ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';

$ pg_ctl -D $PGDATA restart -mf

Step 3:

Add an entry to


 of the master to allow replication connections from the slave. The default location of


 is the data directory. However, you may modify the location of this file in the file 


. In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command =>



host replication replicator md5

The IP address mentioned in this line must match the IP address of your slave server. Please change the IP accordingly.

In order to get the changes into effect, issue a SIGHUP:

$ pg_ctl -D $PGDATA reload
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"

Step 4:


 helps us to stream the data through the 

wal sender

 process from the master to a slave to set up replication. You can also take a tar format backup from master and copy that to the slave server. You can read more about tar format pg_basebackup here

The following step can be used to stream data directory from master to slave. This step can be performed on a slave.

$ pg_basebackup -h -U replicator -p 5432 -D $PGDATA -P -Xs -R

Please replace the IP address with your master’s IP address.

In the above command, you see an optional argument -R. When you pass -R, it automatically creates a


  file that contains the role of the DB instance and the details of its master. It is mandatory to create the recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on master that can be copied to slave, you must create this recovery.conf file manually. Here are the contents of the recovery.conf file:

$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host= port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

In the above file, the role of the server is defined by standby_mode.


  must be set to ON for slaves in postgres.
And to stream WAL data, details of the master server are configured using the parameter



The two parameters




 are automatically created when you use the optional argument -R while taking a pg_basebackup. This recovery.conf file must exist in the data directory($PGDATA) of Slave.

Step 5:

Start your slave once the backup and restore are completed.

If you have configured a backup (remotely) using the streaming method mentioned in Step 4, it just copies all the files and directories to the data directory of the slave. Which means it is both a back up of the master data directory and also provides for restore in a single step.

If you have taken a tar back up from the master and shipped it to the slave, you must unzip/untar the back up to the slave data directory, followed by creating a recovery.conf as mentioned in the previous step. Once done, you may proceed to start your PostgreSQL instance on the slave using the following command.

$ pg_ctl -D $PGDATA start

Step 6:

In a production environment, it is always advisable to have the parameter


set appropriately. This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave, if the WAL is not available on the master.

For example:

If a network issue has caused a slave to lag behind the master for a substantial time, it is less likely to have those WALs required by the slave available on the master’s




 location. Hence, it is sensible to archive the WALs to a safe location, and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. To achieve that, you have to add a line similar to the next example to your recovery.conf file in slave. You may substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

Setting the above parameter requires a restart and cannot be done online.

Final step: validate that replication is setup

As discussed earlier, a

wal sender

  and a

wal receiver

  process are started on the master and the slave after setting up replication. Check for these processes on both master and slave using the following commands.

On Master
$ ps -eaf | grep sender
On Slave
$ ps -eaf | grep receiver
$ ps -eaf | grep startup

You must see those all three processes running on master and slave as you see in the following example log.

On Master
$ ps -eaf | grep sender
postgres  1287  1268  0 10:40 ?        00:00:00 postgres: wal sender process replicator streaming 0/50000D68
On Slave
$ ps -eaf | egrep "receiver|startup"
postgres  1251  1249  0 10:40 ?        00:00:00 postgres: startup process   recovering 000000010000000000000050
postgres  1255  1249  0 10:40 ?        00:00:04 postgres: wal receiver process   streaming 0/50000D68

You can see more details by querying the master’s pg_stat_replication view.

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      |
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Reference :

If you found this post interesting…

Did you know that Percona now provides PostgreSQL support services? If you’d like to read more about this, here’s some more information. We’re here to help.

The post Setting up Streaming Replication in PostgreSQL appeared first on Percona Database Performance Blog.


Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install
yum install postgresql10*
su - postgres
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
    override_to: boolean
      - "*"
#postgres  destination connection
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
      sakila: sch_sakila
#      -
#      -
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
#        - #skips inserts on the table
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference :

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.


Replicating from MySQL 8.0 to MySQL 5.7

replicate from MySQL 8 to MySQL 5.7

In this blog post, we’ll discuss how to set a replication from MySQL 8.0 to MySQL 5.7. There are some situations that having this configuration might help. For example, in the case of a MySQL upgrade, it can be useful to have a master that is using a newer version of MySQL to an older version slave as a rollback plan. Another example is in the case of upgrading a master x master replication topology.

Officially, replication is only supported between consecutive major MySQL versions, and only from a lower version master to a higher version slave. Here is an example of a supported scenario:

5.7 master –> 8.0 slave

while the opposite is not supported:

8.0 master –> 5.7 slave

In this blog post, I’ll walk through how to overcome the initial problems to set a replication working in this scenario. I’ll also show some errors that can halt the replication if a new feature from MySQL 8 is used.

Here is the initial set up that will be used to build the topology:

slave > select @@version;
| @@version     |
| 5.7.17-log |
1 row in set (0.00 sec)
master > select @@version;
| @@version |
| 8.0.12    |
1 row in set (0.00 sec)

First, before executing the CHANGE MASTER command, you need to modify the collation on the master server. Otherwise the replication will run into this error:

slave > show slave status\G
                   Last_Errno: 22
                   Last_Error: Error 'Character set '#255' is not a compiled character set and is not specified in the '/opt/percona_server/5.7.17/share/charsets/Index.xml' file' on query. Default database: 'mysql8_1'. Query: 'create database mysql8_1'

This is because the default character_set and the collation has changed on MySQL 8. According to the documentation:

The default value of the character_set_server and character_set_database system variables has changed from latin1 to utf8mb4.

The default value of the collation_server and collation_database system variables has changed from latin1_swedish_ci to utf8mb4_0900_ai_ci.

Let’s change the collation and the character set to utf8 on MySQL 8 (it is possible to use any option that exists in both versions):

# master my.cnf

You need to restart MySQL 8 to apply the changes. Next, after the restart, you have to create a replication user using mysql_native_password.This is because MySQL 8 changed the default Authentication Plugin to caching_sha2_password which is not supported by MySQL 5.7. If you try to execute the CHANGE MASTER command with a user using caching_sha2_password plugin, you will receive the error message below:

Last_IO_Errno: 2059
Last_IO_Error: error connecting to master 'root@' - retry-time: 60 retries: 1

To create a user using mysql_native_password :

master> CREATE USER 'replica_user'@'%' IDENTIFIED WITH mysql_native_password BY 'repli$cat';
master> GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%';

Finally, we can proceed as usual to build the replication:

master > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000007
Position: 155
1 row in set (0.00 sec)
slave > CHANGE MASTER TO MASTER_HOST='', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=155; start slave;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
# This procedure works with GTIDs too
slave > CHANGE MASTER TO MASTER_HOST='', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025,MASTER_AUTO_POSITION = 1 ; start slave;

Checking the replication status:

master > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_User: replica_user
Master_Port: 19025
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 155
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Skip_Counter: 0
Exec_Master_Log_Pos: 155
Relay_Log_Space: 524
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_SQL_Errno: 0
Master_Server_Id: 100
Master_UUID: 00019025-1111-1111-1111-111111111111
Master_Info_File: /home/vinicius.grippa/sandboxes/rsandbox_5_7_17/master/data/
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Auto_Position: 0
1 row in set (0.01 sec)

Executing a quick test to check if the replication is working:

master > create database vinnie;
Query OK, 1 row affected (0.06 sec)

slave > show databases like 'vinnie';
| Database (vinnie) |
| vinnie |
1 row in set (0.00 sec)


Any tentative attempts to use a new feature from MySQL 8 like roles, invisible indexes or caching_sha2_password will make the replication stop with an error:

master > alter user replica_user identified with caching_sha2_password by 'sekret';
Query OK, 0 rows affected (0.01 sec)

slave > show slave status\G
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'replica_user'@'%'' on query. Default database: ''. Query: 'ALTER USER 'replica_user'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$H	MEDi\"gQ


Replicating from MySQL 8 to MySQL 5.7 is possible. In some scenarios (especially upgrades), this might be helpful, but it is not advisable to have a heterogeneous topology because it will be prone to errors and incompatibilities under some cases.

You might also like:


The post Replicating from MySQL 8.0 to MySQL 5.7 appeared first on Percona Database Performance Blog.


Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.


While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.


How to Set Up Replication Between AWS Aurora and an External MySQL Instance

Amazon RDS Aurora replication to external server

Amazon RDS Aurora replication to external serverAmazon RDS Aurora (MySQL) provides its own low latency replication. Nevertheless, there are cases where it can be beneficial to set up replication from Aurora to an external MySQL server, as Amazon RDS Aurora is based on MySQL and supports native MySQL replication. Here are some examples of when replicating from Amazon RDS Aurora to an external MySQL server can make good sense:

  • Replicating to another cloud or datacenter (for added redundancy)
  • Need to use an independent reporting slave
  • Need to have an additional physical backup
  • Need to use another MySQL flavor or fork
  • Need to failover to another cloud and back

In this blog post I will share simple step by step instructions on how to do it.

Steps to setup MySQL replication from AWS RDS Aurora to MySQL server

  1. Enable binary logs in the option group in Aurora (Binlog format = mixed). This will require a restart.
  2. Create a snapshot and restore it (create a new instance from a snapshot). This is only needed to make a consistent copy with mysqldump. As Aurora does not allow “super” privileges, running
    mysqldump --master-data

      is not possible. The snapshot is the only way to get a consistent backup with the specific binary log position.

  3. Get the binary log information from the snapshot. In the console, look for the “Alarms and Recent Events” for the restored snapshot instance. We should see something like:
    Binlog position from crash recovery is mysql-bin-changelog.000708 31278857
  4. Install MySQL 5.6 (i.e. Percona Server 5.6) on a separate EC2 instance (for Aurora 5.6 – note that you should use MySQL 5.7 for Aurora 5.7). After MySQL is up and running, import the timezones:
    # mysql_tzinfo_to_sql /usr/share/zoneinfo/|mysql

    Sample config:

    innodb_flush_log_at_trx_commit=0 # as this is replication slave
  5. From now on we will make all backups from the restored snapshot. First get all users and import those to the new instance:
    pt-show-grants -h -u percona > grants.sql

    # check that grants are valid and upload to MySQL

    mysql -f < grants.sql

    Make a backup of all schemas except for the “mysql” system tables as Aurora using different format of those (make sure we connect to the snapshot):

    mysqldump --single-transaction -h $host -u percona
    --triggers --routines
    --databases `mysql -u percona -h $host -NBe
    "select group_concat(schema_name separator ' ') from information_schema.schemata where schema_name not in ('mysql', 'information_schema', 'performance_schema')"` > all.sql
  6. Restore to the local database:
    mysql -h localhost < all.sql
  7. Restore users again (some users may fail to create where there are missing databases):
    mysql -f < grants.sql
  8. Download the RDS/Aurora SSL certificate:
    # cd /etc/ssl
    # wget ''
    # chown mysql.mysql rds-combined-ca-bundle.pem
  9. Configure MySQL replication. Take the values for the binary log name and position from #3 above. Please note: now we connect to the actual instance, not a snapshot:
    # mysql -h localhost
    MASTER_LOG_FILE = 'mysql-bin-changelog.000708',
    MASTER_LOG_POS = 31278857,
    MASTER_SSL_CA = '/etc/ssl/rds-combined-ca-bundle.pem',
    mysql> start slave;
  10. Verify that the slave is working. Optionally add the SQL_Delay option to the CHANGE MASTER TO (or anytime) and specify the slave delay in seconds.

I hope those steps will be helpful for setting up an external MySQL replica.

The post How to Set Up Replication Between AWS Aurora and an External MySQL Instance appeared first on Percona Database Performance Blog.


Fixing ER_MASTER_HAS_PURGED_REQUIRED_GTIDS when pointing a slave to a different master

gtid auto position

gtid auto positionGTID replication has made it convenient to setup and maintain MySQL replication. You need not worry about binary log file and position thanks to GTID and auto-positioning. However, things can go wrong when pointing a slave to a different master. Consider a situation where the new master has executed transactions that haven’t been executed on the old master. If the corresponding binary logs have been purged already, how do you point the slave to the new master?

The scenario

Based on technical requirements and architectural change, there is a need to point the slave to a different master by

  1. Pointing it to another node in a PXC cluster
  2. Pointing it to another master in master/master replication
  3. Pointing it to another slave of a master
  4. Pointing it to the slave of a slave of the master … and so on and so forth.

Theoretically, pointing to a new master with GTID replication is easy. All you have to do is run:


Alas, in some cases, replication breaks due to missing binary logs:

*************************** 1. row ***************************
Master_Host: pxc_57_5
Master_User: repl
Master_Port: 3306
Slave_IO_Running: No
Slave_SQL_Running: Yes
** redacted **
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
** redacted **
Master_Server_Id: 1
Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007
Master_Info_File: /var/lib/mysql/
** redacted **
Last_IO_Error_Timestamp: 180613 08:08:20
** redacted **
Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3
Auto_Position: 1

The strange issue here is that if you point the slave back to the old master, replication works just fine. The error says that there are missing binary logs in the new master that the slave needs. If there’s no problem with replication performance and the slave can easily catch up, then it looks like there are transactions executed in the new master that have not been executed in the old master but are recorded in the missing binary logs. The binary logs are most likely lost due to manually purging with PURGE BINARY LOGS or automatic purging if expire_logs_days is set.

At this point, it would be prudent to check and sync old master and new master with tools such as pt-table-checksum and pt-table-sync. However, if a consistency check has been performed and no differences have been found, or there’s confidence that the new master is a good copy—such as another node in the PXC cluster—you can follow the steps below to resolve the problem.


To solve the problem, the slave needs to execute the missing transactions. But since these transactions have been purged, the steps below provide the workaround.

Step 1 Find the GTID sequences that are purged from the new master that is needed by the slave

To identify which GTID sequences are missing, run SHOW GLOBAL VARIABLES LIKE 'gtid_purged'; and SHOW MASTER STATUS; on the new master and SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave:

New Master:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_purged';
| Variable_name | Value |
| gtid_purged | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-2,
4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 |
1 row in set (0.00 sec)
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
| mysql-bin.000004 | 741 | | | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6,
4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 |
1 row in set (0.00 sec)


mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
| Variable_name | Value |
| gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3 |
1 row in set (0.00 sec)

Take note that 1904cf31-912b-ee17-4906-7dae335b4bfc and 1904cf31-912b-ee17-4906-7dae335b4bfc are UUIDs and refer to the MySQL instance where the transaction originated from.

Based on the output:

  • The slave has executed 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3
  • The new master has executed 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6 and 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11
  • The new master has purged 1904cf31-912b-ee17-4906-7dae335b4bfc:1-2 and 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11

This means that the slave has no issue with 1904cf31-912b-ee17-4906-7dae335b4bfc it requires sequences 4-6 and sequences 3-6 are still available in the master. However, the slave cannot fetch sequences 1-11 from 4998aaaa-6ed5-11e8-948c-0242ac120007 because these has been purged from the master.

To summarize, the missing GTID sequences are 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11.

Step 2: Identify where the purged GTID sequences came from

From the SHOW SLAVE STATUS output in the introduction section, it says that the Master_UUID is 4998aaaa-6ed5-11e8-948c-0242ac120007, which means the new master is the source of the missing transactions. You can also verify the new Master’s UUID by running SHOW GLOBAL VARIABLES LIKE 'server_uuid';

mysql> SHOW GLOBAL VARIABLES LIKE 'server_uuid';
| Variable_name | Value |
| server_uuid | 4998aaaa-6ed5-11e8-948c-0242ac120007 |
1 row in set (0.00 sec)

If the new master’s UUID does not match the missing GTID, it is most likely that this missing sequence came from its old master, another master higher up the chain or from another PXC node. If that other master still exists, you can run the same query on those masters to check.

The missing sequences are small such as 1-11. Typically, commands executed locally are due to performing maintenance on this server directly. For example, creating users, fixing privileges or updating passwords. However, you have no guarantee that this is the reason, since the binary logs have already been purged. If you still want to point the slave to the new master, proceed to step 3 or step 4.

Step 3. Injecting the missing transactions on the slave with empty transactions

The workaround is to pretend that those missing GTID sequences have been executed on the slave by injecting 11 empty transactions as instructed here by running:


It looks tedious, but a simple script can automate this:

while [ "$first_sequence_no" -le "$last_sequence_no" ]
echo "SET GTID_NEXT='$uuid:$first_sequence_no';"
first_sequence_no=`expr $first_sequence_no + 1`
bash 4998aaaa-6ed5-11e8-948c-0242ac120007 1 11
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:1';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:2';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:3';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:4';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:5';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:6';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:7';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:8';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:9';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:10';
SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:11';

Before executing the generated output on the slave, stop replication first:

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:1';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:2';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:3';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:4';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:5';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:6';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:7';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:8';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:9';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:10';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET GTID_NEXT='4998aaaa-6ed5-11e8-948c-0242ac120007:11';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

There’s also an even easier solution of injecting empty transactions by using mysqlslavetrx from MySQL utilities. By stopping the slave first and running
mysqlslavetrx --gtid-set=4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 --slaves=root:password@:3306 you will achieve the same result as above.

By running SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave you can see that sequences 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 have been executed already:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
| Variable_name | Value |
| gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3,
4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 |
1 row in set (0.01 sec)

Resume replication and check if replication is healthy by running START SLAVE; and SHOW SLAVE STATUS\G

Query OK, 0 rows affected (0.01 sec)
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: pxc_57_5
Master_User: repl
Master_Port: 3306
** redacted **
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
** redacted **
Seconds_Behind_Master: 0
** redacted **
Master_Server_Id: 1
Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007
** redacted **
Retrieved_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:4-6
Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6,
Auto_Position: 1
** redacted **
1 row in set (0.00 sec)

At this point, we have already solved the problem. However, there’s another way to restore the slave much faster but at the cost of erasing all the existing binary logs on the slave as mentioned in this article. If you want to do this, proceed to step 4.

Step 4. Add the missing sequences to GTID_EXECUTED by modifying GTID_PURGED.

If you followed the steps in Step 3, you do not need to perform Step 4!

To add the missing transactions, you’ll need to stop the slave, reset the master, place the original value of gtid_executed and the missing sequences in gtid_purged variable. A word of caution on using this method: this will purge the existing binary logs of the slave.

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.02 sec)
Query OK, 0 rows affected (0.02 sec)
mysql> SET GLOBAL gtid_purged="1904cf31-912b-ee17-4906-7dae335b4bfc:1-3,4998aaaa-6ed5-11e8-948c-0242ac120007:1-11";
Query OK, 0 rows affected (0.02 sec)

Similar to Step 3, running SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; on the slave shows that sequence 4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 has been executed already:

mysql> SHOW GLOBAL VARIABLES LIKE 'gtid_executed';
| Variable_name | Value |
| gtid_executed | 1904cf31-912b-ee17-4906-7dae335b4bfc:1-3,
4998aaaa-6ed5-11e8-948c-0242ac120007:1-11 |
1 row in set (0.01 sec)

Run START SLAVE; and SHOW SLAVE STATUS\G to resume replication and check if replication is healthy:

Query OK, 0 rows affected (0.01 sec)
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: pxc_57_5
Master_User: repl
Master_Port: 3306
** redacted **
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
** redacted **
Seconds_Behind_Master: 0
** redacted **
Master_Server_Id: 1
Master_UUID: 4998aaaa-6ed5-11e8-948c-0242ac120007
** redacted **
Retrieved_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:4-6
Executed_Gtid_Set: 1904cf31-912b-ee17-4906-7dae335b4bfc:1-6,
Auto_Position: 1
** redacted **
1 row in set (0.00 sec)

Step 5. Done


In this article, I demonstrated how to point the slave to a new master even if it’s missing some binary logs that need to be executed. Although, it is possible to do so with the workarounds shared above, it is prudent to check the consistency of the old and new master first before switching the slave to the new master.

The post Fixing ER_MASTER_HAS_PURGED_REQUIRED_GTIDS when pointing a slave to a different master appeared first on Percona Database Performance Blog.


Webinar 6/28: Securing Database Servers From External Attacks

securing database servers

securing database serversPlease join Percona’s Chief Evangelist Colin Charles on Thursday, June 28th, 2018, as he presents Securing Database Servers From External attacks at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).


A critical piece of your infrastructure is the database tier, yet people don’t pay enough attention to it judging by how many are bitten via poorly chosen defaults, or just a lack understanding of running a secure database tier. In this talk, I’ll focus on MySQL/MariaDB, PostgreSQL, and MongoDB, and cover external authentication, auditing, encryption, SSL, firewalls, replication, and more gems from over a decade of consulting in this space from Percona’s 4,000+ customers.

Register Now


Colin Charles

Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, and had worked at MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and projects. He’s well known within open source communities in APAC, and has spoken at many conferences. Experienced technologist, well known in the open source world for work that spans nearly two decades within the community. Pays attention to emerging technologies from an integration standpoint. Prolific speaker at many industry-wide conferences delivering talks and tutorials with ease. Interests: application development, systems administration, database development, migration, Web-based technologies. Considered expert in Linux and Mac OS X usage/administration/roll-out’s. Specialties: MariaDB, MySQL, Linux, Open Source, Community, speaking & writing to technical audiences as well as business stakeholders.

The post Webinar 6/28: Securing Database Servers From External Attacks appeared first on Percona Database Performance Blog.


Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide

mongodb backup and recovery field guide

mongodb backup and recovery field guidePlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt as he presents MongoDB Backup and Recovery Field Guide on Thursday, June 14, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

This talk will cover backup and recovery solutions for MongoDB replica sets and clusters, focusing on online and low-impact solutions for production systems.

Register for the webinar

Tim Vaillancourt

Senior Technical Operations Architect

With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS.

Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

The post Webinar Thurs 6/14: MongoDB Backup and Recovery Field Guide appeared first on Percona Database Performance Blog.


Causes and Workarounds for Slave Performance Too Slow with Row-Based Events

Slave Performance Too Slow

Slave Performance Too SlowRecently I worked on one customer issue that I would describe as “slave performance too slow”. During a quick analysis, I’ve found that the replication slave SQL thread cannot keep up while processing row-based events from the master’s binary log.

For example:

*************************** 1. row ***************************
              Master_Log_File: binlog.0000185
          Read_Master_Log_Pos: 86698585
        Relay_Master_Log_File: binlog.0000185
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 380
              Relay_Log_Space: 85699128
                  Master_UUID: 98974e7f-2fbc-18e9-72cd-07003817585c
           Retrieved_Gtid_Set: 98974e7f-2fbc-18e9-72cd-07003817585c:1055-1057
            Executed_Gtid_Set: 7f42e2c5-3fbc-16e7-7fb8-05003715789a:1-2,

The processlist state for the SQL thread can be one of the following: Reading event from the relay log, or System lock, or potentially some other state. In my case:

| Id | User            | Host            | db   | Command | Time | State                            | Info             |
|  4 | system user     |                 | NULL | Connect |  268 | Reading event from the relay log | NULL             |

What causes that?

Let’s take a look what could potentially cause such behavior and what we need to pay attention to. When the SQL thread applies the change from a row-based event, it has to locate the exact row that was updated. With a primary key, this is trivial as only one row can possibly have the same value for the primary key.

However, if there is no primary key on the table on the replication slave side, the SQL thread must search the entire table to locate the row to update or delete. It repeats the search for each updated row. This search is both very resource usage intensive (CPU usage can be up to 100%) and slow causing the slave to fall behind.

For InnoDB tables, the “hidden” key used for the clustered index for tables without a primary key cannot be used to avoid searching the entire table for the rows to update or delete. We need to keep in mind that the “hidden” key is unique only to each MySQL instance, so the replication master and replication slave generally don’t have the same values for the “hidden” key for the same row.

What can we do to solve that?

The best solution is to ensure that all tables have a primary key. This not only ensures the SQL thread can easily locate rows to update or delete, but it is also considered as a best practice since it ensures all rows are unique.

If there is no way to logically add a natural primary key for the table, a potential solution is to add an auto-increment unsigned integer column as the primary key.

The query below helps you to locate tables without a primary key:

SELECT tables.table_schema, tables.table_name, tables.table_rows
      FROM information_schema.tables
      LEFT JOIN (
        SELECT table_schema, table_name
        FROM information_schema.statistics
        GROUP BY table_schema, table_name, index_name
            CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END
          ) = COUNT(*)
      ) puks
      ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name
      WHERE puks.table_name IS NULL
        AND tables.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND tables.table_type = 'BASE TABLE' AND engine='InnoDB';

Please note that for InnoDB, there must always be a unique NOT NULL key for all tables. It is required for the clustered index. So adding an explicit “dummy” column as suggested above will not add to the overall storage requirements as it will merely replace the hidden key.

It’s not always possible to add a primary key to the table immediately if, for example, there are many relations on the application side/legacy system, lack of resources, unknown application behavior after the change which required testing, etc.

In this case, a short-term solution is to change the search algorithm used by the replication slave to locate the rows changed by row-based events.

The search algorithm is set using the slave_rows_search_algorithms option which is available in MySQL 5.6 and later. The default value is to use an index scan if possible, otherwise a table scan.

However, for tables without a primary key using a hash scan, which causes the SQL thread to temporarily cache hashes to reduce the overhead of searching the whole table. The value of slave_rows_search_algorithms can be changed dynamically using:

mysql> SET GLOBAL slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN';

Just to note INDEX_SCAN,HASH_SCAN is the default value in MySQL 8.0.

One thing to be aware of when using hash scans that the hashes are only reused within one row-based event. (Each row-based event may have changes to several rows in the same table originating from the same SQL statement).

The binlog_row_event_max_size option on the replication master controls the maximum size of a row-based event. The default max event size is 8kB. This means that switching to hash scans only improves the performance of the SQL thread when:

  1. Several rows fit into one row based event. It may help to increase the value of binlog_row_event_max_size on the replication master, if you perform updates or deletes on large rows (e.g., with blob or text data). You can only set the binlog_row_event_max_size in the MySQL configuration file, and resetting this value requires a restart.
  2. One statement changes several rows.


Even if enabling hash scans improves the performance enough for the replication slave to keep up, the permanent solution is to add an explicit primary key to each table. This should be the general rule of thumb in the schema design in order avoid and/or minimize many issues like slave performance too slow (as described in this post).

Next, I am going to investigate how we can find out the exact thread state using Performance Schema in order to make issue identification less of a guessing game.

The post Causes and Workarounds for Slave Performance Too Slow with Row-Based Events appeared first on Percona Database Performance Blog.


MongoDB Rollback in replicaset

MongoDB Rollback

MongoDB RollbackIn this blog post, we’ll look at how MongoDB rollback works during replicaset failovers.

In recent versions, MongoDB has provided lots of features related to replicaset and automatic failover. When it comes to failover, the next question that arises is “How does MongoDB ROLLBACK work during replicaset failover?”

If a PRIMARY member (say node A) stepped down with some data writes that were executed but not replicated to the SECONDARY members yet, then a ROLLBACK occurs on the former PRIMARY A when it rejoins the replicaset. I’ll explain below how the ROLLBACK works!

ROLLBACK Scenario:

ROLLBACK is rare in a replicaset as MongoDB tries to avoid it by replicating the operations from PRIMARY to SECONDARY without delay, under normal conditions. Most of the time ROLLBACK occurs in the event of network partitioning, or if SECONDARY members can’t keep up with the throughput of operations on the former PRIMARY.


We will see the process with a test. I have used Docker for this test with the MongoDB 3.2 Jessie version to setup a replicaset with members mongo1 – A, mongo2 – B, mongo3 – C and set Priority 10 to A. Now A is PRIMARY as expected in the replicaset. We need to write some data into A and create a network partition scenario with B and C at the same time. For that, I inserted 25000 documents into A and made it out of network at the same time.

Terminal 1 (A’s mongo prompt):

my-mongo-set:PRIMARY> for (var i = 1; i <= 25000; i++) {
...    db.testData.insert( { x : i } )
... }
WriteResult({ "nInserted" : 1 })
my-mongo-set:PRIMARY> db.testD2018-03-30T17:34:51.455+0530 I NETWORK  [thread1] trying reconnect to ( failed
2018-03-30T17:34:51.464+0530 I NETWORK  [thread1] reconnect ( ok
my-mongo-set:SECONDARY> rs.slaveOk()
my-mongo-set:SECONDARY> db.testData.count()


Vinodhs-MBP:~ vinodhkrish$ docker ps
CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                           NAMES
b27d82ac2439        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days >27017/tcp        mongo3
2b39f9e41973        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days >27017/tcp        mongo2
105b6df757d7        mongo:3.2.19-jessie         "docker-entrypoint.s…"   2 days ago          Up 1 days >27017/tcp        mongo1
Vinodhs-MBP:~ vinodhkrish$ docker network disconnect my-mongo-cluster mongo1

The member A has now become as SECONDARY, because it couldn’t reach other members in the replicaset. On the other side, B and C members see that A is not reachable and then B is elected as PRIMARY. We could see that some inserts from former A replicated to B before the network split happens.

(B node)

my-mongo-set:PRIMARY> db.testData.count()

Now do some write operations in current PRIMARY – B and then let node A join the network back by joining the container back to the bridge network. You can observe below that the node A’s member states are changing in the mongo prompt. (I just connected to A and pressed ENTER/RETURN button many times to see the member states, or you can see them in the log file):

(A node)

Vinodhs-MacBook-Pro:mongodb-osx-x86_64-3.2.19 vinodhkrish$ ./bin/mongo
MongoDB shell version: 3.2.19
connecting to:


From MongoDB point of view, we will see the replicaset process to understand what happened above. Normally the SECONDARY member syncs the oplog entries from its syncSource (the member from where the data is replicated) by using oplogFetcher. The OplogFetcher first sends a find() command to the syncSource’s oplog, and then follows with a series of getMores on the cursor. When node A rejoins the replicaset, node A’s oplogFetcher first sends find() command to syncSource node B and check it has a greater than or equal predicate on the timestamp of the last oplog entry it has fetched. Usually the find() command should at least return one doc due to the greater than or equal predicate. If not, it means that the syncSource is behind and so it will not replicate from it and look for other syncSource.

In this case, A’s oplogFetcher sees that the first document returned from node B does not match the last entry in its oplog. That means node A’s oplog has diverged from node B’s and it should go into ROLLBACK.

Node A first finds the common point between its oplog and its syncSource B’s oplog. It then goes through all of the operations in its oplog back to the common point and figures out how to undo them. Here, 9997 inserts are missed from B and C nodes, and so these documents will be recovered from A’s oplog.

2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: our last op time fetched: (term: 4, timestamp: Mar 30 12:03:52:139). source's GTE: (term: 5, timestamp: Mar 30 12:05:37:1) hashes: (3789163619674410187/3226093795606474294)
2018-03-30T12:08:37.160+0000 I REPL     [rsBackgroundSync] rollback 0
2018-03-30T12:08:37.160+0000 I REPL     [ReplicationExecutor] transition to ROLLBACK
2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] beginning rollback
2018-03-30T12:08:37.163+0000 I REPL     [rsBackgroundSync] rollback 1
2018-03-30T12:08:37.164+0000 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback our last optime:   Mar 30 12:03:52:139
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback their last optime: Mar 30 12:08:17:1c5
2018-03-30T12:08:37.166+0000 I REPL     [rsBackgroundSync] rollback diff in end of log times: -265 seconds
2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback common point is (term: 4, timestamp: Mar 30 12:03:46:d2)
2018-03-30T12:08:37.269+0000 I REPL     [rsBackgroundSync] rollback 3 fixup
2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] rollback 3.5
2018-03-30T12:08:38.240+0000 I REPL     [rsBackgroundSync] Setting minvalid to (term: 5, timestamp: Mar 30 12:08:17:1c5)
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4 n:1
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.6
2018-03-30T12:08:38.241+0000 I REPL     [rsBackgroundSync] rollback 4.7
2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 5 d:9997 u:0
2018-03-30T12:08:38.391+0000 I REPL     [rsBackgroundSync] rollback 6
2018-03-30T12:08:38.394+0000 I REPL     [rsBackgroundSync] rollback done
2018-03-30T12:08:38.396+0000 I REPL     [rsBackgroundSync] rollback finished


Where would these 9997 recovered documents go? MongoDB writes these ROLLBACK documents under the rollback directory in the dbpath. These recovered collections are named with namespace as the prefix and the date time as the suffix in their names. These are in BSON format, and we need to convert into JSON to analyze them so the plan for the next course of action can be done. In our case, the testData collection’s rollback data are as follows:

root@105b6df757d7:/# cd /data/db
root@105b6df757d7:/data/db# ls -l rollback/
total 324K
-rw-r--r-- 1 mongodb mongodb 323K Mar 30 12:08 admin.testData.2018-03-30T12-08-38.0.bson

root@105b6df757d7:/data/db/rollback# bsondump admin.testData.2018-03-30T12-08-38.0.bson > rollback.json
2018-03-30T12:13:00.033+0000 9997 objects found
root@105b6df757d7:/data/db/rollback# head rollback.json

That’s it? Now check the counts of the testData collection in node A:

my-mongo-set:PRIMARY> db.testData.count()

So the records 9997 which were rollbacked into the rollback directory would also be dropped from the collection. This ensures the data consistency throughout the replicaset.

How to avoid ROLLBACK – writeConcern

The default writeConcern in the replicaSet is w:1., i.e., When a client writes into a replicaSet, then it receives an acknowledgment from the PRIMARY alone and won’t wait for SECONDARY members’ acknowledgment. If you want to avoid the ROLLBACK scenario in your environment, then you have to use the {w:majority} or {w:n}, where 1 > n <=  (no. of members in your replica set). This ensures that the writes are propagated to so many members of the replica set before sending the acknowledgment to the client. This solves the problem of ROLLBACK.

But please be careful that you are not giving higher value to writeConcern, because it also affects the write performance. The acknowledgment needs to be received from the number of members mentioned in the value. The value {w:majority} provides the acknowledgement that write operations have propagated to the majority of voting nodes, including the primary and is suitable for most of the environments.

ROLLBACK – Limitation

The main thing to note here is that mongod will not rollback more than 300MB data. In such cases, we need to manually check the instance to recover the data. You can see the below message in mongod.log in such cases:

[replica set sync] replSet syncThread: 13410 replSet too much data to roll back

Understanding this simple ROLLBACK background helps us to decide what needs to be done with the rollbacked data. It also helps us avoid such scenarios, because data is data and is very important!

The post MongoDB Rollback in replicaset appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by