Sep
07
2018
--

Setting up Streaming Replication in PostgreSQL

postgres streaming replication

Configuring replication between two databases is considered to be a best strategy towards achieving high availability during disasters and provides fault tolerance against unexpected failures. PostgreSQL satisfies this requirement through streaming replication. We shall talk about another option called logical replication and logical decoding in our future blog post.

Streaming replication works on log shipping. Every transaction in postgres is written to a transaction log called WAL (write-ahead log) to achieve durability. A slave uses these WAL segments to continuously replicate changes from its master.

There exists three mandatory processes –

wal sender

 ,

wal receiver

 and

startup

 process, these play a major role in achieving streaming replication in postgres.

A

wal sender

 process runs on a master, whereas the

wal receiver

 and

startup

 processes runs on its slave. When you start the replication, a

wal receiver

 process sends the LSN (Log Sequence Number) up until when the WAL data has been replayed on a slave, to the master. And then the

wal sender

 process on master sends the WAL data until the latest LSN starting from the LSN sent by the

wal receiver

, to the slave.

Wal receiver

 writes the WAL data sent by

wal sender

 to WAL segments. It is the

startup

 process on slave that replays the data written to WAL segment. And then the streaming replication begins.

Note: Log Sequence Number, or LSN, is a pointer to a location in the WAL.

Steps to setup streaming replication between a master and one slave

Step 1:

Create the user in master using whichever slave should connect for streaming the WALs. This user must have REPLICATION ROLE.

CREATE USER replicator
WITH REPLICATION
ENCRYPTED PASSWORD 'replicator';

Step 2:

The following parameters on the master are considered as mandatory when setting up streaming replication.

  • archive_mode : Must be set to ON to enable archiving of WALs.
  • wal_level : Must be at least set to hot_standby  until version 9.5 or replica  in the later versions.
  • max_wal_senders : Must be set to 3 if you are starting with one slave. For every slave, you may add 2 wal senders.
  • wal_keep_segments : Set the WAL retention in pg_xlog (until PostgreSQL 9.x) and pg_wal (from PostgreSQL 10). Every WAL requires 16MB of space unless you have explicitly modified the WAL segment size. You may start with 100 or more depending on the space and the amount of WAL that could be generated during a backup.
  • archive_command : This parameter takes a shell command or external programs. It can be a simple copy command to copy the WAL segments to another location or a script that has the logic to archive the WALs to S3 or a remote backup server.
  • listen_addresses : Set it to * or the range of IP Addresses that need to be whitelisted to connect to your master PostgreSQL server. Your slave IP should be whitelisted too, else, the slave cannot connect to the master to replicate/replay WALs.
  • hot_standby : Must be set to ON on standby/replica and has no effect on the master. However, when you setup your replication, parameters set on the master are automatically copied. This parameter is important to enable READS on slave. Otherwise, you cannot run your SELECT queries against slave.

The above parameters can be set on the master using these commands followed by a restart:

ALTER SYSTEM SET wal_level TO 'hot_standby';
ALTER SYSTEM SET archive_mode TO 'ON';
ALTER SYSTEM SET max_wal_senders TO '5';
ALTER SYSTEM SET wal_keep_segments TO '10';
ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM SET hot_standby TO 'ON';
ALTER SYSTEM SET archive_command TO 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f';

$ pg_ctl -D $PGDATA restart -mf

Step 3:

Add an entry to

pg_hba.conf

 of the master to allow replication connections from the slave. The default location of

pg_hba.conf

 is the data directory. However, you may modify the location of this file in the file 

postgresql.conf

. In Ubuntu/Debian, pg_hba.conf may be located in the same directory as the postgresql.conf file by default. You can get the location of postgresql.conf in Ubuntu/Debian by calling an OS command =>

pg_lsclusters

.

host replication replicator 192.168.0.28/32 md5

The IP address mentioned in this line must match the IP address of your slave server. Please change the IP accordingly.

In order to get the changes into effect, issue a SIGHUP:

$ pg_ctl -D $PGDATA reload
Or
$ psql -U postgres -p 5432 -c "select pg_reload_conf()"

Step 4:

pg_basebackup

 helps us to stream the data through the 

wal sender

 process from the master to a slave to set up replication. You can also take a tar format backup from master and copy that to the slave server. You can read more about tar format pg_basebackup here

The following step can be used to stream data directory from master to slave. This step can be performed on a slave.

$ pg_basebackup -h 192.168.0.28 -U replicator -p 5432 -D $PGDATA -P -Xs -R

Please replace the IP address with your master’s IP address.

In the above command, you see an optional argument -R. When you pass -R, it automatically creates a

recovery.conf

  file that contains the role of the DB instance and the details of its master. It is mandatory to create the recovery.conf file on the slave in order to set up a streaming replication. If you are not using the backup type mentioned above, and choose to take a tar format backup on master that can be copied to slave, you must create this recovery.conf file manually. Here are the contents of the recovery.conf file:

$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.28 port=5432 user=replicator password=replicator'
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

In the above file, the role of the server is defined by standby_mode.

standby_mode

  must be set to ON for slaves in postgres.
And to stream WAL data, details of the master server are configured using the parameter

primary_conninfo

 .

The two parameters

standby_mode

  and

primary_conninfo

 are automatically created when you use the optional argument -R while taking a pg_basebackup. This recovery.conf file must exist in the data directory($PGDATA) of Slave.

Step 5:

Start your slave once the backup and restore are completed.

If you have configured a backup (remotely) using the streaming method mentioned in Step 4, it just copies all the files and directories to the data directory of the slave. Which means it is both a back up of the master data directory and also provides for restore in a single step.

If you have taken a tar back up from the master and shipped it to the slave, you must unzip/untar the back up to the slave data directory, followed by creating a recovery.conf as mentioned in the previous step. Once done, you may proceed to start your PostgreSQL instance on the slave using the following command.

$ pg_ctl -D $PGDATA start

Step 6:

In a production environment, it is always advisable to have the parameter

restore_command

set appropriately. This parameter takes a shell command (or a script) that can be used to fetch the WAL needed by a slave, if the WAL is not available on the master.

For example:

If a network issue has caused a slave to lag behind the master for a substantial time, it is less likely to have those WALs required by the slave available on the master’s

pg_xlog

 or

pg_wal

 location. Hence, it is sensible to archive the WALs to a safe location, and to have the commands that are needed to restore the WAL set to restore_command parameter in the recovery.conf file of your slave. To achieve that, you have to add a line similar to the next example to your recovery.conf file in slave. You may substitute the cp command with a shell command/script or a copy command that helps the slave get the appropriate WALs from the archive location.

restore_command = 'cp /mnt/server/archivedir/%f "%p"'

Setting the above parameter requires a restart and cannot be done online.

Final step: validate that replication is setup

As discussed earlier, a

wal sender

  and a

wal receiver

  process are started on the master and the slave after setting up replication. Check for these processes on both master and slave using the following commands.

On Master
==========
$ ps -eaf | grep sender
On Slave
==========
$ ps -eaf | grep receiver
$ ps -eaf | grep startup

You must see those all three processes running on master and slave as you see in the following example log.

On Master
=========
$ ps -eaf | grep sender
postgres  1287  1268  0 10:40 ?        00:00:00 postgres: wal sender process replicator 192.168.0.28(36924) streaming 0/50000D68
On Slave
=========
$ ps -eaf | egrep "receiver|startup"
postgres  1251  1249  0 10:40 ?        00:00:00 postgres: startup process   recovering 000000010000000000000050
postgres  1255  1249  0 10:40 ?        00:00:04 postgres: wal receiver process   streaming 0/50000D68

You can see more details by querying the master’s pg_stat_replication view.

$ psql
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 1287
usesysid         | 24615
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.0.28
client_hostname  |
client_port      | 36924
backend_start    | 2018-09-07 10:40:48.074496-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/50000D68
write_lsn        | 0/50000D68
flush_lsn        | 0/50000D68
replay_lsn       | 0/50000D68
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Reference : https://www.postgresql.org/docs/10/static/warm-standby.html#STANDBY-SERVER-SETUP

If you found this post interesting…

Did you know that Percona now provides PostgreSQL support services? If you’d like to read more about this, here’s some more information. We’re here to help.

The post Setting up Streaming Replication in PostgreSQL appeared first on Percona Database Performance Blog.

Jun
28
2018
--

Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby

PostgreSQL Point in Time Recovery

PostgreSQL Point in Time RecoveryThe need to recover a database back to a certain point in time can be a nerve-racking task for DBAs and for businesses. Can this be simplified? Could it be made to work faster? Can we recover to a given point in time with zero loss of transactions/records? Fortunately, the answer to these questions is yes. PostgreSQL Point in Time Recovery (PITR) is an important facility. It offers DBAs the ability to restore a PostgreSQL database simply, quickly and without the loss of transactions or data.

In this post, we’ll help you to understand how this can be achieved, and reduce the potential for pain in the event of panic situations where you need to perform a PITR.

Before proceeding further, let us understand what could force us to perform a PITR.

  1. Someone has accidentally dropped or truncated a table.
  2. A failed deployment has made changes to the database that are difficult to reverse.
  3. You accidentally deleted or modified a lot of data, and as a consequence you cannot run your applications.

In such scenarios, you would immediately look for the latest full backup and the relevant transaction logs (aka WALs in PostgreSQL) to recover up to a known point in the past, before the error occurred. But what if your backup is corrupt and not valid?

Well, it is very important to perform a backup and recovery validation to ensure that the backups are always recoverable—we will address this in a future post. But, if the backup that you are looking at is corrupt, that can be a nightmare. One such unlucky incident for GitLab, where there was a backup restoration failure, caused a major outage followed by a data loss after recovery.

https://about.gitlab.com/2017/02/01/gitlab-dot-com-database-incident/

Even the best of plans can be hard to realize in practice.

It may be that our backups are intact and recoverable. Can we afford to wait until we copy/download the backup and recover it to another disk or server? What if the database size is several hundreds of GBs or several TBs like GitLab’s?

The solution to the problem is: add another standby that is always delayed by a few hours or a day.

This is one of the great features available in PostgreSQL. If you have migrated from Oracle RDBMS to PostgreSQL, you can think of it as an equivalent to FLASHBACK DATABASE in Oracle. Flashback database helps you to rewind data back in time. However, the technique does not work if you have dropped a data file. In fact, this is the case for both Oracle RDBMS and PostgreSQL PITR. ?

https://docs.oracle.com/cd/E11882_01/backup.112/e10642/flashdb.htm#BRADV71000

Adding a Delayed Standby in PostgreSQL

It is important that we use features like streaming replication to achieve high availability in PostgreSQL. Most of the environments have 1 master with 1 or more slaves (standby), either in the same data centre or geographically distributed. To save the time needed for PITR, you can add another slave that can always be delayed by a certain amount of time—this could be hours or days.

For example, if I know that my deployment is determined to be successful when no issues are observed in the first 12 hours, then I might delay one of the standbys by 12 hours.

To delay a standby, once you have setup streaming replication between your PostgreSQL master and slave, the following parameter needs to be added to the recovery.conf file of the slave, followed by a restart.

recovery_min_apply_delay = '12h' # or '1min' or 1d'

Now, let’s consider an example where you have inserted 10000 records at 10:27:34 AM and you have accidentally deleted 5000 records at 10:28:43 AM. Let’s say that you have a standby that is delayed by 1 hour. The steps to perform PITR using the delayed standby through until 10:27:34 AM look like this:

Steps to perform PostgreSQL Point in Time Recovery using a delayed standby


Step 1

Stop the slave (delayed standby) immediately, as soon as you have noticed that an accidental change has happened. If you know that the change has been already applied on the slave, then you cannot perform the point in time recovery using this method.

$ pg_ctl -D $PGDATA stop -mf

Step 2

Rename the recovery.conf file in your standby to another name.

$ mv $PGDATA/recovery.conf $PGDATA/recovery.conf.old

Step 3

Create a new recovery.conf file with the required parameters for PITR.

# recovery.conf file always exists in the Data Directory of Slave
recovery_target_time = '2018-06-07 10:27:34 EDT'
restore_command = 'sh /var/lib/pgsql/scripts/restore_command_script.sh %p %f'
recovery_target_action = 'pause'
recovery_target_inclusive = 'false'

recovery_target_time

Specifies the timestamp up to which you wish to recover your database.

restore_command

Shell command that can be used by PostgreSQL to fetch the required Transaction Logs (WALs) for recovery.
PostgreSQL sends the arguments %p (path to WAL file) and % f (WAL file name) to this shell command. These arguments can be used in the script you use to copy your WALs.

Here is an example script for your reference. This example relies on rsync. The script connects to the backup server to fetch the WALs requested by PostgreSQL. (We’ll cover the procedure to archive these WALs in another blog post soon: this could be a good time to subscribe to the Percona blog mailing list!)

$ cat /var/lib/pgsql/scripts/restore_command_script.sh
#!/bin/bash
# Enable passwordless ssh to Backup Server
# $1 is %p substituted by postgres as the path to WAL File
# $2 is the %f substituted by postgres as the WAL File Name
LOG=/var/lib/pgsql/scripts/restore_command.log
Backup_Server=192.168.0.12
ArchiveDir='/archives'
#
wal=$2
wal_with_path=$1
rsync --no-motd -ave ssh ${Backup_Server}:${ArchiveDir}/${wal} ${wal_with_path} >>$LOG 2>&1
if [ "$?" -ne "0" ]
then
echo "Restore Failed for WAL : $wal" >> $LOG
exit 1
fi
#

recovery_target_action

This is the action that needs to be performed after recovering the instance up to the recovery_target_time. Setting this to pause would let you modify the recovery_target_time after recovery, if you need to. You can then replay the transactions at a slow pace until your desired recovery target is reached. For example, you can recover until 2018-06-07 10:26:34 EDT and then modify recovery_target_time to 2018-06-07 10:27:34 EDT when using pause.

When you know that all the data you are looking for has been recovered, you can issue the following command to stop the recovery process, change the timeline and open the database for writes.

select pg_wal_replay_resume();

Other possible settings for this parameter are promote and shutdown. These do not allow you to replay a few more future transactions after the recovery, as you can with pause.

recovery_target_inclusive

Whether to stop recovery just after the specified recovery_target_time(true) or before(false).

Step 4

Start PostgreSQL using pg_ctl. Now, it should read the parameters in recovery.conf and perform the recovery until the time you set in the recovery_target_time.

$ pg_ctl -D $PGDATA start

Step 5

Here is how the log appears. It says that has performed point-in-time-recovery and has reached a consistent state as requested.

2018-06-07 10:43:22.303 EDT [1423] LOG: starting point-in-time recovery to 2018-06-07 10:27:34-04
2018-06-07 10:43:22.607 EDT [1423] LOG: redo starts at 0/40005B8
2018-06-07 10:43:22.607 EDT [1423] LOG: consistent recovery state reached at 0/40156B0
2018-06-07 10:43:22.608 EDT [1421] LOG: database system is ready to accept read only connections
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery stopping before commit of transaction 570, time 2018-06-07 10:28:59.645685-04
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery has paused
2018-06-07 10:43:22.626 EDT [1423] HINT: Execute pg_wal_replay_resume() to continue.

Step 6

You can now stop recovery and open the database for writes after PITR.

Before executing the next command, you may want to verify that you have got all the desired data by connecting to the database and executing some SQL’s. You can still perform reads before you stop recovery. If you notice that you need another few minutes (or hours) of transactions, then modify the parameter recovery_target_time and go back to step 4. Otherwise, you can stop the recovery by running the following command.

$ psql
select pg_wal_replay_resume();

Summing up

Using PostgreSQL Point in time Recovery is the most simple of procedures that does not involve any effort in identifying the latest backups, transaction logs and space or server to restore in a database emergency. These things happen! Also, it could save a lot of time because the replay of WALs is much faster than rebuilding an entire instance using backups, especially when you have a huge database.

Important post script: I tested and recorded these steps using PostgreSQL 10.4. It is possible with PostgreSQL 9.x versions, however, the parameters could change slightly and you should refer to the PostgreSQL documentation for the correct syntax.

The post Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby appeared first on Percona Database Performance Blog.

Apr
16
2013
--

Is your MySQL buffer pool warm? Make it sweat!

Today’s blog post diving into the waters of the MySQL buffer pool is a cross-post from Groupon’s engineering blog, and is Part 1 of 2. Thank you to Kyle Oppenheim at Groupon for contributing to this project and post. We’ll be posting Part 2 on Thursday. I’ll be at the Percona Live MySQL Conference and Expo next week in Santa Clara, California so look for me there – I’d love to connect and talk more about MySQL buffer pools or anything else that’s on your mind!

There are numerous solutions for MySQL high availability. Many rely on MySQL’s asynchronous replication to maintain a warm standby server which is flipped into service if the active master server has an issue. At Groupon, our standard MySQL database configuration follows this active/passive pattern. There is a virtual IP that points to the active server of a pair. The passive server has mysqld running and replicating from the active master. In theory, failing over is the simple matter of moving the virtual IP. In practice, it’s slightly more complicated. To manage this process, we use a tool co-developed with Percona.

“Warm standby server”? Did you catch that? What does that mean? In most of these high-availability solutions it means that mysqld is running on the standby server and that replication is not lagging. Unfortunately, that’s often not sufficient for failover during peak traffic. The standby server does not process query traffic except for replication. The buffer pool and adaptive hash index on the standby server will not have recently accessed pages. When it starts handling queries after failover, the lower cache hit rates can lead to outages. Specifically, at Groupon, our servers would be heavily I/O bound after failover as the buffer pool pages were loaded from disk.

Replaying Queries

Working with Groupon, we have developed a solution to keep the standby server’s caches hot. (See my Fosdem 2013 slides for information about discarded designs and benchmarks.)

First, we set long_query_time to 0 in order to log every query. (See part two for handling massive slow log volume.) The slow logs are served, via HTTP, by mysql_slowlogd. This daemon is similar to running tail -f slow.log, except that it knows how to follow the log stream across log rotation events. On the standby server, the logs are replayed with Percona Playback by streaming the slow log from the active server.

wget -q -O - http://master_server:3307/slow | percona-playback --mysql-host 127.0.0.1 --mysql-username playback --mysql-password PaSSwOrd --mysql-schema schema_name --query-log-stdin --dispatcher-plugin thread-pool --thread-pool-threads-count 100 --session-init-query \"set innodb_fake_changes=1\" > /var/log/playback.log 2>&1 &

Our awesome development team added a few features to Percona Playback to make it work better for this use case. You will need version 0.6 or later to get these features. Be aware that playback output is really verbose, in production, most likely you want it to redirect to /dev/null, and only have a log file for debugging purposes.

  1. Streaming logs from stdin Percona Playback now supports the –query-log-stdin command-line option for accepting a never-ending stream of queries to playback.
  2. Read-only playback Using the –session-init-query command-line option, we set the option innodb_fake_changes to prevent INSERTs, UPDATEs, and DELETEs from corrupting the data on the standby server. You will need Percona Server in order to use innodb_fake_changes.
  3. Thread pool Percona Playback added a connection pool option via –dispatcher-plugin-thread-pool that will allow connection reuse. This is necessary when running a large stream of queries.

Benchmarks

We benchmarked with slow query logs captured from our production systems. We restored a production database backup to our test database so that our test database was consistent before applying the captured query traffic. This is an important step because update statements that match no rows or insert statements that have duplicate key errors may be faster than an actual database write.

The slow logs were split into chunks, each containing roughly 1M queries. We warmed the cold database with the first chunk and replayed the second chunk after the warmup.

Disk read I/O for chunk 1 followed by 2 - MySQL buffer pool

The y axis is logarithmic, so the difference between the IO usage is 2 orders of magnitude. All graphs looked like this (we did 39 measurements), the next graph shows chunk 4’s workload warmed up with chunk 3.

Disk read IO for chunk 3 followed by 4 - MySQL buffer pool

The result is similar for every single graph, each chunk warmed up the buffer pool for the next one.

disk_io_chunk_1and1 - MySQL buffer pool

As an additional experiment we tried replaying the same chunk again. We expected everything to be cached if we warmed the cache with the exact same data. All the graphs from such self-warming experiments look like this one. The green part of the graph lines up with the blue part.

Check back Thursday for Part 2!

The post Is your MySQL buffer pool warm? Make it sweat! appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com