As we know, Patroni is a well-established standard for an HA framework for PostgreSQL clusters. From time to time, we need to perform maintenance tasks like upgrading the topology or making changes to the existing setup. Here, we will discuss mainly how we can replace the IP/Host information in Patroni and Etcd layers. Below, we […]
02
2025
How to Replace Patroni and Etcd IP/Host Information in PostgreSQL
23
2022
Rebuild Patroni Replica Using pgBackRest
Patroni is one of the most used high availability (HA) solutions with the PostgreSQL database. It uses a Distributed Configuration Store (DCS) to keep the configuration in a centralized location available for all nodes making it an easy-to-use and reliable HA solution available in the market today.
On the other hand, pgBackRest is a backup solution that helps in taking not only the FULL backup but also incremental and differential backup. This is one of the most used backup tool used for PostgreSQL databases.
In the previous blogs, we have discussed how to set up PostgreSQL HA with Patroni, and how to configure pgBackRest. Both these open source tools have been saviors for DBAs for achieving high availability and performing restoration in ample ways. In this blog, we will integrate both of these tools and understand how they can work together to reduce the server load.
Scenarios:
- Reinitializing the Patroni cluster using pgBackRest (instead of pgBaseBackup).
- Creating the Patroni replica using pgBackRest.
Pre-configured setup:
- Patroni configuration setup for two or more databases.
- pgBackRest configured on a dedicated backup host.
For the purpose of testing these scenarios, the below configurations will be used throughout this blog:
Patroni Nodes:
+ Cluster: prod (7171021941707843784) ----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------+-------+---------+---------+----+-----------+ | node1 | node1 | Leader | running | 1 | | | node2 | node2 | Replica | running | 1 | 0 | +--------+-------+---------+---------+----+-----------+
Patronictl edit-config shows below:
loop_wait: 10 maximum_lag_on_failover: 1048576 postgresql: create_replica_methods: - pgbackrest - basebackup parameters: archive_command: pgbackrest --stanza=patroni archive-push %p archive_mode: 'on' archive_timeout: 120s hot_standby: 'on' listen_addresses: '*' logging_collector: 'on' max_replication_slots: 10 max_wal_senders: 10 pg_hba: - host all all 0.0.0.0/0 md5 - host replication all 0.0.0.0/0 md5 - local all postgres peer wal_level: replica wal_log_hints: 'on' pgbackrest: command: /usr/bin/pgbackrest --stanza=patroni --log-level-file=detail --delta restore keep_data: true no_params: true recovery_conf: restore_command: pgbackrest --stanza=patroni archive-get %f %p use_pg_rewind: true use_slots: true retry_timeout: 10 ttl: 30
Note: Please check the highlighted sections, which are specific for rebuilding nodes using pgBackRest backup.
Apart from Patroni, we will need a Backup Repo host where pgBackRest has been configured. It can be on a dedicated server or one of the DB hosts. However, it is recommended to use a dedicated server as in case DB goes down, we have a separate server to make the life of the DBAs easier.
Let’s test the scenarios one by one:
Reinitializing the Patroni cluster using pgBackRest (instead of pgBaseBackup)
The main advantage of using pgBackRest instead of pgBaseBackup while reinitializing the node is that it’ll reduce the load from the leader node. This will not make any difference if the DB size is smaller. However, this feature is very useful in case the DB size is huge and it takes hours or days to build the node. This will divert the resource utilization on the dedicated backup host instead of the primary server, which anyways is busy fulfilling the majority of the requests coming to the database.
Let us try to understand how we can rebuild the node using backup.
Many times, we are unable to start the secondary nodes after failover or switchover. To handle this situation, Patroni allows us to reinitialize the database cluster, which will create/rebuild the node by wiping the data directory. In the background, it will copy all the contents of the data directory from the Primary Server and re-create the desired node.
Please make the changes in the Patroni configuration/yml file and reload the configuration, as shown previously. To reinitialize the Patroni replica node, the reinit command is used as below:
ubuntu@192.168.0.1:~$ patronictl -c /etc/patroni/node1.yml reinit prod + Cluster: prod (7171021941707843784) ----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------+-------+---------+---------+----+-----------+ | node1 | node1 | Leader | running | 1 | | | node2 | node2 | Replica | running | 1 | 0 | +--------+-------+---------+---------+----+-----------+ Which member do you want to reinitialize [node1, node2]? []: node2 Are you sure you want to reinitialize members node2? [y/N]: y Success: reinitialize for member node2
On the replica node, we can notice in the top command that it is rebuilding the node using pgBackRest backup and not pgBackRest. Ideally, Patroni uses pgBaseBackup in case create_replica_methods is not used which increases the load on the leader node.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12939 postgres 20 0 306440 268140 4 S 34.9 27.1 8871:14 GkwP468a 791850 postgres 20 0 218692 29544 26820 S 8.9 3.0 0:00.28 /usr/lib/postgresql/14/bin/postgres -D /var/lib/postgresql/14/main --config-file=/var/lib/postgresql/14/main/postgresql.conf --listen_ad+ 791881 postgres 20 0 60980 12696 10628 S 3.2 1.3 0:00.10 pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni archive-get 00000013.history pg_wal/RECOVERYHISTORY 791874 postgres 20 0 218692 8032 5260 S 2.2 0.8 0:00.07 postgres: prod: startup 791827 postgres 20 0 7760 3516 3212 R 1.3 0.4 0:00.04 bash 784973 postgres 20 0 23316 212 0 S 0.3 0.0 0:18.42 tracepath
In case it is using pgBackRest, then it will create a restore file mentioning pgBackRest command as below:
2022-12-15 15:41:29.070 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=791815-82f4ea68 --log-level-console=info --log-level-file=detail --pg1-path=/var/lib/postgresql/14/main --process-max=2 --repo1-host=192.168.0.5 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=patroni 2022-12-15 15:41:30.800 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04 2022-12-15 15:41:30.877 P00 DETAIL: check '/var/lib/postgresql/14/main' exists 2022-12-15 15:41:30.877 P00 DETAIL: remove 'global/pg_control' so cluster will not start if restore does not complete 2022-12-15 15:41:30.918 P00 INFO: remove invalid files/links/paths from '/var/lib/postgresql/14/main' 2022-12-15 15:41:30.919 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/backup_label.old' 2022-12-15 15:41:31.841 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/base/13761/pg_internal.init' 2022-12-15 15:41:31.920 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/global/pg_internal.init'
Creating the Patroni replica using pgBackRest:
In case the bootstrap section contains code to rebuild the node using pgBackRest, then while adding the node in the already existing Patroni cluster, the first time building of the new node will use pgBackRest backup instead of pgBaseBackup. Also, point-in-time recovery can be done using the bootstrap section. This will help in not only reducing the load from the leader node but also the backup node will help restore the data with comparatively lesser resource utilization.
To configure the same, please use the below in the Patroni configuration file:
bootstrap: method: <custom_bootstrap_method_name> <custom_bootstrap_method_name>: command: <path_to_custom_bootstrap_script> [param1 [, ...]] keep_existing_recovery_conf: True/False no_params: True/False recovery_conf: recovery_target_action: promote recovery_target_timeline: <PITR_Time> restore_command: <method_specific_restore_command>
In this example, the below section has been added to Patroni config, which will build the node by performing point-in-time recovery using the time stamp mentioned.
bootstrap: method: pitr_restore_by_pgbackrest pitr_restore_by_pgbackrest: command: 'pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail --link-all --type=time --target="2022-12-13 15:46:04" restore' keep_existing_recovery_conf: True no_params: True recovery_conf: recovery_target_action: "promote" recovery_target_time: "2022-12-13 15:46:04" restore_command: 'pgbackrest -config=/etc/rdba/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail archive-get %f "%p"' recovery_target_inclusive: true
When the node is being built, one can see in the TOP processes that pgBackRest is being used instead of base backup and Patroni Status when the node is being built:
? patroni.service - PostgreSQL high-availability manager Loaded: loaded (/lib/systemd/system/patroni.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/patroni.service.d ??override.conf Active: active (running) since Mon 2022-12-19 19:06:16 UTC; 3s ago Main PID: 2094 (patroni) Tasks: 11 (limit: 1143) Memory: 114.6M CPU: 941ms CGroup: /system.slice/patroni.service ??2094 /usr/bin/python3 /usr/bin/patroni /etc/patroni/db2.yml ??2100 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail --delta restore ??2102 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-stderr=error --process=1 --remote-type=repo --stanza=p> ??2103 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-stderr=error --process=2 --remote-type=repo --stanza=p> ??2104 ssh -o LogLevel=error -o Compression=no -o PasswordAuthentication=no postgres@192.168.0.3 "/usr/bin/pgbackrest --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-st> ??2105 ssh -o LogLevel=error -o Compression=no -o PasswordAuthentication=no postgres@192.168.0.3 "/usr/bin/pgbackrest --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-st> Dec 19 19:06:16 ip-192-168-0-2 systemd[1]: Started PostgreSQL high-availability manager. Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,415 INFO: Selected new etcd server http://192.168.0.1:2379 Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,434 INFO: No PostgreSQL configuration items changed, nothing to reload. Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,551 INFO: Lock owner: node3; I am db2 Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,633 INFO: trying to bootstrap from leader 'node3' Dec 19 19:06:16 ip-192-168-0-2 patroni[2100]: 2022-12-19 19:06:16.645 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=2100-9618fafd --log-level-console=info --log-l> Dec 19 19:06:16 ip-192-168-0-2 patroni[2100]: WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/var/lib/postgresql/14/main' to confirm that this is a valid $PGDATA director Dec 19 19:06:17 ip-192-168-0-2 patroni[2100]: 2022-12-19 19:06:17.361 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04
Notice that it is using delta restore which means it will automatically identify which files are required to restore and only those will be restored making the whole process faster.
Also, the log file (by default – /var/log/pgbackrest) will contain the restore date and time as below:
-------------------PROCESS START------------------- 2022-12-19 19:06:16.645 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=2100-9618fafd --log-level -console=info --log-level-file=detail --pg1-path=/var/lib/postgresql/14/main --process-max=2 --repo1-host=192.168.0.3 --repo1-host-user=postgres - -repo1-path=/pgrdbackups --stanza=patroni 2022-12-19 19:06:16.646 P00 WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/var/lib/postgresql/14/mai n' to confirm that this is a valid $PGDATA directory. --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted. 2022-12-19 19:06:17.361 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04 2022-12-19 19:06:17.361 P00 DETAIL: check '/var/lib/postgresql/14/main' exists 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/1' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/13760' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/13761'
This blog majorly focuses on integrating the pgBackRest and Patroni, however, one can use other backup tools like WAL_E or BARMAN to rebuild the nodes. More information on such configuration can be found in the Patroni documentation.
Conclusion
Patroni and pgBackRest solutions work best when integrated, which helps in reducing the load from the Primary DB Server. This integration, where the node is reinitialized using pgBackRest, makes the optimum utilization of the dedicated backup server. If the Patroni setup is already configured and the dedicated backup host is also available, then making a few configuration changes in patroni.yml can work like wonders.
11
2021
PostgreSQL HA with Patroni: Your Turn to Test Failure Scenarios
A couple of weeks ago, Jobin and I did a short presentation during Percona Live Online bearing a similar title as the one for this post: “PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them”. We deployed a 3-node PostgreSQL environment with some recycled hardware we had lying around and set ourselves at “breaking” it in different ways: by unplugging network and power cables, killing main processes, attempting to saturate processors. All of this while continuously writing and reading data from PostgreSQL. The idea was to see how Patroni would handle the failures and manage the cluster to continue delivering service. It was a fun demo!
We promised a follow-up post explaining how we set up the environment, so you could give it a try yourselves, and this is it. We hope you also have fun attempting to reproduce our small experiment, but mostly that you use it as an opportunity to learn how a PostgreSQL HA environment managed by Patroni works in practice: there is nothing like a hands-on lab for this!
Initial Setup
We recycled three 10-year old Intel Atom mini-computers for our experiment but you could use some virtual machines instead: even though you will miss the excitement of unplugging real cables, this can still be simulated with a VM. We installed the server version of Ubuntu 20.04 and configured them to know “each other” by hostname; here’s how the hosts file of the first node looked like:
$ cat /etc/hosts 127.0.0.1 localhost node1 192.168.1.11 node1 192.168.1.12 node2 192.168.1.13 node3
etcd
Patroni supports a myriad of systems for Distribution Configuration Store but etcd remains a popular choice. We installed the version available from the Ubuntu repository on all three nodes:
sudo apt-get install etcd
It is necessary to initialize the etcd cluster from one of the nodes and we did that from node1 using the following configuration file:
$ cat /etc/default/etcd ETCD_NAME=node1 ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380" ETCD_INITIAL_CLUSTER_TOKEN="devops_token" ETCD_INITIAL_CLUSTER_STATE="new" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.11:2380" ETCD_DATA_DIR="/var/lib/etcd/postgresql" ETCD_LISTEN_PEER_URLS="http://192.168.1.11:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.11:2379,http://localhost:2379" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.11:2379"
Note how ETCD_INITIAL_CLUSTER_STATE is defined with “new”.
We then restarted the service:
sudo systemctl restart etcd
We can then move on to install etcd on node2. The configuration file follows the same structure as that of node1, except that we are adding node2 to an existing cluster so we should indicate the other node(s):
ETCD_NAME=node2 ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380" ETCD_INITIAL_CLUSTER_TOKEN="devops_token" ETCD_INITIAL_CLUSTER_STATE="existing" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.12:2380" ETCD_DATA_DIR="/var/lib/etcd/postgresql" ETCD_LISTEN_PEER_URLS="http://192.168.1.12:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.12:2379,http://localhost:2379" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.12:2379"
Before we restart the service, we need to formally add node2 to the etcd cluster by running the following command on node1:
sudo etcdctl member add node2 http://192.168.1.12:2380
We can then restart the etcd service on node2:
sudo systemctl restart etcd
The configuration file for node3 looks like this:
ETCD_NAME=node3 ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380,node3=http://192.168.1.13:2380" ETCD_INITIAL_CLUSTER_TOKEN="devops_token" ETCD_INITIAL_CLUSTER_STATE="existing" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.13:2380" ETCD_DATA_DIR="/var/lib/etcd/postgresql" ETCD_LISTEN_PEER_URLS="http://192.168.1.13:2380" ETCD_LISTEN_CLIENT_URLS="http://192.168.1.13:2379,http://localhost:2379" ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.13:2379"
Remember we need to add node3 to the cluster by running the following command on node1:
sudo etcdctl member add node3 http://192.168.1.13:2380
before we can restart the service on node3:
sudo systemctl restart etcd
We can verify the cluster state to confirm it has been deployed successfully by running the following command from any of the nodes:
$ sudo etcdctl member list 2ed43136d81039b4: name=node3 peerURLs=http://192.168.1.13:2380 clientURLs=http://192.168.1.13:2379 isLeader=false d571a1ada5a5afcf: name=node1 peerURLs=http://192.168.1.11:2380 clientURLs=http://192.168.1.11:2379 isLeader=true ecec6c549ebb23bc: name=node2 peerURLs=http://192.168.1.12:2380 clientURLs=http://192.168.1.12:2379 isLeader=false
As we can see above, node1 is the leader at this point, which is expected since the etcd cluster has been bootstrapped from it. If you get a different result, check for etcd entries logged to /var/log/syslog on each node.
Watchdog
Quoting Patroni’s manual:
Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.
While the use of a watchdog mechanism with Patroni is optional, you shouldn’t really consider deploying a PostgreSQL HA environment in production without it.
For our tests, we used the standard software implementation for watchdog that is shipped with Ubuntu 20.04, a module called softdog. Here’s the procedure we used in all three nodes to configure the module to load:
sudo sh -c 'echo "softdog" >> /etc/modules'
Patroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):
sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'
These two steps looked like all that would be required for watchdog to work but to our surprise, the softdog module wasn’t loaded after restarting the servers. After spending quite some time digging around we figured the module was blacklisted by default and there was a strain file with such a directive still lingering around:
$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog /lib/modprobe.d/blacklist_linux_5.4.0-72-generic.conf:blacklist softdog
Editing that file in each of the nodes to remove the line above and restarting the servers did the trick:
$ lsmod | grep softdog softdog 16384 0
$ ls -l /dev/watchdog* crw-rw---- 1 postgres postgres 10, 130 May 21 21:30 /dev/watchdog crw------- 1 root root 245, 0 May 21 21:30 /dev/watchdog0
PostgreSQL
Percona Distribution for PostgreSQL can be easily installed from the Percona Repository in a few easy steps:
sudo apt-get update -y; sudo apt-get install -y wget gnupg2 lsb-release curl wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb sudo dpkg -i percona-release_latest.generic_all.deb sudo apt-get update sudo percona-release setup ppg-12 sudo apt-get install percona-postgresql-12
An important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:
sudo systemctl disable postgresql
For our tests, we want to start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster, so we stop the server and remove the data directory that has been created as part of the PostgreSQL installation:
sudo systemctl stop postgresql sudo rm -fr /var/lib/postgresql/12/main
These steps should be repeated in nodes 2 and 3 as well.
Patroni
The Percona Repository also includes a package for Patroni so with it already configured in the nodes we can install Patroni with a simple:
sudo apt-get install percona-patroni
Here’s the configuration file we have used for node1:
$ cat /etc/patroni/config.yml scope: stampede name: node1 restapi: listen: 0.0.0.0:8008 connect_address: node1:8008 etcd: host: node1:2379 bootstrap: # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 # master_start_timeout: 300 # synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" logging_collector: 'on' max_wal_senders: 5 max_replication_slots: 5 wal_log_hints: "on" #archive_mode: "on" #archive_timeout: 600 #archive_command: "cp -f %p /home/postgres/archived/%f" #recovery_conf: #restore_command: cp /home/postgres/archived/%f %p # some desired options for 'initdb' initdb: # Note: It needs to be a list (some options need values, others are switches) - encoding: UTF8 - data-checksums pg_hba: # Add following lines to pg_hba.conf after running 'initdb' - host replication replicator 192.168.1.1/24 md5 - host replication replicator 127.0.0.1/32 trust - host all all 192.168.1.1/24 md5 - host all all 0.0.0.0/0 md5 # - hostssl all all 0.0.0.0/0 md5 # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter) # post_init: /usr/local/bin/setup_cluster.sh # Some additional users users which needs to be created after initializing new cluster users: admin: password: admin options: - createrole - createdb postgresql: listen: 0.0.0.0:5432 connect_address: node1:5432 data_dir: "/var/lib/postgresql/12/main" bin_dir: "/usr/lib/postgresql/12/bin" # config_dir: pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: vagrant superuser: username: postgres password: vagrant parameters: unix_socket_directories: '/var/run/postgresql' watchdog: mode: required # Allowed values: off, automatic, required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:
sudo systemctl restart patroni
When Patroni starts, it will take care of initializing PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of Patroni’s configuration file. If everything went according to the plan, you should be able to connect to PostgreSQL using the credentials in the configuration file (password is vagrant):
$ psql -U postgres psql (12.6 (Ubuntu 2:12.6-2.focal)) Type "help" for help. postgres=#
Repeat the operation for installing Patroni on nodes 2 and 3: the only difference is that you will need to replace the references to node1 in the configuration file (there are four of them, shown in bold) with the respective node name.
You can also check the state of the Patroni cluster we just created with:
$ sudo patronictl -c /etc/patroni/config.yml list +----------+--------+-------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +----------+--------+-------+--------+---------+----+-----------+ | stampede | node1 | node1 | Leader | running | 2 | | | stampede | node2 | node2 | | running | 2 | 0 | | stampede | node3 | node3 | | running | 2 | 0 | +----------+--------+-------+--------+---------+----+-----------+
node1 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes 2 and 3 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).
HAProxy
A common implementation of high availability in a PostgreSQL environment makes use of a proxy: instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. However, this is not a transparent process: the application needs to be aware of this and split read-only from read-write traffic itself. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:
- Writes ? 5000
- Reads ? 5001
HAproxy can be installed as an independent server (and you can have as many as you want) but it can also be installed on the application server or the database server itself – it is a light enough service. For our tests, we planned on using our own Linux workstations (which also run Ubuntu 20.04) to simulate application traffic so we installed HAproxy on them:
sudo apt-get install haproxy
With the software installed, we modified the main configuration file as follows:
$ cat /etc/haproxy/haproxy.cfg global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen primary bind *:5000 option httpchk OPTIONS /master http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1 node1:5432 maxconn 100 check port 8008 server node2 node2:5432 maxconn 100 check port 8008 server node3 node3:5432 maxconn 100 check port 8008 listen standbys balance roundrobin bind *:5001 option httpchk OPTIONS /replica http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1 node1:5432 maxconn 100 check port 8008 server node2 node2:5432 maxconn 100 check port 8008 server node3 node3:5432 maxconn 100 check port 8008
Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary. For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:
$ curl -s http://node1:8008 {"state": "running", "postmaster_start_time": "2021-05-24 14:50:11.707 UTC", "role": "master", "server_version": 120006, "cluster_unlocked": false, "xlog": {"location": 25615248}, "timeline": 1, "database_system_identifier": "6965869170583425899", "patroni": {"version": "1.6.4", "scope": "stampede"}}
We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas. We can test this in practice, let’s save the postgres user password in a file to facilitate the process:
echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass chmod 0600 ~/.pgpass
We can then execute two read-requests to verify the round-robin mechanism is working as intended:
$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()" 192.168.1.13
$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()" 192.168.1.12
as well as test the writer access:
$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()" 192.168.1.11
You can also check the state of HAproxy by visiting http://localhost:7000/ on your browser.
Workload
To best simulate a production environment to test our failure scenarios, we wanted to have continuous reads and writes to the database. We could have used a benchmark tool such as Sysbench or Pgbench but we were more interested in observing the switch of source server upon a server failure than load itself. Jobin wrote a simple Python script that is perfect for this, HAtester. As was the case with HAproxy, we run the script from our Linux workstation. Since it is a Python script, you need to have a PostgreSQL driver for Python installed to execute it:
sudo apt-get install python3-psycopg2 curl -LO https://raw.githubusercontent.com/jobinau/pgscripts/main/patroni/HAtester.py chmod +x HAtester.py
Edit the script with the credentials to access the PostgreSQL servers (through HAproxy) if you are using different settings from ours. The only requirement for it to work is to have the target table created beforehand, so first connect to the postgres database (unless you are using a different target) in the Primary and run:
CREATE TABLE HATEST (TM TIMESTAMP);
You can then start two different sessions:
- One for writes:
./HAtester.py 5000
- One for reads:
./HAtester.py 5001
The idea is to observe what happens with database traffic when the environment experiences a failure; that is, how HAproxy will route reads and writes as Patroni adjusts the PostgreSQL cluster. You can continuously monitor Patroni from the point of view of the nodes by opening a session in each of them and running the following command:
sudo -u postgres watch patronictl -c /etc/patroni/config.yml list
To facilitate observability and better follow the changes in real-time, we used the terminal multiplexer Tmux to visualize all 5 sessions on the same screen:
- On the left side, we have one session open for each of the 3 nodes, continuously running:
sudo -u postgres watch patronictl -c /etc/patroni/config.yml list
It’s better to have the Patroni view for each node independently because when you start the failure tests you will lose connection to a part of the cluster.
- On the right side, we are executing the HAtester.py script from our workstation:
- Sending writes through port 5000:
./HAtester.py 5000
- and reads through port 5001:
./HAtester.py 5001
- Sending writes through port 5000:
A couple of notes on the execution of the HAtester.py script:
- Pressing Ctrl+C will break the connection but the script will reconnect, this time to a different replica (in the case of reads) due to having the Standbys group on HAproxy configured with round-robin balancing.
- When a switchover or failover takes place and the nodes are re-arranged in the cluster, you may temporarily see writes sent to a node that used to be a replica and was just promoted as primary and reads send to a node that used to be the primary and was demoted as secondary: that’s a limitation of the HAtester.py script but “by design”; we favored faster reconnections and minimal checks on the node’s role for demonstration purposes. On a production application, this part ought to be implemented differently.
Testing Failure Scenarios
The fun part starts now! We leave it to you to test and play around to see what happens with the PostgreSQL cluster in practice following a failure. We leave as suggestions the tests we did in our presentation. For each failure scenario, observe how the cluster re-adjusts itself and the impact on read and write traffic.
1) Loss of Network Communication
- Unplug the network cable from one of the nodes (or simulate this condition in your VM):
- First from a replica
- Then from the primary
- Unplug the network cable from one replica and the primary at the same time:
- Does Patroni experience a split-brain situation?
2) Power Outage
- Unplug the power cable from the primary
- Wait until the cluster is re-adjusted then plug the power cable back and start the node
3) SEGFAULT
Simulate an OOM/crash by killing the postmaster process in one of the nodes with kill -9.
4) Killing Patroni
Remember that Patroni is managing PostgreSQL. What happens if the Patroni process (and not PostgreSQL) is killed?
5) CPU Saturation
Simulate CPU saturation with a benchmark tool such as Sysbench, for example:
sysbench cpu --threads=10 --time=0 run
This one is a bit tricky as the reads and writes are each single-threaded operation. You may need to decrease the priority of the HAtester.py processes with renice, and possibly increase that of Sysbench’s.
6) Manual Switchover
Patroni facilitates changes in the PostgreSQL hierarchy. Switchover operations can be scheduled, the command below is interactive and will prompt you with options:
sudo -u postgres patronictl -c /etc/patroni/config.yml switchover
Alternatively, you can be specific and tell Patroni exactly what to do:
sudo -u postgres patronictl -c /etc/patroni/config.yml switchover --master node1 --candidate node2 --force
We hope you had fun with this hands-on lab! If you have questions or comments, leave us a note in the comments section below!