Jan
11
2022
--

Creating a Standby Cluster With the Percona Distribution for PostgreSQL Operator

Standby Cluster With the Percona Distribution for PostgreSQL Operator

A customer recently asked if our Percona Distribution for PostgreSQL Operator supports the deployment of a standby cluster, which they need as part of their Disaster Recovery (DR) strategy. The answer is yes – as long as you are making use of an object storage system for backups, such as AWS S3 or GCP Cloud Storage buckets, that can be accessed by the standby cluster. In a nutshell, it works like this:

  • The primary cluster is configured with pgBackRest to take backups and store them alongside archived WAL files in a remote repository;
  • The standby cluster is built from one of these backups and it is kept in sync with the primary cluster by consuming the WAL files that are copied from the remote repository.

Note that the primary node in the standby cluster is not a streaming replica from any of the nodes in the primary cluster and that it relies on archived WAL files to replicate events. For this reason, this approach cannot be used as a High Availability (HA) solution. Even though the primary use of a standby cluster in this context is DR, it can be also employed for migrations as well.

So, how can we create a standby cluster using the Percona operator? We will show you next. But first, let’s create a primary cluster for our example.

Creating a Primary PostgreSQL Cluster Using the Percona Operator

You will find a detailed procedure on how to deploy a PostgreSQL cluster using the Percona operator in our online documentation. Here we want to highlight the main steps involved, particularly regarding the configuration of object storage, which is a crucial requirement and should better be done during the initial deployment of the cluster. In the following example, we will deploy our clusters using the Google Kubernetes Engine (GKE) but you can find similar instructions for other environments in the previous link.

Considering you have a Google account configured as well as the gcloud (from the Google Cloud SDK suite) and kubectl command-line tools installed, authenticate yourself with gcloud auth login, and off we go!

Creating a GKE Cluster and Basic Configuration

The following command will create a default cluster named “cluster-1” and composed of three nodes. We are creating it in the us-central1-a zone using e2-standard-4 VMs but you may choose different options. In fact, you may also need to indicate the project name and other main settings if you do not have your gcloud environment pre-configured with them:

gcloud container clusters create cluster-1 --preemptible --machine-type e2-standard-4 --num-nodes=3 --zone us-central1-a

Once the cluster is created, use your IAM identity to control access to this new cluster:

kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account)

Finally, create the pgo namespace:

kubectl create namespace pgo

and set the current context to refer to this new namespace:

kubectl config set-context $(kubectl config current-context) --namespace=pgo

Creating a Cloud Storage Bucket

Remember for this setup we need a Google Cloud Storage bucket configured as well as a Service Account created with the necessary privileges/roles to access it. The respective procedures to obtain these vary according to how your environment is configured so we won’t be covering them here. Please refer to the Google Cloud Storage documentation for the exact steps. The bucket we created for the example in this post was named cluster1-backups-and-wals.

Likewise, please refer to the Creating and managing service account keys documentation to learn how to create a Service Account and download the corresponding key in JSON format – we will need to provide it to the operator so our PostgreSQL clusters can access the storage bucket.

Creating the Kubernetes Secrets File to Access the Storage Bucket

Create a file named my-gcs-account-secret.yaml with the following structure:

apiVersion: v1
kind: Secret
metadata:
  name: cluster1-backrest-repo-config
type: Opaque
data:
  gcs-key: <VALUE>

replacing the <VALUE> placeholder by the output of the following command according to the OS you are using:

Linux:

base64 --wrap=0 your-service-account-key-file.json

macOS:

base64 your-service-account-key-file.json

Installing and Deploying the Operator

The most practical way to install our operator is by cloning the Git repository, and then moving inside its directory:

git clone -b v1.1.0 https://github.com/percona/percona-postgresql-operator
cd percona-postgresql-operator

The following command will deploy the operator:

kubectl apply -f deploy/operator.yaml

We have already prepared the secrets file to access the storage bucket so we can apply it now:

kubectl apply -f my-gcs-account-secret.yaml

Now, all that is left is to customize the storages options in the deploy/cr.yaml file to indicate the use of the GCS bucket as follows:

    storages:
      my-gcs:
        type: gcs
        bucket: cluster1-backups-and-wals

We can now deploy the primary PostgreSQL cluster (cluster1):

kubectl apply -f deploy/cr.yaml

Once the operator has been deployed, you can run the following command to do some housekeeping:

kubectl delete -f deploy/operator.yaml

Creating a Standby PostgreSQL Cluster Using the Percona Operator

After this long preamble, let’s look at what brought you here: how to deploy a standby cluster, which we will refer to as cluster2, that will replicate from the primary cluster.

Copying the Secrets Over

Considering you probably have customized the passwords you use in your primary cluster and that they differ from the default values found in the operator’s git repository, we need to make a copy of the secrets files, adjusted to the standby cluster’s name. The following procedure facilitates this task, saving the secrets files under /tmp/cluster1-cluster2-secrets (you can choose a different target directory):

NOTE: make sure you have the yq tool installed in your system.
mkdir -p /tmp/cluster1-cluster2-secrets/
export primary_cluster_name=cluster1
export standby_cluster_name=cluster2
export secrets="${primary_cluster_name}-users"
kubectl get secret/$secrets -o yaml \
| yq eval 'del(.metadata.creationTimestamp)' - \
| yq eval 'del(.metadata.uid)' - \
| yq eval 'del(.metadata.selfLink)' - \
| yq eval 'del(.metadata.resourceVersion)' - \
| yq eval 'del(.metadata.namespace)' - \
| yq eval 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration")' - \
| yq eval '.metadata.name = "'"${secrets/$primary_cluster_name/$standby_cluster_name}"'"' - \
| yq eval '.metadata.labels.pg-cluster = "'"${standby_cluster_name}"'"' - \
>/tmp/cluster1-cluster2-secrets/${secrets/$primary_cluster_name/$standby_cluster_name}

Deploying the Standby Cluster: Fast Mode

Since we have already covered the procedure used to create the primary cluster in detail in a previous section, we will be presenting the essential steps to create the standby cluster below and provide additional comments only when necessary.

NOTE: the commands below are issued from inside the percona-postgresql-operator directory hosting the git repository for our operator.

Deploying a New GKE Cluster Named cluster-2

This time using the us-west1-b zone here:

gcloud container clusters create cluster-2 --preemptible --machine-type e2-standard-4 --num-nodes=3 --zone us-west1-b
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account)
kubectl create namespace pgo
kubectl config set-context $(kubectl config current-context) --namespace=pgo
kubectl apply -f deploy/operator.yaml

Apply the Adjusted Kubernetes Secrets:

export standby_cluster_name=cluster2
export secrets="${standby_cluster_name}-users"
kubectl create -f /tmp/cluster1-cluster2-secrets/$secrets

The list above does not include the GCS secret file; the key contents remain the same but the backrest-repo pod name needs to be adjusted. Make a copy of that file:

cp my-gcs-account-secret.yaml my-gcs-account-secret-2.yaml

then edit the copy to indicate “cluster2-” instead of “cluster1-”:

name: cluster2-backrest-repo-config

You can apply it now:

kubectl apply -f my-gcs-account-secret-2.yaml

The cr.yaml file of the Standby Cluster

Let’s make a copy of the cr.yaml file we customized for the primary cluster:

cp deploy/cr.yaml deploy/cr-2.yaml

and edit the copy as follows:

1) Change all references (that are not commented) from cluster1 to cluster2  – including current-primary but excluding the bucket reference, which in our example is prefixed with “cluster1-”; the storage section must remain unchanged. (We know it’s not very practical to replace so many references, we still need to improve this part of the routine).

2) Enable the standby option:

standby: true

3) Provide a repoPath that points to the GCS bucket used by the primary cluster (just below the storages section, which should remain the same as in the primary cluster’s cr.yaml file):

repoPath: “/backrestrepo/cluster1-backrest-shared-repo”

And that’s it! All that is left now is to deploy the standby cluster:

kubectl apply -f deploy/cr-2.yaml

With everything working on the standby cluster, do some housekeeping:

kubectl delete -f deploy/operator.yaml

Verifying it all Works as Expected

Remember that the standby cluster is created from a backup and relies on archived WAL files to be continued in sync with the primary cluster. If you make a change in the primary cluster, such as adding a row to a table, that change won’t reach the standby cluster until the WAL file it has been recorded to is archived and consumed by the standby cluster.

When checking if all is working with the new setup, you can force the rotation of the WAL file (and subsequent archival of the previous one) in the primary node of the primary cluster to accelerate the sync process by issuing:

psql> SELECT pg_switch_wal();

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Dec
20
2021
--

PMM Now Supports Monitoring of PostgreSQL Instances Connecting With Any Database (Name)

Monitoring of PostgreSQL Instances

The recent release of Percona Monitoring and Management 2.25.0 (PMM) includes a fix for bug PMM-6937: before that, PMM expected all monitoring connections to PostgreSQL servers to be made using the default postgres database. This worked well for most deployments, however, some DBaaS providers like Heroku and DigitalOcean do not provide direct access to the postgres database (at least not by default) and instead create custom databases for each of their instances. Starting with this new release of PMM, it is possible to specify the name of the database the Postgres exporter should connect to, both through the pmm-admin command-line tool as well as when configuring the monitoring of a remote PostgreSQL instance directly in the PMM web interface.

Secure Connections

Most DBaaS providers enforce or even restrict access to their instances to SSL (well, in fact, TLS) client connections – and that’s a good thing! Just pay attention to this detail when you configure the monitoring of these instances on PMM. If when you configure your instance the system returns a red-box alert with the saying:

Connection check failed: pq: no pg_hba.conf entry for host “123.123.123.123”, user “pmm”, database “mydatabase”, SSL off.

and your firewall allows external connections to your database, chances are the problem is on the last part – “SSL off”.

Web Interface

When configuring the monitoring of a remote PostgreSQL instance in the PMM web interface, make sure you tick the box for using TLS connections:

PMM TLS connections

But note that checking this box is not enough; you have to complement this setting with one of two options:

a) Provide the TLS certificates and key, using the forms that appear once you click on the “Use TLS” check-box:

  • TLS CA
  • TLS certificate key
  • TLS certificate

The PMM documentation explains this further and more details about server and client certificates for PostgreSQL connection can be found in the SSL Support chapter of its online manual.

b) Opt to not provide TLS certificates, leaving the forms empty and checking the “Skip TLS” check-box:

PMM Skip TLS
What should rule this choice is the SSL mode the PostgreSQL server requires for client connections. When DBaaS providers enforce secure connections, usually this means sslmode=require and for this option b above is enough. Only when the PostgreSQL server employs the more restrictive verify-ca or verify-full modes you will need to with option a and provide the TLS certificates and key.

Command-Line Tool

When configuring the monitoring of a PostgreSQL server using the pmm-admin command-line tool, the same options are available and the PMM documentation covers them as well. Here’s a quick example to illustrate configuring the monitoring for a remote PostgreSQL server when sslmode=require and providing a custom database name:

$ pmm-admin add postgresql --server-url=https://admin:admin@localhost:443 --server-insecure-tls --host=my.dbaas.domain.com --port=12345 --username=pmmuser --password=mYpassword --database=customdb --tls --tls-skip-verify

Note that option server-insecure-tls applies to the connection with the PMM server itself; the options prefixed with tls are the ones that apply to the connection to the PostgreSQL database.

Tracking Stats

You may have also noticed in the release notes that PMM 2.25.0 also provides support for the release candidate of our own pg_stat_monitor. For now, however, it is unlikely this extension will be made available by DBaaS providers, so you need to stick with pg_stat_statements for now:

pg_stat_statements

Make sure this extension is loaded for your instance’s custom database, otherwise, PMM won’t be able to track many important stats. That is the case already for all non-hobby Heroku Postgres but for other providers like Digital Ocean it needs to be created beforehand:

defaultdb=> select count(*) from pg_stat_statements;
ERROR:  relation "pg_stat_statements" does not exist
LINE 1: select count(*) from pg_stat_statements;
                             ^
defaultdb=> CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION

defaultdb=> select count(*) from pg_stat_statements;
 count 
-------
    60
(1 row)

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Jun
11
2021
--

PostgreSQL HA with Patroni: Your Turn to Test Failure Scenarios

PostgreSQL HA with Patroni

A couple of weeks ago, Jobin and I did a short presentation during Percona Live Online bearing a similar title as the one for this post: “PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them”. We deployed a 3-node PostgreSQL environment with some recycled hardware we had lying around and set ourselves at “breaking” it in different ways: by unplugging network and power cables, killing main processes, attempting to saturate processors. All of this while continuously writing and reading data from PostgreSQL. The idea was to see how Patroni would handle the failures and manage the cluster to continue delivering service. It was a fun demo!

We promised a follow-up post explaining how we set up the environment, so you could give it a try yourselves, and this is it. We hope you also have fun attempting to reproduce our small experiment, but mostly that you use it as an opportunity to learn how a PostgreSQL HA environment managed by Patroni works in practice: there is nothing like a hands-on lab for this!

Initial Setup

We recycled three 10-year old Intel Atom mini-computers for our experiment but you could use some virtual machines instead: even though you will miss the excitement of unplugging real cables, this can still be simulated with a VM. We installed the server version of Ubuntu 20.04 and configured them to know “each other” by hostname; here’s how the hosts file of the first node looked like:

$ cat /etc/hosts
127.0.0.1 localhost node1
192.168.1.11 node1
192.168.1.12 node2
192.168.1.13 node3

etcd

Patroni supports a myriad of systems for Distribution Configuration Store but etcd remains a popular choice. We installed the version available from the Ubuntu repository on all three nodes:

sudo apt-get install etcd

It is necessary to initialize the etcd cluster from one of the nodes and we did that from node1 using the following configuration file:

$ cat /etc/default/etcd
ETCD_NAME=node1
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.11:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.11:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.11:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.11:2379"

Note how ETCD_INITIAL_CLUSTER_STATE is defined with “new”.

We then restarted the service:

sudo systemctl restart etcd

We can then move on to install etcd on node2. The configuration file follows the same structure as that of node1, except that we are adding node2 to an existing cluster so we should indicate the other node(s):

ETCD_NAME=node2
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.12:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.12:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.12:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.12:2379"

Before we restart the service, we need to formally add node2 to the etcd cluster by running the following command on node1:

sudo etcdctl member add node2 http://192.168.1.12:2380

We can then restart the etcd service on node2:

sudo systemctl restart etcd

The configuration file for node3 looks like this:

ETCD_NAME=node3
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380,node3=http://192.168.1.13:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.13:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.13:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.13:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.13:2379"

Remember we need to add node3 to the cluster by running the following command on node1:

sudo etcdctl member add node3 http://192.168.1.13:2380

before we can restart the service on node3:

sudo systemctl restart etcd

We can verify the cluster state to confirm it has been deployed successfully by running the following command from any of the nodes:

$ sudo etcdctl member list
2ed43136d81039b4: name=node3 peerURLs=http://192.168.1.13:2380 clientURLs=http://192.168.1.13:2379 isLeader=false
d571a1ada5a5afcf: name=node1 peerURLs=http://192.168.1.11:2380 clientURLs=http://192.168.1.11:2379 isLeader=true
ecec6c549ebb23bc: name=node2 peerURLs=http://192.168.1.12:2380 clientURLs=http://192.168.1.12:2379 isLeader=false

As we can see above, node1 is the leader at this point, which is expected since the etcd cluster has been bootstrapped from it. If you get a different result, check for etcd entries logged to /var/log/syslog on each node.

Watchdog

Quoting Patroni’s manual:

Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.

While the use of a watchdog mechanism with Patroni is optional, you shouldn’t really consider deploying a PostgreSQL HA environment in production without it.

For our tests, we used the standard software implementation for watchdog that is shipped with Ubuntu 20.04, a module called softdog. Here’s the procedure we used in all three nodes to configure the module to load:

sudo sh -c 'echo "softdog" >> /etc/modules'

Patroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):

sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'

These two steps looked like all that would be required for watchdog to work but to our surprise, the softdog module wasn’t loaded after restarting the servers. After spending quite some time digging around we figured the module was blacklisted by default and there was a strain file with such a directive still lingering around:

$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog
/lib/modprobe.d/blacklist_linux_5.4.0-72-generic.conf:blacklist softdog

Editing that file in each of the nodes to remove the line above and restarting the servers did the trick:

$ lsmod | grep softdog
softdog                16384  0

$ ls -l /dev/watchdog*
crw-rw---- 1 postgres postgres  10, 130 May 21 21:30 /dev/watchdog
crw------- 1 root     root     245,   0 May 21 21:30 /dev/watchdog0

PostgreSQL

Percona Distribution for PostgreSQL can be easily installed from the Percona Repository in a few easy steps:

sudo apt-get update -y; sudo apt-get install -y wget gnupg2 lsb-release curl
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb
sudo apt-get update
sudo percona-release setup ppg-12
sudo apt-get install percona-postgresql-12

An important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:

sudo systemctl disable postgresql

For our tests, we want to start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster, so we stop the server and remove the data directory that has been created as part of the PostgreSQL installation:

sudo systemctl stop postgresql
sudo rm -fr /var/lib/postgresql/12/main

These steps should be repeated in nodes 2 and 3 as well.

Patroni

The Percona Repository also includes a package for Patroni so with it already configured in the nodes we can install Patroni with a simple:

sudo apt-get install percona-patroni

Here’s the configuration file we have used for node1:

$ cat /etc/patroni/config.yml
scope: stampede
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: node1:8008

etcd:
  host: node1:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: 'on'
        max_wal_senders: 5
        max_replication_slots: 5
        wal_log_hints: "on"
        #archive_mode: "on"
        #archive_timeout: 600
        #archive_command: "cp -f %p /home/postgres/archived/%f"
        #recovery_conf:
        #restore_command: cp /home/postgres/archived/%f %p

  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 192.168.1.1/24 md5
  - host replication replicator 127.0.0.1/32 trust
  - host all all 192.168.1.1/24 md5
  - host all all 0.0.0.0/0 md5
#  - hostssl all all 0.0.0.0/0 md5

  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh
  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: "/var/lib/postgresql/12/main"
  bin_dir: "/usr/lib/postgresql/12/bin"
#  config_dir:
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: vagrant
    superuser:
      username: postgres
      password: vagrant
  parameters:
    unix_socket_directories: '/var/run/postgresql'

watchdog:
  mode: required # Allowed values: off, automatic, required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:

sudo systemctl restart patroni

When Patroni starts, it will take care of initializing PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of Patroni’s configuration file. If everything went according to the plan, you should be able to connect to PostgreSQL using the credentials in the configuration file (password is vagrant):

$ psql -U postgres
psql (12.6 (Ubuntu 2:12.6-2.focal))
Type "help" for help.

postgres=#

Repeat the operation for installing Patroni on nodes 2 and 3: the only difference is that you will need to replace the references to node1 in the configuration file (there are four of them, shown in bold) with the respective node name.

You can also check the state of the Patroni cluster we just created with:

$ sudo patronictl -c /etc/patroni/config.yml list
+----------+--------+-------+--------+---------+----+-----------+
| Cluster  | Member |  Host |  Role  |  State  | TL | Lag in MB |
+----------+--------+-------+--------+---------+----+-----------+
| stampede | node1  | node1 | Leader | running |  2 |           |
| stampede | node2  | node2 |        | running |  2 |         0 |
| stampede | node3  | node3 |        | running |  2 |         0 |
+----------+--------+-------+--------+---------+----+-----------+

node1 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes 2 and 3 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).

HAProxy

A common implementation of high availability in a PostgreSQL environment makes use of a proxy: instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. However, this is not a transparent process: the application needs to be aware of this and split read-only from read-write traffic itself. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:

  • Writes   ?  5000
  • Reads   ?  5001

HAproxy can be installed as an independent server (and you can have as many as you want) but it can also be installed on the application server or the database server itself – it is a light enough service. For our tests, we planned on using our own Linux workstations (which also run Ubuntu 20.04) to simulate application traffic so we installed HAproxy on them:

sudo apt-get install haproxy

With the software installed, we modified the main configuration file as follows:

$ cat /etc/haproxy/haproxy.cfg
global
    maxconn 100

defaults
    log    global
    mode    tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen primary
    bind *:5000
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 node1:5432 maxconn 100 check port 8008
    server node2 node2:5432 maxconn 100 check port 8008
    server node3 node3:5432 maxconn 100 check port 8008

listen standbys
    balance roundrobin
    bind *:5001
    option httpchk OPTIONS /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 node1:5432 maxconn 100 check port 8008
    server node2 node2:5432 maxconn 100 check port 8008
    server node3 node3:5432 maxconn 100 check port 8008

Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary. For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:

$ curl -s http://node1:8008
{"state": "running", "postmaster_start_time": "2021-05-24 14:50:11.707 UTC", "role": "master", "server_version": 120006, "cluster_unlocked": false, "xlog": {"location": 25615248}, "timeline": 1, "database_system_identifier": "6965869170583425899", "patroni": {"version": "1.6.4", "scope": "stampede"}}

We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas. We can test this in practice, let’s save the postgres user password in a file to facilitate the process:

echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass
echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass
chmod 0600 ~/.pgpass

We can then execute two read-requests to verify the round-robin mechanism is working as intended:

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
 192.168.1.13

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
 192.168.1.12

as well as test the writer access:

$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()"
 192.168.1.11

You can also check the state of HAproxy by visiting http://localhost:7000/ on your browser.

Workload

To best simulate a production environment to test our failure scenarios, we wanted to have continuous reads and writes to the database. We could have used a benchmark tool such as Sysbench or Pgbench but we were more interested in observing the switch of source server upon a server failure than load itself. Jobin wrote a simple Python script that is perfect for this, HAtester. As was the case with HAproxy, we run the script from our Linux workstation. Since it is a Python script, you need to have a PostgreSQL driver for Python installed to execute it:

sudo apt-get install python3-psycopg2
curl -LO https://raw.githubusercontent.com/jobinau/pgscripts/main/patroni/HAtester.py
chmod +x HAtester.py

Edit the script with the credentials to access the PostgreSQL servers (through HAproxy) if you are using different settings from ours. The only requirement for it to work is to have the target table created beforehand, so first connect to the postgres database (unless you are using a different target) in the Primary and run:

CREATE TABLE HATEST (TM TIMESTAMP);

You can then start two different sessions:

  1. One for writes:

    ./HAtester.py 5000
  2. One for reads:
    ./HAtester.py 5001

The idea is to observe what happens with database traffic when the environment experiences a failure; that is, how HAproxy will route reads and writes as Patroni adjusts the PostgreSQL cluster. You can continuously monitor Patroni from the point of view of the nodes by opening a session in each of them and running the following command:

sudo -u postgres watch patronictl -c /etc/patroni/config.yml list

To facilitate observability and better follow the changes in real-time, we used the terminal multiplexer Tmux to visualize all 5 sessions on the same screen:

  • On the left side, we have one session open for each of the 3 nodes, continuously running:

    sudo -u postgres watch patronictl -c /etc/patroni/config.yml list

    It’s better to have the Patroni view for each node independently because when you start the failure tests you will lose connection to a part of the cluster.

  • On the right side, we are executing the HAtester.py script from our workstation:
    • Sending writes through port 5000:

      ./HAtester.py 5000
    • and reads through port 5001:

      ./HAtester.py 5001

A couple of notes on the execution of the HAtester.py script:

  • Pressing Ctrl+C will break the connection but the script will reconnect, this time to a different replica (in the case of reads) due to having the Standbys group on HAproxy configured with round-robin balancing.
  • When a switchover or failover takes place and the nodes are re-arranged in the cluster, you may temporarily see writes sent to a node that used to be a replica and was just promoted as primary and reads send to a node that used to be the primary and was demoted as secondary: that’s a limitation of the HAtester.py script but “by design”; we favored faster reconnections and minimal checks on the node’s role for demonstration purposes. On a production application, this part ought to be implemented differently.

Testing Failure Scenarios

The fun part starts now! We leave it to you to test and play around to see what happens with the PostgreSQL cluster in practice following a failure. We leave as suggestions the tests we did in our presentation. For each failure scenario, observe how the cluster re-adjusts itself and the impact on read and write traffic.

1) Loss of Network Communication

  • Unplug the network cable from one of the nodes (or simulate this condition in your VM):
    • First from a replica
    • Then from the primary
  • Unplug the network cable from one replica and the primary at the same time:
    • Does Patroni experience a split-brain situation?

2) Power Outage

  • Unplug the power cable from the primary
  • Wait until the cluster is re-adjusted then plug the power cable back and start the node

3) SEGFAULT

Simulate an OOM/crash by killing the postmaster process in one of the nodes with kill -9.

4) Killing Patroni

Remember that Patroni is managing PostgreSQL. What happens if the Patroni process (and not PostgreSQL) is killed?

5) CPU Saturation

Simulate CPU saturation with a benchmark tool such as Sysbench, for example:

sysbench cpu --threads=10 --time=0 run

This one is a bit tricky as the reads and writes are each single-threaded operation. You may need to decrease the priority of the HAtester.py processes with renice, and possibly increase that of Sysbench’s.

6) Manual Switchover

Patroni facilitates changes in the PostgreSQL hierarchy. Switchover operations can be scheduled, the command below is interactive and will prompt you with options:

sudo -u postgres patronictl -c /etc/patroni/config.yml switchover

Alternatively, you can be specific and tell Patroni exactly what to do:

sudo -u postgres patronictl -c /etc/patroni/config.yml switchover --master node1 --candidate node2 --force


We hope you had fun with this hands-on lab! If you have questions or comments, leave us a note in the comments section below!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com