Dec
02
2022
--

New WAL Archive Module/Library in PostgreSQL 15

New WAL Archive Module:Library in PostgreSQL 15

PostgreSQL traditionally uses shell commands to achieve continuous WAL archiving, which is essential for backups and stable standby replication.  In the past, we blogged about the inefficiency in that design and how some of the backup tools like PgBackRest solve that problem. It is a well-known problem in the PostgreSQL community, and many discussions happened in the past about the same.

It is also unfortunate that PostgreSQL documentation gives an unsafe command string, employing a combination of the test and cp commands as an example:

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Even though the documentation mentions it as an example to explain how the archive_command executes shell scripts/commands, we keep seeing many users copying this approach, falling into this documentation trap, and running into archiving issues.

The good news is that PostgreSQL 15 came up with a set of features to address problems associated with WAL archiving. PostgreSQL can now do WAL archiving using loadable modules. Please refer to the following commit id for the implementation details: https://git.postgresql.org/gitweb/?p=postgresql.git;h=5ef1eefd7 for details. The ability to use an archive module/library is a revolutionary step in this area.

Specifying the archive_library

Specifying the archive_library can be performed when the system is up and running. Just send a signal SIGHUP afterward to get the configuration reloaded::

postgres=# alter system set archive_library='<Library>';
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)

It is important to keep an eye on PostgreSQL logs because anything going wrong will not be displayed on the psql session, but it will be there in the PostgreSQL logs:

2022-11-24 05:07:27.618 UTC [788] LOG:  received SIGHUP, reloading configuration files
2022-11-24 05:07:27.619 UTC [788] LOG:  parameter "archive_library" changed to "shell"
2022-11-24 05:07:27.620 UTC [1274] FATAL:  could not access file "shell": No such file or directory

We can remove the library specification if we want to use the shell commands as it was in PostgreSQL 14 or older.

ALTER SYSTEM SET archive_library='';
OR
ALTER SYSTEM RESET archive_library ;

And make sure that no archive_library specification is not in effect.

postgres=# show archive_library ;
archive_library
-----------------

(1 row)

At this stage, PostgreSQL will use the plain old archive_command for WAL archiving. And you may specify the same.

alter system set archive_command = '<shell command>';

So, in summary, the archive_command will be considered only if the archive_library spec is empty.

basic_archive: The sample archive library

PostgreSQL 15 provides a simple, sample archive library for a reference architecture, which is part of the contrib modules. If the contrib modules are already present, we can specify the archive_library like:

postgres=# ALTER SYSTEM SET archive_library = 'basic_archive';

But this library needs further input about where to place the archive files. Which can be specified using the module-specific parameter basic_archive.archive_directory.

Otherwise, we may start seeing messages as follows in PostgreSQL logs:

WARNING:  archive_mode enabled, yet archiving is not configured

You may want to refer to the official documentation also.

However, this archive library will be loaded only by the archiver process, not by the backend process of regular sessions. So any attempt to set the parameter or see the parameter can give you errors.

postgres=# ALTER SYSTEM SET basic_archive.archive_directory='/home/postgres/arch2';
ERROR:  unrecognized configuration parameter "basic_archive.archive_directory"

postgres=# show basic_archive.archive_directory;
ERROR:  unrecognized configuration parameter "basic_archive.archive_directory"

But regular sessions can LOAD the library if they want.

postgres=# LOAD 'basic_archive';
LOAD

postgres=# ALTER SYSTEM SET basic_archive.archive_directory='/home/postgres/arch2';                                                                                                                                                                             
ALTER SYSTEM

postgres=# show basic_archive.archive_directory;                                                                                                                                                                                             
basic_archive.archive_directory
---------------------------------

(1 row)

postgres=# SELECT pg_reload_conf();                                                                                                                                                                                                          
pg_reload_conf
----------------
t
(1 row)

postgres=# show basic_archive.archive_directory;
basic_archive.archive_directory
---------------------------------
/home/postgres/arch2
(1 row)

postgres=#

Here is a couple of important observations:

1) Instead of loading the library to a specific session, one might use other methods like shared_preload_libraries, but not advisable

2) The module-specific settings may not show up in pg_settings unless the module is loaded on the session.

If there are multiple archive libraries available, we can switch from one library to another when the system is live. But the archiver process will restart behind the scenes, and the new library specified will be loaded by the new archiver process. A message might appear in the PostgreSQL log like:

LOG:  restarting archiver process because value of "archive_library" was changed

if the attempt to archive the WAL by the basic_arhive fails, there will be corresponding ERROR entries in the PostgreSQL log.

2022-11-24 11:24:31.330 UTC [2323] ERROR:  could not create file "/home/postgres/arch2/archtemp.00000001000000000000007B.2323.1669289071330": No such file or directory
2022-11-24 11:24:31.330 UTC [2323] WARNING:  archiving write-ahead log file "00000001000000000000007B" failed too many times, will try again later

One who is closely observing the above error message might be wondering what this file: archtemp.00000001000000000000007B.2323.1669289071330, which the error message mentions, is. The name doesn’t appear to be a regular WAL segment file.  This happens because the basic_archive  module first creates a temporary file in the destination with a name like that, then it fsyncs the file, and then moves the file to its final name. This is to ensure better protection during crashes.

When the archiver is terminated by a signal (other than SIGTERM that is used as part of a server shutdown) or an error by the shell with an exit status greater than 125 (such as command not found), the archiver process aborts and it gets restarted by the postmaster. In such cases, the failures will be recorded in the

pg-stat-archiver

view. The errors in archive library execution also can be monitored using

pg-stat-archiver

.

postgres=# select * from pg_stat_archiver ;
-[ RECORD 1 ]------+------------------------------
archived_count     | 160
last_archived_wal  | 0000000100000000000000A1
last_archived_time | 2022-11-24 14:58:43.33668+00
failed_count       | 680
last_failed_wal    | 000000010000000000000087
last_failed_time   | 2022-11-24 14:49:51.716773+00
stats_reset        | 2022-11-15 08:58:34.939925+00

Advantage of basic_archive

The basic_archive module does pretty much the same thing as the shell command:

'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

But in a much better way. We already discussed one of the advantages:

It creates a temporary file, and it is fsynced and durably moved to the final archive file copy in the destination. This durability is something cp cannot provide. This drastically reduces the chance that a broken file in the archive destination can result in archive failures and sometimes even database recoverability.

Another important functional advantage of basic_archive is that it has a built-in feature to compare the file in the source (pg_wal) and archive destination when the same file exists. It compares the content of the files and verifies that they are exactly the same, and report back to the archiver a “success” so that the archiver process can continue to the next WAL segment. This also reduces the chance of archive failures. Because if the file is archived by the module, but the server crashes before recording it, PostgreSQL will try to archive the same WAL segment again. The new sample basic_archive module silently succeeds the second attempt if the file is copied and has the same content.

Additionally, the basic_archive module has a custom exception handler. This allows the archiver to treat an ERROR as a normal failure and avoid restarting the archiver again.

Potential risks/problems of using archive library

Since the archive modules are loaded into PostgreSQL, unlike shell commands executed by archive_command, they have access to server resources. So please pay attention and avoid modules from unknown sources and sources that are not trustable enough as they possess risk.

Module authors must also remember that the archiving callback runs in its transient memory context.  If the module needs to palloc() something that needs to stick around for a while, it will need to do so in a different memorycontext. One might refer to basic developer documentation and the source of base_archive source code.

The sample module: basic_archive may leave archtemp.* files behind in the archive directory if there were crashes or due to other strange failure cases. If such things happen, users may have to clean them before starting PostgreSQL again.

What can be expected in future

We can see a bright future with all these improvements, as the default archive_command  can become yet another archive module in future versions of PostgreSQL.

I wish the basic_archiveor new modules will come into existence with more features and configuration options like compression because Compression of PostgreSQL WAL Archives is Becoming More Important.

I would expect PostgreSQL support vendors to develop many more powerful libraries and add to the PostgreSQL ecosystem.

Database as a Service (DBaaS) vendors will be one of the biggest beneficiaries as they will have a very specific place and method for WAL archiving, which a module can handle in a much more efficient way.

I would expect the backup tools and methods to evolve. Now a backup tool can remain in demon mode, waiting for messages from the PostgreSQL archiver. walg_archive is an example in this direction.

I think it’s okay to expect restore_libary support also in future PostgreSQL.

Thanks to Community

Special thanks to Nathan Bossart, who took the major role in improving WAL archiving in PostgreSQL 15, and others like Robert Haas for active participation from the beginning till the final commit.

Thanks to Many others who were involved in the discussions and reviews, namely Julien Rouhaud, Andrey Borodin, Robert Haas, Stephen Frost, Magnus Hagander, David Steele, Fujii Masao, Michael Paquier, Alvaro Herrera, Tom Lane, Benoit Lobréau, Peter Eisentraut, Bharath Rupireddy, Kyotaro Horiguchi, Ian Lawrence Barwick, and many others.

Great to see that PostgreSQL 15 started addressing many of the areas of inefficiency, starting from the removal of the stats collector and progress in asynchronous I/O.

Nov
16
2022
--

Using MERGE To Make Your PostgreSQL More Powerful

MERGE To Make Your PostgreSQL More Powerful

MERGE was introduced to PostgreSQL in version 15 and it will have a major impact on the way you write queries. This will definitely change the way you can handle transaction logs and other similar updates.

Imagine that you have an existing table filled with useful data, say an inventory of goods your company sells, and from time to time you get data sets with either new products that need to be entered or inventory lists that may, or may not, have a new quantity of goods on hand. This is normally the type of work you would have to do in an application. The problem with using an application to do this is that the data has to be shifted back and forth, maybe several times, from the database to the application and that is a lot of overhead (pronounced, in this case, as ‘time’). So now all this work can be done with one command on the database server.

Both SQL Server and Oracle have this operator, it has been part of the SQL standard for a while, and PostgreSQL recently added it with the release of version 15. You get the power to INSERT, UPDATE, and DELETE all in one statement. While the syntax looks a little complex it is way beyond a simple ‘upsert’ as you will soon see.

Terminology

We will have two tables to process with MERGE. The table with the updates is called the source table and the main repository of the data is called the target table. The contents of the source table are compared to the target table and actions are made on the target table.  Rows are considered to match or not match between the two tables and actions are taken accordingly on these matches.

Example of how it works

Let’s start with two tables and one row of data.

test=# create table a (id int, x int, status char(10));
CREATE TABLE
test=# create table b (id int, x int, status char(10));
CREATE TABLE
test=# insert into a (id,x,status) values (1,1,'From a');
INSERT 0 1
test=# select * from a;select * from b;
id | x | status
----+---+------------
1 | 1 | From a
(1 row)

id | x | status
----+---+--------
(0 rows)

Now we can use MERGE in a query.  Remember as you look at this query below that table a has one row of data and table b has no rows of data. In this query, we INSERT records not on the target table and UPDATE those that are.

MERGE into b using a on a.id = b.id
when matched then 
     update set x = b.x + 1
when not matched then 
     insert (id,x,status) values (a.id,a.x,a.status);

The query instructs the server to match the two tables on the id columns. If there are rows that match then column x in table b is incremented by one. But if there are no matches (there will not be a match as table b is empty) then the contents of the row from table a are copied to table b.

If we look at the contents of both tables explicitly, we see that the two tables are now identical.

test=# select * from a; select * from b;
 id | x |   status
----+---+------------
  1 | 1 | From a
(1 row)


 id | x |   status
----+---+------------
  1 | 1 | From a
(1 row)

If we use EXPLAIN on the query we get the following:

test-# update set x = b.x + 1
test-# when not matched then insert (id,x,status) values (a.id,a.x,a.status);
                           QUERY PLAN
----------------------------------------------------------------
 Merge on b  (cost=0.00..1.02 rows=0 width=0)
   ->  Nested Loop Left Join  (cost=0.00..1.02 rows=1 width=25)
         Join Filter: (a.id = b.id)
         ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=19)
         ->  Seq Scan on b  (cost=0.00..0.00 rows=1 width=10)
(5 rows)


test=#

And note the update line needs to be update set x = b.x +1 not update set b.x =b.x + 1 as it will trigger a syntax error. Plus you can not update table a – only the target table.

Run MERGE again

Now run the MERGE query a second time and look at the contents of the two tables. This time the matched logic was executed and column x had its value increased from 1 to 2.

test=# select * from a; select * from b;
id | x | status
----+---+------------
1 | 1 | From a
(1 row)


id | x | status
----+---+------------
1 | 2 | From a
(1 row)


test=#

And yes, every time we rerun the MERGE query column x will be incremented.

DELETEs too!

DELETE will remove rows from the target table b if it finds a row that matches. Let’s change the query so that instead of bumping the value of a column we just delete the matching row in the target table.

test=# merge into b using a on a.id = b.id
test-# when matched then delete
test-# when not matched then insert (id,x,status) values (a.id,a.x,a.status);
MERGE 1
test=# select * from b;
 id | x | status
----+---+--------
(0 rows)

DOING nothing

One of our options is to DO NOTHING. Yup, we find that the record is already in the source table and just skip over it. Currently, table b has nothing in it after the DELETE example. So run the following query to populate the table and then again.

test=# MERGE into b using a on a.id = b.id
test-# when matched then
test-#    DO NOTHING
test-# when not matched then
test-#    insert (id,x,status) values (a.id,a.x,a.status);
MERGE 1
test=#

COMPLEXITY too!

You can make MERGE as complex as you need. In the following, you will see that we can act on different values of columns of the target table. We start by truncating the target table and then run the MERGE query to the first popular table b. Then we run it a second time and note how the x and status columns change. And when the MERGE query is run a third time the columns are updated again.

test=# truncate b;
TRUNCATE TABLE
test=# select * from b;
 id | x | status
----+---+--------
(0 rows)
merge into b
using a
on b.id = a.id
when matched AND b.x > 2 THEN
UPDATE SET x = b.x + a.x, status='updated+'
when matched and b.x = 1 THEN
UPDATE SET status = 'updated', x = 3
when not matched then
insert (id,x,status) values (a.id,a.x,a.status);

Now run the MERGE query once.

test=# select * from b;
 id | x |   status
----+---+------------
  1 | 1 | From a
(1 row)

Now run it again.

test=# select * from b;
 id | x |   status
----+---+------------
  1 | 3 | updated
(1 row)

And now a third time!

test=# select * from b;
 id | x |   status
----+---+------------
  1 | 4 | updated+
(1 row)

The logic can be as complex as you can stand.

Conclusion

MERGE is extremely powerful and even these very simple examples provide a glimpse of its use. Many years ago I was working on a project that had transaction logs from cash registers around the world that had to coalesce and MERGE would have been heavily used if it was available. Read through the documentation and this blog and I am sure you will start thinking about some of your more common queries and how they can benefit from the MERGE operator.

As your tables grow larger be sure to double-check that you are using indexes on the proper columns and use EXPLAIN to double-check.

Nov
15
2022
--

pgBackRest Restoration Scenarios

pgBackRest Restoration Scenarios

In one of our previous posts—pgBackRest: A Great Backup Solution and a Wonderful Year of Growth—we discussed the installation of pgBackRest and the backup, quick restore command. It would be ideal to have a restoration activity scheduled as per business needs so that backups are validated and provide critical metrics on recovery time. In this blog post, we will check out some restoration scenarios concerning the pgBackRest backup tool.

Scenarios

  • Restoring the backup on a different host
  • Restoring a single database from the Postgres cluster
  • Restoring the backup to a specific point in time
  • Restoring the backup on a different host to start as a streaming standby
  • Restoring only the delta changes

To proceed with testing the above-mentioned restoration scenarios, we need the following:

  • pgBackRest of the same version installed on source and target machines
  • Repository information i.e, the place where we store our backups could be S3, GCS, Azure buckets, or dedicated storage.
  • The target server should have access to the repository host via TLS or SSH

For testing the above-mentioned scenarios, I have a PostgreSQL primary server, two replica servers, a repository server, and a spare server.

My repository information: a FULL and INCR  backup is available in the repository.

postgres@pgBackRest:~$ pgBackRest --config=/etc/pgBackRest/pgBackRest.conf --log-level-console=detail --stanza=demo info
stanza: demo
    status: ok
    cipher: none

    db (current)
        wal archive min/max (11): 00000001000000000000001E/00000001000000000000002A

        full backup: 20221104-102722F
            timestamp start/stop: 2022-11-04 10:27:22 / 2022-11-04 10:27:32
            wal start/stop: 000000010000000000000028 / 000000010000000000000028
            database size: 37.6MB, database backup size: 37.6MB
            repo1: backup set size: 4.5MB, backup size: 4.5MB

         incr backup: 20221104-102722F_20221104-102807I
            timestamp start/stop: 2022-11-04 10:28:07 / 2022-11-04 10:28:12
            wal start/stop: 00000001000000000000002A / 00000001000000000000002A
            database size: 37.6MB, database backup size: 8.3KB
            repo1: backup set size: 4.5MB, backup size: 402B
            backup reference list: 20221104-102722F

Restoring the backup on a different host

In this scenario, we will test the backup by restoring it to the spare server. My spare server’s pgBackRest conf has the information about the repository host, repository path, repository host user, and required PostgreSQL version installed and access to the repository.

pgBackRest can be used entirely by command line parameters but having a configuration file has more convenience. Below is my spare server pgBackRest configuration file.

postgres@spareserver:~$ cat /etc/pgbackrest/pgbackrest.conf
[global]
# repo details
repo1-path=/pgrdbackups # repository storage derails
repo1-host=192.168.64.18 # repository  host address 
repo1-host-user=postgres # user details to access the repo 
[demo] 
pg1-path=/var/lib/postgresql/11/main  # data_dir path to restore the backup

I will go ahead and restore the backup available in my repository onto the spare server. By default, it restores the latest available backup set available in the repository.

postgres@spareserver:~$ pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info --stanza=demo  restore

2022-11-04 10:39:42.175 P00   INFO: restore command begin 2.41: --config=/etc/pgBackRest/pgBackRest.conf --exec-id=16557-1a840ffa --log-level-console=info --pg1-path=/var/lib/postgresql/11/main --repo1-host=192.168.64.18 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=demo
2022-11-04 10:39:42.959 P00   INFO: repo1: restore backup set 20221104-102722F_20221104-102807I, recovery will start at 2022-11-04 10:28:07
2022-11-04 10:39:50.104 P00   INFO: write /var/lib/postgresql/11/main/recovery.conf
2022-11-04 10:39:50.116 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-11-04 10:39:50.124 P00   INFO: restore size = 37.6MB, file total = 1535
2022-11-04 10:39:50.125 P00   INFO: restore command end: completed successfully (7954ms)

<change archive_commad on the spare server> and start the database 
pg_ctl -D <data_dir> start

We need to make sure the archive_command is altered to a different path or disabled to prevent Postgres from writing archives to the existing repo upon starting on the spare server.

Restoring a single database from the Postgres cluster

In this scenario, we will restore only a single database on the spare server. I have below mentioned databases on my primary database.

Primary server:

postgres=# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges
-----------+----------+----------+---------+---------+-----------------------
 mydb      | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 percona   | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres

postgres=# \l+ percona
                                             List of databases
  Name   |  Owner   | Encoding | Collate |  Ctype  | Access privileges |  Size  | Tablespace | Description
---------+----------+----------+---------+---------+-------------------+--------+------------+-------------
 percona | postgres | UTF8     | C.UTF-8 | C.UTF-8 |                   | 157 MB | pg_default

We will restore only the Percona database to my spare server.

postgres@spareserver:~$ pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info --stanza=demo  --db-include=percona restore
2022-11-04 10:58:05.869 P00   INFO: restore command begin 2.41: --config=/etc/pgBackRest/pgBackRest.conf --db-include=percona --exec-id=16647-4f3aa57d --log-level-console=info --pg1-path=/var/lib/postgresql/11/main --repo1-host=192.168.64.18 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=demo
2022-11-04 10:58:07.088 P00   INFO: repo1: restore backup set 20221104-105706F, recovery will start at 2022-11-04 10:57:06
2022-11-04 10:58:15.791 P00   INFO: write /var/lib/postgresql/11/main/recovery.conf
2022-11-04 10:58:15.805 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-11-04 10:58:15.816 P00   INFO: restore size = 187.4MB, file total = 1548
2022-11-04 10:58:15.817 P00   INFO: restore command end: completed successfully

Start the Postgres cluster on the spare server and check the Percona database.

postgres@spareserver:~$ pg_ctlcluster 11 main start

postgres@spareserver:~$ psql postgres
psql (14.5 (Ubuntu 2:14.5-3.jammy), server 11.17 (Ubuntu 2:11.17-3.jammy))
Type "help" for help.

postgres=# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges
-----------+----------+----------+---------+---------+-----------------------
 mydb      | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 percona   | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 |
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres

postgres=# \l+ percona
                                             List of databases
  Name   |  Owner   | Encoding | Collate |  Ctype  | Access privileges |  Size  | Tablespace | Description
---------+----------+----------+---------+---------+-------------------+--------+------------+-------------
 percona | postgres | UTF8     | C.UTF-8 | C.UTF-8 |                   | 157 MB | pg_default |

After restoring the specific database you will still get to see the other user databases as well. However, when a user will try to connect to them (i.e. apart from the database specified with –db-include option) the connection will not happen and the user will get a FATAL error like this: “FATAL:  relation mapping file “base/xx/pg_filenode.map” contains invalid data”.

Restoring the backup to a specific point in time

Point-in-time recovery is possible with the pgBackRest, Consider a table/database has been dropped and needs to get restored. In this situation, we need the timestamp of the drop event, pgBackRest backup, and archives.

On my primary node, I have a table named pitr in the Percona database. This table has dropped at timestamp 2022-11-04 14:24:32.231309+05:30.

percona=# \dt+ pitr
                             List of relations
 Schema | Name | Type  |  Owner   | Persistence |    Size    | Description
--------+------+-------+----------+-------------+------------+-------------
 public | pitr | table | postgres | permanent   | 8192 bytes |
(1 row)

percona=# select now();drop table pitr;
               now
----------------------------------
 2022-11-04 14:24:32.231309+05:30
(1 row)

DROP TABLE
percona=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/7B0162A0
(1 row)

By using the pgBackRest restore option with the recovery type command we can achieve Point-In-Time-Recovery, by default this type restores the archive to the end of the stream, In this scenario, we will specify the exact time stamp to restore the table, xid and lsn can also be provided to the type command.

pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info --stanza=demo  --db-include=percona --type=time "--target=2022-11-04 14:24:31" restore
2022-11-04 14:35:35.389 P00   INFO: restore command begin 2.41: --config=/etc/pgBackRest/pgBackRest.conf --db-include=percona --exec-id=17804-ca30e9a3 --log-level-console=info --pg1-path=/var/lib/postgresql/11/main --repo1-host=192.168.64.18 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=demo --target="2022-11-04 14:24:31" --type=time
2022-11-04 14:35:36.279 P00   INFO: repo1: restore backup set 20221104-105706F, recovery will start at 2022-11-04 10:57:06
2022-11-04 14:35:44.054 P00   INFO: write /var/lib/postgresql/11/main/recovery.conf
2022-11-04 14:35:44.069 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-11-04 14:35:44.084 P00   INFO: restore size = 187.4MB, file total = 1548
2022-11-04 14:35:44.086 P00   INFO: restore command end: completed successfully

Start the Postgres cluster and check the pitr table.

postgres@spareserver:~$ pg_ctlcluster 11 main start
percona=# \dt pitr
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | pitr | table | postgres

Restoring the backup on a different host to start as a streaming standby

This approach will be useful in case we need to rebuild a crashed standby node or add a new standby to the existing Postgres cluster. 

Primary server:

Two replica servers are connected to the primary server.

postgres=# select usename,client_addr,sync_state,state  from pg_catalog.pg_stat_replication
;
 usename |  client_addr  | sync_state |   state
---------+---------------+------------+-----------
 repl    | 192.168.64.16 | async      | streaming
 repl    | 192.168.64.17 | async      | streaming

Let’s go ahead and add another replica using the pgBackRest. 

postgres@spareserver:~/11/main$ pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info --stanza=demo --type=standby --recovery-option=primary_conninfo='user=repl password=<redacted> host=192.168.64.10 application_name=spare'  restore
2022-11-04 15:02:49.142 P00   INFO: restore command begin 2.41: --config=/etc/pgBackRest/pgBackRest.conf --exec-id=18057-b13cb8f4 --log-level-console=info --pg1-path=/var/lib/postgresql/11/main --recovery-option="primary_conninfo=user=repl password=test host=192.168.64.10 application_name=spare" --repo1-host=192.168.64.18 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=demo --type=standby
2022-11-04 15:02:49.642 P00   INFO: repo1: restore backup set 20221104-105706F, recovery will start at 2022-11-04 10:57:06
2022-11-04 15:02:57.767 P00   INFO: write /var/lib/postgresql/11/main/recovery.conf
2022-11-04 15:02:57.778 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-11-04 15:02:57.791 P00   INFO: restore size = 187.4MB, file total = 1548
2022-11-04 15:02:57.792 P00   INFO: restore command end: completed successfully

The replica server is up and running, Check the pg_stat_replication on the primary node.

postgres=# select usename,client_addr,sync_state,state  from pg_catalog.pg_stat_replication
;
 usename |  client_addr  | sync_state |   state
---------+---------------+------------+-----------
 repl    | 192.168.64.16 | async      | streaming
 repl    | 192.168.64.17 | async      | streaming
 repl    | 192.168.64.19 | async      | streaming
(3 rows)

Restoring only the delta changes

The delta option allows pgBackRest to automatically determine which files in the database cluster directory can be preserved and which ones need to be restored from the backup, Consider the situation as we already have a backup restored on a spare server and now we will perform the delta restore on the spare server again to get the delta data that has been backed up since the last restore.

postgres@spareserver:~$ pg_ctlcluster 11 main stop
postgres@spareserver:~$ pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --log-level-console=info --stanza=demo --delta  restore
2022-11-04 15:36:57.101 P00   INFO: restore command begin 2.41: --config=/etc/pgBackRest/pgBackRest.conf --delta --exec-id=18175-ac9d25dc --log-level-console=info --pg1-path=/var/lib/postgresql/11/main --repo1-host=192.168.64.18 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=demo
2022-11-04 15:36:58.130 P00   INFO: repo1: restore backup set 20221104-153559F, recovery will start at 2022-11-04 15:35:59
2022-11-04 15:36:58.136 P00   INFO: remove invalid files/links/paths from '/var/lib/postgresql/11/main'
2022-11-04 15:37:04.155 P00   INFO: write /var/lib/postgresql/11/main/recovery.conf
2022-11-04 15:37:04.168 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-11-04 15:37:04.181 P00   INFO: restore size = 202.2MB, file total = 2136

 

Overall, pgBackRest is an incredibly popular open-source database backup tool available for PostgreSQL. We also encourage you to try our product Percona Distribution for PostgreSQL, as pgBackRest tool is part of our distribution for backups.

Nov
09
2022
--

Securing Patroni REST API End Points Part 2: Using SSL Certificates

Securing Patroni REST API End Points SSL

In the first part of the article, we discussed how to achieve minimum security by adding a username and password to Patroni REST API and switching from HTTP traffic to HTTPS. In this article, we will be covering how to secure REST APIs using certificate authentication.

Certificate authentication is the only option if we want to prevent everyone else from accessing the “Safe” endpoints, which otherwise don’t require any credentials.

Enabling the client certificate verification

If we plan to use certificate authentication, the “password authentication:” section mentioned in part one is no longer required. Instead, we can just specify:

verify_client: required

The overall

restapi

  section of the patroni configuration will look like this:

…
restapi:
  listen: 0.0.0.0:8008
  connect_address: pg1:8008
  certfile: /etc/patroni/pg1.crt
  keyfile: /etc/patroni/pg1.key
  cafile: /etc/patroni/ca.crt
  verify_client: required
…

Now the client certificate will be verified by the REST API server of Patroni. So the patronictl should make the REST request with the appropriate certificate, which can be verified by the server.

The information required for the

patronictl

needs to be specified under

ctl

:. For example:

ctl:
  insecure: false
  certfile: /etc/patroni/pg1.crt
  keyfile: /etc/patroni/pg1.key
  cacert: /etc/patroni/ca.crt

Once the client certificate verification is enabled, every client should make an HTTPS request with a valid certificate to get a proper response from Patroni REST API:

curl -k https://pg1:8008/cluster  --cacert ~/ca.crt --cert ~/app.crt --key ~/app.key  | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   302    0   302    0     0   2202      0 --:--:-- --:--:-- --:--:--  2220
{
  "members": [
    {
      "name": "pg0",
      "role": "leader",
      "state": "running",
      "api_url": "https://pg0:8008/patroni",
      "host": "pg0",
      "port": 5432,
      "timeline": 46
    },
    {
      "name": "pg1",
      "role": "replica",
      "state": "running",
      "api_url": "https://pg1:8008/patroni",
      "host": "pg1",
      "port": 5432,
      "timeline": 46,
      "lag": 0
    }
  ]
}

HAProxy with certificates

HAProxy could be the most widely used connection router with Patroni.

Unlike password authentication, certificate authentication is not transparent to HAPproxy. Every REST API communication expects valid certificates. So the HAproxy (or any router) needs to go with proper certificates for its health checks.

As per the HAProxy documentation, it should be using a combined certificate which includes the certificate and its key. A combined certificate can be prepared by concatenating the files.

cat app.crt app.key > combined.crt

Now we can specify this certificate and CA certificate for the checks in the haproxy configuration file (haproxy.cfg). The same needs to be specified for each of the candidate patroni nodes as follows:

…
   server pg0 pg0:5432 maxconn 100 check check-ssl verify none port 8008 crt /etc/haproxy/combined.crt ca-file /etc/haproxy/ca.pem
   server pg1 pg1:5432 maxconn 100 check check-ssl verify none port 8008 crt /etc/haproxy/combined.crt ca-file /etc/haproxy/ca.pem
…

A complete sample HAproxy configuration file is available on GitHub for easy reference.

Important points to remember

1. Safe API without certificate verification

We have the option to allow “Safe” API without certificate verification. This can be done by changing the “verify_client” to “optional” instead of “required”.

…
restapi:
  …
  verify_client: optional
  …

2. Why is certificate authentication needed at all?

Generally, it will be a bit overkill because Patroni’s API cannot leak any data from the database. It could be sufficient to protect it by username and password. One of the reasons why certificate authentication may be considered is that the “Safe” endpoints (refer to the previous post) will be available without authentication, which can reveal the current topology of the cluster.

Moreover, some organizations insist on blanket “certificate authentication” everywhere.

3. How to change the certificates?

The SSL certificate will have an expiry, and every organization using the certificates needs to have a fool-proof mechanism to renew the certificate periodically before the one in use expires. In older versions of Patroni, the certificate changes require Patroni service restarts. This can be achieved after the switchover. However, a switchover will affect the existing connections to the primary.

New versions of Patroni (Patroni version 2.1.0 or later) can reload the certificates without restarting the Patroni service. Reference PR: https://github.com/zalando/patroni/pull/1887

With this improvement, we just need to signal the Patroni service with SIGHUP.

It will be a good idea to add SIGHUP to Patroni Service (systemd) so that any certificate change on the node can be triggered by a service reload which sends SIGHUP to Patroni. A sample file is available on GitHub.

4. A word of caution

The more complexity, the more the problems. Patroni being the High Availability framework, additional complexities can potentially lead to unavailability. For example, a certificate expiry can lead to failure in REST API communication, which, internally, Patroni and patronictl themselves are using. So I would recommend assessing the risk and additional complexity before implementing. The simpler, the better.

Unless your organization has a good system and procedures for managing the certificates and keys, staying away from complete certificate authentication can be beneficial.

Nov
04
2022
--

PostgreSQL: Are All NULLs the Same?

PostgreSQL Are All NULLs the Same

I was recently reading the PostgreSQL 15 release notes and stopped at “Allow unique constraints and indexes to treat NULL values as not distinct”. At first glance, it seems to be a minor change, but this feature may change many designs and make many lives a lot easier.

The first thing here is to understand what is and how a NULL works. Many authors describe NULL as how missing or unknown values are represented, and some authors dedicate a good amount of time to explaining and describing NULL, like Date and Darwen’s “A Guide to the SQL Standard“. Also, checking Wikipedia we see that “In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database”. 

Okay, NULL indicates that there is an unknown value in a column, but how does it work? Can we operate with NULLs? Well, we can operate with NULLs but we’ll probably have an unexpected result. For example, arithmetic operations involving NULL will return… NULL! If we have a SQL operation that adds 10 to NULL the result will be NULL. If we do “NULL – NULL” the result will be NULL, not zero! Also, a boolean comparison between two values involving NULL may not return TRUE or FALSE. What about comparisons, are there NULLs larger than others? Are there NULLs equal to others? See how tricky this NULL thing can get?

This is why that change caught my attention. If we compare NULLs they will be in essence all different, and it’s because we don’t know what they are. If I compare an unknown value with another unknown value, even though the result is unknown, in the database world they will be different, making NULLs unique by nature. This is how most of the databases operate (if not all of them). This brings a problem to many applications that need to operate on NULLs, especially the ones that try to have UNIQUE constraints using NULL columns. Because ALL NULLs are different, there will never be a collision and the UNIQUE constraint will be bogus. Many application designers have used the ZERO or EMPTY workaround, putting a zero or an empty value ‘’ to represent the absence of value .” Remember, empty isn’t NULL!

This feature changes the behavior here. It now allows NULLs to be treated equally for the sake of distinct comparison using “UNIQUE NULLS NOT DISTINCT”. No more workarounds are needed! But be careful as it’s still an unknown value and the operations are kept the same. For a much better explanation of how NULLs work, check the book mentioned above. You may see databases from another perspective after reading it!

Nov
02
2022
--

First PostgreSQL Conference Ever: What You’ve Missed at PGConf.EU 2022

PostgreSQL PGConf.EU 2022

I just became part of the PostgreSQL ecosystem and was really curious to get in touch with the community. What are the best ways to interact with a community? Mailing lists for sure, forums, chat channels, or if possible an in-person conference. So I checked the events site of PostgreSQL and found an event nearby — PGConf.EU in Berlin. After reaching out internally, I was told this is one of the biggest or even the biggest PostgreSQL conferences in the world. So it seems I’ve made a good decision.

I was worried and also a little scared about going to a conference after almost three years of not attending any conferences and not seeing too many people. It took me almost six hours including a delay of over two hours (thank you for traveling with Deutsche Bahn!) to make it to Berlin. I was really impressed and surprised when I finally made it to the Berlin Marriott Hotel. What a nice venue for a conference.

View inside the hotel from the 9th floor down to the first floor

The check-in already had a great surprise waiting for me. “Sorry, Mr. Wagner but we’re fully booked and instead of your regular room we can only give you an executive suite”. What a bummer, but I took it nevertheless. I went to the room and almost immediately fell asleep, excited for the next days to come.

Let the show begin

The first day of the conference started with way too much food due to a really great breakfast. After that, I was waiting for the first keynote and opening speaker to come. The room was really crowded and I was wondering what “sold-out” actually meant for a PostgreSQL conference. Magnus Hagander and Vik Fearing did a great job in starting, organizing, and hosting the event. They quickly unveiled that 604+ people had shown up for the conference — what an impressive number.

Picture showing the number 604 attendees

The welcome and opening took about 45 minutes and directly afterward we had the first conference talk of the week: “Efficient Graph Analysis with SQL/PGQ”. It was a super interesting but also a hard topic for me, as this presentation was super technical. I had a hard time following parts of the presentation as I completely missed terms or background information. I definitely plan to look at the slides again and get my feet wet with this topic.

During the first coffee break of the day, I was able to wander around and started to explore the conference venue/booths and I was also able to spot my hungry colleague Umair. Can you spot him in the picture below?

Picture of the lunch break area

The second half of the morning started for me with “Changelog vs Two-dimensional time – what works better?” from Hettie Dombrovskaya. It was a very inspiring presentation as you could feel the passion and deep knowledge of Hettie in every sentence. She basically tries to implement “Time travel”, as she said. She explained bitemporal functions and also the difference between effective time vs asserted time which finally lead to time regions — a small travel in time.

From there I went to “PostgreSQL at GitLab.com”. Given that I’m a huge fan of the communication and transparency policies of GitLab, I had very high expectations about this presentation and I wouldn’t be disappointed. Alexander Sosna and Jose Finotto presented the journey of GitLab using PostgreSQL and also some limitations they’ve hit and how to overcome them. It was interesting to see that a simple use-case split between CI and the rest of the data actually led to such dramatically improved statistics and usability.

Leaving the lunch break behind, I had the first chance to see Hans-Jürgen Schönig in action during his presentation about “Performance tips you have never seen before”. He’s the CEO of CYBERTEC and a very passionate person on- and off-stage. He really engaged with the crowd and was able to get a lot of people smiling whenever he gave some real-world examples while presenting his tips.

Jan Birk presented “Administering large scale PostgreSQL installations” and explained how the Danish Ministry of Education handles 400 clusters divided into 100 products. He really went down the route and explained how they use different tools and extensions such as pgbouncer, pgBadger, barman, patroni, and many more.

As Microsoft had a booth at the conference and was always talking about Citus, I was curious about what they do and joined “Distributed Postgres: How to build a multi-tenant SaaS app with Citus”. It was interesting to see how Charles Feddersen presented the capabilities of Citus on how to scale out as needed and also how resharding works along those lines. Starting small and scaling up whenever you need it is a very common use case nowadays. I’m happy that Citus was open-sourced and can now be developed also by the community. 

I basically stayed in the sponsor’s room, as I was also interested to learn what EDB has to offer when they talk about “EDB Postgres Distributed – Evolution of Replication and High Availability”. Stacy Scoggins presented the solution of EDB as an answer to HA for PostgreSQL. They’ve developed a very nice failover solution into PostgreSQL, which makes it almost transparent. The only cost factor that I’m seeing is the downside of vendor locking, as the code isn’t open. 

Social event

The first day closed up with a social event that took place at the “Alte Münze” — basically a former mint in Berlin_mitte on the banks of the Spree. The venue was nice and a lot of people made it to the event. Underlined by some music, hand food, and drinks, you were able to engage with all parts of the community and I was able to make new friends inside the PostgreSQL community. After a very long day, I finally made it back to the hotel after midnight, waiting for the next two days to come. 

The day after the social event

The second day of the conference started a bit slower. You could tell by looking into people’s faces and eyes during the breakfast and the first few sessions of the day, that for some the social event had taken a little longer — so basically it was an overall success.

Gregory Stark talked about “The Future of Observability in Postgres” and how he envisions standardizing those. He presented his ideas that are needed for the PostgreSQL core to make that happen. Having an open standard, that could easily be used by modern tools across the PostgreSQL ecosystem, sounds like a really great idea to me. 

Being new in the community, I had to join the talk from Melih Mutlu about “Takeaways from the First 6 Months of Hacking on Postgres”. He explained his journey and obstacles getting his first patch landed in the community. He also touched on the current development practices and tools the community is using – I personally chimed in here by asking the GitHub vs Mailing List question, which directly led to some controversial discussion. I’ll follow up on that topic with the community via the mailing list. My personal mantra here is “To improve is to change; to be perfect is to change often” by Winston Churchill.

Photo of the lunch and coffee break area

After the break and some coffee later it was time to learn more about “Neon, cloud-native storage backend for PostgreSQL” by Heikki Linnakangas. He explained the architecture and functionality of the serverless open source alternative to AWS Aurora for Postgres. It separates storage and compute and also distributes the data across a cluster of nodes. Coming with a strong storage background, also in distributed storage, this was a really interesting talk to me and I will definitely follow up with their ongoing work.

I mentioned in the first part that I’m not yet used to all the terms and functionality of PostgreSQL, so I took the chance to listen to Karen Jex on “Everything You Wanted to Know About Databases as a Developer but Were Too Afraid to Ask Your DBA”. This session was really helpful for everyone who is new to databases or PostgreSQL in general. Karen did a great job in going through the various important basic methodologies and functions. I personally took a lot out of this presentation and the slide deck might be helpful to more beginners out there.

Encryption is important and as I just saw a mail thread on TDE (Transparent Data Encryption) on the hackers mailing list, I was curious to learn what Stephen Frost had to say about “On the Road to TDE”. Stephen presented the various different ways to encrypt your data as well as what would work best for Postgres — spoiler: TDE. This is an ongoing effort for over three years already but the broad community is interested in encryption, so this definitely needs to get some hands working on it, to finally build it into Postgres. Let’s see if and how Percona can support this effort, as data encryption is nowadays super important to almost everyone.

Robert Haas presented “How PostgreSQL Resists Automated Management (And How to Fix This)”. He showed some ways external software would like to manage PostgreSQL but fails and what’s the causing factor for it. He also went into his future vision of how this scenario could be improved and implemented into PostgreSQL.

Keynotes

Now Berlin was united again (the room was split up into two meeting rooms during regular sessions) and we were able to listen to the keynotes from the main sponsors of the event. Hans-Jürgen Schönig, Stephen Frost, Marc Linster, and Chris Engelbert did a great job in representing their companies but also the community.

The presentation that stuck out the most to me was the one from Marc about “Accelerating Postgres’ success in the modern enterprise”. He prepared slides that underlay the overall success of PostgreSQL and where this is coming from. Also, the undeniable facts of StackOverflow, where Postgres finally reached the top of the mountain as being the most important database to developers.

The day concluded with Umair and I stepping out for dinner — ok just burgers in the mall next to the hotel — and spending some time getting to know each other better and discussing what we’ve seen and heard at the conference. We both agree that the conference was going great and that the community is highly engaged and further growing.

Also, some sightseeing couldn’t be missed overall before calling it a day.

Umair and Kai in front of the Berlin Gate

Time flies by — The last day of the conference

The last day of the conference started with Robert Haas again talking about “Moving pg_basebackup Forward”. You were able to feel his passion for this extension in every sentence. He showed us the newest features that made it into PostgreSQL 15 and, as such, the significant improvements to pg_basebackup. Two of which are the server-side compression and bucket targets. We also talked about parallel backup — and why this isn’t a good idea most of the time — and incremental backup. There is still a lot of work ahead but the agenda looks very promising.

You might have noticed in the second half that I was quite interested in the new Neon project, so I couldn’t resist joining “Why we build Neon” which was presented by the CEO Nikita Shamgunov. He very openly showed how they’ve built up the current company and what happened within the last year. Pitch decks to investors as well as the founding numbers were presented openly to the audience. Why does this all matter and why is serverless important? — Nikita was able to give good insights into this new startup. 

One of the gold nuggets of the conference has been this talk from Claire Giordano on “How to make your Postgres blog posts reach a ton more people”. She basically presented, in 45 minutes, a condensed version of her fourth training session in an interactive and interesting way to the audience. If there would be only one thing for you to remember from the talk, it’s “Empathy”, Claire said.

Slide showing empathy for your readers

If you want to know how to write great blog posts and reach more people, you have to take a look at her slides

During the social event, I was talking to Harald already about high availability, and I didn’t know he had a talk about that exact topic on Friday. So this talk was a must-have after our conversation on Tuesday. Harald talked about “reasonable availability” and why HA isn’t always a good idea and sometimes even leads to more downtime and outages compared to a simple single-node system. 

Before we reached the end of the conference, it was time for the lightning talks. 

Flip chart showing the list of lightning talks

I was only able to be present for the first half of the session, as I had to catch my train back home, but those lightning talks are usually known to be really good. Given the fact that people have to condense their message into five minutes, they leave out all the blur and filler words in between, so you directly learn about the important things people want to talk about.

Simon Riggs started by announcing that he has retired from actively contributing code to the community moving forward but he still wants to support the community wherever and whenever he can. He received very long applause and a standing ovation, and you could feel the thankfulness from the crowd for everything he’s done over the years for the project and the community.

I’m sad that I missed the talk from Umair about open source licenses but I learned it was a distilled version of these slides. So if you want to learn more about this topic, take a look at the slides or reach out directly to @pg_umair on Twitter. 

Conclusion

PGConf Europe in Berlin was a great conference, with the chance to learn a lot, meet new people, and make friends. If you ever consider joining a PostgreSQL conference, PGConf Europe should be put on the top of your list. I personally saw that the community is welcoming and open and everyone who is interested is encouraged to participate. As this was my first PostgreSQL conference — ever — this is just the beginning of my journey. I’ll be back…

Nov
01
2022
--

Securing Patroni REST API End Points – Part 1

Securing Patroni REST API End Points

In recent years, Patroni emerged as the number one HA framework for PostgreSQL, currently with 5K stars in its git repository. We blogged about how some of the extraordinary capabilities of Patroni to solve problems like Logical Replication Slot Failover.

One outstanding feature of Patroni is its powerful set of REST APIs. These APIs can be used for further automation, monitoring/healthchecks, and for integration with other systems. At the very least, the same APIs are useful for HAProxy integration for automatic connection routing, which is widely used.

Since the Patroni APIs won’t expose any data inside the PostgreSQL database and generally inbound traffic to database hosts is secured in most organizations, a large percentage of the Patroni users are happy with the default REST APIs over HTTP without any additional authentication. However, that may not be the case if the database hosts are potentially exposed to a broader network. A few organizations have blanket rules for raising any HTTP traffic as red flags. Sometimes, additional certificate verifications also become mandatory as per organization policies.

Safe and unsafe APIs

Patroni APIs can be broadly classified as Safe and Unsafe APIs.

Those APIs used for understanding the current state and topology of the Patroni cluster are widely used by monitoring and integration with other tools like load balancers or connection routers. They are basically “GET” requests. Those are considered “Safe APIs”.

There are other sets of APIs that can be used for administering the Patroni. They can be used for changing some of the PostgreSQL parameters, Triggering a switchover, etc. They are PUT, POST, PATCH, and DELETE requests, which can affect the state of the nodes. Those are considered unsafe APIs.

Protecting unsafe APIs

In this article, I am going to use the REST API : /reload as an example of an unsafe API that reloads the configuration.

This is the bare minimum to be achieved if traffic to the database/patroni host machine is not restricted by other means. Patroni provides a simple username and password mechanism by which all unsafe APIs can be protected.

Overall, the restapi: section of patroni configuration (yaml file) can have an entry like:

...
restapi:
  listen: 0.0.0.0:8008
  connect_address: pg1:8008
  authentication:
    username: patroniAdmin
    password: adminSecretPswd
...

An external unsafe request will result in an error:

$ curl -X POST http://pg1:8008/reload
no auth header received

But a Patroni admin can make a request by specifying the username and password:

$ curl -u patroniAdmin:adminSecretPswd -X POST http://pg1:8008/reload
reload scheduled

A request with the wrong credentials will be rejected:

$ curl -u patroniAdmin:adminSecretPswd0 -X POST http://pg0:8008/reload
not authenticated,

Unlike the curl command, patronictl from the database/patroni host will continue to function as normal (without prompting for passwords) because it has access to the password from the patroni configuration.

Caveats:

All nodes should select either HTTP or HTTPS. Mix and match of protocols can result in patronictl failures like:

Reload request received for member pg0 and will be processed within 10 seconds
Failed: reload for member pg1, status code=401, (no auth header received)

There is no restriction like each node having the same username and password. However, if passwords are different, patronictl from other nodes will fail with the message:

Reload request received for member pg0 and will be processed within 10 seconds
Failed: reload for member pg1, status code=401, (not authenticated)

So it is safe to use the same credentials across all nodes.

Switch to HTTPS

Patroni automatically switches to HTTPS if a certificate file is specified.  Please see the below sample configuration.

...
restapi:
  listen: 0.0.0.0:8008
  connect_address: pg0:8008
  authentication:
    username: patroniAdmin
    password: adminSecretPswd
  certfile: /etc/patroni/pg0.crt
  keyfile: /etc/patroni/pg0.key
  cafile: /etc/patroni/ca.crt
...

The meaning of each of these parameters is part of official Patroni documentation.

  • certfile: (optional): Specifies the file with the certificate in the PEM format. If the certfile is not specified or is left empty, the API server will work without SSL.
  • keyfile: (optional): Specifies the file with the secret key in the PEM format.
  • keyfile_password: (optional): Specifies a password for decrypting the keyfile.
  • cafile: (optional): Specifies the file with the CA_BUNDLE with certificates of trusted CAs to use while verifying client certs.

This change requires a restart of the Patroni service. Once restarted, Patroni comes up with listening to HTTPS on port 8008 instead of an HTTP request.

Obviously, the cafile (CA Certificate) is required for the client to verify the server certificate. The client, including the patronictl, can choose not to verify the server certificates.

This patronictl behavior can be specified using another

ctl

section in Patroni configuration:

…
ctl:
  insecure: true
…

However, please expect warning messages from python libraries about skipping the certificate verification of patroni server, as follows.

Reload request received for member pg0 and will be processed within 10 seconds
2022-10-18 14:40:31,755 - WARNING - /usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py:1052: InsecureRequestWarning: Unverified HTTPS request is being made to host 'pg1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,

Reload request received for member pg1 and will be processed within 10 seconds

So, it is always good to have the server certificate verification on the client side.

Please remember that only the “Unsafe APIs” are protected using password authentication. Safe APIs will continue to function as it is. So the impact of this change on other tools like HAProxy which use those safe APIs is almost nil.

$ curl -k https://pg0:8008/cluster | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   302    0   302    0     0   2659      0 --:--:-- --:--:-- --:--:--  2672
{
  "members": [
    {
      "name": "pg0",
      "role": "replica",
      "state": "running",
      "api_url": "https://pg0:8008/patroni",
      "host": "pg0",
      "port": 5432,
      "timeline": 39,
      "lag": 0
    },
    {
      "name": "pg1",
      "role": "leader",
      "state": "running",
      "api_url": "https://pg1:8008/patroni",
      "host": "pg1",
      "port": 5432,
      "timeline": 39
    }
  ]
}

In this blog post, we discussed setting up password authentication and changing the REST API communication to HTTPS, which should safeguard the REST APIs and satisfy many organizations’ requirements. In the next part of this, we shall discuss how to set up a certificate authentication instead of password authentication and a few other aspects of security.

Oct
13
2022
--

Run PostgreSQL in Kubernetes: Solutions, Pros and Cons

Run PostgreSQL in Kubernetes

PostgreSQL’s initial release was in 1996 when cloud-native was not even a term. Right now it is the second most popular relational open source database according to DB-engines. With its popularity growth and the rising trend of Kubernetes, it is not a surprise that there are multiple solutions to run PostgreSQL on K8s.

In this blog post, we are going to compare these solutions and review the pros and cons of each of them. The solutions under our microscope are:

  1. Crunchy Data PostgreSQL Operator (PGO)
  2. CloudNative PG from Enterprise DB
  3. Stackgres from OnGres
  4. Zalando Postgres Operator
  5. Percona Operator for PostgreSQL

The summary and comparison table can be found in our documentation.

Crunchy Data PGO

Crunchy Data is a company well-known in the PostgreSQL community. They provide a wide range of services and software solutions for PG. Their PostgreSQL Operator (PGO) is fully open source (Apache 2.0 license), but at the same time container images used by the operator are shipped under Crunchy Data Developer Program. This means that you cannot use the Operator with these images in production without the contract with Crunchy Data. Read more in the Terms of Use.

Deployment

According to the documentation, the latest version of the operator is 5.2.0, but the latest tag in Github is 4.7.7. I was not able to find which version is ready for production, but I will use a quickstart installation from the GitHub page, which installs 5.2.0. The quick start is not that quick. First, you need to fork the repository with examples: link.

Executing these commands failed for me:

YOUR_GITHUB_UN="<your GitHub username>"
git clone --depth 1 "git@github.com:${YOUR_GITHUB_UN}/postgres-operator-examples.git"
cd postgres-operator-examples

Cloning into 'postgres-operator-examples'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

I just ended up cloning the repo with 

git clone --depth 1 https://github.com/spron-in/postgres-operator-examples

Ran kustomize script which failed as well:

$ kubectl apply -k kustomize/install
error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/percona/postgres-operator-examples/kustomize/install'

The instructions on the documentation page have other commands, so I used them instead. As a person who loves open source, I sent a PR to fix the doc on Github. 

kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default

Now Operator is installed. Install the cluster:

kubectl apply -k kustomize/postgres/

Features

PGO operator is used in production by various companies, comes with management capabilities, and allows users to fine-tune PostgreSQL clusters.

No need to go through the regular day-two operations, like backups and scaling. The following features are quite interesting:

  • Extension Management. PostgreSQL extensions expand the capabilities of the database. With PGO, you can easily add extensions for your cluster and configure them during bootstrap. I like the simplicity of this approach.
  • User / database management. Create users and databases during cluster initialization. This is very handy for CICD pipelines and various automations.
  • Backup with annotations. Usually, Operators come with a separate Custom Resource Definition for backups and restores. In the case of PGO, backups, and restores are managed through annotations. This is an antipattern but still follows the declarative form.

CloudNative PG

This operator was maturing in EnterpriseDB (EDB) to be finally open-sourced recently. It is Apache-licensed and fully open source, and there is an EDB Postgres operator, which is a fork based on CloudNative PG. The Enterprise version has some additional features, for example, support for Red Hat OpenShift.

Deployment

Using quickstart, here is how to install the Operator:

kubectl apply -f \  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.17/releases/cnpg-1.17.0.yaml

It automatically creates

cnpg-system

namespace and deploys necessary CRDs, service accounts, and more.

Once done, you can deploy the PostgreSQL cluster. There are multiple exampolary YAMLs.

kubectl apply -f https://cloudnative-pg.io/documentation/1.17/samples/cluster-example.yaml

There is also a helm chart available that can simplify the installation even more.

Features

CloudNative PG comes with a wide range of regular operational capabilities: backups, scaling, and upgrades. The architecture of the Operator is quite interesting:

  • No StatefulSets. Normally, you would see StatefulSets used for stateful workloads in Kubernetes. Here PostgreSQL cluster is deployed with standalone Pods which are fully controlled by the Operator.
  • No Patroni. Patroni is a de-facto standard in the PostgreSQL community to build highly available clusters. Instead, they use Postgres instance manager.
  • Barman for backups. Not a usual choice as well, but can be explained by the fact that pgBarman, a backup tool for PostgreSQL, was developed by the 2nd Quadrant team which was acquired by EDB.

Apart from architecture decisions, there are some things that I found quite refreshing:

  • Documentation. As a product manager, I’m honestly fascinated by their documentation. It is very detailed, goes deep into details, and is full of various examples covering a wide variety of use cases. 
  • The custom resource which is used to create the cluster is called “Cluster”. It is a bit weird, but running something like kubectl get cluster is kinda cool.
  • You can bootstrap the new cluster, from an existing backup object and use streaming replication from the existing PostgreSQL cluster, even from outside Kubernetes. Useful for CICD and migrations.

Stackgres

OnGres is a company providing its support, professional, and managed services for PostgreSQL. The operator – Stackgres – is licensed under AGPL v3.

Deployment

Installation is super simple and described on the website. It boils down to a single command:

kubectl apply -f 'https://sgres.io/install'

This will deploy the web user interface and the operator. The recommended way to deploy and manage clusters is through the UI. Get the login and password:

kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "username = %s\n" (.data.k8sUsername | base64decode) }}'
kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "password = %s\n" (.data.clearPassword | base64decode) }}'

Connect to the UI. You can either expose the UI through a LoadBalancer or with Kubernetes port forwarding:

POD_NAME=$(kubectl get pods --namespace stackgres -l "app=stackgres-restapi" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward ${POD_NAME} --address 0.0.0.0 8443:9443 --namespace stackgres

Deployment of the cluster in the UI is quite straightforward and I will not cover it here.

Features

UI allows users to scale, backup, restore, clone, and perform various other tasks with the clusters. I found it a bit hard to debug issues. It is recommended to set up a log server and debug issues on it, but I have not tried it. But the UI itself is mature, flexible, and just nice!

Interesting ones:

  • Experimental Babelfish support that enables the migration from MSSQL to save on license costs.
  • Extension management system, where users can choose the extension and its version to expand PG cluster capabilities.

  • To perform upgrades, Vacuum, and other database activities, the Operator provides Database Operation capability. It also has built-in benchmarking, which is cool!

Zalando Postgres Operator

Zalando is an online retailer of shoes, fashion, and beauty. It is the only company in this blog post that is not database-focused. They open-sourced the Operator that they use internally to run and manage PostgreSQL databases and it is quite widely adopted. It is worth mentioning that the Zalando team developed and open-sourced Patroni, which is widely adopted and used.

Deployment

You can deploy Zalando Operator through a helm chart or with kubectl. Same as with Stackgres, this Operator has a built-in web UI.

Helm chart installation is the quickest and easiest way to get everything up and running:

# add repo for postgres-operator
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator

# install the postgres-operator
helm install postgres-operator postgres-operator-charts/postgres-operator

# add repo for postgres-operator-ui
helm repo add postgres-operator-ui-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator-ui

# install the postgres-operator-ui
helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui

Expose the UI:

kubectl port-forward svc/postgres-operator-ui 8081:80

Connect to the UI and create the cluster. 

Features

This is one of the oldest PostgreSQL Operators, over time its functionality was expanding. It supports backups and restores, major version upgrades, and much more. Also, it has a web-based user interface to ease onboarding.

  • The operator heavily relies on Spilo – docker image that provides PostgreSQL and Patroni bundled together. It was developed in Zalando as well. This is a centerpiece to build HA architecture.
  • As Zalando is using AWS for its infrastructure, the operator is heavily tested and can be integrated with AWS. You can see it in some features – like live volume resize for AWS EBS or gp2 to gp3 migration.

Percona Operator for PostgreSQL

Percona is committed to providing software and services for databases anywhere. Kubernetes is a de-facto standard for cloud-native workloads that helps with this commitment.

What are the most important things about our Operator:

  • Fully open source
  • Supported by the community and Percona team. If you have a contract with Percona, you are fully covered with our exceptional services.
  • It is based on the Crunchy Data PGO v 4.7 with enhancements for monitoring, upgradability, and flexibility

Deployment

We have quick-start installation guides through helm and regular YAML manifests. The installation through helm is as follows:

Install the Operator:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm install my-operator percona/pg-operator --version 1.3.0

Deploy PostgreSQL cluster:

helm install my-db percona/pg-db --version 1.3.0

Features

Most of the features are inherited from Crunchy Data – backups, scaling, multi-cluster replication, and many more. 

    • Open Source. Compared to Crunchy Data PGO, we do not impose any limitations on container images, so it is fully open source and can be used without any restrictions in production. 
    • Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. Percona Operators come with an integration with PMM, so that users get full visibility into the health of their databases. 
    • Automated Smart Upgrades. Our Operator not only allows users to upgrade the database but also does it automatically and in a safe, zero-downtime way.
    • One-stop shop. Today’s enterprise environment is multi-database by default. Percona can help companies run PostgreSQL, MySQL, and MongoDB databases workloads over Kubernetes in a comprehensive manner.

To keep you excited, we are working on version two of the operator. It will have an improved architecture, remove existing limitations for backups and restores, enable automated scaling for storage and resources, and more. This quarter we plan to release a beta version, keep an eye on our releases.

Conclusion

PostgreSQL in Kubernetes is not a necessary evil, but an evolutionary step for companies who chose k8s as their platform. Choosing a vendor and a solution – is an important technical decision, which might impact various business metrics in the future. Still confused with various choices? Please start a discussion on the forum or contact our team directly.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Oct
11
2022
--

Don’t Forget to Tune Stats Collector for PostgreSQL 14 and Older

Tune Stats Collector for PostgreSQL 14 and Older

Recently I blogged about a significant improvement in PostgreSQL 15: PostgreSQL 15: Stats Collector Gone? What’s New? While there is great cheer for this upcoming improvement, we could see a few comments about “inefficiencies” in previous versions.

That brought me to the realization that even though the feature to tune stats collector is part of the official documentation and recommendations, and there were many blog posts in the past about it, I rarely see someone trying to tune it in practice.  So I feel it is worth a reminder again.

Option for PostgreSQL 14 and older

Simply, moving the directory in which the stats files are generated (stats_temp_directory) to a location in RAMFS or TEMPFS, which are RAM-based filesystems, can save the IO overhead while making the stats collector much more efficient and accurate.

PostgreSQL documentation also officially advises so to reduce the impact.

For better performance, stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O requirements. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory so that statistics can be retained across server restarts.

Reference: PostgreSQL Documentation

How much space is required?

The current location where stats files are generated can be checked by inspecting the value of stats_temp_directory.

On Red Hat clones, the default location will be within the data directory.

postgres=# show stats_temp_directory ;
 stats_temp_directory 
----------------------
 pg_stat_tmp

and on Debian/Ubuntu, it will be in /var/run/postgresql, for example:

postgres=# show stats_temp_directory ;
          stats_temp_directory           
-----------------------------------------
 /var/run/postgresql/14-main.pg_stat_tmp

Once the location is identified, it is just a matter of checking the size of that directory using

df -h.

Generally, it won’t be higher than a few hundred MBs. The size depends on the number of databases and objects (tables and indexes) within those databases. More importantly, what are the statistics collected, which is controlled by parameters like

track_activities

,

track_activity_query_size

,

track_commit_timestamp

,

track_counts

,

track_functions

  and

track_io_timing

  as mentioned in PostgreSQL Documentation.

Ramfs or tempfs?

There are two main RAM-based filesystems: ramfs and tempfs.

A ramfs can be mounted using a /etc/fstab entry like

ramfs /var/lib/pgsql_stats_ram ramfs size=1G,uid=postgres,gid=postgres 0 0

However, there are a couple of disadvantages. Even if we specify the uid and gid, as shown above, the

ramfs

will be mounted as root :(. and we need a script or method to change ownership or grant permission to the “postgres” account under which PostgreSQL will be running.

Another problem with ramfs is that we won’t be able to see the size using the df command. But there is an advantage also: ramfs is dynamically resizable, and it can grow dynamically as needed.  This eliminates the problem of estimating the size required. However, ramfs cannot use swap if required, so there is a slight risk of a system hang, especially on those systems with high memory constraints.

Considering the risk and demerits, ramfs is less popular, and tempfs is the one which is generally used.

Here is a sample /etc/fstab entry for tempfs:

tmpfs /var/lib/pgsql_stats_tmp tmpfs size=1G,uid=postgres,gid=postgres 0 0

Once this filesystem is mounted, it will appear as a regular filesystem, unlike ramfs:

$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 1.0G 0 1.0G 0% /var/lib/pgsql_stats_tmp

And it is now a matter of instructing PostgreSQL to use this location as the stats_temp_directory. Please remember that this is a dynamic parameter and doesn’t need a PostgreSQL to bounce.

ALTER SYSTEM SET stats_temp_directory = '/var/lib/pgsql_stats_tmp';

We just need to signal (SIGHUP) the PostgreSQL to reload the new configuration.

select pg_reload_conf();

Another option is to make use of /dev/shm. However, it is recommended to maintain a subdirectory that is owned by the Postgres user account with strict permissions.

What are the options for DBaaS?

Restrictions imposed by Database as a Service (DBaaS) generally prevent the users from even starting to estimate the RAM-based filesystem required. Some of the cloud vendors like AWS give instructions on how to setup ramdisk for their DBaaS solution (RDS). Similar documentation exists for the Aurora offering, also. However, I couldn’t see any doc for size estimation.

Many other cloud vendors who offer DBaaS solutions are entirely silent about the stats_temp_directory.

Additional note

Taking the stat_temp_directory outside of the data directory needs additional care if we plan to run multiple instances in the same host machine. Each PostgreSQL instance/cluster needs to have its own stat_temp_directory.

Oct
06
2022
--

PostgreSQL 15 – New Features to Be Excited About

PostgreSQL 15 - New Features to be Excited About

After four beta releases and one Release Candidate, the RC2 for PostgreSQL 15 was announced earlier today. If all goes well – fingers crossed! – this release will be promoted to General Availability (or GA) on October 13.

PostgreSQL Global Development Group (PGDG) has been really good about releasing one major version and four minor versions each year. As always, this year’s major version, i.e. v15, has loads of new features and enhancements to offer. I describe below some of the features that I find especially exciting.

MERGE

Part of the SQL Standard and focusing on making the developer’s job easier, the MERGE command is a very useful enhancement, making it simple to conditionally insert, update, or delete data. The graphic below is a simple representation of the statement.

MERGE command PostgreSQL 15

The business logic that would have otherwise required many Lines of Code (LOC) is made simple with this conditional statement. Reducing the LOC count also reduces the maintenance overhead over the long term. MERGE has been around in Oracle and SQL Server for a while, and an interesting benefit that the implementation in PostgreSQL brings is that it makes it easier for people to move their SQL code from Oracle to PostgreSQL.

Row and column filters in logical replication

Logical Replication was added to core PostgreSQL in version 10. Since then, it has taken strides and added lots of enhancements and functionality in-core. Before v10, logical replication could only be achieved with the help of the extension, pglogical.

With v15, logical replication adds the much-awaited feature of row and column-level filters.

Row and column filters in logical replication PostgreSQL 15

Row and column level filters help you selectively replicate data from your primary database to your standby. This selective replication is not just helpful for performance gains (in case not all data needs to be replicated). It can also be used for use cases like security and geo-fencing of data for regulatory compliance, implementation of multi-tenancy, and stripping off personally identifiable information for reporting purposes.

Server-side compression and client decompression in pg_basebackup

pg_basebackup is a powerful utility that takes the physical backup of an entire PostgreSQL cluster, thus making point-in-time recovery as well as starting off a new backup cluster simpler and faster. With PostgreSQL 15, pg_basebackup now supports server-side compression as well as client decompression enabling efficient storage of data. Compression algorithms supported include lz4, which is perhaps the fastest lossless compression, thereby adding performance efficiency to the process.

Logging format – jsonlog

The PostgreSQL 15 release adds the option to format your database logs in JSON, thereby making the logs compatible with the most popular structure among technologists. This “structured log” can also be used by other utilities for storage and analysis.

Performance enhancements

There has been a string of performance improvements over the past few years in PostgreSQL targeting both local and distributed workloads. While some of these improvements require an understanding of the new features and leveraging them explicitly, many ‘just work’ behind the scenes to make your database perform better.

Improved sorting algorithms

Improvements to on-disk and in-memory sorting algorithms in PostgreSQL 15 have resulted in benchmark speedups of 25% – 400%. What’s even better is that you don’t have to make any changes to your database or to your application to get these performance gains … they just work!

Support for parallelism

Leveraging the power offered by multi-core CPUs, parallelism has been increasing with each new PostgreSQL release. v15 offers parallel SELECT DISTINCT and parallel commits in postgres_fdw. These are some of those features that you need to specifically leverage in order to gain performance using them – they are not the default behavior and neither do they ‘just work’ behind the scenes.

Server stats in shared memory

Starting PostgreSQL 15, server-level statistics – which were previously part of a separate statistics collector process – are stored in shared memory. This eliminates the separate statistics collector process. PostgreSQL tracks all activities of each process to have cumulative stats and all the information is available through different pg_stat_* views. Jobin Augustine has written a fairly detailed article on this topic: PostgreSQL 15: Stats Collector Gone? What’s New?

Summary

PostgreSQL is taking massive strides in improving its performance and offering new features in each new release. v15 is no different. No surprise that PostgreSQL is the most loved as well as the most wanted database in Stackoverflow’s latest developer survey along with being the database that is one of the fastest growing in popularity at DB-Engines. What is important, however, is to understand what features can be leveraged for your use case to make your PostgreSQL perform at its optimum.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com