PostgreSQL is a powerful and versatile relational database management system that provides many features for managing and manipulating databases. One such feature is the is_template flag, which can be used to create and manage template databases. In this blog post, we will explore what the is_template flag is, why it is useful, and how to […]
25
2024
Exploring the is_template Flag: Simplifying PostgreSQL Template Databases
16
2023
Streamline the SQL Code: Guide to pgFormatter

In database management, well-formatted SQL code is easier to read and a vital aspect of maintaining clean, efficient, and error-free databases. PostgreSQL developers and database administrators often deal with complex SQL queries, making code readability a critical factor for productivity. A powerful tool designed to beautify and standardize SQL code is pgFormatter. This guide to pgFormatter explores its features, installation, and how it can revolutionize the SQL coding experience.
What is pgFormatter?
pgFormatter is an open source utility that takes SQL code as input and outputs beautifully formatted, indented, and aligned SQL code following a set of predefined rules. It not only enhances code readability but also enforces best practices and SQL style conventions. As a result, developers can focus more on writing logic and less on code presentation.
Key features of pgFormatter
SQL code beautification: pgFormatter transforms complex and unformatted SQL code into well-structured, visually appealing statements, making it easier to comprehend and maintain.
Indentation and alignment: The tool ensures consistent indentation and alignment of SQL keywords, clauses, and expressions, providing a clear visual code hierarchy.
Standard SQL style: pgFormatter follows standard SQL style guidelines, promoting code consistency across your projects and team members.
Configurable rules: Users can customize formatting rules according to their preferences, adapting pgFormatter to specific project requirements.
Installation and setup of pgFormatter
Don’t forget to update the repos before starting the installation:
yum update or sudo apt-get update
Perl-CGI module needs to be installed as a prerequisite to install pgFormatter:
yum install perl-cgi or apt-get install libcgi-pm-perl
This tool/utility can be installed using the below:
sudo apt-get install pgformatter
Using pgFormatter
Basic usage: Once pg_format is installed, one can use the below command to get the beautified SQL query output.
cat samples/ex1.sql | /usr/bin/pg_format - or /usr/bin/pg_format -n samples/ex1.sql or /usr/bin/pg_format -o result.sql samples/ex1.sql
There are multiple ways to use pg_format; above are a few of the common ways majorly used. More ways could be checked by using pg_format --help
For example:
We have the below unformatted query, which is difficult to read:
ubuntu@ip-172-31-95-64:/tmp$ cat Query.sql
SELECT c.customer_id, c.first_name, c.last_name, COUNT(o.order_id) AS total_orders
FROM customers c JOIN orders o ON c.customer_id = o.customer_id
WHERE c.country IN ('USA', 'Canada','India', 'UK')
AND o.order_date BETWEEN '2023-01-01' AND '2023-05-31'
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) > 5
ORDER BY total_orders DESC, c.last_name ASC;
Let’s try to use pgFormatter and beautify the query to make it readable for easy understandability:
ubuntu@ip-172-31-95-64:/tmp$ /usr/bin/pg_format Query.sql
SELECT
c.customer_id,
c.first_name,
c.last_name,
COUNT(o.order_id) AS total_orders
FROM
customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE
c.country IN ('USA', 'Canada', 'India', 'UK')
AND o.order_date BETWEEN '2023-01-01' AND '2023-05-31'
GROUP BY
c.customer_id,
c.first_name,
c.last_name
HAVING
COUNT(o.order_id) > 5
ORDER BY
total_orders DESC,
c.last_name ASC;
As mentioned earlier, multiple attributes can be used along with pg_format depending on the individual’s requirement.
pgFormatter web interface: The web-based version of pgFormatter allows to format SQL code conveniently from the browser using https://sqlformat.darold.net/.
Please find the unformatted query below:

Please find the query which got formatted after using the “Format my code” button at the bottom left corner:

Note: Please refrain from using the web interface if you are dealing with sensitive queries that might expose any critical data on the internet.
Best practices and benefits
Debugging and troubleshooting: When developers need to inspect SQL code during debugging or troubleshooting manually, well-formatted code is easier to navigate and understand. pgFormatter’s output presents code in a structured manner, helping developers quickly locate potential issues.
Collaborative projects: In a team of developers working on a collaborative PostgreSQL project, each team member may have their own coding style and formatting preferences. Without a tool like pgFormatter, the SQL codebase can become inconsistent and challenging to read. By integrating pgFormatter into their code editors or using it from the command line, team members can ensure that all SQL code follows a standardized format, promoting better collaboration and code reviews.
SQL code reviews: During code reviews, it’s essential to focus on the logic and functionality of the SQL code rather than its presentation. However, poorly formatted code can distract and make it harder to spot actual issues. By running pgFormatter before submitting SQL code for review, developers can present their queries neatly and organized, making it easier for reviewers to focus on the code’s correctness.
Query optimization: In complex SQL queries, code organization can significantly impact database performance. pgFormatter ensures that queries are properly indented and aligned, making it easier for developers to identify areas for optimization. Formatted code enables them to focus on the logical structure and improve query performance efficiently.
Migration and deployment: When deploying a PostgreSQL database schema or migrating it to a new environment, having properly formatted SQL scripts is crucial. pgFormatter helps ensure that schema scripts are consistently formatted, reducing the risk of errors during migrations and deployments.
Adherence to coding standards: Many organizations have their own SQL coding standards and best practices. pgFormatter allows developers to configure formatting rules to match their specific coding guidelines, ensuring adherence to the established standards across the entire codebase.
Useful links
- Please download using http://sourceforge.net/projects/pgformatter/
- Keep following GitHub for more information
Conclusion
In conclusion, pgFormatter is a valuable tool for PostgreSQL developers and administrators seeking to enhance their SQL coding experience. By incorporating pgFormatter into your workflow, you can achieve consistently formatted, clean, and readable SQL code, improving productivity and reducing the risk of errors. Whether you’re working on complex database queries or maintaining a large codebase, pgFormatter is the ideal companion for elevating your SQL coding journey. So, dive into the world of beautifully formatted SQL with pgFormatter, and elevate your PostgreSQL development to new heights.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.
07
2023
Unleashing the Power of PostgreSQL Event-Based Triggers

PostgreSQL provides a powerful mechanism for implementing event-driven actions using triggers. Triggers on Data Definition Language (DDL) events are a powerful feature of PostgreSQL that allows you to perform additional actions in response to changes to the database schema. DDL events include operations such as CREATE, ALTER, and DROP statements on tables, indexes, and other database objects. In this blog post, we will explore using triggers on DDL events in PostgreSQL to implement custom logic and automate database management tasks.
Creating event-based triggers
To create an event-based trigger in PostgreSQL, first, create a trigger function that defines the logic to be executed when the trigger fires. The trigger function can be written in PL/SQL or PL/Python, or any language supported by PostgreSQL.
Trigger function can be created in the same way as we create any user-defined function except that it returns event_trigger variable unlike returning the normal datatypes:
CREATE OR REPLACE FUNCTION
RETURNS event_trigger AS
$$
DECLARE
-- Declare variables if needed
BEGIN
-- Function body
-- Perform desired actions when the trigger fires
END;
$$
LANGUAGE plpgsql;
Once the trigger function is created, the trigger can be created that is associated with a specific event. Unlike normal triggers (which are executed for INSERT, UPDATE, and DELETE kinds of DML operations) that are created on specific tables, event-based triggers are created for DDL events and not on a particular table.
CREATE EVENT TRIGGER trigger_name
[ ON event_trigger_event ]
[ WHEN filter_condition ]
EXECUTE FUNCTION trigger_function_name();
In this syntax, event_trigger_event can be any of the following events, which are described in more detail in PostgreSQL documentation.
- ddl_command_start,
- ddl_command_end,
- sql_drop and
- table_rewrite
This syntax will be clearer after seeing the example in the next sections.
Using triggers on DDL events
Triggers on DDL events can be used for a wide range of purposes and database management tasks. Here are some examples of how to use DDL triggers:
- Log schema changes: One can use DDL triggers to log all schema changes, providing an audit trail of who made the changes and when.
- Automate database management tasks: One can use DDL triggers to automate routine database management tasks, such as creating indexes or updating views.
- Enforce naming conventions: One could use a DDL trigger to enforce naming conventions for tables and columns, ensuring that all objects are named consistently.
Let’s create a few triggers that help us understand all the above usages of event triggers.
Before creating the trigger, let’s create the table which will log all the DDL statements:
CREATE TABLE ddl_log (
id integer PRIMARY KEY,
username TEXT,
object_tag TEXT,
ddl_command TEXT,
timestamp TIMESTAMP
);
CREATE SEQUENCE ddl_log_seq;
Let’s create the event trigger function, which will insert the data into the above table:
CREATE OR REPLACE FUNCTION log_ddl_changes()
RETURNS event_trigger AS $$
BEGIN
INSERT INTO ddl_log
(
id,
username,
object_tag,
ddl_command,
Timestamp
)
VALUES
(
nextval('ddl_log_seq'),
current_user,
tg_tag,
current_query(),
current_timestamp
);
END;
$$ LANGUAGE plpgsql;
Let’s finally create the trigger, which will call the trigger function created above:
CREATE EVENT TRIGGER log_ddl_trigger
ON ddl_command_end
EXECUTE FUNCTION log_ddl_changes();
Let’s create a test table and check if we get the entry in the ddl_log table or not:
demo=# create table test (t1 numeric primary key);
CREATE TABLE
demo=# select * from ddl_log;
id | username | object_tag | ddl_command | timestamp
----+----------+--------------+---------------------------------------------+----------------------------
1 | postgres | CREATE TABLE | create table test (t1 numeric primary key); | 2023-06-02 15:24:54.067929
(1 row)
demo=# drop table test;
DROP TABLE
demo=#
demo=# select * from ddl_log;
id | username | object_tag | ddl_command | timestamp
----+----------+--------------+---------------------------------------------+----------------------------
1 | postgres | CREATE TABLE | create table test (t1 numeric primary key); | 2023-06-02 15:24:54.067929
2 | postgres | DROP TABLE | drop table test; | 2023-06-02 15:25:14.590444
(2 rows)
In this way, schema changes can be logged using the above event trigger code.
Even though it is not a rule of thumb, in my experience, it has been seen that mostly foreign key columns are used for the joining condition in queries. If we have indexes on such columns, it automizes the index creation on foreign key columns. In many applications, there are some naming conventions for table names. Let’s see an example that throws an error if the table name does not start with ‘tbl_’. Similar code can be developed for any object. Let’s check the code which will help in achieving this use case.
Common Trigger function for naming conventions and index creation — this code has been custom developed through my experience working in PostgreSQL and researching online.
CREATE OR REPLACE FUNCTION chk_tblnm_crt_indx()
RETURNS event_trigger AS
$$
DECLARE
obj record;
col record;
table_name text;
column_name text;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() WHERE command_tag = 'CREATE TABLE'
LOOP
-- Check if the table name starts with tbl_
table_name := obj.objid::regclass;
IF table_name NOT LIKE 'tbl_%' THEN
RAISE EXCEPTION 'Table name must start with tbl_';
END IF;
-- Check if there is any foreign key then create index
FOR col IN
SELECT a.attname AS column_name
FROM pg_constraint AS c
CROSS JOIN LATERAL unnest(c.conkey) WITH ORDINALITY AS k(attnum, n)
JOIN pg_attribute AS a
ON k.attnum = a.attnum AND c.conrelid = a.attrelid
WHERE c.contype = 'f' AND c.conrelid = obj.objid::regclass
LOOP
EXECUTE format('CREATE INDEX idx_%s_%s ON %s (%s)', table_name,col.column_name, table_name, col.column_name);
RAISE NOTICE 'INDEX idx_%_% ON % (%) has been created on foreign key column', table_name,col.column_name, table_name, col.column_name;
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Let’s finally create the trigger which will call this event trigger function:
CREATE EVENT TRIGGER chk_tblnm_crt_indx_trigger
ON ddl_command_end
EXECUTE FUNCTION chk_tblnm_crt_indx();
Let’s create a table that does not start with ‘tbl_’ and check how it gives an error:
demo=# create table dept (dept_id numeric primary key, dept_name varchar);
ERROR: Table name must start with tbl_
CONTEXT: PL/pgSQL function chk_tblnm_crt_indx() line 21 at RAISE
demo=#
demo=# create table tbl_dept (dept_id numeric primary key, dept_name varchar);
CREATE TABLE
Now, let’s create another table that references the tbl_dept table to check if an index is created automatically for a foreign key column or not.
demo=# create table tbl_emp(emp_id numeric primary key, emp_name varchar, dept_id numeric references tbl_dept(dept_id));
NOTICE: INDEX idx_tbl_emp_dept_id ON tbl_emp (dept_id) has been created on foreign key column
CREATE TABLE
demo=#
demo=# di idx_tbl_emp_dept_id
List of relations
Schema | Name | Type | Owner | Table
--------+---------------------+-------+----------+---------
public | idx_tbl_emp_dept_id | index | postgres | tbl_emp
(1 row)
As per the output of di, we can see that index has been created automatically on the foreign key column.
Conclusion
Event-based triggers are a powerful feature of PostgreSQL that allows the implementation of complex business logic and helps automate database operations. By creating triggers that are associated with specific events, one can execute custom logic automatically when the event occurs, enabling one to perform additional actions and enforce business rules. With event-based triggers, one can build more robust and automated database systems that can help improve the efficiency of the data.
On the other hand, these are good for non-production environments as it might be an overhead in the production environment if the logic is too complex. In my personal opinion, if any table is populated from the application, triggers should not be created on them; such constraints should be implemented from the application side to reduce database load. At the same time, it could act as a boon for the development (or non-production) environment to follow best practices and recommendations like who did what changes, whether proper naming conventions are used or not, and similar industry standards. In my experience, I have extensively used them for audit purposes on development environments to track the changes done by a huge team of hundreds of people.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
28
2023
What if the Backup Server Is Down and a Backup Is Needed? (Multi-repo Functionality of PgBackRest)

Our previous blogs discussed configuring and setting up backups using the pgBackRest solution. To briefly explain pgBackRest, it is an open source backup tool that takes FULL Backup, Incremental Backup, and Differential Backup for PostgreSQL databases.
Repository means the location/path on the server or the cloud where the actual copy of the backup will reside. In this blog, we will specifically discuss one of the important features of the pgBackRest called Multiple Repository (or, in short, Multi Repo). This attribute helps take the redundant copies of the databases at multiple locations remotely on different servers or locally on the same server.
Let’s discuss a few scenarios one by one.
Scenarios:
1. Behavior of pgBackRest with single repo
2. Behavior of pgBackRest with multiple repos
2.1. Configuring archives redundantly (async=y)
2.2. Taking backup locally with multiple repos
2.3. Taking backup locally and remotely on the cloud
2.4. Take backup locally and in multiple clouds
Pre-configured Setup:
>PostgreSQL installed and configured on the database host.
>pgBackRest is installed and configured on a dedicated backup and database host.
Scenario – 1: Behavior of pgBackRest with single repo
By default, pgBackRest takes backups in a single repository or location, which can reside locally on the same server, remote server, or cloud.
Below is the configuration file, which is normally configured for taking backups with single repo:
| Backup Host: | DB Host: | |
|---|---|---|
[global] |
[global] |
This configuration file contains details about the global section having common parameters like repo details, log details, etc. Even though there is a single repository, the parameter’s name starts from “repo1” to accommodate as many repositories as possible. Here, the backup will be stored on the local backup host at the /var/lib/pgbackrest_repo1 path that is already created with the proper user (in this case, Postgres) and permissions.
[pgstanza] is the name of the stanza for which backup is taken. For the sake of simplicity, we are considering backup for one DB Server only.
Let’s take the backup using the pgbackrest command:
On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-03-23 04:19:14.059 P00 INFO: backup command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-03-23 04:19:15.315 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-03-23 04:19:15.822 P00 INFO: backup start archive = 000000010000000000000039, lsn = 0/39000028
2023-03-23 04:19:15.822 P00 INFO: check archive for prior segment 000000010000000000000038
2023-03-23 04:19:23.184 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-03-23 04:19:23.386 P00 INFO: backup stop archive = 000000010000000000000039, lsn = 0/39000138
2023-03-23 04:19:23.394 P00 INFO: check archive for segment(s) 000000010000000000000039:000000010000000000000039
2023-03-23 04:19:23.711 P00 INFO: new backup label = 20230323-041915F
2023-03-23 04:19:23.790 P00 INFO: full backup size = 22.0MB, file total = 961
2023-03-23 04:19:23.790 P00 INFO: backup command end: completed successfully (9733ms)
2023-03-23 04:19:23.791 P00 INFO: expire command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza
2023-03-23 04:19:23.792 P00 INFO: repo1: expire full backup 20230323-040330F
2023-03-23 04:19:23.806 P00 INFO: repo1: remove expired backup 20230323-040330F
2023-03-23 04:19:23.829 P00 INFO: repo1: 15-1 remove archive, start = 000000010000000000000035, stop = 000000010000000000000036
2023-03-23 04:19:23.830 P00 INFO: expire command end: completed successfully (39ms)
As shown below, the backup goes into the local directory, and inside that, we have two directories as below:
>backup – contains backup when FULL/INCR/DIFF backup is taken
>archive – contains archives that help in PITR
postgres@ip-172-31-54-194:~$ cd /var/lib/pgbackrest_repo1
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$ ls -ltr
total 8
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 archive
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 backup
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$
Scenario – 2: Behavior of pgBackRest with multiple repo
Multiple Repository (or multi-repo) functionalities of pgBackRest support different combinations of storing redundant backup copies. In this section, we have discussed a few of the most useful combinations where multiple backup copies can be stored.
2.1 Configuring Asynchronous archiving (archive-async=y)
Wal files redundancy in the two different repos is possible by using asynchronous archiving.
This (archive-async=y) parameter allows the archive-push and archive-get commands to work asynchronously.
When this parameter is enabled, the pgBackRest will copy the wal files into both the repos. The example below shows that the pgBackRest is archiving the wal files into two different repos.
--repo1-path=/var/lib/pgbackrest_repo1
--repo2-path=/var/lib/pgbackrest_repo2
If the first repo is unavailable and the second repo is available, then, in this case, the pgBackRest will continue copying/archiving the wal files to the second repo. Still, It will accumulate those wal files in the pg_wal directory not archived in the first repo.
A spool path (spool-path) parameter is necessary when asynchronous archiving is enabled. The current WAL archiving status is getting stored in the spool path.
We can track the activities of the asynchronous process in the [stanza]-archive-push-async.log file.
On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info check
2023-03-23 04:35:59.074 P00 INFO: check command begin 2.44: --exec-id=158656-d4a8f71e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --stanza=pgstanza
2023-03-23 04:35:59.920 P00 INFO: check repo1 configuration (primary)
2023-03-23 04:35:59.921 P00 INFO: check repo2 configuration (primary)
2023-03-23 04:36:00.124 P00 INFO: check repo1 archive for WAL (primary)
2023-03-23 04:36:01.327 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo1/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo1
2023-03-23 04:36:01.328 P00 INFO: check repo2 archive for WAL (primary)
2023-03-23 04:36:01.328 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo2/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo2
2023-03-23 04:36:01.430 P00 INFO: check command end: completed successfully (2358ms)
postgres@ip-172-31-54-194:~$
2.2 Taking backup locally with multiple repos:
In this example, we have tried to create two repositories in the local backup server itself, namely pgbackrest_repo1 and pgbackrest_repo2. One can configure both repositories in different storage. In case one storage is unavailable, then another storage will still have a backup, which can be helpful.
Please find the example of pgbackrest.conf in the backup host with two local repositories:
| Backup Host: | DB Host: | |
|---|---|---|
[global] |
[global] |
Let’s rename pgbackrest_repo1 so that it becomes inaccessible, and then let’s try to take the backup:
On the Backup Host:
ubuntu@ip-172-31-54-194:~$ sudo mv /var/lib/pgbackrest_repo1 /var/lib/pgbackrest_repo1_bkp
ubuntu@ip-172-31-54-194:~$ sudo su - postgres
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-02-10 12:40:27.160 P00 INFO: backup command begin 2.44: --exec-id=23422-c65cc1d9 --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:27.161 P00 INFO: repo option not specified, defaulting to repo1
ERROR: [055]: unable to load info file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' or '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy':
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' for read
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
2023-02-10 12:40:27.162 P00 INFO: backup command end: aborted with exception [055]
postgres@ip-172-31-54-194:~$
As shown above, it cannot take the backup and throws the error message highlighted, which is expected.
Let’s try to take a backup in repo=2 and check whether it allows us to do so.
On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup --repo=2
2023-02-10 12:40:34.605 P00 INFO: backup command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:35.949 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-02-10 12:40:36.456 P00 INFO: backup start archive = 000000010000000000000028, lsn = 0/28000028
2023-02-10 12:40:36.456 P00 INFO: check archive for prior segment 000000010000000000000027
2023-02-10 12:40:43.993 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-02-10 12:40:44.195 P00 INFO: backup stop archive = 000000010000000000000028, lsn = 0/28000138
2023-02-10 12:40:44.201 P00 INFO: check archive for segment(s) 000000010000000000000028:000000010000000000000028
2023-02-10 12:40:45.521 P00 INFO: new backup label = 20230210-124035F
2023-02-10 12:40:45.579 P00 INFO: full backup size = 22.0MB, file total = 961
2023-02-10 12:40:45.580 P00 INFO: backup command end: completed successfully (10978ms)
2023-02-10 12:40:45.580 P00 INFO: expire command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza
2023-02-10 12:40:45.592 P00 INFO: repo2: 15-1 remove archive, start = 000000010000000000000020, stop = 000000010000000000000025
2023-02-10 12:40:45.592 P00 INFO: expire command end: completed successfully (12ms)
postgres@ip-172-31-54-194:~$
Excellent… the backup was successful for repo2. Now, let’s check the info and see what it says for repo1.
On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info
stanza: pgstanza
status: mixed
repo1: error (missing stanza path)
repo2: ok
cipher: none
db (current)
wal archive min/max (15): 000000010000000000000026/000000010000000000000028
full backup: 20230210-123819F
timestamp start/stop: 2023-02-10 12:38:19 / 2023-02-10 12:38:28
wal start/stop: 000000010000000000000026 / 000000010000000000000026
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
full backup: 20230210-124035F
timestamp start/stop: 2023-02-10 12:40:35 / 2023-02-10 12:40:44
wal start/stop: 000000010000000000000028 / 000000010000000000000028
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info --repo=1
stanza: pgstanza
status: error (missing stanza path)
postgres@ip-172-31-54-194:~$
2.3 Taking backup locally and remotely on the cloud:
Let’s consider a scenario where one repo is locally available on the dedicated backup server, and another repo is available on the cloud. The advantage here is that in case anyone repo from a local system or cloud is unavailable, it can be availed. This combination can help us to take advantage of the cloud and on-prem local machines.
Let’s check the main configuration needed in pgbackrest.conf:
| Backup Host: | DB Host: | |
|---|---|---|
[global] |
[global] |
As we can see, repo1 related options are specific to storing the backup in the local repository present in the dedicated backup host on the path – /var/lib/pgbackrest_repo1.A few important options for repo2, available in the AWS S3:repo-type is s3 indicating the AWS S3, and it could be azure for Azure Cloud, GCS for Google Cloud,repo2-s3-bucket, repo2-s3-endpoint, repo2-s3-key-secret, and repo2-s3-region attributes varies from cloud to cloud. A bucket or required repo with proper user and permission must be created before configuring pgBackRest backups. More information on the same can be found in pgBackRest User Guide.
2.4 Take backup locally and in multiple clouds:
Another very useful scenario is creating a repository on multiple clouds and one locally in the dedicated backup host. Even if one cloud provider is unavailable, a backup could be available from any other cloud or the local repository. In this case, a configuration could be like repo1-type, repo2-type, repo3-type, and so on.

In the above diagram, four repositories have been created where one repository is available locally, and other repositories are at different clouds viz AWS S3, Azure, and Google Cloud, respectively. In this case, the configuration on the backup host will be as follows.
On the Backup Host:
postgres@ip-172-31-54-194:~$ cat /etc/pgbackrest.conf
[global]
## Repo1: Local
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
## Repo3: Azure
repo3-type=azure
repo3-path=/pgbackrest_repo3
repo3-retention-full=2
repo3-azure-account=pgbackrest
repo3-azure-container=pgbackrest-container
repo3-azure-key=accessKey3
## Repo4: Google Cloud
repo4-type=gcs
repo4-path=/pgbackrest_repo4
repo4-retention-full=2
repo4-gcs-bucket=pgbackrest-bucket
repo4-gcs-key=/etc/pgbackrest/gcs-key.json
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
DB Host configurations will be set in the same way that has been mentioned in the earlier sections, along with the multiple repository details of the cloud.
Conclusion
To conclude, the major advantage of pgBackRest multi-repo functionality is that redundant backup copies can be taken. With the async=y option – archives will move to multiple repositories, and in case the default repo is unavailable, then the second repository will take care of archive files automatically.
The only limitation of this feature is that, by default, the backup will not go to repo=2 even though it is configured. We need to mention the repository number in case a backup needs to be taken in repo2. Also, one needs to take backup multiple times, mentioning the repo number explicitly so that backup can be taken in repositories other than the default repo. Eg – pgbackrest –stanza=pgstanza –log-level-console=info –type=full backup –repo=2.
Despite these limitations, multi-repo functionality can be used to take the backups on the secondary repo, even if the dedicated backup server is unavailable.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
23
2022
Rebuild Patroni Replica Using pgBackRest

Patroni is one of the most used high availability (HA) solutions with the PostgreSQL database. It uses a Distributed Configuration Store (DCS) to keep the configuration in a centralized location available for all nodes making it an easy-to-use and reliable HA solution available in the market today.
On the other hand, pgBackRest is a backup solution that helps in taking not only the FULL backup but also incremental and differential backup. This is one of the most used backup tool used for PostgreSQL databases.
In the previous blogs, we have discussed how to set up PostgreSQL HA with Patroni, and how to configure pgBackRest. Both these open source tools have been saviors for DBAs for achieving high availability and performing restoration in ample ways. In this blog, we will integrate both of these tools and understand how they can work together to reduce the server load.
Scenarios:
- Reinitializing the Patroni cluster using pgBackRest (instead of pgBaseBackup).
- Creating the Patroni replica using pgBackRest.
Pre-configured setup:
- Patroni configuration setup for two or more databases.
- pgBackRest configured on a dedicated backup host.
For the purpose of testing these scenarios, the below configurations will be used throughout this blog:
Patroni Nodes:
+ Cluster: prod (7171021941707843784) ----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------+-------+---------+---------+----+-----------+ | node1 | node1 | Leader | running | 1 | | | node2 | node2 | Replica | running | 1 | 0 | +--------+-------+---------+---------+----+-----------+
Patronictl edit-config shows below:
loop_wait: 10 maximum_lag_on_failover: 1048576 postgresql: create_replica_methods: - pgbackrest - basebackup parameters: archive_command: pgbackrest --stanza=patroni archive-push %p archive_mode: 'on' archive_timeout: 120s hot_standby: 'on' listen_addresses: '*' logging_collector: 'on' max_replication_slots: 10 max_wal_senders: 10 pg_hba: - host all all 0.0.0.0/0 md5 - host replication all 0.0.0.0/0 md5 - local all postgres peer wal_level: replica wal_log_hints: 'on' pgbackrest: command: /usr/bin/pgbackrest --stanza=patroni --log-level-file=detail --delta restore keep_data: true no_params: true recovery_conf: restore_command: pgbackrest --stanza=patroni archive-get %f %p use_pg_rewind: true use_slots: true retry_timeout: 10 ttl: 30
Note: Please check the highlighted sections, which are specific for rebuilding nodes using pgBackRest backup.
Apart from Patroni, we will need a Backup Repo host where pgBackRest has been configured. It can be on a dedicated server or one of the DB hosts. However, it is recommended to use a dedicated server as in case DB goes down, we have a separate server to make the life of the DBAs easier.
Let’s test the scenarios one by one:
Reinitializing the Patroni cluster using pgBackRest (instead of pgBaseBackup)
The main advantage of using pgBackRest instead of pgBaseBackup while reinitializing the node is that it’ll reduce the load from the leader node. This will not make any difference if the DB size is smaller. However, this feature is very useful in case the DB size is huge and it takes hours or days to build the node. This will divert the resource utilization on the dedicated backup host instead of the primary server, which anyways is busy fulfilling the majority of the requests coming to the database.
Let us try to understand how we can rebuild the node using backup.
Many times, we are unable to start the secondary nodes after failover or switchover. To handle this situation, Patroni allows us to reinitialize the database cluster, which will create/rebuild the node by wiping the data directory. In the background, it will copy all the contents of the data directory from the Primary Server and re-create the desired node.
Please make the changes in the Patroni configuration/yml file and reload the configuration, as shown previously. To reinitialize the Patroni replica node, the reinit command is used as below:
ubuntu@192.168.0.1:~$ patronictl -c /etc/patroni/node1.yml reinit prod + Cluster: prod (7171021941707843784) ----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------+-------+---------+---------+----+-----------+ | node1 | node1 | Leader | running | 1 | | | node2 | node2 | Replica | running | 1 | 0 | +--------+-------+---------+---------+----+-----------+ Which member do you want to reinitialize [node1, node2]? []: node2 Are you sure you want to reinitialize members node2? [y/N]: y Success: reinitialize for member node2
On the replica node, we can notice in the top command that it is rebuilding the node using pgBackRest backup and not pgBackRest. Ideally, Patroni uses pgBaseBackup in case create_replica_methods is not used which increases the load on the leader node.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12939 postgres 20 0 306440 268140 4 S 34.9 27.1 8871:14 GkwP468a 791850 postgres 20 0 218692 29544 26820 S 8.9 3.0 0:00.28 /usr/lib/postgresql/14/bin/postgres -D /var/lib/postgresql/14/main --config-file=/var/lib/postgresql/14/main/postgresql.conf --listen_ad+ 791881 postgres 20 0 60980 12696 10628 S 3.2 1.3 0:00.10 pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni archive-get 00000013.history pg_wal/RECOVERYHISTORY 791874 postgres 20 0 218692 8032 5260 S 2.2 0.8 0:00.07 postgres: prod: startup 791827 postgres 20 0 7760 3516 3212 R 1.3 0.4 0:00.04 bash 784973 postgres 20 0 23316 212 0 S 0.3 0.0 0:18.42 tracepath
In case it is using pgBackRest, then it will create a restore file mentioning pgBackRest command as below:
2022-12-15 15:41:29.070 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=791815-82f4ea68 --log-level-console=info --log-level-file=detail --pg1-path=/var/lib/postgresql/14/main --process-max=2 --repo1-host=192.168.0.5 --repo1-host-user=postgres --repo1-path=/pgrdbackups --stanza=patroni 2022-12-15 15:41:30.800 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04 2022-12-15 15:41:30.877 P00 DETAIL: check '/var/lib/postgresql/14/main' exists 2022-12-15 15:41:30.877 P00 DETAIL: remove 'global/pg_control' so cluster will not start if restore does not complete 2022-12-15 15:41:30.918 P00 INFO: remove invalid files/links/paths from '/var/lib/postgresql/14/main' 2022-12-15 15:41:30.919 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/backup_label.old' 2022-12-15 15:41:31.841 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/base/13761/pg_internal.init' 2022-12-15 15:41:31.920 P00 DETAIL: remove invalid file '/var/lib/postgresql/14/main/global/pg_internal.init'
Creating the Patroni replica using pgBackRest:
In case the bootstrap section contains code to rebuild the node using pgBackRest, then while adding the node in the already existing Patroni cluster, the first time building of the new node will use pgBackRest backup instead of pgBaseBackup. Also, point-in-time recovery can be done using the bootstrap section. This will help in not only reducing the load from the leader node but also the backup node will help restore the data with comparatively lesser resource utilization.
To configure the same, please use the below in the Patroni configuration file:
bootstrap: method: <custom_bootstrap_method_name> <custom_bootstrap_method_name>: command: <path_to_custom_bootstrap_script> [param1 [, ...]] keep_existing_recovery_conf: True/False no_params: True/False recovery_conf: recovery_target_action: promote recovery_target_timeline: <PITR_Time> restore_command: <method_specific_restore_command>
In this example, the below section has been added to Patroni config, which will build the node by performing point-in-time recovery using the time stamp mentioned.
bootstrap: method: pitr_restore_by_pgbackrest pitr_restore_by_pgbackrest: command: 'pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail --link-all --type=time --target="2022-12-13 15:46:04" restore' keep_existing_recovery_conf: True no_params: True recovery_conf: recovery_target_action: "promote" recovery_target_time: "2022-12-13 15:46:04" restore_command: 'pgbackrest -config=/etc/rdba/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail archive-get %f "%p"' recovery_target_inclusive: true
When the node is being built, one can see in the TOP processes that pgBackRest is being used instead of base backup and Patroni Status when the node is being built:
? patroni.service - PostgreSQL high-availability manager Loaded: loaded (/lib/systemd/system/patroni.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/patroni.service.d ??override.conf Active: active (running) since Mon 2022-12-19 19:06:16 UTC; 3s ago Main PID: 2094 (patroni) Tasks: 11 (limit: 1143) Memory: 114.6M CPU: 941ms CGroup: /system.slice/patroni.service ??2094 /usr/bin/python3 /usr/bin/patroni /etc/patroni/db2.yml ??2100 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --stanza=patroni --log-level-file=detail --delta restore ??2102 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-stderr=error --process=1 --remote-type=repo --stanza=p> ??2103 /usr/bin/pgbackrest --config=/etc/pgbackrest/pgbackrest.conf --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-stderr=error --process=2 --remote-type=repo --stanza=p> ??2104 ssh -o LogLevel=error -o Compression=no -o PasswordAuthentication=no postgres@192.168.0.3 "/usr/bin/pgbackrest --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-st> ??2105 ssh -o LogLevel=error -o Compression=no -o PasswordAuthentication=no postgres@192.168.0.3 "/usr/bin/pgbackrest --exec-id=2100-9618fafd --log-level-console=off --log-level-file=off --log-level-st> Dec 19 19:06:16 ip-192-168-0-2 systemd[1]: Started PostgreSQL high-availability manager. Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,415 INFO: Selected new etcd server http://192.168.0.1:2379 Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,434 INFO: No PostgreSQL configuration items changed, nothing to reload. Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,551 INFO: Lock owner: node3; I am db2 Dec 19 19:06:16 ip-192-168-0-2 patroni[2094]: 2022-12-19 19:06:16,633 INFO: trying to bootstrap from leader 'node3' Dec 19 19:06:16 ip-192-168-0-2 patroni[2100]: 2022-12-19 19:06:16.645 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=2100-9618fafd --log-level-console=info --log-l> Dec 19 19:06:16 ip-192-168-0-2 patroni[2100]: WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/var/lib/postgresql/14/main' to confirm that this is a valid $PGDATA director Dec 19 19:06:17 ip-192-168-0-2 patroni[2100]: 2022-12-19 19:06:17.361 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04
Notice that it is using delta restore which means it will automatically identify which files are required to restore and only those will be restored making the whole process faster.
Also, the log file (by default – /var/log/pgbackrest) will contain the restore date and time as below:
-------------------PROCESS START------------------- 2022-12-19 19:06:16.645 P00 INFO: restore command begin 2.41: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=2100-9618fafd --log-level -console=info --log-level-file=detail --pg1-path=/var/lib/postgresql/14/main --process-max=2 --repo1-host=192.168.0.3 --repo1-host-user=postgres - -repo1-path=/pgrdbackups --stanza=patroni 2022-12-19 19:06:16.646 P00 WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/var/lib/postgresql/14/mai n' to confirm that this is a valid $PGDATA directory. --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted. 2022-12-19 19:06:17.361 P00 INFO: repo1: restore backup set 20221213-154604F, recovery will start at 2022-12-13 15:46:04 2022-12-19 19:06:17.361 P00 DETAIL: check '/var/lib/postgresql/14/main' exists 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/1' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/13760' 2022-12-19 19:06:17.362 P00 DETAIL: create path '/var/lib/postgresql/14/main/base/13761'
This blog majorly focuses on integrating the pgBackRest and Patroni, however, one can use other backup tools like WAL_E or BARMAN to rebuild the nodes. More information on such configuration can be found in the Patroni documentation.
Conclusion
Patroni and pgBackRest solutions work best when integrated, which helps in reducing the load from the Primary DB Server. This integration, where the node is reinitialized using pgBackRest, makes the optimum utilization of the dedicated backup server. If the Patroni setup is already configured and the dedicated backup host is also available, then making a few configuration changes in patroni.yml can work like wonders.