Percona Universities in South America Next Week

Percona University April 2019

Percona University April 2019There’s just one week to go before the first of this year’s Percona University events in South America. We’re really pleased with the lineup for all three events. We’re also incredibly happy with the response that we have had from the community. While we realize that a free event is… well… free… you are still giving up your time and travel. We place great value on that and we’re making sure that you’ll be hearing quality technical talks from Percona and our guest speakers. Most of the talks will be presented in Spanish – Portuguese in Brazil – although slides will be made available in English.

In fact, the events have been so popular that it’s quite possible that by the time you read this we’ll be operating a wait list for Montevideo (Tuesday April 23), Buenos Aires (Thursday, April 25), and São Paulo (Saturday, April 27).

A request…

So that others don’t miss out on this (rare!) opportunity, if you have reserved a place and can no longer attend, please can you cancel your booking? You can do that after logging into your eventbrite (or MeetUp) account.

After all… we want to make sure that these events are so successful that our CEO, Peter Zaitsev, will find the idea of returning to South America to present at a future series of events completely irresistible.

To wrap up this blog post, let me mention that for the Montevideo event, we’ll have some guest speakers from other companies, to make the content even better; and remember there is a raffle at the end, in which you can get a signed copy of the High Performance MySQL book signed by Peter himself (and other goodies to be discussed ;)).

Looking forward to seeing you next Tuesday!


Fast Upgrade of Legacy PostgreSQL with Minimum Downtime Using pg_upgrade

pg_upgrade to upgrade postgresql

PostgreSQL logoWhen you need to upgrade your PostgreSQL databases, there are a number of options available to you. In this post we’ll take a look at how you can upgrade PostgreSQL versions using pg_upgrade, a built-in tool that allows in-place upgrade of your software. Using pg_upgrade allows you, potentially, to minimize your downtime, an essential consideration for many organizations. It also allows you to perform a postgres upgrade with very minimal effort.

In our previous posts, we discussed various methods and tools that can help us perform a PostgreSQL upgrade – (1) pg_dumpall, (2) pg_dump and pg_restore with pg_dumpall, (3) logical replication and pglogical, and (4) slony. Methods 1 and 2 can involve additional downtime compared to the approaches taken in 3 and 4. Whilst performing an upgrade using logical replication or slony may be time consuming and require a lot of monitoring, it can be worth it if you can minimize downtime. If you have large databases that are busy with a lot of transactions, you may be better served using logical replication or slony.

This post is the fifth of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series. These posts lead up to a live webinar, where we’ll be exploring different methods available to upgrade your PostgreSQL databases. If it’s beyond the live webinar date when you read this, you’ll find the recording at that same link.


pg_upgrade (formerly pg_migrator – until PostgreSQL 8.4) is a built-in tool that helps in upgrading a legacy PostgreSQL server to a newer version without the need of a dump and restore. The oldest version from when you can upgrade your PostgreSQL using pg_upgrade is 8.4.x. It is capable of performing faster upgrades by taking into consideration that system tables are the ones that undergo the most change between two major versions. The internal data storage format is less often affected.

In fact, in one of our tests we were able to perform an upgrade of a 2 TB database server from PostgreSQL 9.6.5 to 11.1 in less than 10 seconds. Now that is fast!

Overview of the process

To understand how it works, consider a PostgreSQL server running on 9.3.3 that needs to upgrade to PostgreSQL 11. You should install the latest binaries for your new PostgreSQL version on the server – let’s say PostgreSQL 11.2 – before you begin the upgrade process.

Preparation and consistency checks

Once you have installed the new PostgreSQL version, initialize a new data directory using the new binaries and start it on another port i.e. a different port to the one used by PostgreSQL 9.3.3, in our example. Use pg_upgrade to perform consistency checks between the two servers – PG 9.3.3 and PG 11.2 – running on two different ports. If you get any errors, such as a missing extension, you need to to fix these before you proceeding to the upgrade. Once the consistency check has been passed, you can proceed.

Here is how the log looks if you should get an error while performing consistency checks.

$ /usr/pgsql-11/bin/pg_upgrade -b /usr/pgsql-9.3/bin -B /usr/pgsql-11/bin -d /var/lib/pgsql/9.3/data -D /var/lib/pgsql/11/data_new -c
Performing Consistency Checks on Old Live Server
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for invalid "unknown" user columns                 ok
Checking for hash indexes                                   ok
Checking for roles starting with "pg_"                      ok
Checking for incompatible "line" data type                  ok
Checking for presence of required libraries                 fatal
Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
Failure, exiting
$ cat loadable_libraries.txt
could not load library "$libdir/pg_repack": ERROR:  could not access file "$libdir/pg_repack": No such file or directory

To proceed beyond the error, in this example you’d need to install this missing extension pg_repack for the new PostgreSQL version, and rerun the check to make sure that you receive no errors and all the checks are passed.

Carrying out the upgrade

Once passed, you can proceed in one of two ways. One option is to let pg_upgrade copy the datafiles of the old data directory to the new data directory initialized by the new PostgreSQL version. The second option is to let pg_upgrade use hard links instead of copying data files. Copying a database of several terabytes may be time consuming. Using the hard links method makes the process really quick as it does not involve copying files.

To use hard links with pg_upgrade, you pass an additional argument -k as you can see in the following command.

$ /usr/pgsql-11/bin/pg_upgrade -b /usr/pgsql-9.3/bin -B /usr/pgsql-11/bin -d /var/lib/pgsql/9.3/data -D /var/lib/pgsql/11/data_new -k

In the Unix file system, a file or a directory is a link to an inode (index node) that stores metadata (disk block location, attributes, etc) of the data stored in them. Each inode is identified by an integer or an inode number. When you use pg_upgrade with hard links, it internally creates another file/directory in the new data directory that links to the same inode as it was in the older data directory for that file/directory. So, it skips the physical copy of the objects, but creates each object and links them to the same inode.

This reduces the disk IO and avoids the need for additional space in the server. An important point to note is that this option works only when you are upgrading your PostgreSQL on the same file system. This means, for example, if you want to upgrade to a new or a faster disk during the database upgrade, the hard link option does not work. In that case, you would need to use the file copy method.

So far, we have seen a high level overview of how pg_upgrade with hard links help you to perform an upgrade with lowest possible downtime. Come see more in action during our Webinar.  And don’t forget at Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track.

Elephant image based on photo from Pexels


Continuous Replication From a Legacy PostgreSQL Version to a Newer Version Using Slony

Native streaming replication in PostgreSQL works only between servers running the same major version. We discussed about logical replication in our previous blog post. In that post, we saw how logical replication could help us set up migration between two different PostgreSQL versions. However, logical replication works only for the currently supported versions of PostgreSQL, for example between PostgreSQL 9.4 and PostgreSQL 11. So what about the legacy versions that are older than 9.4? Slony-I could help us meet this replication requirement.

Replication between different PostgreSQL versions with Slony-I is useful for migration from legacy database installations to the latest available version. So what is Slony and how does it work?

This post is the fourth of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.


Slony is an application-level logical replication implementation for PostgreSQL. Rather, we could say that it is an external replication tool that requires a separate installation and configuration. Slony has been around for a long time. The latest version supports PostgreSQL versions from 8.4 and up to 11.

PostgreSQL logoThe main goal for replication is to ship changes from one database server to another. To better understand the architecture, you should know the terms such as Slon, Events and Slonik in Slony-I.

An aside: Slony means elephants in Russian, and elephants are indeed reputed to have a great memory. A slightly angry, but nevertheless pretty elephant, “Slonik”, looks at you from the PostgreSQL logo image.


Slon is a daemon that runs on each PostgreSQL node in Slony-I replication. These daemons are used for processing configuration and replication events for each PostgreSQL server. Each PostgreSQL server is called a “node”. All nodes together form a Slony “cluster”.

The “publisher node” is a source for replicated changes. While “subscriber” nodes receive and apply changes from the publisher node.

In order to setup replication, we should specify all replicated tables or “set”. Actual subscription works within a specific set. Changes to the tables being replicated are grouped together into SYNCs. These group of transactions are applied together to the subscriber nodes.


Changes are transferred from the publisher in the form of “events”. When an event is processed by the slon daemon on a remote node, it generates a “confirmation”. Events are also used to notify nodes about configuration changes like adding/removing new nodes, new subscriptions or DDL changes.

Each event has a unique origin identifier, sequence number, transaction id for the snapshot on provider node for this event, multiple arguments, and timestamp with timezone.

Triggers written in PL/pgSQL register all changes in replicated tables. Unfortunately, there is no reliable way yet to handle changes to large objects (BLOBS), DDLs, or changes to users and roles.


Slonik means a little elephant. It is a command line utility with parser and interpreter and it accepts “slonik scripts” – a simple declarative scripting language. It is designed to overcome the limitations of procedural language. You use slonik commands to set up or modify slony replication and they can be embedded in shell scripts. It can accept commands from standard input or from files. The following example shows how a slonik script being fed to the slonik utility, and then embedded in shell scripts.

The script to create the initial configuration for the simple master-slave setup of our pgbench database looks like this:

slonik <<_EOF_
 cluster name = percona_pg;
 node 1 admin conninfo = 'dbname=pg93 host=pg93_host user=percona_pg93_user';
 node 2 admin conninfo = 'dbname=pg11 host=pg11_host user=percona_pg11_user';
 # Creates a _$(clustername), this example, _percona_pg schema
 init cluster ( id=1, comment = 'Legacy PG Node');
 # Add a list of tables being replicated to a set.
create set (id=1, origin=1, comment='pgbench');
 set add table (set id=1, origin=1, id=1, fully qualified name = 'public.pgbench_accounts', comment='accounts');
 set add table (set id=1, origin=1, id=2, fully qualified name = 'public.pgbench_branches', comment='branches');
 set add table (set id=1, origin=1, id=3, fully qualified name = 'public.pgbench_tellers', comment='tellers');
 set add table (set id=1, origin=1, id=4, fully qualified name = 'public.pgbench_history', comment='history');
 # Create the second node (the slave) tell the 2 nodes how to connect to
 # each other and how they should listen for events.
 store node (id=2, comment = 'Target node', event node=1);
 store path (server = 1, client = 2, conninfo='dbname=pg93 host=pg93_host user=percona_pg93_user');
 store path (server = 2, client = 1, conninfo='dbname=pg11 host=pg11_host user=percona_pg11_user');

Why Slony is useful for migrations?

Despite the benefits of internal logical replication, this external solution is better for migrations between different versions that are older than PostgreSQL 9.4. The trigger-based approach depends on the database API – both older and newer versions should be compatible for PL/pgSQL and SQL syntax.

How to adapt your database for usage with Slony?

  • Your tables have to have primary keys. Add a serial field to all tables without primary key
  • OID blobs will not have their changes replicated. If you have columns with small-length values, you could convert these to BYTEA. For a really large objects like images, it’s better to store data externally e.g. use S3 in Amazon cloud. If it’s too hard to change your application, you could apply blob changes at the last stage of migration.
  • ALTER TABLE and other DDL operations. Slony can’t detect table structure changes. Instead you should use an EXECUTE SCRIPT slonik command to apply a SQL file with DDL or SQL strings to the whole replication cluster.

Online migration from legacy PostgreSQL

  1. Create replication user with superuser privileges. It’s possible to use fine-grained privileges, but they are significantly harder to setup.
  2. Create a database on the destination, setup access by TCP/IP
  3. Copy table definitions from master to slaves
  4. Install Slony-I. On servers with an old OS distribution you might find it simpler to install Slony-I from the source code.
  5. Define cluster, set of tables, and connection information to nodes as a list of slonik commands
  6. Start the slon daemon on each postgresql server. Check standard output or log files for any potential communication errors.
  7. Execute subscription slonik commands to start sync
  8. Test your read-only queries with a newer version of postgres
  9. Once all the data has been replicated and is in sync, stop your applications and repoint them to the new postgres server.
  10. Use “uninstall node” on the newer version of PostgreSQL to remove all traces of Slony replication

Downgrade steps

To downgrade, follow the same procedure as upgrade. Slony allows you to replicate from and to any versions of PosgreSQL supported by your slony version. The minimum supported version is 8.4.


So far, we have seen a high level overview of how Slony can be helpful to perform an upgrade with the least possible downtime. Come and see more of this in action during our Webinar.  And don’t forget at Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track.

Image derived from photo by Chen Hu on Unsplash


Upcoming Webinar Wed 4/10: Extending and Customizing Percona Monitoring and Management

Percona Monitoring and Management 1.17.0

Percona Monitoring and Management 1.17.0Please join Percona’s Product Manager, Michael Coburn, as he presents his talk Extending and Customizing Percona Monitoring and Management on April 10th, 2019 at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

Do you already run stock PMM in your environment and want to learn how you extend the PMM platform? If so, come learn about:

1. Dashboard Customizations
* How to create a custom dashboard from existing graphs, or build Cross Server Dashboards
2. External Exporters – Monitor any service, anywhere!
* Adding an exporter, view the data in data exploration, to deploying a working Dashboard
3. Working with custom queries (MySQL and PostgreSQL)
* Execute SELECT statements against your database and store in Prometheus
* Build Dashboards relevant to your environment
4. Customizing Exporter Options
* Enable deactivated functionality that applies to your environment
5. Using Grafana Alerting
* Moreover, how to set up channels (SMTP, Slack, etc)
* What’s more, how to configure thresholds and alerts
6. Using MySQL/PostgreSQL Data Source
* Also, execute SELECT statements against your database and plot your application metrics

In order to learn more, register for Extending and Customizing Percona Monitoring and Management.


Adding PostgreSQL Tuple Statistics Dashboard to the PMM Plugin

test installation PostgreSQL tuples

While the PMM Engineering team is working on the polish for release of PMM 2, I wanted to share with you some of the dashboard improvements we’re making around PostgreSQL, specifically how you can plot Tuple Statistics using PMM. PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL®, MongoDB®, and PostgreSQL performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL, MongoDB, and PostgreSQL servers to ensure that your data works as efficiently as possible.

This new dashboard is based on data collected from the pg_stat_user_tables view. However, by default, postgres_exporter doesn’t collect this information. Instead we will leverage the exporter’s ability to run custom SELECT queries from a file (queries.yaml in our example). These generate metric series in PMM that we can then visualize. Collecting tuple information using pg_stat_user_users view requires that databases be identified in queries.yaml, so we’ve written a helper script that takes multiple database names and formats the queries.yaml file accordingly.

Setting up the dashboard

First of all we have to enable the extension dblink. It’s a PostgreSQL contrib extension that allows users to connect to other databases and to run arbitrary queries in them.

postgres=# CREATE EXTENSION dblink;
postgres=# SELECT dblink_connect('dbname=sbtest1');
(1 row)

The next step is to create a query for collecting data from tables. We propose to use our bash script that generates a query and forms other fields for a custom query file. You have to specify the database name, or names, that will be monitored. You can store the result of the script in a separate file or can extend an existing file with queries.

[root@ip-10-178-1-82 ec2-user]# ./postgres_query_generator.sh sbtest1 sbtest2 sbtest3 >> queries.yaml

Now lets run the exporter with this file /home/ec2-user/queries.yaml. Please use flag extend.query-path as in the example below. If you want to change exporter options, the procedure is to remove then add back the exporter with the new option identified.

[root@ip-10-178-1-82 ec2-user]# pmm-admin rm postgresql:metrics PS_NODE-1
OK, removed PostgreSQL metrics PS_NODE-1 from monitoring.
[root@ip-10-178-1-82 ec2-user]# pmm-admin add postgresql:metrics --user postgres PS_NODE-1 -- --extend.query-path /home/ec2-user/queries.yaml
OK, now monitoring PostgreSQL metrics using DSN postgresql:***@/postgres

The PostgreSQL Tuple Statistics Dashboard is available on GrafanaLab for download under the perconalab organization.

Here are some screenshots from our test installation.


Writing PostgreSQL Extensions is Fun – C Language

postgresql extensions

PostgreSQL is a powerful open source relational database management system. It extends the SQL language with additional features. A DBMS is not only defined by its performance and out of the box features, but also its ability to support bespoke/additional user-specific functionality. Some of these functionalities may be in the form of database constructs or modules, like stored procedures or functions, but their scope is generally limited to the functionality being exposed by the DBMS. For instance, how will you write a custom query-analyzing application that resides within your DBMS?

To support such options, PostgreSQL provides a pluggable architecture that allows you to install extensions. Extensions may consist of a configuration (control) file, a combination of SQL files, and dynamically loadable libraries.

This means you can write your own code as per the defined guidelines of an extension and plug it in a PostgreSQL instance without changing the actual PostgreSQL code tree. An extension by very definition extends what PostgreSQL can do, but more than that, it gives you the ability to interact with external entities. These external entities can be other database management systems like ClickHouse, Mongo or HDFs (normally these are called foreign data wrappers), or other interpreters or compilers (thus allowing us to write database functions in another language like Java, Python, Perl or TCL, etc.). Another potential use case of an extension can be for code obfuscation which allows you to protect your super secret code from prying eyes.

Build your own

To build your own extension, you don’t need a complete PostgreSQL code base. You can build and install an extension using installed PostgreSQL (it may require you to install a devel RPM or Debian package). Details about extensions can be found in PostgreSQL’s official documentation[1]. There many extensions available for different features in the contrib directory of PostgreSQL source. Other than the contrib directory, people are also writing extensions readily available on the internet but currently not part of the PostgreSQL source tree. The pg_stat_statements, PL/pgSQL, and PostGIS are examples of the best known or most widely used extensions.

Generally available PostgreSQL extensions may be classified into four main categories:

  • Add support of a new language extension (PL/pgSQL, PL/Python, and PL/Java)
  • Data type extensions where you can introduce new ((Hstore, cube, and hstore)
  • Miscellaneous extensions (contrib folder has many miscellaneous extensions)
  • Foreign Data Wrapper extension (postgres_fdw, mysqldb_fdw, clickhousedb_fdw)

There are four basic file types that are required for building an extension:

  • Makefile: Which uses PGXS PostgreSQL’s build infrastructure for extensions.
  • Control File: Carries information about the extension.
  • SQL File(s): If the extension has any SQL code, it may reside in form SQL files (optional)
  • C Code: The shared object that we want to build (optional).

Extension Makefile

To compile the C code, we need a makefile. It’s a very simple makefile with the exception of “PGXS”, which is PostgreSQL’s infrastructural makefile for creating extensions. The inclusion of “PGXS” is done by invoking pg_config binary with “–pgxs” flag. The detail of that file can be found at GitHub[2].

This is a sample makefile, which can be used to compile the C code.

MODULE_big = log
DATA = log--0.0.1.sql
OBJS = log.o
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Extension Control File

This file must be named as [EXTENSION NAME]. control file. The control file can contain many options which can be found at official documentation [3]. But in this example, I have used some basic options.

comments: Comments about the extension.

default_version: This is the extension SQL version. The name of the SQL file contains this information in the file name.

relocatable:  Tells PostgreSQL if it is possible to move contained objects into a different schema.

module_pathname:  This information is replaced with the actual lib file path.

comment = 'PostgreSQL Utility Command Logger
'default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/log'

The contents of the file can be seen using the psql command \dx psql.

postgres=# \dx log      
List of installed extensions Name   | Version | Schema |       Description
log                                 | 0.0.1   | public | PostgreSQL Utility Command Logger

Extension SQL File

This is a mapping file, which I used to map the PostgreSQL function with the corresponding C function. Whenever you call the SQL function then the corresponding C function is called. The name of the file must be [EXTENSION NAME]–[default-version].sql. This is the same default_version as defined in the control file.

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)

Extension C Code

There are three kinds of functions you can write in c code.

The first is where you call your c code function using SQL function written in SQL file of the extension.

The second type of function is callbacks. You register that callback by assigning the pointer of the function. There is no need for an SQL function here. You call this function automatically when a specific event occurs.

The third type of function is called automatically, even without registering. These functions are called on events such as extension load/unload time etc.

This is the C file containing the definition of the C code. There is no restriction for the name, or of the number of C files.

#include "postgres.h"
/* OS Includes */
/* PostgreSQL Includes */
void _PG_init(void);
void _PG_fini(void);
Datum pg_all_queries(PG_FUNCTION_ARGS);
static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest,       char *completionTag);

You need to include postgres.h for the extension. You can include other PostgreSQL as needed. PG_MODULE_MAGIC is macro which you need to include in the C file for the extension. _PG_init and _PG_fini are functions that are called when the extension is loaded or unloaded, respectively.

Here is an example of the load and unload functions of an extension.

void _PG_init(void)
    /* ... C code here at time of extension loading ... */
    ProcessUtility_hook = process_utility;
Void _PG_fini(void)
    /* ... C code here at time of extension unloading ... */

Here is an example of a callback function that you can invoke whenever you call a utility statement e.g. any DDL statement. The “queryString” variable contains the actual query text.

static void process_utility(PlannedStmt *pstmt,
                           const char *queryString,
                           ProcessUtilityContext context,
                           ParamListInfo params,
                           QueryEnvironment *queryEnv,DestReceiver *dest,
                           char *completionTag)
    /* ... C code here ... */
    standard_ProcessUtility(pstmt,  &nbsp;
                            queryString,  &nbsp;
                            context,  &nbsp;
                            params,  &nbsp;
                            queryEnv,  &nbsp;
    /* ... C code here ... */

Finally, here’s an example of a C function that is invoked through a user-defined SQL function. This internally calls the C function contained in our shared object.

Datum pg_all_queries(PG_FUNCTION_ARGS)
    /* ... C code here ... */
    tupstore = tuplestore_begin_heap(true, false, work_mem);
    /* ... C code here ... */     
    values[0] = CStringGetTextDatum(query);
    values[1] = CStringGetTextDatum(pid);
    /* ... C code here ... */
    return (Datum) 0;

Compile and Install

Before compilation, you need to set the PATH for PostgreSQL’s bin directory if there is no pg_config available in the system paths.

export PATH=/usr/local/pgsql/bin:$PATH

make USE_PGXS=1

make USE_PGXS=1 install


We can now use our extension through a simple SQL query. Following is the output that is coming straight out of extension written in C programming language.

postgres=# select * from pg_all_queries();          
query                      | pid
create table foo(a int);  +| 8196
create table bar(a int);  +| 8196
drop table foo;           +| 8196
(3 rows)

I hope this example can serve as a starting point for you to create more useful extensions that will not only help you and your company, but will also give you an opportunity to share and help the PostgreSQL community grow.

The complete example can be found at Github[4].

[1]: https://www.postgresql.org/docs/current/external-extensions.html

[2]: https://github.com/postgres/postgres/blob/master/src/makefiles/pgxs.mk

[3]: https://www.postgresql.org/docs/9.1/extend-extensions.html

[4]: https://github.com/ibrarahmad/Blog-Examples/tree/master/log

Photo from Pexels


Replication Between PostgreSQL Versions Using Logical Replication

postgres read replica from primary

When you need to upgrade PostgreSQL, there are a variety of approaches that you can use. To avoid application downtime, then not all of the options to upgrade postgres are suitable. When avoiding downtime is essential, then you might consider using replication as a means of upgrade, and depending on your scenario, you can choose to approach this task using either logical replication or physical (streaming) replication. Here, we take a look at the difference between logical and physical replication in PostgreSQL. Then we explore how to accomplish an upgrade using logical replication in more detail, and by doing so, avoid application downtime. In a subsequent article, we’ll investigate physical replication.

We have already discussed about a few methods available to perform PostgreSQL upgrades in our previous blog posts – PostgreSQL Upgrade Using pg_dumpall and PostgreSQL Upgrade Using pg_dump/pg_restore – as part of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series. However, both of these methods involve downtime to application.

Types of logical replication

Here we’ll cover two types of replication you could implement:

  1. Replication between PostgreSQL 10 and 11 versions using built-in logical replication.
  2. Replication between PostgreSQL 9.4 or (<  PG 11) to PostgreSQL 11 using an extension named pglogical .

We might opt to implement replication as a method of upgrade to minimize downtime for the application. Once all the data to date has been replicated to another PostgreSQL server on the latest version, you can simply switch your application to the new server with a minimal downtime… though of course this does depends on the complexity of your application stack.

Logical replication in PostgreSQL allows users to perform a selective replication of a few tables and open a standby for writes. Whereas physical replication in PostgreSQL is a block level replication. In this case, each database in the master is replicated to a standby, and the standby is not open for writes. Going forward, we’ll refer to physical replication as streaming replication.

With logical replication, a standby can have replication enabled from multiple masters. This could be helpful in situations where you need to replicate data from several PostgreSQL databases (OLTP) to a single PostgreSQL server for reporting and data warehousing.

One of the biggest advantages of logical over streaming replication is that logical replication allows us to replicate changes from an older version PostgreSQL to a later version. Streaming replication works only when both the master and standby are of the same major version. We’d recommend they have same minor version too for best practice.

Replication between PostgreSQL 10 and 11 versions

Starting from PostgreSQL 10, logical replication is available with the PostgreSQL source by default. So, you could easily replicate a PostgreSQL 10 database to PostgreSQL 11. Logical Replication uses a publish and subscribe model. The node that sends the changes becomes a publisher. And the node that subscribes to those changes becomes a subscriber. You may have one or more subscriptions to a publication.


Publication is a set of changes generated from a group of tables. It is referred to as a change set or replication set. Publications can only contains tables and cannot contain any other objects. DMLs on these tables can be replicated but not DDLs.

In a publication, you can choose what type of DML to replicate: INSERT or DELETE or UPDATE or ALL. By default, it is ALL. You must have a replica identity set on the table being published to replicate UPDATES and DELETES to a subscriber. A replica identity set helps in identifying the rows to be updated or deleted.

The primary key of a table is its default replica identity. You can also make a unique index with NO NULL values as a replica identity. If there is no primary key or a unique index with NO NULLs, then you can set the replica_identity to FULL. When a replica identity is set to FULL, postgres uses the entire row as a key. Of course, this may be inefficient.

You might see ERRORS if a table with no primary key and a non-default replica identity has been added to a publication after an UPDATE or a DELETE operation.


A subscriber can subscribe to one or more publications. Before adding the subscription, you must ensure that the tables being replicated have been created in the subscriber node. In order to achieve that, you can perform a schema-only dump from publisher to subscriber node.

An example of logical replication

The following example steps work for logical replication between PostgreSQL 10 and 11 versions only.

On the publishing node, create a publication. You can either add all tables or choose to add selected tables to the publication.

-- For adding ALL Tables in Database
-- For adding Selected Tables in Database
CREATE PUBLICATION percpub FOR TABLE scott.employee scott.departments;

On the subscriber node, create a subscription that refers to the publication on the publisher node. Perform a DDL dump of the tables to the subscriber node before creating the subscription, as mentioned above,

$ pg_dump -h publisher_server_ip -p 5432 -d percona -Fc -s -U postgres | pg_restore -d percona -h subscriber_node_ip -p 5432 -U postgres
CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=secret port=5432' PUBLICATION percpub;

The above command also copies the pre-existing data from the tables. If you want to disable the copy of the pre-existing data, you can use the following syntax. It will then only start copying the changes to the publisher after you run this command.

CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=oracle port=5432' PUBLICATION percpub WITH (copy_data = false);

Monitor the replication using the following command on the publishing node.

$ psql
select * from pg_stat_replication;

Replication between PostgreSQL 9.4 and PostgreSQL 11

Now, what about the versions that are older than PostgreSQL 10? For this purpose, there is an extension named


that works for versions from 9.4 until 11. Using pglogical, you can easily replicate PostgreSQL 9.4 to PostgreSQL 11.

The following sequence of steps demonstrates a high-level procedure to setup replication between PG 9.4 and PG 11 using pglogical extension.

Step 1 : Consider pgserver_94 to be the source server with a database : percona_94 running on PostgreSQL 9.4. Create the following extensions.

[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical_origin"
[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical"

Step 2 : Now, you can go ahead and add either selected tables or all the tables in a schema or multiple schemas for replication. In the following example, you can see an error when there is no primary key on one of the tables.

[pgserver_94:] $psql -d percona_94
psql (9.4.21)
Type "help" for help.
percona_94=# SELECT pglogical.create_node(node_name := 'provider1',dsn := 'host= port=5432 dbname=percona_94');
(1 row)
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
ERROR: table pgbench_history cannot be added to replication set default
DETAIL: table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
HINT: Add a PRIMARY KEY to the table
percona_94=# ALTER TABLE pgbench_history ADD PRIMARY KEY (tid,aid,delta);
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
(1 row)

Step 3

On the subscriber node, which is our PostgreSQL 11 database, you can run similar commands as follows.

[pgserver_11:] $psql -d percona_11
psql (11.2)
Type "help" for help.
percona_11=# SELECT pglogical.create_node(node_name := 'subscriber1',dsn := 'host= port=5432 dbname=percona_11 password=secret');
(1 row)
percona_11=# SELECT pglogical.create_subscription(subscription_name := 'subscription1',provider_dsn := 'host= port=5432 dbname=percona_94 password=secret');
(1 row)

Step 4 You can then validate the replication status by querying a few tables pglogical always updates:

percona_11=# select * from pglogical.local_sync_status;
 sync_kind | sync_subid | sync_nspname |   sync_relname   | sync_status | sync_statuslsn
 f         | 1763399739 | public       | pgbench_accounts | r           | 0/2EB7D48
 f         | 1763399739 | public       | pgbench_history  | r           | 0/2EB7D48
 f         | 1763399739 | public       | pgbench_tellers  | r           | 0/2EB7D48
 f         | 1763399739 | public       | pgbench_branches | r           | 0/2EB7D48
 d         | 1763399739 |              |                  | r           | 0/0
(5 rows)
percona_11=# select * from pglogical.subscription;
   sub_id   |   sub_name    | sub_origin | sub_target | sub_origin_if | sub_target_if | sub_enabled |             sub_slot_name              |         sub_rep
lication_sets          | sub_forward_origins | sub_apply_delay
 1763399739 | subscription1 | 2976894835 |  330520249 |    2402836775 |    2049915666 | t           | pgl_percona_11_provider1_subscription1 | {default,defaul
t_insert_only,ddl_sql} | {all}               | 00:00:00
(1 row)

Primary key selection

At step 2, you saw how all the tables of schema : public got added to a replication set by creating a primary key on the table that doesn’t currently have one. The primary key I chose may not be the right one for that table as it is just for demonstration. However, when you choose a primary key, make sure that you are selecting the right one. It needs to be always unique and use column(s) that don’t normally contain NULLs. If you don’t research primary key selection thoroughly, you could cause downtime to your application. Here’s an example error that you could encounter:

[pgserver_94:] $pgbench -c 10 -T 300 -n percona_94
Client 7 aborted in state 12: ERROR: duplicate key value violates unique constraint "pgbench_history_pkey"
DETAIL: Key (tid, aid, delta)=(7, 63268, 2491) already exists.

So far we have seen how you can use pglogical to create replication between an older version to a newer version PostgreSQL. After you have set up replication, you can easily switch your applications to the latest version with a lower downtime.

To see more in action, why not subscribe to our webinar? And don’t forget at Percona Live in Austin, May 28-30 2019, we have two days of PostgreSQL content in a postgres dedicated track.


PostgreSQL: Access ClickHouse, One of the Fastest Column DBMSs, With clickhousedb_fdw

Database management systems are meant to house data but, occasionally, they may need to talk with another DBMS. For example, to access an external server which may be hosting a different DBMS. With heterogeneous environments becoming more and more common, a bridge between the servers is established. We call this bridge a “Foreign Data Wrapper” (FDW). PostgreSQL completed its support of SQL/MED (SQL Management of External Data) with release 9.3 in 2013. A foreign data wrapper is a shared library that is loaded by a PostgreSQL server. It enables the creation of foreign tables in PostgreSQL that act as proxies for another data source.

When you query a foreign table, Postgres passes the request to the associated foreign data wrapper. The FDW creates the connection and retrieves or updates the data in the external data store. Since PostgreSQL planner is involved in all of this process as well, it may perform certain operations like aggregate or joins on the data when retrieved from the data source. I cover some of these later in this post.

ClickHouse Database

ClickHouse is an open source column based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.


clickhousedb_fdw is an open source project – GPLv2 licensed – from Percona. Here’s the link for GitHub project repository:


It is an FDW for ClickHouse that allows you to SELECT from, and INSERT INTO, a ClickHouse database from within a PostgreSQL v11 server.

The FDW supports advanced features like aggregate pushdown and joins pushdown. These significantly improve performance by utilizing the remote server’s resources for these resource intensive operations.

If you would like to follow this post and try the FDW between Postgres and ClickHouse, you can download and set up the ontime dataset for ClickHouse.  After following the instructions, the test that you have the desired data. The ClickHouse client is a client CLI for the ClickHouse Database.

Prepare Data for ClickHouse

Now the data is ready in ClickHouse, the next step is to set up PostgreSQL. We need to create a ClickHouse foreign server, user mapping, and foreign tables.

Install the clickhousedb_fdw extension

There are manual ways to install the clickhousedb_fdw, but clickhousedb_fdw uses PostgreSQL’s coolest extension install feature. By just entering a SQL command you can use the extension:

CREATE EXTENSION clickhousedb_fdw;

CREATE SERVER clickhouse_svr FOREIGN DATA WRAPPER clickhousedb_fdw
OPTIONS(dbname 'test_database', driver '/use/lib/libclickhouseodbc.so');


CREATE FOREIGN TABLE clickhouse_tbl_ontime (  "Year" Int,  "Quarter" Int8,  "Month" Int8,  "DayofMonth" Int8,  "DayOfWeek" Int8,  "FlightDate" Date,  "UniqueCarrier" Varchar(7),  "AirlineID" Int,  "Carrier" Varchar(2),  "TailNum" text,  "FlightNum" text,  "OriginAirportID" Int,  "OriginAirportSeqID" Int,  "OriginCityMarketID" Int,  "Origin" Varchar(5),  "OriginCityName" text,  "OriginState" Varchar(2),  "OriginStateFips" text,  "OriginStateName" text,  "OriginWac" Int,  "DestAirportID" Int,  "DestAirportSeqID" Int,  "DestCityMarketID" Int,  "Dest" Varchar(5),  "DestCityName" text,  "DestState" Varchar(2),  "DestStateFips" text,  "DestStateName" text,  "DestWac" Int,  "CRSDepTime" Int,  "DepTime" Int,  "DepDelay" Int,  "DepDelayMinutes" Int,  "DepDel15" Int,  "DepartureDelayGroups" text,  "DepTimeBlk" text,  "TaxiOut" Int,  "WheelsOff" Int,  "WheelsOn" Int,  "TaxiIn" Int,  "CRSArrTime" Int,  "ArrTime" Int,  "ArrDelay" Int,  "ArrDelayMinutes" Int,  "ArrDel15" Int,  "ArrivalDelayGroups" Int,  "ArrTimeBlk" text,  "Cancelled" Int8,  "CancellationCode" Varchar(1),  "Diverted" Int8,  "CRSElapsedTime" Int,  "ActualElapsedTime" Int,  "AirTime" Int,  "Flights" Int,  "Distance" Int,  "DistanceGroup" Int8,  "CarrierDelay" Int,  "WeatherDelay" Int,  "NASDelay" Int,  "SecurityDelay" Int,  "LateAircraftDelay" Int,  "FirstDepTime" text,  "TotalAddGTime" text,  "LongestAddGTime" text,  "DivAirportLandings" text,  "DivReachedDest" text,  "DivActualElapsedTime" text,  "DivArrDelay" text,  "DivDistance" text,  "Div1Airport" text,  "Div1AirportID" Int,  "Div1AirportSeqID" Int,  "Div1WheelsOn" text,  "Div1TotalGTime" text,  "Div1LongestGTime" text,  "Div1WheelsOff" text,  "Div1TailNum" text,  "Div2Airport" text,  "Div2AirportID" Int,  "Div2AirportSeqID" Int,  "Div2WheelsOn" text,  "Div2TotalGTime" text,  "Div2LongestGTime" text,"Div2WheelsOff" text,  "Div2TailNum" text,  "Div3Airport" text,  "Div3AirportID" Int,  "Div3AirportSeqID" Int,  "Div3WheelsOn" text,  "Div3TotalGTime" text,  "Div3LongestGTime" text,  "Div3WheelsOff" text,  "Div3TailNum" text,  "Div4Airport" text,  "Div4AirportID" Int,  "Div4AirportSeqID" Int,  "Div4WheelsOn" text,  "Div4TotalGTime" text,  "Div4LongestGTime" text,  "Div4WheelsOff" text,  "Div4TailNum" text,  "Div5Airport" text,  "Div5AirportID" Int,  "Div5AirportSeqID" Int,  "Div5WheelsOn" text,  "Div5TotalGTime" text,  "Div5LongestGTime" text,  "Div5WheelsOff" text,  "Div5TailNum" text) server clickhouse_svr options(table_name 'ontime');

postgres=# SELECT a."Year", c1/c2 as Value FROM ( select "Year", count(*)*1000 as c1          
           FROM clickhouse_tbl_ontime          
           WHERE "DepDelay">10 GROUP BY "Year") a                        
           INNER JOIN (select "Year", count(*) as c2 from clickhouse_tbl_ontime          
           GROUP BY "Year" ) b on a."Year"=b."Year" LIMIT 3;
Year |   value    
1987 |        199
1988 | 5202096000
1989 | 5041199000
(3 rows)

Performance Features

PostgreSQL has improved foreign data wrapper processing by added the pushdown feature. Push down improves performance significantly, as the processing of data takes place earlier in the processing chain. Push down abilities include:

  • Operator and function Pushdown
  • Predicate Pushdown
  • Aggregate Pushdown
  • Join Pushdown

Operator and function Pushdown

The function and operators send to Clickhouse instead of calculating and filtering at the PostgreSQL end.

postgres=# EXPLAIN VERBOSE SELECT avg("DepDelay") FROM clickhouse_tbl_ontime WHERE "DepDelay" <10; 
           Foreign Scan  (cost=1.00..-1.00 rows=1000 width=32) Output: (avg("DepDelay"))  
           Relations: Aggregate on (clickhouse_tbl_ontime)  
           Remote SQL: SELECT avg("DepDelay") FROM "default".ontime WHERE (("DepDelay" < 10))(4 rows)

Predicate Pushdown

Instead of filtering the data at PostgreSQL, clickhousedb_fdw send the predicate to Clikhouse Database.

postgres=# EXPLAIN VERBOSE SELECT "Year" FROM clickhouse_tbl_ontime WHERE "Year"=1989;                                  
           Foreign Scan on public.clickhouse_tbl_ontime  Output: "Year"  
           Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("Year" = 1989)

Aggregate Pushdown

Aggregate push down is a new feature of PostgreSQL FDW. There are currently very few foreign data wrappers that support aggregate push down – clickhousedb_fdw is one of them. Planner decides which aggregates are pushed down and which aren’t. Here is an example for both cases.

postgres=# EXPLAIN VERBOSE SELECT count(*) FROM clickhouse_tbl_ontime;
          Foreign Scan (cost=1.00..-1.00 rows=1000 width=8)
          Output: (count(*)) Relations: Aggregate on (clickhouse_tbl_ontime)
          Remote SQL: SELECT count(*) FROM "default".ontime

Join Pushdown

Again, this is a new feature in PostgreSQL FDW, and our clickhousedb_fdw also supports join push down. Here’s an example of that.

postgres=# EXPLAIN VERBOSE SELECT a."Year"
                           FROM clickhouse_tbl_ontime a
                           LEFT JOIN clickhouse_tbl_ontime b ON a."Year" = b."Year";
        Foreign Scan (cost=1.00..-1.00 rows=1000 width=50);
        Output: a."Year" Relations: (clickhouse_tbl_ontime a) LEFT JOIN (clickhouse_tbl_ontime b)
        Remote SQL: SELECT r1."Year" FROM&nbsp; "default".ontime r1 ALL LEFT JOIN "default".ontime r2 ON (((r1."Year" = r2."Year")))

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here. And as you can see, as part of our support commitment, we’re now developing our own open source PostgreSQL projects such as the clickhousedb_fdw. Subscribe to the blog to be amongst the first to know of PostgreSQL and other open source projects from Percona.

As an author of the new clickhousdb_fdw – as well as other  FDWs – I’d be really happy to hear of your use cases and your experience of using this feature.

Photo by Hidde Rensink on Unsplash


PostgreSQL Upgrade Using pg_dump/pg_restore

pg-dump upgrade postgres

PostgreSQL logoIn this blog post, we will explore 




, one of the most commonly used options for performing a PostgreSQL upgrade. It is important to understand the scenarios under which




utilities will be helpful.

This post is the second of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.

About pg_dump


is a utility to perform a backup of single database. You cannot backup multiple databases unless you do so using separate commands in parallel. If your upgrade plan needs global objects to be copied over,


need to be supplemented by


 . To know more about


 , you may refer to our previous blog post.

pg_dump formats


can produce dumps in multiple formats – plain text and custom format – each with own advantages. When you use


with custom format


, you must use


to restore the dump.

If the dump is taken using a plain-text format, pg_dump generates a script file of multiple SQL commands. It can be restored using psql.

A custom format dump, however, is compressed and is not human-readable.

A dump taken in plain text format may be slightly larger in size when compared to a custom format dump.

At times, you may wish to perform schema changes in your target PostgreSQL database before restore, for example, table partitioning. Or you may wish to restore only a selected list of objects from a dump file.

In such cases, you cannot restore a selected list of tables from a plain format dump of a database. If you take the database dump in custom format,  you can use pg_restore, which will help you choose a specific set of tables for restoration.

Steps involved in upgrade

The most important point to remember is that both dump and restore should be performed using the latest binaries. For example, if we need to migrate from version 9.3 to version 11, we should be using the pg_dump binary of PostgreSQL 11 to connect to 9.3 .

When a server is equipped with two different versions of binaries, it is good practice to specify the full path of the pg_dump from the latest version as follows :

/usr/lib/postgresql/11/bin/pg_dump <connection_info_of_source_system> <options>

Getting the global dumps

In PostgreSQL, users/roles are global to the database cluster, and the same user can have privileges on objects in different databases. These are called “Globals” because they are applicable for all the databases within the instance. Creation of globals in the target system at the earliest opportunity is very important, because rest of the DDLs may contain GRANTs to these users/roles. It is good practice to dump the globals into a file, and to examine the file, before importing into destination system. This can be achieved using the following command :

/usr/lib/postgresql/11/bin/pg_dumpall -g -p 5432 > /tmp/globals_only.sql

Since this produces a plain SQL dump file, it can be fed to


connected to the destination server. If there are no modifications required, the globals can be directly piped to the destination server using the command in the next example. Since this is a plain SQL dump file, it can be fed to psql for restore.

/usr/lib/postgresql/11/bin/pg_dumpall -g <source_connection_info> | psql -p <destination_connection_info>

The above command would work for an upgrade in a local server. You can add an additional argument




in the


if you are performing an upgrade to a remote database server.

Schema Only Dumps

The next stage of the migration involves the creation of schema objects. At this point, you might want to move different database objects to different tablespaces, and partition a few of the tables. If such schema modifications are part of the plan, then we should extract the schema definition to a plain text file. Here’s an example command that can be used to achieve this :

/usr/lib/postgresql/11/bin/pg_dump -s -d databasename -p 5432 > /tmp/schema_only.sql

In general, the majority of the database objects won’t need any modifications. In such cases, it is good practice to dump the schema objects as such into the destination database using a


, using a similar command to this:

/usr/lib/postgresql/11/bin/pg_dump -s -d databasename <source_connection> | psql -d database <destination_connection>

Once all the schema objects are created, we should be able to drop only those objects which need modification. We can then recreate them with their modified definition.

Copying data

This is the stage when the majority of the data transfers between the database servers. If there is good bandwidth between source and destination, we should look to achieve maximum parallelism at this stage. In many situations, we could analyze the foreign key dependency hierarchy and import data in parallel batches for a group of tables. Data-only copying is possible using




  flag of



Copying the data of individual tables

You might have to incorporate schema changes as part of an upgrade. In this case, you can copy the data of a few tables individually. We provide an example here:

/usr/lib/postgresql/11/bin/pg_dump <sourcedb_connection_info> -d <database> -a -t schema.tablename | psql <destinationdb_connection_info> <databasename>

There could be special situations where you need to append only a partial selection of the data. This happens especially on time-series data. In such cases, you can use copy commands with a WHERE clause cto extract and import specific data. You can see this in the following example :

/usr/lib/postgresql/11/bin/psql <sourcedb_connection_info> -c "COPY (select * from <table> where <filter condition>)” > /tmp/selected_table_data.sql


pg_dump/pg_restore may be useful if you need to perform a faster upgrade of PostgreSQL server with a modified schema and bloat-free relations. To see more about this method in action, please subscribe to our webinar here.

image based on photos by Skitterphoto and Magda Ehlers from Pexels


Percona University Travels to South America

Percona University April 2019

Percona University April 2019We started hosting Percona University a few years back with the aim of sharing knowledge with the open source database community. The events are held in cities across the world. The next Percona University days will visiting Uruguay, Argentina, and Brazil, in a lightning tour at the end of April.

  • Montevideo, Tuesday, April 23 2019 from 8.30am to 6.30pm
  • Buenos Aires, Thursday, April 25 2019 from 1.30pm to 10.3pm
  • São Paulo, Saturday, April 27 2019 from 9.30am to 7.30pm

What is Percona University?

It is a technical educational event. We encourage people to join us at any point during these talks – we understand that not everyone can take off a half a day from their work or studies. As long as you register – that’s essential.

What is on the agenda for each of the events?

Full agendas and registration forms for the Montevideo, Buenos Aires, and São Paulo events can be accessed at their respective links.

Does the word “University” mean that we won’t cover any in-depth topics, and these events would only interest college/university students?

No, it doesn’t. We designed Percona University presentations for all kinds of “students,” including professionals with years of database industry experience. The word “University” means that this event series is about educating attendees on technical topics (it’s not a sales-oriented event, it’s about sharing knowledge with the community).

Does Percona University cover only Percona technology?

We will definitely mention Percona technology, but we will also focus on real-world technical issues and recommend solutions that work (regardless of whether Percona developed them).

Are there other Percona University events coming up besides these in South America?

We will hold more Percona University events in different locations in the future. Our newsletter is a good source of information about when and where they will occur. If you’d like to partner with Percona in organizing a Percona University event, contact Tom Basil who leads our community team… or Lorraine Pocklington our Community Manager. You can also check our list of technical webinars to get further educational insights.

These events are free and low-key! They aren’t meant to look like a full conference (like our Percona Live series). Percona University has a different format. We make it informal, so it’s designed to be perfect for learning and networking. This is an in-person database community gathering, so feel free to come with interesting cases and tricky questions!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com