Apr
09
2019
--

Continuous Replication From a Legacy PostgreSQL Version to a Newer Version Using Slony

Native streaming replication in PostgreSQL works only between servers running the same major version. We discussed about logical replication in our previous blog post. In that post, we saw how logical replication could help us set up migration between two different PostgreSQL versions. However, logical replication works only for the currently supported versions of PostgreSQL, for example between PostgreSQL 9.4 and PostgreSQL 11. So what about the legacy versions that are older than 9.4? Slony-I could help us meet this replication requirement.

Replication between different PostgreSQL versions with Slony-I is useful for migration from legacy database installations to the latest available version. So what is Slony and how does it work?

This post is the fourth of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.

Slony

Slony is an application-level logical replication implementation for PostgreSQL. Rather, we could say that it is an external replication tool that requires a separate installation and configuration. Slony has been around for a long time. The latest version supports PostgreSQL versions from 8.4 and up to 11.

PostgreSQL logoThe main goal for replication is to ship changes from one database server to another. To better understand the architecture, you should know the terms such as Slon, Events and Slonik in Slony-I.

An aside: Slony means elephants in Russian, and elephants are indeed reputed to have a great memory. A slightly angry, but nevertheless pretty elephant, “Slonik”, looks at you from the PostgreSQL logo image.

Slon

Slon is a daemon that runs on each PostgreSQL node in Slony-I replication. These daemons are used for processing configuration and replication events for each PostgreSQL server. Each PostgreSQL server is called a “node”. All nodes together form a Slony “cluster”.

The “publisher node” is a source for replicated changes. While “subscriber” nodes receive and apply changes from the publisher node.

In order to setup replication, we should specify all replicated tables or “set”. Actual subscription works within a specific set. Changes to the tables being replicated are grouped together into SYNCs. These group of transactions are applied together to the subscriber nodes.

Events

Changes are transferred from the publisher in the form of “events”. When an event is processed by the slon daemon on a remote node, it generates a “confirmation”. Events are also used to notify nodes about configuration changes like adding/removing new nodes, new subscriptions or DDL changes.

Each event has a unique origin identifier, sequence number, transaction id for the snapshot on provider node for this event, multiple arguments, and timestamp with timezone.

Triggers written in PL/pgSQL register all changes in replicated tables. Unfortunately, there is no reliable way yet to handle changes to large objects (BLOBS), DDLs, or changes to users and roles.

slonik

Slonik means a little elephant. It is a command line utility with parser and interpreter and it accepts “slonik scripts” – a simple declarative scripting language. It is designed to overcome the limitations of procedural language. You use slonik commands to set up or modify slony replication and they can be embedded in shell scripts. It can accept commands from standard input or from files. The following example shows how a slonik script being fed to the slonik utility, and then embedded in shell scripts.

The script to create the initial configuration for the simple master-slave setup of our pgbench database looks like this:

#!/bin/sh
slonik <<_EOF_
 cluster name = percona_pg;
 node 1 admin conninfo = 'dbname=pg93 host=pg93_host user=percona_pg93_user';
 node 2 admin conninfo = 'dbname=pg11 host=pg11_host user=percona_pg11_user';
 #--
 # Creates a _$(clustername), this example, _percona_pg schema
 #--
 init cluster ( id=1, comment = 'Legacy PG Node');
 #--
 # Add a list of tables being replicated to a set.
 #--
create set (id=1, origin=1, comment='pgbench');
 set add table (set id=1, origin=1, id=1, fully qualified name = 'public.pgbench_accounts', comment='accounts');
 set add table (set id=1, origin=1, id=2, fully qualified name = 'public.pgbench_branches', comment='branches');
 set add table (set id=1, origin=1, id=3, fully qualified name = 'public.pgbench_tellers', comment='tellers');
 set add table (set id=1, origin=1, id=4, fully qualified name = 'public.pgbench_history', comment='history');
 #--
 # Create the second node (the slave) tell the 2 nodes how to connect to
 # each other and how they should listen for events.
 #--
 store node (id=2, comment = 'Target node', event node=1);
 store path (server = 1, client = 2, conninfo='dbname=pg93 host=pg93_host user=percona_pg93_user');
 store path (server = 2, client = 1, conninfo='dbname=pg11 host=pg11_host user=percona_pg11_user');
_EOF_

Why Slony is useful for migrations?

Despite the benefits of internal logical replication, this external solution is better for migrations between different versions that are older than PostgreSQL 9.4. The trigger-based approach depends on the database API – both older and newer versions should be compatible for PL/pgSQL and SQL syntax.

How to adapt your database for usage with Slony?

  • Your tables have to have primary keys. Add a serial field to all tables without primary key
  • OID blobs will not have their changes replicated. If you have columns with small-length values, you could convert these to BYTEA. For a really large objects like images, it’s better to store data externally e.g. use S3 in Amazon cloud. If it’s too hard to change your application, you could apply blob changes at the last stage of migration.
  • ALTER TABLE and other DDL operations. Slony can’t detect table structure changes. Instead you should use an EXECUTE SCRIPT slonik command to apply a SQL file with DDL or SQL strings to the whole replication cluster.

Online migration from legacy PostgreSQL

  1. Create replication user with superuser privileges. It’s possible to use fine-grained privileges, but they are significantly harder to setup.
  2. Create a database on the destination, setup access by TCP/IP
  3. Copy table definitions from master to slaves
  4. Install Slony-I. On servers with an old OS distribution you might find it simpler to install Slony-I from the source code.
  5. Define cluster, set of tables, and connection information to nodes as a list of slonik commands
  6. Start the slon daemon on each postgresql server. Check standard output or log files for any potential communication errors.
  7. Execute subscription slonik commands to start sync
  8. Test your read-only queries with a newer version of postgres
  9. Once all the data has been replicated and is in sync, stop your applications and repoint them to the new postgres server.
  10. Use “uninstall node” on the newer version of PostgreSQL to remove all traces of Slony replication

Downgrade steps

To downgrade, follow the same procedure as upgrade. Slony allows you to replicate from and to any versions of PosgreSQL supported by your slony version. The minimum supported version is 8.4.

Summary

So far, we have seen a high level overview of how Slony can be helpful to perform an upgrade with the least possible downtime. Come and see more of this in action during our Webinar.  And don’t forget at Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track.


Image derived from photo by Chen Hu on Unsplash

Mar
27
2019
--

PostgreSQL Upgrade Using pg_dump/pg_restore

pg-dump upgrade postgres

PostgreSQL logoIn this blog post, we will explore 

pg_dump

/

pg_restore

, one of the most commonly used options for performing a PostgreSQL upgrade. It is important to understand the scenarios under which

pg_dump

and

pg_restore

utilities will be helpful.

This post is the second of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.

About pg_dump

pg_dump

is a utility to perform a backup of single database. You cannot backup multiple databases unless you do so using separate commands in parallel. If your upgrade plan needs global objects to be copied over,

pg_dump

need to be supplemented by

pg_dumpall

 . To know more about

pg_dumpall

 , you may refer to our previous blog post.

pg_dump formats

pg_dump

can produce dumps in multiple formats – plain text and custom format – each with own advantages. When you use

pg_dump

with custom format

(-Fc)

, you must use

pg_restore

to restore the dump.

If the dump is taken using a plain-text format, pg_dump generates a script file of multiple SQL commands. It can be restored using psql.

A custom format dump, however, is compressed and is not human-readable.

A dump taken in plain text format may be slightly larger in size when compared to a custom format dump.

At times, you may wish to perform schema changes in your target PostgreSQL database before restore, for example, table partitioning. Or you may wish to restore only a selected list of objects from a dump file.

In such cases, you cannot restore a selected list of tables from a plain format dump of a database. If you take the database dump in custom format,  you can use pg_restore, which will help you choose a specific set of tables for restoration.

Steps involved in upgrade

The most important point to remember is that both dump and restore should be performed using the latest binaries. For example, if we need to migrate from version 9.3 to version 11, we should be using the pg_dump binary of PostgreSQL 11 to connect to 9.3 .

When a server is equipped with two different versions of binaries, it is good practice to specify the full path of the pg_dump from the latest version as follows :

/usr/lib/postgresql/11/bin/pg_dump <connection_info_of_source_system> <options>

Getting the global dumps

In PostgreSQL, users/roles are global to the database cluster, and the same user can have privileges on objects in different databases. These are called “Globals” because they are applicable for all the databases within the instance. Creation of globals in the target system at the earliest opportunity is very important, because rest of the DDLs may contain GRANTs to these users/roles. It is good practice to dump the globals into a file, and to examine the file, before importing into destination system. This can be achieved using the following command :

/usr/lib/postgresql/11/bin/pg_dumpall -g -p 5432 > /tmp/globals_only.sql

Since this produces a plain SQL dump file, it can be fed to

psql

connected to the destination server. If there are no modifications required, the globals can be directly piped to the destination server using the command in the next example. Since this is a plain SQL dump file, it can be fed to psql for restore.

/usr/lib/postgresql/11/bin/pg_dumpall -g <source_connection_info> | psql -p <destination_connection_info>

The above command would work for an upgrade in a local server. You can add an additional argument

-h

for

hostname

in the

<destination_connection_info>

if you are performing an upgrade to a remote database server.

Schema Only Dumps

The next stage of the migration involves the creation of schema objects. At this point, you might want to move different database objects to different tablespaces, and partition a few of the tables. If such schema modifications are part of the plan, then we should extract the schema definition to a plain text file. Here’s an example command that can be used to achieve this :

/usr/lib/postgresql/11/bin/pg_dump -s -d databasename -p 5432 > /tmp/schema_only.sql

In general, the majority of the database objects won’t need any modifications. In such cases, it is good practice to dump the schema objects as such into the destination database using a

PIPE

, using a similar command to this:

/usr/lib/postgresql/11/bin/pg_dump -s -d databasename <source_connection> | psql -d database <destination_connection>

Once all the schema objects are created, we should be able to drop only those objects which need modification. We can then recreate them with their modified definition.

Copying data

This is the stage when the majority of the data transfers between the database servers. If there is good bandwidth between source and destination, we should look to achieve maximum parallelism at this stage. In many situations, we could analyze the foreign key dependency hierarchy and import data in parallel batches for a group of tables. Data-only copying is possible using

-a

or

--data-only

  flag of

pg_dump

 .

Copying the data of individual tables

You might have to incorporate schema changes as part of an upgrade. In this case, you can copy the data of a few tables individually. We provide an example here:

/usr/lib/postgresql/11/bin/pg_dump <sourcedb_connection_info> -d <database> -a -t schema.tablename | psql <destinationdb_connection_info> <databasename>

There could be special situations where you need to append only a partial selection of the data. This happens especially on time-series data. In such cases, you can use copy commands with a WHERE clause cto extract and import specific data. You can see this in the following example :

/usr/lib/postgresql/11/bin/psql <sourcedb_connection_info> -c "COPY (select * from <table> where <filter condition>)” > /tmp/selected_table_data.sql

Summary

pg_dump/pg_restore may be useful if you need to perform a faster upgrade of PostgreSQL server with a modified schema and bloat-free relations. To see more about this method in action, please subscribe to our webinar here.


image based on photos by Skitterphoto and Magda Ehlers from Pexels

Mar
18
2019
--

PostgreSQL Upgrade Using pg_dumpall

migrating PostgreSQL using pg_dumpall

PostgreSQL logoThere are several approaches to assess when you need to upgrade PostgreSQL. In this blog post, we look at the option for upgrading a postgres database using pg_dumpall. As this tool can also be used to back up PostgreSQL clusters, then it is a valid option for upgrading a cluster too. We consider the advantages and disadvantages of this approach, and show you the steps needed to achieve the upgrade.

This is the first of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different paths to accomplish postgres upgrade or migration. The series will culminate with a practical webinar to be aired April 17th (you can register here).

We begin this journey by providing you the most straightforward way to carry on with a PostgreSQL upgrade or migration: by rebuilding the entire database from a logical backup.

Defining the scope

Let’s define what we mean by upgrading or migrating PostgreSQL using pg_dumpall.

If you need to perform a PostgreSQL upgrade within the same database server, we’d call that an in-place upgrade or just an upgrade. Whereas a procedure that involves migrating your PostgreSQL server from one server to another server, combined with an upgrade from an older version (let’s say 9.3) to a newer version PostgreSQL (say PG 11.2), can be considered a migration.

There are two ways to achieve this requirement using logical backups :

  1. Using pg_dumpall
  2. Using pg_dumpall + pg_dump + pg_restore

We’ll be discussing the first option (pg_dumpall) here, and will leave the discussion of the second option for our next post.

pg_dumpall

pg_dumpall can be used to obtain a text-format dump of the whole database cluster, and which includes all databases in the cluster. This is the only method that can be used to backup globals such as users and roles in PostgreSQL.

There are, of course, advantages and disadvantages in employing this approach to upgrading PostgreSQL by rebuilding the database cluster using pg_dumpall.

Advantages of using pg_dumpall for upgrading a PostgreSQL server :

  1. Works well for a tiny database cluster.
  2. Upgrade can be completed using just a few commands.
  3. Removes bloat from all the tables and shrinks the tables to their absolute sizes.

Disadvantages of using pg_dumpall for upgrading a PostgreSQL server :

  1. Not the best option for databases that are huge in size as it might involve more downtime. (Several GB’s or TB’s).
  2. Cannot use parallel mode. Backup/restore can use just one process.
  3. Requires double the space on disk as it involves temporarily creating a copy of the database cluster for an in-place upgrade.

Let’s look at the steps involved in performing an upgrade using pg_dumpall:

  1. Install new PostgreSQL binaries in the target server (which could be the same one as the source database server if it is an in-place upgrade).

    -- For a RedHat family OS
    # yum install postgresql11*
    Or
    -- In an Ubuntu/Debian OS
    # apt install postgresql11
  2. Shutdown all the writes to the database server to avoid data loss/mismatch between the old and new version after upgrade.
  3. If you are doing an upgrade within the same server, create a cluster using the new binaries on a new data directory and start it using a port other than the source. For example, if the older version PostgreSQL is running on port 5432, start the new cluster on port 5433. If you are upgrading and migrating the database to a different server, create a new cluster using new binaries on the target server – the cluster may not need to run on a different port other than the default, unless that’s your preference.

    $ /usr/pgsql-11/bin/initdb -D new_data_directory
    $ cd new_data_directory
    $ echo “port = 5433” >> postgresql.auto.conf
    $ /usr/pgsql-11/bin/pg_ctl -D new_data_directory start
  4. You might have a few extensions installed in the old version PostgreSQL cluster. Get the list of all the extensions created in the source database server and install them for the new versions. You can exclude those you get with the contrib module by default. To see the list of extensions created and installed in your database server, you can run the following command.

    $ psql -d dbname -c "\dx"

    Please make sure to check all the databases in the cluster as the extensions you see in one database may not match the list of those created in another database.

  5. Prepare a postgresql.conf file for the new cluster. Carefully prepare this by looking at the existing configuration file of the older version postgres server.
  6. Use pg_dumpall to take a cluster backup and restore it to the new cluster.

    -- Command to dump the whole cluster to a file.
    $ /usr/pgsql-11/bin/pg_dumpall > /tmp/dumpall.sql
    -- Command to restore the dump file to the new cluster (assuming it is running on port 5433 of the same server).
    $ /usr/pgsql-11/bin/psql -p 5433 -f /tmp/dumpall.sql

    Note that i have used the new pg_dumpall from the new binaries to take a backup.
    Another, easier, way is to use PIPE to avoid the time involved in creating a dump file. Just add a hostname if you are performing an upgrade and migration.

    $ pg_dumpall -p 5432 | psql -p 5433
    Or
    $ pg_dumpall -p 5432 -h source_server | psql -p 5433 -h target_server
  7. Run ANALYZE to update statistics of each database on the new server.
  8. Restart the database server using the same port as the source.

Our next post in this series provides a similar way of upgrading your PostgreSQL server while at the same time providing some flexibility to carry on with changes like the ones described above. Stay tuned!


Image based on photo by Sergio Ortega on Unsplash

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com