Aug
05
2022
--

An Illustration of PostgreSQL Bloat

An Illustration of PostgreSQL Bloat

I have been working with Postgres now for many years, most recently as a consultant at Percona for companies with new implementations of Postgres as a result of migrating from Oracle or some other legacy database engine. In some cases, these are Fortune 100 companies with many talented people working for them. However, not all databases work the same and one of the most common observations I make when reviewing a Postgres environment for a client is the amount of table bloat, index bloat, and lack of understanding of its impact on performance – and how to address it.

I wrote a blog about this topic a few years ago and never gave it much thought after its publication. However, with the large number of companies moving to Postgres for obvious reasons, and the lack of true Postgres database management skills needed to support fairly large databases, I thought I would rewrite this blog and bring it back to life with some clarity to help one understand bloat and why it happens.

What causes bloat?

In PostgreSQL, the culprit is Multi-Version Concurrency Control, commonly referred to as MVCC.

MVCC ensures that a transaction against a database will return only data that’s been committed, in a snapshot, even if other processes are trying to modify that data.

Imagine a database with millions of rows in a table. Anytime you update or delete a row, Postgres has to keep track of that row based on a transaction ID. For example, you may be running a long query with transaction ID 100 while someone named John just updated the same table using transaction ID 101. At this point, since you are still inside of transaction 100, which is older than 101 as far as your query is concerned, the changes made by John in transaction ID 101 are not relevant or visible to your query. You are in your own personal bubble of data back before the data has changed. Any new queries from you or anyone else with a transaction ID greater than 101 will see the changes made by John in transaction 101. After all, new transaction IDs are greater than 101, meaning no other transactions are currently in play with IDs less than 101 the data you saw in transaction ID 100 will no longer be needed by the database and will be considered dead but not gone. Hence, bloat!

At a high level, vacuuming is used to free up dead rows in a table so they can be reused. It also helps you avoid transaction ID wraparound.

Let’s go through a few steps to illustrate how all this takes place

In order for Postgres to know which transaction data should be in the result set of your query, the snapshot makes note of transaction information.

Essentially, if your transaction ID is 100, you will only see data from all transaction IDs leading up to 100. As stated above, you will not see data from transaction ID 101 or greater.

Setting up an example

Let’s start by creating a simple table for our example, called percona:

percona=# CREATE TABLE percona ( col1 int );
CREATE TABLE


percona=# INSERT INTO percona values (1);
INSERT 0 1


percona=# INSERT INTO percona values (2);
INSERT 0 1


percona=# INSERT INTO percona values (3);
INSERT 0 1


percona=# INSERT INTO percona values (4);
INSERT 0 1


percona=# INSERT INTO percona values (5);
INSERT 0 1

You can wrap multiple inserts into a single transaction with BEGIN and COMMIT:

percona=# BEGIN;
BEGIN


percona=*# INSERT INTO percona SELECT generate_series(6,10);
INSERT 0 5

percona=*# COMMIT;
COMMIT

Here we can see the 10 rows we inserted into the table, along with some hidden system columns:

percona=# SELECT xmin, xmax, * FROM percona;
   xmin   | xmax | col1
----------+------+------
 69099597 |    0 |    1
 69099609 |    0 |    2
 69099627 |    0 |    3
 69099655 |    0 |    4
 69099662 |    0 |    5
 69099778 |    0 |    6
 69099778 |    0 |    7
 69099778 |    0 |    8
 69099778 |    0 |    9
 69099778 |    0 |   10
(10 rows)

As you can see, values one through five (in the col1 column) have unique transaction IDs (represented in the xmin column)—they were the result of individual INSERT statements, made one after the other. The rows with values of six through 10 share the same transaction ID of 6909978; they were all part of the one transaction we created with the BEGIN and COMMIT statements.

At this point, you may be asking yourself what this has to do with vacuum or autovacuum. We will get there. First, you need to know about the transaction id logic and visually see it for a better understanding as shown above.

How does the table bloat?

In Postgres, the heap is a file containing a list of variable-sized records, in no particular order, that points to the location of a row within a page. (A Postgres page is 8k in size). The pointer to the location is called the CTID.

To view the heap without having to read the raw data from the file, we need to create the following extension inside of our database:

CREATE extension pageinspect;

Now we can inspect the heap for our newly created table and rows:

percona=# SELECT * FROM heap_page_items(get_raw_page('percona',0));


 lp | lp_off | lp_flags | lp_len |  t_xmin  | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |   t_data
----+--------+----------+--------+----------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
  1 |   8160 |        1 |     28 | 69099597 |      0 |        0 | (0,1)  |           1 |       2304 |     24 |        |       | \x01000000
  2 |   8128 |        1 |     28 | 69099609 |      0 |        0 | (0,2)  |           1 |       2304 |     24 |        |       | \x02000000
  3 |   8096 |        1 |     28 | 69099627 |      0 |        0 | (0,3)  |           1 |       2304 |     24 |        |       | \x03000000
  4 |   8064 |        1 |     28 | 69099655 |      0 |        0 | (0,4)  |           1 |       2304 |     24 |        |       | \x04000000
  5 |   8032 |        1 |     28 | 69099662 |      0 |        0 | (0,5)  |           1 |       2304 |     24 |        |       | \x05000000
  6 |   8000 |        1 |     28 | 69099778 |      0 |        0 | (0,6)  |           1 |       2304 |     24 |        |       | \x06000000
  7 |   7968 |        1 |     28 | 69099778 |      0 |        0 | (0,7)  |           1 |       2304 |     24 |        |       | \x07000000
  8 |   7936 |        1 |     28 | 69099778 |      0 |        0 | (0,8)  |           1 |       2304 |     24 |        |       | \x08000000
  9 |   7904 |        1 |     28 | 69099778 |      0 |        0 | (0,9)  |           1 |       2304 |     24 |        |       | \x09000000
 10 |   7872 |        1 |     28 | 69099778 |      0 |        0 | (0,10) |           1 |       2304 |     24 |        |       | \x0a000000
(10 rows)

The table above shows 10 entries with a few columns:

  • lp is the ID of the row/tuple
  • t_xmin is the transaction ID
  • t_ctid is the pointer
  • t_data is the actual data

Currently, the pointer for each row is pointing to itself as determined by the form (page,tupleid). Pretty straightforward.

Now, let’s perform a few updates on a specific row. Let’s change the value of five to 20, then to 30, and finally back to five.

percona=# UPDATE percona SET col1 = 20 WHERE col1 = 5;
UPDATE 1


percona=# UPDATE percona SET col1 = 30 WHERE col1 = 20;
UPDATE 1


percona=# UPDATE percona SET col1 = 5 WHERE col1 = 30;
UPDATE 1

These three changes took place under three different transactions.

What does this mean?  We changed the values for a column three times but never added or deleted any rows. So we should still have 10 rows, right?

percona=# SELECT COUNT(*) FROM percona;
 count
-------
    10
(1 row)

Looks as expected. But wait! Let’s look at the heap now. The real data on disk.

percona=# SELECT * FROM heap_page_items(get_raw_page('percona',0));


 lp | lp_off | lp_flags | lp_len |  t_xmin  |  t_xmax  | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |   t_data
----+--------+----------+--------+----------+----------+----------+--------+-------------+------------+--------+--------+-------+------------
  1 |   8160 |        1 |     28 | 69099597 |        0 |        0 | (0,1)  |           1 |       2304 |     24 |        |       | \x01000000
  2 |   8128 |        1 |     28 | 69099609 |        0 |        0 | (0,2)  |           1 |       2304 |     24 |        |       | \x02000000
  3 |   8096 |        1 |     28 | 69099627 |        0 |        0 | (0,3)  |           1 |       2304 |     24 |        |       | \x03000000
  4 |   8064 |        1 |     28 | 69099655 |        0 |        0 | (0,4)  |           1 |       2304 |     24 |        |       | \x04000000
  5 |   8032 |        1 |     28 | 69099662 | 69103876 |        0 | (0,11) |       16385 |       1280 |     24 |        |       | \x05000000
  6 |   8000 |        1 |     28 | 69099778 |        0 |        0 | (0,6)  |           1 |       2304 |     24 |        |       | \x06000000
  7 |   7968 |        1 |     28 | 69099778 |        0 |        0 | (0,7)  |           1 |       2304 |     24 |        |       | \x07000000
  8 |   7936 |        1 |     28 | 69099778 |        0 |        0 | (0,8)  |           1 |       2304 |     24 |        |       | \x08000000
  9 |   7904 |        1 |     28 | 69099778 |        0 |        0 | (0,9)  |           1 |       2304 |     24 |        |       | \x09000000
 10 |   7872 |        1 |     28 | 69099778 |        0 |        0 | (0,10) |           1 |       2304 |     24 |        |       | \x0a000000
 11 |   7840 |        1 |     28 | 69103876 | 69103916 |        0 | (0,12) |       49153 |       9472 |     24 |        |       | \x14000000
 12 |   7808 |        1 |     28 | 69103916 | 69103962 |        0 | (0,13) |       49153 |       9472 |     24 |        |       | \x1e000000
 13 |   7776 |        1 |     28 | 69103962 |        0 |        0 | (0,13) |       32769 |      10496 |     24 |        |       | \x05000000
(13 rows)

We have 13 rows, not 10. What the heck just happened?

Let’s examine our three separate update transactions (69103876, 69103916, 69103962) to see what’s happening with the heap:

t_xmin (691103876)

  • UPDATE percona SET col1 = 20 WHERE col1 = 5;
  • Logically DELETE tuple ID 5
  • Physically INSERT tuple ID 11
  • UPDATE tuple ID 5 pointer (t_tcid) to point to tuple ID 11

Tuple ID 5 becomes a dead row when its t_xmax gets set to the new transaction ID initiated by transaction 691103876.

t_xmin (69103916)

  • UPDATE percona SET col1 = 30 WHERE col1 = 20;
  • Logically DELETE tuple ID 11
  • Physically INSERT tuple ID 12
  • UPDATE tuple ID 11 pointer (t_tcid) to point to tuple ID 12

Once again, Tuple ID 11 becomes a dead row when its t_xmax set to the new transaction ID initiated by transaction  69103916.

t_xmin (69103962)

  • UPDATE percona SET col1 = 5 WHERE col1 = 30;
  • Logically DELETE tuple ID 12
  • Physically INSERT tuple ID 13
  • UPDATE tuple ID 12 pointer (t_tcid) to point to tuple ID 13

Tuple ID 13 is live and visible to other transactions. It has no t_xmax and the t_ctid (0,13) points to itself.

The key takeaway from this is that we have not added or deleted rows in our table. We still see 10 in the count, but our heap has increased to 13 by an additional three transactions being executed.

At a very high level, this is how PostgreSQL implements MVCC and why we have table bloat in our heap. In essence, changes to data results in a new row reflecting the latest state of the data. The old rows need to be cleaned or reused for efficiency.

Vacuuming the table

The way to deal with the table bloat is to vacuum the table:

percona=# vacuum percona;
VACUUM

Now, let’s examine the heap again:

percona=# SELECT * FROM heap_page_items(get_raw_page('percona',0));

lp | lp_off | lp_flags | lp_len |  t_xmin  | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |   t_data
----+--------+----------+--------+----------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
  1 |   8160 |        1 |     28 | 69099597 |      0 |        0 | (0,1)  |           1 |       2304 |     24 |        |       | \x01000000
  2 |   8128 |        1 |     28 | 69099609 |      0 |        0 | (0,2)  |           1 |       2304 |     24 |        |       | \x02000000
  3 |   8096 |        1 |     28 | 69099627 |      0 |        0 | (0,3)  |           1 |       2304 |     24 |        |       | \x03000000
  4 |   8064 |        1 |     28 | 69099655 |      0 |        0 | (0,4)  |           1 |       2304 |     24 |        |       | \x04000000
  5 |     13 |        2 |      0 |          |        |          |        |             |            |        |        |       |
  6 |   8032 |        1 |     28 | 69099778 |      0 |        0 | (0,6)  |           1 |       2304 |     24 |        |       | \x06000000
  7 |   8000 |        1 |     28 | 69099778 |      0 |        0 | (0,7)  |           1 |       2304 |     24 |        |       | \x07000000
  8 |   7968 |        1 |     28 | 69099778 |      0 |        0 | (0,8)  |           1 |       2304 |     24 |        |       | \x08000000
  9 |   7936 |        1 |     28 | 69099778 |      0 |        0 | (0,9)  |           1 |       2304 |     24 |        |       | \x09000000
 10 |   7904 |        1 |     28 | 69099778 |      0 |        0 | (0,10) |           1 |       2304 |     24 |        |       | \x0a000000
 11 |      0 |        0 |      0 |          |        |          |        |             |            |        |        |       |
 12 |      0 |        0 |      0 |          |        |          |        |             |            |        |        |       |
 13 |   7872 |        1 |     28 | 69103962 |      0 |        0 | (0,13) |       32769 |      10496 |     24 |        |       | \x05000000
(13 rows)

After vacuuming the table, rows five, 11, and 12 are now free to be used again.

So let’s insert another row, with the value of 11 and see what happens:

percona=# INSERT INTO percona values (11);
INSERT 0 1

Let’s examine the heap once more:

percona=# SELECT * FROM heap_page_items(get_raw_page('percona',0));

 lp | lp_off | lp_flags | lp_len |  t_xmin  | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |   t_data
----+--------+----------+--------+----------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
  1 |   8160 |        1 |     28 | 69099597 |      0 |        0 | (0,1)  |           1 |       2304 |     24 |        |       | \x01000000
  2 |   8128 |        1 |     28 | 69099609 |      0 |        0 | (0,2)  |           1 |       2304 |     24 |        |       | \x02000000
  3 |   8096 |        1 |     28 | 69099627 |      0 |        0 | (0,3)  |           1 |       2304 |     24 |        |       | \x03000000
  4 |   8064 |        1 |     28 | 69099655 |      0 |        0 | (0,4)  |           1 |       2304 |     24 |        |       | \x04000000
  5 |     13 |        2 |      0 |          |        |          |        |             |            |        |        |       |
  6 |   8032 |        1 |     28 | 69099778 |      0 |        0 | (0,6)  |           1 |       2304 |     24 |        |       | \x06000000
  7 |   8000 |        1 |     28 | 69099778 |      0 |        0 | (0,7)  |           1 |       2304 |     24 |        |       | \x07000000
  8 |   7968 |        1 |     28 | 69099778 |      0 |        0 | (0,8)  |           1 |       2304 |     24 |        |       | \x08000000
  9 |   7936 |        1 |     28 | 69099778 |      0 |        0 | (0,9)  |           1 |       2304 |     24 |        |       | \x09000000
 10 |   7904 |        1 |     28 | 69099778 |      0 |        0 | (0,10) |           1 |       2304 |     24 |        |       | \x0a000000
 11 |   7840 |        1 |     28 | 69750201 |      0 |        0 | (0,11) |           1 |       2048 |     24 |        |       | \x0b000000
 12 |      0 |        0 |      0 |          |        |          |        |             |            |        |        |       |
 13 |   7872 |        1 |     28 | 69103962 |      0 |        0 | (0,13) |       32769 |      10496 |     24 |        |       | \x05000000
(13 rows)

Our new tuple (with transaction ID 69750201) reused tuple 11, and now the tuple 11 pointer (0,11) is pointing to itself.

As you can see, the heap did not grow to accommodate the new row. It reused an open block for the new row that was made available when we vacuumed the table freeing up dead rows (rows that will no longer be visible in a transaction).

And there you have it. A step-by-step illustration of how bloat occurs in PostgreSQL!

Mar
11
2022
--

PostgreSQL 101 for Non-Postgres DBAs (Simple Backup and Restore)

PostgreSQL 101 Backup and Restore

It’s no surprise that PostgreSQL is becoming the de facto goto database for many. Just a few of the many reasons include advanced technology, scalability, and ways to save money. With that said, we see many experienced DBAs being tasked with migrating existing databases from Oracle, MySQL, SQL Server, and others to Postgres. Although fundamentally speaking, a good DBA should have a conceptual knowledge and understanding of database fundamentals, translating your existing way of performing daily tasks differs from one technology to the other. With that in mind, this blog is addressed to those experienced DBAs that have a well-known and proven set of routines in their old technology and want to know how to perform them in Postgres.

Postgres offers several utilities for performing both physical and logical backups and restores. We will talk about these and how to use them here.

For the purpose of this mini-tutorial, we are assuming all tasks will be performed by the user “postgres”, which has superuser privileges on the database unless otherwise noted.

I Want To …..

 

Logical Backups

Logical backups are processed with native tools such as pg_dump and pg_dumpall. These tools should be included in the default bin directory for postgres installation such as /usr/pgsql-11/bin. If your path is not set, you may want to include the bin directory in your path.

There are many options that can be used when running these tools to customize your data dumps. So, we will cover a few scenarios in this blog.

 

Physical Backups

Physical backups are processed with native tools such as pg_basebackup. Again, these tools should be included in the default bin directory for postgres installation such as /usr/pgsql-11/bin. If your path is not set, you may want to include the bin directory in your path.

You can also use system tools for physical backups such as tar or other archiving tools at your disposal.

 

Prerequisite for Remote Backups

The source database server has to allow a remote connection for the user performing the task. Remember, we are assuming for our examples that the user is postgres. 

 

  1. Create an entry in the pg_hba.conf file similar to the following under the IPv4 connections section.

host    all        postgres        0.0.0.0/0               md5

      2. Edit your postgresql.conf file or whatever file you may be loading for runtime configs and change the parameter listen_addresses to the following:

listen_addresses = ‘*’

Once the above changes are made, reload your configuration file or restart postgres. 

The above examples are pretty open. For security, you most likely will restrict the IP address in the hba.conf file to a more specific IP, Subnet.

In our example, we are allowing postgres to connect from anywhere with password authentication. Thus, the 0.0.0.0/0 and md5. You could change the 0.0.0.0/0 to the address of the other database server like 192.168.1.2/32 We also specify the user postgres with the -U option since it is the user we opened up in the pg_hba.conf file.

If the user running the commands has different credentials on source/target servers you will need to save the password to .pgpass or set the environment variable PGPASSWORD so you are not prompted for the password whenever it is needed.

I want to dump my entire database, including users and credentials to a file.

This is quite a simple task to perform if you have the correct privileges and configuration settings along with the storage needed depending on your database size.

Performing the Data Dump Locally

If you have only one instance of postgres running on your server and have minimal / default configuration for the pg_hba.conf file and your path includes the postgres bin directory, all you need to do as user postgres is ….

pg_dumpall > savedfile.sql

The above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump a specific instance all you do is …

pg_dumpall -p port > savedfile.sql

Replace the port above with the port number the instance you wish to dump is running on.

Performing the data dump remotely.

Although this is pretty much the same thing as on a local server, there are a few things you need to have configured in order to execute this data dump remotely. Plus, your prerequisites need to be addressed.

Now from our remote client or server, we can run the following commands as long as the postgres tools are installed.

pg_dumpall -h host -p port -U postgres > savedfile.sql

Replace the host above with the address of the source DB and port with the port number it is running on.

There are other flags and options you can use. Have a look here for the usage options

I want to dump a specific database only.

Performing the data dump locally.

Similar to the other commands with a slight variation

pg_dump -d dname > savedfile.sql

Like in other scenarios, the above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump from a specific instance all you do is …

pg_dump -p port -d dbname > savedfile.sql

I want to dump a specific database and specific table or tables only.

On a local server

Similar to the other commands with a slight variation

pg_dump -d dname -t tablename > savedfile.sql

Like in other scenarios, the above works well for small databases where you have space on the local server and just want a quick and simple dump of your database.

If you are running multiple instances on the local server and want to dump from a specific instance all you do is …

pg_dump -p port -d dbname -t tablename > savedfile.sql

If you want more than one table, list their names or patterns like so …

pg_dump -d dname -t table1 -t table2 -t table3 > savedfile.sql

From a remote server

Just like in previous examples, specify the connection options with -h host -p port

I only want to dump the users and credentials to restore them somewhere else.

This is just as simple as the above data dumps. However, keep in mind that this will not get you what you need if your instance is an RDS instance. Amazon really locks down what you can do as a privileged user on an RDS instance. Even as Postgres.

From a local server

pg_dumpall -g > users.sql

From a remote server or client. ( saves file locally )

pg_dumpall -g -h host -p port -U postgres > users.sql

You can edit the above dump file and remove any user you do not wish to apply when you restore the file to a different server.

Restoring a Logical Dump

Restoring the newly created backup is a simple task. There are several ways to accomplish this and we will go over a few of these just to get you going.  Keep in mind there is a pg_restore utility as well which we will not be addressing in this blog. Pg_restore lets you get more creative with your dumps and imports.

Again, we assume all actions here are executed as user postgres.

Restoring a pg_dumpall to a local server from a saved file.

psql postgres -f savedfile.sql

Restoring a pg_dumpall to a remote server from a saved file.

psql -h host -p port postgres -f savedfile.sql

Restoring a pg_dumpall to a remote server from the source server.

pg_dumpall | psql -h host -p port postgres

Restoring from a pg_dumpall from a remote server to a remote server.

pg_dumpall -h src_host -p src_port | psql -h target_host -p target_port postgres

Restoring a pg_dump of a specific database from a saved file.

psql dbname -f savedfile.sql

Restoring a pg_dump of a specific database to a remote server from a saved file.

psql -h host -p port dbname -f savedfile.sql

Restoring a pg_dump with a different owner on the target.

Sometimes you don’t have access to the users and credentials on a source database or want them to be different on your target/restored database. Follow these steps to achieve this.

  1. Perform your pg_dump command as noted previously but add the –no-owner option.
  2. Perform the restore as noted above but run the commands as the new owner. 

pg_dump -d database –no-owner > savedfile.sql

psql -U newowner dbname -f savedfile.sql

Remember for remote servers as noted in the other examples, use the -h host -p port and any other connection string option needed.

If the user’s credentials are different and you are prompted for passwords,  read the prerequisites section of this blog.

Let’s Get Physical with pg_baseback

A common way of performing physical backups in Postgres is with the use of pg_basebackup. This tool allows us to generate a physical backup with the necessary WAL files needed to restore or stand up a stand-alone instance.

There are many flags and options for this tool including compression but for the sake of this blog, we will focus on the basic use of pg_basebackup with minimal options.

For the purpose of this document, we will cover physical backups using the native pg_basebackup tool.

NOTE: Typically, one specifies the destination path for the physical backup. This is noted with the -D option of pg_basebackup.

Saving the backup to destination path

pg_basebackup -D /destination/path -Pv –checkpoint=fast

Sending the backup as tar files to the directory path specified

pg_basebackup -D /destination/path -Pv –checkpoint=fast -F t

The above will generate two tar files. A base.tar and a pg_wal.tar

Create a Physical Backup From a Remote Instance

Make sure you have set up the prerequisites as explained here

The only difference between remote and local execution is that for remote, we specify a source server with the -h remote_host and the port postgres is running on with the -p remote_port  

pg_basebackup -h host -p port -D /destination/path -Pv –checkpoint=fast

If the user executing pg_basebackup is not trusted directly from the server executing the pg_basebackup, add the additional option of -U username. For example …

pg_basebackup -U postgres -h host -p port -D /destination/path -Pv –checkpoint=fast

Stand up a Local Instance of Postgres using pg_basebackup

Tar file method

If you execute the pg_baseback with the tar file option, it will generate two tar files. A base.tar and a pg_wal.tar 

Extract the base.tar. If you do not have different WAL files to restore, extract the pg_wal.tar and place the wal segment file in the pg_wal directory.

Directory method

Make sure the directory where the new cluster will be located exists with the proper permissions and storage capacity. Remember, this will consume the same amount of space as the source database.

Define where the target database will reside.

  • mkdir -p /destination/path
  • chmod 700 /destination/path
  • chown postgres:postgres  /destination/path

As user postgres, run the following command assuming pg_basebackup is in your path.

Source database is local

pg_basebackup -D /destination/path-Pv –checkpoint=fast -X stream

Source database is on a remote server

pg_basebackup -h host -p port -D /destination/path-Pv –checkpoint=fast -X stream

What does the above do?

  1. Assumes postgres is running on the localhost using the default port of 5432 and the user executing it has the necessary privs to do so.
  2. initiate a pg_basebackup of the current and running instance of postgres.
  3. Save the copy to the path specified after the -D 
  4. Optionally, the -Pv will show the progress and verbose output of the process.
  5. Perform a fast checkpoint rather than spreading it out. Makes the backup start sooner.
  6. Stream the WAL changes that are happening on the running cluster and save them in the new cluster. This will allow for starting the new cluster without additional WALs.

The above applies to whether the database is remote or not.

Starting the separate instance of postgres

When the pg_basebackup completes, to start up the new local instance, go into the new data directory /destination/path modify the postgresql.conf file or whatever file you may have defined your previous port in..

  • Set the port to a number not in use such as 5433. I.e  port = 5433
  • Modify any memory parameters necessary
  • Make sure, if archiving is enabled, it archives to a different location than the original cluster.

You can then proceed to start the new instance of postgres as follows:

pg_ctl -D /destination/path -o “-p 5433” start

You should now be able to connect to the new cluster with the exact credentials as the source cluster with 

psql -p 5433

Stand up a remote cluster

This process is pretty much identical to the local cluster process above. The only difference is you will specify a host and credentials.

From the remote target host 

pg_basebackup -h source_server -p port -U username -D /destination/path  -Pv –checkpoint=fast -X stream

As you can see, we are simply adding a connection string to the original command we ran for the local copy. This will generate the backup on the remote host and save it to the local destination path.

Once the copy is placed on the target host, if necessary, change your port and archive location if archiving is enabled as mentioned above.

Last words

The above examples are meant to get you started with basic backups and restores. They do not cover more advanced options such as archiving of wal files, point in time recovery, etc … This will be addressed in a future blog or by simply searching online.  Furthermore, using backups to stand up replicas will also be addressed in future blog postings.

May
05
2021
--

Timescale grabs $40M Series B as it goes all in on cloud version of time series database

Timescale, makers of the open-source TimescaleDB time series database, announced a $40 million Series B financing round today. The investment comes just over two years after it got a $15 million Series A.

Redpoint Ventures led today’s round, with help from existing investors Benchmark, New Enterprise Associates, Icon Ventures and Two Sigma Ventures. The company reports it has now raised approximately $70 million.

TimescaleDB lets users measure data across a time dimension, so anything that would change over time. “What we found is we need a purpose-built database for it to handle scalability, reliability and performance, and we like to think of ourselves as the category-defining relational database for time series,” CEO and co-founder Ajay Kulkarni explained.

He says that the choice to build their database on top of Postgres when it launched four years ago was a key decision. “There are a few different databases that are designed for time series, but we’re the only one where developers get the purpose-built time series database plus a complete Postgres database all in one,” he said.

While the company has an open-source version, last year it decided rather than selling an enterprise version (as it had been), it was going to include all of that functionality in the free version of the product and place a bet entirely on the cloud for revenue.

“We decided that we’re going to make a bold bet on the cloud. We think cloud is where the future of database adoption is, and so in the last year […] we made all of our enterprise features free. If you want to test it yourself, you get the whole thing, but if you want a managed service, then we’re available to run it for you,” he said.

The community approach is working to attract users, with over 2 million monthly active databases, some of which the company is betting will convert to the cloud service over time. Timescale is based in New York City, but it’s a truly remote organization, with 60 employees spread across 20 countries and every continent except Antarctica.

He says that as a global company, it creates new dimensions of diversity and different ways of thinking about it. “I think one thing that is actually kind of an interesting challenge for us is what does D&I mean in a totally global org. A lot of people focus on diversity and inclusion within the U.S., but we think we’re doing better than most tech companies in terms of racial diversity, gender diversity,” he said.

And being remote-first isn’t going to change even when we get past the pandemic. “I think it may not work for every business, but I think being remote first has been a really good thing for us,” he said.

 

Nov
09
2020
--

Deep Dive Into PostgreSQL Indexes – Free Course at Percona University Online

Postgresql indexes percona

Postgresql indexes perconaPercona University Online has released its second free course, “A Deep Dive Into PostgreSQL Indexes” by Ibrar Ahmed, Senior Software Engineer at Percona.

Indexes are a basic feature of relational databases. PostgreSQL offers a rich collection of index options for developers and designers. But users need to understand the basic concept of indexes, to be able to compare the different index types and how they apply to different application scenarios. Only then can you make the best decisions about index strategy and design. 

This course consists of 13 short videos. Pass a brief quiz afterward and receive a Certificate of Completion from Percona. Begin the course on Google Classroom here. If you’re prompted for a class identifier, enter code xk5k6fz. The lesson list:

  • Lesson 1: Overview
  • Lesson 2: Heap vs Index
  • Lesson 3: How to Create Indexes?
  • Lesson 4: How to create an Expression Index?
  • Lesson 5: Advantages of a Partial Index
  • Lesson 6: Index Types: B-Tree
  • Lesson 7: Index Types: What is the HASH Index?
  • Lesson 8: Index Types: What is the BRIN Index?
  • Lesson 9: Index Types: GIN and GIST
  • Lesson 10: How to use different types of Indexes?
  • Lesson 11: Index Only Scans
  • Lesson 12: How to Find Duplicate Indexes?
  • Lesson 13: Unused Indexes
  • Course Quiz & Certificate of Completion

You can view the lessons directly on YouTube but you’re eligible for the Certificate of Completion only by attending via Google Classroom. For questions or suggestions, visit the Percona Forum Training category.

In case you missed it, see our initial Percona University Online course How to Upgrade to MySQL 8.0.

PostgreSQL Indexes

Mar
11
2019
--

Switch your PostgreSQL Primary for a Read Replica, Without Downtime

postgres read replica from primary

PostgreSQL logoIn my ongoing research to identify solutions and similarities between MySQL – PostgreSQL, I recently faced a simple issue. I needed to perform a slave shift from one IP to another and I did not want to have to restart the slave that is serving the reads. In MySQL, I can repoint the replication online with the command Change Master TO, so I was looking for similar solution in postgres. In my case, I could also afford some stale reads, so a few seconds delay would have been OK, but I couldn’t take down the server.

After brief research, I noticed that there is not a solution that allow you to do that without restarting the PostgreSQL server instance.
I was a bit disappointed, because I was just trying to move the whole traffic from one subnet to another, so not really changing the Master, but just the pointer.

At this point I raised my question to my colleagues who are experts in PG. Initially they confirmed to me that there is no real dynamic solution/command for that. However, while discussing this, one of them (Jobin Augustine) suggested a not “officially supported” way, that might work.

In brief, given that the WAL Receiver uses its own process, killing it would trigger an internal refresh operation, and that could result in having the replication restart from the new desired configuration.

This was an intriguing suggestion, but I wondered if it might have some negative side effects. In any case, I decided to try it and see what would happen.

This article describe the process I followed to test the approach. To be clear:  this is not an “Official” solution, and is not recommended as best practice.

From now on in this article I will drop the standard MySQL terms and instead use Primary for Master and Replica for Slave.

Scenarios

I carried out two main tests:

  1. No load in writing
  2. Writing happening

for each of these I took these steps:

a) move Replica to same Primary (different ip)
b) move Replica to different Primary/Replica, creating a chain, so from:

+--------+
                          | Primary|
                          +----+---+
                               |
                +--------+     |    +--------+
                |Replica1+<----+--->+Replica2|
                +--------+          +--------+

To:

+-------+
                          |Primary|
                          +---+---+
                              |
                              v
                          +---+----+
                          |Replica2|
                          +---+----+
                              |
                              v
                          +---+----+
                          |Replica1|
                          +--------+

The other thing was to try to be as non-invasive as possible. Given that, I used KILL SIGQUIT(3) instead of the more brutal SIGKILL.

SIGQUIT “The SIGQUIT signal is sent to a process by its controlling terminal when the user requests that the process quit and perform a core dump.

To note that I did try this with SIGTERM (15) which is the nicest approach, but it didn’t in fact force the process to perform the shift as desired.

In general in all the following tests what I execute is:

ps aux|grep 'wal receiver'
kill -3 <pid>

These are the current IPs for node:

Node1 (Primary):

NIC1 = 192.168.1.81
NIC2 = 192.168.4.81
NIC3 = 10.0.0.81

Node2 (replica1):

NIC1 = 192.168.1.82
NIC2 = 192.168.4.82
NIC3 = 10.0.0.82

Node1 (replica2):

NIC1 = 192.168.1.83
NIC2 = 192.168.4.83
NIC3 = 10.0.0.83

The starting position is:

select pid,usesysid,usename,application_name,client_addr,client_port,backend_start,state,sent_lsn,write_lsn,flush_lsn,sync_state from pg_stat_replication;
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 22495 |    24601 | replica | node2            | 192.168.4.82 |       49518 | 2019-02-06 11:07:46.507511-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async

And now let’s roll the ball and see what happen.

Experiment 1 – moving to same Primary no load

I will move Node2 to point to 192.168.1.81

In my recovery.conf
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

change to:

primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.1.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

[root@pg1h3p82 data]# ps aux|grep 'wal receiver'
postgres 8343 0.0 0.0 667164 2180 ? Ss Feb06 16:27 postgres: wal receiver process streaming 10/FD6C60E8

Checking the replication status:

[root@pg1h3p82 data]# ps aux|grep 'wal receiver'
postgres  8343  0.0  0.0 667164  2180 ?        Ss   Feb06  16:27 postgres: wal receiver process   streaming 10/FD6C60E8
                                                                  Tue 19 Feb 2019 12:10:22 PM EST (every 1s)
 pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 23748 |    24601 | replica | node2            | 192.168.4.82 |       49522 | 2019-02-19 12:09:31.054915-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
(2 rows)
                                                                  Tue 19 Feb 2019 12:10:23 PM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
(1 row)
                                                                  Tue 19 Feb 2019 12:10:26 PM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 23756 |    24601 | replica | node2            | 192.168.1.82 |       37866 | 2019-02-19 12:10:26.904766-05 | catchup   | 10/FD460000 | 10/FD3A0000 | 10/FD6C60E8 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
(2 rows)
                                                                  Tue 19 Feb 2019 12:10:28 PM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 23756 |    24601 | replica | node2            | 192.168.1.82 |       37866 | 2019-02-19 12:10:26.904766-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
(2 rows)

It takes six seconds to kill the process, shift to a new IP, and perform the catch up.

Experiment 2 – moving to Different Primary (as a chain of replicas) No load

I will move Node2 to point to 192.168.4.83

In my recovery.conf
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.1.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
change to:
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.83 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

[root@pg1h3p82 data]# ps aux|grep 'wal receiver'
postgres 25859 0.0 0.0 667164 3484 ? Ss Feb19 1:53 postgres: wal receiver process

On Node1

Thu 21 Feb 2019 04:23:26 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
 31241 |    24601 | replica | node2            | 192.168.1.82 |       38232 | 2019-02-21 04:17:24.535662-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async
(2 rows)
                                                                  Thu 21 Feb 2019 04:23:27 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async

On Node3

pid | usesysid | usename | application_name | client_addr | client_port | backend_start | state | sent_lsn | write_lsn | flush_lsn | sync_state
-----+----------+---------+------------------+-------------+-------------+---------------+-------+----------+-----------+-----------+------------
(0 rows)
                                                                  Thu 21 Feb 2019 04:23:30 AM EST (every 1s)
 pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 1435 |    24601 | replica | node2            | 192.168.4.82 |       58116 | 2019-02-21 04:23:29.846798-05 | streaming | 10/FD6C60E8 | 10/FD6C60E8 | 10/FD6C60E8 | async

In this case, shifting to a new primary took four seconds.

Now all this is great, but I was working with NO load, what would happen if we have read/write taking place?

Experiment 3 – moving to same Primary WITH Load

I will move Node2 to point to 192.168.4.81

In my recovery.conf
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.1.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
change to:
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

[root@pg1h3p82 data]# ps aux|grep 'wal receiver'
postgres 20765 0.2 0.0 667196 3712 ? Ss 06:23 0:00 postgres: wal receiver process streaming 11/E33F760

Thu 21 Feb 2019 06:23:03 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn  | write_lsn  | flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+------------+------------+------------+------------
 31649 |    24601 | replica | node2            | 192.168.1.82 |       38236 | 2019-02-21 06:21:23.539493-05 | streaming | 11/8FEC000 | 11/8FEC000 | 11/8FEC000 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/8FEC000 | 11/8FEC000 | 11/8FEC000 | async
                                                                 Thu 21 Feb 2019 06:23:04 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn  | write_lsn  | flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+------------+------------+------------+------------
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/904DCC0 | 11/904C000 | 11/904C000 | async
                                                                 Thu 21 Feb 2019 06:23:08 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn  | write_lsn  | flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+------------+------------+------------+------------
 31778 |    24601 | replica | node2            | 192.168.4.82 |       49896 | 2019-02-21 06:23:08.978179-05 | catchup   | 11/9020000 |            |            | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/9178000 | 11/9178000 | 11/9178000 | async
                                                                 Thu 21 Feb 2019 06:23:09 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn  | write_lsn  | flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+------------+------------+------------+------------
 31778 |    24601 | replica | node2            | 192.168.4.82 |       49896 | 2019-02-21 06:23:08.978179-05 | streaming | 11/91F7860 | 11/91F7860 | 11/91F7860 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/91F7860 | 11/91F7860 | 11/91F7860 | async

In this case shifting to a new primary takes six seconds.

Experiment 4 – moving to Different Primary (as a chain of replicas) No load

I move Node2 to point to 192.168.4.83
In my recovery.conf
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

change to:
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.83 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

[root@pg1h3p82 data]# ps aux|grep 'wal receiver'
postgres 21158 6.3 0.0 667196 3704 ? Ds 06:30 0:09 postgres: wal receiver process streaming 11/4F000000

Node1

Thu 21 Feb 2019 06:30:56 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 31778 |    24601 | replica | node2            | 192.168.4.82 |       49896 | 2019-02-21 06:23:08.978179-05 | streaming | 11/177F8000 | 11/177F8000 | 11/177F8000 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/177F8000 | 11/177F8000 | 11/177F8000 | async
(2 rows)
                                                                  Thu 21 Feb 2019 06:30:57 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/17DAA000 | 11/17DAA000 | 11/17DAA000 | async
(1 row)

Node3

Thu 21 Feb 2019 06:31:01 AM EST (every 1s)
 pid | usesysid | usename | application_name | client_addr | client_port | backend_start | state | sent_lsn | write_lsn | flush_lsn | sync_state
-----+----------+---------+------------------+-------------+-------------+---------------+-------+----------+-----------+-----------+------------
(0 rows)
                                                                 Thu 21 Feb 2019 06:31:02 AM EST (every 1s)
 pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |  state  |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
------+----------+---------+------------------+--------------+-------------+-------------------------------+---------+-------------+-------------+-------------+------------
 1568 |    24601 | replica | node2            | 192.168.4.82 |       58122 | 2019-02-21 06:31:01.937957-05 | catchup | 11/17960000 | 11/17800000 | 11/177F8CC0 | async
(1 row)
                                                                  Thu 21 Feb 2019 06:31:03 AM EST (every 1s)
 pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
 1568 |    24601 | replica | node2            | 192.168.4.82 |       58122 | 2019-02-21 06:31:01.937957-05 | streaming | 11/1A1D3D08 | 11/1A1D3D08 | 11/1A1D3D08 | async
(1 row)

In this case shifting to a new primary took seven seconds.

Finally, I did another test. I was wondering, can I move the server Node2 back under the main Primary Node1 while writes are happening?

Well, here’s what happened:

In my recovery.conf
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.83 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
change to:
primary_conninfo = 'application_name=node2 user=replica password=replica connect_timeout=10 host=192.168.4.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'

After I kill the process as I did in the previous examples, Node2 rejoined the Primary Node1, but …

Thu 21 Feb 2019 06:33:58 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
  1901 |    24601 | replica | node2            | 192.168.4.82 |       49900 | 2019-02-21 06:33:57.81308-05  | catchup   | 11/52E40000 | 11/52C00000 | 11/52BDFFE8 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/5D3F9EC8 | 11/5D3F9EC8 | 11/5D3F9EC8 | async

… Node2 was not really able to catch up quickly, or at least not able to do that until the load was on the primary and high. As soon I reduced the application pressure:

Thu 21 Feb 2019 06:35:29 AM EST (every 1s)
  pid  | usesysid | usename | application_name | client_addr  | client_port |         backend_start         |   state   |  sent_lsn   |  write_lsn  |  flush_lsn  | sync_state
-------+----------+---------+------------------+--------------+-------------+-------------------------------+-----------+-------------+-------------+-------------+------------
  1901 |    24601 | replica | node2            | 192.168.4.82 |       49900 | 2019-02-21 06:33:57.81308-05  | streaming | 11/70AE8000 | 11/70000000 | 11/70000000 | async
 22449 |    24601 | replica | node3            | 192.168.4.83 |       43648 | 2019-02-06 10:56:32.612439-05 | streaming | 11/70AE8000 | 11/70AE8000 | 11/70AE8000 | async

Node2 was able to catch up and align itself.

Conclusions

In all tests , the Replica was able to rejoin the Primary or the new primary, with obvious different times.

From the tests I carried out so far, it seems that modifying the replication source, and then killing the “WAL receiver” thread, is a procedure that allows us to shift the replication source without the need for a service restart.

This is even more efficient compared to the MySQL solution, given the time taken for the recovery and the flexibility.

What I am still wondering is IF this might cause some data inconsistency issues or not. I asked some of the PG experts inside the company, and it seems that the process should be relatively safe, but I would appreciate any feedback/comment in case you know this may not be a safe operation.

Good PostgreSQL to everybody!


Photo by rawpixel.com from Pexels

Mar
07
2018
--

Percona Live 2018 Featured Talk: Securing Your Data on PostgreSQL with Payal Singh

Payal PostgreSQL 1

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Payal Singh, DBA at OmniTI Computer Consulting Inc. Her talk is titled Securing Your Data on PostgreSQL. There is often a lack of understanding about how best to manage minimum basic application security features – especially with major security features being released with every major version of PostgreSQL. In our conversation, we discussed how Payal works to improve application security using Postgres:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Payal: I’m primarily a data addict. I fell in love with databases when it was first taught to me in high school. The declarative SQL syntax was intuitive to me, and efficient compared to other languages I had used (C and C++). I realized that if given the opportunity, I’d choose to become a database administrator. I joined OmniTI in summer of 2012 as a web engineer intern during my Masters, but grabbed the chance to work on an internal database migration project. Working with the DBA team gave me a lot of new insight and exposure, especially into open source databases. The more I learned, the more I loved my job. Right after completing my Masters I joined OmniTI as a full-time database administrator, and never looked back!

Percona: Your talk is titled ” Securing Your Data on PostgreSQL”. Why do you think that security (or the lack of it) is such an issue?

Payal: Securing your data is critical. In my experience, the one reason people using commercial databases are apprehensive of switching to open source alternatives is a lack of exposure to security features. If you look at open source databases today, specifically PostgreSQL, it has the most advanced security features: data encryptionauditingrow-level security to name a few. People don’t know about them, though. As a FOSS project, we don’t have a centralized marketing team to advertise these features to our potential user base, which makes it necessary to spread information through other channels. Speaking about it at a popular conference like Percona Live is one of them!

In addition to public awareness, Postgres is advancing at a lightning pace. With each new major version released every year, a bunch of new security feature additions and major improvements in existing security features are added. So much so that it becomes challenging to keep up with all these features, even for existing Postgres users. My talk on Postgres security aims to inform current as well as prospective Postgres users about the advanced security features that exist and their use case, useful tips to use them, the gotchas, what’s lacking and what’s currently under development.

Percona: Is PostgreSQL better or worse with security and security options than either MySQL or MongoDB? Why?

Payal PostgreSQL 1Payal: I may be a little biased, but I think Postgres is the best database from a security point of view. MySQL is pretty close though! There are quite a few reasons why I consider Postgres to be the best, but I’d like to save that discussion for my talk at Percona Live! For starters though, I think that Postgres’s authentication and role architecture significantly clearer and more straightforward than MySQL’s implementation. Focusing strictly on security, I’d also say that access control and management is more granular and customizable in Postgres than it is in MySQL – although here I’d have to say MySQL’s ACL is easier and more intuitive to manage.

Percona: What is the biggest challenge for database security we are facing?

Payal: For all the databases? I’d say with the rapid growth of IoT, encrypted data processing is a huge requirement that none of the well-known databases currently provide. Even encryption of data at rest outside of the IoT context requires more attention. It is one of the few things that a DBMS can do as a last-ditch effort to protect its data in SQL injection attacks, if all other layers of security (network, application layer, etc.) have failed (which very often is the case).

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Payal: My talk is a run-through of all current and future Postgres security features, from the basic to the very advanced and niche. It is not an isolated talk that assumes Postgres is the only database in the world. I often compare and contrast other database implementations of similar security features as well. Not only is it a decent one-hour primer for people new and interested in Postgres, but also a good way to weigh the pros and cons among databases from a security viewpoint.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Payal: I’m looking forward to all the great talks! I got a lot of information out of the talks at Percona Live last year. The tutorials on new MySQL features were especially great!

Want to find out more about this Percona Live 2018 featured talk, and Payal and PostgreSQL security? Register for Percona Live 2018, and see her talk Securing Your Data on PostgreSQL. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
09
2018
--

Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM)

Collecting PostgreSQL Information using Percona Monitoring and Management

In this article, we’ll describe how to collect PostgreSQL metrics with Percona Monitoring and Management (PMM).

We designed Percona Monitoring and Management (PMM) to be the best tool for MySQL and MongoDB performance investigation. At the same time, it’s built on mature opensource components: Prometheus’ time series database and Grafana. Starting from PMM 1.4.0. it’s possible to add monitoring for any service supported by Prometheus.

Demo

# install docker and docker-compose.
git clone https://github.com/ihanick/pmm-postgresql-demo.git
cd pmm-postgresql-demo
docker-compose build
docker-compose up

At this point, we are running exporter, PostgreSQL and the PMM server, but pmm-client on the PostgreSQL server isn’t configured.

docker-compose exec pg sh /root/initpmm.sh

Now we configured pmm client and added external exporter.

Let’s assume that you have executed commands above on the localhost. At this point we have several URLs:

We also need to create graphs for our new exporter. This could be done manually (import JSON), or you can import the existing dashboard Postgres_exporter published in the Grafana gallery by number in the catalog:

  1. Go to your PMM server web interface and press on the Grafana icon at the top left corner, then dashboards, the import.
  2. Copy and paste the dashboard ID from the Grafana site to “Grafana.com Dashboard” field, and press load.
  3. In the next dialog, choose Prometheus as a data source and continue.

PostgreSQL performance graphs can be seen at: http://localhost:8080/graph/dashboard/db/postgres_exporter?orgId=1

collect PostgreSQL metrics with Percona Monitoring and Management
PMM PostgreSQL postgres_exporter template

 

PMM-PostgreSQL Demo Under the Hood

To move this configuration to production, we need to understand how this demo works.

PMM Server

First of all, you need an existing PMM Server. You can find details on new server configuration at Deploying Percona Monitoring and Management.

In my demo I’m starting PMM without volumes, and all metrics dropped after using the docker-compose down command. Also, there is no need to use port 8080 for PMM, set it up with SSL support and password in production.

PostgreSQL Setup

I’m modifying the latest default PostgreSQL image to:

Of course, you can use a dedicated PostgreSQL server instead of one running inside a docker-compose sandbox. The only requirement is that the PMM server should be able to connect to this server.

User creation and permissions:

CREATE DATABASE postgres_exporter;
CREATE USER postgres_exporter PASSWORD 'password';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;
-- If deploying as non-superuser (for example in AWS RDS)
-- GRANT postgres_exporter TO :MASTER_USER;
CREATE SCHEMA postgres_exporter AUTHORIZATION postgres_exporter;
CREATE VIEW postgres_exporter.pg_stat_activity
AS
  SELECT * from pg_catalog.pg_stat_activity;
GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;
CREATE VIEW postgres_exporter.pg_stat_replication AS
  SELECT * from pg_catalog.pg_stat_replication;
GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;

To simplify setup, you can use a superuser account and access pg_catalog directly. To improve security, allow this user to connect only from exporter host.

PMM Client Setup on PostgreSQL Host

You can obtain database-only statistics with just the external exporter, and you can use any host with pmm-client installed. Fortunately, you can also export Linux metrics from the database host.

After installing the pmm-client package, you still need to configure the system. We should point it to the PMM server and register the external exporter (and optionally add the linux:metrics exporter).

#!/bin/sh
pmm-admin config --client-name pg1 --server pmm-server
pmm-admin add external:metrics postgresql pgexporter:9187
# optional
pmm-admin add linux:metrics
# other postgresql instances
pmm-admin add external:instances postgresql 172.18.0.3:9187

It’s important to keep the external exporter job name as “postgresql”, since all existing templates check it. There is a bit of inconsistency here: the first postgresql server is added as external:metrics, but all further servers should be added as external:instances.

The reason is the first command creates the Prometheus job and first instance, and further servers can be added without job creation.

PMM 1.7.0 external:service

Starting from PMM 1.7.0 the setup simplified if exporter located on the same host as pmm-client:

pmm-admin config --client-name pg1 --server pmm-server
pmm-admin add external:service --service-port=9187 postgresql

pmm-admin add external:metrics or pmm-admin add external:instances are not required if you are running exporter on the same host as pmm-client.

Exporter Setup

Exporter is a simple HTTP/HTTPS server returning one page. The format is:

curl -si http://172.17.0.4:9187/metrics|grep pg_static
# HELP pg_static Version string as reported by postgres
# TYPE pg_static untyped
pg_static{short_version="10.1.0",version="PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit"} 1

As you can see, it’s a self-describing set of counters and string values. The Prometheus time series database built-in to PMM connects to the web server and stores the results on disk. There are multiple exporters available for PostgreSQL. postgres_exporter is listed as a third-party on the official Prometheus website.

You can compile exporter by yourself, or run it inside docker container. This and many other exporters are written in Go and compiled as a static binary so that you can copy the executable from the host with same CPU architecture. For production setups, you probably will run exporter from a database host directly and start the service with systemd.

In order to check network configuration issues, login to pmm-server and use the curl command from above. Do not forget to replace 172.17.0.4:9187 with the appropriate host:port (use the same IP address or DNS name as the pmm-admin add command).

You configure postgres_exporter with a single environment variable:

DATA_SOURCE_NAME=postgresql://postgres_exporter:password@pg:5432/postgres_exporter?sslmode=disable

Make sure that you provide the correct credentials, including the database name.

Run external exporter directly on database server

In order to simplify production setup, you can run exporter directly from the same server as you are using for running PostgreSQL.
This method allows you to use pmm-admin add external:service command recently added to PMM.

# Copy exporter binary from docker container to the local directory to skip build from sources
docker cp pmmpostgres_pgexporter_1:/postgres_exporter ./
# copy exporter binary to database host, use scp instead for existing database server.
docker cp postgres_exporter pmmpostgres_pg_1:/root/
# login to database server shell
docker exec -it pmmpostgres_pg_1 bash
# start exporter
DATA_SOURCE_NAME='postgresql://postgres_exporter:password@127.0.0.1:5432/postgres_exporter?sslmode=disable' ./postgres_exporter

Grafana Setup

In the demo, I’ve used Postgres_exporter dashboard. Use the same site and look for other PostgreSQL dashboards if you need more. The exporter provides many parameters, and not all of them are visualized in this dashboard.

For huge installations, you may find that filtering servers by “instance name” is not comfortable. Write your own JSON for the dashboard, or try to use one from demo repository. It’s the same as dashboard 3742, but uses the hostname for filtering and Prometheus job name in the legends.

All entries of instance=~"$instance" get replaced with instance=~"$host:.*".

The modification allows you to switch between PostgreSQL servers with host instead of “instance”, and see CPU and disk details for the current database server instead of the previously selected host.

Notice

This blog post on how to collect PostgreSQL metrics with Percona Monitoring and Management is not an official integration of PostgreSQL and PMM. I’ve tried to describe complex external exporters setup. Instead of PostgreSQL, you can use any other services and exporters with a similar setup, or even create your own exporter and instrument your application. It’s a great thing to see correlations between application activities and other system components like databases, web servers, etc.

Sep
07
2017
--

Always Verify Examples When Comparing DB Products (PostgreSQL and MySQL)

PostgreSQL and MySQL

PostgreSQL and MySQLIn this blog post, I’ll look at a comparison of PostgreSQL and MySQL.

I came across a post from Hans-Juergen Schoenig, a Postgres consultant at Cybertec. In it, he dismissed MySQL and showed Postgres as better. While his post ignores most of the reasons why MySQL is better, I will focus on where his post is less than accurate. Testing for MySQL was done with Percona Server 5.7, defaults.

Mr. Schoenig complains that MySQL changes data types automatically. He claims inserting 1234.5678 into a numeric(4, 2) column on Postgres produces an error, and that MySQL just rounds the number to fit. In my testing I found this to be a false claim:

mysql> CREATE TABLE data (
    -> id    integer NOT NULL,
    -> data  numeric(4, 2));
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO data VALUES (1, 1234.5678);
ERROR 1264 (22003): Out of range value for column 'data' at row 1

His next claim is that MySQL allows updating a key column to NULL and silently changes it to 0. This is also false:

mysql> INSERT INTO data VALUES (1, 12);
Query OK, 1 row affected (0.00 sec)
mysql> UPDATE data SET id = NULL WHERE id = 1;
ERROR 1048 (23000): Column 'id' cannot be null

In the original post, we never see the warnings and so don’t have the full details of his environment. Since he didn’t specify which version he was testing on, I will point out that MySQL 5.7 does a far better job out-of-the-box handling your data than 5.6 does, and SQL Mode has existed in MySQL for ages. Any user could set it to

STRICT_ALL|TRANS_TABLES

 and get the behavior that is now default in 5.7.

The author is also focusing on a narrow issue, using it to say Postgres is better. I feel this is misleading. I could point out factors in MySQL that are better than in Postgres as well.

This is another case of “don’t necessarily take our word for it”. A simple test of what you see on a blog can help you understand how things work in your environment and why.

Jun
13
2017
--

Webinar Thursday, June 15, 2017: Demystifying Postgres Logical Replication

Postgres Logical Replication

Postgres Logical ReplicationJoin Percona’s Senior Technical Services Engineer Emanuel Calvo as he presents Demystifying Postgres Logical Replication on Thursday, June 15, 2017 at 7 am PDT / 10 am EDT (UTC-7).

The Postgres logical decoding feature was added in version 9.4, and thankfully it is continuously improving due to the vibrant open source community. In this webinar, we are going to walk through its concepts, usage and some of the new things coming up in future releases.

Logical decoding is one of the features under the BDR implementation, allowing bidirectional streams of data between Postgres instances. It also allows you to stream data outside Postgres into many other data systems.

Register for the webinar here.

Emanuel CalvoEmanuel Calvo, Percona Sr. Technical Services

Emanuel has worked with MySQL for more than eight years. He is originally from Argentina, but also lived and worked in Spain and other Latin American countries. He lectures and presents at universities and local events. Emanuel currently works as a Sr. Technical Services at Percona, focusing primarily on MySQL. His professional background includes experience at telecommunication companies, educational institutions and data warehousing solutions. In his career, he has worked as a developer, SysAdmin and DBA in companies like Pythian, Blackbird.io/PalominoDB, Siemens IT Solutions, Correo Argentino (Argentinian Postal Services), Globant-EA, SIU – Government Educational Institution and Aedgency among others. As a community member he has lectured and given talks in Argentina, Brazil, United States, Paraguay, Spain and Belgium as well as written several technical papers.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com