Feb
02
2023
--

PostgreSQL Database Security: External Server-Based Authentication

PostgreSQL Database Security: External Server-Based Authentication

Database security is crucial to protecting sensitive information stored in a database. A critical aspect is authentication, which refers to verifying the identity of a user trying to access the database. The authentication process is typically accomplished through usernames and passwords and includes other factors such as security tokens and one-time passwords. Database Administrators (DBAs) must implement robust authentication mechanisms to ensure that only authorized users can access the database and its data. Additionally, it is best practice to monitor regularly and audit database authentication logs to detect and prevent unauthorized access attempts.

PostgreSQL server authentication refers to verifying the identity of a user trying to connect to the PostgreSQL server. This can be done using various methods, including password-based authentication, a client certificate, and external authentication methods such as GSSAPI, LDAP, and RADIUS. The pg_hba.conf file is used to configure the authentication methods the PostgreSQL server uses and can specify different methods for different types of connections. By default, PostgreSQL uses trust-based authentication, where any connection from localhost is considered trusted and doesn’t require a password. However, it’s important to note that it’s recommended to use password-based or external authentication methods for remote connections and production environments to ensure secure access to the data.

When talking about security, authentication is the first line of defense. PostgreSQL provides various methods of authentication, which are categorized into three categories.

 

PostgreSQL internal authentication and OS-based authentication have already been discussed in previous blogs. Now it’s time to discuss external authentication methods.

PostgreSQL external authentication

PostgreSQL supports a variety of external authentication methods, including GSSAPI, LDAP, and RADIUS. GSSAPI, or Generic Security Service Application Program Interface, which uses Kerberos to authenticate users. LDAP, or Lightweight Directory Access Protocol, authenticates users against an LDAP server. RADIUS, or Remote Authentication Dial-In User Service, authenticates users against a RADIUS server. These external authentication methods can secure a PostgreSQL database by providing a centralized way of managing user access and eliminating the need to store passwords in the database. However, it’s important to note that GSSAPI and RADIUS are only supported in PostgreSQL server versions 9.1 or later, and LDAP is a built-in feature for all versions.

LDAP (Lightweight Directory Access Protocol)

LDAP is a popular authentication method for enterprise environments, and it can also be used with PostgreSQL. In this section, we will go over how to set up LDAP authentication for PostgreSQL and provide an example of how it can be configured. Before you begin, you will need access to an LDAP server and a basic understanding of how it works. You will also need to have PostgreSQL installed and running on your server.

Step 1: Install the OpenLDAP server and client packages

sudo apt-get update 
sudo apt-get install slapd ldap-utils

Step 2: Configure the OpenLDAP server

sudo dpkg-reconfigure slapd

  • Select “No” when asked to configure the database with dbconfig-common.
  • Set the domain name for your LDAP server, for example, “example.com”.
  • Set the organization name for your LDAP server, for example, “Example Inc”.
  • Set the administrator password for your LDAP server.

Step 3: Create an LDAP directory structure

sudo ldapadd -x -D cn=admin,dc=example,dc=com -w <admin password> -f base.ldif

where “base.ldif” is a file containing the directory structure you want to create.

Step 4: Test the LDAP server

ldapsearch -x -b dc=example,dc=com -D cn=admin,dc=example,dc=com -w <admin password>

This command should return information about your LDAP directory.

Step 5: Add users and groups to the LDAP directory

ldapadd -x -D cn=admin,dc=example,dc=com -w <admin password> -f users.ldif

where “users.ldif” is a file containing the users and groups you want to add to the directory.

Step 6: Test the user authentication

ldapwhoami -x -D cn=<user>,dc=example,dc=com -w <user password>

This command should return the DN (Distinguished Name) of the user, indicating that the user

Step 7: Configure pg_hba.conf

The pg_hba.conf file controls the authentication methods used by PostgreSQL. To enable LDAP authentication, you’ll need to add a line to the pg_hba.conf file that specifies the LDAP server and method to be used.

For example:

host all all ldapserver=ldap.example.com ldapmethod=simple

This line tells PostgreSQL to use the LDAP server at ldap.example.com for authentication using the “simple” method.

Step 8: Restart PostgreSQL

After making changes to the pg_hba.conf file, you will need to restart the PostgreSQL service for the changes to take effect.

Step 9: Test the LDAP connection

You can test the connection to the LDAP server by running the following command:

ldapwhoami -h ldap.example.com -D "cn=admin,dc=example,dc=com" -w "adminpassword"

This command will bind to the LDAP server as the “cn=admin,dc=example,dc=com” user with the password “adminpassword”. If the connection is successful, you should see a message indicating the user you are currently bound as.

Step 10: Test LDAP authentication

To test LDAP authentication with PostgreSQL, you can connect to the database using the psql command-line interface. For example,

psql -U myldapuser -h localhost

If the LDAP authentication is set up correctly, you should be able to connect to the database using the myldapuser’s LDAP credentials.

GSSAPI (Generic Security Service Application Program Interface)

GSSAPI is an authentication method that allows users to authenticate to PostgreSQL using Kerberos. In this section, we will go over how to set up GSSAPI authentication for PostgreSQL and provide an example of how it can be configured.

Before you begin, you will need to have a Kerberos infrastructure and a basic understanding of how it works. You will also need to have PostgreSQL installed and running on your server and the Kerberos libraries (gssapi and gssapi_krb5) installed on the server where PostgreSQL is running.

Step 1: Configure pg_hba.conf

The pg_hba.conf file controls the authentication methods used by PostgreSQL. To enable GSSAPI authentication, you’ll need to add a line to the pg_hba.conf file that specifies the GSSAPI method to be used. For example:

hostgssenc postgres postgres 192.168.0.102/32 gss include_realm=0

This line tells PostgreSQL to use GSSAPI for authentication and not to include the Kerberos realm in the username. 

Step 2: Restart PostgreSQL

After making changes to the pg_hba.conf file, you will need to restart the PostgreSQL service for the changes to take effect. If you get this error, that means your server is not configured with –with-gssapi option.

2023-01-31 19:25:18.585 PKT [42302] LOG: hostgssenc record cannot match because GSSAPI is not supported by this build

Step 3: Kerberos configuration

You will need to configure Kerberos to work with PostgreSQL. You will need to create a service principal for the PostgreSQL server and add it to the Kerberos keytab file.

Step 4: Test the GSSAPI connection

You can test the GSSAPI connection by running the following command:

kinit -k -t /path/to/keytab postgres/your.postgres.host@YOURREALM.COM

This command will acquire a Kerberos ticket for the service principal you created in step 3.

Step 5: Test GSSAPI authentication

To test GSSAPI authentication with PostgreSQL, you can connect to the database using the psql command-line interface. For example:

psql "dbname=mydb host=myhost user=myuser"

If the GSSAPI authentication is set up correctly, you should be able to connect to the database using your Kerberos credentials.

SSPI (Security Support Provider Interface)

SSPI is an authentication method that allows users to authenticate to PostgreSQL using Windows’ built-in security features. It is similar to GSSAPI but specific to the Windows operating system.

The PostgreSQL server uses the SSPI library to negotiate authentication with the client using the Kerberos or NTLM protocols. To set up SSPI authentication in PostgreSQL, you must configure the pg_hba.conf file to use the “sspi” method and restart the PostgreSQL service.

For example, to enable SSPI authentication for all connections from the local host, you would add the following line to the pg_hba.conf file:

host all all 127.0.0.1/32 sspi

This line tells PostgreSQL to use SSPI for authentication for all connections coming from the IP address 127.0.0.1, which corresponds to the localhost. Once the changes are made, you will need to restart the PostgreSQL service for the changes to take effect. To test the SSPI authentication, you can connect to the database using the psql command-line interface. SSPI authentication is only supported on Windows and is available on PostgreSQL server version 9.1 or later.

RADIUS (Remote Authentication Dial-In User Service)

RADIUS is a widely used protocol for authenticating remote users. It can also be used to authenticate users in a PostgreSQL database. In this blog post, we will go over how to set up RADIUS authentication for PostgreSQL and provide an example of how it can be configured.

Before you begin, you will need access to a RADIUS server and a basic understanding of how it works.

Step 1: Install FreeRADIUS  

Here are the steps to install and configure a RADIUS server on Ubuntu:

sudo apt-get update sudo apt-get install freeradius freeradius-utils

Step 2: Configure the RADIUS server

  • The main configuration file for FreeRADIUS is located at /etc/freeradius/radiusd.conf.
  • The users file is located at /etc/freeradius/users.

Step 3: Add users to the RADIUS server

sudo nano /etc/freeradius/users

Add a new entry for each user in the following format:

username Auth-Type := Local, User-Password == "password"

Step 4: Restart the RADIUS server

sudo service freeradius restart

Step 5: Test the RADIUS server

sudo radtest username password 127.0.0.1 0 testing123

If the test is successful, you should receive an “Access-Accept” response.

Step 6: Configure pg_hba.conf

The pg_hba.conf file controls the authentication methods used by PostgreSQL. To enable RADIUS authentication, you’ll need to add a line to the pg_hba.conf file that specifies the RADIUS server and method to be used. For example:

host all all 0.0.0.0/0 radius radiusservers=192.168.1.1 radiussecrets=password radiusports=1812

This line tells PostgreSQL to use RADIUS for authentication.

Step 7: Restart PostgreSQL

After making changes to the pg_hba.conf and radius.conf files, you will need to restart the PostgreSQL service for the changes to take effect.

Step 8: Test RADIUS authentication

To test RADIUS authentication with PostgreSQL, you can try to connect to the database using the psql command-line interface. For example,

psql -U username -h localhost

If the RADIUS authentication is set up correctly, you should be able to connect to the database using your RADIUS credentials.

Conclusion

In conclusion, external authentication methods such as GSSAPI, LDAP, and RADIUS can enhance the security of a PostgreSQL database by providing a centralized way of managing user access and eliminating the need to store passwords in the database. GSSAPI uses Kerberos to authenticate users, LDAP authenticates users against an LDAP server, and RADIUS authenticates users against a RADIUS server.

It’s important to note that GSSAPI and RADIUS are only supported in PostgreSQL server versions 9.1 or later, and LDAP is a built-in feature for all versions. In addition to external authentication methods, the PostgreSQL server also supports password-based and client certificate-based authentication methods. It’s recommended to use password-based or external authentication methods for remote connections and production environments to ensure secure access to the data.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open source community in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Feb
01
2023
--

Unlocking the Secrets of TOAST: How To Optimize Large Column Storage in PostgreSQL for Top Performance and Scalability

Unlocking the Secrets of TOAST PostgreSQL

PostgreSQL is a fantastic database, but if you’re storing images, video, audio files, or other large data objects, you need to “toast” them to get optimal performance. This post will look at using The Oversized-Attribute Storage Technique (TOAST) to improve performance and scalability. 

PostgreSQL uses fixed-sized pages, which makes it challenging to store huge data values directly. To solve this problem, large data values are compressed and broken into multiple smaller chunks. This process is done automatically and does not significantly impact how the database is used. This technique, called TOAST, improves how large data values are stored and used in the database.

Therefore, TOAST is a storage technique used in PostgreSQL to handle large data objects such as images, videos, and audio files. The TOAST technique allows for the efficient storage of large data objects by breaking them into smaller chunks and storing them separately from the main table. This can improve the performance of queries and indexing and reduce the amount of disk space required to store the data.

TOAST tables are created automatically by PostgreSQL when a table contains a column of type OID, bytea, or any other data type with the TOASTable storage class. The TOAST table is then used to store the large data objects, while the main table stores a reference to the TOAST table.

Here’s an example of using the TOAST technique in PostgreSQL:

  • Create a table with a large data column:
CREATE TABLE images ( id SERIAL PRIMARY KEY, data BYTEA );

  • Insert a large image into the table:
INSERT INTO images (data) VALUES (E'\\x...');

  • Query the table to see that the large data object is stored in a TOAST table:
SELECT relname, relkind FROM pg_class WHERE relname LIKE 'pg_toast%';

In the example, the images table contains a column named data of type bytea, which can hold extensive binary data. When a large image is inserted into the table, PostgreSQL automatically creates a TOAST table to store the image data separately from the main table. The pg_class system catalog table is then queried to show that a TOAST table has been created.

It’s important to note that while TOAST tables help store large data objects, they can add complexity to the database and should be used with care. Also, in some cases, the query performance will decrease when the data is spread among different tables, depending on the query conditions.

If you have a lot of large data, you don’t need to query/index; another option you could consider is to store it outside the DB in the file system and store a reference to it in DB, similar to how a TOAST table works.

In PostgreSQL, you can use the different TOAST storage strategies by setting the “storage” attribute on a column.

CREATE TABLE mytable ( id serial primary key, large_column dat);
postgres=# \d+ mytable
Table "public.mytable"
Column       | Type    | Collation | Nullable | Default                             | Storage 
-------------+---------+-----------+----------+-------------------------------------+----------
id           | integer |           | not null | nextval('mytable_id_seq'::regclass) | plain 
large_column | bytea   |           |          |                                     | extended 
Indexes:
"mytable_pkey" PRIMARY KEY, btree (id)
Access method: heap

postgres=# ALTER TABLE mytable ALTER COLUMN large_column SET STORAGE PLAIN;
ALTER TABLE
postgres=# \d+ mytable
Table "public.mytable"
Column       | Type    | Collation | Nullable | Default                             | Storage 
-------------+---------+-----------+----------+-------------------------------------+----------
id           | integer |           | not null | nextval('mytable_id_seq'::regclass) | plain 
large_column | bytea   |           |          |                                     | plain 
Indexes:
"mytable_pkey" PRIMARY KEY, btree (id)
Access method: heap

postgres=# ALTER TABLE mytable ALTER COLUMN large_column SET STORAGE MAIN; 
ALTER TABLE
postgres=# \d+ mytable
Table "public.mytable"
Column       | Type    | Collation | Nullable | Default                             | Storage 
-------------+---------+-----------+----------+-------------------------------------+----------
id           | integer |           | not null | nextval('mytable_id_seq'::regclass) | plain 
large_column | bytea   |           |          |                                     | main 
Indexes:
"mytable_pkey" PRIMARY KEY, btree (id)
Access method: heap

Where “data_type” is the data type of the column (e.g., text, bytea), and “strategy” is one of the four TOAST storage strategies (PLAIN, EXTENDED, EXTERNAL, MAIN).

You can also use the pg_attribute table in the system catalog to check which strategy a column uses.

SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'tablename'::regclass AND attnum > 0;

postgres=# SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'mytable'::regclass AND attnum > 0;
attname       | attstorage 
--------------+------------
id            | p
large_column  | m
(2 rows)

It’s worth noting that the default storage strategy for most columns is ‘EXTENDED’ (compression and out-of-line storage) and that you can change the storage strategy of a table column at any time. However, it’s important to note that changing a column’s storage strategy may affect your queries’ performance and table size. Thus, testing your specific use case with different storage strategies is advisable to determine which provides the best performance.

  1. PLAIN strategy: This strategy disables both compression and out-of-line storage and also disables using single-byte headers for varlena types. This is the only strategy available for non-TOAST-able data types, such as integers or booleans. Example: If you have a table with a column of integers, you don’t want to keep them in line.
  2. EXTENDED strategy: This strategy allows both compression and out-of-line storage. This is the default strategy for most TOAST-able data types, such as text or bytea. The system will first attempt to compress the data; if the row is too large, it will store it out-of-line. Example: You have a table with a large column of text and want to reduce its size on the disk; this strategy will try to compress it first. If it doesn’t fit, the row will be stored out of line.
  3. EXTERNAL strategy: This strategy allows out-of-line storage but disables compression. This strategy is useful for text and bytea columns frequently accessed with substring operations. Accessing these columns will be faster because the system only needs to fetch the required parts of the out-of-line value. Example: You have a table with a large column of text and want to improve performance when substring operations are needed; this strategy will store it out of line and avoid compression
  4. MAIN strategy: This strategy allows compression but disables out-of-line storage. Out-of-line storage will still be performed, but only as a last resort when there is no other way to make the row small enough to fit on a page. Example: You have a table with a large column of data not accessed frequently, and you want to compress it to save disk space; this strategy will compress it but will avoid storing it out of line.

Navigating the pitfalls of using TOAST in PostgreSQL

While the TOAST technique can be useful to handle large data objects in PostgreSQL, you might hit a few problems. Here are a few common issues and ways to address them:

  1. Increased storage space: Because TOAST tables store large data objects separately from the main table, they can increase the amount of disk space required to store the data. This can be especially problematic if the table contains many large data objects. To address this issue, consider compressing the data before storing it in the TOAST table or using a storage solution optimized for handling large data objects such as file systems or object storage.
  2. Query performance: Queries involving large data objects stored in TOAST tables can be slower than those with smaller data objects. This is because the database needs to fetch the data from the TOAST table before it can be used in the query. To address this issue, try creating indexes on the TOAST table or consider using a caching layer to reduce the number of times data needs to be fetched from the TOAST table.
  3. Vacuum performance: PostgreSQL runs a process called “vacuum,” which reclaims disk space from deleted or updated rows to maintain the database’s performance. The vacuum process could be slow when there are a lot of large data objects stored in TOAST tables. To address this issue, try running the vacuum process during periods of low database activity, or consider using a storage solution optimized for handling large data objects such as file systems or object storage.
  4. Limited data types: The TOAST table is created only for the columns defined as oid, bytea, or any other data type with the TOASTable storage class. You can’t use a TOAST table for data types such as text or varchar, which can also be large.

Tackling the TOAST table growth issue in PostgreSQL: Strategies and solutions

A common issue with the TOAST system in PostgreSQL is that the size of the TOAST table can grow out of control. This can happen when large amounts of data are inserted into the table, causing the table to become larger than the available disk space. There are several ways to address this issue:

  1. Increasing disk space: The simplest solution is to increase the amount of disk space available to the PostgreSQL instance. This will allow the TOAST table to continue growing and should be considered a temporary solution.
  2. VACUUM and ANALYZE: Running the VACUUM and ANALYZE commands can help to reclaim space in the TOAST table that is no longer needed. Vacuum reclaims space occupied by dead rows, and ANALYZE will help the query planner to make more accurate decisions.
  3. Set a size limit to the TOAST table: You can use the max_toast_size configuration parameter to set a maximum size limit for the TOAST table. Once the table reaches this size, any additional data will be rejected.
  4. Choose a more appropriate storage strategy: As explained previously, choosing a more appropriate storage strategy for your data type and the access pattern can help to avoid unnecessary growth of the TOAST table.
  5. Archiving old data: Removing old or rarely accessed data from your table will help reduce the table’s size. Also, consider archiving old data to a different storage location, such as tape or cloud storage.
  6. Compressing the data: If you are using plain or external storage, you can also consider compressing your data before storing it in the table. Thus, you can use less disk space.

Conclusion

In conclusion, the TOAST system in PostgreSQL is a powerful feature that allows the database to handle large column values that would not fit in a single database block. The system uses a variety of strategies for storing these columns, including PLAIN, EXTENDED, EXTERNAL, and MAIN. Each strategy has its advantages and uses cases, and the appropriate strategy will depend on the specific requirements of your application.

For example, you may use the EXTERNAL strategy if you have a table with a large column of text and want to improve performance when substring operations are needed. When designing a table, consider the size and type of data stored in the columns and choose an appropriate storage strategy that will meet your application’s performance and space requirements. It’s also possible to change the storage strategy of a column at any time, although this may affect the performance of your queries and the size of your table. Thus, testing different strategies before settling on the best one is highly recommended.

Ultimately, understanding the TOAST system and how to use it effectively can significantly improve the performance and scalability of your PostgreSQL applications.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jan
11
2023
--

Upgrading PostgreSQL Extensions

Upgrading PostgreSQL Extensions

PostgreSQL is a powerful and flexible open-source database management system that allows users to install and use extensions to add additional functionality to their databases. In this tutorial, we will cover the process of installing and upgrading PostgreSQL extensions using the example of the pg_stat_monitor extension.

The pg_stat_monitor is a PostgreSQL extension that provides real-time monitoring and analysis of database activity. It allows users to monitor and track various performance metrics, such as queries per second, buffer hits, and shared memory pool size. It also helps them optimize their database’s performance in real-time.

To install the pg_stat_monitor extension, you must have the necessary privileges to create extensions in your database. This can typically be done using the CREATE EXTENSION command.

For example:

CREATE EXTENSION pg_stat_monitor;

This command installs the default version of the pg_stat_monitor extension specified in the control file (pg_stat_monitor.control).

You can use the CREATE EXTENSION command with the VERSION option if you want to install a specific extension version. For example:

CREATE EXTENSION pg_stat_monitor VERSION '2.0';

To upgrade a PostgreSQL extension, you can use the ALTER EXTENSION command with the UPDATE TO option. For example:

ALTER EXTENSION pg_stat_monitor UPDATE TO '2.0';

Before upgrading an extension, you should check the current version of the installed extension. To do this, you can use the pg_extension view:

SELECT * FROM pg_extension WHERE extname = 'pg_stat_monitor';

This will return the version of the pg_stat_monitor extension currently installed in your database.

Along with the library files, a PostgreSQL extension also includes a set of SQL files that define the objects and functions of the extension and a control file to manage the extension. When you upgrade the pg_stat_monitor extension from version 1.0 to version 2.0, the database server executes these SQL files to update the extension’s objects and functionality. The pg_stat_monitor–1.0.sql file is executed first to create the initial version of the extension, and then the pg_stat_monitor–1.0–2.0.sql file is executed to apply any changes or additions made in version 2.0.

The control file is an essential part of a PostgreSQL extension. It provides the database server with the information it needs to manage the extension and make its objects and functionality available to the database. It includes details such as the extension’s name, version, dependencies, and SQL scripts. In the case of the pg_stat_monitor extension, the control file might look something like this:

comment = 'Real-time monitoring and analysis of database activity' 

default_version = '2.0' 

module_pathname = 'pg_stat_monitor.so' 

relocatable = true 

In this control file, the comment field provides a brief description of the extension, the default_version field specifies the default version of the extension (2.0), and the module_pathname field specifies the path to a shared object file that provides additional functionality for the extension. The relocatable field indicates that the extension can be installed in a schema other than the public schema.

Conclusion

PostgreSQL extensions are a powerful way to extend the capabilities of your database and build flexible and powerful applications. You can easily install and upgrade PostgreSQL extensions by using the CREATE EXTENSION and ALTER EXTENSION commands. You can use the control file to manage the extension and make its objects and functionality available to your database.

Jan
05
2023
--

PostgreSQL Vacuuming Command to Optimize Database Performance

PostgreSQL vacuum

This blog post was originally published in January 2023 and was updated in June 2023.

In PostgreSQL, the vacuum command is a maintenance task that helps to optimize database performance and reclaim disk space. Using the PostgreSQL vacuum command involves removing deleted or outdated rows from tables and indexes and updating statistics used by the query planner. This process is necessary to prevent the accumulation of unnecessary data, known as “dead tuples” or “dead rows,” which can take up significant space and slow down queries.

 

Get database support for PostgreSQL

 

Multi-version concurrency control (MVCC)

To maintain consistency and prevent data loss due to concurrent updates, PostgreSQL employs multi-version concurrency control (MVCC). PostgreSQL and other database management systems use MVCC to ensure consistent reads and prevent data loss from concurrent updates. PostgreSQL is achieved by storing multiple versions of each row within the database, allowing transactions to access a consistent data snapshot.

In PostgreSQL databases, each row within a single table is assigned a transaction ID referred to as an “xmin”. This ID signifies the transaction that inserted the row. When a row is updated or deleted, it is not immediately removed from the table. Instead, a new version of the row is inserted with a new transaction ID, while the old version is marked as “dead” with a transaction ID called an “xmax”.

When a transaction reads a row, it utilizes the xmin and xmax values to determine whether the row is visible to the transaction. If the xmin value is greater than the transaction’s “snapshot” (a record of the transaction IDs that were in progress when the transaction began), the row is not visible to the transaction. If the xmax value is equal to the transaction’s ID, the row has been deleted by the transaction and is also not visible. In all other cases, the row is visible to the transaction.

This allows transactions to access a consistent data snapshot, as they can only see rows committed when the transaction began. It also prevents data loss due to concurrent updates, as conflicting updates result in inserting new row versions rather than overwriting the existing data.

Although MVCC incurs some overhead in terms of storage and performance due to the need to maintain multiple versions of each row, it is a crucial feature of PostgreSQL and other database systems that support concurrent updates.

This allows multiple versions of each row to be stored, enabling transactions to access a consistent data snapshot. However, this can result in the accumulation of dead tuples as rows are updated or deleted.

PostgreSQL vacuuming

The vacuum process in PostgreSQL helps maintain the database’s performance and disk space efficiency by removing rows that are no longer needed.

These rows accumulate because PostgreSQL uses MVCC to allow multiple transactions to access the same data simultaneously without conflicts. During the vacuum process, tables and indexes are scanned, and these dead tuples are removed, helping to reclaim space and improve query performance. It is essential to run a vacuum to keep the database running smoothly periodically. MVCC stores multiple versions of each row, so dead tuples are not immediately removed when a row is updated or deleted.

PostgreSQL vacuum configuration parameters

Vacuum Full

The VACUUM FULL command in PostgreSQL is a tool used to reclaim disk space occupied by deleted or obsolete data. Unlike the regular VACUUM command, it performs a more thorough cleanup by physically rearranging the data on the disk. This ensures that all dead tuples are completely removed, resulting in a more compact storage structure and improved query performance.

VACUUM FULL requires exclusive access to the table, so while the command runs, other transactions cannot read from or write to the table, potentially impacting database availability. It’s important to know this is a resource-intensive operation that may cause database disruptions and longer execution times.

Verbose

The term “Verbose” is used when referring to Postgres log output. Normally Postgres logs only the barest of information when it’s first set up, i.e., STARTUP, SHUTDOWN, ERROR, and FATAL messages. Log output verbosity increases substantially when the logging parameters are set to record additional information relating to client access details and Postgres background processes such as CHECKPOINTs and VACUUM. The most “verbose” setting is when each and every SQL statement is logged and which can increase the Postgres log size generating GB of messages in a matter of a few minutes.

PostgreSQL Vacuum and Analyze

The vacuum process removes dead tuples and updates statistics used by the query planner to more accurately estimate the number of rows returned by a query and choose the most efficient execution plan. There are two types of vacuum in PostgreSQL: VACUUM and ANALYZE. VACUUM removes dead tuples and updates statistics, while ANALYZE only updates statistics. It is generally recommended to run both VACUUM and ANALYZE together.

The vacuum command can be initiated manually using SQL commands or automated using the autovacuum background process, which runs based on configurable thresholds such as the number of dead tuples or live rows in a table. In PostgreSQL 15, the vacuum process has been optimized to make it more efficient and faster to vacuum large tables. It includes improvements such as the ability to vacuum multiple partitions of the same table in parallel, vacuum indexes concurrently, and skip vacuuming indexes unaffected by an update.

From a technical perspective, the vacuum process in PostgreSQL 15 involves several components. The vacuum daemon (autovacuum) initiates vacuum operations based on configurable thresholds. The vacuum worker executes the actual vacuum operation, scanning the table or index and removing dead tuples while updating statistics.

Autovacuum

Autovacuum, which automates routine vacuum maintenance, is enabled by default in PostgreSQL and can be configured using several parameters in postgresql.conf file. PostgreSQL has several settings related to vacuum that can be configured to control how the vacuum process runs. You can find the following settings in postgresql.conf file and include:

  1. autovacuum: This setting enables or disables the autovacuum background process. By default, autovacuum is enabled.
  2. autovacuum_vacuum_threshold: This setting determines the minimum number of dead rows that must be present in a table before it is vacuumed. The default value is 50.
  3. autovacuum_analyze_threshold: This setting determines the minimum number of live rows that must be present in a table before it is analyzed. The default value is 50.
  4. autovacuum_vacuum_scale_factor: This setting is a multiplier that determines how many dead rows are needed to trigger a vacuum based on the table size. The default value is 0.2.
  5. autovacuum_analyze_scale_factor: This setting is a multiplier that determines how many live rows are needed to trigger an analyze based on the size of the table. The default value is 0.1.
  6. autovacuum_vacuum_cost_delay: This setting determines the time (in milliseconds) the autovacuum will wait before starting a vacuum operation. The default value is 20.
  7. autovacuum_vacuum_cost_limit: This setting determines the maximum number of rows that can be vacuumed in a single vacuum operation. The default value is 200.

Here is an example of configuring some of the vacuum parameters in the postgresql.conf file:

autovacuum = on 

autovacuum_vacuum_threshold = 100 

autovacuum_analyze_threshold = 100

autovacuum_vacuum_scale_factor = 0.5

autovacuum_analyze_scale_factor = 0.2

autovacuum_vacuum_cost_delay = 50

autovacuum_vacuum_cost_limit = 500

In this example, autovacuum is enabled, and the thresholds for vacuum and analyze are set to 100. The scale factors for vacuum and analyze are set to 0.5 and 0.2, respectively, which means that a vacuum will be triggered when there are 50 dead rows per 1,000 live rows in the table (0.5 x 100), and an analyze will be triggered when there are 20 live rows per 1,000 rows in the table (0.2 x 100). The vacuum cost delay is set to 50 milliseconds, and the vacuum cost limit is set to 500 rows, which means that the autovacuum will wait 50 milliseconds before starting a vacuum operation and will vacuum a maximum of 500 rows at a time.

It is important to configure these settings to ensure that vacuum and analyze properly are running effectively and not causing too much load on the database. It is also a good idea to monitor the activity of the autovacuum and manually vacuum tables that are not adequately maintained by the autovacuum.

Autovacuum can also be configured on a per-table basis using the autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit parameters in the table’s storage parameters. These parameters control how aggressively autovacuum vacuums the table, with a lower cost delay causing the vacuum to run more frequently and a higher cost limit allowing more rows to be vacuumed at once.

Parallel vacuuming

Parallel vacuum is a feature in PostgreSQL that allows the vacuum process to be run concurrently on multiple cores or processors for the same table, improving the performance of the vacuum operation. This can be especially useful for vacuuming large tables, as it allows the vacuum process to use multiple CPUs to scan and process the table in parallel.

The parallel vacuum was introduced in PostgreSQL 13 as an experimental feature and made generally available in PostgreSQL 14. To use a parallel vacuum, you need to set the “max_parallel_workers_per_gather” parameter in the postgresql.conf configuration file to a value greater than one. To enable parallel processing, specify the “PARALLEL” option when running a vacuum or analyze command.

For example, to run a parallel vacuum on a table named “foo_table”, you can use the following command:

VACUUM (PARALLEL, ANALYZE) foo_table;

You can also specify the “PARALLEL” option when running a vacuum or analyze command on an entire schema or database:

VACUUM (PARALLEL, ANALYZE) schema_name.*; VACUUM (PARALLEL, ANALYZE);

Note: Keep in mind that parallel vacuum can increase the load on the database server, as it requires multiple CPU cores to be used simultaneously. You should carefully monitor the performance of your PostgreSQL database when using a parallel vacuum and adjust the “max_parallel_workers_per_gather” parameter as needed to find the optimal setting for your workload.

 

Discover why Percona and PostgreSQL work better together

 

Transaction wraparound

Transaction wraparound is a phenomenon that can occur in PostgreSQL when the maximum transaction ID (TXID) has been reached, and the system wraps around to reuse old transaction IDs. This can cause problems if there are still rows in the database with a higher transaction ID than the current maximum, as they will be considered “dead” and removed by the vacuum process.

To understand how transaction-wraparound works, it is essential to understand how PostgreSQL manages transaction IDs. PostgreSQL transactions are assigned a unique transaction ID called a “xid.” The xid is a 32-bit integer, meaning it has a maximum value of 2^32-1, or 4,294,967,295. When this maximum value is reached, the system wraps around and reuses old xids.

To prevent transaction wraparound from causing problems, it is important to run the vacuum process and remove dead rows regularly. You can use the following query to check for tables at risk of transaction wraparound.

Vacuum statistics

To view the vacuum history for all tables in the current schema:

This query retrieves vacuum statistics for all tables in the “public” schema in the current database. This query can help monitor the status of the vacuum process and identify tables that may need to be vacuumed or analyzed. For example, if a table has many dead rows or has not been vacuumed or analyzed recently. In that case, it may be worth running a manual vacuum or analyzing operations to improve the performance of the database. Download our free e-book, “The 6 Common Causes of Poor Database Performance,” to see the most common database performance issues and get expert insights on how to fix them.

SELECT

    n.nspname as schema_name,

    c.relname as table_name,

    c.reltuples as row_count,

    c.relpages as page_count,

    s.n_dead_tup as dead_row_count,

    s.last_vacuum,

    s.last_autovacuum,

    s.last_analyze,

    s.last_autoanalyze

FROM pg_class c

JOIN pg_namespace n ON n.oid = c.relnamespace

LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid

WHERE c.relkind = 'r' AND n.nspname = 'public';

-[ RECORD 1 ]----+------------------------------

schema_name      | public

table_name       | pgbench_accounts

row_count        | 9.999965e+06

page_count       | 163935

dead_row_count   | 41705

last_vacuum      | 2022-12-25 16:00:36.231734+00

last_autovacuum  | 

last_analyze     | 2022-12-25 16:00:18.90299+00

last_autoanalyze | 

-[ RECORD 2 ]----+------------------------------

schema_name      | public

table_name       | pgbench_branches

row_count        | 100

page_count       | 1

dead_row_count   | 41

last_vacuum      | 2022-12-25 16:00:44.722317+00

last_autovacuum  | 

last_analyze     | 2022-12-25 16:00:16.254529+00

last_autoanalyze | 2022-12-25 16:01:45.957663+00

To view the list of tables that have been modified since the last vacuum:

SELECT

    n.nspname as schema_name,

    c.relname as table_or_index_name,

    c.relkind as table_or_index,

    c.reltuples as row_count,

    s.last_vacuum,

    s.last_autovacuum,

    s.last_analyze,

    s.last_autoanalyze

FROM pg_class c

JOIN pg_namespace n ON n.oid = c.relnamespace

LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid

WHERE (c.relkind = 'r' or c.relkind = 'i')
AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze);

To view the list of tables and indexes that have a high number of dead rows:

SELECT

    n.nspname as schema_name,

    c.relname as table_name,

    c.reltuples as row_count,

    s.n_dead_tup as dead_row_count

FROM pg_class c

JOIN pg_namespace n ON n.oid = c.relnamespace

LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid

WHERE c.relkind = 'r' AND s.n_dead_tup > 0;




SELECT

    n.nspname as schema_name,

    c.relname as index_name,

    s.n_dead_tup as dead_row_count

FROM pg_class c

JOIN pg_namespace n ON n.oid = c.relnamespace

LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid

    WHERE (c.relkind = ''r'' or c.relkind = ''i'')

    AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze)') AS t(

        schema_name text,

        table_or_index_name text,

        table_or_index char(1),

        row_count bigint,

        last_vacuum timestamp,

        last_autovacuum timestamp,

        last_analyze timestamp,

        last_autoanalyze timestamp

    );

You can write the above query for all the databases using dblink. This will give information about all the databases.

SELECT *

FROM dblink('host=<host> port=<port> dbname=<database> user=<username> password=<password>',

    'SELECT

        n.nspname as schema_name,

        c.relname as table_or_index_name,

        c.relkind as table_or_index,

        c.reltuples as row_count,

        s.last_vacuum,

        s.last_autovacuum,

        s.last_analyze,

        s.last_autoanalyze

    FROM pg_class c

    JOIN pg_namespace n ON n.oid = c.relnamespace

    LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid

    WHERE (c.relkind = ''r'' or c.relkind = ''i'')

    AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze)') AS t(

        schema_name text,

        table_or_index_name text,

        table_or_index char(1),

        row_count bigint,

        last_vacuum timestamp,

        last_autovacuum timestamp,

        last_analyze timestamp,

        last_autoanalyze timestamp

    );

Note: You need to replace the placeholders <host>, <port>, <database>, <username>, and <password> with the actual values for your server.

Conclusion

It is important to properly configure the autovacuum to avoid overloading the database with too many vacuums. It is also a good idea to monitor the activity of the autovacuum and manually vacuum tables that are not adequately maintained by the autovacuum.

In summary, PostgreSQL vacuuming is an essential maintenance task in PostgreSQL that helps to reclaim space and improve performance. t is generally recommended to run the vacuum regularly, either manually using the VACUUM SQL command or automatically using the autovacuum background process. This helps to ensure that the database remains efficient and free of dead rows and reduces the risk of transaction wraparound. Autovacuum is a convenient way to automate this process, but it is important to configure and monitor it to ensure it runs effectively and adequately.

Percona Distribution for PostgreSQL provides a complete package for PostgreSQL in a single installation. The components are tested to work together and continuously updated. This means you no longer need to find solutions for common requirements, like high availability and backup, on your own. Percona database experts are the best in the industry when it comes to tuning your complex PostgreSQL database environment for scaling and performance. Join the growing list of thousands of companies and millions of developers that use Percona software every day and learn more about Percona Distribution for PostgreSQL.

 

Discover why Percona and PostgreSQL work better together

 

FAQs

What is vacuum in PostgreSQL?

The term “vacuum” refers to a database maintenance procedure that reclaims storage space and optimizes database performance. Whenever data is inserted, updated, or deleted in a PostgreSQL database, it can generate “dead tuples” – rows that have become obsolete or are inaccessible.

The VACUUM operation in PostgreSQL identifies and eliminates these dead tuples, freeing up disk space to utilize for future operations.

What is the difference between vacuum full and vacuum freeze in PostgreSQL?

Both the VACUUM FULL and VACUUM FREEZE operations are both used in PostgreSQL for maintenance purposes, but they serve different functions.

The VACUUM FULL operation reclaims storage space by physically rewriting the entire table, relocating live tuples to fresh data pages, and discarding obsolete tuples. This can be useful when a significant amount of data has been updated and/or deleted, which results in fragmented space within the table. This operation effectively recaptures the space, but it comes with a cost; longer execution time and exclusive locks on the table, impacting concurrency.

VACUUM FREEZE addresses the issue of transaction ID wraparound. In PostgreSQL, every transaction is associated with a unique transaction ID, represented by a 32-bit number. As transactions progress, the transaction ID incrementally increases, and when it reaches its maximum value, it wraps around to zero, causing issues with transaction visibility and data integrity. The FREEZE operation marks all tuples in a table as frozen, meaning their transaction IDs are set to the maximum value, preventing wraparound and ensuring these tuples are marked as old and not subject to removal during VACUUM operations.

What is the vacuum threshold in PostgreSQL?

The “autovacuum_vacuum_threshold” parameter in PostgreSQL determines the minimum number of updated or deleted tuples required in a table before the autovacuum process is triggered. The default is 50 tuples, meaning that if 50 or more tuples are modified in a table, autovacuum will be triggered. However, you can adjust this value in the PostgreSQL configuration file (postgresql.conf) or by changing table storage parameters.

What is the difference between vacuum and vacuum full?

VACUUM: The VACUUM command analyzes and reclaims storage space in a PostgreSQL database. When rows in a table are deleted or updated, the space they once occupied is not immediately freed up. Rather, it becomes marked as available for future use. This command does not need access to the table, enabling other transactions to read from and write to the table concurrently.

VACUUM FULL: This is a more forceful form of vacuuming, reclaiming space by physically relocating the data within the table, ensuring the removal of all defunct rows, and compacting the storage space. It can take longer to complete than regular vacuuming and requires exclusive access to the table, making it inaccessible for reading or writing by other transactions while the command is in progress.

The primary distinction between VACUUM and VACUUM FULL lies in the level of aggressiveness used to free up storage space. VACUUM is a lightweight and faster operation, marking the space as available for reuse, while VACUUM FULL physically rearranges the data and compresses the storage, resulting in a more thorough cleanup.

Nov
29
2021
--

PostgreSQL 14 Database Monitoring and Logging Enhancements

PostgreSQL-14 Database Monitoring and Logging Enhancements

PostgreSQL-14 was released in September 2021, and it contained many performance improvements and feature enhancements, including some features from a monitoring perspective. As we know, monitoring is the key element of any database management system, and PostgreSQL keeps updating and enhancing the monitoring capabilities. Here are some key ones in PostgreSQL-14.

Query Identifier

Query identifier is used to identify the query, which can be cross-referenced between extensions. Prior to PostgreSQL-14, extensions used an algorithm to calculate the query_id. Usually, the same algorithm is used to calculate the query_id, but any extension can use its own algorithm. Now, PostgreSQL-14 optionally provides a query_id to be computed in the core. Now PostgreSQL-14’s monitoring extensions and utilities like pg_stat_activity, explain, and in pg_stat_statments use this query_id instead of calculating its own. This query_id can be seen in csvlog, after specifying in the log_line_prefix. From a user perspective, there are two benefits of this feature.

  • All the utilities/extensions will use the same query_id calculated by core, which provides an ease to cross-reference this query_id. Previously, all the utilities/extensions needed to use the same algorithm in their code to achieve this capability.
  • The second benefit is extension/utilities can use calculated query_id and don’t need to again, which is a performance benefit.

PostgreSQL introduces a new GUC configuration parameter compute_query_id to enable/disable this feature. The default is auto; this can be turned on/off in postgresql.conf file, or using the SET command.

  • pg_stat_activity

SET compute_query_id = off;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 | query_id 
----------+-----------------------------------------------------------------------+----------
 postgres | select datname, query, query_id from pg_stat_activity;                |         
 postgres | UPDATE pgbench_branches SET bbalance = bbalance + 2361 WHERE bid = 1; |

SET compute_query_id = on;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 |      query_id       
----------+-----------------------------------------------------------------------+---------------------
 postgres | select datname, query, query_id from pg_stat_activity;                |  846165942585941982
 postgres | UPDATE pgbench_tellers SET tbalance = tbalance + 3001 WHERE tid = 44; | 3354982309855590749

  • Log

In the previous versions, there was no mechanism to compute the query_id in the server core. The query_id is especially useful in the log files. To enable that, we need to configure the log_line_prefix configuration parameter. The “%Q” option is added to show the query_id; here is the example.

log_line_prefix = 'query_id = [%Q] -> '

query_id = [0] -> LOG:  statement: CREATE PROCEDURE ptestx(OUT a int) LANGUAGE SQL AS $$ INSERT INTO cp_test VALUES (1, 'a') $$;
query_id = [-6788509697256188685] -> ERROR:  return type mismatch in function declared to return record
query_id = [-6788509697256188685] -> DETAIL:  Function's final statement must be SELECT or INSERT/UPDATE/DELETE RETURNING.
query_id = [-6788509697256188685] -> CONTEXT:  SQL function "ptestx"
query_id = [-6788509697256188685] -> STATEMENT:  CREATE PROCEDURE ptestx(OUT a int) LANGUAGE SQL AS $$ INSERT INTO cp_test VALUES (1, 'a') $$;

  • Explain

The EXPLAIN VERBOSE will show the query_id if compute_query_id is true.

SET compute_query_id = off;

EXPLAIN VERBOSE SELECT * FROM foo;
                          QUERY PLAN                          
--------------------------------------------------------------

 Seq Scan on public.foo  (cost=0.00..15.01 rows=1001 width=4)
   Output: a
(2 rows)

SET compute_query_id = on;

EXPLAIN VERBOSE SELECT * FROM foo;
                          QUERY PLAN                          
--------------------------------------------------------------
 Seq Scan on public.foo  (cost=0.00..15.01 rows=1001 width=4)
   Output: a
 Query Identifier: 3480779799680626233
(3 rows)

autovacuum and auto-analyze Logging Enhancements

PostgreSQL-14 improves the logging of auto-vacuum and auto-analyze. Now we can see the I/O timings in the log, showing how much has been spent reading and writing.

automatic vacuum of table "postgres.pg_catalog.pg_depend": index scans: 1
pages: 0 removed, 67 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 89 removed, 8873 remain, 0 are dead but not yet removable, oldest xmin: 210871
index scan needed: 2 pages from table (2.99% of total) had 341 dead item identifiers removed
index "pg_depend_depender_index": pages: 39 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_depend_reference_index": pages: 41 in total, 0 newly deleted, 0 currently deleted, 0 reusable

I/O timings: read: 44.254 ms, write: 0.531 ms

avg read rate: 13.191 MB/s, avg write rate: 8.794 MB/s
buffer usage: 167 hits, 126 misses, 84 dirtied
WAL usage: 85 records, 15 full page images, 78064 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.07 s

These logs are only available if track_io_timing is enabled.

Connecting Logging

PostgreSQL already logs the connection/disconnection if log_connections/log_disconnections is on. Therefore, PostgreSQL-14 now also logs the actual username supplied by the user. In case some external authentication is used, and mapping is defined in pg_ident.conf, it will become hard to identify the actual user name. Before PostgreSQL-14, you only see the mapped user instead of the actual user.

pg_ident.conf

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME

pg              vagrant                 postgres

pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     peer map=pg

Before PostgreSQL-14

LOG:  database system was shut down at 2021-11-19 11:24:30 UTC
LOG:  database system is ready to accept connections
LOG:  connection received: host=[local]
LOG:  connection authorized: user=postgres database=postgres application_name=psql

PostgreSQL-14

LOG:  database system is ready to accept connections
LOG:  connection received: host=[local]
LOG:  connection authenticated: identity="vagrant" method=peer (/usr/local/pgsql.14/bin/data/pg_hba.conf:89)
LOG:  connection authorized: user=postgres database=postgres application_name=psql

Conclusion

Every major PostgreSQL release carries significant enhancements, and PostgreSQL-14 was no different.

Monitoring is a key feature of any DBMS system, and PostgreSQL keeps upgrading its capabilities to improve its logging and monitoring capabilities. With these newly added features, you have more insights into connections; one can easily track queries and observe performance, and identify how much time is being spent by the vacuum process in read/write operations. This can significantly benefit you in configuring vacuum parameters better.


As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.

Read Our New White Paper:

Why Customers Choose Percona for PostgreSQL

Aug
23
2021
--

PostgreSQL Database Security: OS – Authentication

PostgreSQL Database Security OS - Authentication

Security is everybody’s concern when talking about data and information, and therefore it becomes the main foundation of every database. Security means protecting your data from unauthorized access. That means only authorized users can log in to a system called authentication; a user can only do what they are authorized to do (authorization) and log the user activity (accounting). I have explained these in my main security post, PostgreSQL Database Security: What You Need To Know.

When we are talking about security, authentication is the first line of defense. PostgreSQL provides various methods of authentication, which are categorized into three categories.

In most cases, PostgreSQL is configured to be used with internal authentication. Therefore I have discussed all internal authentication in the previous blog post I mentioned above. In this blog, we will discuss the operating system-based authentication methods for PostgreSQL. There are three methods to do OS-Based authentication.

Ident

Ident authentication only supports TCP/IP connections. Its ident server provides a mechanism to map the client’s operating system username onto the database username. It also has the option for username mapping.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the

$ psql postgres -h 127.0.0.1 -U postgres
psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL:  Ident authentication failed for user "postgres"

If no ident server is installed, you will need to install the ident2 on your ubuntu box or oidentd on CentOS 7. Once you have downloaded and configured the ident server, it is now time to configure PostgreSQL. It starts with creating a user map in “pg_ident.conf” file.

# Put your actual configuration here
# ----------------------------------
# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
PG_USER         vagrant                 postgres

Here we have mapped our system user “vagrant” user with PostgreSQL’s “postgres.” Time to login using the user vagrant.

$ psql postgres -h 127.0.0.1 -U postgres
psql (15devel)
Type "help" for help.

postgres=#

Note: The Identification Protocol is not intended as an authorization or access control protocol.

PAM (Pluggable Authentication Modules)

PAM (Pluggable Authentication Modules) authentication works similarly to “passwords.” You’d have to create a PAM service file that should enable PAM-based authentication. The service name should be set to “PostgreSQL.”

Once the service is created, PAM can now validate user name/password pairs and optionally the connected remote hostname or IP address. The user must already exist in the database for PAM authentication to work.

$ psql postgres -h 127.0.0.1 -U postgres
Password for user postgres: 
2021-08-11 13:16:38.332 UTC [13828] LOG:  pam_authenticate failed: Authentication failure
2021-08-11 13:16:38.332 UTC [13828] FATAL:  PAM authentication failed for user "postgres"
2021-08-11 13:16:38.332 UTC [13828] DETAIL:  Connection matched pg_hba.conf line 91: "host    all             all             127.0.0.1/32            pam"
psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL:  PAM authentication failed for user "postgres"

Ensure that the PostgreSQL server supports PAM authentication. It is a compile-time option that must be set when the server binaries were built. You can check if your PostgreSQL server supports PAM authentication using the following command.

$ pg_config | grep with-pam

CONFIGURE =  '--enable-tap-tests' '--enable-cassert' '--prefix=/usr/local/pgsql/' '--with-pam'

In case there is no PAM server file for PostgreSQL under /etc/pam.d, you’d have to create it manually. You may choose any name for the file; however, I prefer to name it “postgresql.”

$ /etc/pam.d/PostgreSQL

@include common-auth
@include common-account
@include common-session
@include common-password

Since the PostgreSQL user cannot read the password files, install sssd (SSSD – System Security Services Daemon) to bypass this limitation.

sudo apt-get install sssd

Add postgresql to the “ad_gpo_map_remote_interactive” to the “/etc/sssd/sssd.conf”

$ cat /etc/sssd/sssd.conf
ad_gpo_map_remote_interactive = +postgresql

Start sssd service, and check the status that it has properly started.

$ sudo systemctl start sssd

$ sudo systemctl status sssd

sssd.service - System Security Services Daemon

     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)

     Active: active (running) since Wed 2021-08-11 16:18:41 UTC; 12min ago

   Main PID: 1393 (sssd)

      Tasks: 2 (limit: 1071)

     Memory: 5.7M

     CGroup: /system.slice/sssd.service

             ??1393 /usr/sbin/sssd -i --logger=files

             ??1394 /usr/libexec/sssd/sssd_be --domain shadowutils --uid 0 --gid 0 --logger=files

Time now to configure pg_hba.conf to use the PAM authentication. We need to specify the PAM service name (pamservice) as part of authentication options. This should be the same as the file you have created in the /etc/pam.d folder, which in my case is postgresql.

# "local" is for Unix domain socket connections only

local   all             all                                     trust

# IPv4 local connections:

host    all             all             127.0.0.1/32            pam pamservice=postgresql

# IPv6 local connections:

host    all             all             ::1/128                 trust

# Allow replication connections from localhost, by a user with the

We must now reload (or restart) the PostgreSQL server. After this, you can try to login into the PostgreSQL server.

vagrant@ubuntu-focal:~$ psql postgres -h 127.0.0.1 -U postgres
psql (15devel)
Type "help" for help.
postgres=>

Note

If PAM is set up to read /etc/shadow, authentication will fail because the PostgreSQL server is started by a non-root user. However, this is not an issue when PAM is configured to use LDAP or other authentication methods.

Peer

Peer authentication is “ident”ical; i.e., Very much like the ident authentication! The only subtle differences are there are no ident servers, and this method works on local connections rather than over TCP/IP.

The peer authentication provides a mechanism to map the client’s operating system username onto the database username. It also has the option for username mapping.  The configuration is very similar to how we configured for ident authentication except that the authentication method is specified as “peer” instead of “ident.”

$ cat $PGDATA/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                   peer map=PG_USER
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust

$ psql postgres -U postgres

2021-08-12 10:51:11.855 UTC [1976] LOG:  no match in usermap "PG_USER" for user "postgres" authenticated as "vagrant"

2021-08-12 10:51:11.855 UTC [1976] FATAL:  Peer authentication failed for user "postgres"

2021-08-12 10:51:11.855 UTC [1976] DETAIL:  Connection matched pg_hba.conf line 89: "local   all             all                                     peer map=PG_USER"

psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "postgres"

$PGDATA/pg_hba.conf configuration will look something like this:

$ cat $PGDATA/pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only

local   all             all                                     peer map=PG_USER
# IPv4 local connections:

$PGDATA/pg_ident.conf

# Put your actual configuration here
# ----------------------------------
# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
PG_USER         vagrant                postgres

vagrant@ubuntu-focal:~$ psql postgres -U postgres
psql (15devel)
Type "help" for help.
postgres=>

Conclusion

We’ve covered several different authentication methods in this blog. These basic authentication methods involve the PostgreSQL server, kernel, and the ident server; options are available natively without any major external dependencies. It is, however, important that the database is secured properly to prevent unauthorized access to the data.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jul
30
2021
--

Improve PostgreSQL Query Performance Insights with pg_stat_monitor

Improve PostgreSQL Query Performance pg_stat_monitor

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(Grand Unified Configuration).

You can find more about auto_explain here.

pg_stat_monitor

Whilst all previously mentioned views/extensions are great in their own right, one needs to manually combine client/connection information from pg_stat_activity, statistical data from pg_stat_statements, and query plan from auto_analyze to complete the dataset to understand query performance patterns

And that’s precisely the pain that pg_stat_monitor alleviates.

The feature set has been growing over the past year, with it providing, in a single view, all performance-related information that you may need to debug a low performant query. For more information about the extension see our GitHub repository, or for user-specific documentation, see our user guide.

Feature Set

Some features that were part of earlier releases are already discussed in this blog, however, for completeness, I’m going to discuss those here as well.

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals; time buckets. This allows for much better data accuracy, especially in the case of high resolution or unreliable networks.
  • Multi-Dimensional Grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses a more detailed group for higher precision:
    • Bucket ID (bucket),
    • User ID (userid),
    • Database ID (dbid),
    • Query ID (queryid),
    • Client IP Address (client_ip),
    • Plan ID (planid),
    • Application Name (application_name).

This allows you to drill down into the performance of queries coming from particular client addresses and applications, which we at Percona have found to be very valuable in a number of cases.

  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. Also, we found having query parameter values is very helpful, as you can run EXPLAIN on it, or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful when it can help identify issues. With the help of the histogram function, you can now view a timing/calling data histogram in response to a SQL query. And yes, it even works in psql.
SELECT * FROM histogram(0, 'F44CD1B4B33A47AF') AS a(range TEXT, freq INT, bar TEXT);
       range        | freq |              bar
--------------------+------+--------------------------------
  (0 - 3)}          |    2 | ??????????????????????????????
  (3 - 10)}         |    0 |
  (10 - 31)}        |    1 | ???????????????
  (31 - 100)}       |    0 |
  (100 - 316)}      |    0 |
  (316 - 1000)}     |    0 |
  (1000 - 3162)}    |    0 |
  (3162 - 10000)}   |    0 |
  (10000 - 31622)}  |    0 |
  (31622 - 100000)} |    0 |
(10 rows)

  • Functions: It may come as a surprise, but we do understand that functions may internally execute statements!!! To help ease the tracking and analysis, pg_stat_monitor now provides a column that specifically helps keep track of the top query for a statement so that you can backtrack to the originating function.
  • Relation Names: Relations used in a query are available in the “relations” column in the pg_stat_monitor view. This reduces work at your and makes analysis simpler and quicker.
  • Query Types: With query classification as SELECT, INSERT, UPDATE or DELETE, analysis becomes simpler. It’s another effort reduced at your end, and another simplification by pg_stat_monitor.
SELECT bucket, substr(query,0, 50) AS query, cmd_type FROM pg_stat_monitor WHERE elevel = 0;
 bucket |                       query                       | cmd_type 
--------+---------------------------------------------------+----------
      4 | END                                               | 
      4 | SELECT abalance FROM pgbench_accounts WHERE aid = | SELECT
      4 | vacuum pgbench_branches                           | 
      4 | select count(*) from pgbench_branches             | SELECT
      4 | UPDATE pgbench_accounts SET abalance = abalance + | UPDATE
      4 | truncate pgbench_history                          | 
      4 | INSERT INTO pgbench_history (tid, bid, aid, delta | INSERT

  • Query Metadata: Google’s Sqlcommenter is a useful tool that in a way bridges that gap between ORM libraries and understanding database performance. And we support it. So, you can now put any key value data in the comments in /* … */ syntax (see Sqlcommenter documentation for details) in your SQL statements, and the information will be parsed by pg_stat_monitor and made available in the comments column in pg_stat_monitor view.
CREATE EXTENSION hstore;
CREATE FUNCTION text_to_hstore(s text) RETURNS hstore AS $$
BEGIN
    RETURN hstore(s::text[]);
EXCEPTION WHEN OTHERS THEN
    RETURN NULL;
END; $$ LANGUAGE plpgsql STRICT;


SELECT 1 AS num /* { "application", java_app, "real_ip", 192.168.1.1} */;
 num 
-----
   1
(1 row)

SELECT query, text_to_hstore(comments)->'real_ip' AS real_ip from pg_stat_monitor;
query                                                                       |  real_ip 
----------------------------------------------------------------------------+-------------
 SELECT $1 AS num /* { "application", psql_app, "real_ip", 192.168.1.3) */  | 192.168.1.1

  • Logging Error and Warning: As seen in different monitoring/statics collector tools, most of the tools/extensions only monitor the successful queries. But in many cases, monitoring ERROR, WARNING, and LOG give meaningful information to debug the issue. pg_stat_monitor not only monitors the ERROR/WARNINGS/LOG but also collects the statistics about these queries. In PostgreSQL queries with ERROR/WARNING there is an error level (elevel), SQL Code (sqlcode), and an error message is attached. Pg_stat_monitor collects all this information along with its aggregates.
SELECT substr(query,0,50) AS query, decode_error_level(elevel) AS elevel,sqlcode, calls, substr(message,0,50) message 
FROM pg_stat_monitor;
                       query                       | elevel | sqlcode | calls |                      message                      
---------------------------------------------------+--------+---------+-------+---------------------------------------------------
 select substr(query,$1,$2) as query, decode_error |        |       0 |     1 | 
 select bucket,substr(query,$1,$2),decode_error_le |        |       0 |     3 | 
 select 1/0;                                       | ERROR  |     130 |     1 | division by zero

We’ve Come a Long Way

What started as a concept is now nearing its final approach. The pg_stat_monitor extension has evolved and has become very feature-rich. We have no doubt about its usefulness for DBAs, performance engineers, application developers, and anyone who needs to look at query performance. We believe it can help save many hours and help identify unexpected query behaviors. 

pg_stat_monitor is available on Github. We are releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out,  drop us a note, file an issue, or make a pull request!

Try Percona Distribution for PostgreSQL today; it’s free to download and use! 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com