Nov
29
2021
--

PostgreSQL 14 Database Monitoring and Logging Enhancements

PostgreSQL-14 Database Monitoring and Logging Enhancements

PostgreSQL-14 was released in September 2021, and it contained many performance improvements and feature enhancements, including some features from a monitoring perspective. As we know, monitoring is the key element of any database management system, and PostgreSQL keeps updating and enhancing the monitoring capabilities. Here are some key ones in PostgreSQL-14.

Query Identifier

Query identifier is used to identify the query, which can be cross-referenced between extensions. Prior to PostgreSQL-14, extensions used an algorithm to calculate the query_id. Usually, the same algorithm is used to calculate the query_id, but any extension can use its own algorithm. Now, PostgreSQL-14 optionally provides a query_id to be computed in the core. Now PostgreSQL-14’s monitoring extensions and utilities like pg_stat_activity, explain, and in pg_stat_statments use this query_id instead of calculating its own. This query_id can be seen in csvlog, after specifying in the log_line_prefix. From a user perspective, there are two benefits of this feature.

  • All the utilities/extensions will use the same query_id calculated by core, which provides an ease to cross-reference this query_id. Previously, all the utilities/extensions needed to use the same algorithm in their code to achieve this capability.
  • The second benefit is extension/utilities can use calculated query_id and don’t need to again, which is a performance benefit.

PostgreSQL introduces a new GUC configuration parameter compute_query_id to enable/disable this feature. The default is auto; this can be turned on/off in postgresql.conf file, or using the SET command.

  • pg_stat_activity

SET compute_query_id = off;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 | query_id 
----------+-----------------------------------------------------------------------+----------
 postgres | select datname, query, query_id from pg_stat_activity;                |         
 postgres | UPDATE pgbench_branches SET bbalance = bbalance + 2361 WHERE bid = 1; |

SET compute_query_id = on;

SELECT datname, query, query_id FROM pg_stat_activity;
 datname  |                                 query                                 |      query_id       
----------+-----------------------------------------------------------------------+---------------------
 postgres | select datname, query, query_id from pg_stat_activity;                |  846165942585941982
 postgres | UPDATE pgbench_tellers SET tbalance = tbalance + 3001 WHERE tid = 44; | 3354982309855590749

  • Log

In the previous versions, there was no mechanism to compute the query_id in the server core. The query_id is especially useful in the log files. To enable that, we need to configure the log_line_prefix configuration parameter. The “%Q” option is added to show the query_id; here is the example.

log_line_prefix = 'query_id = [%Q] -> '

query_id = [0] -> LOG:  statement: CREATE PROCEDURE ptestx(OUT a int) LANGUAGE SQL AS $$ INSERT INTO cp_test VALUES (1, 'a') $$;
query_id = [-6788509697256188685] -> ERROR:  return type mismatch in function declared to return record
query_id = [-6788509697256188685] -> DETAIL:  Function's final statement must be SELECT or INSERT/UPDATE/DELETE RETURNING.
query_id = [-6788509697256188685] -> CONTEXT:  SQL function "ptestx"
query_id = [-6788509697256188685] -> STATEMENT:  CREATE PROCEDURE ptestx(OUT a int) LANGUAGE SQL AS $$ INSERT INTO cp_test VALUES (1, 'a') $$;

  • Explain

The EXPLAIN VERBOSE will show the query_id if compute_query_id is true.

SET compute_query_id = off;

EXPLAIN VERBOSE SELECT * FROM foo;
                          QUERY PLAN                          
--------------------------------------------------------------

 Seq Scan on public.foo  (cost=0.00..15.01 rows=1001 width=4)
   Output: a
(2 rows)

SET compute_query_id = on;

EXPLAIN VERBOSE SELECT * FROM foo;
                          QUERY PLAN                          
--------------------------------------------------------------
 Seq Scan on public.foo  (cost=0.00..15.01 rows=1001 width=4)
   Output: a
 Query Identifier: 3480779799680626233
(3 rows)

autovacuum and auto-analyze Logging Enhancements

PostgreSQL-14 improves the logging of auto-vacuum and auto-analyze. Now we can see the I/O timings in the log, showing how much has been spent reading and writing.

automatic vacuum of table "postgres.pg_catalog.pg_depend": index scans: 1
pages: 0 removed, 67 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 89 removed, 8873 remain, 0 are dead but not yet removable, oldest xmin: 210871
index scan needed: 2 pages from table (2.99% of total) had 341 dead item identifiers removed
index "pg_depend_depender_index": pages: 39 in total, 0 newly deleted, 0 currently deleted, 0 reusable
index "pg_depend_reference_index": pages: 41 in total, 0 newly deleted, 0 currently deleted, 0 reusable

I/O timings: read: 44.254 ms, write: 0.531 ms

avg read rate: 13.191 MB/s, avg write rate: 8.794 MB/s
buffer usage: 167 hits, 126 misses, 84 dirtied
WAL usage: 85 records, 15 full page images, 78064 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.07 s

These logs are only available if track_io_timing is enabled.

Connecting Logging

PostgreSQL already logs the connection/disconnection if log_connections/log_disconnections is on. Therefore, PostgreSQL-14 now also logs the actual username supplied by the user. In case some external authentication is used, and mapping is defined in pg_ident.conf, it will become hard to identify the actual user name. Before PostgreSQL-14, you only see the mapped user instead of the actual user.

pg_ident.conf

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME

pg              vagrant                 postgres

pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     peer map=pg

Before PostgreSQL-14

LOG:  database system was shut down at 2021-11-19 11:24:30 UTC
LOG:  database system is ready to accept connections
LOG:  connection received: host=[local]
LOG:  connection authorized: user=postgres database=postgres application_name=psql

PostgreSQL-14

LOG:  database system is ready to accept connections
LOG:  connection received: host=[local]
LOG:  connection authenticated: identity="vagrant" method=peer (/usr/local/pgsql.14/bin/data/pg_hba.conf:89)
LOG:  connection authorized: user=postgres database=postgres application_name=psql

Conclusion

Every major PostgreSQL release carries significant enhancements, and PostgreSQL-14 was no different.

Monitoring is a key feature of any DBMS system, and PostgreSQL keeps upgrading its capabilities to improve its logging and monitoring capabilities. With these newly added features, you have more insights into connections; one can easily track queries and observe performance, and identify how much time is being spent by the vacuum process in read/write operations. This can significantly benefit you in configuring vacuum parameters better.


As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.

Read Our New White Paper:

Why Customers Choose Percona for PostgreSQL

Aug
23
2021
--

PostgreSQL Database Security: OS – Authentication

PostgreSQL Database Security OS - Authentication

Security is everybody’s concern when talking about data and information, and therefore it becomes the main foundation of every database. Security means protecting your data from unauthorized access. That means only authorized users can log in to a system called authentication; a user can only do what they are authorized to do (authorization) and log the user activity (accounting). I have explained these in my main security post, PostgreSQL Database Security: What You Need To Know.

When we are talking about security, authentication is the first line of defense. PostgreSQL provides various methods of authentication, which are categorized into three categories.

In most cases, PostgreSQL is configured to be used with internal authentication. Therefore I have discussed all internal authentication in the previous blog post I mentioned above. In this blog, we will discuss the operating system-based authentication methods for PostgreSQL. There are three methods to do OS-Based authentication.

Ident

Ident authentication only supports TCP/IP connections. Its ident server provides a mechanism to map the client’s operating system username onto the database username. It also has the option for username mapping.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the

$ psql postgres -h 127.0.0.1 -U postgres
psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL:  Ident authentication failed for user "postgres"

If no ident server is installed, you will need to install the ident2 on your ubuntu box or oidentd on CentOS 7. Once you have downloaded and configured the ident server, it is now time to configure PostgreSQL. It starts with creating a user map in “pg_ident.conf” file.

# Put your actual configuration here
# ----------------------------------
# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
PG_USER         vagrant                 postgres

Here we have mapped our system user “vagrant” user with PostgreSQL’s “postgres.” Time to login using the user vagrant.

$ psql postgres -h 127.0.0.1 -U postgres
psql (15devel)
Type "help" for help.

postgres=#

Note: The Identification Protocol is not intended as an authorization or access control protocol.

PAM (Pluggable Authentication Modules)

PAM (Pluggable Authentication Modules) authentication works similarly to “passwords.” You’d have to create a PAM service file that should enable PAM-based authentication. The service name should be set to “PostgreSQL.”

Once the service is created, PAM can now validate user name/password pairs and optionally the connected remote hostname or IP address. The user must already exist in the database for PAM authentication to work.

$ psql postgres -h 127.0.0.1 -U postgres
Password for user postgres: 
2021-08-11 13:16:38.332 UTC [13828] LOG:  pam_authenticate failed: Authentication failure
2021-08-11 13:16:38.332 UTC [13828] FATAL:  PAM authentication failed for user "postgres"
2021-08-11 13:16:38.332 UTC [13828] DETAIL:  Connection matched pg_hba.conf line 91: "host    all             all             127.0.0.1/32            pam"
psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL:  PAM authentication failed for user "postgres"

Ensure that the PostgreSQL server supports PAM authentication. It is a compile-time option that must be set when the server binaries were built. You can check if your PostgreSQL server supports PAM authentication using the following command.

$ pg_config | grep with-pam

CONFIGURE =  '--enable-tap-tests' '--enable-cassert' '--prefix=/usr/local/pgsql/' '--with-pam'

In case there is no PAM server file for PostgreSQL under /etc/pam.d, you’d have to create it manually. You may choose any name for the file; however, I prefer to name it “postgresql.”

$ /etc/pam.d/PostgreSQL

@include common-auth
@include common-account
@include common-session
@include common-password

Since the PostgreSQL user cannot read the password files, install sssd (SSSD – System Security Services Daemon) to bypass this limitation.

sudo apt-get install sssd

Add postgresql to the “ad_gpo_map_remote_interactive” to the “/etc/sssd/sssd.conf”

$ cat /etc/sssd/sssd.conf
ad_gpo_map_remote_interactive = +postgresql

Start sssd service, and check the status that it has properly started.

$ sudo systemctl start sssd

$ sudo systemctl status sssd

sssd.service - System Security Services Daemon

     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)

     Active: active (running) since Wed 2021-08-11 16:18:41 UTC; 12min ago

   Main PID: 1393 (sssd)

      Tasks: 2 (limit: 1071)

     Memory: 5.7M

     CGroup: /system.slice/sssd.service

             ??1393 /usr/sbin/sssd -i --logger=files

             ??1394 /usr/libexec/sssd/sssd_be --domain shadowutils --uid 0 --gid 0 --logger=files

Time now to configure pg_hba.conf to use the PAM authentication. We need to specify the PAM service name (pamservice) as part of authentication options. This should be the same as the file you have created in the /etc/pam.d folder, which in my case is postgresql.

# "local" is for Unix domain socket connections only

local   all             all                                     trust

# IPv4 local connections:

host    all             all             127.0.0.1/32            pam pamservice=postgresql

# IPv6 local connections:

host    all             all             ::1/128                 trust

# Allow replication connections from localhost, by a user with the

We must now reload (or restart) the PostgreSQL server. After this, you can try to login into the PostgreSQL server.

vagrant@ubuntu-focal:~$ psql postgres -h 127.0.0.1 -U postgres
psql (15devel)
Type "help" for help.
postgres=>

Note

If PAM is set up to read /etc/shadow, authentication will fail because the PostgreSQL server is started by a non-root user. However, this is not an issue when PAM is configured to use LDAP or other authentication methods.

Peer

Peer authentication is “ident”ical; i.e., Very much like the ident authentication! The only subtle differences are there are no ident servers, and this method works on local connections rather than over TCP/IP.

The peer authentication provides a mechanism to map the client’s operating system username onto the database username. It also has the option for username mapping.  The configuration is very similar to how we configured for ident authentication except that the authentication method is specified as “peer” instead of “ident.”

$ cat $PGDATA/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                   peer map=PG_USER
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust

$ psql postgres -U postgres

2021-08-12 10:51:11.855 UTC [1976] LOG:  no match in usermap "PG_USER" for user "postgres" authenticated as "vagrant"

2021-08-12 10:51:11.855 UTC [1976] FATAL:  Peer authentication failed for user "postgres"

2021-08-12 10:51:11.855 UTC [1976] DETAIL:  Connection matched pg_hba.conf line 89: "local   all             all                                     peer map=PG_USER"

psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "postgres"

$PGDATA/pg_hba.conf configuration will look something like this:

$ cat $PGDATA/pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only

local   all             all                                     peer map=PG_USER
# IPv4 local connections:

$PGDATA/pg_ident.conf

# Put your actual configuration here
# ----------------------------------
# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
PG_USER         vagrant                postgres

vagrant@ubuntu-focal:~$ psql postgres -U postgres
psql (15devel)
Type "help" for help.
postgres=>

Conclusion

We’ve covered several different authentication methods in this blog. These basic authentication methods involve the PostgreSQL server, kernel, and the ident server; options are available natively without any major external dependencies. It is, however, important that the database is secured properly to prevent unauthorized access to the data.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jul
30
2021
--

Improve PostgreSQL Query Performance Insights with pg_stat_monitor

Improve PostgreSQL Query Performance pg_stat_monitor

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(Grand Unified Configuration).

You can find more about auto_explain here.

pg_stat_monitor

Whilst all previously mentioned views/extensions are great in their own right, one needs to manually combine client/connection information from pg_stat_activity, statistical data from pg_stat_statements, and query plan from auto_analyze to complete the dataset to understand query performance patterns

And that’s precisely the pain that pg_stat_monitor alleviates.

The feature set has been growing over the past year, with it providing, in a single view, all performance-related information that you may need to debug a low performant query. For more information about the extension see our GitHub repository, or for user-specific documentation, see our user guide.

Feature Set

Some features that were part of earlier releases are already discussed in this blog, however, for completeness, I’m going to discuss those here as well.

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals; time buckets. This allows for much better data accuracy, especially in the case of high resolution or unreliable networks.
  • Multi-Dimensional Grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses a more detailed group for higher precision:
    • Bucket ID (bucket),
    • User ID (userid),
    • Database ID (dbid),
    • Query ID (queryid),
    • Client IP Address (client_ip),
    • Plan ID (planid),
    • Application Name (application_name).

This allows you to drill down into the performance of queries coming from particular client addresses and applications, which we at Percona have found to be very valuable in a number of cases.

  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. Also, we found having query parameter values is very helpful, as you can run EXPLAIN on it, or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful when it can help identify issues. With the help of the histogram function, you can now view a timing/calling data histogram in response to a SQL query. And yes, it even works in psql.
SELECT * FROM histogram(0, 'F44CD1B4B33A47AF') AS a(range TEXT, freq INT, bar TEXT);
       range        | freq |              bar
--------------------+------+--------------------------------
  (0 - 3)}          |    2 | ??????????????????????????????
  (3 - 10)}         |    0 |
  (10 - 31)}        |    1 | ???????????????
  (31 - 100)}       |    0 |
  (100 - 316)}      |    0 |
  (316 - 1000)}     |    0 |
  (1000 - 3162)}    |    0 |
  (3162 - 10000)}   |    0 |
  (10000 - 31622)}  |    0 |
  (31622 - 100000)} |    0 |
(10 rows)

  • Functions: It may come as a surprise, but we do understand that functions may internally execute statements!!! To help ease the tracking and analysis, pg_stat_monitor now provides a column that specifically helps keep track of the top query for a statement so that you can backtrack to the originating function.
  • Relation Names: Relations used in a query are available in the “relations” column in the pg_stat_monitor view. This reduces work at your and makes analysis simpler and quicker.
  • Query Types: With query classification as SELECT, INSERT, UPDATE or DELETE, analysis becomes simpler. It’s another effort reduced at your end, and another simplification by pg_stat_monitor.
SELECT bucket, substr(query,0, 50) AS query, cmd_type FROM pg_stat_monitor WHERE elevel = 0;
 bucket |                       query                       | cmd_type 
--------+---------------------------------------------------+----------
      4 | END                                               | 
      4 | SELECT abalance FROM pgbench_accounts WHERE aid = | SELECT
      4 | vacuum pgbench_branches                           | 
      4 | select count(*) from pgbench_branches             | SELECT
      4 | UPDATE pgbench_accounts SET abalance = abalance + | UPDATE
      4 | truncate pgbench_history                          | 
      4 | INSERT INTO pgbench_history (tid, bid, aid, delta | INSERT

  • Query Metadata: Google’s Sqlcommenter is a useful tool that in a way bridges that gap between ORM libraries and understanding database performance. And we support it. So, you can now put any key value data in the comments in /* … */ syntax (see Sqlcommenter documentation for details) in your SQL statements, and the information will be parsed by pg_stat_monitor and made available in the comments column in pg_stat_monitor view.
CREATE EXTENSION hstore;
CREATE FUNCTION text_to_hstore(s text) RETURNS hstore AS $$
BEGIN
    RETURN hstore(s::text[]);
EXCEPTION WHEN OTHERS THEN
    RETURN NULL;
END; $$ LANGUAGE plpgsql STRICT;


SELECT 1 AS num /* { "application", java_app, "real_ip", 192.168.1.1} */;
 num 
-----
   1
(1 row)

SELECT query, text_to_hstore(comments)->'real_ip' AS real_ip from pg_stat_monitor;
query                                                                       |  real_ip 
----------------------------------------------------------------------------+-------------
 SELECT $1 AS num /* { "application", psql_app, "real_ip", 192.168.1.3) */  | 192.168.1.1

  • Logging Error and Warning: As seen in different monitoring/statics collector tools, most of the tools/extensions only monitor the successful queries. But in many cases, monitoring ERROR, WARNING, and LOG give meaningful information to debug the issue. pg_stat_monitor not only monitors the ERROR/WARNINGS/LOG but also collects the statistics about these queries. In PostgreSQL queries with ERROR/WARNING there is an error level (elevel), SQL Code (sqlcode), and an error message is attached. Pg_stat_monitor collects all this information along with its aggregates.
SELECT substr(query,0,50) AS query, decode_error_level(elevel) AS elevel,sqlcode, calls, substr(message,0,50) message 
FROM pg_stat_monitor;
                       query                       | elevel | sqlcode | calls |                      message                      
---------------------------------------------------+--------+---------+-------+---------------------------------------------------
 select substr(query,$1,$2) as query, decode_error |        |       0 |     1 | 
 select bucket,substr(query,$1,$2),decode_error_le |        |       0 |     3 | 
 select 1/0;                                       | ERROR  |     130 |     1 | division by zero

We’ve Come a Long Way

What started as a concept is now nearing its final approach. The pg_stat_monitor extension has evolved and has become very feature-rich. We have no doubt about its usefulness for DBAs, performance engineers, application developers, and anyone who needs to look at query performance. We believe it can help save many hours and help identify unexpected query behaviors. 

pg_stat_monitor is available on Github. We are releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out,  drop us a note, file an issue, or make a pull request!

Try Percona Distribution for PostgreSQL today; it’s free to download and use! 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com