Oct
09
2023
--

PostgreSQL Tuple-Level Statistics With pgstattuple

PostgreSQL Tuple-Level Statistics With pgstattuple

Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.

This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.

Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.

In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.

First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.

postgres@ip-172-31-46-212:~$ psql
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))
Type "help" for help.
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             |                   | show tuple-level statistics
(1 row)
postgres=#
postgres=# create extension pgstattuple;
CREATE EXTENSION
postgres=#
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             | 1.5               | show tuple-level statistics
(1 row)
postgres=#

By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.

postgres=# grant pg_stat_scan_tables to <nonsuperuser>;
GRANT ROLE

postgres=> du <nonsuperuser>
                 List of roles
 Role name | Attributes |       Member of
———–+————+———————–
 <nonsuperuser>        |            | {pg_stat_scan_tables}
postgres=>

Next, before using the pgstattuple functions, let’s create a table and index for demonstration.

postgres=# create table workshop (jobno int);
CREATE TABLE

postgres=# insert into workshop values (generate_series(1,80000));
INSERT 0 80000

postgres=# create index workshop_index on workshop (jobno);
CREATE INDEX
postgres=#

pgstattuple functions

pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID. 

For example:

SELECT * FROM pgstattuple('pg_catalog.pg_proc');

SELECT * FROM pgstattuple('workshop');

pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.

The example below shows the tuple statistics of the workshop table.

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

Below are the pgstattuple output columns and their description.

Column Type Description
table_len bigint Physical relation length in bytes
tuple_count bigint Number of live tuples
tuple_len bigint Total length of live tuples in bytes
tuple_percent float8 Percentage of live tuples
dead_tuple_count bigint Number of dead tuples
dead_tuple_len bigint Total length of dead tuples in bytes
dead_tuple_percent float8 Percentage of dead tuples
free_space bigint Total free space in bytes
free_percent float8 Percentage of free space

 

Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

Delete the few ROWS from the table.

postgres=# DELETE FROM workshop WHERE jobno % 8 = 0;
DELETE 10000
postgres=#

In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 10000
dead_tuple_len     | 280000
dead_tuple_percent | 9.66
free_space         | 10056
free_percent       | 0.35

Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:

  1. free_space has increased. Vacuum has reclaimed the space and made it available for re-use at the database level.
  2. Dead rows have been removed. 
  3. The table_len is the same as O.S. level space is not reclaimed.
postgres=# vacuum workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 330412
free_percent       | 11.39

After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.

postgres=# vacuum full workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2539520
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 77.18
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10840
free_percent       | 0.43
postgres=#

pgstattuple queries to check table bloat

We can list the tables where the dead tuple percentage is high by using the query below.

postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc;
 relname                  | dead_tuple_percent
--------------------------+--------------------
 pg_init_privs            |               2.56
 pg_class                 |               1.69
 pg_statistic             |               1.45
 pg_authid                |               1.37
 pg_extension             |               1.28
 pg_type                  |               1.17
 pg_namespace             |               1.17

The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.

SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc;

-[ RECORD 1 ]------+-------------------------
relname            | pg_init_privs
oid                | 3394
relowner           | 10
dead_tuple_percent | 2.56
table_len          | 24576
tuple_count        | 222
tuple_len          | 17316
tuple_percent      | 70.46
dead_tuple_count   | 9
dead_tuple_len     | 630
dead_tuple_percent | 2.56
free_space         | 4568
free_percent       | 18.59

-[ RECORD 2 ]------+-------------------------
relname            | pg_class
oid                | 1259
relowner           | 10
dead_tuple_percent | 1.69
table_len          | 114688
tuple_count        | 415
tuple_len          | 80473
tuple_percent      | 70.17
dead_tuple_count   | 11
dead_tuple_len     | 1937
dead_tuple_percent | 1.69
free_space         | 26976
free_percent       | 23.52

pgstatindex(regclass): This function returns a record showing information about a B-tree index. 

For example:

postgres=# SELECT * FROM pgstatindex ('workshop_index');

-[ RECORD 1 ]------+--------
version            | 4
tree_level         | 1
index_size         | 1589248
root_block_no      | 3
internal_pages     | 1
leaf_pages         | 192
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 89.74
leaf_fragmentation | 0

The output columns and their description.

Column Type Description
version integer B-tree version number
tree_level integer Tree level of the root page
index_size bigint Total index size in bytes
root_block_no bigint Location of root page (zero if none)
internal_pages bigint Number of “internal” (upper-level) pages
leaf_pages bigint Number of leaf pages
empty_pages bigint Number of empty pages
deleted_pages bigint Number of deleted pages

 

pgstatginindex(regclass): This function returns a record showing information about a GIN index.

For example:

SELECT * FROM pgstatginindex('table_gin_index');

pgstathashindex(regclass): This function returns a record showing information about a HASH index. 

For example:

select * from pgstathashindex('table_hash_index');

In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Apr
05
2019
--

Writing PostgreSQL Extensions is Fun – C Language

postgresql extensions

PostgreSQL is a powerful open source relational database management system. It extends the SQL language with additional features. A DBMS is not only defined by its performance and out of the box features, but also its ability to support bespoke/additional user-specific functionality. Some of these functionalities may be in the form of database constructs or modules, like stored procedures or functions, but their scope is generally limited to the functionality being exposed by the DBMS. For instance, how will you write a custom query-analyzing application that resides within your DBMS?

To support such options, PostgreSQL provides a pluggable architecture that allows you to install extensions. Extensions may consist of a configuration (control) file, a combination of SQL files, and dynamically loadable libraries.

This means you can write your own code as per the defined guidelines of an extension and plug it in a PostgreSQL instance without changing the actual PostgreSQL code tree. An extension by very definition extends what PostgreSQL can do, but more than that, it gives you the ability to interact with external entities. These external entities can be other database management systems like ClickHouse, Mongo or HDFs (normally these are called foreign data wrappers), or other interpreters or compilers (thus allowing us to write database functions in another language like Java, Python, Perl or TCL, etc.). Another potential use case of an extension can be for code obfuscation which allows you to protect your super secret code from prying eyes.

Build your own

To build your own extension, you don’t need a complete PostgreSQL code base. You can build and install an extension using installed PostgreSQL (it may require you to install a devel RPM or Debian package). Details about extensions can be found in PostgreSQL’s official documentation[1]. There many extensions available for different features in the contrib directory of PostgreSQL source. Other than the contrib directory, people are also writing extensions readily available on the internet but currently not part of the PostgreSQL source tree. The pg_stat_statements, PL/pgSQL, and PostGIS are examples of the best known or most widely used extensions.

Generally available PostgreSQL extensions may be classified into four main categories:

  • Add support of a new language extension (PL/pgSQL, PL/Python, and PL/Java)
  • Data type extensions where you can introduce new ((Hstore, cube, and hstore)
  • Miscellaneous extensions (contrib folder has many miscellaneous extensions)
  • Foreign Data Wrapper extension (postgres_fdw, mysqldb_fdw, clickhousedb_fdw)

There are four basic file types that are required for building an extension:

  • Makefile: Which uses PGXS PostgreSQL’s build infrastructure for extensions.
  • Control File: Carries information about the extension.
  • SQL File(s): If the extension has any SQL code, it may reside in form SQL files (optional)
  • C Code: The shared object that we want to build (optional).

Extension Makefile

To compile the C code, we need a makefile. It’s a very simple makefile with the exception of “PGXS”, which is PostgreSQL’s infrastructural makefile for creating extensions. The inclusion of “PGXS” is done by invoking pg_config binary with “–pgxs” flag. The detail of that file can be found at GitHub[2].

This is a sample makefile, which can be used to compile the C code.

EXTENSION = log
MODULE_big = log
DATA = log--0.0.1.sql
OBJS = log.o
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Extension Control File

This file must be named as [EXTENSION NAME]. control file. The control file can contain many options which can be found at official documentation [3]. But in this example, I have used some basic options.

comments: Comments about the extension.

default_version: This is the extension SQL version. The name of the SQL file contains this information in the file name.

relocatable:  Tells PostgreSQL if it is possible to move contained objects into a different schema.

module_pathname:  This information is replaced with the actual lib file path.

comment = 'PostgreSQL Utility Command Logger
'default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/log'

The contents of the file can be seen using the psql command \dx psql.

postgres=# \dx log      
List of installed extensions Name   | Version | Schema |       Description
------------------------------------+---------+--------+----------------------------------
log                                 | 0.0.1   | public | PostgreSQL Utility Command Logger

Extension SQL File

This is a mapping file, which I used to map the PostgreSQL function with the corresponding C function. Whenever you call the SQL function then the corresponding C function is called. The name of the file must be [EXTENSION NAME]–[default-version].sql. This is the same default_version as defined in the control file.

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)
RETURNS SETOF RECORDAS 'MODULE_PATHNAME',
'pg_all_queries'
LANGUAGE C STRICT VOLATILE;

Extension C Code

There are three kinds of functions you can write in c code.

The first is where you call your c code function using SQL function written in SQL file of the extension.

The second type of function is callbacks. You register that callback by assigning the pointer of the function. There is no need for an SQL function here. You call this function automatically when a specific event occurs.

The third type of function is called automatically, even without registering. These functions are called on events such as extension load/unload time etc.

This is the C file containing the definition of the C code. There is no restriction for the name, or of the number of C files.

#include "postgres.h"
/* OS Includes */
/* PostgreSQL Includes */
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
Datum pg_all_queries(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_all_queries);
static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest,       char *completionTag);

You need to include postgres.h for the extension. You can include other PostgreSQL as needed. PG_MODULE_MAGIC is macro which you need to include in the C file for the extension. _PG_init and _PG_fini are functions that are called when the extension is loaded or unloaded, respectively.

Here is an example of the load and unload functions of an extension.

void _PG_init(void)
{
    /* ... C code here at time of extension loading ... */
    ProcessUtility_hook = process_utility;
}
Void _PG_fini(void)
{
    /* ... C code here at time of extension unloading ... */
}

Here is an example of a callback function that you can invoke whenever you call a utility statement e.g. any DDL statement. The “queryString” variable contains the actual query text.

static void process_utility(PlannedStmt *pstmt,
                           const char *queryString,
                           ProcessUtilityContext context,
                           ParamListInfo params,
                           QueryEnvironment *queryEnv,DestReceiver *dest,
                           char *completionTag)
{
    /* ... C code here ... */
    standard_ProcessUtility(pstmt,  &nbsp;
                            queryString,  &nbsp;
                            context,  &nbsp;
                            params,  &nbsp;
                            queryEnv,  &nbsp;
                            dest,
                            completionTag);
    /* ... C code here ... */
}

Finally, here’s an example of a C function that is invoked through a user-defined SQL function. This internally calls the C function contained in our shared object.

Datum pg_all_queries(PG_FUNCTION_ARGS)
{
    /* ... C code here ... */
    tupstore = tuplestore_begin_heap(true, false, work_mem);
    /* ... C code here ... */     
    values[0] = CStringGetTextDatum(query);
    values[1] = CStringGetTextDatum(pid);
    /* ... C code here ... */
    tuplestore_donestoring(tupstore);
    return (Datum) 0;
}

Compile and Install

Before compilation, you need to set the PATH for PostgreSQL’s bin directory if there is no pg_config available in the system paths.

export PATH=/usr/local/pgsql/bin:$PATH

make USE_PGXS=1

make USE_PGXS=1 install

Output

We can now use our extension through a simple SQL query. Following is the output that is coming straight out of extension written in C programming language.

postgres=# select * from pg_all_queries();          
query                      | pid
--------------------------+|------
create table foo(a int);  +| 8196
create table bar(a int);  +| 8196
drop table foo;           +| 8196
(3 rows)

I hope this example can serve as a starting point for you to create more useful extensions that will not only help you and your company, but will also give you an opportunity to share and help the PostgreSQL community grow.

The complete example can be found at Github[4].

[1]: https://www.postgresql.org/docs/current/external-extensions.html

[2]: https://github.com/postgres/postgres/blob/master/src/makefiles/pgxs.mk

[3]: https://www.postgresql.org/docs/9.1/extend-extensions.html

[4]: https://github.com/ibrarahmad/Blog-Examples/tree/master/log


Photo from Pexels

Oct
05
2018
--

PostgreSQL Extensions for an Enterprise-Grade System

PostgreSQL extensions for logging

PostgreSQL® logoIn this current series of blog posts we have been discussing various relevant aspects when building an enterprise-grade PostgreSQL setup, such as security, back up strategy, high availability, and different methods to scale PostgreSQL. In this blog post, we’ll get to review some of the most popular open source extensions for PostgreSQL, used to expand its capabilities and address specific needs. We’ll cover some of them during a demo in our upcoming webinar on October 10.

Expanding with PostgreSQL Extensions

PostgreSQL is one of the world’s most feature-rich and advanced open source RDBMSs. Its features are not just limited to those released by the community through major/minor releases. There are hundreds of additional features developed using the extensions capabilities in PostgreSQL, which can cater to needs of specific users. Some of these extensions are very popular and useful to build an enterprise-grade PostgreSQL environment. We previously blogged about a couple of FDW extensions (mysql_fdw and postgres_fdw ) which will allow PostgreSQL databases to talk to remote homogeneous/heterogeneous databases like PostgreSQL and MySQL, MongoDB, etc. We will now cover a few other additional extensions that can expand your PostgreSQL server capabilities.

pg_stat_statements

The pg_stat_statements module provides a means for tracking execution statistics of all SQL statements executed by a server. The statistics gathered by the module are made available via a view named pg_stat_statements. This extension must be installed in each of the databases you want to track, and like many of the extensions in this list, it is available in the contrib package from the PostgreSQL PGDG repository.

pg_repack

Tables in PostgreSQL may end up with fragmentation and bloat due to the specific MVCC implementation in PostgreSQL, or simply due to a high number of rows being naturally removed. This could lead to not only unused space being held inside the table but also to sub-optimal execution of SQL statements. pg_repack is the most popular way to address this problem by reorganizing and repacking the table. It can reorganize the table’s content without placing an exclusive lock on it during the process. DMLs and queries can continue while repacking is happening.  Version 1.2 of pg_repack introduces further new features of parallel index builds, and the ability to rebuild just the indexes. Please refer to the official documentation for more details.

pgaudit

PostgreSQL has a basic statement logging feature. It can be implemented using the standard logging facility with

log_statement = all

 . But this is not sufficient for many audit requirements. One of the essential features for enterprise deployments is the capability for fine-grained auditing the user interactions/statements issued to the database. This is a major compliance requirement for many security standards. The pgaudit extension caters to these requirements.

The PostgreSQL Audit Extension (pgaudit) provides detailed session and/or object audit logging via the standard PostgreSQL logging facility. Please refer to the settings section of its official documentation for more details.

pldebugger

This is a must-have extension for developers who work on stored functions written in PL/pgSQL. This extension is well integrated with GUI tools like pgadmin, which allows developers to step through their code and debug it. Packages for pldebugger are also available in the PGDG repository and installation is straightforward.Once it is set up, we can step though and debug the code remotely.

The official git repo is available here

plprofiler

This is a wonderful extension for finding out where the code is slowing down. This is very helpful, particularly during complex migrations from proprietary databases, like from Oracle to PostgreSQL, which affect application performance. This extension can prepare a report on the overall execution time and tables representation, including flamegraphs, with clear information about each line of code. This extension is not, however, available from the PGDG repo: you will need to build it from source. Details on building and installing plprofiler will be covered in a future blog post. Meanwhile, the official repository and documentation is available here

PostGIS

PostGIS is arguably the most versatile implementation of the specifications of the Open Geospatial Consortium. We can see a large list of features in PostGIS that are rarely available in any other RDBMSs.

There are many users who have primarily opted to use PostgreSQL because of the features supported by PostGIS. In fact, all these features are not implemented as a single extension, but are instead delivered by a collection of extensions. This makes PostGIS one of the most complex extensions to build from source. Luckily, everything is available from the PGDG repository:

$ sudo yum install postgis24_10.x86_64

Once the postgis package is installed, we are able to create the extensions on our target database:

postgres=# CREATE EXTENSION postgis;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_topology;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_sfcgal;
CREATE EXTENSION
postgres=# CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION
postgres=# CREATE EXTENSION address_standardizer;
CREATE EXTENSION

Language Extensions : PL/Python, PL/Perl, PL/V8,PL/R etc.

Another powerful feature of PostgreSQL is its programming languages support. You can code database functions/procedures in pretty much every popular language.

Thanks to the enormous number of libraries available, which includes machine learning ones, and its vibrant community, Python has claimed the third spot amongst the most popular languages of choice according to the TIOBE Programming index. Your team’s skills and libraries remain valid for PostgreSQL server coding too! Teams that regularly code in JavaScript for Node.js or Angular can easily write PostgreSQL server code in PL/V8. All of the packages required are readily available from the PGDG repository.

cstore_fdw

cstore_fdw is an open source columnar store extension for PostgreSQL. Columnar stores provide notable benefits for analytics use cases where data is loaded in batches. Cstore_fdw’s columnar nature delivers performance by only reading relevant data from disk. It may compress data by 6 to 10 times to reduce space requirements for data archive. The official repository and documentation is available here

HypoPG

HypoPG is an extension for adding support for hypothetical indexes – that is, without actually adding the index. This helps us to answer questions such as “how will the execution plan change if there is an index on column X?”. Installation and setup instructions are part of its official documentation

mongo_fdw

Mongo_fdw presents collections from mongodb as tables in PostgreSQL. This is a case where the NoSQL world meets the SQL world and features combine. We will be covering this extension in a future blog post. The official repository is available here

tds_fdw

Another important FDW (foreign data wrapper) extension in the PostgreSQL world is tds_fdw. Both Microsoft SQL Server and Sybase uses TDS (Tabular Data Stream) format. This fdw allows PostgreSQL to use tables stored in remote SQL Server or Sybase database as local tables. This FDW make use of FreeTDS libraries.

orafce

As previously mentioned, there are lot of migrations underway from Oracle to PostgreSQL. Incompatible functions in PostgreSQL are often painful for those who are migrating server code. The “orafce” project implements some of the functions from the Oracle database. The functionality was verified on Oracle 10g and the module is useful for production work. Please refer to the list in its official documentation about the Oracle functions implemented in PostgreSQL

TimescaleDB

In this new world of IOT and connected devices, there is a growing need of time-series data. Timescale can convert PostgreSQL into a scalable time-series data store. The official site is available here with all relevant links.

pg_bulkload

Is loading a large volume of data into database in a very efficient and faster way a challenge for you? If so pg_bulkload may help you solve that problem. Official documentation is available here

pg_partman

PostgreSQL 10 introduced declarative partitions. But creating new partitions and maintaining existing ones, including purging unwanted partitions, requires a good dose of manual effort. If you are looking to automate part of this maintenance you should have a look at what pg_partman offers. The repository with documentation is available here.

wal2json

PostgreSQL has feature related to logical replication built-in. Extra information is recorded in WALs which will facilitate logical decoding. wal2json is a popular output plugin for logical decoding. This can be utilized for different purposes including change data capture. In addition to wal2json, there are other output plugins: a concise list is available in the PostgreSQL wiki.

There are many more extensions that help us build an enterprise-grade PostgreSQL set up using open source solutions. Please feel free to comment and ask us if we know about one that satisfies your particular needs. Or, if there’s still time, sign up for our October webinar and ask us in person!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com