Oct
09
2023
--

PostgreSQL Tuple-Level Statistics With pgstattuple

PostgreSQL Tuple-Level Statistics With pgstattuple

Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.

This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.

Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.

In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.

First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.

postgres@ip-172-31-46-212:~$ psql
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))
Type "help" for help.
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             |                   | show tuple-level statistics
(1 row)
postgres=#
postgres=# create extension pgstattuple;
CREATE EXTENSION
postgres=#
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             | 1.5               | show tuple-level statistics
(1 row)
postgres=#

By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.

postgres=# grant pg_stat_scan_tables to <nonsuperuser>;
GRANT ROLE

postgres=> du <nonsuperuser>
                 List of roles
 Role name | Attributes |       Member of
———–+————+———————–
 <nonsuperuser>        |            | {pg_stat_scan_tables}
postgres=>

Next, before using the pgstattuple functions, let’s create a table and index for demonstration.

postgres=# create table workshop (jobno int);
CREATE TABLE

postgres=# insert into workshop values (generate_series(1,80000));
INSERT 0 80000

postgres=# create index workshop_index on workshop (jobno);
CREATE INDEX
postgres=#

pgstattuple functions

pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID. 

For example:

SELECT * FROM pgstattuple('pg_catalog.pg_proc');

SELECT * FROM pgstattuple('workshop');

pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.

The example below shows the tuple statistics of the workshop table.

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

Below are the pgstattuple output columns and their description.

Column Type Description
table_len bigint Physical relation length in bytes
tuple_count bigint Number of live tuples
tuple_len bigint Total length of live tuples in bytes
tuple_percent float8 Percentage of live tuples
dead_tuple_count bigint Number of dead tuples
dead_tuple_len bigint Total length of dead tuples in bytes
dead_tuple_percent float8 Percentage of dead tuples
free_space bigint Total free space in bytes
free_percent float8 Percentage of free space

 

Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

Delete the few ROWS from the table.

postgres=# DELETE FROM workshop WHERE jobno % 8 = 0;
DELETE 10000
postgres=#

In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 10000
dead_tuple_len     | 280000
dead_tuple_percent | 9.66
free_space         | 10056
free_percent       | 0.35

Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:

  1. free_space has increased. Vacuum has reclaimed the space and made it available for re-use at the database level.
  2. Dead rows have been removed. 
  3. The table_len is the same as O.S. level space is not reclaimed.
postgres=# vacuum workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 330412
free_percent       | 11.39

After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.

postgres=# vacuum full workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2539520
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 77.18
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10840
free_percent       | 0.43
postgres=#

pgstattuple queries to check table bloat

We can list the tables where the dead tuple percentage is high by using the query below.

postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc;
 relname                  | dead_tuple_percent
--------------------------+--------------------
 pg_init_privs            |               2.56
 pg_class                 |               1.69
 pg_statistic             |               1.45
 pg_authid                |               1.37
 pg_extension             |               1.28
 pg_type                  |               1.17
 pg_namespace             |               1.17

The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.

SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc;

-[ RECORD 1 ]------+-------------------------
relname            | pg_init_privs
oid                | 3394
relowner           | 10
dead_tuple_percent | 2.56
table_len          | 24576
tuple_count        | 222
tuple_len          | 17316
tuple_percent      | 70.46
dead_tuple_count   | 9
dead_tuple_len     | 630
dead_tuple_percent | 2.56
free_space         | 4568
free_percent       | 18.59

-[ RECORD 2 ]------+-------------------------
relname            | pg_class
oid                | 1259
relowner           | 10
dead_tuple_percent | 1.69
table_len          | 114688
tuple_count        | 415
tuple_len          | 80473
tuple_percent      | 70.17
dead_tuple_count   | 11
dead_tuple_len     | 1937
dead_tuple_percent | 1.69
free_space         | 26976
free_percent       | 23.52

pgstatindex(regclass): This function returns a record showing information about a B-tree index. 

For example:

postgres=# SELECT * FROM pgstatindex ('workshop_index');

-[ RECORD 1 ]------+--------
version            | 4
tree_level         | 1
index_size         | 1589248
root_block_no      | 3
internal_pages     | 1
leaf_pages         | 192
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 89.74
leaf_fragmentation | 0

The output columns and their description.

Column Type Description
version integer B-tree version number
tree_level integer Tree level of the root page
index_size bigint Total index size in bytes
root_block_no bigint Location of root page (zero if none)
internal_pages bigint Number of “internal” (upper-level) pages
leaf_pages bigint Number of leaf pages
empty_pages bigint Number of empty pages
deleted_pages bigint Number of deleted pages

 

pgstatginindex(regclass): This function returns a record showing information about a GIN index.

For example:

SELECT * FROM pgstatginindex('table_gin_index');

pgstathashindex(regclass): This function returns a record showing information about a HASH index. 

For example:

select * from pgstathashindex('table_hash_index');

In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com