Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.
This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.
Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.
In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.
First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.
postgres@ip-172-31-46-212:~$ psql psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1)) Type "help" for help. postgres=# select * from pg_available_extensions where name='pgstattuple'; name | default_version | installed_version | comment -------------+-----------------+-------------------+----------------------------- pgstattuple | 1.5 | | show tuple-level statistics (1 row) postgres=# postgres=# create extension pgstattuple; CREATE EXTENSION postgres=# postgres=# select * from pg_available_extensions where name='pgstattuple'; name | default_version | installed_version | comment -------------+-----------------+-------------------+----------------------------- pgstattuple | 1.5 | 1.5 | show tuple-level statistics (1 row) postgres=#
By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.
postgres=# grant pg_stat_scan_tables to <nonsuperuser>; GRANT ROLE postgres=> du <nonsuperuser> List of roles Role name | Attributes | Member of ———–+————+———————– <nonsuperuser> | | {pg_stat_scan_tables} postgres=>
Next, before using the pgstattuple functions, let’s create a table and index for demonstration.
postgres=# create table workshop (jobno int); CREATE TABLE postgres=# insert into workshop values (generate_series(1,80000)); INSERT 0 80000 postgres=# create index workshop_index on workshop (jobno); CREATE INDEX postgres=#
pgstattuple functions
pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID.
For example:
SELECT * FROM pgstattuple('pg_catalog.pg_proc'); SELECT * FROM pgstattuple('workshop');
pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.
The example below shows the tuple statistics of the workshop table.
postgres=# SELECT * FROM pgstattuple('workshop'); -[ RECORD 1 ]------+-------- table_len | 2899968 tuple_count | 80000 tuple_len | 2240000 tuple_percent | 77.24 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 10056 free_percent | 0.35
Below are the pgstattuple output columns and their description.
Column | Type | Description |
table_len | bigint | Physical relation length in bytes |
tuple_count | bigint | Number of live tuples |
tuple_len | bigint | Total length of live tuples in bytes |
tuple_percent | float8 | Percentage of live tuples |
dead_tuple_count | bigint | Number of dead tuples |
dead_tuple_len | bigint | Total length of dead tuples in bytes |
dead_tuple_percent | float8 | Percentage of dead tuples |
free_space | bigint | Total free space in bytes |
free_percent | float8 | Percentage of free space |
Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.
postgres=# SELECT * FROM pgstattuple('workshop'); -[ RECORD 1 ]------+-------- table_len | 2899968 tuple_count | 80000 tuple_len | 2240000 tuple_percent | 77.24 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 10056 free_percent | 0.35
Delete the few ROWS from the table.
postgres=# DELETE FROM workshop WHERE jobno % 8 = 0; DELETE 10000 postgres=#
In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.
postgres=# SELECT * FROM pgstattuple('workshop'); -[ RECORD 1 ]------+-------- table_len | 2899968 tuple_count | 70000 tuple_len | 1960000 tuple_percent | 67.59 dead_tuple_count | 10000 dead_tuple_len | 280000 dead_tuple_percent | 9.66 free_space | 10056 free_percent | 0.35
Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:
- free_space has increased. Vacuum has reclaimed the space and made it available for re-use at the database level.
- Dead rows have been removed.
- The table_len is the same as O.S. level space is not reclaimed.
postgres=# vacuum workshop; VACUUM postgres=# SELECT * FROM pgstattuple('workshop'); -[ RECORD 1 ]------+-------- table_len | 2899968 tuple_count | 70000 tuple_len | 1960000 tuple_percent | 67.59 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 330412 free_percent | 11.39
After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.
postgres=# vacuum full workshop; VACUUM postgres=# SELECT * FROM pgstattuple('workshop'); -[ RECORD 1 ]------+-------- table_len | 2539520 tuple_count | 70000 tuple_len | 1960000 tuple_percent | 77.18 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 10840 free_percent | 0.43 postgres=#
pgstattuple queries to check table bloat
We can list the tables where the dead tuple percentage is high by using the query below.
postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc; relname | dead_tuple_percent --------------------------+-------------------- pg_init_privs | 2.56 pg_class | 1.69 pg_statistic | 1.45 pg_authid | 1.37 pg_extension | 1.28 pg_type | 1.17 pg_namespace | 1.17
The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.
SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc; -[ RECORD 1 ]------+------------------------- relname | pg_init_privs oid | 3394 relowner | 10 dead_tuple_percent | 2.56 table_len | 24576 tuple_count | 222 tuple_len | 17316 tuple_percent | 70.46 dead_tuple_count | 9 dead_tuple_len | 630 dead_tuple_percent | 2.56 free_space | 4568 free_percent | 18.59 -[ RECORD 2 ]------+------------------------- relname | pg_class oid | 1259 relowner | 10 dead_tuple_percent | 1.69 table_len | 114688 tuple_count | 415 tuple_len | 80473 tuple_percent | 70.17 dead_tuple_count | 11 dead_tuple_len | 1937 dead_tuple_percent | 1.69 free_space | 26976 free_percent | 23.52
pgstatindex(regclass): This function returns a record showing information about a B-tree index.
For example:
postgres=# SELECT * FROM pgstatindex ('workshop_index'); -[ RECORD 1 ]------+-------- version | 4 tree_level | 1 index_size | 1589248 root_block_no | 3 internal_pages | 1 leaf_pages | 192 empty_pages | 0 deleted_pages | 0 avg_leaf_density | 89.74 leaf_fragmentation | 0
The output columns and their description.
Column | Type | Description |
version | integer | B-tree version number |
tree_level | integer | Tree level of the root page |
index_size | bigint | Total index size in bytes |
root_block_no | bigint | Location of root page (zero if none) |
internal_pages | bigint | Number of “internal” (upper-level) pages |
leaf_pages | bigint | Number of leaf pages |
empty_pages | bigint | Number of empty pages |
deleted_pages | bigint | Number of deleted pages |
pgstatginindex(regclass): This function returns a record showing information about a GIN index.
For example:
SELECT * FROM pgstatginindex('table_gin_index');
pgstathashindex(regclass): This function returns a record showing information about a HASH index.
For example:
select * from pgstathashindex('table_hash_index');
In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.