
PostgreSQL Partition Manager — ‘pg_partman’ — is an open source extension freely available, widely supported, and actively maintained by the PostgreSQL community.
pg_partman creates and manages both time-based and serial-based table partition sets. Users use it to automate and create partitions by breaking large tables into smaller partitions for better performance.
In this post, initially, we will discuss the creation of the pg_partman extension following partitioning for newly created tables. First, we will need to install the OS with the necessary control files for the appropriate PostgreSQL version for installing pg_partman on the database.
Installation of pg_partman on the database
I recommend creating the schema before creating the extension to install the extension’s objects. This step is not mandatory, but after installation, you cannot change it. By default, the extension creates objects in the ‘public’ schema.
Establishing a dedicated schema for an extension serves the purpose of storing its objects, including tables, functions, procedures, and more. This approach facilitates smoother management and simplifies tasks like handling backups, upgrades, and migrations.
Below, I present the steps and failures encountered when attempting to change the schema after creating the extension.
Step one: As an example, let us try to create an extension without Qualifying SCHEMA.
CREATE EXTENSION pg_partman;
postgres=# dx
List of installed extensions
Name | Version | Schema | Description
------------+---------+------------+------------------------------------------------------
pg_partman | 4.6.0 | public | Extension to manage partitioned tables by time or ID
Step two: Next, we demonstrate a failure to set up the schema for an already created extension.
CREATE schema testpartman;
ALTER EXTENSION pg_partman set schema testpartman ;
ERROR: extension "pg_partman" does not support SET SCHEMA
Eventually, here are the steps for Installation of pg_partman along with schema.
Step 1: Create schema for installation of the extension
partman=# CREATE SCHEMA partman;
CREATE SCHEMA
Step 2: Create extension pg_partman in schema partman
partman=# CREATE EXTENSION pg_partman SCHEMA partman;
CREATE EXTENSION
partman=# dx
List of installed extensions
Name | Version | Schema | Description
------------+---------+------------+--------------------------------------------------
pg_partman | 4.6.0 | partman | Extension to manage partitioned tables by time or ID
Implementation of native partitioning
Native partitioning is also known as declarative partitioning. PostgreSQL allows you to declare that a table is divided into partitions with a list of columns or expressions to be used as the partition key.
However, you can only implement range partitioning using pg_partman, either with time-based or serial-based methods. This is because you can only predict when a new partition needs to be created in the case of range partitioning on dates or timestamps.
Time-based partitioning by range with template
Unique constraints on the partitioned table must include all partitioning columns. Extension pg_partman helps to manage this by using a template table to manage properties that currently are not supported by native partitioning.
Properties like INDEX and unique constraint creations on the template table are created on all partitions.
partman=# CREATE TABLE employees (id INT PRIMARY KEY, fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL ,joined DATE) PARTITION BY RANGE (joined);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "employees" lacks column "joined" which is part of the partition key.
1. First, create a table with native partitioning type by range using DATE typed column
<span style="font-family: 'Courier 10 Pitch', Courier, monospace; font-size: 11.2px; background-color: #cce4f5; color: #666666;">partman=# CREATE TABLE employees (id INT, fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL ,joined DATE NOT NULL) PARTITION BY RANGE (joined);
</span>CREATE TABLE
2. Next, create a template table
partman=# CREATE TABLE partman.employees_template (LIKE public.employees);
CREATE TABLE
3. Add constraints to the template table needed for all partitioned tables
partman=# ALTER TABLE partman.employees_template ADD PRIMARY KEY (ID);
ALTER TABLE
4. Create parent to create initial child partitioned tables
partman=# SELECT partman.create_parent('public.employees', p_control := 'joined',p_type := 'native',p_interval := 'yearly',p_template_table := 'partman.employees_template',p_premake := 2, p_start_partition := (CURRENT_TIMESTAMP + '1 hour'::interval)::text);
create_parent
---------------
t
(1 row)
Additionally, I have listed below the various options that can be used with the ‘create_parent’ function and their respective values:
Argument |
Data Type |
Definition |
Values |
p_parent_table |
text |
Parent Partitioned table ( Original table to have existed before creation & MUST be schema-qualified even if it is a public schema.) |
schema.tablename |
p_control |
text |
You should partition the column, either integer or time-based. |
Column name |
p_type |
text |
You can use the Partman type based on your performance improvement and flexibility requirements. |
native,partman |
p_interval |
text |
Integer range or time interval for each partition. (The generic intervals of “yearly” to “quarter-hour”) (yearly, quarterly, monthly, weekly, daily, hourly, half-hour, quarter-hour ) |
Any other interval that validates time.For ID based integer ranges. |
p_template_table |
text |
Template table name, if not provided, will create one in the schema the extension is installed. |
schema.tablename |
p_premake |
integer |
Additional partitions to stay ahead of the current partition. |
We create a default of 4 (resulting in the creation of 5 partitions, including the current one). |
p_start_partition |
text |
This allows specifying the first partition of a set instead of it being automatically determined. It must be a valid timestamp (for time-based) or a positive integer (for id-based) value. |
CURRENT_TIMESTAMP /(CURRENT_TIMESTAMP + ‘1 hour’::interval)::text) |
p_automatic_maintenance |
text |
The parameter determines whether maintenance is automatically managed when run_maintenance() is called without a table parameter or by a background worker process. |
DEFAULT ‘on’::text |
Because we specified ‘2’ for premake in the create_parent function, you can observe that two partitions ahead have been created, as shown in the table definition below.
partman=# d+ employees
Partitioned table "public.employees"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain |
fname | character varying(20) | | | | extended |
lname | character varying(20) | | | | extended |
dob | date | | not null | | plain |
joined | date | | not null | | plain |
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
employees_default DEFAULT
To illustrate, we insert around 10,000 rows to see the data flush to its appropriate partitions if it exists or moves to the default partition.
partman=# INSERT INTO employees (id ,fname,lname,dob ,joined) VALUES ( generate_series(1,10000) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], (array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,
'2023-01-01'::date + trunc(random() * 366 * 3)::int);
INSERT 0 10000
partman=# SELECT employees_p2023 , employees_p2024 ,employees_p2025,employees_default,employees_totalcnt from
( SELECT COUNT(*) FROM employees_p2023 ) AS employees_p2023,( SELECT COUNT(*) FROM employees_p2024 ) AS employees_p2024,
( SELECT COUNT(*) FROM employees_p2025) AS employees_p2025 ,( SELECT COUNT(*) FROM employees_default) AS employees_default ,
( SELECT COUNT(*) FROM employees ) AS employees_totalcnt ;
employees_p2023 | employees_p2024 | employees_p2025 | employees_default | employees_totalcnt
-----------------+-----------------+-----------------+-------------------+------------
(3294) | (3397) | (3293) | (16) | (10000)
(1 row)
By utilizing functions, we can seamlessly create child partitions and efficiently transfer data from the default table to these child tables.
partman=# CALL partman.run_maintenance_proc();
ERROR: updated partition constraint for default partition "employees_default" would be violated by some row
CONTEXT: SQL statement "ALTER TABLE public.employees ATTACH PARTITION public.employees_p2026 FOR VALUES FROM ('2026-01-01 00:00:00+00') TO ('2027-01-01 00:00:00+00')"
As an illustration, the usage of partman.partition_data_proc shown below, which can move data batch-wise to partitioned tables.
partman=# CALL partman.partition_data_proc ('public.employees');
NOTICE: Batch: 1, Rows moved: 16
NOTICE: Total rows moved: 16
NOTICE: Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
CALL
partman=# VACUUM ANALYZE employees;
VACUUM
As a result, we can see below the newly created table structure along with the new child partition.
partman=# d+ employees
Partitioned table "public.employees"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | |
fname | character varying(20) | | | | extended | |
lname | character varying(20) | | | | extended | |
dob | date | | not null | | plain | |
joined | date | | not null | | plain | |
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
employees_p2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'),
employees_default DEFAULT
Consequently, after using the function, you will notice that the data from the default table has been moved to the newly created partitioned table.
partman=# SELECT employees_p2023 , employees_p2024 ,employees_p2025, employees_p2026, employees_default,employees_totalcnt from
( SELECT COUNT(*) FROM employees_p2023 ) AS employees_p2023,( SELECT COUNT(*) FROM employees_p2024 ) AS employees_p2024,
( SELECT COUNT(*) FROM employees_p2025) AS employees_p2025,( SELECT COUNT(*) FROM employees_p2026) AS employees_p2026,
( SELECT COUNT(*) FROM employees_default) AS employees_default , ( SELECT COUNT(*) FROM employees ) AS employees_totalcnt ;
employees_p2023 | employees_p2024 | employees_p2025 | employees_p2026 | employees_default | employees_totalcnt
-----------------+-----------------+-----------------+-----------------+-------------------+--------------------
(3294) | (3397) | (3293) | (16) | (0) | (10000)
(1 row)
Creating the partitioned child tables ahead
Subsequently, we can modify the table partman.part_config settings to make changes by modifying the premake, which controls the partitions ahead.
partman=# update partman.part_config set premake = '4' where parent_table ='public.employees';
UPDATE 1
partman=# CALL partman.run_maintenance_proc();
CALL
partman=# d+ employees
Partitioned table "public.employees"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------
id | integer | | | | plain |
fname | character varying(20) | | | | extended |
lname | character varying(20) | | | | extended |
dob | date | | not null | | plain |
joined | date | | not null | | plain |
Partition key: RANGE (joined)
Partitions: employees_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
employees_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
employees_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
employees_p2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'),
employees_p2027 FOR VALUES FROM ('2027-01-01') TO ('2028-01-01'),
employees_p2028 FOR VALUES FROM ('2028-01-01') TO ('2029-01-01'),
employees_p2029 FOR VALUES FROM ('2029-01-01') TO ('2030-01-01'),
employees_p2030 FOR VALUES FROM ('2030-01-01') TO ('2031-01-01'),
employees_default DEFAULT
Automatic maintenance of partitions
You can plan to create partitions ahead of the INSERTS to prevent data from moving into the default_partition or the main table. You can achieve this by using functions to create newly partitioned tables and move the data.
For instance, we can use the function run_maintenance() to create new partitions in advance, and there are two ways to do it.
1. Using background worker
2. Cron Scheduler
1. Setting up background worker
To start with, we can set “shared_preload_libraries” for partition maintenance operations to run automatically. You can set the parameter either by using ALTER SYSTEM or by manually modifying the config file.
partman=# ALTER SYSTEM set shared_preload_libraries = 'pg_partman_bgw';
ALTER SYSTEM
Afterward, perform a restart of the cluster either using systemctl or pg_ctl.
ubuntu@ip-172-31-47-134:~$ sudo systemctl restart postgresql@14-main.service
shared_preload_libraries
--------------------------
pg_partman_bgw
(1 row)
Configuration options background worker
1. pg_partman_bgw.interval: Number of seconds between calls to run_maintenance(). The default is 3600 (1 hour).
– For time-based partitions, intervals can be chosen based on the partition interval at least daily or twice daily.
– For serial-based partitions, you can choose the partition interval based on the data inserted on a daily basis.
2. pg_partman_bgw.dbname: Required. The database(s) that run_maintenance() will run on. If multiple, use a comma-separated list. If not set, BGW will do nothing.
3. pg_partman_bgw.role: The role that run_maintenance() will run as. The default is “postgres”. You are allowed to use only a single role name.
4. pg_partman_bgw.analyze: Optional argument, by default whenever a new child table is created, an analyze is run on the parent table of the partition set to ensure constraint exclusion works. This analyze can be skipped by setting this to false and help increase the speed of moving large amounts of data. If you set this to false, we strongly recommend that you perform a manual analyze of the partition upon completion to ensure the statistics are updated properly.
2. Cron scheduler
00 01,12 * * * psql -c “SELECT pg_partman.run_maintenance()”
Advantages of pg_partman
- Using pg_partman, you can define data retention to automatically move data into archived partitions or drop the partition.
- Query Performance will improve when dealing with partitioned data specifically.
- Vacuum efficiency will improve as data resides in smaller partitions, reducing the time required for table maintenance compared to dealing with large sets of data.
- You can attach or detach partitions without downtime concurrently, and it will not impact ongoing database operations.
Limitations of pg_partman
- pg_partman only supports partitioning of data types that are integer or date/timestamp.
- Generally, when opting for native partitioning, the parent table should already be created as range partitioned but not list partitioned.
- Unique constraints on partitioned tables must include all partitioning columns.
Conclusion
To summarize, we discussed creating and managing partitions effectively using PostgreSQL Partition Manager(pg_partman) along with the creation of the partition ahead of Inserting data using time-based. I will provide a more detailed usage of pg_partman in upcoming blog posts.
Please review the blogs below for reference:
Performing ETL Using Inheritance in PostgreSQL
PostgreSQL Partitioning Using Traditional Methods
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
Download Percona Distribution for PostgreSQL Today!