In PostgreSQL, an “orphan large object” refers to a large object (LOB) that exists in the pg_largeobject system table but is not associated with any specific row in a user-defined table. This situation can occur due to various reasons, such as when the large object was created, but its reference in the user table was […]
15
2023
Exploring the Improved CREATEROLE Attribute for User Management in PostgreSQL 16
A database role can have a number of attributes that define its privileges and interact with the client authentication system.One such attribute is the CREATEROLE attribute, which is important to PostgreSQL database management of users and roles. In this blog post, we will discuss the improvement to this attribute that has been done for PostgreSQL 16.
In previous releases of PostgreSQL, a superuser can grant a non-superuser the CREATEROLE attribute. This is important because it gives the ability to someone other than the superuser to add, drop, and modify users to the database. Unfortunately in previous PostgreSQL versions, when a user is granted the CREATEROLE attribute, it gets the ability to drop users as well that it did not create.
With PostgreSQL 16, the CREATEROLE attribute still allows a non-superuser the ability to provision new users, however they can only drop users that they themselves created. The system will generate error codes if they try to delete users which they did not create.
In PostgreSQL 15 and older versions
postgres=#CREATE ROLE Alice CREATEROLE LOGIN; CREATE ROLE postgres=# CREATE role bob; CREATE ROLE postgres=# du bob List of roles Role name | Attributes | Member of -----------+--------------+----------- bob | Cannot login | {} postgres=# c postgres alice You are now connected to database "postgres" as user "alice". postgres=drop role bob; DROP ROLE
In PostgreSQL 16
postgres=# CREATE ROLE Alice CREATEROLE LOGIN; CREATE ROLE postgres=# CREATE role bob; CREATE ROLE postgres=# du bob List of roles Role name | Attributes -----------+-------------- bob | Cannot login postgres=#c postgres alice You are now connected to database "postgres" as user "alice". postgres=#drop role bob; ERROR: permission denied to drop role DETAIL: Only roles with the CREATEROLE attribute and the ADMIN option on role "bob" may drop this role.
In previous releases, a user with the CREATEROLE attribute has access to all predefined system roles, including highly privileged roles like pg_execute_server_program and pg_write_server_files. CREATEROLE also allows them to grant privileges to themself or other users that they haven’t been granted themself.
When providing these roles pg_read_server_files, pg_write_server_files, and pg_execute_server_program to users, extreme caution should be taken because they have access to every file on the server file system, bypass all database-level permission checks when accessing files directly and might be exploited to obtain superuser-level access
So it seems we can expose our systems to risk by granting a user the CREATEROLE privilege. Now the concern is how to deal with the problem mentioned above.
In PostgreSQL 16, users with the CREATEROLE attribute do not have the ability to grant membership of a role to anyone; they can only grant membership to roles for which they have the ADMIN OPTION. The WITH ADMIN OPTION clause gives the user the authority to grant membership of a role to other users, to revoke membership of the role from other members of the role.
Let me try to explain this with the help of an example.
In PostgreSQL 15 and older versions:
postgres=#conninfo You are connected to database "postgres" as user "postgres" via socket in "/var/run/postgresql" at port "5432". postgres=# CREATE ROLE dev_admin CREATEROLE LOGIN; CREATE ROLE postgres=#c postgres dev_admin You are now connected to database "postgres" as user "dev_admin". postgres=> CREATE ROLE QA_Admin LOGIN; CREATE ROLE postgres=grant pg_execute_server_program,pg_write_server_files,pg_monitor, pg_read_all_settings,pg_read_all_stats to QA_Admin; GRANT ROLE postgres=ALTER ROLE QA_Admin with encrypted password '***'; ALTER ROLE
As per the above example, dev_admin can login from QA_admin whenever he wants and get almost all the privileges to view internal configurations as well as reading and writing the whole database which is not expected at all. This poses a security threat to the whole organisation.
In PostgreSQL 16:
postgres=# conninfo You are connected to database "postgres" as user "postgres" via socket in "/var/run/postgresql" at port "5432". postgres=# CREATE ROLE dev_admin CREATEROLE LOGIN; CREATE ROLE postgres=# c postgres dev_admin You are now connected to database "postgres" as user "dev_admin". postgres= CREATE ROLE QA_Admin LOGIN; CREATE ROLE postgres=grant pg_execute_server_program,pg_monitor to QA_Admin; ERROR: permission denied to grant role "pg_execute_server_program" DETAIL: Only roles with the ADMIN option on role "pg_execute_server_program" may grant this role.
From above it seems that, if pg_execute_server_program, pg_monitor role is assigned to dev_admin user by a super user then only it would be possible to grant that role to QA_admin like below.
postgres=# conninfo You are connected to database "postgres" as user "postgres" via socket in "/var/run/postgresql" at port "5432". postgres=> c postgres postgres You are now connected to database "postgres" as user "postgres". postgres=# GRANT pg_monitor to dev_admin with ADMIN OPTION; GRANT ROLE postgres=# GRANT pg_execute_server_program to dev_admin with ADMIN OPTION; GRANT ROLE postgres=# c postgres dev_admin You are now connected to database "postgres" as user "dev_admin". postgres=grant pg_monitor to QA_Admin; GRANT ROLE postgres=grant pg_execute_server_program to QA_Admin; GRANT ROLE
The change in CREATEROLE attribute in PostgreSQL 16 is an important improvement to user management because it allows some users the ability to manage all aspects of their team, but not beyond the rights and not outside their own area of responsibility. This gives them just the right amount of control without allowing them to overstep what is required to execute their job.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
03
2023
PostgreSQL Job Scheduler: pg_timetable
In database management, job scheduling has always played an important role. And for PostgreSQL, PgAgent and pg_cron are examples of job schedulers that are already popular. However, there is another job scheduler called pg_timetable, which is completely database-driven and provides a couple of advanced concepts. In this blog, we are going to highlight some of the main features, installation, and some use cases of pg_timetable.
Main features of pg_timetable
- Fully database-driven configuration
- Cron-style scheduling at the PostgreSQL server time zone
- Built-in tasks such as sending emails, etc.
- Full support for database-driven logging
- Tasks can be arranged in chains
- A chain can consist of built-in commands, SQL, and executables
- Parameters can be passed to chains/tasks
Installation
There are currently two options on how you can install and run pg_timetable.
- Container installation. For this, please refer to this page.
Note: The image mentioned on the above page is based on Alpine and not from Percona. - Local installation, i.e., build from sources. Here, we will discuss this approach.
2.1. Download and install GO on your system.
2.2. Clone pg_timetable repo
$ git clone https://github.com/cybertec-postgresql/pg_timetable.git $ cd pg_timetable
2.3. Run pg_timetable
$ go run main.go --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword
2.4. Alternatively, build a binary and run it:
$ go build $ ./pg_timetable --dbname=dbname --clientname=worker001 --user=scheduler --password=strongpassword
Demonstration of job scheduling with pg_timetable
As a use case here, it will be shown how to use pg_timetable as a scheduler to schedule a job, which will refresh the materialized view every day at 12 midnight.
1. Download pg_timetable executable (Follow step 2 mentioned above in the installation section).
2. Make sure the PostgreSQL server is up and running and has a role with CREATE privilege for a target database, e.g.:
postgres=# CREATE ROLE scheduler PASSWORD '***********'; postgres=# GRANT CREATE ON DATABASE postgres TO scheduler; postgres=# CREATE TABLE t_demo (grp int, data numeric); CREATE TABLE postgres=# INSERT INTO t_demo SELECT 1, random() FROM generate_series(1, 5000000); INSERT 0 5000000 postgres=#CREATE MATERIALIZED VIEW mat_view AS SELECT grp, avg(data), count(*) FROM t_demo GROUP BY 1; SELECT 1 postgres=# ALTER MATERIALIZED VIEW mat_view OWNER TO scheduler; postgres=# GRANT SELECT ON mat_view TO scheduler; -bash-4.2$ psql psql (12.16) Type "help" for help. postgres=# SELECT * FROM mat_view; grp | avg | count -----+--------------------------+--------- 1 | 0.5001807958659610956005 | 5000000 (1 row) postgres=# INSERT INTO t_demo SELECT 2, random() postgres-# FROM generate_series(1, 5000000); INSERT 0 5000000
3. Create a new job to refresh the materialized view each night at 12:00 Postgres server time zone.
postgres=# SELECT timetable.add_job('refresh-matview', '0 12 * * *', 'REFRESH MATERIALIZED VIEW public.mat_view'); add_job --------- 1 (1 row)
4. Run the pg_timetable
[centos@ip-172-31-32-10 pg_timetable]$ ./pg_timetable --dbname=postgres --clientname=worker001 --user=scheduler --password=******** 2023-09-09 11:59:20.929 [INFO] [sid:697146069] Starting new session... 2023-09-09 11:59:20.941 [INFO] Database connection established 2023-09-10 12:00:00.961 [INFO] Accepting asynchronous chains execution requests... 2023-09-10 12:00:00.970 [INFO] [count:0] Retrieve scheduled chains to run @reboot 2023-09-10 12:00:00.991 [INFO] [count:3] Retrieve scheduled chains to run 2023-09-10 12:00:00.994 [INFO] [count:0] Retrieve interval chains to run 2023-09-10 12:00:00.019 [INFO] [chain:1] Starting chain 2023-09-10 12:00:00.722 [INFO] [chain:1] [task:1] [txid:2613] Starting task 2023-09-10 12:00:00.074 [INFO] [chain:1] [task:1] [txid:2613] Starting task 2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Closing remote session 2023-09-10 12:00:00.141 [INFO] [chain:1] [task:1] [txid:2613] Task executed successfully 2023-09-10 12:00:00.185 [INFO] [chain:1] [txid:2613] Chain executed successfully
During the first start of pg_timetable, the necessary schema timetable gets created. For reference, below is the catalog structure.
postgres=# dn List of schemas Name | Owner -----------+----------- public | postgres timetable | scheduler (2 rows) postgres=# set search_path to timetable ; SET postgres=# dt List of relations Schema | Name | Type | Owner -----------+----------------+-------+----------- timetable | active_chain | table | scheduler timetable | active_session | table | scheduler timetable | chain | table | scheduler timetable | execution_log | table | scheduler timetable | log | table | scheduler timetable | migration | table | scheduler timetable | parameter | table | scheduler timetable | task | table | scheduler (8 rows)
5. From database logs, it can be observed that MATERIALIZED VIEW gets refreshed as per the schedule.
Output from DB logs:
2023-09-10 12:00:00 UTC [14334] LOG: statement: REFRESH MATERIALIZED VIEW public.mat_view
From psql prompt:
-bash-4.2$ psql psql (12.16) Type "help" for help. postgres=# SELECT * FROM mat_view; grp | avg | count -----+---------------------------+--------- 1 | 0.5001807958659610956005 | 5000000 2 | 0.50000009110202547559215 | 5000000 (2 rows)
Below is the output from the pg_timetable catalog tables.
postgres=# select * from active_session ; client_pid | server_pid | client_name | started_at ------------+------------+-------------+------------------------------ 697146069 | 20137 | worker001 | 2023-09-10 11:59:205672+00 (1 row)
postgres=# select * from chain; -[ RECORD 1 ]-------+-------------------- chain_id | 1 chain_name | refresh-matview run_at | 0 12 * * * max_instances | timeout | 0 live | t self_destruct | f exclusive_execution | f client_name | on_error |
postgres=# select * from execution_log where chain_id=1; -[ RECORD 01 ]------------------------------------------------------------------- chain_id | 1 task_id | 1 txid | 2613 last_run | 2023-09-10 12:00:00.137404+00 finished | 2023-09-10 12:00:00.586543+00 pid | 697146069 returncode | 0 kind | SQL command | REFRESH MATERIALIZED VIEW public.mat_view output | REFRESH MATERIALIZED VIEW client_name | worker001
In conclusion, we can say that pg_timetable is open source and can be used freely by everyone. The main advantages include that pg_timetable is an independent process written in GO, which connects to PostgreSQL just like any other client program. So, if the scheduler crashes, it will not harm your server. Pg_timetable provides a variety of built-in tasks that help you to flexibly combine those operations in an easy way. Further, pg_timetable has been implemented in GO and, therefore, comes as ONE executable that can be started directly. So, there is no need to worry about libraries or dependencies during the installation.
Please refer to the links below to learn more about pg_timetable.
https://github.com/cybertec-postgresql/pg_timetable
https://www.cybertec-postgresql.com/en/products/pg_timetable/
https://pg-timetable.readthedocs.io/en/master/README.html
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
26
2023
What To Do When a Data File in PostgreSQL Goes Missing
We have faced different levels of corruption related to databases in PostgreSQL. Our colleague has written multiple blogs on the subject; please refer to the below links for more:
- The Ultimate Guide to Database Corruption: Part 1 – An Overview
- The Ultimate Guide to Database Corruption: Part 2 – B-Tree Index Corruption
In this blog, we will be discussing the scenario where a data file related to a table goes missing, maybe due to OS (hardware problem) or due to human interruption, which causes the deletion of some data file unintentionally at the OS level. Though it is not at all recommended to touch the
/data/base/ directory and go through files under this
/var/lib/postgresql/14/main/base/, however, sometimes it happens.
Our current database was running fine with the below structure:
List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description -----------+----------+----------+---------+---------+-----------------------+---------+------------+-------------------------------------------- percona | postgres | UTF8 | C.UTF-8 | C.UTF-8 | | 9561 kB | pg_default | postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 | | 8553 kB | pg_default | default administrative connection database template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8401 kB | pg_default | unmodifiable empty database | | | | | postgres=CTc/postgres | | | template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8553 kB | pg_default | default template for new databases | | | | | postgres=CTc/postgres | | | (4 rows)
Somehow we are getting the below error message in PostgreSQL logs:
2023-06-14 09:58:06.408 UTC [4056] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-06-14 09:58:06.412 UTC [4056] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-06-14 09:58:06.423 UTC [4057] LOG: database system was shut down at 2023-06-14 09:58:04 UTC 2023-06-14 09:58:06.432 UTC [4056] LOG: database system is ready to accept connections 2023-06-16 10:00:58.130 UTC [35062] postgres@percona ERROR: could not open file "base/16384/16391": No such file or directory 2023-06-16 10:00:58.130 UTC [35062] postgres@percona STATEMENT: select * from test limit 1; 2023-06-16 10:01:59.191 UTC [35224] postgres@percona ERROR: could not open file "base/16384/16391": No such file or directory 2023-06-16 10:01:59.191 UTC [35224] postgres@percona STATEMENT: select * from test limit 1;
Upon checking, we found it was due to one file (
base/16384/16391) being removed. So we need to check whether this
base/16384/16391 file is available in
/base location.
postgres@ip-172-xx-xx-xx:~/14/main$ ls -l base/16384/16391 ls: cannot access 'base/16384/16391': No such file or directory
Also, we can check at the DB level by using this SQL Query:
percona=# SELECT relid, relname FROM pg_catalog.pg_statio_user_tables WHERE relid = '16391'; relid | relname -------+--------- 16391 | test (1 row)
From the above, we have identified that the file for the table “test” with relid 16391 got deleted. We need to identify whether it was deleted manually by mistake or was due to hardware failure.
In case of hardware failure, first, we need to fix the hardware issue or migrate our database to new hardware and then perform a restore, as mentioned below.
To restore, we can follow either of below approaches:
- If the standby server is there and the issue does not persist on that, then we can fix the issue by taking logical backup from the standby server, dropping the table on primary, and recreating it using the backup taken.
- Perform PITR if we have backup configured with proper archive backup.
- Restore database “percona” if we have proper backup configured using pgBackRest backup.
- If we have a table-level logical backup, then we can restore the “test” table backup, and our database will be in good shape.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.