
There is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using
EXPLAIN
or
EXPLAIN ANALYZE
in PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated how plprofiler can be useful in profiling functions.
plprofiler
builds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called
plpgsql_check
which can be used for a similar purpose as
plprofiler
, while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post.
Installing plpgsql-check
You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.
On RedHat/CentOS
$ sudo yum install plpgsql_check_11
On Ubuntu/Debian
$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make
$ git clone https://github.com/okbob/plpgsql_check.git
$ cd plpgsql_check/
$ make && make install
Creating and enabling this extension
There are 3 advantages of using
plpgsql_check
- Checking for compilation errors in a function code
- Finding dependencies in functions
- Profiling functions
When using plpgsql_check for the first 2 requirements, you may not need to add any entry to
shared_preload_libraries
. However, if you need to use it for profiling functions (3), then you should add appropriate entries to
shared_preload_libraries
so that it could load both
plpgsql
and
plpgsql_check
. Due to dependencies,
plpgsql
must be before
plpgsql_check
in the
shared_preload_libraries
config as you see in the following example :
shared_preload_libraries = plpgsql, plpgsql_check
Any change to
shared_preload_libraries
requires a restart. You may see the following error when you do not have
plpgsql
before
plpgsql_check
in the
shared_preload_libraries
config.
$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql_check'
$ pg_ctl -D /var/lib/pgsql/11/data restart -mf
waiting for server to shut down.... done
server stopped
waiting for server to start....2019-07-07 02:07:10.104 EDT [6423]
FATAL: could not load library "/usr/pgsql-11/lib/plpgsql_check.so": /usr/pgsql-11/lib/plpgsql_check.so: undefined symbol: plpgsql_parser_setup
2019-07-07 02:07:10.104 EDT [6423] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql, plpgsql_check'
$ pg_ctl -D /var/lib/pgsql/11/data start
.....
server started
Once the PostgreSQL instance is started, create this extension in the database where you need to perform any of the previously discussed 3 tasks.
$ psql -d percona -c "CREATE EXTENSION plpgsql_check"
CREATE EXTENSION
Finding Compilation Errors
As discussed earlier, this extension can help developers and admins determine compilation errors. But why is it needed? Let’s consider the following example where we see no errors while creating the function. By the way, I have taken this example from my previous blog post where I was talking about Automatic Index recommendations using
hypopg
and
pg_qualstats
. You might want to read that blog post to understand the logic of the following function.
percona=# CREATE OR REPLACE FUNCTION find_usable_indexes()
percona-# RETURNS VOID AS
percona-# $$
percona$# DECLARE
percona$# l_queries record;
percona$# l_querytext text;
percona$# l_idx_def text;
percona$# l_bef_exp text;
percona$# l_after_exp text;
percona$# hypo_idx record;
percona$# l_attr record;
percona$# /* l_err int; */
percona$# BEGIN
percona$# CREATE TABLE IF NOT EXISTS public.idx_recommendations (queryid bigint,
percona$# query text, current_plan jsonb, recmnded_index text, hypo_plan jsonb);
percona$# FOR l_queries IN
percona$# SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums,
percona$# pg_qualstats_example_query(t.queryid) as query
percona$# FROM
percona$# (
percona$# SELECT qs.relid::regclass AS relname, qs.relid AS relid, qs.queryid,
percona$# string_agg(DISTINCT attnames.attnames,',') AS attnames, qs.attnums
percona$# FROM pg_qualstats_all qs
percona$# JOIN pg_qualstats q ON q.queryid = qs.queryid
percona$# JOIN pg_stat_statements ps ON q.queryid = ps.queryid
percona$# JOIN pg_amop amop ON amop.amopopr = qs.opno
percona$# JOIN pg_am ON amop.amopmethod = pg_am.oid,
percona$# LATERAL
percona$# (
percona$# SELECT pg_attribute.attname AS attnames
percona$# FROM pg_attribute
percona$# JOIN unnest(qs.attnums) a(a) ON a.a = pg_attribute.attnum
percona$# AND pg_attribute.attrelid = qs.relid
percona$# ORDER BY pg_attribute.attnum) attnames,
percona$# LATERAL unnest(qs.attnums) attnum(attnum)
percona$# WHERE NOT
percona$# (
percona$# EXISTS
percona$# (
percona$# SELECT 1
percona$# FROM pg_index i
percona$# WHERE i.indrelid = qs.relid AND
percona$# (arraycontains((i.indkey::integer[])[0:array_length(qs.attnums, 1) - 1],
percona$# qs.attnums::integer[]) OR arraycontains(qs.attnums::integer[],
percona$# (i.indkey::integer[])[0:array_length(i.indkey, 1) + 1]) AND i.indisunique)))
percona$# GROUP BY qs.relid, qs.queryid, qs.qualnodeid, qs.attnums) t
percona$# GROUP BY t.relid, t.relname, t.queryid, t.attnames, t.attnums
percona$# LOOP
percona$# /* RAISE NOTICE '% : is queryid',l_queries.queryid; */
percona$# execute 'explain (FORMAT JSON) '||l_queries.query INTO l_bef_exp;
percona$# execute 'select hypopg_reset()';
percona$# execute 'SELECT indexrelid,indexname FROM hypopg_create_index(''CREATE INDEX on '||l_queries.relname||'('||l_queries.attnames||')'')' INTO hypo_idx;
percona$# execute 'explain (FORMAT JSON) '||l_queries.query INTO l_after_exp;
percona$# execute 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')' INTO l_idx_def;
percona$# INSERT INTO public.idx_recommendations (queryid,query,current_plan,recmnded_index,hypo_plan)
percona$# VALUES (l_queries.queryid,l_querytext,l_bef_exp::jsonb,l_idx_def,l_after_exp::jsonb);
percona$# END LOOP;
percona$# execute 'select hypopg_reset()';
percona$# END;
percona$# $$ LANGUAGE plpgsql;
CREATE FUNCTION
From the above log, it has created the function with no errors. Unless we call the above function, we do not know if it has any compilation errors. Surprisingly, with this extension, we can use the
plpgsql_check_function_tb()
function to learn whether there are any errors, without actually calling it.
percona=# SELECT functionid, lineno, statement, sqlstate, message, detail, hint, level, position,
left (query,60),context FROM plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------
functionid | find_usable_indexes
lineno | 14
statement | FOR over SELECT rows
sqlstate | 42P01
message | relation "pg_qualstats_all" does not exist
detail |
hint |
level | error
position | 306
left | SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums,
context |
From the above log, it is clear that there is an error where a relation used in the function does not exist. But, if we are using dynamic SQLs that are assembled in runtime, false positives are possible, as you can see in the following example. So, you may avoid the functions using dynamic SQL’s or comment the code containing those SQLs before checking for other compilation errors.
percona=# select * from plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno | 50
statement | EXECUTE
sqlstate | 00000
message | cannot determinate a result of dynamic SQL
detail | There is a risk of related false alarms.
hint | Don't use dynamic SQL and record type together, when you would check function.
level | warning
position |
query |
context |
-[ RECORD 2 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno | 52
statement | EXECUTE
sqlstate | 55000
message | record "hypo_idx" is not assigned yet
detail | The tuple structure of a not-yet-assigned record is indeterminate.
hint |
level | error
position |
query |
context | SQL statement "SELECT 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')'"
Finding Dependencies
This extension can be used to find dependent objects in a function. This way, it becomes easy for administrators to understand the objects being used in a function without spending a huge amount of time reading the code. The trick is to simply pass your function as a parameter to
plpgsql_show_dependency_tb()
as you see in the following example.
percona=# select * from plpgsql_show_dependency_tb('find_usable_indexes()');
type | oid | schema | name | params
----------+-------+------------+----------------------------+-----------
FUNCTION | 16547 | public | pg_qualstats | ()
FUNCTION | 16545 | public | pg_qualstats_example_query | (bigint)
FUNCTION | 16588 | public | pg_stat_statements | (boolean)
RELATION | 2601 | pg_catalog | pg_am |
RELATION | 2602 | pg_catalog | pg_amop |
RELATION | 1249 | pg_catalog | pg_attribute |
RELATION | 1262 | pg_catalog | pg_database |
RELATION | 2610 | pg_catalog | pg_index |
RELATION | 16480 | public | idx_recommendations |
RELATION | 16549 | public | pg_qualstats |
RELATION | 16559 | public | pg_qualstats_all |
RELATION | 16589 | public | pg_stat_statements |
(12 rows)
Profiling Functions
This is one of the very interesting features. Once you have added the appropriate entries to
shared_preload_libraries
as discussed earlier, you could easily enable or disable profiling through a GUC:
plpgsql_check.profiler.
This parameter can either be set globally or for only your session. Here’s an example to understand it better.
percona=# ALTER SYSTEM SET plpgsql_check.profiler TO 'ON';
ALTER SYSTEM
percona=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
When you set it globally, all the functions running in the database are automatically profiled and stored. This may be fine in a development or a testing environment, but you could also enable profiling of functions called in a single session through a session-level setting as you see in the following example.
percona=# BEGIN;
BEGIN
percona=# SET plpgsql_check.profiler TO 'ON';
SET
percona=# select salary_update(1000);
salary_update
---------------
(1 row)
percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
lineno | avg_time | source
--------+----------+---------------------------------------------------------------------------------------------------------------
1 | |
2 | | DECLARE
3 | | l_abc record;
4 | | new_sal INT;
5 | 0.005 | BEGIN
6 | 144.96 | FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
7 | | LOOP
8 | NaN | new_sal := l_abc.salary + sal_raise;
9 | NaN | UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
10 | | END LOOP;
11 | | END;
(11 rows)
--- Create an Index and check if it improves the execution time of FOR loop.
percona=# CREATE INDEX idx_1 ON employee (emp_id, dept_id);
CREATE INDEX
percona=# select salary_update(1000);
salary_update
---------------
(1 row)
percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
lineno | avg_time | source
--------+----------+---------------------------------------------------------------------------------------------------------------
1 | |
2 | | DECLARE
3 | | l_abc record;
4 | | new_sal INT;
5 | 0.007 | BEGIN
6 | 73.074 | FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
7 | | LOOP
8 | NaN | new_sal := l_abc.salary + sal_raise;
9 | NaN | UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
10 | | END LOOP;
11 | | END;
(11 rows)
percona=# END;
COMMIT
percona=# show plpgsql_check.profiler;
plpgsql_check.profiler
------------------------
on
(1 row)
In the above log, I have opened a new transaction block and enabled the parameter
plpgsql_check.profiler
only for that block. So any function that I have executed in that transaction is profiled, which can be extracted using
plpgsql_profiler_function_tb()
. Once we have identified the area where the execution time is high, the immediate action is to tune that piece of code. After creating the index, I have called the function again. It has now performed faster than earlier.
Conclusion
Special thanks to Pavel Stehule who has authored this extension and also to the contributors who have put this extension into a usable stage today. This is one of the simplest extensions that can be used to check for compilation errors and dependencies. While this can also be a handy profiling tool, a developer may find both plprofiler or
plpgsql_check
helpful for profiling as well.