Apr
05
2019
--

Writing PostgreSQL Extensions is Fun – C Language

postgresql extensions

PostgreSQL is a powerful open source relational database management system. It extends the SQL language with additional features. A DBMS is not only defined by its performance and out of the box features, but also its ability to support bespoke/additional user-specific functionality. Some of these functionalities may be in the form of database constructs or modules, like stored procedures or functions, but their scope is generally limited to the functionality being exposed by the DBMS. For instance, how will you write a custom query-analyzing application that resides within your DBMS?

To support such options, PostgreSQL provides a pluggable architecture that allows you to install extensions. Extensions may consist of a configuration (control) file, a combination of SQL files, and dynamically loadable libraries.

This means you can write your own code as per the defined guidelines of an extension and plug it in a PostgreSQL instance without changing the actual PostgreSQL code tree. An extension by very definition extends what PostgreSQL can do, but more than that, it gives you the ability to interact with external entities. These external entities can be other database management systems like ClickHouse, Mongo or HDFs (normally these are called foreign data wrappers), or other interpreters or compilers (thus allowing us to write database functions in another language like Java, Python, Perl or TCL, etc.). Another potential use case of an extension can be for code obfuscation which allows you to protect your super secret code from prying eyes.

Build your own

To build your own extension, you don’t need a complete PostgreSQL code base. You can build and install an extension using installed PostgreSQL (it may require you to install a devel RPM or Debian package). Details about extensions can be found in PostgreSQL’s official documentation[1]. There many extensions available for different features in the contrib directory of PostgreSQL source. Other than the contrib directory, people are also writing extensions readily available on the internet but currently not part of the PostgreSQL source tree. The pg_stat_statements, PL/pgSQL, and PostGIS are examples of the best known or most widely used extensions.

Generally available PostgreSQL extensions may be classified into four main categories:

  • Add support of a new language extension (PL/pgSQL, PL/Python, and PL/Java)
  • Data type extensions where you can introduce new ((Hstore, cube, and hstore)
  • Miscellaneous extensions (contrib folder has many miscellaneous extensions)
  • Foreign Data Wrapper extension (postgres_fdw, mysqldb_fdw, clickhousedb_fdw)

There are four basic file types that are required for building an extension:

  • Makefile: Which uses PGXS PostgreSQL’s build infrastructure for extensions.
  • Control File: Carries information about the extension.
  • SQL File(s): If the extension has any SQL code, it may reside in form SQL files (optional)
  • C Code: The shared object that we want to build (optional).

Extension Makefile

To compile the C code, we need a makefile. It’s a very simple makefile with the exception of “PGXS”, which is PostgreSQL’s infrastructural makefile for creating extensions. The inclusion of “PGXS” is done by invoking pg_config binary with “–pgxs” flag. The detail of that file can be found at GitHub[2].

This is a sample makefile, which can be used to compile the C code.

EXTENSION = log
MODULE_big = log
DATA = log--0.0.1.sql
OBJS = log.o
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Extension Control File

This file must be named as [EXTENSION NAME]. control file. The control file can contain many options which can be found at official documentation [3]. But in this example, I have used some basic options.

comments: Comments about the extension.

default_version: This is the extension SQL version. The name of the SQL file contains this information in the file name.

relocatable:  Tells PostgreSQL if it is possible to move contained objects into a different schema.

module_pathname:  This information is replaced with the actual lib file path.

comment = 'PostgreSQL Utility Command Logger
'default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/log'

The contents of the file can be seen using the psql command \dx psql.

postgres=# \dx log      
List of installed extensions Name   | Version | Schema |       Description
------------------------------------+---------+--------+----------------------------------
log                                 | 0.0.1   | public | PostgreSQL Utility Command Logger

Extension SQL File

This is a mapping file, which I used to map the PostgreSQL function with the corresponding C function. Whenever you call the SQL function then the corresponding C function is called. The name of the file must be [EXTENSION NAME]–[default-version].sql. This is the same default_version as defined in the control file.

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)
RETURNS SETOF RECORDAS 'MODULE_PATHNAME',
'pg_all_queries'
LANGUAGE C STRICT VOLATILE;

Extension C Code

There are three kinds of functions you can write in c code.

The first is where you call your c code function using SQL function written in SQL file of the extension.

The second type of function is callbacks. You register that callback by assigning the pointer of the function. There is no need for an SQL function here. You call this function automatically when a specific event occurs.

The third type of function is called automatically, even without registering. These functions are called on events such as extension load/unload time etc.

This is the C file containing the definition of the C code. There is no restriction for the name, or of the number of C files.

#include "postgres.h"
/* OS Includes */
/* PostgreSQL Includes */
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
Datum pg_all_queries(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_all_queries);
static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest,       char *completionTag);

You need to include postgres.h for the extension. You can include other PostgreSQL as needed. PG_MODULE_MAGIC is macro which you need to include in the C file for the extension. _PG_init and _PG_fini are functions that are called when the extension is loaded or unloaded, respectively.

Here is an example of the load and unload functions of an extension.

void _PG_init(void)
{
    /* ... C code here at time of extension loading ... */
    ProcessUtility_hook = process_utility;
}
Void _PG_fini(void)
{
    /* ... C code here at time of extension unloading ... */
}

Here is an example of a callback function that you can invoke whenever you call a utility statement e.g. any DDL statement. The “queryString” variable contains the actual query text.

static void process_utility(PlannedStmt *pstmt,
                           const char *queryString,
                           ProcessUtilityContext context,
                           ParamListInfo params,
                           QueryEnvironment *queryEnv,DestReceiver *dest,
                           char *completionTag)
{
    /* ... C code here ... */
    standard_ProcessUtility(pstmt,   
                            queryString,   
                            context,   
                            params,   
                            queryEnv,   
                            dest,
                            completionTag);
    /* ... C code here ... */
}

Finally, here’s an example of a C function that is invoked through a user-defined SQL function. This internally calls the C function contained in our shared object.

Datum pg_all_queries(PG_FUNCTION_ARGS)
{
    /* ... C code here ... */
    tupstore = tuplestore_begin_heap(true, false, work_mem);
    /* ... C code here ... */     
    values[0] = CStringGetTextDatum(query);
    values[1] = CStringGetTextDatum(pid);
    /* ... C code here ... */
    tuplestore_donestoring(tupstore);
    return (Datum) 0;
}

Compile and Install

Before compilation, you need to set the PATH for PostgreSQL’s bin directory if there is no pg_config available in the system paths.

export PATH=/usr/local/pgsql/bin:$PATH

make USE_PGXS=1

make USE_PGXS=1 install

Output

We can now use our extension through a simple SQL query. Following is the output that is coming straight out of extension written in C programming language.

postgres=# select * from pg_all_queries();          
query                      | pid
--------------------------+|------
create table foo(a int);  +| 8196
create table bar(a int);  +| 8196
drop table foo;           +| 8196
(3 rows)

I hope this example can serve as a starting point for you to create more useful extensions that will not only help you and your company, but will also give you an opportunity to share and help the PostgreSQL community grow.

The complete example can be found at Github[4].

[1]: https://www.postgresql.org/docs/current/external-extensions.html

[2]: https://github.com/postgres/postgres/blob/master/src/makefiles/pgxs.mk

[3]: https://www.postgresql.org/docs/9.1/extend-extensions.html

[4]: https://github.com/ibrarahmad/Blog-Examples/tree/master/log


Photo from Pexels

Oct
05
2018
--

PostgreSQL Extensions for an Enterprise-Grade System

PostgreSQL extensions for logging

PostgreSQL® logoIn this current series of blog posts we have been discussing various relevant aspects when building an enterprise-grade PostgreSQL setup, such as security, back up strategy, high availability, and different methods to scale PostgreSQL. In this blog post, we’ll get to review some of the most popular open source extensions for PostgreSQL, used to expand its capabilities and address specific needs. We’ll cover some of them during a demo in our upcoming webinar on October 10.

Expanding with PostgreSQL Extensions

PostgreSQL is one of the world’s most feature-rich and advanced open source RDBMSs. Its features are not just limited to those released by the community through major/minor releases. There are hundreds of additional features developed using the extensions capabilities in PostgreSQL, which can cater to needs of specific users. Some of these extensions are very popular and useful to build an enterprise-grade PostgreSQL environment. We previously blogged about a couple of FDW extensions (mysql_fdw and postgres_fdw ) which will allow PostgreSQL databases to talk to remote homogeneous/heterogeneous databases like PostgreSQL and MySQL, MongoDB, etc. We will now cover a few other additional extensions that can expand your PostgreSQL server capabilities.

pg_stat_statements

The pg_stat_statements module provides a means for tracking execution statistics of all SQL statements executed by a server. The statistics gathered by the module are made available via a view named pg_stat_statements. This extension must be installed in each of the databases you want to track, and like many of the extensions in this list, it is available in the contrib package from the PostgreSQL PGDG repository.

pg_repack

Tables in PostgreSQL may end up with fragmentation and bloat due to the specific MVCC implementation in PostgreSQL, or simply due to a high number of rows being naturally removed. This could lead to not only unused space being held inside the table but also to sub-optimal execution of SQL statements. pg_repack is the most popular way to address this problem by reorganizing and repacking the table. It can reorganize the table’s content without placing an exclusive lock on it during the process. DMLs and queries can continue while repacking is happening.  Version 1.2 of pg_repack introduces further new features of parallel index builds, and the ability to rebuild just the indexes. Please refer to the official documentation for more details.

pgaudit

PostgreSQL has a basic statement logging feature. It can be implemented using the standard logging facility with

log_statement = all

 . But this is not sufficient for many audit requirements. One of the essential features for enterprise deployments is the capability for fine-grained auditing the user interactions/statements issued to the database. This is a major compliance requirement for many security standards. The pgaudit extension caters to these requirements.

The PostgreSQL Audit Extension (pgaudit) provides detailed session and/or object audit logging via the standard PostgreSQL logging facility. Please refer to the settings section of its official documentation for more details.

pldebugger

This is a must-have extension for developers who work on stored functions written in PL/pgSQL. This extension is well integrated with GUI tools like pgadmin, which allows developers to step through their code and debug it. Packages for pldebugger are also available in the PGDG repository and installation is straightforward.Once it is set up, we can step though and debug the code remotely.

The official git repo is available here

plprofiler

This is a wonderful extension for finding out where the code is slowing down. This is very helpful, particularly during complex migrations from proprietary databases, like from Oracle to PostgreSQL, which affect application performance. This extension can prepare a report on the overall execution time and tables representation, including flamegraphs, with clear information about each line of code. This extension is not, however, available from the PGDG repo: you will need to build it from source. Details on building and installing plprofiler will be covered in a future blog post. Meanwhile, the official repository and documentation is available here

PostGIS

PostGIS is arguably the most versatile implementation of the specifications of the Open Geospatial Consortium. We can see a large list of features in PostGIS that are rarely available in any other RDBMSs.

There are many users who have primarily opted to use PostgreSQL because of the features supported by PostGIS. In fact, all these features are not implemented as a single extension, but are instead delivered by a collection of extensions. This makes PostGIS one of the most complex extensions to build from source. Luckily, everything is available from the PGDG repository:

$ sudo yum install postgis24_10.x86_64

Once the postgis package is installed, we are able to create the extensions on our target database:

postgres=# CREATE EXTENSION postgis;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_topology;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_sfcgal;
CREATE EXTENSION
postgres=# CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION
postgres=# CREATE EXTENSION address_standardizer;
CREATE EXTENSION

Language Extensions : PL/Python, PL/Perl, PL/V8,PL/R etc.

Another powerful feature of PostgreSQL is its programming languages support. You can code database functions/procedures in pretty much every popular language.

Thanks to the enormous number of libraries available, which includes machine learning ones, and its vibrant community, Python has claimed the third spot amongst the most popular languages of choice according to the TIOBE Programming index. Your team’s skills and libraries remain valid for PostgreSQL server coding too! Teams that regularly code in JavaScript for Node.js or Angular can easily write PostgreSQL server code in PL/V8. All of the packages required are readily available from the PGDG repository.

cstore_fdw

cstore_fdw is an open source columnar store extension for PostgreSQL. Columnar stores provide notable benefits for analytics use cases where data is loaded in batches. Cstore_fdw’s columnar nature delivers performance by only reading relevant data from disk. It may compress data by 6 to 10 times to reduce space requirements for data archive. The official repository and documentation is available here

HypoPG

HypoPG is an extension for adding support for hypothetical indexes – that is, without actually adding the index. This helps us to answer questions such as “how will the execution plan change if there is an index on column X?”. Installation and setup instructions are part of its official documentation

mongo_fdw

Mongo_fdw presents collections from mongodb as tables in PostgreSQL. This is a case where the NoSQL world meets the SQL world and features combine. We will be covering this extension in a future blog post. The official repository is available here

tds_fdw

Another important FDW (foreign data wrapper) extension in the PostgreSQL world is tds_fdw. Both Microsoft SQL Server and Sybase uses TDS (Tabular Data Stream) format. This fdw allows PostgreSQL to use tables stored in remote SQL Server or Sybase database as local tables. This FDW make use of FreeTDS libraries.

orafce

As previously mentioned, there are lot of migrations underway from Oracle to PostgreSQL. Incompatible functions in PostgreSQL are often painful for those who are migrating server code. The “orafce” project implements some of the functions from the Oracle database. The functionality was verified on Oracle 10g and the module is useful for production work. Please refer to the list in its official documentation about the Oracle functions implemented in PostgreSQL

TimescaleDB

In this new world of IOT and connected devices, there is a growing need of time-series data. Timescale can convert PostgreSQL into a scalable time-series data store. The official site is available here with all relevant links.

pg_bulkload

Is loading a large volume of data into database in a very efficient and faster way a challenge for you? If so pg_bulkload may help you solve that problem. Official documentation is available here

pg_partman

PostgreSQL 10 introduced declarative partitions. But creating new partitions and maintaining existing ones, including purging unwanted partitions, requires a good dose of manual effort. If you are looking to automate part of this maintenance you should have a look at what pg_partman offers. The repository with documentation is available here.

wal2json

PostgreSQL has feature related to logical replication built-in. Extra information is recorded in WALs which will facilitate logical decoding. wal2json is a popular output plugin for logical decoding. This can be utilized for different purposes including change data capture. In addition to wal2json, there are other output plugins: a concise list is available in the PostgreSQL wiki.

There are many more extensions that help us build an enterprise-grade PostgreSQL set up using open source solutions. Please feel free to comment and ask us if we know about one that satisfies your particular needs. Or, if there’s still time, sign up for our October webinar and ask us in person!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com