Today saw the launch of PostgreSQL14. For me, the important area to focus on is what the community has done to improve performance for heavy transactional workloads and additional support for distributed data.
The amount of data that enterprises have to process continues to grow exponentially. Being able to scale up and out to support these workloads on PostgreSQL 14 through better handling of concurrent connections and additional features for query parallelism makes a lot of sense for performance, while the expansion of logical replication will also help.
Performance Enhancements
Server Side Enhancements
Reducing B-tree index bloat
Frequently updated indexes tend to have dead tuples that cause index bloat. Typically, these tuples are removed only when a vacuum is run. Between vacuums, as the page gets filled up, an update or insert will cause a page split – something that is not reversible. This split would occur even though dead tuples inside the existing page could have been removed, making room for additional tuples.
PostgreSQL 14 implements the improvement where dead tuples are detected and removed even between vacuums, allowing for a reduced number of page splits thereby reducing index bloat.
Eagerly delete B-tree pages
Reducing overhead from B-trees, the vacuum system has been enhanced to eagerly remove deleted pages. Previously, it took 2 vacuum cycles to do so, with the first one marking the page as deleted and the second one actually freeing up that space.
Enhancements for Specialized Use Cases
Pipeline mode for libpq
High latency connections with frequent write operations can slow down client performance as libpq waits for each transaction to be successful before sending the next one. With PostgreSQL 14, ‘pipeline mode’ has been introduced to libpq allowing the client to send multiple transactions at the same time, potentially giving a tremendous boost to performance. What’s more – because this is a client-side feature, PostgreSQL 14’s libpq can even be used with older versions of the PostgreSQL server.
Replicating in-progress transactions
Logical replication has been expanded to allow streaming in-progress transactions to subscribers. Large transactions were previously written to disk till the transaction was completed before replicating to the subscriber. By allowing in-progress transactions to be streamed, users gain significant performance benefits along with more confidence in their distributed workloads.
Add LZ4 compression to TOAST
PostgreSQL 14 adds support for LZ4 compression for TOAST, a system used to efficiently store large data. LZ4 is a lossless compression algorithm that focuses on the speed of compression and decompression. LZ4 can be configured at the column as well as the system level. Previously, the only option was pglz compression – which is fast but has a small compression ratio.
Enhancements for Distributed Workloads
The PostgreSQL foreign data wrapper – postgres_fdw – has been instrumental in easing the burdens of handling distributed workloads. With PostgreSQL 14, there are 2 major improvements in the FDW to improve performance for such transactions: Query parallelism can now be leveraged for parallel table scans on foreign tables, and Bulk insert of data is now allowed on foreign tables.
Both these improvements further cement the increasing ability of PostgreSQL to scale horizontally and natively handle distributed databases.
Improvement to SQL parallelism
PostgreSQL’s support for query parallelism allows the system to engage multiple CPU cores in multiple threads to execute queries in parallel, thereby drastically improving performance. PostgreSQL 14 brings more refinement to this system by adding support for RETURN QUERY and REFRESH MATERIALIZED VIEW to execute queries in parallel. Improvements have also been rolled out to the performance of parallel sequential scans and nested loop joins.
Security
SCRAM as default authentication
SCRAM-SHA-256 authentication was introduced in PostgreSQL 10 and has now been made the default in PostgreSQL 14. The previous default MD5 authentication has had some weaknesses that have been exploited in the past. SCRAM is much more powerful, and it allows for easier regulatory compliance for data security.
Predefined roles
Two predefined roles have been added in PostgreSQL 14 – pg_read_all_data and pg_write_all_data. The former makes it convenient to grant read-only access for a user to all tables, views, and schemas in the database. This role will have read access by default to any new tables that are created. The latter makes it convenient to create super-user-styled privileges, which means that one needs to be quite careful when using it!
Convenience for Application Developers
Access JSON using subscripts
From an application development perspective, I have always found support for JSON in PostgreSQL very interesting. PostgreSQL has supported this unstructured data form since version 9.2, but it has had a unique syntax for retrieving data. In version 14, support for subscripts has been added, making it easier for developers to retrieve JSON data using a commonly recognized syntax.
Multirange types
PostgreSQL has had Range types since version 9.2. PostgreSQL 14 now introduces ‘multirange’ support which allows for non-contiguous ranges, helping developers write simpler queries for complex sequences. A simple example of practical use would be specifying the ranges of time a meeting room is booked through the day.
OUT parameters in stored procedures
Stored procedures were added in PostgreSQL 11, giving developers transactional control in a block of code. PostgreSQL 14 implements the OUT parameter, allowing developers to return data using multiple parameters in their stored procedures. This feature will be familiar to Oracle developers and welcome addition for folks trying to migrate from Oracle to PostgreSQL.
Observability
Observability is one of the biggest buzzwords of 2021, as developers want more insight into how their applications are performing over time. PostgreSQL 14 adds more features to help with monitoring, with one of the biggest changes being the move of query hash system from pg_stat_statement to the core database. This allows for monitoring a query using a single ID across several PostgreSQL systems and logging functions. This version also adds new features to track the progress of COPY, WAL activity, and replication slots statistics.
The above is a small subset of the more than 200 features and enhancements that make up the PostgreSQL 14 release. Taken altogether, this version update will help PostgreSQL continue to lead the open source database sector.
As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.
Read Our New White Paper: