This blog was originally published in August 2018 and was updated in May 2023.
Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume, and so that it doesn’t cause any vulnerabilities. It has default settings for all of the database parameters. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog post, we will delve into the world of PostgreSQL performance tuning and establish some basic guidelines for setting database parameters to enhance the performance of your PostgreSQL database, based on the workload.
What is PostgreSQL performance tuning?
PostgreSQL performance optimization aims to improve the efficiency of a PostgreSQL database system by adjusting configurations and implementing best practices to identify and resolve bottlenecks, improve query speed, and maximize database throughput and responsiveness.
Key areas include:
- Configuration parameter tuning: This tuning involves altering variables such as memory allocation, disk I/O settings, and concurrent connections based on specific hardware and requirements.
- Query optimization: Analyzing query execution plans, identifying slow queries, and optimizing them through appropriate indexing techniques, query rewriting, or utilizing advanced features like partial indexes or materialized views.
- Hardware optimization: You need to ensure that the CPU, memory, and storage components meet the performance requirements of the database workload.
- Statistics and monitoring: By enabling and analyzing performance statistics and utilizing monitoring tools such as Percona Monitoring and Management you can identify bottlenecks and track query performance over time.
- Index tuning: PostgreSQL offers various types of indexes, including B-tree, hash, and generalized inverted indexes (GIN/GiST). Selecting the appropriate index type, creating composite indexes, and regularly analyzing and reindexing the database can substantially improve query speed.
- Schema design: Evaluating the database schema design and making adjustments such as partitioning large tables, eliminating redundant data, and denormalizing tables for frequently accessed information can improve performance.
- Connection pooling: Minimizing connection overhead and improving response times for frequently accessed data by implementing mechanisms for connection pooling and caching strategies.
- Load balancing and replication: Scaling the database system horizontally by distributing the workload across multiple servers using techniques like connection pooling, read replicas, or implementing a primary-replica replication setup.
- Memory management: Since PostgreSQL’s performance relies heavily on efficient memory usage, monitoring and adjusting key memory-related configuration parameters like shared_buffers, work_mem, and effective_cache_size can have a significant impact on performance.
PostgreSQL performance optimization is an ongoing process involving monitoring, benchmarking, and adjustments to maintain high-performing PostgreSQL databases. In addition, staying up-to-date on PostgreSQL releases and improvements can also help.
Why is PostgreSQL performance tuning important?
The performance of a PostgreSQL database has a significant impact on the overall effectiveness of an application. Here’s why it’s crucial for businesses to possess a high-performing database:
Responsiveness: Imagine waiting forever for the data you need immediately to show up. Nobody has time for that! Slow queries and data retrieval can lead to frustrating delays that impact the user experience. To ensure seamless interactions, a well-performing database is essential, delivering quick response times and satisfying user expectations.
Throughput: The throughput of an application is directly influenced by the speed at which the database can process and serve queries. A high-performance database significantly decreases query execution time, empowering the application to handle more concurrent requests and deliver data faster.
Scalability: As an application expands and needs to handle more data and user loads, the database must scale accordingly. A well-performing PostgreSQL database can effectively manage increased workloads, ensuring the application remains responsive and performs well even when under heavy usage.
Efficient Resource Utilization: Enabling the effective utilization of system resources like CPU, memory, and disk I/O can optimize your PostgreSQL database while maintaining functionality. This not only results in cost savings by minimizing hardware requirements but also has the potential to decrease cloud expenses.
Data Integrity: This is crucial for ensuring a reliable and efficient database. By incorporating features like ACID compliance, transaction management, and strong error handling, the database can safeguard against data corruption or loss. This is particularly important for businesses that rely on precise and consistent data for their decision-making and operational needs.
Competitive Advantage: User retention rates and customer satisfaction can make or break any business. A high-performing database that consistently ensures excellent application performance can give businesses a competitive advantage.
Watch the PostgreSQL high-performance tuning and optimization webinar sessions
Understanding the impact of queries on PostgreSQL performance
Bear in mind that while optimizing PostgreSQL server configuration improves performance, a database developer must also be diligent when writing queries for the application. If queries perform full table scans where an index could be used or perform heavy joins or expensive aggregate operations, then the system can still perform poorly even if the database parameters are tuned. It is important to pay attention to performance when writing database queries.
Nevertheless, database parameters are very important, too, so let’s take a look at the eight that have the greatest potential to improve performance
Check out this blog to learn how to improve PostgreSQL query performance insights with pg_stat_monitor.
Tuneable PostgreSQL parameters
Below are some PostgreSQL parameters that can be adjusted for improved performance based on your system and specific workload.
shared_buffer
PostgreSQL uses its own buffer and also uses kernel-buffered IO. That means data is stored in memory twice, first in PostgreSQL buffer and then kernel buffer. Unlike other databases, PostgreSQL does not provide direct IO. This is called double buffering. The PostgreSQL buffer is called shared_buffer, which is the most effective tunable parameter for most operating systems. This parameter sets how much dedicated memory will be used by PostgreSQL for the cache.
The default value of shared_buffer is set very low and you will not benefit much from that. It’s low because certain machines and operating systems do not support higher values. But in most modern machines, you need to increase this value for optimal performance.
The recommended value is 25% of your total machine RAM. You should try some lower and higher values because in some cases we achieve good performance with a setting over 25%. The configuration really depends on your machine and the working data set. If your working set of data can easily fit into your RAM, then you might want to increase the shared_buffer value to contain your entire database so that the whole working set of data can reside in cache. That said, you obviously do not want to reserve all RAM for PostgreSQL.
In production environments, it is observed that a large value for shared_buffer gives really good performance, though you should always benchmark to find the right balance.
testdb=# SHOW shared_buffers;
shared_buffers
----------------
128MB
(1 row)
Note: Be careful as some kernels do not allow a bigger value, specifically in Windows, there is no use of a higher value.
wal_buffers
PostgreSQL writes its WAL (write ahead log) record into the buffers and then these buffers are flushed to disk. The default size of the buffer, defined by wal_buffers, is 16MB, but if you have a lot of concurrent connections, then a higher value can give better performance.
effective_cache_size
The effective_cache_size provides an estimate of the memory available for disk caching. It is just a guideline, not the exact allocated memory or cache size. It does not allocate actual memory but tells the optimizer the amount of cache available in the kernel. If the value of this is set too low the query planner can decide not to use some indexes, even if they’d be helpful. Therefore, setting a large value is always beneficial.
work_mem
This configuration is used for complex sorting. If you have to do complex sorting, then increase the value of work_mem for good results. In-memory sorts are much faster than sorts spilling to disk. Setting a very high value can cause a memory bottleneck for your deployment environment because this parameter is per-user sort operation. Therefore, if you have many users trying to execute sort operations, then the system will allocate
work_mem * total sort operations
for all users. Setting this parameter globally can cause very high memory usage. So it is highly recommended to modify this at the session level.
testdb=# SET work_mem TO "2MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather Merge (cost=509181.84..1706542.14 rows=10000116 width=24)
Workers Planned: 4
-> Sort (cost=508181.79..514431.86 rows=2500029 width=24)
Sort Key: b
-> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24)
(5 rows)
The initial query’s sort node has an estimated cost of 514431.86. A cost is an arbitrary unit of computation. For the above query, we have a work_mem of only 2MB. For testing purposes, let’s increase this to 256MB and see if there is any impact on cost.
testdb=# SET work_mem TO "256MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather Merge (cost=355367.34..1552727.64 rows=10000116 width=24)
Workers Planned: 4
-> Sort (cost=354367.29..360617.36 rows=2500029 width=24)
Sort Key: b
-> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24)
The query cost is reduced to 360617.36 from 514431.86 — a 30% reduction.
maintenance_work_mem
maintenance_work_mem is a memory setting used for maintenance tasks. The default value is 64MB. Setting a large value helps in tasks like VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY, and ALTER TABLE.
postgres=# CHECKPOINT;
postgres=# SET maintenance_work_mem to '10MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 170091.371 ms (02:50.091)
postgres=# CHECKPOINT;
postgres=# set maintenance_work_mem to '256MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 111274.903 ms (01:51.275)
The index creation time is 170091.371ms when maintenance_work_mem is set to only 10MB, but that is reduced to 111274.903 ms when we increase maintenance_work_mem setting to 256MB.
synchronous_commit
This is used to enforce that commit will wait for WAL to be written on disk before returning a success status to the client. This is a trade-off between performance and reliability. If your application is designed such that performance is more important than reliability, then turn off synchronous_commit. This means that there will be a time gap between the success status and a guaranteed write-to-disk. In the case of a server crash, data might be lost even though the client received a success message on commit. In this case, a transaction commits very quickly because it will not wait for a WAL file to be flushed, but reliability is compromised.
checkpoint_timeout, checkpoint_completion_target
PostgreSQL writes changes into WAL. The checkpoint process flushes the data into the data files. This activity is done when CHECKPOINT occurs. This is an expensive operation and can cause a huge amount of IO. This whole process involves expensive disk read/write operations. Users can always issue CHECKPOINT whenever it seems necessary or automate the system by PostgreSQL’s parameters checkpoint_timeout and checkpoint_completion_target.
The checkpoint_timeout parameter is used to set the time between WAL checkpoints. Setting this too low decreases crash recovery time, as more data is written to disk, but it hurts performance, too, since every checkpoint ends up consuming valuable system resources. The checkpoint_completion_target is the fraction of time between checkpoints for checkpoint completion. A high frequency of checkpoints can impact performance. For smooth checkpointing, checkpoint_timeout must be a low value. Otherwise, the OS will accumulate all the dirty pages until the ratio is met and then go for a big flush.
Improve PostgreSQL performance tuning with Percona
There are more parameters that can be tuned to gain better performance but those have less impact than the ones highlighted here. In the end, we must always keep in mind that not all parameters are relevant for all application types. Some applications perform better by tuning a parameter and some don’t. Tuning PostgreSQL parameters must be done for the specific needs of an application and the OS it runs on.
Get Percona Distribution for PostgreSQL
Resources for PostgreSQL performance tuning
If your PostgreSQL database is running on a Linux OS, be sure to read my post about tuning Linux parameters to optimize PostgreSQL database performance.
After tuning your PostgreSQL database to improve its performance, the next step is to put your optimized database to the test. With sysbench, you can quickly evaluate your database’s performance. This process is especially important if you plan to run your database under an intensive workload. Our blog article, Tuning PostgreSQL for sysbench-tpcc, can guide you through the benchmarking process.
Optimizing your database is one way to course-correct poor database performance. However, the problem is not always the database itself! Several causes can contribute to your database’s lackluster performance. Our solution brief details the Top 5 Causes of Poor Database Performance. With this crucial information, you can better implement a quick and efficient resolution.
Discover why Percona and PostgreSQL work better together