If you’re using PostgreSQL in the cloud, there’s a good chance you’re spending more than you need in order to get the results you need for your business.
Let’s take a look at how to get the benefits you need while spending less, based on the recommendations presented by Dani Guzmán Burgos, our Percona Monitoring and Management (PMM) Tech Lead, on this webinar (now available on demand) hosted in November last year.
Usage reduction: What to look for to reduce PostgreSQL cloud costs
The first step in cost reduction is to use what you need and not more. Don’t pay for capacity you don’t use or need.
Usage reduction is a continuous process. Identifying which resources you can trim to reduce your monthly bill can be difficult, but looking at the right metrics will help you understand the application’s actual requirements.
In the Home Dashboard of PMM, a low CPU utilization on any of the database services that are being monitored could mean that the server is inactive or over-provisioned. Marked in red in Figure 1 is a server with less than 30% of CPU usage. PMM can also show you historical data that can help you identify how long a service has been in a given state. Configuration of the CPU metrics can be changed in the dashboard. These color-coded states on panels are available in PMM 2.32.0 and later.
From the Amazon Web Services (AWS) documentation, an instance is considered over-provisioned when at least one specification of your instance, such as CPU, memory, or network, can be sized down while still meeting the performance requirements of your workload and no specification is under-provisioned. Over-provisioned instances may lead to unnecessary infrastructure costs.
Making the use of resources efficiently and ensuring that this does not impact the budget available for cloud computing is not a one-time fix but a continuous cycle of picking properly sized resources and eliminating over-provisioning.
Use reduction at scale requires a cultural shift, and engineers must consider the cost as they think of memory or bandwidth as another deployment KPI.
Think of a gaming company that creates a game that is getting popular, so the number of resources needed to support more users would increase considerably. But if the game loses popularity, the server would become over-provisioned, and the resources allocated must be re-sized to better fit the application’s needs.
Re-sizing to save costs
There are three approaches to usage reduction:
- Reducing waste
- Re-architecting
- Re-sizing to what you need
Regardless of the method you’re using to deploy your PostgreSQL instance, here are some metrics that would determine when re-sizing is needed:
- CPU utilization
- Memory usage
- Network throughput
- Storage usage
Remember that optimizing your infrastructure is intended for more than cost savings. You have to ensure that the operation is not impacted when you’re making decisions based on the metrics. The primary goal is to ensure that the services themselves do not run out of the required operating capacity.
PostgreSQL in the cloud
CPU
Considering AWS as your cloud platform of choice, the configuration made for your infrastructure will influence the performance of your application and monthly costs. For example, an Amazon Elastic Compute Cloud (EC2) instance with Graviton2 processors will be a better choice than non-ARM options, as it’s cheaper, and you will get real and faster cores which means the CPU cores are physical and not with hyper-threading. Graviton2 processors aim to use Reserved Instances to save costs in the long run.
Benefits of Graviton2 Processors
- Best price performance for a broad range of workloads
- Extensive software support
- Enhanced security for cloud applications
- Available with managed AWS services
- Best performance per watt of energy used in Amazon EC2
Storage
Continuing with the AWS example, choosing the right storage option will be key to performance. Amazon Elastic Block Store (EBS) is your good-to-go option for disk space.
From AWS documentation, Amazon EBS is an easy-to-use, scalable, high-performance block-storage service designed for Amazon EC2.
Figure 2: Amazon Elastic Block Storage
Running relational or NoSQL databases is one of the use cases where EBS is recommended for. You can deploy and scale your databases, including SAP HANA, Oracle, Microsoft SQL Server, PostgreSQL, MySQL, Cassandra, and MongoDB.
With EBS, you can configure HDD-based volumes optimized for large streaming workloads or SSD-based volumes (recommended for database workloads) optimized for transactional workloads.
An SSD volume can be any of the following types:
- io1
- io2
- io2 Block Express
- gp2
- gp3
Which one is a better choice for storage? It will depend on the requirements of your workload, including disk space, Input/Output Operations per Second (IOPS), and throughput rate (MB/s), and your configuration must be cost-optimized as well.
Avi Drabkin’s blog post is a recommended reading on this matter, as he analyzes the configuration required for every volume type to satisfy the requirements of a particular use case. For more information on EBS volume types, check the Amazon EBS Volume Types page.
Multi AZ deployments vs. read replicas
Multi-AZ deployment
In an Amazon RDS Multi-AZ deployment, Amazon RDS automatically creates a primary database (DB) instance and synchronously replicates the data to an instance in a different AZ. Amazon RDS automatically fails over to a standby instance without manual intervention when it detects a failure.
Figure 3: Amazon RDS Multi-AZ Deployment
Read replica
Amazon RDS creates a second DB instance using a snapshot of the source DB instance. It then uses the engines’ native asynchronous replication to update the read replica whenever there is a change to the source DB instance. The read replica operates as a DB instance that allows only read-only connections; applications can connect to a read replica just as they would to any DB instance. Amazon RDS replicates all databases in the source DB instance.
Which option is better?
Multi-AZ deployments offer advantages, especially for HA and disaster recovery. The trade-off is that multi-AZ deployments are expensive.
A better option would be to deploy reader instances and combine them with the use of a reverse proxy, like pgpool-II or pgbouncer. The reader instances also cost more than a standard setup, but you can use them for production to handle everyday database traffic.
Pgpool-II can not only be used for reducing connection usage, which will be helpful to reduce CPU and memory usage but can also do load balancing. With load balancing, you can redistribute the traffic, sending the reading requests to your read replicas and writing requests to your main database instance automatically.
Regarding read replicas, in AWS, you cannot promote an RDS PostgreSQL read replica, which means a read replica can’t become the primary instance. Whenever you try to do this, the read replica detaches from the primary instance and become its own primary instance, and you will end up having two different clusters.
One solution is using the pglogical extension for creating replicas outside the RDS path. When combining the pglogical replication with a reverse proxy, you will still get the benefits of a managed database, including backups, minor upgrades maintenance, recovery support, and being tied to the Multi-AZ configuration, which translates to full control over planned failovers.
Also, converting a replica to the primary instance would be a better upgrade approach. For example, if you need to upgrade a database with a large amount of data, this process could take hours, and your instance won’t be available during this time. So, with this configuration, you can upgrade a replica and later convert that replica to the primary instance without interrupting operations.
Check this blog post for more information on how to use pglogical for upgrading your database instances.
Vacuum
As explained in this blog post, bloating in the database is created when tables or indexes are updated, an update is essentially a delete-and-insert operation. The disk space used by the delete is available for reuse but not reclaimed, creating the bloat.
How to remove bloat? That’s what the vacuum process is intended for with the help of autovacuum and vacuumdb.
Autovacuum is a daemon that automates the execution of VACUUM and ANALYZE (to gather statistics) commands. Autovacuum checks for bloated tables in the database and reclaims the space for reuse.
vacuumdb is a utility for cleaning a PostgreSQL database. vacuumdb will also generate internal statistics used by the PostgreSQL query optimizer. vacuumdb is a wrapper around the SQL command VACUUM.
The vacuum process is recommended to be scheduled at a time when your database has low traffic, usually at night. So, you must disable Autovacuum during the day and run vacuumdb at night will full power. This way, you guarantee that the resources will be available during the day when most operations occur.
Monitoring your database for dead tuples (bloat) is also recommended. For this matter, you can use the Experimental PostgreSQL Vacuum Monitoring. This experimental dashboard is not part of PMM, but you can try it and provide feedback.
What about serverless?
With serverless, you truly pay only for what you’re actively using, and unused resources aren’t typically easy to be left flying around, but the move to serverless isn’t without cost. The complexity of building any migration plan to serverless resides in execution and has very little to do with cost savings. There’s an entirely different lens through which you can evaluate serverless: total cost of ownership (TCO).
The TCO refers to the cost of engineering teams that are required to build a solution and the impact of time to market on the success and profitability of a service. Serverless allows you to delegate a lot of responsibility to the cloud provider.
Duties that DevOps engineers would typically perform (server management, scaling, provisioning, patching, etc.) become the responsibility of AWS, GCP, or Azure. And leaves dev teams with free time to focus on shipping differentiated features faster.
With TCO, you must consider that people’s costs may cancel out any infrastructure savings when considering moving from a monolithic application to a serverless one.
Returning to the benefits versus effort, you should consider the overall cost of redesigning services for serverless against the potential for reducing costs.
Conclusion
Knowing your project’s database requirements will be essential when choosing the services and hardware configuration you will pay your cloud service provider for. The configuration you make must guarantee the proper functioning of your application and will determine the monthly costs.
The number of resources required may vary over time, and metrics like CPU usage, memory usage, and disk storage will help you determine when re-sizing your infrastructure is needed.
For example, if the number of database transactions decreases after a certain period of time, you will have more resources than you need, and it will be important to change the configuration of your infrastructure to guarantee the new requirements and pay what is really being used.
Following the recommendations presented in this webinar, you can design an cost-optimized infrastructure without affecting your database’s performance.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.