Talking Drupal #388 – Valhalla Content Hub

Today we are talking about Valhalla Content Hub with Shane Thomas.

For show notes visit: www.talkingDrupal.com/388


  • Joining Netlify
  • Changes at Gatsby
  • What is a content hub
  • How does that differ from a content repo
  • What is Valhalla
  • How does it work
  • Data stitching with GraphQL
  • Can you massage / normalize data
  • Benefits
  • Privacy
  • Production examples
  • How is it structured
  • Do you have to use Gatsby
  • Integrations with Drupal
  • Timing
  • Cost
  • How to sign up



Shane Thomas – www.codekarate.com/ @smthomas3


Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Jacob Rockowitz – www.jrockowitz.com @jrockowitz

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Entity Share You configure one site to be the Server that provides the entities, and content types or bundles will be available, and in which languages.


Run Percona Server for MongoDB on Your ARM-based MacBooks and More

Percona Server for MongoDB on Your Arm-based MacBooks

The transition to Apple Silicon (the M1, M2 ARM-based CPUs) has been great for macOS fans in many ways, but it’s added some friction when you want to run containers locally for testing or development. The good news is we’ve got you covered with ARM64 container images for Percona Server for MongoDB 6.

While it’s possible to run x86_64 images on an ARM-based Mac, it’s not the most efficient or performant way to do so. Unfortunately, this has been the only way to do so until now.

With the release of Percona Server for MongoDB 6.0.4, we’ve added native Docker container images for ARM64. These run without needing Rosetta 2 translation or emulation on your ARM-based Macs.

Running the ARM64 MongoDB container image on your Mac

If you have Docker or Docker Desktop installed on your Mac, you can run Percona Server for MongoDB with a one-liner as described in our documentation:

$ docker run -d --name psmdb --restart always \

Note that you’ll want to replace 6.0.4-3-arm64 with the appropriate image tag. You can check Docker Hub to find the current Arm64 version.

Prefer working with Podman? Just replace docker with podman:

$ podman run -d --name psmdb --restart always \

If you’re using Podman Desktop, you can just drop into the terminal to start testing your container with mongosh:

Podman Desktop

Figure 1: Podman Desktop running ARM64 MongoDB containers

This gives you a quick way to access the MongoDB instance inside the container. Or you can connect by running mongosh externally and connecting to the MongoDB port exposed by the container. You’ll find more documentation on docs.percona.com, “Run Percona Server for MongoDB in a Docker container.”

Using `mongosh` inside the Percona arm64 container

Figure 2: Using mongosh inside the Percona ARM64 container

Please put Percona Server for MongoDB ARM container images through their paces! We’d love to know if you encounter any problems or if you have any tips to share!

Graviton and other ARM platforms

You should also be able to run the Percona Server for MongoDB ARM container images on AWS Graviton instances and other ARM64 platforms. We haven’t done extensive testing there, but we expect these images to work unmodified on Raspberry Pi. If you encounter any issues, please let us know in the forums or file a bug in Jira.

What’s next for ARM64 builds and Percona Server for MongoDB?

This is the first step we are taking to address the ARM support needs. We have started with Docker for Percona Server for MongoDB, but of course we plan to expand the rest of the software for MongoDB and packages.

We aim to incrementally provide full support for the ARM64 architecture for the full Percona Distribution for MongoDB. We are also looking into Arm needs for other Percona products!

Percona Server for MongoDB is a source-available, enterprise-grade drop-in replacement for MongoDB Community Edition.

Learn more about Percona Server for MongoDB


Percona Monitoring and Management 2.35, Percona Backup for MongoDB 2.0.4: Release Roundup February 27, 2023

Percona Releases Feb 27 2023

It’s time for the release roundup!

Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights, critical information, links to the full release notes, and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since February 13, 2023. Take a look!

Percona Monitoring and Management 2.35

Percona Monitoring and Management 2.35 (PMM) was released on February 24, 2023. It’s an open source database monitoring, management, and observability solution for MySQL, PostgreSQL, and MongoDB. Among the many highlights in the release notes, we are pleased to announce that PMM 2.35.0 marks the first step towards introducing PMM Access Control. Label-based access control in PMM allows you to manage who has access to metrics based on labels. By creating roles, you can specify which data can be queried based on specific label criteria, for instance, allowing the QA team to view data related to test environments. Using this feature, you can associate multiple labels with a role, ensuring only data from series that match your defined labels is returned.

Download Percona Monitoring and Management 2.35

Update to the release of Percona Distribution for MySQL (Percona Server for MySQL-based variant) 8.0.31

An update to the release of Percona Distribution for MySQL (PS-based variant) 8.0.31 was released on February 15, 2023. This update includes a new version of ProxySQL component 2.4.7.

Download the Update to the release of Percona Distribution for MySQL (PS-based variant) 8.0.31

Percona Backup for MongoDB 2.0.4

On February 21, 2023, Percona Backup for MongoDB 2.0.4 was released. It is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a non-sharded replica set) and for restoring those backups to a specific point in time. Along with bug fixes, a release highlight is that the ability to specify the custom path to mongod binaries simplifies the physical restore process.

Download Percona Backup for MongoDB 2.0.4

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.


PMM Access Control: A Comprehensive Guide With Use Cases and Examples

Why you might need granular Access Controls

As companies grow from startups to larger organizations, they establish a hierarchical structure of roles and responsibilities that changes year to year. As a result, it becomes increasingly important to protect the confidentiality of information to prevent data leaks and facilitate easy access to relevant data sources and tools for the related teams without having to sift through unrelated databases or environmental data. This is particularly important for large enterprises, where implementing least privilege access to all software used within the organization is likely required. To achieve this in a secure and efficient manner, a key requirement for any database monitoring and management software is to have robust access management features.

Label-based access control is a security mechanism that allows companies to control who can access specific metrics based on their labels.  Monitored data may contain sensitive data that needs to be protected, and label-based access control can be used to ensure that only authorized individuals can access that data.

In a nutshell, here are a few reasons why companies may need label-based access control for their monitoring solution:

  • To reduce the risk of insider threats: Insider threats are a significant concern for many companies. Label-based access control can help reduce the risk of insider threats by limiting access to sensitive data only to employees who need it.
  • To simplify access management: Label-based access control can simplify access management by allowing companies to assign labels to metrics. This makes managing access permissions easier and ensures that only authorized individuals can access specific data.

To ensure compliance: Many industries have regulatory requirements that mandate specific data handling and protection practices. Access control can help ensure that companies comply with these regulations by limiting access to data to only those employees who are authorized to view it.

Sample use cases for label-based Access Control in Percona Monitoring and Management (PMM)

Let’s say you are a member of a company selling a subscription-based streaming service that provides a wide variety of TV shows, movies and other video content to its subscribers. Your company operates a massive distributed architecture that spans multiple data centers and cloud providers, and your databases are a crucial part of this infrastructure. So, you have dozens of teams with hundreds of engineers who are responsible for ensuring high availability and reliability, including employing fault-tolerant systems, and implementing rigorous testing and monitoring procedures. Your organization may have several teams, which are:

  • Database administration, employing DBAs
  • Development Team, employing developers
  • Site reliability engineering, SREs
  • Quality Assurance, with QA Engineers
  • Monitoring Team

Let’s take a look at an example of why you might need and how you could use label-based access control (LBAC) in your authorization flow.

Once data has been ingested into a database and then into PMM’s dashboards, the organizations have a massive amount of data from different sources that is waiting for the DBAs, Developers, SREs, etc., to monitor and diagnose. +50 applications, +1000 nodes, three environments, including qa, dev, production, are all being monitored by PMM. And it is becoming hard to diagnose, isolate, and analyze any applications or environments for which a specific team may be responsible.

PMM, monitoring big scale environment

PMM, monitoring big scale environment

Vanessa is the database administrator in the company, and she would like to set up label-based access control inside PMM to make data access more secure and easy for her organization. The idea is to limit each team’s access to the data and environments that are relevant to their specific responsibilities. For example, the QA team would only have access to data from the QA environment rather than having access to the production environment.

This approach, known as least privilege access, ensures that each team only has access to the data they need to perform their duties, while limiting the risk of unauthorized access to sensitive information. By implementing this solution, she can protect its confidential data and make it easier for teams to find the information they need without being bogged down by irrelevant data.

Structuring the authorization goals

Vanessa has an Admin role in PMM with all privileges and can create users and roles and grant them privileges. 

As a first task, Vanessa plans an access control roll-out strategy. She starts with answering these questions:

  • Why do I need granular access control instead of basic roles with viewer, editor, admin access?
  • What options regarding access control are currently available in PMM? 
  • How do I structure permissions to make them easy to manage?
  • What are the needs of the specific teams who need to  access metrics?
  • Which approach should I use when assigning roles? Should I use the Grafana UI, provisioning, or the API?

Considering needs, Vanessa decides to use both basic roles and Access Roles, which provides label-based permissions. She then created the table below as an Access Control implementation schema. 

Note: You can take advantage of your current authentication provider to manage user and team permissions for basic roles. Access Roles doesn’t provide user management with authentication provider yet. It’s currently part of our roadmap, so please stay tuned for upcoming releases of PMM and keep an eye on the release notes.

Team/User Role name Description Labels*  Privilege 
DBA Team-1 (MySQL) role_dba_mysql Read Privilege MySQL database metrics of all apps on the prod and qa environment  environment~=”(dev|prod)”, service=”mysql” Mysql,

Prod and dev,

All apps

DBA Team-2 (MongoDB) role_dba_mongodb Read Privilege to  MongoDB database metrics of all apps on the prod and dev environment  environment=”dev, prod”, service=”mongodb” MongoDB,

Prod and dev,

All apps

Dev Team-1 (App A) role_dev_appA Read Privilege to database metrics of App A on the prod and dev environment  environment=”dev, prod”, app=”A” Prod and dev,,

App A

Dev Team-2 (App B) role_dev_appB Read Privilege to database metrics of App B on the prod and dev environment  environment=”dev, prod”, app=”B” Prod and dev,

App B

QA Team role_qa Read Privilege to database metrics of the qa environment  environment=”qa” QA

All apps

Monitoring Team role_monitoring Read Privilege to database metrics of the prod environment 



All apps


* Labels must be in key=”value” format, both the key and value must begin with a letter, and they can only contain letters, numbers, hyphens or dashes. Please check the labels for access control page on the product documentation.

Note: Note: Team-level role assignment is currently part of our roadmap, so please keep an eye out for future releases of PMM. If you require the team-level role assignment feature, please don’t hesitate to reach out to us on the forum and let us know.

Before label implementation

A privilege comes with a label and READ permission assigned to a role. So, a role can be granted the READ permission based on labels. 

Enable Access Control

Although Vanessa, a DBA, prefers to enable and configure Access Control via the UI, it is possible to enable it when deploying PMM Server via Docker. Read more on this configuration on our product documentation page. To configure access control from the PMM UI, she does the following:

  • Navigate to Configuration / Settings on the main menu
  • Click Advanced Settings and scroll down to end of the page
  • Click toggle to enable Access Control
  • Click “Apply changes” button

enable access control

After Vanessa enables Access Control, the Access Roles page will automatically be displayed on the main menu.  

Enabled Access Roles in PMM

Enabled Access Roles in PMM

Note that when Access Control is enabled, Full Access is automatically assigned as the default role to all users, granting them access to all metrics. When users log in to PMM for the first time without a designated role, they will be automatically assigned the “Default Role.” For administrators, the default role provides a convenient way to configure default permissions for new users.

Configuration / Access Roles

Configuration / Access Roles

It’s possible to change the default role from UI or pmm-admin. Visit the feature documentation to see how you can change the default role.

Create an access role

Note: To create roles, you must have admin privileges. For more information, see this related page. 

Follow the following steps:

  • Access Control, display on main menu
  • Click “Create” button
  • Enter Role name, Description
  • Add service labels to which this role will have read access to
  • Click “Create Role” button on the top-right corner to save the role or Click “Cancel”  button to exit without saving the role
Create a role in Access Control

Create a role in Access Control

The following label matching operators exists:

  • =: Select labels that are exactly equal to the provided string.
  • !=: Select labels that are not equal to the provided string.
  • =~: Select labels that regex-match the provided string.
  • !~: Select labels that do not regex-match the provided string.

Note: The filter uses PromQL which is a query language of Prometheus. Find more on Prometheus documentation.

Assign a role

To assign a role to users;

  • Click “Users” tab or go to Configuration / Users page to see user list 
  • Click Assign role dropdown to select one or multiple roles to assign
  • It is automatically saved
Assign a role

Assign a role

Role presents basic roles, which are Viewer, Editor and Admin:

  • Viewer – authenticated user with read-only access. They can only view information and can’t add services or do any harm to the system.
  • Editor – In addition to Viewer permission, it allows users to edit Grafana dashboards.
  • Admin – authenticated user with full control.
    • Example: PMM Admin who can add servers and perform updates.

Access roles are presented to allow Admins to create new user roles based on labels, which are designed to limit access to metrics. These roles are only available when Access Control is enabled.

Basic roles and Access roles complement each other and are independent of each other. Access roles is used to filter data based on the assigned filters to the role.

Note: Team-level role assignment is currently part of our roadmap, so please keep an eye out for future releases of PMM. If you require the team-level role assignment feature, please don’t hesitate to reach out to us on the forum and let us know.

Community Forum

Update a role

To edit Access Roles,  follow these steps:

  • From the Main menu, navigate to  Configuration ? Access Roles. The Access Roles tab opens.
  • On the role you want to edit, click the ellipsis (three vertical dots) > edit role in the Options column. The Edit role page opens.
  • Edit the role and click “Save Changes” on the top right corner.

For more information, visit the Manage Access Control documentation.

Update the role

Update the role

Delete a role

Note: You can only remove a role not assigned to any user. To remove any role, first unassign the users from this role.

To edit access roles,  follow these steps:

  • From the Main menu, navigate to  Configuration ? Access Roles. The Access Roles tab opens.
  • On the role you want to delete, click the ellipsis (three vertical dots) > Delete in the Options column. 
  • Click “Delete”

Labels FAQ

Can I add a role with full access?

Yes, you can. Simply follow the same steps as you would for role creation, but make sure to leave the metric filter field blank before saving the role.

Role with full access

Role with full access

Where can I create labels?

You can add custom or predefined labels in PMM while adding a service for monitoring in PMM. You can add predefined labels using both API and UI. To add a label using API, see API documentation. Also you can create their own custom labels and filter by them as well, check our API documentation. 

Where can I assign labels?

Admin can assign labels to services using PMM UI or pmm-admin. Metrics inherit lables from the service. Please check this page for more.

Where can labels be used?

Once a label has been created and/or has been used to filter metrics within the PMM instance, Admin can use these labels to restrict which metrics can display on dashboards and organize metrics by viewing metric data on dashboards based on labels.

Can I delete a label?

No, labels can not be deleted.

Can I rename a label?

Yes, Labels can be renamed by editing a service. Please check the feature page for more.

How does assigning a user permissions based on labels work?

Labels are additive, so you can only further restrict a user’s permissions by adding more labels. If a user has access to everything labeled environment:prod, we assume no restrictions on any other category of label. This user has less restricted permissions than another user who has access to everything with environment:prod AND app:A.

For example, if the following metrics had these set of labels:

Environments = prod, dev

Applications: App A, App B

Role Labels
Role-1 environment:prod, app:A
Role-2 environment:prod
Role-3 environment:dev, app:B

Then the following through users with Source, Admin restricted with Labels will only have access to the following metrics:

User Role with Labels Access to Metrics
Matej Role-1 environment:prod, app:A
Michal Role 2 environment:prod

Any application

Maxim Role 2, Role 3
environment:prod, any application


environment:dev, app:B


Can I limit access to PMM features like QAN, Alerting, and Backup?

Unfortunately, label-based access control is only the initial phase of PMM Access Control and does not currently encompass role-based access control. However, we have plans to incorporate it in 2023.

We’d like to hear your needs and feedback regarding Access Control. Feel free to book a 30-min call with our product team to share feedback.

Book a 30-min call with Percona Monitoring and Management team


PMM V2.35: Label-based Access Control, General Availability of Helm Chart, and More!

PMM V2.35

Today, we are excited to announce the release of Percona Monitoring and Management (PMM) V2.35, including a tech preview of label-based access control, the general availability of Helm Chart, and a range of enhancements to our Database as a Service (DBaaS) offerings, among other improvements and features. Check out all the updates on release notes.

Follow update instructions to update your PMM instance to V2.35, or you can get started using PMM in minutes with our PMM Quickstart guide.

Key highlights of PMM 2.35:

Access Control (Tech Preview): Limit access to metrics

Disclaimer: Please note that PMM Access Control is currently in technical preview and is subject to change. Therefore, we recommend using it for testing purposes only.

PMM Dashboards offer valuable insights for monitoring database metrics and troubleshooting. However, in certain scenarios, such as private or team-specific environments, projects, or applications, access to metrics must sometimes be restricted to team members only.

Previously, you could either share all dashboard metrics with every user in the organization or restrict permissions based on the inherited permissions of the dashboard folder and assign basic roles such as viewer, editor, and admin. As a result, all users could view all metrics on the dashboards.

PMM V2.35 introduces access control, which allows you to manage metric access based on labels. You can manage which data can be viewed based on label(s) by creating access roles.

For instance, you can grant developers access to data related to the specific applications they’re responsible for in the production environment. By associating multiple labels with a role, only data from series that match your defined labels is returned. This feature ensures that you have complete control over who can access specific metrics. For more information, see Access Control documentation.

PMM Access Control

PMM Access Control

In addition to label-based access control, we plan to expand Role-based access control (RBAC), including access limitations to the features like Alerting and Backup with read-edit-delete permissions. 

We would appreciate it if you could share your feedback on access control. Kindly leave your comments and feedback on the forum.

Community Forum

An upcoming blog post will provide comprehensive information on use cases, examples and upcoming features relating to Access Control. Stay tuned!

GA: Deployment of PMM in Kubernetes with Helm Chart

Helm is a package manager for Kubernetes, somewhat like  package managers (YUM, APT, npm, Pip, Gem), but it works at the application level allowing you to deploy multiple manifests together.

We are happy to announce the General Availability of PMM deployment in Kubernetes with the Helm chart. PMM deployment via Helm chart in Kubernetes has been available as Tech Preview since PMM v2.29, but now we’re delighted to offer it to users as a fully supported feature. For more information, see Percona Helm Chart documentation. Percona Helm charts can be found in the percona-helm-charts repository on Github.

What benefits does Helm offer for deploying PMM in Kubernetes?

  • Helm provides the ability to group multiple Kubernetes manifests into a single entity known as a Helm Chart, which can be easily managed for deployments, rollbacks, and storage.
  • Additionally, Helm has built-in templating for Kubernetes manifests, eliminating the need for custom template systems to replace values within manifests, such as Docker tags.
  • Helm also supports creating Charts of Charts, which contain both templates and default values, allowing for easy deployment of an entire application and its dependencies.
  • Helm enables the creation of application catalogs or Helm repositories, similar to traditional package repositories, like npm registry, CPAN, Apache Maven Central, or Ruby Gems.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today


Reducing PostgreSQL Costs in the Cloud

Reducing PostgreSQL Costs in the Cloud

If you’re using PostgreSQL in the cloud, there’s a good chance you’re spending more than you need in order to get the results you need for your business.

Let’s take a look at how to get the benefits you need while spending less, based on the recommendations presented by Dani Guzmán Burgos, our Percona Monitoring and Management (PMM) Tech Lead, on this webinar (now available on demand) hosted in November last year.

Usage reduction: What to look for to reduce PostgreSQL cloud costs

The first step in cost reduction is to use what you need and not more. Don’t pay for capacity you don’t use or need. 

Usage reduction is a continuous process. Identifying which resources you can trim to reduce your monthly bill can be difficult, but looking at the right metrics will help you understand the application’s actual requirements.

In the Home Dashboard of PMM, a low CPU utilization on any of the database services that are being monitored could mean that the server is inactive or over-provisioned. Marked in red in Figure 1 is a server with less than 30%  of CPU usage. PMM can also show you historical data that can help you identify how long a service has been in a given state. Configuration of the CPU metrics can be changed in the dashboard. These color-coded states on panels are available in PMM 2.32.0 and later.

PMM Home Dashboard

Figure 1: PMM Home Dashboard

From the Amazon Web Services (AWS) documentation, an instance is considered over-provisioned when at least one specification of your instance, such as CPU, memory, or network, can be sized down while still meeting the performance requirements of your workload and no specification is under-provisioned. Over-provisioned instances may lead to unnecessary infrastructure costs.

Making the use of resources efficiently and ensuring that this does not impact the budget available for cloud computing is not a one-time fix but a continuous cycle of picking properly sized resources and eliminating over-provisioning.

Use reduction at scale requires a cultural shift, and engineers must consider the cost as they think of memory or bandwidth as another deployment KPI.

Think of a gaming company that creates a game that is getting popular, so the number of resources needed to support more users would increase considerably. But if the game loses popularity, the server would become over-provisioned, and the resources allocated must be re-sized to better fit the application’s needs.

Re-sizing to save costs

There are three approaches to usage reduction:

Regardless of the method you’re using to deploy your PostgreSQL instance, here are some metrics that would determine when re-sizing is needed:

  • CPU utilization
  • Memory usage
  • Network throughput
  • Storage usage

Remember that optimizing your infrastructure is intended for more than cost savings. You have to ensure that the operation is not impacted when you’re making decisions based on the metrics. The primary goal is to ensure that the services themselves do not run out of the required operating capacity.

PostgreSQL in the cloud


Considering AWS as your cloud platform of choice, the configuration made for your infrastructure will influence the performance of your application and monthly costs. For example, an Amazon Elastic Compute Cloud (EC2) instance with Graviton2 processors will be a better choice than non-ARM options, as it’s cheaper, and you will get real and faster cores which means the CPU cores are physical and not with hyper-threading. Graviton2 processors aim to use Reserved Instances to save costs in the long run.

Benefits of Graviton2 Processors

  • Best price performance for a broad range of workloads
  • Extensive software support
  • Enhanced security for cloud applications
  • Available with managed AWS services
  • Best performance per watt of energy used in Amazon EC2


Continuing with the AWS example, choosing the right storage option will be key to performance. Amazon Elastic Block Store (EBS) is your good-to-go option for disk space.

From AWS documentation, Amazon EBS is an easy-to-use, scalable, high-performance block-storage service designed for Amazon EC2.

Amazon Elastic Block Storage

Figure 2: Amazon Elastic Block Storage

Running relational or NoSQL databases is one of the use cases where EBS is recommended for. You can deploy and scale your databases, including SAP HANA, Oracle, Microsoft SQL Server, PostgreSQL, MySQL, Cassandra, and MongoDB.

With EBS, you can configure HDD-based volumes optimized for large streaming workloads or SSD-based volumes (recommended for database workloads) optimized for transactional workloads.

An SSD volume can be any of the following types:

  • io1
  • io2
  • io2 Block Express
  • gp2
  • gp3

Which one is a better choice for storage? It will depend on the requirements of your workload, including disk space, Input/Output Operations per Second (IOPS), and throughput rate (MB/s), and your configuration must be cost-optimized as well.

Avi Drabkin’s blog post is a recommended reading on this matter, as he analyzes the configuration required for every volume type to satisfy the requirements of a particular use case. For more information on EBS volume types, check the Amazon EBS Volume Types page.

Multi AZ deployments vs. read replicas

Multi-AZ deployment

In an Amazon RDS Multi-AZ deployment, Amazon RDS automatically creates a primary database (DB) instance and synchronously replicates the data to an instance in a different AZ. Amazon RDS automatically fails over to a standby instance without manual intervention when it detects a failure.

Amazon RDS Multi-AZ Deployment

Figure 3: Amazon RDS Multi-AZ Deployment

Read replica

Amazon RDS creates a second DB instance using a snapshot of the source DB instance. It then uses the engines’ native asynchronous replication to update the read replica whenever there is a change to the source DB instance. The read replica operates as a DB instance that allows only read-only connections; applications can connect to a read replica just as they would to any DB instance. Amazon RDS replicates all databases in the source DB instance.

Amazon RDS Read Replicas

Figure 4: Amazon RDS Read Replicas

Which option is better?

Multi-AZ deployments offer advantages, especially for HA and disaster recovery. The trade-off is that multi-AZ deployments are expensive.

A better option would be to deploy reader instances and combine them with the use of a reverse proxy, like pgpool-II or pgbouncer. The reader instances also cost more than a standard setup, but you can use them for production to handle everyday database traffic.

Pgpool-II can not only be used for reducing connection usage, which will be helpful to reduce CPU and memory usage but can also do load balancing. With load balancing, you can redistribute the traffic, sending the reading requests to your read replicas and writing requests to your main database instance automatically.

Regarding read replicas, in AWS, you cannot promote an RDS PostgreSQL read replica, which means a read replica can’t become the primary instance. Whenever you try to do this, the read replica detaches from the primary instance and become its own primary instance, and you will end up having two different clusters.

One solution is using the pglogical extension for creating replicas outside the RDS path. When combining the pglogical replication with a reverse proxy, you will still get the benefits of a managed database, including backups, minor upgrades maintenance, recovery support, and being tied to the Multi-AZ configuration, which translates to full control over planned failovers.

Also, converting a replica to the primary instance would be a better upgrade approach. For example, if you need to upgrade a database with a large amount of data, this process could take hours, and your instance won’t be available during this time. So, with this configuration, you can upgrade a replica and later convert that replica to the primary instance without interrupting operations.

Check this blog post for more information on how to use pglogical for upgrading your database instances.


As explained in this blog post, bloating in the database is created when tables or indexes are updated, an update is essentially a delete-and-insert operation. The disk space used by the delete is available for reuse but not reclaimed, creating the bloat.

How to remove bloat? That’s what the vacuum process is intended for with the help of autovacuum and vacuumdb.

Autovacuum is a daemon that automates the execution of VACUUM and ANALYZE (to gather statistics) commands. Autovacuum checks for bloated tables in the database and reclaims the space for reuse.

vacuumdb is a utility for cleaning a PostgreSQL database. vacuumdb will also generate internal statistics used by the PostgreSQL query optimizer. vacuumdb is a wrapper around the SQL command VACUUM.

The vacuum process is recommended to be scheduled at a time when your database has low traffic, usually at night. So, you must disable Autovacuum during the day and run vacuumdb at night will full power. This way, you guarantee that the resources will be available during the day when most operations occur.

Monitoring your database for dead tuples (bloat) is also recommended. For this matter, you can use the Experimental PostgreSQL Vacuum Monitoring. This experimental dashboard is not part of PMM, but you can try it and provide feedback.

What about serverless?

With serverless, you truly pay only for what you’re actively using, and unused resources aren’t typically easy to be left flying around, but the move to serverless isn’t without cost. The complexity of building any migration plan to serverless resides in execution and has very little to do with cost savings. There’s an entirely different lens through which you can evaluate serverless: total cost of ownership (TCO).

The TCO refers to the cost of engineering teams that are required to build a solution and the impact of time to market on the success and profitability of a service. Serverless allows you to delegate a lot of responsibility to the cloud provider.

Duties that DevOps engineers would typically perform (server management, scaling, provisioning, patching, etc.) become the responsibility of AWS, GCP, or Azure. And leaves dev teams with free time to focus on shipping differentiated features faster.

With TCO, you must consider that people’s costs may cancel out any infrastructure savings when considering moving from a monolithic application to a serverless one.

Returning to the benefits versus effort, you should consider the overall cost of redesigning services for serverless against the potential for reducing costs.


Knowing your project’s database requirements will be essential when choosing the services and hardware configuration you will pay your cloud service provider for. The configuration you make must guarantee the proper functioning of your application and will determine the monthly costs.

The number of resources required may vary over time, and metrics like CPU usage, memory usage, and disk storage will help you determine when re-sizing your infrastructure is needed.

For example, if the number of database transactions decreases after a certain period of time, you will have more resources than you need, and it will be important to change the configuration of your infrastructure to guarantee the new requirements and pay what is really being used.

Following the recommendations presented in this webinar, you can design an cost-optimized infrastructure without affecting your database’s performance.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL today!


Percona Server for MongoDB 6 Adds Support for Red Hat Enterprise Linux 9

Percona Server for MongoDB 6 Adds Support for Red Hat Enterprise Linux 9

We’re pleased to announce that Percona Server for MongoDB 6.0.4 adds support for Red Hat Enterprise Linux (RHEL) 9 and compatible Linux distributions.

With the 6.0.4 release you can now run Percona Server for MongoDB 6 on RHEL 7, 8, and 9, as well as Debian 10 and 11, and Ubuntu 18.04, 20.04, and 22.04 LTS.

We aim to support our software on the most widely used Linux distributions, coupled with support across several versions. As you’re planning your operating system and application strategies, we want to ensure that Percona software is available where you need it.

You can also, of course, run Percona Server for MongoDB in a Docker container as well. Have questions, comments, observations, or need help? Let’s chat in the Percona Forums!

Percona Server for MongoDB is a source-available, enterprise-grade drop-in replacement for MongoDB Community Edition.

Learn more about Percona Server for MongoDB


Run For Your Favorite Database at Percona Live 2023

Percona Live 2023 Run

Sure, you ensure your database runs, but have you ever thought about running for your favorite database? You can do just that in Denver, Colorado, on Sunday, May 21, the day before Percona Live 2023 kicks off!

Run for database Percona Live

We’re partnering with the Denver Marathon to attract relay teams of five runners each who will complete legs of 3.9 to 6.5 miles (6 to 10 kilometers). We are recruiting runners for Team MySQL, Team MariaDB, Team PostgreSQL, and Team MongoDB.

Want to work with us to organize another team? Let us know!

In addition to putting it on the line for their beloved databases, Denver Marathon Relay participants get a T-shirt and special recognition during Percona Live 2023.

Denver Marathon Database Relay participation is free for Percona Live attendees. Space is limited, so apply as soon as possible by filling out this form.


Streaming MongoDB Backups Directly to S3

streaming MongoDB backups

If you ever had to make a quick ad-hoc backup of your MongoDB databases, but there was not enough disk space on the local disk to do so, this blog post may provide some handy tips to save you from headaches.

It is a common practice that before a backup can be stored in the cloud or on a dedicated backup server, it has to be prepared first locally and later copied to the destination.

Fortunately, there are ways to skip the local storage entirely and stream MongoDB backups directly to the destination. At the same time, the common goal is to save both the network bandwidth and storage space (cost savings!) while not overloading the CPU capacity on the production database. Therefore, applying on-the-fly compression is essential.

In this article, I will show some simple examples to help you quickly do the job.

Prerequisites for streaming MongoDB backups

You will need an account for one of the providers offering object storage compatible with Amazon S3. I used Wasabi in my tests as it offers very easy registration for a trial and takes just a few minutes to get started if you want to test the service.

A second need is a tool allowing you to manage the data from a Linux command line. The two most popular ones — s3cmd and AWS — are sufficient, and I will show examples using both.

Installation and setup will depend on your OS and the S3 provider specifics. Please refer to the documentation below to proceed, as I will not cover the installation details here.

* https://s3tools.org/s3cmd
* https://docs.aws.amazon.com/cli/index.html

Backup tools

Two main tools are provided with the MongoDB packages, and both do a logical backup.

Compression tool

We all know gzip or bzip2 are installed by default on almost every Linux distro. However, I find zstd way more efficient, so I’ll use it in the examples.


I believe real-case examples are best if you wish to test something similar, so here they are.

Mongodump & s3cmd – Single database backup

  • Let’s create a bucket dedicated to MongoDB data backups:
$ s3cmd mb s3://mbackups
Bucket 's3://mbackups/' created

  • Now, do a simple dump of one example database using the −−archive option, which changes the behavior from storing collections data in separate files on disk, to streaming the whole backup to standard output (STDOUT) using common archive format. At the same time, the stream gets compressed on the fly and sent to the S3 destination.
  • Note the below command does not create a consistent backup with regards to ongoing writes as it does not contain the oplog.
$ mongodump --db=db2 --archive| zstd | s3cmd put - s3://mbackups/$(date +%Y-%m-%d.%H-%M)/db2.zst
2023-02-07T19:33:58.138+0100 writing db2.products to archive on stdout
2023-02-07T19:33:58.140+0100 writing db2.people to archive on stdout
2023-02-07T19:33:59.364+0100 done dumping db2.people (50474 documents)
2023-02-07T19:33:59.977+0100 done dumping db2.products (516784 documents)
upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-33/db2.zst' [part 1 of -, 15MB] [1 of 1]
15728640 of 15728640 100% in 1s 8.72 MB/s done
upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-33/db2.zst' [part 2 of -, 1491KB] [1 of 1]
1527495 of 1527495 100% in 0s 4.63 MB/s done

  • After the backup is done, let’s verify its presence in S3:
$ s3cmd ls -H s3://mbackups/2023-02-07.19-33/
2023-02-07 18:34 16M s3://mbackups/2023-02-07.19-33/db2.zst

Mongorestore & s3cmd – Database restore directly from S3

The below mongorestore command uses archive option as well, which allows us to stream the backup directly to it:

$ s3cmd get --no-progress s3://mbackups/2023-02-07.20-14/db2.zst - |zstd -d | mongorestore --archive --drop
2023-02-08T00:42:41.434+0100 preparing collections to restore from
2023-02-08T00:42:41.480+0100 reading metadata for db2.people from archive on stdin
2023-02-08T00:42:41.480+0100 reading metadata for db2.products from archive on stdin
2023-02-08T00:42:41.481+0100 dropping collection db2.people before restoring
2023-02-08T00:42:41.502+0100 restoring db2.people from archive on stdin
2023-02-08T00:42:42.130+0100 dropping collection db2.products before restoring
2023-02-08T00:42:42.151+0100 restoring db2.products from archive on stdin
2023-02-08T00:42:43.217+0100 db2.people 16.0MB
2023-02-08T00:42:43.217+0100 db2.products 12.1MB
2023-02-08T00:42:43.654+0100 db2.people 18.7MB
2023-02-08T00:42:43.654+0100 finished restoring db2.people (50474 documents, 0 failures)
2023-02-08T00:42:46.218+0100 db2.products 46.3MB
2023-02-08T00:42:48.758+0100 db2.products 76.0MB
2023-02-08T00:42:48.758+0100 finished restoring db2.products (516784 documents, 0 failures)
2023-02-08T00:42:48.758+0100 no indexes to restore for collection db2.products
2023-02-08T00:42:48.758+0100 no indexes to restore for collection db2.people
2023-02-08T00:42:48.758+0100 567258 document(s) restored successfully. 0 document(s) failed to restore.

Mongodump & s3cmd – Full backup

The below command provides a consistent point-in-time snapshot thanks to oplog option:

$ mongodump --port 3502 --oplog --archive | zstd | s3cmd put - s3://mbackups/$(date +%Y-%m-%d.%H-%M)/full_dump.zst
2023-02-13T00:05:54.080+0100 writing admin.system.users to archive on stdout
2023-02-13T00:05:54.083+0100 done dumping admin.system.users (1 document)
2023-02-13T00:05:54.084+0100 writing admin.system.version to archive on stdout
2023-02-13T00:05:54.085+0100 done dumping admin.system.version (2 documents)
2023-02-13T00:05:54.087+0100 writing db1.products to archive on stdout
2023-02-13T00:05:54.087+0100 writing db2.products to archive on stdout
2023-02-13T00:05:55.260+0100 done dumping db2.products (284000 documents)
upload: '<stdin>' -> 's3://mbackups/2023-02-13.00-05/full_dump.zst' [part 1 of -, 15MB] [1 of 1]
2023-02-13T00:05:57.068+0100 [####################....] db1.products 435644/516784 (84.3%)
15728640 of 15728640 100% in 1s 9.63 MB/s done
2023-02-13T00:05:57.711+0100 [########################] db1.products 516784/516784 (100.0%)
2023-02-13T00:05:57.722+0100 done dumping db1.products (516784 documents)
2023-02-13T00:05:57.723+0100 writing captured oplog to
2023-02-13T00:05:58.416+0100 dumped 136001 oplog entries
upload: '<stdin>' -> 's3://mbackups/2023-02-13.00-05/full_dump.zst' [part 2 of -, 8MB] [1 of 1]
8433337 of 8433337 100% in 0s 10.80 MB/s done

$ s3cmd ls -H s3://mbackups/2023-02-13.00-05/full_dump.zst
2023-02-12 23:05 23M s3://mbackups/2023-02-13.00-05/full_dump.zst

Mongodump & s3cmd – Full backup restore

By analogy, mongorestore is using the oplogReplay option to apply the log contained in the archived stream:

$ s3cmd get --no-progress s3://mbackups/2023-02-13.00-05/full_dump.zst - | zstd -d | mongorestore --port 3502 --archive --oplogReplay
2023-02-13T00:07:25.977+0100 preparing collections to restore from
2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "db1", skipping...
2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "db2", skipping...
2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "", skipping...
2023-02-13T00:07:25.977+0100 don't know what to do with subdirectory "admin", skipping...
2023-02-13T00:07:25.988+0100 reading metadata for db1.products from archive on stdin
2023-02-13T00:07:25.988+0100 reading metadata for db2.products from archive on stdin
2023-02-13T00:07:26.006+0100 restoring db2.products from archive on stdin
2023-02-13T00:07:27.651+0100 db2.products 11.0MB
2023-02-13T00:07:28.429+0100 restoring db1.products from archive on stdin
2023-02-13T00:07:30.651+0100 db2.products 16.0MB
2023-02-13T00:07:30.652+0100 db1.products 14.4MB
2023-02-13T00:07:33.652+0100 db2.products 32.0MB
2023-02-13T00:07:33.652+0100 db1.products 18.0MB
2023-02-13T00:07:36.651+0100 db2.products 37.8MB
2023-02-13T00:07:36.652+0100 db1.products 32.0MB
2023-02-13T00:07:37.168+0100 db2.products 41.5MB
2023-02-13T00:07:37.168+0100 finished restoring db2.products (284000 documents, 0 failures)
2023-02-13T00:07:39.651+0100 db1.products 49.3MB
2023-02-13T00:07:42.651+0100 db1.products 68.8MB
2023-02-13T00:07:43.870+0100 db1.products 76.0MB
2023-02-13T00:07:43.870+0100 finished restoring db1.products (516784 documents, 0 failures)
2023-02-13T00:07:43.871+0100 restoring users from archive on stdin
2023-02-13T00:07:43.913+0100 replaying oplog
2023-02-13T00:07:45.651+0100 oplog 2.14MB
2023-02-13T00:07:48.651+0100 oplog 5.68MB
2023-02-13T00:07:51.651+0100 oplog 9.34MB
2023-02-13T00:07:54.651+0100 oplog 13.0MB
2023-02-13T00:07:57.651+0100 oplog 16.7MB
2023-02-13T00:08:00.651+0100 oplog 19.7MB
2023-02-13T00:08:03.651+0100 oplog 22.7MB
2023-02-13T00:08:06.651+0100 oplog 25.3MB
2023-02-13T00:08:09.651+0100 oplog 28.1MB
2023-02-13T00:08:12.651+0100 oplog 30.8MB
2023-02-13T00:08:15.651+0100 oplog 33.6MB
2023-02-13T00:08:18.651+0100 oplog 36.4MB
2023-02-13T00:08:21.651+0100 oplog 39.1MB
2023-02-13T00:08:24.651+0100 oplog 41.9MB
2023-02-13T00:08:27.651+0100 oplog 44.7MB
2023-02-13T00:08:30.651+0100 oplog 47.5MB
2023-02-13T00:08:33.651+0100 oplog 50.2MB
2023-02-13T00:08:36.651+0100 oplog 53.0MB
2023-02-13T00:08:38.026+0100 applied 136001 oplog entries
2023-02-13T00:08:38.026+0100 oplog 54.2MB
2023-02-13T00:08:38.026+0100 no indexes to restore for collection db1.products
2023-02-13T00:08:38.026+0100 no indexes to restore for collection db2.products
2023-02-13T00:08:38.026+0100 800784 document(s) restored successfully. 0 document(s) failed to restore.

Mongoexport – Export all collections from a given database, compress, and save directly to S3

Another example is using the tool to create regular JSON dumps; this is also not a consistent backup if writes are ongoing.

$ ts=$(date +%Y-%m-%d.%H-%M)
$ mydb="db2"
$ mycolls=$(mongo --quiet $mydb --eval "db.getCollectionNames().join('n')")

$ for i in $mycolls; do mongoexport -d $mydb -c $i |zstd| s3cmd put - s3://mbackups/$ts/$mydb/$i.json.zst; done
2023-02-07T19:30:37.163+0100 connected to: mongodb://localhost/
2023-02-07T19:30:38.164+0100 [#######.................] db2.people 16000/50474 (31.7%)
2023-02-07T19:30:39.164+0100 [######################..] db2.people 48000/50474 (95.1%)
2023-02-07T19:30:39.166+0100 [########################] db2.people 50474/50474 (100.0%)
2023-02-07T19:30:39.166+0100 exported 50474 records
upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-30/db2/people.json.zst' [part 1 of -, 4MB] [1 of 1]
4264922 of 4264922 100% in 0s 5.71 MB/s done
2023-02-07T19:30:40.015+0100 connected to: mongodb://localhost/
2023-02-07T19:30:41.016+0100 [##......................] db2.products 48000/516784 (9.3%)
2023-02-07T19:30:42.016+0100 [######..................] db2.products 136000/516784 (26.3%)
2023-02-07T19:30:43.016+0100 [##########..............] db2.products 224000/516784 (43.3%)
2023-02-07T19:30:44.016+0100 [##############..........] db2.products 312000/516784 (60.4%)
2023-02-07T19:30:45.016+0100 [##################......] db2.products 408000/516784 (78.9%)
2023-02-07T19:30:46.016+0100 [#######################.] db2.products 496000/516784 (96.0%)
2023-02-07T19:30:46.202+0100 [########################] db2.products 516784/516784 (100.0%)
2023-02-07T19:30:46.202+0100 exported 516784 records
upload: '<stdin>' -> 's3://mbackups/2023-02-07.19-30/db2/products.json.zst' [part 1 of -, 11MB] [1 of 1]
12162655 of 12162655 100% in 1s 10.53 MB/s done

$ s3cmd ls -H s3://mbackups/$ts/$mydb/
2023-02-07 18:30 4M s3://mbackups/2023-02-07.19-30/db2/people.json.zst
2023-02-07 18:30 11M s3://mbackups/2023-02-07.19-30/db2/products.json.zst

Mongoimport & s3cmd – Import single collection under a different name

$ s3cmd get --no-progress s3://mbackups/2023-02-08.00-49/db2/people.json.zst - | zstd -d | mongoimport -d db2 -c people_copy
2023-02-08T00:53:48.355+0100 connected to: mongodb://localhost/
2023-02-08T00:53:50.446+0100 50474 document(s) imported successfully. 0 document(s) failed to import.

Mongodump & AWS S3 – Backup database

$ mongodump --db=db2 --archive | zstd | aws s3 cp - s3://mbackups/backup1/db2.zst
2023-02-08T11:34:46.834+0100 writing db2.people to archive on stdout
2023-02-08T11:34:46.837+0100 writing db2.products to archive on stdout
2023-02-08T11:34:47.379+0100 done dumping db2.people (50474 documents)
2023-02-08T11:34:47.911+0100 done dumping db2.products (516784 documents)

$ aws s3 ls --human-readable mbackups/backup1/
2023-02-08 11:34:50 16.5 MiB db2.zst

Mongorestore & AWS S3 – Restore database

$ aws s3 cp s3://mbackups/backup1/db2.zst - | zstd -d | mongorestore --archive --drop
2023-02-08T11:37:08.358+0100 preparing collections to restore from
2023-02-08T11:37:08.364+0100 reading metadata for db2.people from archive on stdin
2023-02-08T11:37:08.364+0100 reading metadata for db2.products from archive on stdin
2023-02-08T11:37:08.365+0100 dropping collection db2.people before restoring
2023-02-08T11:37:08.462+0100 restoring db2.people from archive on stdin
2023-02-08T11:37:09.100+0100 dropping collection db2.products before restoring
2023-02-08T11:37:09.122+0100 restoring db2.products from archive on stdin
2023-02-08T11:37:10.288+0100 db2.people 16.0MB
2023-02-08T11:37:10.288+0100 db2.products 13.8MB
2023-02-08T11:37:10.607+0100 db2.people 18.7MB
2023-02-08T11:37:10.607+0100 finished restoring db2.people (50474 documents, 0 failures)
2023-02-08T11:37:13.288+0100 db2.products 47.8MB
2023-02-08T11:37:15.666+0100 db2.products 76.0MB
2023-02-08T11:37:15.666+0100 finished restoring db2.products (516784 documents, 0 failures)
2023-02-08T11:37:15.666+0100 no indexes to restore for collection db2.products
2023-02-08T11:37:15.666+0100 no indexes to restore for collection db2.people
2023-02-08T11:37:15.666+0100 567258 document(s) restored successfully. 0 document(s) failed to restore.

In the above examples, I used both mongodump/mongorestore and mongoexport/mongoimport tools to backup and recover your MongoDB data directly to and from the S3 object storage type, while doing it the streaming and compressed way. Therefore, these methods are simple, fast, and resource-friendly. I hope what I used will be useful when you are looking for options to use in your backup scripts or ad-hoc backup tasks.

Additional tools

Here, I would like to mention that there are other free and open source backup solutions you may try, including Percona Backup for MongoDB (PBM), which now offers both logical and physical backups:

With the Percona Server for MongoDB variant, you may also stream hot physical backups directly to S3 storage:


It is as easy as this:

mongo > db.runCommand({createBackup: 1, s3: {bucket: "mbackups", path: "my_physical_dump1", endpoint: "s3.eu-central-2.wasabisys.com"}})
{ "ok" : 1 }

$ s3cmd du -H s3://mbackups/my_physical_dump1/
138M 26 objects s3://mbackups/my_physical_dump1/

For a sharded cluster, you should use PBM rather for consistent backups.

Btw, don’t forget to check out the MongoDB best backup practices!

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB today!


Talking Drupal #387 – ChatGPT

Today we are talking about ChatGPT with Ezequiel Lanza.

For show notes visit: www.talkingDrupal.com/387


  • What is ChatGPT?
  • What is AI?
  • What is Machine Learning?
  • Common misconceptions
  • How does it work?
  • Accuracy
  • Programmer bias
  • Use cases
  • Impressiveness
  • Drupal
  • Significance of Open Source


Hey GitHub – Coding with your voice ChatGPT Wolfram Alpha


Ezequiel Lanza – github.com/ezelanza @eze_lanza


Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Katherine Druckman – katherinedruckman.com @katherined

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Search API Solr Boost By User Term Allows your site to boost search results that share taxonomy term references with your users.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com