Understanding What Kubernetes Is Used For: The Key to Cloud-Native Efficiency

What Kubernetes Is Used For

If you have any experience working with database software, you have undoubtedly heard the term Kubernetes a lot. You may already be using it daily and find it makes running applications in the cloud much more manageable. But for those who are not so familiar, in this post, we will discuss how Kubernetes has emerged as the unsung hero in an industry where agility and scalability are critical success factors.

At its core, Kubernetes (often abbreviated as K8s) is an open source tool that automates the deployment, scaling, and management of containerized applications. It simplifies infrastructure management and is the driving force behind many cloud-native applications and services.

For some background, Kubernetes was created by Google and is currently maintained by the Cloud Native Computing Foundation (CNCF). It has become the industry standard for cloud-native container orchestration.

To get you started with Kubernetes, this post offers a high-level introduction to its core features, highlighting its built-in advantages and showcasing some practical, real-world uses.  It’s important to note that this overview is not intended to be an exhaustive guide to every facet of Kubernetes. Kubernetes can be complex, which is why we offer comprehensive training that equips you and your team with the expertise and skills to manage database configurations, implement industry best practices, and carry out efficient backup and recovery procedures.

Making container deployment simpler

One of Kubernetes’ primary use cases is deploying containerized apps. Applications are packaged into a single, lightweight container with their dependencies, typically including the application’s code, customizations, libraries, and runtime environment. These containers encapsulate all the components necessary to run the application independently, making them portable and consistent across various environments. Kubernetes manages and orchestrates these containers, handling tasks such as deployment, scaling, load balancing, and networking.

With Kubernetes, you can define and manage your application deployments declaratively, meaning you can tell it how your apps should operate, and Kubernetes takes care of the rest.

Maintaining high availability

Kubernetes also makes it easier for applications to scale in response to changing workloads to maintain high availability. Applications can be horizontally scaled with Kubernetes by adding or deleting containers based on resource allocation and incoming traffic demands. It distributes the load among containers and nodes automatically, ensuring that your application can handle any spike in traffic without the need for manual intervention from an IT staff.

In addition, Kubernetes provides features like continuous monitoring, self-healing capabilities (automatically replacing containers if they happen to fail), and rolling updates (gradual updates of your applications), ensuring that your applications are always available — even in the face of failures or updates. And, if an issue ever does arise from an update, Kubernetes enables quick rollbacks to a previous version, preventing extended outages.

Multi-cloud deployments

Kubernetes is cloud-agnostic, freeing your applications from being tied to a single cloud provider. Your workloads, encapsulated in containers, can be deployed freely across different clouds or your own hardware. This portability ensures that your applications remain consistent and operational, regardless of where they are deployed.

Because of this flexibility, businesses may choose the infrastructure that best meets their needs. Instead of depending on the services of a single cloud vendor, Kubernetes provides the flexibility to distribute workloads over various cloud providers or on-premises infrastructure, reducing the risk of becoming trapped in a single ecosystem. This makes it much easier for organizations to transition to open source solutions, reduce costs, and avoid vendor lock-in.

Controlled access and security

A wide range of security capabilities are available with Kubernetes, which are intended to safeguard your clusters’ integrity and containerized applications. These consist of the following but are not restricted to:

  • Network policies: Establishing communication guidelines between pods provides additional protection at the network level. Essential features include Allow and Deny Rules, Ingress and Egress Rules, and Segmentation and Isolation. These are all intended to make it harder for someone to compromise other pods, even if they manage to gain access to one.
  • Role-Based Access Control (RBAC): RBAC manages permissions to ensure that only individuals, programs, or processes with the proper authorization can utilize particular resources. In essence, it establishes permissions within a Kubernetes cluster.
  • Secrets management: To guarantee that your applications can safely access the credentials they require, Kubernetes additionally offers a secure means of storing sensitive data, including configuration data, passwords, and API keys.

Cost and resource efficiency

One of Kubernetes’ main advantages is its efficient use of resources. By putting several containers onto a single node, it optimizes without over-provisioning and provides tools for setting resource limits and requests for containers. When you combine all of this, you can ensure that your apps are allotted the appropriate amount of CPU and memory in the most effective way possible, saving you money and improving performance.

Continuous Integration and Deployment (CI/CD)

Kubernetes is a dependable base for continuous integration and delivery pipelines and not just a platform for orchestrating containers. It can be easily combined with continuous integration/continuous delivery (CI/CD) tools to automate the building, testing, and deployment of applications, facilitating the release of new versions/updates and code across numerous servers — with no downtime — enhancing development workflows and providing for quicker time-to-market for your applications.

Efficient microservices

The technique of creating applications using microservices architecture makes it possible to divide an application into smaller, independently deployable services, which simplifies development and maintenance. These services, known as “microservices,” can interact with one another through APIs and are each intended to carry out a particular task or feature of the application.

Management of stateful applications

Although stateless applications are frequently associated with Kubernetes, this flexible platform can also effectively handle stateful workloads, such as database management. For companies looking to streamline stateful application management while maintaining high availability and data integrity, Kubernetes is an excellent option because of its automation, scalability, and dependability

Real-world Kubernetes use cases

Numerous industries, including finance, healthcare, e-commerce, education, etc., have adopted Kubernetes. It is an invaluable tool for resolving complicated issues and streamlining processes due to its flexibility and scalability. Here are a few examples of actual Kubernetes use cases across various industries:

E-commerce: Traffic to e-commerce platforms varies, particularly during sales, promotions, and holidays. The seamless shopping experience is ensured by Kubernetes, which assists in automatically scaling web applications to handle this increased load.

Entertainment: To guarantee that viewers have continuous access to films and TV series, film studios and television networks rely on Kubernetes for content delivery and streaming. Just consider the sheer number of people who stream Netflix every night!

Finance: Financial institutions utilize Kubernetes for its security and compliance features. It facilitates simple scalability and effective disaster recovery and enables them to deploy and administer applications in a highly regulated environment.

Education: Universities often use Kubernetes for projects that require substantial resources because it scales computing clusters for research, data analysis, and simulations.

Healthcare: Hospitals and healthcare providers use Kubernetes for processing and analyzing vast amounts of health data so that medical records, lab results, and patient information are protected and available when needed.

Telecommunications: By guaranteeing low-latency communication, Kubernetes assists the telecom sector in quickly deploying 5G and edge computing applications.


Kubernetes’ versatility and scalability empower organizations to streamline and simplify their containerized workloads while enhancing scalability, ensuring high availability, and improving resource efficiency. It is the go-to choice for businesses looking to move to the cloud, embrace microservices, and deploy applications across various environments, all while avoiding the high cost of vendor lock-in. 

Are you interested in learning more about Kubernetes or need assistance with your cloud-native strategy? With Percona Kubernetes Operators, you can manage database workloads on any supported Kubernetes cluster running in private, public, hybrid, or multi-cloud environments. They are 100% open source, free from vendor lock-in, usage restrictions, and expensive contracts, and include enterprise-ready features by default.


Learn more about Percona Kubernetes Operators

Discover how Percona Operator for MongoDB transformed one of our valued customers’ experience with Kubernetes. With Percona as their trusted partner, they’ve experienced reliable solutions and outstanding support for their database needs.



What are some common use cases for Kubernetes?

Kubernetes is widely used for managing containerized applications, making it suitable for various scenarios, including web services, microservices, and data processing applications.

Is Kubernetes suitable for small businesses, or is it primarily for large enterprises?

All sizes of businesses can benefit from Kubernetes. Small businesses can also benefit from its features, like automation and resource optimization, even though large corporations frequently use it for complex workloads.

Can Kubernetes work with different cloud providers?

Yes, Kubernetes offers the flexibility and portability of your app since it is cloud-agnostic and compatible with multiple cloud providers.


Help! I Am Out of Disk Space!

Out of Disk Space MySQL

Out of Disk Space MySQLHow can we fix a nasty out-of-space issue leveraging the flexibility of Percona Operator for MySQL?

When planning a database deployment, one of the most challenging factors to consider is the amount of space we need to dedicate to data on disk.

This is even more cumbersome when working on bare metal, as it is more difficult to add space when using this kind of solution with respect to the cloud.

When using cloud storage like EBS or similar, it is normally easy(er) to extend volumes, which gives us the luxury to plan the space to allocate for data with a good grade of relaxation. 

Is this also true when using a solution based on Kubernetes like Percona Operator for MySQL? Well, it depends on where you run it. However, if the platform you choose supports the option to extend volumes, K8s per se gives you the possibility to do so as well.

Nonetheless, if it can go wrong it will, and ending up with a fully filled device with MySQL is not a fun experience. 

As you know, on normal deployments, when MySQL has no space left on the device, it simply stops working, ergo it will cause a production down event, which of course is unfortunate and we want to avoid it at any cost.  

This blog is the story of what happened, what was supposed to happen, and why. 

The story

The case was on AWS using EKS.

Given all the above, I was quite surprised when we had a case in which a deployed solution based on Percona Operator for MySQL went out of space. However, we started to dig in and review what was going on and why.

The first thing we did was to quickly investigate what was really taking space, and that could have been an easy win if most of the space was taken by some log, but unfortunately, this was not the case, as data was really taking all the available space. 

The next step was to check what storage class (SC) was used for the PersistentVolumeClaim (PVC):

k get pvc
datadir-mt-cluster-1-pxc-0   pvc-<snip>   233Gi      RWO            io1
datadir-mt-cluster-1-pxc-1   pvc-<snip>   233Gi      RWO            io1
datadir-mt-cluster-1-pxc-2   pvc-<snip>   233Gi      RWO            io1

Ok we use the io1 SC, it is now time to check if the SC is supporting volume expansion:

kubectl describe sc io1
Name:            io1
IsDefaultClass:  No
Parameters:            fsType=ext4,iopsPerGB=12,type=io1
AllowVolumeExpansion:  <unset> <------------
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

And no is not enabled, and in this case, we cannot just go and expand the volume, we must change the storage class settings first. To enable volume expansion, you need to delete the storage class and enable it again. 

Unfortunately, we were unsuccessful in doing that operation, because the storage class kept staying unset for  ALLOWVOLUMEEXPANSION. 

As said, this is a production down event, so we cannot invest too much time in digging into why it was not correctly changing the mode, we had to act quickly. 

The only option we had to fix it was:

  • Expand the io1 volumes from the AWS console (or AWS client)
  • Resize the file system 
  • Patch any K8 file to allow K8 to correctly see the new volume’s dimension  

Expanding EBS volumes from the console is trivial, just go to Volumes, select the volume you want to modify, choose modify, and change the size of it with the one desired, and done. 

Once that is done, connect to the Node hosting the pod which has the volume mounted like this:

k get pods -o wide|grep mysql-0
NAME                                        READY     STATUS    RESTARTS   AGE    IP            NODE             
cluster-1-pxc-0                               2/2     Running   1          11d     <mynode>.eu-central-1.compute.internal

Then we need to get the id of the PVC to identify it on the node:

k get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS
datadir-cluster-1-pxc-0   Bound    pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df   233Gi      RWO            io1

One note, when doing this kind of recovery with a Percona XtraDB Cluster-based solution, always recover node-0 first, then the others.  

So we connect to <mynode> and identify the volume: 

lslbk |grep pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df
nvme1n1      259:4    0  350G  0 disk /var/lib/kubelet/pods/9724a0f6-fb79-4e6b-be8d-b797062bf716/volumes/ <-----

At this point we can resize it:

root@ip-<snip>:/# resize2fs  /dev/nvme1n1
resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/nvme1n1 is mounted on /var/lib/kubelet/plugins/; on-line resizing required
old_desc_blocks = 30, new_desc_blocks = 44
The filesystem on /dev/nvme1n1 is now 91750400 (4k) blocks long.

The good thing is that as soon as you do that, the MySQL daemon sees the space and will restart, however, it will happen only on the current pod and K8 will still see the old dimension:

k get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS   REASON
pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df   333Gi      RWO            Delete           Bound    pxc/datadir-cluster-1-pxc-0   io1

To allow K8 to be aligned with the real dimension, we must patch the information stored, and the command is the following:

kubectl patch pvc <pvc-name>  -n <pvc-namespace> -p '{ "spec": { "resources": { "requests": { "storage": "NEW STORAGE VALUE" }}}}'
kubectl patch pvc datadir-cluster-1-pxc-0 -n pxc -p '{ "spec": { "resources": { "requests": { "storage": "350" }}}}'

Remember to use as PVC-name the NAME coming from:

kubectl get pvc.

Once this is done, K8 will see the new volume dimension correctly.

Just repeat the process for Node-1 and Node-2 and …done, the cluster is up again.

Finally, do not forget to modify your custom resources file (cr.yaml) to match the new volume size. I.E.:

        storageClassName: "io1"
            storage: 350G

The whole process took just a few minutes, it was time now to investigate why the incident happened and why the storage class was not allowing extension in the first place.  

Why it happened

Well, first and foremost the platform was not correctly monitored. As such there was a lack of visibility about space utilization and no alert about disk space. 

This was easy to solve by enabling the Percona Monitoring and Management (PMM) feature in the cluster cr and setting the alert in PMM once the nodes join it (see for details on how to do it).

The second issue was the problem with the storage class. Once we had the time to carefully review the configuration files, we identified that there was an additional tab in the SC class, which was causing K8 to ignore the directive. 

It was supposed to be:

kind: StorageClass
  name: io1
  annotations: "false"
  type: io1
  iopsPerGB: "12"
  fsType: ext4 
allowVolumeExpansion: true <----------

It was:
kind: StorageClass
  name: io1
  annotations: "false"
  type: io1
  iopsPerGB: "12"
  fsType: ext4 
  allowVolumeExpansion: true. <---------

What was concerning was the lack of error returned by the Kubernetes API, so in theory the configuration was accepted but not really validated. 

In any case, once we had fixed the typo and recreated the SC, the setting for volume expansion was correctly accepted:

kubectl describe sc io1
Name:            io1
IsDefaultClass:  No
Parameters:            fsType=ext4,iopsPerGB=12,type=io1
AllowVolumeExpansion:  True    
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

What should have happened instead?

If proper monitoring and alerting were in place, the administrators would have the time to act and extend the volumes without downtime. 

However, the procedure for extending volumes on K8 is not complex but also not as straightforward as you may think. My colleague Natalia Marukovich wrote a blog post, Percona Operator Volume Expansion Without Downtime, that gives you the step by step instructions on how to extend the volumes without downtime. 


Using the cloud, containers, automation, or more complex orchestrators like Kubernetes, does not solve all, does not prevent mistakes from happening, and more importantly, does not make the right decisions for you.

You must set up a proper architecture that includes backup, monitoring, and alerting. You must set the right alerts and act on them in time.

Finally, automation is cool, however, the devil is in the details and typos are his day-to-day joy. Be careful and check what you put online, do not rush it. Validate, validate, validate…

Great stateful MySQL to all.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today


Run PostgreSQL in Kubernetes: Solutions, Pros and Cons

Run PostgreSQL in Kubernetes

PostgreSQL’s initial release was in 1996 when cloud-native was not even a term. Right now it is the second most popular relational open source database according to DB-engines. With its popularity growth and the rising trend of Kubernetes, it is not a surprise that there are multiple solutions to run PostgreSQL on K8s.

In this blog post, we are going to compare these solutions and review the pros and cons of each of them. The solutions under our microscope are:

  1. Crunchy Data PostgreSQL Operator (PGO)
  2. CloudNative PG from Enterprise DB
  3. Stackgres from OnGres
  4. Zalando Postgres Operator
  5. Percona Operator for PostgreSQL

The summary and comparison table can be found in our documentation.

Crunchy Data PGO

Crunchy Data is a company well-known in the PostgreSQL community. They provide a wide range of services and software solutions for PG. Their PostgreSQL Operator (PGO) is fully open source (Apache 2.0 license), but at the same time container images used by the operator are shipped under Crunchy Data Developer Program. This means that you cannot use the Operator with these images in production without the contract with Crunchy Data. Read more in the Terms of Use.


According to the documentation, the latest version of the operator is 5.2.0, but the latest tag in Github is 4.7.7. I was not able to find which version is ready for production, but I will use a quickstart installation from the GitHub page, which installs 5.2.0. The quick start is not that quick. First, you need to fork the repository with examples: link.

Executing these commands failed for me:

YOUR_GITHUB_UN="<your GitHub username>"
git clone --depth 1 "${YOUR_GITHUB_UN}/postgres-operator-examples.git"
cd postgres-operator-examples

Cloning into 'postgres-operator-examples'... Permission denied (publickey).
fatal: Could not read from remote repository.

I just ended up cloning the repo with 

git clone --depth 1

Ran kustomize script which failed as well:

$ kubectl apply -k kustomize/install
error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/percona/postgres-operator-examples/kustomize/install'

The instructions on the documentation page have other commands, so I used them instead. As a person who loves open source, I sent a PR to fix the doc on Github. 

kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default

Now Operator is installed. Install the cluster:

kubectl apply -k kustomize/postgres/


PGO operator is used in production by various companies, comes with management capabilities, and allows users to fine-tune PostgreSQL clusters.

No need to go through the regular day-two operations, like backups and scaling. The following features are quite interesting:

  • Extension Management. PostgreSQL extensions expand the capabilities of the database. With PGO, you can easily add extensions for your cluster and configure them during bootstrap. I like the simplicity of this approach.
  • User / database management. Create users and databases during cluster initialization. This is very handy for CICD pipelines and various automations.
  • Backup with annotations. Usually, Operators come with a separate Custom Resource Definition for backups and restores. In the case of PGO, backups, and restores are managed through annotations. This is an antipattern but still follows the declarative form.

CloudNative PG

This operator was maturing in EnterpriseDB (EDB) to be finally open-sourced recently. It is Apache-licensed and fully open source, and there is an EDB Postgres operator, which is a fork based on CloudNative PG. The Enterprise version has some additional features, for example, support for Red Hat OpenShift.


Using quickstart, here is how to install the Operator:

kubectl apply -f \

It automatically creates


namespace and deploys necessary CRDs, service accounts, and more.

Once done, you can deploy the PostgreSQL cluster. There are multiple exampolary YAMLs.

kubectl apply -f

There is also a helm chart available that can simplify the installation even more.


CloudNative PG comes with a wide range of regular operational capabilities: backups, scaling, and upgrades. The architecture of the Operator is quite interesting:

  • No StatefulSets. Normally, you would see StatefulSets used for stateful workloads in Kubernetes. Here PostgreSQL cluster is deployed with standalone Pods which are fully controlled by the Operator.
  • No Patroni. Patroni is a de-facto standard in the PostgreSQL community to build highly available clusters. Instead, they use Postgres instance manager.
  • Barman for backups. Not a usual choice as well, but can be explained by the fact that pgBarman, a backup tool for PostgreSQL, was developed by the 2nd Quadrant team which was acquired by EDB.

Apart from architecture decisions, there are some things that I found quite refreshing:

  • Documentation. As a product manager, I’m honestly fascinated by their documentation. It is very detailed, goes deep into details, and is full of various examples covering a wide variety of use cases. 
  • The custom resource which is used to create the cluster is called “Cluster”. It is a bit weird, but running something like kubectl get cluster is kinda cool.
  • You can bootstrap the new cluster, from an existing backup object and use streaming replication from the existing PostgreSQL cluster, even from outside Kubernetes. Useful for CICD and migrations.


OnGres is a company providing its support, professional, and managed services for PostgreSQL. The operator – Stackgres – is licensed under AGPL v3.


Installation is super simple and described on the website. It boils down to a single command:

kubectl apply -f ''

This will deploy the web user interface and the operator. The recommended way to deploy and manage clusters is through the UI. Get the login and password:

kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "username = %s\n" (.data.k8sUsername | base64decode) }}'
kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "password = %s\n" (.data.clearPassword | base64decode) }}'

Connect to the UI. You can either expose the UI through a LoadBalancer or with Kubernetes port forwarding:

POD_NAME=$(kubectl get pods --namespace stackgres -l "app=stackgres-restapi" -o jsonpath="{.items[0]}")
kubectl port-forward ${POD_NAME} --address 8443:9443 --namespace stackgres

Deployment of the cluster in the UI is quite straightforward and I will not cover it here.


UI allows users to scale, backup, restore, clone, and perform various other tasks with the clusters. I found it a bit hard to debug issues. It is recommended to set up a log server and debug issues on it, but I have not tried it. But the UI itself is mature, flexible, and just nice!

Interesting ones:

  • Experimental Babelfish support that enables the migration from MSSQL to save on license costs.
  • Extension management system, where users can choose the extension and its version to expand PG cluster capabilities.

  • To perform upgrades, Vacuum, and other database activities, the Operator provides Database Operation capability. It also has built-in benchmarking, which is cool!

Zalando Postgres Operator

Zalando is an online retailer of shoes, fashion, and beauty. It is the only company in this blog post that is not database-focused. They open-sourced the Operator that they use internally to run and manage PostgreSQL databases and it is quite widely adopted. It is worth mentioning that the Zalando team developed and open-sourced Patroni, which is widely adopted and used.


You can deploy Zalando Operator through a helm chart or with kubectl. Same as with Stackgres, this Operator has a built-in web UI.

Helm chart installation is the quickest and easiest way to get everything up and running:

# add repo for postgres-operator
helm repo add postgres-operator-charts

# install the postgres-operator
helm install postgres-operator postgres-operator-charts/postgres-operator

# add repo for postgres-operator-ui
helm repo add postgres-operator-ui-charts

# install the postgres-operator-ui
helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui

Expose the UI:

kubectl port-forward svc/postgres-operator-ui 8081:80

Connect to the UI and create the cluster. 


This is one of the oldest PostgreSQL Operators, over time its functionality was expanding. It supports backups and restores, major version upgrades, and much more. Also, it has a web-based user interface to ease onboarding.

  • The operator heavily relies on Spilo – docker image that provides PostgreSQL and Patroni bundled together. It was developed in Zalando as well. This is a centerpiece to build HA architecture.
  • As Zalando is using AWS for its infrastructure, the operator is heavily tested and can be integrated with AWS. You can see it in some features – like live volume resize for AWS EBS or gp2 to gp3 migration.

Percona Operator for PostgreSQL

Percona is committed to providing software and services for databases anywhere. Kubernetes is a de-facto standard for cloud-native workloads that helps with this commitment.

What are the most important things about our Operator:

  • Fully open source
  • Supported by the community and Percona team. If you have a contract with Percona, you are fully covered with our exceptional services.
  • It is based on the Crunchy Data PGO v 4.7 with enhancements for monitoring, upgradability, and flexibility


We have quick-start installation guides through helm and regular YAML manifests. The installation through helm is as follows:

Install the Operator:

helm repo add percona
helm install my-operator percona/pg-operator --version 1.3.0

Deploy PostgreSQL cluster:

helm install my-db percona/pg-db --version 1.3.0


Most of the features are inherited from Crunchy Data – backups, scaling, multi-cluster replication, and many more. 

    • Open Source. Compared to Crunchy Data PGO, we do not impose any limitations on container images, so it is fully open source and can be used without any restrictions in production. 
    • Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. Percona Operators come with an integration with PMM, so that users get full visibility into the health of their databases. 
    • Automated Smart Upgrades. Our Operator not only allows users to upgrade the database but also does it automatically and in a safe, zero-downtime way.
    • One-stop shop. Today’s enterprise environment is multi-database by default. Percona can help companies run PostgreSQL, MySQL, and MongoDB databases workloads over Kubernetes in a comprehensive manner.

To keep you excited, we are working on version two of the operator. It will have an improved architecture, remove existing limitations for backups and restores, enable automated scaling for storage and resources, and more. This quarter we plan to release a beta version, keep an eye on our releases.


PostgreSQL in Kubernetes is not a necessary evil, but an evolutionary step for companies who chose k8s as their platform. Choosing a vendor and a solution – is an important technical decision, which might impact various business metrics in the future. Still confused with various choices? Please start a discussion on the forum or contact our team directly.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators


MySQL in Microservices Environments

MySQL in Microservices Environments

MySQL in Microservices EnvironmentsThe microservice architecture is not a new pattern but has become very popular lately for mainly two reasons: cloud computing and containers. That combo helped increase adoption by tackling the two main concerns on every infrastructure: Cost reduction and infrastructure management.

However, all that beauty hides a dark truth:

The hardest part of microservices is the data layer

And that is especially true when it comes to classic relational databases like MySQL. Let’s figure out why that is.

MySQL and the microservice

Following the same two pillars of microservices (cloud computing and containers), what can one do with that in the MySQL space? What do cloud computing and containers bring to the table?

Cloud computing

The magic of the cloud is that it allows you to be cost savvy by letting you easily SCALE UP/SCALE DOWN the size of your instances. No more wasted money on big servers that are underutilized most of the time. What’s the catch? It’s gotta be fast. Quick scale up to be ready to deal with traffic and quick scale down to cut costs when traffic is low. 


The magic of containers is that one can slice hardware to the resource requirements. The catch here is that containers were traditionally used on stateless applications. Disposable containers.

Relational databases in general and MySQL, in particular, are not fast to scale and are stateful. However, it can be adapted to the cloud and be used for the data layer on microservices.

The Scale Cube

The Scale Cube

The book “The Art of Scalability” by Abott and Fisher describes a really useful, three dimension scalability model: the Scale Cube. In this model, scaling an application by running clones behind a load balancer is known as X-axis scaling. The other two kinds of scaling are Y-axis scaling and Z-axis scaling. The microservice architecture is an application of Y-axis scaling: It defines an architecture that structures the application as a set of loosely coupled, collaborating services.

  • X-Axis: Horizontal Duplication and Cloning of services and data (READ REPLICAS)
  • Y-Axis: Functional Decomposition and Segmentation – (MULTI-TENANT)
  • Z-Axis: Service and Data Partitioning along Customer Boundaries – (SHARDING)

On microservices, each service has its own database in order to be decoupled from other services. In other words: a service’s transactions only involve its database; data is private and accessible only via the microservice API.

It’s natural that the first approach to divide the data is by using the multi-tenant pattern:

Actually before trying multi-tenant, one can use a tables-per-service model where each service owns a set of tables that must only be accessed by that service, but by having that “soft” division, the temptation to skip the API and access directly other services’ tables is huge.

Schema-per-service, where each service has a database schema that is private to that service is appealing since it makes ownership clearer. It is easy to create a user per database, with specific grants to limit database access.

This pattern of “shared database” however comes with some drawbacks like:

  • Single hardware: a failure in your database will hurt all the microservices
  • Resource-intensive tasks related to a particular database will impact the other databases (think on DDLs)
  • Shared resources: disk latency, IOPS, and bandwidth needs to be shared, as well as other resources like CPU, Network bandwidth, etc.

The alternative is to go “Database per service”

Database per service

Share nothing. Cleaner logical separation. Isolated issues. Services are loosely coupled. In fact, this opens the door for microservices to use a database that best suits their needs, like a graph db, a document-oriented database, etc. But as with everything, this also comes with drawbacks:

  • The most obvious: cost. More instances to deploy
  • The most critical: Distributed transactions. As we mentioned before, microservices are collaborative between them and that means that transactions span several services. 

The simplistic approach is to use a two-phase commit implementation. But that solution is just an open invitation to a huge amount of locking issues. It just doesn’t scale. So what are the alternatives?

  • Implementing transactions that span services: The Saga pattern
  • Implementing queries that span services: API composition or Command Query Responsibility Segregation (CQRS)

A saga is a sequence of local transactions. Each local transaction updates the database and publishes messages or events that trigger the next local transaction in the saga. If a local transaction fails for whatever reason, then the saga executes a series of compensating transactions that undo the changes made by the previous transactions. More on Saga here:

An API composition is just a composer that invokes queries on each microservice and then performs an in-memory join of the results:

CQRS is keeping one or more materialized views that contain data from multiple services. This avoids the need to do joins on the query size:

What do all these alternatives have in common? That is taken care of at the API level: it becomes the responsibility of the developer to implement and maintain it. The data layer keep continues to be data, not information.

Make it cloud

There are means for your MySQL to be cloud-native: Easy to scale up and down fast; running on containers, a lot of containers; orchestrated with Kubernetes; with all the pros of Kubernetes (health checks, I’m looking at you).

Percona Operator for MySQL based on Percona XtraDB Cluster

A Kubernetes Operator is a special type of controller introduced to simplify complex deployments. Operators provide full application lifecycle automation and make use of the Kubernetes primitives above to build and manage your application. 

Percona Operator for MySQL

In our blog post “Introduction to Percona Operator for MySQL Based on Percona XtraDB Cluster” an overview of the operator and its benefits are covered. However, it’s worth mentioning what does it make it cloud native:

  • It takes advantage of cloud computing, scaling up and down
  • Runs con containers
  • Is orchestrated by the cloud orchestrator itself: Kubernetes

Under the hood is a Percona XtraDB Cluster running on PODs. Easy to scale out (increase the number of nodes: and can be scaled up by giving more resources to the POD definition (without downtime)

Give it a try and unleash the power of the cloud on MySQL.


Percona Operator for MongoDB Goes Cluster-Wide, Adds AKS Support, and More!

Percona Operator for MongoDB Goes Cluster-Wide

Percona Operator for MongoDB Goes Cluster-WidePercona Operator for MongoDB version 1.13 was recently released and it comes with various ravishing features. In this blog post, we are going to look under the hood and see what are the practical use cases for these improvements.

Cluster-wide deployment

There are two modes that Percona Operators support:

  1. Namespace scope
  2. Cluster-wide

Namespace scope limits the Operator operations to a single namespace, whereas in cluster-wide mode Operator can deploy and manage databases in multiple namespaces of a Kubernetes cluster. Our Operators for PostgreSQL and MySQL already support cluster-wide mode. With the 1.13 release, we are closing the gap for Percona Operator for MongoDB. 

Multi-tenant clusters are the most common call for cluster-wide mode. You as a cluster administrator manage a single deployment of the Operator and equip your teams with the way to deploy and manage MongoDB in their isolated namespaces. Read more about multi-tenancy and best practices in our Multi-Tenant Kubernetes Cluster with Percona Operators.

How does it work?

To deploy in cluster-wide mode we introduce


manifests. The quickest way would be to use the


which deploys the following:

  • Custom Resource Definition
  • Service Account and Cluster Role that allows Operator to create and manage Kubernetes objects in various namespaces
  • Operator Deployment itself

By default, Operator monitors all the namespaces in the cluster. The


environment variable in the Operator Deployment limits the scope. It can be a comma-separated list that instructs the Operator on which namespaces to monitor for Custom Resource objects:

            - name: WATCH_NAMESPACE
              value: "app-dev1,app-dev2”

This is useful if you want to limit the blast radius, but run multiple Operators monitoring various namespaces. For example, you can run an Operator per environment – development, staging, production.

Deploy the bundle:

kubectl apply -f deploy/cw-bundle.yaml -n psmdb-operator

Now you can start deploying databases in the namespaces you need:

kubectl apply -f deploy/cr.yaml -n app-dev1
kubectl apply -f deploy/cr.yaml -n app-dev2

See the demo below where I deploy two clusters in different namespaces with a single Operator.

Hashicorp Vault integration for encryption-at-rest

We take security seriously at Percona. Data-at-rest encryption prevents data visibility in the event of unauthorized access or theft. It is supported by all our Operators. With this release, we introduce the support for integration with Hashicorp Vault, where the user can keep the keys in the Vault and instruct Percona Operator for MongoDB to use those. This feature is in a technical preview stage.

There is a good blog post that describes how Percona Server for MongoDB works with the Vault. In Operator, we implement the same functionality and follow the structure of the same parameters.

How does it work?

We are going to assume that you already have Hashicorp Vault installed – you either use Cloud Platform or a self-hosted version. We will focus on the configuration of the Operator.

To instruct the Operator to use Vault you need to specify two things in the Custom Resource:

  1. secrets.vault

    – Secret resource with a Vault token in it

  2. Custom configuration for mongod for a replica set and config servers


Example of cr.yaml:

    vault: my-vault-secret

The secret object itself should contain the token that has access to create, read, update and delete the secrets in the desired path in the Vault. Please refer to the Vault documentation to understand policies better.

Example of a Secret:

apiVersion: v1
  token: aHZzLnhrVVRPTEVOM2dLQmZuV0I5WTF0RmtOaA==
kind: Secret
  name: my-vault-secret
  namespace: default
type: Opaque

Custom configuration

The operator allows users to fine-tune mongod and mongos configurations. For encryption to work, you must specify vault configuration for replica sets – both data and config servers.

Example of cr.yaml:

  - name: rs0
    size: 3
    configuration: |
        enableEncryption: true
          serverName: vault
          port: 8200
          tokenFile: /etc/mongodb-vault/token
          secret: secret/data/dc/cluster1/rs0
          disableTLSForTesting: true
    enabled: true

      size: 3
      configuration: |
          enableEncryption: true
            serverName: vault
            port: 8200
            tokenFile: /etc/mongodb-vault/token
            secret: secret/data/dc/cluster1/cfg
            disableTLSForTesting: true

What to note here:

  • tokenFile: /etc/mongodb-vault/token
    • It is where the Operator is going to mount the Secret with the Vault token you created before. This is the default path and in most cases should not be changed.
  • secret: secret/data/dc/cluster1/rs0
    • It is the path where keys are going to be stored in the Vault. 

You can read more about Percona Server for MongoDB and Hashicorp Vault parameters in our documentation.

Once you are done with the configuration, apply the Custom Resource as usual. If everything is set up correctly you will see the following message in mongod log:

$ kubectl logs cluster1-rs0-0 -c mongod 
{"t":{"$date":"2022-09-13T19:40:20.342+00:00"},"s":"I",  "c":"STORAGE",  "id":29039,   "ctx":"initandlisten","msg":"Encryption keys DB is initialized successfully"}

Azure Kubernetes Service support

All Percona Operators are going through rigorous QA testing throughout the development lifecycle. Hours of QA engineers’ work are put into automating the test suites for specific Kubernetes flavors. 

AKS, or Azure Kubernetes Service, is the second most popular managed Kubernetes offering according to Flexera 2022 State of the Cloud report. After adding the support for Azure Blob Storage in version 1.11.0, it was just a matter of time before we started supporting AKS in full.

Starting with the 1.13.0 release, Percona Operator for MongoDB supports AKS in Technical Preview. You can see more details in our documentation.

The installation process of the Operator is no different from any other Kubernetes flavor. You can use a helm chart or apply YAML manifests with kubectl. I ran the cluster-wide demo above with AKS.

Admin user

This is a minor change, but frankly, it is my favorite as it impacts the user experience in a great way. Our Operator is coming with systems users that are used to manage and track the health of the database. Also, there are userAdmin and clusterAdmin for users to control the database, create users, and so on.

The problem here is neither userAdmin nor clusterAdmin allows you to start playing with the database right away. At first, you need to create the user that has permission to create databases and collections, and only after that, you can start using your fresh MongoDB cluster. 

With the release 1.13, we say no more to this. We add the databaseAdmin user that acts like a database admin, enabling users to start innovating right away.

databaseAdmin credentials are added to the same Secret object where other users are:

$ kubectl get secret my-cluster-name-secrets -o yaml | grep MONGODB_DATABASE_ADMIN_

Get your password like this:

$ kubectl get secret my-cluster-name-secrets -o jsonpath='{.data.MONGODB_DATABASE_ADMIN_PASSWORD}' | base64 --decode

Connect to the database as usual and start innovating:

mongo "mongodb://databaseAdmin:FoMRrwP2Ck0yqoVYO@"

What’s next

Percona is committed to running databases anywhere. Kubernetes adoption grows year over year, turning from a container orchestrator into a cloud operating system. Our Operators are supporting the community and our customers’ journey in infrastructure transformations by automating the deployment and management of the databases in Kubernetes.

The following links are going to help you to get familiar with Percona Operator for MongoDB:

  1. Quickstart guides
  2. Free Kubernetes cluster for easier and quicker testing
  3. Percona Operator for MongoDB community forum if you have general questions or need assistance

Read about Percona Monitoring and Management DBaaS, an open source solution that simplifies the deployment and management of MongoDB even more.


Percona Operator for MySQL Supports Group Replication

Percona Operator for MySQL Supports Group Replication

Percona Operator for MySQL Supports Group ReplicationThere are two Operators at Percona to deploy MySQL on Kubernetes:

We wrote a blog post in the past explaining the thought process and reasoning behind creating the new Operator for MySQL. The goal for us is to provide production-grade solutions to run MySQL in Kubernetes and support various replication configurations:

  • Synchronous replication
    • with Percona XtraDB Cluster
    • with Group Replication
  • Asynchronous replication

With the latest 0.2.0 release of Percona Operator for MySQL (based on Percona Server for MySQL), we have added Group Replication support. In this blog post, we will briefly review the design of our implementation and see how to set it up. 


This is a high-level design of running MySQL cluster with Group Replication:

MySQL cluster with Group Replication

MySQL Router acts as an entry point for all requests and routes the traffic to the nodes. 

This is a deeper look at how the Operator deploys these components in Kubernetes:
kubernetes deployment

Going from right to left:

  1. StatefulSet to deploy a cluster of MySQL nodes with Group Replication configured. Each node has its storage attached to it.
  2. Deployment object for stateless MySQL Router. 
  3. Deployment is exposed with a Service. We use various TCP ports here:
    1. MySQL Protocol ports
      1. 6446 – read/write, routing traffic to Primary node
      2. 6447 – read-only, load-balancing the traffic across Replicas 
    2. MySQL X Protocol – can be useful for CRUD operations, ex. asynchronous calls. Ports follow the same logic:
      1. 6448 – read/write
      2. 6449 – read-only 


Prerequisites: you need a Kubernetes cluster. Minikube would do.

The files used in this blog post can be found in this Github repo.

Deploy the Operator

kubectl apply --server-side -f



flag, without it you will get the error:

The CustomResourceDefinition "" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

Our Operator follows OpenAPIv3 schema to have proper validation. This unfortunately increases the size of our Custom Resource Definition manifest and as a result, requires us to use



Deploy the Cluster

We are ready to deploy the cluster now:

kubectl apply -f

I created this Custom Resource manifest specifically for this demo. Important to note variables:

  1. Line 10:
    clusterType: group-replication

    – instructs Operator that this is going to be a cluster with Group Replication.

  2. Lines 31-47: are all about MySQL Router. Once Group Replication is enabled, the Operator will automatically deploy the router. 

Get the status

The best way to see if the cluster is ready is to check the Custom Resource state:

$ kubectl get ps
my-cluster   group-replication   ready   18m

As you can see, it is


. You can also see


if the cluster is still not ready or


if something went wrong.

Here you can also see the endpoint where you can connect to. In our case, it is a public IP-address of the load balancer. As described in the design section above, there are multiple ports exposed:

$ kubectl get service my-cluster-router
NAME                TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)                                                       AGE
my-cluster-router   LoadBalancer   6446:30852/TCP,6447:31694/TCP,6448:31515/TCP,6449:31686/TCP   18h

Connect to the Cluster

To connect we will need the user first. By default, there is a root user with a randomly generated password. The password is stored in the Secret object. You can always fetch the password with the following command:

$ kubectl get secrets my-cluster-secrets -ojson | jq -r .data.root | base64 -d

I’m going to use port 6446, which would grant me read/write access and lead me directly to the Primary node through MySQL Router:

mysql -u root -p -h --port 6446
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 156329
Server version: 8.0.28-19 Percona Server (GPL), Release 19, Revision 31e88966cd3

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

Group Replication in action

Let’s see if Group Replication really works. 

List the members of the cluster by running the following command:

$ mysql -u root -p -P 6446 -h -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;'
| member_host                               | member_state | member_role |
| | ONLINE       | PRIMARY     |
| | ONLINE       | SECONDARY   |
| | ONLINE       | SECONDARY   |

Now we will delete one Pod (MySQL node), which also happens to have a Primary role, and see what happens:

$ kubectl delete pod my-cluster-mysql-0
pod "my-cluster-mysql-0" deleted

$ mysql -u root -p -P 6446 -h -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;'
| member_host                               | member_state | member_role |
| | ONLINE       | PRIMARY     |
| | ONLINE       | SECONDARY   |

One node is gone as expected.


node got promoted to a Primary role. I’m still using port 6446 and the same host to connect to the database, which indicates that MySQL Router is doing its job.

After some time Kubernetes will recreate the Pod and the node will join the cluster again automatically:

$ mysql -u root -p -P 6446 -h -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;'
| member_host                               | member_state | member_role |
| | ONLINE       | PRIMARY     |
| | ONLINE       | SECONDARY   |

The recovery phase might take some time, depending on the data size and amount of the changes, but eventually, it will come back ONLINE:

$ mysql -u root -p -P 6446 -h -e 'SELECT member_host, member_state, member_role FROM performance_schema.replication_group_members;'
| member_host                               | member_state | member_role |
| | ONLINE       | SECONDARY   |
| | ONLINE       | PRIMARY     |
| | ONLINE       | SECONDARY   |

What’s coming up next?

Some exciting capabilities and features that we are going to ship pretty soon:

  • Backup and restore support for clusters with Group Replication
    • We have backups and restores in the Operator, but they currently do not work with Group Replication
  • Monitoring of MySQL Router in Percona Monitoring and Management (PMM)
    • Even though the Operator integrates nicely with PMM, it is possible to monitor MySQL nodes only, but not MySQL Router.
  • Automated Upgrades of MySQL and database components in the Operator
    • We have it in all other Operators and it is just logical to add it here

Percona is an open source company and we value our community and contributors. You are greatly encouraged to contribute to Percona Software. Please read our Contributions guide and visit our community webpage.


Running Rocket.Chat with Percona Server for MongoDB on Kubernetes

Running Rocket.Chat with Percona Server for MongoDB on Kubernetes

Our goal is to have a Rocket.Chat deployment which uses highly available Percona Server for MongoDB cluster as the backend database and it all runs on Kubernetes. To get there, we will do the following:

  • Start a Google Kubernetes Engine (GKE) cluster across multiple availability zones. It can be any other Kubernetes flavor or service, but I rely on multi-AZ capability in this blog post.
  • Deploy Percona Operator for MongoDB and database cluster with it
  • Deploy Rocket.Chat with specific affinity rules
    • Rocket.Chat will be exposed via a load balancer

Rocket.Chat will be exposed via a load balancer

Percona Operator for MongoDB, compared to other solutions, is not only the most feature-rich but also comes with various management capabilities for your MongoDB clusters – backups, scaling (including sharding), zero-downtime upgrades, and many more. There are no hidden costs and it is truly open source.

This blog post is a walkthrough of running a production-grade deployment of Rocket.Chat with Percona Operator for MongoDB.


All YAML manifests that I use in this blog post can be found in this repository.

Deploy Kubernetes Cluster

The following command deploys GKE cluster named


in 3 availability zones:

gcloud container clusters create --zone us-central1-a --node-locations us-central1-a,us-central1-b,us-central1-c percona-rocket --cluster-version 1.21 --machine-type n1-standard-4 --preemptible --num-nodes=3

Read more about this in the documentation.

Deploy MongoDB

I’m going to use helm to deploy the Operator and the cluster.

Add helm repository:

helm repo add percona

Install the Operator into the percona namespace:

helm install psmdb-operator percona/psmdb-operator --create-namespace --namespace percona

Deploy the cluster of Percona Server for MongoDB nodes:

helm install my-db percona/psmdb-db -f psmdb-values.yaml -n percona

Replica set nodes are going to be distributed across availability zones. To get there, I altered the affinity keys in the corresponding sections of psmdb-values.yaml:

antiAffinityTopologyKey: ""

Prepare MongoDB

For Rocket.Chat to connect to our database cluster, we need to create the users. By default, clusters provisioned with our Operator have


user, its password is set in




For production-grade systems, do not forget to change this password or create dedicated secrets to provision those. Read more about user management in our documentation.

Spin up a client Pod to connect to the database:

kubectl run -i --rm --tty percona-client1 --image=percona/percona-server-mongodb:4.4.10-11 --restart=Never -- bash -il

Connect to the database with



[mongodb@percona-client1 /]$ mongo "mongodb://userAdmin:userAdmin123456@my-db-psmdb-db-rs0-0.percona/admin?replicaSet=rs0"

We are going to create the following:

  • rocketchat


  • rocketChat

    user to store data and connect to the database

  • oplogger

    user to provide access to oplog for rocket chat

    • Rocket.Chat uses Meteor Oplog tailing to improve performance. It is optional.
use rocketchat
  user: "rocketChat",
  pwd: passwordPrompt(),
  roles: [
    { role: "readWrite", db: "rocketchat" }

use admin
  user: "oplogger",
  pwd: passwordPrompt(),
  roles: [
    { role: "read", db: "local" }

Deploy Rocket.Chat

I will use helm here to maintain the same approach. 

helm install -f rocket-values.yaml my-rocketchat rocketchat/rocketchat --version 3.0.0

You can find rocket-values.yaml in the same repository. Please make sure you set the correct passwords in the corresponding YAML fields.

As you can see, I also do the following:

  • Line 11: expose Rocket.Chat through

    service type

  • Line 13-14: set number of replicas of Rocket.Chat Pods. We want three – one per each availability zone.
  • Line 16-23: set affinity to distribute Pods across availability zones

Load Balancer will be created with a public IP address:

$ kubectl get service my-rocketchat-rocketchat
NAME                       TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE
my-rocketchat-rocketchat   LoadBalancer   80:32548/TCP   12m

You should now be able to connect to

and enjoy your highly available Rocket.Chat installation.

Rocket.Chat installation

Clean Up

Uninstall all helm charts to remove MongoDB cluster, the Operator, and Rocket.Chat:

helm uninstall my-rocketchat
helm uninstall my-db -n percona
helm uninstall psmdb-operator -n percona

Things to Consider


Instead of exposing Rocket.Chat through a load balancer, you may also try ingress. By doing so, you can integrate it with cert-manager and have a valid TLS certificate for your chat server.


It is also possible to run a sharded MongoDB cluster with Percona Operator. If you do so, Rocket.Chat will connect to mongos Service, instead of the replica set nodes. But you will still need to connect to the replica set directly to get oplogs.


We encourage you to try out Percona Operator for MongoDB with Rocket.Chat and let us know on our community forum your results.

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Operator for MongoDB in

Percona Operator for MongoDB contains everything you need to quickly and consistently deploy and scale Percona Server for MongoDB instances into a Kubernetes cluster on-premises or in the cloud. The Operator enables you to improve time to market with the ability to quickly deploy standardized and repeatable database environments. Deploy your database with a consistent and idempotent result no matter where they are used.


Percona Operator for MongoDB and Kubernetes MCS: The Story of One Improvement

Percona Operator for MongoDB and Kubernetes MCS

Percona Operator for MongoDB supports multi-cluster or cross-site replication deployments since version 1.10. This functionality is extremely useful if you want to have a disaster recovery deployment or perform a migration from or to a MongoDB cluster running in Kubernetes. In a nutshell, it allows you to use Operators deployed in different Kubernetes clusters to manage and expand replica sets.Percona Kubernetes Operator

For example, you have two Kubernetes clusters: one in Region A, another in Region B.

  • In Region A you deploy your MongoDB cluster with Percona Operator. 
  • In Region B you deploy unmanaged MongoDB nodes with another installation of Percona Operator.
  • You configure both Operators, so that nodes in Region B are added to the replica set in Region A.

In case of failure of Region A, you can switch your traffic to Region B.

Migrating MongoDB to Kubernetes describes the migration process using this functionality of the Operator.

This feature was released in tech preview, and we received lots of positive feedback from our users. But one of our customers raised an internal ticket, which was pointing out that cross-site replication functionality does not work with Multi-Cluster Services. This started the investigation and the creation of this ticket – K8SPSMDB-625

This blog post will go into the deep roots of this story and how it is solved in the latest release of Percona Operator for MongoDB version 1.12.

The Problem

Multi-Cluster Services or MCS allows you to expand network boundaries for the Kubernetes cluster and share Service objects across these boundaries. Someone calls it another take on Kubernetes Federation. This feature is already available on some managed Kubernetes offerings,  Google Cloud Kubernetes Engine (GKE) and AWS Elastic Kubernetes Service (EKS). Submariner uses the same logic and primitives under the hood.

MCS Basics

To understand the problem, we need to understand how multi-cluster services work. Let’s take a look at the picture below:

multi-cluster services

  • We have two Pods in different Kubernetes clusters
  • We add these two clusters into our MCS domain
  • Each Pod has a service and IP-address which is unique to the Kubernetes cluster
  • MCS introduces new Custom Resources –




    • Once you create a

      object in one cluster,


      object appears in all clusters in your MCS domain.

    • This

        object is in


      domain and with the network magic introduced by MCS can be accessed from any cluster in the MCS domain

Above means that if I have an application in the Kubernetes cluster in Region A, I can connect to the Pod in Kubernetes cluster in Region B through a domain name like


. And it works from another cluster as well.

MCS and Replica Set

Here is how cross-site replication works with Percona Operator if you use load balancer:

MCS and Replica Set

All replica set nodes have a dedicated service and a load balancer. A replica set in the MongoDB cluster is formed using these public IP addresses. External node added using public IP address as well:

  - name: rs0
    size: 3
    - host:

All nodes can reach each other, which is required to form a healthy replica set.

Here is how it looks when you have clusters connected through multi-cluster service:

Instead of load balancers replica set nodes are exposed through Cluster IPs. We have ServiceExports and ServiceImports resources. All looks good on the networking level, it should work, but it does not.

The problem is in the way the Operator builds MongoDB Replica Set in Region A. To register an external node from Region B to a replica set, we will use MCS domain name in the corresponding section:

  - name: rs0
    size: 3
    - host: rs0-4.mongo.svc.clusterset.local

Now our rs.status() will look like this:

"name" : "my-cluster-rs0-0.mongo.svc.cluster.local:27017"
"role" : "PRIMARY"
"name" : "my-cluster-rs0-1.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
"name" : "my-cluster-rs0-2.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
"name" : "rs0-4.mongo.svc.clusterset.local:27017"
"role" : "UNKNOWN"

As you can see, Operator formed a replica set out of three nodes using


domain, as it is how it should be done when you expose nodes with


Service type. In this case, a node in Region B cannot reach any node in Region A, as it tries to connect to the domain that is local to the cluster in Region A. 

In the picture below, you can easily see where the problem is:

The Solution

Luckily we have a Special Interest Group (SIG), a Kubernetes Enhancement Proposal (KEP) and multiple implementations for enabling Multi-Cluster Services. Having a KEP is great since we can be sure the implementations from different providers (i.e GCP, AWS) will follow the same standard more or less.

There are two fields in the Custom Resource that control MCS in the Operator:

    enabled: true
    DNSSuffix: svc.clusterset.local

Let’s see what is happening in the background with these flags set.

ServiceImport and ServiceExport Objects

Once you enable MCS by patching the CR with

spec.multiCluster.enabled: true

, the Operator creates a


object for each service. These ServiceExports will be detected by the MCS controller in the cluster and eventually a


for each


will be created in the same namespace in each cluster that has MCS enabled.

As you see, we made a decision and empowered the Operator to create


objects. There are two main reasons for doing that:

  • If any infrastructure-as-a-code tool is used, it would require additional logic and level of complexity to automate the creation of required MCS objects. If Operator takes care of it, no additional work is needed. 
  • Our Operators take care of the infrastructure for the database, including Service objects. It just felt logical to expand the reach of this functionality to MCS.

Replica Set and Transport Encryption

The root cause of the problem that we are trying to solve here lies in the networking field, where external replica set nodes try to connect to the wrong domain names. Now, when you enable multi-cluster and set


(it defaults to


), Operator does the following:

  • Replica set is formed using MCS domain set in


  • Operator generates TLS certificates as usual, but adds

    domains into the picture

With this approach, the traffic between nodes flows as expected and is encrypted by default.

Replica Set and Transport Encryption

Things to Consider


Please note that the operator won’t install MCS APIs and controllers to your Kubernetes cluster. You need to install them by following your provider’s instructions prior to enabling MCS for your PSMDB clusters. See our docs for links to different providers.

Operator detects if MCS is installed in the cluster by API resources. The detection happens before controllers are started in the operator. If you installed MCS APIs while the operator is running, you need to restart the operator. Otherwise, you’ll see an error like this:

  "level": "error",
  "ts": 1652083068.5910048,
  "logger": "controller.psmdb-controller",
  "msg": "Reconciler error",
  "name": "cluster1",
  "namespace": "psmdb",
  "error": "wrong psmdb options: MCS is not available on this cluster",
  "errorVerbose": "...",
  "stacktrace": "..."

ServiceImport Provisioning Time

It might take some time for


objects to be created in the Kubernetes cluster. You can see the following messages in the logs while creation is in progress: 

  "level": "info",
  "ts": 1652083323.483056,
  "logger": "controller_psmdb",
  "msg": "waiting for service import",
  "replset": "rs0",
  "serviceExport": "cluster1-rs0"

During testing, we saw wait times up to 10-15 minutes. If you see your cluster is stuck in initializing state by waiting for service imports, it’s a good idea to check the usage and quotas for your environment.


We also made a decision to automatically generate TLS certificates for Percona Server for MongoDB cluster with


domain, even if MCS is not enabled. This approach simplifies the process of enabling MCS for a running MongoDB cluster. It does not make much sense to change the


 field, unless you have hard requirements from your service provider, but we still allow such a change. 

If you want to enable MCS with a cluster deployed with an operator version below 1.12, you need to update your TLS certificates to include


SANs. See the docs for instructions.


Business relies on applications and infrastructure that serves them more than ever nowadays. Disaster Recovery protocols and various failsafe mechanisms are routine for reliability engineers, not an annoying task in the backlog. 

With multi-cluster deployment functionality in Percona Operator for MongoDB, we want to equip users to build highly available and secured database clusters with minimal effort.

Percona Operator for MongoDB is truly open source and provides users with a way to deploy and manage their enterprise-grade MongoDB clusters on Kubernetes. We encourage you to try this new Multi-Cluster Services integration and let us know your results on our community forum. You can find some scripts that would help you provision your first MCS clusters on GKE or EKS here.

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Operator for MongoDB in


Expose Databases on Kubernetes with Ingress

Expose Databases on Kubernetes with Ingress

Ingress is a resource that is commonly used to expose HTTP(s) services outside of Kubernetes. To have ingress support, you will need an Ingress Controller, which in a nutshell is a proxy. SREs and DevOps love ingress as it provides developers with a self-service to expose their applications. Developers love it as it is simple to use, but at the same time quite flexible.

High-level ingress design looks like this: 

High-level ingress design

  1. Users connect through a single Load Balancer or other Kubernetes service
  2. Traffic is routed through Ingress Pod (or Pods for high availability)
    • There are multiple flavors of Ingress Controllers. Some use nginx, some envoy, or other proxies. See a curated list of Ingress Controllers here.
  3. Based on HTTP headers traffic is routed to corresponding Pods which run websites. For HTTPS traffic Server Name Indication (SNI) is used, which is an extension in TLS supported by most browsers. Usually, ingress controller integrates nicely with cert-manager, which provides you with full TLS lifecycle support (yeah, no need to worry about renewals anymore).

The beauty and value of such a design is that you have a single Load Balancer serving all your websites. In Public Clouds, it leads to cost savings, and in private clouds, it simplifies your networking or reduces the number of IPv4 addresses (if you are not on IPv6 yet). 

TCP and UDP with Ingress

Quite interestingly, some ingress controllers also support TCP and UDP proxying. I have been asked on our forum and Kubernetes slack multiple times if it is possible to use ingress with Percona Operators. Well, it is. Usually, you need a load balancer per database cluster: 

TCP and UDP with Ingress

The design with ingress is going to be a bit more complicated but still allows you to utilize the single load balancer for multiple databases. In cases where you run hundreds of clusters, it leads to significant cost savings. 

  1. Each TCP port represents the database cluster
  2. Ingress Controller makes a decision about where to route the traffic based on the port

The obvious downside of this design is non-default TCP ports for databases. There might be weird cases where it can turn into a blocker, but usually, it should not.


My goal is to have the following:

All configuration files I used for this blog post can be found in this Github repository.

Deploy Percona XtraDB Clusters (PXC)

The following commands are going to deploy the Operator and three clusters:

kubectl apply -f

kubectl apply -f
kubectl apply -f
kubectl apply -f

Deploy Ingress

helm upgrade --install ingress-nginx ingress-nginx   --repo   --namespace ingress-nginx --create-namespace  \
--set controller.replicaCount=2 \
--set tcp.3306="default/minimal-cluster-haproxy:3306"  \
--set tcp.3307="default/minimal-cluster2-haproxy:3306" \
--set tcp.3308="default/minimal-cluster3-haproxy:3306"

This is going to deploy a highly available ingress-nginx controller. 

  • controller.replicaCount=2

    – defines that we want to have at least two Pods of ingress controller. This is to provide a highly available setup.

  • tcp flags do two things:
    • expose ports 3306-3308 on the ingress’s load balancer
    • instructs ingress controller to forward traffic to corresponding services which were created by Percona Operator for PXC clusters. For example, port 3307 is the one to use to connect to minimal-cluster2. Read more about this configuration in ingress documentation.

Here is the load balancer resource that was created:

$ kubectl -n ingress-nginx get service
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                                                   AGE
ingress-nginx-controller             LoadBalancer   80:30261/TCP,443:32112/TCP,3306:31583/TCP,3307:30786/TCP,3308:31827/TCP   4m13s

As you see, ports 3306-3308 are exposed.

Check the Connection

This is it. Database clusters should be exposed and reachable. Let’s check the connection. 

Get the root password for minimal-cluster2:

$ kubectl get secrets minimal-cluster2-secrets | grep root | awk '{print $2}' | base64 --decode && echo

Connect to the database:

$ mysql -u root -h --port 3307  -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5722

It works! Notice how I use port 3307, which corresponds to minimal-cluster2.

Adding More Clusters

What if you add more clusters into the picture, how do you expose those?


If you use helm, the easiest way is to just add one more flag into the command:

helm upgrade --install ingress-nginx ingress-nginx   --repo   --namespace ingress-nginx --create-namespace  \
--set controller.replicaCount=2 \
--set tcp.3306="default/minimal-cluster-haproxy:3306"  \
--set tcp.3307="default/minimal-cluster2-haproxy:3306" \
--set tcp.3308="default/minimal-cluster3-haproxy:3306" \
--set tcp.3309="default/minimal-cluster4-haproxy:3306"

No helm

Without Helm, it is a two-step process:

First, edit the


which configures TCP services exposure. By default it is called



kubectl -n ingress-nginx edit cm ingress-nginx-tcp
apiVersion: v1
  "3306": default/minimal-cluster-haproxy:3306
  "3307": default/minimal-cluster2-haproxy:3306
  "3308": default/minimal-cluster3-haproxy:3306
  "3309": default/minimal-cluster4-haproxy:3306

Change in


will trigger the reconfiguration of nginx in ingress pods. But as a second step, it is also necessary to expose this port on a load balancer. To do so – edit the corresponding service:

kubectl -n ingress-nginx edit services ingress-nginx-controller
  - name: 3309-tcp
    port: 3309
    protocol: TCP
    targetPort: 3309-tcp

The new cluster is exposed on port 3309 now.

Limitations and considerations

Ports per Load Balancer

Cloud providers usually limit the number of ports that you can expose through a single load balancer:

  • AWS has 50 listeners per NLB, GCP 100 ports per service.

If you hit the limit, just create another load balancer pointing to the ingress controller.


Cost-saving is a good thing, but with Kubernetes capabilities, users expect to avoid manual tasks, not add more. Integrating the change of ingress configMap and load balancer ports into CICD is not a hard task, but maintaining the logic of adding new load balancers to add more ports is harder. I have not found any projects that implement the logic of reusing load balancer ports or automating it in any way. If you know of any – please leave a comment under this blog post.


Run PostgreSQL on Kubernetes with Percona Operator & Pulumi

Run PostgreSQL on Kubernetes with Percona Operator and Pulumi

Avoid vendor lock-in, provide a private Database-as-a-Service for internal teams, quickly deploy-test-destroy databases with CI/CD pipeline – these are some of the most common use cases for running databases on Kubernetes with operators. Percona Distribution for PostgreSQL Operator enables users to do exactly that and more.

Pulumi is an infrastructure-as-a-code tool, which enables developers to write code in their favorite language (Python, Golang, JavaScript, etc.) to deploy infrastructure and applications easily to public clouds and platforms such as Kubernetes.

This blog post is a step-by-step guide on how to deploy a highly-available PostgreSQL cluster on Kubernetes with our Percona Operator and Pulumi.

Desired State

We are going to provision the following resources with Pulumi:

  • Google Kubernetes Engine cluster with three nodes. It can be any Kubernetes flavor.
  • Percona Operator for PostgreSQL
  • Highly available PostgreSQL cluster with one primary and two hot standby nodes
  • Highly available pgBouncer deployment with the Load Balancer in front of it
  • pgBackRest for local backups

Pulumi code can be found in this git repository.


I will use the Ubuntu box to run Pulumi, but almost the same steps would work on macOS.

Pre-install Packages

gcloud and kubectl

echo "deb [signed-by=/usr/share/keyrings/] cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl | sudo apt-key --keyring /usr/share/keyrings/ add -
sudo apt-get update
sudo apt-get install -y google-cloud-sdk kubectl jq unzip


Pulumi allows developers to use the language of their choice to describe infrastructure and applications. I’m going to use python. We will also pip (python package-management system) and venv (virtual environment module).

sudo apt-get install python3 python3-pip python3-venv


Install Pulumi:

curl -sSL | sh

On macOS, this can be installed view Homebrew with

brew install pulumi


You will need to add .pulumi/bin to the $PATH:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/percona/.pulumi/bin



You will need to provide access to Google Cloud to provision Google Kubernetes Engine.

gcloud config set project your-project
gcloud auth application-default login
gcloud auth login


Generate Pulumi token at You will need it later to init Pulumi stack:


This repo has the following files:

  • Pulumi.yaml

    – identifies that it is a folder with Pulumi project


    – python code used by Pulumi to provision everything we need

  • requirements.txt

    – to install required python packages

Clone the repo and go to the



git clone
cd blog-data/pg-k8s-pulumi

Init the stack with:

pulumi stack init pg

You will need the key here generated before on

Python code that Pulumi is going to process is in file. 

Lines 1-6: importing python packages

Lines 8-31: configuration parameters for this Pulumi stack. It consists of two parts:

  • Kubernetes cluster configuration. For example, the number of nodes.
  • Operator and PostgreSQL cluster configuration – namespace to be deployed to, service type to expose pgBouncer, etc.

Lines 33-80: deploy GKE cluster and export its configuration

Lines 82-88: create the namespace for Operator and PostgreSQL cluster

Lines 91-426: deploy the Operator. In reality, it just mirrors the operator.yaml from our Operator.

Lines 429-444: create the secret object that allows you to set the password for pguser to connect to the database

Lines 445-557: deploy PostgreSQL cluster. It is a JSON version of cr.yaml from our Operator repository

Line 560: exports Kubernetes configuration so that it can be reused later 


At first, we will set the configuration for this stack. Execute the following commands:

pulumi config set gcp:project YOUR_PROJECT
pulumi config set gcp:zone us-central1-a
pulumi config set node_count 3
pulumi config set master_version 1.21

pulumi config set namespace percona-pg
pulumi config set pg_cluster_name pulumi-pg
pulumi config set service_type LoadBalancer
pulumi config set pg_user_password mySuperPass

These commands set the following:

  • GCP project where GKE is going to be deployed
  • GCP zone 
  • Number of nodes in a GKE cluster
  • Kubernetes version
  • Namespace to run PostgreSQL cluster
  • The name of the cluster
  • Expose pgBouncer with LoadBalancer object

Deploy with the following command:

$ pulumi up
Previewing update (pg)

View Live:

     Type                                                           Name                                Plan       Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   create     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     create
 +   ?? random:index:RandomPassword                                 password                            create
 +   ?? gcp:container:Cluster                                       gke-cluster                         create
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             create
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    create
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         create
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    create
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         create
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         create
 +   ??  pgo_deployer_crbClusterRoleBinding  create
 +   ??         pgo_deployer_crClusterRole          create
 +   ??               my_cluster_name                     create

  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:19:49.739366105   53802]           Fork support is only compatible with the epoll1 and poll polling strategies

Do you want to perform this update? yes

Updating (pg)
View Live:
     Type                                                           Name                                Status      Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   created     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     created
 +   ?? random:index:RandomPassword                                 password                            created
 +   ?? gcp:container:Cluster                                       gke-cluster                         created
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             created
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    created
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         created
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         created
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    created
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         created
 +   ??         pgo_deployer_crClusterRole          created
 +   ??  pgo_deployer_crbClusterRoleBinding  created
 +   ??               my_cluster_name                     created

  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:20:00.211695433   53839]           Fork support is only compatible with the epoll1 and poll polling strategies

    kubeconfig: "[secret]"

    + 13 created

Duration: 5m30s


Get kubeconfig first:

pulumi stack output kubeconfig --show-secrets > ~/.kube/config

Check if Pods of your PG cluster are up and running:

$ kubectl -n percona-pg get pods
NAME                                             READY   STATUS      RESTARTS   AGE
backrest-backup-pulumi-pg-dbgsp                  0/1     Completed   0          64s
pgo-deploy-8h86n                                 0/1     Completed   0          4m9s
postgres-operator-5966f884d4-zknbx               4/4     Running     1          3m27s
pulumi-pg-787fdbd8d9-d4nvv                       1/1     Running     0          2m12s
pulumi-pg-backrest-shared-repo-f58bc7657-2swvn   1/1     Running     0          2m38s
pulumi-pg-pgbouncer-6b6dc4564b-bh56z             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-vpppx             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-zkdwj             1/1     Running     0          81s
pulumi-pg-repl1-58d578cf49-czm54                 0/1     Running     0          46s
pulumi-pg-repl2-7888fbfd47-h98f4                 0/1     Running     0          46s
pulumi-pg-repl3-cdd958bd9-tf87k                  1/1     Running     0          46s

Get the IP-address of pgBouncer LoadBalancer:

$ kubectl -n percona-pg get services
NAME                             TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
pulumi-pg-pgbouncer              LoadBalancer   5432:32042/TCP               3m17s

You can connect to your PostgreSQL cluster through this IP-address. Use pguser password that was set earlier with

pulumi config set pg_user_password


psql -h -p 5432 -U pguser pgdb

Clean up

To delete everything it is enough to run the following commands:

pulumi destroy
pulumi stack rm

Tricks and Quirks

Pulumi Converter

kube2pulumi is a huge help if you already have YAML manifests. You don’t need to rewrite the whole code, but just convert YAMLs to Pulumi code. This is what I did for operator.yaml.


There are two ways for Custom Resource management in Pulumi:

crd2pulumi generates libraries/classes out of Custom Resource Definitions and allows you to create custom resources later using these. I found it a bit complicated and it also lacks documentation.

apiextensions.CustomResource on the other hand allows you to create Custom Resources by specifying them as JSON. It is much easier and requires less manipulation. See lines 446-557 in my

True/False in JSON

I have the following in my Custom Resource definition in Pulumi code:

perconapg = kubernetes.apiextensions.CustomResource(
    spec= {
    "disableAutofail": False,
    "tlsOnly": False,
    "standby": False,
    "pause": False,
    "keepData": True,

Be sure that you use boolean of the language of your choice and not the “true”/”false” strings. For me using the strings turned into a failure as the Operator was expecting boolean, not the strings.

Depends On…

Pulumi makes its own decisions on the ordering of provisioning resources. You can enforce the order by specifying dependencies

For example, I’m ensuring that Operator and Secret are created before the Custom Resource:


Powered by WordPress | Theme: Aeros 2.0 by