Percona Monitoring and Management DBaaS Overview and Technical Details

Percona Monitoring and Management DBaaS Overview

Percona Monitoring and Management DBaaS OverviewDatabase-as-a-Service (DBaaS) is a managed database that doesn’t need to be installed and maintained but is instead provided as a service to the user. The Percona Monitoring and Management (PMM) DBaaS component allows users to CRUD (Create, Read, Update, Delete) Percona XtraDB Cluster (PXC) and Percona Server for MongoDB (PSMDB) managed databases in Kubernetes clusters.

PXC and PSMDB implement DBaaS on top of Kubernetes (k8s), and PMM DBaaS provides a nice interface and API to manage them.

Deploy Playground with minikube

The easiest way to play with and test PMM DBaaS is to use minikube. Please follow the minikube installation guideline. It is possible that your OS distribution provides native packages for it, so check that with your package manager as well.

In the examples below, Linux is used with kvm2 driver, so additionally kvm and libvirt should be installed. Other OS and drivers could be used as well. Install the kubectl tool as well, it would be more convenient to use it and minikube will configure kubeconfig so k8s cluster could be accessed from the host easily.

Let’s create a k8s cluster and adjust resources as needed. The minimum requirements can be found in the documentation.

  • Start minikube cluster
$ minikube start --cpus 12 --memory 32G --driver=kvm2

  • Download PMM Server deployment for minikube and deploy it in k8s cluster
$ curl -sSf -m 30 \
| kubectl apply -f -

  • For the first time, it could take a while for the PMM Server to init the volume, but it will eventually start
  • Here’s how to check that PMM Server deployment is running:
$ kubectl get deployment
pmm-deployment   1/1     1            1           3m40s

$ kubectl get pods
NAME                             READY   STATUS    RESTARTS   AGE
pmm-deployment-d688fb846-mtc62   1/1     Running   0          3m42s

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM              STORAGECLASS   REASON   AGE
pmm-data                                   10Gi       RWO            Retain           Available                                              3m44s
pvc-cb3a0a18-b6dd-4b2e-92a5-dfc0bc79d880   10Gi       RWO            Delete           Bound       default/pmm-data   standard                3m44s

$ kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pmm-data   Bound    pvc-cb3a0a18-b6dd-4b2e-92a5-dfc0bc79d880   10Gi       RWO            standard       3m45s

$ kubectl get service
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
kubernetes   ClusterIP        <none>        443/TCP                      6m10s
pmm          NodePort   <none>        80:30080/TCP,443:30443/TCP   3m5

  • Expose PMM Server ports on the host, as this also opens links to the PMM UI as well as to the API endpoint in the default browser.
$ minikube service pmm


To not go into too much detail: PV (kubectl get pv) and PVC(kubectl get pvc) are essentially the storage for PMM data (/srv directory). Service is a network for PMM and how to access it.

Attention: this PMM Server deployment is not supposed to be used in production, but just as a sandbox for testing and playing around, as it always starts with the latest version of PMM and k8s is not yet a supported environment for it.

Configure PMM DBaaS

Now PMM DBaaS Dashboard can be used and a k8s cluster could be added, DB added, as well as configured.

DBaaS Dashboard


To enable the PMM DBaaS feature you need to either pass a special environment (ENABLE_DBAAS=1) to the container or enable it in the settings (next screenshot).

To allow PMM managing k8s cluster – it needs to be configured. Check the documentation, but here are short steps:

  • set Public Address address to pmm on Configuration -> Settings -> Advanced Settings page

PMM Advanced settings

  • Get k8s config (kubeconfig) and copy it for registration:
kubectl config view --flatten --minify

  • Register configuration that was copied on DBaaS Kubernetes Cluster dashboard:

DBaaS Register k8s Cluster


Let’s get into details on what that all means.

The Public Address is propagated to pmm-client containers that are run as part of PXC and PSMDB deployments to monitor DB services pmm-client containers run pmm-agent, which would need to connect to the PMM server. It uses Public Address. DNS name pmm is set by Service in pmm-server-minikube.yaml file for our PMM server deployment.

So far, PMM DBaaS uses kubeconfig to get access to k8s API to be able to manage PXC and PSMDB operators. The kubeconfig file and k8s cluster information is stored securely in PMM Server internal DB.

PMM DBaaS couldn’t deploy operators into the k8s cluster for now, but that feature will be implemented very soon. And that is why Operator status on the Kubernetes Cluster dashboard shows hints on how to install them.

What are the operators and why are they needed? This is defined very well in the documentation. Long story short, they are the heart of DBaaS that deploy and configure DBs inside of k8s cluster.

Operators themselves are complex pieces of software that need to be correctly started and configured to deploy DBs. That is where PMM DBaaS comes in handy, to configure a lot for the end-user and provide a UI to choose what DB needs to be created, configured, or deleted.

Deploy PSMDB with DBaaS

Let’s deploy the PSMDB operator and DBs step by step and check them in detail.

  • Deploy PSMDB operator
curl -sSf -m 30 \
| kubectl apply -f -

  • Here’s how it could be checked that operator was created:
$ kubectl get deployment
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
percona-server-mongodb-operator   1/1     1            1           46h
pmm-deployment                    1/1     1            1           24h

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
percona-server-mongodb-operator-586b769b44-hr7mg   1/1     Running   2          46h
pmm-deployment-7fcb579576-hwf76                    1/1     Running   1          24h

Now it is seen on the PMM DBaaS Kubernetes Cluster Dashboard that the MongoDB operator is installed.

Cluster with PSMDB


All REST APIs could be discovered via Swagger; it is exposed on both ports (30080 and 30443 in case of minikube) and could be accessed by appending /swagger to the PMM server address. It is recommended to use https (30443 port), and for example, the URL could look like this:

As DBaaS is a feature under active development, replace /swagger.json to /swagger-dev.json and push the Explore button.

Swagger API

Now all APIs can be seen and even executed.

Let’s try it out. First Authorize and then find /v1/management/DBaaS/Kubernetes/List and push Try it out and Execute. There will be an example of curl as well as response to the REST API POST request. The curl example could be used from the command line as well:

$ curl -kX POST "" -H  "accept: application/json" -H  "authorization: Basic YWRtaW46YWRtaW4=" -H  "Content-Type: application/json" -d "{}"
  "kubernetes_clusters": [
      "kubernetes_cluster_name": "minikube",
      "operators": {
        "xtradb": {
        "psmdb": {
          "status": "OPERATORS_STATUS_OK"

PMM Swagger API Example

Create DB and Deep Dive

PMM Server consists of different components, and for the DBaaS feature, here are the main ones:

  • Grafana UI with DBaaS dashboards talk to pmm-managed through REST API to show the current state and provides a user interface
  • pmm-managed acts as REST gateway and holds kubeconfig and talks to dbaas-controller through gRPC
  • dbaas-controller implements DBaaS features, talks to k8s, and exposes gRPC interface for pmm-managed

The Grafana UI is what users see, and now when operators are installed, the user could create the MongoDB instance. Let’s do this.

  • Go to DBaaS -> DB Cluster page and push Create DB Cluster link
  • Choose your options and push Create Cluster button

Create MongoDB cluster

It has more advanced options to configure resources allocated for the cluster:

Advanced settings for cluster creation

As seen, the cluster was created and could be manipulated. Now let’s see in detail what has happened underneath.

PSMDB Cluster created

When the user pushes the Create Cluster button, Grafana UI POSTS /v1/management/DBaaS/PSMDBCluster/Create request to pmm-managed. pmm-managed handles the request and sends it via gRPC to the dbaas-controller together with kubeconfig.

dbaas-controller handles requests, and with knowledge of operator structure (Custom Resources/CRD), it prepares CR with all needed parameters to create a MongoDB cluster. After filling all needed structures, dbaas-controller converts CR to yaml file and applies it with kubectl apply -f command. kubectl gets pre-configured with kubeconf file (that was passed by pmm-managed from its DB) to talk to the correct cluster, and the kubeconf file is temporarily created and deleted immediately after the request.

The same happens when some parameters change or dbaas-controller gets some parameters from the k8s cluster.

Essentially, the dbaas-controller automates all stages to fill CRs with correct parameters, check that everything works correctly, and returns details about clusters created. The kubectl interface is used for simplicity but it is subject to change before GA, most probably to k8s Go API.


All together, PMM Server DBaaS provides a seamless experience for the user to deploy DB clusters on top of Kubernetes with simple and nice UI without the need to know operators’ internals. Deploying PXC and PSMDB clusters it also configures PMM Agents and exporters, thus all monitoring data is present in PMM Server right away.

PMM PSMDB overview

Go to PMM Dashboard -> MongoDB -> MongoDB Overview and see MongoDB monitoring data, explorer nodes, and service monitoring too, which comes pre-configured with the help of the DBaaS feature.

Give it a try, submit feedback, and chat with us, we would be happy to hear from you!


Don’t forget to stop and/or delete your minikube cluster if it is not used:

  • Stop minikube cluster, to not use resources (could be started with start again)
$ minikube stop

  • If a cluster is not needed anymore, delete minikube cluster
$ minikube delete



Percona Live ONLINE: Percona Previews Open Source Database as a Service

percona open source dbaas

percona open source dbaasPercona Live ONLINE 2021 starts today!  

Representing dozens of projects, communities, and tech companies, and featuring more than 150 expert speakers across 200 sessions, there’s still time to register and attend. 

Register and Attend

Percona latest product announcements focus on Percona’s open source DBaaS preview, and new Percona Kubernetes Operators features.

During Percona Live ONLINE 2021, our experts will be discussing the preview of Percona’s 100% open source Database as a Service (DBaaS), which eliminates vendor lock-in and enables users to maintain control of their data. 

As an alternative to public cloud and large enterprise database vendor DBaaS offerings, this on-demand self-service option provides users with a convenient and simple way to deploy databases quickly. Using Percona Kubernetes Operators means it is possible to configure a database once and deploy it anywhere.

“The future of databases is in the cloud, an approach confirmed by the market and validated by our own customer research,” said Peter Zaitsev, co-founder and CEO of Percona. “We’re taking this one step further by enabling open source databases to be deployed wherever the customer wants them to run – on-premises, in the cloud, or in a hybrid environment. Companies want the flexibility of DBaaS, but they don’t want to be tied to their original decision for all time – as they grow or circumstances change, they want to be able to migrate without lock-in or huge additional expenses.”

The DBaaS supports Percona open source versions of MySQL, MongoDB, and PostgreSQL. 

Critical database management operations such as backup, recovery, and patching will be managed through the Percona Monitoring and Management (PMM) component of Percona DBaaS. 

PMM is completely open source and provides enhanced automation with monitoring and alerting to find, eliminate, and prevent outages, security issues, and slowdowns in performance across MySQL, MongoDB, PostgreSQL, and MariaDB databases.

Customer trials of Percona DBaaS will start this summer. Businesses interested in being part of this trial can register here.

Easy Deployment and Management with Kubernetes Operators from Percona

The Kubernetes Operator for Percona Distribution of PostgreSQL is now available in technical preview, making it easier than ever to deploy. This Operator streamlines the process of creating a database so that developers can gain access to resources faster, as well as then ongoing lifecycle management.

There are also new capabilities available in the Kubernetes Operator for Percona Server for MongoDB, which support enterprise mission-critical deployments with features for advanced data recovery. It now includes support for multiple shards, which provides horizontal database scaling, and allows for distribution of data across multiple MongoDB Pods. This is useful for large data sets when a single machine’s overall processing speed or storage capacity is insufficient. 

This Operator also allows Point-in-Time Recovery, which enables users to roll back the cluster to a specific transaction and time, or even skip a specific transaction. This is important when data needs to be restored to reverse a problem transaction or ransomware attack.

The new Percona product announcements will be discussed in more detail at our annual Percona Live ONLINE Open Source Database Conference 2021 starting today.

We hope you’ll join us! Register today to attend Percona Live ONLINE for free.

Register and Attend


Webinar March 18: Moving Your Database to the Cloud – Top 3 Things to Consider

Moving Your Database to the Cloud

Moving Your Database to the CloudJoin Rick Vasquez, Percona Technical Expert, as he discusses the pros and cons of moving your database to the cloud.

Flexibility, performance, and cost management are three things that make cloud database environments an easy choice for many businesses. If you are thinking of moving your database to the cloud you need to know the issues you might encounter, and how you can make the most of your DBaaS cloud configuration.

Our latest webinar looks into a number of key issues and questions, including:

* The real Total Cost of Ownership (TCO) when moving your database to the cloud.

* If performance is a critical factor in your application, how do you achieve the same or better performance in your chosen cloud?

* The “hidden costs” of running your database in the cloud.

* Why do companies choose open source and cloud software? The pros and cons of this decision.

* The case for cloud support (why “fully managed” isn’t always as fully managed as claimed).

* Should you consider a multi-cloud strategy?

* Business continuity planning in the cloud – what if your provider has an outage?

Please join Rick Vasquez, Percona Technical Expert, on Thursday, March 18, 2021, at 1:00 PM EST for his webinar “Moving Your Database to the Cloud – Top 3 Things to Consider“.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.


Point-In-Time Recovery in Kubernetes Operator for Percona XtraDB Cluster – Architecture Decisions

Point-In-Time Recovery in Kubernetes Operator

Point-In-Time Recovery in Kubernetes OperatorPoint-In-Time Recovery (PITR) for MySQL databases is an important feature that is essential and covers common use cases, like a recovery to the latest possible transaction or roll-back the database to a specific date before some bad query was executed. Percona Kubernetes Operator for Percona XtraDB Cluster (PXC) added support for PITR in version 1.7, and in this blog post we are going to look into the technical details and decisions we made to implement this feature.

Architecture Decisions

Store Binary Logs on Object Storage

MySQL uses binary logs to perform point-in-time recovery. Usually, they are stored locally along with the data, but it is not an option for us:

  • We run the cluster and we cannot rely on a single node’s local storage.
  • The cloud-native world lives in an ephemeral dimension, where nodes and pods can be terminated and S3-compatible storage is a de facto standard to store data.
  • We should be able to recover the data to another Kubernetes cluster in case of a disaster.

We have decided to add a new Binlog Uploader Pod, which connects to the available PXC member and uploads binary logs to S3. Under the hood, it relies on the mysqlbinlog utility.

Use Global Transaction ID

Binary logs on the clustered nodes are not synced and can have different names and contents. This becomes a problem for the Uploader, as it can connect to different PXC nodes for various reasons.

To solve this problem, we decided to rely on Global Transaction ID (GTID). It is a unique transaction identifier, but it is unique not only to the server on which it originated, but is unique across all servers in a given replication topology.  With the GTID captured in binary logs, we can identify any transaction not depending on the filename or its contents. This allows us to continue streaming binlogs from any PXC member at any moment.

User-Defined Functions

We have a unique identifier for every transaction, but the mysqlbinlog utility still doesn’t have the functionality to determine which binary log file contains which GTID. We decided to extend MySQL with few User Defined Functions and added them to Percona Server for MySQL and Percona XtraDB Cluster versions 8.0.21


This function returns all GTIDs that are stored inside the given binlog file. We put the GTID setlist to a new file next to the binary log on S3.


This function takes GTID set as an input and returns a binlog filename which is stored locally. We use it to figure out which GTIDs are already uploaded and which binlog to upload next. 

binlog uploader pod

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Find the node with the oldest binary log

Our quality assurance team caught a bug before the release which can happen in the cluster only:

  • Add a new node to the Percona XtraDB Cluster (for example scale up from 3 to 5 nodes).
  • Binlog Uploader Pod tries to execute get_binlog_by_gtid_set on the new node but gets the error.
2021/01/19 11:23:19 ERROR: collect binlog files: get last uploaded binlog name by gtid set: scan binlog: sql: Scan error on column index 0, name "get_binlog_by_gtid_set('a8e657ab-5a47-11eb-bea2-f3554c9e5a8d:15')": converting NULL to string is unsupported

The error is valid, as this node is new and there are no binary log files that have the GTID set that Uploader got from S3. If you look into this pull request, the quick patch is to always pick the oldest node in the array or in other words the node, which most likely would have the binary logs we need. In the next release of the Operator, we add more sophisticated logic, to discover the node which has the oldest binary logs for sure.

Storageless binlog uploader

The size of binlogs depends on the cluster usage patterns, so it is hard to predict the size of the storage or memory required for them. We decided to take this complexity away by making our Binary Log Uploader Pod completely storageless. Mysqlbinlog can store remote binlog only into files, but we need to put them to S3. To get there we decided to use a named pipe or FIFO special file. Now mysqlbinlog utility loads the binary log file to a named pipe, our Uploader reads it and streams the data directly to S3.

Also, storageless design means that we never store any state between Uploader restarts. Basically, state is not needed, we only need to know which GTIDs are already uploaded and we have this data on a remote S3 bucket. Such design enables the continuous upload flow of binlogs.

Binlog upload delay

S3 protocol expects that the file is completely uploaded. If the file upload is interrupted (let’s say Uploader Pod is evicted), the file will not be accessible/visible on S3. Potentially we can lose many hours of binary logs because of such interruptions. That’s why we need to split the binlog stream into files and upload them separately.

One of the options that users can configure when enabling point-in-time recovery in Percona XtraDB Cluster Operator is timeBetweenUploads. It sets the number of seconds between uploads for Binlog Uploader Pod. By default, we set it to 60 seconds, but it can go down to one second. We do not recommend setting it too low, as every invocation of the Uploader leads to FLUSH BINARY LOGS command execution on the PXC node. We need to flush the logs to close the binary log file to upload it to external storage, but doing it frequently may negatively affect IO and as a result database performance.


It is all about recovery and it has two steps:

  1. Recover the cluster from a full backup
  2. Apply binary logs

We already have the functionality to restore from a full backup (see here), so let’s get to applying the binary logs.

First, we need to figure out from which GTID set we should start applying binary logs – in other words: where do we start?. As we rely on the Percona XtraBackup utility to take full MySQL backups, what we need to do is read the xtrabackup_info file which has lots of useful metadata. We already have this file on S3 near the full backup.

Second, find the binlog which has the GTID set we need. As you remember, we store a file with binlog’s GTID sets on S3 already, so it boils down to reading these files.

Third, download binary logs and apply them. Here we rely on mysqlbinlog as well, which has the flags we need, like –stop-datetime – which stops recovery when the event with a specific timestamp is caught in the log.

point in time recovery


MySQL is more than 25 years old and has a great tooling ecosystem established around it, but as we saw in this blog post, not all these tools are cloud-native ready. Percona engineering teams are committed to providing users the same features across various environments, whether it is a bare-metal installation in the data center or cutting edge Kubernetes in the cloud.


Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator

Kubernetes Costs Percona Monitoring and Management

Kubernetes Costs Percona Monitoring and ManagementMore and more companies are adopting Kubernetes, but after some time they see an unexpected growth around cloud costs. Engineering teams did their part in setting up auto-scalers, but the cloud bill is still growing. Today we are going to see how Percona Monitoring and Management (PMM) can help with monitoring Kubernetes and reducing the costs of the infrastructure.

Get the Metrics


Prometheus Operator is a great tool to monitor Kubernetes as it deploys a full monitoring stack (prometheus, grafana, alertmanager, node exporters) and works out of the box. But if you have multiple k8s clusters, then it would be great to have a single pane of glass from which to monitor them all.

To get there I will have Prometheus Operator running on each cluster and pushing metrics to my PMM server. Metrics will be stored in VictoriaMetrics time-series DB, which PMM uses by default since the December 2020 release of version 2.12.

Prometheus Operator


I followed this manual to the letter to install my PMM server with docker. Don’t forget to open the HTTPS port on your firewall, so that you can reach the UI from your browser, and so that the k8s clusters can push their metrics to VictoriaMetrics through NGINX.

Prometheus Operator

On each Kubernetes cluster, I will now install Prometheus Operator to scrape the metrics and send them to PMM. Bear in mind that Helm charts are stored in prometheus-community repo.

Add helm repository

helm repo add prometheus-community

Prepare the configuration before installing the operator

$ cat values.yaml
    create: false

  enabled: false

    enabled: false

extraScrapeConfigs: |
    - url: https://{PMM_USER}:{PMM_PASS}@{YOUR_PMM_HOST}/victoriametrics/api/v1/write

      kubernetes_cluster_name: {UNIQUE_K8S_LABEL}

  • Disable alertmanager, as I will rely on PMM
  • Add remote_write section to write metrics to PMM’s VictoriaMetrics storage
    • Use your PMM user and password to authenticate. The default username and password are admin/admin. It is highly recommended to change defaults, see how here.
    • /victoriametrics

      endpoint is exposed through NGINX on PMM server

    • If you use https and a self-signed certificate you may need to disable TLS verification:
   insecure_skip_verify: true

  • external_labels

    section is important – it labels all the metrics sent from Prometheus. Each cluster must have a unique


    label to distinguish metrics once they are merged in VictoriaMetrics.

Create namespace and deploy

kubectl create namespace prometheus
helm install prometheus prometheus-community/prometheus -f values.yaml  --namespace=prometheus


Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!


  • PMM Server is up – check
  • Prometheus Operators run on Kubernetes Clusters – check

Now let’s check if metrics are getting to PMM:

  • Go to PMM Server UI
  • On the left pick Explore

PMM Server UI

  • Run the query

It should return the information about the Nodes running on the cluster with UNIQUE_K8S_LABEL. If it does – all good, metrics are there.

Monitor the Costs

The main reasons for the growth of the cloud bill are computing and storage. Kubernetes can scale up adding more and more nodes, skyrocketing compute costs. 

We are going to add two dashboards to the PMM Server which would equip us with a detailed understanding of how resources are used and what should be tuned to reduce the number of nodes in the cluster or change instance types accordingly:

  1. Cluster overview dashboard
  2. Namespace and Pods dashboard

Import these dashboards in PMM:

dashboards in PMM

Dashboard #1 – Cluster Overview

The goal of this dashboard is to provide a quick overview of the cluster and its workloads.

Cluster Overview

The cluster on the screenshot has some room for improvement in utilization. It has a capacity of 1.6 thousand CPU cores but utilizes only 146 cores (~9%). Memory utilization is better – ~62%, but can be improved as well.

Quick take:

  • It is possible to reduce # of nodes and get utilization to at least 80%
  • Looks like workloads in this cluster are mostly memory bound, so it would be wiser to run nodes with more memory and less CPU.

Graphs in the CPU/Mem Request/Limit/Capacity section gives a detailed view of resource usage over time:

CPU/Mem Request/Limit/Capacity section

Another two interesting graphs would show us the top 20 namespaces that are wasting resources. It is calculated as the difference between requests and real utilization for CPU and Memory. The values on this graph can be negative if requests for the containers are not set.

This dashboard also has a graph showing persistent volume claims and their states. It can potentially help to reduce the number of volumes spun up on the cloud.

Dashboard #2 – Namespace and Pod

Now that we have an overview, it is time to dive deeper into the details. At the top, this dashboard allows the user to choose the Cluster, the Namespace, and the Pod.

At first, the user sees Namespace details: Quotas (might be empty if Resource Quotas are not set for the namespace), requests, limits, and real usage for CPU, Memory, Pods, and Persistent Volume Claims.

Namespace and Pod

The Namespace on the screenshot utilizes almost zero CPU cores but requests 20+ cores. If requests are tuned properly, then the capacity required to run the workloads would drop and the number of nodes can be reduced.

The next valuable insight that the user can pick from this dashboard is real Pod utilization – CPU, Memory, Network, and disks (only local storage).

Pod CPU Usage

In the case above you can see CPU and Memory container-level utilization for Prometheus Pod, which is shipping the metrics on one of my Kubernetes clusters.


This blog post equips you with the design to collect multiple Kubernetes clusters metrics in a single time-series database and expose them on the Percona Monitoring and Management UI through dashboards to analyze and gain insights. These insights help you drive your infrastructure costs down and highlight issues on the clusters.

Also, look to PMM on Kubernetes for monitoring of your databases – see our demo here and contact Percona if you are interested in learning more about how to become a Percona Customer, we are here to help!

The call for papers for Percona Live is open. We’d love to receive submissions on topics related to open-source databases such as MySQL, MongoDB, MariaDB, and PostgreSQL. To find out more visit


DBaaS on Kubernetes: Under the Hood

DBaaS on Kubernetes

DBaaS on KubernetesRunning Database-as-a-Service (DBaaS) in the cloud is the norm for users today. It provides a ready-to-use database instance for users in a few seconds, which can be easily scaled and usually comes with a pay-as-you-go model.

We at Percona see that more and more cloud vendors or enterprises either want to or are already running their DBaaS workloads on Kubernetes (K8S). Today we are going to shed some light on how Kubernetes and Operators can be a good tool to run DBaaS and what the most common patterns are.

DBaaS on Kubernetes Offerings

Users usually do not care much how it works internally, but common expectations are:

  • The Database is provisioned and the user gets the endpoint information and a password to connect to it.
  • The Database is highly available and meets certain SLA. Usually, it starts with 99.9% uptime.
  • The Database can be monitored and backed up easily (remember, it is a managed service)


The first step for cloud providers or for big enterprises leaning towards DBaaS on Kubernetes is to choose the topology. There are various ways how it can be structured.

Kubernetes Cluster per DB

Kubernetes Cluster per DB

The advantage of this solution is that it is simple and provides good isolation of workloads by design.

The biggest disadvantage is that each Kubernetes cluster comes with its own masters, which run etcd and control-plane software. On a big scale, this adds a huge overhead.

Shared Kubernetes with Node Isolation

Shared Kubernetes with Node Isolation

Pros: Good isolation, no overhead

Cons: It might be hard to maintain a good utilization of all the nodes in the cluster, and, as a result, lots of computational power will be wasted.

Fully Shared Kubernetes

Fully Shared Kubernetes

We are talking about namespace isolation. Either each customer or each database gets its own namespace.

Pros: Good use of computational resources, no waste

Cons: Resources are isolated via cgroups and it might be tricky to deal with security and “noisy tenants”.

Shared Kubernetes with VM-Like Isolation

Shared Kubernetes with VM-Like Isolation

VM-like isolation is usually delivered by a special container runtime that hardens the boundary between the application and the host kernel.

It can be gVisor, Kata containers, even Firecracker. It slightly increases the resources overhead but improves security greatly.

High Availability

As you can see on the diagrams above, database nodes are running on different Kubernetes nodes. Such a design guarantees zero downtime in case of a single VM or server failure. Affinity or Pod Topology Thread Constraints (promoted to stable in k8s 1.19) are used in Kubernetes to enforce Pod scheduling on separate nodes. In Percona Kubernetes Operators we have anti-affinity enabled by default to ensure a smooth experience for the end-user.


The database cannot go without data and storage. And as always, there are various ways how Operators can store data in Kubernetes.


When somebody talks storage in Kubernetes usually it starts with StatefulSets, the primitive which “manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods”. In other words, this object ensures that Pods with data have unique names and start in a predictable order, which sometimes is crucial for the databases.

Persistent Volume Claims

Persistent Volume Claims

PVCs over network storage is the obvious choice in Kubernetes world, but there is no magic and network storage must be in place. Most cloud providers already have it, e.g. AWS EBS, Google Storage Engine, etc. There are solutions for private clouds as well – Ceph, Gluster, Portworx, etc. Some of them are even cloud-native – like, which is Ceph on k8s.

The performance of PVCs heavily depends on network performance and the underlying disks used for the storage.

Local Storage

Kubernetes provides HostPath and EmptyDir primitives to use the storage of the node where the Pod runs. Local storage might be a good choice to boost performance and in cases where network storage is not available or introduces complexity (e.g. in private clouds)

Local Storage


It is ephemeral storage at its best – a Pod or node restart will wipe the data completely from the node. With good anti-affinity, it is possible to use EmptyDir for production workloads, but it’s not recommended. It is worth mentioning that EmptyDir can be used to store data in memory for extra performance:

- name: file-storage 
    medium: Memory


Pod stores data on a locally mounted disk of the Node. In this case, a Pod or Node restart does not cause data loss on the DB node. It is much easier to manage HostPath through PVCs with the OpenEBS Container Storage Interface. Read this great blog post from Percona’s CTO Vadim Tkachenko on how Percona Operators can be deployed with OpenEBS.

Pitfall #1 – Node Failure Causes Data Loss

Using local storage is tricky because in case of a full node crash the data for one Pod will be lost completely and once the Pod starts back on a new node, it must be recovered somehow or synced from other nodes. It depends on the technology. For example, Percona XtraDB Cluster will do the State Snapshot Transfer with Galera to sync the data to an empty node.

Node Failure Causes Data Loss

Pitfall #2 – Data Limitation is Hard

DBaaS should provide users with the database instance with limited resources, including limits on the storage. For example, 100 GBs per instance. With local storage, there is no way to limit the growth of the database. There are a couple of ways on how to address this:

  1. Run a sidecar container that constantly monitors the data consumption (e.g. with du)
  2. Pick a topology where the database node runs on a dedicated Kubernetes node (see topologies above). In such a case you can attach only the amount of storage required by the tier.


Now DBaaS needs to give the user the endpoint to connect to the database. Remember, if we want our database service to be highly available we run multiple DB nodes.

In Percona Operator for Percona XtraDB Cluster users can choose HAProxy or ProxySQL for traffic balancing across the nodes. The proxy will be deployed along with the DB nodes to balance the traffic between them and monitor the DB node’s health. Without these proxies either the application needs to figure out where to route the traffic itself or use some other external proxy.

The next step is to expose the proxy outside of Kubernetes using regular primitives: LoadBalancer, Ingress with TCP or NodePort, and return the endpoint information to the user.

DBaas Network

Day 2 Operations

Once the platform design is ready and users are happy with Day 1 operations – getting the database up and running – comes Day 2, where DBaaS automates the day-to-day routines.


Kubernetes by design provides scaling capabilities. You can read more about it in my previous blog post Kubernetes Scaling Capabilities with Percona XtraDB Cluster.

DBaaS scaling boils down to increasing the resources of the Kubernetes cluster and then the Pods. The steps may differ depending on the chosen topology, but general capacity rules still apply and Pods will not be scheduled if there is not enough capacity.


Preferably DBaaS should provide automated database upgrades at least for minor versions, otherwise, every security vulnerability or bug in the DB engine would force users to perform manual upgrades.

Percona Operators have a Smart Update feature that asks for the latest version of the component, fetches the latest docker image, and safely rolls it out relying on Kubernetes automation.


Every DBA at least once in their life dropped the production database. This is where backups come in handy.  DBaaS must also automate backup management and lifecycle.

Usually, DB engines already come with the tools to take and restore backups (mysqldump, mongodump, XtraBackup, etc.). The goal here is to embed these tools into Kubernetes world and give users a clear interface to interact with backups.

Percona Operators provide Custom Resources for backups and rely on a Cronjob object to execute a sequence of commands on schedule. DBaaS in this case can be easily integrated and trigger backups and restores by simply pushing the manifests through the K8S control plane.

Point-in-time recovery

Recovering from backup which is a few hours old might not be an acceptable option for critical data. Rolling back the transaction or recovering the data to the latest possible state are the most common use cases for point-in-time recovery (PITR). Most databases have the capability to perform PITR, but they are not designed for the cloud-native world. The recently released version 1.7.0 of Percona XtraDB Cluster Operator provides PITR support out of the box. In future blog posts, we will tell you more about the challenges and technical design of the solution.


Kubernetes vigorously embraces the ephemeral nature of the containers and takes it to the next level by making nodes ephemeral as well. This complicates the regular monitoring approach, where the IP-addresses and ports of the services are well known and probed by the monitoring server. Luckily, the cloud-native landscape is full of monitoring tools that can solve this problem. Take a look at Prometheus Operator, for example.

In Percona we have our own dashing Percona Monitoring and Management (PMM) tool, which integrates with our Operators automatically. To solve the challenge of ephemeral IP-addresses and ports we switched our time-series database from Prometheus to VictoriaMetrics, which allows pushing the metrics, instead of pulling.


It is definitely fun to build DBaaS on Kubernetes and there are multiple ways on how to achieve this goal. The easiest way to do it is by leveraging Operators, as they automate out most of the manual burden. Delivering DBaaS requires rigorous planning and a deep understanding of the user requirements.

In January we kicked off the Preview of Database as a Service in Percona Monitoring and Management. It is fully open source and runs on our Operators in the background to provide the database. We are still looking for users to test and provide feedback during this year-long program, and we would be grateful for your participation and feedback!


Learn more about Percona Kubernetes Operators for Percona XtraDB Cluster or Percona Server for MongoDB


Webinar February 10: Kubernetes – The Path to Open Source DBaaS

Kubernetes The Path to Open Source DBaaS

Kubernetes The Path to Open Source DBaaSDon’t miss out! Join Peter Zaitsev, Percona CEO, as he discusses DBaaS and Kubernetes.

DBaaS is the fastest-growing way to deploy databases. It is fast and convenient yet it is typically done using proprietary software and tightly coupled to the cloud vendor. Percona believes Kubernetes finally enables a fully Open Source DBaaS Solution capable of being deployed anywhere Kubernetes runs – on the Public Cloud or in your private data center.

In this presentation, he will describe the most important user requirements and typical problems you would encounter building a DBaaS Solution and explain how you can solve them using the Kubernetes Operator framework.

Please join Peter Zaitsev, Percona CEO, on Wednesday, February 10, 2021, at 11:00 AM EST for his webinar on DBaaS and Kubernetes.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.


The Preview of Database as a Service (DBaaS) in Percona Monitoring and Management is Now Live!

DBaaS Percona Monitoring and Management

This week we officially kick-off the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management. We are still looking for users to test and provide feedback during this year-long program, and we would love you to participate! 

Preview of Database as a Service in Percona Monitoring and Management


Our vision is to deliver a truly open source solution that won’t lock you in. A single pane of glass to easily manage your open source database infrastructure, and a self-service experience enabling fast and consistent open source database deployment. 

Our goal is to deliver the enterprise benefits our customers are looking for, including:

  • A single interface to deploy and manage your open source databases on-premises, in the cloud, or across hybrid and multi-cloud environments.
  • The ability to configure a database once and deploy it anywhere. 
  • Critical database management operations, such as backup, recovery, and patching.
  • Enhanced automation and advisory services allow you to find, eliminate, and prevent outages, security issues, and slowdowns. 
  • A viable alternative to public cloud and large enterprise database vendor DBaaS offerings, allowing you to eliminate vendor lock-in.

Percona applies a user-driven product development process. So, we hope our user community will get involved in the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management and help inform the design and development of this new software functionality.

The Preview is a year-long program consisting of four phases. Each three-month phase will focus on a different area of participant feedback. Preview participants can be involved in as many phases as they like.

Preview of Database as a Service (DBaaS) in Percona Monitoring and Management

Phase One Details for Interested Participants

Phase one will focus on:

  1. Gathering feedback allows us to understand the applicable user personas and the goals and objectives required in their day-to-day roles.
  2. Gathering feedback on the user experience, specifically involving creating, editing, and deleting database clusters and the databases within those clusters. We will also gather feedback on the management of those clusters and the monitoring of added database servers and nodes.

We are starting with a focus on database deployment and management features, as they help users improve their productivity. 

Other details to note…

  • Phase one of the Preview will run until April 2021
  • Phase one requires around 10 hours of self-paced activities, facilitated through the Percona Forum
    • All Community Preview participant feedback will be captured within the Percona Forum
    • Community Preview participant questions will be facilitated through the Percona Forum.

So make sure to sign up to participate in the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management and become a crucial participant in this initiative, helping shape future users’ experience as we develop and test this new software functionality! 

Register Now!


Is Serverless Just a New Word for Cloud Based?

serverless architecture

serverless architectureServerless is a new buzzword in the database industry. Even though it gets tossed around often, there is some confusion about what it really means and how it really works. Serverless architectures rely on third-party Backend as a Service (BaaS) services. They can also include custom code that is run in managed, ephemeral containers on a Functions as a Service (FaaS) platform. In comparison to traditional Platform as a Service (PaaS) server architecture, where you pay a predetermined sum for your instances, serverless applications benefit from reduced costs of operations and lower complexity. They are also considered to be more agile, allowing for reduced engineering efforts.

In reality, there are still servers in a serverless architecture: they are just being used, managed, and maintained outside of the application. But isn’t that a lot like what cloud providers, such as Amazon RDS, Google Cloud, and Microsoft Azure, are already offering? Well, yes, but with several caveats.

When you use any of the aforementioned platforms, you still need to provision the types of instances that you plan to use and define how those platforms will act. For example, will it run MySQL, MongoDB, PostgreSQL, or some other tool? With serverless, these decisions are no longer needed. Instead, you simply consume resources from a shared resource pool, using whatever application suits your needs at that time. In addition, in a serverless world, you are only charged for the time that you use the server instead of being charged whether you use it a lot or a little (or not at all).

Remember When You Joined That Gym?

How many of us have purchased a gym membership at some point in our life? Oftentimes, you walk in with the best of intentions and happily enroll in a monthly plan. “For only $29.95 per month, you can use all of the resources of the gym as much as you want.” But, many of us have purchased such a membership and found that our visits to the gym dwindle over time, leaving us paying the same monthly fee for less usage.

Traditional Database as a Service (DBaaS) offerings are similar to your gym membership: you sign up, select your service options, and start using them right away. There are certainly cases of companies using those services consistently, just like there are gym members who show up faithfully month after month. But there are also companies who spin up database instances for a specific purpose, use the database instance for some amount of time, and then slowly find that they are accessing that instance less and less. However, the fees for the instance, much like the fees for your gym membership, keep getting charged.

What if we had a “pay as you go” gym plan? Well, some of those certainly exist. Serverless architecture is somewhat like this plan: you only pay for the resources when you use them, and you only pay for your specific usage. This would be like charging $5 for access to the weight room and $3 for access to the swimming pool, each time you use one or the other. The one big difference with serverless architecture for databases is that you still need to have your data stored somewhere in the environment and made available to you as needed. This would be like renting a gym locker to store your workout gear so that didn’t have to bring it back and forth each time you visited.

Obviously, you will pay for that storage, whether it is your data or your workout gear, but the storage fees are going to be less than your standard membership. The big advantage is that you have what you need when you need it, and you can access the necessary resources to use whatever you are storing.

With a serverless architecture, you store your data securely on low cost storage devices and access as needed. The resources required to process that data are available on an on demand basis. So, your charges are likely to be lower since you are paying a low fee for data storage and a usage fee on resources. This can work great for companies that do not need 24x7x365 access to their data since they are only paying for the services when they are using them. It’s also ideal for developers, who may find that they spend far more time working on their application code than testing it against the database. Instead of paying for the database resources while the data is just sitting there doing nothing, you now pay to store the data and incur the database associated fees at use time.

Benefits and Risks of Going Serverless

One of the biggest possible benefits of going with a serverless architecture is that you save money and hassle. Money can be saved since you only pay for the resources when you use them. Hassle is reduced since you don’t need to worry about the hardware on which your application runs. These can be big wins for a company, but you need to be aware of some pitfalls.

First, serverless can save you money, but there is no guarantee that it will save you money.

Consider 2 different people who have the exact same cell phone – maybe it’s your dad and your teenage daughter. These 2 users probably have very different patterns of usage: your dad uses the phone sporadically (if at all!) and your teenage daughter seems to have her phone physically attached to her. These 2 people would benefit from different service plans with their provider. For your dad, a basic plan that allows some usage (similar to the base cost of storage in our serverless database) with charges for usage above that cap would probably suffice. However, such a plan for your teenage daughter would probably spiral out of control and incur very high usage fees. For her, an unlimited plan makes sense. What is a great fit for one user is a poor fit for another, and the same is true when comparing serverless and DBaaS options.

The good news is that serverless architectures and DBaaS options, like Amazon RDS, Microsoft Azure, and Google Cloud, reduce a lot of the hassle of owning and managing servers. You no longer need to be concerned about Mean Time Between Failures, power and cooling issues, or many of the other headaches that come with maintaining your hardware. However, this can also have a negative consequence.

The challenge of enforced updates

About the only thing that is consistent about software in today’s world is that it is constantly changing. New versions are released with new features that may or may not be important to you. When a serverless provider decides to implement a new version or patch of their backend, there may be some downstream issues for you to manage. It is always important to test any new updates, but now some of the decisions about how and when to upgrade may be out of your control. Proper notification from the provider gives you a window of time for testing, but they are probably going to flip the switch regardless of whether or not you have completed all of your test cycles. This is true of both serverless and DBaaS options.

A risk of vendor lock-in

A common mantra in the software world is that we want to avoid vendor lock-in. Of course, from the provider’s side, they want to avoid customer churn, so we often find ourselves on opposite sides of the same issue. Moving to a new platform or provider becomes more complex as you cede more aspects of server management to the host. This means that serverless can cause deep lock-in since your application is designed to work with the environment as your provider has configured it. If you choose to move to a different provider, you need to extract your application and your data from the current provider and probably need to rework it to fit the requirements of the new provider.

The challenge of client-side optimization

Another consideration is that optimizations of server-side configurations must necessarily be more generic compared to those you might make to self-hosted servers. Optimization can no longer be done at the server level for your specific application and use; instead, you now rely on a smarter client to perform your necessary optimizations. This requires a skill set that may not exist with some developers: the ability to tune applications client-side.


Serverless is not going away. In fact, it is likely to grow as people come to a better understanding and comfort level with it. You need to be able to make an informed decision regarding whether serverless is right for you. Careful consideration of the pros and cons is imperative for making a solid determination. Understanding your usage patterns, user expectations, development capabilities, and a lot more will help to guide that decision.

In a future post, I’ll review the architectural differences between on-premises, PaaS, DBaaS and serverless database environments.


The post Is Serverless Just a New Word for Cloud Based? appeared first on Percona Database Performance Blog.


Percona Support with Amazon RDS

Amazon RDS

This blog post will give a brief overview of Amazon RDS capabilities and limitations, and how Percona Support can help you succeed in your Amazon RDS deployments.

One of the common questions that we get from customers and prospective customers is about Percona Support with Amazon RDS. As many companies have shifted to the cloud, or are considering how to do so, it’s natural to try to understand the limitations inherent in different deployment strategies.

Why Use Amazon RDS?

As more companies move to using the cloud, we’ve seen a shift towards work models in technical teams that require software developers to take on more operational duties than they have traditionally. This makes it essential to abstract infrastructure so it can be interacted with as code, whether through automation or APIs. Amazon RDS presents a compelling DBaaS product with significant flexibility while maintaining ease of deployment.

Use Cases Where RDS Isn’t a Fit

There are a number of use cases where the inherent limitations of RDS make it not a good fit. With RDS, you are trading off the flexibility to deploy complex environment topologies for the ease of deploying with the push of a button, or a simple API call. RDS eliminates most of the operational overhead of running a database in your environment by abstracting away the physical or virtual hardware and the operating system, networking and replication configuration. This, however, means that you can’t get too fancy with replication, networking or the underlying operating system or hardware.

When Using RDS, Which Engine is Right For Me?

Amazon’s RDS has numerous database engines available, each suited to a specific use case. The three RDS database engines we’ll be discussing briefly here are MySQL, MariaDB and Aurora.

Use MySQL when you have an application tuned for MySQL, you need to use MySQL plug-ins or you wish to maintain compatibility to support external replicas in EC2. MySQL with RDS has support for Memcached, including plug-in support and 5.7 compatible query optimizer improvements. Unfortunately, thread pooling and similar features that are available in Percona Server for MySQL are not currently available in the MySQL engine on RDS.

Use MariaDB when you have an application that requires features available for this engine but not in others. Currently, MariaDB engines in RDS support thread pooling, table elimination, user roles and virtual columns. MySQL or Aurora don’t support these. MariaDB engines in RDS support global transaction IDs (GTIDs), but they are based on the MariaDB implementation. They are not compatible with MySQL GTIDs. This can affect replication or migrations in the future.

Use Aurora when you want a simple-to-setup solution with strong availability guarantees and minimal configuration. This RDS database engine is cloud-native, built with elasticity and the vagaries of running in a distributed infrastructure in mind. While it does limit your configuration and optimization capabilities more than other RDS database engines, it handles a lot of things for you – including ensuring availability. Aurora automatically detects database crashes and restarts without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Aurora automatically fails over to one of up to 15 read replicas.

So If RDS Handles Operations, Why Do I Need Support?

Generally speaking, properly using a database implies four quadrants of tasks. RDS only covers one of these four quadrants: the operational piece. Your existing staff (or another provider such as Percona) must cover each of the remaining quadrants.

Amazon RDS
Amazon RDS

The areas where people run into trouble are slow queries, database performance not meeting expectations or other such issues. In these cases they often can contact Amazon’s support line. The AWS Support Engineers are trained and focused on addressing issues specific to the AWS environment, however. They’re not DBAs and do not have the database expertise necessary to fully troubleshoot your database issues in depth. Often, when an RDS user encounters a performance issue, the first instinct is to increase the size of their AWS deployment because it’s a simple solution. A better path would be investigating performance tuning. More hardware is not necessarily the best solution. You often end up spending far more on your monthly cloud hosting bill than necessary by ignoring unoptimized configurations and queries.

As noted above, when using MariaDB or MySQL RDS database engines you can make use of plug-ins and inject additional configuration options that aren’t available in Aurora. This includes the ability to replicate to external instances, such as in an EC2 environment. This provides more configuration flexibility for performance optimization – but does require expertise to make use of it.

Outside support vendors (like Percona) can still help you even when you eliminate the operational elements by lending the expertise to your technical teams and educating them on tuning and optimization strategies.

Powered by WordPress | Theme: Aeros 2.0 by