Mar
22
2023
--

Deploy Percona Monitoring and Management on Amazon EKS With eksctl and Helm

Deploy Percona Monitoring and Management on Amazon EKS

One of the installation methods we support for our database software is through Helm. We have a collection of Helm charts, in this repository, for the following Percona software:

Through this blog post, you will learn how to install Percona Monitoring and Management in a Kubernetes cluster on Amazon EKS using Helm and eksctl.

Requirements

For the installation of PMM with Helm, you will need:

  • Kubernetes 1.22+
  • Helm 3.2.0+
  • PV (Persistent Volume) provisioner support in the underlying infrastructure

If you want to install PMM in a Kubernetes cluster on Amazon EKS, you can use eksctl to create the required cluster. I published this blog post Creating a Kubernetes cluster on Amazon EKS with eksctl, in the Percona Community blog, where I explain how to use this tool.

For an easy way to deploy the Kubernetes cluster, check Percona My Database as a Service (MyDBaaS), a Percona Labs project. I also recommend checking this article Percona Labs Presents: Infrastructure Generator for Percona Database as a Service (DBaaS) on our blog, where the process of creating the cluster is described.

MyDBaaS will help you with cluster creation. It will generate the configuration file needed for eksctl, or it can deploy the cluster to AWS.

To use eksctl you must:

  • Install kubectl
  • Create a user with minimum IAM policies
  • Create an access key ID and secret access key for the user previously created
  • Install AWS CLI and configure authentication (
    aws configure

    , from the command line)

  • Install eksctl

Create a Kubernetes cluster

To create the cluster, you need to generate the configuration file for eksctl. Go to https://mydbaas.labs.percona.com/ and fill out the details of your cluster.

MyDBaaS - Kubernetes Cluster Configuration

Figure 1: MyDBaaS – Kubernetes Cluster Configuration

  1. Give your cluster a name. MyDbaas is the default value.
  2. Select the number of nodes. The value selected by default is three but you can create a cluster of up to five nodes.
  3. Write the instance type.
  4. Select the region. The default is us-west-2.

If you don’t know what instance type to use, go to the Instance Type Selector and select:

  • Number of CPUs
  • Memory size
  • Region
Instance Type Selector

Figure 2: MyDBaaS – Instance Type Selector

As stated on the website, this tool will only return the configuration file needed for eksctl. You can also provide your AWS credentials for the tool to deploy the EKS cluster.

After filling out the details, click on Submit and you will get the configuration file that will look like this:

addons:
- name: aws-ebs-csi-driver
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: MyDbaas
  region: us-east-1
nodeGroups:
- desiredCapacity: 2
  iam:
    withAddonPolicies:
      ebs: true
      efs: true
      fsx: true
  instanceType: m5.large
  maxSize: 5
  minSize: 1
  name: ng-1
  preBootstrapCommands:
  - echo 'OPTIONS="--default-ulimit nofile=1048576:1048576"' >> /etc/sysconfig/docker
  - systemctl restart docker
  volumeSize: 100
  volumeType: gp2

Then, create the cluster by running the following command:

eksctl create cluster -f cluster.yaml

While running, eksctl will create the cluster and all the necessary resources. It will take a few minutes to complete.

eksctl Running

Figure 3: eksctl Running

Cluster credentials can be found in

~/.kube/config

. Try

kubectl get nodes

to verify that this file is valid, as suggested by eksctl.

Install PMM with Helm

Once the cluster has been created and configured, you can install PMM using Helm.

  1. Add the repository
helm repo add percona https://percona.github.io/percona-helm-charts/

  1. Install PMM
helm install pmm --set service.type="LoadBalancer" percona/pmm

Once the PMM server runs, you must get its IP address. Run this command to get this value.

kubectl get services monitoring-service

You will get an output similar to the following.

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)                      AGE
monitoring-service   LoadBalancer   10.100.17.121   a57d50410ca2f4c9d9b029da8f44f73f-254463297.us-east-1.elb.amazonaws.com   443:30252/TCP,80:31201/TCP   100s

a57d50410ca2f4c9d9b029da8f44f73f-254463297.us-east-1.elb.amazonaws.com

 is the external IP and the one you need to access PMM from the browser.

Before accessing the dashboard of PMM, get the password.

export ADMIN_PASS=$(kubectl get secret pmm-secret --namespace default -o jsonpath='{.data.PMM_ADMIN_PASSWORD}' | base64 --decode)
echo $ADMIN_PASS

The value of

$ADMIN_PASS

  is the password you need to log into the dashboard. The default user is admin.

Go to the browser and paste the external IP in the address bar.

PMM running on Kubernetes

Figure 4: PMM running on Kubernetes

Now you have PMM running in the cloud on Amazon EKS.

I recommend you check this article Percona Monitoring and Management in Kubernetes is now in Tech Preview on our blog for more information about PMM in Kubernetes using Helm.

Conclusion

Through this blog post, you learned how to create a Kubernetes cluster with eksctl and deploy PMM using Helm with the help of Percona MyDBaaS. The process would be the same for any of the Percona software in the collection of Helm charts.

Learn more about Percona MyDBaaS!

Mar
20
2023
--

Comparisons of Proxies for MySQL

mysql proxy

With a special focus on Percona Operator for MySQL

Overview

HAProxy, ProxySQL, MySQL Router (AKA MySQL Proxy); in the last few years, I had to answer multiple times on what proxy to use and in what scenario. When designing an architecture, many components need to be considered before deciding on the best solution.

When deciding what to pick, there are many things to consider, like where the proxy needs to be, if it “just” needs to redirect the connections, or if more features need to be in, like caching and filtering, or if it needs to be integrated with some MySQL embedded automation.

Given that, there never was a single straight answer. Instead, an analysis needs to be done. Only after a better understanding of the environment, the needs, and the evolution that the platform needs to achieve is it possible to decide what will be the better choice.

However, recently we have seen an increase in the usage of MySQL on Kubernetes, especially with the adoption of Percona Operator for MySQL. In this case, we have a quite well-defined scenario that can resemble the image below:

MySQL on Kubernetes

In this scenario, the proxies must sit inside Pods, balancing the incoming traffic from the Service LoadBalancer connecting with the active data nodes.

Their role is merely to be sure that any incoming connection is redirected to nodes that can serve them, which includes having a separation between Read/Write and Read Only traffic, a separation that can be achieved, at the service level, with automatic recognition or with two separate entry points.

In this scenario, it is also crucial to be efficient in resource utilization and scaling with frugality. In this context, features like filtering, firewalling, or caching are redundant and may consume resources that could be allocated to scaling. Those are also features that will work better outside the K8s/Operator cluster, given the closer to the application they are located, the better they will serve.

About that, we must always remember the concept that each K8s/Operator cluster needs to be seen as a single service, not as a real cluster. In short, each cluster is, in reality, a single database with high availability and other functionalities built in.

Anyhow, we are here to talk about Proxies. Once we have defined that we have one clear mandate in mind, we need to identify which product allows our K8s/Operator solution to:

  • Scale at the maximum the number of incoming connections
  • Serve the request with the higher efficiency
  • Consume as fewer resources as possible

The environment

To identify the above points, I have simulated a possible K8s/Operator environment, creating:

  • One powerful application node, where I run sysbench read-only tests, scaling from two to 4096 threads. (Type c5.4xlarge)
  • Three mid-data nodes with several gigabytes of data in with MySQL and Group Replication (Type m5.xlarge)
  • One proxy node running on a resource-limited box (Type t2.micro)

The tests

We will have very simple test cases. The first one has the scope to define the baseline, identifying the moment when we will have the first level of saturation due to the number of connections. In this case, we will increase the number of connections and keep a low number of operations.

The second test will define how well the increasing load is served inside the previously identified range. 

For documentation, the sysbench commands are:

Test1

sysbench ./src/lua/windmills/oltp_read.lua  --db-driver=mysql --tables=200 --table_size=1000000 
 --rand-type=zipfian --rand-zipfian-exp=0 --skip_trx=true  --report-interval=1 --mysql-ignore-errors=all 
--mysql_storage_engine=innodb --auto_inc=off --histogram  --stats_format=csv --db-ps-mode=disable --point-selects=50 
--reconnect=10 --range-selects=true –rate=100 --threads=<#Threads from 2 to 4096> --time=1200 run

Test2

sysbench ./src/lua/windmills/oltp_read.lua  --mysql-host=<host> --mysql-port=<port> --mysql-user=<user> 
--mysql-password=<pw> --mysql-db=<schema> --db-driver=mysql --tables=200 --table_size=1000000  --rand-type=zipfian 
--rand-zipfian-exp=0 --skip_trx=true  --report-interval=1 --mysql-ignore-errors=all --mysql_storage_engine=innodb 
--auto_inc=off --histogram --table_name=<tablename>  --stats_format=csv --db-ps-mode=disable --point-selects=50 
--reconnect=10 --range-selects=true --threads=<#Threads from 2 to 4096> --time=1200 run

Results

Test 1

As indicated here, I was looking to identify when the first Proxy will reach a dimension that would not be manageable. The load is all in creating and serving the connections, while the number of operations is capped at 100. 

As you can see, and as I was expecting, the three Proxies were behaving more or less the same, serving the same number of operations (they were capped, so why not) until they weren’t.

MySQL router, after the 2048 connection, could not serve anything more.

NOTE: MySQL Router actually stopped working at 1024 threads, but using version 8.0.32, I enabled the feature: connection_sharing. That allows it to go a bit further.  

Let us take a look also the latency:

latency threads

Here the situation starts to be a little bit more complicated. MySQL Router is the one that has the higher latency no matter what. However, HAProxy and ProxySQL have interesting behavior. HAProxy performs better with a low number of connections, while ProxySQL performs better when a high number of connections is in place.  

This is due to the multiplexing and the very efficient way ProxySQL uses to deal with high load.

Everything has a cost:

HAProxy is definitely using fewer user CPU resources than ProxySQL or MySQL Router …

HAProxy

.. we can also notice that HAProxy barely reaches, on average, the 1.5 CPU load while ProxySQL is at 2.50 and MySQL Router around 2. 

To be honest, I was expecting something like this, given ProxySQL’s need to handle the connections and the other basic routing. What was instead a surprise was MySQL Router, why does it have a higher load?

Brief summary

This test highlights that HAProxy and ProxySQL can reach a level of connection higher than the slowest runner in the game (MySQL Router). It is also clear that traffic is better served under a high number of connections by ProxySQL, but it requires more resources. 

Test 2

When the going gets tough, the tough get going

Let’s remove the –rate limitation and see what will happen. 

mysql events

The scenario with load changes drastically. We can see how HAProxy can serve the connection and allow the execution of more operations for the whole test. ProxySQL is immediately after it and behaves quite well, up to 128 threads, then it just collapses. 

MySQL Router never takes off; it always stays below the 1k reads/second, while HAProxy served 8.2k and ProxySQL 6.6k.

mysql latency

Looking at the latency, we can see that HAProxy gradually increased as expected, while ProxySQL and MySQL Router just went up from the 256 threads on. 

To observe that both ProxySQL and MySQL Router could not complete the tests with 4096 threads.

ProxySQL and MySQL Router

Why? HAProxy always stays below 50% CPU, no matter the increasing number of threads/connections, scaling the load very efficiently. MySQL router was almost immediately reaching the saturation point, being affected by the number of threads/connections and the number of operations. That was unexpected, given we do not have a level 7 capability in MySQL Router.

Finally, ProxySQL, which was working fine up to a certain limit, reached saturation point and could not serve the load. I am saying load because ProxySQL is a level 7 proxy and is aware of the content of the load. Given that, on top of multiplexing, additional resource consumption was expected.   

proxysql usage

Here we just have a clear confirmation of what was already said above, with 100% CPU utilization reached by MySQL Router with just 16 threads, and ProxySQL way after at 256 threads.

Brief summary

HAProxy comes up as the champion in this test; there is no doubt that it could scale the increasing load in connection without being affected significantly by the load generated by the requests. The lower consumption in resources also indicates the possible space for even more scaling.

ProxySQL was penalized by the limited resources, but this was the game, we had to get the most out of the few available. This test indicates that it is not optimal to use ProxySQL inside the Operator; it is a wrong choice if low resource and scalability are a must.    

MySQL Router was never in the game. Unless a serious refactoring, MySQL Router is designed for very limited scalability, as such, the only way to adopt it is to have many of them at the application node level. Utilizing it close to the data nodes in a centralized position is a mistake.  

Conclusions

I started showing an image of how the MySQL service is organized and want to close by showing the variation that, for me, is the one to be considered the default approach:

MySQL service is organized

This highlights that we must always choose the right tool for the job. 

The Proxy in architectures involving MySQL/Percona Server for MySQL/Percona XtraDB Cluster is a crucial element for the scalability of the cluster, no matter if using K8s or not. Choosing the one that serves us better is important, which can sometimes be ProxySQL over HAProxy. 

However, when talking about K8s and Operators, we must recognize the need to optimize the resources usage for the specific service. In that context, there is no discussion about it, HAProxy is the best solution and the one we should go to. 

My final observation is about MySQL Router (aka MySQL Proxy). 

Unless there is a significant refactoring of the product, at the moment, it is not even close to what the other two can do. From the tests done so far, it requires a complete reshaping, starting to identify why it is so subject to the load coming from the query more than the load coming from the connections.   

Great MySQL to everyone. 

References

Mar
16
2023
--

Percona Labs Presents: Infrastructure Generator for Percona Database as a Service (DBaaS)

Percona MyDBaaS

Let’s look at how you can run Percona databases on Kubernetes, the easy way.

Chances are that if you are using the latest Percona Monitoring and Management (PMM) version, you have seen the availability of the new Percona Database as a Service (DBaaS). If not, go and get a glimpse of the fantastic feature with our docs on DBaaS – Percona Monitoring and Management.

Now, if you like it and wanna give it a try! (Yay!), but you don’t wanna deal with Kubernetes, (nay)o worries; we have a tool for you. Introducing the Percona DBaaS Infrastructure Creator, or Percona My Database as a Service (MyDBaaS).

My Database as a Service

First, a clarification: this tool is focused on running your DBaaS on Amazon Web Services (AWS).

percona My Database as a Service

At the time of the writing of this article, the PMM DBaaS supported a limited number of cloud vendors, AWS being one of them. This tool creates the infrastructure in an AWS account.

The usage is pretty simple. You have three features:

  • An instance selector: In case you don’t know what instance type to use, but you do know how many CPUs and how much memory your DBs requires.
  • A Cluster Creator: This is where you decide on some minimal and basic properties of the cluster and deploy it.
  • Deploy a Percona Operator: You can choose from our three Operators for Kubernetes to be deployed on the same infrastructure you are creating.

Instance selector

Currently, AWS has 270 different instance types available. Which one to use? The instance selector will help you with that. Just pass the number of vCPUs, amount of memory, and the region you intend to use, and it will show a list of EKS-suitable EC2 instances.

percona my database Instance selector

Why ask for the region? Costs! Instances types have different costs depending on the region they are in, and costs are something the tool will show you:

Cluster creator

A very basic interface. You only need to pass the name of the cluster, the amount of desired nodes, the instance type, and on which region you would like to run the cluster.

percona my database cluster

If you pass your AWS key/secret, the tool will take care of deploying the cluster, and once it is done, it will return the contents of the Kubeconfig file for you to use in the DBaaS of PMM.

A note on how we handle your AWS key/secret

Percona will never store this information. In the particular case of this tool, we do set the following environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

After the creation of the cluster, the environment variables are unset.

Now, why is there an option to not pass the AWS credentials? Well, for security, of course. 

Under the hood, the cluster is created using EKSCTL, a “CloudFormation stack creator.” 

If you are proficient enough with ekstcl, you can just copy/paste the created YAML and run the eksctl command on your own server without sharing your credentials ever. Now, for DBaaS, you still have to provide them, but that’s another topic

using EKSCTL

If you choose to pass the AWS credentials, an EKS cluster will be deployed. The outcome will be the contents of the kubeconfig file

kubeconfig file Percona

With that in place, one can now go to PMM and register the Kubernetes cluster as described in https://docs.percona.com/percona-monitoring-and-management/dbaas/get-started.html#add-a-kubernetes-cluster to deploy the DBaaS in there.

But there’s more:

Percona Operators

Now, there’s an additional option: deploying a Percona Operator. Currently, PMM DBaaS focuses on the Percona Operator for MySQL based on Percona XtraDB Cluster. With the MyDBaaS tool, you can choose from three operators available.

To have the full deployment, you need to check both AWS credentials and deploy a Percona Operator.

deploy a Percona Operator

At the end of the installation, you will have the contents of the Kubeconfig file, plus:

  • The password of the root user 
  • Instructions on how to connect to it

installing percona operator for MySQL

After this, you can use a Percona database running on Kubernetes. Have fun!

About Percona Labs

Percona Labs is a place for the community to have a voice on how we approach solutions, curated by Percona’s CTO, Vadim Tkachenko. Feedback is always welcome:

Community Forum

Mar
15
2023
--

Automating Physical Backups of MongoDB on Kubernetes

MongoDB

We at Percona talk a lot about how Kubernetes Operators automate the deployment and management of databases. Operators seamlessly handle lots of Kubernetes primitives and database configuration bits and pieces, all to remove toil from operation teams and provide a self-service experience for developers.

Today we want to take you backstage and show what is really happening under the hood. We will review technical decisions and obstacles we faced when implementing physical backup in Percona Operator for MongoDB version 1.14.0. The feature is now in technical preview.

The why

Percona Server for MongoDB can handle petabytes of data. The users of our Operators want to host huge datasets in Kubernetes too. However, using a logical restore recovery method takes a lot of time and can lead to SLA breaches.

Our Operator uses Percona Backup for MongoDB (PBM) as a tool to backup and restore databases. We have been leveraging logical backups for quite some time. Physical backups in PBM were introduced a few months ago, and, with those, we saw significant improvement in recovery time (read more in this blog post, Physical Backup Support in Percona Backup for MongoDB):

physical backup mongodb

So if we want to reduce the Recovery Time Objective (RTO) for big data sets, we must provide physical backups and restores in the Operator.

The how

Basics

When you enable backups in your cluster, the Operator adds a sidecar container to each replset pod (including the Config Server pods if sharding is enabled) to run pbm-agent. These agents listen for PBM commands and perform backups and restores on the pods. The Operator opens connections to the database to send PBM commands to its collections or to track the status of PBM operations.

kubernetes backup mongodb

Backups

Enabling physical backups was fairly straightforward. The Operator already knew how to start a logical backup, so we just passed a single field to the PBM command if the requested backup was physical.

Unlike logical backups, PBM needs direct access to mongod’s data directory, and we mounted the persistent volume claim (PVC) on PBM sidecar containers. With these two simple changes, the Operator could perform physical backups. But whether you know it from theory or from experience (ouch!), a backup is useless if you cannot restore it.

Restores

Enabling physical restores wasn’t easy. PBM had two major limitations when it came to physical restores:

1. PBM stops the mongod process during the restore.

2. It runs a temporary mongod process after the backup files are copied to the data directory.

What makes physical restores a complex feature is dealing with the implications of these two constraints.

PBM kills the mongod process

PBM kills the mongod process in the container, which means the operator can’t query PBM collections from the database during the restore to track their status. This forced us to use the PBM CLI tool to control the PBM operations during the physical restore instead of opening a connection to the database.

Note: We are now considering using the same approach for every PBM operation in the Operator. Let us know what you think.

Another side effect of PBM killing mongod was restarting the mongod container. The mongod process runs as PID 1 in the container, and when you kill it, the container restarts. But the PBM agent needs to keep working after mongod is killed to complete the physical restore. For this, we have created a special entrypoint to be used during the restore. This special entrypoint starts mongod and just sleeps after mongod has finished.

Temporary mongod started by PBM

When you start a physical restore, PBM does approximately the following:

1. Kills the mongod process and wipes the data directory.

2. Copies the backup files to the data directory.

3. Starts a temporary mongod process listening on a random port to perform operations on oplog.

4. Kills the mongod and PBM agent processes.

For a complete walkthrough, see Physical Backup Support in Percona Backup for MongoDB.

From the Operator’s point of view, the problem with this temporary mongod process is its version. We support MongoDB 4.4, MongoDB 5.0, and now MongoDB 6.0 in the Operator, and the temporary mongod needs to be the same version as the MongoDB running in the cluster. This means we either include mongod binary in PBM docker images and create a separate image for each version combination or find another solution.

Well… we found it: We copy the PBM binaries into the mongod container before the restore. This way, the PBM agent can kill and start the mongod process as often as it wants, and it’s guaranteed to be the same version with cluster.

Tying it all together

By now, it may seem like I have thrown random solutions to each problem. PBM, CLI, special entrypoint, copying PBM binaries… How does it all work?

When you start a physical restore, the Operator patches the StatefulSet of each replset to:

1. Switch the mongod container entrypoint to our new special entrypoint: It’ll start pbm-agent in the background and then start the mongod process. The script itself will run as PID 1, and after mongod is killed, it’ll sleep forever.

2. Inject a new init container: This init container will copy the PBM and pbm-agent binaries into the mongod container.

3. Remove the PBM sidecar: We must ensure that only one pbm-agent is running in each pod.

Until all StatefulSets (except mongos) are updated, the PerconaServerMongoDBRestore object is in a “waiting” state. Once they’re all updated, the operator starts the restore, and the restore object goes into the requested state, then the running state, and finally, the ready or error state.

Conclusion

Operators simplify the deployment and management of applications on Kubernetes. Physical backups and restores are quite a popular feature for MongoDB clusters. Restoring from physical backup with Percona Backup for MongoDB has multiple manual steps. This article shows what kind of complexity is hidden behind this seemingly simple step.

 

Try Percona Operator for MongoDB

Mar
01
2023
--

Reduce Your Cloud Costs With Percona Kubernetes Operators

cut costs percona operators

Public cloud spending is slowing down. Quarter-over-quarter growth is no longer hitting 30% gains for AWS, Google, and Microsoft. This is businesses’ response to tough and uncertain macroeconomic conditions, where organizations scrutinize their public cloud spending to optimize and adjust.

In this blog post, we will see how running databases on Kubernetes with Percona Operators can reduce your cloud bill when compared to using AWS RDS.

Inputs

These are the following instances that we will start with:

  • AWS
  • RDS for MySQL in us-east-1
  • 10 x db.r5.4xlarge
  • 200 GB storage each

The cost of RDS consists mostly of two things – compute and storage. We will not consider data transfer or backup costs in this article.

  • db.r5.4xlarge – $1.92/hour or $46.08/day or $16,819.20/year
  • For ten instances, it will be $168,192 per year
  • Default gp2 storage is $0.115 per GB-month, for 200 GB, it is $22.50/month or $270 per year
  • For ten instances, it will be $2,700 per year

Our total estimated annual cost is $168,192 + $2,700 = $170,892.

Let’s see how much we can save.

Cutting costs

Move to Kubernetes

If we take AWS EKS, the following components are included in the cost:

  1. Control plane
  2. Compute – EC2, where we will have a dedicated Kubernetes node per database node
  3. Storage – EBS

Control plane is – $0.10/hour per cluster or $876 per year.

For EC2, we will use the same r5.4xlarge instance, which is $1.008/hour or $8,830.08 per year.

EBS gp2 storage is $0.10 per GB-month, for 200GB, it is $240 per year.

For ten instances, our annual cost is:

  • Compute – $88,300.80
  • Storage – $2,880
  • EKS control plane – $876

Total: $92,056.80

By just moving from RDS to Kubernetes and Operators, you will save $78,835.20.

Why Operators?

Seems this and further techniques can be easily executed without Kubernetes and Operators by just leveraging EC2 instances. “Why do I need Operators?” would be the right question to ask.

The usual block on a way to cost savings is management complexity. Managed Database Services like RDS remove a lot of toil from operation teams and provide self-service for developers. Percona Operators do exactly the same thing on Kubernetes, enabling your teams to innovate faster but without an extremely high price tag.

Kubernetes itself brings new challenges and complexity. Managed Kubernetes like EKS or GCP do a good job by removing part of these, but not all. Keep this in mind when you make a decision to cut costs with Operators.

Spot instances

Spot instances (or Preemptible VMs at GCP) use the spare compute capacity of the cloud. They come with no availability guarantee but are available at a much lower price – up to an 80% discount.

Discount is great, but what about availability? You can successfully leverage spot instances with Kubernetes and databases, but you must have a good strategy behind it. For example:

  1. Your production database should be running in a clustered mode. For the Percona XtraDB Cluster, we recommend having at least three nodes.
  2. Run in multiple Availability Zones to minimize the risk of massive spot instance interruptions.
  3. In AWS, you can keep some percentage of your Kubernetes nodes on-demand to guarantee the availability of your database clusters and applications.

You can move the levers between your availability requirements and cost savings. Services like spot.io and tools like aws-node-termination-handler can enhance your availability greatly with spot instances.

Leverage container density

Some of your databases are always utilized, and some are not. With Kubernetes, you can leverage various automation scaling techniques to pack containers densely and reduce the number of instances.

  1. Vertical Pod Autoscaler (VPA) or Kubernetes Event-driven Autoscaling (KEDA) can help to reduce the number of requested resources for your containers based on real utilization.
  2. Cluster Autoscaler will remove the nodes that are not utilized. 

Figure 1 shows an example of this tandem, where Cluster Autoscaler terminates one EC2 instance after rightsizing the container resources.

Figure 1: Animated example showing the Kubernetes autoscaler terminating an unused EC2 instance

Local storage

Databases tend to consume storage, and usually, it grows over time. It gets even trickier when your performance requirements demand expensive storage tiers like io2, where significant spending lands on provisioned IOPs. Some instances at cloud providers come with local NVMe SSDs. It is possible to store data on these local disks instead of EBS volumes, thus reducing your storage costs.

For example,  r5d.4xlarge has 2 x 300 GBs of built-in NVME SSDs. It can be enough to host a database.

The use of local storage comes with caveats:

  1. Instances with local SSDs are approximately 10% more expensive
  2. Like spot nodes, you have two levers – cost and data safety. Clustering and proper backups are essential.

Solutions like OpenEBS or Ondat leverage local storage and, at the same time, simplify operations. We have written many posts about Percona Operators running with OpenEBS:

Conclusion

Cloud spending grows with your business. Simple actions like rightsizing or changing to new instance generations bring short-lived savings. The strategies described in this article significantly cut your cloud bill and keep the complexity away through Operators and Kubernetes.

Try out Percona Operators today!

Feb
10
2023
--

Storage Autoscaling With Percona Operator for MongoDB

Previously, deploying and maintaining a database usually meant many burdensome chores and repetitive tasks to ensure proper functioning. In the cloud era, however, developers and operation engineers started fully embracing automation tools making their job significantly easier. Of those tools, Kubernetes operators are certainly one of the most prominent, as they are oftentimes used as a building block for DBaaS.

ViewBlock, a blockchain explorer, uses the Percona Operator for MongoDB to store critical data. Today along with their team, we will see how pvc-autoresizer can automate storage scaling for MongoDB clusters on Kubernetes.

Reasoning and goal

Nobody enjoys waking up in the middle of the night from disk usage alerts or, worse, a downed cluster due to a lack of free space on a volume that wasn’t properly set up to send proper warnings to its stakeholders.

Our goal is to automate storage scaling when our disk reaches a certain threshold of use and simultaneously reduce the amount of alert noise related to that. Specifically for our Operator, the volumes used by replicaset nodes are defined by their Persistent Volume Claims (PVCs), which are our targets to enable autoscaling.

Percona Kubernetes Operator

Let’s go

Prerequisites

The currently supported Kubernetes versions of pvc-autoresizer are 1.23 – 1.25. In addition to pvc-autoresizer that you can install through their Helm chart, Prometheus also needs to be running in your cluster to provide the resizer with the metrics it needs to determine if a PVC needs to be scaled.

Your CSI driver needs to support Volume Expansion and NodeGetVolumeStats, and the Storage Class used by your PVCs also requires the Volume Expansion feature to be active.

In our lab we will use AWS EKS with a standard storage class.

In action

You first need to add the following annotation to your Storage Class:

resize.topolvm.io/enabled=true

From that point on, your only requirement is to properly annotate the PVCs created by the Operator.

Assuming a simple 3-node Percona Server for MongoDB cluster without sharding and nothing else in your current namespace, you should be able to run the following commands to allow for automatic storage resizing of all the PVCs.

kubectl annotate pvc --all resize.topolvm.io/storage_limit="100Gi"
kubectl annotate pvc --all resize.topolvm.io/increase="20Gi"
kubectl annotate pvc --all resize.topolvm.io/threshold="20%"
kubectl annotate pvc --all resize.topolvm.io/enabled="true"

To describe that particular configuration, it tells pvc-autoresizer that your PVCs should increase by 20Gi when only 20% of space is left on them, with a maximum size of 100Gi beyond which they will not increase automatically.

Note: If you use version 1.14.0 of the Operator that’s scheduled for release this month, it will be possible to annotate the PVCs directly from your CR configuration file, for example:

spec:
  replsets:
  - name: rs0
    volumeSpec:
      persistentVolumeClaim:
        annotations:
         resize.topolvm.io/storage_limit: 100Gi
          resize.topolvm.io/increase: 20Gi
          resize.topolvm.io/threshold: 20%
          resize.topolvm.io/enabled: "true"

Limitations

Manual downscaling

It is possible to increase the storage automatically, but decreasing it is a manual process requiring replacing the volumes entirely. For example, AWS EBS volumes cannot be downsized and you’d need to delete the volumes before creating new ones with a smaller size. In Change Storage Class on Kubernetes on the Fly, we described how to change the storage class on the fly — a similar process will apply to decrease the size of the volumes.

Percentage threshold

As our increase thresholds are specified in percentages, the space available upon resize will inherently grow along with the size of the volume. When dealing with “big” disks, say 1TB, it is advised to set a low threshold to avoid triggering a rescale when a lot of space is still available.

Scaling quotas

Cloud providers, for example, AWS, have scaling quotas for the volumes. For EBS, you can resize a volume once in six hours. If your data ingestion is bigger than the increase amount you set and happens in less than that time, the resizing will fail. It is essential to consider your ingest rate and disk growth so that you can set the appropriate autoresizer configurations that suit your needs. Failure to do so might result in unwanted alerts and the need to transfer data to new PVCs, which you can learn about in Percona Operator Volume Expansion Without Downtime.

Statefulset and custom resource synchronization

When you provision new nodes or recreate one, their PVC will get bootstrapped with the volume request it gets from the immutable StatefulSet created at the initialization of your cluster, not with its latest size. You will need to set the appropriate annotations for those new PVCs to ensure that pvc-autoresizer scales them enough to allow for the replication to have enough space to proceed and that it doesn’t need to scale more than what your cloud provider permits. It is generally recommended to make sure the storage specs in StatefulSets and Custom Resources are in sync with real volumes.

Conclusion

And there you have it, a Percona Operator for MongoDB configuration that automatically scales its storage based on your needs!

It is still advised to have, at the very least, a minimal alerting setup in case you’re close to hitting the storage limit specified in the annotations since pvc-autoresizer requires it to be set.

ViewBlock is a blockchain-agnostic explorer allowing anyone to inspect blocks, transactions, address history, advanced statistics & much more.

Percona Operator for MongoDB automates deployment and management of replica sets and sharded clusters on Kubernetes.

Jan
24
2023
--

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshotsDatabases on Kubernetes continue their rising trend. We see the growing adoption of our Percona Kubernetes Operators and the demand to migrate workloads to the cloud-native platform. Our Operators provide built-in backup and restore capabilities, but some users are still looking for old-fashioned ways, like storage-level snapshots (i.e., AWS EBS Snapshots).

In this blog post, you will learn:

  1. How to back up and restore from storage snapshots using Percona Operators
  2. What the risks and limitations are of such backups

Overview

Volume Snapshots went GA in Kubernetes 1.20. Both your storage and Container Storage Interface (CSI) must support snapshots. All major cloud providers support them but might require some steps to enable it. For example, for GKE, you must create a VolumeSnapshotClass resource first.

At the high level, snapshotting on Kubernetes looks like this:

As PersistentVolume is represented by the real storage volume,

VolumeSnapshot

is the Kubernetes resource for volume snapshot in the cloud.

Getting ready for backups

First, we need to be sure that VolumeSnapshots are supported. For the major clouds, read the following docs:

Once you have CSI configured and Volume Snapshot Class is in place, proceed to create a backup.

Take the backup

Identify the PersistentVolumeClaims (PVC) that you want to snapshot. For example, for my MongoDB cluster, I have six PVCs: three x replica set nodes and three x config server nodes.

$ kubectl get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongod-data-my-cluster-name-cfg-0   Bound    pvc-c9fb5afa-1fc9-41f9-88f3-4ed457f88e58   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-cfg-1   Bound    pvc-b9253264-f79f-4fd0-8496-1d88105d84e5   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-cfg-2   Bound    pvc-5d462005-4015-47ad-9269-c205b7a3dfcb   3Gi        RWO            standard-rwo   76m
mongod-data-my-cluster-name-rs0-0   Bound    pvc-410acf85-36ad-4bfc-a838-f311f9dfd40b   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-rs0-1   Bound    pvc-a621dd8a-a671-4a35-bb3b-3f386550c101   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-rs0-2   Bound    pvc-484bb835-0e2d-4a40-b5a3-1ba340ec0567   3Gi        RWO            standard-rwo   76m

Each PVC will have its own VolumeSnapshot. Example for

mongod-data-my-cluster-name-cfg-0

:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongod-data-my-cluster-name-cfg-0-snap
spec:
  volumeSnapshotClassName: gke-snapshotclass
  source:
    persistentVolumeClaimName: mongod-data-my-cluster-name-cfg-0

I have listed all my VolumeSnapshots objects in one YAML manifest here.

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-volumesnapshots.yaml
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-2-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-2-snap created

VolumeSnapshotContent is created and bound to every

VolumeSnapshot

resource. Its status can tell you the name of the snapshot in the cloud and check if a snapshot is ready:

$ kubectl get volumesnapshotcontent snapcontent-0e67c3b5-551f-495b-b775-09d026ea3c8f -o yaml
…
status:
  creationTime: 1673260161919000000
  readyToUse: true
  restoreSize: 3221225472
  snapshotHandle: projects/percona-project/global/snapshots/snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f

  • snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f is the snapshot I have in GCP for the volume.
  • readyToUse: true – indicates that the snapshot is ready

Restore

The restoration process, in a nutshell, looks as follows:

  1. Create persistent volumes using the snapshots. The names of the volumes must match the standard that Operator uses.
  2. Provision the cluster

Like any other backup, it must have secrets in place: TLS and users.

You can use this restoration process to clone existing clusters as well, just make sure you change the cluster, PVCs, and Secret names.

Create persistent volumes from snapshots. It is the same as the creation of regular PersistentVolumeClaim, but with a

dataSource

section that points to the snapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongod-data-my-cluster-name-rs0-0
spec:
  dataSource:
    name: mongod-data-my-cluster-name-rs0-0-snap
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-pvc-restore.yaml
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-2 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-2 created

Once done, spin up the cluster as usual. The volumes you created earlier will be used automatically. Restoration is done.

Risks and limitations

Storage support

Both storage and the storage plugin in Kubernetes must support volume snapshots. This limits the choices. Apart from public clouds, there are open source solutions like Ceph (rook.io for k8s) that can provide snapshotting capabilities.

Point-in-time recovery

Point-in-time recovery (PITR) allows you to reduce your Point Recovery Objective by restoring or rolling back the database to a specific transaction or time.

Volume snapshots in the clouds store data in increments. The first snapshot holds all the data, and the following ones only store the changes. This significantly reduces your cloud bill. But snapshots cannot provide you with the same RPO as native database mechanisms.

Data consistency and corruption

Snapshots are not data-aware. When a snapshot is taken, numerous transactions and data modifications can happen. For example, heavy write activity and simultaneous compound index creation in MongoDB might lead to snapshot corruption. The biggest problem is that you will learn about data corruption during restoration.

Locking or freezing a filesystem before the snapshot would help to avoid such issues. Solutions like Velero or Veeam make the first steps towards data awareness and can create consistent snapshots by automating file system freezes or stopping replication.

Percona Services teams use various tools to automate the snapshot creation safely. Please contact us here to ensure data safety.

Cost

Public clouds store snapshots on cheap object storage but charge you extra for convenience. For example, the AWS EBS snapshot is priced at $0.05/GB, whereas S3 is only $0.023. It is a 2x difference, which for giant data sets might significantly increase your bill.

Time to recover

It is not a risk or limitation but a common misconception I often see: recovery from snapshots takes only a few seconds. It does not. When you create an EBS volume from the snapshot, it takes a few seconds. But in reality, the volume you just created does not have any data. You can read more about the internals of EBS snapshots in this nice blog post.

Conclusion

Volume Snapshots on Kubernetes can be used for databases but come with certain limitations and risks. Data safety and consistency are the most important factors when choosing a backup solution. For Percona Operators, we strongly recommend using built-in solutions which guarantee data consistency and minimize your recovery time and point objectives.

Learn More About Percona Kubernetes Operators

Dec
15
2022
--

Least Privilege for Kubernetes Resources and Percona Operators

Operators hide the complexity of the application and Kubernetes. Instead of dealing with Pods, StatefulSets, tons of YAML manifests, and various configuration files, the user talks to Kubernetes API to provision a ready-to-use application. An Operator automatically provisions all the required resources and exposes the application. Though, there is always a risk that the user would want to do something manual that can negatively affect the application and the Operator logic.

In this blog post, we will explain how to limit access scope for the user to avoid manual changes for database clusters deployed with Percona Operators. To do so, we will rely on Kubernetes Role-based Access Control (RBAC).

The goal

We are going to have two roles: Administrator and Developer. Administrator will deploy the Operator, create necessary roles, and service accounts. Developers will be able to:

  1. Create, modify, and delete Custom Resources that Percona Operators use
  2. List all the resources – users might want to debug the issues

Developers will not be able to:

  1. Create, modify, or delete any other resource

Least Privilege for Kubernetes

As a result, the Developer will be able to deploy and manage database clusters through a Custom Resource, but will not be able to modify any operator-controlled resources, like Pods, Services, Persistent Volume Claims, etc.

Action

We will provide an example for Percona Operator for MySQL based on Percona XtraDB Cluster (PXC), which just had version 1.12.0 released

Administrator

Create a dedicated namespace

We will allow users to manage clusters in a single namespace called

prod-dbs

:

$ kubectl create namespace prod-dbs

Deploy the operator

Use any of the ways described in our documentation to deploy the operator into the namespace. My personal favorite will be with simple

kubectl

command:

$ kubectl apply -f https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/v1.12.0/deploy/bundle.yaml

Create ClusterRole

ClusterRole resource defines the permissions the user will have for a specific resource in Kubernetes. You can find the YAML in this github repository.

    - apiGroups: ["pxc.percona.com"]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: [""]
      resources:
      - pods
      - pods/exec
      - pods/log
      - configmaps
      - services
      - persistentvolumeclaims
      - secrets
      verbs:
      - get
      - list
      - watch

As you can see we allow any operations for

pxc.percona.com

resources, but restrict others to get, list, and watch.

$ kubectl apply -f https://github.com/spron-in/blog-data/blob/master/rbac-operators/clusterrole.yaml

Create ServiceAccount

We are going to generate a kubeconfig for this service account. This is what the user is going to use to connect to the Kubernetes API.

apiVersion: v1
kind: ServiceAccount
metadata:
    name: database-manager
    namespace: prod-dbs

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/rbac-operators/serviceaccount.yaml

Create ClusterRoleBinding

We need to assign the

ClusterRole

to the

ServiceAccount

.

ClusterRoleBinding

 acts as a relation between these two.

$ kubectl apply -f https://github.com/spron-in/blog-data/blob/master/rbac-operators/clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
    name: percona-database-manager-bind
    namespace: prod-dbs
roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: percona-pxc-rbac
subjects:
    - kind: ServiceAccount
      name: database-manager
      namespace: prod-dbs

Developer

Verify

can-i

command allows you to verify if the service account can really do what we want it to do. Let’s try:

$ kubectl auth can-i create perconaxtradbclusters.pxc.percona.com --as=system:serviceaccount:prod-dbs:database-manager
yes

$ kubectl auth can-i delete service --as=system:serviceaccount:prod-dbs:database-manager
no

All good. I can create Percona XtraDB Clusters, but I can’t delete Services. Please note, that sometimes it might be useful to allow Developers to delete Pods to force the cluster recovery. If you feel that it is needed, please modify the ClusterRole.

Apply

There are multiple ways to use this service account. You can read more about it in Kubernetes documentation. For a quick demonstration, we are going to generate a kubeconfig that we can share with our user. 

Run this script to generate the config. What it does:

  1. Gets the service account secret resource name
  2. Extracts Certificate Authority (ca.crt) contents from the secret
  3. Extracts the token from the secret
  4. Gets the endpoint of the Kubernetes API
  5. Generates the kubeconfig using all of the above
$ bash generate-kubeconfig.sh > tmp-kube.config

Now let’s see if it works as expected:

$ KUBECONFIG=tmp-kube.config -n prod-dbs apply -f https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/v1.12.0/deploy/cr.yaml
perconaxtradbcluster.pxc.percona.com/cluster1 created

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          15m
cluster1-haproxy-1                                 2/2     Running   0          14m
cluster1-haproxy-2                                 2/2     Running   0          14m
cluster1-pxc-0                                     3/3     Running   0          15m
cluster1-pxc-1                                     3/3     Running   0          14m
cluster1-pxc-2                                     3/3     Running   0          13m
percona-xtradb-cluster-operator-77bf8b9df5-qglsg   1/1     Running   0          52m

I was able to create the Custom Resource and a cluster. Let’s try to delete the Pod:

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs delete pods cluster1-haproxy-0
Error from server (Forbidden): pods "cluster1-haproxy-0" is forbidden: User "system:serviceaccount:prod-dbs:database-manager" cannot delete resource "pods" in API group "" in the namespace "prod-dbs"

What about the Custom Resource?

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs delete pxc cluster1
perconaxtradbcluster.pxc.percona.com "cluster1" deleted

Conclusion

Percona Operators automate the deployment and management of the databases in Kubernetes. Least privilege principle should be applied to minimize the ability of inexperienced users to affect availability and data integrity. Kubernetes comes with sophisticated Role-Based Access Control capabilities which allow you to do just that without the need to reinvent it in your platform or application.

Try Percona Operators

Nov
25
2022
--

Percona Operator for MongoDB Backup and Restore on S3-Compatible Storage – Backblaze

Percona Operator for MongoDB Backblaze

Percona Operator for MongoDB BackblazeOne of the main features that I like about the Percona Operator for MongoDB is the integration with Percona Backup for MongoDB (PBM) tool and the ability to backup/restore the database without manual intervention. The Operator allows backing up the DB to S3-compatible cloud storage and so you can use AWS, Azure, etc.

One of our customers asked about the integration between Backblaze and the Operator for backup and restore purposes. So I was checking for it and found that it is S3-compatible and provides a free account with 10GB of cloud storage. So I jumped into testing it with our Operator. Also, I saw in our forum that a few users are using Backblaze cloud storage. So making this blog post for everyone to utilize if they want to test/use Backblaze S3-compatible cloud storage for testing our Operator and PBM.

S3-compatible storage configuration

The Operator supports backup to S3-compatible storage. The steps for backup to AWS or Azure blob are given here. So you can try that as well. In this blog let me focus on B2 cloud storage configuring as the backup location and restore it to another deployment.

Let’s configure the Percona Server for MongoDB (PSMDB) Sharded Cluster using the Operator (with minimal config as explained here). I have used PSMDB operator v1.12.0 and PBM 1.8.1 for the test below. You can sign up for the free account here – https://www.backblaze.com/b2/cloud-storage-b.html. Then log in to your account. You can first create a key pair to access the storage from your operator as follows in the “App Keys” tab:

Add Application Key GUI

 

Then you can create a bucket with your desired name and note down the S3-compatible storage’s details like bucketname (shown in the picture below) and the endpointUrl to point here to send the backup files. The details of endpointUrl can be obtained from the provider and the region is specified in the prefix of the endpointURL variable.

Create Bucket

 

Deploy the cluster

Now let’s download the Operator from GitHub (I used v1.12.0) and configure the files for deploying the MongoDB sharded cluster. Here, I am using cr-minimal.yaml for deploying a very minimal setup of single member replicaset for a shard, config db, and a mongos.

#using an alias for the kubectl command
$ alias "k=kubectl"
$ cd percona-server-mongodb-operator

# Add a backup section in the cr file as shown below. Use the appropriate values from your setup
$ cat deploy/cr-minimal.yaml
apiVersion: psmdb.percona.com/v1-12-0
kind: PerconaServerMongoDB
metadata:
  name: minimal-cluster
spec:
  crVersion: 1.12.0
  image: percona/percona-server-mongodb:5.0.7-6
  allowUnsafeConfigurations: true
  upgradeOptions:
    apply: 5.0-recommended
    schedule: "0 2 * * *"
  secrets:
    users: minimal-cluster
  replsets:
  - name: rs0
    size: 1
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 3Gi
  sharding:
    enabled: true
    configsvrReplSet:
      size: 1
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    mongos:
      size: 1

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:1.8.1
    serviceAccountName: percona-server-mongodb-operator
    pitr:
      enabled: false
      compressionType: gzip
      compressionLevel: 6
    storages:
      s3-us-west:
        type: s3
        s3:
          bucket: psmdbbackupBlaze
          credentialsSecret: my-cluster-name-backup-s3
          region: us-west-004
          endpointUrl: https://s3.us-west-004.backblazeb2.com/
#          prefix: ""
#          uploadPartSize: 10485760
#          maxUploadParts: 10000
#          storageClass: STANDARD
#          insecureSkipTLSVerify: false

 

The backup-s3.yaml contains the key details to access the B2 cloud storage. Encode the Key ID and Access Details (retrieved from Backblaze as mentioned here) as follows to use inside the backup-s3.yaml file. The key name: my-cluster-name-backup-s3 should be unique which is used to refer to the other yaml files:

# First use base64 to encode your keyid and access key:
$ echo "key-sample" | base64 --wrap=0
XXXX==
$ echo "access-key-sample" | base64 --wrap=0
XXXXYYZZ==

$ cat deploy/backup-s3.yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
 AWS_ACCESS_KEY_ID: XXXX==
 AWS_SECRET_ACCESS_KEY: XXXXYYZZ==

 

Then deploy the cluster as mentioned below and deploy backup-s3.yaml as well.

$ k apply -f ./deploy/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbs.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbbackups.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbrestores.psmdb.percona.com created
role.rbac.authorization.k8s.io/percona-server-mongodb-operator created
serviceaccount/percona-server-mongodb-operator created
rolebinding.rbac.authorization.k8s.io/service-account-percona-server-mongodb-operator created
deployment.apps/percona-server-mongodb-operator created

$ k apply -f ./deploy/cr-minimal.yaml
perconaservermongodb.psmdb.percona.com/minimal-cluster created

$ k apply -f ./deploy/backup-s3.yaml 
secret/my-cluster-name-backup-s3 created

After starting the Operator and applying the yaml files, the setup looks like the below:

$ k get pods
NAME                                               READY   STATUS    RESTARTS   AGE
minimal-cluster-cfg-0                              2/2     Running   0          39m
minimal-cluster-mongos-0                           1/1     Running   0          70m
minimal-cluster-rs0-0                              2/2     Running   0          38m
percona-server-mongodb-operator-665cd69f9b-44tq5   1/1     Running   0          74m

$ k get svc
NAME                     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)     AGE
kubernetes               ClusterIP   10.96.0.1     <none>        443/TCP     76m
minimal-cluster-cfg      ClusterIP   None          <none>        27017/TCP   72m
minimal-cluster-mongos   ClusterIP   10.100.7.70   <none>        27017/TCP   72m
minimal-cluster-rs0      ClusterIP   None          <none>        27017/TCP   72m

 

Backup

After deploying the cluster, the DB is ready for backup anytime. Other than the scheduled backup, you can create a backup-custom.yaml file to take a backup whenever you need it (you will need to provide a unique backup name each time, or else a new backup will not work). Our backup yaml file looks like the below one:

$ cat deploy/backup/backup-custom.yaml
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  finalizers:
  - delete-backup
  name: backup1
spec:
  clusterName: minimal-cluster
  storageName: s3-us-west
#  compressionType: gzip
#  compressionLevel: 6

Now load some data into the database and then start the backup now:

$ k apply -f deploy/backup/backup-custom.yaml 
perconaservermongodbbackup.psmdb.percona.com/backup1 configured

The backup progress looks like the below:

$ k get perconaservermongodbbackup.psmdb.percona.com
NAME      CLUSTER           STORAGE      DESTINATION            STATUS      COMPLETED   AGE
backup1   minimal-cluster   s3-us-west   2022-09-08T03:21:58Z   requested               43s
$ k get perconaservermongodbbackup.psmdb.percona.com
NAME      CLUSTER           STORAGE      DESTINATION            STATUS      COMPLETED   AGE
backup1   minimal-cluster   s3-us-west   2022-09-08T03:22:19Z   requested               46s
$ k get perconaservermongodbbackup.psmdb.percona.com
NAME      CLUSTER           STORAGE      DESTINATION            STATUS    COMPLETED   AGE
backup1   minimal-cluster   s3-us-west   2022-09-08T03:22:19Z   running               49s

Here, if you have any issues with the backup, you can view the backup logs from the backup agent sidecar as follows:

$ k logs pod/minimal-cluster-rs0 -c backup-agent

To start another backup, edit backup-custom.yaml and change the backup name followed by applying it (using name:backup2):

$ k apply -f deploy/backup/backup-custom.yaml 
perconaservermongodbbackup.psmdb.percona.com/backup2 configured 

Monitor the backup process (you can use -w option to watch the progress continuously). It should show the status as READY:

$ k get perconaservermongodbbackup.psmdb.percona.com -w
NAME      CLUSTER           STORAGE      DESTINATION            STATUS   COMPLETED   AGE
backup1   minimal-cluster   s3-us-west   2022-09-08T03:22:19Z   ready    12m         14m
backup2   minimal-cluster   s3-us-west                                               8s
backup2   minimal-cluster   s3-us-west   2022-09-08T03:35:56Z   requested               21s
backup2   minimal-cluster   s3-us-west   2022-09-08T03:35:56Z   running                 26s
backup2   minimal-cluster   s3-us-west   2022-09-08T03:35:56Z   ready       0s          41s

From the bucket on Backblaze, the backup files are listed as they were sent from the backup:

Backup Files in B2 cloud Storage

 

Restore

You can restore the cluster from the backup into another similar deployment or into the same cluster. List the backups and restore one of them as follows. The configuration restore-custom.yaml has the backup information to restore. If you are using another deployment, then you can also include backupSource section which I commented on below for your reference, from which the restore process finds the source of the backup. In this case, make sure you create a secret my-cluster-name-backup-s3 before restoring as well to access the backup.

$ cat deploy/backup/restore-custom.yaml 
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: restore2
spec:
  clusterName: minimal-cluster
  backupName: backup2
#  pitr:
#    type: date
#    date: YYYY-MM-DD HH:MM:SS
#  backupSource:
#    destination: s3://S3-BACKUP-BUCKET-NAME-HERE/BACKUP-DESTINATION
#    s3:
#      credentialsSecret: my-cluster-name-backup-s3
#      region: us-west-004
#      bucket: S3-BACKUP-BUCKET-NAME-HERE
#      endpointUrl: https://s3.us-west-004.backblazeb2.com/
#      prefix: ""
#    azure:
#      credentialsSecret: SECRET-NAME
#      prefix: PREFIX-NAME
#      container: CONTAINER-NAME

Listing the backup:

$ k get psmdb-backup
NAME      CLUSTER           STORAGE      DESTINATION            STATUS   COMPLETED   AGE
backup1   minimal-cluster   s3-us-west   2022-09-08T03:22:19Z   ready    3h5m        3h6m
backup2   minimal-cluster   s3-us-west   2022-09-08T03:35:56Z   ready    171m        172m
backup3   minimal-cluster   s3-us-west   2022-09-08T04:16:39Z   ready    130m        131m

 

To verify the restore process, I write some data into a collection vinodh.testData after the backup and before the restore. So the newly inserted document shouldn’t be there after the restore:

# Using mongosh from the mongo container to see the data
# Listing data from collection vinodh.testData
$ kubectl run -i --rm --tty mongo-client --image=mongo:5.0.7 --restart=Never -- bash -c "mongosh --host=10.96.30.92 --username=root --password=password --authenticationDatabase=admin --eval \"db.getSiblingDB('vinodh').testData.find()\" --quiet "
If you don't see a command prompt, try pressing enter.
[ { _id: ObjectId("631956cc70e60e9ed3ecf76d"), id: 1 } ]
pod "mongo-client" deleted

Inserting a document into it:

$ kubectl run -i --rm --tty mongo-client --image=mongo:5.0.7 --restart=Never -- bash -c "mongosh --host=10.96.30.92 --username=root --password=password --authenticationDatabase=admin --eval \"db.getSiblingDB('vinodh').testData.insert({id:2})\" --quiet "
If you don't see a command prompt, try pressing enter.
DeprecationWarning: Collection.insert() is deprecated. Use insertOne, insertMany, or bulkWrite.
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("631980fe07180f860bd22534") }
}
pod "mongo-client" delete

Listing it again to verify:

$ kubectl run -i --rm --tty mongo-client --image=mongo:5.0.7 --restart=Never -- bash -c "mongosh --host=10.96.30.92 --username=root --password=password --authenticationDatabase=admin --eval \"db.getSiblingDB('vinodh').testData.find()\" --quiet "
If you don't see a command prompt, try pressing enter.
[
  { _id: ObjectId("631956cc70e60e9ed3ecf76d"), id: 1 },
  { _id: ObjectId("631980fe07180f860bd22534"), id: 2 }
]
pod "mongo-client" deleted

Running restore as follows:

$ k apply -f deploy/backup/restore-custom.yaml
perconaservermongodbrestore.psmdb.percona.com/restore2 created

Now check the data again in vinodh.testData collection and verify whether the restore is done properly. The below data proves that the collection was restored from the backup as it is listing only the record from the backup:

$ kubectl run -i --rm --tty mongo-client --image=mongo:5.0.7 --restart=Never -- bash -c "mongosh --host=minimal-cluster-mongos --username=root --password=password --authenticationDatabase=admin --eval \"db.getSiblingDB('vinodh').testData.find()\" --quiet "
If you don't see a command prompt, try pressing enter.
[ { _id: ObjectId("631956cc70e60e9ed3ecf76d"), id: 1 } ]

 

Hope this helps you! Now you can try the same from your end to check and use Backblaze in your production if it suits your requirements. I haven’t tested the performance of the network yet. If you have used Backblaze or similar S3-compatible storage for backup, then you can share your experience with us in the comments.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Nov
22
2022
--

Troubleshooting Percona Operators With troubleshoot.sh

Troubleshooting Percona Operators

Troubleshooting Percona OperatorsPercona loves and embraces Kubernetes. Percona Operators have found their way into the hearts and minds of developers and operations teams. And with growing adoption, we get a lot of valuable feedback from the community and our support organization. Two of the most common questions that we hear are:

  1. What are the best practices to run Operators?
  2. Where do I look if there is a problem? In other words, how do I troubleshoot the issues?

In this blog post, we will experiment with troubleshoot.sh, an open source tool developed by Replicated, and see if it can help with our problems. 

Prepare

Installation

Troubleshoot.sh is a collection of kubectl plugins. Installation is dead simple and described in the documentation.

Concepts

There are two basic use cases:

  1. Preflight – check if your cluster is ready and compliant to run an application, so it should cover the best practices question
  2. Support bundle – check if the application is running and behaving appropriately, it clearly helps with troubleshooting

Three functional concepts in the tool that you should know about:

  1. Collectors – where you can define what information you want to collect. A lot of basic information is collected about the cluster by default. 
  2. Redact – if there is sensitive information, you can redact it from the bundle. For example, AWS credentials are somewhere in the logs of your application.
  3. Analyze – act on collected information and provide actionable insights. 

 

Preflight

Checks are defined through YAML files. I have created an example to check if the Kubernetes cluster is ready to run Percona Operator for MySQL (based on Percona XtraDB Cluster). This checks the following:

  1. Kubernetes version and flavor – as we only test our releases on specific versions and providers
  2. Default storage class – it is possible to use local storage, but by default, our Operator uses Persistent Volume Claims
  3. Node size – we check if nodes have at least 1 GB of RAM, otherwise, Percona XtraDB Cluster might fail to start

Run the check with the following command:

kubectl preflight https://raw.githubusercontent.com/spron-in/blog-data/master/percona-troubleshoot/pxc-op-1.11-preflight.yaml

It is just an example. Preflight checks are very flexible and you can read more in the documentation.

Support bundle

Support bundles can be used by support organizations or by users to troubleshoot issues with applications on Kubernetes. Once you run the command, it also creates a tar.gz archive with various data points that can be analyzed later or shared with experts.

Similarly to Preflight, support bundle checks are defined in YAML files. See the example that I created here and run it with the following command:

kubectl support-bundle https://raw.githubusercontent.com/spron-in/blog-data/master/percona-troubleshoot/pxc-op-1.11-support.yaml

Check cluster state

First, we are going to check the presence of Custom Resource Definition and the health of the cluster. In my YAML the name of the cluster is hard coded to minimal-cluster (it can be deployed from our repo).

We are using Custom Resource fields to verify the statuses of Percona XtraDB Cluster, HAProxy, and ProxySQL. This is the example checking HAProxy:

 - yamlCompare:
        checkName: Percona Operator for MySQL - HAProxy
        fileName: cluster-resources/custom-resources/perconaxtradbclusters.pxc.percona.com/default.yaml
        path: "[0].status.haproxy.status"
        value: ready
        outcomes:
          - fail:
              when: "false"
              message: HAProxy is not ready
          - pass:
              when: "true"
              message: HAProxy is ready

perconaxtradbclusters.pxc.percona.com/default.yaml

has all

pxc

Custom Resources in the default namespace. I know I have only one there, so I just parse the whole YAML with the

yamlCompare

method.

Verify Pods statuses

If you see that previous checks failed, it is time to check the Pods. In the current implementation, you can get the statuses of the Pods in various namespaces. 

- clusterPodStatuses:
        name: unhealthy
        namespaces:
          - default
        outcomes:
…
          - fail:
              when: "== Pending"
              message: Pod {{ .Namespace }}/{{ .Name }} is in a Pending state with status of {{ .Status.Reason }}.

Parse logs for known issues

Parsing logs of the application can be tedious. But what if you can predefine all known errors and quickly learn about the issue without going into your central logging system? Troubleshoot.sh can do it as well.

First, we define the collectors. We will get the logs from the Operator and Percona XtraDB Cluster itself:

collectors:
    - logs:
        selector:
          - app.kubernetes.io/instance=minimal-cluster
          - app.kubernetes.io/component=pxc
        namespace: default
        name: pxc/container/logs
        limits:
          maxAge: 24h
    - logs:
        selector:
          - app.kubernetes.io/instance=percona-xtradb-cluster-operator
          - app.kubernetes.io/component=operator
        namespace: default
        name: pxc-operator/container/logs
        limits:
          maxAge: 24h

Now the logs are in our support bundle. We can analyze them manually or catch some known messages with a regular expression analyzer:

    - textAnalyze:
        checkName: Failed to update the password
        fileName: pxc-operator/container/logs/percona-xtradb-cluster-operator-.*.log
        ignoreIfNoFiles: true
        regex: 'Error 1396: Operation ALTER USER failed for'
        outcomes:
          - pass:
              when: "false"
              message: "No failures on password change"
          - fail:
              when: "true"
              message: "There was a failure to change the system user password. For more details: https://docs.percona.com/percona-operator-for-mysql/pxc/users.html"

In the example we are checking for a message in the Operator log, that indicates that the system user password change failed.

Check for security best practices

Except for dealing with operational issues, troubleshoot.sh can help with compliance checks. You should have some security guardrails in the code before you deploy to Kubernetes, but if something slipped you will have a second line of defense. 

In our example we check for weak passwords and if the statefulset matches high availability best practices. 

Collecting and sharing passwords is not recommended, but still can be a valuable addition to your security policies. To capture secrets you need to implicitly define a collector:

spec:
  collectors:
    - secret:
        namespace: default
        name: internal-minimal-cluster
        includeValue: true
        key: root

Replay support bundle with sbctl

sbctl is an awesome addition to troubleshoot.sh. It allows users and support engineers to examine support bundles as if it was regular interaction with the Kubernetes cluster. See the demo below:

What’s next

The tool is new and there are certain limitations. To list a few:

  1. For now, all checks are hard coded. It calls out for some templating mechanism or simple generator of YAML files.
  2. It is not possible to add sophisticated logic into analyzers. For example, to link one analyzer to another. 
  3. It would be great to apply targeted filtering. For example, do not gather the information for all Custom Resources, but only one. It can be useful for targeted troubleshooting. 

We are going to work with the community and see how Percona can contribute to this tool, as it can help our users to standardize the troubleshooting of our products. Please let us know about your use cases or stories around debugging Percona products on Kubernetes in the comments.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com