Jan
24
2022
--

Percona Distribution for MongoDB Operator with Local Storage and OpenEBS

MongoDB Operator with Local Storage and OpenEBS

Automating the deployment and management of MongoDB on Kubernetes is an easy journey with Percona Operator. By default, MongoDB is deployed using persistent volume claims (PVC). In the cases where you seek exceptional performance or you don’t have any external block storage, it is also possible to use local storage. Usually, it makes sense to use local NVMe SSD for better performance (for example Amazon’s i3 and i4i instance families come with local SSDs).

With PVCs, migrating the container from one Kubernetes node to another is straightforward and does not require any manual steps, whereas local storage comes with certain caveats. OpenEBS allows you to simplify local storage management on Kubernetes. In this blog post, we will show you how to deploy MongoDB with Percona Operator and leverage OpenEBS for local storage.

OpenEBS MongoDB

Set-Up

Install OpenEBS

We are going to deploy OpenEBS with a helm chart. Refer to OpenEBS documentation for more details.

helm repo add openebs https://openebs.github.io/charts
helm repo update
helm install openebs --namespace openebs openebs/openebs --create-namespace

This is going to install OpenEBS along with

openebs-hostpath

storageClass:

kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
…
openebs-hostpath     openebs.io/local        Delete          WaitForFirstConsumer   false                  71s

Deploy MongoDB Cluster

We will use a helm chart for it as well and follow this document

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

Install the Operator:

helm install my-op percona/psmdb-operator

Deploy the database using local storage. We will disable sharding for this demo for simplicity:

helm install mytest percona/psmdb-db --set sharding.enabled=false \ 
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \ 
--set  "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \ 
--set "replsets[0].name=rs0" --set "replsets[0].size=3"

As a result, we should have a replica set with three nodes using

openebs-hostpath

storageClass.

$ kubectl get pods
NAME                                    READY   STATUS    RESTARTS   AGE
my-op-psmdb-operator-58c74cbd44-stxqq   1/1     Running   0          5m56s
mytest-psmdb-db-rs0-0                   2/2     Running   0          3m58s
mytest-psmdb-db-rs0-1                   2/2     Running   0          3m32s
mytest-psmdb-db-rs0-2                   2/2     Running   0          3m1s

$ kubectl get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
mongod-data-mytest-psmdb-db-rs0-0   Bound    pvc-63d3b722-4b31-42ab-b4c3-17d8734c92a3   3Gi        RWO            openebs-hostpath   4m2s
mongod-data-mytest-psmdb-db-rs0-1   Bound    pvc-2bf6d908-b3c0-424c-9ccd-c3be3295da3a   3Gi        RWO            openebs-hostpath   3m36s
mongod-data-mytest-psmdb-db-rs0-2   Bound    pvc-9fa3e21e-bfe2-48de-8bba-0dae83b6921f   3Gi        RWO            openebs-hostpath   3m5s

Local Storage Caveats

Local storage is the node storage. It means that if something happens with the node, it will also have an impact on the data. We will review various regular situations and how they impact Percona Server for MongoDB on Kubernetes with local storage.

Node Restart

Something happened with the Kubernetes node – server reboot, virtual machine crash, etc. So the node is not lost but just restarted. Let’s see what would happen in this case with our MongoDB cluster.

I will restart one of my Kubernetes nodes. As a result, the Pod will go into a Pending state:

$ kubectl get pods
NAME                                    READY   STATUS    RESTARTS   AGE
my-op-psmdb-operator-58c74cbd44-stxqq   1/1     Running   0          58m
mytest-psmdb-db-rs0-0                   2/2     Running   0          56m
mytest-psmdb-db-rs0-1                   0/2     Pending   0          67s
mytest-psmdb-db-rs0-2                   2/2     Running   2          55m

In normal circumstances, the Pod should be rescheduled to another node, but it is not happening now. The reason is local storage and affinity rules. If you do

kubectl describe pod mytest-psmdb-db-rs0-1

, you would see something like this:

Events:
  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Warning  FailedScheduling   72s (x2 over 73s)  default-scheduler   0/3 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict.
  Normal   NotTriggerScaleUp  70s                cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict

As you see, the cluster is not scaled up as this Pod needs a specific node that has its storage. We can see this annotation in PVC itself:

$ kubectl describe pvc mongod-data-mytest-psmdb-db-rs0-1
Name:          mongod-data-mytest-psmdb-db-rs0-1
Namespace:     default
StorageClass:  openebs-hostpath
…
Annotations:   pv.kubernetes.io/bind-completed: yes
…
               volume.kubernetes.io/selected-node: gke-sergey-235-default-pool-9f5f2e2b-4jv3
…
Used By:       mytest-psmdb-db-rs0-1

In other words, this Pod will wait for the node to come back. Till it comes back your MongoDB cluster will be in a degraded state, running two nodes out of three. Keep this in mind when you perform maintenance or experience a Kubernetes node crash. With PVCs, this MongoDB Pod would be rescheduled to a new node right away.

Graceful Migration to Another Node

Let’s see what is the best way to migrate one MongoDB replica set Pod from one node to another when local storage is used. There can be multiple reasons – node maintenance, migration to another rack, datacenter, or newer hardware. We want to perform such a migration with no downtime and minimal performance impact on the database.

Firstly, we will add more nodes to the replica set by scaling up the cluster. We will use helm again and change the size from three to five:

helm upgrade mytest percona/psmdb-db --set sharding.enabled=false \ 
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \ 
--set  "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \ 
--set "replsets[0].name=rs0" --set "replsets[0].size=5"

This will create two more Pods in a replica set. Both Pods will use

openebs-hostpath

storage as well. By default, our affinity rules require you to run replica set nodes on different Kubernetes nodes, so either enable auto-scaling or ensure you have enough nodes in your cluster. We are adding more nodes to avoid performance impact.

Once all five replica set nodes are up, we will drain the Kubernetes node we need. This will remove all the pods from it gracefully.

kubectl drain gke-sergey-235-default-pool-9f5f2e2b-rtcz --ignore-daemonsets

As with the node restart described in the previous chapter, the replica set Pod will be stuck in Pending status waiting for the local storage.

kubectl get pods
NAME                                    READY   STATUS    RESTARTS   AGE
…
mytest-psmdb-db-rs0-2                   0/2     Pending   0          65s

The storage will not come back. To solve it we need to remove the PVC and delete the Pod: 

kubectl delete pvc mongod-data-mytest-psmdb-db-rs0-2
persistentvolumeclaim "mongod-data-mytest-psmdb-db-rs0-2" deleted


kubectl delete pod mytest-psmdb-db-rs0-2
pod "mytest-psmdb-db-rs0-2" deleted

This will trigger the creation of a new PVC and a Pod on another node:

NAME                                    READY   STATUS    RESTARTS   AGE
…
mytest-psmdb-db-rs0-2                   2/2     Running   2          1m

Again all five replica set pods are up and running. You can now perform the maintenance on your Kubernetes node.

What is left is to scale down replica set back to three nodes:

helm upgrade mytest percona/psmdb-db --set sharding.enabled=false \ 
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \ 
--set  "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \ 
--set "replsets[0].name=rs0" --set "replsets[0].size=3"

Node Loss

When the Kubernetes node is dead and there is no chance for it to recover, we will face the same situation as with graceful migration: Pod will be stuck in Pending status waiting for the node to come back. The recovery path is the same:

  1. Delete Persistent Volume
  2. Delete the Pod
  3. The pod will start on another node and sync the data to a new local PVC

 

Conclusion

Local storage can boost your database performance and remove the need for cloud storage completely. This can also lower your public cloud provider bill. In this blog post, we saw that these benefits come with a higher maintenance cost, that can be also automated. 

We encourage you to try out Percona Distribution for MongoDB Operator with local storage and share your results on our community forum.

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Distribution for MongoDB Operator in CONTRIBUTING.md.

Sep
26
2018
--

Scaling IO-Bound Workloads for MySQL in the Cloud – part 2

Rplot07-innodb-iops

This post is a followup to my previous article https://www.percona.com/blog/2018/08/29/scaling-io-bound-workloads-mysql-cloud/

In this instance, I want to show the data in different dimensions, primarily to answer questions around how throughput scales with increasing IOPS.

A recap: for the test I use Amazon instances and Amazon gp2 and io1 volumes. In addition to the original post, I also tested two gpl2 volumes combined in software RAID0. I did this for the following reason: Amazon cap the single gp2 volume throughput to 160MB/sec, and as we will see from the charts, this limits InnoDB performance.

Also, a reminder from the previous post: we can increase gp2 IOPS by increasing volume size (to the top limit 10000 IOPS), and for io1 we can increase IOPS by paying per additional IOPS.

Scaling with InnoDB

So for the first result, let’s see how InnoDB scales with increasing IOPS.

There are a few interesting observations here: InnoDB scales linearly with additional IOPS, but it faces a throughput limit that Amazon applies to volumes.

So besides considering IOPS, we should take into account the maximal throughout of volumes.

In the second chart we compare InnoDB performance vs the cost of volumes:

It’s interesting to see here the slope for gp2 volumes is steeper than for io1 volumes. This means we can get a bigger increase in InnoDB performance per dollar using gp2 volumes, but only until we reach the IOPS and throughput limits that are applied to gp2 volumes.

Scaling with MyRocks

And here’s the similar chart but for MyRocks:

Here we can also see that MyRocks scales linearly, showing identical results on gp2 and io1 volumes. This means that running on gp2 will be cheaper. Also, there is no plateau in throughput, as we saw for InnoDB, which means that MyRocks uses less IO throughput.

And the chart for the cost of running MyRocks:

This charts also shows that it is cheaper to run on gp2 volume but only while it provides enough IOPS. I assume that using two gp2 volumes would allow me to double the throughput. (I did not run the test for MyRocks using two volumes)

Conclusions

  • Both MyRocks and InnoDB can scale (linearly) with additional IOPS on gp2 and io1 Amazon volumes.
  • Take into account that IOPS is not the only factor to consider. There is also throughput limit, which affects InnoDB results, so for further scaling you might need to use multiple volumes.

The post Scaling IO-Bound Workloads for MySQL in the Cloud – part 2 appeared first on Percona Database Performance Blog.

Aug
01
2018
--

Saving With MyRocks in The Cloud

The main focus of a previous blog post was the performance of MyRocks when using fast SSD devices. However, I figured that MyRocks would be beneficial for use in cloud workloads, where storage is either slow or expensive.

In that earlier post, we demonstrated the benefits of MyRocks, especially for heavy IO workloads. Meanwhile, Mark wrote in his blog that the CPU overhead in MyRocks might be significant for CPU-bound workloads, but this should not be the issue for IO-bound workloads.

In the cloud the cost of resources is a major consideration. Let’s review the annual cost for the processing and storage resources.

 Resource cost/year, $   IO cost $/year   Total $/year 
c5.9xlarge  7881    7881
1TB io1 5000 IOPS  1500  3900    5400
1TB io1 10000 IOPS  1500  7800    9300
1TB io1 15000 IOPS  1500  11700  13200
1TB io1 20000 IOPS  1500  15600  17100
1TB io1 30000 IOPS  1500  23400  24900
3.4TB GP2 (10000 IOPS)  4800    4800

 

The scenario

The server version is Percona Server 5.7.22

For instances, I used c5.9xlarge instances. The reason for c5 was that it provides high performance Nitro virtualization: Brendan Gregg describes this in his blog post. The rationale for 9xlarge instances was to be able to utilize io1 volumes with a 30000 IOPS throughput – smaller instances will cap io1 throughput at a lower level.

I also used huge gp2 volumes: 3400GB, as this volume provides guaranteed 10000 IOPS even if we do not use io1 volumes. This is a cheaper alternative to io1 volumes to achieve 10000 IOPS.

For the workload I used sysbench-tpcc 5000W (50 tables * 100W), which for InnoDB gave about 471GB in storage used space.

For the cache I used 27GB and 54G buffer size, so the workload is IO-heavy.

I wanted to compare how InnoDB and RocksDB performed under this scenario.

If you are curious I prepared my terraform+ansible deployment files here: https://github.com/vadimtk/terraform-ansible-percona

Before jumping to the results, I should note that for MyRocks I used LZ4 compression for all levels, which in its final size is 91GB. That is five times less than InnoDB size. This alone provides operational benefits—for example to copy InnoDB files (471GB) from a backup volume takes longer than 1 hour, while it is much faster (five times) for MyRocks.

The benchmark results

So let’s review the results.

InnoDB versus MyRocks throughput in the cloud

Or presenting average throughput in a tabular form:

cachesize IOPS engine avg TPS
27 5000 innodb 132.66
27 5000 rocksdb 481.03
27 10000 innodb 285.93
27 10000 rocksdb 1224.14
27 10000gp2 innodb 227.19
27 10000gp2 rocksdb 1268.89
27 15000 innodb 436.04
27 15000 rocksdb 1839.66
27 20000 innodb 584.69
27 20000 rocksdb 2336.94
27 30000 innodb 753.86
27 30000 rocksdb 2508.97
54 5000 innodb 197.51
54 5000 rocksdb 667.63
54 10000 innodb 433.99
54 10000 rocksdb 1600.01
54 10000gp2 innodb 326.12
54 10000gp2 rocksdb 1559.98
54 15000 innodb 661.34
54 15000 rocksdb 2176.83
54 20000 innodb 888.74
54 20000 rocksdb 2506.22
54 30000 innodb 1097.31
54 30000 rocksdb 2690.91

 

We can see that MyRocks outperformed InnoDB in every single combination, but it is also important to note the following:

MyRocks on io1 5000 IOPS showed the performance that InnoDB showed in io1 15000 IOPS.

That means that InnoDB requires three times more in storage throughput. If we take a look at the storage cost, it corresponds to three times more expensive storage. Given that MyRocks requires less storage, it is possible to save even more on storage capacity.

On the most economical storage (3400GB gp2, which will provide 10000 IOPS) MyRocks showed 4.7 times better throughput.

For the 30000 IOPS storage, MyRocks was still better by 2.45 times.

However it is worth noting that MyRocks showed a greater variance in throughput during the runs. Let’s review the charts with 1 sec resolution for GP2 and io1 30000 IOPS storage:Throughput 1 sec resolution for GP2 and io1 30000 IOPS storage MyROCKS versus InnoDB

Such variance might be problematic for workloads that require stable throughput and where periodical slowdowns are unacceptable.

Conclusion

MyRocks is suitable and beneficial not only for fast SSD, but also for cloud deployments. By requiring less IOPS, MyRocks can provide better performance and save on the storage costs.

However, before evaluating MyRocks, make sure that your workload is IO-bound i.e. the working set is much bigger than available memory. For CPU-intensive workloads (where the working set fits into memory), MyRocks will be less beneficial or even perform worse than InnoDB (as described in the blog post A Look at MyRocks Performance)

 

 

 

The post Saving With MyRocks in The Cloud appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com