Automating the deployment and management of MongoDB on Kubernetes is an easy journey with Percona Operator. By default, MongoDB is deployed using persistent volume claims (PVC). In the cases where you seek exceptional performance or you don’t have any external block storage, it is also possible to use local storage. Usually, it makes sense to use local NVMe SSD for better performance (for example Amazon’s i3 and i4i instance families come with local SSDs).
With PVCs, migrating the container from one Kubernetes node to another is straightforward and does not require any manual steps, whereas local storage comes with certain caveats. OpenEBS allows you to simplify local storage management on Kubernetes. In this blog post, we will show you how to deploy MongoDB with Percona Operator and leverage OpenEBS for local storage.
Set-Up
Install OpenEBS
We are going to deploy OpenEBS with a helm chart. Refer to OpenEBS documentation for more details.
helm repo add openebs https://openebs.github.io/charts
helm repo update
helm install openebs --namespace openebs openebs/openebs --create-namespace
This is going to install OpenEBS along with
openebs-hostpath
storageClass:
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
…
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 71s
Deploy MongoDB Cluster
We will use a helm chart for it as well and follow this document.
helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update
Install the Operator:
helm install my-op percona/psmdb-operator
Deploy the database using local storage. We will disable sharding for this demo for simplicity:
helm install mytest percona/psmdb-db --set sharding.enabled=false \
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \
--set "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \
--set "replsets[0].name=rs0" --set "replsets[0].size=3"
As a result, we should have a replica set with three nodes using
openebs-hostpath
storageClass.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-op-psmdb-operator-58c74cbd44-stxqq 1/1 Running 0 5m56s
mytest-psmdb-db-rs0-0 2/2 Running 0 3m58s
mytest-psmdb-db-rs0-1 2/2 Running 0 3m32s
mytest-psmdb-db-rs0-2 2/2 Running 0 3m1s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongod-data-mytest-psmdb-db-rs0-0 Bound pvc-63d3b722-4b31-42ab-b4c3-17d8734c92a3 3Gi RWO openebs-hostpath 4m2s
mongod-data-mytest-psmdb-db-rs0-1 Bound pvc-2bf6d908-b3c0-424c-9ccd-c3be3295da3a 3Gi RWO openebs-hostpath 3m36s
mongod-data-mytest-psmdb-db-rs0-2 Bound pvc-9fa3e21e-bfe2-48de-8bba-0dae83b6921f 3Gi RWO openebs-hostpath 3m5s
Local Storage Caveats
Local storage is the node storage. It means that if something happens with the node, it will also have an impact on the data. We will review various regular situations and how they impact Percona Server for MongoDB on Kubernetes with local storage.
Node Restart
Something happened with the Kubernetes node – server reboot, virtual machine crash, etc. So the node is not lost but just restarted. Let’s see what would happen in this case with our MongoDB cluster.
I will restart one of my Kubernetes nodes. As a result, the Pod will go into a Pending state:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-op-psmdb-operator-58c74cbd44-stxqq 1/1 Running 0 58m
mytest-psmdb-db-rs0-0 2/2 Running 0 56m
mytest-psmdb-db-rs0-1 0/2 Pending 0 67s
mytest-psmdb-db-rs0-2 2/2 Running 2 55m
In normal circumstances, the Pod should be rescheduled to another node, but it is not happening now. The reason is local storage and affinity rules. If you do
kubectl describe pod mytest-psmdb-db-rs0-1
, you would see something like this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 72s (x2 over 73s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict.
Normal NotTriggerScaleUp 70s cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict
As you see, the cluster is not scaled up as this Pod needs a specific node that has its storage. We can see this annotation in PVC itself:
$ kubectl describe pvc mongod-data-mytest-psmdb-db-rs0-1
Name: mongod-data-mytest-psmdb-db-rs0-1
Namespace: default
StorageClass: openebs-hostpath
…
Annotations: pv.kubernetes.io/bind-completed: yes
…
volume.kubernetes.io/selected-node: gke-sergey-235-default-pool-9f5f2e2b-4jv3
…
Used By: mytest-psmdb-db-rs0-1
In other words, this Pod will wait for the node to come back. Till it comes back your MongoDB cluster will be in a degraded state, running two nodes out of three. Keep this in mind when you perform maintenance or experience a Kubernetes node crash. With PVCs, this MongoDB Pod would be rescheduled to a new node right away.
Graceful Migration to Another Node
Let’s see what is the best way to migrate one MongoDB replica set Pod from one node to another when local storage is used. There can be multiple reasons – node maintenance, migration to another rack, datacenter, or newer hardware. We want to perform such a migration with no downtime and minimal performance impact on the database.
Firstly, we will add more nodes to the replica set by scaling up the cluster. We will use helm again and change the size from three to five:
helm upgrade mytest percona/psmdb-db --set sharding.enabled=false \
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \
--set "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \
--set "replsets[0].name=rs0" --set "replsets[0].size=5"
This will create two more Pods in a replica set. Both Pods will use
openebs-hostpath
storage as well. By default, our affinity rules require you to run replica set nodes on different Kubernetes nodes, so either enable auto-scaling or ensure you have enough nodes in your cluster. We are adding more nodes to avoid performance impact.
Once all five replica set nodes are up, we will drain the Kubernetes node we need. This will remove all the pods from it gracefully.
kubectl drain gke-sergey-235-default-pool-9f5f2e2b-rtcz --ignore-daemonsets
As with the node restart described in the previous chapter, the replica set Pod will be stuck in Pending status waiting for the local storage.
kubectl get pods
NAME READY STATUS RESTARTS AGE
…
mytest-psmdb-db-rs0-2 0/2 Pending 0 65s
The storage will not come back. To solve it we need to remove the PVC and delete the Pod:
kubectl delete pvc mongod-data-mytest-psmdb-db-rs0-2
persistentvolumeclaim "mongod-data-mytest-psmdb-db-rs0-2" deleted
kubectl delete pod mytest-psmdb-db-rs0-2
pod "mytest-psmdb-db-rs0-2" deleted
This will trigger the creation of a new PVC and a Pod on another node:
NAME READY STATUS RESTARTS AGE
…
mytest-psmdb-db-rs0-2 2/2 Running 2 1m
Again all five replica set pods are up and running. You can now perform the maintenance on your Kubernetes node.
What is left is to scale down replica set back to three nodes:
helm upgrade mytest percona/psmdb-db --set sharding.enabled=false \
--set "replsets[0].volumeSpec.pvc.storageClassName=openebs-hostpath" \
--set "replsets[0].volumeSpec.pvc.resources.requests.storage=3Gi" \
--set "replsets[0].name=rs0" --set "replsets[0].size=3"
Node Loss
When the Kubernetes node is dead and there is no chance for it to recover, we will face the same situation as with graceful migration: Pod will be stuck in Pending status waiting for the node to come back. The recovery path is the same:
- Delete Persistent Volume
- Delete the Pod
- The pod will start on another node and sync the data to a new local PVC
Conclusion
Local storage can boost your database performance and remove the need for cloud storage completely. This can also lower your public cloud provider bill. In this blog post, we saw that these benefits come with a higher maintenance cost, that can be also automated.
We encourage you to try out Percona Distribution for MongoDB Operator with local storage and share your results on our community forum.
There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Distribution for MongoDB Operator in CONTRIBUTING.md.