Percona Operator for MongoDB supports multi-cluster or cross-site replication deployments since version 1.10. This functionality is extremely useful if you want to have a disaster recovery deployment or perform a migration from or to a MongoDB cluster running in Kubernetes. In a nutshell, it allows you to use Operators deployed in different Kubernetes clusters to manage and expand replica sets.
For example, you have two Kubernetes clusters: one in Region A, another in Region B.
- In Region A you deploy your MongoDB cluster with Percona Operator.
- In Region B you deploy unmanaged MongoDB nodes with another installation of Percona Operator.
- You configure both Operators, so that nodes in Region B are added to the replica set in Region A.
In case of failure of Region A, you can switch your traffic to Region B.
Migrating MongoDB to Kubernetes describes the migration process using this functionality of the Operator.
This feature was released in tech preview, and we received lots of positive feedback from our users. But one of our customers raised an internal ticket, which was pointing out that cross-site replication functionality does not work with Multi-Cluster Services. This started the investigation and the creation of this ticket – K8SPSMDB-625.
This blog post will go into the deep roots of this story and how it is solved in the latest release of Percona Operator for MongoDB version 1.12.
The Problem
Multi-Cluster Services or MCS allows you to expand network boundaries for the Kubernetes cluster and share Service objects across these boundaries. Someone calls it another take on Kubernetes Federation. This feature is already available on some managed Kubernetes offerings, Google Cloud Kubernetes Engine (GKE) and AWS Elastic Kubernetes Service (EKS). Submariner uses the same logic and primitives under the hood.
MCS Basics
To understand the problem, we need to understand how multi-cluster services work. Let’s take a look at the picture below:
- We have two Pods in different Kubernetes clusters
- We add these two clusters into our MCS domain
- Each Pod has a service and IP-address which is unique to the Kubernetes cluster
- MCS introduces new Custom Resources –
ServiceImport
and
ServiceExport
.
- Once you create a
ServiceExport
object in one cluster,
ServiceImport
object appears in all clusters in your MCS domain.
- This
ServiceImport
object is in
svc.clusterset.local
domain and with the network magic introduced by MCS can be accessed from any cluster in the MCS domain
Above means that if I have an application in the Kubernetes cluster in Region A, I can connect to the Pod in Kubernetes cluster in Region B through a domain name like
my-pod.<namespace>.svc.clusterset.local
. And it works from another cluster as well.
MCS and Replica Set
Here is how cross-site replication works with Percona Operator if you use load balancer:
All replica set nodes have a dedicated service and a load balancer. A replica set in the MongoDB cluster is formed using these public IP addresses. External node added using public IP address as well:
replsets:
- name: rs0
size: 3
externalNodes:
- host: 123.45.67.89
All nodes can reach each other, which is required to form a healthy replica set.
Here is how it looks when you have clusters connected through multi-cluster service:
Instead of load balancers replica set nodes are exposed through Cluster IPs. We have ServiceExports and ServiceImports resources. All looks good on the networking level, it should work, but it does not.
The problem is in the way the Operator builds MongoDB Replica Set in Region A. To register an external node from Region B to a replica set, we will use MCS domain name in the corresponding section:
replsets:
- name: rs0
size: 3
externalNodes:
- host: rs0-4.mongo.svc.clusterset.local
Now our rs.status() will look like this:
"name" : "my-cluster-rs0-0.mongo.svc.cluster.local:27017"
"role" : "PRIMARY"
...
"name" : "my-cluster-rs0-1.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
...
"name" : "my-cluster-rs0-2.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
...
"name" : "rs0-4.mongo.svc.clusterset.local:27017"
"role" : "UNKNOWN"
As you can see, Operator formed a replica set out of three nodes using
svc.cluster.local
domain, as it is how it should be done when you expose nodes with
ClusterIP
Service type. In this case, a node in Region B cannot reach any node in Region A, as it tries to connect to the domain that is local to the cluster in Region A.
In the picture below, you can easily see where the problem is:
The Solution
Luckily we have a Special Interest Group (SIG), a Kubernetes Enhancement Proposal (KEP) and multiple implementations for enabling Multi-Cluster Services. Having a KEP is great since we can be sure the implementations from different providers (i.e GCP, AWS) will follow the same standard more or less.
There are two fields in the Custom Resource that control MCS in the Operator:
spec:
multiCluster:
enabled: true
DNSSuffix: svc.clusterset.local
Let’s see what is happening in the background with these flags set.
ServiceImport and ServiceExport Objects
Once you enable MCS by patching the CR with
spec.multiCluster.enabled: true
, the Operator creates a
ServiceExport
object for each service. These ServiceExports will be detected by the MCS controller in the cluster and eventually a
ServiceImport
for each
ServiceExport
will be created in the same namespace in each cluster that has MCS enabled.
As you see, we made a decision and empowered the Operator to create
ServiceExport
objects. There are two main reasons for doing that:
- If any infrastructure-as-a-code tool is used, it would require additional logic and level of complexity to automate the creation of required MCS objects. If Operator takes care of it, no additional work is needed.
- Our Operators take care of the infrastructure for the database, including Service objects. It just felt logical to expand the reach of this functionality to MCS.
Replica Set and Transport Encryption
The root cause of the problem that we are trying to solve here lies in the networking field, where external replica set nodes try to connect to the wrong domain names. Now, when you enable multi-cluster and set
DNSSuffix
(it defaults to
svc.clusterset.local
), Operator does the following:
- Replica set is formed using MCS domain set in
DNSSuffix
- Operator generates TLS certificates as usual, but adds
DNSSuffix
domains into the picture
With this approach, the traffic between nodes flows as expected and is encrypted by default.
Things to Consider
MCS APIs
Please note that the operator won’t install MCS APIs and controllers to your Kubernetes cluster. You need to install them by following your provider’s instructions prior to enabling MCS for your PSMDB clusters. See our docs for links to different providers.
Operator detects if MCS is installed in the cluster by API resources. The detection happens before controllers are started in the operator. If you installed MCS APIs while the operator is running, you need to restart the operator. Otherwise, you’ll see an error like this:
{
"level": "error",
"ts": 1652083068.5910048,
"logger": "controller.psmdb-controller",
"msg": "Reconciler error",
"name": "cluster1",
"namespace": "psmdb",
"error": "wrong psmdb options: MCS is not available on this cluster",
"errorVerbose": "...",
"stacktrace": "..."
}
ServiceImport Provisioning Time
It might take some time for
ServiceImport
objects to be created in the Kubernetes cluster. You can see the following messages in the logs while creation is in progress:
{
"level": "info",
"ts": 1652083323.483056,
"logger": "controller_psmdb",
"msg": "waiting for service import",
"replset": "rs0",
"serviceExport": "cluster1-rs0"
}
During testing, we saw wait times up to 10-15 minutes. If you see your cluster is stuck in initializing state by waiting for service imports, it’s a good idea to check the usage and quotas for your environment.
DNSSuffix
We also made a decision to automatically generate TLS certificates for Percona Server for MongoDB cluster with
*.clusterset.local
domain, even if MCS is not enabled. This approach simplifies the process of enabling MCS for a running MongoDB cluster. It does not make much sense to change the
DNSSuffix
field, unless you have hard requirements from your service provider, but we still allow such a change.
If you want to enable MCS with a cluster deployed with an operator version below 1.12, you need to update your TLS certificates to include
*.clusterset.local
SANs. See the docs for instructions.
Conclusion
Business relies on applications and infrastructure that serves them more than ever nowadays. Disaster Recovery protocols and various failsafe mechanisms are routine for reliability engineers, not an annoying task in the backlog.
With multi-cluster deployment functionality in Percona Operator for MongoDB, we want to equip users to build highly available and secured database clusters with minimal effort.
Percona Operator for MongoDB is truly open source and provides users with a way to deploy and manage their enterprise-grade MongoDB clusters on Kubernetes. We encourage you to try this new Multi-Cluster Services integration and let us know your results on our community forum. You can find some scripts that would help you provision your first MCS clusters on GKE or EKS here.
There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Operator for MongoDB in CONTRIBUTING.md.