Namespaces in Kubernetes provide a way to isolate groups of resources within a single cluster. They are useful in multi-tenant environments with a few to tens of users and teams. They also allow dividing cluster resources between these users through quotas.
Percona Operators come with cluster-wide and namespace-scope deployments. When you have a multi-namespace Kubernetes cluster, it poses a question of which way to use it.
It is worth mentioning that there is also a middle ground: cluster-wide with namespace limitations. That is when an Operator is deployed in cluster-wide mode but has control over a few namespaces, not all. There can be multiple Operators in this mode.
In this blog post, we will dive deeper into the pros and cons of these solutions and help you navigate this uncertainty.
Deeper dive
We will review the differences between cluster-wide and namespace-scoped through the prism of security, availability, and performance. These are the most important aspects of any infrastructure.
Availability
Operators are responsible for deploying new and managing existing database clusters. Existing database clusters will continue working if the Operator goes down, but certain features will not be available:
- Deploy new and delete existing clusters
- Backups and restores
- Scaling, upgrades, and other management tasks
When you choose the deployment method, we recommend you consider the blast radius. The more clusters a single operator controls, the bigger the radius is. The recommendation is to lower the blast radius as much as possible.
As you see in the gif, it is also possible that a namespace-scoped deployment can have a huge blast radius. In this case, it would be wise to redistribute database clusters across multiple namespaces.
Performance
Operators do not impact the performance of the databases but can harm the operational aspect. This is quite similar to the availability problem. The issue is that by default, Operator SDK processes custom resources one by one with no concurrency. The more Custom Resources (CR) you create and update with a single Operator, the slower the processing gets. It is important to remember that Operators come with various CRs. For example, Percona Operator for PostgreSQL has three:
- PerconaPGCluster
- PerconaPGBackup
- PerconaPGRestore
This means that not only database clusters themselves but also backups and restores are Custom Resources and can impact the processing. For instance, if you have 100 clusters managed by an Operator and create 100 backups at the same time (on schedule), these resources will be processed consecutively.
Another aspect here is blocking operations: Smart Upgrade, full cluster crash recovery, etc. If the Operator is executing these operations on at least one cluster, then other operations are queuing up.
The recommended strategy here is similar to the Availability part: reduce the blast radius.
Maximum Concurrent Reconciles
It is possible to increase the concurrency of the Operator through MaxConcurrentReconciles, but it might introduce other problems, for example, race conditions. We are researching this option to relax performance constraints.
Security
When you deploy Percona Operators, we create the following resources in the cluster:
- Custom Resource Definitions (CRDs) – To extend Kubernetes APIs. CRDs are a cluster-level resource; it does not matter which method is used.
- Service accounts and roles – To give access to the Operator to create, modify, and remove resources.
- Deployment – It is the Operator itself.
From a security perspective, we are interested in Service accounts and cluster roles. For cluster-wide, the Operator should have access to multiple namespaces (if not all) in the cluster. The Operator creates Stateful Sets, Services, Volumes, and more. To grant this access in cluster-wide mode, we create ClusterRole instead of Role for the Operator. For example, for Percona Operator for MySQL, you can see the difference between rbac.yaml and cw-rbac.yaml (cluster-wide).
Even a service account token leak is a rare situation, but you must be aware that in the case when a cluster-wide Operator has all the keys to the kingdom.
Resource consumption
Every container consumes resources. Operator, as it is deployed as a Pod, is not an exception. In a namespace-scoped or cluster-wide with namespace limitation, you might have multiple installations of Operator Pod that would consume compute resources: CPU and memory. Percona Operators do not consume much – ~50 milicores and ~30MB of RAM. So it might become a problem only if you have hundreds or thousands of Operator installations.
Conclusion
Operators are a tool that simplifies application deployment and management on Kubernetes. There are certain availability, security, and performance considerations that users should keep in mind when using such tools. The general suggestion is to keep the blast radius small — manage a reasonable number of Custom Resources by a single Operator.
You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.