Nov
01
2023
--

Percona Operators Deployment Explained: Delving Into Cluster-Wide vs. Namespace-Scoped

Cluster-Wide vs Namespace-Scoped Percona Kubernetes Operators

Namespaces in Kubernetes provide a way to isolate groups of resources within a single cluster. They are useful in multi-tenant environments with a few to tens of users and teams. They also allow dividing cluster resources between these users through quotas.

Kubernetes Cluster

Percona Operators come with cluster-wide and namespace-scope deployments. When you have a multi-namespace Kubernetes cluster, it poses a question of which way to use it.

It is worth mentioning that there is also a middle ground: cluster-wide with namespace limitations. That is when an Operator is deployed in cluster-wide mode but has control over a few namespaces, not all. There can be multiple Operators in this mode.

In this blog post, we will dive deeper into the pros and cons of these solutions and help you navigate this uncertainty.

Deeper dive

We will review the differences between cluster-wide and namespace-scoped through the prism of security, availability, and performance. These are the most important aspects of any infrastructure.

Availability

Operators are responsible for deploying new and managing existing database clusters. Existing database clusters will continue working if the Operator goes down, but certain features will not be available:

  1. Deploy new and delete existing clusters
  2. Backups and restores
  3. Scaling, upgrades, and other management tasks

When you choose the deployment method, we recommend you consider the blast radius. The more clusters a single operator controls, the bigger the radius is. The recommendation is to lower the blast radius as much as possible.

As you see in the gif, it is also possible that a namespace-scoped deployment can have a huge blast radius. In this case, it would be wise to redistribute database clusters across multiple namespaces.

Performance

Operators do not impact the performance of the databases but can harm the operational aspect. This is quite similar to the availability problem. The issue is that by default, Operator SDK processes custom resources one by one with no concurrency. The more Custom Resources (CR) you create and update with a single Operator, the slower the processing gets. It is important to remember that Operators come with various CRs. For example, Percona Operator for PostgreSQL has three:

  • PerconaPGCluster
  • PerconaPGBackup
  • PerconaPGRestore

This means that not only database clusters themselves but also backups and restores are Custom Resources and can impact the processing. For instance, if you have 100 clusters managed by an Operator and create 100 backups at the same time (on schedule), these resources will be processed consecutively.

Another aspect here is blocking operations: Smart Upgrade, full cluster crash recovery, etc. If the Operator is executing these operations on at least one cluster, then other operations are queuing up.

The recommended strategy here is similar to the Availability part: reduce the blast radius.

Maximum Concurrent Reconciles

It is possible to increase the concurrency of the Operator through MaxConcurrentReconciles, but it might introduce other problems, for example, race conditions. We are researching this option to relax performance constraints.

Security

When you deploy Percona Operators, we create the following resources in the cluster:

  • Custom Resource Definitions (CRDs) – To extend Kubernetes APIs. CRDs are a cluster-level resource; it does not matter which method is used.
  • Service accounts and roles – To give access to the Operator to create, modify, and remove resources.
  • Deployment – It is the Operator itself.

From a security perspective, we are interested in Service accounts and cluster roles. For cluster-wide, the Operator should have access to multiple namespaces (if not all) in the cluster. The Operator creates Stateful Sets, Services, Volumes, and more. To grant this access in cluster-wide mode, we create ClusterRole instead of Role for the Operator. For example, for Percona Operator for MySQL, you can see the difference between rbac.yaml and cw-rbac.yaml (cluster-wide).

Even a service account token leak is a rare situation, but you must be aware that in the case when a cluster-wide Operator has all the keys to the kingdom.

Resource consumption

Every container consumes resources. Operator, as it is deployed as a Pod, is not an exception. In a namespace-scoped or cluster-wide with namespace limitation, you might have multiple installations of Operator Pod that would consume compute resources: CPU and memory. Percona Operators do not consume much – ~50 milicores and ~30MB of RAM. So it might become a problem only if you have hundreds or thousands of Operator installations.

Conclusion

Operators are a tool that simplifies application deployment and management on Kubernetes. There are certain availability, security, and performance considerations that users should keep in mind when using such tools. The general suggestion is to keep the blast radius small — manage a reasonable number of Custom Resources by a single Operator.

 

Percona Kubernetes Operators

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Nov
23
2021
--

Multi-Tenant Kubernetes Cluster with Percona Operators

multi-tenant kubernetes cluster

multi-tenant kubernetes clusterThere are cases where multiple teams, customers, or applications run in the same Kubernetes cluster. Such an environment is called multi-tenant and requires some preparation and management. Multi-tenant Kubernetes deployment allows you to utilize the economy of scale model on various levels:

  • Smaller compute footprint – one control plane, dense container deployments
  • Ease of management – one cluster, not hundreds

In this blog post, we are going to review multi-tenancy best practices, recommendations and see how Percona Kubernetes Operators can be deployed and managed in such Kubernetes clusters.

Multi-Tenancy

Generic

Multi-tenancy usually means a lot of Pods and workloads in a single cluster. You should always remember that there are certain limits when designing your infrastructure. For vanilla Kubernetes, these limits are quite high and hard to reach:

  • 5000 nodes
  • 10 000 namespaces
  • 150 000 pods

Managed Kubernetes services have their own limits that you should keep in mind. For example, GKE allows a maximum of 110 Pods per node on a standard cluster and only 32 on GKE Autopilot nodes.

The older AWS EKS CNI plugin was limiting the number of Pods per node to the number of IP addresses EC2 can have. With the prefix assignment enabled in CNI, you are still going to hit a limit of 110 pods per node.

Namespaces

Kubernetes Namespaces provides a mechanism for isolating groups of resources within a single cluster. The scope of k8s objects can either be cluster scope or namespace scoped. Objects which are accessible across all the namespaces like

ClusterRole

 are cluster scoped and those which are accessible only in a single namespace like Deployments are namespace scoped.

kubernetes namespaces

Deploying a database with Percona Operators creates pods that are namespace scoped. This provides interesting opportunities to run workloads on different namespaces for different teams, projects, and potentially, customers too. 

Example: Percona Distribution for MongoDB Operator and Percona Server for MongoDB can be run on two different namespaces by adding namespace metadata fields. Snippets are as follows:

# Team 1 DB running in team1-db namespace
apiVersion: psmdb.percona.com/v1-11-0
kind: PerconaServerMongoDB
metadata:
 name: team1-server
 namespace: team1-db

# Team 1 deployment running in team1-db namespace
apiVersion: apps/v1
kind: Deployment
metadata:
 name: percona-server-mongodb-operator-team1
 namespace: team1-db


# Team 2 DB running in team2-db namespace
apiVersion: psmdb.percona.com/v1-11-0
kind: PerconaServerMongoDB
metadata:
 name: team2-server
 namespace: team2-db

# Team 2 deployment running in team2-db namespace
apiVersion: apps/v1
kind: Deployment
metadata:
 name: percona-server-mongodb-operator-team2
 namespace: team2-db

Suggestions:

  1. Avoid using the standard namespaces like
    kube-system

    or

    default

    .

  2. It’s always better to run independent workloads on different namespaces unless there is a specific requirement to do it in a shared namespace.

Namespaces can be used per team, per application environment, or any other logical structure that fits the use case.

Resources

The biggest problem in any multi-tenant environment is this – how can we ensure that a single bad apple doesn’t spoil the whole bunch of apples?

ResourceQuotas

Thanks to Resource Quotas, we can restrict the resource utilization of namespaces.

ResourceQuotas

 also allows you to restrict the number of k8s objects which can be created in a namespace. 

Example of the YAML manifest with resource quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team1-quota         
  namespace: team1-db    # Namespace where operator is deployed
spec:
  hard:
    requests.cpu: "10"     # Cumulative CPU requests of all k8s objects in the namespace cannot exceed 10vcpu
    limits.cpu: "20"       # Cumulative CPU limits of all k8s objects in the namespace cannot exceed 20 vcpu
    requests.memory: 10Gi  # Cumulative memory requests of all k8s objects in the namespace cannot exceed 10Gi
    limits.memory: 20Gi    # Cumulative memory limits of all k8s objects in the namespace cannot exceed 20Gi
    requests.ephemeral-storage: 100Gi  # Cumulative ephemeral storage request of all k8s objects in the namespace cannot exceed 100Gi
    limits.ephemeral-storage: 200Gi    # Cumulative ephemeral storage limits of all k8s objects in the namespace cannot exceed 200Gi
    requests.storage: 300Gi            # Cumulative storage requests of all PVC in the namespace cannot exceed 300Gi
    persistentvolumeclaims: 5          # Maximum number of PVC in the namespace is 5
    count/statefulsets.apps: 2         # Maximum number of statefulsets in the namespace is 2
    # count/psmdb: 2                   # Maximum number of PSMDB objects in the namespace is 2, replace the name with proper Custom Resource

Please refer to the Resource Quotas documentation and apply quotas that are required for your use case.

If resource quotas are applied to a namespace, it is required to set containers’ requests and limits, otherwise, you are going to have an error similar to the following:

Error creating: pods "my-cluster-name-rs0-0" is forbidden: failed quota: my-cpu-memory-quota: must specify limits.cpu,requests.cpu

All Percona Operators provide the capability to fine-tune the requests and limits. The following example sets CPU and memory requests for Percona XtraDB Cluster containers:

spec:
  pxc:
    resources:
      requests:
        memory: 4G
        cpu: 2

LimitRange

With

ResourceQuotas

we can control the cumulative resources in the namespaces but if we want to enforce constraints on individual Kubernetes objects, LimitRange is a useful option. 

For example, if Team 1,2,3 are provided a namespace to run workloads,

ResourceQuota

will ensure that none of the team can exceed the quotas allocated and over-utilize the cluster… but what if a badly configured workload (say an operator run from team 1 with higher priority class) is utilizing all the resources allocated to the team?

LimitRange

can be used to enforce resources like compute, memory, ephemeral storage, storage with PVC. The example below highlights some of the possibilities.

apiVersion: v1
kind: LimitRange
metadata:
  name: lr-team1
  namespace: team1-db
spec:
  limits:
  - type: Pod                      
    max:                            # Maximum resource limit of all containers combined. Consider setting default limits
      ephemeral-storage: 100Gi      # Maximum ephemeral storage cannot exceed 100GB
      cpu: "800m"                   # Maximum CPU limits of the Pod is 800mVCPU
      memory: 4Gi                   # Maximum memory limits of the Pod is 4 GB
    min:                            # Minimum resource request of all containers combined. Consider setting default requests
      ephemeral-storage: 50Gi       # Minimum ephemeral storage should be 50GB
      cpu: "200m"                   # Minimum CPU request is  200mVCPU
      memory: 2Gi                   # Minimum memory request is 2 GB
  - type: PersistentVolumeClaim
    max:
      storage: 2Gi                  # Maximum PVC storage limit
    min:
      storage: 1Gi                  # Minimum PVC storage request

Suggestions:

  1. When it’s feasible, apply
    ResourceQuotas

    and

    LimitRanges

     to the namespaces where the Percona operator is running. This ensures that tenants are not overutilizing the cluster.

  2. Set alerts to monitor objects and usage of resources in namespaces. Automation of
    ResourceQuotas

     changes may also be useful in some scenarios.

  3. It is advisable to use a buffer on maximum expected utilization before setting the
    ResourceQuotas

    .

  4. Set
    LimitRanges

    to ensure workloads are not overutilizing resources in individual namespaces.

Roles and Security

Kubernetes provides several modes to authorize an API request. Role-Based access control is a popular way for authorization. There are four important objects to provide access:

ClusterRole Represents a set of permissions across the cluster (cluster scope)
Role Represents a set of permissions within a namespace ( namespace scope)
ClusterRoleBinding Granting permission to subjects across the cluster ( cluster scope )
RoleBinding Granting permissions to subjects within a namespace ( namespace scope)

Subjects in the

RoleBinding/ClusterRoleBinding

can be users, groups, or service accounts. Every pod running in the cluster will have an identity and a service account attached (“default” service account in the same namespace will be attached if not explicitly specified). Permissions granted to the service account with

RoleBinding/ClusterRoleBinding

dictate the access that pods will have. 

Going by the best policy of least privileges, it’s always advisable to use Roles with the least set of permissions and bind it to a service account with

RoleBinding

. This service account can be used to run the operator or custom resource to ensure proper access and also restrict the blast radius.

Avoid granting cluster-level access unless there is a strong use case to do it.

Example: RBAC in MongoDB Operator uses Role and

RoleBinding

restricting access to a single namespace for the service account. The same service account is used for both CustomResource and the Operator

Network Policies

Network isolation provides additional security to applications and customers in a multi-tenant environment. Network policies are Kubernetes resources that allow you to control the traffic between Pods, CIDR blocks, and network endpoints, but the most common approach is to control the traffic between namespaces:

kubernetes network policies

Most Container Network Interface (CNI) plugins support the implementation of network policies, however, if they don’t and

NetworkPolicy

is created, the resource is silently ignored. For example, AWS CNI does not support network policies, but AWS EKS can run Calico CNI which does.

It is a good approach to follow the least privilege approach, whereby default traffic is denied and access is granted granularly:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: app1-db
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Allow traffic from Pods in namespace

app1

to namespace

app1-db

:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: app1-db
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: app1
  policyTypes:
  - Ingress

Policy Enforcement

In a multi-tenant environment, policy enforcement plays a key role. Policy enforcement ensures that k8s objects pass the required quality gates set by administrators/teams. Some examples of policy enforcement could be:

  1. All the workloads have proper labels 
  2. Proper network policies are set for DB
  3. Unsafe configurations are not allowed (Example)
  4. Backups are always enabled (Example)

The K8s ecosystem offers a wide range of options to achieve this. Some of them are listed below:

  1. Open Policy Agent (OPA) is a CNCF graduated project which gives a high-level declarative language to author and enforces policies across k8s objects. (Examples from Google and OPA repo can be helpful)
  2. Mutating Webhooks can be used to modify API calls before it reaches the API server. This can be used to set required properties for k8s objects. (Example: mutating webhook to add
    NetworkPolicy

    for Pods created in production namespaces)

  3. Validating Webhooks can be used to check if k8s API follows the required policy, any API which doesn’t follow the policy will be rejected. (Example: validating webhook to ensure huge pages of 1GB is not used in the pod )

Cluster-Wide

Percona Distribution for MySQL Operator and Percona Distribution for PostgreSQL Operator both support cluster-wide mode which allows single Operator deploy and manage databases across multiple namespaces (support for cluster-wide mode in Percona Operator for MongoDB is on the roadmap). Is also possible to have an Operator per namespace:

Operator per namespace

For example, a single deployment of Percona Distribution for MySQL Operator can monitor multiple namespaces in cluster-wide mode. The use can specify them in WATCH_NAMESPACE environment variable in the

cw-bundle.yaml

file:

    spec:
      containers:
      - command:
        - percona-xtradb-cluster-operator
        env:
        - name: WATCH_NAMESPACE
          value: "namespace-a, namespace-b"

In a multi-tenant environment, it depends on the amount of freedom you want to give to the tenants. Usually when the tenants are highly trusted (for instance internal teams), then it is fine to choose namespace-scoped deployment, where each team can deploy and manage the Operator themselves.

Conclusion

It is important to remember that Kubernetes is not a multi-tenant system out of the box. Various levels of isolation were described in this blog post that would help you to run your applications and databases securely and ensure operational stability. 

We encourage you to try out our Operators:

CONTRIBUTING.md in every repository is there for those of you who want to contribute your ideas, code, and docs.

For general questions please raise the topic in the community forum.

Oct
03
2016
--

MySQL 8.0 General Tablespaces: File per Database (and no FRM files)

MySQL 8.0 General Tablespaces

MySQL 8.0 General TablespacesIn this blog post, we’ll look at MySQL 8.0 general tablespaces.

Introduction

MySQL 8.0 (the DMR version is available now) has two great features (among others):

  1. The new data dictionary completely removed *.frm files, which is great
  2. The ability to create a tablespace and assign a group of tables to it (originally introduced in 5.7).

With those two options, we can use MySQL for creating multi-tenant environments with a “schema per customer” approach.

Schema per Customer with MySQL 8.0

Using schema per customer with older MySQL versions presents issues  … namely the number of files. (I’ve described schema per customer approach in MySQL in an older blog post.) Let’s say you are hosting a Drupal-based site for your customers, and you create a new database (AKA “schema”) per each customer. You do not want to create one schema for all because each customer wants to extend Drupal and use plugins that will create their own unique set of tables. With tablespace per table and an FRM file, 10K customers will end up with:

  • 65 tables per schema,
  • Two files per table, and
  • 10K schemas

. . . or a grand total of 1.3 million files!

With MySQL 8.0, we can create a tablespace file per each schema and place those tablespace files in a specific set of directories. For example, if we have demo, test and production accounts, we can create a set of directories (outside of the MySQL datadir) and place tablespaces inside them. With no FRM files, we will only have 10 thousands of files, evenly split across multiple locations.

Example:

mysql>  create database if not exists drupal_customer_name;
Query OK, 1 row affected (0.00 sec)
mysql> CREATE TABLESPACE drupal_customer_name
ADD DATAFILE '/var/lib/mysql_datafiles/drupal_demo/drupal_customer_name.ibd'
Engine=InnoDB;
Query OK, 0 rows affected (0.16 sec)
mysql> use drupal_customer_name;
Database changed
mysql> create table t(i int) ENGINE=InnoDB
TABLESPACE drupal_customer_name;
Query OK, 0 rows affected (0.13 sec)
mysql> create table t1(i int) ENGINE=InnoDB
TABLESPACE drupal_customer_name;
Query OK, 0 rows affected (0.00 sec)

Now let’s look at the directory:

ls -lah /var/lib/mysql_datafiles/drupal_demo/
-rw-r----- 1 mysql mysql 144K Sep 26 00:58 drupal_customer_name.ibd

The downside of this approach is that the “create tables” command should have the tablespace name in it. I’ve created a sample “deploy” script to create a new schema for a customer:

customer_name="my_drupal_customer"
mysql -f -vvv -e "create database if not exists $customer_name;
CREATE TABLESPACE $customer_name ADD DATAFILE '/var/lib/mysql_datafiles/drupal_demo/${customer_name}.ibd'
engine=InnoDB;"
cat drupal.sql | sed -e "s/ENGINE=InnoDB/ENGINE=InnoDB TABLESPACE $customer_name/g"|mysql $customer_name

Size and Timing

In the next post, I plan to benchmark the performance of millions of tables with MySQL 8.0 and tablespace file per database. Here, I’ve compared the create table performance between MySQL 5.7 (with FRM and file per table), MySQL 8.0 with a file per table (FRMs are gone) and MySQL 8.0 with a file per database. Time to create 1000 databases for Drupal (no data), 65 tables in each database:

  • MySQL 5.7, file per table: 3m21.819s
  • MySQL 8.0,  file per table: 2m54.358s
  • MySQL 8.0,  file per database: 1m55.133s

What about the size on disk? It did not change much. Actually, the size on disk for the blank tables (no data) is more in MySQL 8.0:

  • 8.0: 10M (10485760 bytes) per 65 blank tables (Drupal)
  • 5.7: 9.2M (9280821 bytes) per 65 blank tables, including FRM files

With 10K schemas, it is 100G just to store tablespaces (schema overhead). At the same time, it is not 100% overhead: InnoDB creates 112K+ the tablespace file right away (it depends upon the table structure). When the data is loaded it will use this reserved space.

Tablespace supports compression as well:  CREATE TABLESPACE … ADD DATAFILE ‘…’ FILE_BLOCK_SIZE = 8192 Engine=InnoDB; CREATE TABLE … ENGINE=InnoDB TABLESPACE … ROW_FORMAT=COMPRESSED;

New Data Dictionary

MySQL 8.0 uses a new transactional data dictionary.

MySQL Server 8.0 now incorporates a global data dictionary containing information about database objects in transactional tables. In previous MySQL releases, dictionary data was stored in metadata files and nontransactional system tables.

That also means that all old metadata files are gone: no .frm, .par, .trn, .trg files. In addition tables inside a MySQL database are not using MyISAM tables anymore. The new installation has no single MyISAM table, although the MyISAM engine is supported:

Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 2299
Server version: 8.0.0-dmr MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> select count(*) from information_schema.tables where engine = 'MyISAM';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.03 sec)
mysql> select count(*), engine from information_schema.tables where table_schema = 'mysql' group by engine;
+----------+--------+
| count(*) | ENGINE |
+----------+--------+
|        2 | CSV    |
|       30 | InnoDB |
+----------+--------+
2 rows in set (0.00 sec)

Conclusion

FRM free installation looks great (performance testing of it is my next step). I would also love to see some additional features in MySQL 8.0:

  • Easier tablespace level manipulations, i.e. “optimize tablespace” to re-claim space in the general tablespace file; add “if exists / if not exists” to create/drop tablespace
  • Much smaller “reserved” space: if the data dictionary is stored elsewhere, we can create a one-page file (16K) + table structure.
References

Please note: MySQL 8.0 is not production ready and only available for preview.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com