Apr
20
2021
--

Change Storage Class on Kubernetes on the Fly

Change Storage Class Kubernetes

Change Storage Class KubernetesPercona Kubernetes Operators support various options for storage: Persistent Volume (PV), hostPath, ephemeral storage, etc. In most of the cases, PVs are used, which are provisioned by the Operator through Storage Classes and Persistent Volume Claims.

Storage Classes define the underlying volume type that should be used (ex. AWS, EBS, gp2, or io1), file system (xfs, ext4), and permissions. In various cases, cluster administrators want to change the Storage Class for already existing volumes:

  • DB cluster is underutilized and it is a good cost-saving when switching from io1 to gp2
  • The other way – DB cluster is saturated on IO and it is required to upsize the volumes
  • Switch the file system for better performance (MongoDB is much better with xfs)

In this blog post, we will show what the best way is to change the Storage Class with Percona Operators and not introduce downtime to the database. We will cover the change in the Storage Class, but not the migration from PVC to other storage types, like hostPath.

Changing Storage Class on Kubernetes

Prerequisites:

Goal:

  • Change the storage from pd-standard to pd-ssd without downtime for PXC.

Planning

The steps we are going to take are the following:

  1. Create a new Storage class for pd-ssd volumes
  2. Change the
    storageClassName

    in Custom Resource (CR) 

  3. Change the
    storageClassName

    in the StatefulSet

  4. Scale up the cluster (optional, to avoid performance degradation)
  5. Reprovision the Pods one by one to change the storage
  6. Scale down the cluster

 

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

 

Execution

Create the Storage Class

By default, standard Storage Class is already present in GKE, we need to create the new

StorageClass

 for ssd:

$ cat pd-ssd.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
volumeBindingMode: Immediate
reclaimPolicy: Delete

$ kubectl apply -f pd-ssd.yaml

The new Storage Class will be called

ssd

and will provision the volumes of

type: pd-ssd

.

Change the storageClassName in Custom Resource

We need to change the configuration for Persistent Volume Claims in our Custom Resource. The variable we look for is

storageClassName

and it is located under

spec.pxc.volumeSpec.persistentVolumeClaim

.

spec:
  pxc:
    volumeSpec:
      persistentVolumeClaim:
-       storageClassName: standard
+       storageClassName: ssd

Now apply new cr.yaml:

$ kubectl apply -f deploy/cr.yaml

Change the storageClassName in the StatefulSet

StatefulSets are almost immutable and when you try to edit the object you get the warning:

# * spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We will rely on the fact that the operator controls the StatefulSet. If we delete it, the Operator is going to recreate it with the last applied configuration. For us, it means – with the new storage class. But the deletion of the StatefulSet leads to Pods termination, but our goal is 0 downtime. To get there, we will delete the set, but keep the Pods running. It can be done with –cascade flag:

$ kubectl delete sts cluster1-pxc --cascade=orphan

As a result, the Pods are up and the Operator recreated the StatefulSet with new storageClass:

$ kubectl get sts | grep cluster1-pxc
cluster1-pxc       3/3      8s

Change Storage Class on Kubernetes

Scale up the Cluster (Optional)

Changing the storage type would require us to terminate the Pods, which decreases the computational power of the cluster and might cause performance issues. To improve performance during the operation we are going to changing the size of the cluster from 3 to 5 nodes:

spec:
  pxc:
-   size: 3
+   size: 5

$ kubectl apply -f deploy/cr.yaml

As long as we have changed the StatefulSet already, new PXC Pods will be provisioned with the volumes backed by the new

StorageClass

:

$ kubectl get pvc
datadir-cluster1-pxc-0   Bound    pvc-6476a94c-fa1b-45fe-b87e-c884f47bd328   6Gi        RWO            standard       78m
datadir-cluster1-pxc-1   Bound    pvc-fcfdeb71-2f75-4c36-9d86-8c68e508da75   6Gi        RWO            standard       76m
datadir-cluster1-pxc-2   Bound    pvc-08b12c30-a32d-46a8-abf1-59f2903c2a9e   6Gi        RWO            standard       64m
datadir-cluster1-pxc-3   Bound    pvc-b96f786e-35d6-46fb-8786-e07f3097da02   6Gi        RWO            ssd            69m
datadir-cluster1-pxc-4   Bound    pvc-84b55c3f-a038-4a38-98de-061868fd207d   6Gi        RWO            ssd           68m

Reprovision the Pods One by One to Change the Storage

This is the step where underlying storage is going to be changed for the database Pods.

Delete the PVC of the Pod that you are going to reprovision. Like for Pod

cluster1-pxc-2

the PVC is called

datadir-cluster1-pxc-2

:

$ kubectl delete pvc datadir-cluster1-pxc-2

The PVC will not be deleted right away as there is a Pod using it. To proceed, delete the Pod:

$ kubectl delete pod cluster1-pxc-2

The Pod will be deleted along with the PVCs. The StatefulSet controller will notice that the pod is gone and will recreate it along with the new PVC of a new Storage Class:

$ kubectl get pods
...
cluster1-pxc-2                                     0/3     Init:0/1   0          7s

$ kubectl get pvc datadir-cluster1-pxc-2
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
…
datadir-cluster1-pxc-2   Bound    pvc-08b12c30-a32d-46a8-abf1-59f2903c2a9e   6Gi        RWO            ssd      10s

The STORAGECLASS column indicates that this PVC is of type ssd.

You might face the situation that the Pod is stuck in a pending state with the following error:

  Warning  FailedScheduling  63s (x3 over 71s)  default-scheduler  persistentvolumeclaim "datadir-cluster1-pxc-2" not found

It can happen due to the race condition: Pod was created when old PVC was terminating, and when the Pod is ready to start the PVC is already gone. Just delete the Pod again, so that the PVC is recreated.

Once the Pod is up, the State Snapshot Transfer kicks in and the data is synced from other nodes. It might take a while if your cluster holds lots of data and is heavily utilized. Please wait till the node is fully up and running, sync is finished, and only then proceed to the next Pod.

Scale Down the Cluster

Once all the Pods are running on the new storage it is time to scale down the cluster (if it was scaled up):

spec:
  pxc:
-   size: 5
+   size: 3


$ kubectl apply -f deploy/cr.yaml

Do not forget to clean up the PVCs for nodes 4 and 5. And done!

Conclusion

Changing the storage type is a simple task on the public clouds, but its simplicity is not synchronized yet with Kubernetes capabilities. Kubernetes and Container Native landscape is evolving and we hope to see this functionality soon. The way of changing Storage Class described in this blog post can be applied to both the Percona Operator for PXC and Percona Operator for MongoDB. If you have ideas on how to automate this and are willing to collaborate with us on the implementation, please submit the Issue to our public roadmap on Github.

Apr
07
2021
--

Percona Kubernetes Operators and Azure Blob Storage

Percona Kubernetes Operators and Azure Blob Storage

Percona Kubernetes Operators allow users to simplify deployment and management of MongoDB and MySQL databases on Kubernetes. Both operators allow users to store backups on S3-compatible storage and leverage Percona XtraBackup and Percona Backup for MongoDB to deliver backup and restore functionality. Both backup tools do not work with Azure Blob Storage, which is not compatible with the S3 protocol.

This blog post explains how to run Percona Kubernetes Operators along with MinIO Gateway on Azure Kubernetes Service (AKS) and store backups on Azure Blob Storage:

Percona Kubernetes Operators along with MinIO Gateway

Setup

Prerequisites:

  • Azure account
  • Azure Blob Storage account and container (the Bucket in AWS terms)
  • Cluster deployed with Azure Kubernetes Service (AKS)

Deploy MinIO Gateway

I have prepared the manifest to deploy the MinIO gateway to Kubernetes, you can find them in the Github repo here.

First, create a separate namespace:

kubectl create namespace minio-gw

Create the secret which contains credentials for Azure Blob Storage:

$ cat minio-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: minio-secret
stringData:
  AZURE_ACCOUNT_NAME: Azure_account_name
  AZURE_ACCOUNT_KEY: Azure_account_key

$ kubectl -n minio-gw apply -f minio-secret.yaml

Apply

minio-gateway.yaml

 from the repository. This manifest does two things:

  1. Creates MinIO Pod backed by Deployment object
  2. Exposes this Pod on port 9000 as a ClusterIP through a Service object
$ kubectl -n minio-gw apply -f blog-data/operators-azure-blob/minio-gateway.yaml

It is also possible to use Helm Charts and deploy the Gateway with MinIO Operator. You can read more about it here. Running a MinIO Operator might be a good choice, but it is an overkill for this blog post.

Deploy PXC

Get the code from Github:

git clone -b v1.7.0 https://github.com/percona/percona-xtradb-cluster-operator

Deploy the bundle with Custom Resource Definitions:

cd percona-xtradb-cluster-operator 
kubectl apply -f deploy/bundle.yaml

Create the Secret object for backup. You should use the same Azure Account Name and Key that you used to setup MinIO:

$ cat deploy/backup-s3.yam
apiVersion: v1
kind: Secret
metadata:
  name: azure-backup
type: Opaque
data:
  AWS_ACCESS_KEY_ID: BASE64_ENCODED_AZURE_ACCOUNT_NAME
  AWS_SECRET_ACCESS_KEY: BASE64_ENCODED_AZURE_ACCOUNT_KEY

Add storage configuration into

cr.yaml

 under

spec.backup.storages

.

storages:
  azure-minio:
    type: s3
    s3:
      bucket: test-azure-container
      credentialsSecret: azure-backup
      endpointUrl: http://minio-gateway-svc.minio-gw:9000

  • bucket

    is the container created on Azure Blob Storage.

  • endpointUrl

    must point to the MinIO Gateway service that was created in the previous section.

Deploy the database cluster:

$ kubectl apply -f deploy/cr.yaml

Read more about the installation of the Percona XtraDB Cluster Operator in our documentation.

Take Backups and Restore

To take the backup or restore, follow the regular approach by creating corresponding

pxc-backup

or

pxc-restore

Custom Resources in Kubernetes. For example, to take the backup I use the following manifest:

$ cat deploy/backup/backup.yaml
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: backup1
spec:
  pxcCluster: cluster1
  storageName: azure-minio

This creates the Custom Resource object

pxc-backup

and the Operator uploads the backup to the Container in my Storage account:

Read more about backup and restore functionality in the Percona Kubernetes Operator for Percona XtraDB Cluster documentation.

Conclusion

Even though Azure Blob Storage is not S3-compatible, Cloud Native landscape provides production-ready tools for seamless integration. MinIO Gateway will work for both Percona Kubernetes Operators for MySQL and MongoDB, enabling S3-like backup and restore functionality.

The Percona team is committed to delivering smooth integration for its software products for all major clouds. Adding support for Azure Blob Storage is on the roadmap of Percona XtraBackup and Percona Backup for MongoDB, so as the certification on Azure Kubernetes Service for both operators.

Apr
01
2021
--

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression FeatureIt is no secret that compute and storage costs are the main drivers of cloud bills. Migration of data from the legacy data center to the cloud looks appealing at first as it significantly reduces capital expense (CapEx) and keeps operational expenses (OpEx) under control. But once you see the bill, the lift and shift project does not look that promising anymore. See Percona’s recent open source survey which shows that many organizations saw an unexpected growth around cloud and data.

Storage growth is an organic process for the expanding business: more customers store more data, and more data needs more backups and disaster recovery storage for low RTO.

Today, the Percona Innovation Team, which is part of the Engineering organization, is proud to announce a new feature – High Compression. With this feature enabled, your MySQL databases will have infinite storage at zero cost.

The Problem

Our research team was digging into the problem of storage growth. They have found that the storage growth of a successful business inevitably leads to the increase of the cloud bill. After two years of research we got the data we need and the problem is now clear, and you can see it on the chart below:

The correlation is clearly visible – the more data you have, the more you pay.

The Solution

Once our Innovation Team received the data, we started working day and night on the solution. The goal was to change the trend and break the correlation. That is how after two years, we are proud to share with the community the High Compression feature. You can see the comparison of the storage costs with and without this new feature below:

Option 100 TB AWS EBS 100 TB AWS S3 for backups 100 TB AWS EBS + High compression 100 TB AWS S3 for backups + High Compression
Annual run rate

$120,000

$25,200

$1.2

< $1

As you see it is a 100,000x difference! What is more interesting, the cost of the storage with the High Compression feature enabled always stays flat and the chart now looks like this:

Theory

Not many people know, but data on disks is stored as bits, which are 0s and 1s. They form the binary sequences which are translated into normal data.

After thorough research, we came to the conclusion that we can replace the 1s with 0s easily. The formula is simple:

f(1) = 0

So instead of storing all these 1s, our High Compression feature stores zeroes only:

 

Implementation

The component which does the conversion is called the Nullifier, and every bit of data goes through it. We are first implementing this feature in Percona Operator for Percona XtraDB Cluster and below is the technical view of how it is implemented in Kubernetes:

As you see, all the data written by the user (all Insert or Update statements) goes through the Nullifier first, and only then are stored on the Persistent Volume Claim (PVC). With the High Compression feature enabled, the size of the PVC can be always 1 GB.

Percona is an open source company and we are thrilled to share our code with everyone. You can see the Pull Request for the High Compression feature here. As you see in the PR, our feature provides the Nullifier through the underestimated and very powerful Blackhole engine.

  if [ "$HIGH_COMPRESSION" == 'yes' ]; then
        sed -r "s|^[#]?default_storage_engine=.*$|default_storage_engine=BLACKHOLE|" ${CFG} 1<>${CFG}
        grep -E -q "^[#]?pxc_strict_mode" "$CFG" || sed '/^\[mysqld\]/a pxc_strict_mode=PERMISSIVE\n' ${CFG} 1<>${CFG}
    fi

The High Compression feature will be enabled by default starting from PXC Operator version 1.8.0, but we have added the flag into

cr.yaml

to disable this feature if needed:

spec.pxc.highCompression: true

.

Backups and with the High Compression feature are blazing fast and take seconds with any amount of data. The challenge our Engineering team is working on now is recovery. The Nullifier does the job, but recovering the data is hard. We are confident that De-Nullifier will be released in 1.8.0 as well.

Conclusion

Percona is spearheading innovation in the database technology field. The High Compression feature solves the storage growth problem and as a result, reduces the cloud bill significantly. The release of the Percona Kubernetes Operator for Percona XtraDB Cluster 1.8.0 is planned for mid-April, but this feature is already available in Tech Preview.

As a quick peek at our roadmap, we are glad to share that the Innovation Team has already started working on the High-Density feature, which will drastically reduce the compute footprint required to run MySQL databases.

Mar
22
2021
--

Storing Kubernetes Operator for Percona Server for MongoDB Secrets in Github

storing kubernetes MongoDB secrets github

storing kubernetes MongoDB secrets githubMore and more companies are adopting GitOps as the way of implementing Continuous Deployment. Its elegant approach built upon a well-known tool wins the hearts of engineers. But even if your git repository is private, it’s strongly discouraged to store keys and passwords in unencrypted form.

This blog post will show how easy it is to use GitOps and keep Kubernetes secrets for Percona Kubernetes Operator for Percona Server for MongoDB securely in the repository with Sealed Secrets or Vault Secrets Operator.

Sealed Secrets

Prerequisites:

  • Kubernetes cluster up and running
  • Github repository (optional)

Install Sealed Secrets Controller

Sealed Secrets rely on asymmetric cryptography (which is also used in TLS), where the private key (which in our case is stored in Kubernetes) can decrypt the message encrypted with the public key (which can be stored in public git repository safely). To make this task easier, Sealed Secrets provides the kubeseal tool, which helps with the encryption of the secrets.

Install kubeseal operator into your Kubernetes cluster:

kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/controller.yaml

It will install the controller into the kube-system namespace and provide the Custom Resource Definition

sealedsecrets.bitnami.com

. All resources in Kubernetes with

kind: SealedSecrets

will be handled by this Operator.

Download the kubeseal binary:

wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/kubeseal-linux-amd64 -O kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubeseal

Encrypt the Keys

In this example, I intend to store important secrets of the Percona Kubernetes Operator for Percona Server for MongoDB in git along with my manifests that are used to deploy the database.

First, I will seal the secret file with system users, which is used by the MongoDB Operator to manage the database. Normally it is stored in deploy/secrets.yaml.

kubeseal --format yaml < secrets.yaml  > blog-data/sealed-secrets/mongod-secrets.yaml

This command creates the file with encrypted contents, you can see it in the blog-data/sealed-secrets repository here. It is safe to store it publicly as it can only be decrypted with a private key.

Executing

kubectl apply -f blog-data/sealed-secrets/mongod-secrets.yaml

does the following:

  1. A sealedsecrets custom resource (CR) is created. You can see it by executing
    kubectl get sealedsecrets

    .

  2. The Sealed Secrets Operator receives the event that a new sealedsecrets CR is there and decrypts it with the private key.
  3. Once decrypted, a regular Secrets object is created which can be used as usual.

$ kubectl get sealedsecrets
NAME               AGE
my-secure-secret   20m

$ kubectl get secrets my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   10     20m

Next, I will also seal the keys for my S3 bucket that I plan to use to store backups of my MongoDB database:

kubeseal --format yaml < backup-s3.yaml  > blog-data/sealed-secrets/s3-secrets.yaml
kubectl apply -f blog-data/sealed-secrets/s3-secrets.yaml

Vault Secrets Operator

Sealed Secrets is the simplest approach, but it is possible to achieve the same result with HashiCorp Vault and Vault Secrets Operator. It is a more advanced, mature, and feature-rich approach.

Prerequisites:

Vault Secrets Operator also relies on Custom Resource, but all the keys are stored in HashiCorp Vault:

Preparation

Create a policy on the Vault for the Operator:

cat <<EOF | vault policy write vault-secrets-operator -
path "kvv2/data/*" {
  capabilities = ["read"]
}
EOF

The policy might look a bit differently, depending on where your secrets are.

Create and fetch the token for the policy:

$ vault token create -period=24h -policy=vault-secrets-operator

Key                  Value                                                                                                                                                                                        
---                  -----                                                                                               
token                s.0yJZfCsjFq75GiVyKiZgYVOm
...

Write down the token, as you will need it in the next step.

Create the Kubernetes Secret so that the Operator can authenticate with the Vault:

export VAULT_TOKEN=s.0yJZfCsjFq75GiVyKiZgYVOm
export VAULT_TOKEN_LEASE_DURATION=86400

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: vault-secrets-operator
type: Opaque
data:
  VAULT_TOKEN: $(echo -n "$VAULT_TOKEN" | base64)
  VAULT_TOKEN_LEASE_DURATION: $(echo -n "$VAULT_TOKEN_LEASE_DURATION" | base64)
EOF

Deploy Vault Secrets Operator

It is recommended to deploy the Operator with Helm, but before we need to create the values.yaml file to configure the operator.

environmentVars:
  - name: VAULT_TOKEN
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN
  - name: VAULT_TOKEN_LEASE_DURATION
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN_LEASE_DURATION
vault:
  address: "http://vault.vault.svc:8200"

Environment variables are pointing to the Secret that was created in the previous chapter to authenticate with Vault. We also need to provide the Vault address for the Operator to retrieve the secrets.

Now we can deploy the Vault Secrets Operator:

helm repo add ricoberger https://ricoberger.github.io/helm-charts
helm repo update

helm upgrade --install vault-secrets-operator ricoberger/vault-secrets-operator -f blog-data/sealed-secrets/values.yaml

Give me the Secret

I have a key created in my HashiCorp Vault:

$ vault kv get kvv2/mongod-secret
…
Key                                 Value
---                                 -----                                                                                                                                                                         
MONGODB_BACKUP_PASSWORD             <>
MONGODB_CLUSTER_ADMIN_PASSWORD      <>
MONGODB_CLUSTER_ADMIN_USER          <>
MONGODB_CLUSTER_MONITOR_PASSWORD    <>
MONGODB_CLUSTER_MONITOR_USER        <>                                                                                                                                                               
MONGODB_USER_ADMIN_PASSWORD         <>
MONGODB_USER_ADMIN_USER             <>

It is time to create the secret out of it. First, we will create the Custom Resource object of

kind: VaultSecret

.

$ cat blog-data/sealed-secrets/vs.yaml
apiVersion: ricoberger.de/v1alpha1
kind: VaultSecret
metadata:
  name: my-secure-secret
spec:
  path: kvv2/mongod-secret
  type: Opaque

$ kubectl apply -f blog-data/sealed-secrets/vs.yaml

The Operator will connect to HashiCorp Vault and create regular Secret object automatically:

$ kubectl get vaultsecret
NAME               SUCCEEDED   REASON    MESSAGE              LAST TRANSITION   AGE
my-secure-secret   True        Created   Secret was created   47m               47m

$ kubectl get secret  my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   7      47m

Deploy MongoDB Cluster

Now that the secrets are in place, it is time to deploy the Operator and the DB cluster:

kubectl apply -f blog-data/sealed-secrets/bundle.yaml
kubectl apply -f blog-data/sealed-secrets/cr.yaml

The cluster will be up in a minute or two and use secrets we deployed.

By the way, my cr.yaml deploys MongoDB cluster with two shards. Multiple shards support was added in version 1.7.0of the Operator – I encourage you to try it out. Learn more about it here: Percona Server for MongoDB Sharding.

Mar
16
2021
--

Docker nabs $23M Series B as new developer focus takes shape

It was easy to wonder what would become of Docker after it sold its enterprise business in 2019, but it regrouped last year as a cloud native container company focused on developers, and the new approach appears to be bearing fruit. Today, the company announced a $23 million Series B investment.

Tribe Capital led the round with participation from existing investors Benchmark and Insight Partners. Docker has now raised a total of $58 million including the $35 million investment it landed the same day it announced the deal with Mirantis.

To be sure, the company had a tempestuous 2019 when they changed CEOs twice, sold the enterprise division and looked to reestablish itself with a new strategy. While the pandemic made 2020 a trying time for everyone, Docker CEO Scott Johnston says that in spite of that, the strategy has begun to take shape.

“The results we think speak volumes. Not only was the strategy strong, but the execution of that strategy was strong as well,” Johnston told me. He indicated that the company added 1.7 million new developer registrations for the free version of the product for a total of more than 7.3 million registered users on the community edition.

As with any open-source project, the goal is to popularize the community project and turn a small percentage of those users into paying customers, but Docker’s problem prior to 2019 had been finding ways to do that. While he didn’t share specific numbers, Johnston indicated that annual recurring revenue (ARR) grew 170% last year, suggesting that they are beginning to convert more successfully.

Johnston says that’s because they have found a way to turn a certain class of developer in spite of a free version being available. “Yes, there’s a lot of upstream open-source technologies, and there are users that want to hammer together their own solutions. But we are also seeing these eight to 10 person ‘two-pizza teams’ who want to focus on building applications, and so they’re willing to pay for a service,” he said.

That open-source model tends to get the attention of investors because it comes with that built-in action at the top of the sales funnel. Tribe’s Arjun Sethi, whose firm led the investment, says his company actually was a Docker customer before investing in the company and sees a lot more growth potential.

“Tribe focuses on identifying N-of-1 companies — top-decile private tech firms that are exhibiting inflection points in their growth, with the potential to scale toward outsized outcomes with long-term venture capital. Docker fits squarely into this investment thesis [ … ],” Sethi said in a statement.

Johnston says as they look ahead post-pandemic, he’s learned a lot since his team moved out of the office last year. After surveying employees, they were surprised to learn that most have been happier working at home, having more time to spend with family, while taking away a grueling commute. As a result, he sees going virtual first, even after it’s safe to reopen offices.

That said, he is planning to offer a way to get teams together for in-person gatherings and a full company get-together once a year.

“We’ll be virtual first, but then with the savings of the real estate that we’re no longer paying for, we’re going to bring people together and make sure we have that social glue,” he said.


Early Stage is the premier “how-to” event for startup entrepreneurs and investors. You’ll hear firsthand how some of the most successful founders and VCs build their businesses, raise money and manage their portfolios. We’ll cover every aspect of company building: Fundraising, recruiting, sales, product-market fit, PR, marketing and brand building. Each session also has audience participation built in — there’s ample time included for audience questions and discussion. Use code “TCARTICLE at checkout to get 20% off tickets right here.

Mar
10
2021
--

A Peek at Percona Kubernetes Operator for Percona Server for MongoDB New Features

Percona Kubernetes Operator for Percona Server for MongoDB New Features

Percona Kubernetes Operator for Percona Server for MongoDB New FeaturesThe latest 1.7.0 release of Percona Kubernetes Operator for Percona Server for MongoDB came out just recently and enables users to:

Today we will look into these new features, the use cases, and highlight some architectural and technical decisions we made when implementing them.

Sharding

The 1.6.0 release of our Operator introduced single shard support, which we highlighted in this blog post and explained why it makes sense. But horizontal scaling is not possible without support for multiple shards.

Adding a Shard

A new shard is just a new ReplicaSet which can be added under spec.replsets in cr.yaml:

spec:
  ...
  replsets:
  - name: rs0
    size: 3
  ....
  - name: rs1
    size: 3
  ...

Read more on how to configure sharding.

In the Kubernetes world, a MongoDB ReplicaSet is a StatefulSet with a number of pods specified in

spec.replsets.[].size

variable.

Once pods are up and running, the Operator does the following:

  • Initiates ReplicaSet by connecting to newly created pods running mongod
  • Connects to mongos and adds a shard with sh.addShard() command

    adding a shard mongodb operator

Then the output of db.adminCommand({ listShards:1 }) will look like this:

        "shards" : [
                {
                        "_id" : "replicaset-1",
                        "host" : "replicaset-1/percona-cluster-replicaset-1-0.percona-cluster-replicaset-1.default.svc.cluster.local:27017,percona-cluster-replicaset-1-1.percona-cluster-replicaset-1.default.svc.cluster.local:27017,percona-cluster-replicaset-1-2.percona-cluster-replicaset-1.default.svc.cluster.local:27017",
                        "state" : 1
                },
                {
                        "_id" : "replicaset-2",
                        "host" : "replicaset-2/percona-cluster-replicaset-2-0.percona-cluster-replicaset-2.default.svc.cluster.local:27017,percona-cluster-replicaset-2-1.percona-cluster-replicaset-2.default.svc.cluster.local:27017,percona-cluster-replicaset-2-2.percona-cluster-replicaset-2.default.svc.cluster.local:27017",
                        "state" : 1
                }
        ],

Have open source expertise to share? Submit your talk for Percona Live ONLINE!

Deleting a Shard

Percona Operators are built to simplify the deployment and management of the databases on Kubernetes. Our goal is to provide resilient infrastructure, but the operator does not manage the data itself. Deleting a shard requires moving the data to another shard before removal, but there are a couple of caveats:

  • Sometimes data is not moved automatically by MongoDB – unsharded collections or jumbo chunks
  • We hit the storage problem – what if another shard does not have enough disk space to hold the data?

shard does not have enough disk space to hold the data

There are a few choices:

  1. Do not touch the data. The user needs to move the data manually and then the operator removes the empty shard.
  2. The operator decides where to move the data and deals with storage issues by upscaling if necessary.
    • Upscaling the storage can be tricky, as it requires certain capabilities from the Container Storage Interface (CNI) and the underlying storage infrastructure.

For now, we decided to pick option #1 and won’t touch the data, but in future releases, we would like to work with the community to introduce fully-automated shard removal.

When the user wants to remove the shard now, we first check if there are any non-system databases present on the ReplicaSet. If there are none, the shard can be removed:

func (r *ReconcilePerconaServerMongoDB) checkIfPossibleToRemove(cr *api.PerconaServerMongoDB, usersSecret *corev1.Secret, rsName string) error {
  systemDBs := map[string]struct{}{
    "local": {},
    "admin": {},
    "config":  {},
  }

delete a shard

Custom Sidecars

The sidecar container pattern allows users to extend the application without changing the main container image. They leverage the fact that all containers in the pod share storage and network resources.

Percona Operators have built-in support for Percona Monitoring and Management to gain monitoring insights for the databases on Kubernetes, but sometimes users may want to expose metrics to other monitoring systems.  Lets see how mongodb_exporter can expose metrics running as a sidecar along with ReplicaSet containers.

1. Create the monitoring user that the exporter will use to connect to MongoDB. Connect to mongod in the container and create the user:

> db.getSiblingDB("admin").createUser({
    user: "mongodb_exporter",
    pwd: "mysupErpassword!123",
    roles: [
      { role: "clusterMonitor", db: "admin" },
      { role: "read", db: "local" }
    ]
  })

2. Create the Kubernetes secret with these login and password. Encode both the username and password with base64:

$ echo -n mongodb_exporter | base64
bW9uZ29kYl9leHBvcnRlcg==
$ echo -n 'mysupErpassword!123' | base64
bXlzdXBFcnBhc3N3b3JkITEyMw==

Put these into the secret and apply:

$ cat mongoexp_secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mongoexp-secret
data:
  username: bW9uZ29kYl9leHBvcnRlcg==
  password: bXlzdXBFcnBhc3N3b3JkITEyMw==

$ kubectl apply -f mongoexp_secret.yaml

3. Add a sidecar for mongodb_exporter into cr.yaml and apply:

replsets:
- name: rs0
  ...
  sidecars:
  - image: bitnami/mongodb-exporter:latest
    name: mongodb-exporter
    env:
    - name: EXPORTER_USER
      valueFrom:
        secretKeyRef:
          name: mongoexp-secret
          key: username
    - name: EXPORTER_PASS
      valueFrom:
        secretKeyRef:
          name: mongoexp-secret
          key: password
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
    - name: MONGODB_URI
      value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017"
    args: ["--web.listen-address=$(POD_IP):9216"

$ kubectl apply -f deploy/cr.yaml

All it takes now is to configure the monitoring system to fetch the metrics for each mongod Pod. For example, prometheus-operator will start fetching metrics once annotations are added to ReplicaSet pods:

replsets:
- name: rs0
  ...
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '9216'

PVCs Clean Up

Running CICD pipelines that deploy MongoDB clusters on Kubernetes is a common thing. Once these clusters are terminated, the Persistent Volume Claims (PVCs) are not. We have now added automation that removes PVCs after cluster deletion. We rely on Kubernetes Finalizers – asynchronous pre-delete hooks. In our case we hook the finalizer to the Custom Resource (CR) object which is created for the MongoDB cluster.

PVCs Clean Up

A user can enable the finalizer through cr.yaml in the metadata section:

metadata:
  name: my-cluster-name
  finalizers:
     - delete-psmdb-pvc

Conclusion

Percona is committed to providing production-grade database deployments on Kubernetes. Our Percona Kubernetes Operator for Percona Server for MongoDB is a feature-rich tool to deploy and manage your MongoDB clusters with ease. Our Operator is free and open source. Try it out by following the documentation here or help us to make it better by contributing your code and ideas to our Github repository.

Feb
24
2021
--

Google Cloud puts its Kubernetes Engine on autopilot

Google Cloud today announced a new operating mode for its Kubernetes Engine (GKE) that turns over the management of much of the day-to-day operations of a container cluster to Google’s own engineers and automated tools. With Autopilot, as the new mode is called, Google manages all of the Day 2 operations of managing these clusters and their nodes, all while implementing best practices for operating and securing them.

This new mode augments the existing GKE experience, which already managed most of the infrastructure of standing up a cluster. This ‘standard’ experience, as Google Cloud now calls it, is still available and allows users to customize their configurations to their heart’s content and manually provision and manage their node infrastructure.

Drew Bradstock, the Group Product Manager for GKE, told me that the idea behind Autopilot was to bring together all of the tools that Google already had for GKE and bring them together with its SRE teams who know how to run these clusters in production — and have long done so inside of the company.

“Autopilot stitches together auto-scaling, auto-upgrades, maintenance, Day 2 operations and — just as importantly — does it in a hardened fashion,” Bradstock noted. “[…] What this has allowed our initial customers to do is very quickly offer a better environment for developers or dev and test, as well as production, because they can go from Day Zero and the end of that five-minute cluster creation time, and actually have Day 2 done as well.”

Image Credits: Google

From a developer’s perspective, nothing really changes here, but this new mode does free up teams to focus on the actual workloads and less on managing Kubernetes clusters. With Autopilot, businesses still get the benefits of Kubernetes, but without all of the routine management and maintenance work that comes with that. And that’s definitely a trend we’ve been seeing as the Kubernetes ecosystem has evolved. Few companies, after all, see their ability to effectively manage Kubernetes as their real competitive differentiator.

All of that comes at a price, of course, in addition to the standard GKE flat fee of $0.10 per hour and cluster (there’s also a free GKE tier that provides $74.40 in billing credits), plus additional fees for resources that your clusters and pods consume. Google offers a 99.95% SLA for the control plane of its Autopilot clusters and a 99.9% SLA for Autopilot pods in multiple zones.

Image Credits: Google

Autopilot for GKE joins a set of container-centric products in the Google Cloud portfolio that also include Anthos for running in multi-cloud environments and Cloud Run, Google’s serverless offering. “[Autopilot] is really [about] bringing the automation aspects in GKE we have for running on Google Cloud, and bringing it all together in an easy-to-use package, so that if you’re newer to Kubernetes, or you’ve got a very large fleet, it drastically reduces the amount of time, operations and even compute you need to use,” Bradstock explained.

And while GKE is a key part of Anthos, that service is more about brining Google’s config management, service mesh and other tools to an enterprise’s own data center. Autopilot of GKE is, at least for now, only available on Google Cloud.

“On the serverless side, Cloud Run is really, really great for an opinionated development experience,” Bradstock added. “So you can get going really fast if you want an app to be able to go from zero to 1000 and back to zero — and not worry about anything at all and have it managed entirely by Google. That’s highly valuable and ideal for a lot of development. Autopilot is more about simplifying the entire platform people work on when they want to leverage the Kubernetes ecosystem, be a lot more in control and have a whole bunch of apps running within one environment.”

 

Feb
24
2021
--

Point-In-Time Recovery in Kubernetes Operator for Percona XtraDB Cluster – Architecture Decisions

Point-In-Time Recovery in Kubernetes Operator

Point-In-Time Recovery in Kubernetes OperatorPoint-In-Time Recovery (PITR) for MySQL databases is an important feature that is essential and covers common use cases, like a recovery to the latest possible transaction or roll-back the database to a specific date before some bad query was executed. Percona Kubernetes Operator for Percona XtraDB Cluster (PXC) added support for PITR in version 1.7, and in this blog post we are going to look into the technical details and decisions we made to implement this feature.

Architecture Decisions

Store Binary Logs on Object Storage

MySQL uses binary logs to perform point-in-time recovery. Usually, they are stored locally along with the data, but it is not an option for us:

  • We run the cluster and we cannot rely on a single node’s local storage.
  • The cloud-native world lives in an ephemeral dimension, where nodes and pods can be terminated and S3-compatible storage is a de facto standard to store data.
  • We should be able to recover the data to another Kubernetes cluster in case of a disaster.

We have decided to add a new Binlog Uploader Pod, which connects to the available PXC member and uploads binary logs to S3. Under the hood, it relies on the mysqlbinlog utility.

Use Global Transaction ID

Binary logs on the clustered nodes are not synced and can have different names and contents. This becomes a problem for the Uploader, as it can connect to different PXC nodes for various reasons.

To solve this problem, we decided to rely on Global Transaction ID (GTID). It is a unique transaction identifier, but it is unique not only to the server on which it originated, but is unique across all servers in a given replication topology.  With the GTID captured in binary logs, we can identify any transaction not depending on the filename or its contents. This allows us to continue streaming binlogs from any PXC member at any moment.

User-Defined Functions

We have a unique identifier for every transaction, but the mysqlbinlog utility still doesn’t have the functionality to determine which binary log file contains which GTID. We decided to extend MySQL with few User Defined Functions and added them to Percona Server for MySQL and Percona XtraDB Cluster versions 8.0.21

get_gtid_set_by_binlog()

This function returns all GTIDs that are stored inside the given binlog file. We put the GTID setlist to a new file next to the binary log on S3.

get_binlog_by_gtid_set()

This function takes GTID set as an input and returns a binlog filename which is stored locally. We use it to figure out which GTIDs are already uploaded and which binlog to upload next. 

binlog uploader pod

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Find the node with the oldest binary log

Our quality assurance team caught a bug before the release which can happen in the cluster only:

  • Add a new node to the Percona XtraDB Cluster (for example scale up from 3 to 5 nodes).
  • Binlog Uploader Pod tries to execute get_binlog_by_gtid_set on the new node but gets the error.
2021/01/19 11:23:19 ERROR: collect binlog files: get last uploaded binlog name by gtid set: scan binlog: sql: Scan error on column index 0, name "get_binlog_by_gtid_set('a8e657ab-5a47-11eb-bea2-f3554c9e5a8d:15')": converting NULL to string is unsupported

The error is valid, as this node is new and there are no binary log files that have the GTID set that Uploader got from S3. If you look into this pull request, the quick patch is to always pick the oldest node in the array or in other words the node, which most likely would have the binary logs we need. In the next release of the Operator, we add more sophisticated logic, to discover the node which has the oldest binary logs for sure.

Storageless binlog uploader

The size of binlogs depends on the cluster usage patterns, so it is hard to predict the size of the storage or memory required for them. We decided to take this complexity away by making our Binary Log Uploader Pod completely storageless. Mysqlbinlog can store remote binlog only into files, but we need to put them to S3. To get there we decided to use a named pipe or FIFO special file. Now mysqlbinlog utility loads the binary log file to a named pipe, our Uploader reads it and streams the data directly to S3.

Also, storageless design means that we never store any state between Uploader restarts. Basically, state is not needed, we only need to know which GTIDs are already uploaded and we have this data on a remote S3 bucket. Such design enables the continuous upload flow of binlogs.

Binlog upload delay

S3 protocol expects that the file is completely uploaded. If the file upload is interrupted (let’s say Uploader Pod is evicted), the file will not be accessible/visible on S3. Potentially we can lose many hours of binary logs because of such interruptions. That’s why we need to split the binlog stream into files and upload them separately.

One of the options that users can configure when enabling point-in-time recovery in Percona XtraDB Cluster Operator is timeBetweenUploads. It sets the number of seconds between uploads for Binlog Uploader Pod. By default, we set it to 60 seconds, but it can go down to one second. We do not recommend setting it too low, as every invocation of the Uploader leads to FLUSH BINARY LOGS command execution on the PXC node. We need to flush the logs to close the binary log file to upload it to external storage, but doing it frequently may negatively affect IO and as a result database performance.

Recovery

It is all about recovery and it has two steps:

  1. Recover the cluster from a full backup
  2. Apply binary logs

We already have the functionality to restore from a full backup (see here), so let’s get to applying the binary logs.

First, we need to figure out from which GTID set we should start applying binary logs – in other words: where do we start?. As we rely on the Percona XtraBackup utility to take full MySQL backups, what we need to do is read the xtrabackup_info file which has lots of useful metadata. We already have this file on S3 near the full backup.

Second, find the binlog which has the GTID set we need. As you remember, we store a file with binlog’s GTID sets on S3 already, so it boils down to reading these files.

Third, download binary logs and apply them. Here we rely on mysqlbinlog as well, which has the flags we need, like –stop-datetime – which stops recovery when the event with a specific timestamp is caught in the log.

point in time recovery

Conclusion

MySQL is more than 25 years old and has a great tooling ecosystem established around it, but as we saw in this blog post, not all these tools are cloud-native ready. Percona engineering teams are committed to providing users the same features across various environments, whether it is a bare-metal installation in the data center or cutting edge Kubernetes in the cloud.

Feb
12
2021
--

Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator

Kubernetes Costs Percona Monitoring and Management

Kubernetes Costs Percona Monitoring and ManagementMore and more companies are adopting Kubernetes, but after some time they see an unexpected growth around cloud costs. Engineering teams did their part in setting up auto-scalers, but the cloud bill is still growing. Today we are going to see how Percona Monitoring and Management (PMM) can help with monitoring Kubernetes and reducing the costs of the infrastructure.

Get the Metrics

Overview

Prometheus Operator is a great tool to monitor Kubernetes as it deploys a full monitoring stack (prometheus, grafana, alertmanager, node exporters) and works out of the box. But if you have multiple k8s clusters, then it would be great to have a single pane of glass from which to monitor them all.

To get there I will have Prometheus Operator running on each cluster and pushing metrics to my PMM server. Metrics will be stored in VictoriaMetrics time-series DB, which PMM uses by default since the December 2020 release of version 2.12.

Prometheus Operator

PMM-server

I followed this manual to the letter to install my PMM server with docker. Don’t forget to open the HTTPS port on your firewall, so that you can reach the UI from your browser, and so that the k8s clusters can push their metrics to VictoriaMetrics through NGINX.

Prometheus Operator

On each Kubernetes cluster, I will now install Prometheus Operator to scrape the metrics and send them to PMM. Bear in mind that Helm charts are stored in prometheus-community repo.

Add helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Prepare the configuration before installing the operator

$ cat values.yaml
serviceAccounts:
  alertmanager:
    create: false

alertmanager:
  enabled: false

configmapReload:
  alertmanager:
    enabled: false

extraScrapeConfigs: |
  remote_write:
    - url: https://{PMM_USER}:{PMM_PASS}@{YOUR_PMM_HOST}/victoriametrics/api/v1/write

server:
  global:
    external_labels:
      kubernetes_cluster_name: {UNIQUE_K8S_LABEL}

  • Disable alertmanager, as I will rely on PMM
  • Add remote_write section to write metrics to PMM’s VictoriaMetrics storage
    • Use your PMM user and password to authenticate. The default username and password are admin/admin. It is highly recommended to change defaults, see how here.
    • /victoriametrics

      endpoint is exposed through NGINX on PMM server

    • If you use https and a self-signed certificate you may need to disable TLS verification:
 tls_config:
   insecure_skip_verify: true

  • external_labels

    section is important – it labels all the metrics sent from Prometheus. Each cluster must have a unique

    kubernetes_cluster_name

    label to distinguish metrics once they are merged in VictoriaMetrics.

Create namespace and deploy

kubectl create namespace prometheus
helm install prometheus prometheus-community/prometheus -f values.yaml  --namespace=prometheus

 

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Check

  • PMM Server is up – check
  • Prometheus Operators run on Kubernetes Clusters – check

Now let’s check if metrics are getting to PMM:

  • Go to PMM Server UI
  • On the left pick Explore

PMM Server UI

  • Run the query
    kube_node_info{kubernetes_cluster_name="UNIQUE_K8S_LABEL"}

It should return the information about the Nodes running on the cluster with UNIQUE_K8S_LABEL. If it does – all good, metrics are there.

Monitor the Costs

The main reasons for the growth of the cloud bill are computing and storage. Kubernetes can scale up adding more and more nodes, skyrocketing compute costs. 

We are going to add two dashboards to the PMM Server which would equip us with a detailed understanding of how resources are used and what should be tuned to reduce the number of nodes in the cluster or change instance types accordingly:

  1. Cluster overview dashboard
  2. Namespace and Pods dashboard

Import these dashboards in PMM:

dashboards in PMM

Dashboard #1 – Cluster Overview

The goal of this dashboard is to provide a quick overview of the cluster and its workloads.

Cluster Overview

The cluster on the screenshot has some room for improvement in utilization. It has a capacity of 1.6 thousand CPU cores but utilizes only 146 cores (~9%). Memory utilization is better – ~62%, but can be improved as well.

Quick take:

  • It is possible to reduce # of nodes and get utilization to at least 80%
  • Looks like workloads in this cluster are mostly memory bound, so it would be wiser to run nodes with more memory and less CPU.

Graphs in the CPU/Mem Request/Limit/Capacity section gives a detailed view of resource usage over time:

CPU/Mem Request/Limit/Capacity section

Another two interesting graphs would show us the top 20 namespaces that are wasting resources. It is calculated as the difference between requests and real utilization for CPU and Memory. The values on this graph can be negative if requests for the containers are not set.

This dashboard also has a graph showing persistent volume claims and their states. It can potentially help to reduce the number of volumes spun up on the cloud.

Dashboard #2 – Namespace and Pod

Now that we have an overview, it is time to dive deeper into the details. At the top, this dashboard allows the user to choose the Cluster, the Namespace, and the Pod.

At first, the user sees Namespace details: Quotas (might be empty if Resource Quotas are not set for the namespace), requests, limits, and real usage for CPU, Memory, Pods, and Persistent Volume Claims.

Namespace and Pod

The Namespace on the screenshot utilizes almost zero CPU cores but requests 20+ cores. If requests are tuned properly, then the capacity required to run the workloads would drop and the number of nodes can be reduced.

The next valuable insight that the user can pick from this dashboard is real Pod utilization – CPU, Memory, Network, and disks (only local storage).

Pod CPU Usage

In the case above you can see CPU and Memory container-level utilization for Prometheus Pod, which is shipping the metrics on one of my Kubernetes clusters.

Summary

This blog post equips you with the design to collect multiple Kubernetes clusters metrics in a single time-series database and expose them on the Percona Monitoring and Management UI through dashboards to analyze and gain insights. These insights help you drive your infrastructure costs down and highlight issues on the clusters.

Also, look to PMM on Kubernetes for monitoring of your databases – see our demo here and contact Percona if you are interested in learning more about how to become a Percona Customer, we are here to help!


The call for papers for Percona Live is open. We’d love to receive submissions on topics related to open-source databases such as MySQL, MongoDB, MariaDB, and PostgreSQL. To find out more visit percona.com/live.

Feb
08
2021
--

Container security acquisitions increase as companies accelerate shift to cloud

Last week, another container security startup came off the board when Rapid7 bought Alcide for $50 million. The purchase is part of a broader trend in which larger companies are buying up cloud-native security startups at a rapid clip. But why is there so much M&A action in this space now?

Palo Alto Networks was first to the punch, grabbing Twistlock for $410 million in May 2019. VMware struck a year later, snaring Octarine. Cisco followed with PortShift in October and Red Hat snagged StackRox last month before the Rapid7 response last week.

This is partly because many companies chose to become cloud-native more quickly during the pandemic. This has created a sharper focus on security, but it would be a mistake to attribute the acquisition wave strictly to COVID-19, as companies were shifting in this direction pre-pandemic.

It’s also important to note that security startups that cover a niche like container security often reach market saturation faster than companies with broader coverage because customers often want to consolidate on a single platform, rather than dealing with a fragmented set of vendors and figuring out how to make them all work together.

Containers provide a way to deliver software by breaking down a large application into discrete pieces known as microservices. These are packaged and delivered in containers. Kubernetes provides the orchestration layer, determining when to deliver the container and when to shut it down.

This level of automation presents a security challenge, making sure the containers are configured correctly and not vulnerable to hackers. With myriad switches this isn’t easy, and it’s made even more challenging by the ephemeral nature of the containers themselves.

Yoav Leitersdorf, managing partner at YL Ventures, an Israeli investment firm specializing in security startups, says these challenges are driving interest in container startups from large companies. “The acquisitions we are seeing now are filling gaps in the portfolio of security capabilities offered by the larger companies,” he said.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com