Jun
23
2021
--

MySQL on Kubernetes with GitOps

MySQL on Kubernetes with GitOps

GitOps workflow was introduced by WeaveWorks as a way to implement Continuous Deployment for cloud-native applications. This technique quickly found its way into devops and developer’s hearts as it greatly simplifies the application delivery pipeline: the change in the manifests in the git repository is reflected in Kubernetes right away. With GitOps there is no need to provide access to the cluster for the developer as all the actions are executed by the Operator.

This blog post is a guide on how to deploy Percona Distribution for MySQL on Kubernetes with Flux – GitOps Operator that keeps your cluster state in sync with the Git repository.

Percona Distribution for MySQL on Kubernetes with Flux

In a nutshell, the flow is the following:

  1. Developer triggers the change in the GitHub repository
  2. Flux Operator:
    1. detects the change
    2. deploys Percona Distribution for MySQL Operator
    3. creates the Custom Resource, which triggers the creation of Percona XtraDB Cluster and HAProxy pods

The result is a fully working MySQL service deployed without talking to Kubernetes API directly.

Preparation

Prerequisites:

  • Kubernetes cluster
  • Github user and account
    • For this blog post, I used the manifests from this repository 

It is a good practice to create a separate namespace for Flux:

$ kubectl create namespace gitops

Installing and managing Flux is easier with

fluxctl

. In Ubuntu, I use snap to install tools, for other operating systems please refer to the manual here.

$ sudo snap install fluxctl --classic

Install Flux operator to your Kubernetes cluster:

$ fluxctl install --git-email=your@email.com --git-url=git@github.com:spron-in/blog-data.git --git-path=gitops-mysql --manifest-generation=true --git-branch=master --namespace=gitops | kubectl apply -f -

GitHub Sync

As per configuration, Flux will monitor the changes in the spron-in/blog-data repository continuously and sync the state. It is required to grant access to Flux to the repo.

Get the public key that was generated during the installation:

$ fluxctl identity --k8s-fwd-ns gitops

Copy the key, add it as Deploy key with write access in GitHub. Go to Settings -> Deploy keys -> Add deploy key:

Action

All set. Flux reconcile loops check the state for changes every five minutes. To trigger synchronization right away run:

$ fluxctl sync --k8s-fwd-ns gitops

In my case I have two YAMLs in the repo:

  • bundle.yaml

    – installs the Operator, creates the Custom Resource Definitions (CRDs)

  • cr.yaml

    – deploys PXC and HAProxy pods

Flux is going to deploy them both.

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          26m
cluster1-haproxy-1                                 2/2     Running   0          25m
cluster1-pxc-0                                     1/1     Running   0          26m
cluster1-pxc-1                                     1/1     Running   0          25m
cluster1-pxc-2                                     1/1     Running   0          23m
percona-xtradb-cluster-operator-79966668bd-95plv   1/1     Running   0          26m

Now let’s add one more HAProxy Pod by changing

spec.haproxy.size

from 2 to 3 in

cr.yaml

. After that commit and push the changes. In a production-grade scenario, the Pull Request will go through a thorough review, in my case I push directly to the main branch.

$ git commit cr.yaml -m 'increase haproxy size from 2 to 3'
$ git push
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 385 bytes | 385.00 KiB/s, done.
Total 4 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/spron-in/blog-data
   e1a27b8..d555c77  master -> master

Either trigger the sync with

fluxctl sync

command or wait for approximately 5 minutes for Flux reconcile loop to detect the changes. In the logs of the Flux Operator you will see the event:

ts=2021-06-15T12:59:08.267469963Z caller=loop.go:134 component=sync-loop event=refreshed url=ssh://git@github.com/spron-in/blog-data.git branch=master HEAD=d555c77c19ea9d1685392680186e1491905401cc
ts=2021-06-15T12:59:08.270678093Z caller=sync.go:61 component=daemon info="trying to sync git changes to the cluster" old=e1a27b8a81e640d3bee9bc2e2c31f9c4189e898a new=d555c77c19ea9d1685392680186e1491905401cc
ts=2021-06-15T12:59:08.844068322Z caller=sync.go:540 method=Sync cmd=apply args= count=9
ts=2021-06-15T12:59:09.097835721Z caller=sync.go:606 method=Sync cmd="kubectl apply -f -" took=253.684342ms err=null output="serviceaccount/percona-xtradb-cluster-operator unchanged\nrole.rbac.authorization.k8s.io/percona-xtradb-cluster-operator unchanged\ncustomresourcedefinition.apiextensions.k8s.io/perconaxtradbbackups.pxc.percona.com configured\ncustomresourcedefinition.apiextensions.k8s.io/perconaxtradbclusterbackups.pxc.percona.com unchanged\ncustomresourcedefinition.apiextensions.k8s.io/perconaxtradbclusterrestores.pxc.percona.com unchanged\ncustomresourcedefinition.apiextensions.k8s.io/perconaxtradbclusters.pxc.percona.com unchanged\nrolebinding.rbac.authorization.k8s.io/service-account-percona-xtradb-cluster-operator unchanged\ndeployment.apps/percona-xtradb-cluster-operator unchanged\nperconaxtradbcluster.pxc.percona.com/cluster1 configured"
ts=2021-06-15T12:59:09.099258988Z caller=daemon.go:701 component=daemon event="Sync: d555c77, default:perconaxtradbcluster/cluster1" logupstream=false
ts=2021-06-15T12:59:11.387525662Z caller=loop.go:236 component=sync-loop state="tag flux" old=e1a27b8a81e640d3bee9bc2e2c31f9c4189e898a new=d555c77c19ea9d1685392680186e1491905401cc
ts=2021-06-15T12:59:12.122386802Z caller=loop.go:134 component=sync-loop event=refreshed url=ssh://git@github.com/spron-in/blog-data.git branch=master HEAD=d555c77c19ea9d1685392680186e1491905401cc

The log indicates that the main CR was configured:

perconaxtradbcluster.pxc.percona.com/cluster1 configured

 

Now we have three HAProxy Pods:

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   1          50m
cluster1-haproxy-1                                 2/2     Running   0          48m
cluster1-haproxy-2                                 2/2     Running   0          4m45s

It is important to note that GitOps maintains the sync between Kubernetes and GitHub. It means that if the user manually changes the object on Kubernetes, Flux, or any other GitOps Operator will revert the changes and sync them with GitHub.

GitOps also comes in handy when users want to take the backup or perform the restoration. To do that the user just creates YAML manifests in the GitHub repo and Flux creates corresponding Kubernetes objects. The Database Operator does the rest.

Conclusion

GitOps is a simple approach to deploy and manage applications on Kubernetes:

  • Change Management is provided by git version-control and code reviews
  • Direct access to Kubernetes API is limited which increases security
  • Infrastructure-as-a-Code is here, there is no need to integrate Terraform, Ansible, or any other tool

All Percona Operators can be deployed and managed with GitOps. As a result, you will get production-grade MySQL, MongoDB, or PostgreSQL cluster which just works.

Apr
28
2021
--

Resolving the MySQL Active-Active Replication Dilemma

MySQL Active-Active Replication Dilemma

MySQL Active-Active Replication DilemmaMulti-writer replication has been a challenge in the MySQL ecosystem for years before truly dedicated solutions were introduced – first Galera (and so Percona XtradDB Cluster (PXC)) replication (around 2011), and then Group Replication (first GA in 2016).

Now, with both multi-writer technologies available, do we still need traditional asynchronous replication, set up in active-active topology? Apparently yes, there are still valid use cases. And you may need it not only when for some reason Galera/PXC or GR are not suitable, but also when you actually use them. Of course, the most typical case is to have a second cluster in a different geographic location, as Disaster Recovery. If you still wonder why you would need it, just read how a whole data center can disappear in the news a few weeks ago, about the OVH incident.

So, a DR site needs to replicate online from the primary cluster and be able to take over the workload very fast if needed. But also it is expected to be able to switch back effortlessly, hence very often the async replication channels are set up in both directions.

A very good writeup about this can be found here: How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse.

Now, after seeing repeating problems with active-active setups falling over and over for years, I thought there are still too few warnings out there about the risks, so I decided to add one more little stone to the stack.

Before I continue, I have to mention this great webinar made last year by my colleague Sveta. You should definitely watch if you are interested in the subject: How Safe is Asynchronous Master-Master Setup in MySQL?.

 

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

 

Failure Test

So, let me demonstrate a simple test case, which may be an eye-opener to some.

First, let’s use the great dbdeployer tool to launch two MySQL instances with active-active replication in just one command:

$ dbdeployer deploy replication --topology=all-masters --nodes=2 --concurrent 8.0.23
all-masters directory installed in $HOME/sandboxes/all_masters_msb_8_0_23
run 'dbdeployer usage multiple' for basic instructions'
$HOME/sandboxes/all_masters_msb_8_0_23/initialize_ms_nodes
# server: 1 
# server: 2

Now, create a very simple table with one example data row:

node1 (test) > create table t1 (id int primary key auto_increment, a int);
Query OK, 0 rows affected (0.04 sec)

node1 (test) > insert into t1 set a=500;
Query OK, 1 row affected (0.01 sec)

So, at this point both nodes have the same data:

node1 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 500  |
+----+------+
1 row in set (0.00 sec)

node2 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 500  |
+----+------+
1 row in set (0.00 sec)

In the next step, let’s simulate some little replication lag by introducing a delay of one second, as it will allow reproducing the problem at will:

node1 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node2"; start replica sql_thread;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

node2 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node1"; start replica sql_thread;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

OK, so what if we send an update, very close in time, to both nodes, where the same row gets a different value:

$ all_masters_msb_8_0_23/use_all -e "update test.t1 set a=@@server_id where id=1"
# server: 1 
# server: 2

As a result, both nodes have different column values!

node1 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 200  |
+----+------+
1 row in set (0.00 sec)

node2 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 100  |
+----+------+
1 row in set (0.00 sec)

Is Replication Broken?

You may think replication is now broken and some error will alert you about the situation? Not at all!

node1 (test) > show replica status\G
*************************** 1. row ***************************
...
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
...
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
...
            Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,
00023825-2222-2222-2222-222222222222:1
...

node2 (test) > show replica status\G
*************************** 1. row ***************************
...
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
...
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
...
            Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,
00023825-2222-2222-2222-222222222222:1

Or, did you hope enabling the GTID feature would prevent inconsistency from happening? Well, see the Executed_Gtid_Set on both nodes – it is identical, yet the nodes have different data.

What happens next? Well, it depends, maybe replication will eventually fail someday with an error if the same row is modified again, but also it is possible the inconsistency will spread further without you even notice it!

Lesson learned?

I hope this simple example emphasized the need for extra care when dealing with multi-primary replication topologies. The remedy though is usually quite simple:

1 – Make sure to allow only one node at a time for writes (super_read_only is here to help you).

2 – Check the data consistency regularly with appropriate tools (MySQL replication primer with pt-table-checksum and pt-table-sync).

Apr
23
2021
--

Yearly Reminder: DDLs That Fail to Propagate May Cause Percona XtraDB Cluster Inconsistencies

DDLs Percona XtraDB Cluster Inconsistencies

DDLs Percona XtraDB Cluster InconsistenciesApologies for the silly title, but the issue is a real one, even though it is not a new thing. Schema upgrades are not an ordinary operation in Galera. For the subject at hand, the bottom line is: under the default Total Order Isolation (TOI) method, “the cluster replicates the schema change query as a statement before its execution.” What this means in practice is that a DDL issued in one node is replicated to other nodes in the cluster before it is even executed in the source node, let alone completed successfully.

As a result of this, it may fail in one node and be successful in another, and this without raising loud alerts or stopping nodes to protect against data inconsistency. This is not a bug in itself but rather a compromise of design. With new changes in MySQL and the recent support for atomic DDLs added in version 8.0, we may see improvements in this area over time in Galera as well. For now, the warning should be maintained: DDLs are not atomic in Galera, even in 8.0.

How Big of an Issue Is This?

To be fair, it’s rare – we do not see this happening often.

How Does the Problem Manifest Itself in Practice? Give Us an Example!

We recently investigated a case where a DBA issued a TRUNCATE TABLE statement in one node, which was taking some time to complete. His reaction at the time was to hit Ctrl+C on the session where the DDL was issued. That session seemed to hang, so he ended up opening a new session and issued a full DELETE on that same table. It also took time to complete, however this time, he was more patient. The problem was: once the statement was completed, he noticed that the two other nodes disappeared from the cluster. Upon closer inspection of the logs, both nodes reported the following error before evicting the cluster:

[ERROR] Slave SQL: Could not execute Delete_rows event on table mydb.target; Can't find record in 'target’, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1073, Error_code: 1032

The DELETE was successful on the source node but failed to complete on the other two nodes. Why? Because the target table was already empty: the previous TRUNCATE TABLE was effectively executed in those nodes.

Wasn’t the TRUNCATE TABLE also successful on the source node? It turned out it wasn’t. In fact, the TRUNCATE was recorded in the slow query log of the source node like this:

# Time: (...)
# User (...)
# Schema: building Last_errno: 1317 Killed: 0
# Query_time: 145.753247 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0
# Bytes_sent: 44 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
# No InnoDB statistics available for this query
SET timestamp=(...)
truncate target;

Note the Last_errno indicated above:

$ perror 1317
MySQL error code 1317 (ER_QUERY_INTERRUPTED): Query execution was interrupted

Our colleague Przemek reproduced a similar problem, which he described in bug PXC-3536: Modify processing to not allow threads/queries to be killed if the thread is in TOI. The Percona Engineering team has since fixed the abortion issue, leaving the following note on the bug tracker:

Fixed in 5.7[.33-31.49] and 8.0[.22-13.1].  Any attempt to kill the query or connection while in TOI will fail.  Note that due to timing, the “kill query” may sometimes return success, but the operation will still have succeeded.

In the case we investigated, the MySQL error log of the source node had the answer; there was some other higher-priority transaction already in place on that node that prevented it from executing the TRUNCATE:

[Note] WSREP: MDL conflict db=mydb table=target ticket=MDL_SHARED_WRITE solved by abort

To clarify, this cluster was receiving writes from multiple nodes, and the higher-priority transaction was applied by a replication thread, which usually has precedence over local events when a conflict is detected.

As you may have suspected, the TRUNCATE was not found in the binary log of the source node; since the statement did not complete, it was not logged. Only the subsequent DELETE statement was there.

And as for the binary logs of the other two nodes, they included both the TRUNCATE as well as the DELETEs covering all rows in the target table. Rows these other nodes no longer had in place and thus were unable to remove caused the data inconsistency that followed.

Apr
20
2021
--

Change Storage Class on Kubernetes on the Fly

Change Storage Class Kubernetes

Change Storage Class KubernetesPercona Kubernetes Operators support various options for storage: Persistent Volume (PV), hostPath, ephemeral storage, etc. In most of the cases, PVs are used, which are provisioned by the Operator through Storage Classes and Persistent Volume Claims.

Storage Classes define the underlying volume type that should be used (ex. AWS, EBS, gp2, or io1), file system (xfs, ext4), and permissions. In various cases, cluster administrators want to change the Storage Class for already existing volumes:

  • DB cluster is underutilized and it is a good cost-saving when switching from io1 to gp2
  • The other way – DB cluster is saturated on IO and it is required to upsize the volumes
  • Switch the file system for better performance (MongoDB is much better with xfs)

In this blog post, we will show what the best way is to change the Storage Class with Percona Operators and not introduce downtime to the database. We will cover the change in the Storage Class, but not the migration from PVC to other storage types, like hostPath.

Changing Storage Class on Kubernetes

Prerequisites:

Goal:

  • Change the storage from pd-standard to pd-ssd without downtime for PXC.

Planning

The steps we are going to take are the following:

  1. Create a new Storage class for pd-ssd volumes
  2. Change the
    storageClassName

    in Custom Resource (CR) 

  3. Change the
    storageClassName

    in the StatefulSet

  4. Scale up the cluster (optional, to avoid performance degradation)
  5. Reprovision the Pods one by one to change the storage
  6. Scale down the cluster

 

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

 

Execution

Create the Storage Class

By default, standard Storage Class is already present in GKE, we need to create the new

StorageClass

 for ssd:

$ cat pd-ssd.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
volumeBindingMode: Immediate
reclaimPolicy: Delete

$ kubectl apply -f pd-ssd.yaml

The new Storage Class will be called

ssd

and will provision the volumes of

type: pd-ssd

.

Change the storageClassName in Custom Resource

We need to change the configuration for Persistent Volume Claims in our Custom Resource. The variable we look for is

storageClassName

and it is located under

spec.pxc.volumeSpec.persistentVolumeClaim

.

spec:
  pxc:
    volumeSpec:
      persistentVolumeClaim:
-       storageClassName: standard
+       storageClassName: ssd

Now apply new cr.yaml:

$ kubectl apply -f deploy/cr.yaml

Change the storageClassName in the StatefulSet

StatefulSets are almost immutable and when you try to edit the object you get the warning:

# * spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

We will rely on the fact that the operator controls the StatefulSet. If we delete it, the Operator is going to recreate it with the last applied configuration. For us, it means – with the new storage class. But the deletion of the StatefulSet leads to Pods termination, but our goal is 0 downtime. To get there, we will delete the set, but keep the Pods running. It can be done with –cascade flag:

$ kubectl delete sts cluster1-pxc --cascade=orphan

As a result, the Pods are up and the Operator recreated the StatefulSet with new storageClass:

$ kubectl get sts | grep cluster1-pxc
cluster1-pxc       3/3      8s

Change Storage Class on Kubernetes

Scale up the Cluster (Optional)

Changing the storage type would require us to terminate the Pods, which decreases the computational power of the cluster and might cause performance issues. To improve performance during the operation we are going to changing the size of the cluster from 3 to 5 nodes:

spec:
  pxc:
-   size: 3
+   size: 5

$ kubectl apply -f deploy/cr.yaml

As long as we have changed the StatefulSet already, new PXC Pods will be provisioned with the volumes backed by the new

StorageClass

:

$ kubectl get pvc
datadir-cluster1-pxc-0   Bound    pvc-6476a94c-fa1b-45fe-b87e-c884f47bd328   6Gi        RWO            standard       78m
datadir-cluster1-pxc-1   Bound    pvc-fcfdeb71-2f75-4c36-9d86-8c68e508da75   6Gi        RWO            standard       76m
datadir-cluster1-pxc-2   Bound    pvc-08b12c30-a32d-46a8-abf1-59f2903c2a9e   6Gi        RWO            standard       64m
datadir-cluster1-pxc-3   Bound    pvc-b96f786e-35d6-46fb-8786-e07f3097da02   6Gi        RWO            ssd            69m
datadir-cluster1-pxc-4   Bound    pvc-84b55c3f-a038-4a38-98de-061868fd207d   6Gi        RWO            ssd           68m

Reprovision the Pods One by One to Change the Storage

This is the step where underlying storage is going to be changed for the database Pods.

Delete the PVC of the Pod that you are going to reprovision. Like for Pod

cluster1-pxc-2

the PVC is called

datadir-cluster1-pxc-2

:

$ kubectl delete pvc datadir-cluster1-pxc-2

The PVC will not be deleted right away as there is a Pod using it. To proceed, delete the Pod:

$ kubectl delete pod cluster1-pxc-2

The Pod will be deleted along with the PVCs. The StatefulSet controller will notice that the pod is gone and will recreate it along with the new PVC of a new Storage Class:

$ kubectl get pods
...
cluster1-pxc-2                                     0/3     Init:0/1   0          7s

$ kubectl get pvc datadir-cluster1-pxc-2
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
…
datadir-cluster1-pxc-2   Bound    pvc-08b12c30-a32d-46a8-abf1-59f2903c2a9e   6Gi        RWO            ssd      10s

The STORAGECLASS column indicates that this PVC is of type ssd.

You might face the situation that the Pod is stuck in a pending state with the following error:

  Warning  FailedScheduling  63s (x3 over 71s)  default-scheduler  persistentvolumeclaim "datadir-cluster1-pxc-2" not found

It can happen due to the race condition: Pod was created when old PVC was terminating, and when the Pod is ready to start the PVC is already gone. Just delete the Pod again, so that the PVC is recreated.

Once the Pod is up, the State Snapshot Transfer kicks in and the data is synced from other nodes. It might take a while if your cluster holds lots of data and is heavily utilized. Please wait till the node is fully up and running, sync is finished, and only then proceed to the next Pod.

Scale Down the Cluster

Once all the Pods are running on the new storage it is time to scale down the cluster (if it was scaled up):

spec:
  pxc:
-   size: 5
+   size: 3


$ kubectl apply -f deploy/cr.yaml

Do not forget to clean up the PVCs for nodes 4 and 5. And done!

Conclusion

Changing the storage type is a simple task on the public clouds, but its simplicity is not synchronized yet with Kubernetes capabilities. Kubernetes and Container Native landscape is evolving and we hope to see this functionality soon. The way of changing Storage Class described in this blog post can be applied to both the Percona Operator for PXC and Percona Operator for MongoDB. If you have ideas on how to automate this and are willing to collaborate with us on the implementation, please submit the Issue to our public roadmap on Github.

Apr
01
2021
--

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression FeatureIt is no secret that compute and storage costs are the main drivers of cloud bills. Migration of data from the legacy data center to the cloud looks appealing at first as it significantly reduces capital expense (CapEx) and keeps operational expenses (OpEx) under control. But once you see the bill, the lift and shift project does not look that promising anymore. See Percona’s recent open source survey which shows that many organizations saw an unexpected growth around cloud and data.

Storage growth is an organic process for the expanding business: more customers store more data, and more data needs more backups and disaster recovery storage for low RTO.

Today, the Percona Innovation Team, which is part of the Engineering organization, is proud to announce a new feature – High Compression. With this feature enabled, your MySQL databases will have infinite storage at zero cost.

The Problem

Our research team was digging into the problem of storage growth. They have found that the storage growth of a successful business inevitably leads to the increase of the cloud bill. After two years of research we got the data we need and the problem is now clear, and you can see it on the chart below:

The correlation is clearly visible – the more data you have, the more you pay.

The Solution

Once our Innovation Team received the data, we started working day and night on the solution. The goal was to change the trend and break the correlation. That is how after two years, we are proud to share with the community the High Compression feature. You can see the comparison of the storage costs with and without this new feature below:

Option 100 TB AWS EBS 100 TB AWS S3 for backups 100 TB AWS EBS + High compression 100 TB AWS S3 for backups + High Compression
Annual run rate

$120,000

$25,200

$1.2

< $1

As you see it is a 100,000x difference! What is more interesting, the cost of the storage with the High Compression feature enabled always stays flat and the chart now looks like this:

Theory

Not many people know, but data on disks is stored as bits, which are 0s and 1s. They form the binary sequences which are translated into normal data.

After thorough research, we came to the conclusion that we can replace the 1s with 0s easily. The formula is simple:

f(1) = 0

So instead of storing all these 1s, our High Compression feature stores zeroes only:

 

Implementation

The component which does the conversion is called the Nullifier, and every bit of data goes through it. We are first implementing this feature in Percona Operator for Percona XtraDB Cluster and below is the technical view of how it is implemented in Kubernetes:

As you see, all the data written by the user (all Insert or Update statements) goes through the Nullifier first, and only then are stored on the Persistent Volume Claim (PVC). With the High Compression feature enabled, the size of the PVC can be always 1 GB.

Percona is an open source company and we are thrilled to share our code with everyone. You can see the Pull Request for the High Compression feature here. As you see in the PR, our feature provides the Nullifier through the underestimated and very powerful Blackhole engine.

  if [ "$HIGH_COMPRESSION" == 'yes' ]; then
        sed -r "s|^[#]?default_storage_engine=.*$|default_storage_engine=BLACKHOLE|" ${CFG} 1<>${CFG}
        grep -E -q "^[#]?pxc_strict_mode" "$CFG" || sed '/^\[mysqld\]/a pxc_strict_mode=PERMISSIVE\n' ${CFG} 1<>${CFG}
    fi

The High Compression feature will be enabled by default starting from PXC Operator version 1.8.0, but we have added the flag into

cr.yaml

to disable this feature if needed:

spec.pxc.highCompression: true

.

Backups and with the High Compression feature are blazing fast and take seconds with any amount of data. The challenge our Engineering team is working on now is recovery. The Nullifier does the job, but recovering the data is hard. We are confident that De-Nullifier will be released in 1.8.0 as well.

Conclusion

Percona is spearheading innovation in the database technology field. The High Compression feature solves the storage growth problem and as a result, reduces the cloud bill significantly. The release of the Percona Kubernetes Operator for Percona XtraDB Cluster 1.8.0 is planned for mid-April, but this feature is already available in Tech Preview.

As a quick peek at our roadmap, we are glad to share that the Innovation Team has already started working on the High-Density feature, which will drastically reduce the compute footprint required to run MySQL databases.

Aug
11
2020
--

Smart Update Strategy in Percona Kubernetes Operator for Percona XtraDB Cluster

smart update strategy percona kubernetes opeerator

smart update strategy percona kubernetes opeeratorIn Percona Kubernetes Operator for Percona XtraDB Cluster (PXC) versions prior to 1.5.0, there were two methods for upgrading PXC clusters, and both of these use built-in StatefulSet update strategies. The first one is manual (OnDelete update strategy) and the second one is semi-automatic (RollingUpdate strategy). Since the Kubernetes operator is about automating the database management, and there are use cases to always keep the database up to date, a new smart update strategy was implemented.

Smart Update Strategy

The smart update strategy can be used to enable automatic context-aware upgrades of PXC clusters between minor versions. One of the use cases for automatic upgrades is if you want to get security updates as soon as they get released.

This strategy will upgrade reader PXC Pods at first and the last one upgraded will be the writer Pod, and it will also wait for the upgraded Pod to show up as online in ProxySQL before the next Pod is upgraded. This is needed to minimize the number of failovers during the upgrade and to make the upgrade as smooth as possible.

To make this work we implemented a version-unaware entrypoint and a Version Service to be queried for the up-to-date versions information.

The non-version specific entrypoint script is included in the operator docker image and is used to start different PXC versions. This makes the operator version not tightly coupled with a specific PXC docker image like it was done before, so one version of the operator will be able to run multiple versions of PXC cluster.

Version Service, which runs at https://check.percona.com/ by default, provides database version and alert information for various open source products. Version Service is open source and it can be run inside your own infrastructure, but that will be covered in some other blog posts.

How Does it Work?

When smart update is enabled and a new cluster has started, the values for docker images in the cr.yaml file will be ignored since the intention is to get them from the Version Service.

If smart update is enabled for an existing cluster, then at the scheduled time a version of the currently running PXC cluster, Kubernetes operator version, and the desired upgrade path will be provided to the Version Service. Version Service will return the JSON object with a set of docker images that should be used in the current environment. After that, the operator will update the CR with the new image paths and continue with deleting and redeploying the Pods in optimal order to minimize failovers.

The upgrade will not be done if the backup operation is in progress during the check for updates since the backup has a higher priority, but instead, it will be done next time the Version Service is checked.

With smart update functionality, you can also lock your database to a specific version, basically disabling automatic upgrades, but when needed use the smart update to trigger context-aware upgrades or just changes to resources.

This is how the upgrade might look in the operator logs (some parts stripped for brevity):

{"level":"info","ts":..,"logger":"..","msg":"update PXC version to 5.7.29-32-57 (fetched from db)"}
{"level":"info","ts":..,"logger":"..","msg":"add new job: * * * * *"}
{"level":"info","ts":..,"logger":"..","msg":"update PXC version from 5.7.29-32-57 to 5.7.30-31.43"}
{"level":"info","ts":..,"logger":"..","msg":"statefullSet was changed, run smart update"}
{"level":"info","ts":..,"logger":"..","msg":"primary pod is cluster1-pxc-0.cluster1-pxc.pxc-test.svc.cluster.local"}
{"level":"info","ts":..,"logger":"..","msg":"apply changes to secondary pod cluster1-pxc-2"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-2 is running"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-2 is online"}
{"level":"info","ts":..,"logger":"..","msg":"apply changes to secondary pod cluster1-pxc-1"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-1 is running"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-1 is online"}
{"level":"info","ts":..,"logger":"..","msg":"apply changes to primary pod cluster1-pxc-0"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-0 is running"}
{"level":"info","ts":..,"logger":"..","msg":"pod cluster1-pxc-0 is online"}
{"level":"info","ts":..,"logger":"..","msg":"smart update finished"}

As you can see, the initial PXC version deployed is 5.7.29, after which the smart update was enabled with the schedule to check for updates every minute (this is done for the test only). After that, smart update contacted the Version Service and started the upgrade process to version 5.7.30. The Primary Pod identified was PXC Pod 0, so firstly Pods 2 and 1 (readers) were upgraded, and only after that Pod 0 (writer), and at the end, the message was logged that the upgrade process finished.

Configuration Options Inside cr.yaml File

spec:
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com/versions
    apply: recommended
    schedule: "0 4 * * *"

As already mentioned, updateStrategy can be OnDelete or RollingUpdate in previous versions, but to use automatic upgrades it should be set to SmartUpdate.

Value of the upgradeOptions.versionServiceEndpoint option can be changed from the default if you have your own Version Service running (e.g. if your cluster doesn’t have a connection to the Internet and you have your own custom docker image repositories).

The most important setting is the upgradeOptions.apply option, which can have several values:

  • Never or Disabled – automatic upgrades are disabled and smart update is only utilized for other types of full cluster changes such as resource alterations or ConfigMap updates.
  • Recommended – automatic upgrades will choose the most recent version of software flagged as Recommended.
  • Latest – automatic upgrades will choose the most recent version of the software available.
    If you are starting a cluster from scratch and you have selected Recommended or Latest as desired versions, the current 8.0 major version will be selected. If you are already running a cluster, in that case, Version Service should always return the upgrade path inside your major version. Basically, if you want to start with a 5.7 major version from scratch, you should explicitly specify some 5.7 version (see below). Then, after the cluster is deployed, if you wish to enable automatic upgrades, you need to change the value of upgradeOptions.apply to “Recommended” or “Latest”.
  • Version Number – when a version number is supplied, this will start an upgrade if the running version doesn’t match the explicit version and then all future upgrades are no-ops. This essentially locks your database cluster to a specific database version. Example values for this can be “ 5.7.30-31.43” or “8.0.19-10.1”.

upgradeOptions.schedule is a classic cron schedule option and by default, it is set to check for new versions every day at 4 AM.

Limitations

A smart update strategy can only be used to upgrade PXC clusters and not the operator itself. It will only do the minor version upgrade automatically and it cannot be used for downgrades (since, as you may know, the downgrades in MySQL from version 8 might be problematic even between minor versions).

Conclusion

If there is a need to always keep the PXC cluster upgraded to the latest/recommended version, or if you just want to facilitate some benefits of the new smart update strategy even without automatic upgrades functionality, you can use version 1.5.0 of the Percona Kubernetes Operator for Percona XtraDB Cluster for this.

Oct
04
2019
--

Percona XtraDB Cluster 8.0 (experimental release) : SST Improvements

xtradb sst improvements

xtradb sst improvementsStarting with the experimental release of Percona XtraDB Cluster 8.0, we have made changes to the SST process to make the process more robust and easier to use.

  • mysqldump and rsync are no longer supported SST methods.

    Support for mysqldump was deprecated starting with PXC 5.7 and has now been completely removed.

    MySQL 8.0 introduced a new Redo Log format that limited the use of rsync while upgrading from PXC 5.7 to 8.0. In addition, the new Galera-4 also introduced changes that further limits the use of rsync.

    The only supported SST method is xtrabackup-v2.

  • A separate Percona XtraBackup installation is no longer required.

    The required Percona XtraBackup (PXB) binaries are now shipped as part of PXC 8.0, they are not installed for general use. So if you want to use PXB outside of an SST, you will have to install PXB separately.

  • SST logging now uses MySQL error logging

    Previously, the SST script would write directly to the error log file. Now, the SST script uses MySQL error logging. A side effect of this change is that the SST logs are not immediately visible. This is due to the logging subsystem being initialized after the SST has completed.

  • The wsrep_sst_auth variable has been removed.

    PXC 8.0 now creates an internal user (mysql.pxc.sst.user) with a random password for use by PXB to take the backup. The cleartext of the password is not saved and the user is deleted after the SST has completed.

    (This feature is still in development and may change before PXC 8.0 GA)

  • PXC SST auto-upgrade

    When PXC 8.0 detects that the SST came from a lower version, mysql_upgrade is automatically invoked. Also “RESET SLAVE ALL” is run on the new node if needed. This is invoked when receiving an SST from PXC 5.7 and PXC 8.0.

    (This feature is still in development and may change before PXC 8.0 GA)

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Oct
03
2019
--

Percona XtraDB Cluster 8.0 New Feature: wsrep_sst_auth Removal

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0The problem

In PXC 5.6 and 5.7, when using xtrabackup-v2 as the SST method, the DBA must create a user with the appropriate privileges for use by Percona XtraBackup (PXB). The username and password of this backup user are specified in the wsrep_sst_auth variable.

This is a problem because this username and password was being stored in plaintext and required that the configuration file be secured.

The PXC 8.0 solution

(This feature is still under development and may change before PXC 8.0 GA)

Because the wsrep_sst_auth is only needed on the donor side to take a backup, PXC 8.0 uses an internal user (created specifically for use by PXC) with a randomly generated password. Since this user is only needed on the donor, the plaintext password is not needed on the joiner node.

This password consists of 32 characters generated at random. A new password is generated for each SST request. The plaintext of the password is never saved and never leaves the node. The username/password is sent to the SST script via unnamed pipes (stdin).

New PXC internal user accounts

mysql.pxc.internal.session

The mysql.pxc.internal.session user account provides the appropriate security context to create and set up the other PXC accounts. This account has a limited set of privileges, enough needed to create the mysql.pxc.sst.user??.

This account is locked and cannot be used to login (the password field will not allow login).

mysql.pxc.sst.user

The mysql.pxc.sst.user is used by XtraBackup to perform the backup. This account has the full set of privileges needed by XtraBackup.?? This account is created for an SST and is dropped at the end of an SST and also when the PXC node is shutdown. The creation/provisioning of this user account is not written to the binlog and is not replicated to other nodes. However, this account is sent with the backup to the joiner node. So the joiner node also has to drop this user after the SST has finished.

mysql.pxc.sst.role

The mysql.pxc.sst.role is the MySQL role that provides the privileges needed for XtraBackup. This allows for easy addition/removal of privileges needed for an SST.

The experimental release of PXC is based on MySQL 8.0.15, and we have not implemented the role-based support due to issues found with MySQL 8.0.15. This will be revisited in future versions of PXC 8.0.

Program flow

  1. DONOR node receives SST request from the JOINER
  2. DONOR node generates a random password and creates the internal SST user
    SET SESSION sql_log_bin = OFF;
    DROP USER IF EXISTS 'mysql.pxc.sst.user'@localhost;
    CREATE USER 'mysql.pxc.sst.user'@localhost IDENTIFIED WITH 'mysql_native_password' BY 'XXXXXXXX' ACCOUNT LOCK;
    GRANT 'mysql.pxc.sst.role'@localhost TO 'mysql.pxc.sst.user'@localhost;
    SET DEFAULT ROLE 'mysql.pxc.sst.role'@localhost to 'mysql.pxc.sst.user'@localhost;
    ALTER USER 'mysql.pxc.sst.user'@localhost ACCOUNT UNLOCK;

    The code that uses role is not being used in the current release due to issues with MySQL 8.0.15. Currently, we create the user with all the permissions needed explicitly.

  3. Launch the SST script (passing the username/password via stdin)
  4. SST uses the username/password to perform the backup
  5. SST script exits
  6. The DONOR node drops the user.
  7. The JOINER node receives the backup and drops the user. Note that the JOINER node also contains the internal SST user!

As a precaution, the user is also dropped when the server is shutdown.

Oct
01
2019
--

Experimental Binary of Percona XtraDB Cluster 8.0

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0Percona is happy to announce the first experimental binary of Percona XtraDB Cluster 8.0 on October 1, 2019. This is a major step for tuning Percona XtraDB Cluster to be more cloud- and user-friendly. This release combines the updated and feature-rich Galera 4, with substantial improvements made by our development team.

Improvements and New Features

Galera 4, included in Percona XtraDB Cluster 8.0, has many new features. Here is a list of the most essential improvements:

  • Streaming replication supports large transactions
  • The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)
  • More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.
  • Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.
  • Multiple system tables help find out more about the state of the cluster state.
  • The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

Percona XtraDB Cluster 8.0 has been reworked in order to improve security and reliability as well as to provide more information about your cluster:

  • There is no need to create a backup user or maintain the credentials in plain text (a security flaw). An internal SST user is created, with a random password for making a backup, and this user is discarded immediately once the backup is done.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases). This avoids manual intervention and simplifies the operation in the cloud.
  • SST (State Snapshot Transfer) rolls back or fixes an unwanted action. It is no more “a copy only block” but a smart operation to make the best use of the copy-phase.
  • Additional visibility statistics are introduced in order to obtain more information about Galera internal objects. This enables easy tracking of the state of execution and flow control.

Installation

You can only install this release from a tarball and it, therefore, cannot be installed through a package management system, such as apt or yum. Note that this release is not ready for use in any production environment.

Percona XtraDB Cluster 8.0 is based on the following:

Please be aware that this release will not be supported in the future, and as such, neither the upgrade to this release nor the downgrade from higher versions is supported.

This release is also packaged with Percona XtraBackup 8.0.5. All Percona software is open-source and free.

In order to experiment with Percona XtraDB Cluster 8.0 in your environment, download and unpack the tarball for your platform.

Note

Be sure to check your system and make sure that the packages are installed which Percona XtraDB Cluster 8.0 depends on.

For Debian or Ubuntu:

$ sudo apt-get install -y \
socat libdbd-mysql-perl \
rsync libaio1 libc6 libcurl3 libev4 libgcc1 libgcrypt20 \
libgpg-error0 libssl1.1 libstdc++6 zlib1g libatomic1

For Red Hat Enterprise Linux or CentOS:

$ sudo yum install -y openssl socat  \
procps-ng chkconfig procps-ng coreutils shadow-utils \
grep libaio libev libcurl perl-DBD-MySQL perl-Digest-MD5 \
libgcc rsync libstdc++ libgcrypt libgpg-error zlib glibc openssl-libs

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
18
2019
--

Percona XtraDB Cluster 5.7.27-31.39 Is Now Available

Percona XtraDB Cluster

Percona XtraDB ClusterPercona is happy to announce the release of Percona XtraDB Cluster 5.7.27-31.39 on September 18, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.27-31.39 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

  • PXC-2432: PXC was not updating the information_schema user/client statistics properly.
  • PXC-2555: SST initialization delay: fixed a bug where the SST process took too long to detect if a child process was running.
  • PXC-2557: Fixed a crash when a node goes NON-PRIMARY and SHOW STATUS is executed.
  • PXC-2592: PXC restarting automatically on data inconsistency.
  • PXC-2605: PXC could crash when log_slow_verbosity included InnoDB.  Fixed upstream PS-5820.
  • PXC-2639: Fixed an issue where a SQL admin command (like OPTIMIZE) could cause a deadlock.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com