Nov
23
2020
--

Uncommon Sense MySQL – When EXPLAIN Can Trash Your Database

When EXPLAIN Can Trash Your Database

When EXPLAIN Can Trash Your DatabaseIf I ask you if running EXPLAIN on the query can change your database, you will probably tell me NO; it is common sense. EXPLAIN should show us how the query is executed, not execute the query, hence it can’t change any data.

Unfortunately, this is the case where common sense does not apply to MySQL (at the time of this writing MySQL 8.0.21 and previous versions) – there are edge cases where EXPLAIN can actually change your database as this Bug illustrates:

DELIMITER $$
CREATE FUNCTION `cleanup`() RETURNS char(50) CHARSET latin1
    DETERMINISTIC
BEGIN 
delete from test.t1;
RETURN 'OK'; 
END $$

Query OK, 0 rows affected (0.01 sec)

DELIMITER ;

mysql> create table t1(i int);
mysql> insert into t1 values(1); 
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1; 
+------+
| i    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)


mysql> explain select * from (select cleanup()) as t1clean; 
+----+-------------+------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table      | partitions | type   | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
|  1 | PRIMARY     | <derived2> | NULL       | system | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL           |
|  2 | DERIVED     | NULL       | NULL       | NULL   | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
+----+-------------+------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
2 rows in set, 1 warning (0.00 sec)


mysql> select * from t1;
Empty set (0.00 sec)

The problem is EXPLAIN executes the cleanup() stored function… which is permitted to modify data. This is different from the more sane PostgreSQL behavior which will NOT execute stored functions while running EXPLAIN (it will if you run EXPLAIN ANALYZE).

This decision in the MySQL case comes from trying to do the right stuff and provide the most reliable explain (query execution plan may well depend on what stored function returns) but it looks like this security tradeoff was not considered.

While this consequence of the current MySQL EXPLAIN design is one of the most severe, you also have the problem that EXPLAIN – which a rational user would expect to be a fast way to check the performance of a query – can take unbound time to complete, for example:

mysql> explain select * from (select sleep(5000) as a) b;

This will run for more than an hour, creating an additional accidental (or not) Denial of Service attack vector.

Going Deeper Down the Rabbit Hole

While this behavior is unfortunate, it will happen only if you have unrestricted privileges.  If you have a more complicated setup, the behavior may vary.

If the user lacks EXECUTE privilege, the EXPLAIN statement will fail.

mysql> explain select * from (select cleanup()) as t1clean;
ERROR 1370 (42000): execute command denied to user 'msandbox_ro'@'localhost' for routine 'test.cleanup'

If the user has EXECUTE privilege but the user executing the stored function lacks DELETE privilege, it will fail too:

mysql> explain select * from (select cleanup()) as t1clean;
ERROR 1142 (42000): DELETE command denied to user 'msandbox_ro'@'localhost' for table 't2'

Note: I’m saying user executing stored function, rather than the current user, as depending on the SECURITY clause in Stored Function definition it may be run either as definer or as invoker.

So what can you do if you want to improve EXPLAIN safety, for example, if you’re developing a tool like Percona Monitoring and Management which, among other features, allows users to run EXPLAIN on their queries?

  • Advise users to set up privileges for monitoring correctly.  It should be the first line of defense from this (and many other) issues, however, it is hard to rely on.  Many users will choose the path of simplicity and will use “root” user with full privileges for monitoring.
  • Wrap your EXPLAIN statement in BEGIN … ROLLBACK which will undo any damage EXPLAIN may have caused. The downside of course is the “work” of deleting the data and when undoing the work will be done. (Note: Of course this only works for Transactional tables, if you still run MyISAM…. Well in this case you have worse problems to worry about.)
  • Use “ set transaction read-only”  to signal you’re not expecting any writes…   EXPLAIN which tries to write data will fail in this case without doing any work.

While these workarounds can have tools running EXPLAIN safer, it does not help users running EXPLAIN directly, and I really hope this issue will be fixed by redesigning EXPLAIN in a way it is not trying to run stored functions, as PostgreSQL already does.

For those who want to know how the query is executed EXACTLY, there is now EXPLAIN ANALYZE.

Nov
23
2020
--

Recover Percona XtraDB Cluster in Kubernetes From Wrong MySQL Config

Recover Percona XtraDB Cluster in Kubernetes

Recover Percona XtraDB Cluster in KubernetesKubernetes operators are meant to simplify the deployment and management of applications. Our Percona Kubernetes Operator for Percona XtraDB Cluster serves the purpose, but also provides users the flexibility to fine-tune their MySQL and proxy services configuration.

The document Changing MySQL Options describes how to provide custom

my.cnf

configuration to the operator. But what would happen if you made a mistake and specified the wrong parameter in the configuration?

Apply Configuration

I already deployed my Percona XtraDB Cluster and deliberately submitted the wrong

my.cnf

  configuration in

cr.yaml

 :

spec:
...
  pxc:
    configuration: |
      [mysqld]
      wrong_param=123
…

Apply the configuration:

$ kubectl apply -f deploy/cr.yaml

Once you do this, the Operator will apply a new MySQL configuration to one of the Pods. In a few minutes you will see that the Pod is stuck in CrashLoopBackOff status:

$ kubectl get pods
NAME                                               READY   STATUS             RESTARTS   AGE
percona-xtradb-cluster-operator-79d786dcfb-lzv4b   1/1     Running            0          5h
test-haproxy-0                                     2/2     Running            0          5m27s
test-haproxy-1                                     2/2     Running            0          4m40s
test-haproxy-2                                     2/2     Running            0          4m24s
test-pxc-0                                         1/1     Running            0          5m27s
test-pxc-1                                         1/1     Running            0          4m41s
test-pxc-2                                         0/1     CrashLoopBackOff   1          59s

In the logs it is clearly stated that this parameter is not supported and

mysqld

  process cannot start:

       2020-11-19T13:30:30.141829Z 0 [ERROR] [MY-000067] [Server] unknown variable 'wrong_param=123'.
        2020-11-19T13:30:30.142355Z 0 [ERROR] [MY-010119] [Server] Aborting
        2020-11-19T13:30:31.835199Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.20-11.1)  Percona XtraDB Cluster (GPL), Release rel11, Revision 683b26a, WSREP version 26.4.3.

It is worth noting that your Percona XtraDB Cluster is still operational and serves the requests.

Recovery

Let’s try to comment out the configuration section and reapply

cr.yaml

 :

spec:
...
  pxc:
#    configuration: |
#      [mysqld]
#      wrong_param=123
…


$ kubectl apply -f deploy/cr.yaml

And it won’t work (in v1.6). The Pod is still in CrashLoopBackOff state as the operator does not apply any changes when not all Pods are up and running. We are doing that to ensure data safety.

Fortunately, there is an easy way to recover from such a mistake: you can either delete or modify the corresponding ConfigMap resource in Kubernetes. Usually its name is

{your_cluster_name}-pxc

:

$ kubectl delete configmap test-pxc

And delete the Pod which is failing:

$ kubectl delete pod text-pxc-2

Kubernetes will restart all Percona XtraDB Cluster pods one by one after some time:

test-pxc-0                                         1/1     Running   0          2m28s
test-pxc-1                                         1/1     Running   0          3m23s
test-pxc-2                                         1/1     Running   0          4m36s

You can apply the correct MySQL configuration now through ConfigMap or cr.yaml again. We are assessing other recovery options for such cases and config validation, so stay tuned for upcoming releases.

Nov
19
2020
--

A Blog Shamelessly Bribing You to Review Percona Monitoring and Management!

Review Percona Monitoring and Management

We would love you to help us spread the word about Percona Monitoring and Management (PMM) to make sure even more people are aware of it and adopting it. And we are not afraid to offer (modest) bribes!

  • If you already use PMM please write an independent review of its pros and cons on the AWS and/or Azure product page.
  • If you don’t use PMM, please install and try this software to see how it can help you improve the monitoring of your database environment.

For those of you new to Percona Monitoring and Management, it is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Percona Monitoring and Management can be used to monitor a wide range of open source database environments:

  • Amazon RDS MySQL
  • Amazon Aurora MySQL
  • MySQL
  • MongoDB
  • Percona XtraDB Cluster
  • PostgreSQL
  • ProxySQL

Percona Monitoring and Management is now available for fast installation on two marketplaces – AWS and Azure. We are keen to increase the number of PMM reviews on those pages so that potential users can get an independent view of how it will benefit their business.

We will send you special Percona Swag for every verified review you post before December 20, 2020. 

Just send us a link to your testimonial or a screenshot, and we will send you the latest in Percona gear – 100% free, and shipped to you anywhere in the world!

You can choose from any of these gift options:

review percona monitoring and management

Any meaningful review (ie: not just a star rating) earns swag; whether it is positive, negative, or mixed. We believe in open source and learning from our users, so please write honestly about your experience using Percona Monitoring and Management.

To claim your swag, email the Percona community team and include:

  1. The screenshot or link to your review
  2. Your postal address
  3. Your phone number (for delivery use only, never for marketing)
  4. If you have chosen a sweatshirt or hoodie please also let us know what color (grey, black, or blue), and your size.
  5. We only accept feedback from PMM users who are using AWS and Azure marketplaces.
  6. Be sure to submit your review before December 20, 2020!

It’s that simple!

So, please visit the AWS and Azure Percona Monitoring and Management download pages to add your review today!

AWS

Azure

Nov
17
2020
--

Wondering How to Run Percona XtraDB Cluster on Kubernetes? Try Our Operator!

Run Percona XtraDB Cluster on Kubernetes

Run Percona XtraDB Cluster on KubernetesKubernetes has been a big trend for a while now, particularly well-suited for microservices. Running your main databases on Kubernetes is probably NOT what you are looking for. However, there’s a niche market for them. My colleague Stephen Thorn did a great job explaining this in The Criticality of a Kubernetes Operator for Databases. If you are considering running your database on Kubernetes, have a look at it first. And, if after reading it you start wondering how the Operator works, Stephen also wrote an Introduction to Percona Kubernetes Operator for Percona XtraDB Cluster (PXC), which presents the Kubernetes architecture and how the Percona Operator simplifies the deployment of a full HA PXC cluster in this environment, proxies included!

Now, if you are curious about how it actually works in practice but are afraid the entry barrier is too high, I can help you with that. In fact, this technology is widespread now, with most cloud providers offering a dedicated Kubernetes engine. In this blog post, I’ll walk you over the steps on how to deploy a Percona XtraDB Cluster (PXC) using the Percona Operator for Kubernetes on Google Cloud Platform (GCP).

Creating a Virtual Environment to Run Kubernetes on GCP

Google Cloud Platform includes among its products the Google Kubernetes Engine (GKE). We can take advantage of their trial offer to create our test cluster there: https://cloud.google.com.

After you sign up, you can access all the bells and whistles in their web interface. Note the Kubernetes Engine API is not enabled by default, you need to do it by visiting the Kubernetes Engine section in the left menu, under COMPUTE.

For the purpose of deploying our environment, we should install their SDK and work from the command line: see https://cloud.google.com/sdk/docs/install and follow the respective installation instructions for your OS (you will probably want to install the SDK on your personal computer).

With the SDK installed, we can initialize our environment, which requires authenticating to the Google Cloud account:

gcloud init

You will be prompted to choose a cloud project to use: there’s one created by default when the account is activated, named “My First Project”. It will receive a unique id, which you can verify in the Google Cloud interface, but usually, it is displayed as the first option presented in the prompt.

Alternatively, you can use gcloud config set to configure your default project and zone, among other settings.

For this exercise, we will be creating a 3-node cluster named k8-test-cluster with n1-standard-4 instances in the us-central1-b zone:

gcloud container clusters create --machine-type n1-standard-4 --num-nodes 3 --zone us-central1-b --cluster-version latest k8-test-cluster

If the command above was successful, you should see your newly created cluster in the list returned by:

gcloud container clusters list

Getting Ready to Work with Kubernetes

Besides the Google Cloud SDK that is used to manage the cloud instances, we also need the Kubernetes command-line tool, kubectl, to manage the Kubernetes cluster. One way to install it is through gcloud itself:

gcloud components install kubectl

This method won’t work for everyone though, as the Cloud SDK component manager is disabled for certain kinds of installation, such as through apt or yum in Linux. I find myself in this group, using Ubuntu, but the failed attempt to install kubectl through gcloud suggested another approach that worked for me:

sudo apt install kubectl

Deploying a PXC Cluster Using the Percona Kubernetes Operator

The Percona operators are available on Github. The most straightforward way to obtain a copy is by cloning the operator’s repository. The latest version of the PXC operator is 1.6.0 and we can clone it with the following command:

git clone -b v1.6.0 https://github.com/percona/percona-xtradb-cluster-operator

Move inside the created directory:

cd percona-xtradb-cluster-operator

and run the following sequence of commands:

  1. Define the Custom Resource Definitions for PXC:

    kubectl apply -f deploy/crd.yaml
  2. Create a namespace on Kubernetes and associate it to your current context:

    kubectl create namespace pxc
    kubectl config set-context $(kubectl config current-context) --namespace=pxc
  3. Define Role-Based Access Control (RBAC) for PXC:

    kubectl apply -f deploy/rbac.yaml
  4. Start the operator within Kubernetes:

    kubectl apply -f deploy/operator.yaml
  5. Configure PXC users and their credentials:

    kubectl apply -f deploy/secrets.yaml
  6. Finally, deploy the cluster:

    kubectl apply -f deploy/cr.yaml

You can find a more detailed explanation of each of these steps, as well as how to customize your installation, in the Percona Kubernetes Operator for Percona XtraDB Cluster online documentation, which includes a quickstart guide for GKE.

Now, it is a matter of waiting for the deployment to complete, which you can monitor with:

kubectl get pods

A successful deployment will show output for the above command similar to:

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          4m21s
cluster1-haproxy-1                                 2/2     Running   0          2m47s
cluster1-haproxy-2                                 2/2     Running   0          2m21s
cluster1-pxc-0                                     1/1     Running   0          4m22s
cluster1-pxc-1                                     1/1     Running   0          2m52s
cluster1-pxc-2                                     1/1     Running   0          111s
percona-xtradb-cluster-operator-79d786dcfb-9lthw   1/1     Running   0          4m37s

As you can see above, the operator will deploy seven pods with the default settings, and those are distributed across the three GKE n1-standard-4 machines we created at first:

kubectl get nodes
NAME                                             STATUS   ROLES    AGE    VERSION
gke-k8-test-cluster-default-pool-02c370e1-gvfg   Ready    <none>   152m   v1.17.13-gke.1400
gke-k8-test-cluster-default-pool-02c370e1-lvh7   Ready    <none>   152m   v1.17.13-gke.1400
gke-k8-test-cluster-default-pool-02c370e1-qn3p   Ready    <none>   152m   v1.17.13-gke.1400

Accessing the Cluster

One way to access the cluster is by creating an interactive shell in the Kubernetes cluster:

kubectl run -i --rm --tty percona-client --image=percona:8.0 --restart=Never -- bash -il

From there, we can access MySQL through the cluster’s HAproxy writer node:

mysql -h cluster1-haproxy -uroot -proot_password

Note the hostname used above is an alias, the connection being routed to one of the HAproxy servers available in the cluster. It is also possible to connect to a specific node by modifying the host option -h with the node’s name:

mysql -h cluster1-pxc-0 -uroot -proot_password

This is where all the fun and experimentation starts: you can test and break things without worrying too much as you can easily and quickly start again from scratch.

Destroying the Cluster and Deleting the Test Environment

Once you are done playing with your Kubernetes cluster, you can destroy it with:

gcloud container clusters delete --zone=us-central1-b k8-test-cluster

It’s important to note the command above will not discard the persistent disk volumes that were created and used by the nodes, which you can check with the command:

gcloud compute disks list

A final purging command is required to remove those as well:

gcloud compute disks delete <disk_name_1> <disk_name_2> <disk_name_3> --zone=us-central1-b

If you are feeling overzealous, you can double-check that all has been deleted:

gcloud container clusters list

gcloud compute disks list

Learn More About Percona Kubernetes Operator for Percona XtraDB Cluster

Interested In Hands-On Learning?

Be sure to get in touch with Percona’s Training Department to schedule your PXC Kubernetes training engagement. Our expert trainers will guide your team firstly through the basics, cover all the configuration noted above (and then some), and then dive deeper into how the operator functions along with High-Availability exercises, disaster recovery scenarios, backups, restore, and much more.

Nov
17
2020
--

Tame Black Friday Gremlins — Optimize Your Database for High Traffic Events

Optimize Your Database for High Traffic Events

Optimize Your Database for High Traffic EventsIt’s that time of year! The Halloween decorations have come down and the leaves have started to change and the Black Friday/Cyber Monday buying season is upon us!

For consumers, it can be a magical time of year, but for those of us that have worked in e-commerce or retail, it usually brings up…different emotions. It’s much like the Gremlins — cute and cuddly unless you break the RULES:

  1. Don’t expose them to sunlight,
  2. Don’t let them come in contact with water,
  3. NEVER feed them after midnight!

I love this analogy and how it parallels the difficulties that we experience in the database industry — especially this time of year. When things go well, it’s a great feeling. When things go wrong, they can spiral out of control in destructive and lasting ways.

Let’s put these fun examples to work and optimize your database!

Don’t Expose Your Database to “Sunlight”

One sure-fire way to make sure that your persistent data storage cannot do its job, and effectively kill it is to let it run out of storage. Before entering the high-traffic holiday selling season, make sure that you have ample storage space to make it all the way to the other side. This may sound basic, but so is not putting a cute, fuzzy pet in the sunlight — it’s much harder than you think!

Here are some great ways to ensure the storage needs for your database are met (most obvious to least obvious):

  1. If you are on a DBaaS such as Amazon RDS, leverage something like Amazon RDS Storage Auto Scaling
  2. In a cloud or elastic infrastructure:
    1. make sure network-attached storage is extensible on the fly, or
    2. properly tune the database mount point to be leveraging logical volume management or software raid to add additional volumes (capacity) on the fly.
  3. In an on-premise or pre-purchased infrastructure, make sure you are overprovisioned — even by end of season estimates — by ~25%.
  4. Put your logs somewhere else than the main drive. The database may not be happy about running out of log space, but logs can be deleted easily — data files cannot!

Don’t Let Your Database Come in “Contact With Water”

We don’t want to feed or allow simple issues to multiply. Actions we take to get out of a bind in the near term can cause problems that require more attention in the future — just like when you put water on a Gremlin, it will multiply!

What are some of these scenarios?

  1. Not having a documented plan of action can cause confusion and chaos if something doesn’t go quite right. Having a plan documented and distributed will keep things from getting overly complicated when issues occur.
  2. Throwing hardware at a problem. Unless you know how it will actually fix an issue, it could be like throwing gasoline on a fire and throw your stack into disarray with blocked and unblocked queries. It also mandates database tuning to be effective.
  3. Understanding (or misunderstanding) how users behave when or if the database slows down:
    1. Do users click to retry five times in five seconds causing even more load?
    2. Is there a way to divert attention to retry later?
    3. Can your application(s) ignore retries within a certain time frame?
  4. Not having just a few sources of truth, with as much availability as possible:
    1. Have at least one failover candidate
    2. Have off-server transaction storage (can you rebuild in a disaster?)
    3. If you have the two above, then delayed replicas are your friend!

Never “Feed” Your Database After “Midnight”

What’s the one thing that can ensure that all heck breaks loose on Black Friday? CHANGE is the food here, and typically, BLACK FRIDAY is the midnight.

Have you ever felt like there is just one thing that you missed and want to get off your backlog? It could be a schema change, a data type change, or an application change from an adjacent team. The ‘no feeding’ rule is parallel to CODE FREEZE in production.

Most companies see this freeze start at the beginning of November when the most stable prod is the one that is already out there, not the one that you have to make stable after a new release:

  1. Change Management is your friend; change that needs to happen should still have a way to happen.
  2. Observability is also your friend; know in absolute terms what is happening to your database and stack so you don’t throw a wrench in it (Percona Monitoring and Management can help).
  3. Educate business stakeholders on the release or change process BEFORE the event, not DURING the event.
  4. Don’t be afraid to “turn it off” when absolute chaos is happening. Small downtime is better than an unusable site over a longer period of time.

Conclusion

Black Friday, Cyber Monday, and the Holidays can be the most wonderful time of the year — and now that we’ve covered the rules, some of the “Gremlins” can stay small and fuzzy and your business won’t get wrecked by pesky database issues or outages.

How Percona Can Help

Percona experts optimize your database performance with open source database support, highly-rated training, managed services, and professional services.

Contact Us to Tame Your Database Gremlins!

Nov
13
2020
--

Kubernetes Scaling Capabilities with Percona XtraDB Cluster

Kubernetes Scaling Capabilities with Percona XtraDB Cluster

Kubernetes Scaling Capabilities with Percona XtraDB ClusterOur recent survey showed that many organizations saw unexpected growth around cloud and data. Unexpected bills can become a big problem, especially in such uncertain times. This blog post talks about how Kubernetes scaling capabilities work with Percona Kubernetes Operator for Percona XtraDB Cluster (PXC Operator) and can help you to control the bill.

Resources

Kubernetes is a container orchestrator and on top of it, it has great scaling capabilities. Scaling can help you to utilize your cluster better and do not waste money on excessive capacity. But before scaling we need to understand what capacity is and how Kubernetes manages CPU and memory resources.

There are two resource concepts that you should be aware of: requests and limits. Requests is the amount of CPU or memory that a container is guaranteed to get on the node. Kubernetes uses requests during scheduling decisions, and it will not schedule a container to a node that does not have enough capacity. Limits is the maximum amount of resources that a container can get on the node. There is no guarantee though. In Linux, world limits are just cgroup maximums for processes.

Each node in a cluster has its own capacity. Part of this capacity is reserved for the operating system and kubelet, and what is left can be utilized by containers (allocatable).

resource allocation in Kubernetes

Okay, now we know a thing or two about resource allocation in Kubernetes. Let’s dive into the problem space.

Problem #1: Requested Too Much

If you request resources for containers but do not utilize them well enough, you end up wasting resources. This is where Vertical Pod Autoscaler (VPA) comes in handy. It can automatically scale up or down container requests based on its historical real usage.

request resources for containers

VPA has 3 modes:

  1. Recommender – it only provides recommendations for containers’ requests. We suggest starting with this mode.
  2. Initial – webhook applies changes to the container during its creation
  3. Auto/Recreate – webhook applies changes to the container during its creation and can also dynamically change the requests for the container

Configure VPA

As a starting point, deploy Percona Kubernetes Operator for Percona XtraDB Cluster and the database by following the guide. VPA is deployed via a single command (see the guide here). VPA requires a metrics-server to get real usage for containers.

We need to create a VPA resource that will monitor our PXC cluster and provide recommendations for requests tuning. For recommender mode set UpdateMode to “Off”:

$ cat vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: pxc-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       StatefulSet
    name:       <name of the STS>
    namespace:  <your namespace>
  updatePolicy:
    updateMode: "Off"

Run the following command to get the name of the StatefulSet:

$ kubectl get sts
NAME           READY   AGE
...
cluster1-pxc   3/3     3h47m

The one with -pxc has the PXC cluster. Apply the VPA object:

$ kubectl apply -f vpa.yaml

After a few minutes you should be able to fetch recommendations from the VPA object:

$ kubectl get vpa pxc-vpa -o yaml
...
  recommendation:
    containerRecommendations:
    - containerName: pxc
      lowerBound:
        cpu: 25m
        memory: "503457402"
      target:
        cpu: 25m
        memory: "548861636"
      uncappedTarget:
        cpu: 25m
        memory: "548861636"
      upperBound:
        cpu: 212m
        memory: "5063059194"

Resources in the target section are the ones that VPA recommends and applies if Auto or Initial modes are configured. Read more here to understand other recommendation sections.

VPA will apply the recommendations once it is running in Auto mode and will persist the recommended configuration even after the pod being restarted. To enable Auto mode patch the VPA object:

$ kubectl patch vpa pxc-vpa --type='json' -p '[{"op": "replace", "path": "/spec/updatePolicy/updateMode", "value": "Auto"}]'

After a few minutes, VPA will restart PXC pods and apply recommended requests.

$ kubectl describe pod cluster1-pxc-0
...
Requests:
      cpu: “25m”
      memory: "548861636"

Delete VPA object to stop autoscaling:

$ kubectl delete vpa pxc-vpa

Please remember few things about VPA and Auto mode:

  1. It changes container requests, but does not change Deployments or StatefulSet resources.
  2. It is not application aware. For PXC, for example, it does not change
    innodb_buffer_pool_size

      which is configured to take 75% of RAM by the operator. To change it, please, set corresponding requests configuration in

    cr.yaml

    and apply.

  3. It respects
    podDistruptionBudget

    to protect your application. In our default

    cr.yaml

      PDB is configured to lose one pod at a time. It means VPA will change requests and restart one pod at a time:

    podDisruptionBudget:
      maxUnavailable: 1

Problem #2: Spiky Usage

The utilization of the application might change over time. It can happen gradually, but what if it is daily spikes of usage or completely unpredictable patterns? Constantly running additional containers is an option, but it leads to resource waste and increases in infrastructure costs. This is where Horizontal Pod Autoscaler (HPA) can help. It monitors container resources or even application metrics to automatically increase or decrease the number of containers serving the application.

Horizontal Pod Autoscaler

Looks nice, but unfortunately, the current version of the PXC Operator will not work with HPA. HPA tries to scale the StatefulSet, which in our case is strictly controlled by the operator. It will overwrite any scaling attempts from the horizontal scaler. We are researching the opportunities to enable this support for PXC Operator.

Problem #3: My Cluster is Too Big

You have tuned resources requests and they are close to real usage, but the cloud bill is still not going down. It might be that your Kubernetes cluster is overprovisioned and should be scaled with Cluster Autoscaler. CA adds and removes nodes to your Kubernetes cluster based on their requests usage. When nodes are removed pods are rescheduled to other nodes automatically.

Kubernetes cluster is overprovisioned

Configure CA

On Google, Kubernetes Engine Cluster Autoscaler can be enabled through gcloud utility. On AWS you need to install autoscaler manually and add corresponding autoscaling groups into the configuration.

In general, CA monitors if there are any pods that are in Pending status (waiting to be scheduled, read more on pod statuses here) and adds more nodes to the cluster to meet the demand. It removes nodes if it sees the possibility to pack pods densely on other nodes. To add and remove nodes it relies on the cloud primitives: node groups in GCP, auto-scaling groups in AWS, virtual machine scale set on Azure, and so on. The installation of CA differs from cloud to cloud, but here are some interesting tricks.

Overprovision the Cluster

If your workloads are scaling up CA needs to provision new nodes. Sometimes it might take a few minutes. If there is a requirement to scale faster it is possible to overprovision the cluster. Detailed instruction is here. The idea is to always run pause pods with low priority, real workloads with higher priority push them out from nodes when needed.

Expanders

Expanders control how to scale up the cluster; which nodes to add. Configure expanders and multiple node groups to fine-tune the scaling. My preference is to use priority expander as it allows us to cherry-pick the nodes by customizable priorities, it is especially useful for a rapidly changing spot market.

Safety

Pay extremely close attention to scaling down. First of all, you can disable it completely by setting

scale-down-enabled

  to

false

(not recommended). For clusters with big nodes with lots of pods be careful with

scale-down-utilization-threshold

  – do not set it to more than 50%, it might impact other nodes and overutilize them. For clusters with a dynamic workload and lots of nodes do not set

scale-down-delay-after-delete

and scale-down-unneeded-time too low, it will lead to non-stop cluster scaling with absolutely no value.

Cluster Autoscaler also respects

podDistruptionBudget

. When you run it along with PXC Operator please make sure PDBs are correctly configured, so that the PXC cluster does not crash in the event of scaling down the Kubernetes.

Conclusion

In cloud environments, day two operations must include cost management. Overprovisioning Kubernetes clusters is a common theme that can quickly become visible in the bills. When running Percona XtraDB Cluster on Kubernetes you can leverage Vertical Pod Autoscaler to tune requests and apply Cluster Autoscaler to reduce the number of instances to minimize your cloud spend. It will be possible to use Horizontal Pod Autoscaler in future releases as well to dynamically adjust your cluster to demand.

Nov
06
2020
--

Various Backup Compression Methods Using Mysqlpump

Backup Compression Methods Using Mysqlpump

Backup Compression Methods Using MysqlpumpMysqlpump is a client program that was released with MySQL 5.7.8 and is used to perform logical backups in a better way. Mysqlpump supports parallelism and it has the capability of creating compressed output. Pablo already wrote a blog about this utility (The mysqlpump Utility), and in this blog, I am going to explore the available compression techniques in the Mysqlpump utility.

Overview

Mysqlpump has three options to perform the compression backup.

–compress: Used to compress all the information sent between client and server.

–compression-algorithm: It was added in MySQL 8.0.18. Used to define the compression algorithm for all incoming connections to the server. (available options: zlib, zstd, uncompressed )

–compress-output: Used to define the compression algorithm for the backup file (available options: lz4, zlib)

Here, “–compress-output” is the option used to define the compression algorithm for the backup file. Which has two algorithms.

  • Lz4
  • Zlib

Lz4: LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed.

Zlib: zlib is a software library used for data compression. zlib compressed data are typically written with a gzip or a zlib wrapper. 

Lab Setup

To experiment with both compression techniques, I have installed the MySQL (8.0.22) server in my local environment. I also created the table “percona_test.mp_test” which has an 11GB size.

[root@localhost]# mysql -e "select @@version, @@version_comment\G"
*************************** 1. row ***************************
        @@version: 8.0.22
@@version_comment: MySQL Community Server - GPL

[root@localhost]# mysql -e "select count(*) from percona_test.mp_test\G"
*************************** 1. row ***************************
count(*): 70698024

[root@localhost percona_test]# ls -lrth
total 11G
-rw-r-----. 1 mysql mysql 11G Oct 23 11:20 mp_test.ibd

Now, I am going to experiment with both compression algorithms.

Compression with Lz4

I am going to take the backup (table: mp_test) using the lz4 compression algorithm.

[root@localhost]# time mysqlpump --set-gtid-purged=off --compress --compress-output=lz4 percona_test mp_test > percona_test.mp_test.lz4
Dump progress: 0/1 tables, 250/70131715 rows
Dump progress: 0/1 tables, 133000/70131715 rows
Dump progress: 0/1 tables, 278500/70131715 rows
...
...
Dump progress: 0/1 tables, 70624000/70131715 rows
Dump completed in 540824
real 9m0.857s

It took 9.1 minutes to complete. And, the file size is 1.1 GB, looks like 10x compression.

[root@dc1 percona_test]# ls -lrth | grep lz4

-rw-r--r--. 1 root  root  1.1G Oct 23 12:47 percona_test.mp_test.lz4

Compression with Zlib

Now, I am going to start the backup with “zlib” algorithm.

[root@dc1]# time mysqlpump --set-gtid-purged=off --compress --compress-output=zlib percona_test mp_test > percona_test.mp_test.zlib
Dump progress: 0/1 tables, 250/70131715 rows
Dump progress: 0/1 tables, 133250/70131715 rows
Dump progress: 0/1 tables, 280250/70131715 rows
Dump progress: 0/1 tables, 428750/70131715 rows
...
...
Dump progress: 0/1 tables, 70627000/70131715 rows
Dump completed in 546249
real 10m6.436s

It took 10.6 minutes to complete the process. And the file size is the same 1.1 GB (10x compression).

[root@dc1]# ls -lrth | grep -i zlib

-rw-r--r--. 1 root  root  1.1G Oct 23 13:06 percona_test.mp_test.zlib

 

 

How to Decompress the Backup

MySQL community provides two utilities to decompress the backups.

  • zlib_decompress ( for zlib compression files )
  • lz4_decompress ( for lz4 compression files )

lz4_decompress

[root@dc1]# time lz4_decompress percona_test.mp_test.lz4 percona_test.mp_test.sql
real 0m45.287s
user 0m1.114s
sys 0m6.568s
[root@dc1]# ls -lrth | grep percona_test.mp_test.sql
-rw-r--r--. 1 root  root  9.1G Oct 23 13:30 percona_test.mp_test.sql

lz4 took 45 seconds to decompress the backup file.

zlib_decompress

[root@dc1]# time zlib_decompress percona_test.mp_test.zlib percona_test.mp_test.sql
real 0m35.553s
user 0m6.642s
sys 0m7.105s
[root@dc1]# ls -lrth | grep percona_test.mp_test.sql
-rw-r--r--. 1 root  root  9.1G Oct 23 13:49 percona_test.mp_test.sql

zlib took 36 seconds to decompress the backup file.

This is the procedure we have to compress/decompress the backups with Mysqlpump. It seems both the algorithms provide the 10x compression. Also, there is not much difference in the execution time as well, but it may be the big one with a large dataset. 

Nov
05
2020
--

ChaosMesh to Create Chaos in Kubernetes

ChaosMesh to Create Chaos in Kubernetes

ChaosMesh to Create Chaos in KubernetesIn my talk on Percona Live (download the presentation), I spoke about how we can use Percona Kubernetes Operators to deploy our own Database-as-Service, based on fully OpenSource components and independent from any particular cloud provider.

Today I want to mention an important tool that I use to test our Operators: ChaosMesh, which actually is part of CNCF and recently became GA version 1.0.

ChaosMesh seeks to deploy chaos engineering experiments in Kubernetes deployments which allows it to test how deployment is resilient against different kinds of failures.

Obviously, this tool is important for Kubernetes Database deployments, and I believe this also can be very useful to test your application deployment to understand how the application will perform and handle different failures.

ChaosMesh allows to emulate:

  • Pod Failure: kill pod or error on pod
  • Network Failure: network partitioning, network delays, network corruptions
  • IO Failure: IO delays and IO errors
  • Stress emulation: stress memory and CPU usage
  • Kernel Failure: return errors on system calls
  • Time skew: Emulate time drift on pods

For our Percona Kubernetes Operators, I found Network Failure especially interesting, as clusters that rely on network communication should provide enough resiliency against network issues.

Let’s review an example of how we can emulate a network failure on one of the pods. Assume we have cluster2 running:

kubectl get pods              
NAME                                                     READY   STATUS                       RESTARTS   AGE
cluster2-haproxy-0                                       2/2     Running                      1          12d
cluster2-haproxy-1                                       2/2     Running                      2          12d
cluster2-haproxy-2                                       2/2     Running                      2          12d
cluster2-pxc-0                                           1/1     Running                      0          12d
cluster2-pxc-1                                           1/1     Running                      0          12d
cluster2-pxc-2                                           1/1     Running                      0          12d

And we will isolate cluster2-pxc-1 from the rest of the cluster, by using the following Chaos Experiment:

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: pxc-network-delay
spec:
  action: partition # the specific chaos action to inject
  mode: one # the mode to run chaos action; supported modes are one/all/fixed/fixed-percent/random-max-percent
  selector: # pods where to inject chaos actions
    pods:
      pxc: # namespace of the target pods
        - cluster2-pxc-1
  direction: to
  target:
    selector:
      pods:
        pxc: # namespace of the target pods
          - cluster2-pxc-0
    mode: one
  duration: "3s"
  scheduler: # scheduler rules for the running time of the chaos experiments about pods.
    cron: "@every 1000s"
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: pxc-network-delay2
spec:
  action: partition # the specific chaos action to inject
  mode: one # the mode to run chaos action; supported modes are one/all/fixed/fixed-percent/random-max-percent
  selector: # pods where to inject chaos actions
    pods:
      pxc: # namespace of the target pods
        - cluster2-pxc-1
  direction: to
  target:
    selector:
      pods:
        pxc: # namespace of the target pods
          - cluster2-pxc-2
    mode: one
  duration: "3s"
  scheduler: # scheduler rules for the running time of the chaos experiments about pods.
    cron: "@every 1000s"

This will isolate the pod  cluster2-pxc-1 for three seconds. Let’s see what happens with the workload which we directed on cluster2-pxc-0 node (the output is from sysbench-tpcc benchmark):

1041,56,1232.46,36566.42,16717.16,17383.33,2465.93,90.78,4.99,0.00
1042,56,1305.42,35841.03,16295.74,16934.44,2610.84,71.83,6.01,0.00
1043,56,1084.73,30647.99,14056.49,14422.06,2169.45,68.05,5.99,0.00
1044,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1045,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1046,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
1047,56,129.00,4219.97,1926.99,2034.98,258.00,4683.57,0.00,0.00
1048,56,939.41,25800.68,11706.55,12215.31,1878.82,960.30,2.00,0.00
1049,56,1182.09,34390.72,15708.49,16318.05,2364.18,66.84,4.00,0.00

And the log from cluster2-pxc-1 pod:

2020-11-05T17:36:27.962719Z 0 [Warning] WSREP: Failed to report last committed 133737, -110 (Connection timed out)
2020-11-05T17:36:29.962975Z 0 [Warning] WSREP: Failed to report last committed 133888, -110 (Connection timed out)
2020-11-05T17:36:30.243902Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.66.9:4567 ssl://192.168.71.201:4567
2020-11-05T17:36:31.161485Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.66.9:34760 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:36:31.162325Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 0008bac8 ssl://192.168.66.9:4567
2020-11-05T17:36:31.162694Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 448e265d (ssl://192.168.71.201:4567), attempt 0
2020-11-05T17:36:31.174019Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:4567 local endpoint ssl://192.168.61.137:47252 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:36:31.176521Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:56892 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:36:31.177086Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567
2020-11-05T17:36:31.177289Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567
2020-11-05T17:36:34.244970Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting off

We can see that the node lost communication for three seconds and then recovered.

There is a variable evs.suspect_timeout with default five sec which defines the limit of how long the nodes will wait till forming a new quorum without the affected node. So let’s see what will happen if we isolate  cluster2-pxc-1 for nine seconds:

369,56,1326.66,38898.39,17789.62,18462.43,2646.33,77.19,5.99,0.00
370,56,1341.82,38812.61,17741.30,18382.65,2688.65,74.46,5.01,0.00
371,56,364.33,11058.76,5070.72,5256.38,731.66,68.05,0.00,0.00
372,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
373,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
374,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
375,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
376,56,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
377,56,613.56,17233.62,7862.12,8139.38,1232.12,6360.91,5.00,0.00
378,56,1474.66,43070.96,19684.16,20439.47,2947.33,75.82,4.00,0.00

The workload was stalled for five seconds but continued after that. And we can see from the log what happened with node  cluster2-pxc-1. The log is quite verbose, but to describe what is happening:

  1. After 5 sec node declared that it lost connection to other nodes
  2. Figured out it is in minority and can’t form a quorum, declared itself NON-PRIMARY
  3. After the network restored, the node reconnected with cluster
  4. The node caught up with other nodes using IST (incremental state transfer) method
  5. Cluster became 3-nodes cluster
2020-11-05T17:39:18.282832Z 0 [Warning] WSREP: Failed to report last committed 334386, -110 (Connection timed out)
2020-11-05T17:39:19.283066Z 0 [Warning] WSREP: Failed to report last committed 334516, -110 (Connection timed out)
2020-11-05T17:39:20.768879Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.66.9:4567 ssl://192.168.71.201:4567 
2020-11-05T17:39:21.769154Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 0008bac8 (ssl://192.168.66.9:4567), attempt 0
2020-11-05T17:39:21.769544Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') reconnecting to 448e265d (ssl://192.168.71.201:4567), attempt 0
2020-11-05T17:39:24.769604Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.66.9:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout), socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2949502432 cwnd: 1 last_queued_since: 2949803921272502 last_delivered_since: 2949803921272502 send_queue_length: 0 send_queue_bytes: 0
2020-11-05T17:39:25.269672Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.71.201:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout), socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2949502932 cwnd: 1 last_queued_since: 2949804421325209 last_delivered_since: 2949804421325209 send_queue_length: 0 send_queue_bytes: 0
2020-11-05T17:39:25.879338Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-11-05T17:39:25.879373Z 0 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2020-11-05T17:39:25.879399Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspecting node: 0008bac8
2020-11-05T17:39:25.879414Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspected node without join message, declaring inactive
2020-11-05T17:39:25.879431Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspecting node: 448e265d
2020-11-05T17:39:25.879445Z 0 [Note] WSREP: evs::proto(11fdd640, OPERATIONAL, view_id(REG,0008bac8,3)) suspected node without join message, declaring inactive
2020-11-05T17:39:26.379920Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout) 
2020-11-05T17:39:26.379956Z 0 [Note] WSREP: declaring node with index 2 inactive (evs.inactive_timeout) 
2020-11-05T17:39:26.791118Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.66.9:4567 local endpoint ssl://192.168.61.137:51672 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:39:26.791958Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 0008bac8 ssl://192.168.66.9:4567
2020-11-05T17:39:26.879766Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,0008bac8,3)
memb {
        11fdd640,0
        }
joined {
        }
left {
        }
partitioned {
        0008bac8,0
        448e265d,0
        }
)
2020-11-05T17:39:26.879962Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-11-05T17:39:26.879975Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,11fdd640,4)
memb {
        11fdd640,0
        }
joined {
        }
left {
        }
partitioned {
        0008bac8,0
        448e265d,0
        }
)
2020-11-05T17:39:26.880029Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2020-11-05T17:39:26.880066Z 0 [Note] WSREP: Received NON-PRIMARY.
2020-11-05T17:39:26.880076Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 334573)
2020-11-05T17:39:26.880095Z 0 [Warning] WSREP: FLOW message from member 139968689209344 in non-primary configuration. Ignored.
2020-11-05T17:39:26.880121Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-11-05T17:39:26.880134Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2020-11-05T17:39:26.880140Z 0 [Note] WSREP: Received NON-PRIMARY.
2020-11-05T17:39:26.880255Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:334573, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2020-11-05T17:39:26.880287Z 2 [Note] WSREP: Setting wsrep_ready to false
2020-11-05T17:39:26.880310Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-11-05T17:39:26.880428Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:334573, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2020-11-05T17:39:26.880438Z 2 [Note] WSREP: Setting wsrep_ready to false
2020-11-05T17:39:26.880445Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-11-05T17:39:27.193945Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:57892 local endpoint ssl://192.168.61.137:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:39:27.194926Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567
2020-11-05T17:39:27.305150Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.71.201:4567 local endpoint ssl://192.168.61.137:48990 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2020-11-05T17:39:27.306328Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') connection established to 448e265d ssl://192.168.71.201:4567
2020-11-05T17:39:27.882743Z 0 [Note] WSREP: declaring 0008bac8 at ssl://192.168.66.9:4567 stable
2020-11-05T17:39:27.882774Z 0 [Note] WSREP: declaring 448e265d at ssl://192.168.71.201:4567 stable
2020-11-05T17:39:27.883565Z 0 [Note] WSREP: Node 0008bac8 state primary
2020-11-05T17:39:27.884475Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,0008bac8,5)
memb {
        0008bac8,0
        11fdd640,0
        448e265d,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2020-11-05T17:39:27.884499Z 0 [Note] WSREP: Save the discovered primary-component to disk
2020-11-05T17:39:27.885430Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2020-11-05T17:39:27.885465Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2020-11-05T17:39:27.886654Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: da55f2d8-1f8d-11eb-80cf-075e56823087
2020-11-05T17:39:27.887174Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 0 (cluster2-pxc-0)
2020-11-05T17:39:27.887194Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 1 (cluster2-pxc-1)
2020-11-05T17:39:27.887208Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: da55f2d8-1f8d-11eb-80cf-075e56823087 from 2 (cluster2-pxc-2)
2020-11-05T17:39:27.887225Z 0 [Note] WSREP: Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 2/3 (primary/total),
        act_id     = 338632,
        last_appl. = 334327,
        protocols  = 0/9/3 (gcs/repl/appl),
        group UUID = f2d3cb29-1578-11eb-857b-624f681f446d
2020-11-05T17:39:27.887244Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2020-11-05T17:39:27.887252Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 338632)
2020-11-05T17:39:27.887439Z 2 [Note] WSREP: State transfer required: 
        Group state: f2d3cb29-1578-11eb-857b-624f681f446d:338632
        Local state: f2d3cb29-1578-11eb-857b-624f681f446d:334573
2020-11-05T17:39:27.887476Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2020-11-05T17:39:27.887486Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2020-11-05T17:39:27.887504Z 2 [Note] WSREP: New cluster view: global state: f2d3cb29-1578-11eb-857b-624f681f446d:338632, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 3
2020-11-05T17:39:27.887516Z 2 [Note] WSREP: Setting wsrep_ready to true
2020-11-05T17:39:27.887524Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2020-11-05T17:39:27.887530Z 2 [Note] WSREP: Setting wsrep_ready to false
2020-11-05T17:39:27.887540Z 2 [Note] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
2020-11-05T17:39:27.887556Z 2 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 2 -> 2) (Increment: 3 -> 3)
2020-11-05T17:39:27.887563Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-11-05T17:39:27.907686Z 2 [Note] WSREP: Assign initial position for certification: 338632, protocol version: 4
2020-11-05T17:39:27.908853Z 0 [Note] WSREP: Service thread queue flushed.
2020-11-05T17:39:27.909023Z 2 [Note] WSREP: Check if state gap can be serviced using IST
2020-11-05T17:39:27.909165Z 2 [Note] WSREP: IST receiver addr using ssl://192.168.61.137:4568
2020-11-05T17:39:27.909236Z 2 [Note] WSREP: IST receiver using ssl
2020-11-05T17:39:27.910176Z 2 [Note] WSREP: Prepared IST receiver, listening at: ssl://192.168.61.137:4568
2020-11-05T17:39:27.910195Z 2 [Note] WSREP: State gap can be likely serviced using IST. SST request though present would be void.
2020-11-05T17:39:27.922651Z 0 [Note] WSREP: Member 1.0 (cluster2-pxc-1) requested state transfer from '*any*'. Selected 0.0 (cluster2-pxc-0)(SYNCED) as donor.
2020-11-05T17:39:27.922679Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 338687)
2020-11-05T17:39:27.922744Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2020-11-05T17:39:27.922791Z 2 [Note] WSREP: GCache history reset: f2d3cb29-1578-11eb-857b-624f681f446d:334573 -> f2d3cb29-1578-11eb-857b-624f681f446d:338632
2020-11-05T17:39:27.956992Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 133734664 bytes
2020-11-05T17:39:27.957016Z 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/56 locked buffers
2020-11-05T17:39:27.958791Z 2 [Note] WSREP: Receiving IST: 4059 writesets, seqnos 334573-338632
2020-11-05T17:39:27.958908Z 0 [Note] WSREP: 0.0 (cluster2-pxc-0): State transfer to 1.0 (cluster2-pxc-1) complete.
2020-11-05T17:39:27.958929Z 0 [Note] WSREP: Member 0.0 (cluster2-pxc-0) synced with group.
2020-11-05T17:39:27.958946Z 0 [Note] WSREP: Receiving IST...  0.0% (   0/4059 events) complete.
2020-11-05T17:39:30.770542Z 0 [Note] WSREP: (11fdd640, 'ssl://0.0.0.0:4567') turning message relay requesting off
2020-11-05T17:39:31.851914Z 0 [Note] WSREP: Receiving IST...100.0% (4059/4059 events) complete.
2020-11-05T17:39:31.853178Z 2 [Note] WSREP: IST received: f2d3cb29-1578-11eb-857b-624f681f446d:338632
2020-11-05T17:39:31.854358Z 0 [Note] WSREP: 1.0 (cluster2-pxc-1): State transfer from 0.0 (cluster2-pxc-0) complete.
2020-11-05T17:39:31.854396Z 0 [Note] WSREP: SST leaving flow control
2020-11-05T17:39:31.854406Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 344195)
2020-11-05T17:40:17.927370Z 0 [Warning] WSREP: Trying to continue unpaused monitor
2020-11-05T17:40:26.972878Z 0 [Note] WSREP: Member 1.0 (cluster2-pxc-1) synced with group.
2020-11-05T17:40:26.972913Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 407602)
2020-11-05T17:40:27.062892Z 4 [Note] WSREP: Synchronized with group, ready for connections
2020-11-05T17:40:27.062911Z 4 [Note] WSREP: Setting wsrep_ready to true
2020-11-05T17:40:27.062922Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Conclusion

ChaosMesh is a great tool to test the resiliency of a deployment, and in my opinion, it can be useful not only for database clusters but also for the testing of general applications to make sure the application is able to sustain different failure scenarios.

Nov
05
2020
--

Deploying Percona Monitoring and Management 2 Without Access to the Internet

percona monitoring and management deployment

Normally it is quite easy to deploy Percona Monitoring and Management (PMM) Server as a Docker container as per the official documentation. However, when working in very restrictive environments, it is possible the server doesn’t have access to the public Internet, so pulling the image from the Docker hub is not possible. Fortunately, there are a few workarounds to get past this problem.

As previously described by Agustin for PMM 1, one way is to Docker pull and save the image somewhere else. Here I will show you another way to do it that doesn’t require a separate server running Docker, and also provide updated instructions for PMM 2.

1. Download the PMM Server image directly from the Percona website. Select the desired Version and choose ‘Server – Docker Image’ from the drop-down box, for example:

download percona monitoring and management

2. Copy the downloaded .docker file to the PMM server, for example via SCP:

scp -i my_private_key pmm-server-2.11.1.docker my_user@my_secure_server:

3. Load the image to the local Docker repository on your PMM server

sudo docker load < pmm-server-2.11.1.docker

4. Create the persistent data container. Normally we would use percona/pmm-server:2 as the image tag, but since we loaded a specific version we need to specify it as follows:

sudo docker create \
-v /srv \
--name pmm-data \
percona/pmm-server:2.11.1 \
/bin/true

5. If this is a production deployment, it is a good idea to move the data container to a dedicated volume.

6. Create the server container (again, specifying the image version we have loaded before):

sudo docker run \
--detach \
--restart always \
--publish 80:80 \
--publish 443:443 \
--volumes-from pmm-data \
--name pmm-server \
percona/pmm-server:2.11.1

7. Verify PMM Server installation by visiting server_hostname:80 or server_hostname:443 and reset the admin password. The default user/password is admin/admin.

All that is left now is to install the clients and start using your brand new Percona Monitoring and Management instance. If you have questions or run into trouble, feel free to reach out to us at the Percona Forums.

Nov
04
2020
--

Running Percona Kubernetes Operator for Percona XtraDB Cluster with Kata Containers

Percona Kubernetes Operator for Percona XtraDB Cluster with Kata Containers

Percona Kubernetes Operator for Percona XtraDB Cluster with Kata ContainersKata containers are containers that use hardware virtualization technologies for workload isolation almost without performance penalties. Top use cases are untrusted workloads and tenant isolation (for example in a shared Kubernetes cluster). This blog post describes how to run Percona Kubernetes Operator for Percona XtraDB Cluster (PXC Operator) using Kata containers.

Prepare Your Kubernetes Cluster

Setting up Kata containers and Kubernetes is well documented in the official github repo (cri-o, containerd, Kubernetes DaemonSet). We will just cover the most important steps and pitfalls.

Virtualization Support

First of all, remember that Kata containers require hardware virtualization support from the CPU on the nodes. To check if your linux system supports it run on the node:

$ egrep ‘(vmx|svm)’ /proc/cpuinfo

VMX (Virtual Machine Extension) and SVM (Secure Virtual Machine) are Intel and AMD features that add various instructions to allow running a guest OS with full privileges, but still keeping host OS protected.

For example, on AWS only i3.metal and r5.metal instances provide VMX capability.

Containerd

Kata containers are OCI (Open Container Interface) compliant, which means that they work pretty well with CRI (Container Runtime Interface) and hence well supported by Kubernetes. To use Kata containers please make sure your Kubernetes nodes run using CRI-O or containerd runtimes.

The image below describes pretty well how Kubernetes works with Kata.

Kubernetes works with Kata

Hint: GKE or kops allows you to start your cluster with containerd out of the box and skip manual steps.

Setting Up Nodes

To run Kata containers, k8s nodes need to have kata-runtime installed and runtime configured properly. The easiest way is to use DaemonSet which installs required packages on every node and reconfigures containerd. As a first step apply the following yamls to create the DaemonSet:

$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/packaging/master/kata-deploy/kata-rbac/base/kata-rbac.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/packaging/master/kata-deploy/kata-deploy/base/kata-deploy.yaml

DaemonSet reconfigures containerd to support multiple runtimes. It does that by changing /etc/containerd/config.toml. Please note that some tools (ex. kops) keep containerd in a separate configuration file config-kops.toml. You need to copy the configuration created by DaemonSet to the corresponding file and restart containerd.

Create runtimeClasses for Kata. RuntimeClass is a feature that allows you to pick runtime for the container during its creation. It has been available since Kubernetes 1.14 as Beta.

$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/packaging/master/kata-deploy/k8s-1.14/kata-qemu-runtimeClass.yaml

Everything is set. Deploy test nginx pod and set the runtime:

$ cat nginx-kata.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-kata
spec:
  runtimeClassName: kata-qemu
  containers:
    - name: nginx
      image: nginx

$ kubectl apply -f nginx-kata.yaml
$ kubectl describe pod nginx-kata | grep “Container ID”
    Container ID:   containerd://3ba8d62be5ee8cd57a35081359a0c08059cf08d8a53bedef3384d18699d13111

On the node verify if Kata is used for this container through ctr tool:

# ctr --namespace k8s.io containers list | grep 3ba8d62be5ee8cd57a35081359a0c08059cf08d8a53bedef3384d18699d13111
3ba8d62be5ee8cd57a35081359a0c08059cf08d8a53bedef3384d18699d13111    sha256:f35646e83998b844c3f067e5a2cff84cdf0967627031aeda3042d78996b68d35 io.containerd.kata-qemu.v2cat 

Runtime is showing kata-qemu.v2 as requested.

The current latest stable PXC Operator version (1.6) does not support runtimeClassName. It is still possible to run Kata containers by specifying

io.kubernetes.cri.untrusted-workload

annotation. To ensure containerd supports this annotation add the following into the configuration toml file on the node:

# cat <<EOF >> /etc/containerd/config.toml
[plugins.cri.containerd.untrusted_workload_runtime]
  runtime_type = "io.containerd.kata-qemu.v2"
EOF

# systemctl restart containerd

Install the Operator

We will install the operator with regular runtime but will put the PXC cluster into Kata containers.

Create the namespace and switch the context:

$ kubectl create namespace pxc-operator
$ kubectl config set-context $(kubectl config current-context) --namespace=pxc-operator

Get the operator from github:

$ git clone -b v1.6.0 https://github.com/percona/percona-xtradb-cluster-operator

Deploy the operator into your Kubernetes cluster:

$ cd percona-xtradb-cluster-operator
$ kubectl apply -f deploy/bundle.yaml

Now let’s deploy the cluster, but before that, we need to explicitly add an annotation to PXC pods and mark them untrusted to enforce Kubernetes to use Kata containers runtime. Edit

deploy/cr.yaml

 :

pxc:
  size: 3
  image: percona/percona-xtradb-cluster:8.0.20-11.1
  …
  annotations:

      io.kubernetes.cri.untrusted-workload: "true"

Now, let’s deploy the PXC cluster:

$ kubectl apply -f deploy/cr.yaml

The cluster is up and running (using 1 node for the sake of experiment):

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
pxc-kata-haproxy-0                                 2/2     Running   0          5m32s
pxc-kata-pxc-0                                     1/1     Running   0          8m16s
percona-xtradb-cluster-operator-749b86b678-zcnsp   1/1     Running   0          44m

In crt output you should see percona-xtradb cluster running using Kata runtime:

# ctr --namespace k8s.io containers list | grep percona-xtradb-cluster | grep kata
448a985c82ae45effd678515f6cf8e11a6dfca159c9abf05a906c7090d297cba    docker.io/percona/percona-xtradb-cluster:8.0.20-11.2 io.containerd.kata-qemu.v2

We are working on adding the support for runtimeClassName option for our operators. The support of this feature enables users to freely choose any container runtime.

Conclusions

Running databases in containers is an ongoing trend and keeping data safe is always the top priority for a business. Kata containers provide security isolation through mature and extensively tested qemu virtualization with little-to-none changes to the existing environment.

Deploy Percona XtraDB Cluster with ease in your Kubernetes cluster with our Operator and Kata containers for better isolation without performance penalties.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com