Mar
25
2019
--

How to Perform Compatible Schema Changes in Percona XtraDB Cluster (Advanced Alternative)?

PXC schema changes options

PXC schema changes optionsIf you are using Galera replication, you know that schema changes may be a serious problem. With its current implementation, there is no way even a simple ALTER will be unobtrusive for live production traffic. It is a fact that with the default TOI alter method, Percona XtraDB Cluster (PXC) cluster suspends writes in order to execute the ALTER in the same order on all nodes.

For factual data structure changes, we have to adapt to the limitations, and either plan for a maintenance window, or use pt-online-schema-change, where interruptions should be very short. I suggest you be extra careful here, as normally you cannot kill an ongoing ALTER query in Galera cluster.

For schema compatible changes, that is, ones that cannot break ROW replication when the writer node and applier nodes have different metadata, we can consider using the Rolling Schema Update (RSU) method. An example of 100% replication-safe DDL is OPTIMIZE TABLE (aka noop-ALTER). However, the following are safe to consider too:

  • adding and removing secondary index,
  • renaming an index,
  • changing the ROW_FORMAT (for example enabling/disabling table compression),
  • changing the KEY_BLOCK_SIZE(compression property).

However, a lesser known fact is that even using the RSU method or pt-online-schema-change for the above may not save us from some unwanted disruptions.

RSU and Concurrent Queries

Let’s take a closer look at a very simple scenario with noop ALTER. We will set wsrep_OSU_method to RSU to avoid a cluster-wide stall. In fact, this mode turns off replication for the following DDL (and only for DDL), so you have to remember to repeat the same ALTER on every cluster member later.

For simplicity, let’s assume there is only one node used for writes. In the first client session, we change the method accordingly to prepare for DDL:

node1 > set wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)
node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| RSU                |          1 |              0 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

(By the way, as seen above, the desync mode is not enabled yet, as it will be automatically enabled around the DDL query only, and disabled right after it finishes).

In a second client session, we start a long enough SELECT query:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
...

And while it’s ongoing, let’s rebuild the table:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (0.98 sec)
Records: 0 Duplicates: 0 Warnings: 0

Surprisingly, immediately the client in the second session receives its SELECT failure:

ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction

So, even a simple SELECT is aborted if it conflicts with the local, concurrent ALTER (RSU)… We can see more details in the error log:

2018-12-04T21:39:17.285108Z 0 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2018-12-04T21:39:17.285124Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 471796)
2018-12-04T21:39:17.305018Z 12 [Note] WSREP: Provider paused at 7bf59bb4-996d-11e8-b3b6-8ed02cd38513:471796 (30)
2018-12-04T21:39:17.324509Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324532Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:
2018-12-04T21:39:17.324535Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb
2018-12-04T21:39:17.324537Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: no conflict, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000
2018-12-04T21:39:17.324542Z 12 [Note] WSREP: MDL conflict db=db1 table=sbtest1 ticket=MDL_SHARED_READ solved by abort
2018-12-04T21:39:17.324544Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324545Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:
2018-12-04T21:39:17.324547Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb
2018-12-04T21:39:17.324548Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: must abort, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000
2018-12-04T21:39:18.517457Z 12 [Note] WSREP: resuming provider at 30
2018-12-04T21:39:18.517482Z 12 [Note] WSREP: Provider resumed.
2018-12-04T21:39:18.518310Z 0 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2018-12-04T21:39:18.518342Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 471796)
2018-12-04T21:39:18.519077Z 0 [Note] WSREP: Member 0.0 (node1) synced with group.
2018-12-04T21:39:18.519099Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 471796)
2018-12-04T21:39:18.519119Z 2 [Note] WSREP: Synchronized with group, ready for connections
2018-12-04T21:39:18.519126Z 2 [Note] WSREP: Setting wsrep_ready to true

Another example – a simple sysbench test, during which I did noop ALTER in RSU mode:

# sysbench /usr/share/sysbench/oltp_read_only.lua --table-size=1000 --tables=8 --mysql-db=db1 --mysql-user=root --threads=8 --time=200 --report-interval=1 --events=0 --db-driver=mysql run
sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 8
Report intermediate results every 1 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 1s ] thds: 8 tps: 558.37 qps: 9004.30 (r/w/o: 7880.62/0.00/1123.68) lat (ms,95%): 18.28 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 8 tps: 579.01 qps: 9290.22 (r/w/o: 8130.20/0.00/1160.02) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 8 tps: 597.36 qps: 9528.89 (r/w/o: 8335.17/0.00/1193.72) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_store_result() returned error 1317 (Query execution was interrupted)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 1317, state = '70100': Query execution was interrupted

So, SELECT queries are aborted to resolve MDL lock request that a DDL in RSU needs immediately. This of course applies to INSERT, UPDATE and DELETE as well. That’s quite an intrusive way to accomplish the goal…

“Manual RSU”

Let’s try a “manual RSU” workaround instead. In fact, we can achieve the same isolated DDL execution as in RSU, by putting a node in desync mode (to avoid flow control) and disabling replication for our session. That way, the ALTER will only be executed in that particular node.

Session 1:

node1 > set wsrep_OSU_method=TOI; set global wsrep_desync=1; set wsrep_on=0;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| TOI                |          0 |              1 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

Session 2:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
+-----------+
| count(*)  |
+-----------+
| 423680000 |
+-----------+
1 row in set (14.07 sec)

Session 1:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (13.52 sec)
Records: 0 Duplicates: 0 Warnings: 0

Session 3:

node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| id | command | time | state                           | info |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| 11 | Query   | 9    | Sending data                    | select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000 |
| 12 | Query   | 7    | Waiting for table metadata lock | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing                       | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| id | command | time | state          | info |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| 11 | Sleep   | 14   |                | NULL |
| 12 | Query   | 13   | altering table | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing      | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

In this case, there was no interruption, the ALTER waited for it’s MDL lock request to succeed gracefully, and did it’s job when it became possible.

Remember, you have to execute the same commands on the rest of the nodes to make them consistent – even for noop-alter, it’s important to make the nodes consistent in terms of table size on disk.

Kill Problem

Another fact is that you cannot cancel or kill a DDL query executed in RSU or in TOI method:

node1 > kill query 12;
ERROR 1095 (HY000): You are not owner of thread 12

This may be an annoying problem when you need to unblock a node urgently. Fortunately, the workaround with wsrep_on=0 also allows to kill an ALTER without that restriction:

Session 1:

node1 > kill query 22;
Query OK, 0 rows affected (0.00 sec)

Session 2:

node1 > alter table db1.sbtest1 engine=innodb;
ERROR 1317 (70100): Query execution was interrupted

Summary

The RSU method may be more intrusive then you’d expect. For schema compatible changes, it is worth considering “manual RSU” with

set global wsrep_desync=1; set wsrep_on=0;

When using it though, please remember that wsrep_on applies to all types of writes, both DDL and DML, so be extra careful to set it back to 1 after the ALTER is done. So the procedure will look like this:

SET GLOBAL wsrep_desync=1;
SET wsrep_on=0;
ALTER ...  /* compatible schema change only! */
SET wsrep_on=1;
SET GLOBAL wsrep_desync=0;

Incidentally, as in my opinion the current RSU behavior is unnecessarily intrusive, I have filed this change suggestion: https://jira.percona.com/browse/PXC-2293


Photo by Pierre Bamin on Unsplash

Mar
19
2019
--

How To Test and Deploy Kubernetes Operator for MySQL(PXC) in OSX/macOS?

kubernetes on mac osx

kubernetes on mac osxIn this blog post, I’m going to show you how to test Kubernetes locally on OSX/macOS. Testing Kubernetes without having access to a cloud operator in a local lab is not as easy as it sounds. I’d like to share some of my experiences in this adventure. For those who have already experienced in Virtualbox & Vagrant combination, I can tell you that it doesn’t work. Since Kubernetes will require virtualization, setting another virtual environment within another VirtualBox has several issues. After trying to bring up a cluster for a day or two, I gave up my traditional lab and figured out that Kubernetes has an alternate solution called minikube.

Installation

If your OSX/macOS doesn’t have brew I strongly recommend installing it. My OSX/macOS version at the time of this post was macOS 10.14.3 (18D109).

$ brew update && brew install kubectl && brew cask install docker minikube virtualbox

Once minikube is installed, we’ll need to start the virtual environment that is required to run our operator.

I’m starting my minikube environment with 4Gb memory since our Percona Xtradb(PXC) Cluster will have 3 MySQL nodes + 1 ProxySQL pod.

$ minikube start --memory 4096
?  minikube v0.35.0 on darwin (amd64)
?  Creating virtualbox VM (CPUs=2, Memory=4096MB, Disk=20000MB) ...
?  "minikube" IP address is 192.168.99.100
?  Configuring Docker as the container runtime ...
?  Preparing Kubernetes environment ...
?  Pulling images required by Kubernetes v1.13.4 ...
?  Launching Kubernetes v1.13.4 using kubeadm ...
?  Waiting for pods: apiserver proxy etcd scheduler controller addon-manager dns
?  Configuring cluster permissions ...
?  Verifying component health .....
?  kubectl is now configured to use "minikube"
?  Done! Thank you for using minikube!

We’re now ready to install Install Percona XtraDB Cluster on Kubernetes.

Setup

Clone and download Kubernetes Operator for MySQL.

$ git clone -b release-0.2.0 https://github.com/percona/percona-xtradb-cluster-operator
Cloning into 'percona-xtradb-cluster-operator'...
remote: Enumerating objects: 191, done.
remote: Counting objects: 100% (191/191), done.
remote: Compressing objects: 100% (114/114), done.
remote: Total 10321 (delta 73), reused 138 (delta 67), pack-reused 10130
Receiving objects: 100% (10321/10321), 17.04 MiB | 3.03 MiB/s, done.
Resolving deltas: 100% (3526/3526), done.
Checking out files: 100% (5159/5159), done.
$ cd percona-xtradb-cluster-operator

Here we have to make the following modifications for this operator to work on OSX/macOS.

  1. Reduce memory allocation for each pod.
  2. Reduce CPU usage for each pod.
  3. Change the topology type (because we want to run all PXC instances on one node).
$ sed -i.bak 's/1G/500m/g' deploy/cr.yaml
$ grep "memory" deploy/cr.yaml
        memory: 500m
      #   memory: 500m
        memory: 500m
      #   memory: 500m
$ sed -i.bak 's/600m/200m/g' deploy/cr.yaml
$ grep "cpu" deploy/cr.yaml
        cpu: 200m
      #   cpu: "1"
        cpu: 200m
      #   cpu: 700m
$ grep "topology" deploy/cr.yaml
      topologyKey: "kubernetes.io/hostname"
    #   topologyKey: "failure-domain.beta.kubernetes.io/zone"
$ sed -i.bak 's/kubernetes\.io\/hostname/none/g' deploy/cr.yaml
$ grep "topology" deploy/cr.yaml
      topologyKey: "none"
    #   topologyKey: "failure-domain.beta.kubernetes.io/zone"

We’re now ready to deploy our PXC via the operator.

$ kubectl apply -f deploy/crd.yaml
customresourcedefinition.apiextensions.k8s.io/perconaxtradbclusters.pxc.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaxtradbbackups.pxc.percona.com created
$ kubectl create namespace pxc
namespace/pxc created
$ kubectl config set-context $(kubectl config current-context) --namespace=pxc
Context "minikube" modified.
$ kubectl apply -f deploy/rbac.yaml
role.rbac.authorization.k8s.io/percona-xtradb-cluster-operator created
rolebinding.rbac.authorization.k8s.io/default-account-percona-xtradb-cluster-operator created
$ kubectl apply -f deploy/operator.yaml
deployment.apps/percona-xtradb-cluster-operator created
$ kubectl apply -f deploy/secrets.yaml
secret/my-cluster-secrets created
$ kubectl apply -f deploy/configmap.yaml
configmap/pxc created
$ kubectl apply -f deploy/cr.yaml
perconaxtradbcluster.pxc.percona.com/cluster1 created

Here we’re ready to monitor the progress of our deployment.

$ kubectl get pods
NAME                                               READY   STATUS              RESTARTS   AGE
cluster1-pxc-node-0                                0/1     ContainerCreating   0          86s
cluster1-pxc-proxysql-0                            1/1     Running             0          86s
percona-xtradb-cluster-operator-5857dfcb6c-g7bbg   1/1     Running             0          109s

If any of the nodes is having difficulty passing any STATUS to Running state

$ kubectl describe pod cluster1-pxc-node-0
Name:               cluster1-pxc-node-0
Namespace:          pxc
Priority:           0
.
..
...
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  3m47s (x14 over 3m51s)  default-scheduler  pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled         3m47s                   default-scheduler  Successfully assigned pxc/cluster1-pxc-node-0 to minikube
  Normal   Pulling           3m45s                   kubelet, minikube  pulling image "perconalab/pxc-openshift:0.2.0"
  Normal   Pulled            118s                    kubelet, minikube  Successfully pulled image "perconalab/pxc-openshift:0.2.0"
  Normal   Created           117s                    kubelet, minikube  Created container
  Normal   Started           117s                    kubelet, minikube  Started container
  Warning  Unhealthy         89s                     kubelet, minikube  Readiness probe failed:
At this stage we’re ready to verify our cluster as soon as we see following output (READY 1/1):
$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-pxc-node-0                                1/1     Running   0          7m38s
cluster1-pxc-node-1                                1/1     Running   0          4m46s
cluster1-pxc-node-2                                1/1     Running   0          2m25s
cluster1-pxc-proxysql-0                            1/1     Running   0          7m38s
percona-xtradb-cluster-operator-5857dfcb6c-g7bbg   1/1     Running   0          8m1s

In order to connect to this cluster, we’ll need to deploy a client shell access.

$ kubectl run -i --rm --tty percona-client --image=percona:5.7 --restart=Never -- bash -il
If you don't see a command prompt, try pressing enter.
bash-4.2$ mysql -h cluster1-pxc-proxysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3617
Server version: 5.5.30 (ProxySQL)
Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.7.25-28, for Linux (x86_64) using  6.2
Connection id:		3617
Current database:	information_schema
Current user:		root@cluster1-pxc-proxysql-0.cluster1-pxc-proxysql.pxc.svc.cluste
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.30 (ProxySQL)
Protocol version:	10
Connection:		cluster1-pxc-proxysql via TCP/IP
Server characterset:	latin1
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
TCP port:		3306
Uptime:			14 min 1 sec
Threads: 1  Questions: 3  Slow queries: 0
--------------

A few things to remember:

  • Secrets for this setup are under deploy/secrets.yaml, you can decode via
$ echo -n '{secret}' |base64 -D

  • To reconnect shell
$ kubectl run -i --tty percona-client --image=percona:5.7 -- sh

  • To redeploy the pod delete first and repeat above steps without configuration changes
$ kubectl delete -f deploy/cr.yaml

  • To stop and delete  minikube virtual environment
$ minikube stop

$ minikube delete

References

Credits


Photo by frank mckenna on Unsplash

Feb
28
2019
--

Percona XtraDB Cluster 5.6.43-28.32 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7

Percona is glad to announce the release of Percona XtraDB Cluster 5.6.43-28.32 on February 28, 2019. Binaries are available from the downloads section or from our software repositories.

This release of Percona XtraDB Cluster includes the support of Ubuntu 18.10 (Cosmic Cuttlefish). Percona XtraDB Cluster 5.6.43-28.32 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

  • PXC-2388: In some cases, DROP FUNCTION function_name was not replicated.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Feb
28
2019
--

Percona XtraDB Cluster 5.7.25-31.35 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona is glad to announce the release of Percona XtraDB Cluster 5.7.25-31.35 on February 28, 2018. Binaries are available from the downloads section or from our software repositories.

This release of Percona XtraDB Cluster includes the support of Ubuntu 18.10 (Cosmic Cuttlefish). Percona XtraDB Cluster 5.7.25-31.35 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

  • PXC-2346mysqld could crash when executing mysqldump --single-transaction while the binary log is disabled. This problem was also reported in PXC-1711PXC-2371PXC-2419.
  • PXC-2388: In some cases, DROP FUNCTION function_name was not replicated.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Feb
20
2019
--

ProxySQL Native Support for Percona XtraDB Cluster (PXC)

galera proxy content image

ProxySQL in its versions up to 1.x did not natively support Percona XtraDB Cluster (PXC). Instead, it relied on the flexibility offered by the scheduler. This approach allowed users to implement their own preferred way to manage the ProxySQL behaviour in relation to the Galera events.

From version 2.0 we can use native ProxySQL support for PXC.. The mechanism to activate native support is very similar to the one already in place for group replication.

In brief it is based on the table [runtime_]mysql_galera_hostgroups and the information needed is mostly the same:

  • writer_hostgroup: the hostgroup ID that refers to the WRITER
  • backup_writer_hostgroup: the hostgoup ID referring to the Hostgorup that will contain the candidate servers
  • reader_hostgroup: The reader Hostgroup ID, containing the list of servers that need to be taken in consideration
  • offline_hostgroup: The Hostgroup ID that will eventually contain the writer that will be put OFFLINE
  • active: True[1]/False[0] if this configuration needs to be used or not
  • max_writers: This will contain the MAX number of writers you want to have at the same time. In a sane setup this should be always 1, but if you want to have multiple writers, you can define it up to the number of nodes.
  • writer_is_also_reader: If true [1] the Writer will NOT be removed from the reader HG
  • max_transactions_behind: The number of wsrep_local_recv_queue after which the node will be set OFFLINE. This must be carefully set, observing the node behaviour.
  • comment: I suggest to put some meaningful notes to identify what is what.

Given the above let us see what we need to do in order to have a working galera native solution.
I will have three Servers:

192.168.1.205 (Node1)
  192.168.1.21  (Node2)
  192.168.1.231 (node3)

As set of Hostgroup, I will have:

Writer  HG-> 100
Reader  HG-> 101
BackupW HG-> 102
offHG   HG-> 9101

To set it up

Servers first:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,1000);

Then the galera settings:

insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) values (100,102,101,9101,0,1,1,16);

As usual if we want to have R/W split we need to define the rules for it:

insert into mysql_query_rules (rule_id,proxy_port,schemaname,username,destination_hostgroup,active,retries,match_digest,apply) values(1040,6033,'windmills','app_test',100,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,schemaname,username,destination_hostgroup,active,retries,match_digest,apply) values(1041,6033,'windmills','app_test',101,1,3,'^SELECT.*@@',1);
save mysql query rules to disk;
load mysql query rules to run;

Then another important variable… the server version, please do yourself a good service ad NEVER use the default.

update global_variables set variable_value='5.7.0' where variable_name='mysql-server_version';
LOAD MYSQL VARIABLES TO RUNTIME;SAVE MYSQL VARIABLES TO DISK;

Finally activate the whole thing:

save mysql servers to disk;
load mysql servers to runtime;

One thing to note before we go ahead. In the list of servers I had:

  1. Filled only the READER HG
  2. Used the same weight

This because of the election mechanism ProxySQL will use to identify the writer, and the (many) problems that may be attached to it.

For now let us go ahead and see what happens when I load this information to runtime.

Before running the above commands:

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

After:

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 501        |
| 1000   | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 501        |
| 1000   | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 546        |
| 1000   | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 467        |
| 1000   | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 546        |
| 1000   | 102       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 467        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
mysql> select * from runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
       writer_hostgroup: 100
backup_writer_hostgroup: 102
       reader_hostgroup: 101
      offline_hostgroup: 9101
                active: 0  <----------- note this
            max_writers: 1
  writer_is_also_reader: 1
max_transactions_behind: 16
                comment: NULL
1 row in set (0.01 sec)

As we can see, ProxySQL had taken the nodes from my READER group and distribute them adding node 1 in the writer and node 2 as backup_writer.

But – there is a but – wasn’t my rule set with Active=0? Indeed it was, and I assume this is a bug (#Issue  1902).

The other thing we should note is that ProxySQL had elected as writer node 3 (192.168.1.231).
As I said before what should we do IF we want to have a specific node as preferred writer?

We need to modify its weight. So say we want to have node 1 (192.168.1.205) as writer we will need something like this:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,10000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,100);

Doing that will give us :

+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 10000  | 100       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 2209       |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 546        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 508        |
| 10000  | 101       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 2209       |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 546        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 508        |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

If you noticed, given we had set the WEIGHT in node 1 higher, this node will become also the most utilized for reads.
We probably do not want that, so let us modify the reader weight.

update mysql_servers set weight=10 where hostgroup_id=101 and hostname='192.168.1.205';

At this point if we trigger the failover, with set global wsrep_reject_queries=all; on node 1.
ProxySQL will take action and will elect another node as writer:

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 562        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 562        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 588        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 588        |
| 10000  | 9101      | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 468        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

Node 3 (192.168.1.231) is the new writer and node 1 is in the special group for OFFLINE.
Let see now what will happen IF we put back node 1.

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 10000  | 100       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 449        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 532        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 569        |
| 10000  | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 449        |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 532        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 569        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

Ooops the READER has come back with the HIGHEST value and as such it will be the most used node, once more. To fix it, we need to re-run the update as before.

But there is a way to avoid this? In short the answer is NO!
This, in my opinion, is BAD and is worth a feature request, because this can really put a node on the knees.

Now this is not the only problem. There is another point that is probably worth discussion, which is the fact ProxySQL is currently doing FAILOVER/FAILBACK.

Failover, is obviously something we want to have, but failback is another discussion. The point is, once the failover is complete and the cluster has redistributed the incoming requests, doing a failback is an impacting operation that can be a disruptive one too.

If all nodes are treated as equal, there is no real way to prevent it, while if YOU set a node to be the main writer, something can be done, let us see what and how.
Say we have:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,100);
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 100       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 470        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 558        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 613        |
| 10     | 101       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 470        |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 558        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 613        |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

Let us put the node down
set global wsrep_reject_queries=all;

And check:

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 519        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 519        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 506        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 506        |
| 1000   | 9101      | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 527        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

We can now manipulate the weight in the special OFFLINE group and see what happen:

update mysql_servers set weight=10 where hostgroup_id=9101 and hostname='192.168.1.205'

Then I put the node up again:
set global wsrep_reject_queries=none;

+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 537        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 537        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 573        |
| 10     | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0	   | 0           | 0	   | 0                 | 0               | 0               | 458	|
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	   | 0           | 0	   | 0                 | 0               | 0               | 573	|
| 10     | 102       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 458        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

That’s it, the node is back but with no service interruption.

At this point we can decide if make this node reader like the others, or wait and plan a proper time of the day when we can put it back as writer, while, in the meanwhile it has a bit of load to warm its bufferpool.

The other point – and important information – is what is ProxySQL is currently checking on Galera? From reading the code Proxy will trap the following:

  • read_only
  • wsrep_local_recv_queue
  • wsrep_desync
  • wsrep_reject_queries
  • wsrep_sst_donor_rejects_queries
  • primary_partition

Plus the standard sanity checks on the node.

Finally to monitor the whole situation we can use this:

mysql> select * from mysql_server_galera_log order by time_start_us desc limit 10;
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
| hostname      | port | time_start_us    | success_time_us | primary_partition | read_only | wsrep_local_recv_queue | wsrep_local_state | wsrep_desync | wsrep_reject_queries | wsrep_sst_donor_rejects_queries | error |
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
| 192.168.1.231 | 3306 | 1549982591661779 | 2884            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982591659644 | 2778            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982591658728 | 2794            | YES               | NO        | 0                      | 4                 | NO           | YES                  | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982586669233 | 2827            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982586663458 | 5100            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982586658973 | 4132            | YES               | NO        | 0                      | 4                 | NO           | YES                  | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982581665317 | 3084            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982581661261 | 3129            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982581658242 | 2786            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982576661349 | 2982            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+

As you can see above the log table keeps track of what is changed. In this case, it reports that node 1 has wsrep_reject_queries activated, and it will continue to log this until we set wsrep_reject_queries=none.

Conclusions

ProxySQL galera native integration is a useful feature to manage any Galera implementation, no matter whether it’s Percona PXC, MariaDB cluster or MySQL/Galera.

The generic approach is obviously a good thing, still it may miss some specific extension like we have in PXC with the performance_schema pxc_cluster_view table.

I’ve already objected about the failover/failback, and I am here again to remind you: whenever you do a controlled failover REMEMBER to change the weight to prevent an immediate failback.

This is obviously not possible in the case of a real failover, and, for instance, a simple temporary eviction will cause two downtimes instead only one. Some environments are fine with that others not so.

Personally I think there should be a FLAG in the configuration, such that we can decide if failback should be executed or not.

 

Feb
08
2019
--

ProxySQL 1.4.14 and Updated proxysql-admin Tool

ProxySQL 1.4.14

ProxySQL 1.4.14

ProxySQL 1.4.14, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL,  and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.14 source and binary packages available from the Percona download page for ProxySQL include ProxySQL Admin – a tool developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.14 are available as well. You can download the original ProxySQL from GitHub. GitHub hosts the documentation in the wiki format.

This release introduces an improvement on how proxysql-admin works with the --max-connections option. In previous releases, this option always equaled to 1000. Now, proxysql_galera_checker uses the value of the --max-connections option set by the user either in the command line (proxysql-admin --max-connections) or in the configuration file.

If the user doesn’t set this option, it defaults to 1000.

Improvements

  • PSQLADM-130: Every time a node is removed and then added back, proxysql_galera_checker script restores the custom value of the --max-connections option set using proxysql-admin --max-connections.
  • The --syncusers option of proxysql-admin starts to support MariaDB. Thanks to Jonas Kint (@jonaskint) for this contribution.

ProxySQL is available under Open Source license GPLv3.

Feb
05
2019
--

Upcoming Webinar Wed 2/6: Percona Software News and Roadmap Update

Percona Software News and Roadmap Update Webinar

Percona Software News and Roadmap Update WebinarJoin Percona CEO Peter Zaitsev as he presents Percona Software News and Roadmap Update on Wednesday, February 6, 2019, at 11:00 AM PST (UTC-8) / 2:00 PM EST (UTC-5).

Register Now

Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software. Topics include Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

During this webinar, Peter will talk about newly released features in Percona software. He will also show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register today to join Peter for his Percona Software News and Roadmap Update.

Jan
28
2019
--

Upcoming Webinar Wed 1/30: Percona XtraDB Cluster: Failure Scenarios and their Recovery

Percona XtraDB Cluster: Failure Scenarios and their Recovery

Percona XtraDB Cluster: Failure Scenarios and their RecoveryPlease join Percona’s Senior Technical Manager, Alkin Tezuysal, and Percona’s Percona XtraDB Cluster Lead, Krunal Bauskar as they present their talk, Percona XtraDB Cluster: Failure Scenarios and their Recovery on Wednesday, January 30th, 2019, at 8:00 AM PST (UTC-8) / 11:00 AM EST (UTC-5).

Register Now

Percona XtraDB Cluster (a.k.a PXC) is an open source, multi-master, high availability MySQL clustering solution. PXC works with your MySQL / Percona Server-created database. Given the multi-master aspect, there are multi-guards to protect a cluster from entering an inconsistent state. Most of these guards are configurable based on their user environment. However, if they are not configured properly they could cause the cluster to stall, fail or error-out.

In this session, we’ll discuss failure scenarios, including a MySQL cluster entering a non-primary state due to network partitioning. We’ll also discuss a cluster stall due to flow control, data inconsistency causing the shutdown of a node and common problems during the initial catch up – a.k.a State Snapshot Transfer (SST). Other issues include delays in the purging of a transaction, a blocking DDL causing the entire cluster to stall and a misconfigured cluster.

We will also go over how to solve some of these problems and how to safely recover from these failures.

To learn more, register for Percona XtraDB Cluster: Failure Scenarios and their Recovery.

Jan
18
2019
--

Percona XtraDB Cluster Operator 0.2.0 Early Access Release Is Now Available

Percona XtraDB Cluster Operator

 

Percona announces the release of Percona XtraDB Cluster Operator  0.2.0 early access.

The Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help save your time and effort.

Percona software builds located in the Percona-Lab and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

You can install the Percona XtraDB Cluster Operator on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The Percona XtraDB Cluster Operator on Percona-Lab is an early access release. Percona doesn’t recommend it for production environments. 

New features

  • Advanced nodes assignment implemented in this version allows to run containers with Percona XtraDB Cluster nodes on different hosts, availability zones, etc. to achieve higher availability and fault tolerance.
  • Cluster backups are now supported, and can be performed on a schedule or on demand.
  • Percona XtraDB Cluster Operator now supports private container registries like those in OpenShift so that Internet access is not required to deploy the operator.

Improvements

  • CLOUD-69: Annotations and labels are now passed from the deploy/cr.yaml configuration file to a StatefulSet for both Percona XtraDB Cluster and ProxySQL Pods
  • CLOUD-55: Now setting a password for the ProxySQL admin user is supported.
  • CLOUD-48: Migration to operator SDK 0.2.1

Fixed Bugs

  • CLOUD-82: Pods were stopped in random order while the cluster removal, which could cause problems when recreating the cluster with the same name.
  • CLOUD-79: Setting long cluster name in the deploy/cr.yaml file made Percona XtraDB Cluster unable to start.
  • CLOUD-54: The clustercheck tool used monitor user instead of its own clustercheck one for liveness and readiness probes.
Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses. It integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

Jan
18
2019
--

Replication Manager Works with MariaDB

Complex multi-cluster replication topology

Some time ago I wrote a script to manage asynchronous replication links between Percona XtraDB clusters. The original post can be found here. The script worked well with Percona XtraDB Cluster but it wasn’t working well with MariaDB®.  Finally, the replication manager works with MariaDB.

First, let’s review the purpose of the script. Managing replication links between Galera based clusters is a tedious task. There are many potential slaves and many potential masters. Furthermore, each replication link must have only a single slave. Just try to imagine how you would maintain the following replication topology:

A complex replication topology

The above topology consists of five clusters and four master-to-master links. The replication manager can easily handle this topology. Of course, it is not a fix to the limitations of asynchronous replication. You must make sure your writes are replication safe. You could want, for example, a global user list or to centralize some access logs. Just to refresh memories, here are some of the script highlights:

  • Uses the Galera cluster for Quorum
  • Configurable, arbitrarily complex topologies
  • The script stores the topology in database tables
  • Elects slaves automatically
  • Monitors replication links
  • Slaves can connect to a list of potential masters

As you probably know, MariaDB has a different GTID implementation and syntax for the multi-source replication commands. I took some time to investigate why the script was failing and fixed it. Now, provided you are using MariaDB 10.1.4+ with GTIDs, the replication manager works fine.

You can found the script here. Be aware that although I work for Percona, the script is not officially supported by Percona.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com