Personal blog of Yzmir Ramirez

Jun
28
2023

Consequences of Forcing Bootstrap on Percona XtraDB Cluster

forcing bootstrap

Recently, I was working on a very unfortunate case that revolved around diverging clusters, data loss, missing important log errors, and forcing commands on Percona XtraDB Cluster (PXC). Even though PXC tries its best to explain what happens in the error log, I can vouch that it can be missed or overlooked when you do not know what to expect.

This blog post is a warning tale, an invitation to try yourself and break stuff (not in production, right?).

TLDR:
Do you know right away what happened when seeing this log?

2023-06-22T08:23:29.003334Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_post_state_exchange():433: Reversing history: 171 -> 44, this member has applied 127 more events than the primary component.Data loss is possible. Must abort.

Demonstration

Using the great https://github.com/datacharmer/dbdeployer:

$ dbdeployer deploy replication --topology=pxc --sandbox-binary=~/opt/pxc  8.0.31

Let’s write some data

$ ./sandboxes/pxc_msb_8_0_31/sysbench oltp_read_write --tables=2 --table-size=1000 prepare

Then let’s suppose someone wants to restart node 1. For some reason, they read somewhere in your internal documentation that they should bootstrap in that situation. With dbdeployer, this will translate to:

$ ./sandboxes/pxc_msb_8_0_31/node1/stop
stop /home/yoann-lc/sandboxes/pxc_msb_8_0_31/node1

$ ./sandboxes/pxc_msb_8_0_31/node1/start --wsrep-new-cluster
......................................................................................................^C

It fails, as it should.

In reality, those bootstrap mistakes happen in homemade start scripts, puppet or ansible modules, or even internal procedures applied in the wrong situation.

Why did it fail? First error to notice:

2023-06-22T08:00:48.322148Z 0 [ERROR] [MY-000000] [Galera] It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

Reminder: Bootstrap should only be used when every node has been double-checked to be down; it’s a manual operation. It fails here because it was not forced and because this node was not the last to be stopped in the cluster.

Good reflex: Connecting to other mysql and check for ‘wsrep_cluster_size’ and ‘wsrep_cluster_status’ statuses before anything.

mysql> show global status where variable_name IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_local_commits','wsrep_received','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected');

Do not: Apply blindly what this log is telling you to do.

But we are here to “fix” around and find out, so let’s bootstrap.

$ sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' ./sandboxes/pxc_msb_8_0_31/node1/data/grastate.dat
$ ./sandboxes/pxc_msb_8_0_31/node1/start --wsrep-new-cluster
.. sandbox server started

At this point, notice that from node1, you have:

$ ./sandboxes/pxc_msb_8_0_31/node1/use -e "show global status where variable_name in ('wsrep_cluster_status', 'wsrep_cluster_size')"
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_size   | 1       |
| wsrep_cluster_status | Primary |
+----------------------+---------+

But from node2 and node3 you will have:

$ ./sandboxes/pxc_msb_8_0_31/node2/use -e "show global status where variable_name in ('wsrep_cluster_status', 'wsrep_cluster_size')"
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| wsrep_cluster_size   | 2       |
| wsrep_cluster_status | Primary |
+----------------------+---------+

Looks fishy. But does your monitoring really alert you to this?

Let’s write some more data, obviously on node1, because why not? It looks healthy.

$ ./sandboxes/pxc_msb_8_0_31/node1/sysbench oltp_delete --tables=2 --table-size=1000 --events=127  run

127 will be useful later on.

Nightmare ensues

We are a few days later. You are still writing to your node. Some new reason to restart node1 comes. Maybe you want to apply a parameter.

$ ./sandboxes/pxc_msb_8_0_31/node1/restart
.............................................................................................................................................................^C

It fails?

Reviewing logs, you would find:

$ less sandboxes/pxc_msb_8_0_31/node1/data/msandbox.err
...
2023-06-22T08:23:29.003334Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_post_state_exchange():433: Reversing history: 171 -> 44, this member has applied 127 more events than the primary component.Data loss is possible. Must abort.
...

Voila, We find our “127” again.

Good reflex: Depends. It would need a post of its own, but that’s a serious problem.

Do not: Force SST on this node. Because it will work, and every data inserted on node1 will be lost.

What does it mean?

When forcing bootstrap, a node will always start. It won’t ever try to connect to other nodes if they are healthy. The other nodes won’t try to connect to the third one either; from their point of view, it just never joined, so it’s not part of the cluster.

When restarting the previously bootstrapped node1 in non-bootstrapped mode, that’s the first time they all see each other in a while.

Each time a transaction is committed, it is replicated along with a sequence number (seqno). The seqno is an ever-growing number. It is used by nodes to determine if incremental state transfer is possible, or if a node state is coherent with others.

Now that node1 is no longer in bootstrap mode, node1 connects to the other members. node1 shares its state (last primary members, seqno). The other nodes correctly picked up that this seqno looks suspicious because it’s higher than their own, meaning the node joining could have applied more transactions. It could also mean it was from some other cluster.

Because nodes are in doubt, nothing will happen. Node1 is denied joining and will not do anything. It won’t try to resynchronize automatically, and it won’t touch its data. Node2 and node3 are not impacted; they will be kept as is too.

How to proceed from there will depend as there are no general guidelines. Ideally, a source of truth should be found. If both clusters applied writes, that’s the toughest situation to be in, and it’s a split brain.

Note: seqno are just numbers. Having equal seqno does not actually guarantee that the underlying transactions applied are identical, but it’s still useful as a simple sanity check. If we were to mess around even more and apply 127 transactions on node2, or even modify seqno manually in grastate.dat, we could have “interesting” results. Try it out (not in production, mind you)!

Note: If you are unaware of bootstrapping and how to properly recover, check out the documentation.

Conclusion

Bootstrap is a last resort procedure, don’t force it lightly. Do not force SST right away if a node does not want to join either. You should always check the error log first.

Fortunately, PXC does not blindly let any node join without some sanity checks.

Minimize unexpected downtime and data loss with a highly available, open source MySQL clustering solution.

Download Percona XtraDB Cluster today

Written by Yoann La Cancellera in: bootstrap,galera,Insight for DBAs,MySQL,mysql-and-variants,Percona Software,Percona XtraDB Cluster,Zend Developer |

Dec
09
2021

A Look Into Percona XtraDB Cluster Non-Blocking Operation for Online Schema Upgrade

Percona XtraDB Cluster Non-Blocking Operation

Percona XtraDB Cluster Non-Blocking Operation Percona XtraDB Cluster 8.0.25 (PXC) has introduced a new option to perform online schema modifications: NBO (Non-Blocking Operation).

When using PXC, the cluster relies on the wsrep_OSU_method parameter to define the Online Schema Upgrade (OSU) method the node uses to replicate DDL statements.

Until now, we normally have three options:

Use Total Isolation Order (TOI, the default)
Use Rolling Schema Upgrade (RSU)
Use Percona’s online schema change tool (TOI + PTOSC)

Each method has some positive and negative aspects. TOI will lock the whole cluster from being able to accept data modifications for the entire time it takes to perform the DDL operation. RSU will misalign the schema definition between the nodes, and in any case, the node performing the DDL operation is still locked. Finally, TOI+PTOSC will rely on creating triggers and copying data, so in some cases, this can be very impactful.

The new Non-Blocking Operation (NBO) method is to help to reduce the impact on the cluster and make it easier to perform some DDL operations.

At the moment we only support a limited set of operations with NBO like:

ALTER INDEX
CREATE INDEX
DROP INDEX

Any other command will result in an error message ER_NOT_SUPPORTED_YET.

But let us see how it works and what the impact is while we will also compare it with the default method TOI.

What we will do is work with four connections:

1 – to perform ddl
2 – to perform insert data in the table being altered
3 – to perform insert data on a different table
4-5 – checking the other two nodes operations

PXC must be at least Version 8.0.25-15.1.

The table we will modify is :

DC1-1(root@localhost) [windmills_s]>show create table windmills_test\G
*************************** 1. row ***************************
       Table: windmills_test
Create Table: CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=8199260 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)

And contains ~five million rows.

DC1-1(root@localhost) [windmills_s]>select count(*) from windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+
1 row in set (0.44 sec)

The Commands

Connection 1:

ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
  ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;

Connection 2:

while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:

while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4-5:

while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_s)"|grep -i -v localhost;sleep 1;done

Operations

Start inserts from connections
Start commands in connections 4 – 5 on the other nodes

Execute:

For TOI

DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=TOI;

For NBO

DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=NBO;

For both

DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=shared;

Let’s Run It

Altering a Table with TOI

DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
Query OK, 0 rows affected (1 min 4.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

Inserts in the altering table (connection 2):

.450
.492
64.993 <--- Alter blocks all inserts on the table we are altering
.788
.609

Inserts on the other table (connection 3):

.455
.461
64.161 <--- Alter blocks all inserts on all the other tables as well
.641
.483

On the other nodes at the same time of the ALTER we can see:

Id  User             db         Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
15	system user		windmills_s	Query	102	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	102238	0	       0    <--- time from start

In short, we have the whole cluster locked for ~64 seconds. During this period of time, all the operations to modify data or structure were on hold.

Let’s Now Try With NBO

Inserts in the altering table:

.437
.487
120.758 <---- Execution time increase
.617
.510

Inserts on the other table:

.468
.485
25.061 <---- still a metalock, but not locking the other tables for the whole duration 
.494
.471

On the other nodes at the same time of the ALTER we can see:

Id      User         db             Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
110068	system user	 windmills_s	Connect	86	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	 120420	 0	          0

In this case, what is also interesting to note is that:

We have a moment of metalock:

110174 pmm 127.0.0.1:42728 NULL Query 2 Waiting for table metadata lock  SELECT x FROM information_schema.tables  WHERE TABLE_SCHEMA = 'windmills_s' 1486 10    0

110068  system user connecting host windmills_s Connect 111 closing tables ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 111000 0 0

The execution time is longer

Summarizing:

TOI            NBO
Time on hold for insert for altering table   	~64 sec    	~120 sec
Time on hold for insert for another table   	~64 sec      ~25 sec 
metalock                       			whole time  	 only at the end

What is Happening? What are the Differences and Why Does it Take Longer with NBO?

Let’s see at a very high level how the two work:

TOI: when you issue a DDL like ADD INDEX a metadata lock is taken on the table and it will be released only at the end of the operation. During this time, you cannot:
- Perform DMLs on any cluster node
- Alter another table in the cluster
NBO: the metadata lock is taken at the start and at the end for a very brief period of time. The ADD INDEX operation will then work on each node independently. The lock taken at the end is to have all the nodes agree on the operation and commit or rollback (using cluster error voting). This final phase costs a bit more in time and is what adds a few seconds to the operation execution. But during the operation:
- You can alter another table (using NBO)
- You can continue to insert data, except in the table(s) you are altering.
- On node crash, the operation will continue on the other nodes, and if successful it will persist.

In short, the cluster server behavior changes significantly when using NBO, offering significant flexibility compared to TOI. The cost in time should not linearly increase with the dimension of the table, but more in relation to the single node efficiency in performing the ALTER operation.

Conclusion

NBO can be significantly helpful to reduce the impact of DDL on the cluster, for now, limited to the widely used creation/modification/drop of an index. But in the future … we may expand it.

The feature is still a technology preview, so do not trust it in production, but test it and let us know what you think.

Final comment: another distribution has introduced NBO, but only if you buy the enterprise version.

Percona, which is truly open source with facts not just words, has implemented NBO in standard PXC, and the code is fully open source. This is not the first one, but just another of the many features Percona is offering for free while others ask you to buy the enterprise version.

Enjoy the product and let us have your feedback! Great MySQL to all!

Written by Marco Tusa in: ddl,galera,High Availability,Insight for DBAs,MySQL,mysql-and-variants,NBO,Percona Software,Percona XtraDB Cluster,tightly coupled cluster,Zend Developer |

May
04
2021

Inconsistent Voting in Percona XtraDB Cluster

Cluster Error Voting

AKA Cluster Error Voting…

What is Cluster Error Voting (CEV)?

“Cluster Error Voting is a new feature implemented by Alexey Yurchenko, and it is a protocol for nodes to decide how the cluster will react to problems in replication. When one or several nodes have an issue applying an incoming transaction(s) (e.g., suspected inconsistency), this new feature helps. In a 5-node cluster, if 2-nodes fail to apply the transaction, they get removed, and a DBA can go in to fix what went wrong so that the nodes can rejoin the cluster. (Seppo Jaakola)”

This feature was ported to Percona XtraDB Cluster (PXC) in version 8.0.21. As indicated above, it is about increasing the resilience of the cluster, especially when TWO nodes fail to operate and may drop from the cluster abruptly. The protocol is activated in a cluster with any number of nodes.

Before CEV, if a node has a problem/error during a transaction, the node having the issue would report the error in his own log and exit the cluster:

2021-04-23T15:18:38.568903Z 11 [ERROR] [MY-010584] [Repl] Slave SQL: Could not execute Write_rows event on table test.test_voting; Duplicate entry '21' for key 'test_voting.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 0, Error_code: MY-001062
2021-04-23T15:18:38.568976Z 11 [Warning] [MY-000000] [WSREP] Event 3 Write_rows apply failed: 121, seqno 16
2021-04-23T15:18:38.569717Z 11 [Note] [MY-000000] [Galera] Failed to apply write set: gtid: 224fddf7-a43b-11eb-84d5-2ebf2df70610:16 server_id: d7ae67e4-a43c-11eb-861f-8fbcf4f1cbb8 client_id: 40 trx_id: 115 flags: 3
2021-04-23T15:18:38.575439Z 11 [Note] [MY-000000] [Galera] Closing send monitor...
2021-04-23T15:18:38.575578Z 11 [Note] [MY-000000] [Galera] Closed send monitor.
2021-04-23T15:18:38.575647Z 11 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-04-23T15:18:38.575737Z 11 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-04-23T15:18:38.576132Z 11 [Note] [MY-000000] [Galera] gcomm: closing backend
2021-04-23T15:18:38.577954Z 11 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,3206d174,5)
memb {
	727c277a,1
	}
joined {
	}
left {
	}
partitioned {
	3206d174,1
	d7ae67e4,1
	}
)
2021-04-23T15:18:38.578109Z 11 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2021-04-23T15:18:38.578158Z 11 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2021-04-23T15:18:38.578640Z 11 [Note] [MY-000000] [Galera] gcomm: closed
2021-04-23T15:18:38.578747Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

While the other nodes will “just” report the node as out of the view:

2021-04-23T15:18:38.561402Z 0 [Note] [MY-000000] [Galera] forgetting 727c277a (tcp://10.0.0.23:4567)
2021-04-23T15:18:38.562751Z 0 [Note] [MY-000000] [Galera] Node 3206d174 state primary
2021-04-23T15:18:38.570411Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,3206d174,6)
memb {
	3206d174,1
	d7ae67e4,1
	}
joined {
	}
left {
	}
partitioned {
	727c277a,1
	}
)
2021-04-23T15:18:38.570679Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-04-23T15:18:38.574592Z 0 [Note] [MY-000000] [Galera] forgetting 727c277a (tcp://10.0.0.23:4567)
2021-04-23T15:18:38.574716Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2021-04-23

With CEV, we have a different process. Let us review it with images first.

Let us start with a cluster…

3 Nodes, where only one works as Primary.

Primary writes and, as expected, writesets are distributed on all nodes.

insert into test_voting values(null,REVERSE(UUID()), NOW()); <-- Few times

DC1-1(root@localhost) [test]>select * from test_voting;
+----+--------------------------------------+---------------------+
| id | what                                 | when                |
+----+--------------------------------------+---------------------+
|  3 | 05de43720080-938a-be11-305a-6d135601 | 2021-04-24 14:43:34 |
|  6 | 05de43720080-938a-be11-305a-7eb60711 | 2021-04-24 14:43:36 |
|  9 | 05de43720080-938a-be11-305a-6861c221 | 2021-04-24 14:43:37 |
| 12 | 05de43720080-938a-be11-305a-d43f0031 | 2021-04-24 14:43:38 |
| 15 | 05de43720080-938a-be11-305a-53891c31 | 2021-04-24 14:43:39 |
+----+--------------------------------------+---------------------+
5 rows in set (0.00 sec)

Some inexperienced DBA does a manual operation on a secondary using the very unsafe feature wsrep_on…

And then, by mistake or because he did not understand what he is doing…

insert into test_voting values(17,REVERSE(UUID()), NOW()); <-- with few different ids

At the end of the operation of the Secondary node, he will have:

DC1-2(root@localhost) [test]>select * from test_voting;
+----+--------------------------------------+---------------------+
| id | what                                 | when                |
+----+--------------------------------------+---------------------+
|  3 | 05de43720080-938a-be11-305a-6d135601 | 2021-04-24 14:43:34 |
|  6 | 05de43720080-938a-be11-305a-7eb60711 | 2021-04-24 14:43:36 |
|  9 | 05de43720080-938a-be11-305a-6861c221 | 2021-04-24 14:43:37 |
| 12 | 05de43720080-938a-be11-305a-d43f0031 | 2021-04-24 14:43:38 |
| 15 | 05de43720080-938a-be11-305a-53891c31 | 2021-04-24 14:43:39 |
| 16 | 05de43720080-a39a-be11-405a-82715600 | 2021-04-24 14:50:17 |
| 17 | 05de43720080-a39a-be11-405a-f9d62e22 | 2021-04-24 14:51:14 |
| 18 | 05de43720080-a39a-be11-405a-f5624662 | 2021-04-24 14:51:20 |
| 19 | 05de43720080-a39a-be11-405a-cd8cd640 | 2021-04-24 14:50:23 |
+----+--------------------------------------+---------------------+

This is not in line with the rest of the cluster that still has the previous data. Then our guy put the node back:

At this point, the Primary does another insert in that table and:

Houston, we have a problem!

The secondary node already has the entry with that ID and cannot perform the insert:

2021-04-24T13:52:51.930184Z 12 [ERROR] [MY-010584] [Repl] Slave SQL: Could not execute Write_rows event on table test.test_voting; Duplicate entry '18' for key 'test_voting.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 0, Error_code: MY-001062
2021-04-24T13:52:51.930295Z 12 [Warning] [MY-000000] [WSREP] Event 3 Write_rows apply failed: 121, seqno 4928120

But instead of exit from the cluster, it will raise a verification through voting:

2021-04-24T13:52:51.932774Z 0 [Note] [MY-000000] [Galera] Member 0(node2) initiates vote on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120,878ded7898c83a72:  Duplicate entry '18' for key 'test_voting.PRIMARY', Error_code: 1062;
2021-04-24T13:52:51.932888Z 0 [Note] [MY-000000] [Galera] Votes over ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120:
   878ded7898c83a72:   1/3
Waiting for more votes.
2021-04-24T13:52:51.936525Z 0 [Note] [MY-000000] [Galera] Member 1(node3) responds to vote on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120,0000000000000000: Success
2021-04-24T13:52:51.936626Z 0 [Note] [MY-000000] [Galera] Votes over ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120:
   0000000000000000:   1/3
   878ded7898c83a72:   1/3
Waiting for more votes.
2021-04-24T13:52:52.003615Z 0 [Note] [MY-000000] [Galera] Member 2(node1) responds to vote on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120,0000000000000000: Success
2021-04-24T13:52:52.003722Z 0 [Note] [MY-000000] [Galera] Votes over ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120:
   0000000000000000:   2/3
   878ded7898c83a72:   1/3
Winner: 0000000000000000

As you can see, each node informs the cluster about the success or failure of the operation, and the majority wins.

Once the majority had identified the operation was legit, as such, the node that asked for the voting will need to get out from the cluster:

2021-04-24T13:52:52.038510Z 12 [ERROR] [MY-000000] [Galera] Inconsistency detected: Inconsistent by consensus on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120
	 at galera/src/replicator_smm.cpp:process_apply_error():1433
2021-04-24T13:52:52.062666Z 12 [Note] [MY-000000] [Galera] Closing send monitor...
2021-04-24T13:52:52.062750Z 12 [Note] [MY-000000] [Galera] Closed send monitor.
2021-04-24T13:52:52.062796Z 12 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-04-24T13:52:52.062880Z 12 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-04-24T13:52:52.063372Z 12 [Note] [MY-000000] [Galera] gcomm: closing backend
2021-04-24T13:52:52.085853Z 12 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,65a111c6-bb0f,23)
memb {
	65a111c6-bb0f,2
	}
joined {
	}
left {
	}
partitioned {
	aae38617-8dd5,2
	dc4eaa39-b39a,2
	}
)
2021-04-24T13:52:52.086241Z 12 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2021-04-24T13:52:52.086391Z 12 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2021-04-24T13:52:52.150106Z 12 [Note] [MY-000000] [Galera] gcomm: closed
2021-04-24T13:52:52.150340Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

It is also nice to notice that now we have a decent level of information about what happened in the other nodes; the log below is from the Primary:

2021-04-24T13:52:51.932829Z 0 [Note] [MY-000000] [Galera] Member 0(node2) initiates vote on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120,878ded7898c83a72:  Duplicate entry '18' for key 'test_voting.PRIMARY', Error_code: 1062;
2021-04-24T13:52:51.978123Z 0 [Note] [MY-000000] [Galera] Votes over ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120:
…<snip>
2021-04-24T13:52:51.981647Z 0 [Note] [MY-000000] [Galera] Votes over ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120:
   0000000000000000:   2/3
   878ded7898c83a72:   1/3
Winner: 0000000000000000
2021-04-24T13:52:51.981887Z 11 [Note] [MY-000000] [Galera] Vote 0 (success) on ab5deb8e-389d-11eb-b1c0-36eca47bacf0:4928120 is consistent with group. Continue.
2021-04-24T13:52:52.064685Z 0 [Note] [MY-000000] [Galera] declaring aae38617-8dd5 at tcp://10.0.0.31:4567 stable
2021-04-24T13:52:52.064885Z 0 [Note] [MY-000000] [Galera] forgetting 65a111c6-bb0f (tcp://10.0.0.21:4567)
2021-04-24T13:52:52.066916Z 0 [Note] [MY-000000] [Galera] Node aae38617-8dd5 state primary
2021-04-24T13:52:52.071577Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,aae38617-8dd5,24)
memb {
	aae38617-8dd5,2
	dc4eaa39-b39a,2
	}
joined {
	}
left {
	}
partitioned {
	65a111c6-bb0f,2
	}
)
2021-04-24T13:52:52.071683Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-04-24T13:52:52.075293Z 0 [Note] [MY-000000] [Galera] forgetting 65a111c6-bb0f (tcp://10.0.0.21:4567)
2021-04-24T13:52:52.075419Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

At this point, a DBA can start to investigate and manually fix the inconsistency and have the node rejoin the cluster. In the meantime, the rest of the cluster continues to operate:

+----+--------------------------------------+---------------------+
| id | what                                 | when                |
+----+--------------------------------------+---------------------+
|  3 | 05de43720080-938a-be11-305a-6d135601 | 2021-04-24 14:43:34 |
|  6 | 05de43720080-938a-be11-305a-7eb60711 | 2021-04-24 14:43:36 |
|  9 | 05de43720080-938a-be11-305a-6861c221 | 2021-04-24 14:43:37 |
| 12 | 05de43720080-938a-be11-305a-d43f0031 | 2021-04-24 14:43:38 |
| 15 | 05de43720080-938a-be11-305a-53891c31 | 2021-04-24 14:43:39 |
| 18 | 05de43720080-938a-be11-405a-d02c7bc5 | 2021-04-24 14:52:51 |
+----+--------------------------------------+---------------------+

Conclusion

Cluster Error Voting (CEV) is a nice feature to have. It helps us better understand what goes wrong and increases the stability of the cluster, and with the voting has a better way to manage the node expulsion.

Another aspect is visibility; never underestimate the fact information is available also on other nodes. Having it available on multiple nodes may help investigations if the log on the failing node gets lost (for any reason).

We still do not have active tuple certification, but it is a good step, especially given the history we have seen of data drift in PXC/Galera in these 12 years of utilization.

My LAST comment is that while I agree WSREP_ON can be a very powerful tool in the hands of experts, as indicated in my colleague’s blog How to Perform Compatible Schema Changes in Percona XtraDB Cluster (Advanced Alternative). That option remains DANGEROUS, and you should never use it UNLESS your name is Przemys?aw Malkowski, and you really know what you are doing.

Great MySQL to everybody!

References

https://www.percona.com/doc/percona-xtradb-cluster/8.0/release-notes/Percona-XtraDB-Cluster-8.0.21-12.1.html

Galera Clustering in MariaDB 10.5 and beyond – Seppo Jaakola – MariaDB Server Fest 2020

Written by Marco Tusa in: Codership,galera,High Availability,Insight for DBAs,MySQL,mysql-and-variants,Percona Software,Percona XtraDB Cluster,Zend Developer |

Oct
30
2020

CVE-2020-15180 – Affects Percona XtraDB Cluster

CVE-2020-15180

CVE-2020-15180 Galera replication technology, a key component of Percona XtraDB Cluster, suffered from a remote code execution vulnerability. Percona has been working with the vendor since early September on this issue and has made releases available to address the problem.

Applicability

A malicious party with access to the WSREP service port (4567/TCP) as well as prerequisite knowledge of the configuration of the Galera cluster name is required in order to exploit this vulnerability, which leads to remote code execution via the WSREP protocol.

Fixes are available in Percona XtraDB Cluster versions:

>= 8.0.20-11.2

>= 5.7.31-31.45.2

>= 5.6.49-28.42.2

Credits

Percona would like to thank all the Percona staff involved in the resolution of this issue.

More Information

Release notes

Written by David Busby in: galera,Insight for DBAs,MySQL,mysql-and-variants,Percona Software,Percona XtraDB Cluster,Zend Developer |

Oct
06
2020

Various Ways to Perform Schema Upgrades with Percona XtraDB Cluster

Perform Schema Upgrades with Percona XtraDB Cluster

Perform Schema Upgrades with Percona XtraDB Cluster Schema changes are the big challenges in Galera replication. So, it is recommended to understand the schema changes operation for everyone who uses the Percona XtraDB Cluster (PXB)/Galera clusters. In this blog, I am going to explain the operation and impact of the various schema changes methods used in the PXB/Galera cluster.

Schema changes with “wsrep_OSU_method = TOI”
Schema changes with “wsrep_OSU_method = RSU”
Schema changes with “ONLINE ALGORITHMS”
Schema changes with “pt-osc”
Schema changes with “gh-ost”

For testing:

I have configured the 3-node Percona Xtradb Cluster (8.0.19).
Executing read/write load using the sysbench.

mysql> select @@wsrep_cluster_address\G
*************************** 1. row ***************************
@@wsrep_cluster_address: gcomm://pxc81,pxc82,pxc83
1 row in set (0.00 sec)

mysql> select @@version, @@version_comment\G
*************************** 1. row ***************************
        @@version: 8.0.19-10
@@version_comment: Percona XtraDB Cluster (GPL), Release rel10, Revision 727f180, WSREP version 26.4.3
1 row in set (0.00 sec)

What is the Impact of Schema Changes in Clusters?

By default (TOI), all the nodes in the cluster will be pause during the ALTER process. Because the ALTER needs to be replicated on all the nodes. If the ALTER is big it will affect the performance and could be the cause of the downtime.
Rollback is not possible on schema upgrade.
You can’t kill the ALTER query immediately during the operation. So, your application may need to wait until the ALTER completion.

mysql> pager grep alter
PAGER set to 'grep alter'

mysql> show processlist;
| 19 | root            | localhost | schema_changes | Query   |   18 | altering table           | alter table sbtest1 add index idx_c(c) |         0 |             0 |
7 rows in set (0.00 sec)

mysql> kill 19;
ERROR 1095 (HY000): You are not owner of thread 19

MDLs are set only on one node. Not across all the nodes in the cluster. So, you need additional control over this.

Schema Changes with “wsrep_OSU_method = TOI”

TOI: Total Order Isolation

TOI is the default method ( wsrep_OSU_method = TOI ) for schema changes.
DDL statements are processed in the same order with regard to other transactions in each node.
The full cluster will be blocked/locked during the DDL operation.
This guarantees data consistency.

mysql> select @@wsrep_OSU_method\G
*************************** 1. row ***************************
@@wsrep_OSU_method: TOI
1 row in set (0.00 sec)

Example:

I am going to run the below ALTER on “pxc81”.

alter table sbtest1 add index idx_c(c)

After initiating the ALTER on pxc81, My processlist states the COMMIT and UPDATE ( from sysbench ) statements are paused. Only ALTER is in progress. The COMMIT and UPDATE will be resumed once the ALTER is completed.

| 17 | root            | localhost | schema_changes | Execute |      15 | closing tables                           | COMMIT                                 |         0 |             0 |
| 17 | root            | localhost | schema_changes | Execute |      15 | updating                                 | UPDATE sbtest1 SET c='91668836759-30934071579-18064439108-53267873872-79461377960-32104006456-143369 |         0 |             1 |
| 24 | root            | localhost | schema_changes | Query   |      15 | altering table                           | alter table sbtest1 add index idx_c(c) |         0 |             0 |

But, still, the SELECT statement can be run with “wsrep_sync_wait != 1” because “wsrep_sync_wait = 1” needs the casualty checks from other nodes. So, it will fail.

SELECT with “wsrep_sync_wait=1”

|  1 | system user     |           | schema_changes | Query   |     0 | altering table         | alter table sbtest1 add index idx_c(c) |         0 |             0 |
| 15 | root            | localhost | schema_changes | Query   |    40 | starting               | select * from sbtest1 where id=1       |         0 |             0 |

mysql> select * from sbtest1 where id=1;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

TOI can be the right choice for quick operations.

CREATE STATEMENTS
RENAME INDEX
RENAME TABLE
DROP INDEX
ALGORITHM=INSTANT

Schema Changes with “wsrep_OSU_method = RSU”

RSU – Rolling Schema Upgrade

In this method, DDL statements will not replicate across the cluster nodes. Need to execute the DDL individually on all nodes.
The node which is executing the DDL will desync from the cluster group. The other nodes in the cluster are still operational and receive the application connections.
Once the node executes the DDL, it will start to apply the missing writesets.
In this method, the important thing is the WRITEs should not be performed on that particular table until the schema upgrade completes on all the nodes. Users should be very clear on this because the failure will break the cluster and the data may be unrecoverable.
Gcache should be good enough to store the writesets.

Example:

At pxc82, I am going to execute the ALTER.

Session 1: (setting up the value to RSU – session-level)

mysql> set wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.09 sec)

Session 2: (checking the node status)

mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_recv_queue    | 0      |
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
2 rows in set (0.00 sec)

Session 1: (executing the ALTER )

mysql> alter table sbtest1 add index idx_c(c);

Session 2: (checking again the node status )

Here the node went to Donor/Desynced state once the ALTER started. You can see the queue also keeps increasing.

mysql> nopager;  show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
PAGER set to stdout
+---------------------------+----------------+
| Variable_name             | Value          |
+---------------------------+----------------+
| wsrep_local_recv_queue    | 2053           |
| wsrep_local_state_comment | Donor/Desynced |
+---------------------------+----------------+
2 rows in set (0.21 sec)

Session 1: (ALTER completed)

mysql> alter table sbtest1 add index idx_c(c);
Query OK, 0 rows affected (2 min 6.52 sec)
Records: 0  Duplicates: 0  Warnings: 0

Session 2: (Node synced to cluster)

mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_recv_queue    | 0      |
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
2 rows in set (0.00 sec)

This step needs to be executed in pxc81 and pxc83 as well. After completing on all nodes, we are good to allow the WRITEs for that table.

The RSU method is not truly disruption-free, as there are few bugs reported regarding RSU. Users should be very clear and careful about executing the RSU for schema updates:

https://jira.percona.com/browse/PXC-2620

https://jira.percona.com/browse/PXC-2293

https://jira.percona.com/browse/PXC-1980

Schema Changes with “ONLINE ALGORITHMS”

So far, we have 3 algorithms,

INPLACE
COPY
INSTANT

With TOI:

“ALGORITHM = INPLACE / COPY” still pauses the cluster during the operation. Galera doesn’t allow transactions when an ALTER TABLE statement is run. So if you are using TOI, any ALTER TABLE will block all transactions on all nodes.

| 17 | root            | localhost | schema_changes | Execute |      12 | closing tables                           | COMMIT                                                               |         0 |             0 |
| 18 | root            | localhost | schema_changes | Execute |      12 | closing tables                           | COMMIT                                                               |         0 |             0 |
| 32 | root            | localhost | schema_changes | Query   |      13 | altering table                           | alter table sbtest1 add index idx_c(c), algorithm=inplace,

“ALGORITHM=INSTANT” is supported and faster in TOI.

mysql> alter table sbtest1 add column test_Ins int , algorithm=instant;
Query OK, 0 rows affected (0.24 sec)
Records: 0  Duplicates: 0  Warnings: 0 lock=none

With RSU:

“ALGORITHM = INPLACE/COPY” is still not beneficial on RSU. It pauses the Galera replication and takes the node to Desync.

mysql> show processlist;
| 62 | root            | localhost | schema_changes | Query   |    51 | altering table                       | alter table sbtest1 add index idx_c(c), algorithm=inplace, lock=none |         0 |             0 |
5 rows in set (0.06 sec)

mysql> nopager;  show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
PAGER set to stdout
+---------------------------+----------------+
| Variable_name             | Value          |
+---------------------------+----------------+
| wsrep_local_recv_queue    | 7335           |
| wsrep_local_state_comment | Donor/Desynced |
+---------------------------+----------------+
2 rows in set (0.03 sec)

“ALGORITHM=INSTANT” is supported and faster in RSU. But, still, you can use TOI to avoid the additional work.

mysql> alter table sbtest1 add column test_Inss int , algorithm=instant;
Query OK, 0 rows affected (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select @@wsrep_OSU_method;
+--------------------+
| @@wsrep_OSU_method |
+--------------------+
| RSU                |
+--------------------+
1 row in set (0.02 sec)

I would suggest using the “ALGORITHM = INSTANT ” with TOI wherever you can. But, make sure you have the MySQL 8.x + version. Unfortunately, “ALGORITHM=INSTANT” currently only supports adding new columns.

Schema Changes with “pt-osc”

pt-osc : Percona-online-schema-change

Personally, I like this approach very much and use this mostly in production environments. Pt-osc provides non-blocking schema upgrades on all nodes in one shot. This should be used with the TOI method. The action flow will be like this:

Create a new table “_tablename_new” with the required modification
Creates triggers for update the modified rows (insert / update / delete)
Copy the records from the original table to the new table using chunk operation.
Once the copy is completed, it will swap the table ( original ? _old, _new ? original ) and drop the triggers and old table. Direct DDLs ( RENAME TABLE, DROP TABLE ) will be used for this operation ( wsrep_OSU_method=TOI ).

For the below ALTER,

alter table schema_changes.sbtest1 add index idx_test_Ins(test_Ins);

Pt-osc flow in SQL syntax:

Creating new table...
CREATE TABLE `schema_changes`.`_sbtest1_new` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `test_Ins` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=20400335 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Created new table schema_changes._sbtest1_new OK.
Altering new table...
ALTER TABLE `schema_changes`.`_sbtest1_new` add index idx_test_Ins(test_Ins)
Altered `schema_changes`.`_sbtest1_new` OK.
Not creating triggers because this is a dry run.
Not copying rows because this is a dry run.
INSERT LOW_PRIORITY IGNORE INTO `schema_changes`.`_sbtest1_new` (`id`, `k`, `c`, `pad`, `test_ins`) SELECT `id`, `k`, `c`, `pad`, `test_ins` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 9253 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
Not swapping tables because this is a dry run.
Not dropping old table because this is a dry run.
Not dropping triggers because this is a dry run.
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_del`
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_upd`
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_ins`
2020-09-30T08:31:17 Dropping new table...
DROP TABLE IF EXISTS `schema_changes`.`_sbtest1_new`;
2020-09-30T08:31:17 Dropped new table OK.

Pt-osc provides several options to perform the effective operations. You can control the connections, active threads, load, chunk size etc ..

For Galera, we have the option “–max-flow-ctrl”. The option will check the average time cluster spent pausing for FC and make the tool pause if it goes over the percentage indicated in the option. By default, the tool will not check the FC.

[root@pxc81 log]# less /bin/pt-online-schema-change  | grep -i pausing
         print STDERR "Pausing because PXC Flow Control is active\n";
         print STDERR "Pausing because "

To make the schema changes on FOREIGN KEY tables, I would suggest using the “alter-foreign-keys-method = rebuild_constraints”. This helps to maintain the consistency of the schema and its relations. In this approach, before dropping the old table, it runs ALTER on all the child tables to drop existing FK and re-add new FK constraints that points to the columns from the new table. Again, adding and dropping the FOREIGN KEY will be the direct ALTER using TOI.

Schema changes with “gh-ost”

Gh-ost is doing a similar approach like “pt-osc”. It also helps to do the non-blocking ALTERs on all cluster nodes in one shot. The main difference is gh-ost is triggerless. Gh-ost uses the binary log to track the changes. So you need the following variables and thresholds to perform the gh-ost operation.

log-bin=sakthi-bin
binlog-format=ROW
log-slave-updates=ON

The flow will be like,

Creates gh-ost table with the required modifications
Copy the records from the original table to the new table using chunk operation.
Apply the live changes by reading the DML events from binary logs.
Once the binary log events are applied, it will swap the tables ( original –> _old, gh-ost –> original ) and drop the old table.

Example:

[root@pxc81 schema_changes]# gh-ost --alter="add index idx_test_Inss(test_Ins)" --database=schema_changes --table=sbtest1 --user=root --password=Jesus@7sakthI --allow-on-master --execute
[2020/09/30 09:40:56] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql 127.0.0.1 3306 root    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/09/30 09:40:56] [info] binlogsyncer.go:354 begin to sync binlog from position (binlog.000027, 196850993)
[2020/09/30 09:40:56] [info] binlogsyncer.go:203 register slave for master server 127.0.0.1:3306
[2020/09/30 09:40:56] [info] binlogsyncer.go:723 rotate to (binlog.000027, 196850993)
# Migrating `schema_changes`.`sbtest1`; Ghost table is `schema_changes`.`_sbtest1_gho`
# Migrating pxc81:3306; inspecting pxc81:3306; executing on pxc81
# Migration started at Wed Sep 30 09:40:56 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# Serving on unix socket: /tmp/gh-ost.schema_changes.sbtest1.sock
Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: binlog.000027:196853401; Lag: 0.02s, State: migrating; ETA: N/A
Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: binlog.000027:196858195; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 22000/6563240 0.3%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: binlog.000027:201067135; Lag: 0.01s, State: migrating; ETA: 9m58s

.......

Copy: 5682000/6563240 86.6%; Applied: 0; Backlog: 0/1000; Time: 16m10s(total), 16m10s(copy); streamer: binlog.000028:213168607; Lag: 0.01s, State: migrating; ETA: 2m30s
Copy: 6563000/6563240 100.0%; Applied: 0; Backlog: 0/1000; Time: 20m20s(total), 20m20s(copy); streamer: binlog.000028:382677405; Lag: 0.01s, State: migrating; ETA: 0s

Gh-ost also provides several options to perform effective operations. You can control the connections, active threads, load, chunk size, etc.

But unfortunately, “–max-flow-ctl” option is not available in gh-ost.

Conclusion

So, finally, I would say,

Always use the direct ALTER with TOI for the metadata changes and INSTANT ALTERs.
Use pt-online-schema-change with TOI and use the optimal flow control thresholds for InnoDB tables.
Schedule pt-online-schema-change operation in off-peak hours for FOREIGN KEY tables.
If you use RSU, never forget that you need to execute the ALTER on all nodes individually and you should block the WRITEs for that particular table. Make sure, your Gcache size is good enough to hold the writesets.
If you are concerned with triggers, you can use the gh-ost to make the ALTERs.

Written by Sri Sakthivel in: galera,MySQL,mysql-and-variants,Open Source,Percona Software,Percona XtraDB Cluster,Zend Developer |

Oct
03
2019

Percona XtraDB Cluster 8.0 New Feature: wsrep_sst_auth Removal

Experimental Binary XtraDB 8.0

The problem

In PXC 5.6 and 5.7, when using xtrabackup-v2 as the SST method, the DBA must create a user with the appropriate privileges for use by Percona XtraBackup (PXB). The username and password of this backup user are specified in the wsrep_sst_auth variable.

This is a problem because this username and password was being stored in plaintext and required that the configuration file be secured.

The PXC 8.0 solution

(This feature is still under development and may change before PXC 8.0 GA)

Because the wsrep_sst_auth is only needed on the donor side to take a backup, PXC 8.0 uses an internal user (created specifically for use by PXC) with a randomly generated password. Since this user is only needed on the donor, the plaintext password is not needed on the joiner node.

This password consists of 32 characters generated at random. A new password is generated for each SST request. The plaintext of the password is never saved and never leaves the node. The username/password is sent to the SST script via unnamed pipes (stdin).

New PXC internal user accounts

mysql.pxc.internal.session

The mysql.pxc.internal.session user account provides the appropriate security context to create and set up the other PXC accounts. This account has a limited set of privileges, enough needed to create the mysql.pxc.sst.user??.

This account is locked and cannot be used to login (the password field will not allow login).

mysql.pxc.sst.user

The mysql.pxc.sst.user is used by XtraBackup to perform the backup. This account has the full set of privileges needed by XtraBackup.?? This account is created for an SST and is dropped at the end of an SST and also when the PXC node is shutdown. The creation/provisioning of this user account is not written to the binlog and is not replicated to other nodes. However, this account is sent with the backup to the joiner node. So the joiner node also has to drop this user after the SST has finished.

mysql.pxc.sst.role

The mysql.pxc.sst.role is the MySQL role that provides the privileges needed for XtraBackup. This allows for easy addition/removal of privileges needed for an SST.

The experimental release of PXC is based on MySQL 8.0.15, and we have not implemented the role-based support due to issues found with MySQL 8.0.15. This will be revisited in future versions of PXC 8.0.

Program flow

DONOR node receives SST request from the JOINER

DONOR node generates a random password and creates the internal SST user

SET SESSION sql_log_bin = OFF;
DROP USER IF EXISTS 'mysql.pxc.sst.user'@localhost;
CREATE USER 'mysql.pxc.sst.user'@localhost IDENTIFIED WITH 'mysql_native_password' BY 'XXXXXXXX' ACCOUNT LOCK;
GRANT 'mysql.pxc.sst.role'@localhost TO 'mysql.pxc.sst.user'@localhost;
SET DEFAULT ROLE 'mysql.pxc.sst.role'@localhost to 'mysql.pxc.sst.user'@localhost;
ALTER USER 'mysql.pxc.sst.user'@localhost ACCOUNT UNLOCK;

The code that uses role is not being used in the current release due to issues with MySQL 8.0.15. Currently, we create the user with all the permissions needed explicitly.

Launch the SST script (passing the username/password via stdin)
SST uses the username/password to perform the backup
SST script exits
The DONOR node drops the user.
The JOINER node receives the backup and drops the user. Note that the JOINER node also contains the internal SST user!

As a precaution, the user is also dropped when the server is shutdown.

Written by Kenn Takara in: cluster,galera,Insight for DBAs,MySQL,Percona Software,Percona XtraDB Cluster,ProxySQL,pxc,Zend Developer |

Oct
01
2019

Experimental Binary of Percona XtraDB Cluster 8.0

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0 Percona is happy to announce the first experimental binary of Percona XtraDB Cluster 8.0 on October 1, 2019. This is a major step for tuning Percona XtraDB Cluster to be more cloud- and user-friendly. This release combines the updated and feature-rich Galera 4, with substantial improvements made by our development team.

Improvements and New Features

Galera 4, included in Percona XtraDB Cluster 8.0, has many new features. Here is a list of the most essential improvements:

Streaming replication supports large transactions
The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)
More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.
Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.
Multiple system tables help find out more about the state of the cluster state.
The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

Percona XtraDB Cluster 8.0 has been reworked in order to improve security and reliability as well as to provide more information about your cluster:

There is no need to create a backup user or maintain the credentials in plain text (a security flaw). An internal SST user is created, with a random password for making a backup, and this user is discarded immediately once the backup is done.
Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases). This avoids manual intervention and simplifies the operation in the cloud.
SST (State Snapshot Transfer) rolls back or fixes an unwanted action. It is no more “a copy only block” but a smart operation to make the best use of the copy-phase.
Additional visibility statistics are introduced in order to obtain more information about Galera internal objects. This enables easy tracking of the state of execution and flow control.

Installation

You can only install this release from a tarball and it, therefore, cannot be installed through a package management system, such as apt or yum. Note that this release is not ready for use in any production environment.

Percona XtraDB Cluster 8.0 is based on the following:

Percona Server for MySQL 8.0.15-5
Codership WSREP API release 27
Codership Galera library 4.2

Please be aware that this release will not be supported in the future, and as such, neither the upgrade to this release nor the downgrade from higher versions is supported.

This release is also packaged with Percona XtraBackup 8.0.5. All Percona software is open-source and free.

In order to experiment with Percona XtraDB Cluster 8.0 in your environment, download and unpack the tarball for your platform.

Note

Be sure to check your system and make sure that the packages are installed which Percona XtraDB Cluster 8.0 depends on.

For Debian or Ubuntu:

$ sudo apt-get install -y \ socat libdbd-mysql-perl \ rsync libaio1 libc6 libcurl3 libev4 libgcc1 libgcrypt20 \ libgpg-error0 libssl1.1 libstdc++6 zlib1g libatomic1

For Red Hat Enterprise Linux or CentOS:

$ sudo yum install -y openssl socat \ procps-ng chkconfig procps-ng coreutils shadow-utils \ grep libaio libev libcurl perl-DBD-MySQL perl-Digest-MD5 \ libgcc rsync libstdc++ libgcrypt libgpg-error zlib glibc openssl-libs

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Written by Borys Belinsky in: cluster,galera,High Availability,Insight for DBAs,MySQL,Percona Software,Percona XtraDB Cluster,ProxySQL,pxc,Zend Developer |

Sep
18
2019

Percona XtraDB Cluster 5.7.27-31.39 Is Now Available

Percona XtraDB Cluster

Percona is happy to announce the release of Percona XtraDB Cluster 5.7.27-31.39 on September 18, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.27-31.39 is now the current release, based on the following:

Percona Server for MySQL 5.7.27-30
Codership WSREP API release 5.7.27
Codership Galera library 3.28

All Percona software is open-source and free.

Bugs Fixed

PXC-2432: PXC was not updating the information_schema user/client statistics properly.
PXC-2555: SST initialization delay: fixed a bug where the SST process took too long to detect if a child process was running.
PXC-2557: Fixed a crash when a node goes NON-PRIMARY and SHOW STATUS is executed.
PXC-2592: PXC restarting automatically on data inconsistency.
PXC-2605: PXC could crash when log_slow_verbosity included InnoDB. Fixed upstream PS-5820.
PXC-2639: Fixed an issue where a SQL admin command (like OPTIMIZE) could cause a deadlock.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Written by Borys Belinsky in: cluster,galera,High Availability,Insight for DBAs,MySQL,Percona Software,Percona XtraDB Cluster,ProxySQL,pxc,Zend Developer |

Sep
17
2019

Percona XtraDB Cluster 5.6.45-28.36 Is Now Available

Percona XtraDB Cluster

Percona is glad to announce the release of Percona XtraDB Cluster 5.6.45-28.36 on September 17, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.6.45-28.36 is now the current release, based on the following:

Percona Server 5.6.45-86.1
Codership WSREP API release 5.6.44
Codership Galera library 3.28

All Percona software is open-source and free.

Bugs Fixed

PXC-2432: PXC was not updating the information schema user/client statistics properly.
PXC-2555: SST initialization delay: fixed a bug where the SST process took too long to detect if a child process was running.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Written by Borys Belinsky in: cluster,galera,High Availability,Insight for DBAs,MySQL,Percona Software,Percona XtraDB Cluster,ProxySQL,pxc,Zend Developer |

Jun
26
2019

Percona XtraDB Cluster 5.7.26-31.37 Is Now Available

Percona XtraDB Cluster 5.7

Percona is glad to announce the release of Percona XtraDB Cluster 5.7.26-31.37 on June 26, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.26-31.37 is now the current release, based on the following:

Percona Server for MySQL 5.7.26-29
Codership WSREP API release 5.7.25
Codership Galera library 3.26

All Percona software is open-source and free.

Bugs Fixed

PXC-2480: In some cases, Percona XtraDB Cluster could not replicate CURRENT_USER() used in the ALTER statement. USER() and CURRENT_USER() are no longer allowed in any ALTER statement since they fail when replicated.
PXC-2487: The case when a DDL or DML action was in progress from one client and the provider was updated
from another client could result in a race condition.
PXC-2490: Percona XtraDB Cluster could crash when binlog_space_limit was set to a value other than zero during wsrep_recover mode.
PXC-2491: SST could fail if the donor had encrypted undo logs.
PXC-2497: The user can set the preferred donor by setting the wsrep_sst_donor variable. An IP address is not valid as the value of this variable. If the user still used an IP address, an error message was produced that did not provide sufficient information. The error message has been improved to suggest that the user check the value of the wsrep_sst_donor for an IP address.
PXC-2537: Nodes could crash after an attempt to set a password using mysqladmin

Other bugs fixed: PXC-2276, PXC-2292, PXC-2476, PXC-2560

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Written by Borys Belinsky in: cluster,Events and Announcements,galera,High Availability,MySQL,Percona Software,Percona XtraDB Cluster,ProxySQL,pxc,Zend Developer |

Older Posts »

The Glass is twice as large as it needs to be

Demonstration

Nightmare ensues

What does it mean?

Conclusion

The Commands

Operations

Let’s Run It

Altering a Table with TOI

Let’s Now Try With NBO

What is Happening? What are the Differences and Why Does it Take Longer with NBO?

Conclusion

What is Cluster Error Voting (CEV)?

Conclusion

References

Applicability

Credits

More Information

Release notes

What is the Impact of Schema Changes in Clusters?

Schema Changes with “wsrep_OSU_method = TOI”

Schema Changes with “wsrep_OSU_method = RSU”

Schema Changes with “ONLINE ALGORITHMS”

Schema Changes with “pt-osc”

Schema changes with “gh-ost”

Conclusion

The problem

The PXC 8.0 solution

New PXC internal user accounts

mysql.pxc.internal.session

mysql.pxc.sst.user

mysql.pxc.sst.role

Program flow

Improvements and New Features

Installation

Note

Bugs Fixed

Bugs Fixed

Bugs Fixed

Tag Cloud

Archives

Contributors