Apr
22
2022
--

Zero Impact on Index Creation with Amazon Aurora 3

Zero Impact on Index Creation with Aurora 3

Zero Impact on Index Creation with Aurora 3In the last quarter of 2021, AWS released Aurora version 3. This new version aligns Aurora with the latest MySQL 8 version, porting many of the advantages MySQL 8 has over previous versions.

While this brings a lot of new interesting features for Aurora, what we are going to cover here is to see how DDLs behave when using the ONLINE option. With a quick comparison with what happens in MySQL 8 standard and with Group Replication.

Tests

All tests were run on an Aurora instance r6g.large with a secondary availability zone. The test was composed of:

        Four connections

    • #1 to perform DDL
    • #2 to perform insert data in the table I am altering
    • #3 to perform insert data on a different table 
    • #4 checking the other node operations

In the Aurora instance, a sysbench schema with 10 tables and five million rows was created, just to get a bit of traffic. While the test table with 5ml rows as well was:

CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`),
  KEY `idx_1` (`uuid`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC

The executed commands:

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills3  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";mysql --defaults-file=./my.cnf -h <secondary aurora instance> -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Operations:
1) start inserts from connections
2) start commands in connections 4 – 5 on the other nodes
3) execute: DC1-1(root@localhost) [windmills_large]>ALTER TABLE windmills_test ADD INDEX idx_1 (uuid,active), ALGORITHM=INPLACE, LOCK=NONE;

With this, what I was looking to capture is the operation impact in doing a common action as creating an Index. My desired expectation is to have no impact when doing operations that are declared “ONLINE” such as creating an index, as well as data consistency between nodes.

Let us see what happened…

Results

While running the insert in the same table, performing the alter:

mysql>  ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (16.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

It is NOT stopping the operation in the same table or any other table in the Aurora instance.

We can only identify a minimal performance impact:

[root@ip-10-0-0-11 tmp]# while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
.347
.283
.278
.297
.291
.317
.686  ? start
<Snip>
.512  ? end
.278
.284
.279

The secondary node is not affected at all, and this is because Aurora managed at the storage level the data replication. There is no such thing as Apply from Relaylog, as we have in standard MySQL asynchronous or data replicated with Group Replication.  

The result is that in Aurora 3, we can have zero impact index (or any other ONLINE/INSTANT) operation, with this I include the data replicated in the other instances for High Availability. 

If we compare this with Group replication (see blog):

GR         Aurora 3
Time on hold for insert for altering table   	~0.217 sec   ~0 sec
Time on hold for insert for another table   	~0.211 sec   ~0 sec

However, keep in mind that MySQL with Group Replication will still need to apply the data on the Secondaries. This means that if your alter was taking 10 hours to build the index, the Secondary nodes will be misaligned with the Source for approximately another 10 hours. 

With Aurora 3 or with Percona XtraDB Cluster (PXC), changes will be there when Source has completed the operation.    

What about PXC? Well, we have a different scenario:

PXC(NBO)     Aurora 3
Time on hold for insert for altering table   	~120 sec      ~0 sec
Time on hold for insert for another table   	~25  sec      ~0 sec

We will have a higher impact while doing the Alter operation, but the data will be on all nodes at the same time maintaining a high level of consistency in the cluster. 

Conclusion

Aurora is not for all uses, and not for all budgets. However, it has some very good aspects like the one we have just seen. The difference between standard MySQL and Aurora is not in the time of holding/locking (aka operation impact) but in the HA aspects. If I have my data/structure on all my Secondary at the same time as the Source, I will feel much more comfortable than having to wait an additional T time.

This is why PXC in that case is a better alternative if you can afford the locking time. If not, well, Aurora 3 is your solution, just do your math properly and be conservative with the instance resources.

Jan
11
2022
--

Online DDL With Group Replication in MySQL 8.0.27

Online DDL With Group Replication in MySQL 8.0.27

Online DDL With Group Replication in MySQL 8.0.27In April 2021, I wrote an article about Online DDL and Group Replication. At that time we were dealing with MySQL 8.0.23 and also opened a bug report which did not have the right answer to the case presented. 

Anyhow, in that article I have shown how an online DDL was de facto locking the whole cluster for a very long time even when using the consistency level set to EVENTUAL.

This article is to give justice to the work done by the MySQL/Oracle engineers to correct that annoying inconvenience. 

Before going ahead, let us remember how an Online DDL was propagated in a group replication cluster, and identify the differences with what happens now, all with the consistency level set to EVENTUAL (see).

In MySQL 8.0.23 we were having:

While in MySQL 8.0.27 we have:

As you can see from the images we have three different phases. Phase one is the same between version 8.0.23 and version 8.0.27. 

Phases two and three, instead, are quite different. In MySQL 8.0.23 after the DDL is applied on the Primary, it is propagated to the other nodes, but a metalock was also acquired and the control was NOT returned. The result was that not only the session executing the DDL was kept on hold, but also all the other sessions performing modifications. 

Only when the operation was over on all secondaries, the DDL was pushed to Binlog and disseminated for Asynchronous replication, lock raised and operation can restart.

Instead, in MySQL 8.0.27,  once the operation is over on the primary the DDL is pushed to binlog, disseminated to the secondaries and control returned. The result is that the write operations on primary have no interruption whatsoever and the DDL is distributed to secondary and Asynchronous replication at the same time. 

This is a fantastic improvement, available only with consistency level EVENTUAL, but still, fantastic.

Let’s See Some Numbers

To test the operation, I have used the same approach used in the previous tests in the article mentioned above.

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmill8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4-5:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Modifying a table with ~5 million rows:

node1-DC1 (root@localhost) [windmills_large]>select count(*) from  windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+

The numbers below represent the time second/milliseconds taken by the operation to complete. While I was also catching the state of the ALTER on the other node I am not reporting it here given it is not relevant. 

EVENTUAL (on the primary only)
-------------------
Node 1 same table:
.184
.186 <--- no locking during alter on the same node
.184
<snip>
.184
.217 <--- moment of commit
.186
.186
.186
.185

Node 1 another table :
.189
.198 <--- no locking during alter on the same node
.188
<snip>
.191
.211  <--- moment of commit
.194

As you can see there is just a very small delay at the moment of commit, but other impacts.

Now if we compare this with the recent tests I have done for Percona XtraDB Cluster (PXC) Non-Blocking operation (see A Look Into Percona XtraDB Cluster Non-Blocking Operation for Online Schema Upgrade) with the same number of rows and same kind of table/data:

Action Group Replication PXC (NBO)
Time on hold for insert for altering table ~ 0.217 sec ~ 120 sec
Time on hold for insert for another table ~ 0.211 sec ~ 25 sec

However, yes there is a however, PXC was maintaining consistency between the different nodes during the DDL execution, while MySQL 8.0.27 with Group Replication was postponing consistency on the secondaries, thus Primary and Secondary were not in sync until full DDL finalization on the secondaries.

Conclusions

MySQL 8.0.27 comes with this nice fix that significantly reduces the impact of an online DDL operation on a busy server. But we can still observe a significant misalignment of the data between the nodes when a DDL is executing. 

On the other hand, PXC with NBO is a bit more “expensive” in time, but nodes remain aligned all the time.

In the end, is what is more important for you to choose one or the other solution, consistency vs. operational impact.

Great MySQL to all.

Dec
09
2021
--

A Look Into Percona XtraDB Cluster Non-Blocking Operation for Online Schema Upgrade

Percona XtraDB Cluster Non-Blocking Operation

Percona XtraDB Cluster Non-Blocking OperationPercona XtraDB Cluster 8.0.25 (PXC) has introduced a new option to perform online schema modifications: NBO (Non-Blocking Operation).

When using PXC, the cluster relies on the wsrep_OSU_method parameter to define the Online Schema Upgrade (OSU) method the node uses to replicate DDL statements. 

Until now, we normally have three options:

  • Use Total Isolation Order (TOI, the default)
  • Use Rolling Schema Upgrade (RSU)
  • Use Percona’s online schema change tool (TOI + PTOSC)

Each method has some positive and negative aspects. TOI will lock the whole cluster from being able to accept data modifications for the entire time it takes to perform the DDL operation. RSU will misalign the schema definition between the nodes, and in any case, the node performing the DDL operation is still locked. Finally, TOI+PTOSC will rely on creating triggers and copying data, so in some cases, this can be very impactful. 

The new Non-Blocking Operation (NBO) method is to help to reduce the impact on the cluster and make it easier to perform some DDL operations.

At the moment we only support a limited set of operations with NBO like:

  • ALTER INDEX
  • CREATE INDEX
  • DROP INDEX

Any other command will result in an error message ER_NOT_SUPPORTED_YET.

But let us see how it works and what the impact is while we will also compare it with the default method TOI.

What we will do is work with four connections:

1 – to perform ddl
2 – to perform insert data in the table being altered
3 – to perform insert data on a different table
4-5 – checking the other two nodes operations

PXC must be at least Version 8.0.25-15.1.

The table we will modify is :

DC1-1(root@localhost) [windmills_s]>show create table windmills_test\G
*************************** 1. row ***************************
       Table: windmills_test
Create Table: CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=8199260 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)

And contains ~five million rows.

DC1-1(root@localhost) [windmills_s]>select count(*) from windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+
1 row in set (0.44 sec)

The Commands

Connection 1:

ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
  ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;

 

Connection 2:

while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

 

Connection 3:

while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

 

Connections 4-5:

while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_s)"|grep -i -v localhost;sleep 1;done

Operations

  • Start inserts from connections
  • Start commands in connections 4 – 5 on the other nodes
  • Execute: 
    • For TOI 
      • DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=TOI;
    • For NBO
      • DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=NBO;
    • For both
      • DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=shared;

Let’s Run It

Altering a Table with TOI

DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
Query OK, 0 rows affected (1 min 4.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

 

Inserts in the altering table (connection 2):

.450
.492
64.993 <--- Alter blocks all inserts on the table we are altering
.788
.609

 

Inserts on the other table (connection 3):

.455
.461
64.161 <--- Alter blocks all inserts on all the other tables as well
.641
.483

 

On the other nodes at the same time of the ALTER we can see:

Id  User             db         Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
15	system user		windmills_s	Query	102	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	102238	0	       0    <--- time from start

In short, we have the whole cluster locked for ~64 seconds. During this period of time, all the operations to modify data or structure were on hold. 

Let’s Now Try With NBO

Inserts in the altering table:

.437
.487
120.758 <---- Execution time increase
.617
.510

 

Inserts on the other table:

.468
.485
25.061 <---- still a metalock, but not locking the other tables for the whole duration 
.494
.471

 

On the other nodes at the same time of the ALTER we can see:

Id      User         db             Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
110068	system user	 windmills_s	Connect	86	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	 120420	 0	          0

 

In this case, what is also interesting to note is that:

  1. We have a moment of metalock:
    1. 110174 pmm 127.0.0.1:42728 NULL Query 2 Waiting for table metadata lock  SELECT x FROM information_schema.tables  WHERE TABLE_SCHEMA = 'windmills_s' 1486 10    0
    2. 110068  system user connecting host windmills_s Connect 111 closing tables ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 111000 0 0
  2. The execution time is longer

Summarizing:

TOI            NBO
Time on hold for insert for altering table   	~64 sec    	~120 sec
Time on hold for insert for another table   	~64 sec      ~25 sec 
metalock                       			whole time  	 only at the end

What is Happening? What are the Differences and Why Does it Take Longer with NBO?

Let’s see at a very high level how the two work:

  • TOI: when you issue a DDL like ADD INDEX a metadata lock is taken on the table and it will be released only at the end of the operation. During this time, you cannot:
    • Perform DMLs on any cluster node
    • Alter another table in the cluster
  • NBO: the metadata lock is taken at the start and at the end for a very brief period of time. The ADD INDEX operation will then work on each node independently. The lock taken at the end is to have all the nodes agree on the operation and commit or rollback (using cluster error voting). This final phase costs a bit more in time and is what adds a few seconds to the operation execution. But during the operation:
    • You can alter another table (using NBO)
    • You can continue to insert data, except in the table(s) you are altering.
    • On node crash, the operation will continue on the other nodes, and if successful it will persist.  

In short, the cluster server behavior changes significantly when using NBO, offering significant flexibility compared to TOI. The cost in time should not linearly increase with the dimension of the table, but more in relation to the single node efficiency in performing the ALTER operation.       

Conclusion

NBO can be significantly helpful to reduce the impact of DDL on the cluster, for now, limited to the widely used creation/modification/drop of an index. But in the future … we may expand it. 

The feature is still a technology preview, so do not trust it in production, but test it and let us know what you think. 

Final comment: another distribution has introduced NBO, but only if you buy the enterprise version.

Percona, which is truly open source with facts not just words, has implemented NBO in standard PXC, and the code is fully open source. This is not the first one, but just another of the many features Percona is offering for free while others ask you to buy the enterprise version.

Enjoy the product and let us have your feedback! Great MySQL to all! 

Jun
07
2019
--

How pt-online-schema-change Handles Foreign Keys

pt-online-schema-change

pt-online-schema-changeForeign key related issues are very common when dealing with DDL changes in MySQL using Percona toolkit. In this blog post, I will explain how the tool (pt-online-schema-change) handles foreign key constraints when executing a DDL change.

First of all, I would like to explain why foreign keys have to be handled at all before writing more about the “How”. Foreign key constraints are aware of table rename operations. In other words, if the parent table is renamed, the child table automatically knows it and changes the foreign key constraint accordingly. Please have a look at the below example, and you can see the table name is automatically updated in the child table after the rename operation on the parent table:

mysql> show create table prd_details \G
*************************** 1. row ***************************
       Table: prd_details
Create Table: CREATE TABLE `prd_details` (
  `product_id` int(11) NOT NULL AUTO_INCREMENT,
  `product_catalog_id` int(11) unsigned NOT NULL,
  ......
  CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_catalog` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)i
mysql> RENAME TABLE product_catalog TO product_cat ;
Query OK, 0 rows affected (0.15 sec)
mysql> show create table prd_details \G
*************************** 1. row ***************************
       Table: prd_details
Create Table: CREATE TABLE `prd_details` (
  `product_id` int(11) NOT NULL AUTO_INCREMENT,
  `product_catalog_id` int(11) unsigned NOT NULL,
  ......
  CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_cat` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Well, that is indeed very nice and to be expected. But please allow me to explain how this becomes a problem when dealing with DDL changes using pt-online-schema-change. The tool implements the DDL changes as mentioned below. Please keep in mind that these are just to give an idea of how the tool works, as there would be more internal things going on in reality.

Let’s take an example ALTER for this case:

Query:

ALTER TABLE T1 MODIFY COLUMN c1 BIGINT UNSIGNED NOT NULL ;

pt-online-schema-change steps for the above alter:

  1. Create a similar table _T1_new
  2. Modify the column c1 to BIGINT in the table _T1_new
  3. Define triggers on table T1 so that changes to data on the original table will be applied to _T1_new as well.
  4. Copy the data from table T1 to _T1_new.
  5. Swap the tables
  6. Drop triggers.

All looks good so far. Now let’s see why these steps create a problem, with a close look at Step #5 (Swap the tables).

Without foreign keys: Swapping of these tables is done as below, which looks nice.

  • Rename T1 —> T1_old
  • Rename _T1_new –> T1
  • If everything right, drop the table T1_old
  • Drop triggers on the new T1 table

With foreign keys: As I mentioned earlier, if there are any child tables with foreign keys to table T1, after renaming, they would automatically point to T1_old but not the new T1. Pt online schema change has to ensure the child table refers to a correct parent table by the end of this DDL change.

  • Rename T1 —> T1_old       =====? The child table refers to T1_old automatically.
  • Rename _T1_new —> T1

In this case, the foreign keys in the child table are still referring to the old table T1_old which don’t have the schema change in place. If you drop T1_old, child table CT1 ends up pointing to a table that doesn’t exist. That’s a very bad situation. Now let’s talk about how the tool handles this.

How does pt-online-schema-change handle this?

The tool comes up with an option named --alter-foreign-keys-method This option supports two values at a high level and below you can see what are those and how they will work.

alter-foreign-keys-method=drop_swap

With this value, it won’t swap as mentioned in the steps. Rather, it drops the old table and then renames the new table with the change in place.

  • Disable foreign key checks for the session (FOREIGN_KEY_CHECKS=0)
  • Drop the table T1_old
  • Rename the new table _T1_new –> T1

The good thing here is that it is quick, but the bad thing is that it’s not reliable. If something goes wrong with renaming, it ends up with the same problem of referring to an unexisting table.

alter-foreign-keys-method=rebuild_constraints

This is the preferred approach for the reason it maintains the consistency of the schema and its relations. In this approach, before dropping the old table, it runs ALTER on all the child tables to drop existing FK and re-add new FK constraints that points to the columns from the new table (with the schema change in place).  Below sequence of bullet points explains the same.

  • Rename T1 –> T1_old
  • Rename _T1_new –> T1
  • ALTER on child table to adjust the foreign key so that it points to T1 rather T1_old.
ALTER TABLE child_table DROP FOREIGN KEY `fk_name`, ADD CONSTRAINT `_fk_name` FOREIGN KEY  (`child_table_column`) REFERENCES _T1_new (`parent_table_column`)

  • Drop the table T1_old
  • Drop triggers from the new T1 table.

I would like to mention that the current implementation to rebuild the child table can be improved by making use of the INPLACE ALTER which I hope would probably be available in upcoming releases. You can see more information about this in the existing bug report here. I will discuss in brief about the two other options available, which are derived based on the above two. Let’s have a quick look.

auto:  If this value is used, it leaves the decision up to the tool itself to choose from the two (drop_swap/rebuild_constraints) options available. If the number of rows in the child table is small, it uses rebuild_constraints; otherwise, it goes with the drop_swap approach. For this reason, this option should always be chosen carefully as it can end up with unexpected results when choosing drop_swap. Below is an example log snippet which explains this behavior:

# pt-online-schema-change --user=root --password=xxxxxxx --alter-foreign-keys-method=auto  --alter "MODIFY COLUMN header_id BIGINT unsigned NOT NULL AUTO_INCREMENT" D=DB1,t=T1 --execute
...........
Copying `DB1`.`T1`:  75% 00:18 remain
2019-05-28T12:49:41 Copied rows OK.
2019-05-28T12:49:41 Max rows for the rebuild_constraints method: 5588
Determining the method to update foreign keys...
2019-05-28T12:49:41   `DB1`.`child_of_T1`: too many rows: 197076; must use drop_swap
2019-05-28T12:49:41 Drop-swapping tables...
...........

none: If this value is used, it is similar to drop_swap but without swapping. In other words, it just drops the original table and leaves the child tables in a state which they point to a table that doesn’t exist. In this case, DBA’s have need to fix the leftover job.

Photo by Silas Köhler on Unsplash

Mar
25
2019
--

How to Perform Compatible Schema Changes in Percona XtraDB Cluster (Advanced Alternative)?

PXC schema changes options

PXC schema changes optionsIf you are using Galera replication, you know that schema changes may be a serious problem. With its current implementation, there is no way even a simple ALTER will be unobtrusive for live production traffic. It is a fact that with the default TOI alter method, Percona XtraDB Cluster (PXC) cluster suspends writes in order to execute the ALTER in the same order on all nodes.

For factual data structure changes, we have to adapt to the limitations, and either plan for a maintenance window, or use pt-online-schema-change, where interruptions should be very short. I suggest you be extra careful here, as normally you cannot kill an ongoing ALTER query in Galera cluster.

For schema compatible changes, that is, ones that cannot break ROW replication when the writer node and applier nodes have different metadata, we can consider using the Rolling Schema Update (RSU) method. An example of 100% replication-safe DDL is OPTIMIZE TABLE (aka noop-ALTER). However, the following are safe to consider too:

  • adding and removing secondary index,
  • renaming an index,
  • changing the ROW_FORMAT (for example enabling/disabling table compression),
  • changing the KEY_BLOCK_SIZE(compression property).

However, a lesser known fact is that even using the RSU method or pt-online-schema-change for the above may not save us from some unwanted disruptions.

RSU and Concurrent Queries

Let’s take a closer look at a very simple scenario with noop ALTER. We will set wsrep_OSU_method to RSU to avoid a cluster-wide stall. In fact, this mode turns off replication for the following DDL (and only for DDL), so you have to remember to repeat the same ALTER on every cluster member later.

For simplicity, let’s assume there is only one node used for writes. In the first client session, we change the method accordingly to prepare for DDL:

node1 > set wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)
node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| RSU                |          1 |              0 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

(By the way, as seen above, the desync mode is not enabled yet, as it will be automatically enabled around the DDL query only, and disabled right after it finishes).

In a second client session, we start a long enough SELECT query:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
...

And while it’s ongoing, let’s rebuild the table:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (0.98 sec)
Records: 0 Duplicates: 0 Warnings: 0

Surprisingly, immediately the client in the second session receives its SELECT failure:

ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction

So, even a simple SELECT is aborted if it conflicts with the local, concurrent ALTER (RSU)… We can see more details in the error log:

2018-12-04T21:39:17.285108Z 0 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2018-12-04T21:39:17.285124Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 471796)
2018-12-04T21:39:17.305018Z 12 [Note] WSREP: Provider paused at 7bf59bb4-996d-11e8-b3b6-8ed02cd38513:471796 (30)
2018-12-04T21:39:17.324509Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324532Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:
2018-12-04T21:39:17.324535Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb
2018-12-04T21:39:17.324537Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: no conflict, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000
2018-12-04T21:39:17.324542Z 12 [Note] WSREP: MDL conflict db=db1 table=sbtest1 ticket=MDL_SHARED_READ solved by abort
2018-12-04T21:39:17.324544Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324545Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:
2018-12-04T21:39:17.324547Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb
2018-12-04T21:39:17.324548Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: must abort, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000
2018-12-04T21:39:18.517457Z 12 [Note] WSREP: resuming provider at 30
2018-12-04T21:39:18.517482Z 12 [Note] WSREP: Provider resumed.
2018-12-04T21:39:18.518310Z 0 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2018-12-04T21:39:18.518342Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 471796)
2018-12-04T21:39:18.519077Z 0 [Note] WSREP: Member 0.0 (node1) synced with group.
2018-12-04T21:39:18.519099Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 471796)
2018-12-04T21:39:18.519119Z 2 [Note] WSREP: Synchronized with group, ready for connections
2018-12-04T21:39:18.519126Z 2 [Note] WSREP: Setting wsrep_ready to true

Another example – a simple sysbench test, during which I did noop ALTER in RSU mode:

# sysbench /usr/share/sysbench/oltp_read_only.lua --table-size=1000 --tables=8 --mysql-db=db1 --mysql-user=root --threads=8 --time=200 --report-interval=1 --events=0 --db-driver=mysql run
sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 8
Report intermediate results every 1 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 1s ] thds: 8 tps: 558.37 qps: 9004.30 (r/w/o: 7880.62/0.00/1123.68) lat (ms,95%): 18.28 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 8 tps: 579.01 qps: 9290.22 (r/w/o: 8130.20/0.00/1160.02) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 8 tps: 597.36 qps: 9528.89 (r/w/o: 8335.17/0.00/1193.72) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_store_result() returned error 1317 (Query execution was interrupted)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 1317, state = '70100': Query execution was interrupted

So, SELECT queries are aborted to resolve MDL lock request that a DDL in RSU needs immediately. This of course applies to INSERT, UPDATE and DELETE as well. That’s quite an intrusive way to accomplish the goal…

“Manual RSU”

Let’s try a “manual RSU” workaround instead. In fact, we can achieve the same isolated DDL execution as in RSU, by putting a node in desync mode (to avoid flow control) and disabling replication for our session. That way, the ALTER will only be executed in that particular node.

Session 1:

node1 > set wsrep_OSU_method=TOI; set global wsrep_desync=1; set wsrep_on=0;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| TOI                |          0 |              1 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

Session 2:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
+-----------+
| count(*)  |
+-----------+
| 423680000 |
+-----------+
1 row in set (14.07 sec)

Session 1:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (13.52 sec)
Records: 0 Duplicates: 0 Warnings: 0

Session 3:

node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| id | command | time | state                           | info |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| 11 | Query   | 9    | Sending data                    | select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000 |
| 12 | Query   | 7    | Waiting for table metadata lock | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing                       | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| id | command | time | state          | info |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| 11 | Sleep   | 14   |                | NULL |
| 12 | Query   | 13   | altering table | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing      | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

In this case, there was no interruption, the ALTER waited for it’s MDL lock request to succeed gracefully, and did it’s job when it became possible.

Remember, you have to execute the same commands on the rest of the nodes to make them consistent – even for noop-alter, it’s important to make the nodes consistent in terms of table size on disk.

Kill Problem

Another fact is that you cannot cancel or kill a DDL query executed in RSU or in TOI method:

node1 > kill query 12;
ERROR 1095 (HY000): You are not owner of thread 12

This may be an annoying problem when you need to unblock a node urgently. Fortunately, the workaround with wsrep_on=0 also allows to kill an ALTER without that restriction:

Session 1:

node1 > kill query 22;
Query OK, 0 rows affected (0.00 sec)

Session 2:

node1 > alter table db1.sbtest1 engine=innodb;
ERROR 1317 (70100): Query execution was interrupted

Summary

The RSU method may be more intrusive then you’d expect. For schema compatible changes, it is worth considering “manual RSU” with

set global wsrep_desync=1; set wsrep_on=0;

When using it though, please remember that wsrep_on applies to all types of writes, both DDL and DML, so be extra careful to set it back to 1 after the ALTER is done. So the procedure will look like this:

SET GLOBAL wsrep_desync=1;
SET wsrep_on=0;
ALTER ...  /* compatible schema change only! */
SET wsrep_on=1;
SET GLOBAL wsrep_desync=0;

Incidentally, as in my opinion the current RSU behavior is unnecessarily intrusive, I have filed this change suggestion: https://jira.percona.com/browse/PXC-2293


Photo by Pierre Bamin on Unsplash

Jul
05
2013
--

Schema changes – what’s new in MySQL 5.6?

MySQL 5.6Among many of the improvements you can enjoy in MySQL 5.6, there is one that addresses a huge operational problem that most DBAs and System Administrators encounter in their life: schema changes.

While it is usually not a problem for small tables or those in early stages of product life cycle, schema changes become a huge pain once your tables get a significant amount of data. Planning for maintenance is becoming more and more difficult, and your worldwide users want the service to be up and running 24/7, while on the other hand, your developers desire to introduce schema changes every week.

PITA

But what is the real problem here? Let me illustrate very typical case:

Session1> ALTER TABLE revision ADD COLUMN mycol tinyint;
Query OK, 1611193 rows affected (1 min 5.74 sec)
Records: 1611193  Duplicates: 0  Warnings: 0
Session2> INSERT INTO revision SET rev_page=3,rev_comment="test",rev_text_id=1;
Query OK, 1 row affected (48.30 sec)
Session3 > show processlist;
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                                                                 |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------------+
|  1 | root | localhost | test | Query   |   47 | copy to tmp table               | ALTER TABLE revision ADD COLUMN mycol tinyint                        |
|  2 | root | localhost | test | Query   |   30 | Waiting for table metadata lock | INSERT INTO revision SET rev_page=3,rev_comment="test",rev_text_id=1 |
|  3 | root | localhost | NULL | Query   |    0 | init                            | show processlist                                                     |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------------------------------------+
3 rows in set (0.01 sec)

What you see above is how changing table’s structure works in MySQL in all pre-5.6 versions. Normally the example INSERT statement is done in matter of microseconds. But once a DBA runs this ALTER TABLE (session1), the application (session2) has to wait for the ALTER to complete before INSERT and other DML statements can succeed. As a result, application will stall on writes to this table until ALTER TABLE completes.

WORKAROUNDS

There are many less or more complicated approaches to the ALTER TABLE problem in MySQL, starting from just well planned downtime window, through master/slave switching techniques, ending on using advanced tools that let you do the change in less possible intrusive way. These tools are out there for years, yet it appears that still many MySQL users are not aware of them. Let me just name here pt-online-schema-change from Percona Toolkit, oak-online-alter-table from Openark Kit or Facebook OSC.

LONG STORY

Historically, MySQL had to perform full table copy for all DDL (Data Definition Language) operations, while you could not write to a table being altered during the process. This problem is even more painful when awaiting writes can also block following reads.

An important, though small step forward was made for InnoDB storage engine since new InnoDB version (aka InnoDB plugin) was introduced for MySQL 5.1. Since then, you can create and drop indexes without copying the whole table contents. So at least the operation that is practised very often for query optimizations, can be done much much quicker.

Next significant improvement (or perhaps I should say fix) in DDL operations area was introduced in MySQL 5.5 as Metadata Locking. Basically from now on schema changes are working properly and consistent with transactions. You will find this explained in details here.

ONLINE(!) DDL in MySQL 5.6

Fortunately, this is not where so much desired evolution in this area has stopped! With MySQL 5.6, this time a huge step forward was made: from now on most of the ALTER types won’t block writes to a table that is being changed!

Another improvement is that in addition to existing instant ALTER implementations (like change default value for a column), now you can perform also following operations without the need of full table copy:

  • Change auto-increment value for a column
  • Rename a column (if data type remains the same)*
  • Add/Drop a foreign key constraint

As said before though, the main improvement in MySQL 5.6 is that during the ALTER operation a table being changed can still be fully accessible by clients, so both reads and writes (with few exceptions) are allowed! Complete statement matrix can be found here.

Let’s see some examples in practice

Example 1 – reset auto-increment value for a column

It can happen that wrong explicit insert sets the value for an auto-increment column higher then necessary. It may be even close to the data type limit and we want to make the table using lower values back again. We realize that and delete that high value row, but…

Session1> SELECT t.TABLE_NAME,c.COLUMN_NAME,c.COLUMN_KEY,t.AUTO_INCREMENT FROM information_schema.tables t JOIN information_schema.columns c USING (TABLE_SCHEMA,TABLE_NAME) WHERE t.TABLE_SCHEMA="test" AND t.TABLE_NAME="revision" AND c.EXTRA="auto_increment";
+------------+-------------+------------+----------------+
| TABLE_NAME | COLUMN_NAME | COLUMN_KEY | AUTO_INCREMENT |
+------------+-------------+------------+----------------+
| revision   | rev_id      | PRI        |        2000002 |
+------------+-------------+------------+----------------+
1 row in set (0.03 sec)
Session1> select max(rev_id) from revision;
+-------------+
| max(rev_id) |
+-------------+
| 1700000     |
+-------------+
1 row in set (0.00 sec)
Session1> insert into revision set rev_page=3,rev_comment="test",rev_text_id=1;
Query OK, 1 row affected (0.02 sec)
Session1> select max(rev_id) from revision;
+-------------+
| max(rev_id) |
+-------------+
| 2000002     |
+-------------+
1 row in set (0.00 sec)

Not possible since the table has next AUTO_INCREMENT=2000002. So this alter is our only help:

Session1> ALTER TABLE revision AUTO_INCREMENT=1700001;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
Session1> insert into revision set rev_page=4,rev_comment="test",rev_text_id=1;
Query OK, 1 row affected (0.01 sec)
Session1> select max(rev_id) from revision;
+-------------+
| max(rev_id) |
+-------------+
|     1700001 |
+-------------+

Finally, this operation in MySQL 5.6 is instant! In previous MySQL versions such ALTER causes full table rewrite and blocks the table for writes for the whole process time. In version 5.5.31 the same ALTER on the same hardware looks like that:

Session1> ALTER TABLE revision AUTO_INCREMENT=1700001;
Query OK, 1611226 rows affected (1 min 3.42 sec)
Records: 1611226  Duplicates: 0  Warnings: 0

Example 2 – DROP COLUMN

Session1> ALTER TABLE revision DROP COLUMN rev_sha1;
Query OK, 0 rows affected (1 min 39.24 sec)
Records: 0 Duplicates: 0 Warnings: 0

During the ALTER is in progress:

Session2> insert into revision set rev_page=3,rev_comment="test",rev_text_id=1,rev_sha1=2;
Query OK, 1 row affected (0.01 sec)

After the ALTER finished:

Session2> insert into revision set rev_page=3,rev_comment="test",rev_text_id=1,rev_sha1=2;
ERROR 1054 (42S22): Unknown column 'rev_sha1' in 'field list'

Great! Drop table was non-blocking, we can use the table without interruption.

Example 3 – RENAME COLUMN

Original column definition was:
rev_deleted tinyint(1) unsigned NOT NULL DEFAULT '0',

Session1> ALTER TABLE revision CHANGE COLUMN rev_deleted rev_deleted1 tinyint(1) unsigned NOT NULL DEFAULT '0';
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

(The same column rename in MySQL 5.5 or earlier copies the whole table.)

But let’s try another column:
rev_timestamp char(14) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',

Session1> ALTER TABLE revision CHANGE COLUMN rev_timestamp rev_date char(14) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '';
Query OK, 1611226 rows affected (1 min 43.34 sec)
Records: 1611226  Duplicates: 0  Warnings: 0

What the heck?

Hmm, let’s try another one:
rev_comment tinyblob NOT NULL,

Session1> ALTER TABLE revision CHANGE COLUMN rev_comment rev_comment1 tinyblob NOT NULL;
Query OK, 1611226 rows affected (2 min 7.91 sec)
Records: 1611226  Duplicates: 0  Warnings: 0

So, by the way of writing this blog post I identified a new bug and reported it here: http://bugs.mysql.com/bug.php?id=69580
In short, Online DDL does not work as expected when you rename a column of binary data type, but also for a char type with binary collation. The bug not only leads to full table copy but also blocks a table for writes.

Example 4 – NEW ALTER TABLE OPTIONS

In case you are not sure if an ALTER TABLE will copy and/or block the table, and you want to make sure your DDL statement won’t cause such problems, you can add new alter specifications to the statement: ALGORITHM and LOCK.

Session1> ALTER TABLE revision ENGINE=InnoDB, LOCK=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: COPY algorithm requires a lock. Try LOCK=SHARED.

As we can see full table rebuild still needs locking.

Session1> ALTER TABLE revision ADD COLUMN mycol3 tinyint, ALGORITHM=COPY, LOCK=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: COPY algorithm requires a lock. Try LOCK=SHARED.

So, ALGORITHM=COPY and LOCK=NONE are mutually exclusive.

Session1> ALTER TABLE revision MODIFY COLUMN rev_id bigint(8) unsigned NOT NULL AUTO_INCREMENT, ALGORITHM=INPLACE;
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

Data type change of a column is both locking and makes full table copy.

Session1> ALTER TABLE revision ADD COLUMN mycol3 tinyint, ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (1 min 38.84 sec)
Records: 0  Duplicates: 0  Warnings: 0

Above example is pretty interesting as the new method is used, so no full table copy takes place, however whole operation is still as much expensive as the old way. This is because significant data reorganization is needed inside the existing ibd tablespace.

ALGORITHM and LOCK options allow you also to force using old method with full table copy by adding ALGORITHM=COPY or protect a table from writes with LOCK=SHARED. The same effect can be achieved with SET old_alter_table=1 before the ALTER.

NEW DIAGNOSTICS

I would like to mention that MySQL 5.6 offers new monitoring tools that we can use to see more details also about ongoing ALTERs.

This is how it looks like during the ALTER with adding a column is in progress:

Session3> SELECT * FROM information_schema.innodb_metrics WHERE name LIKE 'ddl%'\G
*************************** 1. row ***************************
NAME: ddl_background_drop_indexes
SUBSYSTEM: ddl
COUNT: 0
MAX_COUNT: NULL
MIN_COUNT: NULL
AVG_COUNT: 0
COUNT_RESET: 0
MAX_COUNT_RESET: NULL
MIN_COUNT_RESET: NULL
AVG_COUNT_RESET: NULL
TIME_ENABLED: 2013-06-04 09:45:36
TIME_DISABLED: NULL
TIME_ELAPSED: 175015
TIME_RESET: NULL
STATUS: enabled
TYPE: counter
COMMENT: Number of indexes waiting to be dropped after failed index creation
*************************** 2. row ***************************
NAME: ddl_background_drop_tables
SUBSYSTEM: ddl
COUNT: 0
MAX_COUNT: NULL
MIN_COUNT: NULL
AVG_COUNT: 0
COUNT_RESET: 0
MAX_COUNT_RESET: NULL
MIN_COUNT_RESET: NULL
AVG_COUNT_RESET: NULL
TIME_ENABLED: 2013-06-04 09:45:36
TIME_DISABLED: NULL
TIME_ELAPSED: 175015
TIME_RESET: NULL
STATUS: enabled
TYPE: counter
COMMENT: Number of tables in background drop table list
*************************** 3. row ***************************
NAME: ddl_online_create_index
SUBSYSTEM: ddl
COUNT: 1
MAX_COUNT: NULL
MIN_COUNT: 0
AVG_COUNT: 0.000005713795960346256
COUNT_RESET: 1
MAX_COUNT_RESET: NULL
MIN_COUNT_RESET: 0
AVG_COUNT_RESET: NULL
TIME_ENABLED: 2013-06-04 09:45:36
TIME_DISABLED: NULL
TIME_ELAPSED: 175015
TIME_RESET: NULL
STATUS: enabled
TYPE: counter
COMMENT: Number of indexes being created online
*************************** 4. row ***************************
NAME: ddl_pending_alter_table
SUBSYSTEM: ddl
COUNT: 1
MAX_COUNT: NULL
MIN_COUNT: 0
AVG_COUNT: 0.000005713795960346256
COUNT_RESET: 1
MAX_COUNT_RESET: NULL
MIN_COUNT_RESET: 0
AVG_COUNT_RESET: NULL
TIME_ENABLED: 2013-06-04 09:45:36
TIME_DISABLED: NULL
TIME_ELAPSED: 175015
TIME_RESET: NULL
STATUS: enabled
TYPE: counter
COMMENT: Number of ALTER TABLE, CREATE INDEX, DROP INDEX in progress
4 rows in set (0.01 sec)

To enable those counters, simply do:

Session3> set global innodb_monitor_enable = module_ddl;
Query OK, 0 rows affected (0.00 sec)

IS ONLINE DDL GOOD ENOUGH IN MySQL 5.6?

I would not say that. Although schema changes are now a lot more friendly, but there seems to be still a room for improvement in terms of becoming even more “online”. And there is another aspect – we can’t really control the way online DDL is done in terms of server I/O load and replication. It is often the case when full table rewrite would generate enough high disk utilization to cause performance problems. Yet we cannot do anything about this in MySQL’s Online DDL – it will copy a table with full available speed no matter if it can saturate I/O throughput or not. Here the tools like pt-online-schema-change have this big advantage where it divides table copy into chunks and checks automatically a given status variable if there is not too high load on the server, and pauses data copy process if so.

When we speak about replication – with Online DDL each ALTER is a single operation, so after the master finishes it, it gets replicated to slaves. This will cause replication lag for the time slave has to apply long ALTER, as well as increased I/O activity on all slaves at the same time. Also in this regard pt-online-schema-change chunk copy model has huge advantage – schema change is replicated to the slaves and the tool will check how much slaves are lagged hence pause the copy process in case the lag is too big. Also the data chunks are automatically adjusted to fit within given time limit.

I was curious to compare time required for the same ALTER with ONLINE DDL versus pt-online-schema-change:

[root@centos6-2 ~]# time pt-online-schema-change D=test,t=test --alter "ADD COLUMN mycol4 bigint" --execute
Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `test`.`test`...
Creating new table...
Created new table test._test_new OK.
Altering new table...
Altered `test`.`_test_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 952180 rows...
Copied rows OK.
Swapping tables...
Swapped original and new tables OK.
Dropping old table...
Dropped old table `test`.`_test_old` OK.
Dropping triggers...
Dropped triggers OK.
Successfully altered `test`.`test`.
real    0m22.531s
user    0m0.342s
sys     0m0.095s

versus:

Session1> ALTER TABLE test ADD COLUMN mycol4 bigint, ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (30.64 sec)
Records: 0  Duplicates: 0  Warnings: 0

pt-online-schema-change wins here clearly, but it’s just one simple example out of many possible scenarios!

OVERHEAD

The fact that a table is ready for writes during Online DDL in progress, does not mean it will be performed at the same speed.

I tried a simple sysbench test to compare overall throughput of the MySQL server when the server is idle versus when ONLINE ALTER is in progress. Before the test I warmed up InnoDB buffers and ran sysbench test several times. All data for sysbench table fit in memory.

Then I started ALTER statement (for an unrelated table):

Session1> ALTER TABLE revision ADD COLUMN mycol4 tinyint, ALGORITHM=INPLACE, LOCK=SHARED;                                                                                                             Query OK, 0 rows affected (1 min 54.73 sec)
Records: 0  Duplicates: 0  Warnings: 0

And just after that, the same sysbench command:

[root@centos6-2 ~]# sysbench --num-threads=3 --max-requests=5000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-db=test --mysql-user=root run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 3
Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 pct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Maximum number of requests for OLTP test is limited to 5000
Threads started!
...

Below we can see concurrent threads:

Session3 > show processlist;
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------------------------+
| Id | User | Host      | db   | Command | Time | State          | Info                                                                           |
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------------------------+
|  1 | root | localhost | test | Sleep   |  282 |                | NULL                                                                           |
|  2 | root | localhost | test | Query   |   29 | altering table | ALTER TABLE revision ADD COLUMN mycol4 tinyint, ALGORITHM=INPLACE, LOCK=SHARED |
|  5 | root | localhost | NULL | Query   |    0 | init           | show processlist                                                               |
| 39 | root | localhost | test | Execute |    0 | Writing to net | SELECT c from sbtest where id=?                                                |
| 40 | root | localhost | test | Execute |    0 | Writing to net | DELETE from sbtest where id=?                                                  |
| 41 | root | localhost | test | Execute |    0 | Writing to net | SELECT c from sbtest where id=?                                                |
+----+------+-----------+------+---------+------+----------------+--------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

To be clear, sysbench was started after ALTER, and finished before ALTER was done.

Result? On idle MySQL instance sysbench test score is around 270 transactions per second with minimal variation between many the same tests.
While Online DDL of unrelated table is in progress, the same test scored with average 110 transactions per second. Also minimal variation between many the same tests.

In next test, I altered the sbtest table – the same which is used by sysbench test, hence additional overhead of recording all the changes to that table.

Session1> ALTER TABLE sbtest ADD COLUMN mycol4 tinyint, ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (1 min 5.88 sec)
Records: 0  Duplicates: 0  Warnings: 0

Again, sysbench finished before the ALTER, and now the score was 90 transactions per second.
Note though this was only a very simple test of one example use case.

BUGS

At the end I’d like to mention bugs regarding online DDL that in my opinion are important to be aware of. Fortunately there are not many of them. Following were active as of version 5.6.12:

http://bugs.mysql.com/bug.php?id=69151 – in some circumstances statements needing table copy (like add column) create temporary table in tmpdir instead of the database directory

http://bugs.mysql.com/bug.php?id=69444 – Replication is not crash safe with DDL statements

http://bugs.mysql.com/bug.php?id=69580 – the bug around column rename, I just reported and mentioned above

CONCLUSION

I think schema changes in MySQL 5.6 are really less painful out of the box, plus we can still choose external tools if needed. Having said that, MySQL operators are in much better shape now. It is really good to see MySQL evolution is continued in right direction.

The post Schema changes – what’s new in MySQL 5.6? appeared first on MySQL Performance Blog.

Apr
08
2013
--

Understanding the maximum number of columns in a MySQL table

Understanding the maximum number of columns in a MySQL tableThis post was initially going to be two sets of polls: “What is the maximum number of columns in MySQL?” and “What is the minimum maximum number of columns in MySQL?”. Before you read on, ponder those questions and come up with your own answers… and see if you’re right or can prove me wrong!

Back in 2009, I finished what seemed an epic task in the Drizzle code base: banishing the FRM file. Why? We felt it was not a good idea to keep arbitrary and obscure limitations from the 1980s alive in the 21st century and instead wanted a modular system where the storage engines themselves owned their own metadata. This was a radical departure from the MySQL philosophy, and one that has truly paid off in the Drizzle code base. However… for those using MySQL, Percona Server, MariaDB or any other of the MySQL branches/forks, you get to have these odd limitations.

Why do I discuss the FRM file? If we look at the MAX_FIELDS define in MySQL, we’ll see that it’s defined as 4096. This, however, is not the true limit. To find out what the actual limit is, we must delve further into the FRM file.

What is an FRM file? It’s the FoRM file from UNIREG. It’s FRM rather than FORM as, well, you used to only get three characters after a dot in a file name. Back in 1979, Monty developed an in-house database named UNIREG. UNIREG was a text-based program for entering records into an ISAM-style database. It would have an 80×24 text UI for entering data and a separate UI for generating reports. This evolved into the SQL based MySQL, with MySQL 1.0 being released in 1995.

The FoRM file specified what fields where on what screen for entering, as well as any constraints as to what could be entered. For example, if you had more than a certain number of fields, you were going to need more than one 80×24 screen to enter in all the data! You could also have things like NEXT_NUMBER fields (what we know today as auto_increment), CASEUP and CASEDN fields which although not implemented in MySQL, the defines can still found in the source. Basically, it’s why we can’t have nice things (like default values other than NOW()).

It also has certain limits which by any modern standard are purely arbitrary. One of those is the limit that a certain part of the FRM file cannot be greater than 64kb. The bit of code in question that comes into play around the maximum number of columns is this:

/* Hack to avoid bugs with small static rows in MySQL */
reclength=max(file->min_record_length(table_options),reclength);
if (info_length+(ulong) create_fields.elements*FCOMP+288+
    n_length+int_length+com_length > 65535L || int_count > 255)
{
  my_message(ER_TOO_MANY_FIELDS, ER(ER_TOO_MANY_FIELDS), MYF(0));
  DBUG_RETURN(1);
}

Which is, of course, obvious! Various parts of this limit are:

  • info_length is roughly 2+strlen(field_name) for each field. Unless you have many columns, and then it’s something else (as in that case you don’t get your 80×24 terminal UI in your FRM file, you instead get some bunch of bytes per 19 columns).
  • create_fields.elements*FCOMP is just number of fields multiplied by 17
  • 288 is static and is always needed
  • int_length is the interval length. This isn’t the normal meaning of the word interval, we can only guess that it’s called this due to either something UNIREG specific or it’s just Monty’s English language skills in the 1980s. We’ll come back to this one.
  • com_length is the length of all the comments for each field (but not the table)

An interval in UNIREG speak is a set of strings that are the options for ENUM or SET columns. The tricky bit is that it’s unique intervals, not actual intervals, so two ENUM columns both having the options ‘Y’ and ‘N’ will use less space in the FRM than if you had one with ‘Y’ and ‘N’ and the other with ‘A’ and ‘B’.

If you noticed that if you have a long comment on each field you reduce the number of different ENUM elements you can have, you are correct. There is also a limit of 255 unique intervals, so while you can have many more ENUM(‘Y’,’N’) columns, you can only have 255 ENUM columns with unique values.

If you were looking for a very basic formula that is mostly accurate, I present this:

foreach field:  17+2*(strlen(field_name)+2) (bytes)
    + length of all comments (in bytes)
    + length of all intervals (for ENUM, SET) in bytes.

If you use that as a rule of thumb, with that not being able to exceed 64k, you’re roughly on the right track to working out the maximum number of columns in a MySQL table.

So what’s the maximum number of columns in a MySQL table? Well.. I tried a few things before I settled on the following (perl) program (accepts command line parameter of number of columns to create) to produce the CREATE TABLE sql statement:

sub cname ($) {
  my $c=shift;
  my $name="";
  while($c > 0)
  {
    my $n=$c%36;
    $name.=chr(ord('0')+$n) if $n < 10;     $name.=chr(ord('a')+($n-10)) if $n >= 10;
    $c= int $c/36;
  }
  return $name
}
my $sql= "CREATE TABLE t (";
 foreach(1..shift @ARGV) {
    my $n=cname($_);
    $sql.="`$n`";
    $sql.=" ENUM('Y','N','M','0','1','2')\n";
 }
 chop $sql;
 chop $sql;
 $sql.=");";
 print $sql;

This gets you a 46kb CREATE TABLE statement and a 76kb FRM file for a table with 2,829 columns. I believe this to be the maximum number of columns you can create.

If you try setting the storage engine to InnoDB however, you will get an error message! The exact error message you get is not very interesting and just says “Can’t create table ‘test.t’ (errno: 139)”. This is because InnoDB has a hard limit of 1,000 columns. This is the code from ha_innodb.cc that enforces that limit:

if (form->s->fields > 1000) {
  /* The limit probably should be REC_MAX_N_FIELDS - 3 = 1020,
  but we play safe here */
  DBUG_RETURN(HA_ERR_TO_BIG_ROW);
}

Which is curiously gone from MySQL 5.6, it’s replaced by the following set of defines

#define	DATA_N_SYS_COLS 3 /* data0type.h */
/* from rem0types.h */
#define REC_MAX_N_FIELDS        (1024 - 1)
#define REC_MAX_N_USER_FIELDS	(REC_MAX_N_FIELDS - DATA_N_SYS_COLS * 2)

Which adds up to (1024-1)-3*2=1017 which is exactly the same as what I can create. That’s right folks, in MySQL 5.6 you can create a table with a few more columns in InnoDB!

This led me on another idea… what is the minimum maximum number of columns you can create? You may think that it is 255 based on the limit of the number of intervals above, but can you get any lower? Why yes you can! With this bit of perl code I was able to hit a too many columns error with only 192 columns (i.e. 191 worked):

sub cname ($$) {
     my $c=shift;
     my $name="";
     while($c > 0)
     {
	 my $n=$c%36;
	 $name.=chr(ord('0')+$n) if $n < 10; 	 $name.=chr(ord('a')+($n-10)) if $n >= 10;
	 $c= int $c/36;
     }
     $name.='0' foreach(length $name .. shift);
     return $name
 }
 my $sql= "CREATE TABLE `".cname(16,63)."` (";
 foreach(1..shift @ARGV) {
     my $n=cname($_,63);
     $sql.="`$n`";
     $sql.=" ENUM('".cname(0,64)."') COMMENT '".cname($_,254)."',\n";
 }
 chop $sql;
 chop $sql;
 $sql.=");";
 print $sql;

So the maximum number of columns for a table in MySQL is somewhere between 191 and 2829, depending on a number of factors. I’d be interested to hear if you’ve been able to beat my minimum/maximums!

The post Understanding the maximum number of columns in a MySQL table appeared first on MySQL Performance Blog.

Feb
01
2013
--

Implications of Metadata Locking Changes in MySQL 5.5

While most of the talk recently has mostly been around the new changes in MySQL 5.6 (and that is understandable), I have had lately some very interesting cases to deal with, with respect to the Metadata Locking related changes that were introduced in MySQL 5.5.3. It appears that the implications of Metadata Locking have not been covered well, and since there are still a large number of MySQL 5.0 and 5.1 installations that would upgrade or are in the process of upgrading to MySQL 5.5, I thought it necessary to discuss what these implications exactly are.

To read what Metadata Locking exactly is please read this section here in the MySQL manual.

Let’s start off with having a look at the Meta Data Locking behavior prior to MySQL 5.5.3

Metadata Locking behavior prior to MySQL 5.5.3

Prior to MySQL 5.5.3 a statement that opened a table only held meta data locks till the end of the statement and not the end of the transaction. This meant that transaction was not really isolated, because the same query could return different results if executed twice and if a DDL was executed between the query invocations. Let me give you an example via a simple test case where I will add a new column to the table while a transaction in REPEATABLE-READ isolation mode is ACTIVE.

session1 > select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)  

session1 > select * from test where id=1;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
+----+------+
1 row in set (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (0.57 sec)
Records: 2  Duplicates: 0  Warnings: 0

session1 > select * from test where id=1;
Empty set (0.00 sec)

session1 > rollback;
Query OK, 0 rows affected (0.00 sec)  

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)  

session1 > select * from test where id=1;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
+----+------+------------+
1 row in set (0.00 sec)

And you can see how isolation is broken because the SELECT was not repeatable although transaction isolation level of REPEATABLE-READ was used. This behavior of versions prior to 5.5 also means that queries could be written in different order to the binary log breaking locking semantics and contrary to serialization concepts. For example take a look at the following excerpt from the binary log of a case when an UPDATE transaction is mixed with an ALTER:

# at 536
#130201 11:21:03 server id 1  end_log_pos 658   Query   thread_id=8     exec_time=0     error_code=0
SET TIMESTAMP=1359714063/*!*/;
ALTER TABLE test add column id_2 int(11) default 0 after id
/*!*/;
# at 658
#130201 11:21:39 server id 1  end_log_pos 726   Query   thread_id=7     exec_time=0     error_code=0
SET TIMESTAMP=1359714099/*!*/;
BEGIN
/*!*/;
# at 726
# at 773
#130201 11:21:35 server id 1  end_log_pos 773   Table_map: `test`.`test` mapped to number 17
#130201 11:21:35 server id 1  end_log_pos 829   Update_rows: table id 17 flags: STMT_END_F

BINLOG '
L5cLURMBAAAALwAAAAUDAAAAABEAAAAAAAEABHRlc3QABHRlc3QAAwMD/gL+CQY=
L5cLURgBAAAAOAAAAD0DAAAAABEAAAAAAAEAA///+AIAAAAAAAAAA2JhcvgCAAAAAAAAAANob3A=
'/*!*/;
### UPDATE test.test
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=1 is_null=0 */
###   @3='bar' /* STRING(9) meta=65033 nullable=1 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=1 is_null=0 */
###   @3='hop' /* STRING(9) meta=65033 nullable=1 is_null=0 */
# at 829
#130201 11:21:39 server id 1  end_log_pos 856   Xid = 85
COMMIT/*!*/;

Note how ALTER is logged before the UPDATE, because ALTER did not block waiting for the transaction to commit.

For the reasons described above the implementation of Metadata Locking was changed, starting MySQL 5.5.3. Let’s see how this works now.

Metadata Locking behavior starting MySQL 5.5.3

Starting with 5.5.3 DDL statements that modify the table metadata are executed in an isolated fashion consistent with transactional behavior. This means that any open transaction will hold metadata locks on the table it has accessed for as long as the transaction is open. Since an open transaction retains metadata locks on all tables that were opened by the transaction, hence any DDL operation cannot commence till all the transactions that accessed that table are open. Let’s see this in affect via a simple test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                                                        |
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
|  1 | msandbox | localhost | test | Sleep   |  140 |                                 | NULL                                                        |
|  2 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | ALTER TABLE test add column c char(32) default 'dummy_text' |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                                            |
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
3 rows in set (0.00 sec)

You can see how the ALTER blocks, because the transaction in session1 is still open and once the transaction in session1 is closed, the ALTER proceeds through successfully:

session1 > rollback;
Query OK, 0 rows affected (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (46.77 sec)
Records: 2  Duplicates: 0  Warnings: 0

Let’s see where the ALTER spent most of its time:

session2 > show profiles;
+----------+-------------+-------------------------------------------------------------+
| Query_ID | Duration    | Query                                                       |
+----------+-------------+-------------------------------------------------------------+
|        1 | 46.78110075 | ALTER TABLE test add column c char(32) default 'dummy_text' |
+----------+-------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

session2 > show profile for query 1;
+------------------------------+-----------+
| Status                       | Duration  |
+------------------------------+-----------+
| starting                     |  0.000060 |
| checking permissions         |  0.000003 |
| checking permissions         |  0.000003 |
| init                         |  0.000005 |
| Opening tables               |  0.000045 |
| System lock                  |  0.000006 |
| setup                        |  0.000016 |
| creating table               |  0.168283 |
| After create                 |  0.000061 |
| copy to tmp table            |  0.165808 |
| rename result table          | 46.446738 |
| end                          |  0.000035 |
| Waiting for query cache lock |  0.000003 |
| end                          |  0.000006 |
| query end                    |  0.000003 |
| closing tables               |  0.000008 |
| freeing items                |  0.000016 |
| cleaning up                  |  0.000004 |
+------------------------------+-----------+
18 rows in set (0.00 sec)

So the ALTER waited on the meta data locks just after the table with the new structure had been created and populated with data but before the old table was swapped with the new one. Note that ALTER is a multi-step process, the old table is locked in shared mode and then something similar to the following steps are taken: a new table with the new structure is created and then INSERT INTO new_table SELECT * FROM old_table is done and then RENAME old_table to tmp_table, new_table to old_table and finally DROP tmp_table.
Let’s see another example, this time trying a RENAME:

session2 > RENAME TABLE test to test_2;

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
|  1 | msandbox | localhost | test | Sleep   |   49 |                                 | NULL                        |
|  2 | msandbox | localhost | test | Query   |   35 | Waiting for table metadata lock | RENAME TABLE test to test_2 |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
3 rows in set (0.00 sec)

And you can see that the RENAME is also blocked, because a transaction that accessed the table “test” is still open.

So we have an interesting conclusion here that the ALTER waits only at the last stages when its making changes to the table metadata, a table ALTER that alters a big table can keep executing without any hindrance, copying rows from the table with the old structure to the table with the new structure and will only wait at the last step when its about to make changes to table metadata.

Let’s see another interesting side-affect of metadata locking.

When can ALTER render the table inaccessible?

Now there is another interesting side-affect, and that is that when the ALTER comes at the state where it needs to wait for metadata locks, at that point the ALTER simply blocks any type of queries to the table, we know that writes would be blocked anyhow for the entire duration of the ALTER, but reads would be blocked as well at the time when the ALTER is waiting for metadata locks. Let’s see this in action via another test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test_2 order by id;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
|  2 | bar  | dummy_text |
+----+------+------------+
2 rows in set (0.00 sec)

session6 > ALTER TABLE test_2 DROP COLUMN c;

session7 > select * from test_2 order by id;
session8 > select * from test_2 order by id;
session9 > select * from test_2 order by id;
session10 > select * from test_2 order by id;

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                             |
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
|  1 | msandbox | localhost | test | Sleep   |  403 |                                 | NULL                             |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                 |
|  6 | msandbox | localhost | test | Query   |  229 | Waiting for table metadata lock | ALTER TABLE test_2 DROP COLUMN c |
|  7 | msandbox | localhost | test | Query   |  195 | Waiting for table metadata lock | select * from test_2 order by id |
|  8 | msandbox | localhost | test | Query   |  180 | Waiting for table metadata lock | select * from test_2 order by id |
|  9 | msandbox | localhost | test | Query   |  169 | Waiting for table metadata lock | select * from test_2 order by id |
| 10 | msandbox | localhost | test | Query   |   55 | Waiting for table metadata lock | select * from test_2 order by id |
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
7 rows in set (0.00 sec)

And you can see that the table is blocked for any kind of operation. Let’s see the profiling information for one of the queries that was blocked to see where the query spent most of its time:

session10 > show profile for query 1;
+----------------------+------------+
| Status               | Duration   |
+----------------------+------------+
| starting             |   0.000058 |
| checking permissions |   0.000006 |
| Opening tables       | 213.028481 |
| System lock          |   0.000009 |
| init                 |   0.000014 |
| optimizing           |   0.000002 |
| statistics           |   0.000005 |
| preparing            |   0.000006 |
| executing            |   0.000001 |
| Sorting result       |   0.000002 |
| Sending data         |   0.000040 |
| end                  |   0.000003 |
| query end            |   0.000002 |
| closing tables       |   0.000003 |
| freeing items        |   0.000007 |
| logging slow query   |   0.000002 |
| cleaning up          |   0.000002 |
+----------------------+------------+
17 rows in set (0.00 sec)

And you can see how the query spent nearly all its time waiting in the “Opening tables” state. Now this behavior with respect to ALTER making the table inaccessible in some cases is not really documented and as such I have reported a bug: http://bugs.mysql.com/bug.php?id=67647

Metadata locking behaves differently for queries that are serviced from the Query Cache, let’s see what happens in that case.

Metadata Locking and Query Cache

How does metadata locking behave with query_cache? That is an important question. If Query Cache is enabled and the SELECT can be serviced from the Query Cache then the SELECT will not block on the ALTER even though the ALTER is waiting for meta data locks. Why? Because in such a case no table open operation has to be performed. Let’s see this scenario via a test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test_2 order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session6 > RENAME TABLE test_2 to test;

session10 > select * from test_2 order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
|  1 | msandbox | localhost | test | Sleep   |   22 |                                 | NULL                        |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            |
|  6 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | RENAME TABLE test_2 to test |
| 10 | msandbox | localhost | test | Sleep   |   37 |                                 | NULL                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
4 rows in set (0.00 sec)

The query proceeds without being blocked on anything while the RENAME is still waiting for metadata locks. Let’s see the profiling information for this query:

session10 > show profile for query 11;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000018 |
| Waiting for query cache lock   | 0.000003 |
| checking query cache for query | 0.000007 |
| checking privileges on cached  | 0.000003 |
| checking permissions           | 0.000005 |
| sending cached result to clien | 0.000011 |
| logging slow query             | 0.000002 |
| cleaning up                    | 0.000002 |
+--------------------------------+----------+
8 rows in set (0.00 sec)

You can see that no table open operation was performed and hence no wait.

Does the fact that the table has already been opened and table object is in the table_cache change anything with respect to metadata locks.

Metadata Locking and Table Cache

No matter if a connection accesses a table that is already in the Table Cache, any query to a table that has a DDL operation waiting, will block. Why? Because MySQL sees that the old entries in the Table Cache have to be invalidated, and any query that accesses the table will have to reopen the modified table and there will be new entries in the Table Cache. Let’s see this phenomenon in action:

session6 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (59.80 sec)

session10 > show status like 'Open%tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Open_tables   | 30    |
| Opened_tables | 0     |
+---------------+-------+
2 rows in set (0.00 sec)

session10 > select * from test order by id;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
|  2 | bar  | dummy_text |
+----+------+------------+
2 rows in set (53.78 sec)

session10 > show status like 'Open%tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Open_tables   | 30    |
| Opened_tables | 1     |
+---------------+-------+
2 rows in set (0.00 sec)

session10 > show profile for query 18;
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000059 |
| checking permissions |  0.000010 |
| Opening tables       | 53.786685 |
| System lock          |  0.000009 |
| init                 |  0.000012 |
| optimizing           |  0.000003 |
| statistics           |  0.000007 |
| preparing            |  0.000006 |
| executing            |  0.000001 |
| Sorting result       |  0.000004 |
| Sending data         |  0.000033 |
| end                  |  0.000003 |
| query end            |  0.000002 |
| closing tables       |  0.000004 |
| freeing items        |  0.000009 |
| logging slow query   |  0.000002 |
| cleaning up          |  0.000002 |
+----------------------+-----------+
17 rows in set (0.00 sec)

As you can see that the SELECT query still blocks, and the status counter Opened_tables is also incremented once the query finishes.

So much for the explanation, now let’s take a look at the consequences.

Consequences

The consequences of these changes in metadata locking is that, if you have some really hot tables, for example in web applications its typical to see a “sessions” table that is accessed on every request, then care should be taken when you have to ALTER the table otherwise it can easily cause a stall as many threads can get piled up waiting for table metadata lock bringing down the MySQL server or causing all the connections to get depleted.

There are some other interesting consequences as well for application that use MySQL versions prior to 5.5:

  • I remember a customer case where there is a reporting slave that daily runs a long running transaction, this transactions tends to run for hours. Now everyday one of the tables was renamed and swapped and that table was the one that is read from by the long running transaction. As the slave tried to execute the rename query it would simply block waiting for the long running transaction to finish, this would cause the slave to lag for hours waiting for the transaction to be completed, as you know that the slave is single-threaded so it cannot really apply any other event. This was never an issue when the application was using MySQL version < 5.5 as the datastore.
  • There was another interesting case this time with how Active MQ uses MySQL when in HA mode. In HA mode there are two Active MQ servers, both try to do something similar to the following sequence of events:
    session1 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    Query OK, 0 rows affected (0.09 sec)
    
    session1 > insert into t1 values(null);
    Query OK, 1 row affected (0.21 sec)
    
    session1 > start transaction;
    Query OK, 0 rows affected (0.00 sec)
    
    session1 > select * from t1 where i=1 for update;
    +---+
    | i |
    +---+
    | 1 |
    +---+
    1 row in set (0.00 sec)
    
    session2 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    

    When using MySQL 5.1 the second CREATE would just fail immediately with the error “ERROR 1050 (42S01): Table ‘t1′ already exists”, but because of how meta data locking works in 5.5 this is no longer the case, the second CREATE will simply block with unintended consequences. A workaround here would be to set lock_wait_timeout variable to a very low value and then execute the CREATE TABLE, this will make sure that the CREATE fails immediately (however due to a different reason):

    session2 > set session lock_wait_timeout=1;CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    

    However, I feel that the CREATE TABLE should fail in such a case when the table already exists and there is no other DDL like a DROP table waiting to run on the same table, and as such I have reported the bug: http://bugs.mysql.com/bug.php?id=67873

The post Implications of Metadata Locking Changes in MySQL 5.5 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com