One thing I commonly hear when working with my clients is “I want to change my DDL strategy in order to avoid locking in my database! The last time I used the same old method I ended up in a metadata lock situation!”
I agree that metadata locks can be painful, but unfortunately, it’s completely unavoidable, and changing from one tool to another won’t help with this. That said, it’s still worth it to examine how metadata locks work and what the impact is for each of the common tools and processes. In doing so we will see that all these tools will require metadata locks, but knowing more about how they work and how the use locking can help us determine the right tool for your specific use case.
Any time you make a change to a table a metadata lock is needed to ensure consistency between the table itself and MySQL’s data dictionary. In order for MySQL to establish this lock it has to wait for any query against the table in question to finish, and in turn, any query on the table that occurs after the lock request has to wait for the long-running query and the lock to process before continuing. This will become a lot more apparent in our first example where we’re going to cover the online DDL method using the instant algorithm. I will also illustrate how this works with other tools such as pt-online-schema-change and gh-ost.
Lab setup
In order to demonstrate each technology, the first thing we have to do is set up the environment. I’ve used my VM lab to create a single host using Percona Server for MySQL 8.0. I created a test database and an example table and then created three dummy records.
root@centos7-1 ~]# mysql mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test; Database changed mysql> CREATE TABLE `t1` ( -> `c1` int unsigned NOT NULL AUTO_INCREMENT, -> `c2` int, -> `c3` int, -> PRIMARY KEY (`c1`) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 (c1, c2, c3) values (1, 1, 1); Query OK, 1 row affected (0.01 sec) mysql> insert into t1 (c1, c2, c3) values (2, 2, 2); Query OK, 1 row affected (0.01 sec) mysql> insert into t1 (c1, c2, c3) values (3, 3, 3); Query OK, 1 row affected (0.00 sec)
Online DDL, Algorithm=INSTANT
In order to demonstrate how metadata locks work I have several terminals running against my host.
Terminal One
In the first terminal, I have created a long-running query that will prevent the formation of a metadata lock on the table in question.
mysql> select * from t1 where 1=1 and sleep(100);
Terminal Two
In terminal two, I attempted to run the direct alter against MySQL, which hung.
mysql> ALTER TABLE t1 ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1', ALGORITHM=INSTANT; # HUNG
Terminal three
In terminal three we can see the processlist which shows the offending query and the alter that is seeking, but unable to obtain, the metadata lock.
mysql> show processlist; +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------+---------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 90 | Waiting on empty queue | NULL | 89866 | 0 | 0 | | 8 | root | localhost | test | Query | 30 | User sleep | select * from t1 where 1=1 and sleep(100) | 30212 | 0 | 0 | | 9 | root | localhost | test | Query | 12 | Waiting for table metadata lock | ALTER TABLE t1 ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1', ALGORITHM=INSTANT | 11630 | 0 | 0 | | 10 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------+---------+-----------+---------------+ 4 rows in set (0.00 sec)
Terminal four
And in terminal four I have created another select statement against the table, but it hangs because it’s queued behind the metadata lock.
mysql> select * from t1; #HUNG
Terminal one
So now we have fully recreated the issue. In order to resolve this I have gone back into terminal one to kill the offending long-running query.
mysql> select * from t1 where 1=1 and sleep(100); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted
Terminal two
In terminal two we can see that the alter table completed now that it was able to get the lock it needed.
mysql> ALTER TABLE t1 ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1', ALGORITHM=INSTANT; Query OK, 0 rows affected (36.13 sec)
Terminal three
In terminal three we can see that the processlist is now clear.
mysql> show processlist; +----+-----------------+-----------+------+---------+------+------------------------+------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+------------------------+------------------+---------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 120 | Waiting on empty queue | NULL | 119876 | 0 | 0 | | 8 | root | localhost | test | Sleep | 60 | | NULL | 60222 | 0 | 1 | | 9 | root | localhost | test | Sleep | 42 | | NULL | 41640 | 0 | 0 | | 10 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 11 | root | localhost | test | Sleep | 12 | | NULL | 12092 | 3 | 3 | +----+-----------------+-----------+------+---------+------+------------------------+------------------+---------+-----------+---------------+
Terminal four
And in terminal four we can see the query that was queued behind the lock request was able to complete.
mysql> select * from t1; +----+------+------+----+ | c1 | c2 | c3 | c4 | +----+------+------+----+ | 1 | 1 | 1 | 1 | | 2 | 2 | 2 | 1 | | 3 | 3 | 3 | 1 | +----+------+------+----+ 3 rows in set (6.58 sec)
As such we can conclude that Online DDL was impacted by a long-running query given that it was unable to get the metadata lock it needed. Once the long-running query was killed the alter table was able to complete.
Before moving on to the next example, I’ll undo the changes in my lab so we have a fresh start.
Terminal one
mysql> ALTER TABLE t1 DROP COLUMN c4; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show create table t1 \G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `c1` int unsigned NOT NULL AUTO_INCREMENT, `c2` int DEFAULT NULL, `c3` int DEFAULT NULL, PRIMARY KEY (`c1`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)
pt-online-schema-change
Now that we know a little more about metadata locks, I’ll simplify the lab output in this example and the gh-ost example to follow in order to keep things a little more brief and concise.
For pt-online-schema change, there are at least four places where metadata locks are found. One for the creation of each of the three associated triggers, and one for the table swap.
Just as the same as before, in terminal one I created the long-running query against the table in question. In terminal two I ran the following pt-online-schema-change command.
Terminal two
[root@centos7-1 ~]# pt-online-schema-change --print --progress time,10 \ > --no-swap-tables --no-drop-new-table --no-drop-triggers \ > --alter="ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1'" \ > --tries create_triggers:10000:1,drop_triggers:10000:1,copy_rows:10000:1 \ > --set-vars tx_isolation=\'READ-COMMITTED\',lock_wait_timeout=30 \ > h=localhost,D=test,t=t1 \ > --chunk-time=1 \ > --critical-load threads_running=99999 --max-load Threads_running=50 \ > --execute ... Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10000, 1 create_triggers, 10000, 1 drop_triggers, 10000, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`t1`... Creating new table... CREATE TABLE `test`.`_t1_new` ( `c1` int unsigned NOT NULL AUTO_INCREMENT, `c2` int DEFAULT NULL, `c3` int DEFAULT NULL, PRIMARY KEY (`c1`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Created new table test._t1_new OK. Altering new table... ALTER TABLE `test`.`_t1_new` ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1' Altered `test`.`_t1_new` OK. 2022-10-18T07:47:57 Creating triggers... # HUNG
Terminal three
As you can see, this process is hung on the creation of triggers that will keep the original table in sync with the new table that pt-osc has created. This is because creating triggers on a table changes the table definition, which impacts the data dictionary. We can see this in the processlist as shown below…
mysql> show processlist; +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 577 | Waiting on empty queue | NULL | 576835 | 0 | 0 | | 10 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 11 | root | localhost | test | Sleep | 469 | | NULL | 469051 | 3 | 3 | | 16 | root | localhost | test | Query | 44 | User sleep | select * from t1 where 1=1 and sleep(1000) | 44129 | 0 | 0 | | 17 | root | localhost | test | Query | 7 | Waiting for table metadata lock | CREATE TRIGGER `pt_osc_test_t1_del` AFTER DELETE ON `test`.`t1` FOR EACH ROW BEGIN DECLARE CONTINUE | 6567 | 0 | 0 | | 18 | root | localhost | test | Sleep | 38 | | NULL | 38267 | 1 | 1 | +----+-----------------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+
Terminal two
Once I killed the offending long-running query, the pt-osc process continued:
2022-10-18T07:47:57 Creating triggers... ----------------------------------------------------------- Event : DELETE Name : pt_osc_test_t1_del SQL : CREATE TRIGGER `pt_osc_test_t1_del` AFTER DELETE ON `test`.`t1` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`_t1_new` WHERE `test`.`_t1_new`.`c1` <=> OLD.`c1`; END Suffix: del Time : AFTER ----------------------------------------------------------- ----------------------------------------------------------- Event : UPDATE Name : pt_osc_test_t1_upd SQL : CREATE TRIGGER `pt_osc_test_t1_upd` AFTER UPDATE ON `test`.`t1` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`_t1_new` WHERE !(OLD.`c1` <=> NEW.`c1`) AND `test`.`_t1_new`.`c1` <=> OLD.`c1`; REPLACE INTO `test`.`_t1_new` (`c1`, `c2`, `c3`) VALUES (NEW.`c1`, NEW.`c2`, NEW.`c3`); END Suffix: upd Time : AFTER ----------------------------------------------------------- ----------------------------------------------------------- Event : INSERT Name : pt_osc_test_t1_ins SQL : CREATE TRIGGER `pt_osc_test_t1_ins` AFTER INSERT ON `test`.`t1` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; REPLACE INTO `test`.`_t1_new` (`c1`, `c2`, `c3`) VALUES (NEW.`c1`, NEW.`c2`, NEW.`c3`);END Suffix: ins Time : AFTER ----------------------------------------------------------- 2022-10-18T07:49:21 Created triggers OK. 2022-10-18T07:49:21 Copying approximately 3 rows... INSERT LOW_PRIORITY IGNORE INTO `test`.`_t1_new` (`c1`, `c2`, `c3`) SELECT `c1`, `c2`, `c3` FROM `test`.`t1` LOCK IN SHARE MODE /*pt-online-schema-change 7797 copy table*/ 2022-10-18T07:49:21 Copied rows OK. Not dropping old table because --no-drop-triggers was specified. Not dropping triggers because --no-drop-triggers was specified. To drop the triggers, execute: DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t1_del` DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t1_upd` DROP TRIGGER IF EXISTS `test`.`pt_osc_test_t1_ins` Not dropping the new table `test`.`_t1_new` because --no-drop-new-table was specified. To drop the new table, execute: DROP TABLE IF EXISTS `test`.`_t1_new`; Successfully altered `test`.`t1`.
Terminal two
But you’ll notice in our command that we noted that we would swap tables later using the –no-swap-tables –no-drop-new-table options. So now we have to swap the table. This is impacted by metadata locks as well. In terminal one, I created the long-running query again, and in terminal two I attempted to do the table swap.
mysql> RENAME TABLE test.t1 to test._t1_old, -> test._t1_new to test.t1; # HUNG
Terminal three
This was hung due to waiting for a metadata lock, which we can see in the processlist:
mysql> show processlist; +----+-----------------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------+---------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 762 | Waiting on empty queue | NULL | 762369 | 0 | 0 | | 10 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 11 | root | localhost | test | Sleep | 654 | | NULL | 654585 | 3 | 3 | | 16 | root | localhost | test | Query | 80 | User sleep | select * from t1 where 1=1 and sleep(1000) | 79853 | 0 | 0 | | 20 | root | localhost | test | Query | 41 | Waiting for table metadata lock | RENAME TABLE test.t1 to test._t1_old, test._t1_new to test.t1 | 41043 | 0 | 0 | +----+-----------------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------+---------+-----------+---------------+
Terminal three
Once the long-running query was killed, the table swap was able to continue. Further work on this process was not interrupted by long-running queries because all work at this point only happens on the swapped _old table.
mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | _t1_old | | t1 | +----------------+ 2 rows in set (0.00 sec) mysql> drop trigger pt_osc_test_t1_ins; Query OK, 0 rows affected (0.00 sec) mysql> drop trigger pt_osc_test_t1_upd; Query OK, 0 rows affected (0.01 sec) mysql> drop trigger pt_osc_test_t1_del; Query OK, 0 rows affected (0.00 sec) mysql> drop table _t1_old; Query OK, 0 rows affected (0.01 sec)
So, as you can see, pt-osc is impacted by metadata locks as well. In fact, three more than direct alters.
gh-ost
You may think that using gh-ost will help you get around metadata locks, but this isn’t the case. As you will see below, if there is a long-running query, it will hang when it tries to do the table swap similar to how pt-osc swaps tables.
[root@centos7-1 ~]# gh-ost \ > --max-load=Threads_running=25 \ > --critical-load=Threads_running=1000 \ > --chunk-size=1000 \ > --user="root" \ > --password="password" \ > --host=localhost \ > --allow-on-master \ > --database="test" \ > --table="t1" \ > --verbose \ > --alter="ADD COLUMN c4 TINYINT NOT NULL DEFAULT '1'" \ > --switch-to-rbr \ > --allow-master-master \ > --cut-over=default \ > --exact-rowcount \ > --concurrent-rowcount \ > --default-retries=120 \ > --execute 2022-10-18 07:56:38 INFO starting gh-ost 1.1.5 2022-10-18 07:56:38 INFO Migrating `test`.`t1` ... # Migrating `test`.`t1`; Ghost table is `test`.`_t1_gho` # Migrating centos7-1.localdomain:3306; inspecting centos7-1.localdomain:3306; executing on centos7-1.localdomain # Migration started at Tue Oct 18 07:56:38 -0400 2022 # chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000 # throttle-additional-flag-file: /tmp/gh-ost.throttle # Serving on unix socket: /tmp/gh-ost.test.t1.sock Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), ... 2022-10-18 07:56:39 INFO Creating magic cut-over table `test`.`_t1_del` 2022-10-18 07:56:39 INFO Magic cut-over table created 2022-10-18 07:56:39 INFO Locking `test`.`t1`, `test`.`_t1_del` Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: centos7-1-bin.000003:20639; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 1s(copy); streamer: centos7-1-bin.000003:25267; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 1s(copy); streamer: centos7-1-bin.000003:29893; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 1s(copy); streamer: centos7-1-bin.000003:34518; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 1s(copy); streamer: centos7-1-bin.000003:39149; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 1s(copy); streamer: centos7-1-bin.000003:43780; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due 2022-10-18 07:56:45 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction 2022-10-18 07:56:45 INFO Looking for magic cut-over table 2022-10-18 07:56:45 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
And once this lock clears, the process is able to complete.
2022-10-18 07:56:52 INFO Dropping magic cut-over table 2022-10-18 07:56:52 INFO Dropping table `test`.`_t1_del` 2022-10-18 07:56:52 INFO Table dropped Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 15s(total), 1s(copy); streamer: centos7-1-bin.000003:81553; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due ... # Migrating `test`.`t1`; Ghost table is `test`.`_t1_gho` # Migrating centos7-1.localdomain:3306; inspecting centos7-1.localdomain:3306; executing on centos7-1.localdomain # Migration started at Tue Oct 18 07:56:38 -0400 2022 # chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000 # throttle-additional-flag-file: /tmp/gh-ost.throttle # Serving on unix socket: /tmp/gh-ost.test.t1.sock Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 20s(total), 1s(copy); streamer: centos7-1-bin.000003:106278; Lag: 0.04s, HeartbeatLag: 0.06s, State: migrating; ETA: due 2022-10-18 07:56:58 INFO Setting RENAME timeout as 3 seconds 2022-10-18 07:56:58 INFO Session renaming tables is 41 2022-10-18 07:56:58 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`t1` to `test`.`_t1_del`, `test`.`_t1_gho` to `test`.`t1` Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 21s(total), 1s(copy); streamer: centos7-1-bin.000003:111137; Lag: 0.04s, HeartbeatLag: 0.05s, State: migrating; ETA: due 2022-10-18 07:56:59 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to) 2022-10-18 07:56:59 INFO Checking session lock: gh-ost.40.lock 2022-10-18 07:56:59 INFO Connection holding lock on original table still exists 2022-10-18 07:56:59 INFO Will now proceed to drop magic table and unlock tables 2022-10-18 07:56:59 INFO Dropping magic cut-over table 2022-10-18 07:56:59 INFO Releasing lock from `test`.`t1`, `test`.`_t1_del` 2022-10-18 07:56:59 INFO Tables unlocked 2022-10-18 07:56:59 INFO Tables renamed 2022-10-18 07:56:59 INFO Lock & rename duration: 5.975493692s. During this time, queries on `t1` were blocked [2022/10/18 07:56:59] [info] binlogsyncer.go:164 syncer is closing... 2022-10-18 07:57:00 INFO Closed streamer connection. err=<nil> 2022-10-18 07:57:00 INFO Dropping table `test`.`_t1_ghc` [2022/10/18 07:57:00] [error] binlogstreamer.go:77 close sync with err: sync is been closing... [2022/10/18 07:57:00] [info] binlogsyncer.go:179 syncer is closed 2022-10-18 07:57:00 INFO Table dropped 2022-10-18 07:57:00 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue: 2022-10-18 07:57:00 INFO -- drop table `test`.`_t1_del` 2022-10-18 07:57:00 INFO Done migrating `test`.`t1` 2022-10-18 07:57:00 INFO Removing socket file: /tmp/gh-ost.test.t1.sock 2022-10-18 07:57:00 INFO Tearing down inspector 2022-10-18 07:57:00 INFO Tearing down applier 2022-10-18 07:57:00 INFO Tearing down streamer 2022-10-18 07:57:00 INFO Tearing down throttler # Done
Conclusion
As you can see above, all schema change tools have to work with metadata locks. Some more than others, but this is something that we always need to consider when executing changes. Even if you opt to set a table swap for a later date/time, you still need to be aware that metadata locks can interrupt that process. The key here is to monitor for long-running queries at the time that that change occurs and kill them if they are getting in the way, if that’s possible.
This brings me to the final point: selecting a tool to use for a change. Personally, I do not believe there is such a thing as a single best tool for schema changes. You pick the best tool that will be the lightest weight and the least interruptive for the change you want to make.