Jan
19
2017
--

Open Source Databases on Big Machines: Does READ COMMITTED Scale on 144 Cores?

READ COMMITTED

In the second post in my series on open source databases on big machines, we’ll look at whether READ COMMITTED scales with multiple cores.

The default transaction level for InnoDB is

REPEATABLE READ

. A more permissive level is 

READ COMMITTED

, and is known to work well. While the 

REPEATABLE READ

 level maintains the transaction history up to the start of the transaction,

READ COMMITTED

 maintains the transaction history up to the start of the current statement. Peter Zaitsev described the differences between how these modes are handled in this blog post. Both can theoretically cause performance slowdowns, but

READ COMMITTED

 is usually seen as fast-working – at least on a typical MySQL machine (owned by Percona):

READ COMMITTED

The default transaction isolation mode for PostgreSQL is also

READ COMMITTED

. Originally I wanted to use this mode for MySQL tests as well. But when I tested on a machine with 144 cores, I found that after 36 threads

REPEATABLE READ

 continued to scale while

READ COMMITTED

 slowed down. It then got stuck at about 3x slower results for standard OLTP RW tests.

READ COMMITTED

Dimitri Kravtchuk wrote about performance issues with READ COMMITTED in 2015, but he tested with 40 cores that time. My tests show that there is a huge difference after 40 cores.

I tested this originally with Percona Server 5.7.15 and recently re-tested with Oracle’s MySQL versions 5.6.35 and 5.7.17. I confirmed that the bug exists in these versions as well, and reported it. I the Oracle MySQL Team fixes it. The good news is that while 5.6 stopped scaling after 16 threads, 5.7 improves this to 36 threads.

Results for 5.6.35:

READ COMMITTED

Results for 5.7.17:

Machine details:

PostgreSQL Professional and Freematiq machine (tests for MYSQL 5.6.35, Percona 5.7.15 and MySQL 5.7.17 servers):

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

Percona machine (test for Percona 5.7.15 server):

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Test: SysBench OLTP RW test, converted to use prepared statements, as described in this post.

MySQL Options: described in this post.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Aug
01
2016
--

Introduction into storage engine troubleshooting: Q & A

storage engine troubleshooting

 storage engine troubleshootingIn this blog, I will provide answers to the Q & A for the “Introduction into storage engine troubleshooting” webinar.

First, I want to thank everybody for attending the July 14 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: At which isolation level do 

pt-online-schema-change

 and 

pt-archive

  copy data from a table?

A: Both tools do not change the server’s default transaction isolation level. Use either

REPEATABLE READ

 or set it in my

.cnf

.

Q: Can I create an index to optimize a query which has group by A and order by B, both from different tables and A column is from the first table in the two table join?

A: Do you mean a query like

SELECT ... FROM a, b GROUP BY a.A ORDER BY b.B

 ? Yes, this is possible:

mysql> explain select A, B, count(*) from a join b on(a.A=b.id) WHERE b.B < 4 GROUP BY a.A, b.B ORDER BY b.B ASC;
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref       | rows | Extra                                                     |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | b     | range | PRIMARY,B     | B    | 5       | NULL      |   15 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | a     | ref   | A             | A    | 5       | test.b.id |    1 | Using index                                               |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

Q: Where can I find recommendations on what kind of engine to use for different application types or use cases?

A: Storage engines are always being actively developed, therefore I suggest that you don’t search for generic recommendations. These can be outdated just a few weeks after they are written. Study engines instead. For example, just a few years ago MyISAM was the only engine (among those officially supported) that could work with FULLTEXT indexes and SPATIAL columns. Now InnoDB supports both: FULLTEXT indexes since version 5.6 and GIS features in 5.7. Today I can recommend InnoDB as a general-purpose engine for all installations, and TokuDB for write-heavy workloads when you cannot use high-speed disks.

Alternative storage engines can help to realize specific business needs. For example, CONNECT brings data to your server from many sources, SphinxSE talks to the Sphinx daemon, etc.

Other alternative storage engines increase the speed of certain workloads. Memory, for example, can be a good fit for temporary tables.

Q: Can you please explain how we find the full text of the query when we query the view ‘statements_with_full_table_Scans’?

A: Do you mean view in sys schema? Sys schema views take information from

summary_*

 and digests it in Performance Schema, therefore it does not contain full queries (only digests). Full text of the query can be found in the

events_statements_*

  tables in the Performance Schema. Note that even the 

events_statements_history_long

  table can be rewritten very quickly, and you may want to save data from it periodically.

Q: Hi is TokuDB for the new document protocol?

A: As Alex Rubin showed in his detailed blog post, the new document protocol just converts NoSQL queries into SQL, and is thus not limited to any storage engine. To use documents and collections, a storage engine must support generated columns (which TokuDB currently does not). So support of X Protocol for TokuDB is limited to relational tables access.

Q: Please comment on “read committed” versus “repeatable read.”
Q: Repeatable read holds the cursor on the result set for the client versus read committed where the cursor is updated after a transaction.

A:

READ COMMITTED

 and

REPEATABLE READ

 are transaction isolation levels, whose details are explained here.
I would not correlate locks set on table rows in different transaction isolation modes with the result set. A transaction with isolation level

REPEATABLE READ

  instead creates a snapshot of rows that are accessed by the transaction. Let’s consider a table:

mysql> create table ti(id int not null primary key, f1 int) engine=innodb;
Query OK, 0 rows affected (0.56 sec)
mysql> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9);
Query OK, 9 rows affected (0.03 sec)
Records: 9  Duplicates: 0  Warnings: 0

Then start the transaction and select a few rows from this table:

mysql1> begin;
Query OK, 0 rows affected (0.00 sec)
mysql1> select * from ti where id < 5;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.04 sec)

Now let’s update another set of rows in another transaction:

mysql2> update ti set f1 = id*2 where id > 5;
Query OK, 4 rows affected (0.06 sec)
Rows matched: 4  Changed: 4  Warnings: 0
mysql2> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
|  6 |   12 |
|  7 |   14 |
|  8 |   16 |
|  9 |   18 |
+----+------+
9 rows in set (0.00 sec)

You see that the first four rows – which we accessed in the first transaction – were not modified, and last four were modified. If InnoDB only saved the cursor (as someone answered above) we would expect to see the same result if we ran 

SELECT * ...

  query in our old transaction, but it actually shows whole table content before modification:

mysql1> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |    5 |
|  6 |    6 |
|  7 |    7 |
|  8 |    8 |
|  9 |    9 |
+----+------+
9 rows in set (0.00 sec)

So “snapshot”  is a better word than “cursor” for the result set. In the case of

READ COMMITTED

, the first transaction would see modified rows:

mysql1> drop table ti;
Query OK, 0 rows affected (0.11 sec)
mysql1> create table ti(id int not null primary key, f1 int) engine=innodb;
Query OK, 0 rows affected (0.38 sec)
mysql1> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9);
Query OK, 9 rows affected (0.04 sec)
Records: 9  Duplicates: 0  Warnings: 0
mysql1> set transaction isolation level read committed;
Query OK, 0 rows affected (0.00 sec)
mysql1> begin;
Query OK, 0 rows affected (0.00 sec)
mysql1> select * from ti where id < 5;
+----+------+
| id | f1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

Let’s update all rows in the table this time:

mysql2> update ti set f1 = id*2;
Query OK, 9 rows affected (0.04 sec)
Rows matched: 9  Changed: 9  Warnings: 0

Now the first transaction sees both the modified rows with id >= 5 (not in the initial result set), but also the modified rows with id < 5 (which existed in the initial result set):

mysql1> select * from ti;
+----+------+
| id | f1   |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |    6 |
|  4 |    8 |
|  5 |   10 |
|  6 |   12 |
|  7 |   14 |
|  8 |   16 |
|  9 |   18 |
+----+------+
9 rows in set (0.00 sec)

Jun
17
2016
--

InnoDB locks and transaction isolation level

InnoDB locks and transaction isolation

InnoDB locks and transaction isolationWhat is the difference between InnoDB locks and transaction isolation level? We’ll discuss it in this post.

Recently I received a question from a user about one of my earlier blog posts. Since it wasn’t sent as a comment, I will answer it here. The question:

> I am reading your article:
> https://www.percona.com/resources/technical-presentations/troubleshooting-locking-issues-percona-mysql-webinar

> Full table scan locks whole table.

> Some bad select (read) query can do full table scan on InnoDB, does it lock whole table please?

> My understanding was that SELECT (read) blocks another DML only in MyISAM.

To answer this question, we to need understand two different concepts: locking and transaction isolation level.

In the default transaction isolation mode for InnoDB, 

REPEATABLE READ

 and the lower

TRANSACTION ISOLATION

levels,

SELECT

  doesn’t block any DML unless it uses 

SELECT ... FOR UPDATE

 or

SELECT ... LOCK IN SHARE MODE

. On slide 20 (31) of my presentation, I use the 

SELECT ... FOR UPDATE

 statement (this is why a lock is seen).

However, if transaction isolation mode is

SERIALIZABLE

, then 

SELECT

 can block updates. You can see this in the example below:

mysql1> set transaction isolation level serializable;
Query OK, 0 rows affected (0,00 sec)
mysql1> begin;
Query OK, 0 rows affected (0,00 sec)
mysql1> select * from employees join titles using(emp_no);
c936e6fc4c6cbaf77679ba5013339dff  -
443308 rows in set (29,69 sec)
mysql2> begin;
Query OK, 0 rows affected (0,00 sec)
mysql2> update titles set title='Engineer' where emp_no=15504;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Regarding the different level locks set by InnoDB, I recommend that you follow the links on slide 37 (67) of the presentation, especially “InnoDB Locking Explained With Stick Figures” by Bill Karwin. Innodb table locks are also explained very well in this blog post by Peter Zaitsev.

Feb
01
2013
--

Implications of Metadata Locking Changes in MySQL 5.5

While most of the talk recently has mostly been around the new changes in MySQL 5.6 (and that is understandable), I have had lately some very interesting cases to deal with, with respect to the Metadata Locking related changes that were introduced in MySQL 5.5.3. It appears that the implications of Metadata Locking have not been covered well, and since there are still a large number of MySQL 5.0 and 5.1 installations that would upgrade or are in the process of upgrading to MySQL 5.5, I thought it necessary to discuss what these implications exactly are.

To read what Metadata Locking exactly is please read this section here in the MySQL manual.

Let’s start off with having a look at the Meta Data Locking behavior prior to MySQL 5.5.3

Metadata Locking behavior prior to MySQL 5.5.3

Prior to MySQL 5.5.3 a statement that opened a table only held meta data locks till the end of the statement and not the end of the transaction. This meant that transaction was not really isolated, because the same query could return different results if executed twice and if a DDL was executed between the query invocations. Let me give you an example via a simple test case where I will add a new column to the table while a transaction in REPEATABLE-READ isolation mode is ACTIVE.

session1 > select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)  

session1 > select * from test where id=1;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
+----+------+
1 row in set (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (0.57 sec)
Records: 2  Duplicates: 0  Warnings: 0

session1 > select * from test where id=1;
Empty set (0.00 sec)

session1 > rollback;
Query OK, 0 rows affected (0.00 sec)  

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)  

session1 > select * from test where id=1;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
+----+------+------------+
1 row in set (0.00 sec)

And you can see how isolation is broken because the SELECT was not repeatable although transaction isolation level of REPEATABLE-READ was used. This behavior of versions prior to 5.5 also means that queries could be written in different order to the binary log breaking locking semantics and contrary to serialization concepts. For example take a look at the following excerpt from the binary log of a case when an UPDATE transaction is mixed with an ALTER:

# at 536
#130201 11:21:03 server id 1  end_log_pos 658   Query   thread_id=8     exec_time=0     error_code=0
SET TIMESTAMP=1359714063/*!*/;
ALTER TABLE test add column id_2 int(11) default 0 after id
/*!*/;
# at 658
#130201 11:21:39 server id 1  end_log_pos 726   Query   thread_id=7     exec_time=0     error_code=0
SET TIMESTAMP=1359714099/*!*/;
BEGIN
/*!*/;
# at 726
# at 773
#130201 11:21:35 server id 1  end_log_pos 773   Table_map: `test`.`test` mapped to number 17
#130201 11:21:35 server id 1  end_log_pos 829   Update_rows: table id 17 flags: STMT_END_F

BINLOG '
L5cLURMBAAAALwAAAAUDAAAAABEAAAAAAAEABHRlc3QABHRlc3QAAwMD/gL+CQY=
L5cLURgBAAAAOAAAAD0DAAAAABEAAAAAAAEAA///+AIAAAAAAAAAA2JhcvgCAAAAAAAAAANob3A=
'/*!*/;
### UPDATE test.test
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=1 is_null=0 */
###   @3='bar' /* STRING(9) meta=65033 nullable=1 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=1 is_null=0 */
###   @3='hop' /* STRING(9) meta=65033 nullable=1 is_null=0 */
# at 829
#130201 11:21:39 server id 1  end_log_pos 856   Xid = 85
COMMIT/*!*/;

Note how ALTER is logged before the UPDATE, because ALTER did not block waiting for the transaction to commit.

For the reasons described above the implementation of Metadata Locking was changed, starting MySQL 5.5.3. Let’s see how this works now.

Metadata Locking behavior starting MySQL 5.5.3

Starting with 5.5.3 DDL statements that modify the table metadata are executed in an isolated fashion consistent with transactional behavior. This means that any open transaction will hold metadata locks on the table it has accessed for as long as the transaction is open. Since an open transaction retains metadata locks on all tables that were opened by the transaction, hence any DDL operation cannot commence till all the transactions that accessed that table are open. Let’s see this in affect via a simple test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                                                        |
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
|  1 | msandbox | localhost | test | Sleep   |  140 |                                 | NULL                                                        |
|  2 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | ALTER TABLE test add column c char(32) default 'dummy_text' |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                                            |
+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+
3 rows in set (0.00 sec)

You can see how the ALTER blocks, because the transaction in session1 is still open and once the transaction in session1 is closed, the ALTER proceeds through successfully:

session1 > rollback;
Query OK, 0 rows affected (0.00 sec)

session2 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (46.77 sec)
Records: 2  Duplicates: 0  Warnings: 0

Let’s see where the ALTER spent most of its time:

session2 > show profiles;
+----------+-------------+-------------------------------------------------------------+
| Query_ID | Duration    | Query                                                       |
+----------+-------------+-------------------------------------------------------------+
|        1 | 46.78110075 | ALTER TABLE test add column c char(32) default 'dummy_text' |
+----------+-------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

session2 > show profile for query 1;
+------------------------------+-----------+
| Status                       | Duration  |
+------------------------------+-----------+
| starting                     |  0.000060 |
| checking permissions         |  0.000003 |
| checking permissions         |  0.000003 |
| init                         |  0.000005 |
| Opening tables               |  0.000045 |
| System lock                  |  0.000006 |
| setup                        |  0.000016 |
| creating table               |  0.168283 |
| After create                 |  0.000061 |
| copy to tmp table            |  0.165808 |
| rename result table          | 46.446738 |
| end                          |  0.000035 |
| Waiting for query cache lock |  0.000003 |
| end                          |  0.000006 |
| query end                    |  0.000003 |
| closing tables               |  0.000008 |
| freeing items                |  0.000016 |
| cleaning up                  |  0.000004 |
+------------------------------+-----------+
18 rows in set (0.00 sec)

So the ALTER waited on the meta data locks just after the table with the new structure had been created and populated with data but before the old table was swapped with the new one. Note that ALTER is a multi-step process, the old table is locked in shared mode and then something similar to the following steps are taken: a new table with the new structure is created and then INSERT INTO new_table SELECT * FROM old_table is done and then RENAME old_table to tmp_table, new_table to old_table and finally DROP tmp_table.
Let’s see another example, this time trying a RENAME:

session2 > RENAME TABLE test to test_2;

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
|  1 | msandbox | localhost | test | Sleep   |   49 |                                 | NULL                        |
|  2 | msandbox | localhost | test | Query   |   35 | Waiting for table metadata lock | RENAME TABLE test to test_2 |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
3 rows in set (0.00 sec)

And you can see that the RENAME is also blocked, because a transaction that accessed the table “test” is still open.

So we have an interesting conclusion here that the ALTER waits only at the last stages when its making changes to the table metadata, a table ALTER that alters a big table can keep executing without any hindrance, copying rows from the table with the old structure to the table with the new structure and will only wait at the last step when its about to make changes to table metadata.

Let’s see another interesting side-affect of metadata locking.

When can ALTER render the table inaccessible?

Now there is another interesting side-affect, and that is that when the ALTER comes at the state where it needs to wait for metadata locks, at that point the ALTER simply blocks any type of queries to the table, we know that writes would be blocked anyhow for the entire duration of the ALTER, but reads would be blocked as well at the time when the ALTER is waiting for metadata locks. Let’s see this in action via another test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test_2 order by id;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
|  2 | bar  | dummy_text |
+----+------+------------+
2 rows in set (0.00 sec)

session6 > ALTER TABLE test_2 DROP COLUMN c;

session7 > select * from test_2 order by id;
session8 > select * from test_2 order by id;
session9 > select * from test_2 order by id;
session10 > select * from test_2 order by id;

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                             |
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
|  1 | msandbox | localhost | test | Sleep   |  403 |                                 | NULL                             |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                 |
|  6 | msandbox | localhost | test | Query   |  229 | Waiting for table metadata lock | ALTER TABLE test_2 DROP COLUMN c |
|  7 | msandbox | localhost | test | Query   |  195 | Waiting for table metadata lock | select * from test_2 order by id |
|  8 | msandbox | localhost | test | Query   |  180 | Waiting for table metadata lock | select * from test_2 order by id |
|  9 | msandbox | localhost | test | Query   |  169 | Waiting for table metadata lock | select * from test_2 order by id |
| 10 | msandbox | localhost | test | Query   |   55 | Waiting for table metadata lock | select * from test_2 order by id |
+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+
7 rows in set (0.00 sec)

And you can see that the table is blocked for any kind of operation. Let’s see the profiling information for one of the queries that was blocked to see where the query spent most of its time:

session10 > show profile for query 1;
+----------------------+------------+
| Status               | Duration   |
+----------------------+------------+
| starting             |   0.000058 |
| checking permissions |   0.000006 |
| Opening tables       | 213.028481 |
| System lock          |   0.000009 |
| init                 |   0.000014 |
| optimizing           |   0.000002 |
| statistics           |   0.000005 |
| preparing            |   0.000006 |
| executing            |   0.000001 |
| Sorting result       |   0.000002 |
| Sending data         |   0.000040 |
| end                  |   0.000003 |
| query end            |   0.000002 |
| closing tables       |   0.000003 |
| freeing items        |   0.000007 |
| logging slow query   |   0.000002 |
| cleaning up          |   0.000002 |
+----------------------+------------+
17 rows in set (0.00 sec)

And you can see how the query spent nearly all its time waiting in the “Opening tables” state. Now this behavior with respect to ALTER making the table inaccessible in some cases is not really documented and as such I have reported a bug: http://bugs.mysql.com/bug.php?id=67647

Metadata locking behaves differently for queries that are serviced from the Query Cache, let’s see what happens in that case.

Metadata Locking and Query Cache

How does metadata locking behave with query_cache? That is an important question. If Query Cache is enabled and the SELECT can be serviced from the Query Cache then the SELECT will not block on the ALTER even though the ALTER is waiting for meta data locks. Why? Because in such a case no table open operation has to be performed. Let’s see this scenario via a test case:

session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test_2 order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session6 > RENAME TABLE test_2 to test;

session10 > select * from test_2 order by id;
+----+------+
| id | x    |
+----+------+
|  1 | foo  |
|  2 | bar  |
+----+------+
2 rows in set (0.00 sec)

session3 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
| Id | User     | Host      | db   | Command | Time | State                           | Info                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
|  1 | msandbox | localhost | test | Sleep   |   22 |                                 | NULL                        |
|  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            |
|  6 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | RENAME TABLE test_2 to test |
| 10 | msandbox | localhost | test | Sleep   |   37 |                                 | NULL                        |
+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+
4 rows in set (0.00 sec)

The query proceeds without being blocked on anything while the RENAME is still waiting for metadata locks. Let’s see the profiling information for this query:

session10 > show profile for query 11;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000018 |
| Waiting for query cache lock   | 0.000003 |
| checking query cache for query | 0.000007 |
| checking privileges on cached  | 0.000003 |
| checking permissions           | 0.000005 |
| sending cached result to clien | 0.000011 |
| logging slow query             | 0.000002 |
| cleaning up                    | 0.000002 |
+--------------------------------+----------+
8 rows in set (0.00 sec)

You can see that no table open operation was performed and hence no wait.

Does the fact that the table has already been opened and table object is in the table_cache change anything with respect to metadata locks.

Metadata Locking and Table Cache

No matter if a connection accesses a table that is already in the Table Cache, any query to a table that has a DDL operation waiting, will block. Why? Because MySQL sees that the old entries in the Table Cache have to be invalidated, and any query that accesses the table will have to reopen the modified table and there will be new entries in the Table Cache. Let’s see this phenomenon in action:

session6 > ALTER TABLE test add column c char(32) default 'dummy_text';
Query OK, 2 rows affected (59.80 sec)

session10 > show status like 'Open%tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Open_tables   | 30    |
| Opened_tables | 0     |
+---------------+-------+
2 rows in set (0.00 sec)

session10 > select * from test order by id;
+----+------+------------+
| id | x    | c          |
+----+------+------------+
|  1 | foo  | dummy_text |
|  2 | bar  | dummy_text |
+----+------+------------+
2 rows in set (53.78 sec)

session10 > show status like 'Open%tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Open_tables   | 30    |
| Opened_tables | 1     |
+---------------+-------+
2 rows in set (0.00 sec)

session10 > show profile for query 18;
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000059 |
| checking permissions |  0.000010 |
| Opening tables       | 53.786685 |
| System lock          |  0.000009 |
| init                 |  0.000012 |
| optimizing           |  0.000003 |
| statistics           |  0.000007 |
| preparing            |  0.000006 |
| executing            |  0.000001 |
| Sorting result       |  0.000004 |
| Sending data         |  0.000033 |
| end                  |  0.000003 |
| query end            |  0.000002 |
| closing tables       |  0.000004 |
| freeing items        |  0.000009 |
| logging slow query   |  0.000002 |
| cleaning up          |  0.000002 |
+----------------------+-----------+
17 rows in set (0.00 sec)

As you can see that the SELECT query still blocks, and the status counter Opened_tables is also incremented once the query finishes.

So much for the explanation, now let’s take a look at the consequences.

Consequences

The consequences of these changes in metadata locking is that, if you have some really hot tables, for example in web applications its typical to see a “sessions” table that is accessed on every request, then care should be taken when you have to ALTER the table otherwise it can easily cause a stall as many threads can get piled up waiting for table metadata lock bringing down the MySQL server or causing all the connections to get depleted.

There are some other interesting consequences as well for application that use MySQL versions prior to 5.5:

  • I remember a customer case where there is a reporting slave that daily runs a long running transaction, this transactions tends to run for hours. Now everyday one of the tables was renamed and swapped and that table was the one that is read from by the long running transaction. As the slave tried to execute the rename query it would simply block waiting for the long running transaction to finish, this would cause the slave to lag for hours waiting for the transaction to be completed, as you know that the slave is single-threaded so it cannot really apply any other event. This was never an issue when the application was using MySQL version < 5.5 as the datastore.
  • There was another interesting case this time with how Active MQ uses MySQL when in HA mode. In HA mode there are two Active MQ servers, both try to do something similar to the following sequence of events:
    session1 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    Query OK, 0 rows affected (0.09 sec)
    
    session1 > insert into t1 values(null);
    Query OK, 1 row affected (0.21 sec)
    
    session1 > start transaction;
    Query OK, 0 rows affected (0.00 sec)
    
    session1 > select * from t1 where i=1 for update;
    +---+
    | i |
    +---+
    | 1 |
    +---+
    1 row in set (0.00 sec)
    
    session2 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    

    When using MySQL 5.1 the second CREATE would just fail immediately with the error “ERROR 1050 (42S01): Table ‘t1′ already exists”, but because of how meta data locking works in 5.5 this is no longer the case, the second CREATE will simply block with unintended consequences. A workaround here would be to set lock_wait_timeout variable to a very low value and then execute the CREATE TABLE, this will make sure that the CREATE fails immediately (however due to a different reason):

    session2 > set session lock_wait_timeout=1;CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
    

    However, I feel that the CREATE TABLE should fail in such a case when the table already exists and there is no other DDL like a DROP table waiting to run on the same table, and as such I have reported the bug: http://bugs.mysql.com/bug.php?id=67873

The post Implications of Metadata Locking Changes in MySQL 5.5 appeared first on MySQL Performance Blog.

Aug
28
2012
--

Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels

As an instructor with Percona I’m sometimes asked about the differences between the READ COMMITTED and REPEATABLE READ transaction isolation levels.  There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking.

Extra locking (not gap locking)
It is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent deadlocks and maintain the isolation level.

If you run an UPDATE that is not well indexed you will lock many rows:

update employees set store_id = 0 where store_id = 1;
---TRANSACTION 1EAB04, ACTIVE 7 sec
633 lock struct(s), heap size 96696, 218786 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 0x7f8dfc35d700, query id 47 localhost root
show engine innodb status

In the employees table, the column `store_id` is not indexed. Notice that the UPDATE has completed running (we are now running SHOW ENGINE …) but we are holding 218786 row locks and only one undo entry. This means that only one row was changed, but we are still holding extra locks.  The heap size represents the amount of memory that has been allocated for locks.

In REPEATABLE READ every lock acquired during a transaction is held for the duration of the transaction.

In READ COMMITTED the locks that did not match the scan are released after the STATEMENT completes.

Here is the UPDATE statement repeated under READ COMMITTED:

---TRANSACTION 1EAB06, ACTIVE 11 sec
631 lock struct(s), heap size 96696, 1 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 0x7f8dfc35d700, query id 62 localhost root
show engine innodb status

You’ll notice that the heap size is the same, but we are now holding only one lock. In all transaction isolation levels InnoDB creates locks over every index entry scanned. The difference between the levels is that once the statement completes in READ COMMITTED mode, the locks are released for the entries that did not match the scan.  Note that InnoDB does not immediately release the heap memory back after releasing the locks, so the heap size is the same as as that in REPEATABLE READ, but the number of locks held is lower (only one).

This means that in READ COMMITTED other transactions are free to update rows that they would not have been able to update (in REPEATABLE READ) once the UPDATE statement completes.

 

Consistent read views
In REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC, sees < ABB ) is created at the start of the transaction, and this read view (consistent snapshot in Oracle terms) is held open for the duration of the transaction. If you execute a SELECT statement at 5AM, and come back in an open transaction at 5PM, when you run the same SELECT, then you will see the exact same resultset that you saw at 5AM.  This is called MVCC (multiple version concurrency control) and it is accomplished using row versioning and UNDO information.

In REPEATABLE READ InnoDB also creates gap locks for range scans.

select * from some_table where id > 100 FOR UPDATE;

The above update will create a gap lock that will prevent any rows with id > 100 from being inserted into the table until the transaction rolls back or commits.

In the same transaction, if the SELECT … FOR UPDATE is run at 5AM, and an UPDATE is run at 5PM (“UPDATE some_table where id > 100″) then the UPDATE will change the same rows that SELECT FOR UPDATE locked at 5AM. There is no possibility of changing additional rows, because the gap after 100 was previously locked.

 

Non-repeatable reads (read committed)
In READ COMMITTED, a read view is created at the start of each statement.   This means that a SELECT made at 5AM may show different results from the same SELECT run at 5PM, even in the same transaction.   This is because in READ COMMITTED the read view for the transaction lasts only as long as each statement execution.   As a result, consecutive executions of the same statement may show different results.

This is called the ‘phantom row’ problem.

In addition, in READ COMMITTED gap locks are never created.   Since there is no gap lock, the example SELECT .. FOR UPDATE above will not prevent insertions of new rows into the table by other transactions.   Thus, locking rows with SELECT … FOR UPDATE (ie “where id> 100″)  and subsequently updating  rows with “where id> 100″ (even in the same transaction) may result in more rows being updated than were earlier locked. This is because new rows may have been inserted in the table between the statements since there was no gap lock created for the SELECT … FOR UPDATE.

Useful links

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com