Nov
09
2021
--

What if … MySQL’s Repeatable Reads Cause You to Lose Money?

MySQL Repeatable Reads

MySQL Repeatable ReadsWell, let me say if that happens it’s because there is a logic mistake in your application. But you need to know and understand what happens in MySQL to be able to avoid the problem. 

In short, the WHY of this article is to inform you about possible pitfalls and how to prevent them from causing you damage.

Let us start by having a short introduction to what Repeatable reads are about. Given I am extremely lazy, I am going to use (a lot) existing documentation from the MySQL documentation.

Transaction isolation is one of the foundations of database processing. Isolation is the I in the acronym ACID; the isolation level is the setting that fine-tunes the balance between performance and reliability, consistency, and reproducibility of results when multiple transactions are making changes and performing queries at the same time.

InnoDB offers all four transaction isolation levels described by the SQL:1992 standard: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The default isolation level for InnoDB is REPEATABLE READ.

  • REPEATABLE READ
    This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.
  • READ COMMITTED
    Each consistent read, even within the same transaction, sets and reads its own fresh snapshot.

And about Consistent Non-Blocking reads:

A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point in time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database.

Ok, but what does all this mean in practice? To understand, let us simulate this scenario:

I have a shop and I decide to grant a bonus discount to a selected number of customers that:

  • Have an active account to my shop
  • Match my personal criteria to access the bonus

My application is set to perform batch operations in a moment of less traffic and unattended. This includes reactivating dormant accounts that customers ask to reactivate.  

What we will do is to see what happens, by default, and then see what we can do to avoid pitfalls.

The Scenario

I will use three different connections to connect to the same MySQL 8.0.27 instance. The only relevant setting I have modified is the Innodb_timeout that I set to 50 seconds. 

  • Session 1 will simulate a process that should activate the bonus feature for the selected customers
  • Session 2 is an independent process that reactivates the given list of customers
  • Session 3 is used to collect lock information

For this simple test I will use the customer table in the sakila schema modified as below:

CREATE TABLE `customer` (
  `customer_id` smallint unsigned NOT NULL AUTO_INCREMENT,
  `store_id` tinyint unsigned NOT NULL,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `email` varchar(50) DEFAULT NULL,
  `address_id` smallint unsigned NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `create_date` datetime NOT NULL,
  `last_update` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `bonus` int NOT NULL DEFAULT '0',
  `activate_bonus` varchar(45) NOT NULL DEFAULT '0',
  PRIMARY KEY (`customer_id`),
  KEY `idx_fk_store_id` (`store_id`),
  KEY `idx_fk_address_id` (`address_id`),
  KEY `idx_last_name` (`last_name`),
  KEY `idx_bonus` (`bonus`),
  CONSTRAINT `fk_customer_address` FOREIGN KEY (`address_id`) REFERENCES `address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT `fk_customer_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4 COLLATE=ut

As you can see I have added the bonus and activate_bonus fields plus the idx_bonus index.

To be able to trace the locks, these are the threads ids by session:

session 1 17439
session 2 17430
session 3 17443

To collect the lock information:

SELECT 
    index_name, lock_type, lock_mode, lock_data, thread_id
FROM
    performance_schema.data_locks
WHERE
    object_schema = 'sakila'
        AND object_name = 'customer'
        AND lock_type = 'RECORD'
        AND thread_id IN (17439 , 17430)
ORDER BY index_name , lock_data DESC;

Ok, ready? Let’s start!

The Run…

While the following steps can be done in a more compressed way, I prefer to do it in a more verbose way, to make it more human-readable.

First, let us set the environment:

session1 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.07 sec)
session1 >Start Transaction;
Query OK, 0 rows affected (0.07 sec)

Then let see the list of the customers we will modify:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

As you can see we have 21 customers fitting our criteria. How much money is involved in this exercise?

session1 >SELECT
    ->     SUM(amount) income,
    ->     SUM(amount) * 0.90 income_with_bonus,
    ->    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
    -> FROM
    ->     sakila.customer AS c
    ->         JOIN
    ->     sakila.payment AS p ON c.customer_id = p.customer_id
    ->     where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 |         2174.6070 |           241.6230 |
+---------+-------------------+--------------------+

This exercise is going to cost me ~242 dollars. Keep this number in mind. What locks do I have at this point?

session3 >select index_name, lock_type, lock_mode,lock_data from performance_schema.data_locks where  object_schema = 'sakila' and object_name = 'customer' and lock_type = 'RECORD'  and
thread_id in (17439,17430) order by index_name, lock_data desc;
Empty set (0.00 sec)

The answer is none. 

Meanwhile, we have the other process that needs to reactivate the customers:

session2 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.00 sec)

session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 0              |
|           7 |        1 | MARIA      | MILLER    |     1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 0              |
|         178 |        2 | MARION     | SNYDER    |     1 | 0              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 0              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 0              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)

In this case, the process needs to reactivate 10 users.

session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10  Changed: 10  Warnings: 0
session2 >commit;
Query OK, 0 rows affected (0.00 sec)

All good, right? But before going ahead let’s double-check session 1:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

Perfect! My repeatable read still sees the same snapshot. Let me apply the changes:

session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
Query OK, 31 rows affected (0.06 sec)
Rows matched: 31  Changed: 31  Warnings: 0

Wait, what? My list reports 21 entries, but I have modified 31! And if I check the cost:

session1 >SELECT
    ->     SUM(amount) income,
    ->     SUM(amount) * 0.90 income_with_bonus,
    ->    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
    -> FROM
    ->     sakila.customer AS c
    ->         JOIN
    ->     sakila.payment AS p ON c.customer_id = p.customer_id
    ->     where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 |         3378.6090 |           375.4010 |
+---------+-------------------+--------------------+

Well, now the cost of this operation is 375 dollars, not 242. In this case, we are talking about peanuts, but guess what could be if we do something similar on thousands of users. 

Anyhow, let us first:

session1 >rollback;
Query OK, 0 rows affected (0.08 sec)

And cancel the operation.

So what happened; is this a bug?

No, it is not! It is how repeatable reads work in MySQL. The snapshot is relevant to the read operation, but if another session is able to write given also the absence of lock at the moment of reads, the next update operation will touch any value in the table matching the conditions. 

As shown above, this can be very dangerous. But only if you don’t do the right things in the code. 

How Can I Prevent This From Happening?

When coding, hope for the best, plan for the worse, always! Especially when dealing with databases. That approach may save you from spending nights trying to fix the impossible. 

So how can this be prevented? You have two ways, both simple, but both with positive and negative consequences.

  1. Add this simple clausole to the select statement: for share
  2. Add the other simple clausole to the select statement: for update
  3. Use Read-committed

Solution one is easy and clean, with no other change in the code. BUT it creates locks, and if your application is lock sensitive this may be an issue for you.

Solution two is also easy, but a bit more encapsulated and locking. 

On the other hand, solution three does not add locks but requires modifications in the code and it still leaves some space for problems.

Let us see them in detail.

Solution One

Let us repeat the same steps:

session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for share;

Now we check the locks (for brevity I have cut some entries and kept a few as a sample):

session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
    -> object_schema = 'sakila'
    -> and object_name = 'customer'
    -> and lock_type = 'RECORD'
    -> and thread_id = 17439
    -> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | S             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | S             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | S             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | S             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 9                      |     17439 |
<snip>
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.00 sec)

This time we can see that the select has raised a few locks all in S (shared) mode. 

For brevity, I am skipping to report exactly the same results as in the first exercise.

If we go ahead and try with session two:

session2 >set transaction_isolation = 'READ-COMMITTED';
session2 >Start Transaction;
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Here we go! Attempting to change the values in the customer table is now on hold waiting to acquire the lock. If the duration exceeds the Innodb_wait_timeout, then the execution is interrupted and the application must have a try-catch mechanism to retry the operation. This last one is a best practice that you should already have in place in your code. If not, well add it now!

At this point, session one can proceed and complete the operations. After that and before the final commit, we will be able to observe:

+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | S             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | S             | 1, 9                   |     17439 |
| idx_bonus  | RECORD    | X             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| idx_bonus  | RECORD    | S             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 9                      |     17439 |
<snip>|
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
126 rows in set (0.00 sec)

As you can see we now have two different lock types, the Shared one from the select, and the exclusive lock (X) from the update.

Solution Two

In this case, we just change the kind of lock we set on the select. In solution one we opt for a shared lock, which allows other sessions to eventually acquire another shared lock. Here we go for an exclusive lock that will put all the other operations on hold.

session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for UPDATE;

If now we check the locks (again, for brevity I have cut some entries and kept a few as a sample):

session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
    -> object_schema = 'sakila'
    -> and object_name = 'customer'
    -> and lock_type = 'RECORD'
    -> and thread_id = 17439
    -> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | X             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 80                     |     17439 |
<snip>
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 128                    |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.09 sec)

In this case, the kind of lock is X as exclusive, so the other operations when requesting a lock must wait for this transaction to complete.

session2 >#set transaction_isolation = 'READ-COMMITTED';
session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.06 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;

And it will wait for N time where N is either the Innodb_lock_wait_timeout or the commit time from session one.

Note the lock request on hold:

+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | X             | 1, 1                   |     17430 |<--
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
<snip>
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
64 rows in set (0.09 sec)

Session two is trying to lock exclusively the idx_bonus but cannot proceed.

Once session one goes ahead, we have:

session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
Query OK, 0 rows affected (0.06 sec)
Rows matched: 21  Changed: 0  Warnings: 0

session1 >
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 1              |
|          44 |        1 | MARIE      | TURNER    |     1 | 1              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 1              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 1              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 1              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 1              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 1              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 1              |
|         267 |        1 | MARGIE     | WADE      |     1 | 1              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 1              |
|         312 |        2 | MARK       | RINEHART  |     1 | 1              |
|         383 |        1 | MARTIN     | BALES     |     1 | 1              |
|         413 |        2 | MARVIN     | YEE       |     1 | 1              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 1              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 1              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 1              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 1              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 1              |
|         553 |        1 | MAX        | PITT      |     1 | 1              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 1              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 1              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

session1 >SELECT      SUM(amount) income,     SUM(amount) * 0.90 income_with_bonus,    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus FROM     sakila.customer AS c         JOIN     sakila.payment AS p ON c.customer_id = p.customer_id     where active = 1 and bonus =1;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 |         2174.6070 |           241.6230 |
+---------+-------------------+--------------------+
1 row in set (0.06 sec)

Now my update matches the expectations and the other operations are on hold.

After session one commits:

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 1              |
|           7 |        1 | MARIA      | MILLER    |     1 | 1              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 1              |
|         178 |        2 | MARION     | SNYDER    |     1 | 1              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 1              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 1              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 1              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 1              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 1              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 1              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (52.72 sec) <-- Note the time!

session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.06 sec)
Rows matched: 10  Changed: 10  Warnings: 0

Session two is able to complete, BUT it was on hold for 52 seconds waiting for session one. As said, this solution is a good one if you can afford locks and waiting time.

Solution Three

In this case, we will use a different isolation model that will allow session one to see what session two has modified.

session1 >set transaction_isolation = 'READ-COMMITTED';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

Same result as before. Now let us execute commands in session two:

session2 >set transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.00 sec)

session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 0              |
|           7 |        1 | MARIA      | MILLER    |     1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 0              |
|         178 |        2 | MARION     | SNYDER    |     1 | 0              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 0              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 0              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10  Changed: 10  Warnings: 0

session2 >commit;
Query OK, 0 rows affected (0.00 sec)

All seems the same, but if we check again in session one:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+------+----------------+
| customer_id | store_id | first_name | last_name |bonus | activate_bonus |
+-------------+----------+------------+-----------+------+----------------+
|         383 |        1 | MARTIN     | BALES     |    1 | 1              |
|         539 |        1 | MATHEW     | BOLIN     |    1 | 1              |
|         441 |        1 | MARIO      | CHEATHAM  |    1 | 1              |
|         482 |        1 | MAURICE    | CRAWLEY   |    1 | 1              |
|         236 |        1 | MARCIA     | DEAN      |    1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |    1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |    1 | 1              |
|          38 |        1 | MARTHA     | GONZALEZ  |    1 | 1              |
|         444 |        2 | MARCUS     | HIDALGO   |    1 | 1              |
|         252 |        2 | MATTIE     | HOFFMAN   |    1 | 1              |
|         256 |        2 | MABEL      | HOLLAND   |    1 | 1              |
|         226 |        2 | MAUREEN    | LITTLE    |    1 | 1              |
|         323 |        2 | MATTHEW    | MAHAN     |    1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |    1 | 0              |
|           7 |        1 | MARIA      | MILLER    |    1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |    1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |    1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |    1 | 1              |
|         499 |        2 | MARC       | OUTLAW    |    1 | 1              |
|         553 |        1 | MAX        | PITT      |    1 | 1              |
|         312 |        2 | MARK       | RINEHART  |    1 | 1              |
|          80 |        1 | MARILYN    | ROSS      |    1 | 1              |
|         254 |        2 | MAXINE     | SILVA     |    1 | 0              |
|           1 |        1 | MARY       | SMITH     |    1 | 0              |
|         178 |        2 | MARION     | SNYDER    |    1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |    1 | 1              |
|         128 |        1 | MARJORIE   | TUCKER    |    1 | 1              |
|          44 |        1 | MARIE      | TURNER    |    1 | 1              |
|         267 |        1 | MARGIE     | WADE      |    1 | 1              |
|         240 |        1 | MARLENE    | WELCH     |    1 | 1              |
|         413 |        2 | MARVIN     | YEE       |    1 | 1              |
+-------------+----------+------------+-----------+------+----------------+
31 rows in set (0.09 sec)

We now can see all the 31 customers and also the value calculation:

+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 |         3378.6090 |           375.4010 |
+---------+-------------------+--------------------+

It is reporting the right values.

At this point, we can program some logic in the process to check the possible tolerance and have the process either complete performing the update operations or stop the process and exit.

This as mentioned requires additional coding and more logic.

To be fully honest, this solution still leaves space for some possible interference, but at least allows the application to be informed about what is happening. Still, it should be used only if you cannot afford to have a higher level of locking, but this is another story/article.

Conclusion

When writing applications that interact with an RDBMS, you must be very careful in what you do and HOW you do it. While using data facilitation layers like Object Relational Mapping (ORM), seems to make your life easier, in reality, you may lose control of a few crucial aspects of the application’s interaction. So be very careful when opting for how to access your data.

About the case reported above we can summarize a few pitfalls:

  1. First and most important, never ever have processes that may interfere with the one running at the same time. Be very careful when you design them and even more careful when planning their executions.  
  2. Use options such as for share or for update in your select statements. Use them carefully to avoid unuseful locks, but use them. 
  3. In case you have a long process that requires modifying data, and you need to be able to check if other processes have altered the status of the data, but at the same time, cannot have a long wait time for locks. Then use a mix, setting READ-COMMITTED as isolation, to allow your application to check, but also add things like for share or for update in the select statement, immediately before the DML and the commit. That will allow you to prevent writes while you are finalizing, and also to significantly reduce the locking time. 
  4. Keep in mind that long-running processes and long-running transactions can be the source of a lot of pain, especially when using REPEATABLE-READ. Try whenever you can to split the operations into small chunks. 

Finally, when developing, remember that DBAs are friends, and are there to help you to do things at your best. It may seem they are giving you a hard time, but that is because their point of view is different, focusing on data consistency, availability, and durability. But they can help you to save a lot of time after the code is released, especially when you try to identify why something has gone wrong. 

So involve them soon in the design process, use them and let them be part of the process.

References

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-model.html

 

Complete the 2021 Percona Open Source Data Management Software Survey

Have Your Say!

Jan
19
2015
--

Looking deeper into InnoDB’s problem with many row versions

A few days ago I wrote about MySQL performance implications of InnoDB isolation modes and I touched briefly upon the bizarre performance regression I found with InnoDB handling a large amount of versions for a single row. Today I wanted to look a bit deeper into the problem, which I also filed as a bug.

First I validated in which conditions the problem happens. It seems to happen only in REPEATABLE-READ isolation mode and only in case there is some hot rows which get many row versions during a benchmark run. For example the problem does NOT happen if I run sysbench with “uniform” distribution.

In terms of concurrent selects it also seems to require some very special conditions – you need to have the connection to let some history accumulate by having read snapshot open and then do it again with high history. The exact queries to do that seems not to be that important.

Contrary to what I expected this problem also does not require multiple connections – I can repeat it with only one read and one write connection, which means it can happen in a lot more workloads.

Here is the simplified case to repeat it:

sysbench  --num-threads=1 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=uniform --oltp-table-size=1 --mysql-user=root --mysql-password= --mysql-db=sbinnodb  --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

It is repeatable on the table with just 1 row where this row is very hot!

Select query sequence:

begin;
select c  from sbtest1 limit 1;
select sleep(20);
commit;
begin;
select c  from sbtest1 limit 1;
select sleep(300);
commit;

It is interesting though, in this case it does not look like it is enough to open a consistent snapshot – it is only when I run the query on the same table as the update workload is running (I assume touches the same data) when the issue happens.

Let’s look at some graphs:

The transaction rate indeed suffers dramatically – in this case going down more than 100x from close to 15K to around 60 in the lowest point.

It is interesting that CPU consumption is reduced but not as much as the throughput.

This is something I find the most enlightening about what seems to happen. The number of buffer pool read requests increases dramatically where the number of transactions goes down. If we do the math we have 940K buffer pool reads per 14.5K updates in the normal case giving us 65 buffer reads per update, which goes up to 37000 per update when we have regression happening. Which is a dramatic change and it points to something that goes badly wrong.

History size is of interest. We can see it grows quickly first – this is when we have the first transaction and sleep(20) and when it grows very slowly when the second transaction is open as update rate is low. Finally it goes to zero very quickly once the transaction is committed. What is worth noting is the history length here peaks at just 280K which is really quite trivial size.

The InnoDB contention graph is interesting from 2 view points. First it shows that the contention picture changes at the time when the transaction is held open and also right afterward when history is being worked out.

If you look at this graph focused only on the os_wait metrics we can see that mutex_os_waits goes down during the problem while x-locks os waits increases significantly. This means there are some very long exclusive lock os-waits happening which is indicative of the long waits. We can’t actually lock times from status variables – we would need to get data from performance schema for that which unfortunately does not cover key mutexes, and which I unfortunately can’t graph easily yet.

Lets look at some more data.

From the great processlist from sys-schema we can get:

*************************** 4. row ***************************
                thd_id: 536
               conn_id: 516
                  user: root@localhost
                    db: sbinnodb
               command: Query
                 state: updating
                  time: 0
     current_statement: UPDATE sbtest1 SET k=k+1 WHERE id=1
          lock_latency: 36.00 us
         rows_examined: 1
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: NULL
last_statement_latency: NULL
             last_wait: wait/io/table/sql/handler
     last_wait_latency: Still Waiting
                source: handler.cc:7692

Which is unfortunately not very helpful as it does not show the actual lock which is being waited on – just saying that it is waiting somewhere inside storage engine call which is obvious.

We can get some good summary information in the other table:

mysql> select * from waits_global_by_latency;
+------------------------------------------------------+-----------+---------------+-------------+-------------+
| events                                               | total     | total_latency | avg_latency | max_latency |
+------------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/synch/cond/sql/Item_func_sleep::cond            |       384 | 00:31:27.18   | 4.91 s      | 5.00 s      |
| wait/synch/rwlock/innodb/index_tree_rw_lock          | 239460870 | 00:12:13.82   | 3.06 us     | 495.27 ms   |
| wait/synch/mutex/sql/THD::LOCK_thd_data              | 601160083 | 36.48 s       | 60.41 ns    | 643.18 us   |
| wait/io/table/sql/handler                            |      1838 | 29.12 s       | 15.84 ms    | 211.21 ms   |
| wait/synch/mutex/mysys/THR_LOCK::mutex               | 240456674 | 16.57 s       | 68.86 ns    | 655.16 us   |
| wait/synch/cond/sql/MDL_context::COND_wait_status    |        15 | 14.11 s       | 940.37 ms   | 999.36 ms   |
| wait/synch/mutex/innodb/trx_mutex                    | 360685304 | 10.52 s       | 29.11 ns    | 4.01 ms     |
| wait/synch/mutex/innodb/trx_undo_mutex               | 120228229 | 3.87 s        | 31.93 ns    | 426.16 us   |
| wait/io/file/innodb/innodb_data_file                 |       399 | 1.46 s        | 3.66 ms     | 11.21 ms    |
| wait/synch/mutex/innodb/buf_pool_flush_state_mutex   |     32317 | 587.10 ms     | 18.17 us    | 11.27 ms    |
| wait/synch/rwlock/innodb/fil_space_latch             |   4154178 | 386.96 ms     | 92.96 ns    | 74.57 us    |
| wait/io/file/innodb/innodb_log_file                  |       987 | 271.76 ms     | 275.34 us   | 14.23 ms    |
| wait/synch/rwlock/innodb/hash_table_locks            |   8509138 | 255.76 ms     | 30.05 ns    | 53.41 us    |
| wait/synch/mutex/innodb/srv_threads_mutex            |  11938747 | 249.49 ms     | 20.66 ns    | 15.06 us    |
| wait/synch/rwlock/innodb/trx_purge_latch             |   8488041 | 181.01 ms     | 21.28 ns    | 23.51 us    |
| wait/synch/rwlock/innodb/dict_operation_lock         |       232 | 100.48 ms     | 433.09 us   | 61.88 ms    |
| wait/synch/rwlock/innodb/checkpoint_lock             |        10 | 32.34 ms      | 3.23 ms     | 8.38 ms     |
| wait/io/socket/sql/client_connection                 |      2171 | 30.87 ms      | 14.22 us    | 3.95 ms     |
| wait/synch/mutex/innodb/log_sys_mutex                |    260485 | 6.57 ms       | 25.04 ns    | 52.00 us    |
| wait/synch/rwlock/innodb/btr_search_latch            |      4982 | 1.71 ms       | 343.05 ns   | 17.83 us    |
| wait/io/file/myisam/kfile                            |       647 | 1.70 ms       | 2.63 us     | 39.28 us    |
| wait/synch/rwlock/innodb/dict_table_stats            |     46323 | 1.63 ms       | 35.06 ns    | 6.55 us     |
| wait/synch/mutex/innodb/os_mutex                     |     11410 | 904.63 us     | 79.19 ns    | 26.67 us    |
| wait/lock/table/sql/handler                          |      1838 | 794.21 us     | 431.94 ns   | 9.70 us     |
| wait/synch/rwlock/sql/MDL_lock::rwlock               |      7056 | 672.34 us     | 95.15 ns    | 53.29 us    |
| wait/io/file/myisam/dfile                            |       139 | 518.50 us     | 3.73 us     | 31.61 us    |
| wait/synch/mutex/sql/LOCK_global_system_variables    |     11692 | 462.58 us     | 39.44 ns    | 363.39 ns   |
| wait/synch/mutex/innodb/rseg_mutex                   |      7238 | 348.25 us     | 47.89 ns    | 10.38 us    |
| wait/synch/mutex/innodb/lock_mutex                   |      5747 | 222.13 us     | 38.50 ns    | 9.84 us     |
| wait/synch/mutex/innodb/trx_sys_mutex                |      4601 | 187.43 us     | 40.69 ns    | 326.15 ns   |
| wait/synch/mutex/sql/LOCK_table_cache                |      2173 | 185.14 us     | 85.14 ns    | 10.02 us    |
| wait/synch/mutex/innodb/fil_system_mutex             |      3481 | 144.60 us     | 41.32 ns    | 375.91 ns   |
| wait/synch/rwlock/sql/LOCK_grant                     |      1217 | 121.86 us     | 99.85 ns    | 12.07 us    |
| wait/synch/mutex/innodb/flush_list_mutex             |      3191 | 108.82 us     | 33.80 ns    | 8.81 us     |
| wait/synch/mutex/innodb/purge_sys_bh_mutex           |      3600 | 83.52 us      | 23.16 ns    | 123.95 ns   |
| wait/synch/mutex/sql/MDL_map::mutex                  |      1456 | 75.02 us      | 51.33 ns    | 8.78 us     |
| wait/synch/mutex/myisam/MYISAM_SHARE::intern_lock    |      1357 | 55.55 us      | 40.69 ns    | 320.51 ns   |
| wait/synch/mutex/innodb/log_flush_order_mutex        |       204 | 40.84 us      | 200.01 ns   | 9.42 us     |
| wait/io/file/sql/FRM                                 |        16 | 31.16 us      | 1.95 us     | 16.40 us    |
| wait/synch/mutex/mysys/BITMAP::mutex                 |       378 | 30.52 us      | 80.44 ns    | 9.16 us     |
| wait/synch/mutex/innodb/dict_sys_mutex               |       463 | 24.15 us      | 51.96 ns    | 146.17 ns   |
| wait/synch/mutex/innodb/buf_pool_LRU_list_mutex      |       293 | 23.37 us      | 79.50 ns    | 313.94 ns   |
| wait/synch/mutex/innodb/buf_dblwr_mutex              |       398 | 22.60 us      | 56.65 ns    | 380.61 ns   |
| wait/synch/rwlock/myisam/MYISAM_SHARE::key_root_lock |       388 | 14.86 us      | 38.19 ns    | 254.16 ns   |
| wait/synch/mutex/sql/LOCK_plugin                     |       250 | 14.30 us      | 56.97 ns    | 137.41 ns   |
| wait/io/socket/sql/server_unix_socket                |         2 | 9.35 us       | 4.67 us     | 4.72 us     |
| wait/synch/mutex/sql/THD::LOCK_temporary_tables      |       216 | 8.97 us       | 41.32 ns    | 465.74 ns   |
| wait/synch/mutex/sql/hash_filo::lock                 |       151 | 8.35 us       | 55.09 ns    | 150.87 ns   |
| wait/synch/mutex/mysys/THR_LOCK_open                 |       196 | 7.84 us       | 39.75 ns    | 110.80 ns   |
| wait/synch/mutex/sql/TABLE_SHARE::LOCK_ha_data       |       159 | 6.23 us       | 39.13 ns    | 108.92 ns   |
| wait/synch/mutex/innodb/lock_wait_mutex              |        29 | 5.38 us       | 185.30 ns   | 240.38 ns   |
| wait/synch/mutex/innodb/ibuf_mutex                   |        29 | 5.26 us       | 181.23 ns   | 249.46 ns   |
| wait/synch/mutex/sql/LOCK_status                     |        62 | 4.12 us       | 66.36 ns    | 127.39 ns   |
| wait/synch/mutex/sql/LOCK_open                       |       109 | 4.05 us       | 36.93 ns    | 105.79 ns   |
| wait/io/file/sql/dbopt                               |         3 | 3.34 us       | 1.11 us     | 1.99 us     |
| wait/synch/mutex/innodb/buf_pool_free_list_mutex     |        30 | 1.58 us       | 52.58 ns    | 106.73 ns   |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex     |        31 | 1.38 us       | 44.45 ns    | 89.52 ns    |
| wait/synch/mutex/sql/LOCK_thread_count               |        33 | 1.10 us       | 33.18 ns    | 50.71 ns    |
| wait/synch/mutex/sql/LOCK_prepared_stmt_count        |        32 | 1.10 us       | 34.12 ns    | 67.92 ns    |
| wait/synch/mutex/innodb/srv_sys_mutex                |        42 | 1.07 us       | 25.35 ns    | 49.14 ns    |
| wait/synch/mutex/mysys/KEY_CACHE::cache_lock         |        24 | 1.02 us       | 42.26 ns    | 103.60 ns   |
| wait/synch/mutex/sql/MDL_wait::LOCK_wait_status      |        18 | 544.62 ns     | 30.05 ns    | 42.57 ns    |
| wait/synch/rwlock/sql/CRYPTO_dynlock_value::lock     |         2 | 509.25 ns     | 254.47 ns   | 287.02 ns   |
| wait/synch/rwlock/sql/LOCK_dboptions                 |         6 | 262.92 ns     | 43.82 ns    | 84.82 ns    |
| wait/synch/rwlock/sql/MDL_context::LOCK_waiting_for  |         4 | 262.92 ns     | 65.73 ns    | 103.60 ns   |
| wait/synch/mutex/sql/LOCK_connection_count           |         4 | 238.51 ns     | 59.47 ns    | 123.01 ns   |
| wait/synch/mutex/sql/LOCK_sql_rand                   |         2 | 172.15 ns     | 86.08 ns    | 94.84 ns    |
| wait/synch/rwlock/sql/LOCK_system_variables_hash     |         2 | 141.16 ns     | 70.43 ns    | 78.25 ns    |
| wait/synch/rwlock/sql/gtid_commit_rollback           |         2 | 95.47 ns      | 47.58 ns    | 51.65 ns    |
| wait/synch/mutex/innodb/autoinc_mutex                |         2 | 90.14 ns      | 45.07 ns    | 45.07 ns    |
| wait/synch/mutex/sql/LOCK_thd_remove                 |         2 | 85.14 ns      | 42.57 ns    | 52.58 ns    |
| wait/synch/mutex/innodb/ibuf_bitmap_mutex            |         1 | 76.37 ns      | 76.37 ns    | 76.37 ns    |
| wait/synch/mutex/sql/LOCK_xid_cache                  |         2 | 62.91 ns      | 31.30 ns    | 34.43 ns    |
+------------------------------------------------------+-----------+---------------+-------------+-------------+
73 rows in set (0.00 sec)

Which shows a couple of hotspots – the well-known index_tree_rw_lock hotspot which should be gone in MySQL 5.7 as well as trx_mutex and trx_undo_mutexes. It is especially worth noting the almost 500ms which this index tree lw_lock was waited at max (this is the data for a couple of minutes only, so it it is unlikely to be a very rare outlier). This hints to me that in the case of many versions our index_tree_rw_lock might be held for a prolonged period of time.

What is interesting is that the reliable “SHOW ENGINE INNODB STATUS” remains quite handy:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 58847289
--Thread 140386357991168 has waited at btr0cur.cc line 304 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2ca40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row /row0row.cc line 815
--Thread 140386374776576 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386324420352 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386332813056 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386445960960 has waited at btr0cur.cc line 586 for 0.0000 seconds the semaphore:
S-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386366383872 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386341205760 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577
--Thread 140386349598464 has waited at btr0cur.cc line 577 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fade8077f98 '&new_index->lock'
a writer (thread id 140386357991168) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 586
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr /btr0cur.cc line 577

Another sample….

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 61753307
--Thread 140386332813056 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386366383872 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386341205760 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386374776576 has waited at row0row.cc line 815 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0x7fb0bea2cd40 '&block->lock'
a writer (thread id 140386454755072) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.cc line 1082
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/row/row0row.cc line 815
--Thread 140386445960960 has waited at btr0cur.cc line 257 for 0.0000 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7fb0bea2f2c0 '&block->lock'
a writer (thread id 140386445960960) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/btr/btr0cur.cc line 257
OS WAIT ARRAY INFO: signal count 45260762
Mutex spin waits 1524575466, rounds 5092199292, OS waits 43431355
RW-shared spins 14500456, rounds 258217222, OS waits 3721069
RW-excl spins 37764438, rounds 730059390, OS waits 12592324
Spin rounds per wait: 3.34 mutex, 17.81 RW-shared, 19.33 RW-excl

The first one shows mostly index lock contention which we have already seen. The second, though, shows a lot of contention on the block lock, which is something that is not covered by performance schema instrumentation in MySQL 5.6.

Now to tell the truth in this 2nd thread test case it is not your typical case of contention – unlike the case with 64 threads we do not see that drastic CPU drop and it still stays above 25%, the single thread fully saturated should show on this system. This is however where DBA level profiling gets tough with MySQL – as I mentioned the only indication of excessive activity going on is a high number of buffer pool requests – there is no instrumentation which I’m aware of that tells us anything around scanning many old row versions in undo space. Perhaps some more metrics around this syssystem need to be added to INNODB_METRICS?

A good tool to see CPU consumption which both developers and DBAs can use is oprofile, which is of low enough overhead to use in production. Note that to get any meaningful information about the CPU usage inside the program you need to have debug systems installed. For Percona Server 5.6 On Ubuntu you can do apt-get install percona-server-5.6-dbg

To profile just the MySQL Process you can do:

root@ts140i:~# operf --pid 8970
operf: Press Ctl-c or 'kill -SIGINT 9113' to stop profiling
operf: Profiler started
^C
Profiling done.

To get top symbols (functions) called in mysqld I can do:

root@ts140i:~# opreport -l | more
Using /root/oprofile_data/samples/ for samples directory.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:8970 range:0x7fffba5fe000-0x7fffba5fffff) could not be found.
CPU: Intel Haswell microarchitecture, speed 3.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
212162   12.4324  mysqld                   buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
204230   11.9676  mysqld                   dict_table_copy_types(dtuple_t*, dict_table_t const*)
179783   10.5351  no-vmlinux               /no-vmlinux
144335    8.4579  mysqld                   trx_undo_update_rec_get_update(unsigned char*, dict_index_t*, unsigned long, unsigned long, unsigned long, unsigned long, trx_t*, mem_block_info_t*, upd_t
**)
95984     5.6245  mysqld                   trx_undo_prev_version_build(unsigned char const*, mtr_t*, unsigned char const*, dict_index_t*, unsigned long*, mem_block_info_t*, unsigned char**)
78785     4.6167  mysqld                   row_build(unsigned long, dict_index_t const*, unsigned char const*, unsigned long const*, dict_table_t const*, dtuple_t const*, unsigned long const*, row_
ext_t**, mem_block_info_t*)
76656     4.4919  libc-2.19.so             _int_free
63040     3.6941  mysqld                   rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
52794     3.0937  mysqld                   mtr_commit(mtr_t*)

Which points to some hot functions – quite as expected we see some functions working with undo space here. In some cases it is also helpful to look at the call graph to understand where functions are being called if it is not clear.

Now in many cases looking at oprofile report it is hard to figure out what is “wrong” without having the baseline. In this case we can look at the oprofile run for sysbench alone:

root@ts140i:~# opreport -l | more
Using /root/oprofile_data/samples/ for samples directory.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:8970 range:0x7fffba5fe000-0x7fffba5fffff) could not be found.
CPU: Intel Haswell microarchitecture, speed 3.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
1331386  28.4544  no-vmlinux               /no-vmlinux
603873   12.9060  mysqld                   ut_delay(unsigned long)
158428    3.3859  mysqld                   buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*,
 unsigned long, mtr_t*)
88172     1.8844  mysqld                   my_timer_cycles
75948     1.6232  mysqld                   srv_worker_thread
74350     1.5890  mysqld                   MYSQLparse(THD*)
68075     1.4549  mysqld                   mutex_spin_wait(void*, bool, char const*, unsigned long)
59321     1.2678  mysqld                   que_run_threads(que_thr_t*)
59152     1.2642  mysqld                   start_mutex_wait_v1
57495     1.2288  mysqld                   rw_lock_x_lock_func(rw_lock_t*, unsigned long, char const*, unsigned long, bool, bool)
53133     1.1356  mysqld                   mtr_commit(mtr_t*)
44404     0.9490  mysqld                   end_mutex_wait_v1
40226     0.8597  libpthread-2.19.so       pthread_mutex_lock
38889     0.8311  mysqld                   log_block_calc_checksum_innodb(unsigned char const*)

As you see this report is very different in terms of top functions.

Final thoughts: I had 2 purposes with this blog post. First, I hope it will contain some helpful information for developers so this bug can be fixed at least in MySQL 5.7. It is quite a nasty one as it can come out of nowhere if the application unexpectedly develops hot rows. Second, I hope it will show some of the methods you can use to capture and analyze data about MySQL performance.

The post Looking deeper into InnoDB’s problem with many row versions appeared first on MySQL Performance Blog.

Jan
14
2015
--

MySQL performance implications of InnoDB isolation modes

Over the past few months I’ve written a couple of posts about dangerous debt of InnoDB Transactional History and about the fact MVCC can be the cause of severe MySQL performance issues. In this post I will cover a related topic – InnoDB Transaction Isolation Modes, their relationship with MVCC (multi-version concurrency control) and how they impact MySQL performance.

The MySQL Manual provides a decent description of transaction isolation modes supported by MySQL – I will not repeat it here but rather focus on performance implications.

SERIALIZABLE – This is the strongest isolation mode to the point it essentially defeats Multi-Versioning making all SELECTs locking causing significant overhead both in terms of lock management (setting locks is expensive) and the concurrency you can get. This mode is only used in very special circumstances among MySQL Applications.

REPEATABLE READ – This is default isolation level and generally it is quite nice and convenient for the application. It sees all the data at the time of the first read (assuming using standard non-locking reads). This however comes at high cost – InnoDB needs to maintain transaction history up to the point of transaction start, which can be very expensive. The worse-case scenario being applications with a high level of updates and hot rows – you really do not want InnoDB to deal with rows which have hundreds of thousands of versions.

In terms of performance impact, both reads and writes can be impacted. For select queries traversing multiple row versions is very expensive but so it is also for updates, especially as version control seems to cause severe contention issues in MySQL 5.6

Here is example: I ran sysbench for a completely in-memory data set and when start transaction and run full table scan query couple of times while keeping transaction open:

sysbench  --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=80000000 --mysql-user=root --mysql-password= --mysql-db=sbinnodb  --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

As you can see the write throughput drops drastically and stays low at all times when transaction is open, and not only when the query is running. This is perhaps the worse-case scenario I could find which happens when you have select outside of transaction followed by a long transaction in repeatable-read isolation mode. Though you can see regression in other cases, too.

Here is the set of queries I used if someone wants to repeat the test:

select avg(length(c)) from sbtest1;
begin;
select avg(length(c)) from sbtest1;
select sleep(300);
commit;

Not only is Repeatable Read the default isolation level, it is also used for logical backups with InnoDB – think mydumper or mysqldump –single-transaction
These results show such backup methods not only can’t be used with large data sets due to long recovery time but also can’t be used with some high write environments due to their performance impact.

READ COMMITTED mode is similar to REPEATABLE READ with the essential difference being what versions are not kept to the start of first read in transaction, but instead to the start of the current statement. As such using this mode allows InnoDB to maintain a lot less versions, especially if you do not have very long running statements. If you have some long running selects – such as reporting queries performance impact can be still severe.

In general I think good practice is to use READ COMITTED isolation mode as default and change to REPEATABLE READ for those applications or transactions which require it.

READ UNCOMMITTED – I think this is the least understood isolation mode (not a surprise as it only has 2 lines of documentation about it) which only describe it from the logical point of view. If you’re using this isolation mode you will see all the changes done in the database as they happen, even those by transactions which have not been yet committed. One nice use case for this isolation mode is what you can “watch” as some large scale UPDATE statements are happening with dirty reads showing which rows have already been changes and which have not.

So this statement shows changes that have not been committed yet and might not ever be committed if the transaction doing them runs into some errors so – this mode should be used with extreme care. There are a number of cases though when we do not need 100% accurate data and in this case this mode becomes very handy.

So how does READ UNCOMMITTED behave from a performance point of view? In theory InnoDB could purge row versions even if they have been created after the start of the statement in READ UNCOMMITTED mode. In practice, due to a bug or some intricate implementation detail it does not do that – row versions still will be kept to the start of the statement. So if you run very long running SELECT in READ UNCOMMITTED statements you will get a large amount of row versions created, just as if you would use READ COMMITTED. No win here.

There is an important win from SELECT side – READ UNCOMMITTED isolation mode means InnoDB never needs to go and examine the older row versions – the last row version is always the correct one which can cause dramatic performance improvement especially if the undo space had spilled over to the disk so finding old row versions can cause a lot of IO.

Perhaps the best illustration I discovered with query select avg(k) from sbtest1; ran in parallel with the same update heavy workload stated above. In READ COMMITTED isolation mode it never completes – I assume because new index entries are inserted faster than they are scanned, in case of READ UNCOMMITTED isolation mode it completes in a minute or so.

Final Thoughts: Using InnoDB Isolation modes correctly can help your application to get the best possible performance. Your mileage may vary and in some cases you will see no difference; in others it will be absolutely dramatic. There also seems to be a lot of work to be done in relationship to InnoDB performance with long version history. I hope it will be tackled in the future MySQL version.

The post MySQL performance implications of InnoDB isolation modes appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com