Nov
09
2021
--

What if … MySQL’s Repeatable Reads Cause You to Lose Money?

MySQL Repeatable Reads

MySQL Repeatable ReadsWell, let me say if that happens it’s because there is a logic mistake in your application. But you need to know and understand what happens in MySQL to be able to avoid the problem. 

In short, the WHY of this article is to inform you about possible pitfalls and how to prevent them from causing you damage.

Let us start by having a short introduction to what Repeatable reads are about. Given I am extremely lazy, I am going to use (a lot) existing documentation from the MySQL documentation.

Transaction isolation is one of the foundations of database processing. Isolation is the I in the acronym ACID; the isolation level is the setting that fine-tunes the balance between performance and reliability, consistency, and reproducibility of results when multiple transactions are making changes and performing queries at the same time.

InnoDB offers all four transaction isolation levels described by the SQL:1992 standard: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The default isolation level for InnoDB is REPEATABLE READ.

  • REPEATABLE READ
    This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.
  • READ COMMITTED
    Each consistent read, even within the same transaction, sets and reads its own fresh snapshot.

And about Consistent Non-Blocking reads:

A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point in time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database.

Ok, but what does all this mean in practice? To understand, let us simulate this scenario:

I have a shop and I decide to grant a bonus discount to a selected number of customers that:

  • Have an active account to my shop
  • Match my personal criteria to access the bonus

My application is set to perform batch operations in a moment of less traffic and unattended. This includes reactivating dormant accounts that customers ask to reactivate.  

What we will do is to see what happens, by default, and then see what we can do to avoid pitfalls.

The Scenario

I will use three different connections to connect to the same MySQL 8.0.27 instance. The only relevant setting I have modified is the Innodb_timeout that I set to 50 seconds. 

  • Session 1 will simulate a process that should activate the bonus feature for the selected customers
  • Session 2 is an independent process that reactivates the given list of customers
  • Session 3 is used to collect lock information

For this simple test I will use the customer table in the sakila schema modified as below:

CREATE TABLE `customer` (
  `customer_id` smallint unsigned NOT NULL AUTO_INCREMENT,
  `store_id` tinyint unsigned NOT NULL,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `email` varchar(50) DEFAULT NULL,
  `address_id` smallint unsigned NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `create_date` datetime NOT NULL,
  `last_update` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `bonus` int NOT NULL DEFAULT '0',
  `activate_bonus` varchar(45) NOT NULL DEFAULT '0',
  PRIMARY KEY (`customer_id`),
  KEY `idx_fk_store_id` (`store_id`),
  KEY `idx_fk_address_id` (`address_id`),
  KEY `idx_last_name` (`last_name`),
  KEY `idx_bonus` (`bonus`),
  CONSTRAINT `fk_customer_address` FOREIGN KEY (`address_id`) REFERENCES `address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT `fk_customer_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4 COLLATE=ut

As you can see I have added the bonus and activate_bonus fields plus the idx_bonus index.

To be able to trace the locks, these are the threads ids by session:

session 1 17439
session 2 17430
session 3 17443

To collect the lock information:

SELECT 
    index_name, lock_type, lock_mode, lock_data, thread_id
FROM
    performance_schema.data_locks
WHERE
    object_schema = 'sakila'
        AND object_name = 'customer'
        AND lock_type = 'RECORD'
        AND thread_id IN (17439 , 17430)
ORDER BY index_name , lock_data DESC;

Ok, ready? Let’s start!

The Run…

While the following steps can be done in a more compressed way, I prefer to do it in a more verbose way, to make it more human-readable.

First, let us set the environment:

session1 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.07 sec)
session1 >Start Transaction;
Query OK, 0 rows affected (0.07 sec)

Then let see the list of the customers we will modify:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

As you can see we have 21 customers fitting our criteria. How much money is involved in this exercise?

session1 >SELECT
    ->     SUM(amount) income,
    ->     SUM(amount) * 0.90 income_with_bonus,
    ->    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
    -> FROM
    ->     sakila.customer AS c
    ->         JOIN
    ->     sakila.payment AS p ON c.customer_id = p.customer_id
    ->     where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 |         2174.6070 |           241.6230 |
+---------+-------------------+--------------------+

This exercise is going to cost me ~242 dollars. Keep this number in mind. What locks do I have at this point?

session3 >select index_name, lock_type, lock_mode,lock_data from performance_schema.data_locks where  object_schema = 'sakila' and object_name = 'customer' and lock_type = 'RECORD'  and
thread_id in (17439,17430) order by index_name, lock_data desc;
Empty set (0.00 sec)

The answer is none. 

Meanwhile, we have the other process that needs to reactivate the customers:

session2 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.00 sec)

session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 0              |
|           7 |        1 | MARIA      | MILLER    |     1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 0              |
|         178 |        2 | MARION     | SNYDER    |     1 | 0              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 0              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 0              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)

In this case, the process needs to reactivate 10 users.

session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10  Changed: 10  Warnings: 0
session2 >commit;
Query OK, 0 rows affected (0.00 sec)

All good, right? But before going ahead let’s double-check session 1:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

Perfect! My repeatable read still sees the same snapshot. Let me apply the changes:

session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
Query OK, 31 rows affected (0.06 sec)
Rows matched: 31  Changed: 31  Warnings: 0

Wait, what? My list reports 21 entries, but I have modified 31! And if I check the cost:

session1 >SELECT
    ->     SUM(amount) income,
    ->     SUM(amount) * 0.90 income_with_bonus,
    ->    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
    -> FROM
    ->     sakila.customer AS c
    ->         JOIN
    ->     sakila.payment AS p ON c.customer_id = p.customer_id
    ->     where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 |         3378.6090 |           375.4010 |
+---------+-------------------+--------------------+

Well, now the cost of this operation is 375 dollars, not 242. In this case, we are talking about peanuts, but guess what could be if we do something similar on thousands of users. 

Anyhow, let us first:

session1 >rollback;
Query OK, 0 rows affected (0.08 sec)

And cancel the operation.

So what happened; is this a bug?

No, it is not! It is how repeatable reads work in MySQL. The snapshot is relevant to the read operation, but if another session is able to write given also the absence of lock at the moment of reads, the next update operation will touch any value in the table matching the conditions. 

As shown above, this can be very dangerous. But only if you don’t do the right things in the code. 

How Can I Prevent This From Happening?

When coding, hope for the best, plan for the worse, always! Especially when dealing with databases. That approach may save you from spending nights trying to fix the impossible. 

So how can this be prevented? You have two ways, both simple, but both with positive and negative consequences.

  1. Add this simple clausole to the select statement: for share
  2. Add the other simple clausole to the select statement: for update
  3. Use Read-committed

Solution one is easy and clean, with no other change in the code. BUT it creates locks, and if your application is lock sensitive this may be an issue for you.

Solution two is also easy, but a bit more encapsulated and locking. 

On the other hand, solution three does not add locks but requires modifications in the code and it still leaves some space for problems.

Let us see them in detail.

Solution One

Let us repeat the same steps:

session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for share;

Now we check the locks (for brevity I have cut some entries and kept a few as a sample):

session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
    -> object_schema = 'sakila'
    -> and object_name = 'customer'
    -> and lock_type = 'RECORD'
    -> and thread_id = 17439
    -> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | S             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | S             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | S             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | S             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 9                      |     17439 |
<snip>
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.00 sec)

This time we can see that the select has raised a few locks all in S (shared) mode. 

For brevity, I am skipping to report exactly the same results as in the first exercise.

If we go ahead and try with session two:

session2 >set transaction_isolation = 'READ-COMMITTED';
session2 >Start Transaction;
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Here we go! Attempting to change the values in the customer table is now on hold waiting to acquire the lock. If the duration exceeds the Innodb_wait_timeout, then the execution is interrupted and the application must have a try-catch mechanism to retry the operation. This last one is a best practice that you should already have in place in your code. If not, well add it now!

At this point, session one can proceed and complete the operations. After that and before the final commit, we will be able to observe:

+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | S             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | S             | 1, 9                   |     17439 |
| idx_bonus  | RECORD    | X             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| idx_bonus  | RECORD    | S             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 9                      |     17439 |
<snip>|
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
| PRIMARY    | RECORD    | S,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
126 rows in set (0.00 sec)

As you can see we now have two different lock types, the Shared one from the select, and the exclusive lock (X) from the update.

Solution Two

In this case, we just change the kind of lock we set on the select. In solution one we opt for a shared lock, which allows other sessions to eventually acquire another shared lock. Here we go for an exclusive lock that will put all the other operations on hold.

session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for UPDATE;

If now we check the locks (again, for brevity I have cut some entries and kept a few as a sample):

session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
    -> object_schema = 'sakila'
    -> and object_name = 'customer'
    -> and lock_type = 'RECORD'
    -> and thread_id = 17439
    -> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
| idx_bonus  | RECORD    | X             | 1, 9                   |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 80                     |     17439 |
<snip>
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 128                    |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.09 sec)

In this case, the kind of lock is X as exclusive, so the other operations when requesting a lock must wait for this transaction to complete.

session2 >#set transaction_isolation = 'READ-COMMITTED';
session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.06 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;

And it will wait for N time where N is either the Innodb_lock_wait_timeout or the commit time from session one.

Note the lock request on hold:

+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode     | lock_data              | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus  | RECORD    | X             | supremum pseudo-record |     17439 |
<snip>
| idx_bonus  | RECORD    | X             | 1, 128                 |     17439 |
| idx_bonus  | RECORD    | X             | 1, 1                   |     17430 |<--
| idx_bonus  | RECORD    | X             | 1, 1                   |     17439 |
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 9                      |     17439 |
<snip>
| PRIMARY    | RECORD    | X,REC_NOT_GAP | 1                      |     17439 |
+------------+-----------+---------------+------------------------+-----------+
64 rows in set (0.09 sec)

Session two is trying to lock exclusively the idx_bonus but cannot proceed.

Once session one goes ahead, we have:

session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
Query OK, 0 rows affected (0.06 sec)
Rows matched: 21  Changed: 0  Warnings: 0

session1 >
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 1              |
|          44 |        1 | MARIE      | TURNER    |     1 | 1              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 1              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 1              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 1              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 1              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 1              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 1              |
|         267 |        1 | MARGIE     | WADE      |     1 | 1              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 1              |
|         312 |        2 | MARK       | RINEHART  |     1 | 1              |
|         383 |        1 | MARTIN     | BALES     |     1 | 1              |
|         413 |        2 | MARVIN     | YEE       |     1 | 1              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 1              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 1              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 1              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 1              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 1              |
|         553 |        1 | MAX        | PITT      |     1 | 1              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 1              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 1              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

session1 >SELECT      SUM(amount) income,     SUM(amount) * 0.90 income_with_bonus,    (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus FROM     sakila.customer AS c         JOIN     sakila.payment AS p ON c.customer_id = p.customer_id     where active = 1 and bonus =1;
+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 |         2174.6070 |           241.6230 |
+---------+-------------------+--------------------+
1 row in set (0.06 sec)

Now my update matches the expectations and the other operations are on hold.

After session one commits:

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 1              |
|           7 |        1 | MARIA      | MILLER    |     1 | 1              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 1              |
|         178 |        2 | MARION     | SNYDER    |     1 | 1              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 1              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 1              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 1              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 1              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 1              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 1              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (52.72 sec) <-- Note the time!

session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.06 sec)
Rows matched: 10  Changed: 10  Warnings: 0

Session two is able to complete, BUT it was on hold for 52 seconds waiting for session one. As said, this solution is a good one if you can afford locks and waiting time.

Solution Three

In this case, we will use a different isolation model that will allow session one to see what session two has modified.

session1 >set transaction_isolation = 'READ-COMMITTED';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|         383 |        1 | MARTIN     | BALES     |     1 | 0              |
|         539 |        1 | MATHEW     | BOLIN     |     1 | 0              |
|         441 |        1 | MARIO      | CHEATHAM  |     1 | 0              |
|         482 |        1 | MAURICE    | CRAWLEY   |     1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |     1 | 0              |
|          38 |        1 | MARTHA     | GONZALEZ  |     1 | 0              |
|         444 |        2 | MARCUS     | HIDALGO   |     1 | 0              |
|         252 |        2 | MATTIE     | HOFFMAN   |     1 | 0              |
|         256 |        2 | MABEL      | HOLLAND   |     1 | 0              |
|         226 |        2 | MAUREEN    | LITTLE    |     1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |     1 | 0              |
|         499 |        2 | MARC       | OUTLAW    |     1 | 0              |
|         553 |        1 | MAX        | PITT      |     1 | 0              |
|         312 |        2 | MARK       | RINEHART  |     1 | 0              |
|          80 |        1 | MARILYN    | ROSS      |     1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |     1 | 0              |
|         128 |        1 | MARJORIE   | TUCKER    |     1 | 0              |
|          44 |        1 | MARIE      | TURNER    |     1 | 0              |
|         267 |        1 | MARGIE     | WADE      |     1 | 0              |
|         240 |        1 | MARLENE    | WELCH     |     1 | 0              |
|         413 |        2 | MARVIN     | YEE       |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)

Same result as before. Now let us execute commands in session two:

session2 >set transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.00 sec)

session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)

session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
|           1 |        1 | MARY       | SMITH     |     1 | 0              |
|           7 |        1 | MARIA      | MILLER    |     1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |     1 | 0              |
|         178 |        2 | MARION     | SNYDER    |     1 | 0              |
|         236 |        1 | MARCIA     | DEAN      |     1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |     1 | 0              |
|         254 |        2 | MAXINE     | SILVA     |     1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |     1 | 0              |
|         323 |        2 | MATTHEW    | MAHAN     |     1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |     1 | 0              |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10  Changed: 10  Warnings: 0

session2 >commit;
Query OK, 0 rows affected (0.00 sec)

All seems the same, but if we check again in session one:

session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+------+----------------+
| customer_id | store_id | first_name | last_name |bonus | activate_bonus |
+-------------+----------+------------+-----------+------+----------------+
|         383 |        1 | MARTIN     | BALES     |    1 | 1              |
|         539 |        1 | MATHEW     | BOLIN     |    1 | 1              |
|         441 |        1 | MARIO      | CHEATHAM  |    1 | 1              |
|         482 |        1 | MAURICE    | CRAWLEY   |    1 | 1              |
|         236 |        1 | MARCIA     | DEAN      |    1 | 0              |
|         257 |        2 | MARSHA     | DOUGLAS   |    1 | 0              |
|         293 |        2 | MAE        | FLETCHER  |    1 | 1              |
|          38 |        1 | MARTHA     | GONZALEZ  |    1 | 1              |
|         444 |        2 | MARCUS     | HIDALGO   |    1 | 1              |
|         252 |        2 | MATTIE     | HOFFMAN   |    1 | 1              |
|         256 |        2 | MABEL      | HOLLAND   |    1 | 1              |
|         226 |        2 | MAUREEN    | LITTLE    |    1 | 1              |
|         323 |        2 | MATTHEW    | MAHAN     |    1 | 0              |
|         246 |        1 | MARIAN     | MENDOZA   |    1 | 0              |
|           7 |        1 | MARIA      | MILLER    |    1 | 0              |
|           9 |        2 | MARGARET   | MOORE     |    1 | 0              |
|         408 |        1 | MANUEL     | MURRELL   |    1 | 0              |
|         588 |        1 | MARION     | OCAMPO    |    1 | 1              |
|         499 |        2 | MARC       | OUTLAW    |    1 | 1              |
|         553 |        1 | MAX        | PITT      |    1 | 1              |
|         312 |        2 | MARK       | RINEHART  |    1 | 1              |
|          80 |        1 | MARILYN    | ROSS      |    1 | 1              |
|         254 |        2 | MAXINE     | SILVA     |    1 | 0              |
|           1 |        1 | MARY       | SMITH     |    1 | 0              |
|         178 |        2 | MARION     | SNYDER    |    1 | 0              |
|         583 |        1 | MARSHALL   | THORN     |    1 | 1              |
|         128 |        1 | MARJORIE   | TUCKER    |    1 | 1              |
|          44 |        1 | MARIE      | TURNER    |    1 | 1              |
|         267 |        1 | MARGIE     | WADE      |    1 | 1              |
|         240 |        1 | MARLENE    | WELCH     |    1 | 1              |
|         413 |        2 | MARVIN     | YEE       |    1 | 1              |
+-------------+----------+------------+-----------+------+----------------+
31 rows in set (0.09 sec)

We now can see all the 31 customers and also the value calculation:

+---------+-------------------+--------------------+
| income  | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 |         3378.6090 |           375.4010 |
+---------+-------------------+--------------------+

It is reporting the right values.

At this point, we can program some logic in the process to check the possible tolerance and have the process either complete performing the update operations or stop the process and exit.

This as mentioned requires additional coding and more logic.

To be fully honest, this solution still leaves space for some possible interference, but at least allows the application to be informed about what is happening. Still, it should be used only if you cannot afford to have a higher level of locking, but this is another story/article.

Conclusion

When writing applications that interact with an RDBMS, you must be very careful in what you do and HOW you do it. While using data facilitation layers like Object Relational Mapping (ORM), seems to make your life easier, in reality, you may lose control of a few crucial aspects of the application’s interaction. So be very careful when opting for how to access your data.

About the case reported above we can summarize a few pitfalls:

  1. First and most important, never ever have processes that may interfere with the one running at the same time. Be very careful when you design them and even more careful when planning their executions.  
  2. Use options such as for share or for update in your select statements. Use them carefully to avoid unuseful locks, but use them. 
  3. In case you have a long process that requires modifying data, and you need to be able to check if other processes have altered the status of the data, but at the same time, cannot have a long wait time for locks. Then use a mix, setting READ-COMMITTED as isolation, to allow your application to check, but also add things like for share or for update in the select statement, immediately before the DML and the commit. That will allow you to prevent writes while you are finalizing, and also to significantly reduce the locking time. 
  4. Keep in mind that long-running processes and long-running transactions can be the source of a lot of pain, especially when using REPEATABLE-READ. Try whenever you can to split the operations into small chunks. 

Finally, when developing, remember that DBAs are friends, and are there to help you to do things at your best. It may seem they are giving you a hard time, but that is because their point of view is different, focusing on data consistency, availability, and durability. But they can help you to save a lot of time after the code is released, especially when you try to identify why something has gone wrong. 

So involve them soon in the design process, use them and let them be part of the process.

References

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-model.html

 

Complete the 2021 Percona Open Source Data Management Software Survey

Have Your Say!

Oct
24
2018
--

PostgreSQL locking, part 2: heavyweight locks

Locking in PostgreSQL

Locking in PostgreSQLPostgreSQL locking visibility for application developers and DBAs is in most cases related to heavyweight locks. Complex database locking operations require full instrumentation using views from the system catalog. It should be clear which object is locked by a specific database “backend” process. An alternative name for any lock is “bottleneck”. In order to make database operations parallel we should split a single bottleneck into multiple operation specific tasks.

This is the second part of three blog posts related to table level locks. The previous post was about row-level locks, and a subsequent post reviews the latches protecting internal database structures.

Example environment

A simple table with several rows:

CREATE TABLE locktest (c INT);
INSERT INTO locktest VALUES (1), (2);

Helper view

In order to check different types of these locks, let’s create a helper view as suggested by Bruce Momjian in his presentation:

CREATE VIEW lockview AS SELECT pid, virtualtransaction AS vxid, locktype AS lock_type,
mode AS lock_mode, granted,
CASE
WHEN virtualxid IS NOT NULL AND transactionid IS NOT NULL
THEN virtualxid || ' ' || transactionid
WHEN virtualxid::text IS NOT NULL
THEN virtualxid
ELSE transactionid::text                                                                                                                                                                                           END AS xid_lock, relname,
page, tuple, classid, objid, objsubid
FROM pg_locks LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
WHERE -- do not show our view’s locks
pid != pg_backend_pid() AND
-- no need to show self-vxid locks                                                                                                                                                                                 virtualtransaction IS DISTINCT FROM virtualxid
-- granted is ordered earlier
ORDER BY 1, 2, 5 DESC, 6, 3, 4, 7;

RowShareLock (ROW SHARE)

Many applications use the read-modify-write paradigm. For instance, the application fetches single object fields from a table, modifies the data, and saves the changes back to the database. In a multi-user environment, different users could modify the same rows in the course of this transaction. We can get inconsistent data with just a plain select. In response to user demands, almost all SQL databases have SELECT … FOR SHARE locking. This feature prevents the application entity from making data modifications until the locker transaction commits or rolls back.

For example:

  1. There is a user with multiple bank accounts stored in an accounts table, with total_amount stored in a bank_clients table.
  2. In order to update the total_amount field, we should prevent modification of all rows related to the specific bank client.
  3. It would be better to use a single update statement to calculate total_amount and select it from the accounts table. If the update requires external data, or some action from the user, then several statements are required
START TRANSACTION;
SELECT * FROM accounts WHERE client_id = 55 FOR SHARE;
SELECT * FROM bank_clients WHERE client_id=55 FOR UPDATE;
UPDATE bank_clients SET total_amount=38984.33, client_status='gold' WHERE client_id=55;
COMMIT;

The SELECT FOR SHARE statement creates a “RowShareLock” lock on the relation locktest.

Here’s exactly the same lock created with an SQL statement:

BEGIN;
LOCK TABLE locktest IN ROW SHARE MODE;

A single heavyweight RowShareLock is required regardless of the number of rows locked by the query.

This is illustrated with an unfinished transaction in the following example. Start the unfinished transaction, and select from lockview in a second connection to the database:

BEGIN;
SELECT * FROM locktest FOR SHARE;
-- In second connection:
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |   lock_mode   | granted | xid_lock | relname
-------+------+---------------+---------------+---------+----------+----------
 21144 | 3/13 | transactionid | ExclusiveLock | t       | 586      |
 21144 | 3/13 | relation      | RowShareLock  | t       |          | locktest

RowExclusiveLock (ROW EXCLUSIVE)

Real queries that modify rows also require heavyweight locks on tables, one per table.

The next example uses a DELETE query, but an UPDATE will have the same effect.

All commands that modify data in a table obtain a ROW EXCLUSIVE lock.

BEGIN;
DELETE FROM locktest;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode     | granted | xid_lock | relname
-------+------+---------------+------------------+---------+----------+----------
 10997 | 3/6  | transactionid | ExclusiveLock    | t       | 589      |
 10997 | 3/6  | relation      | RowExclusiveLock | t       |          | locktest

This new lock is incompatible with the previous FOR SHARE example.

SELECT * FROM locktest FOR SHARE

  waits for the delete transaction to finish or abort:

postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname,page,tuple FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode     | granted | xid_lock | relname  | page | tuple
-------+------+---------------+------------------+---------+----------+----------+------+-------
 10997 | 3/6  | transactionid | ExclusiveLock    | t       | 589      |          |      |
 10997 | 3/6  | relation      | RowExclusiveLock | t       |          | locktest |      |
 11495 | 5/9  | relation      | RowShareLock     | t       |          | locktest |      |
 11495 | 5/9  | tuple         | RowShareLock     | t       |          | locktest |    0 |     1
 11495 | 5/9  | transactionid | ShareLock        | f       | 589      |          |      |

Queries modifying table content also lock all indexes, even if the index does not contain modified fields.

-- preparation
CREATE INDEX c_idx2 ON locktest (c);
ALTER TABLE locktest ADD COLUMN c2 INT;
CREATE INDEX c2_idx ON locktest(c2);
-- unfinished example transaction
BEGIN;
UPDATE locktest SET c=3 WHERE c=1;
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  | lock_type  |    lock_mode     | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+------------+------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7844 | virtualxid | ExclusiveLock    | t       | 3/7844   |          |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c2_idx   |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c_idx    |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c_idx2   |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | locktest |      |       |         |       |

ShareLock (SHARE)

The non-concurrent version of CREATE INDEX prevents table updates, e.g. DROP TABLE or INSERT or DELETE, with ShareLock.

BEGIN;
CREATE INDEX c_idx ON locktest (c);
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  |   lock_type   |      lock_mode      | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+---------------+---------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7835 | virtualxid    | ExclusiveLock       | t       | 3/7835   |          |      |       |         |       |
 3998 | 3/7835 | transactionid | ExclusiveLock       | t       | 564      |          |      |       |         |       |
 3998 | 3/7835 | relation      | AccessExclusiveLock | t       |          |          |      |       |         |       |
 3998 | 3/7835 | relation      | ShareLock           | t       |          | locktest |      |       |         |       |

You can execute multiple CREATE INDEX queries in parallel unless the index name is exactly the same. The wait happens on the row lock (ShareLock with “transactionid” type) in the pg_class table.

Note that there is also AccessExclusiveLock lock with type “relation”, but it’s not a table level one.

ShareUpdateExclusiveLock (SHARE UPDATE EXCLUSIVE)

These database maintenance operations need to take a ShareUpdateExclusiveLock:

  • ANALYZE table
  • VACUUM (without full) runs.
  • CREATE INDEX CONCURRENTLY

The 

ANALYZE tablename;

  statement updates table statistics. The query planner/optimizer is able to provide the best plans for query execution only if the statistics are up to date.

BEGIN;
ANALYZE locktest;
-- in second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |        lock_mode         | granted | xid_lock | relname
-------+------+---------------+--------------------------+---------+----------+----------
 10997 | 3/7  | transactionid | ExclusiveLock            | t       | 591      |
 10997 | 3/7  | relation      | ShareUpdateExclusiveLock | t       |          | locktest

There is no conflict between RowExclusiveLock and ShareUpdateExclusiveLock. UPDATE/DELETE/INSERT could still modify rows during ANALYZE.

VACUUM and CREATE INDEX CONCURRENTLY can be executed only outside a transaction. To see the effects of these statements in lockview, execute a conflicting transaction first e.g. run ANALYZE in a transaction, or run VACUUM against a huge table.

CREATE INDEX CONCURRENTLY locking can be confusing. SHARE UPDATE EXCLUSIVE lock does not conflict with a ROW EXCLUSIVE lock that’s used for DELETES, INSERT and UPDATES. Unfortunately, CREATE INDEX CONCURRENTLY waits until active transactions are finished twice due to full table scans:

In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions. Before each table scan, the index build must wait for existing transactions that have modified the table to terminate.” PostgreSQL Documentation

AccessExclusiveLock  (ACCESS EXCLUSIVE)

This lock conflicts with any other locks and is used by these statements:

  • CREATE RULE
  • DROP TABLE
  • DROP INDEX
  • TRUNCATE
  • VACUUM FULL
  • LOCK TABLE (default mode)
  • CLUSTER
  • REINDEX
  • REFRESH MATERIALIZED VIEW (without CONCURRENTLY)
BEGIN;
CREATE RULE r_locktest AS ON INSERT TO locktest DO INSTEAD NOTHING;
-- second connection
postgres=# select pid,vxid,lock_type,lock_mode,granted,xid_lock,relname from lockview;
  pid  | vxid |   lock_type   |      lock_mode      | granted | xid_lock | relname
-------+------+---------------+---------------------+---------+----------+----------
 10997 | 3/19 | transactionid | ExclusiveLock       | t       | 596      |
 10997 | 3/19 | relation      | AccessExclusiveLock | t       |          | locktest

More importantly, drop index requires access exclusive locks for both table and index:

BEGIN;
DROP INDEX c_idx;
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  |   lock_type   |      lock_mode      | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+---------------+---------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7839 | virtualxid    | ExclusiveLock       | t       | 3/7839   |          |      |       |         |       |
 3998 | 3/7839 | transactionid | ExclusiveLock       | t       | 569      |          |      |       |         |       |
 3998 | 3/7839 | relation      | AccessExclusiveLock | t       |          | c_idx    |      |       |         |       |
 3998 | 3/7839 | relation      | AccessExclusiveLock | t       |          | locktest |      |       |         |       |

Note: This is the most dangerous type of lock. Avoid running queries requiring access exclusive lock in production, or at least put the application in maintenance mode.

ExclusiveLock

Meanwhile, SQL commands don’t use ExclusiveLock, except for with the general LOCK TABLE statement. This lock prevents all requests except for a non-locking select (i.e. without FOR SHARE/UPDATE).

BEGIN;
LOCK TABLE locktest IN EXCLUSIVE MODE;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid | lock_type |   lock_mode   | granted | xid_lock | relname
-------+------+-----------+---------------+---------+----------+----------
 10997 | 3/21 | relation  | ExclusiveLock | t       |          | locktest

Savepoints

Savepoint produces an additional ExclusiveLock of transactionid type with new xid value.

BEGIN;
SELECT * FROM locktest FOR SHARE;
SAVEPOINT s1;
SELECT * FROM locktest FOR UPDATE;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode    | granted | xid_lock | relname
-------+------+---------------+-----------------+---------+----------+----------
 10997 | 3/37 | transactionid | ExclusiveLock   | t       | 602      |
 10997 | 3/37 | transactionid | ExclusiveLock   | t       | 603      |
 10997 | 3/37 | relation      | AccessShareLock | t       |          | c_idx
 10997 | 3/37 | relation      | RowShareLock    | t       |          | locktest

pg_advisory_lock

Sometimes application developers require synchronization between processes. In such systems, the application creates and removes locks frequently. Systems with a row-based locks implementation tend to cause table bloat.

There are many functions related to advisory locks:

  • per session or per transaction
  • wait if lock is not available or immediately return false
  • exclusive or shared
  • 64-bit or two 32-bit integers resource identifiers

Imagine that we have several cron jobs and that the application should prevent simultaneous runs of the same script. Next, each script can check if a lock is available in PostgreSQL for specific integer job identifier:

postgres=# SELECT pg_try_advisory_lock(10);
 pg_try_advisory_lock
----------------------
 t
-- second connection
postgres=# SELECT * FROM lockview;
 pid  | vxid | lock_type |   lock_mode   | granted | xid_lock | relname | page | tuple | classid | objid | objsubid
------+------+-----------+---------------+---------+----------+---------+------+-------+---------+-------+----------
 3998 | 3/0  | advisory  | ExclusiveLock | t       |          |         |      |       |       0 |    10 |        1
-- other connections
SELECT pg_try_advisory_lock(10);
 pg_try_advisory_lock
----------------------
 f

The query produces ExclusiveLock with type advisory.

Deadlocks

Any system with multiple locks tends to have a deadlock situation when queries will never finish. The only way to resolve such issues: kill one of blocked statements. More importantly, deadlock detection is an expensive procedure in PostgreSQL. A check for deadlock only happens when a transaction is locked for deadlock_timeout milliseconds—after one second by default.

Here is an illustration of a deadlock situation for two different connections A and B:

Any deadlock starting from lock wait.

A: BEGIN; SELECT c FROM locktest WHERE c=1 FOR UPDATE;
B: BEGIN; SELECT c FROM locktest WHERE c=2 FOR UPDATE; SELECT c FROM locktest WHERE c=1 FOR UPDATE;

You are not alone with the identification of deadlocks, as the pg_stat_activity system view helps you to find statements and transactions causing lock waits:

postgres=# SELECT pg_stat_activity.pid AS pid,
query, wait_event, vxid, lock_type,
lock_mode, granted, xid_lock
FROM lockview JOIN pg_stat_activity ON (lockview.pid = pg_stat_activity.pid);
  pid  |          query             |  wait_event   | vxid |   lock_type   |      lock_mode      | granted | xid_lock
-------+----------------------------+---------------+------+---------------+---------------------+---------+----------
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | transactionid | ExclusiveLock       | t       | 605
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | advisory      | ExclusiveLock       | t       |
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | relation      | AccessShareLock     | t       |
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | relation      | RowShareLock        | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | transactionid | ExclusiveLock       | t       | 606
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | advisory      | ExclusiveLock       | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | relation      | AccessShareLock     | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | relation      | RowShareLock        | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | tuple         | AccessExclusiveLock | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | transactionid | ShareLock           | f       | 605

The SELECT FOR UPDATE on c=2 row causes a deadlock:

SELECT c FROM locktest WHERE c=2 FOR UPDATE;

Afterwards, PostgreSQL reports in server log:

2018-08-02 08:46:07.793 UTC [10997] ERROR:  deadlock detected
2018-08-02 08:46:07.793 UTC [10997] DETAIL:  Process 10997 waits for ShareLock on transaction 606; blocked by process 11495.
Process 11495 waits for ShareLock on transaction 605; blocked by process 10997.
Process 10997: select c from locktest where c=2 for update;
Process 11495: select c from locktest where c=1 for update;
2018-08-02 08:46:07.793 UTC [10997] HINT:  See server log for query details.
2018-08-02 08:46:07.793 UTC [10997] CONTEXT:  while locking tuple (0,3) in relation "locktest"
2018-08-02 08:46:07.793 UTC [10997] STATEMENT:  SELECT c FROM locktest WHERE c=2 FOR UPDATE;
ERROR:  deadlock detected
DETAIL:  Process 10997 waits for ShareLock on transaction 606; blocked by process 11495.
Process 11495 waits for ShareLock on transaction 605; blocked by process 10997.
HINT:  See server log for query details.
CONTEXT:  while locking tuple (0,3) in relation "locktest"

As you can see, the database server aborts one blocked transaction automatically.

Multi-way deadlocks

Normally there are just two transactions creating deadlocks. However, in complex cases, an application could caused deadlocks with multiple transactions forming a dependency circle.

A: locks row1, B locks row2, C locks row3

Step2

A: trying to get row3, B: trying to get row1, C: trying to get row2

Summary

  • Do not put DDL statements in long transactions.
  • Please avoid DDL on during high load for frequently updated tables
  • CLUSTER command requires exclusive access to the table and all it’s indexes
  • Monitor postgresql log for deadlock-related messages

Photo by shy sol from Pexels

Oct
16
2018
--

PostgreSQL locking, Part 1: Row Locks

PostgreSQL row level locks

row signing with postgresqlAn understanding of PostgreSQL locking is important to build scalable applications and avoid downtime. Modern computers and servers have many CPU cores and it’s possible to execute multiple queries in parallel. Databases containing many consistent structures with changes made by queries or background processes running in parallel could crash a database or even corrupt data. Thus we need the ability to prevent access from concurrent processes, while changing shared memory structures or rows. One thread updates the structure while all others wait (exclusive lock), or multiple threads read the structure and all writes wait. The side effect of waits is a locking contention and server resources waste. Thus it’s important to understand why waits happen and what locks are involved. In this article, I review PostgreSQL row level locking.

In follow up posts, I will investigate table-level locks and latches protecting internal database structures.

Row locks – an overview

PostgreSQL has many locks at different abstraction levels. The most important locks for applications are related to MVCC implementation – row level locking. In second place – locks appearing during maintenance tasks (during backups/database migrations schema changes) – table level locking. It’s also possible—but rare—to see waits on low level PostgreSQL locks. More often there is a high CPU usage, with many concurrent queries running, but overall server performance reduced in comparison with normal number of queries running in parallel.

Example environment

To follow along, you need a PostgreSQL server with a single-column table containing several rows:

postgres=# CREATE TABLE locktest (c INT);
CREATE TABLE
postgres=# INSERT INTO locktest VALUES (1), (2);
INSERT 0 2

Row locks

Scenario: two concurrent transactions are trying to select a row for update.

PostgreSQL uses row-level locking in this case. Row level locking is tightly integrated with MVCC implementation, and uses hidden xmin and xmax fields.

xmin

 and

xmax

 store the transaction id. All statements requiring row-level locks modify the xmax field (even SELECT FOR UPDATE). The modification happens after the query returns its results, so in order to see xmax change we need to run SELECT FOR UPDATE twice. Usually, the xmax field is used to mark a row as expired—either removed by some transaction completely or in favor of updated row version—but it also used for row-level locking infrastructure.

If you need more details about the xmin and xmax hidden fields and MVCC implementation, please check our “Basic Understanding of Bloat and VACUUM in PostgreSQL” blog post.

postgres=# BEGIN;
postgres=# SELECT xmin,xmax, txid_current(), c FROM locktest WHERE c=1 FOR UPDATE;
BEGIN
 xmin | xmax | txid_current | c
------+------+--------------+---
  579 |  581 |          583 | 1
(1 row)
postgres=# SELECT xmin,xmax, txid_current(), c FROM locktest WHERE c=1 FOR UPDATE;
 xmin | xmax | txid_current | c
------+------+--------------+---
  579 |  583 |          583 | 1
(1 row)

If a statement is trying to to modify the same row, it checks the list of unfinished transactions. The statement has to wait for modification until the transaction with id=xmax is finished.

There is no infrastructure for waiting on a specific row, but a transaction can wait on transaction id.

-- second connection
SELECT xmin,xmax,txid_current() FROM locktest WHERE c=1 FOR UPDATE;

The SELECT FOR UPDATE query running in the second connection is unfinished, and waiting for the first transaction to complete.

pg_locks

Such waits and locks can be seen by querying pg_locks:

postgres=# SELECT locktype,transactionid,virtualtransaction,pid,mode,granted,fastpath
postgres-#  FROM pg_locks WHERE transactionid=583;
   locktype    | transactionid | virtualtransaction |  pid  |     mode      | granted | fastpath
---------------+---------------+--------------------+-------+---------------+---------+----------
 transactionid |           583 | 4/107              | 31369 | ShareLock     | f       | f
 transactionid |           583 | 3/11               | 21144 | ExclusiveLock | t       | f

You can see the writer transaction id for locktype=transactionid == 583. Let’s get the pid and backend id for the holding lock:

postgres=# SELECT id,pg_backend_pid() FROM pg_stat_get_backend_idset() AS t(id)
postgres-#  WHERE pg_stat_get_backend_pid(id) = pg_backend_pid();
 id | pg_backend_pid
----+----------------
  3 |          21144

This backend has its lock granted (t). Each backend has an OS process identifier (PID) and internal PostgreSQL identifier (backend id). PostgreSQL can process many transactions, but locking can happen only between backends, and each backend executes a single transaction. Internal bookkeeping requires just a virtual transaction identifier: a pair of backend ids and a sequence number inside the backend.

Regardless of the number of rows locked, PostgreSQL will have only a single related lock in the pg_locks table. Queries might modify billions of rows but PostgreSQL does not waste memory for redundant locking structures.

A writer thread sets ExclusiveLock on its transactionid. All row level lock waiters set ShareLock. The lock manager resumes all previously locked backend locks as soon as the writer releases the lock.

Lock release for transactionid occurs on commit or rollback.

pg_stat_activity

Another great method to get locking-related details is to select from the pg_stat_activity table:

postgres=# SELECT pid,backend_xid,wait_event_type,wait_event,state,query FROM pg_stat_activity WHERE pid IN (31369,21144);
-[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------------
pid             | 21144
backend_xid     | 583
wait_event_type | Client
wait_event      | ClientRead
state           | idle in transaction
query           | SELECT id,pg_backend_pid() FROM pg_stat_get_backend_idset() AS t(id) WHERE pg_stat_get_backend_pid(id) = pg_backend_pid();
-[ RECORD 2 ]---+---------------------------------------------------------------------------------------------------------------------------
pid             | 31369
backend_xid     | 585
wait_event_type | Lock
wait_event      | transactionid
state           | active
query           | SELECT xmin,xmax,txid_current() FROM locktest WHERE c=1 FOR UPDATE;

Source code-level investigation

Let’s check the stack trace for the waiter with gdb and the pt-pmp tool:

# pt-pmp -p 31369
Sat Jul 28 10:10:25 UTC 2018
30	../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
      1 epoll_wait,WaitEventSetWaitBlock,WaitEventSetWait,WaitLatchOrSocket,WaitLatch,ProcSleep,WaitOnLock,LockAcquireExtended,LockAcquire,XactLockTableWait,heap_lock_tuple,ExecLockRows,ExecProcNode,ExecutePlan,standard_ExecutorRun,PortalRunSelect,PortalRun,exec_simple_query,PostgresMain,BackendRun,BackendStartup,ServerLoop,PostmasterMain,main

The WaitOnLock function is causing the wait. The function is located in lock.c file (POSTGRES primary lock mechanism).

A lock table is a shared memory hash table. The conflicting process sleeps for the lock in storage/lmgr/proc.c. For the most part, this code should be invoked via lmgr.c or another lock-management module, not directly.

Next, locks listed in pg_stat_activity as “Lock” are also called heavyweight locks, and controlled by Lock Manager. HWLocks are also used for many high level actions.

By the way, a full description can be found here: https://www.postgresql.org/docs/current/static/explicit-locking.html

Summary

  • Avoid long running transactions modifying frequently updated rows or too many rows
  • Next, do not use hotspots (single row or multiple rows updated in parallel by many application client connections) with MVCC databases. This kind of workload is more suitable for in-memory databases and can usually be separated from the main business logic.
Sep
25
2017
--

Avoid Shared Locks from Subqueries When Possible

Shared Locks

Shared LocksIn this blog post, we’ll look at how to avoid shared locks from subqueries.

I’m pretty sure most of you have seen an UPDATE statement matching rows returned from a SELECT query:

update ibreg set k=1 where id in (select id from ibcmp where id > 90000);

This query, when executed with

autocommit=1

, is normally harmless. However, this can have bad effects when combined with other statements in the same transaction that result in holding the shared locks from the SELECT query. But first, let me explain why the SELECT query would hold locks in the first place.

Due to InnoDB’s ACID properties, to make sure that the outer UPDATE statement has a consistent view of the matching rows from the SELECT query the server has to acquire a shared lock on those rows. No other thread should modify those matching rows to maintain consistency within the transaction. To demonstrate, let’s take two transactions executed in specific order below:

mysql1> begin;
mysql1> update ibreg set k=1 where id in (select id from ibcmp where id > 90000);
mysql2> begin;
mysql2> delete from ibcmp where id > 90000;

By the time the second session executes, it will be in a LOCK WAIT state (as confirmed from INFORMATION_SCHEMA):

mysql1> select * from information_schema.innodb_trx G
*************************** 1. row ***************************
                    trx_id: 3932449
                 trx_state: LOCK WAIT
               trx_started: 2017-09-06 00:20:05
     trx_requested_lock_id: 3932449:13:1354:31
          trx_wait_started: 2017-09-06 00:20:05
                trx_weight: 2
       trx_mysql_thread_id: 9
                 trx_query: delete from test.ibcmp where id > 90000
       trx_operation_state: starting index read
...
mysql1> select * from information_schema.innodb_locks G
*************************** 1. row ***************************
    lock_id: 3932449:13:1354:31
lock_trx_id: 3932449
  lock_mode: X
  lock_type: RECORD
 lock_table: `test`.`ibcmp`
 lock_index: PRIMARY
 lock_space: 13
  lock_page: 1354
   lock_rec: 31
  lock_data: 90001
*************************** 2. row ***************************
    lock_id: 3932174:13:1354:31
lock_trx_id: 3932174
  lock_mode: S
  lock_type: RECORD
 lock_table: `test`.`ibcmp`
 lock_index: PRIMARY
 lock_space: 13
  lock_page: 1354
   lock_rec: 31
  lock_data: 90001

Information_Schema.INNODB_LOCKS confirms that our first transaction has held a shared lock on the rows that matched the SELECT queries from the first transaction. This can be bad for a number of reasons:

  1. As the number of rows that matches the SELECT grows, DEADLOCK and lock wait timeouts can become more frequent
  2. As a consequence of this, ROLLBACKs would also increase (and are expensive operations)
  3. Your users can become unhappy, especially if it is not handled gracefully from the application

If you really need the consistency of the view between the table being read from and the table getting updated, the lock is necessary and unavoidable. Avoiding the deadlocks and lock wait timeouts can be minimized, but not totally avoided.

On the other hand, if you’re not worried about view consistency, there are two ways you can avoid such problems: by using variables or making sure the SELECT becomes a transient read inside the transaction (i.e., by dumping the results into an OUTFILE).

mysql1> begin;
mysql1> select group_concat(id) into @ids from ibcmp where id > 90000;
mysql1> update ibreg set k=1 where id in (@ids);
mysql2> begin;
mysql2> delete from ibcmp where iid > 90000;

The first method is bound by the

group_concat_max_len

 variable. If you think you will only have a few resulting IDs that fit into

group_concat_max_len

, this is a good solution.

mysql1> begin;
mysql1> select id into outfile '/tmp/id.csv' from ibcmp where id > 90000;
mysql1> create temporary table t (id int unsigned not null) engine=innodb;
mysql1> load data infile '/tmp/id.csv' into table t;
mysql1> update ibreg inner join t on ibreg.id = t.id;
mysql2> begin;
mysql2> delete from ibcmp where id > 90000;

The second approach is only meant to overcome the limitation of the GROUP_CONCAT method.

Again, these two approaches only work if you do not care if the result of the SELECT queries changes on the other table between the BEGIN statement and UPDATE within the transaction.

May
11
2017
--

MyRocks and LOCK IN SHARE MODE

LOCK IN SHARE MODE

LOCK IN SHARE MODEIn this blog post, we’ll look at MyRocks and the

LOCK IN SHARE MODE

.

Those who attended the March 30th webinar “MyRocks Troubleshooting” might remember our discussion with Yoshinori on 

LOCK IN SHARE MODE

.

I did more tests, and I can confirm that his words are true:

LOCK IN SHARE MODE

 works in MyRocks.

This quick example demonstrates this. The initial setup:

CREATE TABLE t (
id int(11) NOT NULL,
f varchar(100) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=ROCKSDB;
insert into t values(12345, 'value1'), (54321, 'value2');

In session 1:

session 1> begin;
Query OK, 0 rows affected (0.00 sec)
session 1> select * from t where id=12345 lock in share mode;
+-------+--------+
| id | f |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.01 sec)

In session 2:

session 2> begin;
Query OK, 0 rows affected (0.00 sec)
session 2> update t set f='value3' where id=12345;
ERROR HY000: Lock wait timeout exceeded; try restarting transaction

However, in the webinar I wanted to remind everyone about the differences between

LOCK IN SHARE MODE

  and

FOR UPDATE

. To do so, I added the former to my “session 2” test for the webinar. Once I did, it ignores the lock set in “session 1”. I can update a row and commit:

session 2> select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)
session 2> update t set f='value3' where id=12345;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
session 2> commit;
Query OK, 0 rows affected (0.02 sec)

I reported this behavior here, and also at Percona Jira bugs database: MYR-107. In Facebook, this bug is already fixed.

This test clearly demonstrates that it is fixed in Facebook. In “session 1”:

session1> CREATE TABLE `t` (
    -> `id` int(11) NOT NULL,
    -> `f` varchar(100) DEFAULT NULL,
    -> PRIMARY KEY (`id`)
    -> ) ENGINE=ROCKSDB;
Query OK, 0 rows affected (0.00 sec)
session1> insert into t values(12345, 'value1'), (54321, 'value2');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
session1> begin;
Query OK, 0 rows affected (0.00 sec)
session1>  select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)

And now in another session:

session2> begin;
Query OK, 0 rows affected (0.00 sec)
session2> select * from t where id=12345 lock in share mode;
+-------+--------+
| id    | f      |
+-------+--------+
| 12345 | value1 |
+-------+--------+
1 row in set (0.00 sec)
session2> update t set f='value3' where id=12345;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction: Timeout on index: test.t.PRIMARY

If you want to test the fix with the Facebook MySQL build, you need to update submodules to download the patch:

git submodule update

.

Jun
17
2016
--

InnoDB locks and transaction isolation level

InnoDB locks and transaction isolation

InnoDB locks and transaction isolationWhat is the difference between InnoDB locks and transaction isolation level? We’ll discuss it in this post.

Recently I received a question from a user about one of my earlier blog posts. Since it wasn’t sent as a comment, I will answer it here. The question:

> I am reading your article:
> https://www.percona.com/resources/technical-presentations/troubleshooting-locking-issues-percona-mysql-webinar

> Full table scan locks whole table.

> Some bad select (read) query can do full table scan on InnoDB, does it lock whole table please?

> My understanding was that SELECT (read) blocks another DML only in MyISAM.

To answer this question, we to need understand two different concepts: locking and transaction isolation level.

In the default transaction isolation mode for InnoDB, 

REPEATABLE READ

 and the lower

TRANSACTION ISOLATION

levels,

SELECT

  doesn’t block any DML unless it uses 

SELECT ... FOR UPDATE

 or

SELECT ... LOCK IN SHARE MODE

. On slide 20 (31) of my presentation, I use the 

SELECT ... FOR UPDATE

 statement (this is why a lock is seen).

However, if transaction isolation mode is

SERIALIZABLE

, then 

SELECT

 can block updates. You can see this in the example below:

mysql1> set transaction isolation level serializable;
Query OK, 0 rows affected (0,00 sec)
mysql1> begin;
Query OK, 0 rows affected (0,00 sec)
mysql1> select * from employees join titles using(emp_no);
c936e6fc4c6cbaf77679ba5013339dff  -
443308 rows in set (29,69 sec)
mysql2> begin;
Query OK, 0 rows affected (0,00 sec)
mysql2> update titles set title='Engineer' where emp_no=15504;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Regarding the different level locks set by InnoDB, I recommend that you follow the links on slide 37 (67) of the presentation, especially “InnoDB Locking Explained With Stick Figures” by Bill Karwin. Innodb table locks are also explained very well in this blog post by Peter Zaitsev.

Sep
20
2010
--

Instrumentation and the cost of Foreign Keys

I occasionally get in to light arguments healthy discussions with students about whether or not to use Foreign Key constraints on InnoDB tables.  My standard response has always been: “it depends on how much of a tradeoff you are willing to make for performance. In some situations the cost can be considerable”.

.. that’s when they expect me to “come up with some real proof” to show them. I do not disagree with their logic or proof being on their list-of-demands.  I support the use of data to make decisions.

The problem is that MySQL has (traditionally) been lacking the instrumentation required to make these decisions easy.  This is getting better  – here is an example we recently added to our InnoDB course:

 CREATE TABLE parent (
 id INT NOT NULL AUTO_INCREMENT,
 bogus_column char(32),
 PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE child (
 id INT NOT NULL AUTO_INCREMENT,
 parent_id INT NOT NULL,
 bogus_column char(32),
 PRIMARY KEY (id),
 KEY (parent_id),
 CONSTRAINT child_ibfk_1 FOREIGN KEY (parent_id) REFERENCES parent (id)
) ENGINE=InnoDB;

INSERT INTO parent (bogus_column)
VALUES ('aaa'), ('bbb'), ('ccc'), ('ddd'), ('eee');

INSERT INTO child (parent_id,bogus_column) VALUES
(1, 'aaa'), (2,'bbb'), (3, 'ccc'),
(4, 'ddd'), (5, 'eee');

START TRANSACTION; # session1
START TRANSACTION; # session2

# session1
UPDATE child SET parent_id = 5? WHERE parent_id = 4;

#session2
UPDATE parent SET bogus_column = 'new!' WHERE id = 4;

#session2
UPDATE parent SET bogus_column = 'new!' WHERE id = 5;

In the last statement, session2 will block waiting on a lock. Want to know where that lock is? Check information_schema.innodb_locks:

mysql> SELECT * FROM information_schema.innodb_locks\G
*************************** 1. row ***************************
    lock_id: 87035:1300:3:6
lock_trx_id: 87035
  lock_mode: X
  lock_type: RECORD
 lock_table: `test`.`parent`
 lock_index: `PRIMARY`
 lock_space: 1300
  lock_page: 3
   lock_rec: 6
  lock_data: 5
*************************** 2. row ***************************
    lock_id: 87034:1300:3:6
lock_trx_id: 87034
  lock_mode: S
  lock_type: RECORD
 lock_table: `test`.`parent`
 lock_index: `PRIMARY`
 lock_space: 1300
  lock_page: 3
   lock_rec: 6
  lock_data: 5
2 rows in set (0.00 sec)

The same example without the foreign key constraints does not block on the last statement.  We also see the expected output change to:

mysql> SELECT * FROM information_schema.innodb_locks;
Empty set (0.00 sec)

This information_schema table is new to InnoDB plugin.  In earlier releases of MySQL you may be able to get the data, but it is not in such an easily digestible form.  Instrumentation is the most under talked about feature in all new releases of MySQL and Percona Server.  See BUG #53336 for more examples of pure awesomeness.


Entry posted by Morgan Tocker |
6 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com