Oct
30
2018
--

PostgreSQL locking, part 3: lightweight locks

LWLocks lightweight locks postgres

PostgreSQL logoPostgreSQL lightweight locks, or LWLocks, control memory access. PostgreSQL uses multi-process architecture and should allow only consistent reads and writes to shared memory structures. LWLocks have two levels of locking: shared and exclusive. It’s also possible to release all acquired LWLocks to simplify clean up. Other databases often call primitives similar to LWLocks “latches”. Because LWLocks is an implementation detail, application developers shouldn’t pay much attention to this kind of locking.

This is the third and final part of a series on PostgreSQL locking, related to latches protecting internal database structures. Here are the previous parts: Row-level locks and table-level locks.

Instrumentation

Starting from PostgreSQL 9.6, LWLocks activity can be investigated with the pg_stat_activity system view. It could be useful under high CPU utilization. There are system settings to help with contention on specific lightweight locks.

Before PostgreSQL 9.5, the LWLocks implementation used spin-locks.  It was a bottleneck. This was fixed in 9.5 with atomic state variable.

Potential heavy contention places

  • WALInsertLock: protects WAL buffers. You can increase the number of wal buffers to get a slight improvement. Incidentally, synchronous_commit=off increases pressure on the lock even more, but it’s not a bad thing. full_page_writes=off reduces contention, but it’s generally not recommended.
  • WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
  • LockMgrLock: appears in top waits during a read-only workload. It latches relations regardless of its size. It’s not a single lock, but at least 16 partitions. Thus it’s important to use multiple tables during benchmarks and avoid single table anti-pattern in production.
  • ProcArrayLock: Protects the ProcArray structure. Before PostgreSQL 9.0, every transaction acquired this lock exclusively before commit.
  • CLogControlLock: protects CLogControl structure, if it shows on the top of pg_stat_activity, you should check the location of $PGDATA/pg_clog—it should be on a buffered file system.
  • SInvalidReadLock: protects sinval array. Readers using shared lock. SICleanupQueue, and other array-wide updates, requires an exclusive lock. It shows at the top of the pg_stat_activity when the shared buffer pool is under stress. Using a higher number of shared_buffers helps to reduce contention.
  • BufMappingLocks: protects regions of buffers. Sets 128 regions (16 before 9.5) of buffers to handle the whole buffer cache.

Spinlocks

The lowest level for locking is spinlocks. Therefore, it’s implemented within CPU-specific instructions. PostgreSQL is trying to change an atomic variable value in a loop. If the value is changed from zero to one – the process obtained a spinlock. If it’s not possible to get a spinlock immediately, the process will increase its wait delay exponentially.  There is no monitoring on spinlocks and it’s not possible to release all accrued spinlocks at once. Due to the single state change, it’s also an exclusive lock. In order to simplify the porting of PostgreSQL to exotic CPU and OS variants, PostgreSQL uses OS semaphores for its spinlocks implementation. Of course, it’s significantly slower compared to native CPU instructions port.

Summary

  • Use pg_stat_activity to find which queries or LWLocks are causing lock waits
  • Use fresh branches of PostgreSQL, as developers have been working on performance improvements and trying to reduce locking contention on hot mutexes.

References

Locks Photo by Warren Sammut on Unsplash

Oct
24
2018
--

PostgreSQL locking, part 2: heavyweight locks

Locking in PostgreSQL

Locking in PostgreSQLPostgreSQL locking visibility for application developers and DBAs is in most cases related to heavyweight locks. Complex database locking operations require full instrumentation using views from the system catalog. It should be clear which object is locked by a specific database “backend” process. An alternative name for any lock is “bottleneck”. In order to make database operations parallel we should split a single bottleneck into multiple operation specific tasks.

This is the second part of three blog posts related to table level locks. The previous post was about row-level locks, and a subsequent post reviews the latches protecting internal database structures.

Example environment

A simple table with several rows:

CREATE TABLE locktest (c INT);
INSERT INTO locktest VALUES (1), (2);

Helper view

In order to check different types of these locks, let’s create a helper view as suggested by Bruce Momjian in his presentation:

CREATE VIEW lockview AS SELECT pid, virtualtransaction AS vxid, locktype AS lock_type,
mode AS lock_mode, granted,
CASE
WHEN virtualxid IS NOT NULL AND transactionid IS NOT NULL
THEN virtualxid || ' ' || transactionid
WHEN virtualxid::text IS NOT NULL
THEN virtualxid
ELSE transactionid::text                                                                                                                                                                                           END AS xid_lock, relname,
page, tuple, classid, objid, objsubid
FROM pg_locks LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
WHERE -- do not show our view’s locks
pid != pg_backend_pid() AND
-- no need to show self-vxid locks                                                                                                                                                                                 virtualtransaction IS DISTINCT FROM virtualxid
-- granted is ordered earlier
ORDER BY 1, 2, 5 DESC, 6, 3, 4, 7;

RowShareLock (ROW SHARE)

Many applications use the read-modify-write paradigm. For instance, the application fetches single object fields from a table, modifies the data, and saves the changes back to the database. In a multi-user environment, different users could modify the same rows in the course of this transaction. We can get inconsistent data with just a plain select. In response to user demands, almost all SQL databases have SELECT … FOR SHARE locking. This feature prevents the application entity from making data modifications until the locker transaction commits or rolls back.

For example:

  1. There is a user with multiple bank accounts stored in an accounts table, with total_amount stored in a bank_clients table.
  2. In order to update the total_amount field, we should prevent modification of all rows related to the specific bank client.
  3. It would be better to use a single update statement to calculate total_amount and select it from the accounts table. If the update requires external data, or some action from the user, then several statements are required
START TRANSACTION;
SELECT * FROM accounts WHERE client_id = 55 FOR SHARE;
SELECT * FROM bank_clients WHERE client_id=55 FOR UPDATE;
UPDATE bank_clients SET total_amount=38984.33, client_status='gold' WHERE client_id=55;
COMMIT;

The SELECT FOR SHARE statement creates a “RowShareLock” lock on the relation locktest.

Here’s exactly the same lock created with an SQL statement:

BEGIN;
LOCK TABLE locktest IN ROW SHARE MODE;

A single heavyweight RowShareLock is required regardless of the number of rows locked by the query.

This is illustrated with an unfinished transaction in the following example. Start the unfinished transaction, and select from lockview in a second connection to the database:

BEGIN;
SELECT * FROM locktest FOR SHARE;
-- In second connection:
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |   lock_mode   | granted | xid_lock | relname
-------+------+---------------+---------------+---------+----------+----------
 21144 | 3/13 | transactionid | ExclusiveLock | t       | 586      |
 21144 | 3/13 | relation      | RowShareLock  | t       |          | locktest

RowExclusiveLock (ROW EXCLUSIVE)

Real queries that modify rows also require heavyweight locks on tables, one per table.

The next example uses a DELETE query, but an UPDATE will have the same effect.

All commands that modify data in a table obtain a ROW EXCLUSIVE lock.

BEGIN;
DELETE FROM locktest;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode     | granted | xid_lock | relname
-------+------+---------------+------------------+---------+----------+----------
 10997 | 3/6  | transactionid | ExclusiveLock    | t       | 589      |
 10997 | 3/6  | relation      | RowExclusiveLock | t       |          | locktest

This new lock is incompatible with the previous FOR SHARE example.

SELECT * FROM locktest FOR SHARE

  waits for the delete transaction to finish or abort:

postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname,page,tuple FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode     | granted | xid_lock | relname  | page | tuple
-------+------+---------------+------------------+---------+----------+----------+------+-------
 10997 | 3/6  | transactionid | ExclusiveLock    | t       | 589      |          |      |
 10997 | 3/6  | relation      | RowExclusiveLock | t       |          | locktest |      |
 11495 | 5/9  | relation      | RowShareLock     | t       |          | locktest |      |
 11495 | 5/9  | tuple         | RowShareLock     | t       |          | locktest |    0 |     1
 11495 | 5/9  | transactionid | ShareLock        | f       | 589      |          |      |

Queries modifying table content also lock all indexes, even if the index does not contain modified fields.

-- preparation
CREATE INDEX c_idx2 ON locktest (c);
ALTER TABLE locktest ADD COLUMN c2 INT;
CREATE INDEX c2_idx ON locktest(c2);
-- unfinished example transaction
BEGIN;
UPDATE locktest SET c=3 WHERE c=1;
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  | lock_type  |    lock_mode     | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+------------+------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7844 | virtualxid | ExclusiveLock    | t       | 3/7844   |          |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c2_idx   |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c_idx    |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | c_idx2   |      |       |         |       |
 3998 | 3/7844 | relation   | RowExclusiveLock | t       |          | locktest |      |       |         |       |

ShareLock (SHARE)

The non-concurrent version of CREATE INDEX prevents table updates, e.g. DROP TABLE or INSERT or DELETE, with ShareLock.

BEGIN;
CREATE INDEX c_idx ON locktest (c);
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  |   lock_type   |      lock_mode      | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+---------------+---------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7835 | virtualxid    | ExclusiveLock       | t       | 3/7835   |          |      |       |         |       |
 3998 | 3/7835 | transactionid | ExclusiveLock       | t       | 564      |          |      |       |         |       |
 3998 | 3/7835 | relation      | AccessExclusiveLock | t       |          |          |      |       |         |       |
 3998 | 3/7835 | relation      | ShareLock           | t       |          | locktest |      |       |         |       |

You can execute multiple CREATE INDEX queries in parallel unless the index name is exactly the same. The wait happens on the row lock (ShareLock with “transactionid” type) in the pg_class table.

Note that there is also AccessExclusiveLock lock with type “relation”, but it’s not a table level one.

ShareUpdateExclusiveLock (SHARE UPDATE EXCLUSIVE)

These database maintenance operations need to take a ShareUpdateExclusiveLock:

  • ANALYZE table
  • VACUUM (without full) runs.
  • CREATE INDEX CONCURRENTLY

The 

ANALYZE tablename;

  statement updates table statistics. The query planner/optimizer is able to provide the best plans for query execution only if the statistics are up to date.

BEGIN;
ANALYZE locktest;
-- in second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |        lock_mode         | granted | xid_lock | relname
-------+------+---------------+--------------------------+---------+----------+----------
 10997 | 3/7  | transactionid | ExclusiveLock            | t       | 591      |
 10997 | 3/7  | relation      | ShareUpdateExclusiveLock | t       |          | locktest

There is no conflict between RowExclusiveLock and ShareUpdateExclusiveLock. UPDATE/DELETE/INSERT could still modify rows during ANALYZE.

VACUUM and CREATE INDEX CONCURRENTLY can be executed only outside a transaction. To see the effects of these statements in lockview, execute a conflicting transaction first e.g. run ANALYZE in a transaction, or run VACUUM against a huge table.

CREATE INDEX CONCURRENTLY locking can be confusing. SHARE UPDATE EXCLUSIVE lock does not conflict with a ROW EXCLUSIVE lock that’s used for DELETES, INSERT and UPDATES. Unfortunately, CREATE INDEX CONCURRENTLY waits until active transactions are finished twice due to full table scans:

In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions. Before each table scan, the index build must wait for existing transactions that have modified the table to terminate.” PostgreSQL Documentation

AccessExclusiveLock  (ACCESS EXCLUSIVE)

This lock conflicts with any other locks and is used by these statements:

  • CREATE RULE
  • DROP TABLE
  • DROP INDEX
  • TRUNCATE
  • VACUUM FULL
  • LOCK TABLE (default mode)
  • CLUSTER
  • REINDEX
  • REFRESH MATERIALIZED VIEW (without CONCURRENTLY)
BEGIN;
CREATE RULE r_locktest AS ON INSERT TO locktest DO INSTEAD NOTHING;
-- second connection
postgres=# select pid,vxid,lock_type,lock_mode,granted,xid_lock,relname from lockview;
  pid  | vxid |   lock_type   |      lock_mode      | granted | xid_lock | relname
-------+------+---------------+---------------------+---------+----------+----------
 10997 | 3/19 | transactionid | ExclusiveLock       | t       | 596      |
 10997 | 3/19 | relation      | AccessExclusiveLock | t       |          | locktest

More importantly, drop index requires access exclusive locks for both table and index:

BEGIN;
DROP INDEX c_idx;
-- second connection
postgres=# SELECT * FROM lockview;
 pid  |  vxid  |   lock_type   |      lock_mode      | granted | xid_lock | relname  | page | tuple | classid | objid | objsubid
------+--------+---------------+---------------------+---------+----------+----------+------+-------+---------+-------+----------
 3998 | 3/7839 | virtualxid    | ExclusiveLock       | t       | 3/7839   |          |      |       |         |       |
 3998 | 3/7839 | transactionid | ExclusiveLock       | t       | 569      |          |      |       |         |       |
 3998 | 3/7839 | relation      | AccessExclusiveLock | t       |          | c_idx    |      |       |         |       |
 3998 | 3/7839 | relation      | AccessExclusiveLock | t       |          | locktest |      |       |         |       |

Note: This is the most dangerous type of lock. Avoid running queries requiring access exclusive lock in production, or at least put the application in maintenance mode.

ExclusiveLock

Meanwhile, SQL commands don’t use ExclusiveLock, except for with the general LOCK TABLE statement. This lock prevents all requests except for a non-locking select (i.e. without FOR SHARE/UPDATE).

BEGIN;
LOCK TABLE locktest IN EXCLUSIVE MODE;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid | lock_type |   lock_mode   | granted | xid_lock | relname
-------+------+-----------+---------------+---------+----------+----------
 10997 | 3/21 | relation  | ExclusiveLock | t       |          | locktest

Savepoints

Savepoint produces an additional ExclusiveLock of transactionid type with new xid value.

BEGIN;
SELECT * FROM locktest FOR SHARE;
SAVEPOINT s1;
SELECT * FROM locktest FOR UPDATE;
-- second connection
postgres=# SELECT pid,vxid,lock_type,lock_mode,granted,xid_lock,relname FROM lockview;
  pid  | vxid |   lock_type   |    lock_mode    | granted | xid_lock | relname
-------+------+---------------+-----------------+---------+----------+----------
 10997 | 3/37 | transactionid | ExclusiveLock   | t       | 602      |
 10997 | 3/37 | transactionid | ExclusiveLock   | t       | 603      |
 10997 | 3/37 | relation      | AccessShareLock | t       |          | c_idx
 10997 | 3/37 | relation      | RowShareLock    | t       |          | locktest

pg_advisory_lock

Sometimes application developers require synchronization between processes. In such systems, the application creates and removes locks frequently. Systems with a row-based locks implementation tend to cause table bloat.

There are many functions related to advisory locks:

  • per session or per transaction
  • wait if lock is not available or immediately return false
  • exclusive or shared
  • 64-bit or two 32-bit integers resource identifiers

Imagine that we have several cron jobs and that the application should prevent simultaneous runs of the same script. Next, each script can check if a lock is available in PostgreSQL for specific integer job identifier:

postgres=# SELECT pg_try_advisory_lock(10);
 pg_try_advisory_lock
----------------------
 t
-- second connection
postgres=# SELECT * FROM lockview;
 pid  | vxid | lock_type |   lock_mode   | granted | xid_lock | relname | page | tuple | classid | objid | objsubid
------+------+-----------+---------------+---------+----------+---------+------+-------+---------+-------+----------
 3998 | 3/0  | advisory  | ExclusiveLock | t       |          |         |      |       |       0 |    10 |        1
-- other connections
SELECT pg_try_advisory_lock(10);
 pg_try_advisory_lock
----------------------
 f

The query produces ExclusiveLock with type advisory.

Deadlocks

Any system with multiple locks tends to have a deadlock situation when queries will never finish. The only way to resolve such issues: kill one of blocked statements. More importantly, deadlock detection is an expensive procedure in PostgreSQL. A check for deadlock only happens when a transaction is locked for deadlock_timeout milliseconds—after one second by default.

Here is an illustration of a deadlock situation for two different connections A and B:

Any deadlock starting from lock wait.

A: BEGIN; SELECT c FROM locktest WHERE c=1 FOR UPDATE;
B: BEGIN; SELECT c FROM locktest WHERE c=2 FOR UPDATE; SELECT c FROM locktest WHERE c=1 FOR UPDATE;

You are not alone with the identification of deadlocks, as the pg_stat_activity system view helps you to find statements and transactions causing lock waits:

postgres=# SELECT pg_stat_activity.pid AS pid,
query, wait_event, vxid, lock_type,
lock_mode, granted, xid_lock
FROM lockview JOIN pg_stat_activity ON (lockview.pid = pg_stat_activity.pid);
  pid  |          query             |  wait_event   | vxid |   lock_type   |      lock_mode      | granted | xid_lock
-------+----------------------------+---------------+------+---------------+---------------------+---------+----------
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | transactionid | ExclusiveLock       | t       | 605
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | advisory      | ExclusiveLock       | t       |
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | relation      | AccessShareLock     | t       |
 10997 | SELECT ... c=1 FOR UPDATE; | ClientRead    | 3/43 | relation      | RowShareLock        | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | transactionid | ExclusiveLock       | t       | 606
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | advisory      | ExclusiveLock       | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | relation      | AccessShareLock     | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | relation      | RowShareLock        | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | tuple         | AccessExclusiveLock | t       |
 11495 | SELECT ... c=1 FOR UPDATE; | transactionid | 5/29 | transactionid | ShareLock           | f       | 605

The SELECT FOR UPDATE on c=2 row causes a deadlock:

SELECT c FROM locktest WHERE c=2 FOR UPDATE;

Afterwards, PostgreSQL reports in server log:

2018-08-02 08:46:07.793 UTC [10997] ERROR:  deadlock detected
2018-08-02 08:46:07.793 UTC [10997] DETAIL:  Process 10997 waits for ShareLock on transaction 606; blocked by process 11495.
Process 11495 waits for ShareLock on transaction 605; blocked by process 10997.
Process 10997: select c from locktest where c=2 for update;
Process 11495: select c from locktest where c=1 for update;
2018-08-02 08:46:07.793 UTC [10997] HINT:  See server log for query details.
2018-08-02 08:46:07.793 UTC [10997] CONTEXT:  while locking tuple (0,3) in relation "locktest"
2018-08-02 08:46:07.793 UTC [10997] STATEMENT:  SELECT c FROM locktest WHERE c=2 FOR UPDATE;
ERROR:  deadlock detected
DETAIL:  Process 10997 waits for ShareLock on transaction 606; blocked by process 11495.
Process 11495 waits for ShareLock on transaction 605; blocked by process 10997.
HINT:  See server log for query details.
CONTEXT:  while locking tuple (0,3) in relation "locktest"

As you can see, the database server aborts one blocked transaction automatically.

Multi-way deadlocks

Normally there are just two transactions creating deadlocks. However, in complex cases, an application could caused deadlocks with multiple transactions forming a dependency circle.

A: locks row1, B locks row2, C locks row3

Step2

A: trying to get row3, B: trying to get row1, C: trying to get row2

Summary

  • Do not put DDL statements in long transactions.
  • Please avoid DDL on during high load for frequently updated tables
  • CLUSTER command requires exclusive access to the table and all it’s indexes
  • Monitor postgresql log for deadlock-related messages

Photo by shy sol from Pexels

Oct
16
2018
--

PostgreSQL locking, Part 1: Row Locks

PostgreSQL row level locks

row signing with postgresqlAn understanding of PostgreSQL locking is important to build scalable applications and avoid downtime. Modern computers and servers have many CPU cores and it’s possible to execute multiple queries in parallel. Databases containing many consistent structures with changes made by queries or background processes running in parallel could crash a database or even corrupt data. Thus we need the ability to prevent access from concurrent processes, while changing shared memory structures or rows. One thread updates the structure while all others wait (exclusive lock), or multiple threads read the structure and all writes wait. The side effect of waits is a locking contention and server resources waste. Thus it’s important to understand why waits happen and what locks are involved. In this article, I review PostgreSQL row level locking.

In follow up posts, I will investigate table-level locks and latches protecting internal database structures.

Row locks – an overview

PostgreSQL has many locks at different abstraction levels. The most important locks for applications are related to MVCC implementation – row level locking. In second place – locks appearing during maintenance tasks (during backups/database migrations schema changes) – table level locking. It’s also possible—but rare—to see waits on low level PostgreSQL locks. More often there is a high CPU usage, with many concurrent queries running, but overall server performance reduced in comparison with normal number of queries running in parallel.

Example environment

To follow along, you need a PostgreSQL server with a single-column table containing several rows:

postgres=# CREATE TABLE locktest (c INT);
CREATE TABLE
postgres=# INSERT INTO locktest VALUES (1), (2);
INSERT 0 2

Row locks

Scenario: two concurrent transactions are trying to select a row for update.

PostgreSQL uses row-level locking in this case. Row level locking is tightly integrated with MVCC implementation, and uses hidden xmin and xmax fields.

xmin

 and

xmax

 store the transaction id. All statements requiring row-level locks modify the xmax field (even SELECT FOR UPDATE). The modification happens after the query returns its results, so in order to see xmax change we need to run SELECT FOR UPDATE twice. Usually, the xmax field is used to mark a row as expired—either removed by some transaction completely or in favor of updated row version—but it also used for row-level locking infrastructure.

If you need more details about the xmin and xmax hidden fields and MVCC implementation, please check our “Basic Understanding of Bloat and VACUUM in PostgreSQL” blog post.

postgres=# BEGIN;
postgres=# SELECT xmin,xmax, txid_current(), c FROM locktest WHERE c=1 FOR UPDATE;
BEGIN
 xmin | xmax | txid_current | c
------+------+--------------+---
  579 |  581 |          583 | 1
(1 row)
postgres=# SELECT xmin,xmax, txid_current(), c FROM locktest WHERE c=1 FOR UPDATE;
 xmin | xmax | txid_current | c
------+------+--------------+---
  579 |  583 |          583 | 1
(1 row)

If a statement is trying to to modify the same row, it checks the list of unfinished transactions. The statement has to wait for modification until the transaction with id=xmax is finished.

There is no infrastructure for waiting on a specific row, but a transaction can wait on transaction id.

-- second connection
SELECT xmin,xmax,txid_current() FROM locktest WHERE c=1 FOR UPDATE;

The SELECT FOR UPDATE query running in the second connection is unfinished, and waiting for the first transaction to complete.

pg_locks

Such waits and locks can be seen by querying pg_locks:

postgres=# SELECT locktype,transactionid,virtualtransaction,pid,mode,granted,fastpath
postgres-#  FROM pg_locks WHERE transactionid=583;
   locktype    | transactionid | virtualtransaction |  pid  |     mode      | granted | fastpath
---------------+---------------+--------------------+-------+---------------+---------+----------
 transactionid |           583 | 4/107              | 31369 | ShareLock     | f       | f
 transactionid |           583 | 3/11               | 21144 | ExclusiveLock | t       | f

You can see the writer transaction id for locktype=transactionid == 583. Let’s get the pid and backend id for the holding lock:

postgres=# SELECT id,pg_backend_pid() FROM pg_stat_get_backend_idset() AS t(id)
postgres-#  WHERE pg_stat_get_backend_pid(id) = pg_backend_pid();
 id | pg_backend_pid
----+----------------
  3 |          21144

This backend has its lock granted (t). Each backend has an OS process identifier (PID) and internal PostgreSQL identifier (backend id). PostgreSQL can process many transactions, but locking can happen only between backends, and each backend executes a single transaction. Internal bookkeeping requires just a virtual transaction identifier: a pair of backend ids and a sequence number inside the backend.

Regardless of the number of rows locked, PostgreSQL will have only a single related lock in the pg_locks table. Queries might modify billions of rows but PostgreSQL does not waste memory for redundant locking structures.

A writer thread sets ExclusiveLock on its transactionid. All row level lock waiters set ShareLock. The lock manager resumes all previously locked backend locks as soon as the writer releases the lock.

Lock release for transactionid occurs on commit or rollback.

pg_stat_activity

Another great method to get locking-related details is to select from the pg_stat_activity table:

postgres=# SELECT pid,backend_xid,wait_event_type,wait_event,state,query FROM pg_stat_activity WHERE pid IN (31369,21144);
-[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------------
pid             | 21144
backend_xid     | 583
wait_event_type | Client
wait_event      | ClientRead
state           | idle in transaction
query           | SELECT id,pg_backend_pid() FROM pg_stat_get_backend_idset() AS t(id) WHERE pg_stat_get_backend_pid(id) = pg_backend_pid();
-[ RECORD 2 ]---+---------------------------------------------------------------------------------------------------------------------------
pid             | 31369
backend_xid     | 585
wait_event_type | Lock
wait_event      | transactionid
state           | active
query           | SELECT xmin,xmax,txid_current() FROM locktest WHERE c=1 FOR UPDATE;

Source code-level investigation

Let’s check the stack trace for the waiter with gdb and the pt-pmp tool:

# pt-pmp -p 31369
Sat Jul 28 10:10:25 UTC 2018
30	../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
      1 epoll_wait,WaitEventSetWaitBlock,WaitEventSetWait,WaitLatchOrSocket,WaitLatch,ProcSleep,WaitOnLock,LockAcquireExtended,LockAcquire,XactLockTableWait,heap_lock_tuple,ExecLockRows,ExecProcNode,ExecutePlan,standard_ExecutorRun,PortalRunSelect,PortalRun,exec_simple_query,PostgresMain,BackendRun,BackendStartup,ServerLoop,PostmasterMain,main

The WaitOnLock function is causing the wait. The function is located in lock.c file (POSTGRES primary lock mechanism).

A lock table is a shared memory hash table. The conflicting process sleeps for the lock in storage/lmgr/proc.c. For the most part, this code should be invoked via lmgr.c or another lock-management module, not directly.

Next, locks listed in pg_stat_activity as “Lock” are also called heavyweight locks, and controlled by Lock Manager. HWLocks are also used for many high level actions.

By the way, a full description can be found here: https://www.postgresql.org/docs/current/static/explicit-locking.html

Summary

  • Avoid long running transactions modifying frequently updated rows or too many rows
  • Next, do not use hotspots (single row or multiple rows updated in parallel by many application client connections) with MVCC databases. This kind of workload is more suitable for in-memory databases and can usually be separated from the main business logic.
Jun
17
2016
--

InnoDB locks and transaction isolation level

InnoDB locks and transaction isolation

InnoDB locks and transaction isolationWhat is the difference between InnoDB locks and transaction isolation level? We’ll discuss it in this post.

Recently I received a question from a user about one of my earlier blog posts. Since it wasn’t sent as a comment, I will answer it here. The question:

> I am reading your article:
> https://www.percona.com/resources/technical-presentations/troubleshooting-locking-issues-percona-mysql-webinar

> Full table scan locks whole table.

> Some bad select (read) query can do full table scan on InnoDB, does it lock whole table please?

> My understanding was that SELECT (read) blocks another DML only in MyISAM.

To answer this question, we to need understand two different concepts: locking and transaction isolation level.

In the default transaction isolation mode for InnoDB, 

REPEATABLE READ

 and the lower

TRANSACTION ISOLATION

levels,

SELECT

  doesn’t block any DML unless it uses 

SELECT ... FOR UPDATE

 or

SELECT ... LOCK IN SHARE MODE

. On slide 20 (31) of my presentation, I use the 

SELECT ... FOR UPDATE

 statement (this is why a lock is seen).

However, if transaction isolation mode is

SERIALIZABLE

, then 

SELECT

 can block updates. You can see this in the example below:

mysql1> set transaction isolation level serializable;
Query OK, 0 rows affected (0,00 sec)
mysql1> begin;
Query OK, 0 rows affected (0,00 sec)
mysql1> select * from employees join titles using(emp_no);
c936e6fc4c6cbaf77679ba5013339dff  -
443308 rows in set (29,69 sec)
mysql2> begin;
Query OK, 0 rows affected (0,00 sec)
mysql2> update titles set title='Engineer' where emp_no=15504;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Regarding the different level locks set by InnoDB, I recommend that you follow the links on slide 37 (67) of the presentation, especially “InnoDB Locking Explained With Stick Figures” by Bill Karwin. Innodb table locks are also explained very well in this blog post by Peter Zaitsev.

Oct
17
2013
--

InnoDB scalability issues due to tables without primary keys

Each day there is probably work done to improve performance of the InnoDB storage engine and remove bottlenecks and scalability issues. Hence there was another one I wanted to highlight:

Scalability issues due to tables without primary keys

This scalability issue is caused by the usage of tables without primary keys. This issue typically shows itself as contention on the InnoDB dict_sys mutex. Now the dict_sys mutex controls access to the data dictionary. This mutex is used at various places. I will only mention a few of them:

  • During operations such as opening and closing table handles, or
  • When accessing I_S tables, or
  • During undo of a freshly inserted row, or
  • During other data dictionary modification operations such as CREATE TABLE, or
  • Within the “Persistent Stats” subsystem, among other things.

Of course this list is not exhaustive but should give you a good picture of how heavily it is used.

But the thing is when you are mainly debugging contention related to a data dictionary control structure, you start to look off at something that is directly related to data dictionary modifications. You look for execution of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, etc. But what if none of that is actually causing the contention on the dict_sys mutex? Are you aware when generating “row-id” values, for tables without explicit primary keys, or without non-nullable unique keys, dict_sys mutex is acquired. So INSERTs to tables with implicit primary keys is a InnoDB system-wide contention point.

Let’s also take a look at the relevant source code.

Firstly, below is the function that does the row-id allocation which is defined in the file storage/innobase/row/row0ins.cc

3060 /***********************************************************//**
3061 Allocates a row id for row and inits the node->index field. */
3062 UNIV_INLINE
3063 void
3064 row_ins_alloc_row_id_step(
3065 /*======================*/
3066         ins_node_t*     node)   /*!< in: row insert node */ 3067 { 3068         row_id_t        row_id; 3069  3070         ut_ad(node->state == INS_NODE_ALLOC_ROW_ID);
3071
3072         if (dict_index_is_unique(dict_table_get_first_index(node->table))) {
3073
3074                 /* No row id is stored if the clustered index is unique */
3075
3076                 return;
3077         }
3078
3079         /* Fill in row id value to row */
3080
3081         row_id = dict_sys_get_new_row_id();
3082
3083         dict_sys_write_row_id(node->row_id_buf, row_id);
3084 }

Secondly, below is the function that actually generates the row-id which is defined in the file storage/innobase/include/dict0boot.ic

26 /**********************************************************************//**
 27 Returns a new row id.
 28 @return the new id */
 29 UNIV_INLINE
 30 row_id_t
 31 dict_sys_get_new_row_id(void)
 32 /*=========================*/
 33 {
 34         row_id_t        id;
 35
 36         mutex_enter(&(dict_sys->mutex));
 37
 38         id = dict_sys->row_id;
 39
 40         if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {
 41
 42                 dict_hdr_flush_row_id();
 43         }
 44
 45         dict_sys->row_id++;
 46
 47         mutex_exit(&(dict_sys->mutex));
 48
 49         return(id);
 50 }

Finally, I would like to share results of a few benchmarks that I conducted in order to show you how this affects performance.

Benchmarking affects of non-presence of primary keys

First off all, let me share information about the host that was used in the benchmarks. I will also share the MySQL version and InnoDB configuration used.

Hardware

The host was a “hi1.4xlarge” Amazon EC2 instance. The instance comes with 16 vCPUs and 60.5GB of memory. The instance storage consists of 2×1024 SSD-backed storage volumes, and the instance is connected to a 10 Gigabit ethernet network. So the IO performance is very decent. I created a RAID 0 array from the 2 instance storage volumes and created XFS filesystem on the resultant software RAID 0 volume. This configuration would allows us to get the best possible IO performance out of the instance.

MySQL

The MySQL version used was 5.5.34 MySQL Community Server, and the InnoDB configuration looked as follows:

innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 42G
innodb-buffer-pool-instances   = 8
innodb-io-capacity             = 10000
innodb_adaptive_hash_index     = 1

I conducted two different types of benchmarks, and both of them were done by using sysbench.

First one involved benchmarking the performance of single-row INSERTs for tables with and without explicit primary keys. That’s what I would be showing first.

Single-row INSERTs

The tables were generated as follows for the benchmark involving tables with primary keys:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare

This resulted in the following table being created:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB

While the tables without primary keys were generated as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare

This resulted in the tables being created with the following structure:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  KEY `xid` (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB

The actual benchmark for the table with primary keys was run as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-dist-type=uniform --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --max-time=300 --num-threads=16 --max-requests=0 --report-interval=1 run

While the actual benchmark for the table without primary keys was run as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --oltp-dist-type=uniform --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --max-time=300 --num-threads=16 --max-requests=0 --report-interval=1 run

Note that the benchmarks were run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below are how the graphs look like for each of these benchmarks.

Writes per second 16 threads
Writes per second 32 threads
Writes per second 64 threads

Some key things to note from the graphs are that the throughput of the INSERTs to the tables without explicit primary keys never goes above 87% of the throughput of the INSERTs to the tables with primary keys defined. Furthermore, as we increase the concurrency downward spikes start appearing. These become more apparent when we move to a concurrency of 64 threads. This is expected, because the contention is supposed to increase as we increase the concurrency of operations that contend on the dict_sys mutex.

Now let’s take a look at how this impacts the bulk load performance.

Bulk Loads

The bulk loads to the tables with primary keys were performed as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/parallel_prepare.lua --oltp-tables-count=64 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --num-threads=16 run

While the bulk loads to the tables without primary keys were performed as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/parallel_prepare.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --num-threads=16 run

Note that the benchmarks were again run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below is what the picture is portrayed by the graph.

Parallel Bulk Loading of Tables

Here again, you can see how the bulk load time increases as we increase the number of concurrent threads. This against points to the increase in contention on the dict_sys mutex. With 16 threads the bulk load time for tables without primary keys is 107% more than the bulk load time for the tables with primary keys. This increases to 116% with 32 threads and finally 124% with 64 threads.

Conclusion

Tables without primary keys cause a wide range of contention because they rely on acquiring dict_sys mutex to generate row-id values. This mutex is used at critical places within InnoDB. Hence the affect of large amount of INSERTs to tables without primary keys is not only isolated to that table alone but can be seen very widely. There are a number of times I have seen tables without primary keys being used in many different scenarios that include simple INSERTs to these tables as well as multi-row INSERTs as a result of, for example, INSERT … SELECT into a table that is being temporarily created. The advice is always to have primary keys present in your tables. Hopefully I have been able to highlight the true impact non-presence of primary keys can have.

The post InnoDB scalability issues due to tables without primary keys appeared first on MySQL Performance Blog.

Aug
28
2012
--

Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels

As an instructor with Percona I’m sometimes asked about the differences between the READ COMMITTED and REPEATABLE READ transaction isolation levels.  There are a few differences between READ-COMMITTED and REPEATABLE-READ, and they are all related to locking.

Extra locking (not gap locking)
It is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent deadlocks and maintain the isolation level.

If you run an UPDATE that is not well indexed you will lock many rows:

update employees set store_id = 0 where store_id = 1;
---TRANSACTION 1EAB04, ACTIVE 7 sec
633 lock struct(s), heap size 96696, 218786 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 0x7f8dfc35d700, query id 47 localhost root
show engine innodb status

In the employees table, the column `store_id` is not indexed. Notice that the UPDATE has completed running (we are now running SHOW ENGINE …) but we are holding 218786 row locks and only one undo entry. This means that only one row was changed, but we are still holding extra locks.  The heap size represents the amount of memory that has been allocated for locks.

In REPEATABLE READ every lock acquired during a transaction is held for the duration of the transaction.

In READ COMMITTED the locks that did not match the scan are released after the STATEMENT completes.

Here is the UPDATE statement repeated under READ COMMITTED:

---TRANSACTION 1EAB06, ACTIVE 11 sec
631 lock struct(s), heap size 96696, 1 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 0x7f8dfc35d700, query id 62 localhost root
show engine innodb status

You’ll notice that the heap size is the same, but we are now holding only one lock. In all transaction isolation levels InnoDB creates locks over every index entry scanned. The difference between the levels is that once the statement completes in READ COMMITTED mode, the locks are released for the entries that did not match the scan.  Note that InnoDB does not immediately release the heap memory back after releasing the locks, so the heap size is the same as as that in REPEATABLE READ, but the number of locks held is lower (only one).

This means that in READ COMMITTED other transactions are free to update rows that they would not have been able to update (in REPEATABLE READ) once the UPDATE statement completes.

 

Consistent read views
In REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC, sees < ABB ) is created at the start of the transaction, and this read view (consistent snapshot in Oracle terms) is held open for the duration of the transaction. If you execute a SELECT statement at 5AM, and come back in an open transaction at 5PM, when you run the same SELECT, then you will see the exact same resultset that you saw at 5AM.  This is called MVCC (multiple version concurrency control) and it is accomplished using row versioning and UNDO information.

In REPEATABLE READ InnoDB also creates gap locks for range scans.

select * from some_table where id > 100 FOR UPDATE;

The above update will create a gap lock that will prevent any rows with id > 100 from being inserted into the table until the transaction rolls back or commits.

In the same transaction, if the SELECT … FOR UPDATE is run at 5AM, and an UPDATE is run at 5PM (“UPDATE some_table where id > 100″) then the UPDATE will change the same rows that SELECT FOR UPDATE locked at 5AM. There is no possibility of changing additional rows, because the gap after 100 was previously locked.

 

Non-repeatable reads (read committed)
In READ COMMITTED, a read view is created at the start of each statement.   This means that a SELECT made at 5AM may show different results from the same SELECT run at 5PM, even in the same transaction.   This is because in READ COMMITTED the read view for the transaction lasts only as long as each statement execution.   As a result, consecutive executions of the same statement may show different results.

This is called the ‘phantom row’ problem.

In addition, in READ COMMITTED gap locks are never created.   Since there is no gap lock, the example SELECT .. FOR UPDATE above will not prevent insertions of new rows into the table by other transactions.   Thus, locking rows with SELECT … FOR UPDATE (ie “where id> 100″)  and subsequently updating  rows with “where id> 100″ (even in the same transaction) may result in more rows being updated than were earlier locked. This is because new rows may have been inserted in the table between the statements since there was no gap lock created for the SELECT … FOR UPDATE.

Useful links

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com