Apr
07
2017
--

Non-Deterministic Order for SELECT with LIMIT

Non-Deterministic Order

Non-Deterministic OrderIn this blog, we’ll look at how queries in systems with parallel processing can return rows in a non-deterministic order (and how to fix it).

Short story:

Do not rely on the order of your rows if your query does not use

ORDER BY

. Even with

ORDER BY

, rows with the same values can be sorted differently. To fix this issue, always add

ORDER BY ... ID

 when you have

LIMIT N

.

Long story:

While playing with MariaDB ColumnStore and Yandex ClickHouse, I came across a very simple case. In MariaDB ColumnStore and Yandex ClickHouse, the simple query (which I used for testing)

select * from <table> where ... limit 10

  returns results in a non-deterministic order.

This is totally expected.

SELECT * from <table> WHERE ... LIMIT 10

 means “give me any ten rows, and as there is no order they can be anything that matches the

WHERE

 condition.” What we used to get in vanilla MySQL + InnoDB, however, is different:

SELECT * from <table> WHERE ... LIMIT 10

 gives us the rows sorted by primary key. Even with MyISAM in MySQL, if the data doesn’t change, the results are repeatable:

mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.01 sec)
mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.00 sec)

The results are ordered by ID here. In most cases, when the data doesn’t change and the query is the same, the order of results will be deterministic: open the file, read ten lines from the beginning, close the file. (When using indexes it can be different if different indexes are selected. For the same query, the database will probably select the same index if the data is static.)

But this is still not guaranteed. Here’s why: imagine we now introduce parallelism, split our table into ten pieces and run ten threads. Each will work on its own piece. Then, unless we specifically wait on each thread to finish and order the results, it will give us a random order of results. Let’s simulate this in a bash script:

for y in {2000..2010}
do
  sql="select YearD, count(*), sum(ArrDelayMinutes) from ontime where yeard=$y and carrier='DL' limit 1"
  mysql -Nb ontime -e "$sql" &
done
wait

The script’s purpose is to perform aggregation faster by taking advantage of multiple CPU cores on the server in parallel. It opens ten connections to MySQL and returns results as they arrive:

$ ./parallel_test.sh
2009    428007  5003632
2007    475889  5915443
2008    451931  5839658
2006    506086  6219275
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2005    658302  8143431
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058
$ ./parallel_test.sh
2009    428007  5003632
2008    451931  5839658
2007    475889  5915443
2006    506086  6219275
2005    658302  8143431
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058

In this case, the faster queries arrive first and are on top, with the slower on the bottom. If the network was involved (think about different nodes in a cluster connected via a network), then the response time from each node can be much more random due to non-deterministic network latency.

In the case of MariaDB ColumnStore or Yandex Clickhouse, where scans are performed in parallel, the order of the results can also be non-deterministic. An example for ClickHouse:

:) select * from wikistat where project = 'en' limit 1;
SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1
????????date??????????????????time???project???subproject???path???????hits????size??
? 2008-07-11 ? 2008-07-11 14:00:00 ? en      ?            ? Retainer ?   14 ? 96857 ?
?????????????????????????????????????????????????????????????????????????????????????
1 rows in set. Elapsed: 0.031 sec. Processed 2.03 million rows, 41.40 MB (65.44 million rows/s., 1.33 GB/s.)
:) select * from wikistat where project = 'en' limit 1;
SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1
????????date??????????????????time???project???subproject???path???????????hits?????size??
? 2008-12-15 ? 2008-12-15 14:00:00 ? en      ?            ? Graeme_Obree ?   18 ? 354504 ?
??????????????????????????????????????????????????????????????????????????????????????????
1 rows in set. Elapsed: 0.023 sec. Processed 1.90 million rows, 68.19 MB (84.22 million rows/s., 3.02 GB/s.)

An example for ColumnStore:

MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-18
time: 2008-01-18 06:00:00
project: en
subproject: NULL
path: Doctor_Who:_Original_Television_Soundtrack
hits: 2
size: 2
1 row in set (1.63 sec)
MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-31
time: 2008-01-31 10:00:00
project: de
subproject: NULL
path: Haramaki
hits: 1
size: 1
1 row in set (1.58 sec)

In another case (bug#72076) we use 

ORDER BY

, but the rows being sorted are the same. MySQL 5.7 contains the “ORDER BY” + LIMIT optimization:

If multiple rows have identical values in the ORDER BY columns, the
server is free to return those rows in any order, and may do so
differently depending on the overall execution plan. In other words,
the sort order of those rows is nondeterministic with respect to
the nonordered columns.

Conclusion
In systems that involve parallel processing, queries like

select * from table where ... limit N

 can return rows in a random order (even if the data doesn’t change between the calls). This is due to the async nature of the parallel calls: whoever serves results faster wins. In MySQL, you run

select * from table limit 1

 three times and get the same data in the same order (especially if the table data doesn’t change), but the response time will be slightly different. In a massively parallel system, the difference in the response times can cause the rows to be ordered differently.

To fix: always add

ORDER BY ... ID

  when you have

LIMIT N

.

Apr
06
2015
--

More on (transactional) MySQL metadata locks

Two years ago Ovais Tariq had explained in detail what kinds of problems existed before MySQL introduced metadata locks in 5.5.3 and how these locks help to prevent them. Still, some implications of metadata locking in MySQL remain unclear for users – DBAs and even software developers that target recent MySQL versions. I’ve decided to include a slide or two into the presentation about InnoDB locks and deadlocks I plan to make (with my colleague Nilnandan Joshi) on April 16 at Percona Live 2015.

I decided to do this as recently I’ve got an issue to work on where it was claimed that the behavior of SELECT blocking TRUNCATE TABLE is wrong, just because transaction isolation level was set to READ COMMITTED and thus there should be no locks set by SELECT and transaction should not even start no matter what the value of autocommit is (it was explicitly set to 0 by smart software).

The MySQL manual clearly says:

“To ensure transaction serializability, the server must not permit one session to perform a data definition language (DDL) statement on a table that is used in an uncompleted explicitly or implicitly started transaction in another session. The server achieves this by acquiring metadata locks on tables used within a transaction and deferring release of those locks until the transaction ends. A metadata lock on a table prevents changes to the table’s structure. This locking approach has the implication that a table that is being used by a transaction within one session cannot be used in DDL statements by other sessions until the transaction ends.”

So, the real challenge was to show these metadata locks still set in a transaction that started implicitly, by SELECT immediately following SET autocommit=0 in a session. It was a good chance to check how metadata locks are exposed in MySQL 5.7 via Performance Schema, so I’ve set up a simple test.

First of all, I’ve enabled instrumentation for metadata locks:

[openxs@centos 5.7]$ bin/mysql --no-defaults -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.7.6-m16 MySQL Community Server (GPL)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'global_instrumentation';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0
mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0

Then I’ve set up a simple test based on the details from the issue (I’ve create the InnoDB table, t, and added a row to it before this):

mysql> set session autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@autocommit, @@tx_isolation;
+--------------+----------------+
| @@autocommit | @@tx_isolation |
+--------------+----------------+
| 0 | READ-COMMITTED |
+--------------+----------------+
1 row in set (0.00 sec)
mysql> select * from t limit 1;
+----+------+
| id | val |
+----+------+
| 1 | 1 |
+----+------+
1 row in set (0.00 sec)

Now, from another session I tried to TRUNCATE the table before the fist session got a chance to do explicit or implicit COMMIT (In the issue I mentioned software used just had not cared to do this, assuming transaction had not started. It worked with MySQL 5.1 really well that way.)

mysql> truncate table t;

I was not surprised that TRUNCATE hung. Manual clearly says that until transaction is committed we do not release metadata locks. But let’s check them in Performance Schema (from the first session, where we executed SELECT):

mysql> select * from performance_schema.metadata_locksG
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: t
OBJECT_INSTANCE_BEGIN: 140450128308592
LOCK_TYPE: SHARED_READ
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5585
OWNER_THREAD_ID: 27
OWNER_EVENT_ID: 17
*************************** 2. row ***************************
OBJECT_TYPE: GLOBAL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140450195436144
LOCK_TYPE: INTENTION_EXCLUSIVE
LOCK_DURATION: STATEMENT
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:5224
OWNER_THREAD_ID: 30
OWNER_EVENT_ID: 8
*************************** 3. row ***************************
OBJECT_TYPE: SCHEMA
OBJECT_SCHEMA: test
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140450195434272
LOCK_TYPE: INTENTION_EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:5209
OWNER_THREAD_ID: 30
OWNER_EVENT_ID: 8
*************************** 4. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: t
OBJECT_INSTANCE_BEGIN: 140450195434368
LOCK_TYPE: EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_parse.cc:5585
OWNER_THREAD_ID: 30
OWNER_EVENT_ID: 8
*************************** 5. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: performance_schema
OBJECT_NAME: metadata_locks
OBJECT_INSTANCE_BEGIN: 140450128262384
LOCK_TYPE: SHARED_READ
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5585
OWNER_THREAD_ID: 27
OWNER_EVENT_ID: 18
5 rows in set (0.00 sec)

Note SHARED_READ lock set on table t and EXCLUSIVE lock is pending on the same table t above. TRUNCATE is blocked (as DDL).

Note also locks related  to out SELECT from the metadata_locks table in the output. Yes, access to Performance Schema is also protected with metadata locks!

We can get a nice view of all metadata locks from other sessions, excluding our current one, and check also all we could get about them before MySQL 5.7 (just a thread state in the SHOW PROCESSLIST output):

mysql> SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, THREAD_ID, PROCESSLIST_ID, PROCESSLIST_INFO FROM performance_schema.metadata_locks INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID WHERE PROCESSLIST_ID <> CONNECTION_ID();
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | LOCK_TYPE | LOCK_STATUS | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_INFO |
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------+
| GLOBAL | NULL | NULL | INTENTION_EXCLUSIVE | GRANTED | 30 | 8 | truncate table t |
| SCHEMA | test | NULL | INTENTION_EXCLUSIVE | GRANTED | 30 | 8 | truncate table t |
| TABLE | test | t | EXCLUSIVE | PENDING | 30 | 8 | truncate table t |
+-------------+---------------+-------------+---------------------+-------------+-----------+----------------+------------------+
3 rows in set (0.00 sec)
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+---------------------------------+------------------+
| 5 | root | localhost | test | Query | 0 | starting | show processlist |
| 8 | root | localhost | test | Query | 50 | Waiting for table metadata lock | truncate table t |
+----+------+-----------+------+---------+------+---------------------------------+------------------+
2 rows in set (0.00 sec)

As soon as I complete transaction where SELECT was executed, TRUNCATE completes and we see no pending metadata locks:

mysql> commit;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, THREAD_ID, PROCESSLIST_ID, PROCESSLIST_INFO FROM performance_schema.metadata_locks INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID WHERE PROCESSLIST_ID <> CONNECTION_ID();
Empty set (0.01 sec)
mysql> select * from t;
Empty set (0.00 sec)

To summarize, MySQL 5.7 allows you to study all metadata locks in detail. They are set for both transactional and non-transactional tables, but remember that if you use autocommit=0 or start transaction explicitly they are released only when commit happens, implicit or explicit. If you want single statement SELECT to not block any DDL after it is completed, make sure to COMMIT immediately or use autocommit=1.

We can surely call the behavior of metadata locks for this case a “bug” or file a “feature request” to change it, but for now any software that is supposed to work with MySQL 5.5.3+ should just take all implications of metadata locks into account.

The post More on (transactional) MySQL metadata locks appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com