Jul
02
2019
--

ALTERS, Foreign Keys, and Metadata Locks, oh my!

ALTERS, Foreign Keys, and Metadata Locks

ALTERS, Foreign Keys, and Metadata LocksI’m sure you’ve seen it – you kick off an ALTER and you get the dreaded “waiting on metadata lock”.  In many cases, this is expected if you are actively working on the table. However, I recently had a case with a client where the table being altered was rarely updated and very small (<100 rows).  The ALTER just sat for hours during a load test (more on this shortly) and never completed until the load test was stopped. Upon stopping the load test, the ALTER completed in less than a second as expected.  So what was going on here?

Check Foreign Keys

My first instinct, whenever there is odd locking, is to check foreign keys.  Naturally, this table had some FKs referencing a much busier table. However, this behavior still seemed rather strange.  When running an ALTER against a table, there is a request for a SHARED_UPGRADEABLE metadata lock against the child table. There is also a SHARED_READ_ONLY metadata lock against the parent and this is where things can get messy.

Let’s take a look at how MDLs are acquired per the documentation (https://dev.mysql.com/doc/refman/en/metadata-locking.html:

If there are multiple waiters for a given lock, the highest-priority lock request is satisfied first, with an exception related to the max_write_lock_count system variable. Write lock requests have higher priority than read lock requests.

It is important to note that the lock order is serialized:

Statements acquire metadata locks one by one, not simultaneously, and perform deadlock detection in the process.

Normally, when thinking of a queue, we think of a FIFO process.  If I issue the following three statements (in this order), they would complete in this order:

  1. INSERT INTO parent…
  2. ALTER TABLE child…
  3. INSERT INTO parent…

However, as the child ALTER statement requests a read lock against parent, the two inserts will complete PRIOR to the ALTER despite the ordering.  Here is a sample scenario in which this can be demonstrated:

Table setup and initial population:

CREATE TABLE `parent` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `val` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `child` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `val` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_parent` (`parent_id`),
  CONSTRAINT `fk_parent` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB;
INSERT INTO `parent` VALUES (1, "one"), (2, "two"), (3, "three"), (4, "four");

Session 1:

start transaction;
update parent set val = "four-new" where id = 4;

Session 2:

alter table child add index `idx_new` (val);

Session 3:

start transaction;
update parent set val = "three-new" where id = 3;

At this point, session 1 has an open transaction and is in sleep with a write metadata lock granted on parent.  Session 2 has an upgradeable (write) lock granted on child and is waiting on a read lock on the parent. Finally, session 3 has a granted write lock against parent:

mysql> select * from performance_schema.metadata_locks;
+-------------+-------------+-------------------+---------------+-------------+
| OBJECT_TYPE | OBJECT_NAME | LOCK_TYPE         | LOCK_DURATION | LOCK_STATUS |
+-------------+-------------+-------------------+---------------+-------------+
| TABLE       | child       | SHARED_UPGRADABLE | TRANSACTION   | GRANTED     | <- ALTER (S2)
| TABLE       | parent      | SHARED_WRITE      | TRANSACTION   | GRANTED     | <- UPDATE (S1)
| TABLE       | parent      | SHARED_WRITE      | TRANSACTION   | GRANTED     | <- UPDATE (S3)
| TABLE       | parent      | SHARED_READ_ONLY  | STATEMENT     | PENDING     | <- ALTER (S2)
+-------------+-------------+-------------------+---------------+-------------+

Notice, the only session with a pending lock status is session 2 (the ALTER).  Session 1 and session 3 (issued before and after the ALTER respectively) both have been granted the write locks.  Where the ordering breaks down is when the commit happens on session 1. When thinking about an ordered queue, one would expect session 2 to acquire the lock and things would just move on.  However, due to the priority nature of the metadata lock system, session 2 still waits and now session 3 has the lock.

If another write session comes in and starts a new transaction and acquires a write lock against the parent table, then even when session 3 completes, the ALTER will still be blocked.  You can see where this is going…

As long as I keep an active transaction that has an MDL against the parent table open, the ALTER on the child table will never complete.  Making this worse, since the write lock on the child table was successful (but the full statement is waiting on acquiring the parent read lock), all incoming read requests against the child table will be blocked!!

Also, think about how you normally try to troubleshoot a statement that won’t complete.  You look at transactions (both in the processlist and InnoDB status) that have been open for a longer time.  But since the blocking thread is now younger than the ALTER thread, the oldest transaction/thread you will see is the ALTER.  Queue hair pulling!!

This is exactly what was happening in this scenario.  In preparation for a release, our client was running their ALTER statements in conjunction with a load test (a very good practice!) to ensure a smooth release.  The problem was that the load test kept an active write transaction open against the parent table. That isn’t to say it just kept writing, but rather there were multiple threads and one was ALWAYS active.  This prevented the ALTER from ever completing and blocked ensuing read requests to the relatively static child table.

Fortunately, there is a solution to this problem (other than banishing FKs from the schema).  The variable max_write_lock_count (https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_write_lock_count) can be used to allow read locks to be granted ahead of write locks after a threshold of consecutive write locks.  By default, this variable is set to 18,446,744,073,709,551,615. So you just have to wait for 18 quintillion write requests to complete before the read would be granted.  To put that in perspective, if you were issuing 10,000 writes/sec against that table, your read would be blocked for 58 million years.  

To prevent this from happening, you can simply reduce max_write_lock_count to a small number (say 10?) and after every 10 write locks get acquired, the MDL subsystem will look for pending read locks, grant one, then move back to writes.  Problem solved!

Being a dynamic variable, this can be adjusted at runtime to allow the waiting ALTER to complete.  In general, this is more of an edge case as there normally is some time in between writes to a table for read locks to get acquired.  However, if your use case keeps concurrent sessions running that ALWAYS have a transaction against a table that is referenced as an FK, you could see this situation crop up.  Fortunately, the fix is straightforward and can be done on the fly!

NOTE:  This troubleshooting was made possible through the performance schema and enabling the metadata_locks table as described here: https://dev.mysql.com/doc/refman/5.7/en/metadata-locks-table.html

Mar
19
2019
--

Upcoming Webinar Thurs 3/21: MySQL Performance Schema in 1 hour

MySQL Performance Schema in 1 hour

MySQL Performance Schema in 1 hourPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents MySQL Performance Schema in 1 hour on Thursday, March 21st, 2019, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Register Now

MySQL 8.0 Performance Schema is a mature tool, used by humans and monitoring products. It was born in 2010 as “a feature for monitoring server execution at a low level.” The tool has grown over the years with performance fixes and DBA-faced features. In this webinar, I will give an overview of Performance Schema, focusing on its tuning, performance, and usability.

Performance Schema helps to troubleshoot query performance, complicated locking issues and memory leaks. It can also troubleshoot resource usage, problematic behavior caused by inappropriate settings and much more. Additionally, it comes with hundreds of options which allow for greater precision tuning.

Performance Schema is a potent and very complicated tool. What’s more, it does not affect performance in most cases. However, it collects a lot of data and sometimes this data is hard to read.

In this webinar, I will guide you through the main Performance Schema features, design, and configuration. You will learn how to get the best of it. I will cover its companion sys schema and graphical monitoring tools.

In order to learn more, register for MySQL Performance Schema in 1 hour today.

Oct
17
2018
--

Can MySQL Parallel Replication Help My Slave?

InnoDB Row Operations per Hour graph from Percona Monitoring and Management performance monitoring tool

Parallel replication has been around for a few years now but is still not that commonly used. I had a customer where the master had a very large write workload. The slave could not keep up so I recommended to use parallel slave threads. But how can I measure if it really helps and is working?

At my customer the

slave_parallel_workers

  was 0. But how big should I increase it, maybe to 1? Maybe to 10? There is a blog post about how can we see how many threads are actually used, which is a great help.

We changed the following variables on the slave:

slave_parallel_type = LOGICAL_CLOCK;
slave_parallel_workers = 40;
slave_preserve_commit_order = ON;

40 threads sounds a lot, right? Of course, this is workload specific: if the transactions are independent it might be useful.

Let’s have a look, how many threads are working:

mysql> SELECT performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID AS THREAD_ID
, performance_schema.events_transactions_summary_by_thread_by_event_name.COUNT_STAR AS COUNT_STAR
FROM performance_schema.events_transactions_summary_by_thread_by_event_name
WHERE performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID IN
     (SELECT performance_schema.replication_applier_status_by_worker.THREAD_ID
      FROM performance_schema.replication_applier_status_by_worker);
+-----------+------------+
| THREAD_ID | COUNT_STAR |
+-----------+------------+
| 25882 | 442481 |
| 25883 | 433200 |
| 25884 | 426460 |
| 25885 | 419772 |
| 25886 | 413751 |
| 25887 | 407511 |
| 25888 | 401592 |
| 25889 | 395169 |
| 25890 | 388861 |
| 25891 | 380657 |
| 25892 | 371923 |
| 25893 | 362482 |
| 25894 | 351601 |
| 25895 | 339282 |
| 25896 | 325148 |
| 25897 | 310051 |
| 25898 | 292187 |
| 25899 | 272990 |
| 25900 | 252843 |
| 25901 | 232424 |
+-----------+------------+

You can see all the threads are working. Which is great.

But did this really speed up the replication? Could the slave write more in the same period of time?

Let’s see the replication lag:

MySQL Replication Delay graph from PMM

As we can see, lag goes down quite quickly. Is this because the increased thread numbers? Or because the job which generated the many inserts finished and there are no more writes coming? (The replication delay did not go to 0 because this slave is deliberately delayed by an hour.)

Luckily in PMM we have other graphs as well that can help us. Like this one showing InnoDB row operations:

InnoDB Row Operations graph from PMM

 

That looks promising: the slave now inserts many more rows than usual. But how much rows were inserted, actually? Let’s create a new graph to see how many rows were inserted per hour. In PMM we already have all this information, we just have to create a new graph using the following query:

increase(mysql_global_status_innodb_row_ops_total{instance="$host",operation!="read"}[1h])

And this is the result:

InnoDB Row Operations per Hour graph from Percona Monitoring and Management performance monitoring tool

We can see a huge jump in the number of inserted rows per hour, it went from ~50Mil to 200-400Mil per hours. We can say that increasing the number of 

slave_parallel_workers

  really helped.

Conclusion

In this case, parallel replication was extremely useful and we could confirm that using PMM and Performance Schema. If you tune the

slave_parallel_workers

  check the graphs. You can show the impact to your boss. ?

 

 

Oct
10
2018
--

Instrumenting Read Only Transactions in InnoDB

Instrumenting read only transactions MySQL

Instrumenting read only transactions MySQLProbably not well known but quite an important optimization was introduced in MySQL 5.6 – reduced overhead for “read only transactions”. While usually by a “transaction” we mean a query or a group of queries that change data, with transaction engines like InnoDB, every data read or write operation is a transaction.

Now, as a non-locking read operation obviously has less impact on the data, it does not need all the instrumenting overhead a write transaction has. The main thing that can be avoided, as described by documentation, is the transaction ID. So, since MySQL 5.6, a read only transaction does not have a transaction ID. Moreover, such a transaction is not visible in the SHOW ENGINE INNODB STATUS output, though I will not go deeper on what really that means under the hood in this article. The fact is that this optimization allows for better scaling of workloads with many RO threads. An example RO benchmark, where 5.5 vs 5.6/5.7 difference is well seen, may be found here: https://www.percona.com/blog/2016/04/07/mysql-5-7-sysbench-oltp-read-results-really-faster/

To benefit from this optimization in MySQL 5.6, either a transaction has to start with the explicit START TRANSACTION READ ONLY clause or it must be an autocommit, non-locking SELECT statement. In version 5.7 and newer, it goes further, as a new transaction is treated as read-only until a locking read or write is executed, at which point it gets “upgraded” to a read-write one.

Information Schema Instrumentation

Let’s see how it looks like (on MySQL 8.0.12) by looking at information_schema.innodb_trx and information_schema.innodb_metrics tables. The second of these, by default, has transaction counters disabled, so before the test we have to enable it with:

SET GLOBAL innodb_monitor_enable = 'trx%comm%';

or by adding a parameter to the

[mysqld]

 section of the configuration file and restarting the instance:

innodb_monitor_enable = "trx_%"

Now, let’s start a transaction which should be read only according to the rules:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

Transaction started as above, did not appear in SHOW ENGINE INNODB STATUS, and its trx_id looks strangely high. And first surprise—for some reason, trx_is_read_only is 0. Now, what if we commit such a transaction—how do the counters change? (I reset them before the test):

mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     1 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

OK, so clearly it was a read-only transaction overall, just the trx_is_read_only property wasn’t set as expected. I had to report this problem here: https://bugs.mysql.com/bug.php?id=92558

What about an explicit RO transaction:

mysql [localhost] {msandbox} (db1) > START TRANSACTION READ ONLY; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 1
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

OK, both transactions are counted as the same type. Moreover, the two transactions shared the same strange trx_id, which appears to be a fake one. For a simple read executed in autocommit mode, the counters increase as expected too:

mysql [localhost] {msandbox} (db1) > select @@autocommit; SELECT count(*) FROM db1.t1;
+--------------+
| @@autocommit |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

Now, let’s test how a transaction looks when we upgrade it to RW later:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT count(*) FROM db1.t1 FOR UPDATE;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 4106
                trx_weight: 2
           trx_rows_locked: 4
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     1 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

OK, as seen above, after a locking read was done, our transaction has transformed: it got a real, unique trx_id assigned. Then, when committed, the RW counter increased.

Performance Schema Problem

Nowadays it may feel natural to use performance_schema for monitoring everything. And, indeed, we can monitor types of transactions with it as well. Let’s enable the needed consumers and instruments:

mysql [localhost] {msandbox} (db1) > UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%transactions%';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 3  Changed: 0  Warnings: 0
mysql [localhost] {msandbox} (db1) > UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'transaction';
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0
mysql [localhost] {msandbox} (db1) > SELECT * FROM performance_schema.setup_instruments WHERE NAME = 'transaction';
+-------------+---------+-------+------------+------------+---------------+
| NAME        | ENABLED | TIMED | PROPERTIES | VOLATILITY | DOCUMENTATION |
+-------------+---------+-------+------------+------------+---------------+
| transaction | YES     | YES   |            |          0 | NULL          |
+-------------+---------+-------+------------+------------+---------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT * FROM performance_schema.setup_consumers WHERE NAME LIKE '%transactions%';
+----------------------------------+---------+
| NAME                             | ENABLED |
+----------------------------------+---------+
| events_transactions_current      | YES     |
| events_transactions_history      | YES     |
| events_transactions_history_long | YES     |
+----------------------------------+---------+
3 rows in set (0.01 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 0
COUNT_READ_WRITE: 0
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)

And let’s do some simple tests:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; COMMIT;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 1
COUNT_READ_WRITE: 1
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     0 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

A void transaction caused an increase to this RW counter in Performance Schema view! Moreover, a simple autocommit select increases it too:

mysql [localhost] {msandbox} (db1) > SELECT count(*) FROM db1.t1;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.01 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 2
COUNT_READ_WRITE: 2
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > START TRANSACTION READ ONLY; COMMIT;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 3
COUNT_READ_WRITE: 2
 COUNT_READ_ONLY: 1
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     0 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

As seen above, with regard to monitoring transactions via Performance Schema, everything seems completely broken, empty transactions increase counters, and the only way to increase RO counter is to call a read-only transaction explicitly, but again, it should not count when no real read was done from a table. For this reason I filed another bug report: https://bugs.mysql.com/bug.php?id=92364

PMM Dashboard

We implemented a transactions information view in PMM, based on Information_schema.innodb_metrics, which—as presented above—is reliable and shows the correct counters. Therefore, I encourage everyone to use the innodb_monitor_enable setting to enable it and have the PMM graph it. It will look something like this:

Jun
26
2018
--

Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

performance troubleshooting MySQL monitoring tools

performance troubleshooting MySQL monitoring toolsPlease join Percona’s Principal Support Escalation Specialist Sveta Smirnova as she presents Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance on Wednesday, June 27th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

During the MySQL Troubleshooting webinar series, I covered many monitoring and logging tools such as:

  • General, slow, audit, binary, error log files
  • Performance Schema
  • Information Schema
  • System variables
  • Linux utilities
  • InnoDB monitors
  • PMM

However, I did not spend much time on the impact these instruments have on overall MySQL performance. And they do have an impact.

And this is the conflict many people face. MySQL Server users try exploring these monitoring instruments, see that they slow down their installations, and turn them off. This is unfortunate. If the instrument that can help you resolve a problem is OFF, you won’t have good and necessary information to help understand when, how and why the issue occurred. In the best case, you’ll re-enable instrumentation and wait for the next disaster occurrence. In the worst case, you try various fix options without any real knowledge if they solve the problem or not.

This is why it is important to understand the impact monitoring tools have on your database, and therefore how to minimize it.

Understanding and controlling the impact of MySQL monitoring tools

In this webinar, I cover why certain monitoring tools affect performance, and how to minimize the impact without turning the instrument off. You will learn how to monitor safely and effectively.

Register Now

 

Sveta Smirnova

Principal Support Escalation Specialist

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance appeared first on Percona Database Performance Blog.

Apr
04
2018
--

Performance Schema for MySQL Troubleshooting Webinar: Q & A

MySQL Troubleshooting

MySQL TroubleshootingIn this blog, I will provide answers to the Q & A for the Performance Schema for MySQL Troubleshooting webinar.

First, I want to thank everybody for attending my March 1, 2018, webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: Is Workbench able to take advantage of the enhancements to Perf schema?

A: MySQL Workbench is a graphical tool for database architects, administrators and developers. It uses Performance Schema for its Performance Schema Reports and Query Statistics dashboards for MySQL Servers of version 5.6 or greater. So the answer is: yes, it is able to take advantage of the enhancements to Performance Schema.

Q: Can we check the history data ?

A: Yes. To do it you need to enable history consumers. You will find instructions here. For all kinds of consumers history table names follow the same pattern:

  • *_history

      contains last N events per thread. Number N is defined by

    performance_schema_*_history_size

      variables. Default is -1 (autosized) in version 5.7 and 10 in version 5.6.

  • *_history_long

     contains the most recent M events. Value of M is defined by

    performance_schema_*_history_long_size

      variables. Default is -1 (autosized) in version 5.7 and 10000 in version 5.6

For example, if you want to have historical data for statements, you need to enable consumers

events_statements_history

 and

events_statements_history_long

. If you want to limit the number of queries stored, you need to set variables

performance_schema_events_statements_history_size

 and

performance_schema_events_statements_history_long_size

.

Q: Are there any guidelines regarding how much memory we should set aside for every X counters/statistics being enabled?

A: No, there is no such guideline I am aware of. But you can use definitions of tables in Performance Schema to calculate the approximate value of how much memory one row could occupy, and make predictions from it. You can also use memory instrumentation in Performance Schema and watch for changes of memory usage under load, adjusting as needed.

Q: How has the performance cost of performance schema changed from 5.6 to 5.7?

A: The worst situation for the performance cost of Performance Schema was in version 5.5. It was discussed in numerous places, but I recommend you read this 2010 post from Dimitri Kravtchuk, a MySQL Performance Architect at Oracle. The result of his post was a huge performance improvement in Performance Schema, reported in this post from 2011. There were more improvements, discussed in 2012. Since then, Performance Schema does not add significant performance overhead unless you enable 

waits

 instrumentation.

Version 5.7 added new instrumentation, and as my tests showed this instrumentation did not add any noticeable performance impact.

To summarize: version 5.6 made huge improvements to the performance of Performance Schema, and the new features in version 5.7 did not add any performance regressions.

Q: Will performance schema eat up my disk space? How long will it store all these logs and cause any issues?

A: Performance Schema does not store anything on disk, but uses memory. It stores data until it reaches the size of the consumer tables. When it reaches the maximum size, it removes the oldest data and replaces it with the newest statistics. Read the Performance Schema startup configuration guide on how to limit the maximum size of consumer tables.

Thanks for attending this webinar on Performance Schema for MySQL Troubleshooting. You can find the slides and a recording here.

The post Performance Schema for MySQL Troubleshooting Webinar: Q & A appeared first on Percona Database Performance Blog.

Mar
14
2018
--

Basic Internal Troubleshooting Tools for MySQL Server Webinar: Q & A

Troubleshooting Tools for MySQL

Troubleshooting Tools for MySQLIn this blog, I will provide answers to the Q & A for the Basic Internal Troubleshooting Tools for MySQL Server webinar.

First, I want to thank everybody for attending my February 15, 2018, webinar on troubleshooting tools for MySQL. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: How do we prevent the schema prefix from appearing in the show create view. This is causing issue with restore on another server with a different DB. See the issue here and reproducible test case: https://gist.github.com/anonymous/30cb138b598fec46be762789397796b6

A: I shortened the example in order to fit it in this blog:

mysql> create table t1(f1 int);
Query OK, 0 rows affected (3.47 sec)
mysql> create view v1 as select * from t1;
Query OK, 0 rows affected (0.21 sec)
mysql> show create view v1G
*************************** 1. row ***************************
                View: v1
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `t1`.`f1` AS `f1` from `t1`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)
mysql> select * from information_schema.views where table_schema='test'G
*************************** 1. row ***************************
       TABLE_CATALOG: def
        TABLE_SCHEMA: test
          TABLE_NAME: v1
     VIEW_DEFINITION: select `test`.`t1`.`f1` AS `f1` from `test`.`t1`
        CHECK_OPTION: NONE
        IS_UPDATABLE: YES
             DEFINER: root@localhost
       SECURITY_TYPE: DEFINER
CHARACTER_SET_CLIENT: utf8
COLLATION_CONNECTION: utf8_general_ci
1 row in set (0.00 sec)

The issue you experienced happened because even if you created a view as

SELECT foo FROM table1;

, it is stored as

SELECT foo FROM your_schema.table1;

. You can see it if you query the 

*.frm

  file for the view:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ cat var/mysqld.1/data/test/v1.frm
TYPE=VIEW
query=select `test`.`t1`.`f1` AS `f1` from `test`.`t1`
md5=5840f59d1287385629fcb9b948e53d96
updatable=1
algorithm=0
definer_user=root
definer_host=localhost
suid=2
with_check_option=0
timestamp=2018-02-24 10:27:45
create-version=1
source=select * from t1
client_cs_name=utf8
connection_cl_name=utf8_general_ci
view_body_utf8=select `test`.`t1`.`f1` AS `f1` from `test`.`t1`

You cannot prevent the schema prefix from being stored. If you restore the view on a different server with a different database name, you should edit the view definition manually. If you already restored the view that points to a non-existent schema, just recreate it.

VIEW

 is metadata only and does not hold any data, so this operation is non-blocking and will run momentarily.

Q: What is thread/sql/compress_gtid_table in performance_schema.threads?

A:

thread/sql/compress_gtid_table

  is name of the instrument. You can read this and other instruments as below:

  • thread/

     is a group of instruments. In this case, it is the instruments that are visible in the

    THREADS

     table.

  • thread/sql/

     is the group of instruments that are part of the server kernel code. If you are not familiar with MySQL source tree, download the source code tarball and check its content. The main components are:

    • sql

        – server kernel

    • storage

       – where storage engines code located (

      storage/innobase

        is InnoDB code,

      storage/myisam

        is MyISAM code and so on)

    • vio

       – input-output functions

    • mysys

       – code, shared between all parts of the server

    • client

       – client library and utilities

    • strings

       – functions to work with strings

This is not full list. For more information consult MySQL Internals Manual

  • thread/sql/compress_gtid_table

     is the name of the particular instrument.

Unfortunately, there is no link to source code for instrumented threads in the table

THREADS

, but we can easily find them in the

sql

 directory. The function

compress_gtid_table

 is defined in

sql/rpl_gtid_persist.cc

and we can check comments and find what it is doing:

/**
The main function of the compression thread.
- compress the gtid_executed table when get a compression signal.
@param p_thd Thread requesting to compress the table
@return
@retval 0 OK. always, the compression thread will swallow any error
for going to wait for next compression signal until
it is terminated.
*/
extern "C" {
static void *compress_gtid_table(void *p_thd)
{

You can also find the description of mysql.gtid_executed compression in the User Reference Manual.

You can follow the same actions to find out what other MySQL threads are doing.

Q: How does a novice on MySQL learn the core basics about MySQL. The documentation can be very vast which surpasses my understanding right now. Are there any good intro books you can recommend for a System Admin?

A: I learned MySQL a long time ago, and a book that I can recommend written for version 5.0. This is “MySQL 5.0 Certification Study Guide” by Paul DuBois,? Stefan Hinz and Carsten Pedersen. The book is in two parts: one is devoted to SQL developers and explains how to run and tune queries. The second part is for DBAs and describes how to tune MySQL server. I asked my colleagues to suggest more modern books for you, and this one is still on the list for many. This is in all cases an awesome book for beginners, just note that MySQL has changed a lot since 5.0 and you need to deepen your knowledge after you finish reading this book.

Another book that was recommended is “MySQL” by Paul DuBois. It is written for beginners and has plenty of content. Paul DuBois has been working on (and continues to work on) the official MySQL documentation for many years, and knows MySQL in great detail.

Another book is “Murach’s MySQL” by Joel Murach, which is used as a course book in many colleges for “Introduction into Databases” type classes.

For System Administrators, you can read “Systems Performance: Enterprise and the Cloud” by Brendan Gregg. This book talks about how to tune operating systems for performance. This is one of the consistent tasks we have to do when administering MySQL. I also recommend that you study Brendan Gregg’s website, which is a great source of information for everyone who is interested in operating system performance tuning.

After you finish the books for novices, you can check out “High Performance MySQL, 3rd Edition” by Peter Zaitsev, Vadim Tkachenko, Baron Schwartz and “MySQL Troubleshooting” by Sveta Smirnova (yours truly =) ). These two books require at least basic MySQL knowledge, however.

Q: Does the database migration goes on same way? Do these tools work for migration as well?

A: The tools I discussed in this webinar are available for any version of MySQL/Percona/MariaDB server. You may use them for migration. For example, it is always useful to compare configuration (

SHOW GLOBAL VARIABLES

) on both “old” and “new” servers. It helps if you observe performance drops on the “new” server. Or you can check table definitions before and after migration. There are many more uses for these tools during the migration process.

Q: How can we take backup of a single schema from a MySQL AWS instance without affecting the performance of applications. An AWS RDS instance to be more clear. mysqldump we cannot use in RDS instance in the current scenario.

A: You can connect to your RDS instance with mysqldump from your local machine, exactly like your MySQL clients connect to it. Then you can collect a dump of a single database, table or even specify the option –where to limit the resulting set to only a portion of the table. Note, by default

mysqldump

 is blocking, but if you backup solely transactional tables (InnoDB, TokuDB, MyRocks) you can run

mysqldump

 with the option

--single-transaction

, which starts the transaction at the beginning of the backup job.

Alternatively, you can use AWS Database Migration Service, which allows you to replicate your databases. Then you can take a backup of a single schema using whatever method you like.

Q: Why do some sites suggest to turn off information and performance schema? Is it important to keep it on or turn it off?

A: You cannot turn off Information Schema. It is always available.

Performance Schema in earlier versions (before 5.6.14) was resource-consuming, even if it was idle while enabled. These limitations were fixed a long time ago, and you don’t need to keep it off. At least unless you hit some new bug.

Q: How do we handle storage level threshold if a data file size grows and reaches max threshold when unnoticed? Can you please help on this question?

A: Do you mean what will happen if the data file grows until filesystem has no space? In this case, clients receive the error

"OS error code 28: No space left on device"

  until space is freed and mysqld can start functioning normally again. If it can write into error log file (for example, if it is located on different disk), you will see messages about error 28 in the error log file too.

Q: What are the performance bottlenecks when enabling performance_schema. Is there any benchmark we can have?

A: Just enabling Performance Schema in version 5.6 and up does not cause any performance issue. With version 5.7, it can also start with almost zero allocated memory, so it won’t affect your other buffers. The Performance Schema causes impact when you enable particular instruments. Most of them are instruments that start with the name

events_waits_*

. I performed benchmarks on effects of particular Performance Schema instruments and published them in this post.

Q: Suggest us some tips about creating a real-time dashboards for the same as we have some replication environment? it would be great if you can help us here for building business level dashboards

A: This is topic for yet another webinar or, better still, a tutorial. For starters, I recommend you to check out the “MySQL Replication” dashboard in PMM and extend it using the metrics that you need.

Thanks for attending the webinar on internal troubleshooting tools for MySQL.

Jun
22
2017
--

ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark)

ClickHouse

ClickHouseIn this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.

We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns.

It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics.

First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html):

CREATE TABLE lineorder
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);

For ClickHouse, the table definition looks like this:

CREATE TABLE lineorderfull (
        LO_ORDERKEY             UInt32,
        LO_LINENUMBER           UInt8,
        LO_CUSTKEY              UInt32,
        LO_PARTKEY              UInt32,
        LO_SUPPKEY              UInt32,
        LO_ORDERDATE            Date,
        LO_ORDERPRIORITY        String,
        LO_SHIPPRIORITY         UInt8,
        LO_QUANTITY             UInt8,
        LO_EXTENDEDPRICE        UInt32,
        LO_ORDTOTALPRICE        UInt32,
        LO_DISCOUNT             UInt8,
        LO_REVENUE              UInt32,
        LO_SUPPLYCOST           UInt32,
        LO_TAX                  UInt8,
        LO_COMMITDATE           Date,
        LO_SHIPMODE             String
)Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192);

From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes.

The second table (RedShift definition):

CREATE TABLE customer
(
  c_custkey      INTEGER NOT NULL,
  c_name         VARCHAR(25) NOT NULL,
  c_address      VARCHAR(25) NOT NULL,
  c_city         VARCHAR(10) NOT NULL,
  c_nation       VARCHAR(15) NOT NULL,
  c_region       VARCHAR(12) NOT NULL,
  c_phone        VARCHAR(15) NOT NULL,
  c_mktsegment   VARCHAR(10) NOT NULL
);

For ClickHouse, I defined as:

CREATE TABLE customerfull (
        C_CUSTKEY       UInt32,
        C_NAME          String,
        C_ADDRESS       String,
        C_CITY          String,
        C_NATION        String,
        C_REGION        String,
        C_PHONE         String,
        C_MKTSEGMENT    String,
        C_FAKEDATE      Date
)Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192);

For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql.

For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future.

To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse.

It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table.

Hardware Setup

One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage.

For the SSB benchmark I use a scale factor of 2500, which provides (in raw data):

Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows

When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio.

We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup.

Single Table Operations

Query:

SELECT
    toYear(LO_ORDERDATE) AS yod,
    sum(LO_REVENUE)
FROM lineorderfull
GROUP BY yod

One node:

7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.)

Three nodes:

7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.)

We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast!

One Table with Filtering

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorderfull
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)

One node:

1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.)

Three nodes:

1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.)

It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system.

Two Tables (Independent Subquery)

In this case, I want to show how Clickhouse handles independent subqueries:

SELECT sum(LO_REVENUE)
FROM lineorderfull
WHERE LO_CUSTKEY IN
(
    SELECT C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
    WHERE C_REGION = 'ASIA'
)

One node:

1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.)

Three nodes:

1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.)

We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY

Two Tables JOIN

With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table.

First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like:

SELECT
    C_REGION,
    sum(LO_EXTENDEDPRICE * LO_DISCOUNT)
FROM lineorderfull
ANY INNER JOIN
(
    SELECT
        C_REGION,
        C_CUSTKEY AS LO_CUSTKEY
    FROM customerfull
) USING (LO_CUSTKEY)
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
GROUP BY C_REGION

One node:

5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.)

Three nodes:

5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.)

In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as:

CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand());
CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand());

Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example:

CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY);
CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY);

Three Tables JOIN

This is where it becomes very complicated. Let’s consider the query that you would normally write:

SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and
LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod;

With Clickhouse’s limitations on JOINs syntax, the query becomes:

SELECT
    sum(LO_REVENUE),
    P_MFGR,
    toYear(LO_ORDERDATE) AS yod
FROM
(
    SELECT
        LO_PARTKEY,
        LO_ORDERDATE,
        LO_REVENUE
    FROM lineorderfull
    ALL INNER JOIN
    (
        SELECT
            C_REGION,
            C_CUSTKEY AS LO_CUSTKEY
        FROM customerfull
    ) USING (LO_CUSTKEY)
    WHERE C_REGION = 'ASIA'
)
ALL INNER JOIN
(
    SELECT
        P_MFGR,
        P_PARTKEY AS LO_PARTKEY
    FROM partfull
) USING (LO_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution.

There is also not much speedup when we compare one node vs. three nodes:

One node execution time:

35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.)

Three nodes execution time:

35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.)

There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!)

Optimized query:

SELECT
    sum(revenue),
    P_MFGR,
    yod
FROM
(
    SELECT
        LO_PARTKEY AS P_PARTKEY,
        toYear(LO_ORDERDATE) AS yod,
        SUM(LO_REVENUE) AS revenue
    FROM lineorderfull
    WHERE LO_CUSTKEY IN
    (
        SELECT C_CUSTKEY
        FROM customerfull
        WHERE C_REGION = 'ASIA'
    )
    GROUP BY
        P_PARTKEY,
        yod
)
ANY INNER JOIN partfull USING (P_PARTKEY)
GROUP BY
    P_MFGR,
    yod
ORDER BY
    P_MFGR ASC,
    yod ASC

Optimized query time:

One node:

35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.)

Three nodes:

35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s.

That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data.

Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained.

I will review dictionary performance in future posts.

Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder).

Observations

  • ClickHouse can handle general analytical queries (it requires special schema design and considerations, however)
  • Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality
  • ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes
  • Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization
Jan
26
2017
--

Performance Schema Benchmarks: OLTP RW

Performance Schema Benchmarks

In this blog post, we’ll look at Performance Schema benchmarks for OLTP Read/Write workloads.

I am in love with Performance Schema and talk a lot about it. Performance Schema is a revolutionary MySQL troubleshooting instrument, but earlier versions had performance issues. Many of these issues are fixed now, and the default options work quickly and reliably. However, there is no free lunch! It is expected that the more instruments you turn ON, the more overhead you’ll have.

The advice I give my customers is that when in doubt, only turn ON the instruments that are required to troubleshoot your issue. Many of them ask: what exactly are the overhead costs for one instrumentation or another? I believe the best answer is “test on your system!” No generic benchmark can exactly repeat a workload on your site. But while I was working on the “OpenSource Databases on Big Machines” project, I decided to test the performance of Performance Schema as well.

I only tested a Read/Write workload. I used the same fast machine (144 CPU cores), the same MySQL options and the same SysBench commands that I described in this post. The option 

innodb_flush_method

 was changed to O_DIRECT, because it’s more reasonable for real-life workloads. I also upgraded the MySQL Server version to Oracle’s MySQL 5.7.17. The reason for the upgrade was to test if the issue described in this post is repeatable with latest Oracle MySQL server version. But since I tested Performance Schema, the effect on Percona Server for MySQL should be same.

I tested nine different scenarios:

  1. “All disabled”: Performance Schema is ON, but all instruments and consumers are disabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
  2. “All enabled”: Performance Schema is ON, and all instruments and consumers are enabled.
    update setup_instruments set enabled='yes';
    update setup_consumers set enabled='yes';
  3. “Default”: Performance Schema is ON, and only default instruments and consumers are enabled.
  4. “MDL only”: only Metadata Lock instrumentation is enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name='wait/lock/metadata/sql/mdl';
  5. “Memory only”: only Memory instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'memory%';
  6. “Stages and Statements”: only Stages and Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  7. “Stages only”: only Stages instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  8. “Statements only”: only Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
  9. “Waits only”: only Waits instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'wait%' ;
    update setup_consumers set enabled='yes' where name like 'events_waits%';

Here are the overall results.

As you can see, some instrumentation only slightly affects performance, while others affect it a lot. I created separate graphs to make the picture clearer.

As expected, enabling all instrumentation makes performance lower:

Does this mean to use Performance Schema, you need to start the server with it ON and then disable all instruments? No! The default options have very little effect on performance:

The same is true for Metadata Locks, Memory and Statements instrumentation:

Regarding statements, I should note that I used prepared statements (which are instrumented in version 5.7). But it makes sense to repeat the tests without prepared statements.

The Stages instrumentation starts affecting performance:

However, the slowdown is reasonable and it happens only after we reach 32 concurrent threads. It still provides great insights on what is happening during query execution.

The real performance killer is Waits instrumentation:

It affects performance close to the same way as all instruments ON.

Conclusion

Using Performance Schema with the default options, Memory, Metadata Locks and Statements instrumentation doesn’t have a great impact on read-write workload performance. You might notice slowdowns with Stages instrumentation after reaching 32 actively running parallel connections. The real performance killer is Waits instrumentation. And even with it on, you will start to notice a performance drop only after 10,000 transactions per second.

Save

Save

Save

Save

Save

Save

Save

Dec
28
2016
--

Quickly Troubleshoot Metadata Locks in MySQL 5.7

Metadata Locks

Metadata LocksIn a previous article, Ovais demonstrated how a DDL can render a table blocked from new queries. In another article, Valerii introduced performance_schema.metadata_locks, which is available in MySQL 5.7 and exposes metadata lock details. Given this information, here’s a quick way to troubleshoot metadata locks by creating a stored procedure that can:

  • Find out which thread(s) have the metadata lock
  • Determine which thread has been waiting for it the longest
  • Find other threads waiting for the metadata lock

Setting up instrumentation

First, you need to enable instrumentation for metadata locks:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';

Second, you need to add this stored procedure:

USE test;
DROP PROCEDURE IF EXISTS procShowMetadataLockSummary;
delimiter //
CREATE PROCEDURE procShowMetadataLockSummary()
BEGIN
	DECLARE table_schema VARCHAR(64);
    DECLARE table_name VARCHAR(64);
    DECLARE id bigint;
    DECLARE time bigint;
    DECLARE info longtext;
	DECLARE curMdlCount INT DEFAULT 0;
    DECLARE curMdlCtr INT DEFAULT 0;
	DECLARE curMdl CURSOR FOR SELECT * FROM tmp_blocked_metadata;
	DROP TEMPORARY TABLE IF EXISTS tmp_blocked_metadata;
	CREATE TEMPORARY TABLE IF NOT EXISTS tmp_blocked_metadata (
       table_schema varchar(64),
       table_name varchar(64),
       id bigint,
	   time bigint,
       info longtext,
       PRIMARY KEY(table_schema, table_name)
    );
    REPLACE tmp_blocked_metadata(table_schema,table_name,id,time,info) SELECT mdl.OBJECT_SCHEMA, mdl.OBJECT_NAME, t.PROCESSLIST_ID, t.PROCESSLIST_TIME, t.PROCESSLIST_INFO FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON mdl.OWNER_THREAD_ID = t.THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' and mdl.LOCK_TYPE='EXCLUSIVE' ORDER BY mdl.OBJECT_SCHEMA,mdl.OBJECT_NAME,t.PROCESSLIST_TIME ASC;
    OPEN curMdl;
    SET curMdlCount = (SELECT FOUND_ROWS());
    WHILE (curMdlCtr < curMdlCount)
    DO
      FETCH curMdl INTO table_schema, table_name, id, time, info;
      SELECT CONCAT_WS(' ','PID',t.PROCESSLIST_ID,'has metadata lock on', CONCAT(mdl.OBJECT_SCHEMA,'.',mdl.OBJECT_NAME), 'with current state', CONCAT_WS('','[',t.PROCESSLIST_STATE,']'), 'for', t.PROCESSLIST_TIME, 'seconds and is currently running', CONCAT_WS('',"[",t.PROCESSLIST_INFO,"]")) AS 'Process(es) that have the metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='GRANTED' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID NOT IN(SELECT mdl2.OWNER_THREAD_ID FROM performance_schema.metadata_locks mdl2 WHERE mdl2.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = mdl2.OBJECT_SCHEMA and mdl.OBJECT_NAME = mdl2.OBJECT_NAME);
      SELECT CONCAT_WS(' ','PID', id, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', time, 'seconds to execute', CONCAT_WS('','[',info,']')) AS 'Oldest process waiting for metadata lock';
      SET curMdlCtr = curMdlCtr + 1;
	  SELECT CONCAT_WS(' ','PID', t.PROCESSLIST_ID, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', t.PROCESSLIST_TIME, 'seconds to execute', CONCAT_WS('','[',t.PROCESSLIST_INFO,']')) AS 'Other queries waiting for metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID AND t.PROCESSLIST_ID <> id ;
	END WHILE;
    CLOSE curMdl;
END//
delimiter ;

Testing

Now, let’s call the procedure to see if there are threads waiting for metadata locks:

mysql> CALL test.procShowMetadataLockSummary();
+----------------------------------------------------------------------------------------------------------------+
| Process(es) that have the metadata lock                                                                        |
+----------------------------------------------------------------------------------------------------------------+
| PID 10 has metadata lock on sbtest.sbtest with current state [] since 274 seconds and is currently running []  |
| PID 403 has metadata lock on sbtest.sbtest with current state [] since 291 seconds and is currently running [] |
+----------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
+------------------------------------------------------------------------------------------------------------------------+
| Oldest process waiting for metadata lock                                                                               |
+------------------------------------------------------------------------------------------------------------------------+
| PID 1264 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [truncate table sbtest.sbtest] |
+------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
+---------------------------------------------------------------------------------------------------------------------------+
| Other queries waiting for metadata lock                                                                                   |
+---------------------------------------------------------------------------------------------------------------------------+
| PID 1269 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] |
| PID 1270 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] |
| PID 1271 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] |
| PID 1272 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] |
| PID 1273 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] |
+---------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

So, as you can see above, you have several choices. You could (a) do nothing and wait for threads 10 and 403 to complete and then thread 1264 can get the lock.

If you can’t wait, you can (b) kill the threads that have the metadata lock so that the TRUNCATE TABLE in thread 1264 can get the lock. Although, before you decide to kill threads 10 and 403, you should check

SHOW ENGINE INNODB STATUS

 to see if the undo log entries for those threads are high. If they are, rolling back these transactions might take a long time.

Lastly, you can instead (c) kill the DDL thread 1264 to free up other queries. You should then reschedule the DDL to run during offpeak hours.

Happy metadata lock hunting!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com