Oct
10
2018
--

Instrumenting Read Only Transactions in InnoDB

Instrumenting read only transactions MySQL

Instrumenting read only transactions MySQLProbably not well known but quite an important optimization was introduced in MySQL 5.6 – reduced overhead for “read only transactions”. While usually by a “transaction” we mean a query or a group of queries that change data, with transaction engines like InnoDB, every data read or write operation is a transaction.

Now, as a non-locking read operation obviously has less impact on the data, it does not need all the instrumenting overhead a write transaction has. The main thing that can be avoided, as described by documentation, is the transaction ID. So, since MySQL 5.6, a read only transaction does not have a transaction ID. Moreover, such a transaction is not visible in the SHOW ENGINE INNODB STATUS output, though I will not go deeper on what really that means under the hood in this article. The fact is that this optimization allows for better scaling of workloads with many RO threads. An example RO benchmark, where 5.5 vs 5.6/5.7 difference is well seen, may be found here: https://www.percona.com/blog/2016/04/07/mysql-5-7-sysbench-oltp-read-results-really-faster/

To benefit from this optimization in MySQL 5.6, either a transaction has to start with the explicit START TRANSACTION READ ONLY clause or it must be an autocommit, non-locking SELECT statement. In version 5.7 and newer, it goes further, as a new transaction is treated as read-only until a locking read or write is executed, at which point it gets “upgraded” to a read-write one.

Information Schema Instrumentation

Let’s see how it looks like (on MySQL 8.0.12) by looking at information_schema.innodb_trx and information_schema.innodb_metrics tables. The second of these, by default, has transaction counters disabled, so before the test we have to enable it with:

SET GLOBAL innodb_monitor_enable = 'trx%comm%';

or by adding a parameter to the

[mysqld]

 section of the configuration file and restarting the instance:

innodb_monitor_enable = "trx_%"

Now, let’s start a transaction which should be read only according to the rules:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

Transaction started as above, did not appear in SHOW ENGINE INNODB STATUS, and its trx_id looks strangely high. And first surprise—for some reason, trx_is_read_only is 0. Now, what if we commit such a transaction—how do the counters change? (I reset them before the test):

mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     1 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

OK, so clearly it was a read-only transaction overall, just the trx_is_read_only property wasn’t set as expected. I had to report this problem here: https://bugs.mysql.com/bug.php?id=92558

What about an explicit RO transaction:

mysql [localhost] {msandbox} (db1) > START TRANSACTION READ ONLY; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 1
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

OK, both transactions are counted as the same type. Moreover, the two transactions shared the same strange trx_id, which appears to be a fake one. For a simple read executed in autocommit mode, the counters increase as expected too:

mysql [localhost] {msandbox} (db1) > select @@autocommit; SELECT count(*) FROM db1.t1;
+--------------+
| @@autocommit |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

Now, let’s test how a transaction looks when we upgrade it to RW later:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; SELECT count(*) FROM db1.t1;
Query OK, 0 rows affected (0.00 sec)
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 421988493944672
                trx_weight: 0
           trx_rows_locked: 0
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT count(*) FROM db1.t1 FOR UPDATE;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT trx_id,trx_weight,trx_rows_locked,trx_rows_modified,trx_is_read_only,trx_autocommit_non_locking
FROM information_schema.innodb_trx\G
*************************** 1. row ***************************
                    trx_id: 4106
                trx_weight: 2
           trx_rows_locked: 4
         trx_rows_modified: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > commit;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     1 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     2 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

OK, as seen above, after a locking read was done, our transaction has transformed: it got a real, unique trx_id assigned. Then, when committed, the RW counter increased.

Performance Schema Problem

Nowadays it may feel natural to use performance_schema for monitoring everything. And, indeed, we can monitor types of transactions with it as well. Let’s enable the needed consumers and instruments:

mysql [localhost] {msandbox} (db1) > UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%transactions%';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 3  Changed: 0  Warnings: 0
mysql [localhost] {msandbox} (db1) > UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'transaction';
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0
mysql [localhost] {msandbox} (db1) > SELECT * FROM performance_schema.setup_instruments WHERE NAME = 'transaction';
+-------------+---------+-------+------------+------------+---------------+
| NAME        | ENABLED | TIMED | PROPERTIES | VOLATILITY | DOCUMENTATION |
+-------------+---------+-------+------------+------------+---------------+
| transaction | YES     | YES   |            |          0 | NULL          |
+-------------+---------+-------+------------+------------+---------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT * FROM performance_schema.setup_consumers WHERE NAME LIKE '%transactions%';
+----------------------------------+---------+
| NAME                             | ENABLED |
+----------------------------------+---------+
| events_transactions_current      | YES     |
| events_transactions_history      | YES     |
| events_transactions_history_long | YES     |
+----------------------------------+---------+
3 rows in set (0.01 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 0
COUNT_READ_WRITE: 0
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)

And let’s do some simple tests:

mysql [localhost] {msandbox} (db1) > START TRANSACTION; COMMIT;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 1
COUNT_READ_WRITE: 1
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     0 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     0 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.00 sec)

A void transaction caused an increase to this RW counter in Performance Schema view! Moreover, a simple autocommit select increases it too:

mysql [localhost] {msandbox} (db1) > SELECT count(*) FROM db1.t1;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.01 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 2
COUNT_READ_WRITE: 2
 COUNT_READ_ONLY: 0
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > START TRANSACTION READ ONLY; COMMIT;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT COUNT_STAR,COUNT_READ_WRITE,COUNT_READ_ONLY
FROM performance_schema.events_transactions_summary_global_by_event_name\G
*************************** 1. row ***************************
      COUNT_STAR: 3
COUNT_READ_WRITE: 2
 COUNT_READ_ONLY: 1
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (db1) > SELECT name, comment, status, count
FROM information_schema.innodb_metrics   WHERE name like 'trx%comm%';
+---------------------------+--------------------------------------------------------------------+---------+-------+
| name                      | comment                                                            | status  | count |
+---------------------------+--------------------------------------------------------------------+---------+-------+
| trx_rw_commits            | Number of read-write transactions  committed                       | enabled |     0 |
| trx_ro_commits            | Number of read-only transactions committed                         | enabled |     0 |
| trx_nl_ro_commits         | Number of non-locking auto-commit read-only transactions committed | enabled |     1 |
| trx_commits_insert_update | Number of transactions committed with inserts and updates          | enabled |     0 |
+---------------------------+--------------------------------------------------------------------+---------+-------+
4 rows in set (0.01 sec)

As seen above, with regard to monitoring transactions via Performance Schema, everything seems completely broken, empty transactions increase counters, and the only way to increase RO counter is to call a read-only transaction explicitly, but again, it should not count when no real read was done from a table. For this reason I filed another bug report: https://bugs.mysql.com/bug.php?id=92364

PMM Dashboard

We implemented a transactions information view in PMM, based on Information_schema.innodb_metrics, which—as presented above—is reliable and shows the correct counters. Therefore, I encourage everyone to use the innodb_monitor_enable setting to enable it and have the PMM graph it. It will look something like this:

Dec
21
2015
--

Better high availability: MySQL and Percona XtraDB Cluster with good application design

high availabilityHigh Availability

Have you ever wondered if your application should be able to work in read-only mode? How important is that question?

MySQL seems to be the most popular database solution for web-based products. Most typical Internet application workloads consist of many reads, with usually few writes. There are exceptions of course – MMO games for instance – but often the number of reads is much bigger then writes. So when your database infrastructure looses its ability to accept writes, either because traditional MySQL replication topology lost its master or Galera cluster lost its quorum, why would you want to the application to declare total downtime? During this scenario, imagine all the users who are just browsing the application (not contributing content): they don’t care if the database cannot accept new data. People also prefer to have access to an application, even if it’s functionality is notably reduced, rather then see the 500 error pages. In some disaster scenarios it is a seriously time-consuming task to perform PITR or recover some valuable data: it is better to at least have the possibility of user read access to a recent backup.

My advice: design your application with the possible read-only partial outage in mind, and test how the application works in that mode during it’s development life cycle. I think it will pay off greatly and increase the perception of availability of your product. As an example, check out some of the big open source projects’ implementation of this concept, like MediaWiki or Drupal (and also some commercial products).

PXC

Having said that, I want to highlight a pretty new (and IMHO) important improvement in this regard, introduced in the Galera replication since PXC version 5.6.24. It was already mentioned by my colleague Stéphane in his blog post earlier this year.

Focus on data consistency

As you probably know, one of Galera’s key advantages is their great data consistency care and data-centric approach. No matter where you write in the cluster, all nodes must have the same data. This is important when you realize what happens when a data inconsistency is detected between the nodes. Inconsistent nodes, which cannot apply a writeset due to missing rows or duplicate unique key values for instance, will have to abort and perform an emergency shutdown. This happens in order to remove contaminated members from the cluster, and not spread the data “illness” further. If it does happen that the majority of nodes perform an emergency abort, the remaining minority may loose the cluster quorum and will stop serving further client’s requests. So the price for data consistency protection is availability.

Anti-split-brain

Sometimes a node or node cluster members loose connectivity to others, in a way that >50% of nodes can no longer communicate. Connectivity is lost all of a sudden, without a proper “goodbye” message from the “dissappeared” nodes. These nodes don’t know what the reason was for the lost connection – were the peers killed? or may be networks were split? In that situation, nodes declare a non-Primary cluster state and go into SQL-disabled mode. This is because a member of a cluster without a quorum (majority), hence not acting as Primary Component, is not trusted as it may have inconsistent or old data. Because of this state, it won’t allow the clients to access it.

This is for two reasons. First and unquestionably, we don’t want to allow writes when there is a risk of network split, where the other part of the cluster still forms the Primary Component and keeps operating. We may also want to disallow reads of a stall data, however, when there is a possibility that the other part of the infrastructure already has a lot of newer information.

In the standard MySQL replication process there are no such precautions – if replication is broken in master-master topology both masters can still accept writes, and they can also read anything from the slaves regardless of how much they may be lagging or if they are connected to their masters at all. In Galera though, even if too much lag in the applying queue is detected (similar to replication lag concept), the cluster will pause writes using a Flow Control mechanism. If replication is broken as described above, it will even stop the reads.

Dirty reads from Galera cluster

This behavior may seem too strict, especially if you just migrated from MySQL replication to PXC, and you just accept that database “slave” nodes can serve the read traffic even if they are separated from the “master.” Or if your application does not rely on writes but mostly on access to existing content. In that case, you can either enable the new wsrep_dirty_reads variable dynamically (per session only if needed), or setup your cluster to run this option by default by placing wsrep_dirty_reads = ON in the my.cnf (global values are acceptable in the config file is available since PXC 5.6.26).

The Galera topology below is something we very often see at customer sites, where WAN locations are configured to communicate via VPN:

WAN_PXCI think this failure scenario is a perfect usage case for wsrep_dirty_reads – where none of the cluster parts are able to work at full functionality alone, but could successfully keep serving read queries to the clients.

So let’s quickly see how the cluster member behaves with the wsrep_dirty_reads option disabled and enabled (for the test I blocked network communication on port 4567):

percona3 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
percona3 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | OFF   |
+-------------------+-------+
1 row in set (0.01 sec)
percona3 mysql>  select * from test.g1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

And when enabled:

percona2 mysql> show status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
percona2 mysql> show variables like '%dirty_reads';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_dirty_reads | ON    |
+-------------------+-------+
1 row in set (0.00 sec)
percona2 mysql> select * from test.g1;
+----+-------+
| id | a     |
+----+-------+
|  1 | dasda |
|  2 | dasda |
+----+-------+
2 rows in set (0.00 sec)
percona2 mysql> insert into test.g1 set a="bb";
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

MySQL

In traditional replication, you are probably using the slaves for reads anyway. So if the master crashes, and for some reason a failover toolkit like MHA or PRM is not configured or also fails, in order to keep the application working you should direct new connections meant for the master to one of the slaves. If you use a loadbalancer, maybe just have the slaves as backups for the master in the write pool. This may help to achieve a better user experience during the downtime, where everyone can at least use existing information. As noted above, however, the application must be prepared to work that way.

There are caveats to this implementation, as the “read_only” mode that is usually used on slaves is not 100% read-only. This is due to the exception for “super” users. In this case, the new super_read_only variable comes to the rescue (available in Percona Server 5.6) as well as stock MySQL 5.7. With this feature, there is no risk that after pointing database connections to one of the slaves, some special users will change the data.

If a disaster is severe enough, it may be necessary to recover data from a huge SQL dump, and it’s often hard to find enough spare servers to serve the traffic with an old binary snapshot. It’s worth noting that InnoDB has a special read-only mode, meant to be used in a read-only medium, that is lightweight compared to full InnoDB mode.

Useful links

If you are looking for more information about Galera/PXC availability problems and recovery tips, these earlier blog posts may be interesting:
Percona XtraDB Cluster (PXC): How many nodes do you need?
Percona XtraDB Cluster: Quorum and Availability of the cluster
Galera replication – how to recover a PXC cluster

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com