Jun
21
2017
--

Percona Monitoring and Management 1.1.5 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.1.5 on June 21, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

  • PMM-667: Fixed the Latency graph in the ProxySQL Overview dashboard to plot microsecond values instead of milliseconds.

  • PMM-800: Fixed the InnoDB Page Splits graph in the MySQL InnoDB Metrics Advanced dashboard to show correct page merge success ratio.

  • PMM-1007: Added links to Query Analytics from MySQL Overview and MongoDB Overview dashboards. The links also pass selected host and time period values.

    NOTE: These links currently open QAN2, which is still considered experimental.

Changes in PMM Client

  • PMM-931: Fixed pmm-admin script when adding MongoDB metrics monitoring for secondary in a replica set.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Jun
21
2017
--

Tracing MongoDB Queries to Code with Cursor Comments

Tracing MongoDB Queries

Tracing MongoDB QueriesIn this short blog post, we will discuss a helpful feature for tracing MongoDB queries: Cursor Comments.

Cursor Comments

Much like other database systems, MongoDB supports the ability for application developers to set comment strings on their database queries using the Cursor Comment feature. This feature is very useful for both DBAs and developers for quickly and efficiently tying a MongoDB query found on the database server to a line of code in the application source.

Once Cursor Comments are set in application code, they can be seen in the following areas on the server:

  1. The
    db.currentOp()

     shell command. If Auth is enabled, this requires a role that has the ‘inprog’ privilege.

  2. Profiles in the system.profile collection (per-db) if profiling is enabled.
  3. The QUERY log component.

Note: the Cursor Comment string shows as the field “query.comment” in the Database Profiler output, and as the field originatingCommand.comment in the output of the db.currentOp() command.

This is fantastic because this makes comments visible in the areas commonly used to find performance issues!

Often it is very easy to find a slow query on the database server, but it is difficult to target the exact area of a large application that triggers the slow query. This can all be changed with Cursor Comments!

Python Example

Below is a snippet of Python code implementing a cursor comment on a simple query to the collection “test.test”. (Most other languages and MongoDB drivers should work similarly if you do not use Python.)

My goal in this example is to get the MongoDB Profiler to log a custom comment, and then we will read it back manually from the server afterward to confirm it worked.

In this example, I include the following pieces of data in my comment:

  1. The Python class
  2. The Python method that executed the query
  3. The file Python was executing
  4. The line of the file Python was executing

Unfortunately, three of the four useful details above are not built-in variables in Python, so the “inspect” module is required to fetch those details. Using the “inspect” module and setting a cursor comment for every query in an application is a bit clunky, so it is best to create a method to do this. I made a class-method named “find_with_comment” in this example code to do this. This method performs a MongoDB query and sets the cursor comment automagically, finally returning a regular pymongo cursor object.

Below is the simple Python example script. It connects to a Mongod on localhost:27017, and demonstrates everything for us. You can run this script yourself if you have the “pymongo” Python package installed.

Script:

from inspect import currentframe, getframeinfo
from pymongo import MongoClient
class TestClass:
    def find_with_comment(self, conn, query, db, coll):
        frame      = getframeinfo(currentframe().f_back)
        comment    = "%s:%s;%s:%i" % (self.__class__.__name__, frame.function, frame.filename, frame.lineno)
        collection = conn[db][coll]
        return collection.find(query).comment(comment)
    def run(self):
        uri   = "localhost:27017"
        conn  = MongoClient(uri)
        query = {'user.name': 'John Doe'}
        for doc in self.find_with_comment(conn, query, 'test', 'test'):
            print doc
        conn.close()
if __name__  == "__main__":
    t = TestClass()
    t.run()

There are a few things to explain in this code:

  1. Line #6-10: The “find_with_comment” method runs a pymongo query and handles adding our special cursor comment string. This method takes-in the connection, query and db+collection name as variables.
  2. Line #7: is using the “inspect” module to read the last Python “frame” so we can fetch the file, line number, that called the query.
  3. Line #12-18: The “run” method makes a database connection, runs the “find_with_comment” method with a query, prints the results and closes the connection. This method is just boilerplate to run the example.
  4. Line #20-21: This code initiates the TestClass and calls the “run” method to run our test.

Trying It Out

Before running this script, enable database profiling mode “2” on the “test” database. This is the database the script will query. The profiling mode “2” causes MongoDB to profile all queries:

$ mongo --port=27017
> use test
switched to db test
> db.setProfilingLevel(2)
{ "was" : 1, "slowms" : 100, "ratelimit" : 1, "ok" : 1 }
> quit()

Now let’s run the script. There should be no output from the script, it is only going to do a find query to generate a Profile.

I saved the script as cursor-comment.py and ran it like this from my Linux terminal:

$ python cursor-comment.py
$

Now, let’s see if we can find any Profiles containing the “query.comment” field:

$ mongo --port=27017
> use test
> db.system.profile.find({ "query.comment": {$exists: true} }, { query: 1 }).pretty()
{
	"query" : {
		"find" : "test",
		"filter" : {
			"user.name" : "John Doe"
		},
		"comment" : "TestClass:run;cursor-comment.py:16"
	}
}

Now we know the exact class, method, file and line number that ran this profiled query! Great!

From this Profile we can conclude that the class-method “TestClass:run” initiated this MongoDB query from Line #16 of cursor-comment.py. Imagine this was a query that slowed down your production system and you need to know the source quickly. The usefulness of this feature/workflow becomes obvious, fast.

More on Python “inspect”

Instead of constructing a custom comment like the example above, you can also use Python “inspect” to collect the Python source code comment that precedes the code that is running. This might be useful for projects that have code comments that would be more useful than class/method/file/line number. As the comment is a string, the sky is the limit on what you can set!

Read about the

.getcomments()

  method of “inspect” here: https://docs.python.org/2/library/inspect.html#inspect.getcomments

Aggregation Comments

MongoDB 3.5/3.6 added support for comments in aggregations. This is a great new feature, as aggregations are often heavy operations that would be useful to tie to a line of code as well!

This can be used by adding a “comment” field to your “aggregate” server command, like so:

db.runCommand({
  aggregate: "myCollection",
  pipeline: [
    { $match: { _id: "foo" } }
  ],
  comment: "fooMatch"
})

See more about this new feature in the following MongoDB tickets: SERVER-28128 and DOCS-10020.

Conclusion

Hopefully this blog gives you some ideas on how this feature can be useful in your application. Start adding comments to your application today!

Jun
20
2017
--

Webinar Thursday June 22, 2017: Deploying MySQL in Production

Deploying MySQL

Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 MySQL is famous for being something you can install and get going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup?

This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc.

Register for the webinar here.

Deploying MySQLDaniel Kowalewski, Senior Technical Operations Engineer

Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything.

Jun
14
2017
--

MySQL Triggers and Updatable Views

MySQL Triggers

MySQL TriggersIn this post we’ll review how MySQL triggers can affect queries.

Contrary to what the documentation states, we can activate triggers even while operating on views:

https://dev.mysql.com/doc/refman/5.7/en/triggers.html

Important: MySQL triggers activate only for changes made to tables by SQL statements. They do not activate for changes in views, nor by changes to tables made by APIs that do not transmit SQL statements to the MySQL server.

Be on the lookout if you use and depend on triggers, since it’s not the case for updatable views! We have reported a documentation bug for this but figured it wouldn’t hurt to mention this as a short blog post, too. ? The link to the bug in question is here:

https://bugs.mysql.com/bug.php?id=86575

Now, we’ll go through the steps we took to test this, and their outputs. These are for the latest MySQL version (5.7.18), but the same results were seen in 5.5.54, 5.6.35, and MariaDB 10.2.5.

First, we create the schema, tables and view needed:

mysql> CREATE SCHEMA view_test;
Query OK, 1 row affected (0.00 sec)
mysql> USE view_test;
Database changed
mysql> CREATE TABLE `main_table` (
   ->   `id` int(11) NOT NULL AUTO_INCREMENT,
   ->   `letters` varchar(64) DEFAULT NULL,
   ->   `numbers` int(11) NOT NULL,
   ->   `time` time NOT NULL,
   ->   PRIMARY KEY (`id`),
   ->   INDEX col_b (`letters`),
   ->   INDEX cols_c_d (`numbers`,`letters`)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.31 sec)
mysql> CREATE TABLE `table_trigger_control` (
   ->   `id` int(11),
   ->   `description` varchar(255)
   -> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.25 sec)
mysql> CREATE VIEW view_main_table AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.02 sec)

Indexes are not really needed to prove the point, but were initially added to the tests for completeness. They make no difference in the results.

Then, we create the triggers for all possible combinations of [BEFORE|AFTER] and [INSERT|UPDATE|DELETE]. We will use the control table to have the triggers insert rows, so we can check if they were actually called by our queries.

mysql> CREATE TRIGGER trigger_before_insert BEFORE INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_after_insert AFTER INSERT ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER INSERT");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_update BEFORE UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "BEFORE UPDATE");
Query OK, 0 rows affected (0.19 sec)
mysql> CREATE TRIGGER trigger_after_update AFTER UPDATE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (NEW.id, "AFTER UPDATE");
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TRIGGER trigger_before_delete BEFORE DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "BEFORE DELETE");
Query OK, 0 rows affected (0.18 sec)
mysql> CREATE TRIGGER trigger_after_delete AFTER DELETE ON main_table FOR EACH ROW
   -> INSERT INTO table_trigger_control VALUES (OLD.id, "AFTER DELETE");
Query OK, 0 rows affected (0.05 sec)

As you can see, they will insert the ID of the row in question, and the combination of time/action appropriate for each one. Next, we will proceed in the following manner:

  1. INSERT three rows in the main table
  2. UPDATE the second
  3. DELETE the third

The reasoning behind doing it against the base table is to check that the triggers are working correctly, and doing what we expect them to do.

mysql> INSERT INTO main_table VALUES (1, 'A', 10, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO main_table VALUES (2, 'B', 20, time(NOW()));
Query OK, 1 row affected (0.14 sec)
mysql> INSERT INTO main_table VALUES (3, 'C', 30, time(NOW()));
Query OK, 1 row affected (0.17 sec)
mysql> UPDATE main_table SET letters = 'MOD' WHERE id = 2;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM main_table WHERE id = 3;
Query OK, 1 row affected (0.10 sec)

And we check our results:

mysql> SELECT * FROM main_table;
+----+---------+---------+----------+
| id | letters | numbers | time     |
+----+---------+---------+----------+
|  1 | A       |      10 | 15:19:14 |
|  2 | MOD     |      20 | 15:19:14 |
+----+---------+---------+----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
+------+---------------+
10 rows in set (0.00 sec)

Everything is working as it should, so let’s move on with the tests that we really care about. We will again take the three steps mentioned above, but this time directly on the view.

mysql> INSERT INTO view_main_table VALUES (4, 'VIEW_D', 40, time(NOW()));
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO view_main_table VALUES (5, 'VIEW_E', 50, time(NOW()));
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO view_main_table VALUES (6, 'VIEW_F', 60, time(NOW()));
Query OK, 1 row affected (0.11 sec)
mysql> UPDATE view_main_table SET letters = 'VIEW_MOD' WHERE id = 5;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> DELETE FROM view_main_table WHERE id = 6;
Query OK, 1 row affected (0.01 sec)

And we check our tables:

mysql> SELECT * FROM main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM view_main_table;
+----+----------+---------+----------+
| id | letters  | numbers | time     |
+----+----------+---------+----------+
|  1 | A        |      10 | 15:19:14 |
|  2 | MOD      |      20 | 15:19:14 |
|  4 | VIEW_D   |      40 | 15:19:34 |
|  5 | VIEW_MOD |      50 | 15:19:34 |
+----+----------+---------+----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM table_trigger_control;
+------+---------------+
| id   | description   |
+------+---------------+
|    1 | BEFORE INSERT |
|    1 | AFTER INSERT  |
|    2 | BEFORE INSERT |
|    2 | AFTER INSERT  |
|    3 | BEFORE INSERT |
|    3 | AFTER INSERT  |
|    2 | BEFORE UPDATE |
|    2 | AFTER UPDATE  |
|    3 | BEFORE DELETE |
|    3 | AFTER DELETE  |
|    4 | BEFORE INSERT |
|    4 | AFTER INSERT  |
|    5 | BEFORE INSERT |
|    5 | AFTER INSERT  |
|    6 | BEFORE INSERT |
|    6 | AFTER INSERT  |
|    5 | BEFORE UPDATE |
|    5 | AFTER UPDATE  |
|    6 | BEFORE DELETE |
|    6 | AFTER DELETE  |
+------+---------------+
20 rows in set (0.00 sec)

As seen in the results, all triggers were executed, even when the queries were run against the view. Since this was an updatable view, it worked. On the contrary, if we try on a non-updatable view it fails (we can force ALGORITHM = TEMPTABLE to test it).

mysql> CREATE ALGORITHM=TEMPTABLE VIEW view_main_table_temp AS SELECT * FROM main_table;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO view_main_table_temp VALUES (7, 'VIEW_H', 70, time(NOW()));
ERROR 1471 (HY000): The target table view_main_table_temp of the INSERT is not insertable-into
mysql> UPDATE view_main_table_temp SET letters = 'VIEW_MOD' WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the UPDATE is not updatable
mysql> DELETE FROM view_main_table_temp WHERE id = 5;
ERROR 1288 (HY000): The target table view_main_table_temp of the DELETE is not updatable

As mentioned before, MariaDB shows the same behavior. The difference, however, is that the documentation is correct in mentioning the limitations, since it only shows the following:

https://mariadb.com/kb/en/mariadb/trigger-limitations/

Triggers cannot operate on any tables in the mysql, information_schema or performance_schema database.

Corollary to the Discussion

It’s always good to thoroughly check the documentation, but it’s also necessary to test things and prove the documentation is showing the real case (bugs can be found everywhere, not just in the code :)).

May
22
2017
--

ICP Counters in information_schema.INNODB_METRICS

ICP Counters

ICP CountersIn this blog, we’ll look at ICP counters in the information_schema.INNODB_METRICS. This is part two of the Index Condition Pushdown (ICP) counters blog post series. 

As mentioned in the previous post, in this blog we will look at how to check on ICP counters on MySQL and Percona Server for MySQL. This also applies to MariaDB, since the INNODB_METRICS table is also available for MariaDB (as opposed to the Handler_icp_% counters being MariaDB-specific). We will use the same table and data set as in the previous post.

For simplicity we’ll show the examples on MySQL 5.7.18, but they also apply to the latest Percona Server for MySQL (5.7.18) and MariaDB Server (10.2.5):

mysql [localhost] {msandbox} (test) > SELECT @@version, @@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 5.7.18    | MySQL Community Server (GPL) |
+-----------+------------------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1G
*************************** 1. row ***************************
      Table: t1
Create Table: CREATE TABLE `t1` (
 `f1` int(11) DEFAULT NULL,
 `f2` int(11) DEFAULT NULL,
 `f3` int(11) DEFAULT NULL,
 KEY `idx_f1_f2` (`f1`,`f2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|  3999996 |
+----------+
1 row in set (3.98 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 LIMIT 12;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
|    3 |    1 |    1 |
|    3 |    2 |    1 |
|    3 |    3 |    1 |
|    3 |    4 |    1 |
+------+------+------+
12 rows in set (0.00 sec)

Before proceeding with the examples, let’s see what counters we have available and how to enable and query them. The documentation page is at the following link: https://dev.mysql.com/doc/refman/5.7/en/innodb-information-schema-metrics-table.html.

The first thing to notice is that we are advised to check the validity of the counters for each version where we want to use them. The counters represented in the INNODB_METRICS table are subject to change, so for the most up-to-date list it’s best to query the running MySQL server:

mysql [localhost] {msandbox} (test) > SELECT NAME, SUBSYSTEM, STATUS FROM information_schema.INNODB_METRICS WHERE NAME LIKE '%icp%';
+------------------+-----------+----------+
| NAME             | SUBSYSTEM | STATUS   |
+------------------+-----------+----------+
| icp_attempts     | icp       | disabled |
| icp_no_match     | icp       | disabled |
| icp_out_of_range | icp       | disabled |
| icp_match        | icp       | disabled |
+------------------+-----------+----------+
4 rows in set (0.00 sec)

Looking good! We have all the counters we expected, which are:

  • icp_attempts: the number of rows where ICP was evaluated
  • icp_no_match: the number of rows that did not completely match the pushed WHERE conditions
  • icp_out_of_range: the number of rows that were checked that were not in a valid scanning range
  • icp_match: the number of rows that completely matched the pushed WHERE conditions

This link to the code can be used for reference: https://github.com/mysql/mysql-server/blob/5.7/include/my_icp.h.

After checking which counters we have at our disposal, you need to enable them (they are not enabled by default). For this, we can use the “modules” provided by MySQL to group similar counters for ease of use. This is also explained in detail in the documentation link above, under the “Counter Modules” section. INNODB_METRICS counters are quite inexpensive to maintain, as you can see in this post by Peter Z.

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_enable = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, SUBSYSTEM, STATUS FROM information_schema.INNODB_METRICS WHERE NAME LIKE '%icp%';
+------------------+-----------+---------+
| NAME             | SUBSYSTEM | STATUS  |
+------------------+-----------+---------+
| icp_attempts     | icp       | enabled |
| icp_no_match     | icp       | enabled |
| icp_out_of_range | icp       | enabled |
| icp_match        | icp       | enabled |
+------------------+-----------+---------+
4 rows in set (0.00 sec)

Perfect, we now know what counters we need, and how to enable them. We just need to know how to query them, and we can move on to the examples. However, before rushing into saying that a simple SELECT against the INNODB_METRICS table will do, let’s step back a bit and see what columns we have available that can be of use:

mysql [localhost] {msandbox} (test) > DESCRIBE information_schema.INNODB_METRICS;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| NAME            | varchar(193) | NO   |     |         |       |
| SUBSYSTEM       | varchar(193) | NO   |     |         |       |
| COUNT           | bigint(21)   | NO   |     | 0       |       |
| MAX_COUNT       | bigint(21)   | YES  |     | NULL    |       |
| MIN_COUNT       | bigint(21)   | YES  |     | NULL    |       |
| AVG_COUNT       | double       | YES  |     | NULL    |       |
| COUNT_RESET     | bigint(21)   | NO   |     | 0       |       |
| MAX_COUNT_RESET | bigint(21)   | YES  |     | NULL    |       |
| MIN_COUNT_RESET | bigint(21)   | YES  |     | NULL    |       |
| AVG_COUNT_RESET | double       | YES  |     | NULL    |       |
| TIME_ENABLED    | datetime     | YES  |     | NULL    |       |
| TIME_DISABLED   | datetime     | YES  |     | NULL    |       |
| TIME_ELAPSED    | bigint(21)   | YES  |     | NULL    |       |
| TIME_RESET      | datetime     | YES  |     | NULL    |       |
| STATUS          | varchar(193) | NO   |     |         |       |
| TYPE            | varchar(193) | NO   |     |         |       |
| COMMENT         | varchar(193) | NO   |     |         |       |
+-----------------+--------------+------+-----+---------+-------+
17 rows in set (0.00 sec)

There are two types: %COUNT and %COUNT_RESET. The former counts since the corresponding counters were enabled, and the latter since they were last reset (we have the TIME_% columns to check when any of these were done). This is why in our examples we are going to check the %COUNT_RESET counters, so we can reset them before running each query (as we did with FLUSH STATUS in the previous post).

Without further ado, let’s check how this all works together:

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 AND (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    2 |    1 |    1 |
+------+------+------+
2 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp%';
+------------------+-------------+
| NAME             | COUNT_RESET |
+------------------+-------------+
| icp_attempts     |           9 |
| icp_no_match     |           6 |
| icp_out_of_range |           1
| icp_match        |           2 |
+------------------+-------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND (f2 % 4) = 1;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | NULL       | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    8 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

If you checked the GitHub link above, you might have noted that the header file only contains three of the counters. This is because icp_attempts is computed as the sum of the rest. As expected, icp_match equals the number of returned rows, which makes sense. icp_no_match should also make sense if we check the amount of rows present without the WHERE conditions on f2.

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
+------+------+------+
8 rows in set (0.00 sec)

So, 8 – 2 = 6, which is exactly icp_no_match‘s value. Finally, we are left with icp_out_of_range. For each end of range the ICP scan detects, this counter is incremented by one. We only scanned one range in the previous query, so let’s try something more interesting (scanning three ranges):

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE ((f1 < 2) OR (f1 > 4 AND f1 < 6) OR (f1 > 8 AND f1 < 12)) AND (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    5 |    1 |    1 |
|    9 |    1 |    1 |
|   10 |    1 |    1 |
|   11 |    1 |    1 |
+------+------+------+
5 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp%';
+------------------+-------------+
| NAME             | COUNT_RESET |
+------------------+-------------+
| icp_attempts     |          23 |
| icp_no_match     |          15 |
| icp_out_of_range |           3 |
| icp_match        |           5 |
+------------------+-------------+
4 rows in set (0.01 sec)

We have now scanned three ranges on f1, namely: (f1 < 2), (4 < f1 < 6) and (8 < f1 < 12). This is correctly reflected in the corresponding counter. Remember that the MariaDB Handler_icp_attempts status counter we looked at in the previous post does not take into account the out-of-range counts. This means the two “attempts” counters will not be the same!

mysql [localhost] {msandbox} (test) > SET GLOBAL innodb_monitor_reset = module_icp; SET GLOBAL innodb_monitor_reset = dml_reads; FLUSH STATUS;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE ((f1 < 2) OR (f1 > 4 AND f1 < 6) OR (f1 > 8 AND f1 < 12)) AND (f2 % 4) = 1;
...
5 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT NAME, COUNT_RESET FROM information_schema.INNODB_METRICS WHERE NAME LIKE 'icp_attempts';
+--------------+-------------+
| NAME         | COUNT_RESET |
+--------------+-------------+
| icp_attempts |          23 |
+--------------+-------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW STATUS LIKE 'Handler_icp_attempts';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Handler_icp_attempts | 20    |
+----------------------+-------+
1 row in set (0.00 sec)

It can be a bit confusing to have two counters that supposedly measure the same counts yielding different values, so watch this if you use MariaDB.

ICP Counters in PMM

Today you can find an ICP counters graph for MariaDB (Handler_icp_attempts) in PMM 1.1.3.

Additionally, in release 1.1.4 you’ll find graphs for ICP metrics from information_schema.INNODB_METRICS: just look for the INNODB_METRICS-based graph on the InnoDB Metrics dashboard!

I hope you found this blog post series useful! Let me know if you have any questions or comments below.

May
17
2017
--

MongoDB Authentication and Roles: Creating Your First Personalized Role

MongoDB Authentication and Roles

MongoDB Authentication and RolesIn this blog post, we’ll walk through the native MongoDB authentication and roles, and learn how to create personalized roles. It is a continuation of Securing MongoDB instances.

As said before, MongoDB features a few authentication methods and built-in roles that offer great control of both who is connecting to the database and what they are allowed to do. However, some companies have their own security policies that are often not covered by default roles. This blog post explains not only how to create personalized roles, but also how to grant minimum access to a user.

Authentication Methods

SCRAM-SHA-1 and MONGODB-CR are challenge-response protocols. All the users and passwords are saved encrypted in the MongoDB instance. Challenge-response authentication methods are widely used on the internet in several server-client software. These authentication methods do not send passwords as plain text to the server when the client is starting an authentication. Each new session has a different hash/code, which stops people from getting the password when sniffing the network.

The MONGODB-CR method was deprecated in version 3.0.

The x.509 authentication is an internal authentication that allows instances and clients to communicate to each other. All certificates are signed by the same Certificate Authority and must be valid. All the network traffic is encrypted by a given key, and it is only possible to read data with a valid certificate signed by such key.

MongoDB also offers external authentications such as LDAP and Kerberos. When using LDAP, users can log in to MongoDB using their centralized passwords. The LDAP application is commonly used to manage users and passwords in wide networks. Kerberos is a service that allows users to login only once, and then generates access tickets so that the users are allowed to access other services. Some configuration is necessary to use external authentication.

Built in roles

  • read: collStats,dbHash,dbStats,find,killCursors,listIndexes,listCollections,
  • readWrite: all read privileges + convertToCapped, createCollection,dbStats, dropCollection, createIndex, dropIndex, emptycapped, insert, listIndexes,remove, renameCollectionSameDB, update.
  • readAnyDatabase: allows the user to perform read in any database except the local and the config databases.

And so on…

In this tutorial, we are going to give specific privileges to a user who is allowed to only read the database, although he is allowed to write in a specific collection.

For this tutorial, we are using MongoDB 3.4 with previously configured authentication.

Steps:

  1. Create the database:
    mongo --authenticationDatbase admin -u superAdmin -p
    use percona
    db.foo.insert({x : 1})
    db.foo2.insert({x : 1})
  2. Create a new user:
    > db.createUser({user : 'client_read', pwd : '123', roles : ['read']})
    Successfully added user: { "user" : "client_read", "roles" : [ "read" ] }
  3. Log in with the user that has just been created and check the user access:
    ./mongo localhost/percona -u client_read -p
    MongoDB shell version v3.4.0-rc5
    Enter password:
    db.foo.find()
    { "_id" : ObjectId("586bc2e9cac0bbb93f325d11"), "x" : 1 }
    db.foo2.find().count()
    1
    // If user try to insert documents will receive an error:
    > db.foo.insert({x : 2})
    WriteResult({
                "writeError" : {
                "code" : 13,
                "errmsg" : "not authorized on percona to execute command
                     { insert: "foo", documents: [ { _id: ObjectId('586bc36e7b114fb2517462f3'), x: 2.0 } ], ordered: true }"
                }
    })
  4. Log out and log in again with administrator user to create a new role for this user:
    mongo --authenticationDatabase admin -u superAdmin -p
    db.createRole({
    role : 'write_foo2_Collection',
    privileges : [ {resource : {db : "percona", collection : "foo2"}, actions : ["insert","remove"]}
    ],
    roles : ["read"]
    })
    db.updateUser('client_read', roles : ['write_foo2_Collection'])
  5. Check the new access:
    ./mongo
    db.auth('client_read','123')
    1
    > show collections
    foo
    foo2
    > db.foo.find()
    { "_id" : ObjectId("586bc2e9cac0bbb93f325d11"), "x" : 1 }
    > db.foo2.insert({y : 2})
    WriteResult({ "nInserted" : 1 })
    > db.foo.insert({y : 2}) //does not have permission.
    WriteResult({
          "writeError" : {
                "code" : 13,
                "errmsg" : "not authorized on percona to execute command { insert: "foo", documents: [ { _id: ObjectId('586bc5e26f05b3a5db849359'), y: 2.0 } ], ordered: true }"
                         }
    })
  6. We can also add access to other database resources. Let’s suppose we would like to grant this just created user permission to execute a getLog command. This command is available in the clusterAdmin role, but we do not want to give all this role’s access to him. See https://docs.mongodb.com/v3.0/reference/privilege-actions/#authr.getLog.

    There is a caveat/detail/observation here. If we want to grant cluster privileges to a user, we should create the role in the admin database. Otherwise, the command will fail:

    db.grantPrivilegesToRole(
         "write_foo2_Collection",
               [
                      {resource : {cluster : true}, actions : ["getLog"] }
               ]
    )
    Roles on the 'percona' database cannot be granted privileges that target other databases or the cluster :
  7. We are creating the same role in the admin database. This user only works properly if the admin database is present in a possible restore. Otherwise, the privileges fail:

    use admin
    db.createRole({
         role : 'write_foo2_Collection_getLogs',
         privileges : [
                           {resource : {db : "percona", collection : "foo2"}, actions : ["insert","remove"]},
                           {resource : {cluster : true}, actions : ["getLog"]}],  
         roles : [ {role : "read", db: "percona"}]
    })
    use percona
    db.updateUser( "client_read",
    {
         roles : [
              { role : "write_foo2_Collection_getLogs", db : "admin" }
                    ]
    }
    )
  8. Now the user has the same privileges as before, plus the getLog permission. We can test this user new access with:

    mongo --authenticationDatabase percona -u read_user -p
    db.adminCommand({getLog : 'global'})
    {
              "totalLinesWritten" : 287,
              "log" : [....
    ….
    }

I hope you find this post useful. Please feel free to ping me on twitter @AdamoTonete or @percona and let us know your thoughts.

May
10
2017
--

Percona-Lab/mongodb_consistent_backup: 1.0 Release Explained

mongodb_consistent_backup

In this blog post, I will cover the Percona-Lab/mongodb_consistent_backup tool and the improvements in the 1.0.1 release of the tool.

Percona-Lab/mongodb_consistent_backup

mongodb_consistent_backup is a tool for performing cluster consistent backups on MongoDB clusters or single-replica sets. This tool is open source Python code, developed by Percona and published under our Percona-Lab GitHub repository. Percona-Lab is a place for code projects maintained and supported with only best-effort from Percona.

By considering the entire MongoDB cluster’s shards and individual shard members, mongodb_consistent_backup can backup a cluster with one or many shards to a single point in time. Single-point-in-time consistency of cluster backups is critical to data integrity for any “sharded” database technology, and is a topic often overlooked in database deployments.

This topic is explained in detail by David Murphy in this Percona blog: https://www.percona.com/blog/2016/07/25/mongodb-consistent-backups/.

1.0 Release

mongodb_consistent_backup originally was a single replica set backup script internal to Percona, which morphed into a large multi-threaded/concurrent Python project. It was released to the public (Percona-Lab) with some rough edges.

This release focuses on the efficiency and reliability of the existing components, many of the pain-points in extending, deploying and troubleshooting the tool and adding some small features.

New Features: Config File Overhaul

The tool was moved to use a structured, nested YAML config file instead of the messy config implemented in 0.x.

You can see a full example of this new format at this URL: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/conf/mongodb-consistent-backup.example.conf

Here’s an example of a very basic config file that’s using 3 x replica-set config servers as “seed hosts” (a new feature in 1.0!), username+password and the optional Nagios NSCA notification method:

production:
  host: csReplSet/config01:27019,config02:27019,config03:27019
  username: mongodb_consistent_password
  password: "correct horse battery staple"
  authdb: admin
  log_dir: /var/log/mongodb_consistent_backup
  backup:
    method: mongodump
    name: production-eu
    location: /var/lib/mongodb_consistent_backup
  archive:
    method: tar
  notify:
    method: nsca
    nsca:
      check_host: mongodb-production-eu
      check_name: "mongodb_consistent_backup"
      server: nagios01.example.com:5667
  upload:
    method: none

New Features: Logging

Overall there is much more logged in this release, both in “regular” mode and “verbose” mode. A highlight for this release is live logging of the output of mongodump, something that was missing from the 0.x versions of the tool.

Now we can see the progress of the backup of each shard/replset in a cluster! Below we can see the backup of csReplset (a config server replica set) dump many collections and complete its backup. After, we can see the replica sets “test1” and “test2” dumping “wikipedia.pages”.

...
[2017-05-05 20:11:05,366] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.settings (1 document)
[2017-05-05 20:11:05,367] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.version to
[2017-05-05 20:11:05,372] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.version (1 document)
[2017-05-05 20:11:05,373] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.locks to
[2017-05-05 20:11:05,377] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.locks (3 documents)
[2017-05-05 20:11:05,378] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.databases to
[2017-05-05 20:11:05,381] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.databases (1 document)
[2017-05-05 20:11:05,383] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.tags to
[2017-05-05 20:11:05,385] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.tags (0 documents)
[2017-05-05 20:11:05,387] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing config.changelog to
[2017-05-05 20:11:05,399] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	done dumping config.changelog (112 documents)
[2017-05-05 20:11:05,401] [INFO] [MongodumpThread-7] [MongodumpThread:wait:72] csReplSet/172.17.0.1:27019:	writing captured oplog to
[2017-05-05 20:11:05,578] [INFO] [MongodumpThread-7] [MongodumpThread:run:133] Backup csReplSet/172.17.0.1:27019 completed in 0.71 seconds, 0 oplog changes
[2017-05-05 20:11:08,042] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[........................]  wikipedia.pages  636/35080  (1.8%)
[2017-05-05 20:11:08,071] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[........................]  wikipedia.pages  878/35118  (2.5%)
[2017-05-05 20:11:11,041] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[#.......................]  wikipedia.pages  1853/35080  (5.3%)
[2017-05-05 20:11:11,068] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[#.......................]  wikipedia.pages  2063/35118  (5.9%)
[2017-05-05 20:11:14,043] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[##......................]  wikipedia.pages  2983/35080  (8.5%)
[2017-05-05 20:11:14,075] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[##......................]  wikipedia.pages  3357/35118  (9.6%)
[2017-05-05 20:11:17,040] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[##......................]  wikipedia.pages  4253/35080  (12.1%)
[2017-05-05 20:11:17,070] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[###.....................]  wikipedia.pages  4561/35118  (13.0%)
[2017-05-05 20:11:20,038] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[###.....................]  wikipedia.pages  5180/35080  (14.8%)
[2017-05-05 20:11:20,067] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[###.....................]  wikipedia.pages  5824/35118  (16.6%)
[2017-05-05 20:11:23,050] [INFO] [MongodumpThread-5] [MongodumpThread:wait:72] test1/172.17.0.1:27017:	[####....................]  wikipedia.pages  6216/35080  (17.7%)
[2017-05-05 20:11:23,072] [INFO] [MongodumpThread-6] [MongodumpThread:wait:72] test2/172.17.0.1:28017:	[####....................]  wikipedia.pages  6964/35118  (19.8%)
...

Also, while backup data is gathered the status output from each Oplog tailing thread is now logged every 30 seconds (by default):

...
[2017-05-05 20:12:09,648] [INFO] [TailThread-2] [TailThread:status:60] Oplog tailer test1/172.17.0.1:27017 status: 256 oplog changes, ts: Timestamp(1494020048, 6)
[2017-05-05 20:12:11,033] [INFO] [TailThread-3] [TailThread:status:60] Oplog tailer test2/172.17.0.1:28017 status: 1588 oplog changes, ts: Timestamp(1494020049, 50)
[2017-05-05 20:12:22,804] [INFO] [TailThread-4] [TailThread:status:60] Oplog tailer csReplSet/172.17.0.1:27019 status: 43 oplog changes, ts: Timestamp(1494020062, 1)
...

You can now write log files to disk by setting the ‘log_dir’ config variable or ‘–log-dir’ command-line flag. One log file per backup is written to this directory, with a symlink pointing to the latest log file. The previous backup’s log file is automatically compressed with gzip.

New Features: ZBackup

ZBackup is an open-source de-duplication, compression and (optional) encryption tool for archive-like data (similar to backups). Files that are fed into ZBackup are organized at a block-level into pieces called “bundles”. When more files are fed into ZBackup, it can re-use the bundles when it notices the same blocks are being backed up. This approach provides significant savings on disk space (required for many database backups). To add to the savings, all data in ZBackup is compressed using LZMA compression, which generally compresses better than gzip/deflate or zip. ZBackup also supports an optional AES-128 encryption at rest. You enable it by providing a key file to ZBackup that allows it to encode/decode the data.

mongodb_consistent_backup 1.0.0 now supports ZBackup as a new archiving method!

Below is an example of ZBackup used on a small database (about 1GB) that is constantly growing.

This graph compares the size added on disk for seven backups taken 10-minutes apart using two methods. The first method is mongodb_consistent_backup, with mongodump built-in gzip compression (available via the –gzip flag since 3.2) enabled. By default mongodump gzip is enabled by mongodb_consistent_backup (if it’s available), so this is a good “baseline”. The second method is mongodb_consistent_backup with mongodump gzip compression disabled and ZBackup used as the mongodb_consistent_backup archiving method, a post-backup stage in our tool. Notice each backup in the graph after the first only adds 14-18mb to the disk usage, meaning ZBackup was able to recognize similarities in the data.

To try out ZBackup as an archive method, use one of these methods:

  1. Set the field “method” under the “archive” section of your mongodb_consistent_backup config file to “zbackup” (example):
    production:
      ...
      archive:
         method: zbackup
      ...
  2. Or, add the command-line flag “archive.method=zbackup” to your command line.

This archive method causes mongodb_consistent_backup to create a subdirectory in the backup location named “mongodb-consistent-backup_zbackup” and import completed backups into ZBackup after the backup stage. This directory contains the ZBackup storage files that it needs to operate, and they should not be modified!

Of course, there are trade-offs. ZBackup adds some additional system resource usage and time to the overall backup AND restore process – both importing and exporting data into ZBackup takes some additional time.

By default ZBackup’s restore uses a very small amount of RAM for cache, so increasing the cache with the “–cache-size” flag may improve restore performance. ZBackup uses threading so more CPUs can also improve performance of backups and restores.

New Features: Docker Container

We now offer a Dockerfile for building mongodb_consistent_backup with all dependencies into a Docker container! The goal for the image is to be as “thin” as possible, and so the build merely downloads a prebuilt binary of the tool and installs dependencies. See: https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/Dockerfile

Some interesting use cases for a Docker-based deployment of the tool come to mind:

  • Running MongoDB backups using ephemeral containers on Apache Mesos or Kubernetes (with persistent volumes or remote upload)
  • Restricting system resources used by mongodb_consistent_backup via Docker/cgroup’s isolation features
  • Simplified deployment or isolated dependencies (e.g., Python, Mongodump, etc.)

Up-to-date images of mongodb_consistent_backup are available at this Dockerhub URL: https://hub.docker.com/r/timvaillancourt/mongodb_consistent_backup/. This image includes mongodb_consistent_backup, gzip-capable mongodump binaries and latest-stable ZBackup binaries.

To run the latest Dockerhub image:

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest <mongodb_consistent_backup-flags here>

To just list the “help” page (all available options):

$ docker run -i timvaillancourt/mongodb_consistent_backup:latest --help
usage: mongodb-consistent-backup [-h] [-c CONFIGPATH]
                                 [-e {production,staging,development}] [-V]
                                 [-v] [-H HOST] [-P PORT] [-u USER]
                                 [-p PASSWORD] [-a AUTHDB] [-n BACKUP.NAME]
                                 [-l BACKUP.LOCATION] [-m {mongodump}]
                                 [-L LOG_DIR] [--lock-file LOCK_FILE]
                                 [--sharding.balancer.wait_secs SHARDING.BALANCER.WAIT_SECS]
                                 [--sharding.balancer.ping_secs SHARDING.BALANCER.PING_SECS]
                                 [--archive.method {tar,zbackup,none}]
                                 [--archive.tar.compression {gzip,none}]
                                 [--archive.tar.threads ARCHIVE.TAR.THREADS]
                                 [--archive.zbackup.binary ARCHIVE.ZBACKUP.BINARY]
                                 [--archive.zbackup.cache_mb ARCHIVE.ZBACKUP.CACHE_MB]
                                 [--archive.zbackup.compression {lzma}]
...
...

An example script for running the container with persistent Docker volumes is available here: https://github.com/Percona-Lab/mongodb_consistent_backup/tree/master/scripts

New Features: Multiple Seed Hosts + Config Servers

mongodb_consistent_backup 1.0 introduces the ability to define a list of multiple “seed” hosts, preventing a potential for a single-point of failure in your backups! If a host in the list is unavailable, it will be skipped.

Multiple hosts should be specified with this replica-set URL format, many hosts separated by commas:

    <replica-set>/<host/ip>:<port>,<host/ip>:<port>,…

Or you can specify a comma-separated list without the replica set name for non-replset nodes (eg: mongos or non-replset config servers):

    <host/ip>:<port>,<host/ip>:<port>,…

Also, the functionality to use cluster Config Servers as seed hosts was added. Before version 1.0 a clustered backup needed to use a single mongos router as a seed host to find all shards and cluster members. Sometimes mongos routers can come and go as you scale, making this design brittle.

With this new functionality, mongodb_consistent_backup can use the Cluster Config Servers to map out the cluster, which are usually three times the fairly-static hosts in an infrastructure. This makes the deployment and operation of the tool a bit simpler and more reliable.

Overall Improvements

As mentioned, a focus in this release was improving the existing code. A major refactoring of the code structure of the project was completed in 1.0, and moves the major “phases” or “stages” in the tool to their own Python sub-modules (e.g., “Backup” and “Archive”) that then auto-load their various “methods” like “mongodump” or “Zbackup”.

The code was broken into these high-level stages:

  1. Backup. The stage that gathers the backup of data. During this stage, Oplog tailing and resolving also occur if the backup is for a cluster. More backup methods are coming soon!
  2. Archive. The stage that archives and optionally compresses the backup data. The new ZBackup method also adds de-duplication and encryption ability to this stage.
  3. Upload. The stage that uploads the resulting data to a remote storage. Currently only AWS S3 is supported with Google Cloud Storage and Rsync being added as we speak.
  4. Notify. The stage that notifies external systems of the success/failure of the backup. Currently, our tool only supports Nagios NSCA, with plans for PagerDuty and others to be added.

Some interesting code enhancements include:

  • Reusing of database connections. This reduces the number of connections on seed hosts.
  • Replication heartbeat time (“operational lag”). This is now considered in replica set lag calculations.
  • Added thread safety for oplog tailing threads. This resolves some issues on extremely-overloaded hosts.

Another focus was efficiency and preventing race conditions. The tool should be much less susceptible to error as a result, although if you see any problems we’d like to hear about them on our GitHub “Issues” page.

Lastly, we encourage the open source community to contribute additional functionality to this tool via our GitHub!

Release Notes:

  • 1.0.0
    • Move to dynamic code “Submodules” and subclassing of repeated components
    • Restructuring of YAML config to nested config
    • Safe start/stopping of oplog tailer threads, additional checking on all thread states
    • File-based logging with gzip of old log
    • Oplog tailer ‘oplogReplay’ performance optimization
    • Fixes to oplog durability to-disk
    • Live mongodump output to stdout in realtime
    • Oplog tailer status logging
    • ZBackup archive method: supporting deduplication, compression and option AES encryption
    • Support for list discovery/seed hosts
    • Support configdb servers as cluster seed hosts
    • Fewer (reused) database connections
    • Database connections to use strong write concern
    • Consider replication operational lag in secondary scoring
    • Backup metadata is written for future functionality and troubleshooting
    • mongodb_consistent_backup.Errors custom exceptions for proper exception handling
    • Python PyPi support added
    • Dockerfile support for running under containers
    • Additional log messages
    • Support for MongoDB 3.4 datatypes
    • Significant reworking of existing code for efficiency, reliability and readability

More about our releases can be seen here: https://github.com/Percona-Lab/mongodb_consistent_backup/releases.

May
09
2017
--

MariaDB Handler_icp_% Counters: What They Are, and How To Use Them

Handler_icp_% counters

Handler_icp_% countersIn this post we’ll see how MariaDB’s Handler_icp_% counters status counters (Handler_icp_attempts and Handler_icp_matches) measure ICP-related work done by the server and storage engine layers, and how to see if our queries are getting any gains by using them.

These counters (as seen in SHOW STATUS output) are MariaDB-specific. In a later post, we will see how we can get this information in MySQL and Percona Server. This investigation spun off from comments in Michael’s post about the new MariaDB dashboard in PMM. Comments are very useful, so keep them coming! ?

We can start by checking the corresponding documentation pages:

https://mariadb.com/kb/en/mariadb/server-status-variables/#handler_icp_attempts

Description: Number of times pushed index condition was checked. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering. See Index Condition Pushdown.

https://mariadb.com/kb/en/mariadb/server-status-variables/#handler_icp_match

Description: Number of times pushed index condition was matched. The smaller the ratio of Handler_icp_attempts to Handler_icp_match the better the filtering. See See Index Condition Pushdown.

As we’ll see below, “attempts” counts the number of times the server layer sent a WHERE clause down to the storage engine layer to check if it can be filtered out. “Match”, on the other hand, counts whether an attempt ended up in the row being returned (i.e., if the pushed WHERE clause was a complete match).

Now that we understand what they measure, let’s check how to use them for reviewing our queries. Before moving forward with the examples, here are a couple of points to keey in mind:

  • Even if the attempt was not successful, it doesn’t mean that it is bad. However, a high (attempts – match) number is good in this context, since this is a measure of the rows that were “saved” from being checked in the server layer after getting the complete row from the storage engine. On the other hand, a low number is not bad — it just means that most (or all) attempts ended up being a match.
  • From the documentation links above, it is stated that “the smaller the ratio between attempts to match, the better the filtering.”, which I believe is the contrary.

Back to our examples, then. First, let’s review version, table structure and data set.

mysql [localhost] {msandbox} (test) > SELECT @@version, @@version_comment;
+-----------------+-------------------+
| @@version       | @@version_comment |
+-----------------+-------------------+
| 10.1.20-MariaDB | MariaDB Server    |
+-----------------+-------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `f1` int(11) DEFAULT NULL,
  `f2` int(11) DEFAULT NULL,
  `f3` int(11) DEFAULT NULL,
  KEY `idx_f1_f2` (`f1`,`f2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)
mysql [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|  3999996 |
+----------+
1 row in set (1.75 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 LIMIT 12;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    1 |    4 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
|    2 |    4 |    1 |
|    3 |    1 |    1 |
|    3 |    2 |    1 |
|    3 |    3 |    1 |
|    3 |    4 |    1 |
+------+------+------+
12 rows in set (0.00 sec)

It’s trivial, but it will work well for what we intend to show:

mysql [localhost] {msandbox} (test) > FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 and f2 < 4;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    1 |    3 |    1 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
|    2 |    3 |    1 |
+------+------+------+
6 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 8              |
| HANDLER_ICP_MATCH    | 6              |
| ROWS_READ            | 6              |
| ROWS_SENT            | 6              |
+----------------------+----------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND f2 < 4;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using index condition |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
1 row in set (0.01 sec)

In this scenario, the server sent a request to the storage engine to check on eight rows, from which six were a complete match. This is the case where a low attempts - match number is seen. The server scanned the index on the f1 column to decide which rows needed a “request for further check”, then the storage engine checked the WHERE condition on the f2 column with the pushed down (f2 < 4) clause.

Now let’s change the condition on f2:

mysql [localhost] {msandbox} (test) > FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+------+------+
| f1   | f2   | f3   |
+------+------+------+
|    1 |    1 |    1 |
|    2 |    1 |    1 |
+------+------+------+
2 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 8              |
| HANDLER_ICP_MATCH    | 2              |
| ROWS_READ            | 2              |
| ROWS_SENT            | 2              |
+----------------------+----------------+
4 rows in set (0.00 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using index condition |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-----------------------+
1 row in set (0.00 sec)

In this scenario, the server also sent a request for eight rows, of which only two ended up being a match, due to the changed condition on f2. This is the case where a “high” attempts - match number is seen.

Great, we understand how to see the amount of rows sent between the server and storage engine layers. Now let’s move forward with the “how can I make sense of these numbers?” part. We can use the other counters included in the outputs that haven’t been mentioned yet (ROWS_READ and ROWS_SENT) and compare them when running the same queries with ICP disabled (which can be conveniently done with a simple SET):

mysql [localhost] {msandbox} (test) > SET optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

Let’s run the queries again. For the first query:

mysql [localhost] {msandbox} (test) > FLUSH STATUS; SELECT * FROM t1 WHERE f1 < 3 and f2 < 4;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 0              |
| HANDLER_ICP_MATCH    | 0              |
| ROWS_READ            | 9              |
| ROWS_SENT            | 6              |
+----------------------+----------------+
4 rows in set (0.01 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 AND f2 < 4;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using where |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)

As we can see by the handler counters, ICP is indeed not being used. Now the server is reading nine rows, as opposed to six when using ICP. Moreover, notice how we now see a Using where in the “Extra” column in the EXPLAIN output. This means that we are doing the filtering on the server layer; and we are using the first column of the composite index for the range scan (f1 < 3).

For the second query:

mysql [localhost] {msandbox} (test) > FLUSH STATUS; SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
...
mysql [localhost] {msandbox} (test) > SELECT * FROM information_schema.SESSION_STATUS WHERE VARIABLE_NAME LIKE '%icp%' OR VARIABLE_NAME='ROWS_READ' OR VARIABLE_NAME='ROWS_SENT';
+----------------------+----------------+
| VARIABLE_NAME        | VARIABLE_VALUE |
+----------------------+----------------+
| HANDLER_ICP_ATTEMPTS | 0              |
| HANDLER_ICP_MATCH    | 0              |
| ROWS_READ            | 9              |
| ROWS_SENT            | 2              |
+----------------------+----------------+
4 rows in set (0.01 sec)
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE f1 < 3 and (f2 % 4) = 1;
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
|    1 | SIMPLE      | t1    | range | idx_f1_f2     | idx_f1_f2 | 5       | NULL |    7 | Using where |
+------+-------------+-------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)

The server is also reading nine rows (because it’s also using only column f1 from the composite index, and we have the same condition on it), with the difference that it used to read only two while using ICP. We could say that this case was much better (and it was), but as with most of the time the answer to the bigger “how much better” question is “it depends”. As stated in the documentation, it has two factors:

  • How many records are filtered out
  • How expensive it is to read them

So it really depends on the size and kind of data set, and if it’s in memory or not. It’s better to benchmark the queries to have more information (like actual response times), but if it’s not possible we can get some valuable information by taking a look at these counters.

Lastly, I wanted to go back to the “attempts to match ratio” mentioned initially. As we saw in these examples, the first one had a 8:6 (or 4:3) ratio and the second a 8:2 (or 4:1) ratio, and the second one filtered more rows. Given this, I believe that the inverse will hold true: “The bigger the ratio of Handler_icp_attempts to Handler_icp_match, the better the filtering.”

Stay tuned for the next part, where we’ll see how to get this information from MySQL and Percona Server, too!

May
08
2017
--

Chasing a Hung MySQL Transaction: InnoDB History Length Strikes Back

Hung MySQL Transaction

In this blog post, I’ll review how a hung MySQL transaction can cause the InnoDB history length to grow and negatively affect MySQL performance.

Recently I was helping a customer discover why SELECT queries were running slower and slower until the server restarts (which got things back to normal). It took some time to get from that symptom to a final diagnosis. Please follow me on the journey of chasing this strange MySQL behavior!

Symptoms

Changes in the query response time can mean tons of things. We can check everything from the query plan to the disk performance. However, fixing it with a restart is less common. After looking at “show engine innodb status”, I noticed some strange lines:

Trx read view will not see trx with id >= 41271309593, sees < 41268384363
---TRANSACTION 41271309586, ACTIVE 766132 sec
2 lock struct(s), heap size 376, 0 row lock(s), undo log entries 1
...

There was a total of 940 transactions like this.

Another insight was the InnoDB transaction history length graph from Percona Monitoring and Management (PMM):

Hung MySQL Transaction

History length of 6 million and growing clearly indicates a problem.

Problem localized

There have been a number of blog posts describing a similar problem: Peter stated in a blog post: “InnoDB transaction history often hides dangerous ‘debt’“. As the InnoDB transaction history grows, SELECTs need to scan more and more previous versions of the rows, and performance suffers. That explains the issue: SELECT queries get slower and slower until restart. Peter also filed this bug: Major regression having many row versions.

But why does the InnoDB transaction history start growing? There are 940 transactions in this state: ACTIVE 766132 sec. MySQL’s process list shows those transactions in “Sleep” state. It turns out that those transactions were “lost” or “hung”. As we can also see, each of those transactions holds two lock structures and one undo record, so they are not committed and not rolled-back. They are sitting there doing nothing. In this case, with the default isolation level REPEATABLE-READ, InnoDB can’t purge the undo records (transaction history) for other transactions until these “hung” transactions are finished.

The quick solution is simple: kill those connections and InnoDB will roll back those transactions and purge transaction history. After killing those 940 transactions, the graph looked like this:

Hung MySQL Transaction

However, several questions remain:

  1. What are the queries inside of this lost transaction? Where are they coming from? The problem is that neither MySQL’s process list nor InnoDB’s status shows the queries for this transaction, as it is not running those queries right now (the process list is a snapshot of what is happening inside MySQL right at this moment)
  2. Can we fix it so that the “hung” transactions don’t affect other SELECT queries and don’t cause the growth of transaction history?

Simulation

As it turns out, it is very easy to simulate this issue with sysbench.

Test preparation

To add some load, I’m using sysbench,16 threads (you can open less, it does not really matter here) and a script for a “write-only” load (running for 120 seconds):

conn=" --db-driver=mysql --mysql-host=localhost --mysql-user=user --mysql-password=password --mysql-db=sbtest "
sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=InnoDB --oltp-table-size=1000000 $conn prepare
sysbench --num-threads=16 --max-requests=0 --max-time=120 --test=/usr/share/sysbench/tests/include/oltp_legacy/oltp.lua --oltp-table-size=1000000 $conn
--oltp-test-mode=complex --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0
--oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=1 --oltp-non-index-updates=0 run

Simulate a “hung” transaction

While the above sysbench is running, open another connection to MySQL:

use test;
CREATE TABLE `a` (
`i` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into a values(1);
begin; insert into a values(1); select * from a;

Note: we will need to run the SELECT as a part of this transaction. Do not close the connection.

Watch the history

mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 34324 |
+----------------------+-------+
1 row in set (0.00 sec)
mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 36480 |
+----------------------+-------+
1 row in set (0.01 sec)

We can see it is growing. Now it is time to commit or rollback or even kill our original transaction:

mysql> rollback;
...
mysql> select name, count from information_schema.INNODB_METRICS where name like '%hist%';
+----------------------+-------+
| name                 | count |
+----------------------+-------+
| trx_rseg_history_len | 793   |
+----------------------+-------+
1 row in set (0.00 sec)

As we can see, it has purged the history length.

Finding the queries from the hung transactions

There are a number of options to find the queries from that “hung” transaction. In older MySQL versions, the only way is to enable the general log (or the slow query log). Starting with MySQL 5.6, we can use the Performance Schema. Here are the steps:

  1. Enable performance_schema if not enabled (it is disabled on RDS / Aurora by default).
  2. Enable events_statements_history:
    mysql> update performance_schema.setup_consumers set ENABLED = 'YES' where NAME='events_statements_history';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
  3. Run the query to find all transaction started 10 seconds ago (change the number of seconds to match your workload):
    SELECT ps.id as processlist_id,
                 trx_started, trx_isolation_level,
                 esh.EVENT_ID,
                 esh.TIMER_WAIT,
                 esh.event_name as EVENT_NAME,
                 esh.sql_text as SQL,
                 esh.RETURNED_SQLSTATE, esh.MYSQL_ERRNO, esh.MESSAGE_TEXT, esh.ERRORS, esh.WARNINGS
    FROM information_schema.innodb_trx trx
    JOIN information_schema.processlist ps ON trx.trx_mysql_thread_id = ps.id
    LEFT JOIN performance_schema.threads th ON th.processlist_id = trx.trx_mysql_thread_id
    LEFT JOIN performance_schema.events_statements_history esh ON esh.thread_id = th.thread_id
    WHERE trx.trx_started < CURRENT_TIME - INTERVAL 10 SECOND
      AND ps.USER != 'SYSTEM_USER'
    ORDER BY esh.EVENT_IDG
    ...
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 79
             TIMER_WAIT: 33767000
             EVENT NAME: statement/sql/begin
                    SQL: begin
      RETURNED_SQLSTATE: 00000
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
    *************************** 9. row ***************************
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 80
             TIMER_WAIT: 2643082000
             EVENT NAME: statement/sql/insert
                    SQL: insert into a values(1)
      RETURNED_SQLSTATE: 00000
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
    *************************** 10. row ***************************
             PROCESS ID: 1971
            trx_started: 2017-05-03 17:36:47
    trx_isolation_level: REPEATABLE READ
               EVENT_ID: 81
             TIMER_WAIT: 140305000
             EVENT NAME: statement/sql/select
                    SQL: select * from a
      RETURNED_SQLSTATE: NULL
            MYSQL_ERRNO: 0
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0

    Now we can see the list of queries from the old transaction (the MySQL query used was taken with modifications from this blog post: Tracking MySQL query history in long running transactions).

At this point, we can chase this issue at the application level and find out why this transaction was not committed. The typical causes:

  • There is a heavy, non-database-related process inside the application code. For example, the application starts a transaction to get a list of images for analysis and then starts an external application to process those images (machine learning or similar), which can take a very long time.
  • The application got an uncaught exception and exited, but the connection to MySQL was not closed for some reason (i.e., returned to the connection pool).

We can also try to configure the timeouts on MySQL or the application so that the connections are closed after “N” minutes.

Changing the transaction isolation level to fix the InnoDB transaction history issue

Now that we know which transaction is holding up the purge process of InnoDB history, we can find this transaction and make changes so it will not “hang”. We can change the transaction isolation level from REPEATABLE READ (default) to READ COMMITTED. In READ COMMITTED, InnoDB does not need to maintain history length when other transactions have committed changes. (More details about different isolation methods and how they affect InnoDB transactions.) That will work in MySQL 5.6 and later. However this doesn’t work in Amazon Aurora (as of now): even with READ COMMITTED isolation level, the history length still grows.

Here is the list of MySQL versions where changing the isolation level fixes the issue

MySQL Version  Transaction isolation  InnoDB History Length
MySQL 5.6 repeatable read history is not purged until “hung” transaction finishes
MySQL 5.6 read committed (fixed) history is purged
Aurora repeatable read history is not purged until “hung” transaction finishes
Aurora read committed history is not purged until “hung” transaction finishes


Summary

Hung transactions can cause the InnoDB history length to grow and (surprisingly, on the first glance) affect the performance of other running select queries. We can use the performance schema to chase the “hung” transaction. Changing the MySQL transaction isolation level can potentially help.

May
03
2017
--

Storing UUID and Generated Columns

Storing UUID

A lot of things have been said about UUID, and storing UUID in an optimized way. Now that we have generated columns, we can store the decomposed information inside the UUID and merge it again with generated columns. This blog post demonstrates this process.

First, I used a simple table with one char field that I called uuid_char to establish a base case. I used this table with and without a primary key:

CREATE TABLE uuid_char (
uuid char(36) CHARACTER SET utf8 NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE uuid_char_pk (
uuid char(36) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (uuid)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I performed the tests on a local VM over MySQL 5.7.17 for 30 seconds, with only two threads, because I wanted to just compare the executions:

sysbench
--oltp-table-size=100000000
--test=/usr/share/doc/sysbench/tests/db/insert_uuid_generated_columns.uuid_char.lua
--oltp-tables-count=4
--num-threads=2
--mysql-user=root
--max-requests=5000000
--report-interval=5
--max-time=30
--mysql-db=generatedcolumn
run

One pair of executions is with the UUID generated by sysbench, which simulates the UUID that comes from the app:

rs = db_query("INSERT INTO uuid_char (uuid) VALUES " .. string.format("('%s')",c_val))

An alternative execution is for when the UUID is generated by the MySQL function uuid():

rs = db_query("INSERT INTO uuid_char (uuid) VALUES (uuid())")

Below we can see the results: 

The inserts are faster without a PK (but only by 5%), and using the uuid() function doesn’t impact performance.

Now, let’s see the alternative method, which is decomposing the UUID. It has four main information sets:

  • Timestamp: this is a number with seven decimals.
  • MAC: the MAC address of the device that creates the UUID
  • Unique value: this value avoids duplicate cases scenarios
  • UUID version: this will always be “1”, as we are going to use version 1. If you are going to use another version, you will need to review the functions that I used.

The structure of the table that we’ll use is:

CREATE TABLE `uuid_generated` (
`timestamp` decimal(18,7) unsigned NOT NULL,
`mac` bigint(20) unsigned NOT NULL,
`temp_uniq` binary(2) NOT NULL,
PRIMARY KEY (`timestamp`,`mac`,`temp_uniq`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

To understand how a UUID is unwrapped, I used this store procedure (which receives a UUID and inserts it into the table):

CREATE PROCEDURE ins_generated_uuid (uuid char(38))
begin
set @hex_timestamp = concat(substring(uuid, 16, 3), substring(uuid, 10, 4), substring(uuid, 1, 8));
set @timestamp = concat(conv(@hex_timestamp,16,10)div 10000000 - (141427 * 24 * 60 * 60),'.',right(conv(@hex_timestamp,16,10),7));
set @mac = conv(right(uuid,12),16,10);
set @temp_uniq = unhex(substring(uuid,20,4));
insert into uuid_generated (timestamp,mac,temp_uniq) values (@timestamp,@mac,@temp_uniq);
end ;;

Explanation:

  • @hex_timestamp is a temporary variable that collects the timestamp in hexadecimal format from the different sections of the UUID
  • @timestamp transforms the hexadecimal timestamp to a decimal number
  • @mac pulls the last number in the UUID (a MAC) so we can store it in as a bigint
  • @temp_uniq is a value to conserve the uniqueness, which is why we store it as binary and it is at the end of the Primary Key

If I wanted to get the UUID again, I can use these two generated columns:

`hex_timestamp` char(40) GENERATED ALWAYS AS (conv(((`timestamp` * 10000000) + (((141427 * 24) * 60) * 600000000)),10,16)) VIRTUAL,
`uuid_generated` char(38) GENERATED ALWAYS AS (concat(right(`hex_timestamp`,8),'-',substr(`hex_timestamp`,4,4),'-1',left(`hex_timestamp`,3),'-',convert(hex(`temp_uniq`) using utf8),'-',lpad(conv(`mac`,10,16),12,'0'))) VIRTUAL,

We performed tests over five scenarios:

  • Without the generated columns, the insert used data generated dynamically
  • Same as before, but we added a char field that stores the UUID
  • With the char field, and adding the generated column
  • We used the store procedure detailed before to insert the data into the table
  • We also tested the performance using triggers

The difference between the Base and the previous table structure with Primary Keys is very small. So, the new basic structure has no impact on performance.

We see that Base and +Char Field have the same performance. So leaving a char field has no performance impact (it just uses more disk space).

Using generated columns impact performance. This is expected, as the columns are generated to validate the type before the row is inserted.

Finally, the use of triggers and store procedure has the same impact in performance.

These are the three structures to the tables:

CREATE TABLE `uuid_generated` (
`timestamp` decimal(18,7) unsigned NOT NULL,
`mac` bigint(20) unsigned NOT NULL,
`temp_uniq` binary(2) NOT NULL,
PRIMARY KEY (`timestamp`,`mac`,`temp_uniq`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `uuid_generated_char` (
`timestamp` decimal(18,7) unsigned NOT NULL,
`mac` bigint(20) unsigned NOT NULL,
`temp_uniq` binary(2) NOT NULL,
`uuid` char(38) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`mac`,`temp_uniq`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `uuid_generated_char_plus` (
`timestamp` decimal(18,7) unsigned NOT NULL,
`mac` bigint(20) unsigned NOT NULL,
`temp_uniq` binary(2) NOT NULL,
`uuid` char(38) DEFAULT NULL,
`hex_timestamp` char(40) GENERATED ALWAYS AS (conv(((`timestamp` * 10000000) + (((141427 * 24) * 60) * 600000000)),10,16)) VIRTUAL,
`uuid_generated` char(38) GENERATED ALWAYS AS (concat(right(`hex_timestamp`,8),'-',substr(`hex_timestamp`,4,4),'-1',left(`hex_timestamp`,3),'-',convert(hex(`temp_uniq`) using utf8),'-',lpad(conv(`mac`,10,16),12,'0'))) VIRTUAL,
PRIMARY KEY (`timestamp`,`mac`,`temp_uniq`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And this is the trigger:

DROP TRIGGER IF EXISTS ins_generated_uuid;
delimiter ;;
CREATE TRIGGER ins_uuid_generated BEFORE INSERT ON uuid_generated
FOR EACH ROW
begin
set @hex_timestamp = concat(substring(NEW.uuid, 16, 3), substring(NEW.uuid, 10, 4), substring(NEW.uuid, 1, 8));
set NEW.timestamp = concat(conv(@hex_timestamp,16,10)div 10000000 - (141427 * 24 * 60 * 60),'.',right(conv(@hex_timestamp,16,10),7));
set NEW.mac = conv(right(NEW.uuid,12),16,10);
set NEW.temp_uniq = unhex(substring(NEW.uuid,20,4));
end ;;
delimiter ;

Conclusions

Decomposing the UUID is an alternative to storing them in order, but it won’t speed up inserts. It is simpler to execute queries over a range of dates, and look at the row for a particular device, as you will be able to use the MAC (it is recommended to add an index for it). Generated columns give you the possibility to build the UUID back in just one string.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com