Jul
06
2016
--

Pipelining versus Parallel Query Execution with MySQL 5.7 X Plugin

Pipelining versus Parallel Query Execution

Pipelining versus Parallel Query ExecutionIn this blog post, we’ll look at pipelining versus parallel query execution when using X Plugin for MySQL 5.7.

In my previous blog post, I showed how to useX Plugin for MySQL 5.7 for parallel query execution. The tricks I used to make it work:

  • Partitioning by hash
  • Open N connections to MySQL, where N = number of CPU cores

I had to do it manually (as well as to sort the result at the end) as X Plugin only supports “pipelining” (which only saves the round trip time) and does not “multiplex” connections to MySQL (MySQL does not use multiple CPU cores for a single query).

TL:DR; version

In this (long) post I’m playing with MySQL 5.7 X Plugin / X Protocol and document store. Here is the summary:

  1. X Plugin does not “multiplex” connections/sessions to MySQL. Similar to the original protocol, one connection to X Plugin will result in one session open to MySQL
  2. An X Plugin query (if the library supports it) returns immediately and does not wait until the query is finished (async call). MySQL works like a queue.
  3. X Plugin does not have any additional server-level durability settings. Unless you check or wait for the acknowledgement (which is asynchronous) from the server, the data might or might not be written into MySQL (“fire and forget”).

At the same time, X Protocol can be helpful if:

  • We want to implement an asynchronous client (i.e., we do not want to block the network communication such as downloading or API calls) when the MySQL table is locked.
  • We want to use MySQL as a queue and save the round-trip time.
Benchmark results: “pipelining” versus “parallelizing” versus a single query

I’ve done a couple of tests comparing the results between “pipelining” versus “parallelizing” versus a single query. Here are the results:

      1. Parallel queries with NodeJS:
        $ time node async_wikistats.js
        ...
        All done! Total: 17753
        ...
        real    0m30.668s
        user    0m0.256s
        sys     0m0.028s
      2. Pipeline with NojeJS:
        $ time node async_wikistats_pipeline.js
        ...
        All done! Total: 17753
        ...
        real 5m39.666s
        user 0m0.212s
        sys 0m0.024s

        In the pipeline with NojeJS, I’m reusing the same connection (and do not open a new one for each thread).

      3. Direct query – partitioned table:
        mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark_part where url like ‘%postgresql%’;
        +-----------------+
        | sum(tot_visits) |
        +-----------------+
        | 17753           |
        +-----------------+
        1 row in set (5 min 31.44 sec)
      4. Direct query – non-partitioned table.
        mysql> select sum(tot_visits) from wikistats.wikistats_by_day_spark where url like ‘%postgresql%’;
        +-----------------+
        | sum(tot_visits) |
        +-----------------+
        | 17753           |
        +-----------------+
        1 row in set (4 min 38.16 sec)
Advantages of pipelines with X Plugin 

Although pipelining with X Plugin does not significantly increase query response time (it can reduce the total latency), it might be helpful in some cases. For example, let’s say we are downloading something from the Internet and need to save the progress of the download as well as the metadata for the document. In this example, I use youtube-dl to search and download the metadata about YouTube videos, then save the metadata JSON into MySQL 5.7 Document Store. Here is the code:

var mysqlx = require('mysqlx');
# This is the same as running $ youtube-dl -j -i ytsearch100:"mysql 5.7"
const spawn = require('child_process').spawn;
const yt = spawn('youtube-dl', ['-j', '-i', 'ytsearch100:"mysql 5.7"'], {maxBuffer: 1024 * 1024 * 128});
var mySession =
mysqlx.getSession({
    host: 'localhost',
    port: 33060,
    dbUser: 'root',
    dbPassword: '<your password>'
});
yt.stdout.on('data', (data) => {
        try {
                dataObj = JSON.parse(data);
                console.log(dataObj.fulltitle);
                mySession.then(session => {
                                                session.getSchema("yt").getCollection("youtube").add(  dataObj  )
                                                .execute(function (row) {
                                                }).catch(err => {
                                                        console.log(err);
                                                })
                                                .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices))  });
                                }).catch(function (err) {
                                              console.log(err);
                                              process.exit();
                                });
        } catch (e) {
                console.log(" --- Can't parse json" + e );
        }
});
yt.stderr.on('data', (data) => {
  console.log("Error receiving data");
});
yt.on('close', (code) => {
  console.log(`child process exited with code ${code}`);
  mySession.then(session => {session.close() } );
});

In the above example, I execute the youtube-dl binary (you need to have it installed first) to search for “MySQL 5.7” videos. Instead of downloading the videos, I only grab the video’s metadata in JSON format  (“-j” flag). Because it is JSON, I can save it into MySQL document store. The table has the following structure:

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  UNIQUE KEY `_id` (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Here is the execution example:

$ node yt.js
What's New in MySQL 5.7
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["3f312c3b-b2f3-55e8-0ee9-b706eddf"]}}
MySQL 5.7: MySQL JSON data type example
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["88223742-9875-59f1-f535-f1cfb936"]}}
MySQL Performance Tuning: Part 1. Configuration (Covers MySQL 5.7)
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["c377e051-37e6-8a63-bec7-1b81c6d6"]}}
Dave Stokes — MySQL 5.7 - New Features and Things That Will Break — php[world] 2014
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["96ae0dd8-9f7d-c08a-bbef-1a256b11"]}}
MySQL 5.7 & JSON: New Opportunities for Developers - Thomas Ulin - Forum PHP 2015
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["ccb5c53e-561c-2ed5-6deb-1b325739"]}}
Cara Instal MySQL 5.7.10 NoInstaller pada Windows Manual Part3
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["95efbd79-8d79-e7b6-a535-271640c8"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["b8cfe132-aca4-1eba-c2ae-69e48db8"]}}

Now, here is what make this example interesting: as NodeJS + X Plugin = Asynchronous + Pipelining, the program execution will not stop if the table is locked. I’ve opened two sessions:

  • session 1: $ node yt.js > test_lock_table.log
  • session 2:
    mysql> lock table youtube read; select sleep(10); unlock tables;
    Query OK, 0 rows affected (0.00 sec)
    +-----------+
    | sleep(10) |
    +-----------+
    |         0 |
    +-----------+
    1 row in set (10.01 sec)
    Query OK, 0 rows affected (0.00 sec)

Results:

...
Upgrade MySQL Server from 5.5 to 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["d4d62a8a-fbfa-05ab-2110-2fd5cf6d"]}}
OSC15 - Georgi Kodinov - Secure Deployment Changes Coming in MySQL 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["8ac1cdb9-1499-544c-da2a-5db1ccf5"]}}
MySQL 5.7: Create JSON string using mysql
FreeBSD 10.3 - Instalación de MySQL 5.7 desde Código Fuente - Source Code
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 1
How to install MySQL Server on Mac OS X Yosemite - ltamTube
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 4
COMO INSTALAR MYSQL VERSION 5.7.13
MySQL and JSON
MySQL 5.7: Merge JSON data using MySQL
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["a11ff369-6f23-11e9-187b-e3713e6e"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["06143a61-4add-79da-0e1d-c2b52cf6"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["1eb94ef4-db63-cb75-767e-e1555549"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e25f15b5-8c19-9531-ed69-7b46807a"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["02b5a4c9-6a21-f263-90d5-cd761906"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e0bef958-10af-b181-81cd-5debaaa0"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["f48fa635-fa63-7481-0668-addabbac"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["557fa5c5-3c8a-fe01-c17c-549c557e"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["456b11d8-ba03-0aec-8e06-9517c6e1"]}}
MySQL WorkBench 6.3 installation on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["0b651987-9b23-b5e0-f8f7-49b8ba5c"]}}
Going through era of IoT with MySQL 5.7 - FOSSASIA 2016
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e133746c-836c-a7e0-3893-292a7429"]}}
MySQL 5.7: MySQL JSON operator example
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["4d13830d-7b30-5b31-d068-c7305e0a"]}}

As we can see, the first two writes were immediate. Then I’ve locked the table, and no MySQL queries went through. At the same time the download process (which is the slowest part here) proceeded and was not blocked (we can see the titles above, which are not followed by lines “… => wrote to MySQL:”). When the table was unlocked, a pile of waiting queries succeeded.

This can be very helpful when running a “download” process, and the network is a bottleneck. In a traditional synchronous query execution, when we lock a table the application gets blocked (including the network communication). With NodeJS and X Plugin, the download part will proceed with MySQL acting as a queue.

Pipeline Durability

How “durable” this pipeline, you might ask. In other words, what will happen if I will kill the connection? To test it out, I have (once again) locked the table (but now before starting the nodejs), killed the connection and finally unlocked the table. Here are the results:

Session 1:
----------
mysql> truncate table youtube_new;
Query OK, 0 rows affected (0.25 sec)
mysql> lock table youtube_new read;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
Session 2:
----------
(when table is locked)
$ node yt1.js
11 03  MyISAM
Switching to InnoDB from MyISAM
tablas InnoDB a MyISAM
MongoDB vs MyISAM (MariaDB/MySQL)
MySQL Tutorial 35 - Foreign Key Constraints for the InnoDB Storage Engine
phpmyadmin foreign keys myisam innodb
Convert or change database manual from Myisam to Innodb
... >100 other results omited ...
^C
Session 1:
----------
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
     Id: 4916
   User: root
   Host: localhost:33221
     db: NULL
Command: Query
   Time: 28
  State: Waiting for table metadata lock
   Info: PLUGIN: INSERT INTO `iot`.`youtube_new` (doc) VALUES ('{"upload_date":"20140319","protocol":"
mysql> unlock table;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
mysql>  select json_unquote(doc->'$.title') from youtube_new;
+---------------------------------+
| json_unquote(doc->'$.title')    |
+---------------------------------+
| 11 03  MyISAM                   |
| Switching to InnoDB from MyISAM |
+---------------------------------+
2 rows in set (0.00 sec)

Please note: in the above, there isn’t a single acknowledgement from the MySQL server. When code receives a response from MySQL it prints “Wrote to MySQL: {“_state”:{“rows_affected”:1,”doc_ids”:[“…”]}}“. Also, note that when the connection was killed the MySQL process is still there, waiting on the table lock.

What is interesting here is is that only two rows have been inserted into the document store. Is there a “history length” here or some other buffer that we can increase? I’ve asked Jan Kneschke, one of the authors of the X Protocol, and the answers were:

  • Q: Is there any history length or any buffer and can we tune it?
    • A: There is no “history” or “buffer” at all, it is all at the connector level.
  • Q: Then why is 2 rows were finally inserted?
    • To answer this question I’ve collected tcpdump to port 33060 (X Protocol), see below

This is very important information! Keep in mind that the asynchronous pipeline has no durability settings: if the application fails and there are some pending writes, those writes can be lost (or could be written).

To fully understand how the protocol works, I’ve captured tcpdump (Jan Kneschke helped me to analyze it):

tcpdump -i lo -s0 -w tests/node-js-pipelining.pcap "tcp port 33060"

This is what is happening:

  • When I hit CTRL+C, nodejs closes the connection. As the table is still locked, MySQL can’t write to it and will not send the result of the insert back.
  • When the table is unlocked, it starts the first statement despite the fact that the connection has been closed. It then acknowledges the first insert and starts the second one.
  • However, at this point the script (client) has already closed the connection and the final packet (write done, here is the id) gets denied. The X Plugin then finds out that the client closed the connection and stops executing the pipeline.

Actually, this is very similar to how the original MySQL protocol worked. If we kill the script/application, it doesn’t automatically kill the MySQL connection (unless you hit CTRL+C in the MySQL client, sends the kill signal) and the connection waits for the table to get unlocked. When the table is unlocked, it inserts the first statement from a file.

Session 1
---------
mysql> select * from t_sql;
Empty set (0.00 sec)
mysql> lock table t_sql read;
Query OK, 0 rows affected (0.00 sec)
Session 2:
----------
$ mysql iot < t.sql
$ kill -9 ...
[3]   Killed                  mysql iot < t.sql
Session 1:
----------
mysql> show processlist;
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| Id   | User | Host            | db   | Command | Time    | State                           | Info                                          |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| 4913 | root | localhost       | iot  | Query   |      41 | Waiting for table metadata lock | insert into t_sql  values('{"test_field":0}') |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
4 rows in set (0.00 sec)
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t_sql;
+-------------------+
| doc               |
+-------------------+
| {"test_field": 0} |
+-------------------+
1 row in set (0.00 sec)

Enforcing unique checks

If I restart my script, it finds the same videos again. We will probably need to enforce the consistency of our data. By default the plugin generates the unique key (_id) for the document, so it prevents inserting the duplicates.

Another way to enforce the unique checks is to create a unique key for youtube id. Here is the updated table structure:

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
  UNIQUE KEY `youtube_id` (`youtube_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

I’ve changed the default “_id” column to the YouTube’s unique ID. Now when I restart the script it shows:

MySQL 5.7: Merge JSON data using MySQL
{ [Error: Document contains a field value that is not unique but required to be]
  info:
   { severity: 0,
     code: 5116,
     msg: 'Document contains a field value that is not unique but required to be',
     sql_state: 'HY000' } }
... => wrote to MySQL: undefined

…as this document has already been loaded.

Conclusion

Although X Plugin pipelining does not necessarily significantly increase query response (it might save the roundtrip time) it can be helpful for some applications.We might not want to block the network communication (i.e., downloading or API calls) when the MySQL table is locked, for example. At the same time, unless you check/wait for the acknowledgement from the server, the data might or might not be written into MySQL.

Bonus: data analysis

Now we can see what we have downloaded. There are a number of interesting fields in the result:

"is_live": null,
	"license": "Standard YouTube License",
	"duration": 2965,
	"end_time": null,
	"playlist": ""mysql 5.7"",
	"protocol": "https",
	"uploader": "YUI Library",
	"_filename": "Douglas Crockford - The JSON Saga--C-JoyNuQJs.mp4",
	"age_limit": 0,
	"alt_title": null,
	"extractor": "youtube",
	"format_id": "18",
	"fulltitle": "Douglas Crockford: The JSON Saga",
	"n_entries": 571,
	"subtitles": {},
	"thumbnail": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg",
	"categories": ["Science & Technology"],
	"display_id": "-C-JoyNuQJs",
	"like_count": 251,
	"player_url": null,
	"resolution": "640x360",
	"start_time": null,
	"thumbnails": [{
		"id": "0",
		"url": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg"
	}],
	"view_count": 36538,
	"annotations": null,
	"description": "Yahoo! JavaScript architect Douglas Crockford tells the story of how JSON was discovered and how it became a major standard for describing data.",
	"format_note": "medium",
	"playlist_id": ""mysql 5.7"",
	"upload_date": "20110828",
	"uploader_id": "yuilibrary",
	"webpage_url": "https://www.youtube.com/watch?v=-C-JoyNuQJs",
	"uploader_url": "http://www.youtube.com/user/yuilibrary",
	"dislike_count": 5,
	"extractor_key": "Youtube",
	"average_rating": 4.921875,
	"playlist_index": 223,
	"playlist_title": null,
	"automatic_captions": {},
	"requested_subtitles": null,
	"webpage_url_basename": "-C-JoyNuQJs"

We can see the most popular videos. To do that I’ve added one more virtual field on view_count, and created an index on it:

CREATE TABLE `youtube` (
  `doc` json DEFAULT NULL,
  `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
  `view_count` int(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.view_count'))) VIRTUAL,
  UNIQUE KEY `youtube_id` (`youtube_id`),
  KEY `view_count` (`view_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

We can run the queries like:

mysql> select json_unquote(doc->'$.title'),
    -> view_count,
    -> json_unquote(doc->'$.dislike_count') as dislikes
    -> from youtube
    -> order by view_count desc
    -> limit 10;
+----------------------------------------------------------------------------------------------------+------------+----------+
| json_unquote(doc->'$.title')                                                                       | view_count | dislikes |
+----------------------------------------------------------------------------------------------------+------------+----------+
| Beginners MYSQL Database Tutorial 1 # Download , Install MYSQL and first SQL query                 |     664153 | 106      |
| MySQL Tutorial                                                                                     |     533983 | 108      |
| PHP and MYSQL - Connecting to a Database and Adding Data                                           |     377006 | 50       |
| PHP MySQL Tutorial                                                                                 |     197984 | 41       |
| Installing MySQL (Windows 7)                                                                       |     196712 | 28       |
| Understanding PHP, MySQL, HTML and CSS and their Roles in Web Development - CodersCult Webinar 001 |     195464 | 24       |
| jQuery Ajax Tutorial #1 - Using AJAX & API's (jQuery Tutorial #7)                                  |     179198 | 25       |
| How To Root Lenovo A6000                                                                           |     165221 | 40       |
| MySQL Tutorial 1 - What is MySQL                                                                   |     165042 | 45       |
| How to Send Email in Blackboard Learn                                                              |     144948 | 28       |
+----------------------------------------------------------------------------------------------------+------------+----------+
10 rows in set (0.00 sec)

Or if we want to find out the most popular resolutions:

mysql> select count(*) as cnt,
    -> sum(view_count) as sum_views,
    -> json_unquote(doc->'$.resolution') as resolution
    -> from youtube
    -> group by resolution
    -> order by cnt desc, sum_views desc
    -> limit 10;
+-----+-----------+------------+
| cnt | sum_views | resolution |
+-----+-----------+------------+
| 273 |   3121447 | 1280x720   |
|  80 |   1195865 | 640x360    |
|  18 |     33958 | 1278x720   |
|  15 |     18560 | 1152x720   |
|  11 |     14800 | 960x720    |
|   5 |      6725 | 1276x720   |
|   4 |     18562 | 1280x682   |
|   4 |      1581 | 1280x616   |
|   4 |       348 | 1280x612   |
|   3 |      2024 | 1200x720   |
+-----+-----------+------------+
10 rows in set (0.02 sec)

Special thanks to Jan Kneschke and Morgan Tocker from Oracle for helping with the X Protocol internals.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com