Sep
16
2016
--

How X Plugin Works Under the Hood

X Plugin

X PluginIn this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.

X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! ?

Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.

I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.

Creating our first collection

We start the MySQL shell and create our first collection:

$ mysqlsh -u root --py
Creating an X Session to root@localhost:33060
No default schema selected.
[...]
Currently in Python mode. Use sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("people")

What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:

CREATE TABLE `people` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.

So, the basics are clear.

  • It stores everything inside a JSON column.
  • Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.

Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:

mysql-py> db.getCollections()
[
]

This is what MySQL actually runs:

SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;

This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.

Inserting and reading documents

I am going to start adding data to our collection. Let’s add our first document:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Miguel Angel",
      ...     "Country": "Spain",
      ...     "Age": 33
      ...   }
      ... )

In the background, MySQL inserts a JSON object and auto-assign a primary key value.

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));

Ok, this is supposed to be schemaless. So let’s add someone else using different fields:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Thrall",
      ...     "Race": "Orc",
      ...     "Faction": "Horde"
      ...   }
      ... )

Same as before, MySQL just writes another JSON object (with different fields):

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));

Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:

mysql-py> db.people.find()

MySQL translates to a simple:

SELECT doc FROM `test`.`people`;

And this is how filters are transformed:

mysql-py> db.people.find("Name = 'Thrall'")

It uses a SELECT with the WHERE clause on data extracted from the JSON object.

SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Updating documents

Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:

mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")

MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:

UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Now I want to remove my own document:

mysql-py> db.people.remove("Name = 'Miguel Angel'");

As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:

DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');

Summary

The magic that makes our MySQL work like a document-store NoSQL database is:

  • Create a simple InnoDB table with a JSON column.
  • Auto-generate the primary key with UUID values and represent it as a virtual column.
  • All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.

I would define the solution as something really clever, simple and clean. Congrats to Oracle! ?

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com