In this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.
X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly!
Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.
I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.
Creating our first collection
We start the MySQL shell and create our first collection:
$ mysqlsh -u root --py Creating an X Session to root@localhost:33060 No default schema selected. [...] Currently in Python mode. Use sql to switch to SQL mode and execute queries. mysql-py> db.createCollection("people")
What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:
CREATE TABLE `people` ( `doc` json DEFAULT NULL, `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL, PRIMARY KEY (`_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.
So, the basics are clear.
- It stores everything inside a JSON column.
- Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.
Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:
mysql-py> db.getCollections() [ ]
This is what MySQL actually runs:
SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;
This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.
Inserting and reading documents
I am going to start adding data to our collection. Let’s add our first document:
mysql-py> db.people.add( ... { ... "Name": "Miguel Angel", ... "Country": "Spain", ... "Age": 33 ... } ... )
In the background, MySQL inserts a JSON object and auto-assign a primary key value.
INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));
Ok, this is supposed to be schemaless. So let’s add someone else using different fields:
mysql-py> db.people.add( ... { ... "Name": "Thrall", ... "Race": "Orc", ... "Faction": "Horde" ... } ... )
Same as before, MySQL just writes another JSON object (with different fields):
INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));
Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:
mysql-py> db.people.find()
MySQL translates to a simple:
SELECT doc FROM `test`.`people`;
And this is how filters are transformed:
mysql-py> db.people.find("Name = 'Thrall'")
It uses a SELECT with the WHERE clause on data extracted from the JSON object.
SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');
Updating documents
Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:
mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")
MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:
UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');
Now I want to remove my own document:
mysql-py> db.people.remove("Name = 'Miguel Angel'");
As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:
DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');
Summary
The magic that makes our MySQL work like a document-store NoSQL database is:
- Create a simple InnoDB table with a JSON column.
- Auto-generate the primary key with UUID values and represent it as a virtual column.
- All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.
I would define the solution as something really clever, simple and clean. Congrats to Oracle!