Mar
07
2016
--

JSON document fast lookup with MySQL 5.7

JSON document fast lookup with MySQL 5-7

JSON document fast lookup with MySQL 5-7In this blog post, we’ll discuss JSON document fast lookup with MySQL 5.7.

Recently I attended Morgan Tocker’s talk on MySQL 5.7 and JSON at FOSDEM, and I found it awesome.

I learned some great information from the talk. Let me share one of them here: a very useful trick if you plan to store JSON documents in your MySQL database and want to retrieve the documents from some attribute’s values. So let’s look at how to do JSON document fast lookup with MySQL 5.7!

In this short example, I show you how we can speed up this type of search using JSON functions and virtual columns.

This our test table:

Table: test_features
Create Table: CREATE TABLE `test_features` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feature` json NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1
mysql> show table status like 'test_features'G
*************************** 1. row ***************************
           Name: test_features
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 171828
 Avg_row_length: 1340
    Data_length: 230326272
Max_data_length: 0
   Index_length: 0
      Data_free: 3145728
 Auto_increment: 206561
    Create_time: 2016-03-01 15:22:34
    Update_time: 2016-03-01 15:23:20
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:

We can see the data length is almost 230M:

+--------------------+--------+-------+-------+-------+------------+---------+
| TABLE              | ENGINE | ROWS  | DATA  | IDX   | TOTAL SIZE | IDXFRAC |
+--------------------+--------+-------+-------+-------+------------+---------+
| json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G      |    0.00 |
+--------------------+--------+-------+-------+-------+------------+---------+
-rw-r----- 1 mysql mysql 228M Mar  1 15:23 /var/lib/mysql/json/test_features.ibd

As an example here is one record (the data is coming from https://github.com/zemirco/sf-city-lots-json):

{
    "type": "Feature",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ],
                ...
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ]
            ]
        ]
    },
    "properties": {
        "TO_ST": "600",
        "BLKLOT": "0010001",
        "STREET": "BEACH",
        "FROM_ST": "600",
        "LOT_NUM": "001",
        "ST_TYPE": "ST",
        "ODD_EVEN": "E",
        "BLOCK_NUM": "0010",
        "MAPBLKLOT": "0010001"
    }
}

Now let’s try to find all records where the street is “BEACH”. “Street” is part of the array attribute properties.

mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH';
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.21 sec)
mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_features
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 171828
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

As you can see, we perform a full table scan to achieve this.

With MySQL, we have the possibility of using virtually generated columns. Let’s create one for the streets:

mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;

I use “json_unquote()” to avoid to add the JSON string quotes in the column, and later in the index.

You can verify the size of the table on disk, and you will see this doesn’t increase (as it’s a virtual column).

Even if we can now use the “street” column in the search, that won’t help. We still need to add an index on it:

mysql> ALTER TABLE test_features ADD KEY `street` (`street`);

And now we can see that the size is larger, because we have added the size of the index:

-rw-r----- 1 mysql mysql 232M Mar  1 15:48 /var/lib/mysql/json/test_features.ibd

Now we can try to run the query like this:

mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH';
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

Let’s have a look at the Query Execution Plan:

mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_features
   partitions: NULL
         type: ref
possible_keys: street
          key: street
      key_len: 33
          ref: const
         rows: 208
     filtered: 100.00
        Extra: Using index

And finally we can verify this in the statistics available in sys schema:

mysql> select * from sys.schema_index_statistics where table_name='test_features'G
*************************** 1. row ***************************
  table_schema: json
    table_name: test_features
    index_name: street
 rows_selected: 208
select_latency: 72.59 us
 rows_inserted: 0
insert_latency: 0 ps
  rows_updated: 0
update_latency: 0 ps
  rows_deleted: 0
delete_latency: 0 ps
*************************** 2. row ***************************
  table_schema: json
    table_name: test_features
    index_name: PRIMARY
 rows_selected: 0
select_latency: 0 ps
 rows_inserted: 0
insert_latency: 0 ps
  rows_updated: 0
update_latency: 0 ps
  rows_deleted: 0
delete_latency: 0 ps
2 rows in set (0.00 sec)

As you can see, this is very fast. If you already know how you want to retrieve data out of your JSON document, it’s very easy to add such indexes in MySQL.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com