About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.
The idea is very simple: if we store a column
`FlightDate` date
in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.
Here is the original example:
CREATE TABLE `ontime` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`FlightDate` date DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`OriginAirportID` int(11) DEFAULT NULL,
`OriginCityName` varchar(100) DEFAULT NULL,
`OriginState` char(2) DEFAULT NULL,
`DestAirportID` int(11) DEFAULT NULL,
`DestCityName` varchar(100) DEFAULT NULL,
`DestState` char(2) DEFAULT NULL,
`DepDelayMinutes` int(11) DEFAULT NULL,
`ArrDelayMinutes` int(11) DEFAULT NULL,
`Cancelled` tinyint(4) DEFAULT NULL,
`CancellationCode` char(1) DEFAULT NULL,
`Diverted` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB
Now I want to find all flights on Sundays (in 2013) and group by airline.
mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm
WHERE dayofweek(FlightDate) = 7 group by carrier
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ontime_sm
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 151253427
Extra: Using where; Using temporary; Using filesort
Results:
32 rows in set (1 min 57.93 sec)
The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:
ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;
Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:
CREATE DEFINER = CURRENT_USER
TRIGGER ontime_insert
BEFORE INSERT ON ontime_sm_triggers
FOR EACH ROW
SET
NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);
One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.
The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:
CREATE TABLE `ontime_sm_virtual` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`FlightDate` date DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`OriginAirportID` int(11) DEFAULT NULL,
`OriginCityName` varchar(100) DEFAULT NULL,
`OriginState` char(2) DEFAULT NULL,
`DestAirportID` int(11) DEFAULT NULL,
`DestCityName` varchar(100) DEFAULT NULL,
`DestState` char(2) DEFAULT NULL,
`DepDelayMinutes` int(11) DEFAULT NULL,
`ArrDelayMinutes` int(11) DEFAULT NULL,
`Cancelled` tinyint(4) DEFAULT NULL,
`CancellationCode` char(1) DEFAULT NULL,
`Diverted` tinyint(4) DEFAULT NULL,
`CRSElapsedTime` int(11) DEFAULT NULL,
`ActualElapsedTime` int(11) DEFAULT NULL,
`AirTime` int(11) DEFAULT NULL,
`Flights` int(11) DEFAULT NULL,
`Distance` int(11) DEFAULT NULL,
`Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,
PRIMARY KEY (`id`),
KEY `Flight_dayofweek` (`Flight_dayofweek`),
) ENGINE=InnoDB
Here we add Flight_dayofweek
tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.
Now MySQL can use this index:
mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ontime_sm_virtual
partitions: NULL
type: ref
possible_keys: Flight_dayofweek
key: Flight_dayofweek
key_len: 2
ref: const
rows: 165409
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:
mysql> alter table ontime_sm_virtual
add key comb(Flight_dayofweek, carrier);
ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.
We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:
mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;
Query OK, 0 rows affected (0.43 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);
Query OK, 999999 rows affected (36.79 sec)
Records: 999999 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ontime_sm_virtual
partitions: NULL
type: ref
possible_keys: Flight_dayofweek,comb
key: comb
key_len: 2
ref: const
rows: 141223
filtered: 100.00
Extra: Using where; Using index
Now MySQL will use an index and completely avoid the filesort.
The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:
mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (27.86 sec)
Records: 999999 Duplicates: 0 Warnings: 0
mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (16.29 sec)
Records: 999999 Duplicates: 0 Warnings: 0
Now the big disappointment: all operations with generated columns are not online right now.
mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.
mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.
I hope it will be fixed in the future releases.
Conclusion
Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).
More information:
The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.