Mar
29
2018
--

Using ProxySQL and VIRTUAL Columns to Solve ORM Issues

ProxySQL and VIRTUAL columns

ProxySQL and VIRTUAL columnsIn this blog post, we’ll look at using ProxySQL and VIRTUAL columns to solve ORM issues.

There are a lot of web frameworks all around. Programmers and web designers are using them to develop and deploy any website and web application. Just to cite some of the most famous names: Drupal, Ruby on Rails, Symfony, etc.

Web frameworks are very useful tools. But sometimes, as with many human artifacts, they have issues. Any framework has its own queries to manage its internal tables. While there is nothing wrong with that, but it often means these queries are not optimized.

Here is my case with Symfony 2 on MySQL 5.7, and how I solved it.

The sessions table issue

Symfony has a table to manage session data for users on the application. The table is defined as follow:

CREATE TABLE `sessions` (
 `sess_id` varchar(126) COLLATE utf8_bin NOT NULL,
 `sess_data` blob NOT NULL,
 `sess_time` int(10) unsigned NOT NULL,
 `sess_lifetime` mediumint(9) NOT NULL,
 PRIMARY KEY (`sess_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

The expiration time of the user session is configurable. The developers decided to configure it to be one month.

Symfony was serving a high traffic website, and very soon that table became very big. After one month, I saw it had more than 14 million rows and was more than 3GB in size.

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, TABLE_ROWS, DATA_LENGTH
    -> FROM information_schema.tables WHERE table_schema='symfony' AND table_name='sessions'\G
*************************** 1. row ***************************
  TABLE_SCHEMA: symfony
    TABLE_NAME: sessions
        ENGINE: InnoDB
    TABLE_ROWS: 14272158
   DATA_LENGTH: 3306140672

Developers noticed the web application sometimes stalling for a few seconds. First, I analyzed the slow queries on MySQL and I discovered that sometimes Symfony deletes inactive sessions. It issued the following query, which took several seconds to complete. This query was the cause of the stalls in the application:

DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847

The query is not optimized. Let’s have a look at the EXPLAIN:

mysql> EXPLAIN DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847\G
*************************** 1. row ***************************
           id: 1
  select_type: DELETE
        table: sessions
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 14272312
     filtered: 100.00
        Extra: Using where

Every DELETE query was a full table scan of more than 14 million rows. So, let’s try to improve it.

First workaround

Looking around on the web and discussing it with colleagues, we’ve found some workarounds. But none of them was the definitive solution:

  1. Reduce expiration time in Symfony configuration. Good idea. One month is probably too long for a high traffic website. But we kept the expiration time configured at one month because of an internal business policy. But even one week wouldn’t have solved the full table scan.
  2. Using a different database solution. Redis was proposed as an alternative to MySQL to manage session data. This might be a good solution, but it could involve a long deployment time. We planned a test, but the sysadmins suggested it was not a good solution to have another database system for such a simple task.
  3. Patching Symfony code. It was proposed to rewrite the query directly into the Symfony code. Discarded.
  4. Create indexes. It was proposed to create indexes on sess_time and sess_lifetime columns. The indexes wouldn’t get used because of the arithmetic addition on the where clause. This is the only condition we have on the query.

So, what do we do if everything must remain the same? Same configuration, same environment, same query issued and no indexes added?

Query optimization using a virtual column

I focused on how to optimize the query. Since I was using 5.7, I thought about a generated virtual column. I decided to add a virtual column in the sessions table, defined as sess_time+sess_lifetime (the same as the condition of the query):

mysql> ALTER TABLE sessions
ADD COLUMN `sess_delete` INT UNSIGNED GENERATED ALWAYS AS ((`sess_time` + `sess_lifetime`)) VIRTUAL;

Any virtual column can have an index on it. So, I created the index:

mysql> ALTER TABLE sessions ADD INDEX(sess_delete);

Note: I first checked that the INSERT queries were well written in Symfony (with an explicit list of the fields to insert), in make sure this modification wouldn’t cause more issues. Making a schema change on a table that is in use by any framework, where the queries against the table are generally outside of your control, can be a daunting task.

So, let’s EXPLAIN the query rewritten as follows, with the condition directly on the generated indexed column:

mysql> EXPLAIN DELETE FROM sessions WHERE sess_delete < 1521025847\G
*************************** 1. row ***************************
           id: 1
  select_type: DELETE
        table: sessions
         type: range
possible_keys: sess_delete
          key: sess_delete
      key_len: 5
          ref: const
         rows: 6435
     filtered: 100.00
        Extra: Using where

The query now can to use the index, and the number of rows selected are the exact number of the session that we have to delete.

So far, so good. But will Symfony execute that query if we don’t want to modify the source code?

Using ProxySQL to rewrite the query

Fortunately, we already had ProxySQL up and running in our environment. We were using it just to manage the master MySQL failover.

One of the very useful features of ProxySQL is the ability to rewrite any query it receives into another one based on rules you can define. You can create queries from very simple rules, like changing the name of a field, to very complex queries that use a chain of rules. It depends on how complex the translation is that you have to do. In our case, we just needed to translate sess_time + sess_lifetime into sess_delete. The rest of the query was the same. We needed to define a very simple rule.

Let’s see how to create the rewrite rules.

Connect to the proxy:

mysql -u admin -psecretpwd -h 127.0.0.1 -P6032 --prompt='Admin> '

Define the rewrite rule by inserting a record into the mysql_query_rules table:

Admin> INSERT INTO mysql_query_rules(rule_id,active,flagIN,match_pattern,negate_match_pattern,re_modifiers,replace_pattern,destination_hostgroup,apply)
 -> VALUES(
 -> 1,
 -> 1,
 -> 0,
 -> '^DELETE FROM sessions WHERE sess_lifetime + sess_time < (.*)',
 -> 0,
 -> 'CASELESS',
 -> 'DELETE FROM sessions WHERE sess_delete < \1',
 -> 0,
 -> 1);

The two fields I want to focus on are:

  • match_pattern: it defines the query to be matched using the regular expression notation. The + symbol must be escaped using because it’s a special character for regular expressions
  • replace_pattern: it defines how to rewrite the matched query. 1 is the value of the parameter matched by match_pattern into (.*)

For the meaning of the other fields, have a look at https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration.

Once created, we have to save the rule to disk and put it on runtime to let it run effectively.

Admin> SAVE MYSQL QUERY RULES TO DISK;
Admin> LOAD MYSQL QUERY RULES TO RUNTIME;

After that, the proxy began to filter the query and rewrite it to have a better execution plan using the index on the virtual column.

Note: pay attention when you need to upgrade the framework. If it needs to rebuild the database tables, you will lose the virtual column you’ve created. Just remember to recreate it and check it after the upgrade.

Conclusion

Developers love using web frameworks because they are very powerful in simplifying development and deployment of complex web applications. But for DBAs, sometimes internal queries can cause a bit of a headache because it is not well optimized or because it was not supposed to run in your “huge” database. I solved my case using ProxySQL and VIRTUAL columns with a minimal impact on the architecture of the system we had and avoided any source code patching.

Take this post as a tip in case you face similar issues with your application framework.

The post Using ProxySQL and VIRTUAL Columns to Solve ORM Issues appeared first on Percona Database Performance Blog.

Apr
29
2015
--

Generated (Virtual) Columns in MySQL 5.7 (labs)

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm
	    WHERE dayofweek(FlightDate) = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 151253427
        Extra: Using where; Using temporary; Using filesort
Results:
32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER
TRIGGER ontime_insert
BEFORE INSERT ON ontime_sm_triggers
FOR EACH ROW
SET
NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `Flight_dayofweek` (`Flight_dayofweek`),
) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual  WHERE Flight_dayofweek = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek
          key: Flight_dayofweek
      key_len: 2
          ref: const
         rows: 165409
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual
       add key comb(Flight_dayofweek, carrier);
ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;
Query OK, 0 rows affected (0.43 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);
Query OK, 999999 rows affected (36.79 sec)
Records: 999999  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek,comb
          key: comb
      key_len: 2
          ref: const
         rows: 141223
     filtered: 100.00
        Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (27.86 sec)
Records: 999999  Duplicates: 0  Warnings: 0
mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (16.29 sec)
Records: 999999  Duplicates: 0  Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.
mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com