Jun
01
2015
--

New PERFORMANCE_SCHEMA defaults in MySQL 5.7.7

I thought it was worth a moment to reiterate on the new Performance Schema related defaults that MySQL 5.7.7 brings to the table, for various reasons.

For one, most of you might have noticed that profiling was marked as deprecated in MySQL 5.6.7. So it is expected that you invest into learning more about Performance Schema (and Mark’s sys schema!).

Second, there are lots of virtual environments and appliances out there running Community Edition MySQL where Performance Schema can be a useful tool for analyzing performance. Thus, expect to see more articles about using PERFORMANCE_SCHEMA and SYS_SCHEMA from us!

Third, we have more and more junior readers who might benefit from light reads such as this. :)

The new defaults that I wanted to highlight are mentioned in the MySQL 5.7.7 release notes:
– The MySQL sys schema is now installed by default during data directory installation.
– The events_statements_history and events_transactions_history consumers now are enabled by default.

Note that if you are upgrading from an earlier version of MySQL to 5.7.7 to get these goodies you will need to run mysql_upgrade and restart the database for the above changes to take effect.

So what do these mean?

If you haven’t had a chance to dig into PERFORMANCE_SCHEMA, check out the quick start guide here. PERFORMANCE_SCHEMA is a nify tool (implemented as a union of a storage engine and a schema in MySQL) to monitor MySQL server execution at a low level, with a focus on performance metrics. It monitors for events that have been “instrumented”, such as function calls, OS wait times, synchronization calls, etc. With performance nomenclature “instruments” are essentially “probes”. The events that the instruments generate can be processed by consumers. Note that not all instruments or consumers are enabled by default.

Some would say that the structure of PERFORMANCE_SCHEMA may be complex and may not be very DBA-friendly. This is what led to the birth of SYS_SCHEMA. For those who are not familiar with Mark Leith’s SYS_SCHEMA and prefer TL;DR – it provides human friendly views, functions and procedures that can help you analyze database usage using PERFORMANCE_SCHEMA. If you haven’t had a chance to check it out yet you might want to read Miguel’s article on using the sys schema or Alexander Rubin’s article about using it in multitenant environments and give it a spin!

I welcome the fact that events_statements_history and events_transactions_history consumers are enabled by default in MySQL 5.7.7 as it means that we get some handy performance details available to us out of the box in vanilla MySQL. Note that these are per-thread tables and by default the history length (the length of the number of entries present; more on those variables here) is automatically sized, thus you may need to increase them.

What details do you get off the bat with them?

Consider the following example:

mysql> select * from performance_schema.events_statements_history where event_id=353G
*************************** 1. row ***************************
              THREAD_ID: 20
               EVENT_ID: 353
           END_EVENT_ID: 456
             EVENT_NAME: statement/sql/select
                 SOURCE: mysqld.cc:963
            TIMER_START: 1818042501405000
              TIMER_END: 1818043715449000
             TIMER_WAIT: 1214044000
              LOCK_TIME: 67000000
               SQL_TEXT: select * from imdb.title limit 100
                 DIGEST: ec93c38ab021107c2160259ddee31faa
            DIGEST_TEXT: SELECT * FROM `imdb` . `title` LIMIT ?
         CURRENT_SCHEMA: performance_schema
            OBJECT_TYPE: NULL
          OBJECT_SCHEMA: NULL
            OBJECT_NAME: NULL
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
      RETURNED_SQLSTATE: NULL
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 100
          ROWS_EXAMINED: 100
CREATED_TMP_DISK_TABLES: 0
     CREATED_TMP_TABLES: 0
       SELECT_FULL_JOIN: 0
 SELECT_FULL_RANGE_JOIN: 0
           SELECT_RANGE: 0
     SELECT_RANGE_CHECK: 0
            SELECT_SCAN: 1
      SORT_MERGE_PASSES: 0
             SORT_RANGE: 0
              SORT_ROWS: 0
              SORT_SCAN: 0
          NO_INDEX_USED: 1
     NO_GOOD_INDEX_USED: 0
       NESTING_EVENT_ID: NULL
     NESTING_EVENT_TYPE: NULL
1 row in set (0.00 sec)

As you can see from above you get similar data that you are used to seeing from EXPLAINs and slow query logs, such as query run time, locking time, rows sent/examined, etc. For instance, in above output my query obtained about a 100 rows (lines 26-27), avoided creating temp tables (lines 28-29) and didn’t have to sort (lines 36-38) and no index was used (line 39) and it ran for about 121 ms (TIMER_END-TIMER_START). The list of details provided is not as abundant as it could be but I imagine that with newer releases the list may grow.

If you want to read on and are curious about how to use Performance Schema for profiling check out Jervin’s great article here!

The post New PERFORMANCE_SCHEMA defaults in MySQL 5.7.7 appeared first on MySQL Performance Blog.

Apr
29
2015
--

Optimizer hints in MySQL 5.7.7 – The missed manual

In version MySQL 5.7.7 Oracle presented a new promising feature: optimizer hints. However it did not publish any documentation about the hints. The only note which I found in the user manual about the hints is:

  • It is now possible to provide hints to the optimizer by including /*+ … */ comments following the SELECT, INSERT, REPLACE, UPDATE, or DELETE keyword of SQL statements. Such statements can also be used with EXPLAIN. Examples:
    SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1
    FROM t3 WHERE f1 > 30 AND f1 < 33;
    SELECT /*+ BKA(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...;
    SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

There are also three worklogs: WL #3996, WL #8016 and WL #8017. But they describe the general concept and do not have much information about which optimizations can be used and how. More light on this provided by slide 59 from Øystein Grøvlen’s session at Percona Live. But that’s all: no “official” full list of possible optimizations, no use cases… nothing.

I tried to sort it out myself.

My first finding is the fact that slide #59 really lists six of seven possible index hints. Confirmation for this exists in one of two new files under sql directory of MySQL source tree, created for this new feature.

$cat sql/opt_hints.h
...
/**
  Hint types, MAX_HINT_ENUM should be always last.
  This enum should be synchronized with opt_hint_info
  array(see opt_hints.cc).
*/
enum opt_hints_enum
{
  BKA_HINT_ENUM= 0,
  BNL_HINT_ENUM,
  ICP_HINT_ENUM,
  MRR_HINT_ENUM,
  NO_RANGE_HINT_ENUM,
  MAX_EXEC_TIME_HINT_ENUM,
  QB_NAME_HINT_ENUM,
  MAX_HINT_ENUM
};

Looking into file sql/opt_hints.cc we can find out what these optimizations give not much choice: either enable or disable.

$cat sql/opt_hints.cc
...
struct st_opt_hint_info opt_hint_info[]=
{
  {"BKA", true, true},
  {"BNL", true, true},
  {"ICP", true, true},
  {"MRR", true, true},
  {"NO_RANGE_OPTIMIZATION", true, true},
  {"MAX_EXECUTION_TIME", false, false},
  {"QB_NAME", false, false},
  {0, 0, 0}
};

A choice for the way to include hints into SQL statements: inside comments with sign “+”

/*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */

, – is compatible with style of optimizer hints which Oracle uses.

We actually had access to these hints before: they were accessible via variable optimizer_switch. At least such optimizations like BKA, BNL, ICP, MRR. But with new syntax we cannot only modify this access globally or per session, but can turn on or off particular optimization for a single table and column in the query. I can demonstrate it on this quite artificial but always accessible example:

mysql> use mysql
Database changed
mysql> explain select * from user where host in ('%', '127.0.0.1');
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | user  | NULL       | range | PRIMARY       | PRIMARY | 180     | NULL |    2 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select /*+ NO_RANGE_OPTIMIZATION(user PRIMARY) */ * from user where host in ('%', '127.0.0.1');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL |    5 |    40.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

I used one more hint, which we could not turn on or off directly earlier: range optimization.

One more “intuitively” documented feature is the ability to turn on or off a particular optimization. This works only for BKA, BNL, ICP and MRR: you can specify NO_BKA(table[[, table]…]), NO_BNL(table[[, table]…]), NO_ICP(table indexes[[, table indexes]…]) and NO_MRR(table indexes[[, table indexes]…]) to avoid using these algorithms for particular table or index in the JOIN.

MAX_EXECUTION_TIME does not require any table or key name inside. Instead you need to specify maximum time in milliseconds which query should run:

mysql> select /*+ MAX_EXECUTION_TIME(1000) */  sleep(1) from user;
ERROR 3024 (HY000): Query execution was interrupted, max_statement_time exceeded
mysql> select /*+ MAX_EXECUTION_TIME(10000) */  sleep(1) from user;
+----------+
| sleep(1) |
+----------+
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
+----------+
5 rows in set (5.00 sec)

QB_NAME is more complicated. WL #8017 tells us this is custom context. But what is this? The answer is in the MySQL test suite! Tests for optimizer hints exist in file t/opt_hints.test For QB_NAME very first entry is query:

EXPLAIN SELECT /*+ NO_ICP(t3@qb1 f3_idx) */ f2 FROM
  (SELECT /*+ QB_NAME(QB1) */ f2, f3, f1 FROM t3 WHERE f1 > 2 AND f3 = 'poiu') AS TD
    WHERE TD.f1 > 2 AND TD.f3 = 'poiu';

So we can specify custom QB_NAME for any subquery and specify optimizer hint only for this context.

To conclude this quick overview I want to show a practical example of when query hints are really needed. Last week I worked on an issue where a customer upgraded from MySQL version 5.5 to 5.6 and found some of their queries started to work slower than before. I wrote an answer which could sound funny, but still remains correct: “One of the reasons for such behavior is optimizer  improvements. While they all are made for better performance, some queries – optimized for older versions – can start working slower than before.”

To demonstrate a public example of such a query I will use my favorite source of information: MySQL Community Bugs Database. In a search for Optimizer regression bugs that are still not fixed we can find bug #68919 demonstrating regression in case the MRR algorithm is used for queries with LIMIT. In run queries, shown in the bug report, we will see a huge difference:

mysql> SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1;
+----+----+----+----+
| pk | i1 | i2 | i3 |
+----+----+----+----+
| 42 | 42 | 42 | 42 |
+----+----+----+----+
1 row in set (6.88 sec)
mysql> explain SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                            |
+----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
|  1 | SIMPLE      | t1    | NULL       | range | idx           | idx  | 4       | NULL | 9999958 |    33.33 | Using index condition; Using MRR |
+----+-------------+-------+------------+-------+---------------+------+---------+------+---------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SELECT /*+ NO_MRR(t1) */ *  FROM t1  WHERE i1>=42 AND i2<=42 LIMIT 1;
+----+----+----+----+
| pk | i1 | i2 | i3 |
+----+----+----+----+
| 42 | 42 | 42 | 42 |
+----+----+----+----+
1 row in set (0.00 sec)

With MRR query execution takes 6.88 seconds and 0 if MRR is not used! But the bug report itself suggests using

optimizer_switch="mrr=off";

as a workaround. And this will work perfectly well if you are OK to run

SET optimizer_switch="mrr=off";

every time you are running a query which will take advantage of having it OFF. With optimizer hints you can have one or another algorithm to be ON for particular table in the query and OFF for another one. I, again, took quite an artificial example, but it demonstrates the method:

mysql> explain select /*+ MRR(dept_emp) */ * from dept_emp where to_date in  (select /*+ NO_MRR(salaries)*/ to_date from salaries where salary >40000 and salary <45000) and emp_no >10100 and emp_no < 30200 and dept_no in ('d005', 'd006','d007');
+----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+
| id | select_type  | table       | partitions | type   | possible_keys          | key        | key_len | ref                        | rows    | filtered | Extra                                         |
+----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+
|  1 | SIMPLE       | dept_emp    | NULL       | range  | PRIMARY,emp_no,dept_no | dept_no    | 8       | NULL                       |   10578 |   100.00 | Using index condition; Using where; Using MRR |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_key>             | <auto_key> | 3       | employees.dept_emp.to_date |       1 |   100.00 | NULL                                          |
|  2 | MATERIALIZED | salaries    | NULL       | ALL    | salary                 | NULL       | NULL    | NULL                       | 2838533 |    17.88 | Using where                                   |
+----+--------------+-------------+------------+--------+------------------------+------------+---------+----------------------------+---------+----------+-----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

 

The post Optimizer hints in MySQL 5.7.7 – The missed manual appeared first on MySQL Performance Blog.

Apr
29
2015
--

Generated (Virtual) Columns in MySQL 5.7 (labs)

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm
	    WHERE dayofweek(FlightDate) = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 151253427
        Extra: Using where; Using temporary; Using filesort
Results:
32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER
TRIGGER ontime_insert
BEFORE INSERT ON ontime_sm_triggers
FOR EACH ROW
SET
NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `Flight_dayofweek` (`Flight_dayofweek`),
) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual  WHERE Flight_dayofweek = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek
          key: Flight_dayofweek
      key_len: 2
          ref: const
         rows: 165409
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual
       add key comb(Flight_dayofweek, carrier);
ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;
Query OK, 0 rows affected (0.43 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);
Query OK, 999999 rows affected (36.79 sec)
Records: 999999  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek,comb
          key: comb
      key_len: 2
          ref: const
         rows: 141223
     filtered: 100.00
        Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (27.86 sec)
Records: 999999  Duplicates: 0  Warnings: 0
mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (16.29 sec)
Records: 999999  Duplicates: 0  Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.
mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

The post Generated (Virtual) Columns in MySQL 5.7 (labs) appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com