May
13
2019
--

Solve Query Failures Regarding ONLY_FULL_GROUP_BY SQL Mode

Solve Query Failures SQL mode

“Hey, what’s going on with my applications? I installed a newer version of MySQL. I have queries that perfectly run with the older version and now I have a lot of errors.”

This is a question some customers have asked me after upgrading MySQL. In this article, we’ll see what one of the most frequent causes of this issue is, and how to solve it.

We are talking about this error:

ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause
and contains nonaggregated column 'test.web_log.user_id' which is not functionally
dependent on columns in GROUP BY clause; this is incompatible
with sql_mode=only_full_group_by

Have you ever seen it?

SQL_MODE

As the first thing let me introduce the concept of SQL_MODE.

MySQL can work using different SQL modes that affect the syntax of the queries and validation checks. Based on the configured value of the variable sql_mode, it means that a query can be valid and regularly executes or can receive a validation error and cannot be executed.

The oldest versions of MySQL got users accustomed to writing queries that were not semantically correct because it was designed to work in the “forgiving mode”. Users could write any kind of syntactically valid query regardless of SQL standard compliance or semantic rules. This was a bad habit that was corrected introducing the sql_mode to instruct MySQL to work in a more restrictive way for query validation.

Some users are not aware of this feature because the default value was not so restrictive. Starting from 5.7, the default value is more restrictive and this the reason why some users have problems with unexpected query failures after migration to 5.7 or 8.0.

The sql_mode variable can be set in the configuration file (/etc/my.cnf) or can be changed at runtime. The scope of the variable can be GLOBAL and SESSION, so it can change by the purpose of the mode for any single connection.

The sql_mode variable can have more values, separated by a comma, to control different behaviors. For example, you can instruct MySQL how to deal with dates with zeros as ‘0000-00-00’, to ensure the date be considered as valid or not. In the “forgiving mode” (or if sql_mode variable is empty) you can INSERT such a value without problems.

# set sql mode to "forgiving mode"
mysql> set session sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> create table t1( mydate date );
Query OK, 0 rows affected (0.05 sec)
mysql> insert into t1 values('0000-00-00');
Query OK, 1 row affected (0.00 sec)
mysql> select * from t1;
+------------+
| mydate     |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.00 sec)

But this is not the correct behavior as stated by the TRADITIONAL mode. As good programmers know, you have to validate dates into your source code in order to avoid to have incorrect data or incorrect results.

The following is how you can dynamically instruct MySQL to behave in the traditional mode to throw an error instead:

mysql> set session sql_mode='TRADITIONAL';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 values('0000-00-00');
ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'mydate' at row 1

There are many other modes you can use. Covering all the modes is not the goal of the article, so please refer to the official documentation for more details and examples:

https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html

 

The ONLY_FULL_GROUP_BY issue

Let’s focus on the most frequent cause of errors when migrating to 5.7 or 8.0. As we said, 5.7 has a default SQL mode that is more restrictive than 5.6, and as such it’s for 8.0. This is true when you upgrade MySQL copying the old my.cnf file that doesn’t have a specific setting for the sql_mode variable. So, be aware.

Let’s create a sample table to store the clicks on the webpages of our site. We would like to log the page name and the id of the registered user.

mysql> create table web_log ( id int auto_increment primary key, page_url varchar(100), user_id int, ts timestamp);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into web_log(page_url,user_id,ts) values('/index.html',1,'2019-04-17 12:21:32'),
    -> ('/index.html',2,'2019-04-17 12:21:35'),('/news.php',1,'2019-04-17 12:22:11'),('/store_offers.php',3,'2019-04-17 12:22:41'),
    -> ('/store_offers.php',2,'2019-04-17 12:23:04'),('/faq.html',1,'2019-04-17 12:23:22'),('/index.html',3,'2019-04-17 12:32:25'),
    -> ('/news.php',2,'2019-04-17 12:32:38');
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0
mysql> select * from web_log;
+----+--------------------+---------+---------------------+
| id | page_url           | user_id | ts                  |
+----+--------------------+---------+---------------------+
|  1 | /index.html        |       1 | 2019-04-17 12:21:32 |
|  2 | /index.html        |       2 | 2019-04-17 12:21:35 |
|  3 | /news.php          |       1 | 2019-04-17 12:22:11 |
|  4 | /store_offers.php  |       3 | 2019-04-17 12:22:41 |
|  5 | /store_offers.html |       2 | 2019-04-17 12:23:04 |
|  6 | /faq.html          |       1 | 2019-04-17 12:23:22 |
|  7 | /index.html        |       3 | 2019-04-17 12:32:25 |
|  8 | /news.php          |       2 | 2019-04-17 12:32:38 |
+----+--------------------+---------+---------------------+

 

Now we want to issue a query to calculate the most visited pages.

# let's turn the sql mode to "forgiving"
mysql> set session sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT page_url, user_id, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+---------+--------+
| page_url          | user_id | visits |
+-------------------+---------+--------+
| /index.html       |       1 |      3 |
| /news.php         |       1 |      2 |
| /store_offers.php |       3 |      2 |
| /faq.html         |       1 |      1 |
+-------------------+---------+--------+
4 rows in set (0.00 sec)

The query works, but it’s not really correct. It is easily understandable that page_url is the column of the grouping function, the value we are most interested in and we want to be unique for counting. Also, the visits column is good, as it’s the counter. But what about user_id? What does this column represent? We grouped on the page_url so the value returned for user_id is just one of the values in the group. In fact, it was not only user number1 to visit the index.html, but even users 2 and 3 visited the page. How can I consider that value? Is it the first visitor? Is it the last one?

We don’t know the right answer! We should consider the user_id column’s value as a random item of the group.

Anyway, the right answer is that the query is not semantically correct, because it has no meaning to return a value from a column that is not part of the grouping function. Then the query is expected to be invalid in the traditional sql.

Let’s test it.

mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT page_url, user_id, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause
and contains nonaggregated column 'test.web_log.user_id' which is not functionally
dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Now we have an error, as expected.

The SQL mode ONLY_FULL_GROUP_BY is part of the TRADITIONAL mode and it is enabled by default starting from 5.7.

A lot of customers had this kind of issue after migration to a recent version of MySQL.

Now we know what the cause of the issue is, but our applications are still not working. What possible solutions do we have to let the applications work again?

Solution 1 – rewrite the query

Since it’s not correct to select a column that is not part of the grouping, we can rewrite the query without those columns. Very simple.

mysql> SELECT page_url, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------+
| page_url          | visits |
+-------------------+--------+
| /index.html       |      3 |
| /news.php         |      2 |
| /store_offers.php |      2 |
| /faq.html         |      1 |
+-------------------+--------+

If you have a lot of queries affected by the problem, you have to potentially do a lot of work to retrieve and rewrite them. Or maybe the queries can be part of a legacy application you are not able or you don’t want to touch.

But this solution is the one that forces you to write correct queries and let your database configuration be restrictive in term of SQL validation.

 

Solution 2 – step back to the forgiving mode

You can change MySQL’s configuration and step back to the “forgiving” mode.

Or you can only drop the ONLY_FULL_GROUP_BY from the default. The default SQL mode in MySQL 5.7 includes these modes: ONLY_FULL_GROUP_BY, STRINCT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER

#set the complete "forgiving" mode
mysql> SET GLOBAL sql_mode='';
# alternatively you can set sql mode to the following
mysql> SET GLOBAL sql_mode='STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_ENGINE_SUBSTITUTION';

 

Solution 3 – use of aggregation functions

If your application absolutely needs to retrieve the user_id field for some valid reason, or it’s too complicated to change your source code,  you can rely on an aggregation function in order to avoid changing the sql mode configuration.

For example we can use MAX(), MIN() or even GROUP_CONCAT() aggregation functions.

mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY';
mysql> SELECT page_url, MAX(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------------+--------+
| page_url          | MAX(user_id) | visits |
+-------------------+--------------+--------+
| /index.html       |            3 |      3 |
| /news.php         |            2 |      2 |
| /store_offers.php |            3 |      2 |
| /faq.html         |            1 |      1 |
+-------------------+--------------+--------+
mysql> SELECT page_url, GROUP_CONCAT(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+-----------------------+--------+
| page_url          | GROUP_CONCAT(user_id) | visits |
+-------------------+-----------------------+--------+
| /index.html       |                 1,2,3 |      3 |
| /news.php         |                   1,2 |      2 |
| /store_offers.php |                   3,2 |      2 |
| /faq.html         |                     1 |      1 |
+-------------------+-----------------------+--------+

 

MySQL provides even a specific function for solving the problem: ANY_VALUE().

mysql> SELECT page_url, ANY_VALUE(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------------------+--------+
| page_url          | ANY_VALUE(user_id) | visits |
+-------------------+--------------------+--------+
| /index.html       |                  1 |      3 |
| /news.php         |                  1 |      2 |
| /store_offers.php |                  3 |      2 |
| /faq.html         |                  1 |      1 |
+-------------------+--------------------+--------+

 

Conclusion

I personally prefer solution number 1 because it forces you to write SQL-92 compliant queries. Following the standards is often considered a best practice.

Solution 2 is good in case you cannot change your application code or if rewriting all the queries is really too complicated. The solution is very good to solve the issues in a matter of seconds, however, I strongly suggest to have a long term plan to rewrite the queries that are not SQL-92 compliant.

For more details: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Photo of sign by Ken Treloar on Unsplash

Apr
16
2018
--

Binlog and Replication Improvements in Percona Server for MySQL

Percona Server for MySQL

Percona Server for MySQLDue to continuous development and improvement, Percona Server for MySQL incorporates a number of improvements related to binary log handling and replication. This results in replication specifics, distinguishing it from MySQL Server.

Temporary tables and mixed logging format

Summary of the fix:

As soon as some statement involving temporary tables was met when using a mixed binlog format, MySQL switched to row-based logging for all statements until the end of the session (or until all temporary tables used in the session were dropped). This is inconvenient when you have long-lasting connections, including replication-related ones. Percona Server for MySQL fixes the situation by switching between statement-based and row-based logging when necessary.

Details:

The new mixed binary logging format, supported by Percona Server for MySQL, means that the server runs in statement-based logging by default, but switches to row-based logging when replication would be unpredictable. For example, in the case of a nondeterministic SQL statement that could cause data divergence if reproduced on a slave server. The switch is done when matching any condition from a long list, and one of these conditions is the use of temporary tables.

Temporary tables are never logged using row-based format, but any statement that touches a temporary table is logged in row mode. This way, we intercept all the side effects that temporary tables can produce on non-temporary ones.

There is no need to use the row logging format for any other statements, solely because of the temp table presence. However, MySQL undertook such an excessive precaution: once some statement with a temporary table had appeared and the row-based logging was used, MySQL was logging unconditionally put all subsequent statements in row format.

Percona Server for MySQL has implemented more accurate behavior. Instead of switching to row-based logging until the last temporary table is closed, the usual rules of row vs. statement format apply, and we don’t consider the presence of currently opened temporary tables. This change was introduced with the fix of bug #151 (upstream #72475).

Temporary table drops and binloging on GTID-enabled server

Summary of the fix:

MySQL logs DROP statements for all temporary tables regardless of the logging mode under which these tables were created. This produces binlog writes and errand GTIDs on slaves with row and mixed logging. Percona Server for MySQL fixes this by tracking the binlog format at temporary table create time and uses it to decide whether a DROP should be logged or not.

Details:

Even with read_only mode enabled, the server permits some operations, including ones with temporary tables. With the previous fix, temporary table operations are not binlogged in row- or mixed-mode. But MySQL server doesn’t track what the logging mode was when a temporary table was created, and therefore unconditionally logs DROP statements for all temporary tables. These DROP statements receive IF EXISTS addition, which is intended to make them harmless.

Percona Server for MySQL has fixed this with the bug fixes #964, upstream #83003, and upstream #85258. Moreover, with all the binlogging fixes discussed so far nothing involving temporary tables is logged to the binary log in row or mixed format. There is no need to consider CREATE/DROP TEMPORARY TABLE unsafe for use in stored functions, triggers and multi-statement transactions in row/mixed format. Therefore, we introduced an additional fix to mark the creation and drop of temporary tables as unsafe inside transactions in statement-based replication only (the fixed bug is #1816, while the correspondent upstream one is #89467 and it is still open).

Safety of statements with a LIMIT clause

Summary of the fix:

MySQL Server considers all UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause unsafe, no matter if they are really producing non-deterministic results or not. Percona Server for MySQL is more accurate because it acknowledges such instructions as safe when they include ORDER BY PK or WHERE condition.

Details:

MySQL Server treats UPDATE/DELETE/INSERT ... SELECT statements with the LIMIT clause as unsafe, considering that they produce an unpredictable number of rows. But some such statements can still produce an absolutely predictable result. One such deterministic case takes place when a statement with the LIMIT clause has an ORDER BY PK or WHERE condition.

The patch, making updates and deletes with a limit to be supposed as safe if they have an ORDER BY pk_column clause, was initially provided on the upstream bug report and incorporated later into Percona Server for MySQL with additional improvements. Bug fixed #44 (upstream #42415).

Performance improvements

There are also two modifications in Percona Server related to multi-source replication that improve performance on slaves.

The first improvement is about relay log position, which was always updated in multi-source replications setups regardless of whether the committed transaction has already been executed or not. Percona Server omits relay log position updates for the already logged GTIDs.

These unconditional relay log position updates caused additional fsync operations in the case of relay-log-info-repository=TABLE. With the higher number of channels transmitting such duplicate (already executed) transactions, the situation became proportionally worse. The problem was solved in Percona Server 5.7.18-14.  Bug fixed  #1786 (upstream #85141).

The second improvement decreases the load on slave nodes configured to update the master status and connection information only on log file rotation. MySQL additionally updated this information in the case of multi-source replication when a slave had to skip the already executed GTID event. This behavior was the cause of substantially higher write loads on slaves and lower replication throughput.

The configuration with master_info_repository=TABLE and sync_master_info=0  makes the slave update the master status and connection information in this table on log file rotation and not after each sync_master_info event, but it didn’t work on multi-source replication setups. Heartbeats sent to the slave to skip GTID events that it had already executed previously were evaluated as relay log rotation events and reacted with mysql.slave_master_info table sync. This inaccuracy could produce a huge (up to five times on some setups) increase in write load on the slave, before this problem was fixed in Percona Server for MySQL 5.7.20-19. Bug fixed  #1812 (upstream #85158).

Current status of fixes

The three issues related to temporary tables that were fixed in Percona Server 5.5 and contributed upstream, and the final fixes of the bugs #72475, #83003, and #85258, have landed into MySQL Server 8.0.4.

The post Binlog and Replication Improvements in Percona Server for MySQL appeared first on Percona Database Performance Blog.

Oct
09
2017
--

MySQL and MariaDB Default Configuration Differences

MySQL and MariaDB Default Configuration

MySQL and MariaDB Default ConfigurationIn this blog post, I’ll discuss some of the MySQL and MariaDB default configuration differences, focusing on MySQL 5.7 and MariaDB 10.2.

MariaDB Server is a general purpose open source database, created by the founders of MySQL. MariaDB Server (referred to as MariaDB for brevity) has similar roots as Percona Server for MySQL, but is quickly diverging from MySQL compatibility and growing on its own. MariaDB has become the default installation for several operating systems (such as Red Hat Enterprise Linux/CentOS/Fedora). Changes in the default variables can make a large difference in the out-of-box performance of the database, so knowing what is different is important.

As MariaDB grows on its own and doesn’t remain 100% compatible with MySQL, the defaults configuration settings might not mean everything or behave the way they used to. It might use different variable names, or implement the same variables in new ways. You also need to take into account that MariaDB uses it’s own Aria storage engine that has many configuration options that do not exist in MySQL.

Note: In this blog, I am looking at variables common to both MySQL or MariaDB, but have different defaults, not variables that are specific to either MySQL or MariaDB (except for the different switches inside the optimizer_switch).

Binary Logs

Variable MariaDB Default MySQL Default
sync_binlog 0 1
binlog_format Mixed Row

 

MySQL has taken a more conservative stance when it comes to the binary log. In the newest versions of MySQL 5.7, they have updated two variables to help ensure all committed data remains intact and identical. Binlog_format was updated to row in MySQL in order to prevent non-deterministic statements from having different results on the slave. Row-based replication also helps when performing a lot of smaller updates. MariaDB defaults to the Mixed format. Mixed uses statement-based format unless certain criteria are met. It hat case, it uses the row format. You can see the detailed criteria for when the row format is used here: https://mariadb.com/kb/en/the-mariadb-library/binary-log-formats/.

The other difference that can cause a significant impact on performance is related to sync_binlog. Sync_binlog controls the number of commit groups to collect before synchronizing the binary log to disk. MySQL has changed this to 1, which means that every transaction is flushed to disk before it is committed. This guarantees that there can never be a committed transaction that is not recorded (even during a system failure). This can create a big impact to performance, as shown by a Roel Van de Paar in his blog: https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/

MariaDB utilizes a value of 0 for sync_binlog, which allows the operating system to determine when the binlog needs to be flushed. This provides better performance, but adds the risk that if MariaDB crashes (or power is lost) that some data may be lost.

MyISAM

Variable MariaDB Default MySQL Default
myisam_recover_options BACKUP,QUICK OFF
key_buffer_size 134217728 8388608

 

InnoDB replaced MyISAM as the default storage engine for some time now, but it is still used for many system tables. MySQL has tuned down the MyISAM settings, since it is not heavily used.

When mysqld opens a table, it checks whether the table is marked as crashed, or was not closed properly, and runs a check on it based on the myisam_recover_options settings. MySQL disables this by default, preventing recovery. MariaDB has enabled the BACKUP and QUICK recovery options. BACKUP causes a table_name-datetime.bak file to be created whenever a data file is changed during recovery. QUICK causes mysqld to not check the rows in a table if there are no delete blocks, ensuring recovery can occur faster.

MariaDB 10.2 increased the key_buffer_size. This allows for more index blocks to be stored in memory. All threads use this buffer, so a small buffer can cause information to get moved in and out of it more quickly. MariaDB 10.2 uses a buffer 16 times the size of MySQL 5.7: 134217728 in MariaDB 10.2 vsx 8388608 in MySQL 5.7.

Innodb

Variable MariaDB Default MySQL Default
innodb_max_undo_log_size 10485760(10 MiB) 1073741824(1024 MiB)

 

InnoDB variables have remained primarily unchanged between MariaDB 10.2 and MySQL 5.7. MariaDB has reduced the innodb_max_undo_log_size starting in 10.2.6. This was reduced from MySQL’s default of 1073741824(1024 MiB) to 10485760(10 MiB). These sizes reflect the maximum size an undo tablespace can become before it is marked for truncation. The tablespace doesn’t get truncated unless innodb_undo_log_truncate is enabled, and it is disabled in MySQL 5.7 and MariaDB 10.2 by default.

Logging

Variable MariaDB Default MySQL Default
log_error /var/log/mysqld.log
log_slow_admin_statements ON OFF
log_slow_slave_statements ON OFF
lc_messages_dir /usr/share/mysql

 

Logs are extremely important for troubleshooting any issues so the different choices in logging for MySQL 5.7 and MariaDB 10.2 are very interesting.

The log_error variable allows you to control where errors get logged. MariaDB 10.2 leaves this variable blank, writing all errors to stderr. MySQL 5.7 uses an explicitly created file at: /var/log/mysqld.log.

MariaDB 10.2 has also enabled additional slow statement logging. Log_slow_admin_statements create a record for any administrative statements that are not typically written to the binlog. Log_slow_slave_statements log the replicated statements sent from the master, if they are slow to complete. MySQL 5.7 does not enable logging of these statements by default.

Lc_messages_dir is the directory that contains the error message files for various languages. The variable defaults might be a little misleading in MariaDB 10.2. Lc_messages_dir is left empty by default, although it still uses the same path as MySQL 5.7. The files are located in /usr/share/mysql by default for both databases.

Performance Schema

Variable MariaDB Default MySQL Default
performance_schema OFF ON
performance_schema_setup_actors_size 100 -1 (auto adjusted)
performance_schema_setup_objects_size 100 -1 (auto adjusted)

 

The performance schema is an instrumentation tool that is designed to help troubleshoot various performance concerns. MySQL 5.7 enables the performance schema, and many of its instruments, by default. MySQL even goes so far as to detect the appropriate value for many Performance Schema variables instead of setting a static default. The Performance Schema does come with some overhead, and there are many blogs regarding how much this can impact performance. I think Sveta Smirnova said it best in her blog  Performance Schema Benchmarks OLTP RW: “…test on your system! No generic benchmark can exactly repeat a workload on your site.

MariaDB has disabled the Performance Schema by default, as well as adjusted a couple of the dynamic variables. Note that if you wish to disable or enable the Performance Schema, it requires a restart of the server since these variables are not dynamic. Performance_schema_setup_actors_size and performance_schema_setup_objects_size have both been set to a static 100, instead of the dynamic -1 used in MySQL 5.7. These both limit the number of rows that can be stored in relative tables. This creates a hard limit to the size these tables can grow to, helping to reduce their data footprint.

SSL/TLS

Variable MariaDB Default MySQL Default
ssl_ca ca.pem
ssl_cert server-cert.pem
ssl_key server-key.pem

 

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographic protocols that allow for secure communication. SSL is actually the predecessor of TLS, although both are often referred to as SSL. MySQL 5.7 and MariaDB 10.2 support both yaSSL and OpenSSL. The default configurations for SSL/TLS differ only slightly between MySQL 5.7 and MariaDB 10.2. MySQL 5.7 sets a specific file name for ssl_ca, ssl_cert, and ssl_key. These files are created in the base directory, identified by the variable basedir. Each of these variables is left blank in MariaDB 10.2, so you need to set them before using secure connections. These variables are not dynamic, so be sure to set the values before starting your database.

Query Optimizer

MariaDB 10.2 MySQL 5.7 Optimization Meaning Switch
N/A OFF Batched Key Access Controls use of BKA join algorithm batched_key_access
N/A ON Block Nested-Loop Controls use of BNL join algorithm block_nested_loop
N/A ON Condition Filtering Controls use of condition filtering condition_fanout_filter
Deprecated ON Engine Condition Pushdown Controls engine condition pushdown engine_condition_pushdown
ON N/A Engine Condition Pushdown Controls ability to push conditions down into non-mergeable views and derived tables condition_pushdown_for_derived
ON N/A Exists Subquery Allows conversion of in statements to exists statements exists_to_in
ON N/A Exists Subquery Allows conversion of exists statements to in statements in_to_exists
N/A ON Index Extensions Controls use of index extensions use_index_extensions
OFF N/A Index Merge Allows index_merge for non-equality conditions index_merge_sort_intersection
ON N/A Join Algorithms Perform index lookups for a batch of records from the join buffer join_cache_bka
ON N/A Join Algorithms Controls use of BNLH and BKAH algorithms join_cache_hashed
ON N/A Join Algorithms Controls use of incremental algorithms join_cache_incremental
ON N/A Join Algorithms Controls use of block-based algorithms for outer joins outer_join_with_cache
ON N/A Join Algorithms Controls block-based algorithms for use with semi-join operations semijoin_with_cache
OFF N/A Join Buffer Creates the join buffer with an estimated size based on the estimated number of rows in the result optimize_join_buffer_size
ON N/A Materialized Temporary Tables Allows index creation on derived temporary tables derived_keys
ON N/A Materialized Temporary Tables Controls use of the rowid-merge strategy partial_match_rowid_merge
ON N/A Materialized Temporary Tables Controls use of the partial_match_table-scan strategy partial_match_table_scan
OFF ON Multi-Range Read Controls use of the multi-range read strategy mrr
OFF ON Multi-Range Read Controls use of cost-based MRR, if mrr=on mrr_cost_based
OFF N/A Multi-Range Read Enables key ordered scans if mrr=on mrr_sort_keys
ON N/A Order By Considers multiple equalities when ordering results ordery_uses_equalities
ON N/A Query Plan Allows the optimizer to use hidden components of InnoDB keys extended_keys
ON N/A Query Plan Controls the removal of irrelevant tables from the execution plan table_elimination
ON N/A Subquery Stores subquery results and correlation parameters for reuse subquery_cache
N/A ON Subquery Materialization Controls us of cost-based materialization ubquery_materialization_cost_based
N/A ON Subquery Materialization &

Semi-join

Controls the semi-join duplicate weedout strategy duplicateweedout

 

The query optimizer has several variances that not only affect query performance but also how you write SQL statements. The query optimizer is substantially different between MariaDB and MySQL, so even with identical configurations you are likely to see varying performance.

The sql_mode puts restrictions on how you can write queries. MySQL 5.7 has several additional restrictions compared to MariaDB 10.2. Only_full_group_by requires that all fields in any select…group by statement are either aggregated or inside the group by clause. The optimizer doesn’t assume anything regarding the grouping, so you must specify it explicitly.

No_zero_date, and no_zero_in_date both affect how the server interprets 0’s in dates. When no_zero_date is enabled, values of ‘0000-00-00’ are permitted but produce a warning. With strict mode enabled, then the value is not permitted and produces an error. No_zero_in_date is similar, except it applies to any section of the date(month, day, or year). With this disabled, dates with 0 parts, such as ‘2017-00-16’ are allowed as is. When enabled, the date is changed to ‘0000-00-00’ without warning. Strict mode prevents the date being inserted, unless ignore is provided as well. “INSERT IGNORE” and “UPDATE IGNORE” inserts the dates as ‘0000-00-00’. 5.7.4 changed this. No_zero_in_date was consolidated with strict mode, and the explicit option is deprecated.

The query_prealloc_size determines the size of the persistent buffer used for statement parsing and execution. If you regularly use complex queries, it can be useful to increase the size of this buffer, as it does not need to allocate additional memory during the query parsing. MySQL 5.7 has set this buffer to 8192, with a block size of 1024. MariaDB increased this value in 10.1.2 up to 24576.

Query_alloc_block_size dictates the size in bytes of any extra blocks allocated during query parsing. If memory fragmentation is a common problem, you might want to look at increasing this value. MySQL 5.7 uses 8192, while MariaDB 10.2 uses 16384 (twice that). Be careful when adjusting the block sizes: going too high consumes more than the needed amount of memory, and too low causes significant fragmentation.

The optimizer_switch variable contains many different switches that impact how the query optimizer plans and performs different queries. MariaDB 10.2 and MySQL 5.7 have many differences in their enabled options, and even the available options. You can see a brief breakdown of each of the options below. Any options with N/A is not supported in that server.

Miscellaneous

Variable MariaDB Default MySQL Default
default_tmp_storage_engine NULL InnoDB
group_concat_max_len 1048576(1M) 1024(1K)
Lock_wait_timeout 86400 (1 DAY) 31536000 (1 YEAR)
Max_allowed_packet (16777216) 16MB 4194304 (4MB)
Max_write_lock_count 4294967295 18446744073709551615
Old_passwords OFF 0
Open_files_limit 0 dependent on OS
pid_file /var/lib/mysql/ /var/run/mysqld/
secure_file_priv Varies by installation
sort_buffer_size 2097152 262144
table_definition_cache 400 autosized
table_open_cache_instances 8 16
thread_cache_size autosized autosized
thread_stack 292KB 192KB/256KB

 

There are many variables that do not fit well into a group. I will go over those here.

When creating temporary tables, if you do not specify a storage engine then a default is used. In MySQL 5.7 this is set to InnoDB, the same as the default_storage_engine. MariaDB 10.2 also uses InnoDB, but it is not explicitly set. MariaDB sets the default_tmp_storage_engine to NULL, which causes it to use the default_storage_engine. This is important to remember if you change your default storage engine, as it would also change the default for temporary tables. An Important note, in MariaDB this is only relevant to tables created with “CREATE TEMPORARY TABLE”. Internal in-memory temporary tables use the memory storage engine, and internal, on-disk temporary tables use the aria engine by default.

The Group_concat function can cause some very large results if left unchecked. You can restrict the maximum size of results from this function with group_concat_max_len. MySQL 5.7 limits this to 1024(1K). MariaDB increased the value in 10.2.4 up to 1048576(1M).

Lock_wait_timeout controls how long a thread waits as it attempts to acquire a metadata lock. Several statements require a metadata lock, including DDL and DML operations, Lock Tables, Flush Tables with Read Lock and Handler statements. MySQL 5.7 defaults to the maximum possible value (one year), while MariaDB 10.2 has toned this down to one day.

Max_allowed_packet sets a limit to the maximum size of a packet, or a generated/intermediate string. This value is intentionally kept small (4MB) on MySQL 5.7 in order to detect the larger, intentionally incorrect packets. MariaDB has increased this value to 16MB. If using any large BLOB fields, you need to adjust this value to the size of the largest BLOB, in multiples of 1024, or you risk running into errors transferring the results.

Max_write_lock_count controls the number of write locks that can be given before some read lock requests being processed. In extremely heavy write loads your reads can pile up while waiting for the writes to complete. Modifying the max_write_lock_count allows you to tune how many writes can occur before some reads are allowed against the table. MySQL 5.7 keeps this value at the maximum (18446744073709551615), while MariaDB 10.2 lowered this to 4294967295. One thing to note is that this is still the maximum value on MariaDB 10.2.

Old_passwords controls the hashing method used by the password function, create user and grant statements. This variable has undergone several changes in MySQL 5.7. As of 5.7.4 the valid options were MySQL 4.1 native hashing, Pre-4.1 (old) hashing, and SHA-256 hashing. Version 5.7.5 removed the “old” Pre-4.1 method, and in 5.7.6 the variable has been deprecated with the intent of removing it entirely. MariaDB 10.2 uses a simple boolean value for this variable instead of the enumerated one in MySQL 5.7, though the intent is the same. Both default the old_passwords to OFF, or 0, and allow you to enable the older method if necessary.

Open_files_limit restricts the number of file descriptors mysqld can reserve. If set to 0 (the default in MariaDB 10.2) then mysqld reserves max_connections * 5 or max_connections + table_open_cache * 2, whichever is larger. It should be noted that mysqld cannot use an amount larger than the hard limit imposed by the operating system. MySQL 5.7 is also restricted by the operating systems hard limit, but is set at runtime to the real value permitted by the system (not a calculated value).

The pid_file allows you to control where you store the process id file. This isn’t a file you typically need, but it is good to know where it is located in case some unusual errors occur. On MariaDB you can find this inside /var/lib/mysql/, while on MySQL 5.7 you will find it inside /var/run/mysqld/. You will also notice a difference in the actual name of the file. MariaDB 10.2 uses the hostname as the name of the pid, while MySQL 5.7 simply uses the process name (mysqld.pid).

Secure_file_priv is a security feature that allows you to restrict the location of files used in data import and export operations. When this variable is empty, which was the default in MySQL before 5.7.6, there is no restriction. If the value is set to NULL, import and export operations are not permitted. The only other valid value is the directory path where files can be imported from or exported to. MariaDB 10.2 defaults to empty. As of MySQL 5.7.6, the default will depend on the install_layout CMAKE option.

INSTALL_LAYOUT DEFAULT VALUE
STANDALONE,WIN NULL(>=MySQL 5.7.16_,empty(<MySQL 5.7.16)
DEB,RPM,SLES,SVR4 /var/lib/mysql-files
Other Mysql-files under the CMAKE_INSTALL_PREFIX value

 

Mysqld uses a sort buffer regardless of storage engine. Every session that must perform a sort allocates a buffer equal to the value of sort_buffer_size. This buffer should at minimum be large enough to contain 15 tuples. In MySQL 5.7, this defaults to 262144, while MariaDB 10.2 uses the larger value 2097152.

The table_definition_cache restricts the number of table definitions that can be cached. If you have a large number of tables, mysqld may have to read the .frm file to get this information. MySQL 5.7 auto detects the appropriate size to use, while MariaDB 10.2 defaults this value to 400. On my small test VM, MySQL 5.7 chose a value of 1400.

The table_open_cache_instances vary in implementation between MySQL and MariaDB. MySQL 5.7 creates multiple instances of the table_open_cache, each holding a portion of the tables. This helps reduce contention, as a session needs to lock only one instance of the cache for DML statements. In MySQL 5.7.7 the default was a single instance, but this was changed in MySQL 5.7.8 (increased to 16). MariaDB has a more dynamic approach to the table_open_cache. Initially there is only a single instance of the cache, and the table_open_cache_instances variable is the maximum number of instances that can be created. If contention is detected on the single cache, another instance is created and an error logged. MariaDB 10.2 suspects that the maximum eight instances it sets by default should support up to 100 CPU cores.

The thread_cache_size controls when a new thread is created. When a client disconnects the thread is stored in the cache, as long as the maximum number of threads do not exist. Although this is not typically noticeable, if your server sees hundreds of connections per second you should increase this value to so that new connections can use the cache. Thread_cache_size is an automatically detected variable in both MySQL 5.7 and MariaDB 10.2, but their methods to calculate the default vary significantly. MySQL uses a formula, with a maximum of 100: 8+ (max_connections / 100). MariaDB 10.2 uses the smaller value out of 256 or the max_connections size.

The thread_stack is the stack size for each thread. If the stack size is too small, it limits the complexity of SQL statements, the recursion depth of stored procedures and other memory-consuming actions. MySQL 5.7 defaults the stack size to 192KB on 32-bit platforms and 256KB on 64-bit systems. MariaDB 10.2 adjusted this value several times. MariaDB 10.2.0 used 290KB, 10.2.1 used 291KB and 10.2.5 used 292KB.

Conclusion

Hopefully, this helps you with the configurations options between MySQL and MariaDB. Use the comments for any questions.

Aug
11
2017
--

Learning MySQL 5.7: Q & A

MySQL 5.7

MySQL 5.7In this post I’ll answer questions I received in my Wednesday, July 19, 2017, webinar Learning MySQL 5.7!

First, thank you all who attended the webinar. The link to the slides and the webinar recording can be found here.

I received a number of interesting questions in the webinar that I’ve followed up with below.

Would there be a big difference on passing from 5.1 to 5.6 before going to 5.7 or, at this point, would it be roughly the same?

The biggest risk of jumping between versions, in this case 5.1 to 5.6, is reverting in case of problems. Rollbacks don’t happen often, but they do happen and you have to make sure you have the infrastructure in place whenever you decide to execute. These upgrade steps are not officially supported by Oracle nor even recommended here at Percona. Having said that, as long as your tests (checksums, pt-upgrade) and rollback plan works, this shouldn’t be a problem.

One unforgettable issue I have personally encountered is an upgrade from 5.1 via dump and reload to 5.6. The 5.6 version ran with ROW binlog format preventing replication back to 5.1 because of the limitation with the TIMESTAMP columns. Similarly, downgrading without replication means you have to deal with changes to the MySQL system schema, which obviously require some form of downtime.

Additionally, replication from 5.7 to 5.5 will not work because of the additional metadata information that 5.7 creates (i.e., GTID even when GTID is disabled).

After in-place upgrade a Percona XtraDB Cluster from 5.5 to 5.7 (through 5.6),

innodb_file_per_table

 is enabled by default and the database is now almost twice the size. It was a 40 GB DB now it’s 80 GB due to every table has its own file but ibdata1 is still 40 GB. Is there any solution for this (that doesn’t involve mysqldump and drop tables) and how can this be avoided in future upgrades?

The reason this might be the case is that after upgrading, a number (or possibly all) of tables were [re]created. This would obviously create separate tablespaces for each. One way I can think of reclaiming that disk space is through a familiar upgrade path:

  1. Detach one of the nodes and make is an async replica of the remaining nodes in the cluster
  2. Dump and reload data from this node, then resume replication
  3. Join the other nodes from the cluster as additional nodes of a new cluster using the async replica
  4. Once there is only one node remaining in the original cluster, you can switch to the new cluster for production
  5. Rejoin the last node from the original cluster into the new cluster to complete the process

Depending on the semantics of your switch, it may or may not involve a downtime. For example, if you use ProxySQL this should be a transparent operation.

One way to avoid this problem is by testing. Testing the upgrade process in a lab will expose this kind of information even before deploying the new version into production, allowing you to adjust your process accordingly.

What is a possible impact on upgrades going from the old table format to Barracuda?

So far I am not aware of any negative impact – except if you upgrade and need to downgrade but have since created indexes with prefixes larger than what was supported on the previous version (see large_index_prefix and Barracuda documentation).

Upgrading to Barracuda and one of the supported row formats specifically allows memory constrained systems to save a little more. With BLOB/TEXT column stored off the page, they will not fill the buffer pool unless they are needed.

How do you run mysql_upgrade in parallel?

Good question, I actually wrote about it here.

Can you elaborate on ALTER progress features, and is it also applicable to “Optimization ” query?

I was not able to get more details on the “Optimization” part of this question. I can only assume this too was meant to be table rebuild via OPTIMIZE TABLE. First I would like to point out that OPTIMIZE has been an online DDL operation from 5.6 (with few limitations). As such, there is almost no point in monitoring. Also, for the cases where the online DDL does not apply to OPTIMIZE, under the hood, this is ALTER TABLE .. FORCE – a full table rebuild.

Now, for the actual ALTER process doing a table copy/rebuild, MySQL 5.7 provides some form of progress indication as to how much work has been done. However, it does not necessarily provide an estimate of the actual time it would take to complete. Each ALTER process has different phases which can vary under different conditions. Alternatively, you can also employ other ways of monitoring progress as described in the post.

We are migrated from 5.7.11 to 5.7.17 Percona Server and facing “

Column 1 of table 'x.x' cannot be converted from type 'varchar(100)' to type 'varchar(100)'

”.

This is interesting – what we have seen so far are errors with different datatypes or sizes, which most likely means inconsistency from the table structures if the error is coming from replication. We will need more information on what steps were taken during the upgrade to tell what happened here. Our forums would be the best place to continue this conversation. To begin with, perhaps slave_type_conversions might help if the table structures in replication are the same.

Is the Boost Geometry almost on par with Postgres GIS functions?

I cannot answer this with authority or certainty. I’ve used GIS functions in MySQL, but not developed code for it. Although Boost::Geometry was chosen because of its well-designed API, rapid development and license compatibility, it does not necessarily mean it is more mature than PostGIS (which is widely adopted).

What is the best bulk insert method for MySQL 5.7?

The best option can be different in many situations, so we have to put context here. For this reason, let me give some example scenarios and what might work best:

  • On an upgrade process where you are doing a full dump and reload, parallelizing the process by using mydumper/myloader or mysqlpump will save a lot of time depending the hardware resource available.
  • Bulk INSERT from your application that happens at regular intervals – multi-row inserts are always ideal to reduce disk writes per insert. LOAD DATA INFILE is also a popular option if you can.

Again, thank you for attending the webinar – if you have additional questions head on out to the Percona Forums!

Jul
14
2017
--

A Little Trick Upgrading to MySQL 5.7

Upgrading to MySQL 5.7

Upgrading to MySQL 5.7In this blog post, I’ll look at a trick we use at Percona when upgrading to MySQL 5.7.

I’ll be covering this subject (and others) in my webinar Learning MySQL 5.7 on Wednesday, July 19, 2017.

We’ve been doing upgrades for quite a while here are Percona, and we try to optimize, standardize and improve this process to save time. When upgrading to MySQL 5.7, more often than not you need to run REPAIR or ALTER via mysql_upgrade to a number of MySQL tables. Sometimes a few hundred, sometimes hundreds of thousands.

One way to cut some time from testing or executing mysql_upgrade is to combine it with mysqlcheck. This identifies tables that need to be rebuilt or repaired. The first step is to capture the output of this process:

revin@acme:~$ mysqlcheck --check-upgrade --all-databases > mysql-check.log

This provides a lengthy output of what needs to be done to successfully upgrade our tables. On my test data, I get error reports like the ones below. I’ll need to take the specified action against them:

ads.agency
error    : Table upgrade required. Please do "REPAIR TABLE `agency`" or dump/reload to fix it!
store.categories
error    : Table rebuild required. Please do "ALTER TABLE `categories` FORCE" or dump/reload to fix it!

Before we run through this upgrade, let’s get an idea of how long it would take for a regular mysql_upgrade to complete on this dataset:

revin@acme:~$ time mysql_upgrade
Enter password:
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
Checking system database.
mysql.columns_priv                                 OK
mysql.db                                           OK
...
mysql.user                                         OK
Upgrading the sys schema.
Checking databases.
ads.account_preference_assoc         OK
...
Repairing tables
...
ads.agency
Note     : TIME/TIMESTAMP/DATETIME columns of old format have been upgraded to the new format.
status   : OK
...
`store`.`categories`
Running  : ALTER TABLE `store`.`categories` FORCE
status   : OK
...
Upgrade process completed successfully.
Checking if update is needed.
real	25m57.482s
user	0m0.024s
sys	0m0.072s

On a cold server, my baseline above took about 25 minutes.

The second step on our time-saving process is to identify the tables that need some action (in this case, REPAIR and ALTER … FORCE). Generate the SQL statements to run them and put them into a single SQL file:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--');
	do echo "mysql -e 'REPAIR TABLE $t;'" >> upgrade.sql; done
revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--');
	do echo "mysql -e 'ALTER TABLE $t FORCE;'" >> upgrade.sql; done

My upgrade.sql file will have something like this:

mysql -e 'ALTER TABLE store.categories FORCE;'
mysql -e 'REPAIR TABLE ads.agency;'

Now we should be ready to run these commands in parallel as the third step in the process:

revin@acme:~$ time parallel -j 4 -- < upgrade.sql
...
real	17m31.448s
user	0m1.388s
sys	0m0.616s

Getting some parallelization is not bad, and the process improved by about 38%. If we are talking about multi-terabyte data sets, then it is already a big gain.

On the other hand, my dataset has a few tables that are bigger than the rest. Since mysqlcheck processes them in a specific order, one of the threads was processing most of them instead of spreading them out evenly to each thread by size. To fix this, we need to have an idea of the sizes of each table we will be processing. We can use a query from the INFORMATION_SCHEMA.TABLES for this purpose:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--');
	do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2);
	s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';");
	echo "$s |mysql -e 'ALTER TABLE $t FORCE;'" >> table-sizes.sql; done
revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--');
	do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2);
	s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';");
	echo "$s |mysql -e 'REPAIR TABLE $t;'" >> table-sizes.sql; done

Now my table-sizes.sql file will have contents like below, which I can sort and pass to the parallel command again and cut even more time!

32768 |mysql -e 'REPAIR TABLE ads.agency;'
81920 |mysql -e 'ALTER TABLE store.categories FORCE;'

revin@acme:~$ cat table-sizes.sql |sort -rn|cut -d'|' -f2 > upgrade.sql
revin@acme:~$ time parallel -j 4 -- < upgrade.sql
...
real	8m1.116s
user	0m1.260s
sys	0m0.624s

This go-around, my total execution time is 8 minutes – a good 65% improvement. To wrap it up, we will need to run mysql_upgrade one last time so that the system tables are also upgraded, the tables are checked again and then restart the MySQL server as instructed by the manual:

revin@acme:~$ time mysql_upgrade --force

The whole process should be easy to automate and script, depending on your preference. Lastly: YMMV. If you have one table that is more than half the size of your total data set, there might not be big gains.

If you want to learn more about upgrading to MySQL 5.7, come to my webinar on Wednesday, July 19: Learning MySQL 5.7. This process is only one of the phases in a multi-step upgrade process when moving to 5.7. I will discuss them in more detail next week. Register now from the link below, and I’ll talk to you soon!

Oct
18
2016
--

Upgrading to MySQL 5.7? Beware of the new STRICT mode

STRICT mode

STRICT modeThis blog post discusses the ramifications of STRICT mode in MySQL 5.7.

In short

By default, MySQL 5.7 is much “stricter” than older versions of MySQL. That can make your application fail. To temporarily fix this, change the

SQL_MODE

 to

NO_ENGINE_SUBSTITUTION

 (same as in MySQL 5.6):

mysql> set global SQL_MODE="NO_ENGINE_SUBSTITUTION";

MySQL 5.7, dates and default values

The default

SQL_MODE

 in MySQL 5.7 is:

ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

That makes MySQL operate in “strict” mode for transactional tables.

“Strict mode controls how MySQL handles invalid or missing values in data-change statements such as INSERT or UPDATE. A value can be invalid for several reasons. For example, it might have the wrong data type for the column, or it might be out of range. A value is missing when a new row to be inserted does not contain a value for a non-NULL column that has no explicit DEFAULT clause in its definition. (For a NULL column, NULL is inserted if the value is missing.) Strict mode also affects DDL statements such as CREATE TABLE.”
http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-strict

That also brings up an interesting problem with the default value for the date/datetime column. Let’s say we have the following table in MySQL 5.7, and want to insert a row into it:

mysql> CREATE TABLE `events_t` (
-> `id` int(11) NOT NULL AUTO_INCREMENT,
-> `event_date` datetime NOT NULL,
-> `profile_id` int(11) DEFAULT NULL,
-> PRIMARY KEY (`id`),
-> KEY `event_date` (`event_date`),
-> KEY `profile_id` (`profile_id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
-> ;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into events_t (profile_id) values (1);
ERROR 1364 (HY000): Field 'event_date' doesn't have a default value

The

event_date

 does not have a default value, and we are inserting a row without a value for

event_date

. That causes an error in MySQL 5.7. If we can’t use NULL, we will have to create a default value. In strict mod,e we can’t use “0000-00-00” either:

mysql> alter table events_t change event_date event_date datetime NOT NULL default '0000-00-00 00:00:00';
ERROR 1067 (42000): Invalid default value for 'event_date'
mysql> alter table events_t change event_date event_date datetime NOT NULL default '2000-00-00 00:00:00';
ERROR 1067 (42000): Invalid default value for 'event_date'

We have to use a real date:

mysql> alter table events_t change event_date event_date datetime NOT NULL default '2000-01-01 00:00:00';
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> insert into events_t (profile_id) values (1);
Query OK, 1 row affected (0.00 sec)

Or, a most likely much better approach is to change the application logic to:

  • allow NULLs, or
  • always insert the real dates (i.e. use NOW() function), or
  • change the table field to timestamp and update it automatically if no value has been assigned
Further reading

Read the Morgan Tocker’s article on how to transition to MySQL 5.7, and check the full sql_mode documentation

Oct
12
2016
--

MySQL 5.7 Performance Tuning Immediately After Installation

MySQL 5.7 Performance Tuning

This blog updates Stephane Combaudon’s blog on MySQL performance tuning, and covers MySQL 5.7 performance tuning immediately after installation.

A few years ago, Stephane Combaudon wrote a blog post on Ten MySQL performance tuning settings after installation that covers the (now) older versions of MySQL: 5.1, 5.5 and 5.6. In this post, I will look into what to tune in MySQL 5.7 (with a focus on InnoDB).

The good news is that MySQL 5.7 has significantly better default values. Morgan Tocker created a page with a complete list of features in MySQL 5.7, and is a great reference point. For example, the following variables are set by default:

In MySQL 5.7, there are only four really important variables that need to be changed. However, there are other InnoDB and global MySQL variables that might need to be tuned for a specific workload and hardware.

To start, add the following settings to my.cnf under the [mysqld] section. You will need to restart MySQL:

[mysqld]
# other variables here
innodb_buffer_pool_size = 1G # (adjust value here, 50%-70% of total RAM)
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0
innodb_flush_method = O_DIRECT

Description:

Variable Value
innodb_buffer_pool_size Start with 50% 70% of total RAM. Does not need to be larger than the database size
innodb_flush_log_at_trx_commit
  • 1   (Default)
  • 0/2 (more performance, less reliability)
innodb_log_file_size 128M – 2G (does not need to be larger than buffer pool)
innodb_flush_method O_DIRECT (avoid double buffering)

 

What is next?

Those are a good starting point for any new installation. There are a number of other variables that can increase MySQL performance for some workloads. Usually, I would setup a MySQL monitoring/graphing tool (for example, the Percona Monitoring and Management platform) and then check the MySQL dashboard to perform further tuning.

What can we tune further based on the graphs?

InnoDB buffer pool size. Look at the graphs:

MySQL 5.7 Performance Tuning

MySQL 5.7 Performance Tuning

As we can see, we can probably benefit from increasing the InnoDB buffer pool size a bit to ~10G, as we have RAM available and the number of free pages is small compared to the total buffer pool.

InnoDB log file size. Look at the graph:

MySQL 5.7 Performance Tuning

As we can see here, InnoDB usually writes 2.26 GB of data per hour, which exceeds the total size of the log files (2G). We can now increase the

innodb_log_file_size

 variable and restart MySQL. Alternatively, use “show engine InnoDB status” to calculate a good InnoDB log file size.

Other variables

There are a number of other InnoDB variables that can be further tuned:

innodb_autoinc_lock_mode

Setting innodb_autoinc_lock_mode =2 (interleaved mode) can remove the need for table-level AUTO-INC lock (and can increase performance when multi-row insert statements are used to insert values into tables with auto_increment primary key). This requires

binlog_format=ROW

  or

MIXED

  (and ROW is the default in MySQL 5.7).

innodb_io_capacity and innodb_io_capacity_max

This is a more advanced tuning, and only make sense when you are performing a lot of writes all the time (it does not apply to reads, i.e. SELECTs). If you really need to tune it, the best method is knowing how many IOPS the system can do. For example, if the server has one SSD drive, we can set

innodb_io_capacity_max=6000

 and

innodb_io_capacity=3000

 (50% of the max). It is a good idea to run the sysbench or any other benchmark tool to benchmark the disk throughput.

But do we need to worry about this setting? Look at the graph of buffer pool’s “dirty pages“:

screen-shot-2016-10-03-at-7-19-47-pm

In this case, the total amount of dirty pages is high, and it looks like InnoDB can’t keep up with flushing them. If we have a fast disk subsystem (i.e., SSD), we might benefit from increasing

innodb_io_capacity

 and innodb_io_capacity_max.

Conclusion or TL;DR version

The new MySQL 5.7 defaults are much better for general purpose workloads. At the same time, we still need to configure InnoDB variables to take advantages of the amount of RAM on the box. After installation, follow these steps:

  1. Add InnoDB variables to my.cnf (as described above) and restart MySQL
  2. Install a monitoring system, (e.g., Percona Monitoring and Management platform)
  3. Look at the graphs and determine if MySQL needs to be tuned further
Jul
19
2016
--

Upcoming Webinar Wednesday July 20, 11 am PDT: Practical MySQL Performance Optimization

MySQL Performance Optimization

Practical MySQL Performance OptimizationAre you looking to improve your MySQL performance? Application success is often limited by poor MySQL performance. Please join Percona CEO and Founder Peter Zaitsev for this exclusive webinar on Wednesday, July 20th, 2016 at 11:00 AM PDT (UTC – 7) as he presents “Practical MySQL Performance Optimization“.

Peter Zaitsev discusses how to get excellent MySQL performance while being practical. In other words, spending time on what gives you the best return. The webinar updates Peter’s ever-popular Practical MySQL Performance Optimization presentation. It covers the important points for improving MySQL performance. It also includes a discussion of the new tools and features in the latest MySQL 5.7 release, as well as their most important aspects – so you can employ them for maximum database performance.

Areas covered:

  • Hardware
  • MySQL Configuration
  • Schema and Queries
  • Application Architecture
  • MySQL 5.7 New Tools and Features

Peter will highlight practical approaches and techniques for optimizing your time. He will also focus on the queries that are most important for your application. At the end of this webinar, you will know how to optimize MySQL performance in the most practical way.

register-now

Practical MySQL Performance OptimizationPeter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014 and 2015.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Jun
07
2016
--

Severe performance regression in MySQL 5.7 crash recovery

MySQL 5.7 Crash Recovery

In this post, we’ll discuss some insight I’ve gained regarding severe performance regression in MySQL 5.7 crash recovery.

Working on different InnoDB log file sizes in my previous post:

What is a big innodb_log_file_size?

I tried to understand how we can make InnoDB crash recovery faster, but found a rather surprising 5.7 crash recovery regression.

Basically, crash recovery in MySQL 5.7 is two times slower, due to this issue: https://bugs.mysql.com/bug.php?id=80788. InnoDB now performs the log scan twice, compared to a single scan in MySQL 5.6 (no surprise that there is performance degradation).

Fortunately, there is a proposed patch for MySQL 5.7, so I hope it will be improved soon.

As for general crash recovery improvement, my opinion is that it would be much improved by making it multi-threaded. Right now it is significantly limited by the single thread that reads and processes log entries one-by-one. With the current hardware, consisting of tens of cores and fast SSD, we can improve crash recovery by utilizing all the resources we have.

One small improvement that can be made is to disable PERFORMANCE_SCHEMA during recovery (these stats are not needed anyway).

Jun
03
2016
--

MySQL 5.7 By Default 1/3rd Slower Than 5.6 When Using Binary Logs

binary logs make MySQL 5.7 slower

binary logs make MySQL 5.7 slower

Researching a performance issue, we came to a startling discovery:

MySQL 5.7 + binlogs is by default 37-45% slower than MySQL 5.6 + binlogs when otherwise using the default MySQL settings

Test server MySQL versions used:
i7, 8 threads, SSD, Centos 7.2.1511
mysql-5.6.30-linux-glibc2.5-x86_64
mysql-5.7.12-linux-glibc2.5-x86_64

mysqld –options:

--no-defaults --log-bin=mysql-bin --server-id=2

Run details:
Sysbench version 0.5, 4 threads, socket file connection

Sysbench Prepare: 

sysbench --test=/usr/share/doc/sysbench/tests/db/parallel_prepare.lua --oltp-auto-inc=off --mysql-engine-trx=yes --mysql-table-engine=innodb --oltp_table_size=1000000 --oltp_tables_count=1 --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=/path_to_socket_file/your_socket_file.sock prepare

Sysbench Run:

sysbench --report-interval=10 --oltp-auto-inc=off --max-time=50 --max-requests=0 --mysql-engine-trx=yes --test=/usr/share/doc/sysbench/tests/db/oltp.lua --init-rng=on --oltp_index_updates=10 --oltp_non_index_updates=10 --oltp_distinct_ranges=15 --oltp_order_ranges=15 --oltp_tables_count=1 --num-threads=4 --oltp_table_size=1000000 --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=/path_to_socket_file/your_socket_file.sock run

Results:

5.6.30: transactions: 7483 (149.60 per sec.)
5.7.12: transactions: 4689 (93.71 per sec.)  — That is a 37.36% decrease!

Note: on high-end systems with premium IO (think Fusion-IO, memory-only, high-end SSD with good caching throughput), the difference would be much smaller or negligible.

The reason?

A helpful comment from Shane Bester on a related bug report made me realize what was happening. Note the following in the MySQL Manual:

“Prior to MySQL 5.7.7, the default value of sync_binlog was 0, which configures no synchronizing to disk—in this case, the server relies on the operating system to flush the binary log’s contents from time to time as for any other file. MySQL 5.7.7 and later use a default value of 1, which is the safest choice, but as noted above can impact performance.” — https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_sync_binlog

The culprit is thus the

--sync_binlog=1

 change which was made in 5.7.7 (in 5.6 it is 0 by default). While this may indeed be “the safest choice,” one has to wonder why Oracle chose to implement this default change in 5.7.7. After all, there are many other options t aid crash safety.

A related blog post  from the MySQL HA team states;

“Indeed, [with sync_binlog=1,] it increases the total number of fsyncs called, but since MySQL 5.6, the server groups transactions and fsync’s them together, which minimizes greatly a potential performance hit.” — http://mysqlhighavailability.com/replication-defaults-in-mysql-5-7-7/ (ref item #4)

This seems incorrect given our findings, unless perhaps it requires tuning some other option.

This raises some actions points/questions for Oracle’s team: why change this now? Was 5.6 never crash-safe in terms of binary logging? How about other options that aid crash safety? Is anything [before 5.7.7] really ACID compliant by default?

In 2009 my colleague Peter Zaitsev had already posted on performance matters in connection with sync_binlog issues. More than seven years later, the questions asked in his post may still be valid today;

“May be opening binlog with O_DSYNC flag if sync_binlog=1 instead of using fsync will help? Or may be binlog pre-allocation would be good solution.” — PZ

Testing the same setup again, but this time with

sync_binlog=0

  and

sync_binlog=1

  synchronized/setup on both servers, we see;

Results for sync_binlog=0:

5.6.30: transactions: 7472 (149.38 per sec.)
5.7.12: transactions: 6594 (131.86 per sec.)  — A 11.73% decrease

Results for sync_binlog=1:

5.6.30: transactions: 3854 (77.03 per sec.)
5.7.12: transactions: 4597 (91.89 per sec.)  — A 19.29% increase

Note: the increase here is to some extent negated by the fact that enabling sync_binlog is overall still causes a significant (30% on 5.7 and 48% on 5.6) performance drop. Also interesting is that this could be the effect of “tuning the defaults” of/in 5.7, and it also makes one think about the possibility o further defaults tuning/optimization in this area.

Results for sync_binlog=100:

5.6.30: transactions: 7564 (151.12 per sec.)
5.7.12: transactions: 6515 (130.22 per sec.) — A 13.83% decrease

Thus, while 5.7.12 made some improvements when it comes to

--sync_binlog=1

, when

--sync_binlog

 is turned off or is set to 100, we still see a ~11% decrease in performance. This is the same when not using binary logging at all, as a test with only

--no-defaults

  (i.e. 100% vanilla out-of-the-box MySQL 5.6.30 versus MySQL 5.7.12) shows;

Results without binlogs enabled:

5.6.30: transactions: 7891 (157.77 per sec.)
5.7.12: transactions: 6963 (139.22 per sec.)  — A 11.76% decrease

This raises another question for Oracle’s team: with four threads, there is a ~11% decrease in performance for 5.7.12 versus 5.6.30 (both vanilla)?

Discussing this internally, we were interested to see whether the arbitrary low number of four threads skewed the results and perhaps only showed a less realistic use case. However, testing with more threads, the numbers became worse still:

Results with 100 threads:

5.6.30. transactions: 20216 (398.89 per sec.)
5.7.12. transactions: 11097 (218.43 per sec.) — A 45.24% decrease

Results with 150 threads:

5.6.30. transactions: 11852 (233.01 per sec.)
5.7.12. transactions: 6606 (129.80 per sec.) — A 44.29% decrease

The findings in this article were compiled from a group effort.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com