“Hey, what’s going on with my applications? I installed a newer version of MySQL. I have queries that perfectly run with the older version and now I have a lot of errors.”
This is a question some customers have asked me after upgrading MySQL. In this article, we’ll see what one of the most frequent causes of this issue is, and how to solve it.
We are talking about this error:
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.web_log.user_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Have you ever seen it?
SQL_MODE
As the first thing let me introduce the concept of SQL_MODE.
MySQL can work using different SQL modes that affect the syntax of the queries and validation checks. Based on the configured value of the variable sql_mode, it means that a query can be valid and regularly executes or can receive a validation error and cannot be executed.
The oldest versions of MySQL got users accustomed to writing queries that were not semantically correct because it was designed to work in the “forgiving mode”. Users could write any kind of syntactically valid query regardless of SQL standard compliance or semantic rules. This was a bad habit that was corrected introducing the sql_mode to instruct MySQL to work in a more restrictive way for query validation.
Some users are not aware of this feature because the default value was not so restrictive. Starting from 5.7, the default value is more restrictive and this the reason why some users have problems with unexpected query failures after migration to 5.7 or 8.0.
The sql_mode variable can be set in the configuration file (/etc/my.cnf) or can be changed at runtime. The scope of the variable can be GLOBAL and SESSION, so it can change by the purpose of the mode for any single connection.
The sql_mode variable can have more values, separated by a comma, to control different behaviors. For example, you can instruct MySQL how to deal with dates with zeros as ‘0000-00-00’, to ensure the date be considered as valid or not. In the “forgiving mode” (or if sql_mode variable is empty) you can INSERT such a value without problems.
# set sql mode to "forgiving mode" mysql> set session sql_mode=''; Query OK, 0 rows affected (0.00 sec) mysql> create table t1( mydate date ); Query OK, 0 rows affected (0.05 sec) mysql> insert into t1 values('0000-00-00'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1; +------------+ | mydate | +------------+ | 0000-00-00 | +------------+ 1 row in set (0.00 sec)
But this is not the correct behavior as stated by the TRADITIONAL mode. As good programmers know, you have to validate dates into your source code in order to avoid to have incorrect data or incorrect results.
The following is how you can dynamically instruct MySQL to behave in the traditional mode to throw an error instead:
mysql> set session sql_mode='TRADITIONAL'; Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 values('0000-00-00'); ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'mydate' at row 1
There are many other modes you can use. Covering all the modes is not the goal of the article, so please refer to the official documentation for more details and examples:
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html
https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html
The ONLY_FULL_GROUP_BY issue
Let’s focus on the most frequent cause of errors when migrating to 5.7 or 8.0. As we said, 5.7 has a default SQL mode that is more restrictive than 5.6, and as such it’s for 8.0. This is true when you upgrade MySQL copying the old my.cnf file that doesn’t have a specific setting for the sql_mode variable. So, be aware.
Let’s create a sample table to store the clicks on the webpages of our site. We would like to log the page name and the id of the registered user.
mysql> create table web_log ( id int auto_increment primary key, page_url varchar(100), user_id int, ts timestamp); Query OK, 0 rows affected (0.03 sec) mysql> insert into web_log(page_url,user_id,ts) values('/index.html',1,'2019-04-17 12:21:32'), -> ('/index.html',2,'2019-04-17 12:21:35'),('/news.php',1,'2019-04-17 12:22:11'),('/store_offers.php',3,'2019-04-17 12:22:41'), -> ('/store_offers.php',2,'2019-04-17 12:23:04'),('/faq.html',1,'2019-04-17 12:23:22'),('/index.html',3,'2019-04-17 12:32:25'), -> ('/news.php',2,'2019-04-17 12:32:38'); Query OK, 7 rows affected (0.01 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> select * from web_log; +----+--------------------+---------+---------------------+ | id | page_url | user_id | ts | +----+--------------------+---------+---------------------+ | 1 | /index.html | 1 | 2019-04-17 12:21:32 | | 2 | /index.html | 2 | 2019-04-17 12:21:35 | | 3 | /news.php | 1 | 2019-04-17 12:22:11 | | 4 | /store_offers.php | 3 | 2019-04-17 12:22:41 | | 5 | /store_offers.html | 2 | 2019-04-17 12:23:04 | | 6 | /faq.html | 1 | 2019-04-17 12:23:22 | | 7 | /index.html | 3 | 2019-04-17 12:32:25 | | 8 | /news.php | 2 | 2019-04-17 12:32:38 | +----+--------------------+---------+---------------------+
Now we want to issue a query to calculate the most visited pages.
# let's turn the sql mode to "forgiving" mysql> set session sql_mode=''; Query OK, 0 rows affected (0.00 sec) mysql> SELECT page_url, user_id, COUNT(*) AS visits -> FROM web_log -> GROUP BY page_url ORDER BY COUNT(*) DESC; +-------------------+---------+--------+ | page_url | user_id | visits | +-------------------+---------+--------+ | /index.html | 1 | 3 | | /news.php | 1 | 2 | | /store_offers.php | 3 | 2 | | /faq.html | 1 | 1 | +-------------------+---------+--------+ 4 rows in set (0.00 sec)
The query works, but it’s not really correct. It is easily understandable that page_url is the column of the grouping function, the value we are most interested in and we want to be unique for counting. Also, the visits column is good, as it’s the counter. But what about user_id? What does this column represent? We grouped on the page_url so the value returned for user_id is just one of the values in the group. In fact, it was not only user number1 to visit the index.html, but even users 2 and 3 visited the page. How can I consider that value? Is it the first visitor? Is it the last one?
We don’t know the right answer! We should consider the user_id column’s value as a random item of the group.
Anyway, the right answer is that the query is not semantically correct, because it has no meaning to return a value from a column that is not part of the grouping function. Then the query is expected to be invalid in the traditional sql.
Let’s test it.
mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT page_url, user_id, COUNT(*) AS visits -> FROM web_log -> GROUP BY page_url ORDER BY COUNT(*) DESC; ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.web_log.user_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Now we have an error, as expected.
The SQL mode ONLY_FULL_GROUP_BY is part of the TRADITIONAL mode and it is enabled by default starting from 5.7.
A lot of customers had this kind of issue after migration to a recent version of MySQL.
Now we know what the cause of the issue is, but our applications are still not working. What possible solutions do we have to let the applications work again?
Solution 1 – rewrite the query
Since it’s not correct to select a column that is not part of the grouping, we can rewrite the query without those columns. Very simple.
mysql> SELECT page_url, COUNT(*) AS visits -> FROM web_log -> GROUP BY page_url ORDER BY COUNT(*) DESC; +-------------------+--------+ | page_url | visits | +-------------------+--------+ | /index.html | 3 | | /news.php | 2 | | /store_offers.php | 2 | | /faq.html | 1 | +-------------------+--------+
If you have a lot of queries affected by the problem, you have to potentially do a lot of work to retrieve and rewrite them. Or maybe the queries can be part of a legacy application you are not able or you don’t want to touch.
But this solution is the one that forces you to write correct queries and let your database configuration be restrictive in term of SQL validation.
Solution 2 – step back to the forgiving mode
You can change MySQL’s configuration and step back to the “forgiving” mode.
Or you can only drop the ONLY_FULL_GROUP_BY from the default. The default SQL mode in MySQL 5.7 includes these modes: ONLY_FULL_GROUP_BY, STRINCT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER
#set the complete "forgiving" mode mysql> SET GLOBAL sql_mode=''; # alternatively you can set sql mode to the following mysql> SET GLOBAL sql_mode='STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_ENGINE_SUBSTITUTION';
Solution 3 – use of aggregation functions
If your application absolutely needs to retrieve the user_id field for some valid reason, or it’s too complicated to change your source code, you can rely on an aggregation function in order to avoid changing the sql mode configuration.
For example we can use MAX(), MIN() or even GROUP_CONCAT() aggregation functions.
mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY'; mysql> SELECT page_url, MAX(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC; +-------------------+--------------+--------+ | page_url | MAX(user_id) | visits | +-------------------+--------------+--------+ | /index.html | 3 | 3 | | /news.php | 2 | 2 | | /store_offers.php | 3 | 2 | | /faq.html | 1 | 1 | +-------------------+--------------+--------+ mysql> SELECT page_url, GROUP_CONCAT(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC; +-------------------+-----------------------+--------+ | page_url | GROUP_CONCAT(user_id) | visits | +-------------------+-----------------------+--------+ | /index.html | 1,2,3 | 3 | | /news.php | 1,2 | 2 | | /store_offers.php | 3,2 | 2 | | /faq.html | 1 | 1 | +-------------------+-----------------------+--------+
MySQL provides even a specific function for solving the problem: ANY_VALUE().
mysql> SELECT page_url, ANY_VALUE(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC; +-------------------+--------------------+--------+ | page_url | ANY_VALUE(user_id) | visits | +-------------------+--------------------+--------+ | /index.html | 1 | 3 | | /news.php | 1 | 2 | | /store_offers.php | 3 | 2 | | /faq.html | 1 | 1 | +-------------------+--------------------+--------+
Conclusion
I personally prefer solution number 1 because it forces you to write SQL-92 compliant queries. Following the standards is often considered a best practice.
Solution 2 is good in case you cannot change your application code or if rewriting all the queries is really too complicated. The solution is very good to solve the issues in a matter of seconds, however, I strongly suggest to have a long term plan to rewrite the queries that are not SQL-92 compliant.
For more details: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Photo of sign by Ken Treloar on Unsplash