Nov
02
2022
--

Set Theory in MySQL 8.0: UNION and Now INTERSECT and EXCEPT

Set Theory in MySQL 8.0

Set Theory in MySQL 8.0Are you familiar with the UNION statement for your SQL queries? Most likely, you are. It has been supported for a long time.

In case you are not familiar with UNION, don’t worry, I’m going to show you how it works with simple examples.

Considering “Set Theory”, other than the UNION, starting from the newly deployed MySQL Community 8.0.31, a couple of new clauses are available: INTERSECT and EXCEPT.

From now on you can rely on more powerful options to write your queries and deploy some Set Theory logic.

In this article, I’m going to present how the three options work.

The traditional UNION

UNION combines the results from two or multiple SELECT statements into a single result set. Let’s suppose you have two queries, A and B, they return the same number of columns with the same data types and you need to merge all rows from both queries into a single result set. You can write the two SELECT statements joined with UNION clause, as follows:

SELECT ... FROM tableA WHERE ... UNION SELECT ... FROM tableB WHERE ....

Visualizing this using a Venn diagram you get the following:

 

Let’s see a simple example.

Christmas is coming and your company would like to send greeting cards by email to all your customers and employees. The company has to provide a single file with name, surname, and email address to an external delivery service. You have two distinct tables in your database: customers and employees. Well, you can run two queries and merge the data with a spreadsheet, right? That’s possible, but let’s try to do it with a single query instead. It’s more elegant.

Here are our tables.

mysql> select * from customers;
+----+-------------+----------+----------------------+
| id | name        | surname  | email                |
+----+-------------+----------+----------------------+
| 1 | Keanu        | Reeves   | keanu@domain1.com    |
| 2 | Laurence     | Fishburn | laurence@domain2.com |
| 3 | Carrie-Anne  | Moss     | carrie@domain3.com   |
| 4 | Hugo         | Weaving  | hugo@domain4.com     |
+----+-------------+----------+----------------------+


mysql> select * from employees;
+----+-------------+------------+--------------------+
| id | name        | surname    | email              |
+----+-------------+------------+--------------------+
| 1 | Gloria       | Foster     | gloria@domain1.com |
| 2 | Joe          | Pantoliano | joe@domain2.com    |
| 3 | Carrie-Anne  | Moss       | carrie@domain3.com |
| 4 | Marcus       | Chong      | marcus@domain4.com |
+----+-------------+------------+--------------------+

Please note that one of our employees (Carrie-Anne) is also one of our customers. It’s not an error, that’s on purpose.

The following single query merges the two queries by selecting all rows of the two tables using the UNION clause:

mysql> select name, surname, email
-> from employees
-> union
-> select name, surname, email
-> from customers;
+-------------+------------+----------------------+
| name        | surname    | email                |
+-------------+------------+----------------------+
| Gloria      | Foster     | gloria@domain1.com   |
| Joe         | Pantoliano | joe@domain2.com      |
| Carrie-Anne | Moss       | carrie@domain3.com   |
| Marcus      | Chong      | hugo@domain4.com     |
| Keanu       | Reeves     | keanu@domain1.com    |
| Laurence    | Fishburn   | laurence@domain2.com |
| Hugo        | Weaving    | hugo@domain4.com     |
+-------------+------------+----------------------+
7 rows in set (0.00 sec)

Note that Carrie-Ann appears only once on the list. Indeed the two tables have four rows each, but the final result set has only seven. The UNION clause eliminates by default all duplicates, only one occurrence is preserved.

If you would like to avoid dropping duplicates, you have to use UNION ALL instead. In such a case, eight rows would be returned, with Carrie-Ann appearing twice. There are several use cases where this makes sense. Actually, if you know that both sets don’t have duplicate items, using UNION ALL instead of UNION makes your query more efficient because the step of processing duplicate items is skipped.

It is possible to sort the result set by putting the ORDER BY clause at the end of the second query only.

mysql> select name, surname, email 
-> from employees 
-> union 
-> select name, surname, email 
-> from customers 
-> order by name, surname;
+-------------+------------+----------------------+
| name        | surname    | email                |
+-------------+------------+----------------------+
| Carrie-Anne | Moss       | carrie@domain3.com   |
| Gloria      | Foster     | gloria@domain1.com   |
| Hugo        | Weaving    | hugo@domain4.com     |
| Joe         | Pantoliano | joe@domain2.com      |
| Keanu       | Reeves     | keanu@domain1.com    |
| Laurence    | Fishburn   | laurence@domain2.com |
| Marcus      | Chong      | hugo@domain4.com     |
+-------------+------------+----------------------+

INTERSECT

While UNION has been available in MySQL for a long time, INTERSECT is the more recent novelty.

With the INTERSECT clause, you can limit the result set of multiple SELECT statements to those rows which are common to all.

Basically, you can get what is shown in the following Venn diagram:

Let’s run a query to find out who are customers that are also employees. We can do it by finding the intersection of the two sets.

mysql> select name, surname, email
-> from customers
-> intersect
-> select name, surname, email
-> from employees;
+-------------+---------+--------------------+
| name        | surname | email              |
+-------------+---------+--------------------+
| Carrie-Anne | Moss    | carrie@domain3.com |
+-------------+---------+--------------------+

As expected, Carrie-Anne is the one in both the customers and employees tables.

The eventual duplicate rows are dropped by default, the same happens with UNION.

EXCEPT

Even the EXCEPT clause is a novelty of MySQL 8.0.31. Let’s test it.

The EXCEPT clause permits finding the rows from the first query that don’t appear in the second query. The following Venn diagram shows what is the expected result.

Let’s find out all the customers that are not employees. The following query can achieve that.

mysql> select name, surname, email
-> from customers
-> except
-> select name, surname, email
-> from employees;
+----------+----------+----------------------+
| name     | surname  | email                |
+----------+----------+----------------------+
| Keanu    | Reeves   | keanu@domain1.com    |
| Laurence | Fishburn | laurence@domain2.com |
| Hugo     | Weaving  | hugo@domain4.com     |
+----------+----------+----------------------+

What about the opposite? If I would like to find out all the employees that are not also customers I can simply invert the two queries as follows:

mysql> select name, surname, email
-> from employees
-> except
-> select name, surname, email
-> from customers;
+--------+------------+--------------------+
| name   | surname    | email              |
+--------+------------+--------------------+
| Gloria | Foster     | gloria@domain1.com |
| Joe    | Pantoliano | joe@domain2.com    |
| Marcus | Chong      | marcus@domain4.com |
+--------+------------+--------------------+

What we’ve got is shown in the following diagram:

 

Combine clauses to cover another case

Let’s suppose we would like to find out the list of customers and employees to match the following diagram. So, customers and employees that don’t appear in both tables.

MySQL does not have a specific clause for that, but we can implement it by combining the available ones. We can find what we need by calculating the UNION of two EXCEPTs. Let’s look at the query:

mysql> (select name, surname, email from customers 
-> except 
-> select name, surname, email from employees) 
-> union 
-> (select name, surname, email from employees 
-> except 
-> select name, surname, email from customers) 
-> order by surname;
+----------+------------+----------------------+
| name     | surname    | email                |
+----------+------------+----------------------+
| Marcus   | Chong      | marcus@domain4.com   |
| Laurence | Fishburn   | laurence@domain2.com |
| Gloria   | Foster     | gloria@domain1.com   |
| Joe      | Pantoliano | joe@domain2.com      |
| Keanu    | Reeves     | keanu@domain1.com    |
| Hugo     | Weaving    | hugo@domain4.com     |
+----------+------------+----------------------+

As an alternative, we can calculate it with the EXCEPT between the UNION and the INTERSECT. It is basically the difference between the full merged data and the intersection. Easier to see than explain it:

mysql> (select name, surname, email from customers 
-> union 
-> select name, surname, email from employees) 
-> except 
-> (select name, surname, email from customers 
-> intersect 
-> select name, surname, email from employees) 
-> order by surname;
+----------+------------+----------------------+
| name     | surname    | email                |
+----------+------------+----------------------+
| Marcus   | Chong      | marcus@domain4.com   |
| Laurence | Fishburn   | laurence@domain2.com |
| Gloria   | Foster     | gloria@domain1.com   |
| Joe      | Pantoliano | joe@domain2.com      |
| Keanu    | Reeves     | keanu@domain1.com    |
| Hugo     | Weaving    | hugo@domain4.com     |
+----------+------------+----------------------+

The two queries return the same result.

Keep in mind an important thing when combining multiple operators. My suggestion is to use parentheses to split the logic for the set theory operation you have in mind, as you can see in the examples. Without parenthesized expressions, you have to consider that INTERSECT acts before UNION and EXCEPT. This may change the logic of your query. So, use parentheses to simplify your work.

Note:

These queries can be rewritten using Common Table Expressions (CTE). This may result in a faster query when dealing with a larger data set. An equivalent query is as follows:

mysql> with
-> cte_cust as (select name, surname, email from customers),
-> cte_emp as (select name, surname, email from employees)
-> (select * from cte_cust except select * from cte_emp)
-> union
-> (select * from cte_emp except select * from cte_cust)
-> order by surname;
+----------+------------+----------------------+
| name     | surname    | email                |
+----------+------------+----------------------+
| Marcus   | Chong      | marcus@domain4.com   |
| Laurence | Fishburn   | laurence@domain2.com |
| Gloria   | Foster     | gloria@domain1.com   |
| Joe      | Pantoliano | joe@domain2.com      |
| Keanu    | Reeves     | keanu@domain1.com    |
| Hugo     | Weaving    | hugo@domain4.com     |
+----------+------------+----------------------+

If you are not familiar with CTE, please have a look at the article Introduction to MySQL 8.0 Common Table Expressions that I wrote to get more details.

Conclusion

The new INTERSECT and EXCEPT clauses have been finally deployed in MySQL 8.0.31. Together with the well-known UNION, you can now implement your set theory logic reducing the number of queries. That’s awesome.

Use parenthesized expressions if you have to combine multiple clauses at once. This will help a lot.

Keep in mind that it is possible to write equivalent queries with JOINs instead of using UNION, INTERSECT, and EXCEPT. In some cases, a query using JOINs may be more efficient but it could be less readable. In other cases, it is possible that with UNION and other clauses the query could run faster (in particular if you have multiple OR conditions splitting them with multi UNIONs could lead to a faster query). The goal of this article was just to introduce the new features, not to test the performance. In the real world, remember to always rely on the EXPLAIN function to check the execution plan of your queries to find out the optimal one.

At the time of this post, Percona Server for MySQL 8.0.31 has not been deployed yet. It’s on the way. For now, you can test the new features on MySQL Community 8.0.31.

https://dev.mysql.com/doc/refman/8.0/en/union.html

https://dev.mysql.com/doc/refman/8.0/en/intersect.html

https://dev.mysql.com/doc/refman/8.0/en/except.html

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com