Dec
05
2017
--

MySQL 8.0 Window Functions: A Quick Taste

Window Functions

Window FunctionsIn this post, we’ll briefly look at window functions in MySQL 8.0.

One of the major features coming to MySQL 8.0 is the support of Window functions. The detailed documentation is already available here Window functions. I wanted to take a quick look at the cases where window functions help.

Probably one the most frequent limitations in MySQL SQL syntax was analyzing a dataset. I tried to find the answer to the following question: “Find the Top N entries for each group in a grouped result.”

To give an example, I will refer to this request on Stackoverflow. While there is a solution, it is hardly intuitive and portable.

This is a popular problem, so databases without window support try to solve it in different ways. For example, ClickHouse introduced a special extension for LIMIT. You can use LIMIT n BY m to find “m” entries per group.

This is a case where window functions come in handy.

As an example, I will take the IMDB database and find the TOP 10 movies per century (well, the previous 20th and the current 21st).To download the IMDB dataset, you need to have to have an AWS account and download data from S3 storage (the details are provided on IMDB page).

I will use the following query with MySQL 8.0.3:

SELECT primaryTitle,century*100,rating,genres,rn as `Rank` FROM (SELECT primaryTitle,startYear div 100 as century,rating,genres, RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc) rn FROM title,ratings WHERE title.tconst=ratings.tconst AND titleType='movie' AND numVotes>100000) t1 WHERE rn<=10 ORDER BY century,rating desc

The main part of this query is RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc), which is the mentioned window function. PARTITION BY divides rows into groups, ORDER BY specifies the order and RANK() calculates the rank using the order in the specific group.

The result is:

+---------------------------------------------------+-------------+--------+----------------------------+------+
| primaryTitle                                      | century*100 | rating | genres                     | Rank |
+---------------------------------------------------+-------------+--------+----------------------------+------+
| The Shawshank Redemption                          |        1900 |    9.3 | Crime,Drama                |    1 |
| The Godfather                                     |        1900 |    9.2 | Crime,Drama                |    2 |
| The Godfather: Part II                            |        1900 |      9 | Crime,Drama                |    3 |
| 12 Angry Men                                      |        1900 |    8.9 | Crime,Drama                |    4 |
| The Good, the Bad and the Ugly                    |        1900 |    8.9 | Western                    |    4 |
| Schindler's List                                  |        1900 |    8.9 | Biography,Drama,History    |    4 |
| Pulp Fiction                                      |        1900 |    8.9 | Crime,Drama                |    4 |
| Star Wars: Episode V - The Empire Strikes Back    |        1900 |    8.8 | Action,Adventure,Fantasy   |    8 |
| Forrest Gump                                      |        1900 |    8.8 | Comedy,Drama,Romance       |    8 |
| Fight Club                                        |        1900 |    8.8 | Drama                      |    8 |
| The Dark Knight                                   |        2000 |      9 | Action,Crime,Drama         |    1 |
| The Lord of the Rings: The Return of the King     |        2000 |    8.9 | Adventure,Drama,Fantasy    |    2 |
| The Lord of the Rings: The Fellowship of the Ring |        2000 |    8.8 | Adventure,Drama,Fantasy    |    3 |
| Inception                                         |        2000 |    8.8 | Action,Adventure,Sci-Fi    |    3 |
| The Lord of the Rings: The Two Towers             |        2000 |    8.7 | Action,Adventure,Drama     |    5 |
| City of God                                       |        2000 |    8.7 | Crime,Drama                |    5 |
| Spirited Away                                     |        2000 |    8.6 | Adventure,Animation,Family |    7 |
| Interstellar                                      |        2000 |    8.6 | Adventure,Drama,Sci-Fi     |    7 |
| The Intouchables                                  |        2000 |    8.6 | Biography,Comedy,Drama     |    7 |
| Gladiator                                         |        2000 |    8.5 | Action,Adventure,Drama     |   10 |
| Memento                                           |        2000 |    8.5 | Mystery,Thriller           |   10 |
| The Pianist                                       |        2000 |    8.5 | Biography,Drama,Music      |   10 |
| The Lives of Others                               |        2000 |    8.5 | Drama,Thriller             |   10 |
| The Departed                                      |        2000 |    8.5 | Crime,Drama,Thriller       |   10 |
| The Prestige                                      |        2000 |    8.5 | Drama,Mystery,Sci-Fi       |   10 |
| Like Stars on Earth                               |        2000 |    8.5 | Drama,Family               |   10 |
| Whiplash                                          |        2000 |    8.5 | Drama,Music                |   10 |
+---------------------------------------------------+-------------+--------+----------------------------+------+
27 rows in set (0.19 sec)

The previous century was dominated by “The Godfather” and the current one by “The Lord of the Rings”. While we may or may not agree with the results, this is what the IMDB rating tells us.
If we look at the result set, we can see that there are actually more than ten movies per century, but this is how function RANK() works. It gives the same RANK for rows with an identical rating. And if there are multiple rows with the same rating, all of them will be included in the result set.

I welcome the addition of window functions into MySQL 8.0. This definitely simplifies some complex analytical queries. Unfortunately, complex queries still will be single-threaded — this is a performance limiting factor. Hopefully, we can see multi-threaded query execution in future MySQL releases.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com