Oct
30
2019
--

Understanding Hash Joins in MySQL 8

hash joins mysql

hash joins mysqlIn MySQL 8.0.18 there is a new feature called Hash Joins, and I wanted to see how it works and in which situations it can help us. Here you can find a nice detailed explanation about how it works under the hood.

The high-level basics are the following: if there is a join, it will create an in-memory hash table based on one of the tables and will read the other table row by row, calculate a hash, and do a lookup on the in-memory hash table.

Great, but does this give us any performance benefits?

First of all, this only works on fields that are not indexed, so that is an immediate table scan and we usually do not recommend doing joins without indexes because it is slow. Here is where Hash Joins in MySQL can help because it will use an in-memory hash table instead of Nested Loop.

Let’s do some tests and see. First I created the following tables:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

I have inserted 131072 random rows into both tables.

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 131072   |
+----------+

First test – Hash Joins

Run a join based on c2 which is not indexed.

mysql> explain format=tree select count(*) from t1 join t2 on t1.c2 = t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=1728488704)
-> Table scan on t2 (cost=0.01 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

We have to use explain format=tree to see if Hash Join will be used or not, as normal explain still says it is going to be a Nested Loop, which I think it is very misleading. I have already filed a bug report because of this and in the ticket, you can see some comments from developers saying:

The solution is to stop using traditional EXPLAIN (it will eventually go away).

So this is not going to be fixed in traditional explain and we should start using the new way.

Back to the query; we can see it is going to use Hash Join for this query, but how fast is it?

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (0.73 sec)

0.73s for a more than 17m rows join table. Looks promising.

Second Test – Non-Hash Joins

We can disable it with an optimizer switch or optimizer hint.

mysql> select /*+ NO_HASH_JOIN (t1,t2) */ count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (13 min 36.36 sec)

Now the same query takes more than 13 minutes. That is a huge difference and we can see Hash Join helps a lot here.

Third Test – Joins Based on Indexes

Let’s create indexes and see how fast a join based on indexes is.

create index idx_c2 on t1(c2);
create index idx_c2 on t2(c2);

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (2.63 sec)

2.6s

  Hash Join is even faster than the Index-based join in this case.

However, I was able to force the optimizer to use Hash Joins even if an index is available by using ignore index:

mysql> explain format=tree select count(*) from t1 ignore index (idx_c2) join t2 ignore index (idx_c2) on t1.c2 = t2.c2 where t1.c2=t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=17336898)
-> Table scan on t2 (cost=0.00 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

I still think it would be nice if I can tell the optimizer with a hint to use Hash Joins even if an index is available, so we do not have to ignore indexes on all the tables. I have created a feature request for this.

However, if you read my first bug report carefully you can see a comment from a MySQL developer which indicates this might not be necessary:

BNL (Block Nested-Loop) will also go away entirely at some point, at which point this hint will be ignored.

That could mean they are planning to remove BNL joins in the future and maybe replace it with Hash join.

Limitations

We can see Hash Join can be powerful, but there are limitations:

  • As I mentioned it only works on columns that do not have indexes (or you have to ignore them).
  • It only works with equi-join conditions.
  • It does not work with LEFT or RIGHT JOIN.

I would like to see a status metric as well to monitor how many times Hash Join was used, and for this, I filled another feature request.

Conclusion

Hash Join seems a very powerful new join option, and we should keep an eye on this because I would not be surprised if we get some other features in the future as well. In theory, it would be able to do Left and Right joins as well and as we can see in the comments on the bug report that Oracle has plans for it in the future.

May
30
2012
--

A case for MariaDB’s Hash Joins

MariaDB 5.3/5.5 has introduced a new join type “Hash Joins” which is an implementation of a Classic Block-based Hash Join Algorithm. In this post we will see what the Hash Join is, how it works and for what types of queries would it be the right choice. I will show the results of executing benchmarks for different queries and explain the results so that you have a better understanding of when using the Hash Join will be best and when not. Although Hash Joins are available since MariaDB 5.3, but I will be running my benchmarks on the newer MariaDB 5.5.

Overview

Hash Join is a new algorithm introduced in MariaDB 5.3/5.5 that can be used for joining tables that have a equijoin conditions of the form tbl1.col1 = tbl2.col1, etc. As I mentioned above that what is actually implemented is the Classic Hash Join. But its known as Block Nested Loop Hash (BNLH) Join in MariaDB.
The Classic Hash Join Algorithm consists of two phases, a build phase and a probe phase. Let’s consider the case of joining two tables on a equijoin condition. So the first thing would be to designate the smallest of the two tables as the left operand and the other table which is bigger, to be the right operand. Now when the algorithm begins, the first phase is the build phase, in which a hash table is created over the join attributes and rows of the left operand. Next comes the probe phase, which is where the matching rows from the right operand are found, by scanning the right operand and for each row scanned performing a lookup in the hash table by using values of the columns participating in the equijoin condition. The hash table is accessed by using a hash function on the values of the join condition, and hence is quite efficient. But what about the restriction on the size of the hash table. The size of the hash table is restricted by the value of join_buffer_size, and so if the left operand is big such that the size of the hash table built on it is greater than the join_buffer_size, then multiple hash tables would be created. For example if the left operand has “n” rows, and its size is three times the value of join_buffer_size, then 3 hash tables would need to be created each containing a hash table on n/3 rows. And so both the build and probe phase would be done three times, which means that the right operand will be scanned thrice.

Wikipedia has a nicely simplified version of the Classic Hash Join algorithm (http://en.wikipedia.org/wiki/Hash_join#Classic_hash_join) which I will quote below for better understanding:

  1. For each tuple r in the build input R
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input S, and add matching join tuples to the output relation
      2. Reset the hash table
  2. Do a final scan of the probe input S and add the resulting join tuples to the output relation

Now after the explanation of the hash join lets see how it performs for different test cases.

Benchmarks

For the purpose of the benchmarks I used the DBT3 dataset of scale factor 2, which means the total dataset size is 4.8G. Let me show the breakdown of dataset size by the tables that I have used in the benchmarks:

Table ‘lineitem’: 3.8G
Table ‘supplier’: 11M
Table ‘orders’: 468M

I have benchmarked two different kinds of workloads, IO bound and in-memory. Benchmark on IO bound workload was performed with a buffer pool size of 1G, while benchmark on in-memory workload was performed with a buffer pool size of 6G. The benchmarks compare Block Nested Loop (BNL) Join of MySQL 5.5.24, Batched Key Access (BKA) Join of MySQL 5.6.5 and Block Nested Loop Hash (BNLH) Join of MariaDB 5.5.20. The configuration used with the three variants of MySQL are listed below.

Configuration

Let’s first take a look at the configuration used with different MySQL flavors.

MySQL 5.5.24 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

MySQL 5.6.5 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
read_rnd_buffer_size=32M
optimizer_switch='batched_key_access=on'
join_buffer_size=32M

MariaDB 5.5.20 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'

optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
mrr_buffer_size=32M

optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=4
join_buffer_size=32M
join_buffer_space_limit=32M

Note that MariaDB includes a new variable ‘join_cache_level‘, this variable controls which Join Algorithms are allowed to be used, a value of 4 here means that Nested Loop Join and Hash Join algorithms are allowed. Now as well know that ‘join_buffer_size‘ controls the size of the join buffer allocated for each join in a query, MariaDB introduces another variable to control the size of the buffer ‘join_buffer_space_limit‘. This variable controls the maximum allowed size of the buffer for the whole query. By default it has a value of 1024*128*10, which means that your effective join_buffer_size is not bigger than this value. Hence, the reason I have set join_buffer_space_limit=32M.

Benchmark Machine Specs

The machine that I used for the benchmarks, is a dual core machine with the following CPU configuration: 2xIntel(R) Core(TM)2 CPU 6600 @ 2.40GHz. The amount of memory installed is 8G and the MySQL datadir is on a 4-disk Software RAID5 volume, the disks are 5.4K RPM disks. The filesystem used is XFS, and the OS installed is Centos 6.2

Table Structure

Before moving on, let’s take a look at the structure of the tables involved in the benchmark tests.

CREATE TABLE `supplier` (
  `s_suppkey` int(11) NOT NULL DEFAULT '0',
  `s_name` char(25) DEFAULT NULL,
  `s_address` varchar(40) DEFAULT NULL,
  `s_nationkey` int(11) DEFAULT NULL,
  `s_phone` char(15) DEFAULT NULL,
  `s_acctbal` decimal(10,2) DEFAULT NULL,
  `s_comment` varchar(101) DEFAULT NULL,
  PRIMARY KEY (`s_suppkey`),
  KEY `i_s_nationkey` (`s_nationkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `lineitem` (
  `l_orderkey` int(11) NOT NULL DEFAULT '0',
  `l_partkey` int(11) DEFAULT NULL,
  `l_suppkey` int(11) DEFAULT NULL,
  `l_linenumber` int(11) NOT NULL DEFAULT '0',
  `l_quantity` decimal(10,2) DEFAULT NULL,
  `l_extendedprice` decimal(10,2) DEFAULT NULL,
  `l_discount` decimal(10,2) DEFAULT NULL,
  `l_tax` decimal(10,2) DEFAULT NULL,
  `l_returnflag` char(1) DEFAULT NULL,
  `l_linestatus` char(1) DEFAULT NULL,
  `l_shipDATE` date DEFAULT NULL,
  `l_commitDATE` date DEFAULT NULL,
  `l_receiptDATE` date DEFAULT NULL,
  `l_shipinstruct` char(25) DEFAULT NULL,
  `l_shipmode` char(10) DEFAULT NULL,
  `l_comment` varchar(44) DEFAULT NULL,
  PRIMARY KEY (`l_orderkey`,`l_linenumber`),
  KEY `i_l_shipdate` (`l_shipDATE`),
  KEY `i_l_suppkey_partkey` (`l_partkey`,`l_suppkey`),
  KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`),
  KEY `i_l_suppkey` (`l_suppkey`),
  KEY `i_l_receiptdate` (`l_receiptDATE`),
  KEY `i_l_orderkey` (`l_orderkey`),
  KEY `i_l_orderkey_quantity` (`l_orderkey`,`l_quantity`),
  KEY `i_l_commitdate` (`l_commitDATE`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL DEFAULT '0',
  `o_custkey` int(11) DEFAULT NULL,
  `o_orderstatus` char(1) DEFAULT NULL,
  `o_totalprice` decimal(10,2) DEFAULT NULL,
  `o_orderDATE` date DEFAULT NULL,
  `o_orderpriority` char(15) DEFAULT NULL,
  `o_clerk` char(15) DEFAULT NULL,
  `o_shippriority` int(11) DEFAULT NULL,
  `o_comment` varchar(79) DEFAULT NULL,
  PRIMARY KEY (`o_orderkey`),
  KEY `i_o_orderdate` (`o_orderDATE`),
  KEY `i_o_custkey` (`o_custkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Test Cases

Now let’s see the test cases and then see how the joins perform for each test case.

Test Case A – Join a small table that fits in memory to a large table with no WHERE clause

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
GROUP BY s_nationkey, l_shipmode;

+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref                     | rows  | filtered | Extra                                        |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+
|  1 | SIMPLE      | supplier | index | PRIMARY       | i_s_nationkey | 5       | NULL                    | 20266 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | lineitem | ref   | i_l_suppkey   | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |   250 |   100.00 | Using where                                  |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 32077 seconds to finish in the IO bound workload. Anyhow we can clearly see from the above chart that Hash join comprehensively beats BKA and BNL, hash join is perfect in these cases where you are joining a small table with a very large table with no ‘indexed where’ conditions on the big table. BNLH takes half the time to complete the query for in-memory workload and 6.6x less times as compared to BKA MySQL 5.6, and 965x less time as compared to BNL MySQL 5.5. So hash join gives us an improvement by a very large factor both for IO bound workload and in-memory workload.

Test Case B – Join a small table that fits in memory to a large table with a selective WHERE clause on an indexed column

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
WHERE s_nationkey='24'
GROUP BY s_nationkey, l_shipmode; 

+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table    | type | possible_keys         | key           | key_len | ref                     | rows | filtered | Extra                                                     |
+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | supplier | ref  | PRIMARY,i_s_nationkey | i_s_nationkey | 5       | const                   |  808 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | lineitem | ref  | i_l_suppkey           | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |  292 |   100.00 | Using where                                               |
+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 1280 seconds to finish in the IO bound workload. In this test Hash join is not ideal because you have a highly selective where clause that reduces the size of the joining data set. And hash join performs even badly and takes 7x more time for in-memory workload in this test case. While for IO bound workload, hash join takes 53x less time to execute the query as compared to MySQL 5.5 but takes slightly more time as compared to BKA algorithm of MySQL 5.6.

Test Case C – Join a small table with a large table with a WHERE clause on a non-indexed column

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
WHERE l_shipmode='AIR'
GROUP BY s_nationkey, l_shipmode;

+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref                     | rows  | filtered | Extra       |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+
|  1 | SIMPLE      | supplier | index | PRIMARY       | i_s_nationkey | 5       | NULL                    | 20174 |   100.00 | Using index |
|  1 | SIMPLE      | lineitem | ref   | i_l_suppkey   | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |   270 |   100.00 | Using where |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 31654 seconds to finish in the IO bound workload. Again here hash join beats BKA and BNL comprehensively. Hash join outperforms the other join types when you are joining a small table with a very large table with a where clause on a ‘non-indexed’ column In this test we can clearly see that Hash Join gives a lot of reduction in query time. The reduction in query time for IO bound workload is 1266x times when compared to MySQL 5.5 and 9x times when compared to MySQL 5.6. While for in-memory workload the reduction is query time is 3.5x when compared to both MySQL 5.5 and MySQL 5.6.

Another interesting thing to note is that for both Test B and Test C, Hash Join takes similar amount of time both for IO bound workload and in-memory workload. Why, because Hash Join implies scanning the table lineitem (right operand) in both test cases. Since in Test B we have a limited set of rows in the supplier table (left operand) to join to the lineitem table (right operand) so scanning the lineitem table (BNLH) proves to be costly as compared to doing batched index lookups (BKA). However, in Test C the cost of hash join remains the same but the cost of BKA increases, as there are going to be a lot more random index lookups needed to be performed because of the increase in the number of rows needed to be joined from supplier table (left operand).

Test Case D – Join a large data set (>1M rows) from one table with a large table

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT o.*, count(*) as num_items
FROM orders AS o INNER JOIN lineitem AS l ON o_orderkey=l_orderkey
WHERE o_orderdate > '1996-05-01' GROUP BY o_orderkey
ORDER BY num_items DESC LIMIT 10;

+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type  | possible_keys                              | key     | key_len | ref               | rows    | filtered | Extra                                        |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+
|  1 | SIMPLE      | o     | index | PRIMARY,i_o_orderdate                      | PRIMARY | 4       | NULL              | 2993459 |    50.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | dbt3.o.o_orderkey |       1 |   100.00 | Using index                                  |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+

And the results in seconds of time taken to complete the above query:

Here we can clearly see that MySQL 5.5 beats both BKA of MySQL 5.6 and Hash Join of MariaDB 5.5. In IO bound test MySQL 5.5 takes 2.5x less time to complete the query as compared to MySQL 5.6 BKA algorithm, and takes 6.6x less time as compared to MariaDB 5.5 Hash Join, Hash Join also performs worse as compared to BKA and takes 2.5x more time. While for in-memory workload test, MySQL 5.5 takes 5x less time as compared to MySQL 5.6 and 13x less time as compared to MariaDB 5.5 Hash Join. First thing to note is that the above query would be reading 1/3 the number of rows in the table orders (left operand), and so MySQL 5.5 prefers to do a PRIMARY index scan of the table orders resulting in sequential IO, while both MySQL 5.6 and MariaDB 5.5 prefer to do an index range scan on the secondary key o_orderdate which results in random scans of the PK to fetch the columns that are not part of the secondary key. Even though MySQL 5.6 uses MRR to offset the effect of random access of PK, even then it proves to be costly. Also note that the table lineitem, is joined by the column l_orderkey which is the left-most PK column, so reading the table orders in PK order has another benefit that it implies reading the table lineitem in PK order. Hence, these benefits mean MySQL 5.5 wins. But why does Hash Join take so much more time. The reason is that the rows needed to be read from the left operand which is the table orders are far greater than the size of the join buffer. The size of the join buffer is 32M, while the size of the left operand is 186M which means roughly 6 scans of the right operand which is the table lineitem. Hence the reason why hash join is slow in this case, because we have to refill the join buffer with rows from orders table many times, and hash join is not that good if you need many scans of the right operand (in this case the table lineitem).

Another difference with the query in this test case is that, while with the queries in previous test cases, the joining key from the table supplier would match approximately 600 rows from the table lineitem for each distinct key value, in this test case D, the joining key from the table orders would match approximately 5 rows from the table lineitem for each distinct key value. Also the joining key in this test case D is PK in one table and left-most part of the PK in the second table.

How does optimizer work with the different Join Algorithms available?

Currently, the part of the optimizer that is responsible for choosing the join algorithm for a particular query and QEP is not advanced enough and there is work to be done yet. As I understand it MariaDB folks are working on the cost-based choice for any joins. It’s not easy because the current costing model is primitive and must be enhanced to support the possibility of existence of different join algorithms. So what does that mean to MariaDB/MySQL users right now with the state of the current optimizer. Right now you would have to manually enable and disable the join algorithms for the optimizer to choose from.
In MariaDB, every algorithm has a number given to it:
1 – flat BNL
2 – incremental BNL
3 – flat BNLH
4 – incremental BNLH
5 – flat BKA
6 – incremental BKA
7 – flat BKAH
8 – incremental BKAH

The variable join_cache_level controls which algorithms are enabled. If join_cache_level=4 all algorithms numbered 1 to 4 are enabled, if join_cache_level=8, all algorithms numbered 1 to 8 are enabled. Optimizer is naive in the sense that it always uses the max values join algorithm. If join_cache_level=4 it always uses BNLH (hash join), if join_cache_level=8 it always uses BKAH (a variant of BKA). Optimizer does not try to check which algorithm is the best one to use, it just assumes that the algorithm with the highest numeric value is the best one.
So we can force the join algorithm used by setting appropriate values of “join_cache_level”. For example in my test I forced the optimizer to use hash join by setting join_cache_level=4. We can set certain rules for which certain join algorithms are best and then use that algorithm by making use of the variable “join_cache_level”.

Conclusion

Based on the above information and the benchmark results for different test cases, we can see where Hash Joins work best and where they don’t. First of all Hash joins only work for equijoins. Hash join work best when you are joining very big tables with no WHERE clause, or a WHERE clause on a non-indexed column. They also provide big improvement in query response time when you are joining tables with no indexes on the join condition (Full Join). The best performance with Hash Join can be achieved when the left table can fit completely in the join buffer, or when the least amount of buffer refills are needed, as each buffer refill means a scan of the right-side table. However, Hash joins do not outperform BNL or BKA when you are joining a really small subset of rows, as then scanning the right-side table becomes costly in comparison. Block Nested Loop Join would perform better than Hash Join when you are joining two tables on a PK column such that both tables are read in PK order. One use case that I can think of for hash joins is data warehouse applications that need to run reporting queries that need to join on lookup tables which tend to be small mostly. What use cases can you think of?

Apr
04
2012
--

Join Optimizations in MySQL 5.6 and MariaDB 5.5

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 has introduced some additional optimizations which we will also look at, in this post.

Now let me briefly explain these optimizations.

Batched Key Access

Traditionally, MySQL always uses Nested Loop Join to join two or more tables. What this means is that, select rows from first table participating in the joins are read, and then for each of these rows an index lookup is performed on the second table. This means many point queries, say for example if table1 yields 1000 rows then 1000 index lookups are performed on table2. Of course this could mean a lot of random lookups in case the dataset does not fit into memory or if the caches are not warm, this also does not take into account locality of access, as you could be reading the same page multiple times. Consider if you have a small buffer pool and a very large number of rows from table2 that do not fit into the buffer pool, then worst case you could be reading the same page multiple times into the buffer pool.
So considering this drawback, Batched Key Access (BKA) optimization was introduced. When BKA is being used then, after the selected rows are read from table1, the values of indexed columns that will be used to perform a lookup on table2 are batched together and sent to the storage engine. The storage engine then uses the MRR interface (which I explained in my previous post) to lookup the rows in table2. So this means that we have traded many point index lookups to one or more index range lookups. This means MySQL can employ many other optimizations like for example if columns other then the secondary key columns are required to be fetched, then MRR interface can instead access the PK by arranging the secondary key values by PK column and then performing the lookup, and then there are other possibilities like InnoDB doing read_ahead by noticing the sequential access pattern.
BKA is available in both MySQL 5.6 and MariaDB 5.5. You can read more about BKA in MySQL 5.6 here and BKA in MariaDB 5.5 here.

However, MariaDB 5.5 has one additional optimization that is used in conjunction with BKA and that is MRR Key-ordered Scan. Let’s see what this optimization actually is.

Key-ordered Scan for BKA

With this optimization the idea of MRR is further extended to improve join performance. As I told you above, when table t1 would be joined to table t2, then selected rows from t1 would be read and then for all rows, index lookup would be performed on t2. Here is where Key-ordered scan comes in. With BKA turned on, the index lookups to t2 are batched together and sent to the MRR interface. Now when the MRR interface is going to perform the lookup on t2 by a secondary key on the common join column, let’s suppose col1, then with Key-ordered scan, the secondary key on col1 would be sorted by col1 i.e. in index order and then the lookup will be performed. So key-ordered scan is basically an extension of MRR to secondary keys. You can read more about Key-ordered Scan for BKA here.

Now there is one additional join optimization in MariaDB that I would like to quickly explain here, and that is Hash Joins.

Hash Join

As I have told before MySQL has only supported one join algorithm and that is Nested Loop Join. MariaDB has introduced a new join algorithm Hash Join. This join algorithm only works with equi-joins. Now let me briefly explain how hash join algorithm works. Suppose you have two tables t1 and t2, that you wish to join together in an equi-join, then the way hash join algorithm will operate is that first a hash table will be created based on the columns of table t1 that are two be used to join rows with table t2. This step is called the build step. After the hash table has been created, rows from table t2 are read and hash function is applied to the columns participating in the join condition and then a hash table lookup is performed, on the hash table we created earlier. This step is known as probe step. This join algorithm works best for highly selective queries, and obviously when the first operand used in the build step is such that the hash table fits in memory. You can read more about the hash join algorithm here.

Now let’s move on to the benchmarks, to see the difference in numbers.

Benchmark results

For the purpose of this benchmark, I have used TPC-H Query #3 and ran it on TPC-H dataset (InnoDB tables) with a Scale Factor of 2 (InnoDB dataset size ~5G). Note that query cache is disabled during these benchmark runs and that the disks are 4 5.4K disks in Software RAID5.

Also note that the following changes were made on MySQL 5.6 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='batched_key_access=on'
join_buffer_size=6M
read_rnd_buffer_size=6M

Also note that the following changes were made on MariaDB 5.5 config:

optimizer_switch='index_condition_pushdown=off'
optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=8
join_buffer_size=6M
mrr_buffer_size=6M

Note that I have turned off ICP because I would like to see the affect of BKA on the query times. But I also had to turn on MRR because BKA uses the MRR interface. Note also the additional optimizer switches and variables in case of MariaDB 5.5. You can read more about these variables here.

The query used is:

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem FORCE INDEX (i_l_orderkey)
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < '1995-03-09'
        and l_shipdate > '1995-03-09'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

In-memory workload

Now let’s see how effective are the join optimizations when the workload fits entirely in memory. For the purpose of benchmarking in-memory workload, the InnoDB buffer pool size is set to 6G and the buffer pool was warmed up, so that the relevant pages were already loaded in the buffer pool. Note that as mentioned at the start of the benchmark results section, the InnoDB dataset size is ~5G. Now let’s take a look at the graph:

In the graph above the labels mean as follows:
MySQL 5.6 (2) is for MySQL 5.6 w/ buffer size 6M
MariaDB 5.5 (2) is for MariaDB 5.5 w/ buffer size 6M
MariaDB 5.5 (3) is for MariaDB 5.5 w/ buffer size 6M (Hash Join and Key-ordered Scan disabled)

You can see that the lowest query time is for MySQL 5.6 which takes 0.16s less as compared to MySQL 5.5 While with join_buffer_size set to 6M and read_rnd_buffer_size set to 6M, the query time for MySQL 5.6 becomes approximately equal to that of MySQL 5.5. MariaDB 5.5 is quite slow as compared to both MySQL 5.5 and MySQL 5.6. For MariaDB 5.5 the query time decreases when the join_buffer_size is set to 6M and mrr_buffer_size is set to 6M. While the lowest query time for MariaDB 5.5 is when both the Hash Join and the Key-ordered Scan are disabled.

IO bound workload

Now let’s see how effective are the join optimizations when the workload is IO bound. For the purpose of benchmarking IO bound workload, the InnoDB buffer pool size is set to 1G and the buffer pool was not warmed up, so that it does not have the relevant pages loaded up already. Now let’s take a look at the graph:

The labels in the graph above are same as in the graph for in-memory workload.

BKA makes a huge difference when the workload is IO bound and the query time drops from 2534.41s down to under a minute. MySQL 5.6 has the smallest query time when the join_buffer_size is set to 6M and the read_rnd_buffer_size is set to 6M. MariaDB 5.5 is close, yet slower by ~10s. Another thing that is important to know is that, just increasing the join_buffer_size is not going to provide the best possible query performance. As I mentioned at the start that BKA uses the MRR interface so you should also adjust/increase the size of the buffer that MRR uses appropriately. In my tests I noticed that with just the join_buffer_size increased to 6M, the query time was ~300s while when both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size were set to 6M, the query time dropped to ~40s. So the maximum possible benefit is when size of both the buffers is increased appropriately. Also in the chart above the bottom two bars correspond to MariaDB with Hash Join and Key-ordered Scan enabled, and MariaDB with Hash Join and Key-ordered Scan disabled, and the only difference in query time is 48.78s vs 48.91s, so I don’t see Hash Join and Key-ordered Scan making much of a difference here.

Now let’s take a look at the status counters.

Counter Name MySQL 5.5 MySQL 5.6 MySQL 5.6 w/ join_buffer_size=6M & read_rnd_buffer_size=6M MariaDB 5.5 MariaDB 5.5 w/ join_buffer_size=6M & mrr_buffer_size=6M MariaDB 5.5 Hash Join Disabled w/ join_buffer_size=4M & mrr_buffer_size=4M
Created_tmp_disk_tables 0 0 0 0 0 0
Created_tmp_tables 1 1 1 1 1 1
Handler_mrr_init N/A 112 3 1133 1 1
Handler_mrr_key_refills N/A N/A N/A 0 0 0
Handler_mrr_rowid_refills N/A N/A N/A 22 0 0
Handler_read_key 1829910 4440511 4372096 1801917 1790641 1805995
Handler_read_next 2651500 2628103 2586036 2628103 2592816 2615612
Handler_read_rnd_next 23078 22780 22757 22867 23049 23221
Innodb_buffer_pool_read_ahead 1152 23231 130919 23228 130731 131497
Innodb_buffer_pool_read_requests 12947073 10228611 7816154 10251697 9319281 9393396
Innodb_buffer_pool_reads 327963 332092 12889 335080 14067 13384
Innodb_data_read 5G 5.4G 2.2G 5.4G 2.2G 2.2G
Innodb_data_reads 329115 355323 143808 335526 16164 15506
Innodb_pages_read 329115 355323 143808 358308 144798 144881
Innodb_rows_read 4127318 6718351 6611675 6707882 5479445 5527245
Select_scan 1 1 1 1 1 1
Sort_scan 1 1 1 1 1 1

The first obvious improvement is shown by the high numbers of Innodb_buffer_pool_read_ahead when the buffers are sized appropriately, which shows that the IO access pattern has been changed to become sequential. The two other most important numbers are values for Innodb_buffer_pool_reads and Innodb_data_read. We can see that with appropriately sized buffers less no. of Innodb_buffer_pool_reads are done, and less amount of data is read from disk, in fact half the amount of data is read from disk 2.2G vs 5G. However, there is one number in MariaDB 5.5 that is quite large as compared to MySQL 5.6 and that is ‘Handler_mrr_init’. Is it because of this that the query on MariaDB 5.5 is slow as compared to MySQL 5.6. Next interesting thing are the last two columns of the table above and the values for ‘Handler_read_key’, ‘Handler_read_next’ and ‘Handler_read_rnd_next’. MariaDB 5.5 with Hash Joins enabled is doing less Handler reads as compared to when the Hash Joins are disabled, but that did not make much of a difference in the query times.

Conclusion

BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.

I intend to run tests to see what specific types of queries would benefit from Hash Join as compared to Nested Loop Join, but for now it looks like Nested Loop Join is a much better general purpose join algorithm.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com