May
30
2012
--

A case for MariaDB’s Hash Joins

MariaDB 5.3/5.5 has introduced a new join type “Hash Joins” which is an implementation of a Classic Block-based Hash Join Algorithm. In this post we will see what the Hash Join is, how it works and for what types of queries would it be the right choice. I will show the results of executing benchmarks for different queries and explain the results so that you have a better understanding of when using the Hash Join will be best and when not. Although Hash Joins are available since MariaDB 5.3, but I will be running my benchmarks on the newer MariaDB 5.5.

Overview

Hash Join is a new algorithm introduced in MariaDB 5.3/5.5 that can be used for joining tables that have a equijoin conditions of the form tbl1.col1 = tbl2.col1, etc. As I mentioned above that what is actually implemented is the Classic Hash Join. But its known as Block Nested Loop Hash (BNLH) Join in MariaDB.
The Classic Hash Join Algorithm consists of two phases, a build phase and a probe phase. Let’s consider the case of joining two tables on a equijoin condition. So the first thing would be to designate the smallest of the two tables as the left operand and the other table which is bigger, to be the right operand. Now when the algorithm begins, the first phase is the build phase, in which a hash table is created over the join attributes and rows of the left operand. Next comes the probe phase, which is where the matching rows from the right operand are found, by scanning the right operand and for each row scanned performing a lookup in the hash table by using values of the columns participating in the equijoin condition. The hash table is accessed by using a hash function on the values of the join condition, and hence is quite efficient. But what about the restriction on the size of the hash table. The size of the hash table is restricted by the value of join_buffer_size, and so if the left operand is big such that the size of the hash table built on it is greater than the join_buffer_size, then multiple hash tables would be created. For example if the left operand has “n” rows, and its size is three times the value of join_buffer_size, then 3 hash tables would need to be created each containing a hash table on n/3 rows. And so both the build and probe phase would be done three times, which means that the right operand will be scanned thrice.

Wikipedia has a nicely simplified version of the Classic Hash Join algorithm (http://en.wikipedia.org/wiki/Hash_join#Classic_hash_join) which I will quote below for better understanding:

  1. For each tuple r in the build input R
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input S, and add matching join tuples to the output relation
      2. Reset the hash table
  2. Do a final scan of the probe input S and add the resulting join tuples to the output relation

Now after the explanation of the hash join lets see how it performs for different test cases.

Benchmarks

For the purpose of the benchmarks I used the DBT3 dataset of scale factor 2, which means the total dataset size is 4.8G. Let me show the breakdown of dataset size by the tables that I have used in the benchmarks:

Table ‘lineitem’: 3.8G
Table ‘supplier’: 11M
Table ‘orders’: 468M

I have benchmarked two different kinds of workloads, IO bound and in-memory. Benchmark on IO bound workload was performed with a buffer pool size of 1G, while benchmark on in-memory workload was performed with a buffer pool size of 6G. The benchmarks compare Block Nested Loop (BNL) Join of MySQL 5.5.24, Batched Key Access (BKA) Join of MySQL 5.6.5 and Block Nested Loop Hash (BNLH) Join of MariaDB 5.5.20. The configuration used with the three variants of MySQL are listed below.

Configuration

Let’s first take a look at the configuration used with different MySQL flavors.

MySQL 5.5.24 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

MySQL 5.6.5 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
read_rnd_buffer_size=32M
optimizer_switch='batched_key_access=on'
join_buffer_size=32M

MariaDB 5.5.20 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'

optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
mrr_buffer_size=32M

optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=4
join_buffer_size=32M
join_buffer_space_limit=32M

Note that MariaDB includes a new variable ‘join_cache_level‘, this variable controls which Join Algorithms are allowed to be used, a value of 4 here means that Nested Loop Join and Hash Join algorithms are allowed. Now as well know that ‘join_buffer_size‘ controls the size of the join buffer allocated for each join in a query, MariaDB introduces another variable to control the size of the buffer ‘join_buffer_space_limit‘. This variable controls the maximum allowed size of the buffer for the whole query. By default it has a value of 1024*128*10, which means that your effective join_buffer_size is not bigger than this value. Hence, the reason I have set join_buffer_space_limit=32M.

Benchmark Machine Specs

The machine that I used for the benchmarks, is a dual core machine with the following CPU configuration: 2xIntel(R) Core(TM)2 CPU 6600 @ 2.40GHz. The amount of memory installed is 8G and the MySQL datadir is on a 4-disk Software RAID5 volume, the disks are 5.4K RPM disks. The filesystem used is XFS, and the OS installed is Centos 6.2

Table Structure

Before moving on, let’s take a look at the structure of the tables involved in the benchmark tests.

CREATE TABLE `supplier` (
  `s_suppkey` int(11) NOT NULL DEFAULT '0',
  `s_name` char(25) DEFAULT NULL,
  `s_address` varchar(40) DEFAULT NULL,
  `s_nationkey` int(11) DEFAULT NULL,
  `s_phone` char(15) DEFAULT NULL,
  `s_acctbal` decimal(10,2) DEFAULT NULL,
  `s_comment` varchar(101) DEFAULT NULL,
  PRIMARY KEY (`s_suppkey`),
  KEY `i_s_nationkey` (`s_nationkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `lineitem` (
  `l_orderkey` int(11) NOT NULL DEFAULT '0',
  `l_partkey` int(11) DEFAULT NULL,
  `l_suppkey` int(11) DEFAULT NULL,
  `l_linenumber` int(11) NOT NULL DEFAULT '0',
  `l_quantity` decimal(10,2) DEFAULT NULL,
  `l_extendedprice` decimal(10,2) DEFAULT NULL,
  `l_discount` decimal(10,2) DEFAULT NULL,
  `l_tax` decimal(10,2) DEFAULT NULL,
  `l_returnflag` char(1) DEFAULT NULL,
  `l_linestatus` char(1) DEFAULT NULL,
  `l_shipDATE` date DEFAULT NULL,
  `l_commitDATE` date DEFAULT NULL,
  `l_receiptDATE` date DEFAULT NULL,
  `l_shipinstruct` char(25) DEFAULT NULL,
  `l_shipmode` char(10) DEFAULT NULL,
  `l_comment` varchar(44) DEFAULT NULL,
  PRIMARY KEY (`l_orderkey`,`l_linenumber`),
  KEY `i_l_shipdate` (`l_shipDATE`),
  KEY `i_l_suppkey_partkey` (`l_partkey`,`l_suppkey`),
  KEY `i_l_partkey` (`l_partkey`,`l_quantity`,`l_shipmode`,`l_shipinstruct`),
  KEY `i_l_suppkey` (`l_suppkey`),
  KEY `i_l_receiptdate` (`l_receiptDATE`),
  KEY `i_l_orderkey` (`l_orderkey`),
  KEY `i_l_orderkey_quantity` (`l_orderkey`,`l_quantity`),
  KEY `i_l_commitdate` (`l_commitDATE`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL DEFAULT '0',
  `o_custkey` int(11) DEFAULT NULL,
  `o_orderstatus` char(1) DEFAULT NULL,
  `o_totalprice` decimal(10,2) DEFAULT NULL,
  `o_orderDATE` date DEFAULT NULL,
  `o_orderpriority` char(15) DEFAULT NULL,
  `o_clerk` char(15) DEFAULT NULL,
  `o_shippriority` int(11) DEFAULT NULL,
  `o_comment` varchar(79) DEFAULT NULL,
  PRIMARY KEY (`o_orderkey`),
  KEY `i_o_orderdate` (`o_orderDATE`),
  KEY `i_o_custkey` (`o_custkey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Test Cases

Now let’s see the test cases and then see how the joins perform for each test case.

Test Case A – Join a small table that fits in memory to a large table with no WHERE clause

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
GROUP BY s_nationkey, l_shipmode;

+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref                     | rows  | filtered | Extra                                        |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+
|  1 | SIMPLE      | supplier | index | PRIMARY       | i_s_nationkey | 5       | NULL                    | 20266 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | lineitem | ref   | i_l_suppkey   | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |   250 |   100.00 | Using where                                  |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+----------------------------------------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 32077 seconds to finish in the IO bound workload. Anyhow we can clearly see from the above chart that Hash join comprehensively beats BKA and BNL, hash join is perfect in these cases where you are joining a small table with a very large table with no ‘indexed where’ conditions on the big table. BNLH takes half the time to complete the query for in-memory workload and 6.6x less times as compared to BKA MySQL 5.6, and 965x less time as compared to BNL MySQL 5.5. So hash join gives us an improvement by a very large factor both for IO bound workload and in-memory workload.

Test Case B – Join a small table that fits in memory to a large table with a selective WHERE clause on an indexed column

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
WHERE s_nationkey='24'
GROUP BY s_nationkey, l_shipmode; 

+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table    | type | possible_keys         | key           | key_len | ref                     | rows | filtered | Extra                                                     |
+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | supplier | ref  | PRIMARY,i_s_nationkey | i_s_nationkey | 5       | const                   |  808 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | lineitem | ref  | i_l_suppkey           | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |  292 |   100.00 | Using where                                               |
+----+-------------+----------+------+-----------------------+---------------+---------+-------------------------+------+----------+-----------------------------------------------------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 1280 seconds to finish in the IO bound workload. In this test Hash join is not ideal because you have a highly selective where clause that reduces the size of the joining data set. And hash join performs even badly and takes 7x more time for in-memory workload in this test case. While for IO bound workload, hash join takes 53x less time to execute the query as compared to MySQL 5.5 but takes slightly more time as compared to BKA algorithm of MySQL 5.6.

Test Case C – Join a small table with a large table with a WHERE clause on a non-indexed column

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT s_nationkey, l_shipmode, count(*)
FROM supplier INNER JOIN lineitem ON s_suppkey = l_suppkey
WHERE l_shipmode='AIR'
GROUP BY s_nationkey, l_shipmode;

+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref                     | rows  | filtered | Extra       |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+
|  1 | SIMPLE      | supplier | index | PRIMARY       | i_s_nationkey | 5       | NULL                    | 20174 |   100.00 | Using index |
|  1 | SIMPLE      | lineitem | ref   | i_l_suppkey   | i_l_suppkey   | 5       | dbt3.supplier.s_suppkey |   270 |   100.00 | Using where |
+----+-------------+----------+-------+---------------+---------------+---------+-------------------------+-------+----------+-------------+

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 31654 seconds to finish in the IO bound workload. Again here hash join beats BKA and BNL comprehensively. Hash join outperforms the other join types when you are joining a small table with a very large table with a where clause on a ‘non-indexed’ column In this test we can clearly see that Hash Join gives a lot of reduction in query time. The reduction in query time for IO bound workload is 1266x times when compared to MySQL 5.5 and 9x times when compared to MySQL 5.6. While for in-memory workload the reduction is query time is 3.5x when compared to both MySQL 5.5 and MySQL 5.6.

Another interesting thing to note is that for both Test B and Test C, Hash Join takes similar amount of time both for IO bound workload and in-memory workload. Why, because Hash Join implies scanning the table lineitem (right operand) in both test cases. Since in Test B we have a limited set of rows in the supplier table (left operand) to join to the lineitem table (right operand) so scanning the lineitem table (BNLH) proves to be costly as compared to doing batched index lookups (BKA). However, in Test C the cost of hash join remains the same but the cost of BKA increases, as there are going to be a lot more random index lookups needed to be performed because of the increase in the number of rows needed to be joined from supplier table (left operand).

Test Case D – Join a large data set (>1M rows) from one table with a large table

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

SELECT o.*, count(*) as num_items
FROM orders AS o INNER JOIN lineitem AS l ON o_orderkey=l_orderkey
WHERE o_orderdate > '1996-05-01' GROUP BY o_orderkey
ORDER BY num_items DESC LIMIT 10;

+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type  | possible_keys                              | key     | key_len | ref               | rows    | filtered | Extra                                        |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+
|  1 | SIMPLE      | o     | index | PRIMARY,i_o_orderdate                      | PRIMARY | 4       | NULL              | 2993459 |    50.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | dbt3.o.o_orderkey |       1 |   100.00 | Using index                                  |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+-------------------+---------+----------+----------------------------------------------+

And the results in seconds of time taken to complete the above query:

Here we can clearly see that MySQL 5.5 beats both BKA of MySQL 5.6 and Hash Join of MariaDB 5.5. In IO bound test MySQL 5.5 takes 2.5x less time to complete the query as compared to MySQL 5.6 BKA algorithm, and takes 6.6x less time as compared to MariaDB 5.5 Hash Join, Hash Join also performs worse as compared to BKA and takes 2.5x more time. While for in-memory workload test, MySQL 5.5 takes 5x less time as compared to MySQL 5.6 and 13x less time as compared to MariaDB 5.5 Hash Join. First thing to note is that the above query would be reading 1/3 the number of rows in the table orders (left operand), and so MySQL 5.5 prefers to do a PRIMARY index scan of the table orders resulting in sequential IO, while both MySQL 5.6 and MariaDB 5.5 prefer to do an index range scan on the secondary key o_orderdate which results in random scans of the PK to fetch the columns that are not part of the secondary key. Even though MySQL 5.6 uses MRR to offset the effect of random access of PK, even then it proves to be costly. Also note that the table lineitem, is joined by the column l_orderkey which is the left-most PK column, so reading the table orders in PK order has another benefit that it implies reading the table lineitem in PK order. Hence, these benefits mean MySQL 5.5 wins. But why does Hash Join take so much more time. The reason is that the rows needed to be read from the left operand which is the table orders are far greater than the size of the join buffer. The size of the join buffer is 32M, while the size of the left operand is 186M which means roughly 6 scans of the right operand which is the table lineitem. Hence the reason why hash join is slow in this case, because we have to refill the join buffer with rows from orders table many times, and hash join is not that good if you need many scans of the right operand (in this case the table lineitem).

Another difference with the query in this test case is that, while with the queries in previous test cases, the joining key from the table supplier would match approximately 600 rows from the table lineitem for each distinct key value, in this test case D, the joining key from the table orders would match approximately 5 rows from the table lineitem for each distinct key value. Also the joining key in this test case D is PK in one table and left-most part of the PK in the second table.

How does optimizer work with the different Join Algorithms available?

Currently, the part of the optimizer that is responsible for choosing the join algorithm for a particular query and QEP is not advanced enough and there is work to be done yet. As I understand it MariaDB folks are working on the cost-based choice for any joins. It’s not easy because the current costing model is primitive and must be enhanced to support the possibility of existence of different join algorithms. So what does that mean to MariaDB/MySQL users right now with the state of the current optimizer. Right now you would have to manually enable and disable the join algorithms for the optimizer to choose from.
In MariaDB, every algorithm has a number given to it:
1 – flat BNL
2 – incremental BNL
3 – flat BNLH
4 – incremental BNLH
5 – flat BKA
6 – incremental BKA
7 – flat BKAH
8 – incremental BKAH

The variable join_cache_level controls which algorithms are enabled. If join_cache_level=4 all algorithms numbered 1 to 4 are enabled, if join_cache_level=8, all algorithms numbered 1 to 8 are enabled. Optimizer is naive in the sense that it always uses the max values join algorithm. If join_cache_level=4 it always uses BNLH (hash join), if join_cache_level=8 it always uses BKAH (a variant of BKA). Optimizer does not try to check which algorithm is the best one to use, it just assumes that the algorithm with the highest numeric value is the best one.
So we can force the join algorithm used by setting appropriate values of “join_cache_level”. For example in my test I forced the optimizer to use hash join by setting join_cache_level=4. We can set certain rules for which certain join algorithms are best and then use that algorithm by making use of the variable “join_cache_level”.

Conclusion

Based on the above information and the benchmark results for different test cases, we can see where Hash Joins work best and where they don’t. First of all Hash joins only work for equijoins. Hash join work best when you are joining very big tables with no WHERE clause, or a WHERE clause on a non-indexed column. They also provide big improvement in query response time when you are joining tables with no indexes on the join condition (Full Join). The best performance with Hash Join can be achieved when the left table can fit completely in the join buffer, or when the least amount of buffer refills are needed, as each buffer refill means a scan of the right-side table. However, Hash joins do not outperform BNL or BKA when you are joining a really small subset of rows, as then scanning the right-side table becomes costly in comparison. Block Nested Loop Join would perform better than Hash Join when you are joining two tables on a PK column such that both tables are read in PK order. One use case that I can think of for hash joins is data warehouse applications that need to run reporting queries that need to join on lookup tables which tend to be small mostly. What use cases can you think of?

May
30
2012
--

Data compression in InnoDB for text and blob fields

Have you wanted to compress only certain types of columns in a table while leaving other columns uncompressed? While working on a customer case this week I saw an interesting problem where a table had many heavily utilized TEXT fields with some read queries exceeding 500MB (!!), and stored in a 100GB table. In this case we were not allowed to make any query or application logic changes so we chose to implement the Barracuda file format and utilize compressed rows as this appealed to me for this mostly-read application. One quick way you can see if your rows will benefit from compression would be to read Peter Zaitsev’s blog post and execute:

SELECT AVG(LENGTH((`colTextField`)) FROM `t1` WHERE `id` < 1000

compare this to:

SELECT AVG(LENGTH(COMPRESS(`colTextField`))) FROM `t1` WHERE `id` < 1000

In our case we saw about a 75% reduction when the TEXT field was compressed which we felt indicated there would be a benefit derived from table compression.

With the original InnoDB Antelope file format you have the choice of ROW_FORMAT=COMPACT and ROW_FORMAT=REDUNDANT where InnoDB stored the first 768 bytes of variable length columns (BLOB, VARCHAR, TEXT) in the index record, and the remainder stored in overflow pages.  COMPACT became the default after MySQL 5.0.3 and has a more compact representation for nulls and variable-length fields than REDUNDANT.

Using InnoDB’s new Barracuda file format (available since InnoDB plugin 1.1 or MySQL 5.5) you can now leverage table compression by specifying ROW_FORMAT=COMPRESSED.  In our case we only wanted MySQL to try to move the larger (greater than 16KB) TEXT fields  off-page so we utilized the KEY_BLOCK_SIZE=16 directive.  This means that each TEXT / BLOB field that exceeds 16KB it would be stored in it’s own page (less the 20 byte pointer stored in the index page).  Based on our analysis 75% of the blobs stored in the table were over 8KB, which were responsible for 90% of space usage hence compressing only externally stored blobs provided substaintial advantages.  Why did we choose a KEY_BLOCK_SIZE that is the same value of the InnoDB page size of 16KB?  As the fine MySQL manual states:

This setting may still be useful for tables with many long BLOBVARCHAR or TEXT columns, because such values often do compress well, and might therefore require fewer “overflow” pages.

I did not test with a smaller KEY_BLOCK_SIZE as we had minimal time to effect the table compression modification (given the long run-time of the ALTER TABLE), you may find your application benefits from a different KEY_BLOCK_SIZE value.  Also note that you need to enable the  dynamic variable innodb_file_format=BARRACUDA (don’t forget to set it in my.cnf!):

SET GLOBAL innodb_file_format=BARRACUDA;

One caveat: you must be running with innodb_file_per_table=1 as the InnoDB system tablespace cannot be compressed, see this page for further details on how to enable compression for a table.

To utilize Barracuda format tables you will need to create them new and migrate data, or affect existing tables with an ALTER TABLE statement. As table compression is table specific, ROW_FORMAT and KEY_BLOCK_SIZE directives are passed via CREATE TABLE or ALTER TABLE statements. In our case, we chose to re-build the table using ALTER TABLE via a null-operation like this:

ALTER TABLE `t1` ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

In our case even though the customer had a 3GHz 24-core machine the ALTER TABLE was progressing slowly as it was bound to a single CPU while compressing the data. Just have patience. :) Keep in mind too that if you started with a 100GB table and assuming you know your approximate compression rate, you will be left with a considerably smaller on-disk footprint so ideally you will be able to postpone that purchase of additional disk capacity.

So what was the real-world outcome of this exercise?  We were able to show a 70% improvement in queries against this table when the TEXT fields were not part of the query request due to Barracuda  not storing 768 bytes of the blob on field, and reduce the table down to 30GB.  Happy customer :)

One parting idea: you may be able to leverage pt-online-schema-change from Percona Toolkit 2.1 in order to modify the table if you cannot sustain the blocking effects of a traditional ALTER TABLE statement.

I hope this helps you understand a use case where table compression can be beneficial when your workload is mostly-read. Thanks for reading my first mysqlperformanceblog.com blog post!

May
30
2012
--

Symfony 2.0.15 released

Symfony 2.0.15 has just been released.

This version mainly fixes the PEAR packages and some Composer configuration
issues. Doctrine has been updated to 2.1.7 and Twig to 1.8.2.

The CHANGELOG
has all the details about the changes and you can even have a look at the full
diff.

If you are starting a new project, you can get the Symfony Standard Edition
distribution on the download page.

If you already have a project based on the Symfony Standard Edition 2.0.x, you
can easily upgrade to 2.0.15 by getting the new
deps and
deps.lock
files.

Then, run the vendors script (it also clears your cache):

$ ./bin/vendors install

Remember that the Symfony2 Components are also available as standalone
libraries. You can get them via their dedicated read-only repositories on
Github (https://github.com/symfony/Finder for instance) or install them via
Composer.


Be trained by Symfony experts
2012-07-02 Paris
2012-07-06 Paris
2012-07-09 Paris

Written by in: Zend Developer |
May
30
2012
--

Percona at SELF

There is a great conference – SELF (South East Linux Fest) taking place in June 8-9 in Charlotte,NC
This conference has a great lineup for MySQL with 2 tracks on Friday dedicated to it. There are many great speakers both from Oracle and community on the list.
From Percona I will speak about Optimizing MySQL Configuration and Baron will talk about Forecasting MySQL Scalability from TCP Traffic.
See you there !

May
30
2012
--

Security Release: symfony 1.4.18 released

symfony 1.4.18 has just been released. Read the post carefully as this version
fixes a security vulnerability.

Dmitri Groutso contacted us a couple of days ago about a possible security
issue in the session code:

“The regenerate() method as implemented by database backed session classes do
not persist the current session data from request memory before regenerating
session ID, leaving shadow copy in the database as it was at the beginning of
the request (still authenticated in the “logout” case). Passing to
$destroy=true to regenerate mitigates the attack, by explicitly removing
shadow copy.”

His
patch
has been applied in the 1.4.18 release.

Here are all the changes from the
CHANGELOG:

  • [33466] fixed a possible DB session fixation attack (patch from Dmitri Groutso)
  • [33373] fixed test browser click function does not handle css selector without [ or ] (closes #9982, patch from mouette)

If you’ve checked out a copy of the tag from Subversion you can switch to the
latest version:

$ svn switch http://svn.symfony-project.com/tags/RELEASE_1_4_18

If you are using the PEAR package you can update using the pear command:

$ pear upgrade symfony/symfony-1.4.18

And as always, don’t forget to clear your cache after upgrading.


Be trained by Symfony experts
2012-06-25 Köln
2012-06-25 Köln
2012-06-27 Köln

Written by in: Zend Developer |
May
29
2012
--

A week of symfony #282 (21->27 May 2012)

This week Form component continued its huge refactorization with hundreds of fixes, tweaks and changes (some of them breaking BC). In addition, Symfony Live San Francisco 2012 was announced and SensioLabs launched in United Kingdom.

Development mailing list

Symfony2 development highlights

Master branch:

  • 53aaf76:
    [Routing] removed unused ‘defaults’ property of Router
  • 2996340:
    [Form] extracted FormConfig class to simplify Form constructor
  • c2a243f:
    [Form] made PropertyPath deterministic: “[prop]” always refers to indices (array or ArrayAccess), “prop” always refers to properties
  • 860dd1f:
    [Form] adapted Form to create a deterministic property path by default
  • 2301b15:
    [Form] tightened PropertyPath validation to reject any empty value (such as false)
  • 5e87dd8:
    [Form] added tests for the case when “property_path” is null or false
  • 306324e:
    [Form] greatly improved the error mapping done in DelegatingValidationListener
  • 7a4ba52:
    [EventDispatcher] added UnmodifiableEventDispatcher class
  • bbffd1b:
    [Form] fixed index checks in PropertyPath classes
  • 0c09a0e:
    [Form] made $name parameters optional in PropertyPathBuilder:replaceBy(Index|Property)
  • c8b61d5:
    [Form] renamed FormMapping to MappingRule and moved some logic there to make rules more extendable
  • 215b687:
    [Form] added capability to process “.” rules in “error_mapping”
  • 59d6b55:
    [Form] fixed error mapping aborts if reaching an unsynchronized form
  • ac69394:
    [Form] allowed native framework errors to be mapped as well
  • a6b3902:
    added the possibility to translate the placeholder and title attributes in the PHP form templates
  • 517ae43:
    [Console] added an exception when an option name or shortcut is invalid
  • 0935964:
    [HttpFoundation] modified example for mod_rewrite to not add Authorization header if it is not set in the request
  • d1864c7:
    [Form] fixed virtual forms are ignored when prepopulating a form
  • bad6d04:
    [Form] added accessor FormConfigInterface::getByReference() and let Form clone objects if not by reference
  • 37a3a29:
    [OptionsResolver] optimized validation
  • bcf8cf9:
    [Process] refactored the Windows handling so it is always executed
  • 0af5f06:
    [OptionsResolver] added method setFilters() for augmenting the final option values
  • 97de004:
    [OptionsResolver] added option type validation capabilities
  • 027259e:
    [Form] changed getDefaultOptions() to setDefaultOptions(OptionsResolver $resolver) in FormTypeInterface
  • 0ef4066:
    [Form] options are now passed to buildView() and buildViewBottomUp()
  • 2cd99e8:
    [Form] added FormBuilderInterface and FormViewInterface and cleaned up FormTypeInterface and FormTypeExtensionInterface
  • dc2fa9d:
    [OptionsResolver] added OptionsResolverInterface
  • 8cae328:
    [Form] setDefaultOptions() is now coded against OptionsResolverInterface
  • bec8015:
    [Form] renamed client and application format to view and model format
  • 33fecca:
    [Form] merged various form events and added class FormEvent
  • 877d8f7:
    [Form] reversed the order of $type and $name in FormFactory::createNamed[Builder
  • 98a7c0c:
    [Form] consolidated FormInterface, FormBuilderInterface and FormViewInterface
  • 2e6cdd1:
    [Form] inverted the logic of “single_control” and renamed it to “compound”. The opposite is now “simple”.
  • ee803cd:
    [Form] renamed setVars() to addVars() in FormViewInterface
  • 8c23d7f:
    [Form] fixed “error_mapping” is not an attribute anymore, but an option
  • c688166:
    [Form] fixed form type translation_domain inheritance
  • 8308aea:
    [Config] added EnumNode

2.0.x branch:

  • 8223632:
    [HttpFoundation] fixed the UploadedFilename name sanitization
  • 55faa54:
    [Form] fixed invalid ‘type’ option in ValidatorTypeGuesser for Date/TimeFields
  • 7a85b43:
    [TwigBundle] fixed the path to templates when using composer
  • 8c6c86c:
    [FrameworkBundle] added unit tests for AddCacheWarmerPass class

Repository summary: 4,763 watchers (#1 in PHP, #30 overall) and 1249 forks (#1 in PHP, #12 overall).

They talked about us


Be trained by Symfony experts
2012-06-25 Köln
2012-06-25 Köln
2012-06-27 Köln

Written by in: Zend Developer |
May
29
2012
--

Towards Symfony 2.1

People are getting nervous about the release of Symfony 2.1, and this post
tries to give all the information I have at my disposal as of today.

First, I want to apologize for the lack of communication during the last few
weeks, but I wanted to have a solid plan before taking any decision and before
announcing anything official on this blog.

The discussion about the 2.1 release schedule started some time ago when I
asked the community about the best plan for Symfony 2.1 on the developers
mailing-list.
I suggested two possible options back then:

  • Wait for the form component to stabilize before releasing Symfony 2.1;

  • Release 2.1 as soon as possible (by reverting some changes made on the form
    component — and reintroduce them after the release).

The form component being one of the most used Symfony features and one that is
not declared as stable yet, its state is very important to take into account
before each release of the framework.

The plan I’m describing in this post has been taken thanks to the numerous
comments (almost 80 as of now) on the mailing-list thread I’ve mentioned
above. Thanks to all developers who contributed to the discussion.

So, based on the feedback from the community (the vast majority wanted to wait
instead of releasing earlier), and after many discussions on the state of the
form component with
Bernhard and
Victor, we chose recently
that the only viable option was to wait for the form to stabilize for the
following reasons:

  • The work needed to stabilize the form component is almost finished (all
    major backward compatibility breaks should be finished before Symfony Live
    in Paris);

  • Many bundles have adjusted to the form changes already, so reverting would
    have been a nightmare for these bundles;

  • Having less releases with backward compatibility breaks is better for the
    community (we have tried hard to keep BC whenever possible and to pack all
    major BC breaks for the form component in 2.1);

  • Documentation update would also have been more challenging as we would have
    to revert some doc changes as well if we had reverted some form changes;

  • Many blogs have already talked about the changes that occurred in forms for
    2.1, so that would be confusing as well if we had reverted some of them.

  • Reverting all form changes to avoid some of the above issues was not even
    an option as some BC breaks were done to fix major bugs in the form
    component;

  • Composer is used everywhere in Symfony 2.1 and even if it has been much
    more stable for the last couple of weeks, waiting a bit more will help the
    transition a lot.

So, the release schedule for Symfony 2.1 reads as follows:

  • First 2.1 beta release after the hacking day at Symfony Live Paris (we will
    try to work hard during the hacking day to stabilize things and see how we
    can make the update easier — if you have some big apps, please come and
    try to upgrade them with the help of Bernhard and other core team members,
    that will help us a lot);

  • First release candidate mid-July;

  • Final release at the end of August (releasing at the end of August means
    that Symfony 2.1 will be out just a year after Symfony 2.0).

What’s next? After the Symfony 2.1 release, we would like to release more
often. Having shorter release cycle means that we need less BC breaks. And as
all the major BC breaks for the form components will have been made in 2.1, it
will be much more easier to upgrade. So, expect a Symfony 2.2 release before
the end of 2012.

Releasing new versions is great, but having a long term support release is
even better. For Symfony 2.1, we have marked some features as deprecated
because they have been replaced with something better. That makes Symfony 2.1
more compatible with applications developed for Symfony 2.0 (less hard BC
breaks) and at the same time, it gives developers plenty of time to upgrade.
As deprecated features will be removed in Symfony 2.3, the 2.3 version will be the first
LTS release for Symfony2.

Last, but not the least, I want to talk a bit about how the Symfony community
works. Like any other Open-Source project, Symfony is made by passionate
developers and volunteers all around the world (more than 460 at the time of writing) and sometimes, some of the major
contributors cannot dedicate as much time to the project as they would want. This has impacts on our the release cycle.

Of course, getting paid to contribute to an Open-Source project is probably
the best way to ensure the best involvement on the long run, but that’s pretty
rare… I’m one of the few lucky ones to have this great status!

Today, I’m really happy to announce that
SensioLabs has decided to give more free time to Victor Berchet so that he
can dedicate more time to his work on the Symfony core; and SensioLabs has been sponsoring Bernhard Schussek‘s work on the
form and validator components for the last couple of months and it will continue to do so for the foreseeable future.


Be trained by Symfony experts
2012-06-25 Köln
2012-06-25 Köln
2012-06-27 Köln

Written by in: Zend Developer |
May
27
2012
--

Secure passwords being insecure

If you follow the general advices to create secure password the following ones seem to be secure, right?

  • s11P$||!sh&2
  • pr0&!!ke0
  • 3kj39|!381
  • The answer to the question is, “it depends on how you use them:)

    Notice that these passwords all contain multiple exclamation points and ampersands which are normally special characters for your shell. The people tend to copy and paste them directly to the terminal but that can lead to some non-predictable behavior and therefore cause big problems depending on the character combination.

    Let’s execute the previous examples:

    Login to mysql:

    root@debian:~# mysql -uroot -ps11P$||!sh&2
    mysql -uroot -ps11P$||shutdown -r now&2
    [1] 1758
    -bash: 2: command not found
    root@debian:~# ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

    Broadcast message from root@debian (pts/0) (Sun May 27 13:14:34 2012):

    The system is going down for reboot NOW!

    || is an OR condition and bash tries to execute the last part !sh that points to the “shutdown -r now” command from the history.

    Ok, login to mysql is not very safe. We’re going now to check the replication data:

    root@debian:~# pt-table-checksum u=root,p=pr0&!!&ke0
    pt-table-checksum h=localhost,u=root,p=pr0&shutdown -r now&ke0
    [1] 1736
    [2] 1737
    -bash: ke0: command not found
    root@debian:~#
    Broadcast message from root@debian (pts/0) (Sun May 27 13:46:08 2012):

    The system is going down for reboot NOW!

    In this case we use the combination !! so Bash tries to run the last command, “shutdown -r now“.

    Let’s try to run some backups :)

    root@debian:~# innobackupex --password=3kj39|!381
    innobackupex --password=3kj39|/etc/init.d/mysql restart

    Stopping MySQL database server: mysqldinnobackupex: Missing command line argument
    .
    Starting MySQL database server: mysqld.

    In this last example, !381 asks Bash to run the command with number 381 in the history, that is: “/etc/init.d/mysql restart

    These are only a small number of examples. Of course, I’m not telling you to use simple passwords. The lesson that we can learn here is “don’t paste your password on the shell“. First because they’re going to be logged to Bash’s history file and second because some combinations can cause big problems.

    In order to avoid these problems:

  • Don’t run commands from the root account. There are no small mistakes when you make them from root.
  • Use .my.cnf in your home directory to avoid typing a password. Protect the file with permission mode 0600.
  • Use the -p option with mysql and its utilities. This makes the programs prompt you for a password from the tty.
  • May
    24
    2012
    --

    Announcing Percona Server 5.1.63-13.4

    Percona is glad to announce the release of Percona Server 5.1.63-13.4 on May 24th, 2012 (Downloads are available from Percona Server 5.1.63-13.4 downloads and from the Percona Software Repositories).

    Based on MySQL 5.1.63, including all the bug fixes in it, Percona Server 5.1.63-13.4 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.63-13.4 milestone at Launchpad.

    Bugs fixed:

    • Building Percona Server with the Clang compiler resulted in a compiler error. Bug fixed#997496 (Alexey Kopytov).
    May
    24
    2012
    --

    Done! Or What?

    During the last few weeks, it has become a little silent here at Use The Index, Luke! Today, I’d like to tell you the news that caused the silence and give a short outlook about the future of Use The Index, Luke!

    The most important news is that I became father. It’s very important to me, at least. My son, Marcel, was born on 25th March 2012. So, we already had two month to settle down. Mommy is quite good at the night shift—I can sleep almost normally and have enough time and energy to work during daytime.

    Most of that time I’ve spent to complete the book. Yes, it is “done”. As done as an online IT book can get. That means, the originally planned content is—after two years—finalized. The German translation was completed at about the same time. Use The Index, Luke! is now fully available in two languages.

    During these two years of writing, I have always strived for an exciting text, because I know database tuning is not a popular topic for developers. I designed the structure so that one topic leads to the next in the hope it is read from cover to cover in one shot. The problem with the website is, however, that most people just see one or two pages. Those pages are often are often out of context and don’t make sense on their own. To cut a long story short: I believe the text only works when read entirely. So, I decided to publish the book as a real book.

    I took the title I already used for the e-Pub edition last year: SQL Performance Explained. The cover photo is a quickly running squirrel because the book is about fast SQL. So it comes that I sometimes introduce the book as “Squirrel Performance Explained”.

    At the moment, there is only a German edition available (to my own surprise). That is because I found some confusingly written paragraphs during translation. Definitely not good enough for printing. Those parts were not just translated, but rewritten from scratch. In two languages—at least at the beginning. Due to time constraints, I had to stop editing the English text after a while. Just writing in my native language was, of course, faster.

    You can buy the German edition directly via http://sql-performance-explained.de/ or—private consumers in Germany—also in my amazon.de-Shop. Delivery time is the same in both cases because I’m always using amazon’s delivery network for Germany—no matter which way you order.

    Finally, I’ll give you a small outlook what’s happening next. I’m currently completing the English text, of course. I’m looking for a lecturer, btw; A native speaker, preferably with German and IT knowledge (contact). Once the text is finalized, printing is no big issue. After that, I need to take care to earn some money again. From business perspective, this year was a disaster until now. Not only because I spent quite some time on the book, but also to prepare the arrival of my son. I finally took drivers education, for example. One would not believe how much time that takes.

    Earning money is for me mostly about doing workshops and Instant-Coaching. I’m constantly getting inquires for the workshop. The problem is, however, the travel time and expenses. That makes it too expensive very often. As workaround, I try to get three (or more) clients in the same area—in that case, the travel time and expenses is no problem anymore. So, if you would like to hire me for a workshop, just leave me a short note. For the planning, I just need a postal code. The book is part of the course material, of course. At the moment, I’m only doing on-site workshops in the D-A-CH region. I’m, however, getting inquires from all over the world (thank you!). But that is even harder to manage. For those of you living outside Austria, Germany or Switzerland, please consider using the Instant-Coaching model. It is a virtual workshop, using your system. I can explain the principles based on examples from your own application, if you like.

    If I find some time in between, I’d like to process my blog backlog. I’m having topics for about 20 articles in the pipeline—some of them for the miserably neglected myth directory. I can also imagine writing some opinion articles. I’ve always aimed to stay objective when writing for the book. Now, I might write possibly some…well…more emotional articles.

    I did not yet make up my mind regarding long-term planning for Use The Index, Luke! I will, of course, maintain and update the material. Let me know if you have any topic you would like to see on Use The Index, Luke! Your wish might come true.

    So long,

    -markus

    Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com