Feb
24
2014
--

The MySQL ARCHIVE storage engine – Alternatives

In my previous post I pointed out that the existing ARCHIVE storage engine in MySQL may not be the one that will satisfy your needs when it comes to effectively storing large and/or old data. But are there any good alternatives? As the primary purpose of this engine is to store rarely accessed data in disk space efficient way, I will focus here on data compression abilities rather then on performance.

The InnoDB engine provides compressed row format, but is it’s efficiency even close to the one from that available in archive engine? You can also compress MyISAM tables by using myisampack tool, but that also means a table will be read only after such operation.

Moreover, I don’t trust MyISAM nor Archive when it comes to data durability. Fortunately along came a quite new (open source since April 2013) player into this field – TokuDB! It seems to provide an excellent compression ratios, but also it’s fully ACID compliant, and does not have any of the limitations present in Archive, so it’s functionality is much more like InnoDB! This may allow you also to store production data on SSD drives, which disk space cost is still higher then on traditional disks, where otherwise it could be too expensive.

To better illustrate what choice do we have, I made some very simple disk savings comparison of all the mentioned variants.
I have used an example table with some scientific data fetched from here (no indexes):

CREATE TABLE `table1` (
  `snp_id` int(11) DEFAULT NULL,
  `contig_acc` varchar(32) DEFAULT NULL,
  `contig_ver` tinyint(4) DEFAULT NULL,
  `asn_from` int(11) DEFAULT NULL,
  `asn_to` int(11) DEFAULT NULL,
  `locus_id` int(11) DEFAULT NULL,
  `locus_symbol` varchar(128) DEFAULT NULL,
  `mrna_acc` varchar(128) DEFAULT NULL,
  `mrna_ver` int(11) DEFAULT NULL,
  `protein_acc` varchar(128) DEFAULT NULL,
  `protein_ver` int(11) DEFAULT NULL,
  `fxn_class` int(11) DEFAULT NULL,
  `reading_frame` int(11) DEFAULT NULL,
  `allele` text,
  `residue` text,
  `aa_position` int(11) DEFAULT NULL,
  `build_id` varchar(4) NOT NULL,
  `ctg_id` int(11) DEFAULT NULL,
  `mrna_start` int(11) DEFAULT NULL,
  `mrna_stop` int(11) DEFAULT NULL,
  `codon` text,
  `protRes` char(3) DEFAULT NULL,
  `contig_gi` int(11) DEFAULT NULL,
  `mrna_gi` int(11) DEFAULT NULL,
  `mrna_orien` tinyint(4) DEFAULT NULL,
  `cp_mrna_ver` int(11) DEFAULT NULL,
  `cp_mrna_gi` int(11) DEFAULT NULL,
  `verComp` varchar(7) NOT NULL
)

ARCHIVE storage engine

mysql >show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: ARCHIVE
        Version: 10
     Row_format: Compressed
           Rows: 19829016
 Avg_row_length: 11
    Data_length: 221158267
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: 2013-12-22 23:58:51
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.28 sec)
-rw-rw----. 1 przemek przemek 211M Dec 22 23:58 table1.ARZ

TokuDB engine, default compression

mysql >show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: TokuDB
        Version: 10
     Row_format: tokudb_zlib
           Rows: 19829016
 Avg_row_length: 127
    Data_length: 2518948412
Max_data_length: 9223372036854775807
   Index_length: 0
      Data_free: 6615040
 Auto_increment: NULL
    Create_time: 2013-12-23 00:03:47
    Update_time: 2013-12-23 00:12:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.13 sec)
-rwxrwx--x. 1 przemek przemek 284M Dec 23 00:12 _b_tokudb_table1_main_32_1_18_B_0.tokudb

TokuDB engine, highest compression

mysql [localhost] {msandbox} (b_tokudb) > show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: TokuDB
        Version: 10
     Row_format: tokudb_lzma
           Rows: 19829016
 Avg_row_length: 127
    Data_length: 2518948412
Max_data_length: 9223372036854775807
   Index_length: 0
      Data_free: 6950912
 Auto_increment: NULL
    Create_time: 2013-12-23 00:43:47
    Update_time: 2013-12-23 00:49:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=TOKUDB_LZMA
        Comment:
1 row in set (0.01 sec)
-rwxrwx--x. 1 przemek przemek 208M Dec 23 00:49 _b_tokudb_sql_980_2_main_1b92_2_18.tokudb

(btw, did you notice how the file name changed after altering with different compression?
It’s no longer reflecting the real table name, so quite confusing :( )

InnoDB engine, uncompressed

mysql > show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 19898159
 Avg_row_length: 117
    Data_length: 2343567360
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2014-01-01 16:47:03
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.42 sec)
-rw-rw----. 1 przemek przemek 2.3G Jan  1 16:37 table1.ibd

InnoDB engine, compressed with default page size (8kB)

mysql > show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 19737546
 Avg_row_length: 59
    Data_length: 1171783680
Max_data_length: 0
   Index_length: 0
      Data_free: 5767168
 Auto_increment: NULL
    Create_time: 2014-01-01 18:51:22
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED
        Comment:
1 row in set (0.31 sec)
-rw-rw----. 1 przemek przemek 1.2G Jan  1 18:51 table1.ibd

InnoDB engine, compressed with 4kB page size

mysql > show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 19724692
 Avg_row_length: 30
    Data_length: 592445440
Max_data_length: 0
   Index_length: 0
      Data_free: 3932160
 Auto_increment: NULL
    Create_time: 2014-01-01 19:41:12
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=4
        Comment:
1 row in set (0.03 sec)
-rw-rw----. 1 przemek przemek 584M Jan  1 19:41 table1.ibd

MyISAM engine, uncompressed

mysql > show table status like 'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 19829016
 Avg_row_length: 95
    Data_length: 1898246492
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2013-12-23 11:02:28
    Update_time: 2013-12-23 11:03:45
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.01 sec)
-rw-rw----. 1 przemek przemek 1.8G Dec 23 11:03 table1.MYD

MyISAM engine, compressed (myisampack)

mysql > show table status like  'table1'\G
*************************** 1. row ***************************
           Name: table1
         Engine: MyISAM
        Version: 10
     Row_format: Compressed
           Rows: 19829016
 Avg_row_length: 42
    Data_length: 848098828
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2013-12-23 11:02:28
    Update_time: 2013-12-23 11:03:45
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: 853535317
 Create_options:
        Comment:
1 row in set (0.00 sec)
-rw-rw----. 1 przemek przemek 809M Dec 23 11:03 table1.MYD

Compression summary table

Engine Compression Table size [MB]
InnoDB  none  2272
InnoDB  KEY_BLOCK_SIZE=8  1144
InnoDB  KEY_BLOCK_SIZE=4  584
MyISAM  none  1810
MyISAM  compressed with myisampack  809
Archive  default  211
TokuDB  ZLIB  284
TokuDB  LZMA  208

So the clear winner is TokuDB, leaving InnoDB far behind. But this is just one test – the results may be very different for your specific data.

To get even better idea, let’s compare several crucial features available in mentioned storage engines

Feature Archive MyISAM (compressed) InnoDB TokuDB
DML only INSERTs no yes yes
Transactions no no yes yes
ACID no no yes yes
Indexes no yes yes yes
Online DDL no no yes * yes **

* – since version 5.6, with some limitations
** – supports add/drop indexes, add/drop/rename columns and expand int, char, varchar and varbinary data types

Summary

TokuDB seems to be an excellent alternative when it comes to disk space usage efficiency, but this is not the only reason why you should try it perhaps.
You may want to check these articles too:

The post The MySQL ARCHIVE storage engine – Alternatives appeared first on MySQL Performance Blog.

Jan
06
2014
--

The ARCHIVE Storage Engine – does it do what you expect?

Sometimes there is a need for keeping large amounts of old, rarely used data without investing too much on expensive storage. Very often such data doesn’t need to be updated anymore, or the intent is to leave it untouched. I sometimes wonder what I should really suggest to our Support customers.

For this purpose, the archive storage engine, added in MySQL 4.1.3, seems perfect as it provides excellent compression and the only DML statement it does allow is INSERT. However, does it really work as you would expect?

First of all, it has some serious limitations. Apart from lack of support for DELETE, REPLACE and UPDATE statements (which may be acceptable for some needs), another one is that it does not allow you to have indexes, although you can have an auto_increment column being either a unique or non-unique index. So usually straightforward converting your tables to archive engine will not be possible. See the list of features for reference.

But unfortunately, it does not always work as the manual says, within it’s described limitations. See the following very simple examples.

Problem I

Does the archive storage engine really ensure uniqueness for a primary or unique key?

mysql> CREATE TABLE `b` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=ARCHIVE;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into b values (null),(null),(null),(null);
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from b;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
+----+
4 rows in set (0.01 sec)
mysql> repair table b;
+--------+--------+----------+----------+
| Table  | Op     | Msg_type | Msg_text |
+--------+--------+----------+----------+
| test.b | repair | status   | OK       |
+--------+--------+----------+----------+
1 row in set (0.00 sec)
mysql> insert into b values (null),(null);
Query OK, 2 row affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> select * from b;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  1 |
|  2 |
+----+
6 rows in set (0.01 sec)
mysql> show indexes from b\G
*************************** 1. row ***************************
        Table: b
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: NULL
  Cardinality: NULL
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: NONE
      Comment:
Index_comment:
1 row in set (0.00 sec)

That is really bad – a column being a primary key effectively allows duplicates! And another case exposing the same problem:

mysql> CREATE TABLE `c` ( `id` int(11) NOT NULL AUTO_INCREMENT, UNIQUE KEY (`id`) ) ENGINE=ARCHIVE;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into c values (null),(null),(null);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> select * from c;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.01 sec)
mysql> optimize table c;
+--------+----------+----------+----------+
| Table  | Op       | Msg_type | Msg_text |
+--------+----------+----------+----------+
| test.c | optimize | status   | OK       |
+--------+----------+----------+----------+
1 row in set (0.01 sec)
mysql> insert into c values (null);
Query OK, 1 row affected (0.00 sec)
mysql> select * from c;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  1 |
+----+
4 rows in set (0.01 sec)

So even a simple optimize table command does break it completely. After we realize that such operation made our data bad, we won’t be able to easily go back to different engine without sacrificing uniqueness first:

mysql> alter table c engine=innodb;
ERROR 1062 (23000): ALTER TABLE causes auto_increment resequencing, resulting in duplicate entry '1' for key 'id'
mysql> alter table c drop key id;
ERROR 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key
mysql> alter table c drop key id, add key(id);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> alter table c engine=innodb;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

There were already bug reports related to auto_increment feature being broken, but I have filed a new, more specific bug report about this problem.
————–

Problem II

Are we always able to alter a table to use the archive storage engine, even if it is theoretically using supported table definition? Auto increment column issue again…

mysql> select * from c;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.01 sec)

We have the same c table using archive. We can change it’s engine to something different:

mysql> alter table c engine=innodb;
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> select * from c;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

But in some cases, we can’t set it back to archive!

mysql> alter table c engine=archive;
ERROR 1022 (23000): Can't write; duplicate key in table '#sql-1649_3'

There is an old bug report about that.
————–

Problem III

And yet another weirdness around auto_increment values. It seems normal that databases allow us to insert explicit values into auto_increment columns, even lower then last inserted maximum, and all other engines – MyISAM, Memory and InnoDB do that:

mysql> CREATE TABLE ai (a int auto_increment primary key) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into ai values (10);
Query OK, 1 row affected (0.00 sec)
mysql> insert into ai values (1);
Query OK, 1 row affected (0.00 sec)
mysql> select * from ai;
+----+
| a  |
+----+
|  1 |
| 10 |
+----+
2 rows in set (0.00 sec)

But it’s not the case for Archive engine:

mysql> CREATE TABLE aa (a int auto_increment primary key) ENGINE=Archive;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into aa values (10);
Query OK, 1 row affected (0.00 sec)
mysql> insert into aa values (1);
ERROR 1022 (23000): Can't write; duplicate key in table 'aa'

This undocumented behavior was reported here.

Summary

The archive storage engine provides a very good compression and is available in all MySQL variants out of the box. However it does have serious limitations as well as works unreliable and not as expected in some cases.

Related articles worth mentioning here:
http://www.mysqlperformanceblog.com/2006/11/12/trying-archive-storage-engine/
http://www.mysqlperformanceblog.com/2013/02/11/adventures-in-archiving/

In my next blog post, I am going to present simple research on (free) alternatives we do have that can replace the archive storage engine in terms of disk space effectiveness.

The post The ARCHIVE Storage Engine – does it do what you expect? appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com