This post is a continuation of my research of TokuDB’s storage engine to understand if it is suitable for timeseries workloads.
While inserting LOAD DATA INFILE into an empty table shows great results for TokuDB, what’s more interesting is seeing some realistic workloads.
So this time let’s take a look at the INSERT benchmark.
What I am going to do is to insert data in 16 parallel threads into the table from the previous post:
CREATE TABLE `sensordata` ( `ts` int(10) unsigned NOT NULL DEFAULT '0', `sensor_id` int(10) unsigned NOT NULL, `data1` double NOT NULL, `data2` double NOT NULL, `data3` double NOT NULL, `data4` double NOT NULL, `data5` double NOT NULL, `cnt` int(10) unsigned NOT NULL, PRIMARY KEY (`sensor_id`,`ts`) )
The INSERTS are bulk inserts with sequentially increasing ts
and with sensor_id
from 1 to 1000.
While the inserts are not fully sequential, because the primary key is (sensor_id, ts)
, it is enough to have in memory workload, so I do not expect performance degradation when data exceeds memory. This will play in favor for InnoDB, as it is known that TokuDB performs worse in CPU-bound benchmarks.
The benchmark executes 1mln events, each event inserts 1000 records in bulk. That is when finished we should about about 1 bln records in the table.
So let’s see how InnoDB (compressed 8K vs not compressed) performs.
Throughput (more is better):
Response time (log 10 scale on the axe Y) (less is better):
So InnoDB comes with following numbers:
- InnoDB no compression. Averages at 350 inserts/sec with response time 80-100ms per transaction. The final table size is 82GB
- InnoDB 8K compression. Throughput is 130 inserts/sec, response time 250ms. Table size is 60GB
Now, we have a quite bad compression rate, because I used uniform distribution for values of data1-data5 columns, and uniform may not be good for compression. And actually in the real case I expect much more repeating values, so I am going to re-test with pareto (zipfian) distribution.
For TokuDB (tested tokudb_fast and tokudb_small formats)
Response time (log 10 scale on the axe Y) (less is better):
TokuDB observations:
- After an initial warmup TokuDB shows quite inconsistent performance with both tokudb_fast and tokudb_small formats
- For tokudb_fast, the throughput is topping at ~150 inserts/sec, and 95% response time ~160 ms. However there are periodical stalls when throughput drops almost to 0 and response time jump to 10 sec !!! per transaction.
- For tokudb_small, the throughput even less stable jumping around 100 inserts/sec and response time starts from 300ms per transactions with stalls up to 30 sec per transaction
File sizes for TokuDB: tokudb_fast: 50GB, tokudb_small: 45GB. Again I correspond a bad compression rate to uniform distribution. If we switch to pareto, the file size for tokudb_fast is 21GB, and for tokudb_small is 13GB
If we zoom in to 900 sec timeframe we can see periodic behavior of TokuDB:
Now I consider these stalls in TokuDB as severe and I do not think I can recommend to use it in production under such workload conditions until the problem is fixed.
The scripts for the timeseries benchmark for sysbench v0.5 you can find there
https://github.com/percona/sysbench-scripts
Software versions, for InnoDB: Percona Server 5.6-RC3 , for TokuDB: mariadb-5.5.30-tokudb-7.0.4
UPDATE (5-Sep-2013):
By many requests I update the post with following information:
TokuDB throughput (tokudb_small row format) with Pareto distribution, for two cases:
- 1. PRIMARY KEY (
sensor_id
,ts
) (on graph:tokudb_small
) - 2. PRIMARY KEY (
ts
,sensor_id
), KEY (sensor_id
,ts
) (on graph:tokudb_small_key
)
We can see that top throughput for tokudb_small_key
is less then for tokudb_small
, but there is also less variance in throughput.
The my.cnf files.
For InnoDB
[mysqld] # gdb log-error=error.log innodb_file_per_table = true innodb_data_file_path = ibdata1:100M:autoextend innodb_flush_method = O_DIRECT innodb_log_buffer_size = 256M innodb_flush_log_at_trx_commit = 1 innodb_buffer_pool_size = 40G innodb_buffer_pool_instances=1 innodb_file_format = Barracuda innodb_checksum_algorithm = crc32 innodb_log_file_size = 4G innodb_log_files_in_group = 2 #innodb_log_block_size=4096 #####plugin options innodb_read_io_threads = 16 innodb_write_io_threads = 4 innodb_io_capacity = 4000 #not innodb options (fixed) port = 3306 back_log = 50 max_connections = 2000 max_prepared_stmt_count=500000 max_connect_errors = 10 table_open_cache = 2048 max_allowed_packet = 16M binlog_cache_size = 16M max_heap_table_size = 64M sort_buffer_size = 4M join_buffer_size = 4M thread_cache_size = 1000 query_cache_size = 0 query_cache_type = 0 thread_stack = 192K tmp_table_size = 64M server-id = 10 key_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 4M bulk_insert_buffer_size = 8M myisam_sort_buffer_size = 8M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 myisam_recover socket=/var/lib/mysql/mysql.sock user=root
for TokuDB (pretty much defaults)
[mysqld] gdb skip-innodb #not innodb options (fixed) port = 3306 back_log = 50 max_connections = 2000 max_prepared_stmt_count=500000 max_connect_errors = 10 table_open_cache = 2048 max_allowed_packet = 16M binlog_cache_size = 16M max_heap_table_size = 64M sort_buffer_size = 4M join_buffer_size = 4M thread_cache_size = 1000 query_cache_size = 0 query_cache_type = 0 ft_min_word_len = 4 #default_table_type = InnoDB thread_stack = 192K tmp_table_size = 64M server-id = 10 key_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 4M bulk_insert_buffer_size = 8M myisam_sort_buffer_size = 8M myisam_max_sort_file_size = 10G #myisam_max_extra_sort_file_size = 10G myisam_repair_threads = 1 myisam_recover socket=/var/lib/mysql/mysql.sock user=root
The post TokuDB vs InnoDB in timeseries INSERT benchmark appeared first on MySQL Performance Blog.