Personal blog of Yzmir Ramirez

Nov
05
2009

InnoDB: look after fragmentation

One problem made me puzzled for couple hours, but it was really interesting to figure out what’s going on.

So let me introduce problem at first. The table is

CODE:

CREATE TABLE `c` (
`tracker_id` int(10) unsigned NOT NULL,
`username` char(20) character set latin1 collate latin1_bin NOT NULL,
`time_id` date NOT NULL,
`block_id` int(10) unsigned default NULL,
PRIMARY KEY (`tracker_id`,`username`,`time_id`),
KEY `block_id` (`block_id`)
) ENGINE=InnoDB

Table has 11864696 rows and takes Data_length: 698,351,616 bytes on disk

The problem is that after restoring table from mysqldump, the query that scans data by primary key was slow. How slow ? Let me show.

The query in question is (Q1):

SELECT count(distinct username) FROM tracker where TIME_ID >= '2009-07-20 00:00:00' AND TIME_ID <= '2009-10-21 00:00:00' AND (tracker_id=437)

On cold buffer_pool, it took:

PLAIN TEXT

CODE:

+—————————+
| count(distinct username) |
+—————————+
| 5856156 |
+—————————+
1 row in set (4 min 13.61 sec)

However the query (again on cold buffer_pool) (Q2)

SELECT count(distinct username) FROM tracker where TIME_ID >= '2009-07-20 00:00:00' AND TIME_ID <= '2009-10-21 00:00:00'

PLAIN TEXT

CODE:

+—————————+
| count(distinct username) |
+—————————+
| 5903053 |
+—————————+
1 row in set (18.81 sec)

Difference is impressive. 4 min 13.61 sec vs 18.81 sec

If you want EXPLAIN plain, here it is:

For Q1:

PLAIN TEXT

CODE:

+—-+————-+————————-+——+—————+———+———+——-+———+————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————————-+——+—————+———+———+——-+———+————————–+
| 1 | SIMPLE | tracker | ref | PRIMARY | PRIMARY | 4 | const | 6880241 | Using where; Using index |
+—-+————-+————————-+——+—————+———+———+——-+———+————————–+
1 row in set (0.02 sec)

For Q2:

PLAIN TEXT

CODE:

+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————————–+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————————–+
| 1 | SIMPLE | tracker | index | NULL | block_id | 5 | NULL | 13760483 | Using where; Using index |
+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————————–+

Query Q1 is executed using Primary Key, and Query Q2 is using block_id key.

To get more details I ran both queries with our extended stats in slow.log (available in 5.0-percona releases)

So for query Q1:

PLAIN TEXT

CODE:

# Query_time: 253.643162 Lock_time: 0.000137 Rows_sent: 1 Rows_examined: 11569733 Rows_affected: 0 Rows_read: 11569733
# InnoDB_IO_r_ops: 73916 InnoDB_IO_r_bytes: 1211039744 InnoDB_IO_r_wait: 236.149003
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 54838

And for query Q2:

PLAIN TEXT

CODE:

# Query_time: 18.846855 Lock_time: 0.000123 Rows_sent: 1 Rows_examined: 11864696 Rows_affected: 0 Rows_read: 11864696
# InnoDB_IO_r_ops: 27510 InnoDB_IO_r_bytes: 450723840 InnoDB_IO_r_wait: 0.165124
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 24687

As you see for Q1 IO read took 236.149003 sec vs 0.165124 for Q2. But Q1 is scan by primary key, which supposed to be
sequential!

Let’s see on another statistic, which available in innodb_check_fragmentation patch:

for Q1:

PLAIN TEXT

CODE:

SHOW STATUS LIKE ‘Innodb_scan_pages%’;
+——————————+——-+
| Variable_name | Value |
+——————————+——-+
| Innodb_scan_pages_contiguous | 88 |
| Innodb_scan_pages_jumpy | 73789 |
+——————————+——-+
2 rows in set (0.00 sec)

and for Q2:

PLAIN TEXT

CODE:

mysql> SHOW STATUS LIKE ‘Innodb_scan_pages%’;
+——————————+——-+
| Variable_name | Value |
+——————————+——-+
| Innodb_scan_pages_contiguous | 26959 |
| Innodb_scan_pages_jumpy | 442 |
+——————————+——-+
2 rows in set (0.00 sec)

So you see for Q1 it was not sequential scan, even it is primary key, but it is sequential for Q2.

So what’s the answer ? It’s fragmentation of primary key (and whole data table, as InnoDB data == primary key). But how it could happen with
primary key after mysqldump ? The answer here if we look on

EXPLAIN SELECT * FROM tracker;

PLAIN TEXT

CODE:

+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————-+
| 1 | SIMPLE | tracker | index | NULL | block_id | 5 | NULL | 13760483 | Using index |
+—-+————-+————————-+——-+—————+————————————-+———+——+———-+————-+
1 row in set (0.00 sec)

We see that dump is taken in key “block_id” order, not in primary key order. And later when we load this table, INSERTS into primary key happens in random order, and that gives us the fragmentation we see here.

How to fix it in our case. It’s easy: ALTER TABLE tracker ENGINE=InnoDB, it will force InnoDB to rebuild table in primary key order.

After that Q1:

PLAIN TEXT

CODE:

+—————————+
| count(distinct username) |
+—————————+
| 5856156 |
+—————————+
1 row in set (17.72 sec)
mysql> SHOW STATUS LIKE ‘Innodb_scan_pages%’;
+——————————+——-+
| Variable_name | Value |
+——————————+——-+
| Innodb_scan_pages_contiguous | 37864 |
| Innodb_scan_pages_jumpy | 574 |
+——————————+——-+
2 rows in set (0.00 sec)
and extended stats:
# Query_time: 17.765369 Lock_time: 0.000137 Rows_sent: 1 Rows_examined: 11569733 Rows_affected: 0 Rows_read: 11569733
# InnoDB_IO_r_ops: 38530 InnoDB_IO_r_bytes: 631275520 InnoDB_IO_r_wait: 0.204893
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 35584

You see that time returned to appropriate 17.72 sec.

You may ask what happens now with Q2 ? yes, it’s getting slow now, as we made key “block_id” inserted not in order.

PLAIN TEXT

CODE:

+—————————+
| count(distinct username) |
+—————————+
| 5903053 |
+—————————+
1 row in set (2 min 8.92 sec)
mysql> SHOW STATUS LIKE ‘Innodb_scan_pages%’;
+——————————+——-+
| Variable_name | Value |
+——————————+——-+
| Innodb_scan_pages_contiguous | 45 |
| Innodb_scan_pages_jumpy | 35904 |
+——————————+——-+
2 rows in set (0.00 sec)

As for mysqldump you may use --order-by-primary options to force dump in primary key order.

So notes to highlight:

InnoDB fragmentation may hurt your query significantly, especially when data is not in buffer_pool and execution goes to read from disk
Fragmentation by secondary key is much more likely than by primary key, and you cannot really control it (tough it is possible in XtraDB / InnoDB-plugin with FAST INDEX creation) so be careful with queries scan many records by secondary key
To check if you query affected by fragmentation you can use Innodb_scan_pages_contiguous ; Innodb_scan_pages_jumpy statistics in 5.0-percona releases (coming to 5.1-XtraDB soon)

Entry posted by Vadim |
7 comments

Add to: | | | |

Written by Vadim in: Innodb,MySQL,Zend Developer |

Nov
02
2009

Air traffic queries in InfiniDB: early alpha

As Calpont announced availability of InfiniDB I surely couldn’t miss a chance to compare it with previously tested databases in the same environment.
See my previous posts on this topic:
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB

I could not run all queries against InfiniDB and I met some hiccups during my experiment, so it was less plain experience than with other databases.

So let’s go by the same steps:

Load data

InfiniDB supports MySQL’s LOAD DATA statement and it’s own colxml / cpimport utilities. As LOAD DATA is more familiar for me, I started with that, however after issuing LOAD DATA on 180MB file ( for 1989 year, 1st month) very soon it caused extensive swapping (my box has 4GB of RAM) and statement failed with
ERROR 1 (HY000) at line 1: CAL0001: Insert Failed: St9bad_alloc

Alright, colxml / cpimport was more successful, however it has less flexibility in syntax than LOAD DATA, so I had to transform the input files into a format that cpimport could understand.

Total load time was 9747 sec or 2.7h (not counting time spent on files transformation)

I put summary data into on load data time, datasize and query time to Google Spreadsheet so you can easy compare with previous results. There are different sheets for queries, datasize and time of load.

Datasize

Size of database after loading is another confusing point. InfiniDB data directory has complex structure like

PLAIN TEXT

CODE:

./000.dir/000.dir/003.dir/233.dir
./000.dir/000.dir/003.dir/233.dir/000.dir
./000.dir/000.dir/003.dir/233.dir/000.dir/FILE000.cdf
./000.dir/000.dir/003.dir/241.dir
./000.dir/000.dir/003.dir/241.dir/000.dir
./000.dir/000.dir/003.dir/241.dir/000.dir/FILE000.cdf
./000.dir/000.dir/003.dir/238.dir
./000.dir/000.dir/003.dir/238.dir/000.dir
./000.dir/000.dir/003.dir/238.dir/000.dir/FILE000.cdf
./000.dir/000.dir/003.dir/235.dir
./000.dir/000.dir/003.dir/235.dir/000.dir
./000.dir/000.dir/003.dir/235.dir/000.dir/FILE000.cdf

so it’s hard to day what files are related to table. But after load, the size of 000.dir is 114G, which is as twice big as original data files. SHOW TABLE STATUS does not really help there, it shows

PLAIN TEXT

CODE:

Name: ontime
Engine: InfiniDB
Version: 10
Row_format: Dynamic
Rows: 2000
Avg_row_length: 0
Data_length: 0
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: NULL
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:

with totally misleading information.

So I put 114GB as size of data after load, until someone points me how to get real size, and also explains what takes so much space.

Queries

First count start query SELECT count(*) FROM ontime took 2.67 sec, which shows that InfiniDB does not store counter of records, however calculates it pretty fast.

Q0:
select avg(c1) from (select year,month,count(*) as c1 from ontime group by YEAR,month) t;

Another bumper, on this query InfiniDB complains
ERROR 138 (HY000): The query includes syntax that is not supported by InfiniDB. Use 'show warnings;' to get more information. Review the Calpont InfiniDB Syntax guide for additional information on supported distributed syntax or consider changing the InfiniDB Operating Mode (infinidb_vtable_mode). mysql> show warnings; +-------+------+------------------------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------------------------+ | Error | 9999 | Subselect in From clause is not supported in this release. | +-------+------+------------------------------------------------------------+

Ok, so InfiniDB does not support DERIVED TABLES, which is big limitation from my point of view.
As workaround I tried to create temporary table, but got another error:

PLAIN TEXT

CODE:

mysql> create temporary table tq2 as (select Year,Month,count(*) as c1 from ontime group by Year, Month);
ERROR 122 (HY000): Cannot open table handle for ontime.

As warning suggests I turned infinidb_vtable_mode = 2, which is:

PLAIN TEXT

CODE:

2) auto-switch mode: InfiniDB will attempt to process the query internally, if it
cannot, it will automatically switch the query to run in row-by-row mode.

but query took 667 sec :

so I skip queries Q5, Q6, Q7 from consideration, which are also based on DERIVED TABLES, as not supported by InfiniDB.

Other queries: (again look on comparison with other engines in Google Spreadsheet or in summary table at the bottom)

Query Q1:
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;
7 rows in set (6.79 sec)

Query Q2:
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;
7 rows in set (4.59 sec)

Query Q3:
SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;
4.96 sec

Query Q4:
mysql> SELECT Carrier, count(*) FROM ontime WHERE DepDelay > 10 AND YearD=2007 GROUP BY Carrier ORDER BY 2 DESC;

I had another surprise with query, after 15 min it did not return results, I check system and it was totally idle, but query stuck. I killed query, restarted mysqld but could not connect to mysqld anymore. In processes I see that InfiniDB started couple external processes: ExeMgr, DDLProc, PrimProc, controllernode fg, workernode DBRM_Worker1 fg which cooperate each with other using IPC shared memory and semaphores. To clean system I rebooted server, and only after that mysqld was able to start.

After that query Q4 took 0.75 sec

Queries Q5-Q7 skipped.

Query Q8:

SELECT DestCityName, COUNT( DISTINCT OriginCityName) FROM ontime WHERE YearD BETWEEN 2008 and 2008 GROUP BY DestCityName ORDER BY 2 DESC LIMIT 10;

And times for InfiniDB:

1y: 8.13 sec
2y: 16.54 sec
3y: 24.46 sec
4y: 32.49 sec
10y: 1 min 10.35 sec

Query Q9:

Q9:
select Year ,count(*) as c1 from ontime group by Year;
Time: 9.54 sec

Ok, so there is summary table with queries times (in sec, less is better)

Query	MonetDB	InfoBright	LucidDB	InfiniDB
Q0	29.9	4.19	103.21	NA
Q1	7.9	12.13	49.17	6.79
Q2	0.9	6.73	27.13	4.59
Q3	1.7	7.29	27.66	4.96
Q4	0.27	0.99	2.34	0.75
Q5	0.5	2.92	7.35	NA
Q6	12.5	21.83	78.42	NA
Q7	27.9	8.59	106.37	NA
Q8 (1y)	0.55	1.74	6.76	8.13
Q8 (2y)	1.1	3.68	28.82	16.54
Q8 (3y)	1.69	5.44	35.37	24.46
Q8 (4y)	2.12	7.22	41.66	32.49
Q8 (10y)	29.14	17.42	72.67	70.35
Q9	6.3	0.31	76.12	9.54

Conclusions

InfiniDB server version shows Server version: 5.1.39-community InfiniDB Community Edition 0.9.4.0-5-alpha (GPL), so I consider it as alpha release, and it is doing OK for alpha. I will wait for more stable release for further tests, as it took good amount of time to deal with different glitches.
InfiniDB shows really good time for queries it can handle, quite often better than InfoBright.
Inability to handle derived tables is significant drawback for me, I hope it will be fixed

Entry posted by Vadim |
18 comments

Add to: | | | |

Written by Vadim in: benchmarks,dw,MySQL,OLAP,Zend Developer |

Nov
01
2009

Speaking at the LA MySQL Meetup – 18th November

A recent photo from Highload.ru

I said in my last post, that we’re interested in speaking at MySQL meetups, and I’m happy to say that the Los Angeles MySQL Meetup has taken us up on the offer.

On November 18th, I’ll be giving an introductory talk on InnoDB/XtraDB Performance Optimization. I will be the second speaker, with Carl Gelbart first speaking on Infobright.

What brings me to LA? On the same day (18th Nov) I’ll be teaching a one day class on Performance Optimization for MySQL with InnoDB and XtraDB. If you haven’t signed up yet – spaces are still available.

Entry posted by Morgan Tocker |
No comment

Add to: | | | |

Written by Morgan Tocker in: community,Innodb,MySQL,training,xtradb,Zend Developer |

« Newer Posts

The Glass is twice as large as it needs to be

InnoDB: look after fragmentation

Air traffic queries in InfiniDB: early alpha

Speaking at the LA MySQL Meetup – 18th November

Archives

Contributors

Personal blog of Yzmir Ramirez

The Glass is twice as large as it needs to be

InnoDB: look after fragmentation

Air traffic queries in InfiniDB: early alpha

Speaking at the LA MySQL Meetup – 18th November

Tag Cloud

Archives

Contributors