After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page
You can find the description of benchmark in original post, there I will show number I have for LucidDB vs previous systems.
Load time
To load data into LucidDB in single thread, it took for me 15273 sec or 4.24h. In difference with other systems LucidDB support multi-threaded load, with concurrency 2 (as I have only 2 cores on that box), the load time is 9955 sec or 2.76h. For comparison
for InforBright load time is 2.45h and for MonetDB it is 2.6h
DataSize
Another interesting metric is datasize after load. In LucidDB db file after load takes 9.3GB.
UPDATE 27-Oct-2009 From metadata table the actual size of data is 4.5GB, the 9.3GB is size of physical file db.dat, which probably was not truncated after several loads of data.
For InfoBright it is 1.6GB, and for MonetDB – 65GB. Obviously LucidDB uses some compression, but it is not so aggressive as in InfoBright case. As original dataset is 55GB, compression rate for LucidDB is somewhat 1:12
Queries time
Let me put list of queries and times for all systems.
– Lame query “count start”
LucidDB:
SELECT count(*) FROM otp."ontime";
1 row selected (55.165 seconds)
Both InfoBright and MonetDB returned result immediately.
It seems LucidDB has to scan whole table to get result.
– Q0:
select avg(c1) from (select "Year","Month",count(*) as c1 from otp."ontime" group by "Year","Month") t;
LucidDB: 103.205 seconds
InfoBright: 4.19 sec
MonetDB: 29.9 sec
– Q1:
SELECT “DayOfWeek”, count(*) AS c FROM OTP.”ontime” WHERE “Year” BETWEEN 2000 AND 2008 GROUP BY “DayOfWeek” ORDER BY c DESC;
LucidDB: 49.17 seconds
InfoBright: 12.13 sec
MonetDB: 7.9 sec
– Q2:
SELECT “DayOfWeek”, count(*) AS c FROM otp.”ontime” WHERE “DepDelay”>10 AND “Year” BETWEEN 2000 AND 2008 GROUP BY “DayOfWeek” ORDER BY c DESC;
LucidDB: 27.131 seconds
InfoBright: 6.37 sec
MonetDB: 0.9 sec
– Q3:
!set rowlimit 10
SELECT “Origin”, count(*) AS c FROM otp.”ontime” WHERE “DepDelay”>10 AND “Year” BETWEEN 2000 AND 2008 GROUP BY “Origin” ORDER BY c DESC;
LucidDB: 27.664 seconds
InfoBright: 7.29 sec
MonetDB: 1.7 sec
– Q4:
SELECT “Carrier”, count(*) FROM otp.”ontime” WHERE “DepDelay”>10 AND “Year”=2007 GROUP BY “Carrier” ORDER BY 2 DESC;
LucidDB: 2.338 seconds
InfoBright: 0.99 sec
MonetDB: 0.27 sec
– Q5:
SELECT t.”Carrier”, c, c2, c*1000/c2 as c3 FROM (SELECT “Carrier”, count(*) AS c FROM OTP.”ontime” WHERE “DepDelay”>10 AND “Year”=2007 GROUP BY “Carrier”) t JOIN (SELECT “Carrier”, count(*) AS c2 FROM OTP.”ontime” WHERE “Year”=2007 GROUP BY “Carrier”) t2 ON (t.”Carrier”=t2.”Carrier”) ORDER BY c3 DESC;
LucidDB: 7.351 seconds
InfoBright: 2.92 sec
MonetDB: 0.5 sec
– Q6:
SELECT t.”Carrier”, c, c2, c*1000/c2 as c3 FROM (SELECT “Carrier”, count(*) AS c FROM OTP.”ontime” WHERE “DepDelay”>10 AND “Year” BETWEEN 2000 AND 2008 GROUP BY “Carrier”) t JOIN (SELECT “Carrier”, count(*) AS c2 FROM OTP.”ontime” WHERE “Year” BETWEEN 2000 AND 2008 GROUP BY “Carrier”) t2 ON (t.”Carrier”=t2.”Carrier”) ORDER BY c3 DESC;
LucidDB: 78.423 seconds
InfoBright: 21.83 sec
MonetDB: 12.5 sec
– Q7:
SELECT t.”Year”, c1/c2 FROM (select “Year”, count(*)*1000 as c1 from OTP.”ontime” WHERE “DepDelay”>10 GROUP BY “Year”) t JOIN (select “Year”, count(*) as c2 from OTP.”ontime” GROUP BY “Year”) t2 ON (t.”Year”=t2.”Year”);
LucidDB: 106.374 seconds
InfoBright: 8.59 sec
MonetDB: 27.9 sec
– Q8:
SELECT “DestCityName”, COUNT( DISTINCT “OriginCityName”) FROM “ontime” WHERE “Year” BETWEEN 2008 and 2008 GROUP BY “DestCityName” ORDER BY 2 DESC;
Years, LucidDB, InfoBright, MonetDB
1y, 6.76s, 1.74s, 0.55s
2y, 28.82s, 3.68s, 1.10s
3y, 35.37s, 5.44s, 1.69s
4y, 41.66s, 7.22s, 2.12s
10y, 72.67s, 17.42s, 29.14s
– Q9:
select “Year” ,count(*) as c1 from “ontime” group by “Year”;
LucidDB: 76.121 seconds
InfoBright: 0.31 sec
MonetDB: 6.3 sec
As you see LucidDB is not showing best results. However on good side about LucidDB I can mention it is very reach featured, with full support of DML statement. ETL features is also very impressive, you can extract, filter, transform external data (there is even access to MySQL via JDBC driver) just in SQL queries (compare with single LOAD DATA statement in InfoBright ICE edition). Also I am not so much in Java, but as I understood LucidDB can be easily integrated with Java applications, which is important if your development is Java based.
Worth to mention that in LucidDB single query execution takes 100% of user time in single CPU, which may signal that there some low-hanging fruits for optimization. OProfile can show clear places to fix.
Entry posted by Vadim |
10 comments
Add to:
|
|
|
| 