Infobright and InnoDB AMI images are now available
There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here on out. Shard-Query is very flexible in it’s configuration, so you can use this sample database to spread processing over up to 20 nodes.
The Infobright Community Edition (ICE) images are available in 32 and 64 bit varieties. Due to memory requirements, the InnoDB versions are only available on 64 bit instances. MySQL will fail to start on a micro instance, simply decrease the values in the /etc/my.cnf file if you really want to try micro instances.
*EDIT*
The storage worker currently logs too much information. This can cause the disk to fill up with logs. You can fix this by modifying shard-query/run_worker to contain the following:
#!/bin/bash
while [ 1 ]
do
./worker >> /dev/null 2>&1 < /dev/null
done;
Where to find the images
Amazon ID
| Name
| Arch
| Notes |
ami-20b74949
| shard-query-infobright-demo-64bit
| x86_64
| ICE 3.5.2pl1. Requires m1.large or larger |
ami-8eb648e7
| shard-query-innodb-demo-64bit
| x86_64
| Percona Server 5.5.11 with XtraDB. Requires m1.large or larger. |
ami-f65ea19f
| shard-query-infobright-demo
| i686 |
ICE 3.5.2pl1 32bit. Requires m1.small or greater. |
snap-073b6e68
| shard-query-demo-data-flatfiles
| 30GB ext3 EBS
| This is an ext3 volume which contains the flat files for the demos, if you want to reload on your favorite storage engine or database |
About the cluster
For best performance, there should be an even data distribution in the system. To get an even distribution, the test data was hashed over the values in the date_id column. There will be another blog post about the usage and performance of the splitter. It is multi-threaded(actually multi-process) and is able to hash split up to 50GB/hour of input data on my i970 test machine. It is possible to distribute splitting and/or loading among multiple nodes as well. Note that in the demonstration each node will contain redundant, but non-accessed data for all configurations of more than one node. This would not be the case in normal circumstances. The extra data will not impact performance because it will never be accessed.
Since both InnoDB and ICE versions of the data are available it is important to examine the differences in size. This will give us some interesting information about how Shard-Query will perform on each database. To do the size comparison, I used the du utility:
InnoDB file size on disk: 42GB (with indexes)
# du -sh *
203M ibdata1
128M ib_logfile0
128M ib_logfile1
988K mysql
2.1G ontime1
2.1G ontime10
2.1G ontime11
2.1G ontime12
2.1G ontime13
2.1G ontime14
2.1G ontime15
2.1G ontime16
2.1G ontime17
2.1G ontime18
2.1G ontime19
2.1G ontime2
2.1G ontime20
2.1G ontime3
2.1G ontime4
2.1G ontime5
2.1G ontime6
2.1G ontime7
2.1G ontime8
2.1G ontime9
212K performance_schema
0 test
ICE size on disk: 2.5GB
# du -sh *
8.0K bh.err
11M BH_RSI_Repository
4.0K brighthouse.ini
4.0K brighthouse.log
4.0K brighthouse.seq
964K mysql
123M ontime1
124M ontime10
123M ontime11
123M ontime12
123M ontime13
123M ontime14
123M ontime15
123M ontime16
123M ontime17
123M ontime18
124M ontime19
124M ontime2
124M ontime20
124M ontime3
123M ontime4
122M ontime5
122M ontime6
122M ontime7
123M ontime8
125M ontime9
The InnoDB data directory size is 42GB, which is twice the original size of the input data. The ICE schema was discussed in the comments of the last post. ICE does not have any indexes (not even primary keys).
Here is the complete InnoDB schema from one shard. The schema is duplicated 20 times (but not the ontime_fact data):
DROP TABLE IF EXISTS `dim_airport`;
CREATE TABLE `dim_airport` (
`airport_id` int(11) NOT NULL DEFAULT '0',
`airport_code` char(3) DEFAULT NULL,
`CityName` varchar(100) DEFAULT NULL,
`State` char(2) DEFAULT NULL,
`StateFips` varchar(10) DEFAULT NULL,
`StateName` varchar(50) NOT NULL,
`Wac` int(11) DEFAULT NULL,
PRIMARY KEY (`airport_id`),
KEY `CityName` (`CityName`),
KEY `State` (`State`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Data from BTS ontime flight data. Data for Origin and Destination airport data.';
CREATE TABLE `dim_date` (
`Year` year(4) DEFAULT NULL,
`Quarter` tinyint(4) DEFAULT NULL,
`Month` tinyint(4) DEFAULT NULL,
`DayofMonth` tinyint(4) DEFAULT NULL,
`DayOfWeek` tinyint(4) DEFAULT NULL,
`FlightDate` date NOT NULL,
`date_id` smallint(6) NOT NULL,
PRIMARY KEY (`date_id`),
KEY `FlightDate` (`FlightDate`),
KEY `Year` (`Year`,`Quarter`,`Month`,`DayOfWeek`),
KEY `Quarter` (`Quarter`,`Month`,`DayOfWeek`),
KEY `Month` (`Month`,`DayOfWeek`),
KEY `DayOfWeek` (`DayOfWeek`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains the date information from the BTS ontime flight data. Note dates may not be in date_id order';
/*!40101 SET character_set_client = @saved_cs_client */;
CREATE TABLE `dim_flight` (
`UniqueCarrier` char(7) DEFAULT NULL,
`AirlineID` int(11) DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`FlightNum` varchar(10) DEFAULT NULL,
`flight_id` int(11) NOT NULL DEFAULT '0',
`AirlineName` varchar(100) DEFAULT NULL,
PRIMARY KEY (`flight_id`),
KEY `UniqueCarrier` (`UniqueCarrier`,`AirlineID`,`Carrier`),
KEY `AirlineID` (`AirlineID`,`Carrier`),
KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains information on flights, and what airline offered those flights and the flight number of the flight. Some data hand updated.';
--
-- Table structure for table `ontime_fact`
--
CREATE TABLE `ontime_fact` (
`date_id` int(11) NOT NULL DEFAULT '0',
`origin_airport_id` int(11) NOT NULL DEFAULT '0',
`dest_airport_id` int(11) NOT NULL DEFAULT '0',
`flight_id` int(11) NOT NULL DEFAULT '0',
`TailNum` varchar(50) DEFAULT NULL,
`CRSDepTime` int(11) DEFAULT NULL,
`DepTime` int(11) DEFAULT NULL,
`DepDelay` int(11) DEFAULT NULL,
`DepDelayMinutes` int(11) DEFAULT NULL,
`DepDel15` int(11) DEFAULT NULL,
`DepartureDelayGroups` int(11) DEFAULT NULL,
`DepTimeBlk` varchar(20) DEFAULT NULL,
`TaxiOut` int(11) DEFAULT NULL,
`WheelsOff` int(11) DEFAULT NULL,
`WheelsOn` int(11) DEFAULT NULL,
`TaxiIn` int(11) DEFAULT NULL,
`CRSArrTime` int(11) DEFAULT NULL,
`ArrTime` int(11) DEFAULT NULL,
`ArrDelay` int(11) DEFAULT NULL,
`ArrDelayMinutes` int(11) DEFAULT NULL,
`ArrDel15` int(11) DEFAULT NULL,
`ArrivalDelayGroups` int(11) DEFAULT NULL,
`ArrTimeBlk` varchar(20) DEFAULT NULL,
`Cancelled` tinyint(4) DEFAULT NULL,
`CancellationCode` char(1) DEFAULT NULL,
`Diverted` tinyint(4) DEFAULT NULL,
`CRSElapsedTime` int(11) DEFAULT NULL,
`ActualElapsedTime` int(11) DEFAULT NULL,
`AirTime` int(11) DEFAULT NULL,
`Flights` int(11) DEFAULT NULL,
`Distance` int(11) DEFAULT NULL,
`DistanceGroup` tinyint(4) DEFAULT NULL,
`CarrierDelay` int(11) DEFAULT NULL,
`WeatherDelay` int(11) DEFAULT NULL,
`NASDelay` int(11) DEFAULT NULL,
`SecurityDelay` int(11) DEFAULT NULL,
`LateAircraftDelay` int(11) DEFAULT NULL,
`FirstDepTime` varchar(10) DEFAULT NULL,
`TotalAddGTime` varchar(10) DEFAULT NULL,
`LongestAddGTime` varchar(10) DEFAULT NULL,
`DivAirportLandings` varchar(10) DEFAULT NULL,
`DivReachedDest` varchar(10) DEFAULT NULL,
`DivActualElapsedTime` varchar(10) DEFAULT NULL,
`DivArrDelay` varchar(10) DEFAULT NULL,
`DivDistance` varchar(10) DEFAULT NULL,
`Div1Airport` varchar(10) DEFAULT NULL,
`Div1WheelsOn` varchar(10) DEFAULT NULL,
`Div1TotalGTime` varchar(10) DEFAULT NULL,
`Div1LongestGTime` varchar(10) DEFAULT NULL,
`Div1WheelsOff` varchar(10) DEFAULT NULL,
`Div1TailNum` varchar(10) DEFAULT NULL,
`Div2Airport` varchar(10) DEFAULT NULL,
`Div2WheelsOn` varchar(10) DEFAULT NULL,
`Div2TotalGTime` varchar(10) DEFAULT NULL,
`Div2LongestGTime` varchar(10) DEFAULT NULL,
`Div2WheelsOff` varchar(10) DEFAULT NULL,
`Div2TailNum` varchar(10) DEFAULT NULL,
`Div3Airport` varchar(10) DEFAULT NULL,
`Div3WheelsOn` varchar(10) DEFAULT NULL,
`Div3TotalGTime` varchar(10) DEFAULT NULL,
`Div3LongestGTime` varchar(10) DEFAULT NULL,
`Div3WheelsOff` varchar(10) DEFAULT NULL,
`Div3TailNum` varchar(10) DEFAULT NULL,
`Div4Airport` varchar(10) DEFAULT NULL,
`Div4WheelsOn` varchar(10) DEFAULT NULL,
`Div4TotalGTime` varchar(10) DEFAULT NULL,
`Div4LongestGTime` varchar(10) DEFAULT NULL,
`Div4WheelsOff` varchar(10) DEFAULT NULL,
`Div4TailNum` varchar(10) DEFAULT NULL,
`Div5Airport` varchar(10) DEFAULT NULL,
`Div5WheelsOn` varchar(10) DEFAULT NULL,
`Div5TotalGTime` varchar(10) DEFAULT NULL,
`Div5LongestGTime` varchar(10) DEFAULT NULL,
`Div5WheelsOff` varchar(10) DEFAULT NULL,
`Div5TailNum` varchar(10) DEFAULT NULL,
KEY `date_id` (`date_id`),
KEY `flight_id` (`flight_id`),
KEY `origin_airport_id` (`origin_airport_id`),
KEY `dest_airport_id` (`dest_airport_id`),
KEY `DepDelay` (`DepDelay`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains all avaialble data from 1988 to 2010';
mysql> use ontime1;
Database changed
mysql> show table status like 'ontime_fact'\G
*************************** 1. row ***************************
Name: ontime_fact
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 6697533
Avg_row_length: 241
Data_length: 1616904192
Max_data_length: 0
Index_length: 539279360
Data_free: 4194304
Auto_increment: NULL
Create_time: 2011-05-10 04:26:14
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment: Contains all avaialble data from 1988 to 2010
1 row in set (0.00 sec)
With ICE, after compression there is only 2.5GB of data, so ICE gets over 16:1 compression ratio(compared to Innodb, 8:1 compared to raw input data), which is quite nice. Each shard contains only 128MB of data!
Storage engine makes a big difference
In general, a column store performs about 8x-10x better than a row store for queries which access a significant amount of data. One big reason for this is the excellent compression that RLE techniques provide.
I have not loaded InnoDB compressed tables yet but since InnoDB compression is not RLE, I doubt it will have the same impact.
For large datasets effective compression results in the need for fewer nodes in order to keep data entirely in memory. This frees disk to use on-disk temporary storage for hash joins and other background operations. This will have a direct impact in our query response times and throughput.
Setting up a cluster using the AMI images
You can easily test Shard-Query for yourself. Spin up the desired number of EC2 instances using on of the the AMI images. You should spin a number of instances that evenly divides into 20 for best results. There is a helpful utility (included in the image) to help configure the cluster and it uses a copy of this text on this page. To use it, ensure:
- That only the instances that you want to use are shown in the EC2 console.
- That the "private ip" field is selected in the list of columns to show (click show/hide to change the columns)
- That the "public dns" field is selected
SSH to the public DNS entry of the node on the list of nodes. This node will become "shard1".
Now, in the EC2 console hit CTRL-A to select all text on the page and then CTRL-C to copy it. Paste this into a text file on shard1 called "/tmp/servers.txt" and run the following commands:
$ cat servers.txt | grep "10\."| grep -v internal |tee hosts.internal
[host list omitted]
Now you need to set up the hosts file:
sudo su -
# cat hosts.internal | ~ec2-user/tools/mkhosts >> /etc/hosts
# ping shard20
PING shard20 (10.126.15.34) 56(84) bytes of data.
64 bytes from shard20 (10.126.15.34): icmp_seq=1 ttl=61 time=0.637 ms
...
Note: There is no need to put that hosts file on your other nodes unless you want to run workers on them.
Generate a cluster configuration
There is a script provided to generate the shards.ini file for testing an cluster of 1 to 20 nodes.
cd shard-query
#generate a config for 20 shards (adjust to your number of nodes)
php genconfig 20 > shards.ini
Running the test
For best performance, you should run the workers on one or two nodes. You should start two workers per core in the cluster.
First start gearmand:
gearmand -p 7000 -d
Then start the workers on node 1 (assuming a 20 node cluster):
cd shard-query
./start_workers 80
I normally start (2 * TOTAL_CLUSTER_CORES) workers. That is, if you have 20 machines, each with 2 cores, run 80 workers.
Test the system. You should see the following row count (the first number is wall time, the second exec time, the third parse time).
$ echo "select count(*) from ontime_fact;" | ./run_query
Array
(
[count(*)] => 135125787
)
1 rows returned (0.084244966506958s, 0.078309059143066s, 0.0059359073638916s)
Execute the test:
As seen above, the run_query script will run one more more semicolon terminated SQL statements. The queries for the benchmark are in ~ec2-user/shard-query/queries.sql.
I have also provided a convenient script which will summarize the output from the ./run_query command, called pivot_results
cd shard-query/
$ ./run_query < queries.sql | tee raw |./pivot_results &
[1] 12359
$ tail -f ./raw
-- Q1
...
At the end, you will get a result output that is easy to graph in a spreadsheet:
$ cat raw | ./pivot_results
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8.0,Q8.1,Q8.2,Q8.3,Q8.4,Q9,Q10,Q11
34.354,60.978,114.175,27.138,45.751,14.905,14.732,34.946,126.599,250.222,529.287,581.295,11.042,63.366,14.573
InnoDB my.cnf
[client]
port=3306
socket=/tmp/mysql-inno.sock
[mysqld]
socket=/tmp/mysql-inno.sock
default-storage-engine=INNODB
innodb-buffer-pool-instances=2
innodb-buffer-pool-size=5600M
innodb-file-format=barracuda
innodb-file-per-table
innodb-flush-log-at-trx-commit=1
innodb-flush-method=O_DIRECT
innodb-ibuf-active-contract=1
innodb-import-table-from-xtrabackup=1
innodb-io-capacity=1000
innodb-log-buffer-size=32M
innodb-log-file-size=128M
innodb-open-files=1000
innodb_fast_checksum
innodb-purge-threads=1
innodb-read-ahead=linear
innodb-read-ahead-threshold=8
innodb-read-io-threads=16
innodb-recovery-stats
innodb-recovery-update-relay-log
innodb-replication-delay=#
innodb-rollback-on-timeout
innodb-rollback-segments=16
innodb-stats-auto-update=0
innodb-stats-on-metadata=0
innodb-stats-sample-pages=256
innodb-stats-update-need-lock=0
innodb-status-file
innodb-strict-mode
innodb-thread-concurrency=0
innodb-thread-concurrency-timer-based
innodb-thread-sleep-delay=0
innodb-use-sys-stats-table
innodb-write-io-threads=4
join-buffer-size=16M
key-buffer-size=64M
local-infile=on
lock-wait-timeout=300
log-error=/var/log/mysqld-innodb.log
max-allowed-packet=1M
net-buffer-length=16K
#we value throughput over response time, get a good plan
optimizer-prune-level=0
partition=ON
port=3306
read-buffer-size=512K
read-rnd-buffer-size=1M
skip-host-cache
skip-name-resolve
sort-buffer-size=512K
sql-mode=STRICT_TRANS_TABLES
symbolic-links
table-definition-cache=16384
table-open-cache=128
thread-cache-size=32
thread-stack=256K
tmp-table-size=64M
transaction-isolation=READ-COMMITTED
user=mysql
wait-timeout=86400
To be continued
You can now set up a cluster from 1 to 20 nodes for testing. This way you can verify the numbers in my next blog post. I will compare performance of various cluster sizes on both storage engines.