Sep
22
2014
--

How to scale big data applications using MySQL sharding frameworks

How to scale big data applications using MySQL sharding frameworksThis Wednesday I’ll be discussing two common types of big data: machine-generated data and user-generated content. These types of big data are amenable to sharding, a commonly used technique for spreading data over more than one database server.

I’ll be discussing this in-depth during a live webinar at 10 a.m. Pacific time on Sept. 24. I’ll also talk about two major sharding frameworks: MySQL Fabric and Shard-Query for OLTP or OLAP workloads, respectively. Following the webinar there will be a brief Q/A session.

Find the webinar link here: “How to Scale Big Data Applications Using MySQL Sharding Frameworks” for more information or register directly here.

Find Shard-Query (part of Swanhart-Tools) here, in Github
Find MySQL Fabric (part of MySQL Utilities) here, at the MySQL documentation page

The post How to scale big data applications using MySQL sharding frameworks appeared first on Percona Performance Blog.

May
01
2014
--

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and explain how Shard-Query executes them in parallel.

Here is the DDL for the lineorder table, which I will use for the demo queries:

CREATE TABLE IF NOT EXISTS lineorder
(
 LO_OrderKey bigint not null,
 LO_LineNumber tinyint not null,
 LO_CustKey int not null,
 LO_PartKey int not null,
 LO_SuppKey int not null,
 LO_OrderDateKey int not null,
 LO_OrderPriority varchar(15),
 LO_ShipPriority char(1),
 LO_Quantity tinyint,
 LO_ExtendedPrice decimal,
 LO_OrdTotalPrice decimal,
 LO_Discount decimal,
 LO_Revenue decimal,
 LO_SupplyCost decimal,
 LO_Tax tinyint,
 LO_CommitDateKey int not null,
 LO_ShipMode varchar(10),
 primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)
) PARTITION BY HASH(LO_OrderDateKey) PARTITIONS 8;

Notice that the lineorder table is partitioned by HASH(LO_OrderDateKey) into 8 partitions.  I used 8 partitions and my test box has 4 cores. It does not hurt to have more partitions than cores. A number of partitions that is two or three times the number of cores generally works best because it keeps each partition small, and smaller partitions are faster to scan. If you have a very large table, a larger number of partitions may be acceptable. Shard-Query will submit a query to Gearman for each partition, and the number of Gearman workers controls the parallelism.

The SQL for the first demo is:

SELECT COUNT(DISTINCT LO_OrderDateKey) FROM lineorder;

Here is the explain from regular MySQL:

mysql> explain select count(distinct LO_OrderDateKey) from lineorder\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineorder
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 25
          ref: NULL
         rows: 58922188
        Extra: Using index
1 row in set (0.00 sec)

 

So it is basically a full table scan. It takes a long time:

mysql> select count(distinct LO_OrderDateKey) from lineorder;
+---------------------------------+
| count(distinct LO_OrderDateKey) |
+---------------------------------+
|                            2406 |
+---------------------------------+
1 row in set (4 min 48.63 sec)

 

Shard-Query executes this query differently from MySQL. It sends a query to each partition, in parallel like the following queries:

Array
(
    [0] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p0)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [1] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p1)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [2] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p2)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [3] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p3)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [4] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p4)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [5] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p5)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [6] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p6)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    [7] => SELECT LO_OrderDateKey AS expr_2839651562
FROM lineorder  PARTITION(p7)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
)

You will notice that there is one query for each partition.  Those queries will be sent to Gearman and executed in parallel by as many Gearman workers as possible (in this case 4.)  The output of the queries go into a coordinator table, and then another query does a final aggregation.  That query looks like this:

SELECT COUNT(distinct expr_2839651562) AS `count`
FROM `aggregation_tmp_73522490`

The Shard-Query time:

select count(distinct LO_OrderDateKey) from lineorder;
Array
(
    [count ] => 2406
)
1 rows returned
Exec time: 0.10923719406128

That isn’t a typo, it really is sub-second compared to minutes in regular MySQL.

This is because Shard-Query uses GROUP BY to answer this query and a  loose index scan of the PRIMARY KEY is possible:

mysql> explain partitions SELECT LO_OrderDateKey AS expr_2839651562
    -> FROM lineorder  PARTITION(p7)  AS `lineorder`   WHERE 1=1  AND 1=1  GROUP BY LO_OrderDateKey
    -> \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineorder
   partitions: p7
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 80108
        Extra: Using index for group-by
1 row in set (0.00 sec)

Next another simple query will be tested, first on regular MySQL:

mysql> select count(*) from lineorder;
+----------+
| count(*) |
+----------+
| 59986052 |
+----------+
1 row in set (4 min 8.70 sec)

Again, the EXPLAIN shows a full table scan:

mysql> explain select count(*) from lineorder\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineorder
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 25
          ref: NULL
         rows: 58922188
        Extra: Using index
1 row in set (0.00 sec)

Now, Shard-Query can’t do anything special to speed up this query, except to execute it in parallel, similar to the first query:

[0] => SELECT COUNT(*) AS expr_3190753946
FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 AND 1=1
[1] => SELECT COUNT(*) AS expr_3190753946
FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 AND 1=1
[2] => SELECT COUNT(*) AS expr_3190753946
FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 AND 1=1
[3] => SELECT COUNT(*) AS expr_3190753946
FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 AND 1=1
...

The aggregation SQL is similar, but this time the aggregate function is changed to SUM to combine the COUNT from each partition:

SELECT SUM(expr_3190753946) AS ` count `
FROM `aggregation_tmp_51969525`

And the query is quite a bit faster at 140.24 second compared with MySQL’s 248.7 second result:

Array
(
[count ] => 59986052
)
1 rows returned
Exec time: 140.24419403076

Finally, I want to look at a more complex query that uses joins and aggregation.

mysql> explain select d_year, c_nation,  sum(lo_revenue - lo_supplycost) as profit  from lineorder
join dim_date  on lo_orderdatekey = d_datekey
join customer  on lo_custkey = c_customerkey
join supplier  on lo_suppkey = s_suppkey
join part  on lo_partkey = p_partkey
where  c_region = 'AMERICA'  and s_region = 'AMERICA'
and (p_mfgr = 'MFGR#1'  or p_mfgr = 'MFGR#2')
group by d_year, c_nation  order by d_year, c_nation;
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                      | rows | Extra                           |
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+
|  1 | SIMPLE      | dim_date  | ALL    | PRIMARY       | NULL    | NULL    | NULL                     |    5 | Using temporary; Using filesort |
|  1 | SIMPLE      | lineorder | ref    | PRIMARY       | PRIMARY | 4       | ssb.dim_date.D_DateKey   |   89 | NULL                            |
|  1 | SIMPLE      | supplier  | eq_ref | PRIMARY       | PRIMARY | 4       | ssb.lineorder.LO_SuppKey |    1 | Using where                     |
|  1 | SIMPLE      | customer  | eq_ref | PRIMARY       | PRIMARY | 4       | ssb.lineorder.LO_CustKey |    1 | Using where                     |
|  1 | SIMPLE      | part      | eq_ref | PRIMARY       | PRIMARY | 4       | ssb.lineorder.LO_PartKey |    1 | Using where                     |
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+------+---------------------------------+
5 rows in set (0.01 sec)

Here is the query on regular MySQL:

mysql> select d_year, c_nation,  sum(lo_revenue - lo_supplycost) as profit  from lineorder  join dim_date  on lo_orderdatekey = d_datekey  join customer  on lo_custkey = c_customerkey  join supplier  on lo_suppkey = s_suppkey  join part  on lo_partkey = p_partkey  where  c_region = 'AMERICA'  and s_region = 'AMERICA'  and (p_mfgr = 'MFGR#1'  or p_mfgr = 'MFGR#2')  group by d_year, c_nation  order by d_year, c_nation;
+--------+---------------+--------------+
| d_year | c_nation      | profit       |
+--------+---------------+--------------+
|   1992 | ARGENTINA     | 102741829748 |
...
|   1998 | UNITED STATES |  61345891337 |
+--------+---------------+--------------+
35 rows in set (11 min 56.79 sec)

Again, Shard-Query splits up the query to run over each partition (I won’t bore you with the details) and it executes the query faster than MySQL, in 343.3 second compared to ~720:

Array
(
    [d_year] => 1998
    [c_nation] => UNITED STATES
    [profit] => 61345891337
)
35 rows returned
Exec time: 343.29854893684

I hope you see how using Shard-Query can speed up queries without using sharding, on just a single server. All you really need to do is add partitioning.

You can get Shard-Query from GitHub at http://github.com/greenlion/swanhart-tools

Please note: Configure and install Shard-Query as normal, but simply use one node and set the column option (the shard column) to “nocolumn” or false, because you are not required to use a shard column if you are not sharding.

The post Parallel Query for MySQL with Shard-Query appeared first on MySQL Performance Blog.

May
11
2011
--

Shard-Query EC2 images available

Infobright and InnoDB AMI images are now available

There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here on out. Shard-Query is very flexible in it’s configuration, so you can use this sample database to spread processing over up to 20 nodes.

The Infobright Community Edition (ICE) images are available in 32 and 64 bit varieties. Due to memory requirements, the InnoDB versions are only available on 64 bit instances. MySQL will fail to start on a micro instance, simply decrease the values in the /etc/my.cnf file if you really want to try micro instances.

*EDIT*
The storage worker currently logs too much information. This can cause the disk to fill up with logs. You can fix this by modifying shard-query/run_worker to contain the following:

#!/bin/bash
while [ 1 ]
do
./worker >> /dev/null 2>&1 < /dev/null
done;

Where to find the images

Amazon ID

Name

Arch

Notes
ami-20b74949

shard-query-infobright-demo-64bit

x86_64

ICE 3.5.2pl1. Requires m1.large or larger
ami-8eb648e7

shard-query-innodb-demo-64bit

x86_64

Percona Server 5.5.11 with XtraDB. Requires m1.large or larger.
ami-f65ea19f

shard-query-infobright-demo

i686 ICE 3.5.2pl1 32bit. Requires m1.small or greater.
snap-073b6e68

shard-query-demo-data-flatfiles

30GB ext3 EBS

This is an ext3 volume which contains the flat files for the demos, if you want to reload on your favorite storage engine or database

About the cluster

For best performance, there should be an even data distribution in the system. To get an even distribution, the test data was hashed over the values in the date_id column. There will be another blog post about the usage and performance of the splitter. It is multi-threaded(actually multi-process) and is able to hash split up to 50GB/hour of input data on my i970 test machine. It is possible to distribute splitting and/or loading among multiple nodes as well. Note that in the demonstration each node will contain redundant, but non-accessed data for all configurations of more than one node. This would not be the case in normal circumstances. The extra data will not impact performance because it will never be accessed.

Since both InnoDB and ICE versions of the data are available it is important to examine the differences in size. This will give us some interesting information about how Shard-Query will perform on each database. To do the size comparison, I used the du utility:

InnoDB file size on disk: 42GB (with indexes)

# du -sh *
203M    ibdata1
128M    ib_logfile0
128M    ib_logfile1
988K    mysql
2.1G    ontime1
2.1G    ontime10
2.1G    ontime11
2.1G    ontime12
2.1G    ontime13
2.1G    ontime14
2.1G    ontime15
2.1G    ontime16
2.1G    ontime17
2.1G    ontime18
2.1G    ontime19
2.1G    ontime2
2.1G    ontime20
2.1G    ontime3
2.1G    ontime4
2.1G    ontime5
2.1G    ontime6
2.1G    ontime7
2.1G    ontime8
2.1G    ontime9
212K    performance_schema
0       test

ICE size on disk: 2.5GB

# du -sh *
8.0K    bh.err
11M     BH_RSI_Repository
4.0K    brighthouse.ini
4.0K    brighthouse.log
4.0K    brighthouse.seq
964K    mysql
123M    ontime1
124M    ontime10
123M    ontime11
123M    ontime12
123M    ontime13
123M    ontime14
123M    ontime15
123M    ontime16
123M    ontime17
123M    ontime18
124M    ontime19
124M    ontime2
124M    ontime20
124M    ontime3
123M    ontime4
122M    ontime5
122M    ontime6
122M    ontime7
123M    ontime8
125M    ontime9

The InnoDB data directory size is 42GB, which is twice the original size of the input data. The ICE schema was discussed in the comments of the last post. ICE does not have any indexes (not even primary keys).

Here is the complete InnoDB schema from one shard. The schema is duplicated 20 times (but not the ontime_fact data):

DROP TABLE IF EXISTS `dim_airport`;
CREATE TABLE `dim_airport` (
  `airport_id` int(11) NOT NULL DEFAULT '0',
  `airport_code` char(3) DEFAULT NULL,
  `CityName` varchar(100) DEFAULT NULL,
  `State` char(2) DEFAULT NULL,
  `StateFips` varchar(10) DEFAULT NULL,
  `StateName` varchar(50) NOT NULL,
  `Wac` int(11) DEFAULT NULL,
  PRIMARY KEY (`airport_id`),
  KEY `CityName` (`CityName`),
  KEY `State` (`State`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Data from BTS ontime flight data.  Data for Origin and Destination airport data.';

CREATE TABLE `dim_date` (
  `Year` year(4) DEFAULT NULL,
  `Quarter` tinyint(4) DEFAULT NULL,
  `Month` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `FlightDate` date NOT NULL,
  `date_id` smallint(6) NOT NULL,
  PRIMARY KEY (`date_id`),
  KEY `FlightDate` (`FlightDate`),
  KEY `Year` (`Year`,`Quarter`,`Month`,`DayOfWeek`),
  KEY `Quarter` (`Quarter`,`Month`,`DayOfWeek`),
  KEY `Month` (`Month`,`DayOfWeek`),
  KEY `DayOfWeek` (`DayOfWeek`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains the date information from the BTS ontime flight data.  Note dates may not be in date_id order';
/*!40101 SET character_set_client = @saved_cs_client */;

CREATE TABLE `dim_flight` (
  `UniqueCarrier` char(7) DEFAULT NULL,
  `AirlineID` int(11) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `FlightNum` varchar(10) DEFAULT NULL,
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`flight_id`),
  KEY `UniqueCarrier` (`UniqueCarrier`,`AirlineID`,`Carrier`),
  KEY `AirlineID` (`AirlineID`,`Carrier`),
  KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains information on flights, and what airline offered those flights and the flight number of the flight.  Some data hand updated.';

--
-- Table structure for table `ontime_fact`
--

CREATE TABLE `ontime_fact` (
  `date_id` int(11) NOT NULL DEFAULT '0',
  `origin_airport_id` int(11) NOT NULL DEFAULT '0',
  `dest_airport_id` int(11) NOT NULL DEFAULT '0',
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `TailNum` varchar(50) DEFAULT NULL,
  `CRSDepTime` int(11) DEFAULT NULL,
  `DepTime` int(11) DEFAULT NULL,
  `DepDelay` int(11) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `DepDel15` int(11) DEFAULT NULL,
  `DepartureDelayGroups` int(11) DEFAULT NULL,
  `DepTimeBlk` varchar(20) DEFAULT NULL,
  `TaxiOut` int(11) DEFAULT NULL,
  `WheelsOff` int(11) DEFAULT NULL,
  `WheelsOn` int(11) DEFAULT NULL,
  `TaxiIn` int(11) DEFAULT NULL,
  `CRSArrTime` int(11) DEFAULT NULL,
  `ArrTime` int(11) DEFAULT NULL,
  `ArrDelay` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `ArrDel15` int(11) DEFAULT NULL,
  `ArrivalDelayGroups` int(11) DEFAULT NULL,
  `ArrTimeBlk` varchar(20) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `DistanceGroup` tinyint(4) DEFAULT NULL,
  `CarrierDelay` int(11) DEFAULT NULL,
  `WeatherDelay` int(11) DEFAULT NULL,
  `NASDelay` int(11) DEFAULT NULL,
  `SecurityDelay` int(11) DEFAULT NULL,
  `LateAircraftDelay` int(11) DEFAULT NULL,
  `FirstDepTime` varchar(10) DEFAULT NULL,
  `TotalAddGTime` varchar(10) DEFAULT NULL,
  `LongestAddGTime` varchar(10) DEFAULT NULL,
  `DivAirportLandings` varchar(10) DEFAULT NULL,
  `DivReachedDest` varchar(10) DEFAULT NULL,
  `DivActualElapsedTime` varchar(10) DEFAULT NULL,
  `DivArrDelay` varchar(10) DEFAULT NULL,
  `DivDistance` varchar(10) DEFAULT NULL,
  `Div1Airport` varchar(10) DEFAULT NULL,
  `Div1WheelsOn` varchar(10) DEFAULT NULL,
  `Div1TotalGTime` varchar(10) DEFAULT NULL,
  `Div1LongestGTime` varchar(10) DEFAULT NULL,
  `Div1WheelsOff` varchar(10) DEFAULT NULL,
  `Div1TailNum` varchar(10) DEFAULT NULL,
  `Div2Airport` varchar(10) DEFAULT NULL,
  `Div2WheelsOn` varchar(10) DEFAULT NULL,
  `Div2TotalGTime` varchar(10) DEFAULT NULL,
  `Div2LongestGTime` varchar(10) DEFAULT NULL,
  `Div2WheelsOff` varchar(10) DEFAULT NULL,
  `Div2TailNum` varchar(10) DEFAULT NULL,
  `Div3Airport` varchar(10) DEFAULT NULL,
  `Div3WheelsOn` varchar(10) DEFAULT NULL,
  `Div3TotalGTime` varchar(10) DEFAULT NULL,
  `Div3LongestGTime` varchar(10) DEFAULT NULL,
  `Div3WheelsOff` varchar(10) DEFAULT NULL,
  `Div3TailNum` varchar(10) DEFAULT NULL,
  `Div4Airport` varchar(10) DEFAULT NULL,
  `Div4WheelsOn` varchar(10) DEFAULT NULL,
  `Div4TotalGTime` varchar(10) DEFAULT NULL,
  `Div4LongestGTime` varchar(10) DEFAULT NULL,
  `Div4WheelsOff` varchar(10) DEFAULT NULL,
  `Div4TailNum` varchar(10) DEFAULT NULL,
  `Div5Airport` varchar(10) DEFAULT NULL,
  `Div5WheelsOn` varchar(10) DEFAULT NULL,
  `Div5TotalGTime` varchar(10) DEFAULT NULL,
  `Div5LongestGTime` varchar(10) DEFAULT NULL,
  `Div5WheelsOff` varchar(10) DEFAULT NULL,
  `Div5TailNum` varchar(10) DEFAULT NULL,
  KEY `date_id` (`date_id`),
  KEY `flight_id` (`flight_id`),
  KEY `origin_airport_id` (`origin_airport_id`),
  KEY `dest_airport_id` (`dest_airport_id`),
  KEY `DepDelay` (`DepDelay`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains all avaialble data from 1988 to 2010';

mysql> use ontime1;
Database changed

mysql> show table status like 'ontime_fact'\G
*************************** 1. row ***************************
           Name: ontime_fact
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 6697533
 Avg_row_length: 241
    Data_length: 1616904192
Max_data_length: 0
   Index_length: 539279360
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2011-05-10 04:26:14
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: Contains all avaialble data from 1988 to 2010
1 row in set (0.00 sec)

With ICE, after compression there is only 2.5GB of data, so ICE gets over 16:1 compression ratio(compared to Innodb, 8:1 compared to raw input data), which is quite nice. Each shard contains only 128MB of data!

Storage engine makes a big difference

In general, a column store performs about 8x-10x better than a row store for queries which access a significant amount of data. One big reason for this is the excellent compression that RLE techniques provide.
I have not loaded InnoDB compressed tables yet but since InnoDB compression is not RLE, I doubt it will have the same impact.

For large datasets effective compression results in the need for fewer nodes in order to keep data entirely in memory. This frees disk to use on-disk temporary storage for hash joins and other background operations. This will have a direct impact in our query response times and throughput.

Setting up a cluster using the AMI images

You can easily test Shard-Query for yourself. Spin up the desired number of EC2 instances using on of the the AMI images. You should spin a number of instances that evenly divides into 20 for best results. There is a helpful utility (included in the image) to help configure the cluster and it uses a copy of this text on this page. To use it, ensure:

  1. That only the instances that you want to use are shown in the EC2 console.
  2. That the "private ip" field is selected in the list of columns to show (click show/hide to change the columns)
  3. That the "public dns" field is selected

SSH to the public DNS entry of the node on the list of nodes. This node will become "shard1".

Now, in the EC2 console hit CTRL-A to select all text on the page and then CTRL-C to copy it. Paste this into a text file on shard1 called "/tmp/servers.txt" and run the following commands:

$ cat servers.txt | grep "10\."| grep -v internal |tee hosts.internal
[host list omitted]

Now you need to set up the hosts file:

sudo su -
# cat hosts.internal | ~ec2-user/tools/mkhosts >> /etc/hosts

# ping shard20
PING shard20 (10.126.15.34) 56(84) bytes of data.
64 bytes from shard20 (10.126.15.34): icmp_seq=1 ttl=61 time=0.637 ms
...

Note: There is no need to put that hosts file on your other nodes unless you want to run workers on them.

Generate a cluster configuration

There is a script provided to generate the shards.ini file for testing an cluster of 1 to 20 nodes.

cd shard-query

#generate a config for 20 shards (adjust to your number of nodes)
php genconfig 20 > shards.ini

Running the test

For best performance, you should run the workers on one or two nodes. You should start two workers per core in the cluster.

First start gearmand:

gearmand -p 7000 -d

Then start the workers on node 1 (assuming a 20 node cluster):

cd shard-query
./start_workers 80

I normally start (2 * TOTAL_CLUSTER_CORES) workers. That is, if you have 20 machines, each with 2 cores, run 80 workers.

Test the system. You should see the following row count (the first number is wall time, the second exec time, the third parse time).

$ echo "select count(*) from ontime_fact;" | ./run_query

Array
(
    [count(*)] => 135125787
)
1 rows returned (0.084244966506958s, 0.078309059143066s, 0.0059359073638916s)

Execute the test:

As seen above, the run_query script will run one more more semicolon terminated SQL statements. The queries for the benchmark are in ~ec2-user/shard-query/queries.sql.

I have also provided a convenient script which will summarize the output from the ./run_query command, called pivot_results

cd shard-query/
$ ./run_query < queries.sql | tee raw |./pivot_results &
[1] 12359
$ tail -f ./raw
-- Q1
...

At the end, you will get a result output that is easy to graph in a spreadsheet:

$ cat raw | ./pivot_results
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8.0,Q8.1,Q8.2,Q8.3,Q8.4,Q9,Q10,Q11
34.354,60.978,114.175,27.138,45.751,14.905,14.732,34.946,126.599,250.222,529.287,581.295,11.042,63.366,14.573

InnoDB my.cnf

[client]
port=3306
socket=/tmp/mysql-inno.sock

[mysqld]
socket=/tmp/mysql-inno.sock
default-storage-engine=INNODB
innodb-buffer-pool-instances=2
innodb-buffer-pool-size=5600M
innodb-file-format=barracuda
innodb-file-per-table
innodb-flush-log-at-trx-commit=1
innodb-flush-method=O_DIRECT
innodb-ibuf-active-contract=1
innodb-import-table-from-xtrabackup=1
innodb-io-capacity=1000
innodb-log-buffer-size=32M
innodb-log-file-size=128M
innodb-open-files=1000
innodb_fast_checksum
innodb-purge-threads=1
innodb-read-ahead=linear
innodb-read-ahead-threshold=8
innodb-read-io-threads=16
innodb-recovery-stats
innodb-recovery-update-relay-log
innodb-replication-delay=#
innodb-rollback-on-timeout
innodb-rollback-segments=16
innodb-stats-auto-update=0
innodb-stats-on-metadata=0
innodb-stats-sample-pages=256
innodb-stats-update-need-lock=0
innodb-status-file
innodb-strict-mode
innodb-thread-concurrency=0
innodb-thread-concurrency-timer-based
innodb-thread-sleep-delay=0
innodb-use-sys-stats-table
innodb-write-io-threads=4
join-buffer-size=16M
key-buffer-size=64M
local-infile=on
lock-wait-timeout=300
log-error=/var/log/mysqld-innodb.log
max-allowed-packet=1M
net-buffer-length=16K
#we value throughput over response time, get a good plan
optimizer-prune-level=0
partition=ON
port=3306
read-buffer-size=512K
read-rnd-buffer-size=1M
skip-host-cache
skip-name-resolve
sort-buffer-size=512K
sql-mode=STRICT_TRANS_TABLES
symbolic-links
table-definition-cache=16384
table-open-cache=128
thread-cache-size=32
thread-stack=256K
tmp-table-size=64M
transaction-isolation=READ-COMMITTED
user=mysql
wait-timeout=86400

To be continued

You can now set up a cluster from 1 to 20 nodes for testing. This way you can verify the numbers in my next blog post. I will compare performance of various cluster sizes on both storage engines.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com