Aug
12
2014
--

Zend Delivers PHP Development Platform In AWS Marketplace

Close up artsy shot of php scripting book Zend, the commercial company behind the popular scripting language PHP, announced today, it was launching a PHP development service in the AWS marketplace, effectively making it a Platform as a Service inside the Amazon infrastructure cloud.
While PHP is an open source project, Zend was launched in 1999 by some early developers to create a commercial arm. Just last month, the company… Read More

Jul
16
2014
--

High Availability with mysqlnd_ms on Percona XtraDB Cluster

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used:

mysqlnd_ms_mm.ini

.  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case

192.168.56.44

 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.

{
   "primary": {
     "master": {
       "master_1": {
         "host": "192.168.56.44",
         "port": "3306"
       },
       "master_2": {
         "host": "192.168.56.43",
         "port": "3306"
       },
       "master_3": {
         "host": "192.168.56.42",
         "port": "3306"
       }
     },
     "slave": { },
     "filters": { "roundrobin": [ ] },
     "failover": { "strategy": "loop_before_master", "remember_failed": true }
   }
 }

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly named

master-master.ini

 :

mysqlnd_ms.enable = 1
mysqlnd_ms.disable_rw_split = 1
mysqlnd_ms.multi_master = 1
mysqlnd_ms.force_config_usage = 1
mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_mm.ini

A new addition to this configuration is

mysqlnd_ms.multi_master

 , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is called

master-master.php

 , it is largely similar to

master-slave-ng.php

with a few differences:

  1. There is no need for 
    /tmp/PRIMARY_HAS_FAILED

      sentinel as all nodes were writable.

  2. There is no need for 
    /*ms=master*/

      SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown 

192.168.56.44

  which sends my connection to the next server in the configuration,

192.168.56.43

 . When I started back 

192.168.56.44

  again, the script resumed connections there. Pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php
Last value 3564 from host 192.168.56.44 via TCP/IP and thread id 19
Last value 3565 from host 192.168.56.44 via TCP/IP and thread id 20
Last value 3566 from host 192.168.56.44 via TCP/IP and thread id 21
Last value 3567 from host 192.168.56.44 via TCP/IP and thread id 22
Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
Warning: connect_mysql(): Error while reading greeting packet. PID=23464 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30
Last value 0 from host  and thread id 0
Last value 3568 from host 192.168.56.43 via TCP/IP and thread id 1552
Last value 3569 from host 192.168.56.43 via TCP/IP and thread id 1553
[...]
Last value 3584 from host 192.168.56.43 via TCP/IP and thread id 1568
Last value 3585 from host 192.168.56.44 via TCP/IP and thread id 18

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed 

roundrobin

  to

random

 . As you can see, all three nodes were being used, of course in random, at the same time you will also see when I shutdown 

192.168.56.44

  around where the 

connect_mysql

  errors and then the server was used again near the bottom after a started it back up. Still pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php
Last value 3590 from host 192.168.56.42 via TCP/IP and thread id 2060
Last value 3591 from host 192.168.56.43 via TCP/IP and thread id 1569
Last value 3592 from host 192.168.56.43 via TCP/IP and thread id 1570
Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
Warning: connect_mysql(): Error while reading greeting packet. PID=23919 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63
ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30
Last value 0 from host  and thread id 0
Last value 3593 from host 192.168.56.42 via TCP/IP and thread id 2061
Last value 3594 from host 192.168.56.42 via TCP/IP and thread id 2062
Last value 3595 from host 192.168.56.42 via TCP/IP and thread id 2063
Last value 3596 from host 192.168.56.42 via TCP/IP and thread id 2064
Last value 3597 from host 192.168.56.43 via TCP/IP and thread id 1576
Last value 3598 from host 192.168.56.43 via TCP/IP and thread id 1577
Last value 3599 from host 192.168.56.43 via TCP/IP and thread id 1578
Last value 3600 from host 192.168.56.43 via TCP/IP and thread id 1579
Last value 3601 from host 192.168.56.42 via TCP/IP and thread id 2065
Last value 3602 from host 192.168.56.43 via TCP/IP and thread id 1581
Last value 3603 from host 192.168.56.43 via TCP/IP and thread id 1582
Last value 3604 from host 192.168.56.42 via TCP/IP and thread id 2066
Last value 3605 from host 192.168.56.44 via TCP/IP and thread id 19
Last value 3606 from host 192.168.56.43 via TCP/IP and thread id 1583
Last value 3607 from host 192.168.56.44 via TCP/IP and thread id 21

So here are some issues I’ve observed during these tests:

  1. remember_failed

      during failover does not work as advertised. Supposedly, a failed node should not be used again for every connection request but in my test, this is not the case. See more from this bug. This means that if you have 2 out of 3 failed nodes in this scenario the overhead would be too big when testing both connections. Perhaps some sort of in memory shared TTL can be used to overcome this? I’m not sure.

  2. If you look closely around line 7 on my last output above the error displayed is kind of misleading. In particular it says
    ERRROR: 192.168.56.43 via TCP/IP

     , whereby it was not 

    192.168.56.43

      that failed, it was

    192.168.56.43

     . This is because under the hood, immediately after failure the next node will be cycled to, this is especially true since we have loop_before_master configured. I sure do have a bug on the script that should capture the 

    host_info

      properly, but this is something to always keep in mind so you don’t keep scratching your head.

So we’ve seen these two forms of possibilities and they definitely have use cases and advantages. On the other hand because of the issues we have found so far (I’ve reported 4 bugs on the PHP bugs database during the course of these tests including one crashing), I recommend to make sure you test seriously before putting this on production.

The post High Availability with mysqlnd_ms on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Jul
14
2014
--

Simple MySQL Master HA with mysqlnd_ms

I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used,

mysqlnd_ms_ms.ini

 :

{
  "primary": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33001"
      }
    },
    "slave": {
    }
  },
  "standby": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33002"
      }
    },
    "slave": {
    }
  }
}

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

mysqlnd_ms.enable = 1
mysqlnd_ms.disable_rw_split = 1
mysqlnd_ms.force_config_usage = 1
mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

This is the custom INI file I used for the tests,

master-slave.ini

 . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

  1. To start the test, it bootstraps the test table via DROP and then CREATE queries.
  2. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.
  3. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.
  4. To create the new connection, a call to the function 
    connect_mysql

      is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like 

    'SELECT 1'

      to start the connection manually or call

    mysqli::real_connect

     . Not even 

    mysqli::ping

      does not work without the former, I’ve opened this bug.

  5. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers 
    2002

      and

    2003

     . If the connection cannot be established after 10 retries, the application creates a sentinel file as 

    /tmp/PRIMARY_HAS_FAILED

      and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

[revin@forge phpugph201407]$ php -c master-slave.ini master-slave-ng.php
Last value 0001 from server id 101 thread id 7
Last value 0003 from server id 101 thread id 8
37: [2002] Connection refused
Connection to host 'primary' failed: [0] Connection refused, retrying (1 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (2 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (3 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (4 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (5 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (6 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (7 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (8 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (9 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (10 of 10) in 3 seconds
The primary host 'primary' has failed after 30 seconds, failing over to standby!
52: [2002] Connection refused
Last value 0004 from server id 102 thread id 635
Last value 0006 from server id 102 thread id 636
Last value 0008 from server id 102 thread id 637
[...]

This is not the perfect setup and there are a number of limitations, however it tells us that if you have a simple HA requirement like if you’re not running a very critical application but still do not want to be waken up at night but rather deal with issues in the morning, this might just fit. So here are some more notes:

    • If you have a master-slave configuration, you just basically shot your primary (master) in the foot during the failover. You may need to rebuild its data in the morning.
    • If instead you have master-master, you might just be able to bring the primary master back online, get it caught up in replication and then delete 
      /tmp/PRIMARY_HAS_FAILED

        file to switch your application back to it.

    • The use of 
      /tmp/PRIMARY_HAS_FAILED

        sentinel file is rudimentary, its not the only way. You should consider sending notifications to yourself when failover happens because this method requires human intervention to put back the primary master back in rotation.

The same effect can be achieved with a little more coding, but you can already take advantage of the plugin with less.

I’ve also tested the plugin on the second form where you can write to multiple masters using Percona XtraDB Cluster. I’ve found a few interesting issues there so stay tuned.

The post Simple MySQL Master HA with mysqlnd_ms appeared first on MySQL Performance Blog.

Jul
08
2013
--

Poll: What programming languages and platforms do you use?

What Programming Languages and Platforms do you use?What programming languages and platforms do you use for large-scale projects in your organization?

If something is missing from the list please leave a comment and share your story. Thanks!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

The post Poll: What programming languages and platforms do you use? appeared first on MySQL Performance Blog.

Mar
06
2013
--

Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries

Percona XtraDB Cluster

Accessing Percona XtraDB Cluster nodes in parallel with MySQL asynchronous calls

This post is followup to Peter’s recent post, “Investigating MySQL Replication Latency in Percona XtraDB Cluster,” in which a question was raised as to whether we can measure latency to all nodes at the same time. It is an interesting question: If we have N nodes, can we send queries to nodes to be executed in parallel?

To answer it, I decided to try a new asynchronous call to send a query to MySQL using a new MySQLnd driver for PHP. In this post I’ll only show how to make these calls, and in following posts how to measure latency to all nodes.


PHP does not provide a way for parallel execution, so this is where an asynchronous call helps. What does this call do? Basically we send a query to MySQL, and we do not wait for result but rather get response later.

The MySQLnd driver has been available since PHP 5.3, and in most part it mimics the standard MySQLi driver and functions it provides. But in addition to that, it also provides a function, mysqli_poll, which unfortunately is marked as “not documented,” however we still can use it — using an example from PHP docs.

So there is my example on how to access Percona XtraDB Cluster nodes in parallel:

$reader_hosts = array( "192.88.225.243", "192.88.225.242", "192.88.225.240", "192.88.225.160", "192.88.225.159" );
$all_links=array();
# Establish connections
foreach ($reader_hosts as $i) {
        $mysqli = new mysqli($i, 'root', '', 'test');
        if ($mysqli->connect_error) {
                echo 'Connect Error (' . $mysqli->connect_errno . ') '
                        . $mysqli->connect_error;
        }else{
                $all_links[]=$mysqli;
                $mysqli->query("SET wsrep_causal_reads=1");
        }
}
# Run queries in parallel:
foreach ($all_links as $linkid => $link) {
 $link->query("SELECT something FROM tableN WHERE ", MYSQLI_ASYNC);
}
$processed = 0;
do {
        $links = $errors = $reject = array();
        foreach ($all_links as $link) {
                $links[] = $errors[] = $reject[] = $link;
        }
    # loop to wait on results
    if (!mysqli_poll($links, $errors, $reject, 60)) {
        continue;
    }
    foreach ($links as $k=>$link) {
        if ($result = $link->reap_async_query()) {
            $res = $result->fetch_row();
            # Handle returned result
            mysqli_free_result($result);
        } else die(sprintf("MySQLi Error: %s", mysqli_error($link)));
        $processed++;
    }
} while ($processed < count($all_links));

As conclusion, we see that using PHP with MySQLnd drivers we can execute 5 MySQL asynchronous queries in parallel against 5 different nodes of Percona XtraDB Cluster.

The post Accessing Percona XtraDB Cluster nodes in parallel from PHP using MySQL asynchronous queries appeared first on MySQL Performance Blog.

Jan
23
2013
--

Read/Write Splitting with PHP Webinar Questions Followup

Today I gave a presentation on “Read/Write Splitting with PHP” for Percona Webinars.  If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as I could during the session, but here are all the questions with my complete answers:

Q: I wasn’t able to start up the webinar until it was 20 minutes in progress. Is it possible to get a recording once it is over?

A: Yes, we will email a link to all webinar attendees with a recording of the webinar and a link to the slides within 24 hours.  Folks who did not attend the webinar can also visit the webinar registration page and get access to the recording and slides.

Q: Are there similar plugins or resources for Java based applications?

A: The recent versions of Connector/J include load-balancing and failover features, but these features don’t seem to measure slave lag.  They seem to assume that all slaves are ready to be queried all the time.

Another solution that could be used independently of application language would be MySQL Proxy with a RW Splitting plugin.  However, if the splitting is delegated to a middleware layer, we would need another method to choose a given query’s slave lag tolerance.  Perhaps embed the application’s choice in an SQL comment?

Q: Can the solution you proposed be used with Percona Xtradb Cluster. Or are there any other issues we should look for?

A: Percona XtraDB Cluster is a synchronous replication technology, therefore the slaves should never lag behind the master.  It’s not necessary to check the “seconds behind master” or the binlog execution position, because all nodes int the cluster are always caught up fully to the original writer.  The Doctrine class extensions I’m experimenting with are meant for environments that use traditional MySQL asynchronous replication.

edit: PXC is really a bit like semi-synchronous replication, in that changes are queued on all cluster nodes, but not executed synchronously.  Because of features like multi-threadeded apply, changes are executed more promptly than they would be in single-threaded MySQL replication.  Therefore all nodes stay in sync much more reliably.

Also, you can enable wsrep_causal_reads for a given session, so that SELECTs will block until all previously queued changes are executed.  Therefore you can have assurance that reads from a slave node will return results as current as reads from the original node where the change was made.

The bottom line is that PXC handles slave lag transparently, so the solution I described in this webinar isn’t necessary.

Q: For randomizing the slave requests, should we use some randomize function or HA proxy?

A: The Doctrine connection wrapper class randomly chooses one the slaves for you.  If the randomly-selected slave is not usable because it’s outside the application’s specified tolerance, then the wrapper class chooses another slave.  No slave is chosen more than once.  If no slaves pass the tolerance test, then the wrapper class executes the query against the master.

Q: What are advantages/disadvantages of filtering-out tables from replication in master side vs slave side?

A: You can configure filtering either on the master or the slave.  Master-side filtering records only a subset of changes to the binary log.  You could use this to reduce binary log growth, if you have some databases that don’t need scale-out or failover capability.  For example tables used for logging less critical information.  In this scenario, not all queries could scale out to slaves, because some of the data would exist only on the master.

Alternatively, you can filter on the slaves. Each slave downloads the full binlog containing all changes, but each slave only replays changes for a subset of the binlog, skipping the other binlog entries.  In this way, each slave could have an easier time keeping up with changes written by the master, and each slave would “specialize” by serving read queries for only specific databases.  In this scenario, you couldn’t pick a slave at random, your application would need to direct queries to the slave(s) that have a copy of the right database.

Q: Based on the example, I assume that PoC is based on tolerance type selection per connection. It would probably makes sense to improve it to query-level. As you mentioned, many applications require different tolerance levels per query.

A: Yes, that’s a good point.  I wrote the proof of concept code with a separate connection wrapper class for each tolerance mode, but another way to design it would be a single class that can switch between different modes on successive queries.

Q: Is RBR faster replicating vs SBR replication?  RBR is faster than statement based, correct?

A: Often that’s true, Row-Based Replication (RBR) can be faster than Statement-Based Replication (SBR).  But there are exception cases too, so we can’t make such a broad general statement.  It depends on the types of queries your application runs.

Q: Hello, was there any good experiences with MySQL Proxy (RW-splitting)?

A: Most solutions with MySQL Proxy I see have stern disclaimers against using it in production.  Even the official manual page for MySQL Proxy 0.8.2 still says: “MySQL Proxy is currently an Alpha release and should not be used within production environments.”  The plugin for RW Splitting also makes similar disclaimers.

I assume it’s practically impossible for a middleware product like MySQL Proxy to assure stability, given that its intended use encourages developers to write their own plugins in Lua.  They can’t be responsible for bugs in developers’ Lua plugins.  :-)

Q: Would it work with the ORM layer of Doctrine, considering it would only work for one unit of work?

A: Yes, if you layer an ORM on top of the DAL, this makes it inconvenient to choose the degree of slave lag tolerance as I have shown.  This is probably a good reason to redesign my prototype to use hints embedded in SQL comments, instead of instantiating different wrapper classes.

Q: Does storing the position in $_SESSION lead to local storage and limit scalabilty?

A: It depends how you store $_SESSION.  I have audited sites that stored $_SESSION in the MySQL database for persistence, but when we measured query load with pt-query-digest, we found that 30% of the database response time was solely due to updating the session table.  For best scalability, I’d recommend storing $_SESSION in an in-memory data store such as Memcached or APC.  Store in $_SESSION only values that are ephemeral and easily recreated.  Anything that needs to be persistent, store it more deliberately in a persistent database.

Q: Can you compare / contrast this approach with using a write-through cache?

A: The InnoDB buffer pool is effectively a type of write-through cache.  That is, when you change data on a given InnoDB page, the change is persisted immediately, but the page also resides in the buffer pool and it can serve subsequent queries.  This allow you speedy access to recently-changed data on a given database instance, as long as the page remains in the buffer pool.

But the read/write split solution is intended for cases where the query load is too high for a single database instance, and you need to scale out to multiple database instances.

Q: Isn’t opening multiple database connections to the master and then subsequently potentially a number of “unsafe” slaves going to be a high performance cost?

A: Connecting to a MySQL database isn’t a great performance cost, compared to the queries themselves.  If it is, then persistent connections can help with that.

By the way, we’re seeing reports that MySQL 5.6 non-persistent connections are much faster, so we have that to look forward to.

Q: What about write splitting (updates,inserts) on multiple MySQL servers? Or is it possible to split only read queries (select etc.)?

A: You can set up a pair of MySQL instances to replicate from each other, so you can write to either instance.  But you don’t gain much scale-out benefits from this, since both instances have to do approximately the same work to apply the changes.  In other words, you end up multiplying work instead of multiplying capacity.

Also, some care is necessary to work with this kind of dual-writer architecture, as my colleague Jay Janssen described in his recent webinar, “The Hazards of Multi-writing in a Dual-Master Setup.” I encourage you to watch the recording of Jay’s presentation.

Percona XtraDB Cluster reduces the hazards of multi-writing by keeping all the instances in sync continuously.

Q: If a slave is slow because of bandwidth issues, the seconds behind master will report zero seconds behind if the SQL thread is caught up and the IO thread is waiting.

A: Yes, the “seconds behind master” is really “seconds behind the latest binlog entry downloaded from the master,” which is a subtle distinction.  It means that the slave could have an inaccurate idea of how much it’s lagging, if there are more binary logs yet to download.  This can happen if the slave is offline for a while, or if the network between master and slave is slow or unreliable.

A solution to measure slave lag more accurately is to use pt-heartbeat, which inserts the system timestamp into a dummy table once per second on the master.  As these timestamp values replicate to a slave, you can compare the latest value inserted to the slave’s system clock and get a much more reliable measure of slave lag (assuming the system clocks on master and slave are in sync).  But I designed my proof of concept Doctrine classes not to assume use of pt-heartbeat.

Thanks again for attending my webinar!  Here are some more tips:

 

The post Read/Write Splitting with PHP Webinar Questions Followup appeared first on MySQL Performance Blog.

May
11
2011
--

Shard-Query EC2 images available

Infobright and InnoDB AMI images are now available

There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here on out. Shard-Query is very flexible in it’s configuration, so you can use this sample database to spread processing over up to 20 nodes.

The Infobright Community Edition (ICE) images are available in 32 and 64 bit varieties. Due to memory requirements, the InnoDB versions are only available on 64 bit instances. MySQL will fail to start on a micro instance, simply decrease the values in the /etc/my.cnf file if you really want to try micro instances.

*EDIT*
The storage worker currently logs too much information. This can cause the disk to fill up with logs. You can fix this by modifying shard-query/run_worker to contain the following:

#!/bin/bash
while [ 1 ]
do
./worker >> /dev/null 2>&1 < /dev/null
done;

Where to find the images

Amazon ID

Name

Arch

Notes
ami-20b74949

shard-query-infobright-demo-64bit

x86_64

ICE 3.5.2pl1. Requires m1.large or larger
ami-8eb648e7

shard-query-innodb-demo-64bit

x86_64

Percona Server 5.5.11 with XtraDB. Requires m1.large or larger.
ami-f65ea19f

shard-query-infobright-demo

i686 ICE 3.5.2pl1 32bit. Requires m1.small or greater.
snap-073b6e68

shard-query-demo-data-flatfiles

30GB ext3 EBS

This is an ext3 volume which contains the flat files for the demos, if you want to reload on your favorite storage engine or database

About the cluster

For best performance, there should be an even data distribution in the system. To get an even distribution, the test data was hashed over the values in the date_id column. There will be another blog post about the usage and performance of the splitter. It is multi-threaded(actually multi-process) and is able to hash split up to 50GB/hour of input data on my i970 test machine. It is possible to distribute splitting and/or loading among multiple nodes as well. Note that in the demonstration each node will contain redundant, but non-accessed data for all configurations of more than one node. This would not be the case in normal circumstances. The extra data will not impact performance because it will never be accessed.

Since both InnoDB and ICE versions of the data are available it is important to examine the differences in size. This will give us some interesting information about how Shard-Query will perform on each database. To do the size comparison, I used the du utility:

InnoDB file size on disk: 42GB (with indexes)

# du -sh *
203M    ibdata1
128M    ib_logfile0
128M    ib_logfile1
988K    mysql
2.1G    ontime1
2.1G    ontime10
2.1G    ontime11
2.1G    ontime12
2.1G    ontime13
2.1G    ontime14
2.1G    ontime15
2.1G    ontime16
2.1G    ontime17
2.1G    ontime18
2.1G    ontime19
2.1G    ontime2
2.1G    ontime20
2.1G    ontime3
2.1G    ontime4
2.1G    ontime5
2.1G    ontime6
2.1G    ontime7
2.1G    ontime8
2.1G    ontime9
212K    performance_schema
0       test

ICE size on disk: 2.5GB

# du -sh *
8.0K    bh.err
11M     BH_RSI_Repository
4.0K    brighthouse.ini
4.0K    brighthouse.log
4.0K    brighthouse.seq
964K    mysql
123M    ontime1
124M    ontime10
123M    ontime11
123M    ontime12
123M    ontime13
123M    ontime14
123M    ontime15
123M    ontime16
123M    ontime17
123M    ontime18
124M    ontime19
124M    ontime2
124M    ontime20
124M    ontime3
123M    ontime4
122M    ontime5
122M    ontime6
122M    ontime7
123M    ontime8
125M    ontime9

The InnoDB data directory size is 42GB, which is twice the original size of the input data. The ICE schema was discussed in the comments of the last post. ICE does not have any indexes (not even primary keys).

Here is the complete InnoDB schema from one shard. The schema is duplicated 20 times (but not the ontime_fact data):

DROP TABLE IF EXISTS `dim_airport`;
CREATE TABLE `dim_airport` (
  `airport_id` int(11) NOT NULL DEFAULT '0',
  `airport_code` char(3) DEFAULT NULL,
  `CityName` varchar(100) DEFAULT NULL,
  `State` char(2) DEFAULT NULL,
  `StateFips` varchar(10) DEFAULT NULL,
  `StateName` varchar(50) NOT NULL,
  `Wac` int(11) DEFAULT NULL,
  PRIMARY KEY (`airport_id`),
  KEY `CityName` (`CityName`),
  KEY `State` (`State`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Data from BTS ontime flight data.  Data for Origin and Destination airport data.';

CREATE TABLE `dim_date` (
  `Year` year(4) DEFAULT NULL,
  `Quarter` tinyint(4) DEFAULT NULL,
  `Month` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `FlightDate` date NOT NULL,
  `date_id` smallint(6) NOT NULL,
  PRIMARY KEY (`date_id`),
  KEY `FlightDate` (`FlightDate`),
  KEY `Year` (`Year`,`Quarter`,`Month`,`DayOfWeek`),
  KEY `Quarter` (`Quarter`,`Month`,`DayOfWeek`),
  KEY `Month` (`Month`,`DayOfWeek`),
  KEY `DayOfWeek` (`DayOfWeek`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains the date information from the BTS ontime flight data.  Note dates may not be in date_id order';
/*!40101 SET character_set_client = @saved_cs_client */;

CREATE TABLE `dim_flight` (
  `UniqueCarrier` char(7) DEFAULT NULL,
  `AirlineID` int(11) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `FlightNum` varchar(10) DEFAULT NULL,
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `AirlineName` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`flight_id`),
  KEY `UniqueCarrier` (`UniqueCarrier`,`AirlineID`,`Carrier`),
  KEY `AirlineID` (`AirlineID`,`Carrier`),
  KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains information on flights, and what airline offered those flights and the flight number of the flight.  Some data hand updated.';

--
-- Table structure for table `ontime_fact`
--

CREATE TABLE `ontime_fact` (
  `date_id` int(11) NOT NULL DEFAULT '0',
  `origin_airport_id` int(11) NOT NULL DEFAULT '0',
  `dest_airport_id` int(11) NOT NULL DEFAULT '0',
  `flight_id` int(11) NOT NULL DEFAULT '0',
  `TailNum` varchar(50) DEFAULT NULL,
  `CRSDepTime` int(11) DEFAULT NULL,
  `DepTime` int(11) DEFAULT NULL,
  `DepDelay` int(11) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `DepDel15` int(11) DEFAULT NULL,
  `DepartureDelayGroups` int(11) DEFAULT NULL,
  `DepTimeBlk` varchar(20) DEFAULT NULL,
  `TaxiOut` int(11) DEFAULT NULL,
  `WheelsOff` int(11) DEFAULT NULL,
  `WheelsOn` int(11) DEFAULT NULL,
  `TaxiIn` int(11) DEFAULT NULL,
  `CRSArrTime` int(11) DEFAULT NULL,
  `ArrTime` int(11) DEFAULT NULL,
  `ArrDelay` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `ArrDel15` int(11) DEFAULT NULL,
  `ArrivalDelayGroups` int(11) DEFAULT NULL,
  `ArrTimeBlk` varchar(20) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `DistanceGroup` tinyint(4) DEFAULT NULL,
  `CarrierDelay` int(11) DEFAULT NULL,
  `WeatherDelay` int(11) DEFAULT NULL,
  `NASDelay` int(11) DEFAULT NULL,
  `SecurityDelay` int(11) DEFAULT NULL,
  `LateAircraftDelay` int(11) DEFAULT NULL,
  `FirstDepTime` varchar(10) DEFAULT NULL,
  `TotalAddGTime` varchar(10) DEFAULT NULL,
  `LongestAddGTime` varchar(10) DEFAULT NULL,
  `DivAirportLandings` varchar(10) DEFAULT NULL,
  `DivReachedDest` varchar(10) DEFAULT NULL,
  `DivActualElapsedTime` varchar(10) DEFAULT NULL,
  `DivArrDelay` varchar(10) DEFAULT NULL,
  `DivDistance` varchar(10) DEFAULT NULL,
  `Div1Airport` varchar(10) DEFAULT NULL,
  `Div1WheelsOn` varchar(10) DEFAULT NULL,
  `Div1TotalGTime` varchar(10) DEFAULT NULL,
  `Div1LongestGTime` varchar(10) DEFAULT NULL,
  `Div1WheelsOff` varchar(10) DEFAULT NULL,
  `Div1TailNum` varchar(10) DEFAULT NULL,
  `Div2Airport` varchar(10) DEFAULT NULL,
  `Div2WheelsOn` varchar(10) DEFAULT NULL,
  `Div2TotalGTime` varchar(10) DEFAULT NULL,
  `Div2LongestGTime` varchar(10) DEFAULT NULL,
  `Div2WheelsOff` varchar(10) DEFAULT NULL,
  `Div2TailNum` varchar(10) DEFAULT NULL,
  `Div3Airport` varchar(10) DEFAULT NULL,
  `Div3WheelsOn` varchar(10) DEFAULT NULL,
  `Div3TotalGTime` varchar(10) DEFAULT NULL,
  `Div3LongestGTime` varchar(10) DEFAULT NULL,
  `Div3WheelsOff` varchar(10) DEFAULT NULL,
  `Div3TailNum` varchar(10) DEFAULT NULL,
  `Div4Airport` varchar(10) DEFAULT NULL,
  `Div4WheelsOn` varchar(10) DEFAULT NULL,
  `Div4TotalGTime` varchar(10) DEFAULT NULL,
  `Div4LongestGTime` varchar(10) DEFAULT NULL,
  `Div4WheelsOff` varchar(10) DEFAULT NULL,
  `Div4TailNum` varchar(10) DEFAULT NULL,
  `Div5Airport` varchar(10) DEFAULT NULL,
  `Div5WheelsOn` varchar(10) DEFAULT NULL,
  `Div5TotalGTime` varchar(10) DEFAULT NULL,
  `Div5LongestGTime` varchar(10) DEFAULT NULL,
  `Div5WheelsOff` varchar(10) DEFAULT NULL,
  `Div5TailNum` varchar(10) DEFAULT NULL,
  KEY `date_id` (`date_id`),
  KEY `flight_id` (`flight_id`),
  KEY `origin_airport_id` (`origin_airport_id`),
  KEY `dest_airport_id` (`dest_airport_id`),
  KEY `DepDelay` (`DepDelay`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Contains all avaialble data from 1988 to 2010';

mysql> use ontime1;
Database changed

mysql> show table status like 'ontime_fact'\G
*************************** 1. row ***************************
           Name: ontime_fact
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 6697533
 Avg_row_length: 241
    Data_length: 1616904192
Max_data_length: 0
   Index_length: 539279360
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2011-05-10 04:26:14
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: Contains all avaialble data from 1988 to 2010
1 row in set (0.00 sec)

With ICE, after compression there is only 2.5GB of data, so ICE gets over 16:1 compression ratio(compared to Innodb, 8:1 compared to raw input data), which is quite nice. Each shard contains only 128MB of data!

Storage engine makes a big difference

In general, a column store performs about 8x-10x better than a row store for queries which access a significant amount of data. One big reason for this is the excellent compression that RLE techniques provide.
I have not loaded InnoDB compressed tables yet but since InnoDB compression is not RLE, I doubt it will have the same impact.

For large datasets effective compression results in the need for fewer nodes in order to keep data entirely in memory. This frees disk to use on-disk temporary storage for hash joins and other background operations. This will have a direct impact in our query response times and throughput.

Setting up a cluster using the AMI images

You can easily test Shard-Query for yourself. Spin up the desired number of EC2 instances using on of the the AMI images. You should spin a number of instances that evenly divides into 20 for best results. There is a helpful utility (included in the image) to help configure the cluster and it uses a copy of this text on this page. To use it, ensure:

  1. That only the instances that you want to use are shown in the EC2 console.
  2. That the "private ip" field is selected in the list of columns to show (click show/hide to change the columns)
  3. That the "public dns" field is selected

SSH to the public DNS entry of the node on the list of nodes. This node will become "shard1".

Now, in the EC2 console hit CTRL-A to select all text on the page and then CTRL-C to copy it. Paste this into a text file on shard1 called "/tmp/servers.txt" and run the following commands:

$ cat servers.txt | grep "10\."| grep -v internal |tee hosts.internal
[host list omitted]

Now you need to set up the hosts file:

sudo su -
# cat hosts.internal | ~ec2-user/tools/mkhosts >> /etc/hosts

# ping shard20
PING shard20 (10.126.15.34) 56(84) bytes of data.
64 bytes from shard20 (10.126.15.34): icmp_seq=1 ttl=61 time=0.637 ms
...

Note: There is no need to put that hosts file on your other nodes unless you want to run workers on them.

Generate a cluster configuration

There is a script provided to generate the shards.ini file for testing an cluster of 1 to 20 nodes.

cd shard-query

#generate a config for 20 shards (adjust to your number of nodes)
php genconfig 20 > shards.ini

Running the test

For best performance, you should run the workers on one or two nodes. You should start two workers per core in the cluster.

First start gearmand:

gearmand -p 7000 -d

Then start the workers on node 1 (assuming a 20 node cluster):

cd shard-query
./start_workers 80

I normally start (2 * TOTAL_CLUSTER_CORES) workers. That is, if you have 20 machines, each with 2 cores, run 80 workers.

Test the system. You should see the following row count (the first number is wall time, the second exec time, the third parse time).

$ echo "select count(*) from ontime_fact;" | ./run_query

Array
(
    [count(*)] => 135125787
)
1 rows returned (0.084244966506958s, 0.078309059143066s, 0.0059359073638916s)

Execute the test:

As seen above, the run_query script will run one more more semicolon terminated SQL statements. The queries for the benchmark are in ~ec2-user/shard-query/queries.sql.

I have also provided a convenient script which will summarize the output from the ./run_query command, called pivot_results

cd shard-query/
$ ./run_query < queries.sql | tee raw |./pivot_results &
[1] 12359
$ tail -f ./raw
-- Q1
...

At the end, you will get a result output that is easy to graph in a spreadsheet:

$ cat raw | ./pivot_results
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8.0,Q8.1,Q8.2,Q8.3,Q8.4,Q9,Q10,Q11
34.354,60.978,114.175,27.138,45.751,14.905,14.732,34.946,126.599,250.222,529.287,581.295,11.042,63.366,14.573

InnoDB my.cnf

[client]
port=3306
socket=/tmp/mysql-inno.sock

[mysqld]
socket=/tmp/mysql-inno.sock
default-storage-engine=INNODB
innodb-buffer-pool-instances=2
innodb-buffer-pool-size=5600M
innodb-file-format=barracuda
innodb-file-per-table
innodb-flush-log-at-trx-commit=1
innodb-flush-method=O_DIRECT
innodb-ibuf-active-contract=1
innodb-import-table-from-xtrabackup=1
innodb-io-capacity=1000
innodb-log-buffer-size=32M
innodb-log-file-size=128M
innodb-open-files=1000
innodb_fast_checksum
innodb-purge-threads=1
innodb-read-ahead=linear
innodb-read-ahead-threshold=8
innodb-read-io-threads=16
innodb-recovery-stats
innodb-recovery-update-relay-log
innodb-replication-delay=#
innodb-rollback-on-timeout
innodb-rollback-segments=16
innodb-stats-auto-update=0
innodb-stats-on-metadata=0
innodb-stats-sample-pages=256
innodb-stats-update-need-lock=0
innodb-status-file
innodb-strict-mode
innodb-thread-concurrency=0
innodb-thread-concurrency-timer-based
innodb-thread-sleep-delay=0
innodb-use-sys-stats-table
innodb-write-io-threads=4
join-buffer-size=16M
key-buffer-size=64M
local-infile=on
lock-wait-timeout=300
log-error=/var/log/mysqld-innodb.log
max-allowed-packet=1M
net-buffer-length=16K
#we value throughput over response time, get a good plan
optimizer-prune-level=0
partition=ON
port=3306
read-buffer-size=512K
read-rnd-buffer-size=1M
skip-host-cache
skip-name-resolve
sort-buffer-size=512K
sql-mode=STRICT_TRANS_TABLES
symbolic-links
table-definition-cache=16384
table-open-cache=128
thread-cache-size=32
thread-stack=256K
tmp-table-size=64M
transaction-isolation=READ-COMMITTED
user=mysql
wait-timeout=86400

To be continued

You can now set up a cluster from 1 to 20 nodes for testing. This way you can verify the numbers in my next blog post. I will compare performance of various cluster sizes on both storage engines.

Jul
12
2009
0

PHP urlEncode urlDecode Tool

So while I was at it made this PHP urlEncode urlDecode Tool because I keep forgetting to bookmark other sites’ urlencode/urldecoders.  Also because the Mac version of Hackbar in Firefox doesn’t have the same options as the Window’s version.

Jul
12
2009
0

PHP TimeDiff Tool

I needed to send an email with human readable string of the difference between two timestamps, so I made this little tool:

PHP TimeDiff Tool

Jul
03
2009
0

Well-Intentioned Destruction

A friend at work forwarded me this article from 2007 titled Well-Intentioned Destruction.  Its a great article about maintaining legacy code and debugging a serious problem with seamingly random data deletes.

Written by in: Development,PHP,Web Development | Tags: ,

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com