Jan
18
2019
--

Percona XtraDB Cluster Operator 0.2.0 Early Access Release Is Now Available

Percona XtraDB Cluster Operator

 

Percona announces the release of Percona XtraDB Cluster Operator  0.2.0 early access.

The Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help save your time and effort.

Percona software builds located in the Percona-Lab and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

You can install the Percona XtraDB Cluster Operator on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The Percona XtraDB Cluster Operator on Percona-Lab is an early access release. Percona doesn’t recommend it for production environments. 

New features

  • Advanced nodes assignment implemented in this version allows to run containers with Percona XtraDB Cluster nodes on different hosts, availability zones, etc. to achieve higher availability and fault tolerance.
  • Cluster backups are now supported, and can be performed on a schedule or on demand.
  • Percona XtraDB Cluster Operator now supports private container registries like those in OpenShift so that Internet access is not required to deploy the operator.

Improvements

  • CLOUD-69: Annotations and labels are now passed from the deploy/cr.yaml configuration file to a StatefulSet for both Percona XtraDB Cluster and ProxySQL Pods
  • CLOUD-55: Now setting a password for the ProxySQL admin user is supported.
  • CLOUD-48: Migration to operator SDK 0.2.1

Fixed Bugs

  • CLOUD-82: Pods were stopped in random order while the cluster removal, which could cause problems when recreating the cluster with the same name.
  • CLOUD-79: Setting long cluster name in the deploy/cr.yaml file made Percona XtraDB Cluster unable to start.
  • CLOUD-54: The clustercheck tool used monitor user instead of its own clustercheck one for liveness and readiness probes.
Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses. It integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

Jan
18
2019
--

Percona XtraBackup 2.4.13 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 2.4.13 on January 18, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

New features and improvements:

  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.

Bugs fixed

  • xtrabackup did not delete missing tables from the partial backup which led to error messages logged by the server on startup. Bug fixed PXB-1536.
  • The --history option did not work when autocommit was disabled. Bug fixed PXB-1569.
  • xtrabackup could fail to backup encrypted tablespace when it was recently created or altered. Bug fixed PXB-1648.
  • When the --throttle option was used, the applied value was different from the one specified by the user (off by one error). Bug fixed PXB-1668.
  • It was not allowed for MTS (multi-threaded slaves) without GTID to be backed up with --safe-slave-backup. Bug fixed PXB-1672.
  • Percona Xtrabackup could crash when the ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL. Bug fixed PXB-1679.
  • xbcrypt could display an assertion failure and generated core if the required parameters are missing. Bug fixed PXB-1683.
  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory error”. Bugs fixed PXB-1691 and PXB-1698.
  • xtrabackup --prepare could hang while performing insert buffer merge. Bug fixed PXB-1704.
  • Incremental backups did not update xtrabackup_binlog_info with --binlog-info=lockless. Bug fixed PXB-1711

Other bugs fixed:  PXB-1570PXB-1609PXB-1632

Release notes with all the improvements for version 2.4.13 are available in our online documentation. Please report any bugs to the issue tracker.

Jan
18
2019
--

Replication Manager Works with MariaDB

Complex multi-cluster replication topology

Some time ago I wrote a script to manage asynchronous replication links between Percona XtraDB clusters. The original post can be found here. The script worked well with Percona XtraDB Cluster but it wasn’t working well with MariaDB®.  Finally, the replication manager works with MariaDB.

First, let’s review the purpose of the script. Managing replication links between Galera based clusters is a tedious task. There are many potential slaves and many potential masters. Furthermore, each replication link must have only a single slave. Just try to imagine how you would maintain the following replication topology:

A complex replication topology

The above topology consists of five clusters and four master-to-master links. The replication manager can easily handle this topology. Of course, it is not a fix to the limitations of asynchronous replication. You must make sure your writes are replication safe. You could want, for example, a global user list or to centralize some access logs. Just to refresh memories, here are some of the script highlights:

  • Uses the Galera cluster for Quorum
  • Configurable, arbitrarily complex topologies
  • The script stores the topology in database tables
  • Elects slaves automatically
  • Monitors replication links
  • Slaves can connect to a list of potential masters

As you probably know, MariaDB has a different GTID implementation and syntax for the multi-source replication commands. I took some time to investigate why the script was failing and fixed it. Now, provided you are using MariaDB 10.1.4+ with GTIDs, the replication manager works fine.

You can found the script here. Be aware that although I work for Percona, the script is not officially supported by Percona.

Jan
17
2019
--

Percona Server for MySQL 8.0.13-4 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQL

Percona announces the release of Percona Server for MySQL 8.0.13-4 on January 17, 2019 (downloads are available here and from the Percona Software Repositories). This release contains a fix for a critical bug that prevented Percona Server for MySQL 5.7.24-26 from being upgraded to version 8.0.13-3 if there are more than around 1000 tables, or if the maximum allocated InnoDB table ID is around 1000. Percona Server for MySQL 8.0.13-4 is now the current GA release in the 8.0 series.

All of Percona’s software is open-source and free.

Percona Server for MySQL 8.0 includes all the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona. For a list of highlighted features from both MySQL 8.0 and Percona Server for MySQL 8.0, please see the GA release announcement.

Note: If you are upgrading from 5.7 to 8.0, please ensure that you read the upgrade guide and the document Changed in Percona Server for MySQL 8.0.

Bugs Fixed

  • It was not possible to upgrade from MySQL 5.7.24-26 to 8.0.13-3 if there were more than around 1000 tables, or if the maximum allocated InnoDB table ID was around 1000. Bug fixed #5245.
  • SHOW BINLOG EVENTS FROM <bad offset> is not diagnosed inside Format_description_log_events. Bug fixed #5126 (Upstream #93544).
  • There was a typo in mysqld_safe.shtrottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.
  • Percona Server for MySQL 8.0 could crash with the “Assertion failure: dict0dict.cc:7451:space_id != SPACE_UNKNOWN” exception during an upgrade from Percona Server for MySQL 5.7.23 to Percona Server for MySQL 8.0.13-3 with --innodb_file_per_table=OFF. Bug fixed #5222.
  • On Debian or Ubuntu, a conflict was reported on the /usr/bin/innochecksum file when attempting to install Percona Server for MySQL 8 over MySQL 8. Bug fixed #5225.
  • An out-of-bound read exception could occur on debug builds in the compressed columns with dictionaries feature. Bug fixed #5311.
  • The innodb_data_pending_reads server status variable contained an incorrect value. Bug fixed #5264. Thanks to Fangxin Lou for the patch.
  • A memory leak and needless allocation in compression dictionaries could happen in mysqldump. Bug fixed #5307.
  • A compression-related memory leak could happen in mysqlbinlog. Bug fixed #5308.

Other bugs fixed: #4797#5209#5268#5270#5306#5309

Find the release notes for Percona Server for MySQL 8.0.13-4 in our online documentation. Report bugs in the Jira bug tracker.

Jan
17
2019
--

Using Parallel Query with Amazon Aurora for MySQL

parallel query amazon aurora for mysql

parallel query amazon aurora for mysqlParallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL (Percona Server) 5.6 running on an EC2 instance of the same class.

In Short

Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. This is a significant improvement for the slow queries.

Test data and versions

For my test, I need to choose:

  1. Aurora instance type and comparison
  2. Dataset
  3. Queries

Aurora instance type and comparison

According to Jeff Barr’s excellent article (https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/) the following instance classes will support parallel query (PQ):

“The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions”

As I want to maximize the concurrency of parallel query sessions, I have chosen db.r4.8xlarge. For the EC2 instance I will use the same class: r4.8xlarge.

Aurora:

mysql> show global variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| aurora_version          | 1.18.0                       |
| innodb_version          | 1.2.10                       |
| protocol_version        | 10                           |
| version                 | 5.6.10                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+

MySQL on ec2

mysql> show global variables like '%version%';
+-------------------------+------------------------------------------------------+
| Variable_name           | Value                                                |
+-------------------------+------------------------------------------------------+
| innodb_version          | 5.6.41-84.1                                          |
| protocol_version        | 10                                                   |
| slave_type_conversions  |                                                      |
| tls_version             | TLSv1.1,TLSv1.2                                      |
| version                 | 5.6.41-84.1                                          |
| version_comment         | Percona Server (GPL), Release 84.1, Revision b308619 |
| version_compile_machine | x86_64                                               |
| version_compile_os      | debian-linux-gnu                                     |
| version_suffix          |                                                      |
+-------------------------+------------------------------------------------------+

Table

I’m using the “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time  (You can find the scripts I used here: https://github.com/Percona-Lab/ontime-airline-performance).

mysql> show table status like 'ontime'\G
*************************** 1. row ***************************
          Name: ontime
        Engine: InnoDB
       Version: 10
    Row_format: Compact
          Rows: 173221661
Avg_row_length: 409
   Data_length: 70850183168
Max_data_length: 0
  Index_length: 0
     Data_free: 7340032
Auto_increment: NULL
   Create_time: 2018-09-26 02:03:28
   Update_time: NULL
    Check_time: NULL
     Collation: latin1_swedish_ci
      Checksum: NULL
Create_options:
       Comment:
1 row in set (0.00 sec)

The table is very wide, 84 columns.

Working with Aurora PQ (Parallel Query)

Documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html

Aurora PQ works by doing a full table scan (parallel reads are done on the storage level). The InnoDB buffer pool is not used when Parallel Query is utilized.

For the purposes of the test I turned PQ on and off (normally AWS Aurora uses its own heuristics to determine if the PQ will be helpful or not):

Turn on and force:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;
Query OK, 0 rows affected (0.00 sec)

Turn off:

mysql> set session aurora_pq = 0;
Query OK, 0 rows affected (0.00 sec)

The EXPLAIN plan in MySQL will also show the details about parallel query execution statistics.

Queries

Here, I use the “reporting” queries, running only one query at a time. The queries are similar to those I’ve used in older blog posts comparing MySQL and Apache Spark performance (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/ )

Here is a summary of the queries:

  1. Simple queries:
    • select count(*) from ontime where flightdate > '2017-01-01'
    • select avg(DepDelay/ArrDelay+1) from ontime
  2. Complex filter, single table:
select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
   and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = 0
  ORDER by DepDelay DESC
LIMIT 10;

3. Complex filter, join “reference” table

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

4. select one row only, no index

Query 1a: simple, count(*)

Let’s take a look at the most simple query: count(*). This variant of the “ontime” table has no secondary indexes.

select count(*) from ontime where flightdate > '2017-01-01';

Aurora, pq (parallel query) disabled:

I disabled the PQ first to compare:

mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (8 min 25.49 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.81 sec)
mysql> mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.25 sec)
Please note: the first run was “cold run”; data was read from disk. The second and third run used the cached data.
Now let's enable and force Aurora PQ:
mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1; 
Query OK, 0 rows affected (0.00 sec)
mysql> explain select count(*) from ontime where flightdate > '2017-01-01'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using parallel query (1 columns, 1 filters, 0 exprs; 0 extra)
1 row in set (0.00 sec)

(from the EXPLAIN plan, we can see that parallel query is used).

Results:

mysql> select count(*) from ontime where flightdate > '2017-01-01';                                                                                                                          
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.53 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)

As we can see the results are very stable. It does not use any cache (ie: innodb buffer pool) either. The result is also interesting: utilizing multiple threads (up to 16 threads) and reading data from disk (using disk cache, probably) can be ~10x faster compared to reading from memory in a single thread.

Result: ~10x performance gain, no index used

Query 1b: simple, avg

set aurora_pq = 1; set aurora_pq_force=1;
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (1 min 48.17 sec)
set aurora_pq = 0; set aurora_pq_force=0;  
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (2 min 49.95 sec)
Here we can see that PQ gives use ~2x performance increase.

Summary of simple query performance

Here is what we learned comparing Aurora PQ performance to native MySQL query execution:

  1. Select count(*), not using index: 10x performance increase with Aurora PQ.
  2. select avg(…), not using index: 2x performance increase with Aurora PQ.

Query 2: Complex filter, single table

The following query will always be slow in MySQL. This combination of the filters in the WHERE condition makes it extremely hard to prepare a good set of indexes to make this query faster.

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
  and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = '0'
ORDER by DepDelay DESC
LIMIT 10;

Let’s compare the query performance with and without PQ.

PQ disabled:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+

10 rows in set (3 min 42.47 sec)

/* another run */

10 rows in set (3 min 46.90 sec)

This query is 100% cached. Here is the graph from PMM showing the number of read requests:

  1. Read requests: logical requests from the buffer pool
  2. Disk reads: physical requests from disk

Buffer pool requests:

Buffer pool requests from PMM

Now let’s enable and force PQ:

PQ enabled:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;                                                                                                                              Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort; Using parallel query (12 columns, 4 filters, 3 exprs; 0 extra)
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS                                                                                                                                                                      -> FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest -> FROM ontime
   -> WHERE
   ->  DestState not in ('AK', 'HI', 'PR', 'VI')
   ->  and OriginState not in ('AK', 'HI', 'PR', 'VI')
   ->  and flightdate > '2015-01-01'
   ->   and ArrDelay < 15
   -> and cancelled = 0
   -> and Diverted = 0
   -> and DivAirportLandings = 0
   ->  ORDER by DepDelay DESC
   -> LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+
10 rows in set (41.88 sec)
/* run 2 */
10 rows in set (28.49 sec)
/* run 3 */
10 rows in set (29.60 sec)

Now let’s compare the requests:

InnoDB Buffer Pool Requests

As we can see, Aurora PQ is almost NOT utilizing the buffer pool (there are a minor number of read requests. Compare the max of 4K requests per second with PQ to the constant 600K requests per second in the previous graph).

Result: ~8x performance gain

Query 3: Complex filter, join “reference” table

In this example I join two tables: the main “ontime” table and a reference table. If we have both tables without indexes it will simply be too slow in MySQL. To make it better, I have created an index for both tables and so it will use indexes for the join:

CREATE TABLE `carriers` (
 `carrier_code` varchar(8) NOT NULL DEFAULT '',
 `carrier_name` varchar(200) DEFAULT NULL,
 PRIMARY KEY (`carrier_code`),
 KEY `carrier_name` (`carrier_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> show create table ontime_ind\G
...
 PRIMARY KEY (`id`),
 KEY `comb1` (`Carrier`,`Year`,`ArrDelayMinutes`),
 KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB AUTO_INCREMENT=178116912 DEFAULT CHARSET=latin1

Query:

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

PQ disabled, explain plan:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: range
possible_keys: PRIMARY,carrier_name
         key: carrier_name
     key_len: 203
         ref: NULL
        rows: 3
       Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ref
possible_keys: comb1
         key: comb1
     key_len: 3
         ref: ontime.c.carrier_code
        rows: 2711597
       Extra: Using index condition; Using where
2 rows in set (0.01 sec)

As we can see MySQL uses indexes for the join. Response times:

/* run 1 – cold run */

10 rows in set (29 min 17.39 sec)

/* run 2  – warm run */

10 rows in set (2 min 45.16 sec)

PQ enabled, explain plan:

mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: ALL
possible_keys: PRIMARY,carrier_name
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 1650
       Extra: Using where; Using temporary; Using filesort; Using parallel query (2 columns, 0 filters, 1 exprs; 0 extra)
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ALL
possible_keys: comb1
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173542245
       Extra: Using where; Using join buffer (Hash Join Outer table o); Using parallel query (11 columns, 1 filters, 1 exprs; 0 extra)
2 rows in set (0.00 sec)

As we can see, Aurora does not use any indexes and uses a parallel scan instead.

Response time:

mysql> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
...
*************************** 4. row ***************************
   FlightDate: 2017-05-04
UniqueCarrier: UA
      TailNum: N68821
    FlightNum: 1205
       Origin: KOA
OriginCityName: Kona, HI
         Dest: LAX
 DestCityName: Los Angeles, CA
     DepDelay: 1457
     ArrDelay: 1459
*************************** 5. row ***************************
   FlightDate: 1991-03-12
UniqueCarrier: DL
      TailNum:
    FlightNum: 1118
       Origin: ATL
OriginCityName: Atlanta, GA
         Dest: STL
 DestCityName: St. Louis, MO
...
10 rows in set (28.78 sec)
mysql> select found_rows();
+--------------+
| found_rows() |
+--------------+
|      4180974 |
+--------------+
1 row in set (0.00 sec)

Result: ~5x performance gain

(this is actually comparing the index cached read to a non-index PQ execution)

Summary

Aurora PQ can significantly improve the performance of reporting queries as such queries may be extremely hard to optimize in MySQL, even when using indexes. With indexes, Aurora PQ response time can be 5x-10x better compared to the non-parallel, fully cached operations. Aurora PQ can help improve performance of complex queries by performing parallel reads.

The following table summarizes the query response times:

Query Time, No PQ, index Time, PQ
select count(*) from ontime where flightdate > ‘2017-01-01’ 2 min 48.81 sec 16.53 sec
select avg(DepDelay) from ontime; 2 min 49.95 sec 1 min 48.17 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest

FROM ontime

WHERE

DestState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and OriginState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and flightdate > ‘2015-01-01’

and ArrDelay < 15

and cancelled = 0

and Diverted = 0

and DivAirportLandings = 0

ORDER by DepDelay DESC

LIMIT 10;

3 min 42.47 sec 28.49 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay

FROM ontime_ind o

JOIN carriers c on o.carrier = c.carrier_code

WHERE

(carrier_name like ‘United%’ or carrier_name like ‘Delta%’)

and ArrDelay > 30

ORDER by DepDelay DESC

LIMIT 10\G

2 min 45.16 sec 28.78 sec


Photo by Thomas Lipke on Unsplash

Jan
17
2019
--

IBM and Vodafone form cloud, 5G and AI business venture and ink $550M service deal

IBM is one of the world’s biggest system integrators, but to get closer to where enterprises are actually doing their work, it’s been inking partnerships with companies that build devices and run the networks enterprises are using for their IT, and today comes the latest development on that front.

IBM is announcing a new venture with mobile carrier Vodafone, in a deal that will comes in two parts. First, IBM will supply Vodafone’s B2B unit Vodafone Business with managed services in the areas of cloud and hosting. And second, the two will together work on building and delivering solutions in areas like AI, cloud, 5G, IoT and software defined networking to enterprise customers.

The latter part of the deal appears to be a classic JV that will see both sides bringing something to the table — employees from both companies will be moving into a separate office together very soon that will essentially be “neutral” territory. The former part, meanwhile, will see Vodafone paying IBM some $550 million in an eight-year agreement.

That price tag alone is a strong indicator that this deal is a big one for both companies.

The agreement follows along the lines of what IBM inked with Apple several years ago, where the two would work together to develop enterprise solutions that would have been more challenging to do on their own.

Indeed, while IBM does provide systems integration services, it hasn’t moved as deeply into mobile-specific solutions for businesses, even as its other operational units — doing research and other work in AI, cloud, quantum computing and other areas — are making strong headway on specific projects, some of which involve mobile technology. Now that it’s nearly in full possession of RedHat — which it is in the process of buying for $34 billion, a deal that’s now received the approval of RedHat’s shareholders — it will also have open source cloud computing to add to that.

What the Vodafone deal will tap is taking more of those cutting-edge developments that IBM has built and worked on in specific projects, and productise them for a wider audience of businesses and other organisations, which might already be Vodafone customers.

“To deliver multi-cloud strategies in the real world, enterprises need to invest at many levels, ranging from cloud connectivity to cloud governance and management. This new venture between Vodafone and IBM addresses the ‘full stack’ of real-world multi-cloud concerns with a powerful combination of capabilities that should enable customers to deliver multi-cloud strategies in all layers of their organizations,” noted Carla Arend, senior program director for European software at IDC.

The Apple / IBM deal is more than instructive in this case; it will help fuel this new venture. From what I understand, several fruits of that labor will be making their way into the IBM / Vodafone deal, too, which makes sense, considering Vodafone’s position as a mobile carrier and the iPhone making some strong headway into the business market.

“IBM has built industry-leading hybrid cloud, AI and security capabilities underpinned by deep industry expertise,” said IBM Chairman, President and CEO Ginni Rometty in a statement. “Together, IBM and Vodafone will use the power of the hybrid cloud to securely integrate critical business applications, driving business innovation – from agriculture to next- generation retail.”

“Vodafone has successfully established its cloud business to help our customers succeed in a digital world,” said Vodafone CEO Nick Read, in the statement. “This strategic venture with IBM allows us to focus on our strengths in fixed and mobile technologies, whilst leveraging IBM’s expertise in multicloud, AI and services. Through this new venture we’ll accelerate our growth and deepen engagement with our customers while driving radical simplification and efficiency in our business.”

I’ve been told that the first joint “customer engagements” are already happening with an unnamed energy company. Thinking about what kinds of services Vodafone may be providing to end users today — they will cover mobile data and voice connectivity, mobile broadband, IoT and 5G services — this first deal will involve tapping all four, with an emphasis on 5G and IoT.

Jan
17
2019
--

Former Facebook engineer picks up $15M for AI platform Spell

In 2016, Serkan Piantino packed up his desk at Facebook with hopes to move on to something new. The former director of Engineering for Facebook AI Research had every intention to keep working on AI, but quickly realized a huge issue.

Unless you’re under the umbrella of one of these big tech companies like Facebook, it can be very difficult and incredibly expensive to get your hands on the hardware necessary to run machine learning experiments.

So he built Spell, which today received $15 million in Series A funding led by Eclipse Ventures and Two Sigma Ventures.

Spell is a collaborative platform that lets anyone run machine learning experiments. The company connects clients with the best, newest hardware hosted by Google, AWS and Microsoft Azure and gives them the software interface they need to run, collaborate and build with AI.

“We spent decades getting to a laptop powerful enough to develop a mobile app or a website, but we’re struggling with things we develop in AI that we haven’t struggled with since the 70s,” said Piantino. “Before PCs existed, the computers filled the whole room at a university or NASA and people used terminals to log into a single main frame. It’s why Unix was invented, and that’s kind of what AI needs right now.”

In a meeting with Piantino this week, TechCrunch got a peek at the product. First, Piantino pulled out his MacBook and opened up Terminal. He began to run his own code against MNIST, which is a database of handwritten digits commonly used to train image detection algorithms.

He started the program and then moved over to the Spell platform. While the original program was just getting started, Spell’s cloud computing platform had completed the test in less than a minute.

The advantage here is obvious. Engineers who want to work on AI, either on their own or for a company, have a huge task in front of them. They essentially have to build their own computer, complete with the high-powered GPUs necessary to run their tests.

With Spell, the newest GPUs from Nvidia and Google are virtually available for anyone to run their tests.

Individual users can get on for free, specify the type of GPU they need to compute their experiment and simply let it run. Corporate users, on the other hand, are able to view the runs taking place on Spell and compare experiments, allowing users to collaborate on their projects from within the platform.

Enterprise clients can set up their own cluster, and keep all of their programs private on the Spell platform, rather than running tests on the public cluster.

Spell also offers enterprise customers a “spell hyper” command that offers built-in support for hyperparameter optimization. Folks can track their models and results and deploy them to Kubernetes/Kubeflow in a single click.

But perhaps most importantly, Spell allows an organization to instantly transform their model into an API that can be used more broadly throughout the organization, or used directly within an app or website.

The implications here are huge. Small companies and startups looking to get into AI now have a much lower barrier to entry, whereas large traditional companies can build out their own proprietary machine learning algorithms for use within the organization without an outrageous upfront investment.

Individual users can get on the platform for free, whereas enterprise clients can get started for $99/month per host you use over the course of a month. Piantino explains that Spell charges based on concurrent usage, so if the customer has 10 concurrent things running, the company considers that the “size” of the Spell cluster and charges based on that.

Piantino sees Spell’s model as the key to defensibility. Whereas many cloud platforms try to lock customers in to their entire suite of products, Spell works with any language framework and lets users plug and play on the platforms of their choice by simply commodifying the hardware. In fact, Spell doesn’t even share with clients which cloud cluster (Microsoft Azure, Google or AWS) they’re on.

So, on the one hand the speed of the tests themselves goes up based on access to new hardware, but, because Spell is an agnostic platform, there is also a huge advantage in how quickly one can get set up and start working.

The company plans to use the funding to further grow the team and the product, and Piantino says he has his eye out for top-tier engineering talent, as well as a designer.

Jan
17
2019
--

On-demand workspace platform Breather taps new CEO

Breather’s new CEO Bryan Murphy / Breather Press Kit

Breather, the platform that provides on-demand private workspace, announced today that it has appointed Bryan Murphy as its new CEO.

Before joining Breather, Murphy was the founder and president of direct-to-consumer mattress startup, Tomorrow Sleep. Prior to Tomorrow Sleep, Murphy held posts as an advisor to investment firms and as an executive at eBay after the company acquired his previous company, WHI Solutions — an e-commerce platform for aftermarket auto parts — where Murphy was the co-founder and CEO.

Breather believes Murphy’s extensive background scaling e-commerce and SaaS platforms, as well as his experience working with incumbents across a number of traditional industries, can help it execute through its next stage of global growth.

Murphy is filling the vacancy left by co-founder and former CEO Julien Smith, who stepped down as chief executive this past September, just three months after the company completed its $45 million Series C round, which was led by Menlo Ventures and saw participation from RRE Ventures, Temasek Holdings, Ascendas-Singbridge and Caisse de Depot et Placement du Quebec.

In a past statement on his transition, Smith said: “As I reflect on my strengths and consider what it will take for the company to reach its full potential, I realize bringing on an executive with experience scaling a company through the next level of growth is the best thing for the business.”

Smith, who remains with the company as chairman of the board, believes Murphy more than fits the bill. “Bryan’s record of scaling brands in competitive markets makes him an ideal leader to support this momentum, and I’m excited to see where he takes us next,” Smith said.

In a conversation with TechCrunch, Murphy explained that Breather’s next growth phase will ultimately come down to its ability to continue the global expansion of its network of locations and partner landlords while striking the optimal balance between rental economics and employee utility, productivity and performance. With new spaces and ramped marketing efforts, Murphy and the company expect 2019 to be a big year for Breather — “I think this year, you’re going to start hearing a lot about Breather and it really being in a leadership role for the industry.”

Breather’s workspace at 900 Broadway in New York City is one of 500+ network locations accessible to users.

On Breather’s platform, users are currently able to access a network of more than 500 private workspaces across 10 major cities around the world, which can be booked as meeting space or short-term private office space.

Meeting spaces can be reserved for as little as two hours, while office space can be booked on a month-to-month basis, providing businesses with financial flexibility, private and more spacious alternatives to co-working options, and the ability to easily change offices as they grow. For landlords, Breather allows property owners to generate value from underutilized space by providing a turnkey digital booking system, as well as expertise in the short-term rental space.

Murphy explained to TechCrunch that part of what excited him most about his new role was his belief in Breather’s significant product-market fit and the immense addressable market that he sees for flexible workspaces longer-term. With limited penetration to date, Murphy feels the commercial office space industry is in just the third inning of significant transformation. 

Murphy believes that long-term growth for Breather and other flexible space providers will be driven by a heightened focus on employee flexibility and wellness, a growing number of currently underserved companies whose needs fall between co-working and traditional direct leasing, and the need for landlords to support a wider variety of office space options as workforce demographics and behaviors shift. 

Murphy believes that the ease, flexibility and unlocked value Breather provides puts the platform in a great position to win market share.

“Breather has built a remarkable commercial real estate e-commerce and services platform that offers one-click access to over 500 workspaces around the world,” said Murphy in a press release. “To our customers, having access to workspace that is turnkey, affordable, beautiful, productive and that can flex up and down based on needs is a total game changer.”

To date, Breather has served more than 500,000 customers and has raised more than $120 million in investment.

Jan
17
2019
--

Alation announces $50M Series C investment as data catalog biz takes off

Alation, a startup that helps crawl a company’s databases in order to build a data search catalog, announced a $50 million Series C investment today.

The round was led by Sapphire Ventures and Salesforce Ventures. Existing investors Costanoa Ventures, DCVC (Data Collective), Harmony Partners and Icon Ventures also participated. Today’s investment brings the total raised to $82 million, according to Crunchbase data.

The participation of Sapphire Ventures, originally launched by SAP, and Salesforce Ventures, the venture arm of Salesforce, is particularly telling. One of the issues these enterprise software companies face when they go inside large enterprises is helping customer’s access and understand data wherever it lives. It’s one of the reasons that Salesforce bought MuleSoft for $6.5 billion last year.

This is a problem that employees face, as well. It’s simply inefficient to query multiple databases manually, or to even know what databases exist inside a large organization. Alation uses out-of-the-box connectors to connect to common data sources like Oracle, Redshift, Teradata, Spark and Tableau to create a centralized data catalog.

With that catalog in place, employees can search just as they would with any enterprise search engine, with the notable difference that this tool is focused strictly on structured data inside of supported data sources.

The company goes beyond pure matching to find the data an employee is searching for. Company CEO and co-founder Satyen Sangani says they also use a method to analyze usage to display the most likely result. “What differentiates us in particular is that we look at the logs of how people are using that information,” he explained. This is analogous to how Google uses the PageRank algorithm to measure the popularity of a page based on the number of times people link to a page.

Alation catalog page. Screenshot: Alation

It is certainly not alone in the space, with competitors like Alteryx and Informatica, but Alation’s approach seems to be resonating. Sangani reports triple-digit growth four years running. The company has soared from 89 employees at the end of last year to around 200 today. It boasts 100 large enterprise customers in production, including names like BMW, Hilton, American Express and Salesforce (whose investment arm, Salesforce Ventures also helped lead today’s round).

As the company grows rapidly, Sangani says he wants the capital in place to help fuel the increasing interest. The size and scope of his customers means that he will need to hire not just engineers to keep developing the product and building new connectors, but customer support and sales and marketing. In all, he expects to add between 100 and 200 employees in the next year.

He also wants to continue building out partnerships. As an example, Teradata is an authorized reseller, and has helped sell the product in global markets where a startup like Alation might lack the resources to enter.

Based in Redwood City, Calif., the company launched in 2012 and released the first version of the product in 2014. Its most recent round prior to today was a $23 million Series B in 2017.

Jan
16
2019
--

AWS launches Backup, a fully managed backup service for AWS

Amazon’s AWS cloud computing service today launched Backup, a new tool that makes it easier for developers on the platform to back up their data from various AWS services and their on-premises apps. Out of the box, the service, which is now available to all developers, lets you set up backup policies for services like Amazon EBS volumes, RDS databases, DynamoDB tables, EFS file systems and AWS Storage Gateway volumes. Support for more services is planned, too. To back up on-premises data, businesses can use the AWS Storage Gateway.

The service allows users to define their various backup policies and retention periods, including the ability to move backups to cold storage (for EFS data) or delete them completely after a certain time. By default, the data is stored in Amazon S3 buckets.

Most of the supported services, except for EFS file systems, already feature the ability to create snapshots. Backup essentially automates that process and creates rules around it, so it’s no surprise that pricing for Backup is the same as for using those snapshot features (with the exception of the file system backup, which will have a per-GB charge). It’s worth noting that you’ll also pay a per-GB fee for restoring data from EFS file systems and DynamoDB backups.

Currently, Backup’s scope is limited to a given AWS region, but the company says that it plans to offer cross-region functionality later this year.

“As the cloud has become the default choice for customers of all sizes, it has attracted two distinct types of builders,” writes Bill Vass, AWS’s VP of Storage, Automation, and Management Services. “Some are tinkerers who want to tweak and fine-tune the full range of AWS services into a desired architecture, and other builders are drawn to the same breadth and depth of functionality in AWS, but are willing to trade some of the service granularity to start at a higher abstraction layer, so they can build even faster. We designed AWS Backup for this second type of builder who has told us that they want one place to go for backups versus having to do it across multiple, individual services.”

Early adopters of AWS Backup are State Street Corporation, Smile Brands and Rackspace, though this is surely a service that will attract its fair share of users as it makes the life of admins quite a bit easier. AWS does have quite a few backup and storage partners, though, who may not be all that excited to see AWS jump into this market, too — though they often offer a wider range of functionality than AWS’s service, including cross-region and offsite backups.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com