Sep
07
2018
--

Taboola and Percona Host a Meetup in Tel Aviv, Oct 3

Percona Taboola Meetup in Tel Aviv

Percona Taboola Meetup in Tel AvivFrom time to time, Percona hosts and co-hosts events around the world, often at no cost for attendees. The common focus for these events is, of course, open source databases. On October 3 2018, we team up with Taboola for an evening of talks suitable for all levels of expertise, addressing both the how and the why of choosing a database technology. The event is to be held at the Taboola offices in Tel Aviv. Here’s the schedule:

  • 6:00 pm Welcome with food & drink served
  • 6:45 pm How to pick database technologies that cover your requirements (Dimitri Vanoverbeke, Percona)
  • 7:20 pm Lessons from the field, extremely heavy loads on Percona Server for MySQL (Ariel Pisetzky, Taboola)
  • 7:40 pm Migrating to the cloud, the why and the how. (Dimitri Vanoverbeke, Percona)
  • 8:00 pm Event close

If you would like to join us, please sign up here so that we can be sure to cater for you.

Meanwhile, if you have not yet bought your tickets for Percona Live Europe from November 5 – 7 in Frankfurt, then hurry… I have extended Early Bird pricing one week. We’ll be publishing the exciting, high quality schedule soon so watch out for it! (But don’t miss the Early Bird…)

Finally, if you would like to be sure to not miss our future events, please subscribe to our newsletter–we don’t necessarily blog about every opportunity.

The post Taboola and Percona Host a Meetup in Tel Aviv, Oct 3 appeared first on Percona Database Performance Blog.

Aug
29
2018
--

Scaling IO-Bound Workloads for MySQL in the Cloud

InnoDB / MyRocks throughput on IO1

Is increasing GP2 volumes size or increasing IOPS for IO1 volumes a valid method for scaling IO-Bound workloads? In this post I’ll focus on one question: how much can we improve performance if we use faster cloud volumes? This post is a continuance of previous cloud research posts:

To recap, in Amazon EC2 we can use gp2 and io1 volumes. gp2 performance can be scaled with size, i.e for gp2 volume size of 500GB we get 1500 iops; size 1000GB – 3000 iops; and for 3334GB – 10000 iops (maximal possible value). For io1 volumes we can “buy” throughput up to 30000 iops.

So I wanted to check how both InnoDB and RocksDB storage engines perform on these volumes with different throughput.

Benchmark Scenario

I will use the same datasize that I used in Saving With MyRocks in The Cloud, that is sysbench-tpcc, 50 tables, 100W each, about 500GB datasize in InnoDB and 100GB in RocksDB (compressed with LZ4).

Volumes settings: gp2 volumes from 500GB (1000GB for InnoDB) to 3400GB with 100GB increments (so each increment increases throughput by 300 iops); io1 volumes: 1TB in size, iops from 1000 to 30000 with 1000 increments.

Let’s take look at the results. I will use a slightly different format than usual, but hopefully it represents the results better. You will see density throughout the plots—a higher and narrower chart represents less variance in the throughput. The plot represents the distribution of the throughput.

Results on GP2 volumes:

InnoDB/MyRocks throughput on gp2

It’s quite interesting to see how the result scales with better IO throughput. InnoDB does not improve its throughput after gp2 size 2600GB, while MyRocks continues to scale linearly. The problem with MyRocks is that there is a lot of variance in throughput (I will show a one second resolution chart).

Results on IO1 volumes

InnoDB / MyRocks throughput on IO1

Here MyRocks again shows an impressive growth as as we add more IO capacity, but also shows a lot of variance on high capacity volumes.

Let’s compare how engines perform with one second resolution. GP2 volume, 3400GB:

InnoDB/MyRocks throughput on gp2 3400GB

IO1 volume, 30000 iops:

InnoDB/MyRocks throughput on IO1 30000 IOPS

So for MyRocks there seems to be periodical background activity, which does not allow it to achieve a stable throughput.

Raw results, if you’d like to review them, can be found here: https://github.com/Percona-Lab-results/201808-rocksdb-cloudio

Conclusions

If you are looking to improve throughput in IO-bound workloads, either increasing GP2 volumes size or increasing IOPS for IO1 volumes is a valid method, especially for the MyRocks engine.

The post Scaling IO-Bound Workloads for MySQL in the Cloud appeared first on Percona Database Performance Blog.

Aug
28
2018
--

Webinar Wed 8/29: Databases in the Hosted Cloud

databases-in-the-cloud

databases-in-the-cloudPlease join Percona’s Chief Evangelist, Colin Charles on Wednesday, August 29th, 2018, as he presents Databases in the Hosted Cloud at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

Nearly everyone today uses some form of database in the hosted cloud. You can use hosted MySQL, MariaDB, Percona Server, and PostgreSQL in several cloud providers as a database as a service (DBaaS).

In this webinar, Colin Charles explores how to efficiently deploy a cloud database configured for optimal performance, with a particular focus on MySQL.

You’ll learn the differences between the various public cloud offerings for Amazon RDS including Aurora, Google Cloud SQL, Rackspace OpenStack DBaaS, Microsoft Azure, and Alibaba Cloud, as well as the access methods and the level of control you have. Hosting in the cloud can be a challenge but after today’s webinar, we’ll make sure you walk away with a better understanding of how you can leverage the cloud for your business needs.

Topics include:

  • Backup strategies
  • Planning multiple data centers for availability
  • Where to host your application
  • How to get the most performance out of the solution
  • Cost
  • Monitoring
  • Moving from one DBaaS to another
  • Moving from a DBaaS to your own hosted platform

Register Now.

The post Webinar Wed 8/29: Databases in the Hosted Cloud appeared first on Percona Database Performance Blog.

Aug
24
2018
--

This Week in Data with Colin Charles 50: Percona Live Europe Sessions, PostgreSQL in Google Cloud

Colin Charles

Colin Charles

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Grading is underway for talks at Percona Live Europe 2018. I understand that by next week you will see the tutorial schedule released. As part of the program committee, I have enjoyed reviewing tutorials, and I reckon there is great competition for the schedule. I suggest you register now, and don’t forget to book your accommodation (need a discount?).

A video worth watching: How we Live Migrated Millions of BBM Users & its Infrastructure Across the Pacific (Cloud Next ’18). I was a big Blackberry Messenger (BBM) user in its heyday. They moved from their Canadian data center to Google Cloud. Blackberry used MySQL, migrated Oracle to PostgreSQL, as Oracle isn’t sanctioned to run on Google Cloud Platform. They also moved from MySQL to Google CloudSQL since the databases weren’t critical to running the application. Some cases write to BigQuery and have Tableau query them. They make use of Cassandra native replication. With PostgreSQL, there is master-slave replication: they got users onto the new master and then promoted the master. When they did the master promotion, there was minimal (5-10 minutes) downtime while this process was ongoing. This didn’t impact their key services, messages continued to work, and this was seen as acceptable.

Releases

  • MariaDB 10.3.9 – InnoDB from 5.7.23, a new variable innodb_log_optimize_ddl for avoiding delay due to page flushing and allowing concurrent backup, and many fixes and improvements around ALTER TABLE too.
  • Percona Server for MySQL 5.6.41-84.1 – full-text search index with InnoDB table bug fixed, as well as ensuring queries on a table with CHARSET=euckr COLLATE=euckr_bin always return the same results.
  • Percona Server for MySQL 5.5.61-38.13  bug fixes.

Link List

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 50: Percona Live Europe Sessions, PostgreSQL in Google Cloud appeared first on Percona Database Performance Blog.

Aug
20
2018
--

Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

AWS EC2 MySQL cost savings

AWS EC2 MySQL cost savingsIf you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

EBS vs Local instance store

We have heard from customers that large EBS GP2 volumes can be affected by short term outages—IO “stalls” where no IO is going in or out for a couple of minutes. This can happen, especially, in the largest AWS region us-east-1. Statistically, with so many disks in disk arrays (which back EBS volumes) we can expect frequent disk failures. If we allocate a very large EBS GP2 volume, i.e. 10Tb+, hitting such failure events can be common.

In the case of MySQL/InnoDB, such an IO “stall” will be obvious, particularly with the highly loaded system where MySQL needs to do physical IO. During the stall, you will see all write queries are waiting, or “hang”.  Some of the writes may error out with “Error 1030” (MySQL error code 1030 (ER_GET_ERRNO): Got error %d from storage engine). There is nothing MySQL can do here – if the IO subsystem is not available, it will need to wait for it.

The good news is: many of the newer EC2 instances (i.e. i3, m5d, etc) have local SSD disks attached (NVMe). Those disks are local to the physical server and should not suffer from the EBS issues described above. Using local disks can be a very good solution:

  1. They are faster, as they are local to the server, and do not suffer from the EBS issues
  2. They are much cheaper compared to large EBS volumes.

Please note, however, that local storage does not guarantee persistence. More about this below.

Another potential option will be to use IO1 volumes with provisional IOPS. However, it will be significantly more expensive for the large volumes and high traffic.

A look at costs

To estimate the costs, I’ve used the AWS simple monthly calculator. Estimated costs are based on 1 year reserved instances. Let’s imagine we will need to use 14TB volume (to store ~10Tb of MySQL data including binary logs). The pricing estimates will look like this:

r4.4xlarge, 122GB RAM, 16 vCPUs + EBS, 14TB volume (this is what we are presumably using now)

Amazon EC2 Service (US East (N. Virginia)) $ 1890.56 / month
Compute: $ 490.56
EBS Volumes: $1400.00

Local storage price estimate:
i3.4xlarge, 122GB RAM, 16 vCPUs, 3800 GiB disk (2 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $ 627.21 / month
Compute: $ 625.61

i3.8xlarge, 244GB RAM, 32 vCPUs, 7600 GiB disk (4 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $1252.82 / month
Compute: $ 1251.22

As we can see, even if we switch to i3.8xlarge and get 2x more RAM and 2x more virtual CPUs, faster storage, 10 gigabit network we can still pay 1.5x less per box what we are presumably paying now. Include replication, then that’s paying 1.5x less per each of the replication servers.

But wait … there is a catch.

How to migrate to local storage from EBS

Well, we have some challenges here to migrate from EBS to local instance NVMe storage.

  1. Wait, we are storing ~10Tb and i3.8xlarge have 7600 GiB disk. The answer is simple: compression (see below)
  2. Wait, but the local storage is ephemeral, if we loose the box we will loose our data – that is unacceptable.  The answer is also simple: replication (see below)
  3. Wait, but we use EBS snapshots for backups. That answer is simple too: we can still use EBS (and use snapshots) on 1 of the replication slave (see below)

Compression

To fit i3.8xlarge we only need 2x compression. This can be done with InnoDB row compression (row_format=compressed) or InnoDB page compression, which requires sparse file and hole punching support. However, InnoDB compression may be slower and will only compress ibd files—it does not compress binary logs, frm files, etc.

ZFS

Another option: use the ZFS filesystem. ZFS will compress all files, including binary logs and frm. That can be very helpful if we use a “schema per customer” or “table per customer” approach and need to store 100K – 200K tables in a single MySQL instance. If the data is compressible, or new tables were provisioned without much data in those, ZFS can give a significant disk savings.

I’ve used ZFS (followed Yves blog post, Hands-On Look at ZFS with MySQL). Here are the results of data compression with ZFS (this is real data, not a generated data):

# du -sh --apparent-size /mysqldata/mysql/data
8.6T	/mysqldata/mysql/data
# du -sh /mysqldata/mysql/data
3.2T	/mysqldata/mysql/data

Compression ratio:

# zfs get all | grep -i compress
...
mysqldata/mysql/data  compressratio         2.42x                  -
mysqldata/mysql/data  compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/data  refcompressratio      2.42x                  -
mysqldata/mysql/log   compressratio         3.75x                  -
mysqldata/mysql/log   compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/log   refcompressratio      3.75x                  -

As we can see, the original 8.6Tb of data was compressed to 3.2Tb, the compression ratio for MySQL tables is 2.42x, for binary logs 3.75x. That will definitely fit i3.8xlarge.

(For another test, I’ve generated 40 million tables spread across multiple schemas (databases). I’ve added some data only to one schema, leaving others blank. For that test I achieved ~10x compression ratio.)

Conclusion: ZFS can provide you with very good compression ratio, will allow you to use different EC2 instances on AWS, and save you a substantial amount of money. Although compression is not free performance-wise, and ZFS can be slower for some workloads, using local NVMe storage can compensate.

You can find some performance testing for ZFS on linux in this blog post: About ZFS Performance. Some benchmarks comparing EBS and local NVMe SSD storage (i3 instances) can be found in this blog post: Percona XtraDB Cluster on Amazon GP2 Volumes

MyRocks

Another option for compression would be using the MyRocks storage engine in Percona Server for MySQL, which provides compression.

Replication and using local volumes

As the local instance storage is ephemeral we need redundancy: we can use MySQL replication or Percona XtraDB cluster (PXC). In addition, we can use one replication slave—or we can attach a replication slave to PXC—and have it use EBS storage.

Local storage is not durable. If you stop the instance and then start it again, the local storage will probably disappear. (Though reboot is an exception, you can reboot the instance and the local storage will be fine.) In addition if the local storage disappears we will have to recreate MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

For example, using MySQL replication:

master - local storage (AZ 1, i.e. 1a)
-> slave1 - local storage (AZ 2, i.e. 1b)
-> slave2 - ebs storage (AZ 3, i.e. 1c)
   (other replication slaves if needed with local storage - optional)

MySQL Master AZ 1a, Local storage

Then we can use slave2 for ebs snapshots (if needed). This slave will be more expensive (as it is using EBS) but it can also be used to either serve production traffic (i.e. we can place smaller amount of traffic) or for other purposes (for example analytical queries, etc).

For Percona XtraDB cluster (PXC) we can just use 3 nodes, 1 in each AZ. PXC uses auto-provisioning with SST if the new node comes back blank. For MySQL replication we need some additional things:

  1. Failover from master to a slave if the master will go down. This can be done with MHA or Orchestrator
  2. Ability to clone slave. This can be done with Xtrabackup or ZFS snapshots (if using ZFS)
  3. Ability to setup a new MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

Other options

Here are some totally different options we could consider:

  1. Use IO1 volumes (as discussed). That can be way more expensive.
  2. Use local storage and MyRocks storage engine. However, switching to another storage engine is another bigger project and requires lots of testing
  3. Switch to AWS Aurora. That can be even more expensive for this particular case; and switching to aurora can be another big project by itself.

Conclusions

  1. Using EC2 i3 instances with local NVMe storage can increase performance and save money. There are some limitations: local storage is ephemeral and will disappear if the node has stopped. Reboot is fine.
  2. ZFS filesystem with compression enabled can decrease the storage requirements so that a MySQL instance will fit into local storage. Another option for compression could be to use InnoDB compression (row_format=compressed).

That may not work for everyone as it requires additional changes to the existing server provisioning: failover from master to a slave, ability to clone replication slaves (or use PXC), ability to setup a new MySQL local storage partition, using compression.

The post Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost appeared first on Percona Database Performance Blog.

Aug
02
2018
--

Amazon RDS Multi-AZ Deployments and Read Replicas

RDS Multi-AZ

Amazon RDS is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. One of the common questions that we get is “What is Multi-AZ and how it’s different from Read Replica, do I need both?”.  I have tried to answer this question in this blog post and it depends on your application needs. Are you looking for High Availability (HA), read scalability … or both?

Before we go to into detail, let me explain two common terms used with Amazon AWS.

Region – an AWS region is a separate geographical area like US East (N. Virginia), Asia Pacific (Mumbai), EU (London) etc. Each AWS Region has multiple, isolated locations known as Availability Zones.

Availability Zone (AZ) – AZ is simply one or more data centers, each with redundant power, networking and connectivity, housed in separate facilities. Data centers are geographically isolated within the same region.

What is Multi-AZ?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.

In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica of the master DB in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support and to minimize latency during system backups. In the event of planned database maintenance, DB instance failure, or an AZ failure of your primary DB instance, Amazon RDS automatically performs a failover to the standby so that database operations can resume quickly without administrative intervention.

You can check in the AWS management console if a database instance is configured as Multi-AZ. Select the RDS service, click on the DB instance and review the details section.

AWS management console showing that instance is Multi-AZ

This screenshot from AWS management console (above) shows that the database is hosted as Multi-AZ deployment and the standby replica is deployed in us-east-1a AZ.

Benefits of Multi-AZ deployment:

  • Replication to a standby replica is synchronous which is highly durable.
  • When a problem is detected on the primary instance, it will automatically failover to the standby in the following conditions:
    • The primary DB instance fails
    • An Availability Zone outage
    • The DB instance server type is changed
    • The operating system of the DB instance is undergoing software patching.
    • A manual failover of the DB instance was initiated using Reboot with failover.
  • The endpoint of the DB instance remains the same after a failover, the application can resume database operations without manual intervention.
  • If a failure occurs, your availability impact is limited to the time that the automatic failover takes to complete. This helps to achieve increased availability.
  • It reduces the impact of maintenance. RDS performs maintenance on the standby first, promotes the standby to primary master, and then performs maintenance on the old master which is now a standby replica.
  • To prevent any negative impact of the backup process on performance, Amazon RDS creates a backup from the standby replica.

Amazon RDS does not failover automatically in response to database operations such as long-running queries, deadlocks or database corruption errors. Also, the Multi-AZ deployments are limited to a single region only, cross-region Multi-AZ is not currently supported.

Can I use an RDS standby replica for read scaling?

The Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic. Multi-AZ maintains a standby replica for HA/failover. It is available for use only when RDS promotes the standby instance as the primary. To service read-only traffic, you should use a Read Replica instead.

What is Read Replica?

Read replicas allow you to have a read-only copy of your database.

When you create a Read Replica, you first specify an existing DB instance as the source. Then Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. You can use MySQL native asynchronous replication to keep Read Replica up-to-date with the changes. The source DB must have automatic backups enabled for setting up read replica.

Benefits of Read Replica

  • Read Replica helps in decreasing load on the primary DB by serving read-only traffic.
  • A Read Replica can be manually promoted as a standalone database instance.
  • You can create Read Replicas within AZ, Cross-AZ or Cross-Region.
  • You can have up to five Read Replicas per master, each with own DNS endpoint. Unlike a Multi-AZ standby replica, you can connect to each Read Replica and use them for read scaling.
  • You can have Read Replicas of Read Replicas.
  • Read Replicas can be Multi-AZ enabled.
  • You can use Read Replicas to take logical backups (mysqldump/mydumper) if you want to store the backups externally to RDS.
  • Read Replica helps to maintain a copy of databases in a different region for disaster recovery.

At AWS re:Invent 2017, AWS announced the preview for Amazon Aurora Multi-Master, this will allow users to create multiple Aurora writer nodes and helps in scaling reads/writes across multiple AZs. You can sign up for preview here.

Conclusion

While both (Multi-AZ and Read replica) maintain a copy of database but they are different in nature. Use Multi-AZ deployments for High Availability and Read Replica for read scalability. You can further set up a cross-region read replica for disaster recovery.

The post Amazon RDS Multi-AZ Deployments and Read Replicas appeared first on Percona Database Performance Blog.

Jul
17
2018
--

When Should I Use Amazon Aurora and When Should I use RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

DBaaS cloud services allow users to use databases without configuring physical hardware and infrastructure, and without installing software. I’m not sure if there is a straightforward answer, but when trying to find out which solution best fits an organization there are multiple factors that should be taken into consideration. These may be performance, high availability, operational cost, management, capacity planning, scalability, security, monitoring, etc.

There are also cases where although the workload and operational needs seem to best fit to one solution, there are other limiting factors which may be blockers (or at least need special handling).

In this blog post, I will try to provide some general rules of thumb but let’s first try to give a short description of these products.

What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS.

An introduction to Amazon RDS

Amazon Relational Database Service (Amazon RDS) is a hosted database service which provides multiple database products to choose from, including Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. We will focus on MySQL and Aurora.

With regards to systems administration, both solutions are time-saving. You get an environment ready to deploy your application and if there are no dedicated DBAs, RDS gives you great flexibility for operations like upgrades or backups. For both products, Amazon applies required updates and the latest patches without any downtime. You can define maintenance windows and automated patching (if enabled) will occur within them. Data is continuously backed up to S3 in real time, with no performance impact. This eliminates the need for backup windows and other, complex or not, scripted procedures. Although this sounds great, the risk of vendor lock-in and the challenges of enforced updates and client-side optimizations are still there.

So, Aurora or RDS MySQL?

Amazon Aurora is a relational, proprietary, closed-source database engine, with all that that implies.

RDS MySQL is 5.5, 5.6 and 5.7 compatible and offers the option to select among minor releases. While RDS MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability. Until recently, it was a limitation that Aurora was only compatible with MySQL 5.6 but it’s now compatible with both 5.6 and 5.7 too.

So, in most cases, no significant application changes are required for either product. Keep in mind that certain MySQL features like the MyISAM storage engine are not available with Amazon Aurora. Migration to RDS can be performed using Percona XtraBackup.

For RDS products shell access to the underlying operating system is disabled and access to MySQL user accounts with the “SUPER” privilege isn’t allowed. To configure MySQL variables or manage users, Amazon RDS provides specific parameter groups, APIs and other special system procedures which be used. If you need to enable remote access this article will help you do so https://www.percona.com/blog/2018/05/08/how-to-enable-amazon-rds-remote-access/

Performance considerations

Although Amazon RDS uses SSDs to achieve better IO throughput for all its database services, Amazon claims that the Aurora is able to achieve a 5x performance boost than standard MySQL and provides reliability out of the box. In general, Aurora seems to be faster, but not always.

For example, due to the need to disable the InnoDB change buffer for Aurora (this is one of the keys for the distributed storage engine), and that updates to secondary indexes must be write through, there is a big performance penalty in workloads where heavy writes that update secondary indexes are performed. This is because of the way MySQL relies on the change buffer to defer and merge secondary index updates. If your application performs a high rate of updates against tables with secondary indexes, Aurora performance may be poor. In any case, you should always keep in mind that performance depends on schema design. Before taking the decision to migrate, performance should be evaluated against an application specific workload. Doing extensive benchmarks will be the subject of a future blog post.

Capacity Planning

Talking about underlying storage, another important thing to take into consideration is that with Aurora there is no need for capacity planning. Aurora storage will automatically grow, from the minimum of 10 GB up to 64 TiB, in 10 GB increments, with no impact on database performance. The table size limit is only constrained by the size of the Aurora cluster volume, which has a maximum of 64 tebibytes (TiB). As a result, the maximum table size for a table in an Aurora database is 64 TiB. For RDS MySQL, the maximum provisioned storage limit constrains the size of a table to a maximum size of 16 TB when using InnoDB file-per-table tablespaces.

Replication

Replication is a really powerful feature of MySQL (like) products. With Aurora, you can provision up to fifteen replicas compared to just five in RDS MySQL. All Aurora replicas share the same underlying volume with the primary instance and this means that replication can be performed in milliseconds as updates made by the primary instance are instantly available to all Aurora replicas. Failover is automatic with no data loss on Amazon Aurora whereas the replicas failover priority can be set.

An explanatory description of Amazon Aurora’s architecture can be found in Vadim’s post written a couple of years ago https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/

The architecture used and the way that replication works on both products shows a really significant difference between them. Aurora is a High Availablity (HA) solution where you only need to attach a reader and this automatically becomes Multi-AZ available. Aurora replicates data to six storage nodes in Multi-AZs to withstand the loss of an entire AZ (Availability Zone) or two storage nodes without any availability impact to the client’s applications.

On the other hand, RDS MySQL allows only up to five replicas and the replication process is slower than Aurora. Failover is a manual process and may result in last-minute data loss. RDS for MySQL is not an HA solution, so you have to mark the master as Multi-AZ and attach the endpoints.

Monitoring

Both products can be monitored with a variety of monitoring tools. You can enable automated monitoring and you can define the log types to publish to Amazon CloudWatch. Percona Monitoring and Management (PMM) can also be used to gather metrics.

Be aware that for Aurora there is a limitation for the T2 instances such that Performance Schema can cause the host to run out of memory if enabled.

Costs

Aurora instances will cost you ~20% more than RDS MySQL. If you create Aurora read replicas then the cost of your Aurora cluster will double. Aurora is only available on certain RDS instance sizes. Instances pricing details can be found here and here.

Storage pricing may be a bit tricky. Keep in mind that pricing for Aurora differs to that for RDS MySQL. For RDS MySQL you have to select the type and size for the EBS volume, and you have to be sure that provisioned EBS IOPs can be supported by your instance type as EBS IOPs are restricted by the instance type capabilities. Unless you watch for this, you may end up having EBS IOPs that cannot be really used by your instance.

For Aurora, IOPs are only limited by the instance type. This means that if you want to increase IOPs performance on Aurora you should proceed with an instance type upgrade. In any case, Amazon will charge you based on the dataset size and the requests per second.

That said, although for Aurora you pay only for the data you really use in 10GB increments if you want high performance you have to select the correct instance. For Aurora, regardless of the instance type, you get billed $0.10 per GB-month and $0.20 per 1 million requests so if you need high performance the cost maybe even more than RDS MySQL. For RDS MySQL storage costs are based on the EBS type and size.

Percona provides support for RDS services and you might be interested in these cases studies:

When a more fully customized solution is required, most of our customers usually prefer the use of AWS EC2 instances supported by our managed services offering.

TL;DR
  • If you are looking for a native HA solution then you should use Aurora
  • For a read-intensive workload within an HA environment, Aurora is a perfect match. Combined with ProxySQL for RDS you can get a high flexibility
  • Aurora performance is great but is not as much as expected for write-intensive workloads when secondary indexes exist. In any case, you should benchmark both RDS MySQL and Aurora before taking the decision to migrate.  Performance depends much on workload and schema design
  • By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades
  • If you need to use MySQL plugins you should use RDS MySQL
  • Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the only option
  • With RDS MySQL you can use specific MySQL releases
  • Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only need a managed solution to deploy services in a less expensive way and out of the box availability is not your main concern, RDS MySQL is what you need
  • If for any reason Performance Schema must be ON, you should not enable this on Amazon Aurora MySQL T2 instances. With the Performance Schema enabled, the T2 instance may run out of memory
  • For both products, you should carefully examine the known issues and limitations listed here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html and here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.AuroraMySQL.html

The post When Should I Use Amazon Aurora and When Should I use RDS MySQL? appeared first on Percona Database Performance Blog.

Jun
20
2018
--

Is Serverless Just a New Word for Cloud Based?

serverless architecture

serverless architectureServerless is a new buzzword in the database industry. Even though it gets tossed around often, there is some confusion about what it really means and how it really works. Serverless architectures rely on third-party Backend as a Service (BaaS) services. They can also include custom code that is run in managed, ephemeral containers on a Functions as a Service (FaaS) platform. In comparison to traditional Platform as a Service (PaaS) server architecture, where you pay a predetermined sum for your instances, serverless applications benefit from reduced costs of operations and lower complexity. They are also considered to be more agile, allowing for reduced engineering efforts.

In reality, there are still servers in a serverless architecture: they are just being used, managed, and maintained outside of the application. But isn’t that a lot like what cloud providers, such as Amazon RDS, Google Cloud, and Microsoft Azure, are already offering? Well, yes, but with several caveats.

When you use any of the aforementioned platforms, you still need to provision the types of instances that you plan to use and define how those platforms will act. For example, will it run MySQL, MongoDB, PostgreSQL, or some other tool? With serverless, these decisions are no longer needed. Instead, you simply consume resources from a shared resource pool, using whatever application suits your needs at that time. In addition, in a serverless world, you are only charged for the time that you use the server instead of being charged whether you use it a lot or a little (or not at all).

Remember When You Joined That Gym?

How many of us have purchased a gym membership at some point in our life? Oftentimes, you walk in with the best of intentions and happily enroll in a monthly plan. “For only $29.95 per month, you can use all of the resources of the gym as much as you want.” But, many of us have purchased such a membership and found that our visits to the gym dwindle over time, leaving us paying the same monthly fee for less usage.

Traditional Database as a Service (DBaaS) offerings are similar to your gym membership: you sign up, select your service options, and start using them right away. There are certainly cases of companies using those services consistently, just like there are gym members who show up faithfully month after month. But there are also companies who spin up database instances for a specific purpose, use the database instance for some amount of time, and then slowly find that they are accessing that instance less and less. However, the fees for the instance, much like the fees for your gym membership, keep getting charged.

What if we had a “pay as you go” gym plan? Well, some of those certainly exist. Serverless architecture is somewhat like this plan: you only pay for the resources when you use them, and you only pay for your specific usage. This would be like charging $5 for access to the weight room and $3 for access to the swimming pool, each time you use one or the other. The one big difference with serverless architecture for databases is that you still need to have your data stored somewhere in the environment and made available to you as needed. This would be like renting a gym locker to store your workout gear so that didn’t have to bring it back and forth each time you visited.

Obviously, you will pay for that storage, whether it is your data or your workout gear, but the storage fees are going to be less than your standard membership. The big advantage is that you have what you need when you need it, and you can access the necessary resources to use whatever you are storing.

With a serverless architecture, you store your data securely on low cost storage devices and access as needed. The resources required to process that data are available on an on demand basis. So, your charges are likely to be lower since you are paying a low fee for data storage and a usage fee on resources. This can work great for companies that do not need 24x7x365 access to their data since they are only paying for the services when they are using them. It’s also ideal for developers, who may find that they spend far more time working on their application code than testing it against the database. Instead of paying for the database resources while the data is just sitting there doing nothing, you now pay to store the data and incur the database associated fees at use time.

Benefits and Risks of Going Serverless

One of the biggest possible benefits of going with a serverless architecture is that you save money and hassle. Money can be saved since you only pay for the resources when you use them. Hassle is reduced since you don’t need to worry about the hardware on which your application runs. These can be big wins for a company, but you need to be aware of some pitfalls.

First, serverless can save you money, but there is no guarantee that it will save you money.

Consider 2 different people who have the exact same cell phone – maybe it’s your dad and your teenage daughter. These 2 users probably have very different patterns of usage: your dad uses the phone sporadically (if at all!) and your teenage daughter seems to have her phone physically attached to her. These 2 people would benefit from different service plans with their provider. For your dad, a basic plan that allows some usage (similar to the base cost of storage in our serverless database) with charges for usage above that cap would probably suffice. However, such a plan for your teenage daughter would probably spiral out of control and incur very high usage fees. For her, an unlimited plan makes sense. What is a great fit for one user is a poor fit for another, and the same is true when comparing serverless and DBaaS options.

The good news is that serverless architectures and DBaaS options, like Amazon RDS, Microsoft Azure, and Google Cloud, reduce a lot of the hassle of owning and managing servers. You no longer need to be concerned about Mean Time Between Failures, power and cooling issues, or many of the other headaches that come with maintaining your hardware. However, this can also have a negative consequence.

The challenge of enforced updates

About the only thing that is consistent about software in today’s world is that it is constantly changing. New versions are released with new features that may or may not be important to you. When a serverless provider decides to implement a new version or patch of their backend, there may be some downstream issues for you to manage. It is always important to test any new updates, but now some of the decisions about how and when to upgrade may be out of your control. Proper notification from the provider gives you a window of time for testing, but they are probably going to flip the switch regardless of whether or not you have completed all of your test cycles. This is true of both serverless and DBaaS options.

A risk of vendor lock-in

A common mantra in the software world is that we want to avoid vendor lock-in. Of course, from the provider’s side, they want to avoid customer churn, so we often find ourselves on opposite sides of the same issue. Moving to a new platform or provider becomes more complex as you cede more aspects of server management to the host. This means that serverless can cause deep lock-in since your application is designed to work with the environment as your provider has configured it. If you choose to move to a different provider, you need to extract your application and your data from the current provider and probably need to rework it to fit the requirements of the new provider.

The challenge of client-side optimization

Another consideration is that optimizations of server-side configurations must necessarily be more generic compared to those you might make to self-hosted servers. Optimization can no longer be done at the server level for your specific application and use; instead, you now rely on a smarter client to perform your necessary optimizations. This requires a skill set that may not exist with some developers: the ability to tune applications client-side.

Conclusion

Serverless is not going away. In fact, it is likely to grow as people come to a better understanding and comfort level with it. You need to be able to make an informed decision regarding whether serverless is right for you. Careful consideration of the pros and cons is imperative for making a solid determination. Understanding your usage patterns, user expectations, development capabilities, and a lot more will help to guide that decision.

In a future post, I’ll review the architectural differences between on-premises, PaaS, DBaaS and serverless database environments.

 

The post Is Serverless Just a New Word for Cloud Based? appeared first on Percona Database Performance Blog.

Apr
26
2018
--

The Evolution of the DBA in an “As-A-Service” World

DBA

The requirements for managing and running a database in a modern enterprise have evolved over the past ten years. Those in charge of running enterprise databases have seen their focus shift from ensuring access and availability, to architecture, design and scalability responsibilities. Web-first companies pioneered the change by charging site reliability engineers (SRE’s) or multi-faceted DBAs with the task of ensuring that the company’s main revenue engine not only stayed up, but could scale to wherever the business needed to go. This is a far cry from the classic enterprise DBA’s top responsibilities: keep it up, keep it backed up, and react to issues as they present themselves.

Today, enterprises look for new revenue models to keep up with a shifting technology paradigm driven by the cloud. The requirements and needs for managing their database environments are changing along with this shift. In the SaaS world, application outages mean lost revenue. Worse, it leads to customer churn and gives your competitors an opening. To keep revenue flowing, every part of a SaaS company’s critical infrastructure needs to be planned out: redundancy should be built-in, and a future-proof architecture should be built to accommodate scale.

The more issues you can design out before launch, the less chance of a catastrophic outage later on. This means as a SaaS provider you want your DBAs and database engineers architecting a database that avoids problems at scale, and you want them working with your developers to write better, more efficient database calls. The database infrastructure is designed and automated to work at scale, while taking into account efficient use of resources for meeting today’s requirements.

When companies move to the cloud, the cloud provider takes care of much of the operational automation and many of the mundane day-to-day tasks (for example, using database as a service (DBaaS) options such as Amazon RDS and Aurora). But this does not eliminate the need for database expertise: it moves the function closer to the design and development side of the application. Someone needs to not only design and tune the database to support the application, but also has to understand how to build the modular pieces available in the cloud into a cohesive scalable unit that meets the needs of the application and the company. This means there are much higher impacts and clearer ROIs realized from efficient database expertise.

Cloud DBA vs. Classic DBA

 

Over the years at Percona, we have seen this shift as well. Currently, more than 50% of the support tickets our customers open are related to application design issues, query performance or database infrastructure design. This is a far cry from five years ago when these represented less than 20% of our overall caseload. This makes sense, however, when you think about the maturity of our database products and the technological advances that impact the database. A more stable MySQL and MongoDB, coupled with advances in either homegrown automation or cloud-based infrastructure, reduce the likelihood of common crashing bugs and “Core Database Software” related bugs. Instead, outages and issues are increasingly caused by design decisions, bad code or unplanned-for edge cases. In order to keep up, DBAs need to evolve to move upstream to have the greatest impact.

At Percona, we recognize the changing requirements of modern database deployments. In fact, we have been providing database expertise since the beginning of the SaaS and cloud era. We recognize the needs of clients that choose to run on a DBaaS platform are slightly different than those managing their own full-stack database deployments.

That’s why we created a brand new tier of support focused on DBaaS platforms. These services allow you to rely on your cloud provider for operational break-fix support, while augmenting that with proven world-class expertise focused on the design, development, and tuning of the database itself (which cloud providers typically don’t address).

We also launched a DBaaS-focused version of our Percona DBA service. The Percona DBA service focuses on designing, setting up, and proactively improving your DBaaS cloud environment to ensure you get the most out of your investment. 

Contact us for more details on our new support and managed service options that can help optimize your cloud database environments, and make them run as efficiently as possible with the applications that drive your business.

The post The Evolution of the DBA in an “As-A-Service” World appeared first on Percona Database Performance Blog.

Apr
23
2018
--

Percona Live 2018 Featured Talk: Data Integrity at Scale with Alexis Guajardo

Alexis Google Percona Live 2018

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Alexis Guajardo, Senior Software Engineer at Google.com. His session talk is titled Data Integrity at Scale. Keeping data safe is the top responsibility of anyone running a database. In this session, he dives into Cloud SQL’s storage architecture to demonstrate how they check data down to the disk level:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Alexis: I am a Software Engineer on the Cloud SQL team with Google Cloud. I got into databases by using FileMaker. However, the world of database technology has changed many times over since then.

Percona: Your session is titled “Data Integrity at Scale”. Has the importance of data integrity increased over time? Why?

Alexis Google Percona Live 2018Alexis: Data integrity has always been vital to databases and data in general. The most common method is using checksum validation to ensure data integrity. The challenge that we faced at Cloud SQL on Google Cloud was how to do this for two very popular open source database solutions, and how to do it at scale. The store for MySQL was a bit more straightforward, because of innochecksum.  PostgreSQL required our team to create a utility, which is open sourced. The complicated aspect of data corruption is that sometimes it is dormant and discovered at a most inopportune time. What we have instituted are frequent checks for corruption of the entire data set, so if there is a software bug or other issue, we can mitigate it as soon as possible.

Percona: How does scaling affect the ability to maintain data integrity?

AlexisThere is a benefit to working on a team that provides a public cloud. Since Google Cloud is not bounded by most restrictions that an individual or company would be, we can allocate resources to do data integrity verifications without restriction. If I were to implement a similar system at a smaller company, most likely there would be cost and resource restrictions. However, data integrity is a feature that Google Cloud provides.

Percona: What are three things a DBA should know about ensuring data integrity?

Alexis: I think that the three things can be simplified down to three words: verify your backups.

Even if someone does not use Cloud SQL, it is vital to take backups, maintain them and verify them. Having terabytes of backups, but without verification, leaves open the possibility that a software bug or hardware issue somehow corrupted a backup.

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Alexis: I would say the main reason to attend my talk is to discover more about Cloud SQL. As a DBA or developer, having a managed database as a service solution takes away a lot of the minutia. But there are still the tasks of improving queries and creating applications.  However, having reliable and verified backups is vital. With the addition of high availability and the ability to scale up easily, Cloud SQL’s managed database solution makes life much easier.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Alexis: The many talks about Vitesse look very interesting. It is also an open source Google technology, and to see its adoption by many companies and how they have benefited from its use will be interesting.

Want to find out more about this Percona Live 2018 featured talk, and data integrity at scale? Register for Percona Live 2018, and see Alexis session talk Data Integrity at Scale. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Data Integrity at Scale with Alexis Guajardo appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com