Jul
13
2016
--

Using Ceph with MySQL

Ceph

CephOver the last year, the Ceph world drew me in. Partly because of my taste for distributed systems, but also because I think Ceph represents a great opportunity for MySQL specifically and databases in general. The shift from local storage to distributed storage is similar to the shift from bare disks host configuration to LVM-managed disks configuration.

Most of the work I’ve done with Ceph was in collaboration with folks from RedHat (mainly Brent Compton and Kyle Bader). This work resulted in a number of talks presented at the Percona Live conference in April and the RedHat Summit San Francisco at the end of June. I could write a lot about using Ceph with databases, and I hope this post is the first in a long series on Ceph. Before I starting with use cases, setup configurations and performance benchmarks, I think I should quickly review the architecture and principles behind Ceph.

Introduction to Ceph

Inktank created Ceph a few years ago as a spin-off of the hosting company DreamHost. RedHat acquired Inktank in 2014 and now offers it as a storage solution. OpenStack uses Ceph as its dominant storage backend. This blog, however, focuses on a more general review and isn’t restricted to a virtual environment.

A simplistic way of describing Ceph is to say it is an object store, just like S3 or Swift. This is a true statement but only up to a certain point.  There are minimally two types of nodes with Ceph, monitors and object storage daemons (OSDs). The monitor nodes are responsible for maintaining a map of the cluster or, if you prefer, the Ceph cluster metadata. Without access to the information provided by the monitor nodes, the cluster is useless. Redundancy and quorum at the monitor level are important.

Any non-trivial Ceph setup has at least three monitors. The monitors are fairly lightweight processes and can be co-hosted on OSD nodes (the other node type needed in a minimal setup). The OSD nodes store the data on disk, and a single physical server can host many OSD nodes – though it would make little sense for it to host more than one monitor node. The OSD nodes are listed in the cluster metadata (the “crushmap”) in a hierarchy that can span data centers, racks, servers, etc. It is also possible to organize the OSDs by disk types to store some objects on SSD disks and other objects on rotating disks.

With the information provided by the monitors’ crushmap, any client can access data based on a predetermined hash algorithm. There’s no need for a relaying proxy. This becomes a big scalability factor since these proxies can be performance bottlenecks. Architecture-wise, it is somewhat similar to the NDB API, where – given a cluster map provided by the NDB management node – clients can directly access the data on data nodes.

Ceph stores data in a logical container call a pool. With the pool definition comes a number of placement groups. The placement groups are shards of data across the pool. For example, on a four-node Ceph cluster, if a pool is defined with 256 placement groups (pg), then each OSD will have 64 pgs for that pool. You can view the pgs as a level of indirection to smooth out the data distribution across the nodes. At the pool level, you define the replication factor (“size” in Ceph terminology).

The recommended values are a replication factor of three for spinners and two for SSD/Flash. I often use a size of one for ephemeral test VM images. A replication factor greater than one associates each pg with one or more pgs on the other OSD nodes.  As the data is modified, it is replicated synchronously to the other associated pgs so that the data it contains is still available in case an OSD node crashes.

So far, I have just discussed the basics of an object store. But the ability to update objects atomically in place makes Ceph different and better (in my opinion) than other object stores. The underlying object access protocol, rados, updates an arbitrary number of bytes in an object at an arbitrary offset, exactly like if it is a regular file. That update capability allows for much fancier usage of the object store – for things like the support of block devices, rbd devices, and even a network file systems, cephfs.

When using MySQL on Ceph, the rbd disk block device feature is extremely interesting. A Ceph rbd disk is basically the concatenation of a series of objects (4MB objects by default) that are presented as a block device by the Linux kernel rbd module. Functionally it is pretty similar to an iSCSI device as it can be mounted on any host that has access to the storage network and it is dependent upon the performance of the network.

The benefits of using Ceph

Agility
In a world striving for virtualization and containers, Ceph gives easily moves database resources between hosts.

IO scalability
On a single host, you have access only to the IO capabilities of that host. With Ceph, you basically put in parallel all the IO capabilities of all the hosts. If each host can do 1000 iops, a four-node cluster could reach up to 4000 iops.

High availability
Ceph replicates data at the storage level, and provides resiliency to storage node crash.  A kind of DRBD on steroids…

Backups
Ceph rbd block devices support snapshots, which are quick to make and have no performance impacts. Snapshots are an ideal way of performing MySQL backups.

Thin provisioning
You can clone and mount Ceph snapshots as block devices. This is a useful feature to provision new database servers for replication, either with asynchronous replication or with Galera replication.

The caveats of using Ceph

Of course, nothing is free. Ceph use comes with some caveats.

Ceph reaction to a missing OSD
If an OSD goes down, the Ceph cluster starts copying data with fewer copies than specified. Although good for high availability, the copying process significantly impacts performance. This implies that you cannot run a Ceph with a nearly full storage, you must have enough disk space to handle the loss of one node.

The “no out” OSD attribute mitigates this, and prevents Ceph from reacting automatically to a failure (but you are then on your own). When using the “no out” attribute, you must monitor and detect that you are running in degraded mode and take action. This resembles a failed disk in a RAID set. You can choose this behavior as default with the mon_osd_auto_mark_auto_out_in setting.

Scrubbing
Every day and every week (deep), Ceph scrubs operations that, although they are throttled, can still impact performance. You can modify the interval and the hours that control the scrub action. Once per day and once per week are likely fine. But you need to set osd_scrub_begin_hour and osd_scrub_end_hour to restrict the scrubbing to off hours. Also, scrubbing throttles itself to not put too much load on the nodes. The osd_scrub_load_threshold variable sets the threshold.

Tuning
Ceph has many parameters so that tuning Ceph can be complex and confusing. Since distributed systems push hardware, properly tuning Ceph might require things like distributing interrupt load among cores and thread core pinning, handling of Numa zones – especially if you use high-speed NVMe devices.

Conclusion

Hopefully, this post provided a good introduction to Ceph. I’ve discussed the architecture, the benefits and the caveats of Ceph. In future posts, I’ll present use cases with MySQL. These cases include performing Percona XtraDB Cluster SST operations using Ceph snapshots, provisioning async slaves and building HA setups. I also hope to provide guidelines on how to build and configure an efficient Ceph cluster.

Finally, a note for the ones who think cost and complexity put building a Ceph cluster out of reach. The picture below shows my home cluster (which I use quite heavily). The cluster comprises four ARM-based nodes (Odroid-XU4), each with a two TB portable USB-3 hard disk, a 16 GB EMMC flash disk and a gigabit Ethernet port.

I won’t claim record breaking performance (although it’s decent), but cost-wise it is pretty hard to beat (at around $600)!

Ceph

https://rh2016.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=42190&tclass=popup

 

Dec
29
2015
--

2016 Percona Live Tutorials Schedule is UP!

PL16-Logo-Vert-Full-Opt1

percona live tutorialsWe are excited to announce that the tutorial schedule for the Percona Live Data Performance Conference 2016 is up!

The schedule shows all the details for each of our informative and enlightening Percona Live tutorial sessions, including insights into InnoDB, MySQL 5.7, MongoDB 3.2 and RocksDB. These tutorials are a must for any data performance professional!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The sneak peek schedule for Percona Live 2016 has also been posted! The Conference will feature a variety of formal tracks and sessions related to MySQL, NoSQL and Data in the Cloud. With over 150 slots to fill, there will be no shortage of great content this year.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Just a reminder to everyone out there: our Super Saver discount rate for the Percona Live Data Performance and Expo 2016 is only available ‘til December 31 11:30pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at the lowest price possible!

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, NoSQL and Data in the Cloud event. Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Click through to the tutorial link right now, look them over, and pick which sessions you want to attend!

Nov
02
2015
--

A first look at RDS Aurora

Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.

First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.

Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.

The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.

That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.

Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.

So, how does Aurora compares to a technology like Galera?

Pros:

  • Higher write capacity, writer is unaffected by the other nodes
  • Simpler logic, no need for certification
  • No need for an SST to provision a new node
  • Can’t diverge
  • Scale iops tremendously
  • Fast failover
  • No need for quorum (handled by the object store)
  • Simple to deploy

Cons:

  • Likely asynchronous at the storage level
  • Only one node is writable
  • Not open source

Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?

The post A first look at RDS Aurora appeared first on MySQL Performance Blog.

May
20
2015
--

Percona XtraBackup 2.3.1-beta1 is now available

Percona XtraBackup for MySQL Percona is glad to announce the release of Percona XtraBackup 2.3.1-beta1 on May 20th 2015. Downloads are available from our download site here. This BETA release, will be available in Debian testing and CentOS testing repositories.

This is an BETA quality release and it is not intended for production. If you want a high quality, Generally Available release, the current Stable version should be used (currently 2.2.10 in the 2.2 series at the time of writing).

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release contains all of the features and bug fixes in Percona XtraBackup 2.2.10, plus the following:

New Features:

  • innobackupex script has been rewritten in C and it’s set as the symlink for xtrabackup. innobackupex still supports all features and syntax as 2.2 version did, but it is now deprecated and will be removed in next major release. Syntax for new features will not be added to the innobackupex, only to the xtrabackup. xtrabackup now also copies MyISAM tables and supports every feature of innobackupex. Syntax for features previously unique to innobackupex (option names and allowed values) remains the same for xtrabackup.
  • Percona XtraBackup can now read swift parameters from a [xbcloud] section from the .my.cnf file in the users home directory or alternatively from the global configuration file /etc/my.cnf. This makes it more convenient to use and avoids passing the sensitive data, such as --swift-key, on the command line.
  • Percona XtraBackup now supports different authentication options for Swift.
  • Percona XtraBackup now supports partial download of the cloud backup.
  • Options: --lock-wait-query-type, --lock-wait-threshold and --lock-wait-timeout have been renamed to --ftwrl-wait-query-type, --ftwrl-wait-threshold and --ftwrl-wait-timeout respectively.

Bugs Fixed:

  • innobackupex didn’t work correctly when credentials were specified in .mylogin.cnf. Bug fixed #1388122.
  • Options --decrypt and --decompress didn’t work with xtrabackup binary. Bug fixed #1452307.
  • Percona XtraBackup now executes an extra FLUSH TABLES before executing FLUSH TABLES WITH READ LOCK to potentially lower the impact from FLUSH TABLES WITH READ LOCK. Bug fixed #1277403.
  • innobackupex didn’t read user,password options from ~/.my.cnf file. Bug fixed #1092235.
  • innobackupex was always reporting the original version of the innobackup script from InnoDB Hot Backup. Bug fixed #1092380.

Release notes with all the bugfixes for Percona XtraBackup 2.3.1-beta1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.3.1-beta1 is now available appeared first on MySQL Performance Blog.

Apr
08
2015
--

More on OpenStack Live and our talks at OpenStack Summit Vancouver

In April and May, Percona will hold and participate in two OpenStack events: OpenStack Live and the OpenStack Summit. Join our talks at these events in Santa Clara and Vancouver for new insights into the MySQL operations of the core of OpenStack as well as the latest information on MySQL guest instances.

Bay Area OpenStack Live 2015Next week (April 13-14), Percona will host OpenStack Live at the Santa Clara Convention Center. Conveniently located for those in the Bay Area, this two day, user-focused event will cover many core aspects of OpenStack including Nova, Swift, Neutron, and Trove as well as related technologies like Ceph and Docker. (OpenStack Live passes include access to keynotes and the expo hall at the Percona Live MySQL Conference.)

OpenStack Live will be a packed two days with 6 hands-on tutorials, 18 sessions, and keynotes and panel discussions featuring speakers from Facebook, EMC, ObjectRocket, Percona, Yahoo, VMware, Deep Information Sciences, and special guest, Steve Wozniak. Yes, the real Steve Wozniak.

Of course I’m looking forward to my session, “An introduction to Database as a Service with an emphasis on OpenStack using Trove,” with Amrith Kumar, Founder and CTO of Tesora, on Tuesday from 11:30am – 12:20pm in Room 209. Some other topics you shouldn’t miss include:

Tutorial: Deploying, Configuring and Operating OpenStack Trove
Monday, April 13: 9:30am – 12:30pm in Room 203
Amrith Kumar (Tesora) and Sriram Kalyanasundaram (Tesora) will lead a 3 hour hands-on tutorial on Trove DBaaS. It’s rare to get such a focused tutorial on this key component of OpenStack and led by Tesora, major contributors to the Trove project. If you are even considering implementing Trove for your cloud, this is a must-attend class.

Tutorial: How to Get Your Groove on with OpenStack Swift Object Storage
Monday, April 13: 1:30pm – 4:30pm in Room 204
John Dickinson (OpenStack Swift), Manzoor Brar (SwiftStack), and Sergei Glushenko (Percona) will lead the audience on a journey into Swift, OpenStack’s object store project. John, the Swift “Project Team Lead” (PTL), and Manzoor will show you how to deploy a Swift cluster, use it with real applications, and, with the assistance of Sergei from Percona, how to properly backup MySQL databases to a Swift cluster.

Sessions which I’d recommend attending are:

Session: MySQL and OpenStack deep dive
Tuesday, April 14: 11:30am – 12:20pm in Room 204
Peter Boros (Percona)

Session: Deploying a OpenStack Cloud at Scale at Time Warner Cable
Tuesday, April 14: 1:20pm – 2:10pm in Room 203
Matthew Fischer (Time Warner Cable), Clayton O’Neill (Time Warner Cable)

Session: Designing a highly resilient Network Infrastructure for OpenStack Clouds
Tuesday, April 14: 3:50pm – 4:40pm in Room 203
Pere Monclus (PLUMgrid)

This is only a small sample of the tutorials and talks which you will hear at OpenStack Live. If you’re in the Bay Area next week and are interested in OpenStack or just want to learn more about some of the core components, register today to join us in Santa Clara from April 13-14. It will be well worth the investment.

As a reminder, getting the most out of all tutorials at OpenStack Live requires that you bring your own laptop and are ready to dig into the technologies. Review the session descriptions for any materials that should be downloaded or installed prior to the event.

Talks at OpenStack Summit Vancouver

Next month, Percona will join thousands in Vancouver for the next OpenStack Summit. In addition to helping those developing and running OpenStack to operate, optimize, and achieve high availability of the MySQL core, Percona MySQL experts will speak in two sessions.

Deep Dive into MySQL replication with OpenStack Trove, and Kilo
George Lorch (Percona) and Amrith Kumar (Tesora) will present an in depth exploration of MySQL replication with Trove and Kilo capabilities. This is a great way for anyone interested in OpenStack DBaaS to find out the latest information.

Core Services MySQL Database Backup and Recovery to Swift
Most OpenStack services rely heavily on MySQL, but how do you mitigate data loss in the event of a failed upgrade or unplanned outage? Kenny Gryp (Percona) and Chris Nelson (SwiftStack), experts in MySQL and storage, will discuss this challenge and provide advice and strategies for a fast, complete recovery from highly available storage sources.

The team from Percona will again be found in the exhibit hall at booth T37. We’re looking forward to seeing old friends and meeting new ones that rely on Percona software, such as the Galera-based Percona XtraDB Cluster and Percona XtraBackup, in their OpenStack clouds.

The post More on OpenStack Live and our talks at OpenStack Summit Vancouver appeared first on MySQL Performance Blog.

Sep
29
2014
--

MySQL & OpenStack: How to overcome issues as your dataset grows

MySQL is the database of choice for most OpenStack components (Ceilometer is a notable exception). If you start with a small deployment, it will probably run like a charm. But as soon as the dataset grows, you will suddenly face several challenges. We will write a series of blog posts explaining the issues you may hit and how to overcome them.

Where is MySQL used in OpenStack?

Have a look at the logical diagram of OpenStack below (click the image for a larger view).

 

MySQL & OpenStack: A deployment primer

The diagram is a bit outdated: Neutron appears as Quantum and newer components like Heat are not pictured. But it shows that a database has to be used to store metadata or runtime information. And although many databases are supported, MySQL is the most common choice. Of course MySQL can also be used in instances running inside an OpenStack cloud.

What are the most common issues?

OpenStack_PerconaAs with many applications, when you start small, the database is running well and maintenance operations are fast and easy to perform. But with a dataset that grows, you will find that the following operations are becoming increasingly difficult:

  1. Having good backups: mysqldump is the standard backup tool for small deployments. While backups of instances having 100GB of data is still quite fast, restore is single-threaded and will take hours. You will probably need to use other tools such as Percona XtraBackup, but what are the tradeoffs?
  2. Changing the schema: whenever you have to add an index, change a datatype or add a column, it can trigger a table rebuild which will prevent writes to proceed on the table. While the rebuild is fast when the table has only a few hundreds of MBs of data, ALTER TABLE statements can easily take hours or days for very large tables. Using pt-online-schema-change from Percona Toolkit is a good workaround, but it doesn’t mean that you can blindly run it without any precaution.
  3. Making the database highly available: whenever the database is down, the whole platform is down or runs in a degraded state. So you need to plan for a high availability solution. One option is to use Galera, but that can introduce subtle issues.
  4. Monitoring the health of your database instances: MySQL exposes hundreds of metrics, how do you know which ones to looked at to quickly identify potential issues?

1. and 2. are not likely to be issues for the MySQL instance backing your OpenStack cloud as it will be very small, but they can be big hurdles for guest databases that can grow very large depending on the application.

3. and 4. are highly desirable no matter the size of the database.

Stay tuned for more related posts on MySQL & OpenStack – and feel free to give us your feedback! And remember that if MySQL is showing bad performance in your OpenStack deployment, Percona is here to help. Just give us a call anytime, 24/7. I also invite you and your team to attend the inaugural OpenStack Live 2015 conference, which runs April 13-14, 2015 in Santa Clara, Calif. It runs alongside the Percona Live MySQL Conference and Expo (April 13-16) at the Hyatt Regency Santa Clara and the Santa Clara Convention Center.

The post MySQL & OpenStack: How to overcome issues as your dataset grows appeared first on Percona Performance Blog.

Aug
28
2014
--

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

OpenStack Trove Day

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
– Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
– Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
– Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
– Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

Aug
28
2014
--

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

OpenStack Trove Day

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
– Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
– Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
– Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
– Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

Aug
25
2014
--

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

Aug
25
2014
--

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com