Welcome to the next Percona Live featured talk with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!
In this Percona Live featured talk, we’ll meet Avi Kivity, CTO of ScyllaDB. His talk will be Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second. Scylla is a new NoSQL database that applies systems programming techniques to a horizontally scalable NoSQL design to achieve extreme performance improvements. I had a chance to speak with Avi and learn a bit more about Scylla and its strengths:
Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.
Avi: Unlike perhaps many database developers, I approached databases up from the kernel and the filesystem layers. As the maintainer of the Linux Kernel-based Virtual Machine (KVM) project, I had extensive experience in kernel programming, especially scaling loads to many cores. Before that, at Exanet, I worked on a distributed filesystem (now Dell’s FluidFS), where I gained storage and distributed systems experience. Applying this low-level experience to a high-level application like a database has been very rewarding for me.
I work as ScyllaDB’s CTO in our Herzliyya, Israel headquarters, but our development team is scattered around twelve countries! Since ScyllaDB is a remote-work champion, I have the pleasure of working with the very best developers on the planet.
Percona: Your talk is going to be “Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second.” What is it about Scylla that makes it an obvious choice for an adopter? Is there a specific workload or scenario that it handles well?
Avi: As Scylla is a drop-in replacement for Cassandra, existing Cassandra users are our obvious target. Cassandra compatibility means that the Cassandra file formats, drivers, query language, management tools, and even configuration files are all understood by Scylla. Your existing applications, data and Cassandra skills transfer with very little effort. However, on average you gain up to 10 times the throughput, with a sizable reduction in latency; at the higher percentiles, you gain even more! The throughput improvement can be translated to smaller clusters, higher application throughput, a bigger load safety margin, or a combination of all of these.
As a very high-throughput database, Scylla is a good fit for the Internet of Things (IoT) and web-scale data stores. Its low latency (no Garbage Collection pauses!) make it a good fit for ad-tech applications. Even non-Cassandra users with high-throughput or strict low-latency requirements should take a good look at Scylla.
Percona: Where are you in the great NoSQL vs. MySQL debate? Why would somebody choose NoSQL (and specifically, Scylla) over MySQL?
Avi: Both SQL and NoSQL have their places. SQL offers great flexibility in your query choices, and excellent ACIDity. NoSQL trades off some of that flexibility and transactional behavior, but in return it gives you incredible scalability, geographical distribution and availability – and with Scylla, amazing throughput.
A great advantage of the Scylla architecture (which, to be fair, we inherited from Cassandra) is its symmetric structure. All nodes have the same role: there are no masters and slaves, metadata nodes or management nodes. A symmetric architecture means linear scaling as you add nodes, without a specific node becoming a bottleneck. This is pretty hard to achieve in a MySQL deployment.
Percona: What do you see as an issue that we the open source database community needs to be on top of concerning NoSQL, Cassandra, or Scylla? What keeps you up at night?
Avi: The NoSQL movement placed great emphasis on scale-out, almost completely ignoring scale-up. Why bother with per-node performance if you can simply add more nodes? Operational costs and complexity, that’s why! With Scylla, we’re trying to bring the same kind of attention to per-node performance that traditional SQL databases have while still providing the NoSQL goodness.
When we investigated the performance bottlenecks in Cassandra, we saw that while non-blocking message-passing was used between nodes (as it should be), blocking locks were used for inter-core communications, and blocking I/O APIs were used for storage access. To fix this problem, we wrote Seastar (http://seastar-project.org), a server application framework that uses non-blocking message-passing for inter-core communications and storage access. Scylla builds on Seastar and uses it to achieve its performance goals.
Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?
Avi: This is my first Percona Live conference, so I’m excited! I’m looking forward to engaging with Percona Live attendees, seeing how Scylla can help them and understanding which features we need to prioritize on the Scylla roadmap.
You can read more of Avi’s thoughts on NoSQL, SQL and Scylla at the Scylla blog, or follow him on Twitter.
Want to find out more about Avi and Scylla? Register for Percona Live Data Performance Conference 2016, and see his talk Scylla, a Cassandra-compatible NoSQL database at 2 million requests per second. Use the code “FeaturedTalk” and receive $100 off the current registration price!
The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.
The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.