Jan
27
2021
--

Datastax acquires Kesque as it gets into data streaming

Datastax, the company best known for commercializing the open-source Apache Cassandra database, is moving beyond databases. As the company announced today, it has acquired Kesque, a cloud messaging service.

The Kesque team built its service on top of the Apache Pulsar messaging and streaming project. Datastax has now taken that team’s knowledge in this area and, combined with its own expertise, is launching its own Pulsar-based streaming platform by the name of Datastax Luna Streaming, which is now generally available.

This move comes right as Datastax is also now, for the first time, announcing that it is cash-flow positive and profitable, as the company’s chief product officer, Ed Anuff, told me. “We are at over $150 million in [annual recurring revenue]. We are cash-flow positive and we are profitable,” he told me. This marks the first time the company is publically announcing this data. In addition, the company also today revealed that about 20 percent of its annual contract value is now for DataStax Astra, its managed multi-cloud Cassandra service and that the number of self-service Asta subscribers has more than doubled from Q3 to Q4.

The launch of Luna Streaming now gives the 10-year-old company a new area to expand into — and one that has some obvious adjacencies with its existing product portfolio.

“We looked at how a lot of developers are building on top of Cassandra,” Anuff, who joined Datastax after leaving Google Cloud last year, said. “What they’re doing is, they’re addressing what people call ‘data-in-motion’ use cases. They have huge amounts of data that are coming in, huge amounts of data that are going out — and they’re typically looking at doing something with streaming in conjunction with that. As we’ve gone in and asked, “What’s next for Datastax?,’ streaming is going to be a big part of that.”

Given Datastax’s open-source roots, it’s no surprise the team decided to build its service on another open-source project and acquire an open-source company to help it do so. Anuff noted that while there has been a lot of hype around streaming and Apache Kafka, a cloud-native solution like Pulsar seemed like the better solution for the company. Pulsar was originally developed at Yahoo! (which, full disclosure, belongs to the same Verizon Media Group family as TechCrunch) and even before acquiring Kesque, Datastax already used Pulsar to build its Astra platform. Other Pulsar users include Yahoo, Tencent, Nutanix and Splunk.

“What we saw was that when you go and look at doing streaming in a scale-out way, that Kafka isn’t the only approach. We looked at it, and we liked the Pulsar architecture, we like what’s going on, we like the community — and remember, we’re a company that grew up in the Apache open-source community — we said, ‘okay, we think that it’s got all the right underpinnings, let’s go and get involved in that,” Anuff said. And in the process of doing so, the team came across Kesque founder Chris Bartholomew and eventually decided to acquire his company.

The new Luna Streaming offering will be what Datastax calls a “subscription to success with Apache Pulsar.’ It will include a free, production-ready distribution of Pulsar and an optional, SLA-backed subscription tier with enterprise support.

Unsurprisingly, Datastax also plans to remain active in the Pulsar community. The team is already making code contributions, but Anuff also stressed that Datastax is helping out with scalability testing. “This is one of the things that we learned in our participation in the Apache Cassandra project,” Anuff said. “A lot of what these projects need is folks coming in doing testing, helping with deployments, supporting users. Our goal is to be a great participant in the community.”

Mar
03
2020
--

Datastax acquires The Last Pickle

Data management company Datastax, one of the largest contributors to the Apache Cassandra project, today announced that it has acquired The Last Pickle (and no, I don’t know what’s up with that name either), a New Zealand-based Cassandra consulting and services firm that’s behind a number of popular open-source tools for the distributed NoSQL database.

As Datastax Chief Strategy Officer Sam Ramji, who you may remember from his recent tenure at Apigee, the Cloud Foundry Foundation, Google and Autodesk, told me, The Last Pickle is one of the premier Apache Cassandra consulting and services companies. The team there has been building Cassandra-based open source solutions for the likes of Spotify, T Mobile and AT&T since it was founded back in 2012. And while The Last Pickle is based in New Zealand, the company has engineers all over the world that do the heavy lifting and help these companies successfully implement the Cassandra database technology.

It’s worth mentioning that Last Pickle CEO Aaron Morton first discovered Cassandra when he worked for WETA Digital on the special effects for Avatar, where the team used Cassandra to allow the VFX artists to store their data.

“There’s two parts to what they do,” Ramji explained. “One is the very visible consulting, which has led them to become world experts in the operation of Cassandra. So as we automate Cassandra and as we improve the operability of the project with enterprises, their embodied wisdom about how to operate and scale Apache Cassandra is as good as it gets — the best in the world.” And The Last Pickle’s experience in building systems with tens of thousands of nodes — and the challenges that its customers face — is something Datastax can then offer to its customers as well.

And Datastax, of course, also plans to productize The Last Pickle’s open-source tools like the automated repair tool Reaper and the Medusa backup and restore system.

As both Ramji and Datastax VP of Engineering Josh McKenzie stressed, Cassandra has seen a lot of commercial development in recent years, with the likes of AWS now offering a managed Cassandra service, for example, but there wasn’t all that much hype around the project anymore. But they argue that’s a good thing. Now that it is over ten years old, Cassandra has been battle-hardened. For the last ten years, Ramji argues, the industry tried to figure out what the de factor standard for scale-out computing should be. By 2019, it became clear that Kubernetes was the answer to that.

“This next decade is about what is the de facto standard for scale-out data? We think that’s got certain affordances, certain structural needs and we think that the decades that Cassandra has spent getting harden puts it in a position to be data for that wave.”

McKenzie also noted that Cassandra provides users with a number of built-in features like support for mutiple data centers and geo-replication, rolling updates and live scaling, as well as wide support across programming languages, give it a number of advantages over competing databases.

“It’s easy to forget how much Cassandra gives you for free just based on its architecture,” he said. “Losing the power in an entire datacenter, upgrading the version of the database, hardware failing every day? No problem. The cluster is 100 percent always still up and available. The tooling and expertise of The Last Pickle really help bring all this distributed and resilient power into the hands of the masses.”

The two companies did not disclose the price of the acquisition.

Apr
27
2017
--

Percona Live 2017: Hawkular Metrics, An Overview

Hawkular Metrics

Hawkular MetricsThe place is still frantic here at Percona Live 2017 as everybody tries to squeeze in a few more talks before the end of the day. One such talk was given by Red Hat’s Stefan Negrea on Hawkular Metrics.

Hawkular Metrics is a scalable, long-term, high-performance storage engine for metric data. The session was an overview of the project that includes the history of the project, an overview of the Hawkular ecosystem, technical details of the project, developer features and APIs and third party integrations.

Hawkular Metrics is backed by Cassandra for scalability. Hawkular Metrics is used and exposed by Hawkular Services.The API uses JSON to communicate with clients.

Users of Hawkular Metrics include:

  • IoT enthusiasts who need to collect metrics, and possibly trigger alerts
  • Operators who are looking for a solution to store metrics from statsD, collectd, syslog
  • Developers of solutions who need long-term time series database storage
  • Users of ManageIQ who are looking for Middleware management
  • Users of Kubernetes/Heapster who want to store Docker container metrics in a long-term time series database storage, thanks to the Heapster sink for Hawkular.

Stefan was kind enough to speak with me after the talk. Check it out below:

There are more talks today. Check out Thursday’s schedule here. Don’t forget to attend the Closing Remarks and prize give away at 4:00 pm.

Apr
11
2016
--

DataStax adds graph databases to enterprise Cassandra product set

Social network graph coming off a tablet. When DataStax acquired Aurelius, a graph database startup last year, it was clear it wanted to add graph database functionality to its DataStax Enterprise product, and today it achieved that goal when it announced the release of DataStax Enterprise Graph. The new enterprise graph product has been fully integrated into the DataStax Enterprise product set, giving customers an integrated… Read More

Mar
25
2015
--

Yelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more

elp IT! A talk with 3 Yelp MySQL DBAs heading to Percona Live 2015Founded in 2004 to help people find great local businesses, Yelp has some 135 million monthly unique visitors. With those traffic volumes Yelp’s 300+ engineers are constantly working to keep things moving smoothly – and when you move that fast you learn many things.

Fortunately for the global MySQL community, three Yelp DBAs will be sharing what they’ve learned at the annual Percona Live MySQL Conference and Expo this April 13-16 in Santa Clara, California.

Say “hello” to Susanne Lehmann, Jenni Snyder and Josh Snyder! I chatted with them over email about their presentations, on how MySQL is used at Yelp, and about the shortage of women in MySQL.

***

Tom: Jenni, you and Josh will be co-presenting “Next generation monitoring: moving beyond Nagios ” on April 14.

You mentioned that Yelp’s databases scale dynamically, and so does your monitoring of those databases. And to minimize human intervention, you’ve created a Puppet and Sensu monitoring ensemble… because “if it’s not monitored, it’s not in production.” Talk to me more about Yelp’s philosophy of “opt-out monitoring.” What does that entail? How does that help Yelp?

Jenni: Before we moved to Sensu, our Nagios dashboards were a sea of red, muted, acknowledged, or disabled service checks. In fact, we even had a cluster check to make sure that we never accidentally put a host into use that was muted or marked for downtime. It was possible for a well-meaning operator to acknowledge checks on a host and forget about it, and I certainly perpetrated a couple of instances of disks filling up after acknowledging a 3am “warning” page that I’d rather forget about. With Sensu, hosts and services come out of the downtime/acknowledgement state automatically after a number of days, ensuring that we’re kept honest and stay on top of issues that need to be addressed.

Also, monitoring is deployed with a node, not separate monitoring configuration. Outside of a grace period we employ when a host is first provisioned or rebooted, if a host is up, it’s being monitored and alerting. Also, alerting doesn’t always mean paging. We also use IRC and file tickets directly into our tracking system when we don’t need eyes on a problem right away.


Tom: Susanne, in your presentation, titled “insert cassandra into prod where use_case=?;” you’ll discuss the situations you’ve encountered where MySQL just wasn’t the right tool for the job.

What led up to that discovery and how did you come up with finding the right tools (and what were they) to run alongside and support MySQL?

Susanne: Our main force behind exploring other datastores alongside MySQL was that Yelp is growing outside the US market a lot. Therefore we wanted the data to be nearer to the customer and needed multi-master writes.

Also, we saw use cases where our application data was organized very key-value like and not relational, which made them a better fit for a NoSQL solution.

We decided to use Cassandra as a datastore and I plan to go more into detail why during my talk. Now we offer developers more choices on how to store our application data, but we also believe in the “right tool for the job” philosophy and might add more solutions to the mix in the future.


Tom: Jenni, you’ll also be presenting “Schema changes multiple times a day? OK!” I know that you and your fellow MySQL DBAs are always improving and also finding better ways of supporting new and existing features for Yelp users like me. Delivering on such a scale must entail some unique processes and tools. Does this involve a particular mindset among your fellow DBAs? Also, what are some of those key tools – and processes and how are they used?

Jenni: Yelp prizes the productivity of our developers and our ability to iterate and develop new features quickly. In order to do that, we need to be able to not only create new database tables, but also modify existing ones, many of which are larger than MySQL can alter without causing considerable replication delay. The first step is to foster a culture of automated testing, monitoring, code reviews, and partnership between developers and DBAs to ensure that we can quickly & safely roll out schema changes. In my talk, I’ll be describing tools that we’ve talked about before, like our Gross Query Checker, as well as the way the DBA team works with developers while still getting the rest of our work done. The second, easy part is using a tool like pt-online-schema-change to run schema changes online without causing replication delay or degrading performance :)


Tom: Josh, you’ll also be speaking on “Bootstrapping databases in a single command: elastic provisioning for the win.” What is “elastic provisioning” and how are you using it for Yelp’s tooling?

Josh: When I say that we use elastic provisioning, I mean that we can reliably and consistently build a database server from scratch, with minimal human involvement. The goal is to encompass every aspect of the provisioning task, including configuration, monitoring, and even load balancing, in a single thoroughly automated process. With this process in place, we’ve found ourselves able to quickly allocate and reallocate resources, both in our datacenters and in the cloud. Our tools for implementing the above goals give us greater confidence in our infrastructure, while avoiding single-points of failure and achieving the maximum possible level of performance. We had a lot of fun building this system, and we think that many of the components involved are relevant to others in the field.


Tom: Susanne and Jenni, last year at Percona Live there was a BoF session titled “MySQL and Women (or where are all the women?).” The idea was to discuss why there are “just not enough women working on the technology side of tech.” In a nutshell, the conversation focused on why there are not more women in MySQL and why so relatively few attend MySQL conferences like Percona Live.

The relative scarcity of women in technical roles was also the subject of an article published in the August 2014 issue of Forbes, citing a recent industry report.

Why, in your (respective) views, do you (or don’t) think that there are so few women in MySQL? And how can this trend be reversed?

Susanne: I think there are few women in MySQL and the reasons are manifold. Of course there is the pipeline problem. Then there is the problem, widely discussed right now, that women who are entering STEM jobs are less likely staying in there. These are reasons not specific for MySQL jobs, but rather for STEM in general. What is more specific for database/MySQL jobs is, in my opinion, that often times DBAs need to be on call, they need to stay in the office if things go sideways. Database problems tend often to be problems that can’t wait till the next morning. That makes it more demanding when you have a family for example (which is true for men as well of course, but seems still to be more of a problem for women).

As for how to reverse the trend, I liked this Guardian article because it covers a lot of important points. There is no easy solution.

I like that more industry leaders and technology companies are discussing what they can do to improve diversity these days. In general, it really helps to have a great professional (female) support system. At Yelp, we have AWE, the Awesome Women in Engineering group, in which Jenni and I are both active. We participate in welcoming women to Yelp engineering, speaking at external events and workshops to help other women present their work, mentoring, and a book club.

Jenni: I’m sorry that I missed Percona Live and this BoF last year; I was out on maternity leave. I believe that tech/startup culture is a huge reason that fewer women are entering and staying these days, but a quick web search will lead you to any number of articles debating the subject. I run into quite a few women working with MySQL; it’s large, open community and generally collaborative and supportive nature is very welcoming. As the article you linked to suggests, MySQL has a broad audience. It’s easy to get started with and pull into any project, and as a result, most software professionals have worked with it at some time or another.

On another note, I’m happy to see that Percona Live has a Code of Conduct. I hope that Percona and/or MySQL will consider adopting a Community Code of Conduct like Python, Puppet, and Ubuntu. Doing so raises the bar for all participants, without hampering collaboration and creativity!

* * *

Thanks very much, Susanne, Jenni and Josh! I look forward to seeing you next month at the conference. And readers, if you’d like to attend Percona Live, use the promo code Yelp15 for 15% off! Just enter that during registration. If you’re already attending, be sure to tweet about your favorite sessions using the hashtag #PerconaLive. And if you need to find a great place to eat while attending Percona Live, click here for excellent Yelp recommendations. :)

The post Yelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more appeared first on MySQL Performance Blog.

Feb
03
2015
--

DataStax Grabs Aurelius In Graph Database Acqui-Hire

Graph made from US 100 dollar bill with arrow pointing up. DataStax scored a $106M funding pay day last September, and today it announced it was using some of that money to acquire open source graph database company, Aurelius, along with its engineering talent.
No terms were disclosed, but all eight Aurelius engineers will be joining DataStax immediately. The new team will begin working on what they are calling “a massively scalable graph… Read More

Jan
27
2015
--

Percona University: Back to school Feb. 12 in Raleigh, N.C.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona University event in Raleigh, N.C. on Jan. 29, 2013.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona University event in Raleigh, N.C. on Jan. 29, 2013.

About two years ago we held our first-ever Percona University event in Raleigh, N.C. It was a great success with high attendance and very positive feedback which led us to organize a number of similar educational events in different locations around the world.

And next month we’ll be back where it all started. On February 12, Percona University comes to Raleigh – and this time the full-day educational event will be much more cool. What have we changed? Take a look at the agenda.

First – this is no longer just a MySQL-focused event. While 10 years ago MySQL was the default, dominating choice for modern companies looking to store and process data effectively – this is no longer the case. And as such the event’s theme is “Smart Data.” In addition to MySQL, Percona and MariaDB technologies (which you would expect to be covered), we have talks about Hadoop, MongoDB, Cassandra, Redis, Kafka, SQLLite.

However the “core” data-store technologies is not the only thing successful data architects should know – one should also be well-versed in the modern approaches to the infrastructure and general data management. This is why we also have talks about Ansible and OpenStack, DBaaS and PaaS as well as a number of more talks about big-picture topics around architecture and technology management.

Second – this is our first multi-track Percona University event – we had so many great speakers interested in speaking that we could not fit them all into one track, so we have two tracks now with 25 sessions which makes that quite an educational experience!

Third – while we’re committed to having those events be very affordable, we decided to charge $10 per attendee. The reason for this is to encourage people to register who actually plan on attending – when hosting free events we found out that way too many registered and never showed up, which was causing the venues to rapidly fill past capacity and forcing us to turn away those who could actually be there. It was also causing us to order more food than needed, causing waste. We trust $10 will not prevent you from attending, but if it does cause hardship, just drop me a note and I’ll give you a free pass.

A few other things you need to know:

This is very much a technically focused event. I have encouraged all speakers to make it about technology rather than sales pitches or marketing presentations.

This is low-key educational event. Do not expect it to be very fancy. If you’re looking for the great conference experience consider attending the Percona Live MySQL Conference and Expo this April.

Although it’s a full-day event, you can come for just part of the day. We recognize many of you will not be able to take a full day from work and may be able to attend only in the morning or the afternoon. This is totally fine. The morning registration hours is when most people will register, however, there will be someone on the desk to get you your pass throughout the day.

Thinking of Attending? Take a look at the day’s sessions and then register as space is limited. The event will be held at North Carolina State University’s McKimmon Conference & Training Center. I hope to see you there!

The post Percona University: Back to school Feb. 12 in Raleigh, N.C. appeared first on MySQL Performance Blog.

Sep
04
2014
--

DataStax Lands $106M In Series E Funding

Database diagram DataStax, the commercial face of the open source Apache Cassandra database, announced $106M in Series E Funding, bringing their total to date to over $190M in funding. Today’s round was led by Kleiner, Perkins, Caufield & Byers with additional investors including ClearBridge, Cross Creek and Wasatch, PremjiInvest and Comcast Ventures. Existing investors including Lightspeed… Read More

Oct
06
2013
--

How can we bring query to the data?

Baron recently wrote about sending the query to the data looking at distributed systems like Cassandra. I want to take a look at more simple systems like MySQL and see how we’re doing in this space.

It is obvious getting computations as closer to the data as possible is the most efficient as we will likely have less data to work with on the higher level in this case. Internally MySQL starts add optimizations which help in this regard, such as Index Condition Pushdown which allow storage engine to do most rudimentary data filtering improving efficiency.

The more important case though is the Application – Database interaction. Modern applications often have quite complicated logic which might not map to SQL very well. Framework and the practices developers follow can only add to this problem. As results Application may be issuing a lot of queries to the database doing computation inside the application and paying a lot of inefficiency and latency for transferring data back and forth. Latency is really a key here as accessing data through network is thousands of times slower than accessing data in memory so many simple data processing algorithms you could imagine accessing data they need row by row simply do not work.

For some tasks simply learning SQL (including some voodoo practices with user variables etc) and using it correctly is good enough to do efficient computations with single round trip, for others – it might not map to SQL very well.

There is known solution to this problem which existed in many database systems for decades – stored procedures. These would allow you to store programs inside database servers so they can work with data locally often accessing it with much lower latency so you can implement much broader set of algorithms. Stored procedures also other other advantages such as giving more security and more control over what application does to DBA but they have limitations too.

There are MySQL limitations – Stored Procedures restrict transparency,hard to debug, do not perform very well and need to be implemented in the programming language from 70s. Though all of this can be fixed in time. The design limitation though is stored procedures do not support some of the modern development and operational practices very well.

If you look at a lot of modern applications with database backend they have the “code” which is something which lives in your version control system changed quickly and such changes can be developed in production many times a day – approach proved to be successful for many modern web application. This is all as long as no database change is needed… because database does not like you to be agile. Changing database schema, indexes etc takes significant effort and can’t be taken lightly. They also require different process as you can’t just “deploy changed database structure” as you do with data you have to have the migration process which can be ranging from trivial (such as adding a column) to rather complicated – such as major database redesign. In fact reducing pain from database maintenance by having no schema is one of the major draws for NoSQL systems which offer a lot more flexibility.
The code also can often be deployed in rolling fashion to make it downtime free then more than one version of code is allowed to run in production for short or long period of time. For example deploying performance optimized version of code I can deploy it only on one Web server for a few days to ensure there are no surprises before full scale deployment

The problem with Stored Procedures they take some middle place. They are really the code which is part of the application but they live in the database and updated through change to the database which is global for all application servers. This makes it for Developers which can’t use the same editing tool to just edit code save it and see how it runs in the test system without doing extra work. Second problem is solved by some people by implementing some versioning in the database, so instead of CALL MAKE_PAYMENT(…) they will use CALL MAKE_PAYMENT_(…) which is however also quite pain in the butt.

So what would be the alternative ?

I would love to see MySQL extending the work started with INSERT ON DUPLICATE KEY UPDATE and Multi-Result Set by being able to submit the simple programs to run on the database server side as alternative to queries (or even alternative to SQL API). Using language more friendly to modern developers, such as JavaScript and allowing to return more than flat tables (for example JSON objects) would be quite appreciated. Such API would for example allow to solve “dynamic join” problems efficiently – where the data which belongs to the object (and as such which tables needs to be added to the join in SQL) depend on the object properties as well as handling complex update logic which now often requires many roundtrips to the application.

Implementing something like this one would need to be extra careful as it would allow application developers to kill database servers either more effectively than they do now – the proper limit on resource usage (CPU, memory etc) would need to be enforced. We also would need to be extra careful with security as ability to change the “program” allows hackers a lot more ways to express their creativity than SQL injection currently does.

The downside of this approach compared to really “stored” procedures from performance standpoint is their dynamic nature – each would need to be compiled for execution from the scratch. I think this could be substantially optimized with modern technologies and it is a small price to pay for the power to avoid many round trips and get a lot more power on the data processing local to the database.

In the end I think something along those lines could be quite helpful in expending usability of relational databases for modern applications. What are your thoughts?

The post How can we bring query to the data? appeared first on MySQL Performance Blog.

Jun
19
2013
--

What technologies are you running alongside MySQL?

What technologies are you running alongside MySQL?In many environments MySQL is not the only technology used to store in-process data.

Quite frequently, especially with large-scale or complicated applications, we use MySQL alongside other technologies for certain tasks of reporting, caching as well as main data-store for portions of application.

What technologies for data storage and processing do you use alongside MySQL in your environment? Please feel free to elaborate in the comments about your use case and experiences!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

The post What technologies are you running alongside MySQL? appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com