Jul
15
2021
--

The CockroachDB EC-1

Every application is a palimpsest of technologies, each layer forming a base that enables the next layer to function. Web front ends rely on JavaScript and browser DOM, which rely on back-end APIs, which themselves rely on databases.

As one goes deeper down the stack, engineering decisions become ever more conservative — changing the location of a button in a web app is an inconvenience; changing a database engine can radically upend an entire project.

It’s little surprise then that database technologies are among the longest-lasting engineering projects in the modern software developer toolkit. MySQL, which remains one of the most popular database engines in the world, was first released in the mid-1990s, and Oracle Database, launched more than four decades ago, is still widely used in high-performance corporate environments.

Database technology can change the world, but the world in these parts changes very, very slowly. That’s made building a startup in the sector a tough equation: Sales cycles can be painfully slow, even when new features can dramatically expand a developer’s capabilities. Competition is stiff and comes from some of the largest and most entrenched tech companies in the world. Exits have also been few and far between.

That challenge — and opportunity — is what makes studying Cockroach Labs so interesting. The company behind CockroachDB attempts to solve a long-standing problem in large-scale, distributed database architecture: How to make it so that data created in one place on the planet is always available for consumption by applications that are thousands of miles away, immediately and accurately. Making global data always available immediately and accurately might sound like a simple use case, but in reality it’s quite the herculean task. Cockroach Labs’ story is one of an uphill struggle, but one that saw it turn into a next-generation, $2-billion-valued database contender.

The lead writer of this EC-1 is Bob Reselman. Reselman has been writing about the enterprise software market for more than two decades, with a particular emphasis on teaching and educating engineers on technology. The lead editor for this package was Danny Crichton, the assistant editor was Ram Iyer, the copy editor was Richard Dal Porto, figures were designed by Bob Reselman and stylized by Bryce Durbin, and illustrations were drawn by Nigel Sussman.

CockroachDB had no say in the content of this analysis and did not get advance access to it. Reselman has no financial ties to CockroachDB or other conflicts of interest to disclose.

The CockroachDB EC-1 comprises four main articles numbering 9,100 words and a reading time of 37 minutes. Here’s what we’ll be crawling over:

We’re always iterating on the EC-1 format. If you have questions, comments or ideas, please send an email to TechCrunch Managing Editor Danny Crichton at danny@techcrunch.com.

Jul
15
2021
--

CockroachDB, the database that just won’t die

There is an art to engineering, and sometimes engineering can transform art. For Spencer Kimball and Peter Mattis, those two worlds collided when they created the widely successful open-source graphics program, GIMP, as college students at Berkeley.

That project was so successful that when the two joined Google in 2002, Sergey Brin and Larry Page personally stopped by to tell the new hires how much they liked it and explained how they used the program to create the first Google logo.

Cockroach Labs was started by developers and stays true to its roots to this day.

In terms of good fortune in the corporate hierarchy, when you get this type of recognition in a company such as Google, there’s only one way you can go — up. They went from rising stars to stars at Google, becoming the go-to guys on the Infrastructure Team. They could easily have looked forward to a lifetime of lucrative employment.

But Kimball, Mattis and another Google employee, Ben Darnell, wanted more — a company of their own. To realize their ambitions, they created Cockroach Labs, the business entity behind their ambitious open-source database CockroachDB. Can some of the smartest former engineers in Google’s arsenal upend the world of databases in a market spotted with the gravesites of storage dreams past? That’s what we are here to find out.

Berkeley software distribution

Mattis and Kimball were roommates at Berkeley majoring in computer science in the early-to-mid-1990s. In addition to their usual studies, they also became involved with the eXperimental Computing Facility (XCF), an organization of undergraduates who have a keen, almost obsessive interest in CS.

Jul
15
2021
--

How engineers fought the CAP theorem in the global war on latency

CockroachDB was intended to be a global database from the beginning. The founders of Cockroach Labs wanted to ensure that data written in one location would be viewable immediately in another location 10,000 miles away. The use case was simple, but the work needed to make it happen was herculean.

The company is betting the farm that it can solve one of the largest challenges for web-scale applications. The approach it’s taking is clever, but it’s a bit complicated, particularly for the non-technical reader. Given its history and engineering talent, the company is in the process of pulling it off and making a big impact on the database market, making it a technology well worth understanding. In short, there’s value in digging into the details.

Using CockroachDB’s multiregion feature to segment data according to geographic proximity fulfills Cockroach Labs’ primary directive: To get data as close to the user as possible.

In part 1 of this EC-1, I provided a general overview and a look at the origins of Cockroach Labs. In this installment, I’m going to cover the technical details of the technology with an eye to the non-technical reader. I’m going to describe the CockroachDB technology through three questions:

  1. What makes reading and writing data over a global geography so hard?
  2. How does CockroachDB address the problem?
  3. What does it all mean for those using CockroachDB?

What makes reading and writing data over a global geography so hard?

Spencer Kimball, CEO and co-founder of Cockroach Labs, describes the situation this way:

There’s lots of other stuff you need to consider when building global applications, particularly around data management. Take, for example, the question and answer website Quora. Let’s say you live in Australia. You have an account and you store the particulars of your Quora user identity on a database partition in Australia.

But when you post a question, you actually don’t want that data to just be posted in Australia. You want that data to be posted everywhere so that all the answers to all the questions are the same for everybody, anywhere. You don’t want to have a situation where you answer a question in Sydney and then you can see it in Hong Kong, but you can’t see it in the EU. When that’s the case, you end up getting different answers depending where you are. That’s a huge problem.

Reading and writing data over a global geography is challenging for pretty much the same reason that it’s faster to get a pizza delivered from across the street than from across the city. The essential constraints of time and space apply. Whether it’s digital data or a pepperoni pizza, the further away you are from the source, the longer stuff takes to get to you.

Jul
15
2021
--

“Developers, as you know, do not like to pay for things”

In the previous part of this EC-1, we looked at the technical details of CockroachDB and how it provides accurate data instantaneously anywhere on the planet. In this installment, we’re going to take a look at the product side of Cockroach, with a particular focus on developer relations.

As a business, Cockroach Labs has many things going for it. The company’s approach to distributed database technology is novel. And, as more companies operate on a global level, CockroachDB has the potential to gain some significant market share internationally. The company is seven years into a typical 10-year maturity model for databases, has raised $355 million, and holds a $2 billion market value. It’s considered a double unicorn. Few database companies can say this.

The company is now aggressively expanding into the database-as-a-service space, offering its own technology in a fully managed package, expanding the spectrum of clients who can take immediate advantage of its products.

But its growth depends upon securing the love of developers while also making its product easier to use for new customers. To that end, I’m going to analyze the company’s pivot to the cloud as well as its extensive outreach to developers as it works to set itself up for long-term, sustainable success.

Cockroach Labs looks to the cloud

These days, just about any company of consequence provides services via the internet, and a growing number of these services are powered by products and services from native cloud providers. Gartner forecasted in 2019 that cloud services are growing at an annual rate of 17.5%, and there’s no sign that the growth has abated at all.

Its founders’ history with Google back in the mid-2000s has meant that Cockroach Labs has always been aware of the impact of cloud services on the commercial web. Unsurprisingly, CockroachDB could run cloud native right from its first release, given that its architecture presupposes the cloud in its operation — as we saw in part 2 of this EC-1.

Jul
15
2021
--

Scaling CockroachDB in the red ocean of relational databases

Most database startups avoid building relational databases, since that market is dominated by a few goliaths. Oracle, MySQL and Microsoft SQL Server have embedded themselves into the technical fabric of large- and medium-size companies going back decades. These established companies have a lot of market share and a lot of money to quash the competition.

So rather than trying to compete in the relational database market, over the past decade, many database startups focused on alternative architectures such as document-centric databases (like MongoDB), key-value stores (like Redis) and graph databases (like Neo4J). But Cockroach Labs went against conventional wisdom with CockroachDB: It intentionally competed in the relational database market with its relational database product.

While it did face an uphill battle to penetrate the market, Cockroach Labs saw a surprising benefit: It didn’t have to invent a market. All it needed to do was grab a share of a market that also happened to be growing rapidly.

Cockroach Labs has a bright future, compelling technology, a lot of money in the bank and has an experienced, technically astute executive team.

In previous parts of this EC-1, I looked at the origins of CockroachDB, presented an in-depth technical description of its product as well as an analysis of the company’s developer relations and cloud service, CockroachCloud. In this final installment, we’ll look at the future of the company, the competitive landscape within the relational database market, its ability to retain talent as it looks toward a potential IPO or acquisition, and the risks it faces.

CockroachDB’s success is not guaranteed. It has to overcome significant hurdles to secure a profitable place for itself among a set of well-established database technologies that are owned by companies with very deep pockets.

It’s not impossible, though. We’ll first look at MongoDB as an example of how a company can break through the barriers for database startups competing with incumbents.

When life gives you Mongos, make MongoDB

Dev Ittycheria, MongoDB CEO, rings the Nasdaq Stock Market Opening Bell. Image Credits: Nasdaq, Inc

MongoDB is a good example of the risks that come with trying to invent a new database market. The company started out as a purely document-centric database at a time when that approach was the exception rather than the rule.

Web developers like document-centric databases because they address a number of common use cases in their work. For example, a document-centric database works well for storing comments to a blog post or a customer’s entire order history and profile.

Jan
12
2021
--

Cockroach Labs scores $160M Series E on $2B valuation

Cockroach Labs, makers of CockroachDB, have been on a fundraising roll for the last couple of years. Today the company announced a $160 million Series E on a fat $2 billion valuation. The round comes just eight months after the startup raised an $86.6 million Series D.

The latest investment was led by Altimeter Capital, with participation from new investors Greenoaks and Lone Pine, along with existing investors Benchmark, Bond, FirstMark, GV, Index Ventures and Tiger Global. The round doubled the company’s previous valuation and increased the amount raised to $355 million.

Co-founder and CEO Spencer Kimball says the company’s revenue more than doubled in 2020 in spite of COVID, and that caught the attention of investors. He attributed this paradoxical rise to the rapid shift to the cloud brought on by the pandemic that many people in the industry have seen.

“People became more aggressive with what was already underway, a real move to embrace the cloud to build the next generation of applications and services, and that’s really fundamentally where we are,” Kimball told me.

As that happened, the company began a shift in thinking. While it has embraced an open-source version of CockroachDB along with a 30-day free trial on the company’s cloud service as ways to attract new customers to the top of the funnel, it wants to try a new approach.

In fact, it plans to replace the 30-day trial with a newer version later this year without any time limits. It believes this will attract more developers to the platform and enable them to see the full set of features without having to enter credit card information. What’s more, by taking this approach, it should end up costing the company less money to support the free tier.

“What we expect is that you can do all kinds of things on that free tier. You can do a hackathon, any kind of hobby project […] or even a startup that has ambitions to be the next DoorDash or Airbnb,” he said. As he points out, there’s a point where early-stage companies don’t have many users, and can remain in the free tier until they achieve product-market fit.

“That’s when they put a credit card down, and they can extend beyond the free tier threshold and pay for what they use,” he said. The newer free tier is still in the beta testing phase, but will be rolled out during this year.

Kimball says the company wasn’t necessarily looking to raise, although he knew that it would continue to need more cash on the balance sheet to run with giant competitors like Oracle, AWS and the other big cloud vendors, along with a slew of other database startups. As the company’s revenue grows, he certainly sees an IPO in its future, but he doesn’t see it happening this year.

The startup ended the year with 200 employees and Kimball expects to double that by the end of this year. He says growing a diverse group of employees takes good internal data and building a welcoming and inclusive culture.

“I think the starting point for anything you want to optimize in a business is to make sure that you have the metrics in front of you, and that you’re constantly looking at them […] in order to measure how you’re doing,” he explained.

He added, “The thing that we’re most focused on in terms of action is really building the culture of the company appropriately and that’s something we’ve been doing for all six years we’ve been around. To the extent that you have an inclusive environment where people actually really view the value of respect, that helps with diversity.”

Kimball says he sees a different approach to running the business when the pandemic ends, with some small percentage going into the office regularly and others coming for quarterly visits, but he doesn’t see a full return to the office post-pandemic.

Aug
06
2019
--

Cockroach Labs announces $55M Series C to battle industry giants

Cockroach Labs, makers of CockroachDB, sits in a tough position in the database market. On one side, it has traditional database vendors like Oracle, and on the other there’s AWS and its family of databases. It takes some good technology and serious dollars to compete with those companies. Cockroach took care of the latter with a $55 million Series C round today.

The round was led by Altimeter Capital and Tiger Global along with existing investor GV. Other existing investors, including Benchmark, Index Ventures, Redpoint Ventures, FirstMark Capital and Work-Bench, also participated. Today’s investment brings the total raised to more than $110 million, according to the company.

Spencer Kimball, co-founder and CEO, says the company is building a modern database to compete with these industry giants. “CockroachDB is architected from the ground up as a cloud native database. Fundamentally, what that means is that it’s distributed, not just across nodes in a single data center, which is really table stakes as the database gets bigger, but also across data centers to be resilient. It’s also distributed potentially across the planet in order to give a global customer base what feels like a local experience to keep the data near them,” Kimball explained.

At the same time, even while it has a cloud product hosted on AWS, it also competes with several AWS database products, including Amazon Aurora, Redshift and DynamoDB. Much like MongoDB, which changed its open-source licensing structure last year, Cockroach did as well, for many of the same reasons. They both believed bigger players were taking advantage of the open-source nature of their products to undermine their markets.

“If you’re trying to build a business around an open-source product, you have to be careful that a much bigger player doesn’t come along and extract too much of the value out of the open-source product that you’ve been building and maintaining,” Kimball explained.

As the company deals with all of these competitive pressures, it takes a fair bit of money to continue building a piece of technology to beat the competition, while going up against much deeper-pocketed rivals. So far the company has been doing well, with Q1 revenue this year doubling all of last year. Kimball indicated that Q2 could double Q1, but he wants to keep that going, and that takes money.

“We need to accelerate that sales momentum and that’s usually what the Series C is about. Fundamentally, we have, I think, the most advanced capabilities in the market right now. Certainly we do if you look at the differentiator around just global capability. We nevertheless are competing with Oracle on one side, and Amazon on the other side. So a lot of this money is going towards product development too,” he said.

Cockroach Labs was founded in 2015, and is based in New York City.

Oct
30
2018
--

Cockroach Labs launches CockroachDB as managed service

Cockroach Lab’s open source SQL database, CockroachDB, has been making inroads since it launched last year, but as any open source technology matures, in order to move deeper into markets it has to move beyond technical early adopters to a more generalized audience. To help achieve that, the company announced a new CockroachDB managed service today.

The service has been designed to be cloud-agnostic, and for starters it’s going to be available on Amazon Web Services and Google Cloud Platform. Cockroach, which launched in 2015, has always positioned itself as modern cloud alternative to the likes of Oracle or even Amazon’s Aurora database.

As company co-founder and CEO Spencer Kimball told me in an interview in May, those companies involve too much vendor lock-in for his taste. His company launched as open alternative to all of that. “You can migrate a Cockroach cluster from one cloud to another with no down time,” Kimball told TechCrunch in May.

He believes having that kind of flexibility is a huge advantage over what other vendors are offering, and today’s announcement carries that a step further. Instead of doing all the heavy lifting of setting up and managing a database and the related infrastructure, Cockroach is now offering CockroachDB as a service to handle all of that for you.

Kimball certainly recognizes that by offering his company’s product in this format, it will help grow his market. “We’ve been seeing significant migration activity away from Oracle, AWS Aurora, and Cassandra, and we’re now able to get our customers to market faster with Managed CockroachDB,” Kimball said in a statement.

The database itself offers the advantage of being ultra-resilient, meaning it stays up and running under most circumstances and that’s a huge value proposition for any database product. It achieves up time through replication, so if one version of itself goes down, the next can take over.

As an open source tool, it has been making money up until now by offering an enterprise version, which includes backup, support and other premium pieces. With today’s announcement, the company can get a more direct revenue stream from customers subscribing to the database service.

A year ago, the company announced version 1.0 of CockroachDB and $27 million in Series B financing, which was led by Redpoint with participation from Benchmark, GV, Index Ventures and FirstMark. They’ve obviously been putting that money to good use developing this new managed service.

Mar
27
2017
--

What’s Next for SQL Databases?

SQL Databases

SQL DatabasesIn this blog, I’ll go over my thoughts on what we can expect in the world of SQL databases.

After reading Baron’s prediction on databases, here:

https://www.xaprb.com/blog/defining-moments-in-database-history/

I want to provide my own view on what’s coming up next for SQL databases. I think we live in interesting times, when we can see the beginning of the next-generation of RDBMSs.

There are defining characteristics of such databases:

  1. Auto-scaling. The ability to add and use resources depending on the current load and database size. This is done transparently for users and DBAs.
  2. Auto-healing. The automatic handling of node failures.
  3. Multi-regional, cloud-agnostic, geo-distributed. The ability to support multiple data centers and multiple clouds, in different parts of the world.
  4. Transactional. All the above, with the ability to support multi-statements transactional workloads.
  5. Strong consistency. The full definition of strong consistency is pretty involved. For simplicity, let’s say it means that reads (in the absence of ongoing writes) will return the same data, despite what region or data center you are getting it from. A simple counter-example is the famous MySQL asynchronous replication, where (with the slave delay) reading the data on a slave can return very outdated data. I am focusing on reads, because in a distributed environment the consistent reads performance will be affected. This is where network latency (often limited by the speed of light) will define performance.
  6. SQL language. SQL, despite being old and widely criticized, is not going anywhere. This is a universal language for app developers to access data.

With this, I see following interesting projects:

  • Google Cloud Spanner (https://cloud.google.com/spanner/). Recently announced and still in the Beta stage. Definitely an interesting projects, with the obvious limitation of running only in Google Cloud.
  • FaunaDB (https://fauna.com/). Also very recently announced, so it is hard to say how it performs. The major downside I see is that it does not provide SQL access, but uses a custom language.
  • Two open source projects:
    • CockroachDB (https://www.cockroachlabs.com/). This is still in the Beta stage, but definitely an interesting project to follow. Initially, the project planned to support only key-value access, but later they made a very smart decision to provide SQL access via a PostgreSQL-compatible protocol.
    • TiDB (https://github.com/pingcap/tidb). Right now in RC stages, and the target is to provide SQL access over a MySQL compatible protocol (and later PostgreSQL protocol).

Protocol compatibility is a wise approach, although not strictly necessary. It lowers an entry barrier for the existing applications.

Both CockroachDB and TiDB, at the moment of this writing, still have rough edges and can’t be used in serious deployments (from my experience). I expect both projects will make a big progress in 2017.

What shared characteristics can we expect from these systems?

As I mentioned above, we may see that the read performance is degraded (as latency increases), and often it will be defined more by network performance than anything else. Storage IO and CPU cycles will be secondary factors. There will be more work on how to understand and tune the network traffic.

We may need to get used to the fact that point or small range selects become much slower. Right now, we see very fast point selects for traditional RDBM (MySQL, PostgreSQL, etc.).

Heavy writes will be problematic. The problem is that all writes will need to go through the consistency protocol. Write-optimized storage engines will help (both CockroachDB and TiDB use RocksDB in the storage layer).

The long transactions (let’s say changing 100000 or more rows) also will be problematic. There is just too much network round-trips and housekeeping work on each node, making long transactions an issue for distributed systems.

Another shared property (at least between CockroachDB and TiDB) is the active use of the Raft protocol to achieve consistency. So it will be important to understand how this protocol works to use it effectively. You can find a good overview of the Raft protocol here: http://container-solutions.com/raft-explained-part-1-the-consenus-problem/.

There probably are more NewSQL technologies than I have mentioned here, but I do not think any of them captured critical market- or mind-share. So we are at the beginning of interesting times . . .

What about MySQL? Can MySQL become the database that provides all these characteristics? It is possible, but I do not think it will happen anytime soon. MySQL would need to provide automatic sharding to do this, which will be very hard to implement given the current internal design. It may happen in the future, though it will require a lot of engineering efforts to make it work properly.

Oct
05
2016
--

Percona Live Europe 2016: “Inside CockroachDB’s Survivability Model” with Marc Berhault

Percona Live Europe

Percona Live Europe

The Percona Live Europe 2016 conference is moving along quickly, and today we saw a lot of presentations on open source database that AREN’T MySQL or MongoDB.

There are more than 100 open source databases in the world, each with their own design and use case sweet spots. At Percona Live Europe, we get a chance to learn about some of the most popular open source databases, their design use cases, user stories as well as how they work with together with MySQL and MongoDB.

One such talk was from Marc Berhault, Engineer at Cockroach Labs. CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store.

This talk took a deep dive into CockroachDB, a database whose “survive and thrive” model aims to bring the best aspects of Google’s next generation database, Spanner, to the rest of the world via open source. Marc looked specifically at CockroachDB’s operations and deployment model, and explored how rebalancing, repair, and symmetric nodes combine to create both simple deployment and a strong recovery story. He then explored how you can both contribute to it and use it to build scalable, resilient applications that can be deployed to any cloud infrastructure.

Percona’s EMEA Field Marketing Manager Kamal Taibi was able to chat with Marc, and get better insight into his talk. Check it out below!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com