Jul
15
2021
--

CockroachDB, the database that just won’t die

There is an art to engineering, and sometimes engineering can transform art. For Spencer Kimball and Peter Mattis, those two worlds collided when they created the widely successful open-source graphics program, GIMP, as college students at Berkeley.

That project was so successful that when the two joined Google in 2002, Sergey Brin and Larry Page personally stopped by to tell the new hires how much they liked it and explained how they used the program to create the first Google logo.

Cockroach Labs was started by developers and stays true to its roots to this day.

In terms of good fortune in the corporate hierarchy, when you get this type of recognition in a company such as Google, there’s only one way you can go — up. They went from rising stars to stars at Google, becoming the go-to guys on the Infrastructure Team. They could easily have looked forward to a lifetime of lucrative employment.

But Kimball, Mattis and another Google employee, Ben Darnell, wanted more — a company of their own. To realize their ambitions, they created Cockroach Labs, the business entity behind their ambitious open-source database CockroachDB. Can some of the smartest former engineers in Google’s arsenal upend the world of databases in a market spotted with the gravesites of storage dreams past? That’s what we are here to find out.

Berkeley software distribution

Mattis and Kimball were roommates at Berkeley majoring in computer science in the early-to-mid-1990s. In addition to their usual studies, they also became involved with the eXperimental Computing Facility (XCF), an organization of undergraduates who have a keen, almost obsessive interest in CS.

Jul
15
2021
--

How engineers fought the CAP theorem in the global war on latency

CockroachDB was intended to be a global database from the beginning. The founders of Cockroach Labs wanted to ensure that data written in one location would be viewable immediately in another location 10,000 miles away. The use case was simple, but the work needed to make it happen was herculean.

The company is betting the farm that it can solve one of the largest challenges for web-scale applications. The approach it’s taking is clever, but it’s a bit complicated, particularly for the non-technical reader. Given its history and engineering talent, the company is in the process of pulling it off and making a big impact on the database market, making it a technology well worth understanding. In short, there’s value in digging into the details.

Using CockroachDB’s multiregion feature to segment data according to geographic proximity fulfills Cockroach Labs’ primary directive: To get data as close to the user as possible.

In part 1 of this EC-1, I provided a general overview and a look at the origins of Cockroach Labs. In this installment, I’m going to cover the technical details of the technology with an eye to the non-technical reader. I’m going to describe the CockroachDB technology through three questions:

  1. What makes reading and writing data over a global geography so hard?
  2. How does CockroachDB address the problem?
  3. What does it all mean for those using CockroachDB?

What makes reading and writing data over a global geography so hard?

Spencer Kimball, CEO and co-founder of Cockroach Labs, describes the situation this way:

There’s lots of other stuff you need to consider when building global applications, particularly around data management. Take, for example, the question and answer website Quora. Let’s say you live in Australia. You have an account and you store the particulars of your Quora user identity on a database partition in Australia.

But when you post a question, you actually don’t want that data to just be posted in Australia. You want that data to be posted everywhere so that all the answers to all the questions are the same for everybody, anywhere. You don’t want to have a situation where you answer a question in Sydney and then you can see it in Hong Kong, but you can’t see it in the EU. When that’s the case, you end up getting different answers depending where you are. That’s a huge problem.

Reading and writing data over a global geography is challenging for pretty much the same reason that it’s faster to get a pizza delivered from across the street than from across the city. The essential constraints of time and space apply. Whether it’s digital data or a pepperoni pizza, the further away you are from the source, the longer stuff takes to get to you.

Jul
15
2021
--

“Developers, as you know, do not like to pay for things”

In the previous part of this EC-1, we looked at the technical details of CockroachDB and how it provides accurate data instantaneously anywhere on the planet. In this installment, we’re going to take a look at the product side of Cockroach, with a particular focus on developer relations.

As a business, Cockroach Labs has many things going for it. The company’s approach to distributed database technology is novel. And, as more companies operate on a global level, CockroachDB has the potential to gain some significant market share internationally. The company is seven years into a typical 10-year maturity model for databases, has raised $355 million, and holds a $2 billion market value. It’s considered a double unicorn. Few database companies can say this.

The company is now aggressively expanding into the database-as-a-service space, offering its own technology in a fully managed package, expanding the spectrum of clients who can take immediate advantage of its products.

But its growth depends upon securing the love of developers while also making its product easier to use for new customers. To that end, I’m going to analyze the company’s pivot to the cloud as well as its extensive outreach to developers as it works to set itself up for long-term, sustainable success.

Cockroach Labs looks to the cloud

These days, just about any company of consequence provides services via the internet, and a growing number of these services are powered by products and services from native cloud providers. Gartner forecasted in 2019 that cloud services are growing at an annual rate of 17.5%, and there’s no sign that the growth has abated at all.

Its founders’ history with Google back in the mid-2000s has meant that Cockroach Labs has always been aware of the impact of cloud services on the commercial web. Unsurprisingly, CockroachDB could run cloud native right from its first release, given that its architecture presupposes the cloud in its operation — as we saw in part 2 of this EC-1.

Jul
15
2021
--

Scaling CockroachDB in the red ocean of relational databases

Most database startups avoid building relational databases, since that market is dominated by a few goliaths. Oracle, MySQL and Microsoft SQL Server have embedded themselves into the technical fabric of large- and medium-size companies going back decades. These established companies have a lot of market share and a lot of money to quash the competition.

So rather than trying to compete in the relational database market, over the past decade, many database startups focused on alternative architectures such as document-centric databases (like MongoDB), key-value stores (like Redis) and graph databases (like Neo4J). But Cockroach Labs went against conventional wisdom with CockroachDB: It intentionally competed in the relational database market with its relational database product.

While it did face an uphill battle to penetrate the market, Cockroach Labs saw a surprising benefit: It didn’t have to invent a market. All it needed to do was grab a share of a market that also happened to be growing rapidly.

Cockroach Labs has a bright future, compelling technology, a lot of money in the bank and has an experienced, technically astute executive team.

In previous parts of this EC-1, I looked at the origins of CockroachDB, presented an in-depth technical description of its product as well as an analysis of the company’s developer relations and cloud service, CockroachCloud. In this final installment, we’ll look at the future of the company, the competitive landscape within the relational database market, its ability to retain talent as it looks toward a potential IPO or acquisition, and the risks it faces.

CockroachDB’s success is not guaranteed. It has to overcome significant hurdles to secure a profitable place for itself among a set of well-established database technologies that are owned by companies with very deep pockets.

It’s not impossible, though. We’ll first look at MongoDB as an example of how a company can break through the barriers for database startups competing with incumbents.

When life gives you Mongos, make MongoDB

Dev Ittycheria, MongoDB CEO, rings the Nasdaq Stock Market Opening Bell. Image Credits: Nasdaq, Inc

MongoDB is a good example of the risks that come with trying to invent a new database market. The company started out as a purely document-centric database at a time when that approach was the exception rather than the rule.

Web developers like document-centric databases because they address a number of common use cases in their work. For example, a document-centric database works well for storing comments to a blog post or a customer’s entire order history and profile.

Jun
02
2021
--

How Expensify hacked its way to a robust, scalable tech stack

Take a close look at any ambitious startup and you’ll find pugnacity nestled in its core. Stubbornness and a bullheaded belief in the worth of what a company wants to bring to fruition is often the biggest driver of its success, and the people at such companies also tend to share this quality.

So it wouldn’t be too far off the mark to say the people at Expensify are a stubborn lot — to the company’s ultimate benefit. This group of P2P pirates/hackers that set out to build an expense management app stuck to their gut, made their own rules. They asked questions few thought of, like: Why have lots of employees when you can find a way to get work done and reach impressive profitability with a few? Why work from an office in San Francisco when the internet lets you work from anywhere, even a sailboat in the Caribbean?

It makes sense in a way: If you’re a pirate, to hell with the rules, right? And even more so when nobody can explain the rules in the first place.

With that in mind, one could assume Expensify decided to ask itself: Why not build our own totally custom tech stack? Indeed, Expensify has made several tech decisions that were met with disbelief — from having an open-source frontend and cross-platform mobile development to hiring contractors to train its AI and recruiting open-source contributors — but its belief in its own choices has paid off over the years, and the company is ready to IPO any day now.

How much of a tech advantage Expensify enjoys owing to such choices is an open question, but one thing is clear: These choices are key to understanding Expensify and its roadmap. Let’s take a look.

Built on Bedrock

I think another question Expensify also decided to ask in its early days was something like: Why not have our database on top of a technology that’s built for small-scale application software?

It may sound incredible, but Expensify actually runs on a custom database built on top of SQLite. This is surprising, because despite being one of the most widely deployed database engines, SQLite is known for running on small, embedded systems like smartphones and web browsers, not powering enterprise-scale databases.

It may sound incredible, but Expensify actually runs on a custom database built on top of SQLite.

This custom database is called Bedrock, and its architecture is as unique as they come. Expensify explains it as an “RDBMS optimized for self-healing replication across relatively slow, relatively unreliable WAN (internet) connections, enabling extremely high availability/high performance multi-datacenter deployments without any single point of failure.” RDBMS means relational database management system, describing SQLite and other row-based databases where entries are interconnected with each other.

But why would Expensify build this instead of going for any number of widely available enterprise database solutions?

To answer that question, we need to go back to the early days of the company, which was originally a side project for its founder and CEO, David Barrett. His initial idea was to develop a prepaid card for the homeless, but this required putting a server on the Visa network, which brought several strict requirements and challenges. “I would say one of the most difficult [parts] was that I needed the ability to automatically replicate and failover,” Barrett told TechCrunch when we interviewed him a couple of months ago.

This was no easy feat in 2007, but Barrett was up for the challenge. “I just hit a moment where the technology available off the shelf just wasn’t that good. And I happened to be a peer-to-peer software developer who had tons of spare time and really wanted to build this thing to put on the Visa backend,” he said. The P2P aspect was important, as Barrett had the skills to make it work. His first hires for Expensify, P2P engineers he had worked with at Red Swoosh and Akamai, were also unusually suited for the job.

Jun
12
2019
--

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook -incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.”

Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the ’80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational database is a piece of software that would just give you a database. And you needed a way to query that database, and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open-source technologies that power both the services.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational database for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”

Jan
24
2019
--

Microsoft acquires Citus Data

Microsoft today announced that it has acquired Citus Data, a company that focused on making PostgreSQL databases faster and more scalable. Citus’ open-source PostgreSQL extension essentially turns the application into a distributed database and, while there has been a lot of hype around the NoSQL movement and document stores, relational databases — and especially PostgreSQL — are still a growing market, in part because of tools from companies like Citus that overcome some of their earlier limitations.

Unsurprisingly, Microsoft plans to work with the Citus Data team to “accelerate the delivery of key, enterprise-ready features from Azure to PostgreSQL and enable critical PostgreSQL workloads to run on Azure with confidence.” The Citus co-founders echo this in their own statement, noting that “as part of Microsoft, we will stay focused on building an amazing database on top of PostgreSQL that gives our users the game-changing scale, performance, and resilience they need. We will continue to drive innovation in this space.”

PostgreSQL is obviously an open-source tool, and while the fact that Microsoft is now a major open-source contributor doesn’t come as a surprise anymore, it’s worth noting that the company stresses that it will continue to work with the PostgreSQL community. In an email, a Microsoft spokesperson also noted that “the acquisition is a proof point in the company’s commitment to open source and accelerating Azure PostgreSQL performance and scale.”

Current Citus customers include the likes of real-time analytics service Chartbeat, email security service Agari and PushOwl, though the company notes that it also counts a number of Fortune 100 companies among its users (they tend to stay anonymous). The company offers both a database as a service, an on-premises enterprise version and the free open-source edition. For the time being, it seems like that’s not changing, though over time I would suspect that Microsoft will transition users of the hosted service to Azure.

The price of the acquisition was not disclosed. Citus Data, which was founded in 2010 and graduated from the Y Combinator program, previously raised more than $13 million from the likes of Khosla Ventures, SV Angel and Data Collective.

Dec
19
2018
--

Google’s Cloud Spanner database adds new features and regions

Cloud Spanner, Google’s globally distributed relational database service, is getting a bit more distributed today with the launch of a new region and new ways to set up multi-region configurations. The service is also getting a new feature that gives developers deeper insights into their most resource-consuming queries.

With this update, Google is adding to the Cloud Spanner lineup Hong Kong (asia-east2), its newest data center location. With this, Cloud Spanner is now available in 14 out of 18 Google Cloud Platform (GCP) regions, including seven the company added this year alone. The plan is to bring Cloud Spanner to every new GCP region as they come online.

The other new region-related news is the launch of two new configurations for multi-region coverage. One, called eur3, focuses on the European Union, and is obviously meant for users there who mostly serve a local customer base. The other is called nam6 and focuses on North America, with coverage across both costs and the middle of the country, using data centers in Oregon, Los Angeles, South Carolina and Iowa. Previously, the service only offered a North American configuration with three regions and a global configuration with three data centers spread across North America, Europe and Asia.

While Cloud Spanner is obviously meant for global deployments, these new configurations are great for users who only need to serve certain markets.

As far as the new query features are concerned, Cloud Spanner is now making it easier for developers to view, inspect and debug queries. The idea here is to give developers better visibility into their most frequent and expensive queries (and maybe make them less expensive in the process).

In addition to the Cloud Spanner news, Google Cloud today announced that its Cloud Dataproc Hadoop and Spark service now supports the R language, in addition to Python 3.7 support on App Engine.

Aug
29
2017
--

Salesforce is using AI to democratize SQL so anyone can query databases in natural language

 SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce’s AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL. Read More

Apr
30
2015
--

Airtable Launches Its API And Embedded Databases

airtable Two months after launching its unique organizational tool, Airtable is rolling out two new features that’ll make it easier to share your personal spreadsheet/database hybrids with others in a way they’ll find useful. In addition to sharing direct read or edit access to the relational databases you create in its app, Airtable is today letting you draw from the underlying data using… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com