Apr
24
2019
--

Databricks open-sources Delta Lake to make data lakes more reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these vast data repositories.

Delta Lake, which has long been a proprietary part of Databrick’s offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

“Today nearly every company has a data lake they are trying to gain insights from, but data lakes have proven to lack data reliability. Delta Lake has eliminated these challenges for hundreds of enterprises. By making Delta Lake open source, developers will be able to easily build reliable data lakes and turn them into ‘Delta Lakes’,” said Ali Ghodsi, co-founder and CEO at Databricks.

What’s important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs.

The company is still looking at how the project will be governed in the future. “We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead,” Ghodsi said. “One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses.”

To invite this community, Databricks plans to take outside contributions, just like the Spark project.

“We want Delta Lake technology to be used everywhere on-prem and in the cloud by small and large enterprises,” said Ghodsi. “This approach is the fastest way to build something that can become a standard by having the community provide direction and contribute to the development efforts.” That’s also why the company decided against a Commons Clause licenses that some open-source companies now use to prevent others (and especially large clouds) from using their open source tools in their own commercial SaaS offerings. “We believe the Commons Clause license is restrictive and will discourage adoption. Our primary goal with Delta Lake is to drive adoption on-prem as well as in the cloud.”

Feb
05
2019
--

Databricks raises $250M at a $2.75B valuation for its analytics platform

Databricks, the company founded by the original team behind the Apache Spark big data analytics engine, today announced that it has raised a $250 million Series E round led by Andreessen Horowitz. Coatue Management, Green Bay Ventures, Microsoft and NEA, also participated in this round, which brings the company’s total funding to $498.5 million. Microsoft’s involvement here is probably a bit of a surprise, but it’s worth noting that it also worked with Databricks on the launch of Azure Databricks as a first-party service on the platform, something that’s still a rarity in the Azure cloud.

As Databricks also today announced, its annual recurring revenue now exceeds $100 million. The company didn’t share whether it’s cash flow-positive at this point, but Databricks CEO and co-founder Ali Ghodsi shared that the company’s valuation is now $2.75 billion.

Current customers, which the company says number around 2,000, include the likes of Nielsen, Hotels.com, Overstock, Bechtel, Shell and HP.

“What Ali and the Databricks team have built is truly phenomenal,” Green Bay Ventures co-founder Anthony Schiller told me. “Their success is a testament to product innovation at the highest level. Databricks is without question best-in-class and their impact on the industry proves it. We were thrilled to participate in this round.”

While Databricks is obviously known for its contributions to Apache Spark, the company itself monetizes that work by offering its Unified Analytics platform on top of it. This platform allows enterprises to build their data pipelines across data storage systems and prepare data sets for data scientists and engineers. To do this, Databricks offers shared notebooks and tools for building, managing and monitoring data pipelines, and then uses that data to build machine learning models, for example. Indeed, training and deploying these models is one of the company’s focus areas these days, which makes sense, given that this is one of the main use cases for big data, after all.

On top of that, Databricks also offers a fully managed service for hosting all of these tools.

“Databricks is the clear winner in the big data platform race,” said Ben Horowitz, co-founder and general partner at Andreessen Horowitz, in today’s announcement. “In addition, they have created a new category atop their world-beating Apache Spark platform called Unified Analytics that is growing even faster. As a result, we are thrilled to invest in this round.”

Ghodsi told me that Horowitz was also instrumental in getting the company to re-focus on growth. The company was already growing fast, of course, but Horowitz asked him why Databricks wasn’t growing faster. Unsurprisingly, given that it’s an enterprise company, that means aggressively hiring a larger sales force — and that’s costly. Hence the company’s need to raise at this point.

As Ghodsi told me, one of the areas the company wants to focus on is the Asia Pacific region, where overall cloud usage is growing fast. The other area the company is focusing on is support for more verticals like mass media and entertainment, federal agencies and fintech firms, which also comes with its own cost, given that the experts there don’t come cheap.

Ghodsi likes to call this “boring AI,” since it’s not as exciting as self-driving cars. In his view, though, the enterprise companies that don’t start using machine learning now will inevitably be left behind in the long run. “If you don’t get there, there’ll be no place for you in the next 20 years,” he said.

Engineering, of course, will also get a chunk of this new funding, with an emphasis on relatively new products like MLFlow and Delta, two tools Databricks recently developed and that make it easier to manage the life cycle of machine learning models and build the necessary data pipelines to feed them.

Nov
15
2017
--

Microsoft makes Databricks a first-party service on Azure

 Databricks has made a name for itself as one of the most popular commercial services around the Apache Spark data analytics platform (which, not coincidentally, was started by the founders of Databricks). Now it’s coming to Microsoft’s Azure platform in the form of a preview of the imaginatively named “Azure Databricks.” Read More

Jun
06
2017
--

Databricks releases serverless platform for Apache Spark along with new library supporting deep learning

 Today to kick off Spark Summit, Databricks announced a Serverless Platform for Apache Spark — welcome news for developers looking to reduce time spent on cluster management. The move to simplify developer experiences is set to be a major theme of the event overall. In addition to Serverless, the company also introduced Deep Learning Pipelines, a library that makes it easy to mix… Read More

Feb
17
2016
--

Databricks Launches Free Community Edition As Companion To Free Online Spark Courses

Female programmer looking intently at computer screen. Databricks, the commercial company created from the open source Apache Spark project, announced the release of a free Community Edition today aimed at teaching people how to use Spark — and as an adjunct to the free online courses (MOOCs) it created last year. The free version is a limited edition without all of the advanced features you would find in the enterprise-pay… Read More

Jun
15
2015
--

IBM Pours Researchers And Resources Into Apache Spark Project

Pouring liquid gold. IBM today pledged it would devote 3500 researchers to the open source big data project, Apache Spark. It also announced that it was open sourcing its own IBM SystemML machine learning technology in a move designed to help push it to the forefront of big data and machine learning.
These two technologies are part of the IBM transformation strategy that includes cloud, big data, analytics and… Read More

Jun
30
2014
--

Databricks Snags $33M In Series B And Debuts Cloud Platform For Processing Big Data

Sparks flying from a sparkler. Databricks, the commercial entity created by the developers of the open source Apache Spark project, announced $33M in Series B funding today and the launch of a new cloud product, their first one as a company. There is little doubt that big data is a big deal these days and companies are popping up to help customers process the data. Databricks hopes to simplify the entire matter by moving it… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com