Jul
29
2019
--

Microsoft acquires data privacy and governance service BlueTalon

Microsoft today announced that it has acquired BlueTalon, a data privacy and governance service that helps enterprises set policies for how their employees can access their data. The service then enforces those policies across most popular data environments and provides tools for auditing policies and access, too.

Neither Microsoft nor BlueTalon disclosed the financial details of the transaction. Ahead of today’s acquisition, BlueTalon had raised about $27.4 million, according to Crunchbase. Investors include Bloomberg Beta, Maverick Ventures, Signia Venture Partners and Stanford’s StartX fund.

BlueTalon Policy Engine How it works

“The IP and talent acquired through BlueTalon brings a unique expertise at the apex of big data, security and governance,” writes Rohan Kumar, Microsoft’s corporate VP for Azure Data. “This acquisition will enhance our ability to empower enterprises across industries to digitally transform while ensuring right use of data with centralized data governance at scale through Azure.”

Unsurprisingly, the BlueTalon team will become part of the Azure Data Governance group, where the team will work on enhancing Microsoft’s capabilities around data privacy and governance. Microsoft already offers access and governance control tools for Azure, of course. As virtually all businesses become more data-centric, though, the need for centralized access controls that work across systems is only going to increase and new data privacy laws aren’t making this process easier.

“As we began exploring partnership opportunities with various hyperscale cloud providers to better serve our customers, Microsoft deeply impressed us,” BlueTalon CEO Eric Tilenius, who has clearly read his share of “our incredible journey” blog posts, explains in today’s announcement. “The Azure Data team was uniquely thoughtful and visionary when it came to data governance. We found them to be the perfect fit for us in both mission and culture. So when Microsoft asked us to join forces, we jumped at the opportunity.”

Jun
18
2019
--

MongoDB gets a data lake, new security features and more

MongoDB is hosting its developer conference today and, unsurprisingly, the company has quite a few announcements to make. Some are straightforward, like the launch of MongoDB 4.2 with some important new security features, while others, like the launch of the company’s Atlas Data Lake, point the company beyond its core database product.

“Our new offerings radically expand the ways developers can use MongoDB to better work with data,” said Dev Ittycheria, the CEO and president of MongoDB. “We strive to help developers be more productive and remove infrastructure headaches — with additional features along with adjunct capabilities like full-text search and data lake. IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. It’s our mission to give developers better ways to work with data wherever it resides, including in public and private clouds.”

The highlight of today’s set of announcements is probably the launch of MongoDB Atlas Data Lake. Atlas Data Lake allows users to query data, using the MongoDB Query Language, on AWS S3, no matter their format, including JSON, BSON, CSV, TSV, Parquet and Avro. To get started, users only need to point the service at their existing S3 buckets. They don’t have to manage servers or other infrastructure. Support for Data Lake on Google Cloud Storage and Azure Storage is in the works and will launch in the future.

Also new is Full-Text Search, which gives users access to advanced text search features based on the open-source Apache Lucene 8.

In addition, MongoDB is also now starting to bring together Realm, the mobile database product it acquired earlier this year, and the rest of its product lineup. Using the Realm brand, Mongo is merging its serverless platform, MongoDB Stitch, and Realm’s mobile database and synchronization platform. Realm’s synchronization protocol will now connect to MongoDB Atlas’ cloud database, while Realm Sync will allow developers to bring this data to their applications. 

“By combining Realm’s wildly popular mobile database and synchronization platform with the strengths of Stitch, we will eliminate a lot of work for developers by making it natural and easy to work with data at every layer of the stack, and to seamlessly move data between devices at the edge to the core backend,”  explained Eliot Horowitz, CTO and co-founder of MongoDB.

As for the latest release of MongoDB, the highlight of the release is a set of new security features. With this release, Mongo is implementing client-side Field Level Encryption. Traditionally, database security has always relied on server-side trust. This typically leaves the data accessible to administrators, even if they don’t have client access. If an attacker breaches the server, that’s almost automatically a catastrophic event.

With this new security model, Mongo is shifting access to the client and to the local drivers. It provides multiple encryption options; for developers to make use of this, they will use a new “encrypt” JSON scheme attribute.

This ensures that all application code can generally run unmodified, and even the admins won’t get access to the database or its logs and backups unless they get client access rights themselves. Because the logic resides in the drivers, the encryption is also handled totally separate from the actual database.

Other new features in MongoDB 4.2 include support for distributed transactions and the ability to manage MongoDB deployments from a single Kubernetes control plane.

Jun
12
2019
--

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook -incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.”

Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the ’80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational database is a piece of software that would just give you a database. And you needed a way to query that database, and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open-source technologies that power both the services.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational database for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”

May
02
2019
--

Couchbase’s mobile database gets built-in ML and enhanced synchronization features

Couchbase, the company behind the eponymous NoSQL database, announced a major update to its mobile database today that brings some machine learning smarts, as well as improved synchronization features and enhanced stats and logging support, to the software.

“We’ve led the innovation and data management at the edge since the release of our mobile database five years ago,” Couchbase’s VP of Engineering Wayne Carter told me. “And we’re excited that others are doing that now. We feel that it’s very, very important for businesses to be able to utilize these emerging technologies that do sit on the edge to drive their businesses forward, and both making their employees more effective and their customer experience better.”

The latter part is what drove a lot of today’s updates, Carter noted. He also believes that the database is the right place to do some machine learning. So with this release, the company is adding predictive queries to its mobile database. This new API allows mobile apps to take pre-trained machine learning models and run predictive queries against the data that is stored locally. This would allow a retailer to create a tool that can use a phone’s camera to figure out what part a customer is looking for.

To support these predictive queries, Couchbase mobile is also getting support for predictive indexes. “Predictive indexes allow you to create an index on prediction, enabling correlation of real-time predictions with application data in milliseconds,” Carter said. In many ways, that’s also the unique value proposition for bringing machine learning into the database. “What you really need to do is you need to utilize the unique values of a database to be able to deliver the answer to those real-time questions within milliseconds,” explained Carter.

The other major new feature in this release is delta synchronization, which allows businesses to push far smaller updates to the databases on their employees’ mobile devices. That’s because they only have to receive the information that changed instead of a full updated database. Carter says this was a highly requested feature, but until now, the company always had to prioritize work on other components of Couchbase.

This is an especially useful feature for the company’s retail customers, a vertical where it has been quite successful. These users need to keep their catalogs up to data and quite a few of them supply their employees with mobile devices to help shoppers. Rumor has it that Apple, too, is a Couchbase user.

The update also includes a few new features that will be more of interest to operators, including advanced stats reporting and enhanced logging support.

Apr
24
2019
--

Databricks open-sources Delta Lake to make data lakes more reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these vast data repositories.

Delta Lake, which has long been a proprietary part of Databrick’s offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

“Today nearly every company has a data lake they are trying to gain insights from, but data lakes have proven to lack data reliability. Delta Lake has eliminated these challenges for hundreds of enterprises. By making Delta Lake open source, developers will be able to easily build reliable data lakes and turn them into ‘Delta Lakes’,” said Ali Ghodsi, co-founder and CEO at Databricks.

What’s important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs.

The company is still looking at how the project will be governed in the future. “We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead,” Ghodsi said. “One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses.”

To invite this community, Databricks plans to take outside contributions, just like the Spark project.

“We want Delta Lake technology to be used everywhere on-prem and in the cloud by small and large enterprises,” said Ghodsi. “This approach is the fastest way to build something that can become a standard by having the community provide direction and contribute to the development efforts.” That’s also why the company decided against a Commons Clause licenses that some open-source companies now use to prevent others (and especially large clouds) from using their open source tools in their own commercial SaaS offerings. “We believe the Commons Clause license is restrictive and will discourage adoption. Our primary goal with Delta Lake is to drive adoption on-prem as well as in the cloud.”

Apr
09
2019
--

Google Cloud challenges AWS with new open-source integrations

Google today announced that it has partnered with a number of top open-source data management and analytics companies to integrate their products into its Google Cloud Platform and offer them as managed services operated by its partners. The partners here are Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j and Redis Labs.

The idea here, Google says, is to provide users with a seamless user experience and the ability to easily leverage these open-source technologies in Google’s cloud. But there is a lot more at play here, even though Google never quite says so. That’s because Google’s move here is clearly meant to contrast its approach to open-source ecosystems with Amazon’s. It’s no secret that Amazon’s AWS cloud computing platform has a reputation for taking some of the best open-source projects and then forking those and packaging them up under its own brand, often without giving back to the original project. There are some signs that this is changing, but a number of companies have recently taken action and changed their open-source licenses to explicitly prevent this from happening.

That’s where things get interesting, because those companies include Confluent, Elastic, MongoDB, Neo4j and Redis Labs — and those are all partnering with Google on this new project, though it’s worth noting that InfluxData is not taking this new licensing approach and that while DataStax uses lots of open-source technologies, its focus is very much on its enterprise edition.

“As you are aware, there has been a lot of debate in the industry about the best way of delivering these open-source technologies as services in the cloud,” Manvinder Singh, the head of infrastructure partnerships at Google Cloud, said in a press briefing. “Given Google’s DNA and the belief that we have in the open-source model, which is demonstrated by projects like Kubernetes, TensorFlow, Go and so forth, we believe the right way to solve this it to work closely together with companies that have invested their resources in developing these open-source technologies.”

So while AWS takes these projects and then makes them its own, Google has decided to partner with these companies. While Google and its partners declined to comment on the financial arrangements behind these deals, chances are we’re talking about some degree of profit-sharing here.

“Each of the major cloud players is trying to differentiate what it brings to the table for customers, and while we have a strong partnership with Microsoft and Amazon, it’s nice to see that Google has chosen to deepen its partnership with Atlas instead of launching an imitation service,” Sahir Azam, the senior VP of Cloud Products at MongoDB told me. “MongoDB and GCP have been working closely together for years, dating back to the development of Atlas on GCP in early 2017. Over the past two years running Atlas on GCP, our joint teams have developed a strong working relationship and support model for supporting our customers’ mission critical applications.”

As for the actual functionality, the core principle here is that Google will deeply integrate these services into its Cloud Console; for example, similar to what Microsoft did with Databricks on Azure. These will be managed services and Google Cloud will handle the invoicing and the billings will count toward a user’s Google Cloud spending commitments. Support will also run through Google, so users can use a single service to manage and log tickets across all of these services.

Redis Labs CEO and co-founder Ofer Bengal echoed this. “Through this partnership, Redis Labs and Google Cloud are bringing these innovations to enterprise customers, while giving them the choice of where to run their workloads in the cloud, he said. “Customers now have the flexibility to develop applications with Redis Enterprise using the fully integrated managed services on GCP. This will include the ability to manage Redis Enterprise from the GCP console, provisioning, billing, support, and other deep integrations with GCP.”

Apr
02
2019
--

How to handle dark data compliance risk at your company

Slack and other consumer-grade productivity tools have been taking off in workplaces large and small — and data governance hasn’t caught up.

Whether it’s litigation, compliance with regulations like GDPR or concerns about data breaches, legal teams need to account for new types of employee communication. And that’s hard when work is happening across the latest messaging apps and SaaS products, which make data searchability and accessibility more complex.

Here’s a quick look at the problem, followed by our suggestions for best practices at your company.

Problems

The increasing frequency of reported data breaches and expanding jurisdiction of new privacy laws are prompting conversations about dark data and risks at companies of all sizes, even small startups. Data risk discussions necessarily include the risk of a data breach, as well as preservation of data. Just two weeks ago it was reported that Jared Kushner used WhatsApp for official communications and screenshots of those messages for preservation, which commentators say complies with record keeping laws but raises questions about potential admissibility as evidence.

Mar
27
2019
--

Microsoft, Adobe and SAP prepare to expand their Open Data Initiative

At last year’s Microsoft Ignite conference, the CEOs of Microsoft, Adobe and SAP took the stage to announce the launch of the Open Data Initiative. The idea behind this effort was to make it easier for their customers to move data between each others’ services by standardizing on a common data format and helping them move their data out of their respective silos and into a single customer-chosen data lake. At this week’s Adobe Summit, the three companies today announced how they plan to expand this program as they look to bring in additional partners.

“The intent of the companies joining forces was really to solve a common customer problem that we hear time and time again, which is that there are high-value business data tends to be very siloed in a variety of different applications,” Alysa Taylor, Microsoft’s corporate vice president, Business Applications & Global Industry, told me. “Being able to extract that data, reason over that data, garner intelligence from that data, is very cost-prohibitive and it’s very manual and time-consuming.”

The core principle of the alliance is that the customers own their data and they should be able to get as much value out of it as they can. Ideally, having this common data schema means that the customer doesn’t have to figure out ways to transform the data from these vendors and can simply flow all of it into a single data lake that then in turn feeds the various analytics services, machine learning systems and other tools that these companies offer.

At the Adobe Summit today, the three companies showed their first customer use case based on how Unilever is making use of this common data standard. More importantly, though, they also stressed that the Open Data Initiative is indeed open to others. As a first step, the three companies today announced the formation of a partner advisory council.

“What this basically means is that we’ve extended it out to key participants in the ecosystem to come and join us as part of this ODI effort,” Adobe’s VP of Ecosystem Development Amit Ahuja told me. “What we’re starting with is really a focus around two big groups of partners. Number one is, who are the other really interesting ISVs who have a lot of this core data that we want to make sure we can bring into this kind of single unified view. And the second piece is who are the major players out there that are trying to help these customers around their enterprise architecture.”

The first 12 partners that are joining this new council include Accenture, Amadeus, Capgemini, Change Healthcare, Cognizant, EY, Finastra, Genesys, Hootsuite, Inmobi, Sprinklr and WPP . This is very much a first step, though. Over time, the group expects to expand far beyond this first set of partners and include a much larger group of stakeholders.

“We really want to make this really broad in a way that we can quickly make progress and demonstrate that what we’re talking about from a conceptual process has really hard customer benefits attached to it,” Abhay Kumar, SAP’s global vice president, Global Business Development & Ecosystem, noted. The use cases the alliance has identified focus on market intelligence, sales intelligence and services intelligence, he added.

Today, as enterprises often pull in data from dozens of disparate systems, making sense of all that information is hard enough, but to even get to this point, enterprises first have to transform it and make it usable. To do so, they then have to deploy another set of applications that massages the data. “I don’t want to go and buy another 15 or 20 applications to make that work,” Ahuja said. “I want to realize the investment and the ROI of the applications that I’ve already bought.”

All three stressed that this is very much a collaborative effort that spans the engineering, sales and product marketing groups.

Feb
20
2019
--

Why Daimler moved its big data platform to the cloud

Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon, the company today announced that it has successfully moved its on-premises big data platform to Microsoft’s Azure cloud. This new platform, which the company calls eXtollo, is Daimler’s first major service to run outside of its own data centers, though it’ll probably not be the last.

As Daimler’s head of its corporate center of excellence for advanced analytics and big data Guido Vetter told me, the company started getting interested in big data about five years ago. “We invested in technology — the classical way, on-premise — and got a couple of people on it. And we were investigating what we could do with data because data is transforming our whole business as well,” he said.

By 2016, the size of the organization had grown to the point where a more formal structure was needed to enable the company to handle its data at a global scale. At the time, the buzz phrase was “data lakes” and the company started building its own in order to build out its analytics capacities.

Electric lineup, Daimler AG

“Sooner or later, we hit the limits as it’s not our core business to run these big environments,” Vetter said. “Flexibility and scalability are what you need for AI and advanced analytics and our whole operations are not set up for that. Our backend operations are set up for keeping a plant running and keeping everything safe and secure.” But in this new world of enterprise IT, companies need to be able to be flexible and experiment — and, if necessary, throw out failed experiments quickly.

So about a year and a half ago, Vetter’s team started the eXtollo project to bring all the company’s activities around advanced analytics, big data and artificial intelligence into the Azure Cloud, and just over two weeks ago, the team shut down its last on-premises servers after slowly turning on its solutions in Microsoft’s data centers in Europe, the U.S. and Asia. All in all, the actual transition between the on-premises data centers and the Azure cloud took about nine months. That may not seem fast, but for an enterprise project like this, that’s about as fast as it gets (and for a while, it fed all new data into both its on-premises data lake and Azure).

If you work for a startup, then all of this probably doesn’t seem like a big deal, but for a more traditional enterprise like Daimler, even just giving up control over the physical hardware where your data resides was a major culture change and something that took quite a bit of convincing. In the end, the solution came down to encryption.

“We needed the means to secure the data in the Microsoft data center with our own means that ensure that only we have access to the raw data and work with the data,” explained Vetter. In the end, the company decided to use the Azure Key Vault to manage and rotate its encryption keys. Indeed, Vetter noted that knowing that the company had full control over its own data was what allowed this project to move forward.

Vetter tells me the company obviously looked at Microsoft’s competitors as well, but he noted that his team didn’t find a compelling offer from other vendors in terms of functionality and the security features that it needed.

Today, Daimler’s big data unit uses tools like HD Insights and Azure Databricks, which covers more than 90 percents of the company’s current use cases. In the future, Vetter also wants to make it easier for less experienced users to use self-service tools to launch AI and analytics services.

While cost is often a factor that counts against the cloud, because renting server capacity isn’t cheap, Vetter argues that this move will actually save the company money and that storage costs, especially, are going to be cheaper in the cloud than in its on-premises data center (and chances are that Daimler, given its size and prestige as a customer, isn’t exactly paying the same rack rate that others are paying for the Azure services).

As with so many big data AI projects, predictions are the focus of much of what Daimler is doing. That may mean looking at a car’s data and error code and helping the technician diagnose an issue or doing predictive maintenance on a commercial vehicle. Interestingly, the company isn’t currently bringing to the cloud any of its own IoT data from its plants. That’s all managed in the company’s on-premises data centers because it wants to avoid the risk of having to shut down a plant because its tools lost the connection to a data center, for example.

Feb
19
2019
--

Redis Labs raises a $60M Series E round

Redis Labs, a startup that offers commercial services around the Redis in-memory data store (and which counts Redis creator and lead developer Salvatore Sanfilippo among its employees), today announced that it has raised a $60 million Series E funding round led by private equity firm Francisco Partners.

The firm didn’t participate in any of Redis Labs’ previous rounds, but existing investors Goldman Sachs Private Capital Investing, Bain Capital Ventures, Viola Ventures and Dell Technologies Capital all participated in this round.

In total, Redis Labs has now raised $146 million and the company plans to use the new funding to accelerate its go-to-market strategy and continue to invest in the Redis community and product development.

Current Redis Labs users include the likes of American Express, Staples, Microsoft, Mastercard and Atlassian . In total, the company now has more than 8,500 customers. Because it’s pretty flexible, these customers use the service as a database, cache and message broker, depending on their needs. The company’s flagship product is Redis Enterprise, which extends the open-source Redis platform with additional tools and services for enterprises. The company offers managed cloud services, which give businesses the choice between hosting on public clouds like AWS, GCP and Azure, as well as their private clouds, in addition to traditional software downloads and licenses for self-managed installs.

Redis Labs CEO Ofer Bengal told me the company’s isn’t cash positive yet. He also noted that the company didn’t need to raise this round but that he decided to do so in order to accelerate growth. “In this competitive environment, you have to spend a lot and push hard on product development,” he said.

It’s worth noting that he stressed that Francisco Partners has a reputation for taking companies forward and the logical next step for Redis Labs would be an IPO. “We think that we have a very unique opportunity to build a very large company that deserves an IPO,” he said.

Part of this new competitive environment also involves competitors that use other companies’ open-source projects to build their own products without contributing back. Redis Labs was one of the first of a number of open-source companies that decided to offer its newest releases under a new license that still allows developers to modify the code but that forces competitors that want to essentially resell it to buy a commercial license. Ofer specifically notes AWS in this context. It’s worth noting that this isn’t about the Redis database itself but about the additional modules that Redis Labs built. Redis Enterprise itself is closed-source.

“When we came out with this new license, there were many different views,” he acknowledged. “Some people condemned that. But after the initial noise calmed down — and especially after some other companies came out with a similar concept — the community now understands that the original concept of open source has to be fixed because it isn’t suitable anymore to the modern era where cloud companies use their monopoly power to adopt any successful open source project without contributing anything to it.”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com