May
02
2019
--

Couchbase’s mobile database gets built-in ML and enhanced synchronization features

Couchbase, the company behind the eponymous NoSQL database, announced a major update to its mobile database today that brings some machine learning smarts, as well as improved synchronization features and enhanced stats and logging support, to the software.

“We’ve led the innovation and data management at the edge since the release of our mobile database five years ago,” Couchbase’s VP of Engineering Wayne Carter told me. “And we’re excited that others are doing that now. We feel that it’s very, very important for businesses to be able to utilize these emerging technologies that do sit on the edge to drive their businesses forward, and both making their employees more effective and their customer experience better.”

The latter part is what drove a lot of today’s updates, Carter noted. He also believes that the database is the right place to do some machine learning. So with this release, the company is adding predictive queries to its mobile database. This new API allows mobile apps to take pre-trained machine learning models and run predictive queries against the data that is stored locally. This would allow a retailer to create a tool that can use a phone’s camera to figure out what part a customer is looking for.

To support these predictive queries, Couchbase mobile is also getting support for predictive indexes. “Predictive indexes allow you to create an index on prediction, enabling correlation of real-time predictions with application data in milliseconds,” Carter said. In many ways, that’s also the unique value proposition for bringing machine learning into the database. “What you really need to do is you need to utilize the unique values of a database to be able to deliver the answer to those real-time questions within milliseconds,” explained Carter.

The other major new feature in this release is delta synchronization, which allows businesses to push far smaller updates to the databases on their employees’ mobile devices. That’s because they only have to receive the information that changed instead of a full updated database. Carter says this was a highly requested feature, but until now, the company always had to prioritize work on other components of Couchbase.

This is an especially useful feature for the company’s retail customers, a vertical where it has been quite successful. These users need to keep their catalogs up to data and quite a few of them supply their employees with mobile devices to help shoppers. Rumor has it that Apple, too, is a Couchbase user.

The update also includes a few new features that will be more of interest to operators, including advanced stats reporting and enhanced logging support.

Apr
24
2019
--

Databricks open-sources Delta Lake to make data lakes more reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise’s data lake by bringing ACID transactions to these vast data repositories.

Delta Lake, which has long been a proprietary part of Databrick’s offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill.

The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

“Today nearly every company has a data lake they are trying to gain insights from, but data lakes have proven to lack data reliability. Delta Lake has eliminated these challenges for hundreds of enterprises. By making Delta Lake open source, developers will be able to easily build reliable data lakes and turn them into ‘Delta Lakes’,” said Ali Ghodsi, co-founder and CEO at Databricks.

What’s important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs.

The company is still looking at how the project will be governed in the future. “We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead,” Ghodsi said. “One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses.”

To invite this community, Databricks plans to take outside contributions, just like the Spark project.

“We want Delta Lake technology to be used everywhere on-prem and in the cloud by small and large enterprises,” said Ghodsi. “This approach is the fastest way to build something that can become a standard by having the community provide direction and contribute to the development efforts.” That’s also why the company decided against a Commons Clause licenses that some open-source companies now use to prevent others (and especially large clouds) from using their open source tools in their own commercial SaaS offerings. “We believe the Commons Clause license is restrictive and will discourage adoption. Our primary goal with Delta Lake is to drive adoption on-prem as well as in the cloud.”

Apr
09
2019
--

Google Cloud challenges AWS with new open-source integrations

Google today announced that it has partnered with a number of top open-source data management and analytics companies to integrate their products into its Google Cloud Platform and offer them as managed services operated by its partners. The partners here are Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j and Redis Labs.

The idea here, Google says, is to provide users with a seamless user experience and the ability to easily leverage these open-source technologies in Google’s cloud. But there is a lot more at play here, even though Google never quite says so. That’s because Google’s move here is clearly meant to contrast its approach to open-source ecosystems with Amazon’s. It’s no secret that Amazon’s AWS cloud computing platform has a reputation for taking some of the best open-source projects and then forking those and packaging them up under its own brand, often without giving back to the original project. There are some signs that this is changing, but a number of companies have recently taken action and changed their open-source licenses to explicitly prevent this from happening.

That’s where things get interesting, because those companies include Confluent, Elastic, MongoDB, Neo4j and Redis Labs — and those are all partnering with Google on this new project, though it’s worth noting that InfluxData is not taking this new licensing approach and that while DataStax uses lots of open-source technologies, its focus is very much on its enterprise edition.

“As you are aware, there has been a lot of debate in the industry about the best way of delivering these open-source technologies as services in the cloud,” Manvinder Singh, the head of infrastructure partnerships at Google Cloud, said in a press briefing. “Given Google’s DNA and the belief that we have in the open-source model, which is demonstrated by projects like Kubernetes, TensorFlow, Go and so forth, we believe the right way to solve this it to work closely together with companies that have invested their resources in developing these open-source technologies.”

So while AWS takes these projects and then makes them its own, Google has decided to partner with these companies. While Google and its partners declined to comment on the financial arrangements behind these deals, chances are we’re talking about some degree of profit-sharing here.

“Each of the major cloud players is trying to differentiate what it brings to the table for customers, and while we have a strong partnership with Microsoft and Amazon, it’s nice to see that Google has chosen to deepen its partnership with Atlas instead of launching an imitation service,” Sahir Azam, the senior VP of Cloud Products at MongoDB told me. “MongoDB and GCP have been working closely together for years, dating back to the development of Atlas on GCP in early 2017. Over the past two years running Atlas on GCP, our joint teams have developed a strong working relationship and support model for supporting our customers’ mission critical applications.”

As for the actual functionality, the core principle here is that Google will deeply integrate these services into its Cloud Console; for example, similar to what Microsoft did with Databricks on Azure. These will be managed services and Google Cloud will handle the invoicing and the billings will count toward a user’s Google Cloud spending commitments. Support will also run through Google, so users can use a single service to manage and log tickets across all of these services.

Redis Labs CEO and co-founder Ofer Bengal echoed this. “Through this partnership, Redis Labs and Google Cloud are bringing these innovations to enterprise customers, while giving them the choice of where to run their workloads in the cloud, he said. “Customers now have the flexibility to develop applications with Redis Enterprise using the fully integrated managed services on GCP. This will include the ability to manage Redis Enterprise from the GCP console, provisioning, billing, support, and other deep integrations with GCP.”

Apr
02
2019
--

How to handle dark data compliance risk at your company

Slack and other consumer-grade productivity tools have been taking off in workplaces large and small — and data governance hasn’t caught up.

Whether it’s litigation, compliance with regulations like GDPR or concerns about data breaches, legal teams need to account for new types of employee communication. And that’s hard when work is happening across the latest messaging apps and SaaS products, which make data searchability and accessibility more complex.

Here’s a quick look at the problem, followed by our suggestions for best practices at your company.

Problems

The increasing frequency of reported data breaches and expanding jurisdiction of new privacy laws are prompting conversations about dark data and risks at companies of all sizes, even small startups. Data risk discussions necessarily include the risk of a data breach, as well as preservation of data. Just two weeks ago it was reported that Jared Kushner used WhatsApp for official communications and screenshots of those messages for preservation, which commentators say complies with record keeping laws but raises questions about potential admissibility as evidence.

Mar
27
2019
--

Microsoft, Adobe and SAP prepare to expand their Open Data Initiative

At last year’s Microsoft Ignite conference, the CEOs of Microsoft, Adobe and SAP took the stage to announce the launch of the Open Data Initiative. The idea behind this effort was to make it easier for their customers to move data between each others’ services by standardizing on a common data format and helping them move their data out of their respective silos and into a single customer-chosen data lake. At this week’s Adobe Summit, the three companies today announced how they plan to expand this program as they look to bring in additional partners.

“The intent of the companies joining forces was really to solve a common customer problem that we hear time and time again, which is that there are high-value business data tends to be very siloed in a variety of different applications,” Alysa Taylor, Microsoft’s corporate vice president, Business Applications & Global Industry, told me. “Being able to extract that data, reason over that data, garner intelligence from that data, is very cost-prohibitive and it’s very manual and time-consuming.”

The core principle of the alliance is that the customers own their data and they should be able to get as much value out of it as they can. Ideally, having this common data schema means that the customer doesn’t have to figure out ways to transform the data from these vendors and can simply flow all of it into a single data lake that then in turn feeds the various analytics services, machine learning systems and other tools that these companies offer.

At the Adobe Summit today, the three companies showed their first customer use case based on how Unilever is making use of this common data standard. More importantly, though, they also stressed that the Open Data Initiative is indeed open to others. As a first step, the three companies today announced the formation of a partner advisory council.

“What this basically means is that we’ve extended it out to key participants in the ecosystem to come and join us as part of this ODI effort,” Adobe’s VP of Ecosystem Development Amit Ahuja told me. “What we’re starting with is really a focus around two big groups of partners. Number one is, who are the other really interesting ISVs who have a lot of this core data that we want to make sure we can bring into this kind of single unified view. And the second piece is who are the major players out there that are trying to help these customers around their enterprise architecture.”

The first 12 partners that are joining this new council include Accenture, Amadeus, Capgemini, Change Healthcare, Cognizant, EY, Finastra, Genesys, Hootsuite, Inmobi, Sprinklr and WPP . This is very much a first step, though. Over time, the group expects to expand far beyond this first set of partners and include a much larger group of stakeholders.

“We really want to make this really broad in a way that we can quickly make progress and demonstrate that what we’re talking about from a conceptual process has really hard customer benefits attached to it,” Abhay Kumar, SAP’s global vice president, Global Business Development & Ecosystem, noted. The use cases the alliance has identified focus on market intelligence, sales intelligence and services intelligence, he added.

Today, as enterprises often pull in data from dozens of disparate systems, making sense of all that information is hard enough, but to even get to this point, enterprises first have to transform it and make it usable. To do so, they then have to deploy another set of applications that massages the data. “I don’t want to go and buy another 15 or 20 applications to make that work,” Ahuja said. “I want to realize the investment and the ROI of the applications that I’ve already bought.”

All three stressed that this is very much a collaborative effort that spans the engineering, sales and product marketing groups.

Feb
20
2019
--

Why Daimler moved its big data platform to the cloud

Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon, the company today announced that it has successfully moved its on-premises big data platform to Microsoft’s Azure cloud. This new platform, which the company calls eXtollo, is Daimler’s first major service to run outside of its own data centers, though it’ll probably not be the last.

As Daimler’s head of its corporate center of excellence for advanced analytics and big data Guido Vetter told me, the company started getting interested in big data about five years ago. “We invested in technology — the classical way, on-premise — and got a couple of people on it. And we were investigating what we could do with data because data is transforming our whole business as well,” he said.

By 2016, the size of the organization had grown to the point where a more formal structure was needed to enable the company to handle its data at a global scale. At the time, the buzz phrase was “data lakes” and the company started building its own in order to build out its analytics capacities.

Electric lineup, Daimler AG

“Sooner or later, we hit the limits as it’s not our core business to run these big environments,” Vetter said. “Flexibility and scalability are what you need for AI and advanced analytics and our whole operations are not set up for that. Our backend operations are set up for keeping a plant running and keeping everything safe and secure.” But in this new world of enterprise IT, companies need to be able to be flexible and experiment — and, if necessary, throw out failed experiments quickly.

So about a year and a half ago, Vetter’s team started the eXtollo project to bring all the company’s activities around advanced analytics, big data and artificial intelligence into the Azure Cloud, and just over two weeks ago, the team shut down its last on-premises servers after slowly turning on its solutions in Microsoft’s data centers in Europe, the U.S. and Asia. All in all, the actual transition between the on-premises data centers and the Azure cloud took about nine months. That may not seem fast, but for an enterprise project like this, that’s about as fast as it gets (and for a while, it fed all new data into both its on-premises data lake and Azure).

If you work for a startup, then all of this probably doesn’t seem like a big deal, but for a more traditional enterprise like Daimler, even just giving up control over the physical hardware where your data resides was a major culture change and something that took quite a bit of convincing. In the end, the solution came down to encryption.

“We needed the means to secure the data in the Microsoft data center with our own means that ensure that only we have access to the raw data and work with the data,” explained Vetter. In the end, the company decided to use the Azure Key Vault to manage and rotate its encryption keys. Indeed, Vetter noted that knowing that the company had full control over its own data was what allowed this project to move forward.

Vetter tells me the company obviously looked at Microsoft’s competitors as well, but he noted that his team didn’t find a compelling offer from other vendors in terms of functionality and the security features that it needed.

Today, Daimler’s big data unit uses tools like HD Insights and Azure Databricks, which covers more than 90 percents of the company’s current use cases. In the future, Vetter also wants to make it easier for less experienced users to use self-service tools to launch AI and analytics services.

While cost is often a factor that counts against the cloud, because renting server capacity isn’t cheap, Vetter argues that this move will actually save the company money and that storage costs, especially, are going to be cheaper in the cloud than in its on-premises data center (and chances are that Daimler, given its size and prestige as a customer, isn’t exactly paying the same rack rate that others are paying for the Azure services).

As with so many big data AI projects, predictions are the focus of much of what Daimler is doing. That may mean looking at a car’s data and error code and helping the technician diagnose an issue or doing predictive maintenance on a commercial vehicle. Interestingly, the company isn’t currently bringing to the cloud any of its own IoT data from its plants. That’s all managed in the company’s on-premises data centers because it wants to avoid the risk of having to shut down a plant because its tools lost the connection to a data center, for example.

Feb
19
2019
--

Redis Labs raises a $60M Series E round

Redis Labs, a startup that offers commercial services around the Redis in-memory data store (and which counts Redis creator and lead developer Salvatore Sanfilippo among its employees), today announced that it has raised a $60 million Series E funding round led by private equity firm Francisco Partners.

The firm didn’t participate in any of Redis Labs’ previous rounds, but existing investors Goldman Sachs Private Capital Investing, Bain Capital Ventures, Viola Ventures and Dell Technologies Capital all participated in this round.

In total, Redis Labs has now raised $146 million and the company plans to use the new funding to accelerate its go-to-market strategy and continue to invest in the Redis community and product development.

Current Redis Labs users include the likes of American Express, Staples, Microsoft, Mastercard and Atlassian . In total, the company now has more than 8,500 customers. Because it’s pretty flexible, these customers use the service as a database, cache and message broker, depending on their needs. The company’s flagship product is Redis Enterprise, which extends the open-source Redis platform with additional tools and services for enterprises. The company offers managed cloud services, which give businesses the choice between hosting on public clouds like AWS, GCP and Azure, as well as their private clouds, in addition to traditional software downloads and licenses for self-managed installs.

Redis Labs CEO Ofer Bengal told me the company’s isn’t cash positive yet. He also noted that the company didn’t need to raise this round but that he decided to do so in order to accelerate growth. “In this competitive environment, you have to spend a lot and push hard on product development,” he said.

It’s worth noting that he stressed that Francisco Partners has a reputation for taking companies forward and the logical next step for Redis Labs would be an IPO. “We think that we have a very unique opportunity to build a very large company that deserves an IPO,” he said.

Part of this new competitive environment also involves competitors that use other companies’ open-source projects to build their own products without contributing back. Redis Labs was one of the first of a number of open-source companies that decided to offer its newest releases under a new license that still allows developers to modify the code but that forces competitors that want to essentially resell it to buy a commercial license. Ofer specifically notes AWS in this context. It’s worth noting that this isn’t about the Redis database itself but about the additional modules that Redis Labs built. Redis Enterprise itself is closed-source.

“When we came out with this new license, there were many different views,” he acknowledged. “Some people condemned that. But after the initial noise calmed down — and especially after some other companies came out with a similar concept — the community now understands that the original concept of open source has to be fixed because it isn’t suitable anymore to the modern era where cloud companies use their monopoly power to adopt any successful open source project without contributing anything to it.”

Feb
07
2019
--

Microsoft Azure sets its sights on more analytics workloads

Enterprises now amass huge amounts of data, both from their own tools and applications, as well as from the SaaS applications they use. For a long time, that data was basically exhaust. Maybe it was stored for a while to fulfill some legal requirements, but then it was discarded. Now, data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful.

Today, it’s Microsoft turn to shine the spotlight on its data analytics services. The actual news here is pretty straightforward. Two of these are services that are moving into general availability: the second generation of Azure Data Lake Storage for big data analytics workloads and Azure Data Explorer, a managed service that makes easier ad-hoc analysis of massive data volumes. Microsoft is also previewing a new feature in Azure Data Factory, its graphical no-code service for building data transformation. Data Factory now features the ability to map data flows.

Those individual news pieces are interesting if you are a user or are considering Azure for your big data workloads, but what’s maybe more important here is that Microsoft is trying to offer a comprehensive set of tools for managing and storing this data — and then using it for building analytics and AI services.

(Photo credit:Josh Edelson/AFP/Getty Images)

“AI is a top priority for every company around the globe,” Julia White, Microsoft’s corporate VP for Azure, told me. “And as we are working with our customers on AI, it becomes clear that their analytics often aren’t good enough for building an AI platform.” These companies are generating plenty of data, which then has to be pulled into analytics systems. She stressed that she couldn’t remember a customer conversation in recent months that didn’t focus on AI. “There is urgency to get to the AI dream,” White said, but the growth and variety of data presents a major challenge for many enterprises. “They thought this was a technology that was separate from their core systems. Now it’s expected for both customer-facing and line-of-business applications.”

Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimized for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be very fast. “The world of analytics tended to be defined by having to decide upfront and then building rigid structures around it to get the performance you wanted,” explained White. Data Lake Storage, on the other hand, wants to offer the best of both worlds.

Likewise, White argued that while many enterprises used to keep these services on their on-premises servers, many of them are still appliance-based. But she believes the cloud has now reached the point where the price/performance calculations are in its favor. It took a while to get to this point, though, and to convince enterprises. White noted that for the longest time, enterprises that looked at their analytics projects thought $300 million projects took forever, tied up lots of people and were frankly a bit scary. “But also, what we had to offer in the cloud hasn’t been amazing until some of the recent work,” she said. “We’ve been on a journey — as well as the other cloud vendors — and the price performance is now compelling.” And it sure helps that if enterprises want to meet their AI goals, they’ll now have to tackle these workloads, too.

Jan
24
2019
--

Microsoft acquires Citus Data

Microsoft today announced that it has acquired Citus Data, a company that focused on making PostgreSQL databases faster and more scalable. Citus’ open-source PostgreSQL extension essentially turns the application into a distributed database and, while there has been a lot of hype around the NoSQL movement and document stores, relational databases — and especially PostgreSQL — are still a growing market, in part because of tools from companies like Citus that overcome some of their earlier limitations.

Unsurprisingly, Microsoft plans to work with the Citus Data team to “accelerate the delivery of key, enterprise-ready features from Azure to PostgreSQL and enable critical PostgreSQL workloads to run on Azure with confidence.” The Citus co-founders echo this in their own statement, noting that “as part of Microsoft, we will stay focused on building an amazing database on top of PostgreSQL that gives our users the game-changing scale, performance, and resilience they need. We will continue to drive innovation in this space.”

PostgreSQL is obviously an open-source tool, and while the fact that Microsoft is now a major open-source contributor doesn’t come as a surprise anymore, it’s worth noting that the company stresses that it will continue to work with the PostgreSQL community. In an email, a Microsoft spokesperson also noted that “the acquisition is a proof point in the company’s commitment to open source and accelerating Azure PostgreSQL performance and scale.”

Current Citus customers include the likes of real-time analytics service Chartbeat, email security service Agari and PushOwl, though the company notes that it also counts a number of Fortune 100 companies among its users (they tend to stay anonymous). The company offers both a database as a service, an on-premises enterprise version and the free open-source edition. For the time being, it seems like that’s not changing, though over time I would suspect that Microsoft will transition users of the hosted service to Azure.

The price of the acquisition was not disclosed. Citus Data, which was founded in 2010 and graduated from the Y Combinator program, previously raised more than $13 million from the likes of Khosla Ventures, SV Angel and Data Collective.

Sep
20
2018
--

MariaDB acquires Clustrix

MariaDB, the company behind the eponymous MySQL drop-in replacement database, today announced that it has acquired Clustrix, which itself is a MySQL drop-in replacement database, but with a focus on scalability. MariaDB will integrate Clustrix’s technology into its own database, which will allow it to offer its users a more scalable database service in the long run.

That by itself would be an interesting development for the popular open source database company. But there’s another angle to this story, too. In addition to the acquisition, MariaDB also today announced that cloud computing company ServiceNow is investing in MariaDB, an investment that helped it get to today’s acquisition. ServiceNow doesn’t typically make investments, though it has made a few acquisitions. It is a very large MariaDB user, though, and it’s exactly the kind of customer that will benefit from the Clustrix acquisition.

MariaDB CEO Michael Howard tells me that ServiceNow current supports about 80,000 instances of MariaDB. With this investment (which is actually an add-on to MariaDB’s 2017 Series C round), ServiceNow’s SVP of Development and Operations Pat Casey will join MariaDB’s board.

Why would MariaDB acquire a company like Clustrix, though? When I asked Howard about the motivation, he noted that he’s now seeing more companies like ServiceNow that are looking at a more scalable way to run MariaDB. Howard noted that it would take years to build a new database engine from the ground up.

“You can hire a lot of smart people individually, but not necessarily have that experience built into their profile,” he said. “So that was important and then to have a jumpstart in relation to this market opportunity — this mandate from our market. It typically takes about nine years, to get a brand new, thorough database technology off the ground. It’s not like a SaaS application where you can get a front-end going in about a year or so.

Howard also stressed that the fact that the teams at Clustrix and MariaDB share the same vocabulary, given that they both work on similar problems and aim to be compatible with MySQL, made this a good fit.

While integrating the Clustrix database technology into MariaDB won’t be trivial, Howard stressed that the database was always built to accommodate external database storage engines. MariaDB will have to make some changes to its APIs to be ready for the clustering features of Clustrix. “It’s not going to be a 1-2-3 effort,” he said. “It’s going to be a heavy-duty effort for us to do this right. But everyone on the team wants to do it because it’s good for the company and our customers.

MariaDB did not disclose the price of the acquisition. Since it was founded in 2006, though, the Y Combinator-incubated Clustrix had raised just under $72 million, though. MariaDB has raised just under $100 million so far, so it’s probably a fair guess that Clustrix didn’t necessarily sell for a large multiple of that.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com