Sep
22
2020
--

Microsoft brings data services to its Arc multi-cloud management service

Microsoft today launched a major update to its Arc multi-cloud service that allows Azure customers to run and manage workloads across clouds — including those of Microsoft’s competitors — and their on-premises data centers. First announced at Microsoft Ignite in 2019, Arc was always meant to not just help users manage their servers but also allow them to run data services like Azure SQL and Azure Database for PostgreSQL, close to where their data sits.

Today, the company is making good on this promise with the preview launch of Azure Arc-enabled data services with support for, as expected, Azure SQL and Azure Database for PostgreSQL.

In addition, Microsoft is making the core feature of Arc, Arc-enabled servers, generally available. These are the tools at the core of the service that allow enterprises that use the standard Azure Portal to manage and monitor their Windows and Linux servers across their multi-cloud and edge environments.

Image Credits: Microsoft

“We’ve always known that enterprises are looking to unlock the agility of the cloud — they love the app model, they love the business model — while balancing a need to maintain certain applications and workloads on premises,” Rohan Kumar, Microsoft’s corporate VP for Azure Data said. “A lot of customers actually have a multi-cloud strategy. In some cases, they need to keep the data specifically for regulatory compliance. And in many cases, they want to maximize their existing investments. They’ve spent a lot of CapEx.”

As Kumar stressed, Microsoft wants to meet customers where they are, without forcing them to adopt a container architecture, for example, or replace their specialized engineered appliances to use Arc.

“Hybrid is really [about] providing that flexible choice to our customers, meeting them where they are, and not prescribing a solution,” he said.

He admitted that this approach makes engineering the solution more difficult, but the team decided the baseline should be a container endpoint and nothing more. And for the most part, Microsoft packaged up the tools its own engineers were already using to run Azure services on the company’s own infrastructure to manage these services in a multi-cloud environment.

“In hindsight, it was a little challenging at the beginning, because, you can imagine, when we initially built them, we didn’t imagine that we’ll be packaging them like this. But it’s a very modern design point,” Kumar said. But the result is that supporting customers is now relatively easy because it’s so similar to what the team does in Azure, too.

Kumar noted that one of the selling points for the Azure Data Services is also that the version of Azure SQL is essentially evergreen, allowing them to stop worrying about SQL Server licensing and end-of-life support questions.

Sep
15
2020
--

Data virtualization service Varada raises $12M

Varada, a Tel Aviv-based startup that focuses on making it easier for businesses to query data across services, today announced that it has raised a $12 million Series A round led by Israeli early-stage fund MizMaa Ventures, with participation by Gefen Capital.

“If you look at the storage aspect for big data, there’s always innovation, but we can put a lot of data in one place,” Varada CEO and co-founder Eran Vanounou told me. “But translating data into insight? It’s so hard. It’s costly. It’s slow. It’s complicated.”

That’s a lesson he learned during his time as CTO of LivePerson, which he described as a classic big data company. And just like at LivePerson, where the team had to reinvent the wheel to solve its data problems, again and again, every company — and not just the large enterprises — now struggles with managing their data and getting insights out of it, Vanounou argued.

varada architecture diagram

Image Credits: Varada

The rest of the founding team, David Krakov, Roman Vainbrand and Tal Ben-Moshe, already had a lot of experience in dealing with these problems, too, with Ben-Moshe having served at the chief software architect of Dell EMC’s XtremIO flash array unit, for example. They built the system for indexing big data that’s at the core of Varada’s platform (with the open-source Presto SQL query engine being one of the other cornerstones).

Image Credits: Varada

Essentially, Varada embraces the idea of data lakes and enriches that with its indexing capabilities. And those indexing capabilities is where Varada’s smarts can be found. As Vanounou explained, the company is using a machine learning system to understand when users tend to run certain workloads, and then caches the data ahead of time, making the system far faster than its competitors.

“If you think about big organizations and think about the workloads and the queries, what happens during the morning time is different from evening time. What happened yesterday is not what happened today. What happened on a rainy day is not what happened on a shiny day. […] We listen to what’s going on and we optimize. We leverage the indexing technology. We index what is needed when it is needed.”

That helps speed up queries, but it also means less data has to be replicated, which also brings down the cost. As MizMaa’s Aaron Applbaum noted, since Varada is not a SaaS solution, the buyers still get all of the discounts from their cloud providers, too.

In addition, the system can allocate resources intelligently so that different users can tap into different amounts of bandwidth. You can tell it to give customers more bandwidth than your financial analysts, for example.

“Data is growing like crazy: in volume, in scale, in complexity, in who requires it and what the business intelligence uses are, what the API uses are,” Applbaum said when I asked him why he decided to invest. “And compute is getting slightly cheaper, but not really, and storage is getting cheaper. So if you can make the trade-off to store more stuff, and access things more intelligently, more quickly, more agile — that was the basis of our thesis, as long as you can do it without compromising performance.”

Varada, with its team of experienced executives, architects and engineers, ticked a lot of the company’s boxes in this regard, but he also noted that unlike some other Israeli startups, the team understood that it had to listen to customers and understand their needs, too.

“In Israel, you have a history — and it’s become less and less the case — but historically, there’s a joke that it’s ‘ready, fire, aim.’ You build a technology, you’ve got this beautiful thing and you’re like, ‘alright, we did it,’ but without listening to the needs of the customer,” he explained.

The Varada team is not afraid to compare itself to Snowflake, which at least at first glance seems to make similar promises. Vananou praised the company for opening up the data warehousing market and proving that people are willing to pay for good analytics. But he argues that Varada’s approach is fundamentally different.

“We embrace the data lake. So if you are Mr. Customer, your data is your data. We’re not going to take it, move it, copy it. This is your single source of truth,” he said. And in addition, the data can stay in the company’s virtual private cloud. He also argues that Varada isn’t so much focused on the business users but the technologists inside a company.

 

Jul
14
2020
--

Google Cloud’s new BigQuery Omni will let developers query data in GCP, AWS and Azure

At its virtual Cloud Next ’20 event, Google today announced a number of updates to its cloud portfolio, but the private alpha launch of BigQuery Omni is probably the highlight of this year’s event. Powered by Google Cloud’s Anthos hybrid-cloud platform, BigQuery Omni allows developers to use the BigQuery engine to analyze data that sits in multiple clouds, including those of Google Cloud competitors like AWS and Microsoft Azure — though for now, the service only supports AWS, with Azure support coming later.

Using a unified interface, developers can analyze this data locally without having to move data sets between platforms.

“Our customers store petabytes of information in BigQuery, with the knowledge that it is safe and that it’s protected,” said Debanjan Saha, the GM and VP of Engineering for Data Analytics at Google Cloud, in a press conference ahead of today’s announcement. “A lot of our customers do many different types of analytics in BigQuery. For example, they use the built-in machine learning capabilities to run real-time analytics and predictive analytics. […] A lot of our customers who are very excited about using BigQuery in GCP are also asking, ‘how can they extend the use of BigQuery to other clouds?’ ”

Image Credits: Google

Google has long said that it believes that multi-cloud is the future — something that most of its competitors would probably agree with, though they all would obviously like you to use their tools, even if the data sits in other clouds or is generated off-platform. It’s the tools and services that help businesses to make use of all of this data, after all, where the different vendors can differentiate themselves from each other. Maybe it’s no surprise then, given Google Cloud’s expertise in data analytics, that BigQuery is now joining the multi-cloud fray.

“With BigQuery Omni customers get what they wanted,” Saha said. “They wanted to analyze their data no matter where the data sits and they get it today with BigQuery Omni.”

Image Credits: Google

He noted that Google Cloud believes that this will help enterprises break down their data silos and gain new insights into their data, all while allowing developers and analysts to use a standard SQL interface.

Today’s announcement is also a good example of how Google’s bet on Anthos is paying off by making it easier for the company to not just allow its customers to manage their multi-cloud deployments but also to extend the reach of its own products across clouds. This also explains why BigQuery Omni isn’t available for Azure yet, given that Anthos for Azure is still in preview, while AWS support became generally available in April.

May
19
2020
--

Microsoft launches Azure Synapse Link to help enterprises get faster insights from their data

At its Build developer conference, Microsoft today announced Azure Synapse Link, a new enterprise service that allows businesses to analyze their data faster and more efficiently, using an approach that’s generally called “hybrid transaction/analytical processing” (HTAP). That’s a mouthful; it essentially enables enterprises to use the same database system for analytical and transactional workloads on a single system. Traditionally, enterprises had to make some trade-offs between either building a single system for both that was often highly over-provisioned or maintain separate systems for transactional and analytics workloads.

Last year, at its Ignite conference, Microsoft announced Azure Synapse Analytics, an analytics service that combines analytics and data warehousing to create what the company calls “the next evolution of Azure SQL Data Warehouse.” Synapse Analytics brings together data from Microsoft’s services and those from its partners and makes it easier to analyze.

“One of the key things, as we work with our customers on their digital transformation journey, there is an aspect of being data-driven, of being insights-driven as a culture, and a key part of that really is that once you decide there is some amount of information or insights that you need, how quickly are you able to get to that? For us, time to insight and a secondary element, which is the cost it takes, the effort it takes to build these pipelines and maintain them with an end-to-end analytics solution, was a key metric we have been observing for multiple years from our largest enterprise customers,” said Rohan Kumar, Microsoft’s corporate VP for Azure Data.

Synapse Link takes the work Microsoft did on Synaps Analytics a step further by removing the barriers between Azure’s operational databases and Synapse Analytics, so enterprises can immediately get value from the data in those databases without going through a data warehouse first.

“What we are announcing with Synapse Link is the next major step in the same vision that we had around reducing the time to insight,” explained Kumar. “And in this particular case, a long-standing barrier that exists today between operational databases and analytics systems is these complex ETL (extract, transform, load) pipelines that need to be set up just so you can do basic operational reporting or where, in a very transactionally consistent way, you need to move data from your operational system to the analytics system, because you don’t want to impact the performance of the operational system in any way because that’s typically dealing with, depending on the system, millions of transactions per second.”

ETL pipelines, Kumar argued, are typically expensive and hard to build and maintain, yet enterprises are now building new apps — and maybe even line of business mobile apps — where any action that consumers take and that is registered in the operational database is immediately available for predictive analytics, for example.

From the user perspective, enabling this only takes a single click to link the two, while it removes the need for managing additional data pipelines or database resources. That, Kumar said, was always the main goal for Synapse Link. “With a single click, you should be able to enable real-time analytics on your operational data in ways that don’t have any impact on your operational systems, so you’re not using the compute part of your operational system to do the query, you actually have to transform the data into a columnar format, which is more adaptable for analytics, and that’s really what we achieved with Synapse Link.”

Because traditional HTAP systems on-premises typically share their compute resources with the operational database, those systems never quite took off, Kumar argued. In the cloud, with Synapse Link, though, that impact doesn’t exist because you’re dealing with two separate systems. Now, once a transaction gets committed to the operational database, the Synapse Link system transforms the data into a columnar format that is more optimized for the analytics system — and it does so in real time.

For now, Synapse Link is only available in conjunction with Microsoft’s Cosmos DB database. As Kumar told me, that’s because that’s where the company saw the highest demand for this kind of service, but you can expect the company to add support for available in Azure SQL, Azure Database for PostgreSQL and Azure Database for MySQL in the future.

Apr
22
2020
--

Fishtown Analytics raises $12.9M Series A for its open-source analytics engineering tool

Philadelphia-based Fishtown Analytics, the company behind the popular open-source data engineering tool dbt, today announced that it has raised a $12.9 million Series A round led by Andreessen Horowitz, with the firm’s general partner Martin Casado joining the company’s board.

“I wrote this blog post in early 2016, essentially saying that analysts needed to work in a fundamentally different way,” Fishtown founder and CEO Tristan Handy told me, when I asked him about how the product came to be. “They needed to work in a way that much more closely mirrored the way the software engineers work and software engineers have been figuring this shit out for years and data analysts are still like sending each other Microsoft Excel docs over email.”

The dbt open-source project forms the basis of this. It allows anyone who can write SQL queries to transform data and then load it into their preferred analytics tools. As such, it sits in-between data warehouses and the tools that load data into them on one end, and specialized analytics tools on the other.

As Casado noted when I talked to him about the investment, data warehouses have now made it affordable for businesses to store all of their data before it is transformed. So what was traditionally “extract, transform, load” (ETL) has now become “extract, load, transform” (ELT). Andreessen Horowitz is already invested in Fivetran, which helps businesses move their data into their warehouses, so it makes sense for the firm to also tackle the other side of this business.

“Dbt is, as far as we can tell, the leading community for transformation and it’s a company we’ve been tracking for at least a year,” Casado said. He also argued that data analysts — unlike data scientists — are not really catered to as a group.

Before this round, Fishtown hadn’t raised a lot of money, even though it has been around for a few years now, except for a small SAFE round from Amplify.

But Handy argued that the company needed this time to prove that it was on to something and build a community. That community now consists of more than 1,700 companies that use the dbt project in some form and over 5,000 people in the dbt Slack community. Fishtown also now has over 250 dbt Cloud customers and the company signed up a number of big enterprise clients earlier this year. With that, the company needed to raise money to expand and also better service its current list of customers.

“We live in Philadelphia. The cost of living is low here and none of us really care to make a quadro-billion dollars, but we do want to answer the question of how do we best serve the community,” Handy said. “And for the first time, in the early part of the year, we were like, holy shit, we can’t keep up with all of the stuff that people need from us.”

The company plans to expand the team from 25 to 50 employees in 2020 and with those, the team plans to improve and expand the product, especially its IDE for data analysts, which Handy admitted could use a bit more polish.

Nov
26
2019
--

New Amazon capabilities put machine learning in reach of more developers

Today, Amazon announced a new approach that it says will put machine learning technology in reach of more developers and line of business users. Amazon has been making a flurry of announcements ahead of its re:Invent customer conference next week in Las Vegas.

While the company offers plenty of tools for data scientists to build machine learning models and to process, store and visualize data, it wants to put that capability directly in the hands of developers with the help of the popular database query language, SQL.

By taking advantage of tools like Amazon QuickSight, Aurora and Athena in combination with SQL queries, developers can have much more direct access to machine learning models and underlying data without any additional coding, says VP of artificial intelligence at AWS, Matt Wood.

“This announcement is all about making it easier for developers to add machine learning predictions to their products and their processes by integrating those predictions directly with their databases,” Wood told TechCrunch.

For starters, Wood says developers can take advantage of Aurora, the company’s MySQL (and Postgres)-compatible database to build a simple SQL query into an application, which will automatically pull the data into the application and run whatever machine learning model the developer associates with it.

The second piece involves Athena, the company’s serverless query service. As with Aurora, developers can write a SQL query — in this case, against any data store — and based on a machine learning model they choose, return a set of data for use in an application.

The final piece is QuickSight, which is Amazon’s data visualization tool. Using one of the other tools to return some set of data, developers can use that data to create visualizations based on it inside whatever application they are creating.

“By making sophisticated ML predictions more easily available through SQL queries and dashboards, the changes we’re announcing today help to make ML more usable and accessible to database developers and business analysts. Now anyone who can write SQL can make — and importantly use — predictions in their applications without any custom code,” Amazon’s Matt Asay wrote in a blog post announcing these new capabilities.

Asay added that this approach is far easier than what developers had to do in the past to achieve this. “There is often a large amount of fiddly, manual work required to take these predictions and make them part of a broader application, process or analytics dashboard,” he wrote.

As an example, Wood offers a lead-scoring model you might use to pick the most likely sales targets to convert. “Today, in order to do lead scoring you have to go off and wire up all these pieces together in order to be able to get the predictions into the application,” he said. With this new capability, you can get there much faster.

“Now, as a developer I can just say that I have this lead scoring model which is deployed in SageMaker, and all I have to do is write literally one SQL statement that I do all day long into Aurora, and I can start getting back that lead scoring information. And then I just display it in my application and away I go,” Wood explained.

As for the machine learning models, these can come pre-built from Amazon, be developed by an in-house data science team or purchased in a machine learning model marketplace on Amazon, says Wood.

Today’s announcements from Amazon are designed to simplify machine learning and data access, and reduce the amount of coding to get from query to answer faster.

Nov
04
2019
--

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.

Jun
12
2019
--

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook -incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.”

Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the ’80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational database is a piece of software that would just give you a database. And you needed a way to query that database, and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open-source technologies that power both the services.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational database for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”

Jun
10
2019
--

Qubole launches Quantum, its serverless database engine

Qubole, the data platform founded by Apache Hive creator and former head of Facebook’s Data Infrastructure team Ashish Thusoo, today announced the launch of Quantum, its first serverless offering.

Qubole may not necessarily be a household name, but its customers include the likes of Autodesk, Comcast, Lyft, Nextdoor and Zillow . For these users, Qubole has long offered a self-service platform that allowed their data scientists and engineers to build their AI, machine learning and analytics workflows on the public cloud of their choice. The platform sits on top of open-source technologies like Apache Spark, Presto and Kafka, for example.

Typically, enterprises have to provision a considerable amount of resources to give these platforms the resources they need. These resources often go unused and the infrastructure can quickly become complex.

Qubole already abstracts most of this away, offering what is essentially a serverless platform. With Quantum, however, it is going a step further by launching a high-performance serverless SQL engine that allows users to query petabytes of data with nothing else but ANSI-SQL, giving them the choice between using a Presto cluster or a serverless SQL engine to run their queries, for example.

The data can be stored on AWS and users won’t have to set up a second data lake or move their data to another platform to use the SQL engine. Quantum automatically scales up or down as needed, of course, and users can still work with the same metastore for their data, no matter whether they choose the clustered or serverless option. Indeed, Quantum is essentially just another SQL engine without Qubole’s overall suite of engines.

Typically, Qubole charges enterprises by compute minutes. When using Quantum, the company uses the same metric, but enterprises pay for the execution time of the query. “So instead of the Qubole compute units being associated with the number of minutes the cluster was up and running, it is associated with the Qubole compute units consumed by that particular query or that particular workload, which is even more fine-grained,” Thusoo explained. “This works really well when you have to do interactive workloads.”

Thusoo notes that Quantum is targeted at analysts who often need to perform interactive queries on data stored in object stores. Qubole integrates with services like Tableau and Looker (which Google is now in the process of acquiring). “They suddenly get access to very elastic compute capacity, but they are able to come through a very familiar user interface,” Thusoo noted.

 

May
02
2019
--

Microsoft brings Azure SQL Database to the edge (and Arm)

Microsoft today announced an interesting update to its database lineup with the preview of Azure SQL Database Edge, a new tool that brings the same database engine that powers Azure SQL Database in the cloud to edge computing devices, including, for the first time, Arm-based machines.

Azure SQL Edge, Azure corporate vice president Julia White writes in today’s announcement, “brings to the edge the same performant, secure and easy to manage SQL engine that our customers love in Azure SQL Database and SQL Server.”

The new service, which will also run on x64-based devices and edge gateways, promises to bring low-latency analytics to edge devices as it allows users to work with streaming data and time-series data, combined with the built-in machine learning capabilities of Azure SQL Database. Like its larger brethren, Azure SQL Database Edge will also support graph data and comes with the same security and encryption features that can, for example, protect the data at rest and in motion, something that’s especially important for an edge device.

As White rightly notes, this also ensures that developers only have to write an application once and then deploy it to platforms that feature Azure SQL Database, good old SQL Server on premises and this new edge version.

SQL Database Edge can run in both connected and fully disconnected fashion, something that’s also important for many use cases where connectivity isn’t always a given, yet where users need the kind of data analytics capabilities to keep their businesses (or drilling platforms, or cruise ships) running.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com