Dec
16
2020
--

Hightouch raises $2.1M to help businesses get more value from their data warehouses

Hightouch, a SaaS service that helps businesses sync their customer data across sales and marketing tools, is coming out of stealth and announcing a $2.1 million seed round. The round was led by Afore Capital and Slack Fund, with a number of angel investors also participating.

At its core, Hightouch, which participated in Y Combinator’s Summer 2019 batch, aims to solve the customer data integration problems that many businesses today face.

During their time at Segment, Hightouch co-founders Tejas Manohar and Josh Curl witnessed the rise of data warehouses like Snowflake, Google’s BigQuery and Amazon Redshift — that’s where a lot of Segment data ends up, after all. As businesses adopt data warehouses, they now have a central repository for all of their customer data. Typically, though, this information is then only used for analytics purposes. Together with former Bessemer Ventures investor Kashish Gupta, the team decided to see how they could innovate on top of this trend and help businesses activate all of this information.

hightouch founders

HighTouch co-founders Kashish Gupta, Josh Curl and Tejas Manohar.

“What we found is that, with all the customer data inside of the data warehouse, it doesn’t make sense for it to just be used for analytics purposes — it also makes sense for these operational purposes like serving different business teams with the data they need to run things like marketing campaigns — or in product personalization,” Manohar told me. “That’s the angle that we’ve taken with Hightouch. It stems from us seeing the explosive growth of the data warehouse space, both in terms of technology advancements as well as like accessibility and adoption. […] Our goal is to be seen as the company that makes the warehouse not just for analytics but for these operational use cases.”

It helps that all of the big data warehousing platforms have standardized on SQL as their query language — and because the warehousing services have already solved the problem of ingesting all of this data, Hightouch doesn’t have to worry about this part of the tech stack either. And as Curl added, Snowflake and its competitors never quite went beyond serving the analytics use case either.

Image Credits: Hightouch

As for the product itself, Hightouch lets users create SQL queries and then send that data to different destinations — maybe a CRM system like Salesforce or a marketing platform like Marketo — after transforming it to the format that the destination platform expects.

Expert users can write their own SQL queries for this, but the team also built a graphical interface to help non-developers create their own queries. The core audience, though, is data teams — and they, too, will likely see value in the graphical user interface because it will speed up their workflows as well. “We want to empower the business user to access whatever models and aggregation the data user has done in the warehouse,” Gupta explained.

The company is agnostic to how and where its users want to operationalize their data, but the most common use cases right now focus on B2C companies, where marketing teams often use the data, as well as sales teams at B2B companies.

Image Credits: Hightouch

“It feels like there’s an emerging category here of tooling that’s being built on top of a data warehouse natively, rather than being a standard SaaS tool where it is its own data store and then you manage a secondary data store,” Curl said. “We have a class of things here that connect to a data warehouse and make use of that data for operational purposes. There’s no industry term for that yet, but we really believe that that’s the future of where data engineering is going. It’s about building off this centralized platform like Snowflake, BigQuery and things like that.”

“Warehouse-native,” Manohar suggested as a potential name here. We’ll see if it sticks.

Hightouch originally raised its round after its participation in the Y Combinator demo day but decided not to disclose it until it felt like it had found the right product/market fit. Current customers include the likes of Retool, Proof, Stream and Abacus, in addition to a number of significantly larger companies the team isn’t able to name publicly.

Dec
09
2020
--

Firebolt raises $37M to take on Snowflake, Amazon and Google with a new approach to data warehousing

For many organizations, the shift to cloud computing has played out more realistically as a shift to hybrid architectures, where a company’s data is just as likely to reside in one of a number of clouds as it might in an on-premise deployment, in a data warehouse or in a data lake. Today, a startup that has built a more comprehensive way to assess, analyse and use that data is announcing funding as it looks to take on Snowflake, Amazon, Google and others in the area of enterprise data analytics.

Firebolt, which has redesigned the concept of a data warehouse to work more efficiently and at a lower cost, is today announcing that it has raised $37 million from Zeev Ventures, TLV Partners, Bessemer Venture Partners and Angular Ventures. It plans to use the funding to continue developing its product and bring on more customers.

The company is officially “launching” today but — as is the case with so many enterprise startups these days operating in stealth — it has been around for two years already building its platform and signing commercial deals. It now has some 12 large enterprise customers and is “really busy” with new business, said CEO Eldad Farkash in an interview.

The funding may sound like a large amount for a company that has not really been out in the open, but part of the reason is because of the track record of the founders. Farkash was one of the founders of Sisense, the successful business intelligence startup, and he has co-founded Firebolt with two others who were on Sisense’s founding team, Saar Bitner as COO and Ariel Yaroshevich as CTO.

At Sisense, these three were coming up against an issue: When you are dealing in terabytes of data, cloud data warehouses were straining to deliver good performance to power its analytics and other tools, and the only way to potentially continue to mitigate that was by piling on more cloud capacity.

Farkash is something of a technical savant and said that he decided to move on and build Firebolt to see if he could tackle this, which he described as a new, difficult and “meaningful” problem. “The only thing I know how to do is build startups,” he joked.

In his opinion, while data warehousing has been a big breakthrough in how to handle the mass of data that companies now amass and want to use better, it has started to feel like a dated solution.

“Data warehouses are solving yesterday’s problem, which was, ‘How do I migrate to the cloud and deal with scale?’ ” he said, citing Google’s BigQuery, Amazon’s RedShift and Snowflake as fitting answers for that issue. “We see Firebolt as the new entrant in that space, with a new take on design on technology. We change the discussion from one of scale to one of speed and efficiency.”

The startup claims that its performance is up to 182 times faster than that of other data warehouses. It’s a SQL-based system that works on principles that Farkash said came out of academic research that had yet to be applied anywhere, around how to handle data in a lighter way, using new techniques in compression and how data is parsed. Data lakes in turn can be connected with a wider data ecosystem, and what it translates to is a much smaller requirement for cloud capacity.

This is not just a problem at Sisense. With enterprise data continuing to grow exponentially, cloud analytics is growing with it, and is estimated by 2025 to be a $65 billion market, Firebolt estimates.

Still, Farkash said the Firebolt concept was initially a challenging sell even to the engineers that it eventually hired to build out the business: It required building completely new warehouses from the ground up to run the platform, five of which exist today and will be augmented with more, on the back of this funding, he said.

And it should be pointed out that its competitors are not exactly sitting still either. Just yesterday, Dataform announced that it had been acquired by Google to help it build out and run better performance at BigQuery.

“Firebolt created a SaaS product that changes the analytics experience over big data sets,” Oren Zeev of Zeev Ventures said in a statement. “The pace of innovation in the big data space has lagged the explosion in data growth rendering most data warehousing solutions too slow, too expensive, or too complex to scale. Firebolt takes cloud data warehousing to the next level by offering the world’s most powerful analytical engine. This means companies can now analyze multi Terabyte / Petabyte data sets easily at significantly lower costs and provide a truly interactive user experience to their employees, customers or anyone who needs to access the data.”

Nov
12
2020
--

Databricks launches SQL Analytics

AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service that makes it easier for data analysts to run their standard SQL queries directly on data lakes. And with that, enterprises can now easily connect their business intelligence tools like Tableau and Microsoft’s Power BI to these data repositories as well.

SQL Analytics will be available in public preview on November 18.

In many ways, SQL Analytics is the product Databricks has long been looking to build and that brings its concept of a “lake house” to life. It combines the performance of a data warehouse, where you store data after it has already been transformed and cleaned, with a data lake, where you store all of your data in its raw form. The data in the data lake, a concept that Databricks’ co-founder and CEO Ali Ghodsi has long championed, is typically only transformed when it gets used. That makes data lakes cheaper, but also a bit harder to handle for users.

Image Credits: Databricks

“We’ve been saying Unified Data Analytics, which means unify the data with the analytics. So data processing and analytics, those two should be merged. But no one picked that up,” Ghodsi told me. But “lake house” caught on as a term.

“Databricks has always offered data science, machine learning. We’ve talked about that for years. And with Spark, we provide the data processing capability. You can do [extract, transform, load]. That has always been possible. SQL Analytics enables you to now do the data warehousing workloads directly, and concretely, the business intelligence and reporting workloads, directly on the data lake.”

The general idea here is that with just one copy of the data, you can enable both traditional data analyst use cases (think BI) and the data science workloads (think AI) Databricks was already known for. Ideally, that makes both use cases cheaper and simpler.

The service sits on top of an optimized version of Databricks’ open-source Delta Lake storage layer to enable the service to quickly complete queries. In addition, Delta Lake also provides auto-scaling endpoints to keep the query latency consistent, even under high loads.

While data analysts can query these data sets directly, using standard SQL, the company also built a set of connectors to BI tools. Its BI partners include Tableau, Qlik, Looker and Thoughtspot, as well as ingest partners like Fivetran, Fishtown Analytics, Talend and Matillion.

Image Credits: Databricks

“Now more than ever, organizations need a data strategy that enables speed and agility to be adaptable,” said Francois Ajenstat, chief product officer at Tableau. “As organizations are rapidly moving their data to the cloud, we’re seeing growing interest in doing analytics on the data lake. The introduction of SQL Analytics delivers an entirely new experience for customers to tap into insights from massive volumes of data with the performance, reliability and scale they need.”

In a demo, Ghodsi showed me what the new SQL Analytics workspace looks like. It’s essentially a stripped-down version of the standard code-heavy experience with which Databricks users are familiar. Unsurprisingly, SQL Analytics provides a more graphical experience that focuses more on visualizations and not Python code.

While there are already some data analysts on the Databricks platform, this obviously opens up a large new market for the company — something that would surely bolster its plans for an IPO next year.

May
27
2020
--

RudderStack raises $5M seed round for its open-source Segment competitor

RudderStack, a startup that offers an open-source alternative to customer data management platforms like Segment, today announced that it has raised a $5 million seed round led by S28 Capital. Salil Deshpande of Uncorrelated Ventures and Mesosphere/D2iQ co-founder Florian Leibert (through 468 Capital) also participated in this round.

In addition, the company also today announced that it has acquired Blendo, an integration platform that helps businesses transform and move data from their data sources to databases.

Like its larger competitors, RudderStack helps businesses consolidate all of their customer data, which is now typically generated and managed in multiple places — and then extract value from this more holistic view. The company was founded by Soumyadeb Mitra, who has a Ph.D. in database systems and worked on similar problems previously when he was at 8×8 after his previous startup, MairinaIQ, was acquired by that company.

Mitra argues that RudderStack is different from its competitors thanks to its focus on developers, its privacy and security options and its focus on being a data warehouse first, without creating yet another data silo.

“Our competitors provide tools for analytics, audience segmentation, etc. on top of the data they keep,” he said. “That works well if you are a small startup, but larger enterprises have a ton of other data sources — at 8×8 we had our own internal billing system, for example — and you want to combine this internal data with the event stream data — that you collect via RudderStack or competitors — to create a 360-degree view of the customer and act on that. This becomes very difficult with the SaaS-hosted data model of our competitors — you won’t be sending all your internal data to these cloud vendors.”

Part of its appeal, of course, is the open-source nature of RudderStack, whose GitHub repository now has more than 1,700 stars for the main RudderStack server. Mitra credits getting on the front page of HackerNews for its first sale. On that day, it received over 500 GitHub stars, a few thousand clones and a lot of signups for its hosted app. “One of those signups turned out to be our first paid customer. They were already a competitor’s customer, but it wasn’t scaling up so were looking to build something in-house. That’s when they found us and started working with us,” he said.

Because it is open source, companies can run RudderStack anyway they want, but like most similar open-source companies, RudderStack offers multiple hosting options itself, too, that include cloud hosting, starting at $2,000 per month, with unlimited sources and destination.

Current users include IFTTT, Mattermost, MarineTraffic, Torpedo and Wynn Las Vegas.

As for the Blendo acquisition, it’s worth noting that the company only raised a small amount of money in its seed round. The two companies did not disclose the price of the acquisition.

“With Blendo, I had the opportunity to be part of a great team that executed on the vision of turning any company into a data-driven organization,” said Blendo founder Kostas Pardalis, who has joined RudderStack as head of Growth. “We’ve combined the talented Blendo and RudderStack teams together with the technology that both companies have created, at a time when the customer data market is ripe for the next wave of innovation. I’m excited to help drive RudderStack forward.”

Mitra tells me that RudderStack acquired Blendo instead of building its own version of this technology because “it is not a trivial technology to build — cloud sources are really complicated and have weird schemas and API challenges and it would have taken us a lot of time to figure it out. There are independent large companies doing the ETL piece.”

May
19
2020
--

Microsoft launches Azure Synapse Link to help enterprises get faster insights from their data

At its Build developer conference, Microsoft today announced Azure Synapse Link, a new enterprise service that allows businesses to analyze their data faster and more efficiently, using an approach that’s generally called “hybrid transaction/analytical processing” (HTAP). That’s a mouthful; it essentially enables enterprises to use the same database system for analytical and transactional workloads on a single system. Traditionally, enterprises had to make some trade-offs between either building a single system for both that was often highly over-provisioned or maintain separate systems for transactional and analytics workloads.

Last year, at its Ignite conference, Microsoft announced Azure Synapse Analytics, an analytics service that combines analytics and data warehousing to create what the company calls “the next evolution of Azure SQL Data Warehouse.” Synapse Analytics brings together data from Microsoft’s services and those from its partners and makes it easier to analyze.

“One of the key things, as we work with our customers on their digital transformation journey, there is an aspect of being data-driven, of being insights-driven as a culture, and a key part of that really is that once you decide there is some amount of information or insights that you need, how quickly are you able to get to that? For us, time to insight and a secondary element, which is the cost it takes, the effort it takes to build these pipelines and maintain them with an end-to-end analytics solution, was a key metric we have been observing for multiple years from our largest enterprise customers,” said Rohan Kumar, Microsoft’s corporate VP for Azure Data.

Synapse Link takes the work Microsoft did on Synaps Analytics a step further by removing the barriers between Azure’s operational databases and Synapse Analytics, so enterprises can immediately get value from the data in those databases without going through a data warehouse first.

“What we are announcing with Synapse Link is the next major step in the same vision that we had around reducing the time to insight,” explained Kumar. “And in this particular case, a long-standing barrier that exists today between operational databases and analytics systems is these complex ETL (extract, transform, load) pipelines that need to be set up just so you can do basic operational reporting or where, in a very transactionally consistent way, you need to move data from your operational system to the analytics system, because you don’t want to impact the performance of the operational system in any way because that’s typically dealing with, depending on the system, millions of transactions per second.”

ETL pipelines, Kumar argued, are typically expensive and hard to build and maintain, yet enterprises are now building new apps — and maybe even line of business mobile apps — where any action that consumers take and that is registered in the operational database is immediately available for predictive analytics, for example.

From the user perspective, enabling this only takes a single click to link the two, while it removes the need for managing additional data pipelines or database resources. That, Kumar said, was always the main goal for Synapse Link. “With a single click, you should be able to enable real-time analytics on your operational data in ways that don’t have any impact on your operational systems, so you’re not using the compute part of your operational system to do the query, you actually have to transform the data into a columnar format, which is more adaptable for analytics, and that’s really what we achieved with Synapse Link.”

Because traditional HTAP systems on-premises typically share their compute resources with the operational database, those systems never quite took off, Kumar argued. In the cloud, with Synapse Link, though, that impact doesn’t exist because you’re dealing with two separate systems. Now, once a transaction gets committed to the operational database, the Synapse Link system transforms the data into a columnar format that is more optimized for the analytics system — and it does so in real time.

For now, Synapse Link is only available in conjunction with Microsoft’s Cosmos DB database. As Kumar told me, that’s because that’s where the company saw the highest demand for this kind of service, but you can expect the company to add support for available in Azure SQL, Azure Database for PostgreSQL and Azure Database for MySQL in the future.

Apr
22
2020
--

Fishtown Analytics raises $12.9M Series A for its open-source analytics engineering tool

Philadelphia-based Fishtown Analytics, the company behind the popular open-source data engineering tool dbt, today announced that it has raised a $12.9 million Series A round led by Andreessen Horowitz, with the firm’s general partner Martin Casado joining the company’s board.

“I wrote this blog post in early 2016, essentially saying that analysts needed to work in a fundamentally different way,” Fishtown founder and CEO Tristan Handy told me, when I asked him about how the product came to be. “They needed to work in a way that much more closely mirrored the way the software engineers work and software engineers have been figuring this shit out for years and data analysts are still like sending each other Microsoft Excel docs over email.”

The dbt open-source project forms the basis of this. It allows anyone who can write SQL queries to transform data and then load it into their preferred analytics tools. As such, it sits in-between data warehouses and the tools that load data into them on one end, and specialized analytics tools on the other.

As Casado noted when I talked to him about the investment, data warehouses have now made it affordable for businesses to store all of their data before it is transformed. So what was traditionally “extract, transform, load” (ETL) has now become “extract, load, transform” (ELT). Andreessen Horowitz is already invested in Fivetran, which helps businesses move their data into their warehouses, so it makes sense for the firm to also tackle the other side of this business.

“Dbt is, as far as we can tell, the leading community for transformation and it’s a company we’ve been tracking for at least a year,” Casado said. He also argued that data analysts — unlike data scientists — are not really catered to as a group.

Before this round, Fishtown hadn’t raised a lot of money, even though it has been around for a few years now, except for a small SAFE round from Amplify.

But Handy argued that the company needed this time to prove that it was on to something and build a community. That community now consists of more than 1,700 companies that use the dbt project in some form and over 5,000 people in the dbt Slack community. Fishtown also now has over 250 dbt Cloud customers and the company signed up a number of big enterprise clients earlier this year. With that, the company needed to raise money to expand and also better service its current list of customers.

“We live in Philadelphia. The cost of living is low here and none of us really care to make a quadro-billion dollars, but we do want to answer the question of how do we best serve the community,” Handy said. “And for the first time, in the early part of the year, we were like, holy shit, we can’t keep up with all of the stuff that people need from us.”

The company plans to expand the team from 25 to 50 employees in 2020 and with those, the team plans to improve and expand the product, especially its IDE for data analysts, which Handy admitted could use a bit more polish.

Feb
24
2020
--

Databricks makes bringing data into its ‘lakehouse’ easier

Databricks today announced the launch of its new Data Ingestion Network of partners and the launch of its Databricks Ingest service. The idea here is to make it easier for businesses to combine the best of data warehouses and data lakes into a single platform — a concept Databricks likes to call “lakehouse.”

At the core of the company’s lakehouse is Delta Lake, Databricks’ Linux Foundation-managed open-source project that brings a new storage layer to data lakes that helps users manage the lifecycle of their data and ensures data quality through schema enforcement, log records and more. Databricks users can now work with the first five partners in the Ingestion Network — Fivetran, Qlik, Infoworks, StreamSets, Syncsort — to automatically load their data into Delta Lake. To ingest data from these partners, Databricks customers don’t have to set up any triggers or schedules — instead, data automatically flows into Delta Lake.

“Until now, companies have been forced to split up their data into traditional structured data and big data, and use them separately for BI and ML use cases. This results in siloed data in data lakes and data warehouses, slow processing and partial results that are too delayed or too incomplete to be effectively utilized,” says Ali Ghodsi, co-founder and CEO of Databricks. “This is one of the many drivers behind the shift to a Lakehouse paradigm, which aspires to combine the reliability of data warehouses with the scale of data lakes to support every kind of use case. In order for this architecture to work well, it needs to be easy for every type of data to be pulled in. Databricks Ingest is an important step in making that possible.”

Databricks VP of Product Marketing Bharath Gowda also tells me that this will make it easier for businesses to perform analytics on their most recent data and hence be more responsive when new information comes in. He also noted that users will be able to better leverage their structured and unstructured data for building better machine learning models, as well as to perform more traditional analytics on all of their data instead of just a small slice that’s available in their data warehouse.

Dec
03
2019
--

AWS speeds up Redshift queries 10x with AQUA

At its re:Invent conference, AWS CEO Andy Jassy today announced the launch of AQUA (the Advanced Query Accelerator) for Amazon Redshift, the company’s data warehousing service. As Jassy noted in his keynote, it’s hard to scale data warehouses when you want to do analytics over that data. At some point, as your data warehouse or lake grows, the data starts overwhelming your network or available compute, even with today’s highspeed networks and chips. So to handle this, AQUA is essentially a hardware-accelerated cache and promises up to 10x better query performance than competing cloud-based data warehouses.

“Think about how much data you have to move over the network to get to your compute,” Jassy said. And if that’s not a problem for a company today, he added, it will likely become one soon, given how much data most enterprises now generate.

With this, Jassy explained, you’re bringing the compute power you need directly to the storage layer. The cache sits on top of Amazon’s standard S3 service and can hence scale out as needed across as many nodes as needed.

AWS designed its own analytics processors to power this service and accelerate the data compression and encryption on the fly.

Unsurprisingly, the service is also 100% compatible with the current version of Redshift.

In addition, AWS also today announced next-generation compute instances for Redshift, the RA3 instances, with 48 vCPUs and 384GiB of memory and up to 64 TB of storage. You can build clusters of these with up to 128 instances.

Nov
04
2019
--

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.

Aug
15
2019
--

Incorta raises $30M Series C for ETL-free data processing solution

Incorta, a startup founded by former Oracle executives who want to change the way we process large amounts of data, announced a $30 million Series C today led by Sorenson Capital.

Other investors participating in the round included GV (formerly Google Ventures), Kleiner Perkins, M12 (formerly Microsoft Ventures), Telstra Ventures and Ron Wohl. Today’s investment brings the total raised to $75 million, according to the company.

Incorta CEO and co-founder Osama Elkady says he and his co-founders were compelled to start Incorta because they saw so many companies spending big bucks for data projects that were doomed to fail. “The reason that drove me and three other guys to leave Oracle and start Incorta is because we found out with all the investment that companies were making around data warehousing and implementing advanced projects, very few of these projects succeeded,” Elkady told TechCrunch.

A typical data project involves ETL (extract, transform, load). It’s a process that takes data out of one database, changes the data to make it compatible with the target database and adds it to the target database.

It takes time to do all of that, and Incorta is trying to make access to the data much faster by stripping out this step. Elkady says that this allows customers to make use of the data much more quickly, claiming they are reducing the process from one that took hours to one that takes just seconds. That kind of performance enhancement is garnering attention.

Rob Rueckert, managing director for lead investor Sorenson Capital, sees a company that’s innovating in a mature space. “Incorta is poised to upend the data warehousing market with innovative technology that will end 30 years of archaic and slow data warehouse infrastructure,” he said in a statement.

The company says revenue is growing by leaps and bounds, reporting 284% year over year growth (although they did not share specific numbers). Customers include Starbucks, Shutterfly and Broadcom.

The startup, which launched in 2013, currently has 250 employees, with developers in Egypt and main operations in San Mateo, Calif. They recently also added offices in Chicago, Dubai and Bangalore.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com