Sep
07
2021
--

Seqera Labs grabs $5.5M to help sequence COVID-19 variants and other complex data problems

Bringing order and understanding to unstructured information located across disparate silos has been one of the more significant breakthroughs of the big data era, and today a European startup that has built a platform to help with this challenge specifically in the area of life sciences — and has, notably, been used by labs to sequence and so far identify two major COVID-19 variants — is announcing some funding to continue building out its tools to a wider set of use cases, and to expand into North America.

Seqera Labs, a Barcelona-based data orchestration and workflow platform tailored to help scientists and engineers order and gain insights from cloud-based genomic data troves, as well as to tackle other life science applications that involve harnessing complex data from multiple locations, has raised $5.5 million in seed funding.

Talis Capital and Speedinvest co-led this round, with participation also from previous backer BoxOne Ventures and a grant from the Chan Zuckerberg Initiative, Mark Zuckerberg and Dr. Priscilla Chan’s effort to back open source software projects for science applications.

Seqera — a portmanteau of “sequence” and “era”, the age of sequencing data, basically — had previously raised less than $1 million, and quietly, it is already generating revenues, with five of the world’s biggest pharmaceutical companies part of its customer base, alongside biotech and other life sciences customers.

Seqera was spun out of the Centre for Genomic Regulation, a biomedical research center based out of Barcelona, where it was built as the commercial application of Nextflow, open source workflow and data orchestration software originally created by the founders of Seqera, Evan Floden and Paolo Di Tommaso, at the CGR.

Floden, Seqera’s CEO, told TechCrunch that he and Di Tommaso were motivated to create Seqera in 2018 after seeing Nextflow gain a lot of traction in the life science community, and subsequently getting a lot of repeat requests for further customization and features. Both Nextflow and Seqera have seen a lot of usage: the Nextflow runtime has been downloaded more than 2 million times, the company said, while Seqera’s commercial cloud offering has now processed more than 5 billion tasks.

The COVID-19 pandemic is a classic example of the acute challenge that Seqera (and by association Nextflow) aims to address in the scientific community. With COVID-19 outbreaks happening globally, each time a test for COVID-19 is processed in a lab, live genetic samples of the virus get collected. Taken together, these millions of tests represent a goldmine of information about the coronavirus and how it is mutating, and when and where it is doing so. For a new virus about which so little is understood and that is still persisting, that’s invaluable data.

So the problem is not if the data exists for better insights (it does); it is that it’s nearly impossible to use more legacy tools to view that data as a holistic body. It’s in too many places, and there is just too much of it, and it’s growing every day (and changing every day), which means that traditional approaches of porting data to a centralized location to run analytics on it just wouldn’t be efficient, and would cost a fortune to execute.

That is where Segera comes in. The company’s technology treats each source of data across different clouds as a salient pipeline which can be merged and analyzed as a single body, without that data ever leaving the boundaries of the infrastructure where it already exists. Customised to focus on genomic troves, scientists can then query that information for more insights. Seqera was central to the discovery of both the Alpha and Delta variants of the virus, and work is still ongoing as COVID-19 continues to hammer the globe.

Seqera is being used in other kinds of medical applications, such as in the realm of so-called “precision medicine.” This is emerging as a very big opportunity in complex fields like oncology: cancer mutates and behaves differently depending on many factors, including genetic differences of the patients themselves, which means that treatments are less effective if they are “one size fits all.”

Increasingly, we are seeing approaches that leverage machine learning and big data analytics to better understand individual cancers and how they develop for different populations, to subsequently create more personalized treatments, and Seqera comes into play as a way to sequence that kind of data.

This also highlights something else notable about the Seqera platform: it is used directly by the people who are analyzing the data — that is, the researchers and scientists themselves, without data specialists necessarily needing to get involved. This was a practical priority for the company, Floden told me, but nonetheless, it’s an interesting detail of how the platform is inadvertently part of that bigger trend of “no-code/low-code” software, designed to make highly technical processes usable by non-technical people.

It’s both the existing opportunity and how Seqera might be applied in the future across other kinds of data that lives in the cloud that makes it an interesting company, and it seems an interesting investment, too.

“Advancements in machine learning, and the proliferation of volumes and types of data, are leading to increasingly more applications of computer science in life sciences and biology,” said Kirill Tasilov, principal at Talis Capital, in a statement. “While this is incredibly exciting from a humanity perspective, it’s also skyrocketing the cost of experiments to sometimes millions of dollars per project as they become computer-heavy and complex to run. Nextflow is already a ubiquitous solution in this space and Seqera is driving those capabilities at an enterprise level – and in doing so, is bringing the entire life sciences industry into the modern age. We’re thrilled to be a part of Seqera’s journey.”

“With the explosion of biological data from cheap, commercial DNA sequencing, there is a pressing need to analyse increasingly growing and complex quantities of data,” added Arnaud Bakker, principal at Speedinvest. “Seqera’s open and cloud-first framework provides an advanced tooling kit allowing organisations to scale complex deployments of data analysis and enable data-driven life sciences solutions.”

Although medicine and life sciences are perhaps Seqera’s most obvious and timely applications today, the framework originally designed for genetics and biology can be applied to any a number of other areas: AI training, image analysis and astronomy are three early use cases, Floden said. Astronomy is perhaps very apt, since it seems that the sky is the limit.

“We think we are in the century of biology,” Floden said. “It’s the center of activity and it’s becoming data-centric, and we are here to build services around that.”

Seqera is not disclosing its valuation with this round.

Aug
19
2021
--

Companies betting on data must value people as much as AI

The Pareto principle, also known as the 80-20 rule, asserts that 80% of consequences come from 20% of causes, rendering the remainder way less impactful.

Those working with data may have heard a different rendition of the 80-20 rule: A data scientist spends 80% of their time at work cleaning up messy data as opposed to doing actual analysis or generating insights. Imagine a 30-minute drive expanded to two-and-a-half hours by traffic jams, and you’ll get the picture.

As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now.

While most data scientists spend more than 20% of their time at work on actual analysis, they still have to waste countless hours turning a trove of messy data into a tidy dataset ready for analysis. This process can include removing duplicate data, making sure all entries are formatted correctly and doing other preparatory work.

On average, this workflow stage takes up about 45% of the total time, a recent Anaconda survey found. An earlier poll by CrowdFlower put the estimate at 60%, and many other surveys cite figures in this range.

None of this is to say data preparation is not important. “Garbage in, garbage out” is a well-known rule in computer science circles, and it applies to data science, too. In the best-case scenario, the script will just return an error, warning that it cannot calculate the average spending per client, because the entry for customer #1527 is formatted as text, not as a numeral. In the worst case, the company will act on insights that have little to do with reality.

The real question to ask here is whether re-formatting the data for customer #1527 is really the best way to use the time of a well-paid expert. The average data scientist is paid between $95,000 and $120,000 per year, according to various estimates. Having the employee on such pay focus on mind-numbing, non-expert tasks is a waste both of their time and the company’s money. Besides, real-world data has a lifespan, and if a dataset for a time-sensitive project takes too long to collect and process, it can be outdated before any analysis is done.

What’s more, companies’ quests for data often include wasting the time of non-data-focused personnel, with employees asked to help fetch or produce data instead of working on their regular responsibilities. More than half of the data being collected by companies is often not used at all, suggesting that the time of everyone involved in the collection has been wasted to produce nothing but operational delay and the associated losses.

The data that has been collected, on the other hand, is often only used by a designated data science team that is too overworked to go through everything that is available.

All for data, and data for all

The issues outlined here all play into the fact that save for the data pioneers like Google and Facebook, companies are still wrapping their heads around how to re-imagine themselves for the data-driven era. Data is pulled into huge databases and data scientists are left with a lot of cleaning to do, while others, whose time was wasted on helping fetch the data, do not benefit from it too often.

The truth is, we are still early when it comes to data transformation. The success of tech giants that put data at the core of their business models set off a spark that is only starting to take off. And even though the results are mixed for now, this is a sign that companies have yet to master thinking with data.

Data holds much value, and businesses are very much aware of it, as showcased by the appetite for AI experts in non-tech companies. Companies just have to do it right, and one of the key tasks in this respect is to start focusing on people as much as we do on AIs.

Data can enhance the operations of virtually any component within the organizational structure of any business. As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now. The goal for any company looking to tap data today comes down to getting it from point A to point B. Point A is the part in the workflow where data is being collected, and point B is the person who needs this data for decision-making.

Importantly, point B does not have to be a data scientist. It could be a manager trying to figure out the optimal workflow design, an engineer looking for flaws in a manufacturing process or a UI designer doing A/B testing on a specific feature. All of these people must have the data they need at hand all the time, ready to be processed for insights.

People can thrive with data just as well as models, especially if the company invests in them and makes sure to equip them with basic analysis skills. In this approach, accessibility must be the name of the game.

Skeptics may claim that big data is nothing but an overused corporate buzzword, but advanced analytics capacities can enhance the bottom line for any company as long as it comes with a clear plan and appropriate expectations. The first step is to focus on making data accessible and easy to use and not on hauling in as much data as possible.

In other words, an all-around data culture is just as important for an enterprise as the data infrastructure.

Jul
26
2021
--

ActiveFence comes out of the shadows with $100M in funding and tech that detects online harm, now valued at $500M+

Online abuse, disinformation, fraud and other malicious content is growing and getting more complex to track. Today, a startup called ActiveFence is coming out of the shadows to announce significant funding on the back of a surge of large organizations using its services. ActiveFence has quietly built a tech platform to suss out threats as they are being formed and planned to make it easier for trust and safety teams to combat them on platforms.

The startup, co-headquartered in New York and Tel Aviv, has raised $100 million, funding that it will use to continue developing its tools and to continue expanding its customer base. To date, ActiveFence says that its customers include companies in social media, audio and video streaming, file sharing, gaming, marketplaces and other technologies — it has yet to disclose any specific names but says that its tools collectively cover “billions” of users. Governments and brands are two other categories that it is targeting as it continues to expand. It has been around since 2018 and is growing at around 100% annually.

The $100 million being announced today actually covers two rounds: Its most recent Series B led by CRV and Highland Europe, as well as a Series A it never announced led by Grove Ventures and Norwest Venture Partners. Vintage Investment Partners, Resolute Ventures and other unnamed backers also participated. It’s not disclosing valuation but I understand it’s over $500 million.

“We are very honored to be ActiveFence partners from the very earliest days of the company, and to be part of this important journey to make the internet a safer place and see their unprecedented success with the world’s leading internet platforms,” said Lotan Levkowitz, general partner at Grove Ventures, in a statement.

The increased presence of social media and online chatter on other platforms has put a strong spotlight on how those forums are used by bad actors to spread malicious content. ActiveFence’s particular approach is a set of algorithms that tap into innovations in AI (natural language processing) and to map relationships between conversations. It crawls all of the obvious, and less obvious and harder-to-reach parts of the internet to pick up on chatter that is typically where a lot of the malicious content and campaigns are born — some 3 million sources in all — before they become higher-profile issues. It’s built both on the concept of big data analytics as well as understanding that the long tail of content online has a value if it can be tapped effectively.

“We take a fundamentally different approach to trust, safety and content moderation,” Noam Schwartz, the co-founder and CEO, said in an interview. “We are proactively searching the darkest corners of the web and looking for bad actors in order to understand the sources of malicious content. Our customers then know what’s coming. They don’t need to wait for the damage, or for internal research teams to identify the next scam or disinformation campaign. We work with some of the most important companies in the world, but even tiny, super niche platforms have risks.”

The insights that ActiveFence gathers are then packaged up in an API that its customers can then feed into whatever other systems they use to track or mitigate traffic on their own platforms.

ActiveFence is not the only company building technology to help platform operators, governments and brands have a better picture of what is going on in the wider online world. Factmata has built algorithms to better understand and track sentiments online; Primer (which also recently raised a big round) also uses NLP to help its customers track online information, with its customers including government organizations that used its technology to track misinformation during election campaigns; Bolster (formerly called RedMarlin) is another.

Some of the bigger platforms have also gotten more proactive in bringing tracking technology and talent in-house: Facebook acquired Bloomsbury AI several years ago for this purpose; Twitter has acquired Fabula (and is working on a bigger efforts like Birdwatch to build better tools), and earlier this year Discord picked up Sentropy, another online abuse tracker. In some cases, companies that more regularly compete against each other for eyeballs and dollars are even teaming up to collaborate on efforts.

Indeed, it may well be that ultimately there will exist multiple efforts and multiple companies doing good work in this area, not unlike other corners of the world of security, which might need more than one hammer thrown at problems to crack them. In this particular case, the growth of the startup to date, and its effectiveness in identifying early warning signs, is one reason investors have been interested in ActiveFence.

“We are pleased to support ActiveFence in this important mission,” commented Izhar Armony, a general partner at CRV, in a statement. “We believe they are ready for the next phase of growth and that they can maintain leadership in the dynamic and fast-growing trust and safety market.”

“ActiveFence has emerged as a clear leader in the developing online trust and safety category. This round will help the company to accelerate the growth momentum we witnessed in the past few years,” said Dror Nahumi, general partner at Norwest Venture Partners, in a statement.

Jul
12
2021
--

Quantexa raises $153M to build out AI-based big data tools to track risk and run investigations

As financial crime has become significantly more sophisticated, so too have the tools that are used to combat it. Now, Quantexa — one of the more interesting startups that has been building AI-based solutions to help detect and stop money laundering, fraud and other illicit activity — has raised a growth round of $153 million, both to continue expanding that business in financial services and to bring its tools into a wider context, so to speak: linking up the dots around all customer and other data.

“We’ve diversified outside of financial services and working with government, healthcare, telcos and insurance,” Vishal Marria, its founder and CEO, said in an interview. “That has been substantial. Given the whole journey that the market’s gone through in contextual decision intelligence as part of bigger digital transformation, was inevitable.”

The Series D values the London-based startup between $800 million and $900 million on the heels of Quantexa growing its subscriptions revenues 108% in the last year.

Warburg Pincus led the round, with existing backers Dawn Capital, AlbionVC, Evolution Equity Partners (a specialist cybersecurity VC), HSBC, ABN AMRO Ventures and British Patient Capital also participating. The valuation is a significant hike up for Quantexa, which was valued between $200 million and $300 million in its Series C last July. It has now raised over $240 million to date.

Quantexa got its start out of a gap in the market that Marria identified when he was working as a director at Ernst & Young tasked with helping its clients with money laundering and other fraudulent activity. As he saw it, there were no truly useful systems in the market that efficiently tapped the world of data available to companies — matching up and parsing both their internal information as well as external, publicly available data — to get more meaningful insights into potential fraud, money laundering and other illegal activities quickly and accurately.

Quantexa’s machine learning system approaches that challenge as a classic big data problem — too much data for a human to parse on their own, but small work for AI algorithms processing huge amounts of that data for specific ends.

Its so-called “Contextual Decision Intelligence” models (the name Quantexa is meant to evoke “quantum” and “context”) were built initially specifically to address this for financial services, with AI tools for assessing risk and compliance and identifying financial criminal activity, leveraging relationships that Quantexa has with partners like Accenture, Deloitte, Microsoft and Google to help fill in more data gaps.

The company says its software — and this, not the data, is what is sold to companies to use over their own data sets — has handled up to 60 billion records in a single engagement. It then presents insights in the form of easily digestible graphs and other formats so that users can better understand the relationships between different entities and so on.

Today, financial services companies still make up about 60% of the company’s business, Marria said, with seven of the top 10 U.K. and Australian banks and six of the top 14 financial institutions in North America among its customers. (The list includes its strategic backer HSBC, as well as Standard Chartered Bank and Danske Bank.)

But alongside those — spurred by a huge shift in the market to rely significantly more on wider data sets, to businesses updating their systems in recent years, and the fact that, in the last year, online activity has in many cases become the “only” activity — Quantexa has expanded more significantly into other sectors.

“The Financial crisis [of 2007] was a tipping point in terms of how financial services companies became more proactive, and I’d say that the pandemic has been a turning point around other sectors like healthcare in how to become more proactive,” Marria said. “To do that you need more data and insights.”

So in the last year in particular, Quantexa has expanded to include other verticals facing financial crime, such as healthcare, insurance, government (for example in tax compliance) and telecoms/communications, but in addition to that, it has continued to diversify what it does to cover more use cases, such as building more complete customer profiles that can be used for KYC (know your customer) compliance or to serve them with more tailored products. Working with government, it’s also seeing its software getting applied to other areas of illicit activity, such as tracking and identifying human trafficking.

In all, Quantexa has “thousands” of customers in 70 markets. Quantexa cites figures from IDC that estimate the market for such services — both financial crime and more general KYC services — is worth about $114 billion annually, so there is still a lot more to play for.

“Quantexa’s proprietary technology enables clients to create single views of individuals and entities, visualized through graph network analytics and scaled with the most advanced AI technology,” said Adarsh Sarma, MD and co-head of Europe at Warburg Pincus, in a statement. “This capability has already revolutionized the way KYC, AML and fraud processes are run by some of the world’s largest financial institutions and governments, addressing a significant gap in an increasingly important part of the industry. The company’s impressive growth to date is a reflection of its invaluable value proposition in a massive total available market, as well as its continued expansion across new sectors and geographies.”

Interestingly, Marria admitted to me that the company has been approached by big tech companies and others that work with them as an acquisition target — no real surprises there — but longer term, he would like Quantexa to consider how it continues to grow on its own, with an independent future very much in his distant sights.

“Sure, an acquisition to the likes of a big tech company absolutely could happen, but I am gearing this up for an IPO,” he said.

Jun
25
2021
--

Edge Delta raises $15M Series A to take on Splunk

Seattle-based Edge Delta, a startup that is building a modern distributed monitoring stack that is competing directly with industry heavyweights like Splunk, New Relic and Datadog, today announced that it has raised a $15 million Series A funding round led by Menlo Ventures and Tim Tully, the former CTO of Splunk. Previous investors MaC Venture Capital and Amity Ventures also participated in this round, which brings the company’s total funding to date to $18 million.

“Our thesis is that there’s no way that enterprises today can continue to analyze all their data in real time,” said Edge Delta co-founder and CEO Ozan Unlu, who has worked in the observability space for about 15 years already (including at Microsoft and Sumo Logic). “The way that it was traditionally done with these primitive, centralized models — there’s just too much data. It worked 10 years ago, but gigabytes turned into terabytes and now terabytes are turning into petabytes. That whole model is breaking down.”

Image Credits: Edge Delta

He acknowledges that traditional big data warehousing works quite well for business intelligence and analytics use cases. But that’s not real-time and also involves moving a lot of data from where it’s generated to a centralized warehouse. The promise of Edge Delta is that it can offer all of the capabilities of this centralized model by allowing enterprises to start to analyze their logs, metrics, traces and other telemetry right at the source. This, in turn, also allows them to get visibility into all of the data that’s generated there, instead of many of today’s systems, which only provide insights into a small slice of this information.

While competing services tend to have agents that run on a customer’s machine, but typically only compress the data, encrypt it and then send it on to its final destination, Edge Delta’s agent starts analyzing the data right at the local level. With that, if you want to, for example, graph error rates from your Kubernetes cluster, you wouldn’t have to gather all of this data and send it off to your data warehouse where it has to be indexed before it can be analyzed and graphed.

With Edge Delta, you could instead have every single node draw its own graph, which Edge Delta can then combine later on. With this, Edge Delta argues, its agent is able to offer significant performance benefits, often by orders of magnitude. This also allows businesses to run their machine learning models at the edge, as well.

Image Credits: Edge Delta

“What I saw before I was leaving Splunk was that people were sort of being choosy about where they put workloads for a variety of reasons, including cost control,” said Menlo Ventures’ Tim Tully, who joined the firm only a couple of months ago. “So this idea that you can move some of the compute down to the edge and lower latency and do machine learning at the edge in a distributed way was incredibly fascinating to me.”

Edge Delta is able to offer a significantly cheaper service, in large part because it doesn’t have to run a lot of compute and manage huge storage pools itself since a lot of that is handled at the edge. And while the customers obviously still incur some overhead to provision this compute power, it’s still significantly less than what they would be paying for a comparable service. The company argues that it typically sees about a 90 percent improvement in total cost of ownership compared to traditional centralized services.

Image Credits: Edge Delta

Edge Delta charges based on volume and it is not shy to compare its prices with Splunk’s and does so right on its pricing calculator. Indeed, in talking to Tully and Unlu, Splunk was clearly on everybody’s mind.

“There’s kind of this concept of unbundling of Splunk,” Unlu said. “You have Snowflake and the data warehouse solutions coming in from one side, and they’re saying, ‘hey, if you don’t care about real time, go use us.’ And then we’re the other half of the equation, which is: actually there’s a lot of real-time operational use cases and this model is actually better for those massive stream processing datasets that you required to analyze in real time.”

But despite this competition, Edge Delta can still integrate with Splunk and similar services. Users can still take their data, ingest it through Edge Delta and then pass it on to the likes of Sumo Logic, Splunk, AWS’s S3 and other solutions.

Image Credits: Edge Delta

“If you follow the trajectory of Splunk, we had this whole idea of building this business around IoT and Splunk at the Edge — and we never really quite got there,” Tully said. “I think what we’re winding up seeing collectively is the edge actually means something a little bit different. […] The advances in distributed computing and sophistication of hardware at the edge allows these types of problems to be solved at a lower cost and lower latency.”

The Edge Delta team plans to use the new funding to expand its team and support all of the new customers that have shown interest in the product. For that, it is building out its go-to-market and marketing teams, as well as its customer success and support teams.

 

Jun
24
2021
--

Firebolt raises $127M more for its new approach to cheaper and more efficient Big Data analytics

Snowflake changed the conversation for many companies when it comes to the potentials of data warehousing. Now one of the startups that’s hoping to disrupt the disruptor is announcing a big round of funding to expand its own business.

Firebolt, which has built a new kind of cloud data warehouse that promises much more efficient, and cheaper, analytics around whatever is stored within it, is announcing a major Series B of $127 million on the heels of huge demand for its services.

The company, which only came out of stealth mode in December, is not disclosing its valuation with this round, which brings the total raised by the Israeli company to $164 million. New backers Dawn Capital and K5 Global are in this round, alongside previous backers Zeev Ventures, TLV Partners, Bessemer Venture Partners and Angular Ventures.

Nor is it disclosing many details about its customers at the moment. CEO and co-founder Eldad Farkash told me in an interview that most of them are U.S.-based, and that the numbers have grown from the dozen or so that were using Firebolt when it was still in stealth mode (it worked quietly for a couple of years building its product and onboarding customers before finally launching six months ago). They are all migrating from existing data warehousing solutions like Snowflake or BigQuery. In other words, its customers are already cloud-native, Big Data companies: it’s not trying to proselytize on the basic concept but work with those who are already in a specific place as a business.

“If you’re not using Snowflake or BigQuery already, we prefer you come back to us later,” he said. Judging by the size and quick succession of the round, that focus is paying off.

The challenge that Firebolt set out to tackle is that while data warehousing has become a key way for enterprises to analyze, update and manage their big data stores — after all, your data is only as good as the tools you have to parse it and keep it secure — typically data warehousing solutions are not efficient, and they can cost a lot of money to maintain.

The challenge was seen firsthand by the three founders of Firebolt, Farkash (CEO), Saar Bitner (COO) and Ariel Yaroshevich (CTO) when they were at a previous company, the business intelligence powerhouse Sisense, where respectively they were one of its co-founders and two members of its founding team. At Sisense, the company continually came up against an issue: When you are dealing in terabytes of data, cloud data warehouses were straining to deliver good performance to power their analytics and other tools, and the only way to potentially continue to mitigate that was by piling on more cloud capacity. And that started to become very expensive.

Firebolt set out to fix that by taking a different approach, rearchitecting the concept. As Farkash sees it, while data warehousing has indeed been a big breakthrough in Big Data, it has started to feel like a dated solution as data troves have grown.

“Data warehouses are solving yesterday’s problem, which was, ‘How do I migrate to the cloud and deal with scale?’” he told me back in December. Google’s BigQuery, Amazon’s RedShift and Snowflake are fitting answers for that issue, he believes, but “we see Firebolt as the new entrant in that space, with a new take on design on technology. We change the discussion from one of scale to one of speed and efficiency.”

The startup claims that its performance is up to 182 times faster than that of other data warehouses with a SQL-based system that works on academic research that had yet to be applied anywhere, around how to handle data in a lighter way, using new techniques in compression and how data is parsed. Data lakes in turn can be connected with a wider data ecosystem, and what it translates to is a much smaller requirement for cloud capacity. And lower costs.

Fast forward to today, and the company says the concept is gaining a lot of traction with engineers and developers in industries like business intelligence, customer-facing services that need to parse a lot of information to serve information to users in real time and back-end data applications. That is proving out what investors suspected would be a shift before the startup even launched, stealthily or otherwise.

“I’ve been an investor at Firebolt since their Series A round and before they had any paying customers,” said Oren Zeev of Zeev Ventures. “What had me invest in Firebolt is mostly the team. A group of highly experienced executives mostly from the big data space who understand the market very well, and the pain organizations are experiencing. In addition, after speaking to a few of my portfolio companies and Firebolt’s initial design partners, it was clear that Firebolt is solving a major pain, so all in all, it was a fairly easy decision. The market in which Firebolt operates is huge if you consider the valuations of Snowflake and Databricks. Even more importantly, it is growing rapidly as the migration from on-premise data warehouse platforms to the cloud is gaining momentum, and as more and more companies rely on data for their operations and are building data applications.”

Jun
17
2021
--

Transform launches with $24.5M in funding for a tool to query and build metrics out of data troves

The biggest tech companies have put a lot of time and money into building tools and platforms for their data science teams and those who work with them to glean insights and metrics out of the masses of data that their companies produce: how a company is performing, how a new feature is working, when something is broken, or when something might be selling well (and why) are all things you can figure out if you know how to read the data.

Now, three alums that worked with data in the world of big tech have founded a startup that aims to build a “metrics store” so that the rest of the enterprise world — much of which lacks the resources to build tools like this from scratch — can easily use metrics to figure things out like this, too.

Transform, as the startup is called, is coming out of stealth today, and it’s doing so with an impressive amount of early backing — a sign not just of investor confidence in these particular founders, but also the recognition that there is a gap in the market for, as the company describes it, a “single source of truth for business data” that could be usefully filled.

The company is announcing that it has closed, while in stealth, a Series A of $20 million, and an earlier seed round of $4.5 million — both led by Index Ventures and Redpoint Ventures. The seed, the company said, also had dozens of angel investors, with the list including Elad Gil of Color Genomics, Lenny Rachitsky of Airbnb and Cristina Cordova of Notion.

The big breakthrough that Transform has made is that it’s built a metrics engine that a company can apply to its structured data — a tool similar to what big tech companies have built for their own use, but that hasn’t really been created (at least until now) for others who are not those big tech companies to use, too.

Transform can work with vast troves of data from the warehouse, or data that is being tracked in real time, to generate insights and analytics about different actions around a company’s products. Transform can be used and queried by non-technical people who still have to deal with data, Handel said.

The impetus for building the product came to Nick Handel, James Mayfield and Paul Yang — respectively Transform’s CEO, COO and software engineer — when they all worked together at Airbnb (previously Mayfield and Yang were also at Facebook together) in a mix of roles that included product management and engineering.

There, they could see first-hand both the promise that data held for helping make decisions around a product, or for measuring how something is used, or to plan future features, but also the demands of harnessing it to work, and getting everyone on the same page to do so.

“There is a growing trend among tech companies to test every single feature, every single iteration of whatever. And so as a part of that, we built this tool [at Airbnb] that basically allowed you to define the various metrics that you wanted to track to understand your experiment,” Handel recalled in an interview. “But you also want to understand so many other things like, how many people are searching for listings in certain areas? How many people are instantly booking those listings? Are they contacting customer service, are they having trust and safety issues?” The tool Airbnb built was Minerva, optimised specifically for the kinds of questions Airbnb might typically have for its own data.

“By locking down all of the definitions for the metrics, you could basically have a data engineering team, a centralized data infrastructure team, do all the calculation for these metrics, and then serve those to the data scientists to then go in and do kind of deeper, more interesting work, because they weren’t bogged down in calculating those metrics over and over,” he continued. This platform evolved within Airbnb. “We were we were really inspired by some of the early work that we saw happen on this tool.”

The issue is that not every company is built to, well, build tools like these tailored to whatever their own business interests might be.

“There’s a handful of companies who do similar things in the metrics space,” Mayfield said, “really top flight companies like LinkedIn, Airbnb and Uber. They have really started to invest in metrics. But it’s only those companies that can devote teams of eight or 10, engineers, designers who can build those things in house. And I think that was probably, you know, a big part of the impetus for wanting to start this company was to say, not every organization is going to be able to devote eight or 10 engineers to building this metrics tool.”

And the other issue is that metrics have become an increasingly important — maybe the most important — lever for decision making in the world of product design and wider business strategy for a tech (and maybe by default, any) company.

We have moved away from “move fast and break things.” Instead, we now embrace — as Mayfield put it — “If you can’t measure it, you can’t move it.”

Transform is built around three basic priorities, Handel said.

The first of these has to do with collective ownership of metrics: by building a single framework for measuring these and identifying them, their theory is that it’s easier for a company to all get on the same page with using them. The second of these is to use Transform to simply make the work of the data team more efficient and easier, by turning the most repetitive parts of extracting insights into automated scripts that can be used and reused, giving the data team the ability to spend more time analyzing the data rather than just building datasets. And third of all, to provide customers with APIs that they can use to embed the metric-extracting tools into other applications, whether in business intelligence or elsewhere.

The three products it’s introducing today, called Metrics Framework, Metrics Catalog and Metrics API follow from these principles.

Transform is only really launching publicly today, but Handel said that it’s already working with a small handful of customers (unnamed) in a small beta, enough to be confident that what it’s built works as it was intended. The funding will be used to continue building out the product as well as bring on more talent and hopefully onboard more businesses to using it.

Hopefully might be less a tenuous word than its investors would use, convinced that it’s filling a strong need in the market.

“Transform is filling a critical gap within the industry. Just as we invested in Looker early on for its innovative approach to business intelligence, Transform takes it one step further by providing a powerful yet streamlined single source of truth for metrics,” said Tomasz Tungis, MD, Redpoint Ventures, in a statement.

“We’ve seen companies across the globe struggle to make sense of endless data sources or turn them into actionable, trusted metrics. We invested in Transform because they’ve developed an elegant solution to this problem that will change how companies think about their data,” added Shardul Shah, a partner at Index Ventures.

Jun
02
2021
--

With buyout, Cloudera hunts for relevance in a changing market

When Cloudera announced its sale to a pair of private equity firms yesterday for $5.3 billion, along with a couple of acquisitions of its own, the company detailed a new path that could help it drive back towards relevance in the big data market.

When the company launched in 2008, Hadoop was in its early days. The open source project developed at Yahoo three years earlier was built to deal with the large amounts of data that the Internet pioneer generated. It became increasingly clear over time that every company would have to deal with growing data stores, and it seemed that Cloudera was in the right market at the right time.

And for a while things went well. Cloudera rode the Hadoop startup wave, garnering a cool billion in funding along the way, including a stunning $740 million check from Intel Capital in 2014. It then went public in 2018 to much fanfare.

But the markets had already started to shift by the time of its public debut. Hadoop, a highly labor-intensive way to manage data, was being supplanted by cheaper and less complex cloud-based solutions.

“The excitement around the original promise of the Hadoop market has contracted significantly. It’s incredibly expensive and complex to get it working effectively in an enterprise context,” Casey Aylward, an investor at Costanoa Ventures told TechCrunch.

The company likely saw that writing on the wall when it merged with another Hadoop-based company, Hortonworks in 2019. That transaction valued the combined entity at $5.2 billion, almost the same amount it sold for yesterday, two years down the road. The decision to sell and go private may also have been spurred by Carl Icahn buying an 18% stake in the company that same year.

Looking to the future, Cloudera’s sale could provide the enterprise unicorn room as it regroups.

Patrick Moorhead, founder and principal analyst at Moor Insight & Strategies sees the deal as a positive step for the company. “I think this is good news for Cloudera because it now has the capital and flexibility to dive head first into SaaS. The company invented the entire concept of a data life cycle, implemented initially on premises, then extended to private and public clouds,” Moorhead said.

Adam Ronthal, Gartner Research VP agrees that it at least gives Cloudera more room to make necessary adjustments its market strategy as long as it doesn’t get stifled by its private equity overlords. “It should give Cloudera an opportunity to focus on their future direction with increased flexibility — provided they are able to invest in that future and that this does not just focus on cost cutting and maximizing profits. Maintaining a culture of innovation will be key,” Ronthal said.

Which brings us to the two purchases Cloudera also announced as part of its news package.

If you want to change direction in a hurry, there are worse ways than via acquisitions. And grabbing Datacoral and Cazena should help Cloudera alter its course more quickly than it could have managed on its own.

“[The] two acquisitions will help Cloudera capture some of the value on top of the lake storage layer — perhaps moving into different data management features and/or expanding into the compute layer for analytics and AI/ML use cases, where there has been a lot of growth and excitement in recent years,” Alyward said.

Chandana Gopal, Research Director for the future of intelligence at IDC agrees that the transactions give Cloudera some more modern options that could help speed up the data wrangling process. “Both the acquisitions are geared towards making the management of cloud infrastructure easier for end-users. Our research shows that data prep and integration takes 70%-80% of an analyst’s time versus the time spent in actual analysis. It seems like both these companies’ products will provide technology to improve the data integration/preparation experience,” she said.

The company couldn’t stay on the path it was on forever, certainly not with an activist investor breathing down its neck. Its recent efforts could give it the time away from public markets it needs to regroup. How successful Cloudera’s turnaround proves to be will depend on whether the private equity companies buying it can both agree on the direction and strategy for the company, while providing the necessary resources to push the company in a new direction. All of that and more will determine if these moves pay off in the end.

May
20
2021
--

How to ensure data quality in the era of Big Data

A little over a decade has passed since The Economist warned us that we would soon be drowning in data. The modern data stack has emerged as a proposed life-jacket for this data flood — spearheaded by Silicon Valley startups such as Snowflake, Databricks and Confluent.

Today, any entrepreneur can sign up for BigQuery or Snowflake and have a data solution that can scale with their business in a matter of hours. The emergence of cheap, flexible and scalable data storage solutions was largely a response to changing needs spurred by the massive explosion of data.

Currently, the world produces 2.5 quintillion bytes of data daily (there are 18 zeros in a quintillion). The explosion of data continues in the roaring ‘20s, both in terms of generation and storage — the amount of stored data is expected to continue to double at least every four years. However, one integral part of modern data infrastructure still lacks solutions suitable for the Big Data era and its challenges: Monitoring of data quality and data validation.

Let me go through how we got here and the challenges ahead for data quality.

The value vs. volume dilemma of Big Data

In 2005, Tim O’Reilly published his groundbreaking article “What is Web 2.0?”, truly setting off the Big Data race. The same year, Roger Mougalas from O’Reilly introduced the term “Big Data” in its modern context? — ?referring to a large set of data that is virtually impossible to manage and process using traditional BI tools.

Back in 2005, one of the biggest challenges with data was managing large volumes of it, as data infrastructure tooling was expensive and inflexible, and the cloud market was still in its infancy (AWS didn’t publicly launch until 2006). The other was speed: As Tristan Handy from Fishtown Analytics (the company behind dbt) notes, before Redshift launched in 2012, performing relatively straightforward analyses could be incredibly time-consuming even with medium-sized data sets. An entire data tooling ecosystem has since been created to mitigate these two problems.

The emergence of the modern data stack (example logos & categories)

The emergence of the modern data stack (example logos and categories). Image Credits: Validio

Scaling relational databases and data warehouse appliances used to be a real challenge. Only 10 years ago, a company that wanted to understand customer behavior had to buy and rack servers before its engineers and data scientists could work on generating insights. Data and its surrounding infrastructure was expensive, so only the biggest companies could afford large-scale data ingestion and storage.

The challenge before us is to ensure that the large volumes of Big Data are of sufficiently high quality before they’re used.

Then came a (Red)shift. In October 2012, AWS presented the first viable solution to the scale challenge with Redshift — a cloud-native, massively parallel processing (MPP) database that anyone could use for a monthly price of a pair of sneakers ($100) — about 1,000x cheaper than the previous “local-server” setup. With a price drop of this magnitude, the floodgates opened and every company, big or small, could now store and process massive amounts of data and unlock new opportunities.

As Jamin Ball from Altimeter Capital summarizes, Redshift was a big deal because it was the first cloud-native OLAP warehouse and reduced the cost of owning an OLAP database by orders of magnitude. The speed of processing analytical queries also increased dramatically. And later on (Snowflake pioneered this), they separated computing and storage, which, in overly simplified terms, meant customers could scale their storage and computing resources independently.

What did this all mean? An explosion of data collection and storage.

Apr
30
2021
--

Analytics as a service: Why more enterprises should consider outsourcing

With an increasing number of enterprise systems, growing teams, a rising proliferation of the web and multiple digital initiatives, companies of all sizes are creating loads of data every day. This data contains excellent business insights and immense opportunities, but it has become impossible for companies to derive actionable insights from this data consistently due to its sheer volume.

According to Verified Market Research, the analytics-as-a-service (AaaS) market is expected to grow to $101.29 billion by 2026. Organizations that have not started on their analytics journey or are spending scarce data engineer resources to resolve issues with analytics implementations are not identifying actionable data insights. Through AaaS, managed services providers (MSPs) can help organizations get started on their analytics journey immediately without extravagant capital investment.

MSPs can take ownership of the company’s immediate data analytics needs, resolve ongoing challenges and integrate new data sources to manage dashboard visualizations, reporting and predictive modeling — enabling companies to make data-driven decisions every day.

AaaS could come bundled with multiple business-intelligence-related services. Primarily, the service includes (1) services for data warehouses; (2) services for visualizations and reports; and (3) services for predictive analytics, artificial intelligence (AI) and machine learning (ML). When a company partners with an MSP for analytics as a service, organizations are able to tap into business intelligence easily, instantly and at a lower cost of ownership than doing it in-house. This empowers the enterprise to focus on delivering better customer experiences, be unencumbered with decision-making and build data-driven strategies.

Organizations that have not started on their analytics journey or are spending scarce data engineer resources to resolve issues with analytics implementations are not identifying actionable data insights.

In today’s world, where customers value experiences over transactions, AaaS helps businesses dig deeper into their psyche and tap insights to build long-term winning strategies. It also enables enterprises to forecast and predict business trends by looking at their data and allows employees at every level to make informed decisions.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com