Sep
16
2021
--

Confluent CEO Jay Kreps is coming to TC Sessions: SaaS for a fireside chat

As companies process ever-increasing amounts of data, moving it in real time is a huge challenge for organizations. Confluent is a streaming data platform built on top of the open source Apache Kafka project that’s been designed to process massive numbers of events. To discuss this, and more, Confluent CEO and co-founder Jay Kreps will be joining us at TC Sessions: SaaS on Oct 27th for a fireside chat.

Data is a big part of the story we are telling at the SaaS event, as it has such a critical role in every business. Kreps has said in the past the data streams are at the core of every business, from sales to orders to customer experiences. As he wrote in a company blog post announcing the company’s $250 million Series E in April 2020, Confluent is working to process all of this data in real time — and that was a big reason why investors were willing to pour so much money into the company.

“The reason is simple: though new data technologies come and go, event streaming is emerging as a major new category that is on a path to be as important and foundational in the architecture of a modern digital company as databases have been,” Kreps wrote at the time.

The company’s streaming data platform takes a multi-faceted approach to streaming and builds on the open source Kafka project. While anyone can download and use Kafka, as with many open source projects, companies may lack the resources or expertise to deal with the raw open source code. Many a startup have been built on open source to help simplify whatever the project does, and Confluent and Kafka are no different.

Kreps told us in 2017 that companies using Kafka as a core technology include Netflix, Uber, Cisco and Goldman Sachs. But those companies have the resources to manage complex software like this. Mere mortal companies can pay Confluent to access a managed cloud version or they can manage it themselves and install it in the cloud infrastructure provider of choice.

The project was actually born at LinkedIn in 2011 when their engineers were tasked with building a tool to process the enormous number of events flowing through the platform. The company eventually open sourced the technology it had created and Apache Kafka was born.

Confluent launched in 2014 and raised over $450 million along the way. In its last private round in April 2020, the company scored a $4.5 billion valuation on a $250 million investment. As of today, it has a market cap of over $17 billion.

In addition to our discussion with Kreps, the conference will also include Google’s Javier Soltero, Amplitude’s Olivia Rose, as well as investors Kobie Fuller and Casey Aylward, among others. We hope you’ll join us. It’s going to be a thought-provoking lineup.

Buy your pass now to save up to $100 when you book by October 1. We can’t wait to see you in October!


Sep
16
2021
--

Tyk raises $35M for its open source, open-ended approach to enterprise API management

APIs are the grease turning the gears and wheels for many organizations’ IT systems today, but as APIs grow in number and use, tracking how they work (or don’t work) together can become complex and potentially critical if something goes awry. Now, a startup that has built an innovative way to help with this is announcing some funding after getting traction with big enterprises adopting its approach.

Tyk, which has built a way for users to access and manage multiple internal enterprise APIs through a universal interface by way of GraphQL, has picked up $35 million, an investment that it will be using both for hiring and to continue enhancing and expanding the tools that it provides to users. Tyk has coined a term describing its approach to managing APIs and the data they produce — “universal data graph” — and today its tools are being used to manage APIs by some 10,000 businesses, including large enterprises like Starbucks, Societe Generale and Domino’s.

Scottish Equity Partners led the round, with participation also from MMC Ventures — its sole previous investor from a round in 2019 after boostrapping for its first five years. The startup is based out of London but works in a very distributed way — one of the co-founders is living in New Zealand currently — and it will be hiring and growing based on that principle, too. It has raised just over $40 million to date.

Tyk (pronounced like “tyke”, meaning small/lively child) got its start as an open source side project first for co-founder Martin Buhr, who is now the company’s CEO, while he was working elsewhere, as a “load testing thing,” in his words.

The shifts in IT toward service-oriented architectures, and building and using APIs to connect internal apps, led him to rethink the code and consider how it could be used to control APIs. Added to that was the fact that as far as Buhr could see, the API management platforms that were in the market at the time — some of the big names today include Kong, Apigee (now a part of Google), 3scale (now a part of RedHat and thus IBM), MuleSoft (now a part of Salesforce) — were not as flexible as his needs were. “So I built my own,” he said.

It was built as an open source tool, and some engineers at other companies started to use it. As it got more attention, some of the bigger companies interested in using it started to ask why he wasn’t charging for anything — a sure sign as any that there was probably a business to be built here, and more credibility to come if he charged for it.

“So we made the gateway open source, and the management part went into a licensing model,” he said. And Tyk was born as a startup co-founded with James Hirst, who is now the COO, who worked with Buhr at a digital agency some years before.

The key motivation behind building Tyk has stayed as its unique selling point for customers working in increasingly complex environments.

“What sparked interest in Tyk was that companies were unhappy with API management as it exists today,” Buhr noted, citing architectures using multiple clouds and multiple containers, creating more complexity that needed better management. “It was just the right time when containerization, Kubernetes and microservices were on the rise… The way we approach the multi-data and multi-vendor cloud model is super flexible and resilient to partitions, in a way that others have not been able to do.”

“You engage developers and deliver real value and it’s up to them to make the choice,” added Hirst. “We are responding to a clear shift in the market.”

One of the next frontiers that Tyk will tackle will be what happens within the management layer, specifically when there are potential conflicts with APIs.

“When a team using a microservice makes a breaking change, we want to bring that up and report that to the system,” Buhr said. “The plan is to flag the issue and test against it, and be able to say that a schema won’t work, and to identify why.”

Even before that is rolled out, though, Tyk’s customer list and its growth speak to a business on the cusp of a lot more.

“Martin and James have built a world-class team and the addition of this new capital will enable Tyk to accelerate the growth of its API management platform, particularly around the GraphQL focused Universal Data Graph product that launched earlier this year,” said Martin Brennan, a director at SEP, in a statement. “We are pleased to be supporting the team to achieve their global ambitions.”

Keith Davidson, a partner at SEP, is joining the Tyk board as a non-executive director with this round.

Sep
10
2021
--

Amagi tunes into $100M for cloud-based video content creation, monetization

Media technology company Amagi announced Friday $100 million to further develop its cloud-based SaaS technology for broadcast and connected televisions.

Accel, Avataar Ventures and Norwest Venture Partners joined existing investor Premji Invest in the funding round, which included buying out stakes held by Emerald Media and Mayfield Fund. Nadathur Holdings continues as an existing investor. The latest round gives Amagi total funding raised to date of $150 million, Baskar Subramanian, co-founder and CEO of Amagi, told TechCrunch.

Bangalore-based Amagi provides cloud broadcast and targeted advertising software so that customers can create content that can be created and monetized to be distributed via broadcast TV and streaming TV platforms like The Roku Channel, Samsung TV Plus and Pluto TV. The company already supports more than 2,000 channels on its platform across over 40 countries.

“Video is a complex technology to manage — there are large files and a lot of computing,” Subramanian said. “What Amagi does is enable a content owner with zero technology knowledge to simplify that complex workflow and scalable infrastructure. We want to make it easy to plug in and start targeting and monetizing advertising.”

As a result, Amagi customers see operational cost savings on average of up to 40% compared to traditional delivery models and their ad impressions grow between five and 10 times.

The new funding comes at a time when the company is experiencing rapid growth. For example, Amagi grew 30 times in the United States alone over the past few years, Subramanian said. Amagi commands an audience of over 2 billion people, and the U.S. is its largest market. The company also sees growth potential in both Latin America and Europe.

In addition, in the last year, revenue grew 136%, while new customer year over year growth was 44%, including NBCUniversal — Subramanian said the Tokyo Olympics were run on Amagi’s platform for NBC, USA Today and ABS-CBN.

As more of a shift happens with video content being developed for connected television experiences, which he said is a $50 billion market, the company plans to use the new funding for sales expansion, R&D to invest in the company’s product pipeline and potential M&A opportunities. The company has not made any acquisitions yet, Subramanian added.

In addition to the broadcast operations in New Delhi, Amagi also has an innovation center in Bangalore and offices in New York, Los Angeles and London.

“Consumer behavior and infrastructure needs have reached a critical mass and new companies are bringing in the next generation of media, and we are a large part of that growth,” Subramanian said. “Sports will come on quicker, while live news and events are going to be one of the biggest growth areas.”

Shekhar Kirani, partner at Accel, said Amagi is taking a unique approach to enterprise SaaS due to that $50 billion industry shift happening in video content, where he sees half of the spend moving to connected television platforms quickly.

Some of the legacy players like Viacom and NBCUniversal created their own streaming platforms, where Netflix and Amazon have also been leading, but not many SaaS companies are enabling the transition, he said.

When Kirani met Subramanian five years ago, Amagi was already well funded, but Kirani was excited about the platform and wanted to help the company scale. He believes the company has a long tailwind because it is saving people time and enabling new content providers to move faster to get their content distributed.

“Amagi is creating a new category and will grow fast,” Kirani added. “They are already growing and doubling each year with phenomenal SaaS metrics because they are helping content providers to connect to any audience.

 

Sep
08
2021
--

Real-time database platform SingleStore raises $80M more, now at a $940M valuation

Organizations are swimming in data these days, and so solutions to help manage and use that data in more efficient ways will continue to see a lot of attention and business. In the latest development, SingleStore — which provides a platform to enterprises to help them integrate, monitor and query their data as a single entity, regardless of whether that data is stored in multiple repositories — is announcing another $80 million in funding, money that it will be using to continue investing in its platform, hiring more talent and overall business expansion. Sources close to the company tell us that the company’s valuation has grown to $940 million.

The round, a Series F, is being led by Insight Partners, with new investor Hewlett Packard Enterprise, and previous backers Khosla Ventures, Dell Technologies Capital, Rev IV, Glynn Capital and GV (formerly Google Ventures) also participating. The startup has to date raised $264 million, including most recently an $80 million Series E last December, just on the heels of rebranding from MemSQL.

The fact that there are three major strategic investors in this Series F — HPE, Dell and Google — may say something about the traction that SingleStore is seeing, but so too do its numbers: 300%+ increase in new customer acquisition for its cloud service and 150%+ year-over-year growth in cloud.

Raj Verma, SingleStore’s CEO, said in an interview that its cloud revenues have grown by 150% year over year and now account for some 40% of all revenues (up from 10% a year ago). New customer numbers, meanwhile, have grown by over 300%.

“The flywheel is now turning around,” Verma said. “We didn’t need this money. We’ve barely touched our Series E. But I think there has been a general sentiment among our board and management that we are now ready for the prime time. We think SingleStore is one of the best-kept secrets in the database market. Now we want to aggressively be an option for people looking for a platform for intensive data applications or if they want to consolidate databases to one from three, five or seven repositories. We are where the world is going: real-time insights.”

With database management and the need for more efficient and cost-effective tools to manage that becoming an ever-growing priority — one that definitely got a fillip in the last 18 months with COVID-19 pushing people into more remote working environments. That means SingleStore is not without competitors, with others in the same space, including Amazon, Microsoft, Snowflake, PostgreSQL, MySQL, Redis and more. Others like Firebolt are tackling the challenges of handing large, disparate data repositories from another angle. (Some of these, I should point out, are also partners: SingleStore works with data stored on AWS, Microsoft Azure, Google Cloud Platform and Red Hat, and Verma describes those who do compute work as “not database companies; they are using their database capabilities for consumption for cloud compute.”)

But the company has carved a place for itself with enterprises and has thousands now on its books, including GE, IEX Cloud, Go Guardian, Palo Alto Networks, EOG Resources and SiriusXM + Pandora.

“SingleStore’s first-of-a-kind cloud database is unmatched in speed, scale, and simplicity by anything in the market,” said Lonne Jaffe, managing director at Insight Partners, in a statement. “SingleStore’s differentiated technology allows customers to unify real-time transactions and analytics in a single database.” Vinod Khosla from Khosla Ventures added that “SingleStore is able to reduce data sprawl, run anywhere, and run faster with a single database, replacing legacy databases with the modern cloud.”

Sep
08
2021
--

Google Workspace opens up spaces for all users

Employee location has become a bit more complicated as some return to the office, while others work remotely. To embrace those hybrid working conditions, Google is making more changes to its Google Workspace offering by going live with spaces in Google Chat for all users.

Spaces integrates with Workspace tools, like the calendar, Drive and documents, to provide a more hybrid work experience where users can see the full history, content and context of conversations, regardless of their location.

Google’s senior director of product management, Sanaz Ahari, wrote in a blog post Wednesday that customers wanted spaces to be more like a “central hub for collaboration, both in real time and asynchronously. Instead of starting an email chain or scheduling a video meeting, teams can come together directly in a space to move projects and topics along.”

Here are some new features users can see in spaces:

  • One interface for everything — inbox, chats, spaces and meetings.
  • Spaces, and content therein, can be made discoverable for people to find and join in the conversation.
  • Better search ability within a team’s knowledge base.
  • Ability to reply to any message within a space.
  • Enhanced security and admin tools to monitor communication.

Employees can now indicate if they will be virtual or in-person on certain days in Calendar for collaboration expectations. As a complement, users can call colleagues on both mobile and desktop devices in Google Meet.

Calendar work location. Image Credits: Google

In November, all customers will be able to use Google Meet’s Companion Mode to join a meeting from a personal device while tapping into in-room audio and video. Also later this year, live-translated captions will be available in English to French, German, Portuguese and Spanish, with more languages being added in the future.

In addition, Google is also expanding its Google Meet hardware portfolio to include two new all-in-one video conferencing devices, third-party devices — Logitech’s video bar and Appcessori’s mobile device speaker dock — and interoperability with Webex by Cisco.

Google is tying everything together with a handbook for navigating hybrid work, which includes best practice blueprints for five common hybrid meetings.

 

Sep
07
2021
--

Seqera Labs grabs $5.5M to help sequence COVID-19 variants and other complex data problems

Bringing order and understanding to unstructured information located across disparate silos has been one of the more significant breakthroughs of the big data era, and today a European startup that has built a platform to help with this challenge specifically in the area of life sciences — and has, notably, been used by labs to sequence and so far identify two major COVID-19 variants — is announcing some funding to continue building out its tools to a wider set of use cases, and to expand into North America.

Seqera Labs, a Barcelona-based data orchestration and workflow platform tailored to help scientists and engineers order and gain insights from cloud-based genomic data troves, as well as to tackle other life science applications that involve harnessing complex data from multiple locations, has raised $5.5 million in seed funding.

Talis Capital and Speedinvest co-led this round, with participation also from previous backer BoxOne Ventures and a grant from the Chan Zuckerberg Initiative, Mark Zuckerberg and Dr. Priscilla Chan’s effort to back open source software projects for science applications.

Seqera — a portmanteau of “sequence” and “era”, the age of sequencing data, basically — had previously raised less than $1 million, and quietly, it is already generating revenues, with five of the world’s biggest pharmaceutical companies part of its customer base, alongside biotech and other life sciences customers.

Seqera was spun out of the Centre for Genomic Regulation, a biomedical research center based out of Barcelona, where it was built as the commercial application of Nextflow, open source workflow and data orchestration software originally created by the founders of Seqera, Evan Floden and Paolo Di Tommaso, at the CGR.

Floden, Seqera’s CEO, told TechCrunch that he and Di Tommaso were motivated to create Seqera in 2018 after seeing Nextflow gain a lot of traction in the life science community, and subsequently getting a lot of repeat requests for further customization and features. Both Nextflow and Seqera have seen a lot of usage: the Nextflow runtime has been downloaded more than 2 million times, the company said, while Seqera’s commercial cloud offering has now processed more than 5 billion tasks.

The COVID-19 pandemic is a classic example of the acute challenge that Seqera (and by association Nextflow) aims to address in the scientific community. With COVID-19 outbreaks happening globally, each time a test for COVID-19 is processed in a lab, live genetic samples of the virus get collected. Taken together, these millions of tests represent a goldmine of information about the coronavirus and how it is mutating, and when and where it is doing so. For a new virus about which so little is understood and that is still persisting, that’s invaluable data.

So the problem is not if the data exists for better insights (it does); it is that it’s nearly impossible to use more legacy tools to view that data as a holistic body. It’s in too many places, and there is just too much of it, and it’s growing every day (and changing every day), which means that traditional approaches of porting data to a centralized location to run analytics on it just wouldn’t be efficient, and would cost a fortune to execute.

That is where Segera comes in. The company’s technology treats each source of data across different clouds as a salient pipeline which can be merged and analyzed as a single body, without that data ever leaving the boundaries of the infrastructure where it already exists. Customised to focus on genomic troves, scientists can then query that information for more insights. Seqera was central to the discovery of both the Alpha and Delta variants of the virus, and work is still ongoing as COVID-19 continues to hammer the globe.

Seqera is being used in other kinds of medical applications, such as in the realm of so-called “precision medicine.” This is emerging as a very big opportunity in complex fields like oncology: cancer mutates and behaves differently depending on many factors, including genetic differences of the patients themselves, which means that treatments are less effective if they are “one size fits all.”

Increasingly, we are seeing approaches that leverage machine learning and big data analytics to better understand individual cancers and how they develop for different populations, to subsequently create more personalized treatments, and Seqera comes into play as a way to sequence that kind of data.

This also highlights something else notable about the Seqera platform: it is used directly by the people who are analyzing the data — that is, the researchers and scientists themselves, without data specialists necessarily needing to get involved. This was a practical priority for the company, Floden told me, but nonetheless, it’s an interesting detail of how the platform is inadvertently part of that bigger trend of “no-code/low-code” software, designed to make highly technical processes usable by non-technical people.

It’s both the existing opportunity and how Seqera might be applied in the future across other kinds of data that lives in the cloud that makes it an interesting company, and it seems an interesting investment, too.

“Advancements in machine learning, and the proliferation of volumes and types of data, are leading to increasingly more applications of computer science in life sciences and biology,” said Kirill Tasilov, principal at Talis Capital, in a statement. “While this is incredibly exciting from a humanity perspective, it’s also skyrocketing the cost of experiments to sometimes millions of dollars per project as they become computer-heavy and complex to run. Nextflow is already a ubiquitous solution in this space and Seqera is driving those capabilities at an enterprise level – and in doing so, is bringing the entire life sciences industry into the modern age. We’re thrilled to be a part of Seqera’s journey.”

“With the explosion of biological data from cheap, commercial DNA sequencing, there is a pressing need to analyse increasingly growing and complex quantities of data,” added Arnaud Bakker, principal at Speedinvest. “Seqera’s open and cloud-first framework provides an advanced tooling kit allowing organisations to scale complex deployments of data analysis and enable data-driven life sciences solutions.”

Although medicine and life sciences are perhaps Seqera’s most obvious and timely applications today, the framework originally designed for genetics and biology can be applied to any a number of other areas: AI training, image analysis and astronomy are three early use cases, Floden said. Astronomy is perhaps very apt, since it seems that the sky is the limit.

“We think we are in the century of biology,” Floden said. “It’s the center of activity and it’s becoming data-centric, and we are here to build services around that.”

Seqera is not disclosing its valuation with this round.

Sep
02
2021
--

Box, Zoom chief product officers discuss how the changing workplace drove their latest collaboration

If the past 18 months is any indication, the nature of the workplace is changing. And while Box and Zoom already have integrations together, it makes sense for them to continue to work more closely.

Their newest collaboration is the Box app for Zoom, a new type of in-product integration that allows users to bring apps into a Zoom meeting to provide the full Box experience.

While in Zoom, users can securely and directly access Box to browse, preview and share files from Zoom — even if they are not taking part in an active meeting. This new feature follows a Zoom integration Box launched last year with its “Recommended Apps” section that enables access to Zoom from Box so that workflows aren’t disrupted.

The companies’ chief product officers, Diego Dugatkin with Box and Oded Gal with Zoom, discussed with TechCrunch why seamless partnerships like these are a solution for the changing workplace.

With digitization happening everywhere, an integration of “best-in-breed” products for collaboration is essential, Dugatkin said. Not only that, people don’t want to be moving from app to app, instead wanting to stay in one environment.

“It’s access to content while never having to leave the Zoom platform,” he added.

It’s also access to content and contacts in different situations. When everyone was in an office, meeting at a moment’s notice internally was not a challenge. Now, more people are understanding the value of flexibility, and both Gal and Dugatkin expect that spending some time at home and some time in the office will not change anytime soon.

As a result, across the spectrum of a company, there is an increasing need for allowing and even empowering people to work from anywhere, Dugatkin said. That then leads to a conversation about sharing documents in a secure way for companies, which this collaboration enables.

The new Box and Zoom integration enables meeting in a hybrid workplace: chat, video, audio, computers or mobile devices, and also being able to access content from all of those methods, Gal said.

“Companies need to be dynamic as people make the decision of how they want to work,” he added. “The digital world is providing that flexibility.”

This long-term partnership is just scratching the surface of the continuous improvement the companies have planned, Dugatkin said.

Dugatkin and Gal expect to continue offering seamless integration before, during and after meetings: utilizing Box’s cloud storage, while also offering the ability for offline communication between people so that they can keep the workflow going.

“As Diego said about digitization, we are seeing continuous collaboration enhanced with the communication aspect of meetings day in and day out,” Gal added. “Being able to connect between asynchronous and synchronous with Zoom is addressing the future of work and how it is shaping where we go in the future.”

Aug
25
2021
--

Cribl raises $200M to help enterprises do more with their data

At a time when remote work, cybersecurity attacks and increased privacy and compliance requirements threaten a company’s data, more companies are collecting and storing their observability data, but are being locked in with vendors or have difficulty accessing the data.

Enter Cribl. The San Francisco-based company is developing an “open ecosystem of data” for enterprises that utilizes unified data pipelines, called “observability pipelines,” to parse and route any type of data that flows through a corporate IT system. Users can then choose their own analytics tools and storage destinations like Splunk, Datadog and Exabeam, but without becoming dependent on a vendor.

The company announced Wednesday a $200 million round of Series C funding to value Cribl at $1.5 billion, according to a source close to the company. Greylock and Redpoint Ventures co-led the round and were joined by new investor IVP, existing investors Sequoia and CRV and strategic investment from Citi Ventures and CrowdStrike. The new capital infusion gives Cribl a total of $254 million in funding since the company was started in 2017, Cribl co-founder and CEO Clint Sharp told TechCrunch.

Sharp did not discuss the valuation; however, he believes that the round is “validation that the observability pipeline category is legit.” Data is growing at a compound annual growth rate of 25%, and organizations are collecting five times more data today than they did 10 years ago, he explained.

“Ultimately, they want to ask and answer questions, especially for IT and security people,” Sharp added. “When Zoom sends data on who started a phone call, that might be data I need to know so I know who is on the call from a security perspective and who they are communicating with. Also, who is sending files to whom and what machines are communicating together in case there is a malicious actor. We can also find out who is having a bad experience with the system and what resources they can access to try and troubleshoot the problem.”

Cribl also enables users to choose how they want to store their data, which is different from competitors that often lock companies into using only their products. Instead, customers can buy the best products from different categories and they will all talk to each other through Cribl, Sharp said.

Though Cribl is developing a pipeline for data, Sharp sees it more as an “observability lake,” as more companies have differing data storage needs. He explains that the lake is where all of the data will go that doesn’t need to go into an existing storage solution. The pipelines will send the data to specific tools and then collect the data, and what doesn’t fit will go back into the lake so companies have it to go back to later. Companies can keep the data for longer and more cost effectively.

Cribl said it is seven times more efficient at processing event data and boasts a customer list that includes Whole Foods, Vodafone, FINRA, Fannie Mae and Cox Automotive.

Sharp went after additional funding after seeing huge traction in its existing customer base, saying that “when you see that kind of traction, you want to keep doubling down.” His aim is to have a presence in every North American city and in Europe, to continue launching new products and growing the engineering team.

Up next, the company is focusing on go-to-market and engineering growth. Its headcount is 150 currently, and Sharp expects to grow that to 250 by the end of the year.

Over the last fiscal year, Cribl grew its revenue 293%, and Sharp expects that same trajectory for this year. The company is now at a growth stage, and with the new investment, he believes Cribl is the “future leader in observability.”

“This is a great investment for us, and every dollar, we believe, is going to create an outsized return as we are the only commercial company in this space,” he added.

Scott Raney, managing director at Redpoint Ventures, said his firm is a big enterprise investor in software, particularly in companies that help organizations leverage data to protect themselves, a sweet spot that Cribl falls into.

He feels Sharp is leading a team, having come from Splunk, that has accomplished a lot, has a vision and a handle on the business and knows the market well. Where Splunk is capturing the machine data and using its systems to extract the data, Cribl is doing something similar in directing the data where it needs to go, while also enabling companies to utilize multiple vendors and build apps to sit on top of its infrastructure.

“Cribl is adding opportunity by enriching the data flowing through, and the benefits are going to be meaningful in cost reduction,” Raney said. “The attitude out there is to put data in cheaper places, and afford more flexibility to extract data. Step one is to make that transition, and step two is how to drive the data sitting there. Cribl is doing something that will go from being a big business to a legacy company 30 years from now.”

Aug
25
2021
--

Bodo.ai secures $14M, aims to make Python better at handling large-scale data

Bodo.ai, a parallel compute platform for data workloads, is developing a compiler to make Python portable and efficient across multiple hardware platforms. It announced Wednesday a $14 million Series A funding round led by Dell Technologies Capital.

Python is one of the top programming languages used among artificial intelligence and machine learning developers and data scientists, but as Behzad Nasre, co-founder and CEO of Bodo.ai, points out, it is challenging to use when handling large-scale data.

Bodo.ai, headquartered in San Francisco, was founded in 2019 by Nasre and Ehsan Totoni, CTO, to make Python higher performing and production ready. Nasre, who had a long career at Intel before starting Bodo, met Totoni and learned about the project that he was working on to democratize machine learning and enable parallel learning for everyone. Parallelization is the only way to extend Moore’s Law, Nasre told TechCrunch.

Bodo does this via a compiler technology that automates the parallelization so that data and ML developers don’t have to use new libraries, APIs or rewrite Python into other programming languages or graphics processing unit code to achieve scalability. Its technology is being used to make data analytics tools in real time and is being used across industries like financial, telecommunications, retail and manufacturing.

“For the AI revolution to happen, developers have to be able to write code in simple Python, and that high-performance capability will open new doors,” Totoni said. “Right now, they rely on specialists to rewrite them, and that is not efficient.”

Joining Dell in the round were Uncorrelated Ventures, Fusion Fund and Candou Ventures. Including the new funding, Bodo has raised $14 million in total. The company went after Series A dollars after its product had matured and there was good traction with customers, prompting Bodo to want to scale quicker, Nasre said.

Nasre feels Dell Technologies Capital was “uniquely positioned to help us in terms of reserves and the role they play in the enterprise at large, which is to have the most effective salesforce in enterprise.”

Though he was already familiar with Nasre, Daniel Docter, managing director at Dell Technologies, heard about Bodo from a data scientist friend who told Docter that Bodo’s preliminary results “were amazing.”

Much of Dell’s investments are in the early-stage and in deep tech founders that understand the problem. Docter puts Totoni and Nasre in that category.

“Ehsan fits this perfectly, he has super deep technology knowledge and went out specifically to solve the problem,” he added. “Behzad, being from Intel, saw and lived with the problem, especially seeing Hadoop fail and Spark take its place.”

Meanwhile, with the new funding, Nasre intends to triple the size of the team and invest in R&D to build and scale the company. It will also be developing a marketing and sales team.

The company is now shifting from financing to customer- and revenue-focused as it aims to drive up adoption by the Python community.

“Our technology can translate simple code into the fast code that the experts will try,” Totoni said. “I joined Intel Labs to work on the problem, and we think we have the first solution that will democratize machine learning for developers and data scientists. Now, they have to hand over Python code to specialists who rewrite it for tools. Bodo is a new type of compiler technology that democratizes AI.”

 

Aug
19
2021
--

Companies betting on data must value people as much as AI

The Pareto principle, also known as the 80-20 rule, asserts that 80% of consequences come from 20% of causes, rendering the remainder way less impactful.

Those working with data may have heard a different rendition of the 80-20 rule: A data scientist spends 80% of their time at work cleaning up messy data as opposed to doing actual analysis or generating insights. Imagine a 30-minute drive expanded to two-and-a-half hours by traffic jams, and you’ll get the picture.

As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now.

While most data scientists spend more than 20% of their time at work on actual analysis, they still have to waste countless hours turning a trove of messy data into a tidy dataset ready for analysis. This process can include removing duplicate data, making sure all entries are formatted correctly and doing other preparatory work.

On average, this workflow stage takes up about 45% of the total time, a recent Anaconda survey found. An earlier poll by CrowdFlower put the estimate at 60%, and many other surveys cite figures in this range.

None of this is to say data preparation is not important. “Garbage in, garbage out” is a well-known rule in computer science circles, and it applies to data science, too. In the best-case scenario, the script will just return an error, warning that it cannot calculate the average spending per client, because the entry for customer #1527 is formatted as text, not as a numeral. In the worst case, the company will act on insights that have little to do with reality.

The real question to ask here is whether re-formatting the data for customer #1527 is really the best way to use the time of a well-paid expert. The average data scientist is paid between $95,000 and $120,000 per year, according to various estimates. Having the employee on such pay focus on mind-numbing, non-expert tasks is a waste both of their time and the company’s money. Besides, real-world data has a lifespan, and if a dataset for a time-sensitive project takes too long to collect and process, it can be outdated before any analysis is done.

What’s more, companies’ quests for data often include wasting the time of non-data-focused personnel, with employees asked to help fetch or produce data instead of working on their regular responsibilities. More than half of the data being collected by companies is often not used at all, suggesting that the time of everyone involved in the collection has been wasted to produce nothing but operational delay and the associated losses.

The data that has been collected, on the other hand, is often only used by a designated data science team that is too overworked to go through everything that is available.

All for data, and data for all

The issues outlined here all play into the fact that save for the data pioneers like Google and Facebook, companies are still wrapping their heads around how to re-imagine themselves for the data-driven era. Data is pulled into huge databases and data scientists are left with a lot of cleaning to do, while others, whose time was wasted on helping fetch the data, do not benefit from it too often.

The truth is, we are still early when it comes to data transformation. The success of tech giants that put data at the core of their business models set off a spark that is only starting to take off. And even though the results are mixed for now, this is a sign that companies have yet to master thinking with data.

Data holds much value, and businesses are very much aware of it, as showcased by the appetite for AI experts in non-tech companies. Companies just have to do it right, and one of the key tasks in this respect is to start focusing on people as much as we do on AIs.

Data can enhance the operations of virtually any component within the organizational structure of any business. As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now. The goal for any company looking to tap data today comes down to getting it from point A to point B. Point A is the part in the workflow where data is being collected, and point B is the person who needs this data for decision-making.

Importantly, point B does not have to be a data scientist. It could be a manager trying to figure out the optimal workflow design, an engineer looking for flaws in a manufacturing process or a UI designer doing A/B testing on a specific feature. All of these people must have the data they need at hand all the time, ready to be processed for insights.

People can thrive with data just as well as models, especially if the company invests in them and makes sure to equip them with basic analysis skills. In this approach, accessibility must be the name of the game.

Skeptics may claim that big data is nothing but an overused corporate buzzword, but advanced analytics capacities can enhance the bottom line for any company as long as it comes with a clear plan and appropriate expectations. The first step is to focus on making data accessible and easy to use and not on hauling in as much data as possible.

In other words, an all-around data culture is just as important for an enterprise as the data infrastructure.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com