Jan
27
2021
--

Pinecone lands $10M seed for purpose-built machine learning database

Pinecone, a new startup from the folks who helped launch Amazon SageMaker, has built a vector database that generates data in a specialized format to help build machine learning applications faster, something that was previously only accessible to the largest organizations. Today the company came out of stealth with a new product and announced a $10 million seed investment led by Wing Venture Capital.

Company co-founder Edo Liberty says that he started the company because of this fundamental belief that the industry was being held back by the lack of wider access to this type of database. “The data that a machine learning model expects isn’t a JSON record, it’s a high dimensional vector that is either a list of features or what’s called an embedding that’s a numerical representation of the items or the objects in the world. This [format] is much more semantically rich and actionable for machine learning,” he explained.

He says that this is a concept that is widely understood by data scientists, and supported by research, but up until now only the biggest and technically superior companies like Google or Pinterest could take advantage of this difference. Liberty and his team created Pinecone to put that kind of technology in reach of any company.

The startup spent the last couple of years building the solution, which consists of three main components. The main piece is a vector engine to convert the data into this machine-learning ingestible format. Liberty says that this is the piece of technology that contains all the data structures and algorithms that allow them to index very large amounts of high dimensional vector data, and search through it in an efficient and accurate way.

The second is a cloud hosted system to apply all of that converted data to the machine learning model, while handling things like index lookups along with the pre- and post-processing — everything a data science team needs to run a machine learning project at scale with very large workloads and throughputs. Finally, there is a management layer to track all of this and manage data transfer between source locations.

One classic example Liberty uses is an eCommerce recommendation engine. While this has been a standard part of online selling for years, he believes using a vectorized data approach will result in much more accurate recommendations and he says the data science research data bears him out.

“It used to be that deploying [something like a recommendation engine] was actually incredibly complex, and […] if you have access to a production grade database, 90% of the difficulty and heavy lifting in creating those solutions goes away, and that’s why we’re building this. We believe it’s the new standard,” he said.

The company currently has 10 people including the founders, but the plan is to double or even triple that number, depending on how the year goes. As he builds his company as an immigrant founder — Liberty is from Israel — he says that diversity is top of mind. He adds that it’s something he worked hard on at his previous positions at Yahoo and Amazon as he was building his teams at those two organizations. One way he is doing that is in the recruitment process. “We have instructed our recruiters to be proactive [in finding more diverse applicants], making sure they don’t miss out on great candidates, and that they bring us a diverse set of candidates,” he said.

Looking ahead to post-pandemic, Liberty says he is a bit more traditional in terms of office versus home, and that he hopes to have more in-person interactions. “Maybe I’m old fashioned but I like offices and I like people and I like to see who I work with and hang out with them and laugh and enjoy each other’s company, and so I’m not jumping on the bandwagon of ‘let’s all be remote and work from home’.”

Nov
12
2020
--

Databricks launches SQL Analytics

AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service that makes it easier for data analysts to run their standard SQL queries directly on data lakes. And with that, enterprises can now easily connect their business intelligence tools like Tableau and Microsoft’s Power BI to these data repositories as well.

SQL Analytics will be available in public preview on November 18.

In many ways, SQL Analytics is the product Databricks has long been looking to build and that brings its concept of a “lake house” to life. It combines the performance of a data warehouse, where you store data after it has already been transformed and cleaned, with a data lake, where you store all of your data in its raw form. The data in the data lake, a concept that Databricks’ co-founder and CEO Ali Ghodsi has long championed, is typically only transformed when it gets used. That makes data lakes cheaper, but also a bit harder to handle for users.

Image Credits: Databricks

“We’ve been saying Unified Data Analytics, which means unify the data with the analytics. So data processing and analytics, those two should be merged. But no one picked that up,” Ghodsi told me. But “lake house” caught on as a term.

“Databricks has always offered data science, machine learning. We’ve talked about that for years. And with Spark, we provide the data processing capability. You can do [extract, transform, load]. That has always been possible. SQL Analytics enables you to now do the data warehousing workloads directly, and concretely, the business intelligence and reporting workloads, directly on the data lake.”

The general idea here is that with just one copy of the data, you can enable both traditional data analyst use cases (think BI) and the data science workloads (think AI) Databricks was already known for. Ideally, that makes both use cases cheaper and simpler.

The service sits on top of an optimized version of Databricks’ open-source Delta Lake storage layer to enable the service to quickly complete queries. In addition, Delta Lake also provides auto-scaling endpoints to keep the query latency consistent, even under high loads.

While data analysts can query these data sets directly, using standard SQL, the company also built a set of connectors to BI tools. Its BI partners include Tableau, Qlik, Looker and Thoughtspot, as well as ingest partners like Fivetran, Fishtown Analytics, Talend and Matillion.

Image Credits: Databricks

“Now more than ever, organizations need a data strategy that enables speed and agility to be adaptable,” said Francois Ajenstat, chief product officer at Tableau. “As organizations are rapidly moving their data to the cloud, we’re seeing growing interest in doing analytics on the data lake. The introduction of SQL Analytics delivers an entirely new experience for customers to tap into insights from massive volumes of data with the performance, reliability and scale they need.”

In a demo, Ghodsi showed me what the new SQL Analytics workspace looks like. It’s essentially a stripped-down version of the standard code-heavy experience with which Databricks users are familiar. Unsurprisingly, SQL Analytics provides a more graphical experience that focuses more on visualizations and not Python code.

While there are already some data analysts on the Databricks platform, this obviously opens up a large new market for the company — something that would surely bolster its plans for an IPO next year.

Aug
06
2020
--

Mode raises $33M to supercharge its analytics platform for data scientists

Data science is the name of the game these days for companies that want to improve their decision making by tapping the information they are already amassing in their apps and other systems. And today, a startup called Mode Analytics, which has built a platform incorporating machine learning, business intelligence and big data analytics to help data scientists fulfill that task, is announcing $33 million in funding to continue making its platform ever more sophisticated.

Most recently, for example, the company has started to introduce tools (including SQL and Python tutorials) for less technical users, specifically those in product teams, so that they can structure queries that data scientists can subsequently execute faster and with more complete responses — important for the many follow-up questions that arise when a business intelligence process has been run. Mode claims that its tools can help produce answers to data queries in minutes.

This Series D is being led by SaaS specialist investor H.I.G. Growth Partners, with previous investors Valor Equity Partners, Foundation Capital, REV Venture Partners and Switch Ventures all participating. Valor led Mode’s Series C in February 2019, while Foundation and REV respectively led its A and B rounds.

Mode is not disclosing its valuation, but co-founder and CEO Derek Steer confirmed in an interview that it was “absolutely” an up-round.

For some context, PitchBook notes that last year its valuation was $106 million. The company now has a customer list that it says covers 52% of the Forbes 500, including Anheuser-Busch, Zillow, Lyft, Bloomberg, Capital One, VMware and Conde Nast. It says that to date it has processed 830 million query runs and 170 million notebook cell runs for 300,000 users. (Pricing is based on a freemium model, with a free “Studio” tier and Business and Enterprise tiers priced based on size and use.)

Mode has been around since 2013, when it was co-founded by Steer, Benn Stancil (Mode’s current president) and Josh Ferguson (initially the CTO and now chief architect).

Steer said the impetus for the startup came out of gaps in the market that the three had found through years of experience at other companies.

Specifically, when all three were working together at Yammer (they were early employees and stayed on after the Microsoft acquisition), they were part of a larger team building custom data analytics tools for Yammer. At the time, Steer said Yammer was paying $1 million per year to subscribe to Vertica (acquired by HP in 2011) to run it.

They saw an opportunity to build a platform that could provide similar kinds of tools — encompassing things like SQL Editors, Notebooks and reporting tools and dashboards — to a wider set of users.

“We and other companies like Facebook and Google were building analytics internally,” Steer recalled, “and we knew that the world wanted to work more like these tech companies. That’s why we started Mode.”

All the same, he added, “people were not clear exactly about what a data scientist even was.”

Indeed, Mode’s growth so far has mirrored that of the rise of data science overall, as the discipline of data science, and the business case for employing data scientists to help figure out what is “going on” beyond the day to day, getting answers by tapping all the data that’s being amassed in the process of just doing business. That means Mode’s addressable market has also been growing.

But even if the trove of potential buyers of Mode’s products has been growing, so has the opportunity overall. There has been a big swing in data science and big data analytics in the last several years, with a number of tech companies building tools to help those who are less technical “become data scientists” by introducing more intuitive interfaces like drag-and-drop features and natural language queries.

They include the likes of Sisense (which has been growing its analytics power with acquisitions like Periscope Data), Eigen (focusing on specific verticals like financial and legal queries), Looker (acquired by Google) and Tableau (acquired by Salesforce).

Mode’s approach up to now has been closer to that of another competitor, Alteryx, focusing on building tools that are still aimed primarily at helping data scientists themselves. You have any number of database tools on the market today, Steer noted, “Snowflake, Redshift, BigQuery, Databricks, take your pick.” The key now is in providing tools to those using those databases to do their work faster and better.

That pitch and the success of how it executes on it is what has given the company success both with customers and investors.

“Mode goes beyond traditional Business Intelligence by making data faster, more flexible and more customized,” said Scott Hilleboe, managing director, H.I.G. Growth Partners, in a statement. “The Mode data platform speeds up answers to complex business problems and makes the process more collaborative, so that everyone can build on the work of data analysts. We believe the company’s innovations in data analytics uniquely position it to take the lead in the Decision Science marketplace.”

Steer said that fundraising was planned long before the coronavirus outbreak to start in February, which meant that it was timed as badly as it could have been. Mode still raised what it wanted to in a couple of months — “a good raise by any standard,” he noted — even if it’s likely that the valuation suffered a bit in the process. “Pitching while the stock market is tanking was terrifying and not something I would repeat,” he added.

Given how many acquisitions there have been in this space, Steer confirmed that Mode too has been approached a number of times, but it’s staying put for now. (And no, he wouldn’t tell me who has been knocking, except to say that it’s large companies for whom analytics is an “adjacency” to bigger businesses, which is to say, the very large tech companies have approached Mode.)

“The reason we haven’t considered any acquisition offers is because there is just so much room,” Steer said. “I feel like this market is just getting started, and I would only consider an exit if I felt like we were handicapped by being on our own. But I think we have a lot more growing to do.”

Aug
05
2020
--

Datafold is solving the chaos of data engineering

It seemed so simple. A small schema issue in a database was wrecking a feature in the app, increasing latency and degrading the user experience. The resident data engineer pops in a fix to amend the schema, and everything seems fine — for now. Unbeknownst to them, that small fix completely clobbered all the dashboards used by the company’s leadership. Finance is down, ops is pissed, and the CEO — well, they don’t even know whether the company is online.

For data engineers, it’s not just a recurring nightmare — it’s a day-to-day reality. A decade plus into that whole “data is the new oil” claptrap, and we’re still managing data piecemeal and without proper systems and controls. Data lakes have become data oceans and data warehouses have become … well, whatever the massive version of a warehouse is called (a waremansion I guess). Data engineers bridge the gap between the messy world of real life and the precise nature of code, and they need much better tools to do their jobs.

As TechCrunch’s unofficial data engineer, I’ve personally struggled with many of these same problems. And so that’s what drew me into Datafold.

Datafold is a brand-new platform for managing the quality assurance of data. Much in the way that a software platform has QA and continuous integration tools to ensure that code functions as expected, Datafold integrates across data sources to ensure that changes in the schema of one table doesn’t knock out functionality somewhere else.

Founder Gleb Mezhanskiy knows these problems firsthand. He’s informed from his time at Lyft, where he was a data scientist and data engineer, and later transformed into a product manager “focused on the productivity of data professionals.” The idea was that as Lyft expanded, it needed much better pipelines and tooling around its data to remain competitive with Uber and others in its space.

His lessons from Lyft inform Datafold’s current focus. Mezhanskiy explained that the platform sits in the connections between all data sources and their outlets. There are two challenges to solve here. First, “data is changing, every day you get new data, and the shape of it can be very different either for business reasons or because your data sources can be broken.” And second, “the old code that is used by companies to transform this data is also changing very rapidly because companies are building new products, they are refactoring their features … a lot of errors can happen.”

In equation form: messy reality + chaos in data engineering = unhappy data end users.

With Datafold, changes made by data engineers in their extractions and transformations can be compared for unintentional changes. For instance, maybe a function that formerly returned an integer now returns a text string, an accidental mistake introduced by the engineer. Rather than wait until BI tools flop and a bunch of alerts come in from managers, Datafold will indicate that there is likely some sort of problem, and identify what happened.

The key efficiency here is that Datafold aggregates changes in datasets — even datasets with billions of entries — into summaries so that data engineers can understand even subtle flaws. The goal is that even if an error transpires in 0.1% of cases, Datafold will be able to identify that issue and also bring a summary of it to the data engineer for response.

Datafold is entering a market that is, quite frankly, as chaotic as the data being processed. It sits in the key middle layer of the data stack — it’s not the data lake or data warehouse for storing data, and it isn’t the end user BI tools like a Looker, Tableau or many others. Instead, it’s part of a number of tools available for data engineers to manage and monitor their data flows to ensure consistency and quality.

The startup is targeting companies with at least 20 people on their data team — that’s the sweet spot where a data team has enough scale and resources that they are going to be concerned with data quality.

Today Datafold is three people, and will be debuting officially at YC’s Demo Day later this month. Its ultimate dream is a world where data engineers never again have to get an overnight page to fix a data quality issue. If you’ve been there, you know precisely why such a product is valuable.

Jun
24
2020
--

Databricks acquires Redash, a visualizations service for data scientists

Data and analytics service Databricks today announced that it has acquired Redash, a company that helps data scientists and analysts visualize their data and build dashboards around it.

Redash’s customers include the likes of Atlassian, Cloudflare, Mozilla and Soundcloud and the company offers both an open-source self-hosted version of its tools, as well as paid hosted options.

The two companies did not disclose the financial details of the acquisition. According to Crunchbase, Tel Aviv-based Redash never raised any outside funding.

Databricks co-founder CEO Ali Ghodsi told me that the two companies met because one of his customers was using the product. “Since then, we’ve been impressed with the entire team and their attention to quality,” he said. “The combination of Redash and Databricks is really the missing link in the equation — an amazing backend with Lakehouse and an amazing front end built-in visualization and dashboarding feature from Redash to make the magic happen.”

Image Credits: Databricks

For Databricks, this is also a clear signal that it wants its service to become the go-to platform for all data teams and offer them all of the capabilities they would need to extract value from their data in a single platform.

“Not only are our organizations aligned in our open source heritage, but we also share in the mission to democratize and simplify data and AI so that data teams and more broadly, business intelligence users, can innovate faster,” Ghodsi noted. “We are already seeing awesome results for our customers in the combined technologies and look forward to continuing to grow together.”

In addition to the Redash acquisition, Databricks also today announced the launch of its Delta Engine, a new high-performance query engine for use with the company’s Delta Lake transaction layer.

Databricks’ new Delta Engine for Delta Lake enables fast query execution for data analytics and data science, without moving the data out of the data lake,” the company explains. “The high-performance query engine has been built from the ground up to take advantage of modern cloud hardware for accelerated query performance. With this improvement, Databricks customers are able to move to a unified data analytics platform that can support any data use case and result in meaningful operational efficiencies and cost savings.”

Jun
24
2020
--

Cape Privacy launches data science collaboration platform with $5.06M seed investment

Cape Privacy emerged from stealth today after spending two years building a platform for data scientists to privately share encrypted data. The startup also announced $2.95 million in new funding and $2.11 million in funding it got when the business launched in 2018, for a total of $5.06 million raised.

Boldstart Ventures and Version One led the round, with participation from Haystack, Radical Ventures and Faktory Ventures.

Company CEO Ché Wijesinghe says that data science teams often have to deal with data sets that contain sensitive data and share data internally or externally for collaboration purposes. It creates a legal and regulatory data privacy conundrum that Cape Privacy is trying to solve.

“Cape Privacy is a collaboration platform designed to help focus on data privacy for data scientists. So the biggest challenge that people have today from a business perspective is managing privacy policies for machine learning and data science,” Wijesinghe told TechCrunch.

The product breaks down that problem into a couple of key areas. First of all it can take language from lawyers and compliance teams and convert that into code that automatically generates policies about who can see the different types of data in a given data set. What’s more, it has machine learning underpinnings so it also learns about company rules and preferences over time.

It also has a cryptographic privacy component. By wrapping the data with a cryptographic cypher, it lets teams share sensitive data in a safe way without exposing the data to people who shouldn’t be seeing it because of legal or regulatory compliance reasons.

“You can send something to a competitor as an example that’s encrypted, and they’re able to process that encrypted data without decrypting it, so they can train their model on encrypted data,” company co-founder and CTO Gavin Uhma explained.

The company closed the new round in April, which means they were raising in the middle of a pandemic, but it didn’t hurt that they had built the product already and were ready to go to market, and that Uhma and his co-founders had already built a successful startup, GoInstant, which was acquired by Salesforce in 2012. (It’s worth noting that GoInstant debuted at TechCrunch Disrupt in 2011.)

Uhma and his team brought Wijesinghe on board to build the sales and marketing team because, as a technical team, they wanted someone with go-to-market experience running the company so they could concentrate on building product.

The company has 14 employees and is already an all-remote team, so the team didn’t have to adjust at all when the pandemic hit. While it plans to keep hiring fairly limited for the foreseeable future, the company has had a diversity and inclusion plan from the start.

“You have to be intentional about about seeking diversity, so it’s something that when we sit down and map out our hiring and work with recruiters in terms of our pipeline, we really make sure that diversity is one of our objectives. You just have it as a goal, as part of your culture, and it’s something that when we see the picture of the team, we want to see diversity,” he said.

Wijesinghe adds, “As a person of color myself, I’m very sensitive to making sure that we have a very diverse team, not just from a color perspective, but a gender perspective as well.”

The company is gearing up to sell the product  and has paid pilots starting in the coming weeks.

May
06
2020
--

Run:AI brings virtualization to GPUs running Kubernetes workloads

In the early 2000s, VMware introduced the world to virtual servers that allowed IT to make more efficient use of idle server capacity. Today, Run:AI is introducing that same concept to GPUs running containerized machine learning projects on Kubernetes.

This should enable data science teams to have access to more resources than they would normally get were they simply allocated a certain number of available GPUs. Company CEO and co-founder Omri Geller says his company believes that part of the issue in getting AI projects to market is due to static resource allocation holding back data science teams.

“There are many times when those important and expensive computer sources are sitting idle, while at the same time, other users that might need more compute power since they need to run more experiments and don’t have access to available resources because they are part of a static assignment,” Geller explained.

To solve that issue of static resource allocation, Run:AI came up with a solution to virtualize those GPU resources, whether on prem or in the cloud, and let IT define by policy how those resources should be divided.

“There is a need for a specific virtualization approaches for AI and actively managed orchestration and scheduling of those GPU resources, while providing the visibility and control over those compute resources to IT organizations and AI administrators,” he said.

Run:AI creates a resource pool, which allocates based on need. Image Credits Run:AI

Run:AI built a solution to bridge this gap between the resources IT is providing to data science teams and what they require to run a given job, while still giving IT some control over defining how that works.

“We really help companies get much more out of their infrastructure, and we do it by really abstracting the hardware from the data science, meaning you can simply run your experiment without thinking about the underlying hardware, and at any moment in time you can consume as much compute power as you need,” he said.

While the company is still in its early stages, and the current economic situation is hitting everyone hard, Geller sees a place for a solution like Run:AI because it gives customers the capacity to make the most out of existing resources, while making data science teams run more efficiently.

He also is taking a realistic long view when it comes to customer acquisition during this time. “These are challenging times for everyone,” he says. “We have plans for longer time partnerships with our customers that are not optimized for short term revenues.”

Run:AI was founded in 2018. It has raised $13 million, according to Geller. The company is based in Israel with offices in the United States. It currently has 25 employees and a few dozen customers.

Feb
27
2020
--

London-based Gyana raises $3.9M for a no-code approach to data science

Coding and other computer science expertise remain some of the more important skills that a person can have in the working world today, but in the last few years, we have also seen a big rise in a new generation of tools providing an alternative way of reaping the fruits of technology: “no-code” software, which lets anyone — technical or non-technical — build apps, games, AI-based chatbots, and other products that used to be the exclusive terrain of engineers and computer scientists.

Today, one of the newer startups in the category — London-based Gyana, which lets non-technical people run data science analytics on any structured dataset — is announcing a round of £3 million to fuel its next stage of growth.

Led by U.K. firm Fuel Ventures, other investors in this round include Biz Stone of Twitter, Green Shores Capital and U+I , and it brings the total raised by the startup to $6.8 million since being founded in 2015.

Gyana (Sanskrit for “knowledge”) was co-founded by Joyeeta Das and David Kell, who were both pursuing post-graduate degrees at Oxford: Das, a former engineer, was getting an MBA, and Kell was doing a Ph. D. in physics.

Das said the idea of building this tool came out of the fact that the pair could see a big disconnect emerging not just in their studies, but also in the world at large — not so much a digital divide, as a digital light year in terms of the distance between the groups of who and who doesn’t know how to work in the realm of data science.

“Everyone talks about using data to inform decision making, and the world becoming data-driven, but actually that proposition is available to less than one percent of the world,” she said.

Out of that, the pair decided to work on building a platform that Das describes as a way to empower “citizen data scientists,” by letting users upload any structured data set (for example, a .CSV file) and running a series of queries on it to be able to visualise trends and other insights more easily.

While the longer term goal may be for any person to be able to produce an analytical insight out of a long list of numbers, the more practical and immediate application has been in enterprise services and building tools for non-technical knowledge workers to make better, data-driven decisions.

To prove out its software, the startup first built an app based on the platform that it calls Neera (Sanskrit for “water”), which specifically parses footfall and other “human movement” metrics, useful for applications in retail, real estate and civic planning — for example to determine well certain retail locations are performing, footfall in popular locations, decisions on where to place or remove stores, or how to price a piece of property.

Starting out with the aim of mid-market and smaller companies — those most likely not to have in-house data scientists to meet their business needs — startup has already picked up a series of customers that are actually quite a lot bigger than that. They include Vodafone, Barclays, EY, Pret a Manger, Knight Frank and the UK Ministry of Defense. It says it has some £1 million in contracts with these firms currently.

That, in turn, has served as the trigger to raise this latest round of funding and to launch Vayu (Sanskrit for “air”) — a more general purpose app that covers a wider set of parameters that can be applied to a dataset. So far, it has been adopted by academic researchers, financial services employees, and others that use analysis in their work, Das said.

With both Vayu and Neera, the aim — refreshingly — is to make the whole experience as privacy-friendly as possible, Das noted. Currently, you download an app if you want to use Gyana, and you keep your data local as you work on it. Gyana has no “anonymization” and no retention of data in its processes, except things like analytics around where your cursor hovers, so that Gyana knows how it can improve its product.

“There are always ways to reverse engineer these things,” Das said of anonymization. “We just wanted to make sure that we are not accidentally creating a situation where, despite learning from anaonyised materials, you can’t reverse engineer what people are analysing. We are just not convinced.”

While there is something commendable about building and shipping a tool with a lot of potential to it, Gyana runs the risk of facing what I think of as the “water, water everywhere” problem. Sometimes if a person really has no experience or specific aim, it can be hard to think of how to get started when you can do anything. Das said they have also identified this, and so while currently Gyana already offers some tutorials and helper tools within the app to nudge the user along, the plan is to eventually bring in a large variety of datasets for people to get started with, and also to develop a more intuitive way to “read” the basics of the files in order to figure out what kinds of data inquiries a person is most likely to want to make.

The rise of “no-code” software has been a swift one in the world of tech spanning the proliferation of startups, big acquisitions, and large funding rounds. Companies like Airtable and DashDash are aimed at building analytics leaning on interfaces that follow the basic design of a spreadsheet; AppSheet, which is a no-code mobile app building platform, was recently acquired by Google; and Roblox (for building games without needing to code) and Uncorq (for app development) have both raised significant funding just this week. In the area of no-code data analytics and visualisation, there are biggies like Tableau, as well as Trifacta, RapidMiner and more.

Gartner predicts that by 2024, some 65% of all app development will be made on low- or no-code platforms, and Forrester estimates that the no- and low-code market will be worth some $10 billion this year, rising to $21.2 billion by 2024.

That represents a big business opportunity for the likes of Gyana, which has been unique in using the no-code approach specifically to tackle the area of data science.

However, in the spirit of citizen data scientists, the intention is to keep a consumer version of the apps free to use as it works on signing up enterprise users with more enhanced paid products, which will be priced on an annual license basis (currently clients are paying between $6,000 and $12,000 depending on usage, she said).

“We want to do free for as long as we can,” Das said, both in relation to the data tools and the datasets that it will offer to users. “The biggest value add is not about accessing premium data that is hard to get. We are not a data marketplace but we want to provide data that makes sense to access,” adding that even with business users, “we’d like you to do 90% of what you want to do without paying for anything.”

Dec
12
2019
--

DataRobot is acquiring Paxata to add data prep to machine learning platform

DataRobot, a company best known for creating automated machine learning models known as AutoML, announced today that it intends to acquire Paxata, a data prep platform startup. The companies did not reveal the purchase price.

Paxata raised a total of $90 million before today’s acquisition, according to the company.

Up until now, DataRobot has concentrated mostly on the machine learning and data science aspect of the workflow — building and testing the model, then putting it into production. The data prep was left to other vendors like Paxata, but DataRobot, which raised $206 million in September, saw an opportunity to fill in a gap in their platform with Paxata.

“We’ve identified, because we’ve been focused on machine learning for so long, a number of key data prep capabilities that are required for machine learning to be successful. And so we see an opportunity to really build out a unique and compelling data prep for machine learning offering that’s powered by the Paxata product, but takes the knowledge and understanding and the integration with the machine learning platform from DataRobot,” Phil Gurbacki, SVP of product development and customer experience at DataRobot, told TechCrunch.

Prakash Nanduri, CEO and co-founder at Paxata, says the two companies were a great fit and it made a lot of sense to come together. “DataRobot has got a significant number of customers, and every one of their customers have a data and information management problem. For us, the deal allows us to rapidly increase the number of customers that are able to go from data to value. By coming together, the value to the customer is increased at an exponential level,” he explained.

DataRobot is based in Boston, while Paxata is in Redwood City, Calif. The plan moving forward is to make Paxata a west coast office, and all of the company’s almost 100 employees will become part of DataRobot when the deal closes.

While the two companies are working together to integrate Paxata more fully into the DataRobot platform, the companies also plan to let Paxata continue to exist as a standalone product.

DataRobot has raised more than $431 million, according to PitchBook data. It raised $206 million of that in its last round. At the time, the company indicated it would be looking for acquisition opportunities when it made sense.

This match-up seems particularly good, given how well the two companies’ capabilities complement one another, and how much customer overlap they have. The deal is expected to close before the end of the year.

Sep
20
2019
--

Vianai emerges with $50M seed and a mission to simplify machine learning tech

You don’t see a startup get a $50 million seed round all that often, but such was the case with Vianai, an early-stage startup launched by Vishal Sikka, former Infosys managing director and SAP executive. The company launched recently with a big check and a vision to transform machine learning.

Just this week, the startup had a coming out party at Oracle Open World, where Sikka delivered one of the keynotes and demoed the product for attendees. Over the last couple of years, since he left Infosys, Sikka has been thinking about the impact of AI and machine learning on society and the way it is being delivered today. He didn’t much like what he saw.

It’s worth noting that Sikka got his PhD from Stanford with a specialty in AI in 1996, so this isn’t something that’s new to him. What’s changed, as he points out, is the growing compute power and increasing amounts of data, all fueling the current AI push inside business. What he saw when he began exploring how companies are implementing AI and machine learning today was a lot of complex tooling, which, in his view, was far more complex than it needed to be.

He saw dense Jupyter notebooks filled with code. He said that if you looked at a typical machine learning model, and stripped away all of the code, what you found was a series of mathematical expressions underlying the model. He had a vision of making that model-building more about the math, while building a highly visual data science platform from the ground up.

The company has been iterating on a solution over the last year with two core principles in mind: explorability and explainability, which involves interacting with the data and presenting it in a way that helps the user attain their goal faster than the current crop of model-building tools.

“It is about making the system reactive to what the user is doing, making it completely explorable, while making it possible for the developer to experiment with what’s happening in a way that is incredibly easy. To make it explainable means being able to go back and forth with the data and the model, using the model to understand the phenomenon that you’re trying to capture in the data,” Sikka told TechCrunch.

He says the tool isn’t just aimed at data scientists, it’s about business users and the data scientists sitting down together and iterating together to get the answers they are seeking, whether it’s finding a way to reduce user churn or discover fraud. These models do not live in a data science vacuum. They all have a business purpose, and he believes the only way to be successful with AI in the enterprise is to have both business users and data scientists sitting together at the same table working with the software to solve a specific problem, while taking advantage of one another’s expertise.

For Sikka, this means refining the actual problem you are trying to solve. “AI is about problem solving, but before you do the problem solving, there is also a [challenge around] finding and articulating a business problem that is relevant to businesses and that has a value to the organization,” he said.

He is very clear, that he isn’t looking to replace humans, but instead wants to use AI to augment human intelligence to solve actual human problems. He points out that this product is not automated machine learning (AutoML), which he considers a deeply flawed idea. “We are not here to automate the jobs of data science practitioners. We are here to augment them,” he said.

As for that massive seed round, Sikka knew it would take a big investment to build a vision like this, and with his reputation and connections, he felt it would be better to get one big investment up front, and he could concentrate on building the product and the company. He says that he was fortunate enough to have investors who believe in the vision, even though as he says, no early business plan survives the test of reality. He didn’t name specific investors, only referring to friends and wealthy and famous people and institutions. A company spokesperson reiterated they were not revealing a list of investors at this time.

For now, the company has a new product and plenty of money in the bank to get to profitability, which he states is his ultimate goal. Sikka could have taken a job running a large organization, but like many startup founders, he saw a problem, and he had an idea how to solve it. That was a challenge he couldn’t resist pursuing.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com