Aug
19
2021
--

Companies betting on data must value people as much as AI

The Pareto principle, also known as the 80-20 rule, asserts that 80% of consequences come from 20% of causes, rendering the remainder way less impactful.

Those working with data may have heard a different rendition of the 80-20 rule: A data scientist spends 80% of their time at work cleaning up messy data as opposed to doing actual analysis or generating insights. Imagine a 30-minute drive expanded to two-and-a-half hours by traffic jams, and you’ll get the picture.

As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now.

While most data scientists spend more than 20% of their time at work on actual analysis, they still have to waste countless hours turning a trove of messy data into a tidy dataset ready for analysis. This process can include removing duplicate data, making sure all entries are formatted correctly and doing other preparatory work.

On average, this workflow stage takes up about 45% of the total time, a recent Anaconda survey found. An earlier poll by CrowdFlower put the estimate at 60%, and many other surveys cite figures in this range.

None of this is to say data preparation is not important. “Garbage in, garbage out” is a well-known rule in computer science circles, and it applies to data science, too. In the best-case scenario, the script will just return an error, warning that it cannot calculate the average spending per client, because the entry for customer #1527 is formatted as text, not as a numeral. In the worst case, the company will act on insights that have little to do with reality.

The real question to ask here is whether re-formatting the data for customer #1527 is really the best way to use the time of a well-paid expert. The average data scientist is paid between $95,000 and $120,000 per year, according to various estimates. Having the employee on such pay focus on mind-numbing, non-expert tasks is a waste both of their time and the company’s money. Besides, real-world data has a lifespan, and if a dataset for a time-sensitive project takes too long to collect and process, it can be outdated before any analysis is done.

What’s more, companies’ quests for data often include wasting the time of non-data-focused personnel, with employees asked to help fetch or produce data instead of working on their regular responsibilities. More than half of the data being collected by companies is often not used at all, suggesting that the time of everyone involved in the collection has been wasted to produce nothing but operational delay and the associated losses.

The data that has been collected, on the other hand, is often only used by a designated data science team that is too overworked to go through everything that is available.

All for data, and data for all

The issues outlined here all play into the fact that save for the data pioneers like Google and Facebook, companies are still wrapping their heads around how to re-imagine themselves for the data-driven era. Data is pulled into huge databases and data scientists are left with a lot of cleaning to do, while others, whose time was wasted on helping fetch the data, do not benefit from it too often.

The truth is, we are still early when it comes to data transformation. The success of tech giants that put data at the core of their business models set off a spark that is only starting to take off. And even though the results are mixed for now, this is a sign that companies have yet to master thinking with data.

Data holds much value, and businesses are very much aware of it, as showcased by the appetite for AI experts in non-tech companies. Companies just have to do it right, and one of the key tasks in this respect is to start focusing on people as much as we do on AIs.

Data can enhance the operations of virtually any component within the organizational structure of any business. As tempting as it may be to think of a future where there is a machine learning model for every business process, we do not need to tread that far right now. The goal for any company looking to tap data today comes down to getting it from point A to point B. Point A is the part in the workflow where data is being collected, and point B is the person who needs this data for decision-making.

Importantly, point B does not have to be a data scientist. It could be a manager trying to figure out the optimal workflow design, an engineer looking for flaws in a manufacturing process or a UI designer doing A/B testing on a specific feature. All of these people must have the data they need at hand all the time, ready to be processed for insights.

People can thrive with data just as well as models, especially if the company invests in them and makes sure to equip them with basic analysis skills. In this approach, accessibility must be the name of the game.

Skeptics may claim that big data is nothing but an overused corporate buzzword, but advanced analytics capacities can enhance the bottom line for any company as long as it comes with a clear plan and appropriate expectations. The first step is to focus on making data accessible and easy to use and not on hauling in as much data as possible.

In other words, an all-around data culture is just as important for an enterprise as the data infrastructure.

Jan
13
2021
--

Stacklet raises $18M for its cloud governance platform

Stacklet, a startup that is commercializing the Cloud Custodian open-source cloud governance project, today announced that it has raised an $18 million Series A funding round. The round was led by Addition, with participation from Foundation Capital and new individual investor Liam Randall, who is joining the company as VP of business development. Addition and Foundation Capital also invested in Stacklet’s seed round, which the company announced last August. This new round brings the company’s total funding to $22 million.

Stacklet helps enterprises manage their data governance stance across different clouds, accounts, policies and regions, with a focus on security, cost optimization and regulatory compliance. The service offers its users a set of pre-defined policy packs that encode best practices for access to cloud resources, though users can obviously also specify their own rules. In addition, Stacklet offers a number of analytics functions around policy health and resource auditing, as well as a real-time inventory and change management logs for a company’s cloud assets.

The company was co-founded by Travis Stanfield (CEO) and Kapil Thangavelu (CTO). Both bring a lot of industry expertise to the table. Stanfield spent time as an engineer at Microsoft and leading DealerTrack Technologies, while Thangavelu worked at Canonical and most recently in Amazon’s AWSOpen team. Thangavelu is also one of the co-creators of the Cloud Custodian project, which was first incubated at Capital One, where the two co-founders met during their time there, and is now a sandbox project under the Cloud Native Computing Foundation’s umbrella.

“When I joined Capital One, they had made the executive decision to go all-in on cloud and close their data centers,” Thangavelu told me. “I got to join on the ground floor of that movement and Custodian was born as a side project, looking at some of the governance and security needs that large regulated enterprises have as they move into the cloud.”

As companies have sped up their move to the cloud during the pandemic, the need for products like Stacklets has also increased. The company isn’t naming most of its customers, but it has disclosed FICO a design partner. Stacklet isn’t purely focused on the enterprise, though. “Once the cloud infrastructure becomes — for a particular organization — large enough that it’s not knowable in a single person’s head, we can deliver value for you at that time and certainly, whether it’s through the open source or through Stacklet, we will have a story there.” The Cloud Custodian open-source project is already seeing serious use among large enterprises, though, and Stacklet obviously benefits from that as well.

“In just 8 months, Travis and Kapil have gone from an idea to a functioning team with 15 employees, signed early Fortune 2000 design partners and are well on their way to building the Stacklet commercial platform,” Foundation Capital’s Sid Trivedi said. “They’ve done all this while sheltered in place at home during a once-in-a-lifetime global pandemic. This is the type of velocity that investors look for from an early-stage company.”

Looking ahead, the team plans to use the new funding to continue to developed the product, which should be generally available later this year, expand both its engineering and its go-to-market teams and continue to grow the open-source community around Cloud Custodian.

Sep
23
2020
--

WhyLabs brings more transparancy to ML ops

WhyLabs, a new machine learning startup that was spun out of the Allen Institute, is coming out of stealth today. Founded by a group of former Amazon machine learning engineers, Alessya Visnjic, Sam Gracie and Andy Dang, together with Madrona Venture Group principal Maria Karaivanova, WhyLabs’ focus is on ML operations after models have been trained — not on building those models from the ground up.

The team also today announced that it has raised a $4 million seed funding round from Madrona Venture Group, Bezos Expeditions, Defy Partners and Ascend VC.

Visnjic, the company’s CEO, used to work on Amazon’s demand forecasting model.

“The team was all research scientists, and I was the only engineer who had kind of tier-one operating experience,” she told me. “So I thought, “Okay, how bad could it be? I carried the pager for the retail website before. But it was one of the first AI deployments that we’d done at Amazon at scale. The pager duty was extra fun because there were no real tools. So when things would go wrong — like we’d order way too many black socks out of the blue — it was a lot of manual effort to figure out why issues were happening.”

Image Credits: WhyLabs

But while large companies like Amazon have built their own internal tools to help their data scientists and AI practitioners operate their AI systems, most enterprises continue to struggle with this — and a lot of AI projects simply fail and never make it into production. “We believe that one of the big reasons that happens is because of the operating process that remains super manual,” Visnjic said. “So at WhyLabs, we’re building the tools to address that — specifically to monitor and track data quality and alert — you can think of it as Datadog for AI applications.”

The team has brought ambitions, but to get started, it is focusing on observability. The team is building — and open-sourcing — a new tool for continuously logging what’s happening in the AI system, using a low-overhead agent. That platform-agnostic system, dubbed WhyLogs, is meant to help practitioners understand the data that moves through the AI/ML pipeline.

For a lot of businesses, Visnjic noted, the amount of data that flows through these systems is so large that it doesn’t make sense for them to keep “lots of big haystacks with possibly some needles in there for some investigation to come in the future.” So what they do instead is just discard all of this. With its data logging solution, WhyLabs aims to give these companies the tools to investigate their data and find issues right at the start of the pipeline.

Image Credits: WhyLabs

According to Karaivanova, the company doesn’t have paying customers yet, but it is working on a number of proofs of concepts. Among those users is Zulily, which is also a design partner for the company. The company is going after mid-size enterprises for the time being, but as Karaivanova noted, to hit the sweet spot for the company, a customer needs to have an established data science team with 10 to 15 ML practitioners. While the team is still figuring out its pricing model, it’ll likely be a volume-based approach, Karaivanova said.

“We love to invest in great founding teams who have built solutions at scale inside cutting-edge companies, who can then bring products to the broader market at the right time. The WhyLabs team are practitioners building for practitioners. They have intimate, first-hand knowledge of the challenges facing AI builders from their years at Amazon and are putting that experience and insight to work for their customers,” said Tim Porter, managing director at Madrona. “We couldn’t be more excited to invest in WhyLabs and partner with them to bring cross-platform model reliability and observability to this exploding category of MLOps.”

Jan
21
2020
--

Canonical’s Anbox Cloud puts Android in the cloud

Canonical, the company behind the popular Ubuntu Linux distribution, today announced the launch of Anbox Cloud, a new platform that allows enterprises to run Android in the cloud.

On Anbox Cloud, Android becomes the guest operating system that runs containerized applications. This opens up a range of use cases, ranging from bespoke enterprise apps to cloud gaming solutions.

The result is similar to what Google does with Android apps on Chrome OS, though the implementation is quite different and is based on the LXD container manager, as well as a number of Canonical projects like Juju and MAAS for provisioning the containers and automating the deployment. “LXD containers are lightweight, resulting in at least twice the container density compared to Android emulation in virtual machines – depending on streaming quality and/or workload complexity,” the company points out in its announcements.

Anbox itself, it’s worth noting, is an open-source project that came out of Canonical and the wider Ubuntu ecosystem. Launched by Canonical engineer Simon Fels in 2017, Anbox runs the full Android system in a container, which in turn allows you to run Android application on any Linux-based platform.

What’s the point of all of this? Canonical argues that it allows enterprises to offload mobile workloads to the cloud and then stream those applications to their employees’ mobile devices. But Canonical is also betting on 5G to enable more use cases, less because of the available bandwidth but more because of the low latencies it enables.

“Driven by emerging 5G networks and edge computing, millions of users will benefit from access to ultra-rich, on-demand Android applications on a platform of their choice,” said Stephan Fabel, director of Product at Canonical, in today’s announcement. “Enterprises are now empowered to deliver high performance, high density computing to any device remotely, with reduced power consumption and in an economical manner.”

Outside of the enterprise, one of the use cases that Canonical seems to be focusing on is gaming and game streaming. A server in the cloud is generally more powerful than a smartphone, after all, though that gap is closing.

Canonical also cites app testing as another use case, given that the platform would allow developers to test apps on thousands of Android devices in parallel. Most developers, though, prefer to test their apps in real — not emulated — devices, given the fragmentation of the Android ecosystem.

Anbox Cloud can run in the public cloud, though Canonical is specifically partnering with edge computing specialist Packet to host it on the edge or on-premise. Silicon partners for the project are Ampere and Intel .

Jul
01
2019
--

We’ll talk even more Kubernetes at TC Sessions: Enterprise with Microsoft’s Brendan Burns and Google’s Tim Hockin

You can’t go to an enterprise conference these days without talking containers — and specifically the Kubernetes container management system. It’s no surprise then, that we’ll do the same at our inaugural TC Sessions: Enterprise event on September 5 in San Francisco. As we already announced last week, Kubernetes co-founder Craig McLuckie and Aparna Sinha, Google’s director of product management for Kubernetes, will join us to talk about the past, present and future of containers in the enterprise.

In addition, we can now announce that two other Kubernetes co-founders will join us: Google principal software engineer Tim Hockin, who currently works on Kubernetes and the Google Container Engine, and Microsoft distinguished engineer Brendan Burns, who was the lead engineer for Kubernetes during his time at Google.

With this, we’ll have three of the four Kubernetes co-founders onstage to talk about the five-year-old project.

Before joining the Kuberntes efforts, Hockin worked on internal Google projects like Borg and Omega, as well as the Linux kernel. On the Kubernetes project, he worked on core features and early design decisions involving networking, storage, node, multi-cluster, resource isolation and cluster sharing.

While his colleagues Craig McLuckie and Joe Beda decided to parlay their work on Kubernetes into a startup, Heptio, which they then successfully sold to VMware for about $550 million, Burns took a different route and joined the Microsoft Azure team three years ago.

I can’t think of a better group of experts to talk about the role that Kubernetes is playing in reshaping how enterprise build software.

If you want a bit of a preview, here is my conversation with McLuckie, Hockin and Microsoft’s Gabe Monroy about the history of the Kubernetes project.

Early-Bird tickets are now on sale for $249; students can grab a ticket for just $75. Book your tickets here before prices go up.

Apr
05
2019
--

On balance, the cloud has been a huge boon to startups

Today’s startups have a distinct advantage when it comes to launching a company because of the public cloud. You don’t have to build infrastructure or worry about what happens when you scale too quickly. The cloud vendors take care of all that for you.

But last month when Pinterest announced its IPO, the company’s cloud spend raised eyebrows. You see, the company is spending $750 million a year on cloud services, more specifically to AWS. When your business is primarily focused on photos and video, and needs to scale at a regular basis, that bill is going to be high.

That price tag prompted Erica Joy, a Microsoft engineer to publish this Tweet and start a little internal debate here at TechCrunch. Startups, after all, have a dog in this fight, and it’s worth exploring if the cloud is helping feed the startup ecosystem, or sending your bills soaring as they have with Pinterest.

For starters, it’s worth pointing out that Ms. Joy works for Microsoft, which just happens to be a primary competitor of Amazon’s in the cloud business. Regardless of her personal feelings on the matter, I’m sure Microsoft would be more than happy to take over that $750 million bill from Amazon. It’s a nice chunk of business, but all that aside, do startups benefit from having access to cloud vendors?

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com