Jan
17
2019
--

Former Facebook engineer picks up $15M for AI platform Spell

In 2016, Serkan Piantino packed up his desk at Facebook with hopes to move on to something new. The former director of Engineering for Facebook AI Research had every intention to keep working on AI, but quickly realized a huge issue.

Unless you’re under the umbrella of one of these big tech companies like Facebook, it can be very difficult and incredibly expensive to get your hands on the hardware necessary to run machine learning experiments.

So he built Spell, which today received $15 million in Series A funding led by Eclipse Ventures and Two Sigma Ventures.

Spell is a collaborative platform that lets anyone run machine learning experiments. The company connects clients with the best, newest hardware hosted by Google, AWS and Microsoft Azure and gives them the software interface they need to run, collaborate and build with AI.

“We spent decades getting to a laptop powerful enough to develop a mobile app or a website, but we’re struggling with things we develop in AI that we haven’t struggled with since the 70s,” said Piantino. “Before PCs existed, the computers filled the whole room at a university or NASA and people used terminals to log into a single main frame. It’s why Unix was invented, and that’s kind of what AI needs right now.”

In a meeting with Piantino this week, TechCrunch got a peek at the product. First, Piantino pulled out his MacBook and opened up Terminal. He began to run his own code against MNIST, which is a database of handwritten digits commonly used to train image detection algorithms.

He started the program and then moved over to the Spell platform. While the original program was just getting started, Spell’s cloud computing platform had completed the test in less than a minute.

The advantage here is obvious. Engineers who want to work on AI, either on their own or for a company, have a huge task in front of them. They essentially have to build their own computer, complete with the high-powered GPUs necessary to run their tests.

With Spell, the newest GPUs from Nvidia and Google are virtually available for anyone to run their tests.

Individual users can get on for free, specify the type of GPU they need to compute their experiment and simply let it run. Corporate users, on the other hand, are able to view the runs taking place on Spell and compare experiments, allowing users to collaborate on their projects from within the platform.

Enterprise clients can set up their own cluster, and keep all of their programs private on the Spell platform, rather than running tests on the public cluster.

Spell also offers enterprise customers a “spell hyper” command that offers built-in support for hyperparameter optimization. Folks can track their models and results and deploy them to Kubernetes/Kubeflow in a single click.

But perhaps most importantly, Spell allows an organization to instantly transform their model into an API that can be used more broadly throughout the organization, or used directly within an app or website.

The implications here are huge. Small companies and startups looking to get into AI now have a much lower barrier to entry, whereas large traditional companies can build out their own proprietary machine learning algorithms for use within the organization without an outrageous upfront investment.

Individual users can get on the platform for free, whereas enterprise clients can get started for $99/month per host you use over the course of a month. Piantino explains that Spell charges based on concurrent usage, so if the customer has 10 concurrent things running, the company considers that the “size” of the Spell cluster and charges based on that.

Piantino sees Spell’s model as the key to defensibility. Whereas many cloud platforms try to lock customers in to their entire suite of products, Spell works with any language framework and lets users plug and play on the platforms of their choice by simply commodifying the hardware. In fact, Spell doesn’t even share with clients which cloud cluster (Microsoft Azure, Google or AWS) they’re on.

So, on the one hand the speed of the tests themselves goes up based on access to new hardware, but, because Spell is an agnostic platform, there is also a huge advantage in how quickly one can get set up and start working.

The company plans to use the funding to further grow the team and the product, and Piantino says he has his eye out for top-tier engineering talent, as well as a designer.

Jan
16
2019
--

Nvidia’s T4 GPUs are now available in beta on Google Cloud

Google Cloud today announced that Nvidia’s Turing-based Tesla T4 data center GPUs are now available in beta in its data centers in Brazil, India, Netherlands, Singapore, Tokyo and the United States. Google first announced a private test of these cards in November, but that was a very limited alpha test. All developers can now take these new T4 GPUs for a spin through Google’s Compute Engine service.

The T4, which essentially uses the same processor architecture as Nvidia’s RTX cards for consumers, slots in-between the existing Nvidia V100 and P4 GPUs on the Google Cloud Platform . While the V100 is optimized for machine learning, though, the T4 (as its P4 predecessor) is more of a general-purpose GPU that also turns out to be great for training models and inferencing.

In terms of machine and deep learning performance, the 16GB T4 is significantly slower than the V100, though if you are mostly running inference on the cards, you may actually see a speed boost. Unsurprisingly, using the T4 is also cheaper than the V100, starting at $0.95 per hour compared to $2.48 per hour for the V100, with another discount for using preemptible VMs and Google’s usual sustained use discounts.

Google says that the card’s 16GB memory should easily handle large machine learning models and the ability to run multiple smaller models at the same time. The standard PCI Express 3.0 card also comes with support for Nvidia’s Tensor Cores to accelerate deep learning and Nvidia’s new RTX ray-tracing cores. Performance tops out at 260 TOPS and developers can connect up to four T4 GPUs to a virtual machine.

It’s worth stressing that this is also the first GPU in the Google Cloud lineup that supports Nvidia’s ray-tracing technology. There isn’t a lot of software on the market yet that actually makes use of this technique, which allows you to render more lifelike images in real time, but if you need a virtual workstation with a powerful next-generation graphics card, that’s now an option.

With today’s beta launch of the T4, Google Cloud now offers quite a variety of Nvidia GPUs, including the K80, P4, P100 and V100, all at different price points and with different performance characteristics.

Oct
10
2018
--

Nvidia launches Rapids to help bring GPU acceleration to data analytics

Nvidia, together with partners like IBM, HPE, Oracle, Databricks and others, is launching a new open-source platform for data science and machine learning today. Rapids, as the company is calling it, is all about making it easier for large businesses to use the power of GPUs to quickly analyze massive amounts of data and then use that to build machine learning models.

“Businesses are increasingly data-driven,” Nvidia’s VP of Accelerated Computing Ian Buck told me. “They sense the market and the environment and the behavior and operations of their business through the data they’ve collected. We’ve just come through a decade of big data and the output of that data is using analytics and AI. But most it is still using traditional machine learning to recognize complex patterns, detect changes and make predictions that directly impact their bottom line.”

The idea behind Rapids then is to work with the existing popular open-source libraries and platforms that data scientists use today and accelerate them using GPUs. Rapids integrates with these libraries to provide accelerated analytics, machine learning and — in the future — visualization.

Rapids is based on Python, Buck noted; it has interfaces that are similar to Pandas and Scikit, two very popular machine learning and data analysis libraries, and it’s based on Apache Arrow for in-memory database processing. It can scale from a single GPU to multiple notes and IBM notes that the platform can achieve improvements of up to 50x for some specific use cases when compared to running the same algorithms on CPUs (though that’s not all that surprising, given what we’ve seen from other GPU-accelerated workloads in the past).

Buck noted that Rapids is the result of a multi-year effort to develop a rich enough set of libraries and algorithms, get them running well on GPUs and build the relationships with the open-source projects involved.

“It’s designed to accelerate data science end-to-end,” Buck explained. “From the data prep to machine learning and for those who want to take the next step, deep learning. Through Arrow, Spark users can easily move data into the Rapids platform for acceleration.”

Indeed, Spark is surely going to be one of the major use cases here, so it’s no wonder that Databricks, the company founded by the team behind Spark, is one of the early partners.

“We have multiple ongoing projects to integrate Spark better with native accelerators, including Apache Arrow support and GPU scheduling with Project Hydrogen,” said Spark founder Matei Zaharia in today’s announcement. “We believe that RAPIDS is an exciting new opportunity to scale our customers’ data science and AI workloads.”

Nvidia is also working with Anaconda, BlazingDB, PyData, Quansight and scikit-learn, as well as Wes McKinney, the head of Ursa Labs and the creator of Apache Arrow and Pandas.

Another partner is IBM, which plans to bring Rapids support to many of its services and platforms, including its PowerAI tools for running data science and AI workloads on GPU-accelerated Power9 servers, IBM Watson Studio and Watson Machine Learning and the IBM Cloud with its GPU-enabled machines. “At IBM, we’re very interested in anything that enables higher performance, better business outcomes for data science and machine learning — and we think Nvidia has something very unique here,” Rob Thomas, the GM of IBM Analytics told me.

“The main benefit to the community is that through an entirely free and open-source set of libraries that are directly compatible with the existing algorithms and subroutines that their used to — they now get access to GPU-accelerated versions of them,” Buck said. He also stressed that Rapids isn’t trying to compete with existing machine learning solutions. “Part of the reason why Rapids is open source is so that you can easily incorporate those machine learning subroutines into their software and get the benefits of it.”

Sep
12
2018
--

Nvidia launches the Tesla T4, its fastest data center inferencing platform yet

Nvidia today announced its new GPU for machine learning and inferencing in the data center. The new Tesla T4 GPUs (where the ‘T’ stands for Nvidia’s new Turing architecture) are the successors to the current batch of P4 GPUs that virtually every major cloud computing provider now offers. Google, Nvidia said, will be among the first to bring the new T4 GPUs to its Cloud Platform.

Nvidia argues that the T4s are significantly faster than the P4s. For language inferencing, for example, the T4 is 34 times faster than using a CPU and more than 3.5 times faster than the P4. Peak performance for the P4 is 260 TOPS for 4-bit integer operations and 65 TOPS for floating point operations. The T4 sits on a standard low-profile 75 watt PCI-e card.

What’s most important, though, is that Nvidia designed these chips specifically for AI inferencing. “What makes Tesla T4 such an efficient GPU for inferencing is the new Turing tensor core,” said Ian Buck, Nvidia’s VP and GM of its Tesla data center business. “[Nvidia CEO] Jensen [Huang] already talked about the Tensor core and what it can do for gaming and rendering and for AI, but for inferencing — that’s what it’s designed for.” In total, the chip features 320 Turing Tensor cores and 2,560 CUDA cores.

In addition to the new chip, Nvidia is also launching a refresh of its TensorRT software for optimizing deep learning models. This new version also includes the TensorRT inference server, a fully containerized microservice for data center inferencing that plugs seamlessly into an existing Kubernetes infrastructure.

 

 

May
30
2018
--

Nvidia launches colossal HGX-2 cloud server to power HPC and AI

Nvidia launched a monster box yesterday called the HGX-2, and it’s the stuff that geek dreams are made of. It’s a cloud server that is purported to be so powerful it combines high performance computing with artificial intelligence requirements in one exceptionally compelling package.

You know you want to know the specs, so let’s get to it: It starts with 16x NVIDIA Tesla V100 GPUs. That’s good for 2 petaFLOPS for AI with low precision, 250 teraFLOPS
for medium precision and 125 teraFLOPS for those times when you need the highest precision. It comes standard with a 1/2 a terabyte of memory and 12 Nvidia NVSwitches, which enable GPU to GPU communications at 300 GB per second. They have doubled the capacity from the HGX-1 released last year.

Chart: Nvidia

Paresh Kharya, group product marketing manager for Nvidia’s Tesla data center products says this communication speed enables them to treat the GPUs essentially as a one giant, single GPU. “And what that allows [developers] to do is not just access that massive compute power, but also access that half a terabyte of GPU memory as a single memory block in their programs,” he explained.

Graphic: Nvidia

Unfortunately you won’t be able to buy one of these boxes. In fact, Nvidia is distributing them strictly to resellers, who will likely package these babies up and sell them to hyperscale datacenters and cloud providers. The beauty of this approach for cloud resellers is that when they buy it, they have the entire range of precision in a single box, Kharya said

“The benefit of the unified platform is as companies and cloud providers are building out their infrastructure, they can standardize on a single unified architecture that supports the entire range of high performance workloads. So whether it’s AI, or whether it’s high performance simulations the entire range of workloads is now possible in just a single platform,”Kharya explained.

He points out this is particularly important in large scale datacenters. “In hyperscale companies or cloud providers, the main benefit that they’re providing is the economies of scale. If they can standardize on the fewest possible architectures, they can really maximize the operational efficiency. And what HGX allows them to do is to standardize on that single unified platform,” he added.

As for developers, they can write programs that take advantage of the underlying technologies and program in the exact level of precision they require from a single box.

The HGX-2 powered servers will be available later this year from partner resellers including Lenovo, QCT, Supermicro and Wiwynn.

Mar
27
2018
--

Pure Storage teams with Nvidia on GPU-fueled Flash storage solution for AI

As companies gather increasing amounts of data, they face a choice over bottlenecks. They can have it in the storage component or the backend compute system. Some companies have attacked the problem by using GPUs to streamline the back end problem or Flash storage to speed up the storage problem. Pure Storage wants to give customers the best of both worlds.

Today it announced, Airi, a complete data storage solution for AI workloads in a box.

Under the hood Airi starts with a Pure Storage FlashBlade, a storage solution that Pure created specifically with AI and machine learning kind of processing in mind. NVidia contributes the pure power with four NVIDIA DGX-1 supercomputers, delivering four petaFLOPS of performance with NVIDIA ® Tesla ® V100 GPUs. Arista provides the networking hardware to make it all work together with Arista 100GbE switches. The software glue layer comes from the NVIDIA GPU Cloud deep learning stack and Pure Storage AIRI Scaling Toolkit.

Photo: Pure Storage

One interesting aspect of this deal is that the FlashBlade product operates as a separate product inside of the Pure Storage organization. They have put together a team of engineers with AI and data pipeline understanding with the focus inside the company on finding ways to move beyond the traditional storage market and find out where the market is going.

This approach certainly does that, but the question is do companies want to chase the on-prem hardware approach or take this kind of data to the cloud. Pure would argue that the data gravity of AI workloads would make this difficult to achieve with a cloud solution, but we are seeing increasingly large amounts of data moving to the cloud with the cloud vendors providing tools for data scientists to process that data.

If companies choose to go the hardware route over the cloud, each vendor in this equation — whether Nvidia, Pure Storage or Arista — should benefit from a multi-vendor sale. The idea ultimately is to provide customers with a one-stop solution they can install quickly inside a data center if that’s the approach they want to take.

Mar
15
2018
--

The red-hot AI hardware space gets even hotter with $56M for a startup called SambaNova Systems

Another massive financing round for an AI chip company is coming in today, this time for SambaNova Systems — a startup founded by a pair of Stanford professors and a longtime chip company executive — to build out the next generation of hardware to supercharge AI-centric operations.

SambaNova joins an already quite large class of startups looking to attack the problem of making AI operations much more efficient and faster by rethinking the actual substrate where the computations happen. The GPU has become increasingly popular among developers for its ability to handle the kinds of lightweight mathematics in very speedy fashion necessary for AI operations. Startups like SambaNova look to create a new platform from scratch, all the way down to the hardware, that is optimized exactly for those operations. The hope is that by doing that, it will be able to outclass a GPU in terms of speed, power usage, and even potentially the actual size of the chip. SambaNova today said it has raised a massive $56 million series A financing round was co-led by GV and Walden International, with participation from Redline Capital and Atlantic Bridge Ventures.

SambaNova is the product of technology from Kunle Olukotun and Chris Ré, two professors at Stanford, and led by former Oracle SVP of development Rodrigo Liang, who was also a VP at Sun for almost 8 years. When looking at the landscape, the team at SambaNova looked to work their way backwards, first identifying what operations need to happen more efficiently and then figuring out what kind of hardware needs to be in place in order to make that happen. That boils down to a lot of calculations stemming from a field of mathematics called linear algebra done very, very quickly, but it’s something that existing CPUs aren’t exactly tuned to do. And a common criticism from most of the founders in this space is that Nvidia GPUs, while much more powerful than CPUs when it comes to these operations, are still ripe for disruption.

“You’ve got these huge [computational] demands, but you have the slowing down of Moore’s law,” Olukotun said. “The question is, how do you meet these demands while Moore’s law slows. Fundamentally you have to develop computing that’s more efficient. If you look at the current approaches to improve these applications based on multiple big cores or many small, or even FPGA or GPU, we fundamentally don’t think you can get to the efficiencies you need. You need an approach that’s different in the algorithms you use and the underlying hardware that’s also required. You need a combination of the two in order to achieve the performance and flexibility levels you need in order to move forward.”

While a $56 million funding round for a series A might sound massive, it’s becoming a pretty standard number for startups looking to attack this space, which has an opportunity to beat massive chipmakers and create a new generation of hardware that will be omnipresent among any device that is built around artificial intelligence — whether that’s a chip sitting on an autonomous vehicle doing rapid image processing to potentially even a server within a healthcare organization training models for complex medical problems. Graphcore, another chip startup, got $50 million in funding from Sequoia Capital, while Cerebras Systems also received significant funding from Benchmark Capital.

Olukotun and Liang wouldn’t go into the specifics of the architecture, but they are looking to redo the operational hardware to optimize for the AI-centric frameworks that have become increasingly popular in fields like image and speech recognition. At its core, that involves a lot of rethinking of how interaction with memory occurs and what happens with heat dissipation for the hardware, among other complex problems. Apple, Google with its TPU, and reportedly Amazon have taken an intense interest in this space to design their own hardware that’s optimized for products like Siri or Alexa, which makes sense because dropping that latency to as close to zero as possible with as much accuracy as possible in the end improves the user experience. A great user experience leads to more lock-in for those platforms, and while the larger players may end up making their own hardware, GV’s Dave Munichiello — who is joining the company’s board — says this is basically a validation that everyone else is going to need the technology soon enough.

“Large companies see a need for specialized hardware and infrastructure,” he said. “AI and large-scale data analytics are so essential to providing services the largest companies provide that they’re willing to invest in their own infrastructure, and that tells us more investment is coming. What Amazon and Google and Microsoft and Apple are doing today will be what the rest of the Fortune 100 are investing in in 5 years. I think it just creates a really interesting market and an opportunity to sell a unique product. It just means the market is really large, if you believe in your company’s technical differentiation, you welcome competition.”

There is certainly going to be a lot of competition in this area, and not just from those startups. While SambaNova wants to create a true platform, there are a lot of different interpretations of where it should go — such as whether it should be two separate pieces of hardware that handle either inference or machine training. Intel, too, is betting on an array of products, as well as a technology called Field Programmable Gate Arrays (or FPGA), which would allow for a more modular approach in building hardware specified for AI and are designed to be flexible and change over time. Both Munichiello’s and Olukotun’s arguments are that these require developers who have a special expertise of FPGA, which is a sort of niche-within-a-niche that most organizations will probably not have readily available.

Nvidia has been a massive benefactor in the explosion of AI systems, but it clearly exposed a ton of interest in investing in a new breed of silicon. There’s certainly an argument for developer lock-in on Nvidia’s platforms like Cuda. But there are a lot of new frameworks, like TensorFlow, that are creating a layer of abstraction that are increasingly popular with developers. That, too represents an opportunity for both SambaNova and other startups, who can just work to plug into those popular frameworks, Olukotun said. Cerebras Systems CEO Andrew Feldman actually also addressed some of this on stage at the Goldman Sachs Technology and Internet Conference last month.

“Nvidia has spent a long time building an ecosystem around their GPUs, and for the most part, with the combination of TensorFlow, Google has killed most of its value,” Feldman said at the conference. “What TensorFlow does is, it says to researchers and AI professionals, you don’t have to get into the guts of the hardware. You can write at the upper layers and you can write in Python, you can use scripts, you don’t have to worry about what’s happening underneath. Then you can compile it very simply and directly to a CPU, TPU, GPU, to many different hardwares, including ours. If in order to do that work, you have to be the type of engineer that can do hand-tuned assembly or can live deep in the guts of hardware, there will be no adoption… We’ll just take in their TensorFlow, we don’t have to worry about anything else.”

(As an aside, I was once told that Cuda and those other lower-level platforms are really used by AI wonks like Yann LeCun building weird AI stuff in the corners of the Internet.)

There are, also, two big question marks for SambaNova: first, it’s very new, having started in just November while many of these efforts for both startups and larger companies have been years in the making. Munichiello’s answer to this is that the development for those technologies did, indeed, begin a while ago — and that’s not a terrible thing as SambaNova just gets started in the current generation of AI needs. And the second, among some in the valley, is that most of the industry just might not need hardware that’s does these operations in a blazing fast manner. The latter, you might argue, could just be alleviated by the fact that so many of these companies are getting so much funding, with some already reaching close to billion-dollar valuations.

But, in the end, you can now add SambaNova to the list of AI startups that have raised enormous rounds of funding — one that stretches out to include a myriad of companies around the world like Graphcore and Cerebras Systems, as well as a lot of reported activity out of China with companies like Cambricon Technology and Horizon Robotics. This effort does, indeed, require significant investment not only because it’s hardware at its base, but it has to actually convince customers to deploy that hardware and start tapping the platforms it creates, which supporting existing frameworks hopefully alleviates.

“The challenge you see is that the industry, over the last ten years, has underinvested in semiconductor design,” Liang said. “If you look at the innovations at the startup level all the way through big companies, we really haven’t pushed the envelope on semiconductor design. It was very expensive and the returns were not quite as good. Here we are, suddenly you have a need for semiconductor design, and to do low-power design requires a different skillset. If you look at this transition to intelligent software, it’s one of the biggest transitions we’ve seen in this industry in a long time. You’re not accelerating old software, you want to create that platform that’s flexible enough [to optimize these operations] — and you want to think about all the pieces. It’s not just about machine learning.”

Sep
21
2017
--

Google Cloud adds support for more powerful Nvidia GPUs

 Google Cloud Platform announced support for some powerful Nvidia GPUs on Google Compute Engine today. For starters, the company is making Nvidia K80 GPUs generally available. At the same time, it’s launching support for Nvidia P100 GPUs in Beta along with a new sustained pricing model. For companies working with machine learning workloads, having access to GPUs in the cloud provides… Read More

Aug
07
2017
--

IBM touts improved distributed training time for visual recognition models

 Two months ago, Facebook’s AI Research Lab (FAIR) published some impressive training times for massively distributed visual recognition models. Today IBM is firing back with some numbers of its own. IBM’s research groups says it was able to train ResNet-50 for 1k classes in 50 minutes across 256 GPUs — which is just the polite way of saying “my model trains faster than… Read More

Aug
02
2017
--

HP’s new Nvidia-powered backpack VR PC is designed for work, not play

 HP has a new entrant in that most curious PC niche – the backpack computer. A product of the virtual reality computing wave, the backpack PC provides all the power needed to drive high-quality VR headsets like Oculus Rift and HTC Vive, but with a form factor that allows the user to roam about untethered. The new HP Z VR Backpack is a bit different from the rest of the field, though,… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com