Nov
02
2020
--

AWS launches its next-gen GPU instances

AWS today announced the launch of its newest GPU-equipped instances. Dubbed P4d, these new instances are launching a decade after AWS launched its first set of Cluster GPU instances. This new generation is powered by Intel Cascade Lake processors and eight of Nvidia’s A100 Tensor Core GPUs. These instances, AWS promises, offer up to 2.5x the deep learning performance of the previous generation — and training a comparable model should be about 60% cheaper with these new instances.

Image Credits: AWS

For now, there is only one size available, the p4d.24xlarge instance, in AWS slang, and the eight A100 GPUs are connected over Nvidia’s NVLink communication interface and offer support for the company’s GPUDirect interface as well.

With 320 GB of high-bandwidth GPU memory and 400 Gbps networking, this is obviously a very powerful machine. Add to that the 96 CPU cores, 1.1 TB of system memory and 8 TB of SSD storage and it’s maybe no surprise that the on-demand price is $32.77 per hour (though that price goes down to less than $20/hour for one-year reserved instances and $11.57 for three-year reserved instances.

Image Credits: AWS

On the extreme end, you can combine 4,000 or more GPUs into an EC2 UltraCluster, as AWS calls these machines, for high-performance computing workloads at what is essentially a supercomputer-scale machine. Given the price, you’re not likely to spin up one of these clusters to train your model for your toy app anytime soon, but AWS has already been working with a number of enterprise customers to test these instances and clusters, including Toyota Research Institute, GE Healthcare and Aon.

“At [Toyota Research Institute], we’re working to build a future where everyone has the freedom to move,” said Mike Garrison, Technical Lead, Infrastructure Engineering at TRI. “The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours and we are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

Mar
05
2020
--

Nvidia acquires data storage and management platform SwiftStack

Nvidia today announced that it has acquired SwiftStack, a software-centric data storage and management platform that supports public cloud, on-premises and edge deployments.

The company’s recent launches focused on improving its support for AI, high-performance computing and accelerated computing workloads, which is surely what Nvidia is most interested in here.

“Building AI supercomputers is exciting to the entire SwiftStack team,” says the company’s co-founder and CPO Joe Arnold in today’s announcement. “We couldn’t be more thrilled to work with the talented folks at NVIDIA and look forward to contributing to its world-leading accelerated computing solutions.”

The two companies did not disclose the price of the acquisition, but SwiftStack had previously raised about $23.6 million in Series A and B rounds led by Mayfield Fund and OpenView Venture Partners. Other investors include Storm Ventures and UMC Capital.

SwiftStack, which was founded in 2011, placed an early bet on OpenStack, the massive open-source project that aimed to give enterprises an AWS-like management experience in their own data centers. The company was one of the largest contributors to OpenStack’s Swift object storage platform and offered a number of services around this, though it seems like in recent years, it has downplayed the OpenStack relationship as that platform’s popularity has fizzled in many verticals.

SwiftStack lists the likes of PayPal, Rogers, data center provider DC Blox, Snapfish and Verizon (TechCrunch’s parent company) on its customer page. Nvidia, too, is a customer.

SwiftStack notes that it team will continue to maintain existing set of open source tools like Swift, ProxyFS, 1space and Controller.

“SwiftStack’s technology is already a key part of NVIDIA’s GPU-powered AI infrastructure, and this acquisition will strengthen what we do for you,” says Arnold.

Jan
16
2019
--

Nvidia’s T4 GPUs are now available in beta on Google Cloud

Google Cloud today announced that Nvidia’s Turing-based Tesla T4 data center GPUs are now available in beta in its data centers in Brazil, India, Netherlands, Singapore, Tokyo and the United States. Google first announced a private test of these cards in November, but that was a very limited alpha test. All developers can now take these new T4 GPUs for a spin through Google’s Compute Engine service.

The T4, which essentially uses the same processor architecture as Nvidia’s RTX cards for consumers, slots in-between the existing Nvidia V100 and P4 GPUs on the Google Cloud Platform . While the V100 is optimized for machine learning, though, the T4 (as its P4 predecessor) is more of a general-purpose GPU that also turns out to be great for training models and inferencing.

In terms of machine and deep learning performance, the 16GB T4 is significantly slower than the V100, though if you are mostly running inference on the cards, you may actually see a speed boost. Unsurprisingly, using the T4 is also cheaper than the V100, starting at $0.95 per hour compared to $2.48 per hour for the V100, with another discount for using preemptible VMs and Google’s usual sustained use discounts.

Google says that the card’s 16GB memory should easily handle large machine learning models and the ability to run multiple smaller models at the same time. The standard PCI Express 3.0 card also comes with support for Nvidia’s Tensor Cores to accelerate deep learning and Nvidia’s new RTX ray-tracing cores. Performance tops out at 260 TOPS and developers can connect up to four T4 GPUs to a virtual machine.

It’s worth stressing that this is also the first GPU in the Google Cloud lineup that supports Nvidia’s ray-tracing technology. There isn’t a lot of software on the market yet that actually makes use of this technique, which allows you to render more lifelike images in real time, but if you need a virtual workstation with a powerful next-generation graphics card, that’s now an option.

With today’s beta launch of the T4, Google Cloud now offers quite a variety of Nvidia GPUs, including the K80, P4, P100 and V100, all at different price points and with different performance characteristics.

Aug
07
2017
--

IBM touts improved distributed training time for visual recognition models

 Two months ago, Facebook’s AI Research Lab (FAIR) published some impressive training times for massively distributed visual recognition models. Today IBM is firing back with some numbers of its own. IBM’s research groups says it was able to train ResNet-50 for 1k classes in 50 minutes across 256 GPUs — which is just the polite way of saying “my model trains faster than… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com