Apr
21
2020
--

AWS and Facebook launch an open-source model server for PyTorch

AWS and Facebook today announced two new open-source projects around PyTorch, the popular open-source machine learning framework. The first of these is TorchServe, a model-serving framework for PyTorch that will make it easier for developers to put their models into production. The other is TorchElastic, a library that makes it easier for developers to build fault-tolerant training jobs on Kubernetes clusters, including AWS’s EC2 spot instances and Elastic Kubernetes Service.

In many ways, the two companies are taking what they have learned from running their own machine learning systems at scale and are putting this into the project. For AWS, that’s mostly SageMaker, the company’s machine learning platform, but as Bratin Saha, AWS VP and GM for Machine Learning Services, told me, the work on PyTorch was mostly motivated by requests from the community. And while there are obviously other model servers like TensorFlow Serving and the Multi Model Server available today, Saha argues that it would be hard to optimize those for PyTorch.

“If we tried to take some other model server, we would not be able to quote optimize it as much, as well as create it within the nuances of how PyTorch developers like to see this,” he said. AWS has lots of experience in running its own model servers for SageMaker that can handle multiple frameworks, but the community was asking for a model server that was tailored toward how they work. That also meant adapting the server’s API to what PyTorch developers expect from their framework of choice, for example.

As Saha told me, the server that AWS and Facebook are now launching as open source is similar to what AWS is using internally. “It’s quite close,” he said. “We actually started with what we had internally for one of our model servers and then put it out to the community, worked closely with Facebook, to iterate and get feedback — and then modified it so it’s quite close.”

Bill Jia, Facebook’s VP of AI Infrastructure, also told me, he’s very happy about how his team and the community has pushed PyTorch forward in recent years. “If you look at the entire industry community — a large number of researchers and enterprise users are using AWS,” he said. “And then we figured out if we can collaborate with AWS and push PyTorch together, then Facebook and AWS can get a lot of benefits, but more so, all the users can get a lot of benefits from PyTorch. That’s our reason for why we wanted to collaborate with AWS.”

As for TorchElastic, the focus here is on allowing developers to create training systems that can work on large distributed Kubernetes clusters where you might want to use cheaper spot instances. Those are preemptible, though, so your system has to be able to handle that, while traditionally, machine learning training frameworks often expect a system where the number of instances stays the same throughout the process. That, too, is something AWS originally built for SageMaker. There, it’s fully managed by AWS, though, so developers never have to think about it. For developers who want more control over their dynamic training systems or to stay very close to the metal, TorchElastic now allows them to recreate this experience on their own Kubernetes clusters.

AWS has a bit of a reputation when it comes to open source and its engagement with the open-source community. In this case, though, it’s nice to see AWS lead the way to bring some of its own work on building model servers, for example, to the PyTorch community. In the machine learning ecosystem, that’s very much expected, and Saha stressed that AWS has long engaged with the community as one of the main contributors to MXNet and through its contributions to projects like Jupyter, TensorFlow and libraries like NumPy.

Dec
03
2019
--

AWS AutoPilot gives you more visible AutoML in SageMaker Studio

Today at AWS re:Invent in Las Vegas, the company announced AutoPilot, a new tool that gives you greater visibility into automated machine learning model creation, known as AutoML. This new tool is part of the new SageMaker Studio also announced today.

As AWS CEO Andy Jassy pointed out onstage today, one of the problems with AutoML is that it’s basically a black box. If you want to improve a mediocre model, or just evolve it for your business, you have no idea how it was built.

The idea behind AutoPilot is to give you the ease of model creation you get from an AutoML-generated model, but also give you much deeper insight into how the system built the model. “AutoPilot is a way to create a model automatically, but give you full visibility and control,” Jassy said.

“Using a single API call, or a few clicks in Amazon SageMaker Studio, SageMaker Autopilot first inspects your data set, and runs a number of candidates to figure out the optimal combination of data preprocessing steps, machine learning algorithms and hyperparameters. Then, it uses this combination to train an Inference Pipeline, which you can easily deploy either on a real-time endpoint or for batch processing. As usual with Amazon SageMaker, all of this takes place on fully-managed infrastructure,” the company explained in a blog post announcing the new feature.

You can look at the model’s parameters, and see 50 automated models, and it provides you with a leader board of what models performed the best. What’s more, you can look at the model’s underlying notebook, and also see what trade-offs were made to generate that best model. For instance, it may be the most accurate, but sacrifices speed to get that.

Your company may have its own set of unique requirements and you can choose the best model based on whatever parameters you consider to be most important, even though it was generated in an automated fashion.

Once you have the model you like best, you can go into SageMaker Studio, select it and launch it with a single click. The tool is available now.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com