Jun
04
2018
--

How Yelp (mostly) shut down its own data centers and moved to AWS

Back in 2013, Yelp was a 9-year old company built on a set of internal systems. It was coming to the realization that running its own data centers might not be the most efficient way to run a business that was continuing to scale rapidly. At the same time, the company understood that the tech world had changed dramatically from 2004 when it launched and it needed to transform the underlying technology to a more modern approach.

That’s a lot to take on in one bite, but it wasn’t something that happened willy-nilly or overnight says Jason Fennell, SVP of engineering at Yelp . The vast majority of the company’s data was being processed in a massive Python repository that was getting bigger all the time. The conversation about shifting to a microservices architecture began in 2012.

The company was also running the massive Yelp application inside its own datacenters, and as it grew it was increasingly becoming limited by long lead times required to procure and get new hardware online. It saw this was an unsustainable situation over the long-term and began a process of transforming from running a huge monolithic application on-premises to one built on microservices running in the cloud. It was a quite a journey.

The data center conundrum

Fennell described the classic scenario of a company that could benefit from a shift to the cloud. Yelp had a small operations team dedicated to setting up new machines. When engineering anticipated a new resource requirement, they had to give the operations team sufficient lead time to order new servers and get them up and running, certainly not the most efficient way to deal with a resource problem, and one that would have been easily solved by the cloud.

“We kept running into a bottleneck, I was running a chunk of the search team [at the time] and I had to project capacity out to 6-9 months. Then it would take a few months to order machines and another few months to set them up,” Fennell explained. He emphasized that the team charged with getting these machines going was working hard, but there were too few people and too many demands and something had to give.

“We were on this cusp. We could have scaled up that team dramatically and gotten [better] at building data centers and buying servers and doing that really fast, but we were hearing a lot of AWS and the advantages there,” Fennell explained.

To the cloud!

They looked at the cloud market landscape in 2013 and AWS was the clear leader technologically. That meant moving some part of their operations to EC2. Unfortunately, that exposed a new problem: how to manage this new infrastructure in the cloud. This was before the notion of cloud-native computing even existed. There was no Kubernetes. Sure, Google was operating in a cloud-native fashion in-house, but it was not really an option for most companies without a huge team of engineers.

Yelp needed to explore new ways of managing operations in a hybrid cloud environment where some of the applications and data lived in the cloud and some lived in their data center. It was not an easy problem to solve in 2013 and Yelp had to be creative to make it work.

That meant remaining with one foot in the public cloud and the other in a private data center. One tool that helped ease the transition was AWS Direct Connect, which was released the prior year and enabled Yelp to directly connect from their data center to the cloud.

Laying the groundwork

About this time, as they were figuring out how AWS works, another revolutionary technological change was occurring when Docker emerged and began mainstreaming the notion of containerization. “That’s another thing that’s been revolutionary. We could suddenly decouple the context of the running program from the machine it’s running on. Docker gives you this container, and is much lighter weight than virtualization and running full operating systems on a machine,” Fennell explained.

Another thing that was happening was the emergence of the open source data center operating system called Mesos, which offered a way to treat the data center as a single pool of resources. They could apply this notion to wherever the data and applications lived. Mesos also offered a container orchestration tool called Marathon in the days before Kubernetes emerged as a popular way of dealing with this same issue.

“We liked Mesos as a resource allocation framework. It abstracted away the fleet of machines. Mesos abstracts many machines and controls programs across them. Marathon holds guarantees about what containers are running where. We could stitch it all together into this clear opinionated interface,” he said.

Pulling it all together

While all this was happening, Yelp began exploring how to move to the cloud and use a Platform as a Service approach to the software layer. The problem was at the time they started, there wasn’t really any viable way to do this. In the buy versus build decision making that goes on in large transformations like this one, they felt they had little choice but to build that platform layer themselves.

In late 2013 they began to pull together the idea of building this platform on top of Mesos and Docker, giving it the name PaaSTA, an internal joke that stood for Platform as a Service, Totally Awesome. It became simply known as Pasta.

Photo: David Silverman/Getty Images

The project had the ambitious goal of making their infrastructure work as a single fabric, in a cloud-native fashion before most anyone outside of Google was using that term. Pasta developed slowly with the first developer piece coming online in August 2014 and the first  production service later that year in December. The company actually open sourced the technology the following year.

“Pasta gave us the interface between the applications and development teams. Operations had to make sure Pasta is up and running, while Development was responsible for implementing containers that implemented the interface,” Fennell said.

Moving to deeper into the public cloud

While Yelp was busy building these internal systems, AWS wasn’t sitting still. It was also improving its offerings with new instance types, new functionality and better APIs and tooling. Fennell reports this helped immensely as Yelp began a more complete move to the cloud.

He says there were a couple of tipping points as they moved more and more of the application to AWS — including eventually, the master database. This all happened in more recent years as they understood better how to use Pasta to control the processes wherever they lived. What’s more, he said that adoption of other AWS services was now possible due to tighter integration between the in-house data centers and AWS.

Photo: erhui1979/Getty Images

The first tipping point came around 2016 as all new services were configured for the cloud. He said they began to get much better at managing applications and infrastructure in AWS and their thinking shifted from how to migrate to AWS to how to operate and manage it.

Perhaps the biggest step in this years-long transformation came last summer when Yelp moved its master database from its own data center to AWS. “This was the last thing we needed to move over. Otherwise it’s clean up. As of 2018, we are serving zero production traffic through physical data centers,” he said. While they still have two data centers, they are getting to the point, they have the minimum hardware required to run the network backbone.

Fennell said they went from two weeks to a month to get a service up and running before this was all in place to just a couple of minutes. He says any loss of control by moving to the cloud has been easily offset by the convenience of using cloud infrastructure. “We get to focus on the things where we add value,” he said — and that’s the goal of every company.

Dec
14
2017
--

Instana raises $20 million for its microservice monitoring and management service

 Instana, a company that helps enterprises monitor and manage their microservice deployments with the help of automation and artificial intelligence, today announced that it has raised a $20 million Series B round led by Accel, with participation from existing investor Target Partners. This brings Instana’s total funding to $26 million to date. Launched in 2015, Instana bills itself as… Read More

Oct
12
2017
--

Google, IBM and others launch an open-source API for keeping tabs on software supply chains

 Thanks to containers and microservices, the way we are building software is changing. But you probably still want to know who built a given container and what’s running in it. To get a handle on this, Google, IBM and others today announced Grafeas, a new joint open-source project that provides users with a standardized way for auditing and governing their software supply chain. Read More

Jan
23
2017
--

Cloud Native Computing Foundation adds Linkerd as its fifth hosted project

shipping containers, boxes The Cloud Native Computing Foundation (CNCF), the open source group that’s also the home of the popular Kubernetes container management system, today announced that it has added Linkerd as its fifth hosted project. Linkerd (pronounced linker-DEE), which was incubated at Buoyant, follows in the footsteps of other CNCF projects like Kubernetes, Prometheus, OpenTracing and Fluentd. The… Read More

Sep
13
2016
--

Microsoft’s Azure Service Fabric for running and managing microservices is coming to Linux

A Microsoft logo sits on a flag flying in the grounds of the Nokia Oyj mobile handset factory, operated by Microsoft Corp., in Komarom, Hungary, on Monday, July 21, 2014. Microsoft said it will eliminate as many as 18,000 jobs, the largest round of cuts in its history, as Chief Executive Officer Satya Nadella integrates Nokia Oyj's handset unit and slims down the software maker. Photographer: Akos Stiller/Bloomberg via Getty Images Microsoft’s CTO for Azure (and occasional novelist) Mark Russinovich is extremely bullish about microservices. In his view, the vast majority of apps — including enterprise apps — will soon be built using microservices. Microsoft, with its variety of cloud services and developer tools, obviously wants a piece of that market. With Service Fabric, the company offers a service… Read More

May
25
2016
--

Preparing for infrastructures of the future

containers There are many abstract notions about what the future of infrastructure will look like, but the truth is that the future is staring companies in the face. The adoption of hybrid cloud models, containers and microservices architectures are only the beginning stages of preparing for infrastructures of the future. Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com