Sep
20
2019
--

Chef CEO says he’ll continue to work with ICE in spite of protests

Yesterday, software development tool maker Chef found itself in the middle of a firestorm after a Tweet called them out for doing business with DHS/ICE. Eventually it led to an influential open source developer removing a couple of key pieces of software from the project, bringing down some parts of Chef’s commercial business.

Chef intends to fulfill its contract with ICE, in spite of calls to cancel it. In a blog post published this morning, Chef CEO Barry Crist defended the decision. “I do not believe that it is appropriate, practical, or within our mission to examine specific government projects with the purpose of selecting which U.S. agencies we should or should not do business.”

He stood by the company’s decision this afternoon in an interview with TechCrunch, while acknowledging that it was a difficult and emotional decision for everyone involved. “For some portion of the community, and some portion of our company, this is a super, super-charged lightning rod, and this has been very difficult. It’s something that we spent a lot of time on, and I want to represent that there are portions of [our company] that do not agree with this, but I as a leader of the company, along with the executive team, made a decision that we would honor the contracts and those relationships that were formed and work with them over time,” he said.

He added, “I think our challenge as as leadership right now is how do we collectively navigate through through times like this, and through emotionally-charged issues like the ICE contract.”

The deal with ICE, which is a $95,000 a year contract for software development tools, dates back to the Obama administration when the then DHS CIO wanted to move the department towards more modern agile/DevOps development workflows, according Christ.

He said for people who might think it’s a purely economic decision, the money represents a fraction of the company’s more than $50 million annual revenue (according to Crunchbase data), but he says it’s about a long-term business arrangement with the government that transcends individual administration policies. “It’s not about the $100,000, it’s about decisions we’ve made to engage the government. And I appreciate that not everyone in our world feels the same way or would make that same decision, but that’s the decision that that we made as a leadership team,”Crist said.

Shortly after word of Chef’s ICE contract appeared on Twitter, according to a report in The Register, former Chef employee Seth Vargo removed a couple of key pieces of open source software from the repository, telling The Register that “software engineers have to operate by some kind of moral compass.” This move brought down part of Chef’s commercial software and it took them 24 hours to get those services fully restored, according to Chef CTO Corey Scobie.

Crist says he wants to be clear that his decision does not mean he supports current ICE policies. “I certainly don’t want to be viewed as I’m taking a strong stand in support of ICE. What we’re taking a strong stand on is our consistency with working with our customers, and again, our work with DHS  started in the previous administration on things that we feel very good about,” he said.

Sep
16
2019
--

FOSSA scores $8.5 million Series A to help enterprise manage open-source licenses

As more enterprise developers make use of open source, it becomes increasingly important for companies to make sure that they are complying with licensing requirements. They also need to ensure the open-source bits are being updated over time for security purposes. That’s where FOSSA comes in, and today the company announced an $8.5 million Series A.

The round was led by Bain Capital Ventures, with help from Costanoa Ventures and Norwest Venture Partners. Today’s round brings the total raised to $11 million, according to the company.

Company founder and CEO Kevin Wang says that over the last 18 months, the startup has concentrated on building tools to help enterprises comply with their growing use of open source in a safe and legal way. He says that overall this increasing use of open source is great news for developers, and for these bigger companies in general. While it enables them to take advantage of all the innovation going on in the open-source community, they need to make sure they are in compliance.

“The enterprise is really early on this journey, and that’s where we come in. We provide a platform to help the enterprise manage open-source usage at scale,” Wang explained. That involves three main pieces. First it tracks all of the open-source and third-party code being used inside a company. Next, it enforces licensing and security policy, and, finally, it has a reporting component. “We automate the mass reporting and compliance for all of the housekeeping that comes from using open source at scale,” he said.

The enterprise focus is relatively new for the company. It originally launched in 2017 as a tool for developers to track individual use of open source inside their programs. Wang saw a huge opportunity inside the enterprise to apply this same kind of capability inside larger organizations, which were hungry for tools to help them comply with the myriad open-source licenses out there.

“We found that there was no tooling out there that can manage the scale and breadth across all the different enterprise use cases and all the really complex mission-critical code bases,” he said. What’s more, he found that where there were existing tools, they were vastly underutilized or didn’t provide broad enough coverage.

The company announced a $2.2 million seed round in 2017, and since then has grown from 10 to 40 employees. With today’s funding, that should increase as the company is expanding quickly. Wang reports that the startup has been tripling its revenue numbers and customer accounts year over year. The new money should help accelerate that growth and expand the product and markets it can sell into.

Sep
11
2019
--

Kubernetes co-founder Craig McLuckie is as tired of talking about Kubernetes as you are

“I’m so tired of talking about Kubernetes . I want to talk about something else,” joked Kubernetes co-founder and VP of R&D at VMware Craig McLuckie during a keynote interview at this week’s Cloud Foundry Summit in The Hague. “I feel like that 80s band that had like one hit song — Cherry Pie.”

He doesn’t quite mean it that way, of course (though it makes for a good headline, see above), but the underlying theme of the conversation he had with Cloud Foundry executive director Abby Kearns was that infrastructure should be boring and fade into the background, while enabling developers to do their best work.

“We still have a lot of work to do as an industry to make the infrastructure technology fade into the background and bring forwards the technologies that developers interface with, that enable them to develop the code that drives the business, etc. […] Let’s make that infrastructure technology really, really boring.”

IMG 20190911 115940

What McLuckie wants to talk about is developer experience and with VMware’s intent to acquire Pivotal, it’s placing a strong bet on Cloud Foundry as one of the premiere development platforms for cloud native applications. For the longest time, the Cloud Foundry and Kubernetes ecosystem, which both share an organizational parent in the Linux Foundation, have been getting closer, but that move has accelerated in recent months as the Cloud Foundry ecosystem has finished work on some of its Kubernetes integrations.

McLuckie argues that the Cloud Native Computing Foundation, the home of Kubernetes and other cloud-native, open-source projects, was always meant to be a kind of open-ended organization that focuses on driving innovation. And that created a large set of technologies that vendors can choose from.

“But when you start to assemble that, I tend to think about you building up this cake which is your development stack, you discover that some of those layers of the cake, like Kubernetes, have a really good bake. They are done to perfection,” said McLuckie, who is clearly a fan of the Great British Baking show. “And other layers, you look at it and you think, wow, that could use a little more bake, it’s not quite ready yet. […] And we haven’t done a great job of pulling it all together and providing a recipe that delivers an entirely consumable experience for everyday developers.”

EEK3PG1W4AAaasp

He argues that Cloud Foundry, on the other hand, has always focused on building that highly opinionated, consistent developer experience. “Bringing those two communities together, I think, is going to have incredibly powerful results for both communities as we start to bring these technologies together,” he said.

With the Pivotal acquisition still in the works, McLuckie didn’t really comment on what exactly this means for the path forward for Cloud Foundry and Kubernetes (which he still talked about with a lot of energy, despite being tired of it). But it’s clear that he’s looking to Cloud Foundry to enable that developer experience on top of Kubernetes that abstracts all of the infrastructure away for developers and makes deploying an application a matter of a single CLI command.

Bonus: Cherry Pie.

Sep
11
2019
--

ScyllaDB takes on Amazon with new DynamoDB migration tool

There are a lot of open source databases out there, and ScyllaDB, a NoSQL variety, is looking to differentiate itself by attracting none other than Amazon users. Today, it announced a DynamoDB migration tool to help Amazon customers move to its product.

It’s a bold move, but Scylla, which has a free open source product along with paid versions, has always had a penchant for going after bigger players. It has had a tool to help move Cassandra users to ScyllaDB for some time.

CEO Dor Laor says DynamoDB customers can now also migrate existing code with little modification. “If you’re using DynamoDB today, you will still be using the same drivers and the same client code. In fact, you don’t need to modify your client code one bit. You just need to redirect access to a different IP address running Scylla,” Laor told TechCrunch.

He says that the reason customers would want to switch to Scylla is because it offers a faster and cheaper experience by utilizing the hardware more efficiently. That means companies can run the same workloads on fewer machines, and do it faster, which ultimately should translate to lower costs.

The company also announced a $25 million Series C extension led by Eight Roads Ventures. Existing investors Bessemer Venture Partners, Magma Venture Partners, Qualcomm Ventures and TLV Partners also participated. Scylla has raised a total of $60 million, according to the company.

The startup has been around for 6 years and customers include Comcast, GE, IBM and Samsung. Laor says that Comcast went from running Cassandra on 400 machines to running the same workloads with Scylla on just 60.

Laor is playing the long game in the database market, and it’s not about taking on Cassandra, DynamoDB or any other individual product. “Our main goal is to be the default NoSQL database where if someone has big data, real-time workloads, they’ll think about us first, and we will become the default.”

Sep
10
2019
--

HashiCorp announces fully managed service mesh on Azure

Service mesh is just beginning to take hold in the cloud-native world, and as it does, vendors are looking for ways to help customers understand it. One way to simplify the complexity of dealing with the growing number of service mesh products out there is to package it as a service. Today, HashiCorp announced a new service on Azure to address that need, building it into the Consul product.

HashiCorp co-founder and CTO Armon Dadgar says it’s a fully managed service. “We’ve partnered closely with Microsoft to offer a native Consul [service mesh] service. At the highest level, the goal here is, how do we make it basically push-button,” Dadgar told TechCrunch.

He adds that there is extremely tight integration in terms of billing and permissions, as well as other management functions, as you would expect with a managed service in the public cloud. Brendan Burns, one of the original Kubernetes developers, who is now a distinguished engineer at Microsoft, says the HashiCorp solution really strips away a lot of the complexity associated with running a service mesh.

“In this case, HashiCorp is using some integration into the Azure control plane to run Consul for you. So you just consume the service mesh. You don’t have to worry about the operations of the service mesh, Burns said. He added, “This is really turning it into a service instead of a do-it-yourself exercise.”

Service meshes are tools used in conjunction with containers and Kubernetes in a dynamic cloud native environment to help micro services communicate and interoperate with one another. There is a growing number of them, including Istio, Envoy and Linkerd, jockeying for position right now.

Burns makes it clear that while Microsoft is working closely with HashiCorp on this project, it’s also working with other vendors, as well. “Our goal with the service mesh interface specification was really to let a lot of partners be successful on the platform. You know, there’s a bunch of different service meshes. It’s a place where we feel like there’s a lot of evolution and experimentation happening, so we want to make sure that our customers can can find the right solution for them,” Burns explained.

The HashiCorp Consul service is currently in private beta.

Sep
10
2019
--

Snyk grabs $70M more to detect security vulnerabilities in open-source code and containers

A growing number of IT breaches has led to security becoming a critical and central aspect of how computing systems are run and maintained. Today, a startup that focuses on one specific area — developing security tools aimed at developers and the work they do — has closed a major funding round that underscores the growth of that area.

Snyk — a London and Boston-based company that got its start identifying and developing security solutions for developers working on open-source code — is today announcing that it has raised $70 million, funding that it will be using to continue expanding its capabilities and overall business. For example, the company has more recently expanded to building security solutions to help developers identify and fix vulnerabilities around containers, an increasingly standard unit of software used to package up and run code across different computing environments.

Open source — Snyk works as an integration into existing developer workflows, compatible with the likes of GitHub, Bitbucket and GitLab, as well as CI/CD pipelines — was an easy target to hit. It’s used in 95% of all enterprises, with up to 77% of open-source components liable to have vulnerabilities, by Snyk’s estimates. Containers are a different issue.

“The security concerns around containers are almost more about ownership than technology,” Guy Podjarny, the president who co-founded the company with Assaf Hefetz and Danny Grander, explained in an interview. “They are in a twilight zone between infrastructure and code. They look like virtual machines and suffer many of same concerns such as being unpatched or having permissions that are too permissive.”

While containers are present in fewer than 30% of computing environments today, their growth is on the rise, according to Gartner, which forecasts that by 2022, more than 75% of global organizations will run containerized applications. Snyk estimates that a full 44% of Docker image scans (Docker being one of the major container vendors) have known vulnerabilities.

This latest round is being led by Accel with participation from existing investors GV and Boldstart Ventures. These three, along with a fourth investor (Heavybit) also put $22 million into the company as recently as September 2018. That round was made at a valuation of $100 million, and from what we understand from a source close to the startup, it’s now in the “range” of $500 million.

“Accel has a long history in the security market and we believe Snyk is bringing a truly unique, developer-first approach to security in the enterprise,” said Matt Weigand of Accel said in a statement. “The strength of Snyk’s customer base, rapidly growing free user community, leadership team and innovative product development prove the company is ready for this next exciting phase of growth and execution.”

Indeed, the company has hit some big milestones in the last year that could explain that hike. It now has some 300,000 developers using it around the globe, with its customer base growing some 200% this year and including the likes of Google, Microsoft, Salesforce and ASOS (side note: you know that if developers at developer-centric places themselves working at the vanguard of computing, like Google and Microsoft, are using your product, that is a good sign). Notably, that has largely come by word of mouth — inbound interest.

The company in July of this year took on a new CEO, Peter McKay, who replaced Podjarny. McKay was the company’s first investor and has a track record in helping to grow large enterprise security businesses, a sign of the trajectory that Snyk is hoping to follow.

“Today, every business, from manufacturing to retail and finance, is becoming a software business,” said McKay. “There is an immediate and fast growing need for software security solutions that scale at the same pace as software development. This investment helps us continue to bring Snyk’s product-led and developer-focused solutions to more companies across the globe, helping them stay secure as they embrace digital innovation – without slowing down.”

Sep
06
2019
--

APIs are the next big SaaS wave

While the software revolution started out slowly, over the past few years it’s exploded and the fastest-growing segment to-date has been the shift towards software as a service or SaaS.

SaaS has dramatically lowered the intrinsic total cost of ownership for adopting software, solved scaling challenges and taken away the burden of issues with local hardware. In short, it has allowed a business to focus primarily on just that — its business — while simultaneously reducing the burden of IT operations.

Today, SaaS adoption is increasingly ubiquitous. According to IDG’s 2018 Cloud Computing Survey, 73% of organizations have at least one application or a portion of their computing infrastructure already in the cloud. While this software explosion has created a whole range of downstream impacts, it has also caused software developers to become more and more valuable.

The increasing value of developers has meant that, like traditional SaaS buyers before them, they also better intuit the value of their time and increasingly prefer businesses that can help alleviate the hassles of procurement, integration, management, and operations. Developer needs to address those hassles are specialized.

They are looking to deeply integrate products into their own applications and to do so, they need access to an Application Programming Interface, or API. Best practices for API onboarding include technical documentation, examples, and sandbox environments to test.

APIs tend to also offer metered billing upfront. For these and other reasons, APIs are a distinct subset of SaaS.

For fast-moving developers building on a global-scale, APIs are no longer a stop-gap to the future—they’re a critical part of their strategy. Why would you dedicate precious resources to recreating something in-house that’s done better elsewhere when you can instead focus your efforts on creating a differentiated product?

Thanks to this mindset shift, APIs are on track to create another SaaS-sized impact across all industries and at a much faster pace. By exposing often complex services as simplified code, API-first products are far more extensible, easier for customers to integrate into, and have the ability to foster a greater community around potential use cases.

Screen Shot 2019 09 06 at 10.40.51 AM

Graphics courtesy of Accel

Billion-dollar businesses building APIs

Whether you realize it or not, chances are that your favorite consumer and enterprise apps—Uber, Airbnb, PayPal, and countless more—have a number of third-party APIs and developer services running in the background. Just like most modern enterprises have invested in SaaS technologies for all the above reasons, many of today’s multi-billion dollar companies have built their businesses on the backs of these scalable developer services that let them abstract everything from SMS and email to payments, location-based data, search and more.

Simultaneously, the entrepreneurs behind these API-first companies like Twilio, Segment, Scale and many others are building sustainable, independent—and big—businesses.

Valued today at over $22 billion, Stripe is the biggest independent API-first company. Stripe took off because of its initial laser-focus on the developer experience setting up and taking payments. It was even initially known as /dev/payments!

Stripe spent extra time building the right, idiomatic SDKs for each language platform and beautiful documentation. But it wasn’t just those things, they rebuilt an entire business process around being API-first.

Companies using Stripe didn’t need to fill out a PDF and set up a separate merchant account before getting started. Once sign-up was complete, users could immediately test the API with a sandbox and integrate it directly into their application. Even pricing was different.

Stripe chose to simplify pricing dramatically by starting with a single, simple price for all cards and not breaking out cards by type even though the costs for AmEx cards versus Visa can differ. Stripe also did away with a monthly minimum fee that competitors had.

Many competitors used the monthly minimum to offset the high cost of support for new customers who weren’t necessarily processing payments yet. Stripe flipped that on its head. Developers integrate Stripe earlier than they integrated payments before, and while it costs Stripe a lot in setup and support costs, it pays off in brand and loyalty.

Checkr is another excellent example of an API-first company vastly simplifying a massive yet slow-moving industry. Very little had changed over the last few decades in how businesses ran background checks on their employees and contractors, involving manual paperwork and the help of 3rd party services that spent days verifying an individual.

Checkr’s API gives companies immediate access to a variety of disparate verification sources and allows these companies to plug Checkr into their existing on-boarding and HR workflows. It’s used today by more than 10,000 businesses including Uber, Instacart, Zenefits and more.

Like Checkr and Stripe, Plaid provides a similar value prop to applications in need of banking data and connections, abstracting away banking relationships and complexities brought upon by a lack of tech in a category dominated by hundred-year-old banks. Plaid has shown an incredible ramp these past three years, from closing a $12 million Series A in 2015 to reaching a valuation over $2.5 billion this year.

Today the company is fueling an entire generation of financial applications, all on the back of their well-built API.

Screen Shot 2019 09 06 at 10.41.02 AM

Graphics courtesy of Accel

Then and now

Accel’s first API investment was in Braintree, a mobile and web payment systems for e-commerce companies, in 2011. Braintree eventually sold to, and became an integral part of, PayPal as it spun out from eBay and grew to be worth more than $100 billion. Unsurprisingly, it was shortly thereafter that our team decided to it was time to go big on the category. By the end of 2014 we had led the Series As in Segment and Checkr and followed those investments with our first APX conference in 2015.

Plaid, Segment, Auth0, and Checkr had only raised Seed or Series A financings! And we are even more excited and bullish on the space. To convey just how much API-first businesses have grown in such a short period of time, we thought it would be useful perspective to share some metrics over the past five years, which we’ve broken out in the two visuals included above in this article.

While SaaS may have pioneered the idea that the best way to do business isn’t to actually build everything in-house, today we’re seeing APIs amplify this theme. At Accel, we firmly believe that APIs are the next big SaaS wave — having as much if not more impact as its predecessor thanks to developers at today’s fastest-growing startups and their preference for API-first products. We’ve actively continued to invest in the space (in companies like, Scale, mentioned above).

And much like how a robust ecosystem developed around SaaS, we believe that one will continue to develop around APIs. Given the amount of progress that has happened in just a few short years, Accel is hosting our second APX conference to once again bring together this remarkable community and continue to facilitate discussion and innovation.

Screen Shot 2019 09 06 at 10.41.10 AM

Graphics courtesy of Accel

Aug
29
2019
--

Using linux-fincore to Check Linux Page Cache Usage

linux cache

Check Linux Page Cache UsageIn this short blog post, we will check how to use linux-fincore to check which files are in the in-memory Linux page cache. To have an introductory read about the Linux page cache check here and here.

In summary, whenever you read from or write to a file (unless you are using Direct_IO to bypass the functionality), the result is cached in memory, so that subsequent requests can be served from it, instead of the orders of magnitude-slower disk subsystem (it can also be used to cache writes, before flushing them to disk). This is done as far as there is memory that is not being used by any process; whenever there is a shortage of otherwise free memory, the kernel will choose to first evict the page cache out of it. 

This process is transparent to us as userland dwellers and is generally something that we shouldn’t mind. However, what if we wanted to have more information on it? Is it possible? How can we do it? Let’s find out!

Installing it

Unless it’s for CentOS 6, there seem to be no packages available, so we need to download the source and compile. The steps for this are simple enough:

git clone  https://github.com/yazgoo/linux-ftools.git
cd linux-ftools/
./configure
make
sudo make install

After this, we will have the binaries ready to be used in /usr/local/bin/.

SHELL> linux-fincore --help
fincore version 1.3.0
fincore [options] files...

  -s --summarize          When comparing multiple files, print a summary report
  -p --pages              Print pages that are cached
  -o --only-cached        Only print stats for files that are actually in cache.
  -g --graph              Print a visual graph of each file's cached page distribution.
  -S --min-size           Require that each files size be larger than N bytes.
  -C --min-cached-size    Require that each files cached size be larger than N bytes.
  -P --min-perc-cached    Require percentage of a file that must be cached.
  -h --help               Print this message.
  -L --vertical           Print the output of this script vertically.

Using it

As seen in the previous output, we need to pass either a file or a list of files for it to work. This is kind of strange at first glance, and begs for the question: “What if I don’t provide some files that are indeed in the page cache?” The answer is simple – they won’t be listed, even if they are in cache! Let’s see it in action. First, let’s write two files, and check if they are in the cache (and how much space they are taking up).

SHELL> echo aoeu > test_file_1
SHELL> echo htns > test_file_2
SHELL> linux-fincore -L test_file_1 test_file_2
[...trimmed for brevity...]
test_file_1
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
test_file_2
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
---
total cached size: 8,192

The -L option shows us the output in vertical format, instead of the default columnar style. Now, if we leave out the third argument (which is the second file name):

SHELL> linux-fincore -L test_file_1
[...trimmed for brevity...]
test_file_1
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
---
total cached size: 4,096

We only see test_file_1, even though we know test_file_2 is also cached. This is something to have present every time we use the tool.

A more interesting example is to check from, for instance, a running MySQL server, which files are cached. We can use a command like the following:

SHELL> linux-fincore --only-cached $(find /var/lib/mysql/ -type f)
filename                          size total_pages min_cached page cached_pages cached_size cached_perc
--------                          ---- ----------- --------------- ------------ ----------- -----------
...
/var/lib/mysql/ibdata1      12,582,912 3,072         0 3,072 12,582,912 100.00
/var/lib/mysql/ib_logfile1  50,331,648 12,288         0 12,288 50,331,648 100.00
/var/lib/mysql/ib_logfile0  50,331,648 12,288         0 12,288 50,331,648 100.00
...
---
total cached size: 115,634,176

The –only-cached flag will make it less verbose, by only showing outputs for the files that are in the cache. 

Caveats and Limitations

The number one caveat, as mentioned before, is to provide an accurate list of files to check for, or else the results will be obviously wrong (if we are trying to dimension the whole cache usage).

One limitation is that there is an upper bound on the number of files we can check (at least with one command), given the argument list is not limitless. For instance, one natural way of wanting to check for the whole cache is to use a command like the following:

SHELL> linux-fincore --only-cached $(find / -type f)
-bash: /usr/local/bin/linux-fincore: Argument list too long

The command fails, as we can see, because we exceeded the number of arguments allowed by bash.

Cleaning the Page Cache

This topic is a bit out of scope for the blog post, but I thought that I could at least mention it. There are two ways of manually purging the page cache if needed:

1- directly writing to the /proc/sys/vm/drop_caches file

    sync && \
    echo 1 > /proc/sys/vm/drop_caches && \
    echo 3 > /proc/sys/vm/compact_memory

2- using the sysctl configuration tool

    sync && \
    sysctl vm.drop_caches=1

More information about this here (search for the drop_caches section).

What can I use it for?

The tool can be used for checking which files are cached, and how much memory they are taking. For instance, if you see a spike in memory usage after certain operations, you could:

  • capture initial output with linux-fincore
  • flush the page cache (as shown above)
  • run the problematic operation
  • capture a second sample with linux-fincore

Then you could compare the outputs to see which files were used by the operation, and how much was needed for them.

Further reading and similar tools

There is an extended blog post on this matter, which has more tools and how to use them.

vmtouch – https://hoytech.com/vmtouch/

mincore – http://man7.org/linux/man-pages/man2/mincore.2.html

Written by in: MySQL,Open Source,Zend Developer | Tags:
Aug
27
2019
--

Kubernetes – Introduction to Containers

Kubernetes - Introduction to Containers

Kubernetes - Introduction to ContainersHere at Percona’s Training and Education department, we are always at work with trying to keep our materials up-to-date and relevant to current technologies.  In addition, we keep an eye out for what topics are “hot” and are generating a lot of buzz.  Unless you’ve been living under a rock for the past year, Kubernetes is the current buzz word/technology. This is the first post in this blog series where we will dive into Kubernetes, and explore using it with Percona XtraDB Cluster.

Editor Disclaimer: This post is not intended to convince you to switch to a containerized environment. This post is simply aimed to educate about a growing trend in the industry.

In The Beginning…

Let’s start at the beginning. First, there were hardware-based servers; we still have these today. Despite the prevalence of “the cloud”, many large companies still use private datacenters and run their databases and applications “the traditional way.” Stepping up from hardware we venture into virtual machine territory. Popular solutions here are VMWare (commercial product), QEMU, KVM, and Virtual Box (the latter three are OSS).

Virtual Machines (VM) abstract away the hardware on which they are running using emulation software; this is generally referred to as the hypervisor. Such a methodology allows for an entirely separate operating system (OS) to run “on top of” the OS running natively against the hardware. For example, you may have CentOS 7 running inside a VM, which in turn, is running on your Mac laptop, or you could have Windows 10 running inside a VM, which is running on your Ubuntu 18 desktop PC.

There are several nice things about using VMs in your infrastructure. One of those is the ability to move a VM from one server to another. Likewise, deploying 10 more VMs just like the first one is usually just a couple of clicks or a few commands. Another is a better utilization of hardware resources. If your servers have 8 CPU cores and 128GB of memory, running one application on that server might not fully utilize all that hardware. Instead, you could run multiple VMs on that one server, thus taking advantage of all the resources. Additionally, using a VM allows you to bundle your application, and all associated libraries, and configs into one “unit” that can be deployed and copied very easily (We’ll come back to this concept in a minute).

One of the major downsides of the VM solution is the fact that you have an entire, separate, OS running.  It is a huge waste of resources to run a CentOS 7 VM on top of a CentOS 7 hardware-server.  If you have 5 CentOS 7 VMs running, you have five separate linux kernels (in addition to the host kernel), five separate copies of /usr, and five separate copies of the same libraries and application code taking up precious memory. Some might say that, in this setup, 90-99% of each VM is duplicating efforts already being made by the base OS. Additionally, each VM is dealing with its own CPU context switching management. Also, what happens when one VM needs a bit more CPU than another? There exists the concept of CPU “stealing” where one VM can take CPU from another VM. Imagine if that was your critical DB that had CPU stolen!

What if you could eliminate all that duplication but still give your application the ability to be bundled and deployed across tens or hundreds of servers with just a few commands? Enter: containers.

What is a Container?

Simply put, a container is a type of “lightweight” virtual machine without all the overhead of running an independent operating system on top of a hypervisor. A container can be as small as a single line script or as large as <insert super complicated application with lots of libraries, code, images, etc.>.

With containers, you essentially get all of the “pros” surrounding VMs with a lot less “cons,” mainly in the area of performance. A container is not an emulation, like VMs. On Linux, the kernel provides control groups (cgroups) functionality, which can limit and isolate CPU, memory, disk, and network. This action of isolation becomes the container; everything needed is ‘contained’ in one place/group. In more advanced setups, these cgroups can even limit/throttle the number of resources given to each container. With containers, there is no need for an additional OS install; no emulation, no hypervisor. Because of this, you can achieve “near bare-metal performance” while using containers.

Containers, and any applications running inside the containers, run inside their own, isolated, namespace, and thus, a container can only see its own process, disk contents, devices assigned, and granted network access.

Much like VMs, containers can be started, stopped, and copied from machine to machine, making deployment of applications across hundreds of servers quite easy.

One potential downside to containers is that they are inherently stateless. This means that if you have a running container, and make a change to its configuration or update a library, etc., that state is not saved if the container is restarted. The initial state of the launched container will be restored on restart. That said, most container implementations provide a method for supplying external storage which can hold stateful information to survive a restart or redeployment.

Dock the Ship, Capt’n

The concept of “containers” is generic. Many different implementations of containers exist, much like different distributions of Linux. The most popular implementation of containers in the Linux space is Docker. Docker is free, open-source software, but can also be sold with a commercial license for more advanced features. Docker is built upon one of the earlier linux container implementations, LXC.

Docker uses the concept of “images” to distribute containers. For example, you can download and install the Percona MySQL 8.0 image to your server, and from that image, create as many separate, independent containers of ‘Percona MySQL 8’ as your hardware can handle. Without getting too deep into the specifics, Docker images are the end result of multiple compressed layers which were used during the building process of this image. Herein lies another pro for Docker. If multiple images share common layers, there only needs to be one copy of that layer on any system, thus the potential for space savings can be huge.

Another nice feature of Docker containers is its built-in versioning capabilities. If you are familiar with the version control software Git, then you should know the concept of “tags” which is extremely similar to Docker tags. These tags allow you to track and specify Docker images when creating and deploying images. For example, using the same image (

percona/percona-server

 ), you can deploy

percona/percona-server:5.7

  or

percona/percona-server:8.0

  In this example, “5.7” and “8.0” are the tags used.

Docker also has a huge community ecosystem for sharing images. Visit https://hub.docker.com/ and search for just about any popular OSS project and you will probably find out that someone has already created an image for you to utilize.

Too Many Ships at Port

Docker containers are extremely simple to maintain, deploy, and use within your infrastructure. But what happens when you have hundreds of servers, and you’re trying to manage thousands of docker containers? Sure, if one server goes down, all you need to do, technically speaking, is re-launch those downed containers on another server. Do you have another server in your sea of servers with enough free capacity to handle this additional load? What if you were using persistent storage and need that volume moved along with the container? Running and managing large installations of containers can quickly become overwhelming.

Now we will discuss the topic of container orchestration.

Kubernetes

Pronounced, koo-ber-net-ies, and often abbreviated K8S (there are 8 letters between the ‘K’ and last ‘S’), is the current mainstream project for managing Docker containers at very large scale.  Kubernetes is Greek for “governor”, “helmsman”, or “captain”. Rather than repeating it here, you can read more about the history of K8S on your own.

Terminology

K8S has many moving parts to itself, as you might have imagined. Take a look at this image.

Kubernetes

Source: Wikipedia (modified)

Nodes

At the bottom of the image are two ‘Kubernetes Nodes’. These are usually your physical hardware servers, but could also be VMs at your cloud provider. These nodes will have a minimum OS installed, along with other necessary packages like K8S and Docker.

Kubelet is a management binary used by K8S to watch existing containers, run health checks, and launch new containers.

Kube-proxy is a simple port-mapping proxy service. Let’s say you had a simple webserver running as a container on 172.14.11.109:34223. Your end-users would access an external/public IP over 443 like normal, and kube-proxy would map/route that external access to the correct internal container.

Pods

The next concept to understand is a pod. This is typically the base term used when discussing what is running on a node. A pod can be a single container or can be an entire setup, running multiple containers, with mapped external storage, that should be grouped together as a single unit.

Networking

Networking within K8S is a complex topic, and a deep discussion is not suitable for this post. At an entry-level, what you need to know is that K8S creates an “overlay” network, allowing a container in a pod on node1 to talk to a container in a pod on node322 which is in another datacenter. Should your company policy require it, you can create multiple overlay networks across your entire K8S and use them as a way to isolate traffic between pods (ie: think old-school VLANs).

Master Node

At the top of the diagram, you can see the developer/operator interacting with the K8S master node. This node runs several processes which are critical to K8S operation:

kube-apiserver: A simple server to handle/route/process API requests from other K8S tools and even your own custom tools

etcd: A highly available key-value store. This is where K8S stores all of its stateful data.

kube-scheduler: Manages inventory and decides where to launch new pods based on available node resources.

You might have noticed that there is only 1 master node in the above diagram. This is the default installation. This is not highly available and should the master node go down, the entire K8S infrastructure will go down with it. In a true production environment, it is highly recommended that you run the master node in a highly-available manner.

Wrap It Up

In this post, we discussed how traditional hardware-based setups have been slowly migrating, first towards virtual machine setups and now to containerized deployments. We brought up the issues surrounding managing large container environments and introduced the leading container orchestration platform, Kubernetes. Lastly, we dove into K8S’s various components and architecture.

Once you have your K8S cluster set up with a master node, and many worker nodes, you’ll finally be able to launch some pods and have services accessible. This is what we will do in part 2 of this blog series.

Aug
27
2019
--

Blog Poll: Which of These Issues Have You Experienced as Related to Your Database?

database blog poll

database blog pollOur blog poll series is back again with a new question:  Which of These Issues Have You Experienced as Related to Your Database? Unplanned downtime? Buggy code? Overworked staff? Last year, we asked you a few questions in a blog poll and we received a great amount of feedback. We wanted to follow up on those some of those same survey questions to see what may have changed over the last 12 months. We’d love to hear from you!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

This poll will be up for one month and will be maintained over in the sidebar should you wish to come back at a later date and take part. We look forward to seeing your responses!

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com