Aug
30
2019
--

PMM for PostgreSQL: Quick Start Guide

PMM for PostgreSQL

As a Solutions Engineer at Percona, one of my responsibilities is to support our customer-facing roles such as the sales and customer success teams. This affords me the opportunity to speak to many current and new customers who partner with Percona. I often find that many people are interested in Percona Monitoring and Management (PMM) as a free and open-source monitoring solution due to its robust monitoring capabilities when compared to many SaaS-based monitoring solutions. They are interested in installing PMM for PostgreSQL for the first time and want a “quick start guide” with a brief overview to get their feet wet. I have included the commands to get started for both PMM 1 and PMM 2 (PMM2 is still in beta).

For a brief overview of the PMM Architecture and how to install PMM Server, please see my previous post PMM for MongoDB: Quick Start Guide.

PMM for PostgreSQL

Demonstration Environment

When deploying PMM in this example, I am making the following assumptions about the environment:

  • PostgreSQL and the monitoring host are running on Debian based operating systems. (For information on installing as an RPM instead please read Deploying Percona Monitoring and Management.)
  • PostgreSQL is already installed and setup. The username and password for the PostgreSQL user are postgres:postgres.
  • The PMM server docker image has been installed and started on another host.

Installing PMM for PostgreSQL

Setting up DB permissions

  • Capturing read and write time statistics is possible only if PostgreSQL’s track_io_timing setting is enabled. This can be done either in the configuration file or with the following query executed on the running system:
ALTER SYSTEM SET track_io_timing=ON;
SELECT pg_reload_conf();

  • Percona recommends that a PostgreSQL user be configured for SUPERUSER level access, in order to gather the maximum amount of data with a minimum amount of complexity. This can be done with the following command for the standalone PostgreSQL installation:
CREATE USER postgres WITH SUPERUSER ENCRYPTED PASSWORD 'postgres';

If you are using RDS:

CREATE USER postgres WITH rds_superuser ENCRYPTED PASSWORD 'postgres';

Download the Percona repo package

We must first enable the Percona package repository on our PostgreSQL instance and install the PMM Client. Please refer to PMM for MongoDB: Quick Start Guide to accomplish the first three steps below (but come back here before doing MongoDB-specific things)

  • Download PMM-Client
  • Install PMM-Client
  • Configure PMM-Client for monitoring

Now we provide the PMM Client credentials necessary for monitoring the PostgreSQL database.  Execute the following command to start monitoring and communicating with the PMM server:

sudo pmm-admin add postgresql --user=postgres --password=postgres

To start monitoring and communicating with the PMM 2 Server:

pmm-admin add postgresql --username=postgres --password=postgres

You should get a similar output as below if it was successful:

Great! We have successfully installed PMM for PostgreSQL and are ready to take a look at the dashboard.

Of Note: We’re launching PMM 2 Beta with just the PostgreSQL Overview dashboard, but we have others under development, so watch for new Dashboards to appear in subsequent releases!

PostgreSQL Overview Dashboard

Navigate to the IP address of your monitoring host. http://<pmm_server_ip>.

The PostgreSQL Overview Dashboard contains the following graphs:

  • PostgreSQL Connections
  • Active Connections
  • Tuples
  • Read Tuple Activity
  • Tuples Changed per (time resolution)
  • Transactions
  • Durations of Transactions
  • Number of Temp Files
  • Size of Temp Files
  • Conflicts/Deadlocks
  • Numer of Locks
  • Operations with Blocks
  • Buffers
  • Canceled Queries
  • Cache Hit Ratio
  • Checkpoint Stats
  • PostgreSQL Settings
  • System Information

For more information about PMM 2 please read: Percona Monitoring and Management (PMM) 2 Beta Is Now Available

Aug
29
2019
--

Using linux-fincore to Check Linux Page Cache Usage

linux cache

Check Linux Page Cache UsageIn this short blog post, we will check how to use linux-fincore to check which files are in the in-memory Linux page cache. To have an introductory read about the Linux page cache check here and here.

In summary, whenever you read from or write to a file (unless you are using Direct_IO to bypass the functionality), the result is cached in memory, so that subsequent requests can be served from it, instead of the orders of magnitude-slower disk subsystem (it can also be used to cache writes, before flushing them to disk). This is done as far as there is memory that is not being used by any process; whenever there is a shortage of otherwise free memory, the kernel will choose to first evict the page cache out of it. 

This process is transparent to us as userland dwellers and is generally something that we shouldn’t mind. However, what if we wanted to have more information on it? Is it possible? How can we do it? Let’s find out!

Installing it

Unless it’s for CentOS 6, there seem to be no packages available, so we need to download the source and compile. The steps for this are simple enough:

git clone  https://github.com/yazgoo/linux-ftools.git
cd linux-ftools/
./configure
make
sudo make install

After this, we will have the binaries ready to be used in /usr/local/bin/.

SHELL> linux-fincore --help
fincore version 1.3.0
fincore [options] files...

  -s --summarize          When comparing multiple files, print a summary report
  -p --pages              Print pages that are cached
  -o --only-cached        Only print stats for files that are actually in cache.
  -g --graph              Print a visual graph of each file's cached page distribution.
  -S --min-size           Require that each files size be larger than N bytes.
  -C --min-cached-size    Require that each files cached size be larger than N bytes.
  -P --min-perc-cached    Require percentage of a file that must be cached.
  -h --help               Print this message.
  -L --vertical           Print the output of this script vertically.

Using it

As seen in the previous output, we need to pass either a file or a list of files for it to work. This is kind of strange at first glance, and begs for the question: “What if I don’t provide some files that are indeed in the page cache?” The answer is simple – they won’t be listed, even if they are in cache! Let’s see it in action. First, let’s write two files, and check if they are in the cache (and how much space they are taking up).

SHELL> echo aoeu > test_file_1
SHELL> echo htns > test_file_2
SHELL> linux-fincore -L test_file_1 test_file_2
[...trimmed for brevity...]
test_file_1
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
test_file_2
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
---
total cached size: 8,192

The -L option shows us the output in vertical format, instead of the default columnar style. Now, if we leave out the third argument (which is the second file name):

SHELL> linux-fincore -L test_file_1
[...trimmed for brevity...]
test_file_1
size: 5
total_pages: 1
min_cached_page: 0
cached: 1
cached_size: 4,096
cached_perc: 100.00
---
total cached size: 4,096

We only see test_file_1, even though we know test_file_2 is also cached. This is something to have present every time we use the tool.

A more interesting example is to check from, for instance, a running MySQL server, which files are cached. We can use a command like the following:

SHELL> linux-fincore --only-cached $(find /var/lib/mysql/ -type f)
filename                          size total_pages min_cached page cached_pages cached_size cached_perc
--------                          ---- ----------- --------------- ------------ ----------- -----------
...
/var/lib/mysql/ibdata1      12,582,912 3,072         0 3,072 12,582,912 100.00
/var/lib/mysql/ib_logfile1  50,331,648 12,288         0 12,288 50,331,648 100.00
/var/lib/mysql/ib_logfile0  50,331,648 12,288         0 12,288 50,331,648 100.00
...
---
total cached size: 115,634,176

The –only-cached flag will make it less verbose, by only showing outputs for the files that are in the cache. 

Caveats and Limitations

The number one caveat, as mentioned before, is to provide an accurate list of files to check for, or else the results will be obviously wrong (if we are trying to dimension the whole cache usage).

One limitation is that there is an upper bound on the number of files we can check (at least with one command), given the argument list is not limitless. For instance, one natural way of wanting to check for the whole cache is to use a command like the following:

SHELL> linux-fincore --only-cached $(find / -type f)
-bash: /usr/local/bin/linux-fincore: Argument list too long

The command fails, as we can see, because we exceeded the number of arguments allowed by bash.

Cleaning the Page Cache

This topic is a bit out of scope for the blog post, but I thought that I could at least mention it. There are two ways of manually purging the page cache if needed:

1- directly writing to the /proc/sys/vm/drop_caches file

    sync && \
    echo 1 > /proc/sys/vm/drop_caches && \
    echo 3 > /proc/sys/vm/compact_memory

2- using the sysctl configuration tool

    sync && \
    sysctl vm.drop_caches=1

More information about this here (search for the drop_caches section).

What can I use it for?

The tool can be used for checking which files are cached, and how much memory they are taking. For instance, if you see a spike in memory usage after certain operations, you could:

  • capture initial output with linux-fincore
  • flush the page cache (as shown above)
  • run the problematic operation
  • capture a second sample with linux-fincore

Then you could compare the outputs to see which files were used by the operation, and how much was needed for them.

Further reading and similar tools

There is an extended blog post on this matter, which has more tools and how to use them.

vmtouch – https://hoytech.com/vmtouch/

mincore – http://man7.org/linux/man-pages/man2/mincore.2.html

Written by in: MySQL,Open Source,Zend Developer | Tags:
Aug
29
2019
--

Marc Benioff will discuss building a socially responsible and successful startup at TechCrunch Disrupt

Salesforce chairman, co-founder and CEO Marc Benioff took a lot of big chances when he launched the company 20 years ago. For starters, his was one of the earliest enterprise SaaS companies, but he wasn’t just developing a company on top of a new platform, he was building one from scratch with social responsibility built-in.

Fast-forward 20 years and that company is wildly successful. In its most recent earnings report, it announced a $4 billion quarter, putting it on a $16 billion run rate, and making it by far the most successful SaaS company ever.

But at the heart of the company’s DNA is a charitable streak, and it’s not something they bolted on after getting successful. Even before the company had a working product, in the earliest planning documents, Salesforce wanted to be a different kind of company. Early on, it designed the 1-1-1 philanthropic model that set aside 1% of Salesforce’s equity, and 1% of its product and 1% of its employees’ time to the community. As the company has grown, that model has serious financial teeth now, and other startups over the years have also adopted the same approach using Salesforce as a model.

In our coverage of Dreamforce, the company’s enormous annual customer conference, in 2016, Benioff outlined his personal philosophy around giving back:

You are at work, and you have great leadership skills. You can isolate yourselves and say I’m going to put those skills to use in a box at work, or you can say I’m going to have an integrated life. The way I look at the world, I’m going to put those skills to work to make the world a better place.

This year Benioff is coming to TechCrunch Disrupt in San Francisco to discuss with TechCrunch editors how to build a highly successful business, while giving back to the community and the society your business is part of. In fact, he has a book coming out in mid-October called Trailblazer: The Power of Business as the Greatest Platform for Change, in which he writes about how businesses can be a positive social force.

Benioff has received numerous awards over the years for his entrepreneurial and charitable spirit, including Innovator of the Decade from Forbes, one of the World’s 25 Greatest Leaders from Fortune, one of the 10 Best-Performing CEOs from Harvard Business Review, GLAAD, the Billie Jean King Leadership Initiative for his work on equality and the Variety Magazine EmPOWerment Award.

It’s worth noting that in 2018, a group of 618 Salesforce employees presented Benioff with a petition protesting the company’s contract with the Customs and Border Patrol (CBP). Benioff in public comments stated that the tools were being used in recruitment and management, and not helping to separate families at the border. While Salesforce did not cancel the contract, at the time, co-CEO Keith Block stated that the company would donate $1 million to organizations helping separated families, as well as match any internal employee contributions through its charitable arm, Salesforce.org.

Disrupt SF runs October 2 to October 4 at the Moscone Center in the heart of San Francisco. Tickets are available here.

Did you know Extra Crunch annual members get 20% off all TechCrunch event tickets? Head over here to get your annual pass, and then email extracrunch@techcrunch.com to get your 20% discount. Please note that it can take up to 24 hours to issue the discount code.


Aug
29
2019
--

Mews grabs $33M Series B to modernize hotel administration

If you think about the traditional hotel business, there hasn’t been a ton of innovation. You mostly still stand in a line to check in, and sometimes even to check out. You let the staff know about your desire for privacy with a sign on the door. Mews believes it’s time to rethink how hotels work in a more modern digital context, especially on the administrative side, and today it announced a $33 million Series B led by Battery Ventures.

When Mews founder Richard Valtr started his own hotel in Prague in 2012, he wanted to change how hotels have operated traditionally. “I really wanted to change the way that hotel systems are built to make sure that it’s more about the experience that the guest is actually having, rather than facilitating the kind of processes that hotels have built over the last hundred years,” Valtr told TechCrunch.

He said most of the innovation in this space has been in the B2C area, using Airbnb as a prime example. He wants to bring that kind of change to the way hotels operate. “That’s essentially what Mews is trying to do. [We want to shift the focus to] the fundamental things about why we love to travel and why people actually love to stay in hotels, experience hotels, and be cared for by professional staff. We are trying to do that in a way that that actually delivers a really meaningful experience and personalized experience to that one particular customer,” he explained.

For starters, Mews is a cloud-based system that automates a lot of the manual tasks, like room assignments that hotel staff at many hotels often still have to handle as part of their jobs. Valtr believes by freeing the staff from these kinds of tedious activities, it enables them to concentrate more on the guests.

It also offers ways for guests and hotels to customize their stays to get the best experience possible. Valtr says this approach brings a new level of flexibility that allows hotels to create new revenue opportunities, while letting guests choose the kind of stay they want.

From a guest perspective, they could by-pass the check-in process altogether, sharing all of their registration details ahead of time and getting a pass code sent to their phone to get into the room. The system integrates with third-party hotel booking sites like Booking.com and Expedia, as well as other services, through its open hospitality API, which offers lots of opportunities for properties to partner with local businesses.

The company is currently operating at 1,000 properties across 47 countries, but it lacks a presence in the U.S. and wants to use this round to open an office in NYC and expand into this market. “We really want to attack the U.S. market because that’s essentially where most of the decision makers for all of the major chains are. And we’re not going to change the industry if we don’t actually change the thinking of the biggest brands,” Valtr said.

Today, the company has 270 employees spread across 10 offices around the world. Headquarters are in Prague and London, but the company is in the process of opening that NYC office, and the number of employees will expand when that happens.

Aug
28
2019
--

ReadMe scores $9M Series A to help firms customize API docs

Software APIs help different tools communicate with one another, let developers access essential services without having to code it themselves and are critical components for driving a platform-driven strategy. Yet they require solid documentation to help make the best use of them. ReadMe, a startup that helps companies customize their API documentation, announced a $9 million Series A today led by Accel with help from Y Combinator. The company was part of the Y Combinator Winter 2015 cohort.

Prior to today’s funding announcement, the company had taken just a $1.2 million seed round in 2014. Today, it reports 3,000 paying customers and that it has been profitable for the last several years, an unusual position for a startup. In spite of this success, co-founder and CEO Gregory Koberger said as the company has taken on larger customers, they have more sophisticated requirements, and that prompted them to take this round of funding.

In addition, it has expanded the platform to use a company’s API logs to help create more dynamic documentation and improve customer support kinds of scenarios. But by taking on data from other companies, it needs to make sure the data is secure, and today’s funding will help in that regard.

“We’re going to still build the company traditionally by hiring more engineers, more support people, more designers, the obvious stuff, but the main impetus for doing this was that we started working with bigger companies with more secure data. So a lot of the money is going to help make sure that we handle that right,” Koberger explained.

Screenshot 2019 08 28 10.55.38

Image: ReadMe

He says this ability to make use of the API logs has opened up all kinds of possibilities for the company, as the data provides a valuable window into how people use the APIs. “It’s amazing how much you get by just actually seeing what the server sees. When people are having problems with an API, they can debug it themselves because they can actually see the problems, the support team can see it as well,” Koberger said.

Accel’s Dan Levine, whose firm is leading the investment, believes that having good documentation is the difference between making and breaking an API. “APIs don’t just create technical integration, they create ecosystems around core services and underpin corporate partnerships that generate billions of dollars. ReadMe is as much a strategy as it is a service for businesses. Providing clean, interactive, data-driven API documentation to make developers love working with you can be the difference between 100 partnerships or 1,000 partnerships,” Levine said.

ReadMe was founded in 2014. It has 22 employees in their San Francisco office, a number that should increase with today’s funding.

Aug
28
2019
--

ThoughtSpot hauls in $248M Series E on $1.95B valuation

ThoughtSpot was started by a bunch of ex-Googlers looking to bring the power of search to data. Seven years later the company is growing fast, sporting a fat valuation of almost $2 billion and looking ahead to a possible IPO. Today it announced a hefty $248 million Series E round as it continues on its journey.

Investors include Silver Lake Waterman, Silver Lake’s late-stage growth capital fund, along with existing investors Lightspeed Venture Partners, Sapphire Ventures and Geodesic Capital. Today’s funding brings the total raised to $554 million, according to the company.

The company wants to help customers bring speed to data analysis by answering natural language questions about the data without having to understand how to formulate a SQL query. As a person enters questions, ThoughtSpot translates that question into SQL, then displays a chart with data related to the question, all almost instantly (at least in the demo).

It doesn’t stop there though. It also uses artificial intelligence to understand intent to help come up the exact correct answer. ThoughtSpot CEO Sudheesh Nair says that this artificial intelligence underpinning is key to the product. As he explained, if you are looking for the answer to a specific question, like “What is the profit margin of red shoes in Portland?,” there won’t be multiple answers. There is only one answer, and that’s where artificial intelligence really comes into play.

“The bar on delivering that kind of answer is very high and because of that, understanding intent is critical. We use AI for that. You could ask, ‘How did we do with red shoes in Portland?’ I could ask, ‘What is the profit margin of red shoes in Portland?’ The system needs to know that we both are asking the same question. So there’s a lot of AI that goes behind it to understand the intent,” Nair explained.

image 10

Image: ThoughtSpot

ThoughtSpot gets answers to queries by connecting to a variety of internal systems, like HR, CRM and ERP, and uses all of this data to answer the question, as best it can. So far, it appears to be working. The company has almost 250 large-company customers, and is on a run rate of close to $100 million.

Nair said the company didn’t necessarily need the money, with $100 million still in the bank, but he saw an opportunity, and he seized it. He says the money gives him a great deal of flexibility moving forward, including the possibility of acquiring companies to fill in missing pieces or to expand the platform’s capabilities. It also will allow him to accelerate growth. Plus, he sees the capital markets possibly tightening next year and he wanted to strike while the opportunity was in front of him.

Nair definitely sees the company going public at some point. “With these kind of resources behind us, it actually opens up an opportunity for us to do any sort of IPO that we want. I do think that a company like this will benefit from going public because Global 2000 kind of customers, where we have our most of our business, appreciate the transparency and the stability represented by public companies,” he said.

He added, “And with $350 million in the bank, it’s totally [possible to] IPO, which means that a year and a half from now if we are ready to take the company public, we can actually have all options open, including a direct listing, potentially. I’m not saying we will do that, but I’m saying that with this kind of funding behind us, we have all those options open.”

Aug
27
2019
--

Talking Drupal #225 – Tome

In episode #223 we explore the Tome module with maintainer Sam Mortenson.

www.talkingdrupal.com/225

Topics

  • What is Tome
  • Tome origin story
  • How Tome works
  • Tome pieces: Static, Sync, Netlify, Lunr
  • Stephen’s experience with Tome
  • Tome vs other products

Resources

Tome Module

Tome Documentation

 

Tome Websites

Tome Documentation Umami OVH jpoesen Hike with Gravity GNUSCHICHTEN Badzilla aikido essen karnap Poop PDX Switching to Linux 

Guest

Sam Mortenson  @DrupalSAM  mortenson.coffee/blog

Hosts

Stephen Cross – www.stephencross.com @s tephencross

John Picozzi – www.oomphinc.com @johnpicozzi

Nic Laflin – www.nLighteneddevelopment.com @nicxvan

 

 

Aug
27
2019
--

How to move from VP of Sales to CRO with leading exec recruiter David Ives

It wasn’t so long ago that sales meant just showing up with a deck and a smile. These days, it seems that sales leaders almost need a PhD in statistics just to get through the typical day managing a sales funnel. From SQLs and MQLs to NDRR and managing overall retention, the roles of VP of Sales and Chief Revenue Officers (CROs) are evolving rapidly in tandem with the best practices of SaaS startups.

Few people know this world better than David Ives, who is a partner at True Search, one of the top executive recruiting firms in the country where he co-leads the go-to-market practice. David has led countless CRO and VP of Sales searches, and in the process, has learned not just what CEOs and boards are looking for, but also the kinds of skills that candidates need to shine in these important career inflection points.

In our conversation, we talk about the evolving nature of the sales org, how leaders can best position themselves for future advancement, what companies are looking for today in new executive sales hires, and compensation changes in the industry.

This interview has been extensively edited and condensed for clarity

Introduction and background

Danny: Why don’t we start with your background — how did you get into recruiting?

David: So my background was definitely unique. I started as an enterprise sales rep of the truest form selling subscription-based data analytics and systems into capital markets, so into investment banks, trading desks, hedge funds, asset managers, portfolio managers — you name it. Then I drifted purposely, intentionally away from capital markets and did about four different growth technology companies. I landed at NewsCred, and it was a neat time — it was really the birth of the startup landscape with the whole Flatiron district in New York.

Later, I was looking for my next CRO opportunity and was networking with some of the investor folks that I knew. I had a friend of mine who was a talent partner at a private equity firm who said to me, “I’ve always thought that you’d be really good at this and we’re starting to push for our search firms to have operators.” I went and met with Brad and Joe [founders of True], and three weeks later I was in the seat.

Danny: That’s great. And what do you do at True?

David: Well, we moved to a specialization model right when I got here. I don’t know if I was the test case or not, but I didn’t know search, so my skillset was that I knew the role. I run our go-to-market practice with another partner, and we have probably 40, 45 people in that group. We focus exclusively on sales, marketing, customer success, we’ll do biz dev. I probably skew more to CRO than anything else, but I do CMO and VP of marketing as well, and then I do a handful of business development, chief client officers, and VPs of customer success a year. That’s my mix basically.

What is the skillset of a modern CRO?

Danny: You’ve been in the sales leadership space for a long time, and you’ve been in the recruiting space for a couple of years. What are some of the changes that you’re seeing today in terms of candidates, skills, and experiences?

David: I think a big change has been from what I call a backend pipeline manager to what I would call a full funnel manager.

Aug
27
2019
--

SAP covers hot topics at TechCrunch’s Sept. 5 Enterprise show in SF

You can’t talk enterprise software without talking SAP, one of the giants in a $500 billion industry. And not only will SAP’s CEO Bill McDermott share insights at TC Sessions: Enterprise 2019 on September 5, but the company will also sponsor two breakout sessions.

The editors will sit down with McDermott and talk about SAP’s quick growth due, in part, to several $1 billion-plus acquisitions. We’re also curious to hear about his approach to acquisitions and his strategy for growing the company in a quickly changing market. No doubt he’ll weigh in on the state of enterprise software in general, too.

Now about those breakout sessions. They run in parallel to our Main Stage set and we have a total of two do-not-miss presentations for you to enjoy. On September 5, you’ll enjoy three breakout sessions –two from SAP and one from Pricefx. You can check out the agenda for TC Sessions: Enterprise, but we want to shine the light on the sponsored sessions to give you a sense of the quality content you can expect:

  • Innovating for a Super-Human Future 
    Martin Wezowski (SAP)
    We talk about change, but what are the mechanics and the dynamics behind it? And how fast is it? The noted futurist will discuss what it means to be an innovator is transforming faster than before, and this transformation is deeply rooted in the challenges and promises between cutting-edge tech and humanism. The symbiosis between human creativity & empathy and machine intelligence opens new worlds for our imagination in a time when “now” has never been so temporary, and helps us answer the question: “What is human, and what is work in a superhuman future?” (Sponsored by SAP)
  • Pricing from Day One
    Madhavan Ramanujam (Simon-Kucher & Partners, Gabriel Smith) and Darius Jakubik (Pricefx) A key ingredient distinguishing top performing companies is clear focus on price. To maximize revenue and profits, pricing should be a C-level / boardroom consideration. To optimize pricing, you should think about price when determining which products and features to bring to market; put the people, process and technology in place to optimize it; and maintain flexibility to adjust strategy and tactics to respond to changing markets. By doing so, companies unlock the single greatest profit lever that exists. (Sponsored by Pricefx)
  • Cracking the Code: From Startup to Scaleup in Enterprise Software 
    Ram Jambunathan (SAP.iO), Lonnie Rae Kurlander (Medal), Caitlin MacGregor (Plum) and Dimitri Sirota (BigID) The startup journey is hard. Data shows that 70% of upstart tech companies fail, while only 1% of these startups will go on to gain unicorn status. Success in enterprise software often requires deep industry experience, strong networks, brutally efficient execution and a bit of luck. This panel brings together three successful SAP.iO Fund-backed enterprise startups for an open discussion on lessons learned, challenges of scaling and why the right strategic investors or partners can be beneficial even at early stages. (Sponsored by SAP)

TC Sessions: Enterprise 2019 takes place in San Francisco on September 5. It’s a jam-packed day (agenda here) filled with interviews, panel discussions and breakouts — from some of the top minds in enterprise software. Buy your ticket today and remember: You receive a free Expo-only pass to TechCrunch Disrupt SF 2019 for every ticket you buy.

Aug
27
2019
--

Kubernetes – Introduction to Containers

Kubernetes - Introduction to Containers

Kubernetes - Introduction to ContainersHere at Percona’s Training and Education department, we are always at work with trying to keep our materials up-to-date and relevant to current technologies.  In addition, we keep an eye out for what topics are “hot” and are generating a lot of buzz.  Unless you’ve been living under a rock for the past year, Kubernetes is the current buzz word/technology. This is the first post in this blog series where we will dive into Kubernetes, and explore using it with Percona XtraDB Cluster.

Editor Disclaimer: This post is not intended to convince you to switch to a containerized environment. This post is simply aimed to educate about a growing trend in the industry.

In The Beginning…

Let’s start at the beginning. First, there were hardware-based servers; we still have these today. Despite the prevalence of “the cloud”, many large companies still use private datacenters and run their databases and applications “the traditional way.” Stepping up from hardware we venture into virtual machine territory. Popular solutions here are VMWare (commercial product), QEMU, KVM, and Virtual Box (the latter three are OSS).

Virtual Machines (VM) abstract away the hardware on which they are running using emulation software; this is generally referred to as the hypervisor. Such a methodology allows for an entirely separate operating system (OS) to run “on top of” the OS running natively against the hardware. For example, you may have CentOS 7 running inside a VM, which in turn, is running on your Mac laptop, or you could have Windows 10 running inside a VM, which is running on your Ubuntu 18 desktop PC.

There are several nice things about using VMs in your infrastructure. One of those is the ability to move a VM from one server to another. Likewise, deploying 10 more VMs just like the first one is usually just a couple of clicks or a few commands. Another is a better utilization of hardware resources. If your servers have 8 CPU cores and 128GB of memory, running one application on that server might not fully utilize all that hardware. Instead, you could run multiple VMs on that one server, thus taking advantage of all the resources. Additionally, using a VM allows you to bundle your application, and all associated libraries, and configs into one “unit” that can be deployed and copied very easily (We’ll come back to this concept in a minute).

One of the major downsides of the VM solution is the fact that you have an entire, separate, OS running.  It is a huge waste of resources to run a CentOS 7 VM on top of a CentOS 7 hardware-server.  If you have 5 CentOS 7 VMs running, you have five separate linux kernels (in addition to the host kernel), five separate copies of /usr, and five separate copies of the same libraries and application code taking up precious memory. Some might say that, in this setup, 90-99% of each VM is duplicating efforts already being made by the base OS. Additionally, each VM is dealing with its own CPU context switching management. Also, what happens when one VM needs a bit more CPU than another? There exists the concept of CPU “stealing” where one VM can take CPU from another VM. Imagine if that was your critical DB that had CPU stolen!

What if you could eliminate all that duplication but still give your application the ability to be bundled and deployed across tens or hundreds of servers with just a few commands? Enter: containers.

What is a Container?

Simply put, a container is a type of “lightweight” virtual machine without all the overhead of running an independent operating system on top of a hypervisor. A container can be as small as a single line script or as large as <insert super complicated application with lots of libraries, code, images, etc.>.

With containers, you essentially get all of the “pros” surrounding VMs with a lot less “cons,” mainly in the area of performance. A container is not an emulation, like VMs. On Linux, the kernel provides control groups (cgroups) functionality, which can limit and isolate CPU, memory, disk, and network. This action of isolation becomes the container; everything needed is ‘contained’ in one place/group. In more advanced setups, these cgroups can even limit/throttle the number of resources given to each container. With containers, there is no need for an additional OS install; no emulation, no hypervisor. Because of this, you can achieve “near bare-metal performance” while using containers.

Containers, and any applications running inside the containers, run inside their own, isolated, namespace, and thus, a container can only see its own process, disk contents, devices assigned, and granted network access.

Much like VMs, containers can be started, stopped, and copied from machine to machine, making deployment of applications across hundreds of servers quite easy.

One potential downside to containers is that they are inherently stateless. This means that if you have a running container, and make a change to its configuration or update a library, etc., that state is not saved if the container is restarted. The initial state of the launched container will be restored on restart. That said, most container implementations provide a method for supplying external storage which can hold stateful information to survive a restart or redeployment.

Dock the Ship, Capt’n

The concept of “containers” is generic. Many different implementations of containers exist, much like different distributions of Linux. The most popular implementation of containers in the Linux space is Docker. Docker is free, open-source software, but can also be sold with a commercial license for more advanced features. Docker is built upon one of the earlier linux container implementations, LXC.

Docker uses the concept of “images” to distribute containers. For example, you can download and install the Percona MySQL 8.0 image to your server, and from that image, create as many separate, independent containers of ‘Percona MySQL 8’ as your hardware can handle. Without getting too deep into the specifics, Docker images are the end result of multiple compressed layers which were used during the building process of this image. Herein lies another pro for Docker. If multiple images share common layers, there only needs to be one copy of that layer on any system, thus the potential for space savings can be huge.

Another nice feature of Docker containers is its built-in versioning capabilities. If you are familiar with the version control software Git, then you should know the concept of “tags” which is extremely similar to Docker tags. These tags allow you to track and specify Docker images when creating and deploying images. For example, using the same image (

percona/percona-server

 ), you can deploy

percona/percona-server:5.7

  or

percona/percona-server:8.0

  In this example, “5.7” and “8.0” are the tags used.

Docker also has a huge community ecosystem for sharing images. Visit https://hub.docker.com/ and search for just about any popular OSS project and you will probably find out that someone has already created an image for you to utilize.

Too Many Ships at Port

Docker containers are extremely simple to maintain, deploy, and use within your infrastructure. But what happens when you have hundreds of servers, and you’re trying to manage thousands of docker containers? Sure, if one server goes down, all you need to do, technically speaking, is re-launch those downed containers on another server. Do you have another server in your sea of servers with enough free capacity to handle this additional load? What if you were using persistent storage and need that volume moved along with the container? Running and managing large installations of containers can quickly become overwhelming.

Now we will discuss the topic of container orchestration.

Kubernetes

Pronounced, koo-ber-net-ies, and often abbreviated K8S (there are 8 letters between the ‘K’ and last ‘S’), is the current mainstream project for managing Docker containers at very large scale.  Kubernetes is Greek for “governor”, “helmsman”, or “captain”. Rather than repeating it here, you can read more about the history of K8S on your own.

Terminology

K8S has many moving parts to itself, as you might have imagined. Take a look at this image.

Kubernetes

Source: Wikipedia (modified)

Nodes

At the bottom of the image are two ‘Kubernetes Nodes’. These are usually your physical hardware servers, but could also be VMs at your cloud provider. These nodes will have a minimum OS installed, along with other necessary packages like K8S and Docker.

Kubelet is a management binary used by K8S to watch existing containers, run health checks, and launch new containers.

Kube-proxy is a simple port-mapping proxy service. Let’s say you had a simple webserver running as a container on 172.14.11.109:34223. Your end-users would access an external/public IP over 443 like normal, and kube-proxy would map/route that external access to the correct internal container.

Pods

The next concept to understand is a pod. This is typically the base term used when discussing what is running on a node. A pod can be a single container or can be an entire setup, running multiple containers, with mapped external storage, that should be grouped together as a single unit.

Networking

Networking within K8S is a complex topic, and a deep discussion is not suitable for this post. At an entry-level, what you need to know is that K8S creates an “overlay” network, allowing a container in a pod on node1 to talk to a container in a pod on node322 which is in another datacenter. Should your company policy require it, you can create multiple overlay networks across your entire K8S and use them as a way to isolate traffic between pods (ie: think old-school VLANs).

Master Node

At the top of the diagram, you can see the developer/operator interacting with the K8S master node. This node runs several processes which are critical to K8S operation:

kube-apiserver: A simple server to handle/route/process API requests from other K8S tools and even your own custom tools

etcd: A highly available key-value store. This is where K8S stores all of its stateful data.

kube-scheduler: Manages inventory and decides where to launch new pods based on available node resources.

You might have noticed that there is only 1 master node in the above diagram. This is the default installation. This is not highly available and should the master node go down, the entire K8S infrastructure will go down with it. In a true production environment, it is highly recommended that you run the master node in a highly-available manner.

Wrap It Up

In this post, we discussed how traditional hardware-based setups have been slowly migrating, first towards virtual machine setups and now to containerized deployments. We brought up the issues surrounding managing large container environments and introduced the leading container orchestration platform, Kubernetes. Lastly, we dove into K8S’s various components and architecture.

Once you have your K8S cluster set up with a master node, and many worker nodes, you’ll finally be able to launch some pods and have services accessible. This is what we will do in part 2 of this blog series.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com