Jun
04
2019
--

How Kubernetes came to rule the world

Open source has become the de facto standard for building the software that underpins the complex infrastructure that runs everything from your favorite mobile apps to your company’s barely usable expense tool. Over the course of the last few years, a lot of new software is being deployed on top of Kubernetes, the tool for managing large server clusters running containers that Google open sourced five years ago.

Today, Kubernetes is the fastest growing open-source project and earlier this month, the bi-annual KubeCon+CloudNativeCon conference attracted almost 8,000 developers to sunny Barcelona, Spain, making the event the largest open-source conference in Europe yet.

To talk about how Kubernetes came to be, I sat down with Craig McLuckie, one of the co-founders of Kubernetes at Google (who then went on to his own startup, Heptio, which he sold to VMware); Tim Hockin, another Googler who was an early member on the project and was also on Google’s Borg team; and Gabe Monroy, who co-founded Deis, one of the first successful Kubernetes startups, and then sold it to Microsoft, where he is now the lead PM for Azure Container Compute (and often the public face of Microsoft’s efforts in this area).

Google’s cloud and the rise of containers

To set the stage a bit, it’s worth remembering where Google Cloud and container management were five years ago.

Apr
29
2019
--

Canonical’s Mark Shuttleworth on dueling open-source foundations

At the Open Infrastructure Summit, which was previously known as the OpenStack Summit, Canonical founder Mark Shuttleworth used his keynote to talk about the state of open-source foundations — and what often feels like the increasing competition between them. “I know for a fact that nobody asked to replace dueling vendors with dueling foundations,” he said. “Nobody asked for that.”

He then put a point on this, saying, “what’s the difference between a vendor that only promotes the ideas that are in its own interest and a foundation that does the same thing. Or worse, a foundation that will only represent projects that it’s paid to represent.”

Somewhat uncharacteristically, Shuttleworth didn’t say which foundations he was talking about, but since there are really only two foundations that fit the bill here, it’s pretty clear that he was talking about the OpenStack Foundation and the Linux Foundation — and maybe more precisely the Cloud Native Computing Foundation, the home of the incredibly popular Kubernetes project.

It turns out, that’s only part of his misgivings about the current state of open-source foundations, though. I sat down with Shuttleworth after his keynote to discuss his comments, as well as Canonical’s announcements around open infrastructure.

One thing that’s worth noting at the outset is that the OpenStack Foundation is using this event to highlight that fact that it has now brought in more new open infrastructure projects outside of the core OpenStack software, with two of them graduating from their pilot phase. Shuttleworth, who has made big bets on OpenStack in the past and is seeing a lot of interest from customers, is not a fan. Canonical, it’s worth noting, is also a major sponsor of the OpenStack Foundation. He, however, believes, the foundation should focus on the core OpenStack project.

“We’re busy deploying 27 OpenStack clouds — that’s more than double the run rate last year,” he said. “OpenStack is important. It’s very complicated and hard. And a lot of our focus has been on making it simpler and cleaner, despite the efforts of those around us in this community. But I believe in it. I think that if you need large-scale, multi-tenant virtualization infrastructure, it’s the best game in town. But it has problems. It needs focus. I’m super committed to that. And I worry about people losing their focus because something newer and shinier has shown up.”

To clarify that, I asked him if he essentially believes that the OpenStack Foundation is making a mistake by trying to be all things infrastructure. “Yes, absolutely,” he said. “At the end of the day, I think there are some projects that this community is famous for. They need focus, they need attention, right? It’s very hard to argue that they will get focus and attention when you’re launching a ton of other things that nobody’s ever heard of, right? Why are you launching those things? Who is behind those decisions? Is it a money question as well? Those are all fair questions to ask.”

He doesn’t believe all of the blame should fall on the Foundation leadership, though. “I think these guys are trying really hard. I think the common characterization that it was hapless isn’t helpful and isn’t accurate. We’re trying to figure stuff out.” Shuttleworth indeed doesn’t believe the leadership is hapless, something he stressed, but he clearly isn’t all that happy with the current path the OpenStack Foundation is on either.

The Foundation, of course, doesn’t agree. As OpenStack Foundation COO Mark Collier told me, the organization remains as committed to OpenStack as ever. “The Foundation, the board, the community, the staff — we’ve never been more committed to OpenStack,” he said. “If you look at the state of OpenStack, it’s one of the top-three most active open-source projects in the world right now […] There’s no wavering in our commitment to OpenStack.” He also noted that the other projects that are now part of the foundation are the kind of software that is helpful to OpenStack users. “These are efforts which are good for OpenStack,” he said. In addition, he stressed that the process of opening up the Foundation has been going on for more than two years, with the vast majority of the community (roughly 97 percent) voting in favor.

OpenStack board member Allison Randal echoed this. “Over the past few years, and a long series of strategic conversations, we realized that OpenStack doesn’t exist in a vacuum. OpenStack’s success depends on the success of a whole network of other open-source projects, including Linux distributions and dependencies like Python and hypervisors, but also on the success of other open infrastructure projects which our users are deploying together. The OpenStack community has learned a few things about successful open collaboration over the years, and we hope that sharing those lessons and offering a little support can help other open infrastructure projects succeed too. The rising tide of open source lifts all boats.”

As far as open-source foundations in general, he surely also doesn’t believe that it’s a good thing to have numerous foundations compete over projects. He argues that we’re still trying to figure out the role of open-source foundations and that we’re currently in a slightly awkward position because we’re still trying to determine how to best organize these foundations. “Open source in society is really interesting. And how we organize that in society is really interesting,” he said. “How we lead that, how we organize that is really interesting and there will be steps forward and steps backward. Foundations tweeting angrily at each other is not very presidential.”

He also challenged the notion that if you just put a project into a foundation, “everything gets better.” That’s too simplistic, he argues, because so much depends on the leadership of the foundation and how they define being open. “When you see foundations as nonprofit entities effectively arguing over who controls the more important toys, I don’t think that’s serving users.”

When I asked him whether he thinks some foundations are doing a better job than others, he essentially declined to comment. But he did say that he thinks the Linux Foundation is doing a good job with Linux, in large parts because it employs Linus Torvalds . “I think the technical leadership of a complex project that serves the needs of many organizations is best served that way and something that the OpenStack Foundation could learn from the Linux Foundation. I’d be much happier with my membership fees actually paying for thoughtful, independent leadership of the complexity of OpenStack rather than the sort of bizarre bun fights and stuffed ballots that we see today. For all the kumbaya, it flatly doesn’t work.” He believes that projects should have independent leaders who can make long-term plans. “Linus’ finger is a damn useful tool and it’s hard when everybody tries to get reelected. It’s easy to get outraged at Linus, but he’s doing a fucking good job, right?”

OpenStack, he believes, often lacks that kind of decisiveness because it tries to please everybody and attract more sponsors. “That’s perhaps the root cause,” he said, and it leads to too much “behind-the-scenes puppet mastering.”

In addition to our talk about foundations, Shuttleworth also noted that he believes the company is still on the path to an IPO. He’s obviously not committing to a time frame, but after a year of resetting in 2018, he argues that Canonical’s business is looking up. “We want to be north of $200 million in revenue and a decent growth rate and the right set of stories around the data center, around public cloud and IoT.” First, though, Canonical will do a growth equity round.

Apr
12
2019
--

OpenStack Stein launches with improved Kubernetes support

The OpenStack project, which powers more than 75 public and thousands of private clouds, launched the 19th version of its software this week. You’d think that after 19 updates to the open-source infrastructure platform, there really isn’t all that much new the various project teams could add, given that we’re talking about a rather stable code base here. There are actually a few new features in this release, though, as well as all the usual tweaks and feature improvements you’d expect.

While the hype around OpenStack has died down, we’re still talking about a very active open-source project. On average, there were 155 commits per day during the Stein development cycle. As far as development activity goes, that keeps OpenStack on the same level as the Linux kernel and Chromium.

Unsurprisingly, a lot of that development activity focused on Kubernetes and the tools to manage these container clusters. With this release, the team behind the OpenStack Kubernetes installer brought the launch time for a cluster down from about 10 minutes to five, regardless of the number of nodes. To further enhance Kubernetes support, OpenStack Stein also includes updates to Neutron, the project’s networking service, which now makes it easier to create virtual networking ports in bulk as containers are spun up, and Ironic, the bare-metal provisioning service.

All of that is no surprise, given that according to the project’s latest survey, 61 percent of OpenStack deployments now use both Kubernetes and OpenStack in tandem.

The update also includes a number of new networking features that are mostly targeted at the many telecom users. Indeed, over the course of the last few years, telcos have emerged as some of the most active OpenStack users as these companies are looking to modernize their infrastructure as part of their 5G rollouts.

Besides the expected updates, though, there are also a few new and improved projects here that are worth noting.

“The trend from the last couple of releases has been on scale and stability, which is really focused on operations,” OpenStack Foundation executive director Jonathan Bryce told me. “The new projects — and really most of the new projects from the last year — have all been pretty oriented around real-world use cases.”

The first of these is Placement. “As people build a cloud and start to grow it and it becomes more broadly adopted within the organization, a lot of times, there are other requirements that come into play,” Bryce explained. “One of these things that was pretty simplistic at the beginning was how a request for a resource was actually placed on the underlying infrastructure in the data center.” But as users get more sophisticated, they often want to run specific workloads on machines with certain hardware requirements. These days, that’s often a specific GPU for a machine learning workload, for example. With Placement, that’s a bit easier now.

It’s worth noting that OpenStack had some of this functionality before. The team, however, decided to uncouple it from the existing compute service and turn it into a more generic service that could then also be used more easily beyond the compute stack, turning it more into a kind of resource inventory and tracking tool.

Then, there is also Blazer, a reservation service that offers OpenStack users something akin to AWS Reserved Instances. In a private cloud, the use case for a feature is a bit different, though. But as some of the private clouds got bigger, some users found that they needed to be able to guarantee resources to run some of their regular, overnight batch jobs or data analytics workloads, for example.

As far as resource management goes, it’s also worth highlighting Sahara, which now makes it easier to provision Hadoop clusters on OpenStack.

In previous releases, one of the focus areas for the project was to improve the update experience. OpenStack is obviously a very complex system, so bringing it up to the latest version is also a bit of a complex undertaking. These improvements are now paying off. “Nobody even knows we are running Stein right now,” Vexxhost CEO Mohammed Nasar, who made an early bet on OpenStack for his service, told me. “And I think that’s a good thing. You want to be least impactful, especially when you’re in such a core infrastructure level. […] That’s something the projects are starting to become more and more aware of but it’s also part of the OpenStack software in general becoming much more stable.”

As usual, this release launched only a few weeks before the OpenStack Foundation hosts its bi-annual Summit in Denver. Since the OpenStack Foundation has expanded its scope beyond the OpenStack project, though, this event also focuses on a broader range of topics around open-source infrastructure. It’ll be interesting to see how this will change the dynamics at the event.

Mar
15
2019
--

Suse is once again an independent company

Open-source infrastructure and application delivery vendor Suse — the company behind one of the oldest Linux distributions — today announced that it is once again an independent company. The company today finalized its $2.5 billion acquisition by growth investor EQT from Micro Focus, which itself had acquired it back in 2014.

Few companies have changed hands as often as Suse and yet remained strong players in their business. Suse was first acquired by Novell in 2004. Novell was then acquired by Attachmate in 2010, which Micro Focus acquired in 2014. The company then turned Suse into an independent division, only to then announce its sale to EQT in the middle of 2018.

It took a while for Micro Focus and EQT to finalize the acquisition, though, but now, for the first time since 2004, Suse stands on its own.

Micro Focus says that when it acquired Attachmate Group for $2.35 billion, Suse generated just 20 percent of the group’s total revenues. Since then, Suse has generated quite a bit more business as it expanded its product portfolio well beyond its core Linux offerings and into the more lucrative open-source infrastructure and application delivery business by, among other things, offering products and support around massive open-source projects like Cloud Foundry, OpenStack and Kubernetes.

Suse CEO Nils Brauckmann will remain at the helm of the company, but the company is shaking up its executive ranks a bit. Enrica Angelone, for example, has been named to the new post of CFO at Suse, and Sander Huyts is now the company’s COO. Former Suse CTO Thomas Di Giacomo is now president of Engineering, Product and Innovation. All three report directly to Brauckmann.

“Our genuinely open, open source solutions, flexible business practices, lack of enforced vendor lock-in and exceptional service are more critical to customer and partner organizations, and our independence coincides with our single-minded focus on delivering what is best for them,” said Brauckmann in today’s announcement. “Our ability to consistently meet these market demands creates a cycle of success, momentum and growth that allows SUSE to continue to deliver the innovation customers need to achieve their digital transformation goals and realize the hybrid and multi-cloud workload management they require to power their own continuous innovation, competitiveness and growth.”

Since IBM recently bought Red Hat for $34 billion, though, it remains to be seen how long Suse’s independent future will last. The market for open source is only heating up, after all.

Dec
20
2018
--

Benchmark PostgreSQL With Linux HugePages

Benchmarking HugePages and PostgreSQL

Linux kernel provides a wide range of configuration options that can affect performance. It’s all about getting the right configuration for your application and workload. Just like any other database, PostgreSQL relies on the Linux kernel to be optimally configured. Poorly configured parameters can result in poor performance. Therefore, it is important that you benchmark database performance after each tuning session to avoid performance degradation. In one of my previous posts, Tune Linux Kernel Parameters For PostgreSQL Optimization, I described some of the most useful Linux kernel parameters and how those may help you improve database performance. Now I am going to share my benchmark results with you after configuring Linux Huge Page with different PostgreSQL workload. I have performed a comprehensive set of benchmarks for many different PostgreSQL load sizes and different number concurrent clients.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 11

Linux Kernel Settings

I have used default kernel settings without any optimization/tuning except for disabling Transparent HugePages. Transparent HugePages are by default enabled, and allocate a page size that may not be recommended for database usage. For databases generally, fixed sized HugePages are needed, which Transparent HugePages do not provide. Hence, disabling this feature and defaulting to classic HugePages is always recommended.

PostgreSQL Settings

I have used consistent PostgreSQL settings for all the benchmarks in order to record different PostgreSQL workloads with different settings of Linux HugePages. Here is the PostgreSQL setting used for all benchmarks:

shared_buffers = '64GB'
work_mem = '1GB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '10min'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
wal_compression = 'ON'

Benchmark scheme

In the benchmark, the benchmark scheme plays an important role. All the benchmarks are run three times with thirty minutes duration for each run. I took the median value from these three benchmarks. The benchmarks were carried out using the PostgreSQL benchmarking tool pgbench.  pgbench works on scale factor, with one scale factor being approximately 16MB of workload. 

HugePages

Linux, by default, uses 4K memory pages along with HugePages. BSD has Super Pages, whereas Windows has Large Pages. PostgreSQL has support for HugePages (Linux) only. In cases where there is a high memory usage, smaller page sizes decrease performance. By setting up HugePages, you increase the dedicated memory for the application and therefore reduce the operational overhead that is incurred during allocation/swapping; i.e. you gain performance by using HugePages.

Here is the Hugepage setting when using Hugepage size of 1GB. You can always get this information from /proc.

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     100
HugePages_Free:       97
HugePages_Rsvd:       63
HugePages_Surp:        0
Hugepagesize:    1048576 kB

For more detail about HugePages please read my previous blog post.

https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-for-postgresql-optimization/

Generally, HugePages comes in sizes 2MB and 1GB, so it makes sense to use 1GB size instead of the much smaller 2MB size.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge
https://kerneltalks.com/services/what-is-huge-pages-in-linux/

Benchmark Results

This benchmark shows the overall impact of different sizes of HugePages. The first set of benchmarks was created with the default Linux 4K page size without enabling HugePages. Note that Transparent Hugepages were also disabled, and remained disabled throughout these benchmarks.

Then the second set of benchmarks was performed with 2MB HugePages. Finally, the third set of benchmarks is performed with HugePages set to 1GB in size.

All these benchmarks were executed with PostgreSQL version 11. The sets include a combination of different database sizes and clients. The graph below shows comparative performance results for these benchmarks with TPS (transactions per seconds) on the y-axis, and database size and the number of clients per database size on the x-axis.

 

Clearly, from the graph above, you can see that the performance gain with HugePages increases as the number of clients and the database size increases, as long as the size remains within the pre-allocated shared buffer.

This benchmark shows TPS versus clients. In this case, the database size is set to 48GB. On the y-axis, we have TPS and on the x-axis, we have the number of connected clients. The database size is small enough to fit in the shared buffer, which is set to 64GB.

With HugePages set to 1GB, the higher the number of clients, the higher the comparative performance gain.

The next graph is the same as the one above except for a database size of 96GB. This exceeds the shared buffer size, which is set to 64GB.

 

The key observation here is that the performance with 1GB HugePages improves as the number of clients increases and it eventually gives better performance than 2MB HugePages or the standard 4KB page size.

This benchmark shows the TPS versus database size. In this case, the number of connected clients it set to 32. On the y-axis, we have TPS and on the x-axis, we have database sizes.

As expected, when the database spills over the pre-allocated HugePages, the performance degrades significantly.

Summary

One of my key recommendations is that we must keep Transparent HugePages off. You will see the biggest performance gains when the database fits into the shared buffer with HugePages enabled. Deciding on the size of huge page to use requires a bit of trial and error, but this can potentially lead to a significant TPS gain where the database size is large but remains small enough to fit in the shared buffer.

Nov
15
2018
--

Canonical plans to raise its first outside funding as it looks to a future IPO

It’s been 14 years since Mark Shuttleworth first founded and funded Canonical and the Ubuntu project. At the time, it was mostly a Linux distribution. Today, it’s a major enterprise player that offers a variety of products and services. Throughout the years, Shuttleworth self-funded the project and never showed much interest in taking outside money. Now, however, that’s changing.

As Shuttleworth told me, he’s now looking for investors as he looks to get the company on track to an IPO. It’s no secret that the company’s recent re-focusing on the enterprise — and shutting down projects like the Ubuntu phone and the Unity desktop environment — was all about that, after all. Shuttleworth sees raising money as a step in this direction — and as a way of getting the company in shape for going public.

“The first step would be private equity,” he told me. “And really, that’s because having outside investors with outside members of the board essentially starts to get you to have to report and be part of that program. I’ve got a set of things that I think we need to get right. That’s what we’re working towards now. Then there’s a set of things that private investors are looking for and the next set of things is when you’re doing a public offering, there’s a different level of discipline required.”

It’s no secret that Shuttleworth, who sports an impressive beard these days, was previously resistant to this, and he acknowledged as much. “I think that’s a fair characterization,” he said. “I enjoy my independence and I enjoy being able to make long-term calls. I still feel like I’ll have the ability to do that, but I do appreciate keenly the responsibility of taking other people’s money. When it’s your money, it’s slightly different.”

Refocusing Canonical on the enterprise business seems to be paying off already. “The numbers are looking good. The business is looking healthy. It’s not a charity. It’s not philanthropy,” he said. “There are some key metrics that I’m watching, which are the gate for me to take the next step, which would be growth equity.” Those metrics, he told me, are the size of the business and how diversified it is.

Shuttleworth likens this program of getting the company ready to IPO to getting fit. “There’s no point in saying: I haven’t done any exercise in the last 10 years but I’m going to sign up for tomorrow’s marathon,” he said.

The move from being a private company to taking outside investment and going public — especially after all these years of being self-funded — is treacherous, though, and Shuttleworth admitted as much, especially in terms of being forced to setting short-term goals to satisfy investors that aren’t necessarily in the best interest of the company in the long term. Shuttleworth thinks he can negotiate those issues, though.

Interestingly, he thinks the real danger is quite a different one. “I think the most dangerous thing in making that shift is the kind of shallowness of the unreasonably big impact that stock price has on people’s mood,” he said. “Today, at Canonical, it’s 600 people. You might have some that are having a really great day and some that are having a shitty day. And they have to figure out what’s real about both of those scenarios. But they can kind of support each other. […] But when you have a stock ticker, everybody thinks they’re having a great day, or everybody thinks they’re having a shitty day in a way that may be completely uncorrelated to how well they’re actually doing.”

Shuttleworth does not believe that IBM’s acquisition of its competitor Red Hat will have any immediate effect on its business, though. What he does think, however, is that this move is making a lot of people rethink for the first time in years the investment they’ve been making in Red Hat and its enterprise Linux distribution. Canonical’s promise is that Ubuntu, as well as its OpenStack tools and services, are not just competitive but also more cost-effective in the long run, and offer enterprises an added degree of agility. And if more businesses start looking at Canonical and Ubuntu, that can only speed up Shuttleworth’s (and his bankers’) schedule for hitting Canonical’s metrics for raising money and going public.

Oct
28
2018
--

Forget Watson, the Red Hat acquisition may be the thing that saves IBM

With its latest $34 billion acquisition of Red Hat, IBM may have found something more elementary than “Watson” to save its flagging business.

Though the acquisition of Red Hat  is by no means a guaranteed victory for the Armonk, N.Y.-based computing company that has had more downs than ups over the five years, it seems to be a better bet for “Big Blue” than an artificial intelligence program that was always more hype than reality.

Indeed, commentators are already noting that this may be a case where IBM finally hangs up the Watson hat and returns to the enterprise software and services business that has always been its core competency (albeit one that has been weighted far more heavily on consulting services — to the detriment of the company’s business).

Watson, the business division focused on artificial intelligence whose public claims were always more marketing than actually market-driven, has not performed as well as IBM had hoped and investors were losing their patience.

Critics — including analysts at the investment bank Jefferies (as early as one year ago) — were skeptical of Watson’s ability to deliver IBM from its business woes.

As we wrote at the time:

Jefferies pulls from an audit of a partnership between IBM Watson and MD Anderson as a case study for IBM’s broader problems scaling Watson. MD Anderson cut its ties with IBM after wasting $60 million on a Watson project that was ultimately deemed, “not ready for human investigational or clinical use.”

The MD Anderson nightmare doesn’t stand on its own. I regularly hear from startup founders in the AI space that their own financial services and biotech clients have had similar experiences working with IBM.

The narrative isn’t the product of any single malfunction, but rather the result of overhyped marketing, deficiencies in operating with deep learning and GPUs and intensive data preparation demands.

That’s not the only trouble IBM has had with Watson’s healthcare results. Earlier this year, the online medical journal Stat reported that Watson was giving clinicians recommendations for cancer treatments that were “unsafe and incorrect” — based on the training data it had received from the company’s own engineers and doctors at Sloan-Kettering who were working with the technology.

All of these woes were reflected in the company’s latest earnings call where it reported falling revenues primarily from the Cognitive Solutions business, which includes Watson’s artificial intelligence and supercomputing services. Though IBM chief financial officer pointed to “mid-to-high” single digit growth from Watson’s health business in the quarter, transaction processing software business fell by 8% and the company’s suite of hosted software services is basically an afterthought for business gravitating to Microsoft, Alphabet, and Amazon for cloud services.

To be sure, Watson is only one of the segments that IBM had been hoping to tap for its future growth; and while it was a huge investment area for the company, the company always had its eyes partly fixed on the cloud computing environment as it looked for areas of growth.

It’s this area of cloud computing where IBM hopes that Red Hat can help it gain ground.

“The acquisition of Red Hat is a game-changer. It changes everything about the cloud market,” said Ginni Rometty, IBM Chairman, President and Chief Executive Officer, in a statement announcing the acquisition. “IBM will become the world’s number-one hybrid cloud provider, offering companies the only open cloud solution that will unlock the full value of the cloud for their businesses.”

The acquisition also puts an incredible amount of marketing power behind Red Hat’s various open source services business — giving all of those IBM project managers and consultants new projects to pitch and maybe juicing open source software adoption a bit more aggressively in the enterprise.

As Red Hat chief executive Jim Whitehurst told TheStreet in September, “The big secular driver of Linux is that big data workloads run on Linux. AI workloads run on Linux. DevOps and those platforms, almost exclusively Linux,” he said. “So much of the net new workloads that are being built have an affinity for Linux.”

Sep
05
2018
--

Modifying List of Collected Metrics on PMM Linux Exporter

Changing metrics collection with PMM for Linux

Changing metrics collection with PMM for LinuxDo you need to modify the metrics collected from Linux by Percona Monitoring and Management (PMM)? In this blog post we will see how to enable, disable, and update collected metrics on PMM’s linux:metrics exporter. 

We will assume that the PMM client packages are installed, and they are configured already.

Using a custom list of metrics

Let’s now suppose we are not yet collecting any metrics on our desired client server, and we want to enable only the following: diskstats, meminfo, netdev and vmstat. We can use the following command:

pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,netdev,vmstat

So, in order to enable or disable the functionality we want, we need to modify the collectors.enabled list accordingly. In the following online documentation page, we are able to find all collectors supported:

https://www.percona.com/doc/percona-monitoring-and-management/section.exporter.node.html

In this way, by adding or removing items from the collectors.enabled list, we can choose which functionality will be set in our PMM linux:metrics collectors.

Checking list of metrics used

We can use the ps aux command to check the collectors list that applies at any given time. Let’s see this in practical terms. After running the previous command, we should expect to see the following:

shell> ps aux | grep node_exporter
...
root     20450 3.2  0.0 1660248 15924 ?       Sl 13:48 0:13 /usr/local/percona/pmm-client/node_exporter -collectors.enabled=diskstats,filefd,filesystem,loadavg,meminfo,netdev,netstat,stat,time,uname,vmstat,meminfo_numa -web.listen-address=10.10.8.141:42000 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml -web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt -web.ssl-key-file=/usr/local/percona/pmm-client/server.key -collectors.enabled=diskstats,meminfo,netdev,vmstat

Note: there is currently a bug where you will see duplicate entries when using custom collectors. We are tracking this under issue PMM-2857. For now, if you see two different lists on ps aux outputs, the one that applies is the last one (as seen above).

Updating the list of metrics

There is currently no way to enable (or disable) metrics dynamically (however, we are working on it). We will need to stop the collector, manually get and edit the metrics list, and re-add it with the updated collectors in place. For this, we will need to know the current list of metrics used from the ps aux outputs.

Continuing with the same example, suppose we are not happy with what netdev metrics offer, and we wish to disable them to avoid unnecessary overhead. Knowing the metrics list was:

-collectors.enabled=diskstats,meminfo,netdev,vmstat

We will need to run the following commands to have it removed from the collectors list:

pmm-admin stop linux:metrics
pmm-admin rm linux:metrics
pmm-admin add linux:metrics -- -collectors.enabled=diskstats,meminfo,vmstat

Links for reference

  • Passing options to exporter: here and here
  • Collector options for linux:metrics exporter: here

Did you find this post useful?

You might also enjoy some of our other resources. Check out some of these recent technical webinars on using PMM, presented by my colleagues from Percona:

The post Modifying List of Collected Metrics on PMM Linux Exporter appeared first on Percona Database Performance Blog.

Jul
03
2018
--

Linux OS Tuning for MySQL Database Performance

Linux OS tuning for MySQL database performance

Linux OS tuning for MySQL database performanceIn this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud. Other posts have addressed MySQL parameters, like Alexander’s blog MySQL 5.7 Performance Tuning Immediately After Installation. That post remains highly relevant for the latest versions of MySQL, 5.7 and 8.0. Here we will focus more on the Linux operating system parameters that can affect database performance.

Server and Operating System

Here are some Linux parameters that you should check and consider modifying if you need to improve database performance.

Kernel – vm.swappiness

The value represents the tendency of the kernel  to swap out memory pages. On a database server with ample amounts of RAM, we should keep this value as low as possible. The extra I/O can slow down or even render the service unresponsive. A value of 0 disables swapping completely while 1 causes the kernel to perform the minimum amount of swapping. In most cases the latter setting should be OK:

# Set the swappiness value as root
echo 1 > /proc/sys/vm/swappiness
# Alternatively, using sysctl
sysctl -w vm.swappiness=1
# Verify the change
cat /proc/sys/vm/swappiness
1
# Alternatively, using sysctl
sysctl vm.swappiness
vm.swappiness = 1

The change should be also persisted in /etc/sysctl.conf:

vm.swappiness = 1

Filesystems – XFS/ext4/ZFS
XFS

XFS is a high-performance, journaling file system designed for high scalability. It provides near native I/O performance even when the file system spans multiple storage devices.  XFS has features that make it suitable for very large file systems, supporting files up to 8EiB in size. Fast recovery, fast transactions, delayed allocation for reduced fragmentation and near raw I/O performance with DIRECT I/O.

The default options for mkfs.xfs are good for optimal speed, so the simple command:

# Use default mkfs options
mkfs.xfs /dev/target_volume

will provide best performance while ensuring data safety. Regarding mount options, the defaults should fit most cases. On some filesystems you can see a performance increase by adding the noatime mount option to the /etc/fstab.  For XFS filesystems the default atime behaviour is relatime, which has almost no overhead compared to noatime and still maintains sane atime values.  If you create an XFS file system on a LUN that has a battery backed, non-volatile cache, you can further increase the performance of the filesystem by disabling the write barrier with the mount option nobarrier. This helps you to avoid flushing data more often than necessary. If a BBU (backup battery unit) is not present, however, or you are unsure about it, leave barriers on, otherwise you may jeopardize data consistency. With this options on, an /etc/fstab file should look like the one below:

/dev/sda2              /datastore              xfs     defaults,nobarrier
/dev/sdb2              /binlog                 xfs     defaults,nobarrier

ext4

ext4 has been developed as the successor to ext3 with added performance improvements. It is a solid option that will fit most workloads. We should note here that it supports files up to 16TB in size, a smaller limit than xfs. This is something you should consider if extreme table space size/growth is a requirement. Regarding mount options, the same considerations apply. We recommend the defaults for a robust filesystem without risks to data consistency. However, if an enterprise storage controller with a BBU cache is present, the following mount options will provide the best performance:

/dev/sda2              /datastore              ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro
/dev/sdb2              /binlog                 ext4     noatime,data=writeback,barrier=0,nobh,errors=remount-ro

Note: The data=writeback option results in only metadata being journaled, not actual file data. This has the risk of corrupting recently modified files in the event of a sudden power loss, a risk which is minimised with a presence of a BBU enabled controller. nobh only works with the data=writeback option enabled.

ZFS

ZFS is a filesystem and LVM combined enterprise storage solution with extended protection vs data corruption. There are certainly cases where the rich feature set of ZFS makes it an essential option to consider, most notably when advance volume management is a requirement. ZFS tuning for MySQL can be a complex topic and falls outside the scope of this blog. For further reference, there is a dedicated blog post on the subject by Yves Trudeau:

Disk Subsystem – I/O scheduler 

Most modern Linux distributions come with noop or deadline I/O schedulers by default, both providing better performance than the cfq and anticipatory ones. However it is always a good practice to check the scheduler for each device and if the value shown is different than noop or deadline the policy can change without rebooting the server:

# View the I/O scheduler setting. The value in square brackets shows the running scheduler
cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
# Change the setting
sudo echo noop > /sys/block/sdb/queue/scheduler

To make the change persistent, you must modify the GRUB configuration file:

# Change the line:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
# to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop"

AWS Note: There are cases where the I/O scheduler has a value of none, most notably in AWS VM instance types where EBS volumes are exposed as NVMe block devices. This is because the setting has no use in modern PCIe/NVMe devices. The reason is that they have a very large internal queue and they bypass the IO scheduler altogether. The setting in this case is none and it is the optimal in such disks.

Disk Subsystem – Volume optimization

Ideally different disk volumes should be used for the OS installation, binlog, data and the redo log, if this is possible. The separation of OS and data partitions, not just logically but physically, will improve database performance. The RAID level can also have an impact: RAID-5 should be avoided as the checksum needed to ensure integrity is costly. The best performance without making compromises to redundancy is achieved by the use of an advanced controller with a battery-backed cache unit and preferably RAID-10 volumes spanned across multiple disks.

AWS Note: For further information about EBS volumes and AWS storage optimisation, Amazon has documentation at the following links:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-optimized-instances.html

Database settings

System Architecture – NUMA settings

Non-uniform memory access (NUMA) is a memory design where an SMP’s system processor can access its own local memory faster than non-local memory (the one assigned local to other CPUs). This may result in suboptimal database performance and potentially swapping. When the buffer pool memory allocation is larger than size of the RAM available local to the node, and the default memory allocation policy is selected, swapping occurs. A NUMA enabled server will report different node distances between CPU nodes. A uniformed one will report a single distance:

# NUMA system
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65525 MB
node 0 free: 296 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 9538 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 65536 MB
node 2 free: 12701 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 65535 MB
node 3 free: 7166 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10
# Uniformed system
numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 64509 MB
node 0 free: 4870 MB
node distances:
node   0
  0:  10

In the case of a NUMA system, where numactl shows different distances across nodes, the MySQL variable innodb_numa_interleave should be enabled to ensure memory interleaving. Percona Server provides improved NUMA support by introducing the flush_caches variable. When enabled, it will help with allocation fairness across nodes. To determine whether or not allocation is equal across nodes, you can examine numa_maps for the mysqld process with this script:

# The perl script numa_maps.pl will report memory allocation per CPU node:
# 3595 is the pid of the mysqld process
perl numa_maps.pl < /proc/3595/numa_maps
N0        :     16010293 ( 61.07 GB)
N1        :     10465257 ( 39.92 GB)
N2        :     13036896 ( 49.73 GB)
N3        :     14508505 ( 55.35 GB)
active    :          438 (  0.00 GB)
anon      :     54018275 (206.06 GB)
dirty     :     54018275 (206.06 GB)
kernelpagesize_kB:         4680 (  0.02 GB)
mapmax    :          787 (  0.00 GB)
mapped    :         2731 (  0.01 GB)

Conclusion

In this blog post we examined a few important OS related settings and explained how they can be tuned for better database performance.

While you are here …

You might also find value in this recorded webinar Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

 

The post Linux OS Tuning for MySQL Database Performance appeared first on Percona Database Performance Blog.

Jun
26
2018
--

Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

performance troubleshooting MySQL monitoring tools

performance troubleshooting MySQL monitoring toolsPlease join Percona’s Principal Support Escalation Specialist Sveta Smirnova as she presents Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance on Wednesday, June 27th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

During the MySQL Troubleshooting webinar series, I covered many monitoring and logging tools such as:

  • General, slow, audit, binary, error log files
  • Performance Schema
  • Information Schema
  • System variables
  • Linux utilities
  • InnoDB monitors
  • PMM

However, I did not spend much time on the impact these instruments have on overall MySQL performance. And they do have an impact.

And this is the conflict many people face. MySQL Server users try exploring these monitoring instruments, see that they slow down their installations, and turn them off. This is unfortunate. If the instrument that can help you resolve a problem is OFF, you won’t have good and necessary information to help understand when, how and why the issue occurred. In the best case, you’ll re-enable instrumentation and wait for the next disaster occurrence. In the worst case, you try various fix options without any real knowledge if they solve the problem or not.

This is why it is important to understand the impact monitoring tools have on your database, and therefore how to minimize it.

Understanding and controlling the impact of MySQL monitoring tools

In this webinar, I cover why certain monitoring tools affect performance, and how to minimize the impact without turning the instrument off. You will learn how to monitor safely and effectively.

Register Now

 

Sveta Smirnova

Principal Support Escalation Specialist

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com