Nov
10
2020
--

With $29M in funding, Isovalent launches its cloud-native networking and security platform

Isovalent, a startup that aims to bring networking into the cloud-native era, today announced that it has raised a $29 million Series A round led by Andreessen Horowitz and Google. In addition, the company today officially launched its Cilium Enterprise platform (which was in stealth until now) to help enterprises connect, observe and secure their applications.

The open-source Cilium project is already seeing growing adoption, with Google choosing it for its new GKE dataplane, for example. Other users include Adobe, Capital One, Datadog and GitLab. Isovalent is following what is now the standard model for commercializing open-source projects by launching an enterprise version.

Image Credits: Cilium

The founding team of CEO Dan Wendlandt and CTO Thomas Graf has deep experience in working on the Linux kernel and building networking products. Graf spent 15 years working on the Linux kernel and created the Cilium open-source project, while Wendlandt worked on Open vSwitch at Nicira (and then VMware).

Image Credits: Isovalent

“We saw that first wave of network intelligence be moved into software, but I think we both shared the view that the first wave was about replicating the traditional network devices in software,” Wendlandt told me. “You had IPs, you still had ports, you created virtual routers, and this and that. We both had that shared vision that the next step was to go beyond what the hardware did in software — and now, in software, you can do so much more. Thomas, with his deep insight in the Linux kernel, really saw this eBPF technology as something that was just obviously going to be groundbreaking technology, in terms of where we could take Linux networking and security.”

As Graf told me, when Docker, Kubernetes and containers, in general, become popular, what he saw was that networking companies at first were simply trying to reapply what they had already done for virtualization. “Let’s just treat containers as many as miniature VMs. That was incredibly wrong,” he said. “So we looked around, and we saw eBPF and said: this is just out there and it is perfect, how can we shape it forward?”

And while Isovalent’s focus is on cloud-native networking, the added benefit of how it uses the eBPF Linux kernel technology is that it also gains deep insights into how data flows between services and hence allows it to add advanced security features as well.

As the team noted, though, users definitely don’t need to understand or program eBPF, which is essentially the next generation of Linux kernel modules, themselves.

Image Credits: Isovalent

“I have spent my entire career in this space, and the North Star has always been to go beyond IPs + ports and build networking visibility and security at a layer that is aligned with how developers, operations and security think about their applications and data,” said Martin Casado, partner at Andreesen Horowitz (and the founder of Nicira). “Until just recently, the technology did not exist. All of that changed with Kubernetes and eBPF.  Dan and Thomas have put together the best team in the industry and given the traction around Cilium, they are well on their way to upending the world of networking yet again.”

As more companies adopt Kubernetes, they are now reaching a stage where they have the basics down but are now facing the next set of problems that come with this transition. Those, almost by default, include figuring out how to isolate workloads and get visibility into their networks — all areas where Isovalent/Cilium can help.

The team tells me its focus, now that the product is out of stealth, is about building out its go-to-market efforts and, of course, continue to build out its platform.

Aug
29
2018
--

Tune Linux Kernel Parameters For PostgreSQL Optimization

Linux parameters for PostgreSQL performance tuning

For optimum performance, a PostgreSQL database depends on the operating system parameters being defined correctly. Poorly configured OS kernel parameters can cause degradation in database server performance. Therefore, it is imperative that these parameters are configured according to the database server and its workload. In this post, we will discuss some important kernel parameters that can affect database server performance and how these should be tuned.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to define the maximum size of a single shared memory segment a Linux process can allocate. Until version 9.2, PostgreSQL uses System V (SysV) that requires SHMMAX setting. After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory.

Prior to version 9.3 SHMMAX was the most important kernel parameter. The value of SHMMAX is in bytes.

Similarly SHMALL is another kernel parameter used to define system wide total amount of shared memory pages. To view the current values for SHMMAX, SHMALL or SHMMIN, use the ipcs command.

$ ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1073741824
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

$ ipcs -M
IPC status from  as of Thu Aug 16 22:20:35 PKT 2018
shminfo:
	shmmax: 16777216	(max shared memory segment size)
	shmmin:       1	(min shared memory segment size)
	shmmni:      32	(max number of shared memory identifiers)
	shmseg:       8	(max shared memory segments per process)
	shmall:    1024	(max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate the shared memory. This parameter is one of the most important kernel parameters. Whenever you get following error messages, it means that you have an older version PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration errors

If SHMMAX is misconfigured, you can get an error when trying to initialize a PostgreSQL cluster using the initdb command.

DETAIL: Failed system call was shmget(key=1, size=2072576, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. 
You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

Similarly, you can get an error when starting the PostgreSQL server using the pg_ctl command.

DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.
You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.

Be aware of differing definitions

The definition of the SHMMAX/SHMALL parameters is slightly different between Linux and MacOS X. These are the definitions:

  • Linux: kernel.shmmax, kernel.shmall
  • MacOS X: kern.sysv.shmmax, kern.sysv.shmall

The sysctl command can be used to change the value temporarily. To permanently set the value, add an entry into /etc/sysctl.conf. The details are given below.

# Get the value of SHMMAX
sudo sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kern.sysv.shmall
kern.sysv.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kern.sysv.shmmax=16777216
kern.sysv.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kern.sysv.shmall=16777216
kern.sysv.shmall: 4096 -> 16777216

# Get the value of SHMMAX
sudo sysctl kernel.shmmax
kernel.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kernel.shmall
kernel.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kernel.shmmax=16777216
kernel.shmmax: 4096 -> 16777216<br>
# Set the value of SHMALL
sudo sysctl -w kernel.shmall=16777216
kernel.shmall: 4096 -> 16777216

Remember: to make the change permanent add these values in

/etc/sysctl.conf

 

Huge Pages

Linux, by default uses 4K memory pages, BSD has Super Pages, whereas Windows has Large Pages. A page is a chunk of RAM that is allocated to a process. A process may own more than one page depending on its memory requirements. The more memory a process needs, the more pages that are allocated to it. The OS maintains a table of page allocation to processes. The smaller the page size, the bigger the table, the more time required to lookup a page in that page table. Therefore, huge pages make it possible to use large amount of memory with reduced overheads; fewer page look ups, fewer page faults, faster read/write operations through larger buffers. This results in improved performance.

PostgreSQL has support for bigger pages on Linux only. By default, Linux uses 4K of memory pages, so in cases where there are too many memory operations, there is a need to set bigger pages. Performance gains have been observed by using huge pages with sizes 2 MB and up to 1 GB. The size of Huge Page can be set boot time. You can easily check the huge page settings and utilization on your Linux box using cat /proc/meminfo | grep -i huge command.

Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this example, although huge page size is set at 2,048 (2 MB), the total number of huge pages has a value of 0. which signifies that huge pages are disabled.

Script to quantify Huge Pages

This is a simple script which returns the number of Huge Pages required. Execute the script on your Linux box while your PostgreSQL is running. Ensure that $PGDATA environment variable is set to PostgreSQL’s data directory.

#!/bin/bash
pid=`head -1 $PGDATA/postmaster.pid`
echo "Pid:            $pid"
peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'`
echo "VmPeak:            $peak kB"
hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'`
echo "Hugepagesize:   $hps kB"
hp=$((peak/hps))
echo Set Huge Pages:     $hp

The output of the script looks like this:

Pid:            12737
VmPeak:         180932 kB
Hugepagesize:   2048 kB
Set Huge Pages: 88

The recommended huge pages are 88, therefore you should set the value to 88.

sysctl -w vm.nr_hugepages= 88

Check the huge pages now, you will see no huge page is in use (HugePages_Free = HugePages_Total).

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       88
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now set the parameter huge_pages “on” in $PGDATA/postgresql.conf and restart the server.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       81
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that a very few of the huge pages are used. Let’s now try to add some data into the database.

postgres=# CREATE TABLE foo(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000));
INSERT 0 10000000

Let’s see if we are now using more huge pages than before.

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       18
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that most of the huge pages are in use.

Note: The sample value for HugePages used here is very low, which is not a normal value for a big production machine. Please assess the required number of pages for your system and set those accordingly depending on your systems workload and resources.

vm.swappiness

vm.swappiness is another kernel parameter that can affect the performance of the database. This parameter is used to control the swappiness (swapping pages to and from swap memory into RAM) behaviour on a Linux system. The value ranges from 0 to 100. It controls how much memory will be swapped or paged out. Zero means disable swap and 100 means aggressive swapping.

You may get good performance by setting lower values.

Setting a value of 0 in newer kernels may cause the OOM Killer (out of memory killer process in Linux) to kill the process. Therefore, you can be on safe side and set the value to 1 if you want to minimize swapping. The default value on a Linux system is 60. A higher value causes the MMU (memory management unit) to utilize more swap space than RAM, whereas a lower value preserves more data/code in memory.

A smaller value is a good bet to improve performance in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications acquire memory and free that memory when it is no longer needed. But in some cases an application acquires too much memory and does not release it.  This can invoke the OOM killer. Here are the possible values for vm.overcommit_memory parameter with a description for each:

    1. Heuristic overcommit, Do it intelligently (default); based kernel heuristics
    2. Allow overcommit anyway
    3. Don’t over commit beyond the overcommit ratio.

Reference: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio is the percentage of RAM that is available for overcommitment. A value of 50% on a system with 2 GB of RAM may commit up to 3 GB of RAM.

A value of 2 for vm.overcommit_memory yields better performance for PostgreSQL. This value maximises RAM utilization by the server process without any significant risk of getting killed by the OOM killer process. An application will be able to overcommit, but only within the overcommit ratio, thus reducing the risk of having OOM killer kill the process. Hence a value to 2 gives better performance than the default 0 value. However, reliability can be improved by ensuring that memory beyond an allowable range is not overcommitted. It avoid the risk of the process being killed by OOM-killer.

On systems without swap, one may experience problem when vm.overcommit_memory is 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

The vm.dirty_background_ratio is the percentage of memory filled with dirty pages that need to be flushed to disk. Flushing is done in the background. The value of this parameter ranges from 0 to 100; however a value lower than 5 may not be effective and some kernels do not internally support it. The default value is 10 on most Linux systems. You can gain performance for write intensive operations with a lower ratio, which means that Linux flushes dirty pages in the background.

You need to set a value of vm.dirty_background_bytes depending on your disk speed.

There are no “good” values for these two parameters since both depend on the hardware. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes to 25% of your disk speed improves performance by up to ~25% in most cases.

vm.dirty_ratio / dirty_bytes

This is same as vm.dirty_background_ratio / dirty_background_bytes except that the flushing is done in the foreground, blocking the application. So vm.dirty_ratio should be higher than vm.dirty_background_ratio. This will ensure that background processes kick in before the foreground processes to avoid blocking the application, as much as possible. You can tune the difference between the two ratios depending on your disk IO load.

Summing up

You can tune other parameters for performance, but the improvement gains are likely to be minimal. We must keep in mind that not all parameters are relevant for all applications types. Some applications perform better by tuning some parameters and some applications don’t. You need to find a good balance between these parameter configurations for the expected application workload and type, and OS behaviour must also be kept in mind when making adjustments. Tuning kernel parameters is not as easy as tuning database parameters: it’s harder to be prescriptive.

In my next post, I’ll take a look at tuning PostgreSQL’s database parameters. You might also enjoy this post:

Tuning PostgreSQL for sysbench-tpcc

 

The post Tune Linux Kernel Parameters For PostgreSQL Optimization appeared first on Percona Database Performance Blog.

Apr
28
2014
--

OOM relation to vm.swappiness=0 in new kernel

I have recently been involved in diagnosing the reasons behind OOM invocation that would kill the MySQL server process. Of course these servers were primarily running MySQL. As such the MySQL server process was the one with the largest amount of memory allocated.

But the strange thing was that in all the cases, there was no swapping activity seen and there were enough pages in the page cache. Ironically all of these servers were CentOS 6.4 running kernel version 2.6.32-358. Another commonality was the fact that vm.swappiness was set to 0. This is a pretty much standard practice and one that is applied on nearly every server that runs MySQL.

Looking into this further I realized that there was a change introduced in kernel 3.5-rc1 that altered the swapping behavior when “vm.swappiness=0″.

Below is the description of the commit that changed “vm.swappiness=0″ behavior, together with the diff:

$ git show fe35004fbf9eaf67482b074a2e032abb9c89b1dd
commit fe35004fbf9eaf67482b074a2e032abb9c89b1dd
Author: Satoru Moriya <satoru.moriya@hds.com>
Date: Tue May 29 15:06:47 2012 -0700
mm: avoid swapping out with swappiness==0
Sometimes we'd like to avoid swapping out anonymous memory. In
particular, avoid swapping out pages of important process or process
groups while there is a reasonable amount of pagecache on RAM so that we
can satisfy our customers' requirements.
OTOH, we can control how aggressive the kernel will swap memory pages with
/proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.
But with current reclaim implementation, the kernel may swap out even if
we set swappiness=0 and there is pagecache in RAM.
This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely (for global reclaim
until the amount of free pages and filebacked pages in a zone has been
reduced to something very very small (nr_free + nr_filebacked < high
watermark)).
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 67a4fd4..ee97530 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1761,10 +1761,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
* proportional to the fraction of recently scanned pages on
* each list that were recently referenced and in active use.
*/
- ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+ ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
ap /= reclaim_stat->recent_rotated[0] + 1;
- fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+ fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
spin_unlock_irq(&mz->zone->lru_lock);
@@ -1777,7 +1777,7 @@ out:
unsigned long scan;
 scan = zone_nr_lru_pages(mz, lru);
- if (priority || noswap) {
+ if (priority || noswap || !vmscan_swappiness(mz, sc)) {
scan >>= priority;
if (!scan && force_scan)
scan = SWAP_CLUSTER_MAX;

This change was merged into the RHEL kernel 2.6.32-303:

* Mon Aug 27 2012 Jarod Wilson <jarod@redhat.com> [2.6.32-303.el6]
...
- [mm] avoid swapping out with swappiness==0 (Satoru Moriya) [787885]

This obviously changed the way we think about “vm.swappiness=0″. Previously, setting this to 0 was thought to reduce the tendency to swap userland processes but not disable that completely. As such it was expected to see little swapping instead of OOM.

This applies to all RHEL/CentOS kernels > 2.6.32-303 and to other distributions that provide newer kernels such as Debian and Ubuntu. Or any other distribution where this change has been backported as in RHEL.

Let me share with you memory zones related statistics that were logged to the system log from one of the OOM event.

Mar 11 11:01:45 db01 kernel: Node 0 DMA: 4*4kB 2*8kB 2*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15680kB
Mar 11 11:01:45 db01 kernel: Node 0 DMA32: 6*4kB 22*8kB 444*16kB 374*32kB 129*64kB 26*128kB 15*256kB 17*512kB 2*1024kB 0*2048kB 0*4096kB = 45448kB
Mar 11 11:01:45 db01 kernel: Node 0 Normal: 825*4kB 1012*8kB 382*16kB 169*32kB 69*64kB 74*128kB 14*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 42436kB
Mar 11 11:01:45 db01 kernel: 452844 total pagecache pages
Mar 11 11:01:45 db01 kernel: 0 pages in swap cache
Mar 11 11:01:45 db01 kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 11 11:01:45 db01 kernel: Free swap  = 4128760kB
Mar 11 11:01:45 db01 kernel: Total swap = 4128760kB
...
Mar 11 11:01:45 db01 kernel: Node 0 DMA free:15680kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 11 11:01:45 db01 kernel: Node 0 DMA32 free:45448kB min:25140kB low:31424kB high:37708kB active_anon:1741812kB inactive_anon:520348kB active_file:4792kB inactive_file:462576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:386328kB writeback:76268kB mapped:936kB shmem:0kB slab_reclaimable:20420kB slab_unreclaimable:6964kB kernel_stack:0kB pagetables:572kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:142592 all_unreclaimable? no
Mar 11 11:01:45 db01 kernel: Node 0 Normal free:42436kB min:42316kB low:52892kB high:63472kB active_anon:3041852kB inactive_anon:643624kB active_file:340156kB inactive_file:1003512kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:979444kB writeback:22040kB mapped:15616kB shmem:180kB slab_reclaimable:41052kB slab_unreclaimable:35996kB kernel_stack:2720kB pagetables:19912kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:31552 all_unreclaimable? no

As can be seen the amount of free memory and the amount of memory in the page cache was greater than the high watermark, which prevented any swapping activity. Yet unnecessary memory pressure caused OOM to be invoked which killed the MySQL server process.

MySQL getting OOM’ed is bad for many reasons and can have an undesirable impact such as causing loss of uncommitted transactions or transactions not yet flushed to the log because of innodb_flush_log_at_trx_commit=0, or a much more heavy impact because of cold caches upon restart.

I prefer the old behavior of vm.swappiness and as such I now set it to a value of “1″. Setting vm.swappiness=0 would mean that you will now have to be much more accurate in how you configure the size of various global and session buffers.

The post OOM relation to vm.swappiness=0 in new kernel appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com