Jan
20
2021
--

Drain Kubernetes Nodes… Wisely

Drain Kubernetes Nodes Wisely

Drain Kubernetes Nodes WiselyWhat is Node Draining?

Anyone who ever worked with containers knows how ephemeral they are. In Kubernetes, not only can containers and pods be replaced, but the nodes as well. Nodes in Kubernetes are VMs, servers, and other entities with computational power where pods and containers run.

Node draining is the mechanism that allows users to gracefully move all containers from one node to the other ones. There are multiple use cases:

  • Server maintenance
  • Autoscaling of the k8s cluster – nodes are added and removed dynamically
  • Preemptable or spot instances that can be terminated at any time

Why Drain?

Kubernetes can automatically detect node failure and reschedule the pods to other nodes. The only problem here is the time between the node going down and the pod being rescheduled. Here’s how it goes without draining:

  1. Node goes down – someone pressed the power button on the server.
  2. kube-controller-manager

    , the service which runs on masters, cannot get the

    NodeStatus

    from the

    kubelet

    on the node. By default it tries to get the status every 5 seconds and it is controlled by

    --node-monitor-period

    parameter of the controller.

  3. Another important parameter of the
    kube-controller-manager

    is

    --node-monitor-grace-period

    , which defaults to 40s. It controls how fast the node will be marked as

    NotReady

    by the master.

  4. So after ~40 seconds
    kubectl get nodes

    shows one of the nodes as

    NotReady

    , but the pods are still there and shown as running. This leads us to

    --pod-eviction-timeout

    , which is 5 minutes by default (!). It means that after the node is marked as

    NotReady

    , only after 5 minutes Kubernetes starts to evict the Pods.

Drain Kubernetes Nodes

So if someone shuts down the server, then only after almost six minutes (with default settings), Kubernetes starts to reschedule the pods to other nodes. This timing is also valid for managed k8s clusters, like GKE.

These defaults might seem to be too high, but this is done to prevent frequent pods flapping, which might impact your application and infrastructure in a far more negative way.

Okay, Draining How?

As mentioned before – draining is the graceful method to move the pods to another node. Let’s see how draining works and what pitfalls are there.

Basics

kubectl drain {NODE_NAME}

command most likely will not work. There are at least two flags that need to be set explicitly:

  • --ignore-daemonsets

    – it is not possible to evict pods that run under a DaemonSet. This flag ignores these pods.

  • --delete-emptydir-data

    – is an acknowledgment of the fact that data from EmptyDir ephemeral storage will be gone once pods are evicted.

Once the drain command is executed the following happens:

  1. The node is cordoned. It means that no new pods can be placed on this node. In the Kubernetes world, it is a Taint
    node.kubernetes.io/unschedulable:NoSchedule

    placed on the node that most of the pods tolerate.

  2. Pods, except the ones that belong to DaemonSets, are evicted and hopefully scheduled on another node.

Pods are evicted and now the server can be powered off. Wrong.

DaemonSets

If for some reason your application or service uses a DaemonSet primitive, the pod was not drained from the node. It means that it still can perform its function and even receive the traffic from the load balancer or the service. 

The best way to ensure that it is not happening – delete the node from the Kubernetes itself.

  1. Stop the
    kubelet

    on the node.

  2. Delete the node from the cluster with
    kubectl delete {NODE_NAME}

If

kubelet

is not stopped, the node will appear again after the deletion.

Pods are evicted, node is deleted, and now the server can be powered off. Wrong again.

Load Balancer

Here is quite a standard setup:

kubernetes Load Balancer

The external load balancer sends the traffic to all Kubernetes nodes. Kube-proxy and Container Network Interface internals are dealing with routing the traffic to the correct pod.

There are various ways to configure the load balancer, but as you see it might be still sending the traffic to the node. Make sure that the node is removed from the load balancer before powering it off. For example, AWS node termination handler does not remove the node from the Load Balancer, which causes a short packet loss in the event of node termination.

Conclusion

Microservices and Kubernetes shifted the paradigm of systems availability. SRE teams are focused on resilience more than on stability. Nodes, containers, and load balancers can fail, but they are ready for it. Kubernetes is an orchestration and automation tool that helps here a lot, but there are still pitfalls that must be taken care of to meet SLAs.

Jan
13
2021
--

Running Kubernetes on the Edge

Running Kubernetes on the Edge

Running Kubernetes on the EdgeWhat is Edge

Edge is a buzzword that, behind the curtain, means moving private or public clouds closer to the end devices. End devices, such as the Internet of Things (from a doorbell to a VoIP station), become more complex and require more computational power.  There is a constant growth of connected devices and by the end of 2025, there will be 41.6 billion of them, generating 69.4 Zettabytes of data.

Latency, speed of data processing, or security concerns do not allow computation to happen in the cloud. Businesses rely on edge computing or micro clouds, which can run closer to the end devices. All this constructs the Edge.

How Kubernetes Helps Here

Containers are portable and quickly becoming a de facto standard to ship software. Kubernetes is a container orchestrator with robust built-in scaling capabilities. This gives the perfect toolset for businesses to shape their Edge computing with ease and without changing existing processes.

The cloud-native landscape has various small Kubernetes distributions that were designed and built for the Edge: k3s, microk8s, minikube, k0s, and newly released EKS Distro. They are lightweight, can be deployed with few commands, and are fully conformant. Projects like KubeEdge bring even more simplicity and standardization into the Kubernetes ecosystem on the Edge.

Running Kubernetes on the Edge also poses the challenge to manage hundreds and thousands of clusters. Google Anthos, Azure Arc, and VMWare Tanzu allow you to run your clusters anywhere and manage them through a single interface with ease.

Topologies

We are going to review various topologies that Kubernetes provides for the Edge to bring computation and software closer to the end devices.

The end device is a Kubernetes cluster

Kubernetes cluster edge

Some devices run complex software and require multiple components to operate – web servers, databases, built-in data-processing, etc. Using packages is an option, but compared to containers and automated orchestration, it is slow and sometimes turns the upgrade process into a nightmare. In such cases, it is possible to run a Kubernetes cluster on each end device and manage software and infrastructure components using well-known primitives.

The drawback of this solution is the overhead that comes from running etcd and masters’ components on every device.

The end device is a node

The end device is a node kubernetes

In this case, you can manage each end device through a single Kubernetes control plane. Deploying software to support and run your phones, printers or any other devices can be done through standard Kubernetes primitives.

Micro-clouds

kubernetes Micro-clouds

This topology is all about moving computational power closer to the end devices by creating micro-clouds on the Edge. Micro-cloud is formed by the Kubernetes nodes on the server farm on the customer premises. Running your AI/ML (like Kubeflow) or any other resource-heavy application in your own micro-cloud is done with Kubernetes and its primitives.

How Percona Addresses Edge Challenges

We at Percona continue to invest in the Kubernetes ecosystem and expand our partnership with the community. Our Kubernetes Operators for Percona XtraDB Cluster and MongoDB are open source and enable anyone to run production-ready MySQL and MongoDB databases on the Edge.

Check out how easy it is to deploy our operators on Minikube or EKS Distro (which is similar to microk8s). We are working on furthering Day 2 operations simplification and in future blog posts, you will see how to deploy and manage databases on multiple Kubernetes clusters with KubeApps.

Jan
11
2021
--

Full Read Consistency Within Percona Kubernetes Operator for Percona XtraDB Cluster

Full Read Consistency Within Percona Kubernetes Operator

Full Read Consistency Within Percona Kubernetes OperatorThe aim of Percona Kubernetes Operator for Percona XtraDB Cluster is to be a special type of controller introduced to simplify complex deployments. The Operator extends the Kubernetes API with custom resources. The Operator solution is using Percona XtraDB Cluster (PXC) behind the hood to provide a highly available, resilient, and scalable MySQL service in the Kubernetes space. 

This solution comes with all the advantages/disadvantages provided by Kubernetes, plus some advantages of its own like the capacity to scale reads on the nodes that are not Primary.

Of course, there are some limitations like the way PXC handles DDLs, which may impact the service, but there is always a cost to pay to get something, expecting to have it all for free is unreasonable.     

In this context, we need to talk and cover what is full read consistency in this solution and why it is important to understand the role it plays.  

Stale Reads

When using Kubernetes we should talk about the service and not about the technology/product used to deliver such service. 

In our case, the Percona Operator is there to deliver a MySQL service. We should then see that as a whole, as a single object. To be more clear what we must consider is NOT the fact we have a cluster behind the service but that we have a service that to be resilient and highly available, use a cluster. 

We should not care if a node/pod goes down unless the service is discontinued.

What we have as a plus in the Percona Operator solution is a certain level of READ scalability. This achieved optimizing the use of the non PRIMARY nodes, and instead of having them sitting there applying only replicated data, the Percona Operator provides access to them to scale the reads.  

But… there is always a BUT ? 

Let us start with an image:

 

By design, the apply and commit finalize in Galera (PXC) may have (and has) a delay between nodes. This means that, if using defaults, applications may have inconsistent reads if trying to access the data from different nodes than the Primary. 

It provides access using two different solutions:

  • Using HAProxy (default)
  • Using ProxySQL

 

 

When using HAProxy you will have 2 entry points:

  • cluster1-haproxy, which will point to the Primary ONLY, for reads and writes. This is the default entry point for the applications to the MySQL database.
  • cluster1-haproxy-replicas, which will point to all three nodes and is supposed to be used for READS only. This is the PLUS you can use if your application has READ/WRITE separation.

Please note that at the moment there is nothing preventing an application to use the cluster1-haproxy-replicas also for write, but that is dangerous and wrong because will generate a lot of certification conflicts and BF abort given it will distribute writes all over the cluster impacting on performance as well (and not giving you any write scaling):

 

[marcotusa@instance-1 ~]$ for i in `seq 1 100`; do mysql -h cluster1-haproxy-replicas -e "insert into test.iamwritingto values(null,@@hostname)";done
+----------------+-------------+
| host           | count(host) |
+----------------+-------------+
| cluster1-pxc-1 |          34 |
| cluster1-pxc-2 |          33 |
| cluster1-pxc-0 |          33 |
+----------------+-------------+

When using ProxySQL the entry point is a single one, but you may define query rules to automatically split the R/W requests coming from the application. This is the preferred method when an application has no way to separate the READS from the writes.

Here I have done a comparison of the two methods, HAProxy and ProxySQL.

Now, as mentioned above, by default, PXC (any Galera base solution) comes with some relaxed settings, for performance purposes. This is normally fine in many standard cases, but if you use the Percona Operator and use the PLUS of scaling reads using the second access point with HAproxy or Query Rules with Proxysql, you should NOT have stale reads, given the service must provide consistent data, as if you are acting on a single node. 

To achieve that you can change the defaults and change the parameter in PXC wsrep_sync_wait. 

When changing the parameter wsrep_sync_wait as explained in the documentation, the node initiates a causality check, blocking incoming queries while it catches up with the cluster. 

Once all data on the node receiving the READ request is commit_finalized, the node performs the read.

But this has a performance impact, as said before.

What Is The Impact?

To test the performance impact I had used a cluster deployed in GKE, with these characteristics:

  • 3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
  • 1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
  • PXC pods using:
    •  25GB of the 32 available 
    • 6 CPU of the 8 available
  • HAProxy:
    • 600m CPU
    • 1GB RAM
  • PMM agent
    • 500m CPU
    • 500 MB Ram

In the application node, I used sysbench running two instances, one in r/w mode the other only reads. Finally, to test the stale read, I used the stale read test from my test suite.

Given I was looking for results with a moderate load, I just used 68/96/128 threads per sysbench instance. 

Results

Marco, did we have or not have stale reads? Yes, we did:

I had from 0 (with very light load) up to 37% stale reads with a MODERATED load, where moderated was the 128 threads sysbench running. 

Setting wsrep_sync_wait=3 of course I had full consistency.  But I had performance loss:

As you can see, I had an average loss of 11% in case of READS:

While for writes the average loss was 16%. 

Conclusions

At this point, we need to stop and think about what is worth doing. If my application is READs heavy and READs scaling, it is probably worth enabling the full synchronicity given scaling on the additional node allows me to have 2x or more READs. 

If instead my application is write critical, probably losing also ~16% performance is not good.

Finally if my application is stale reads tolerant, I will just go with the defaults and get all the benefits without penalties.

Also keep in mind that Percona Kubernetes Operator for Percona XtraDB Cluster is designed to offer a MySQL service so the state of the single node is not as critical as if you are using a default PXC installation, PODs are by nature ephemeral objects while service is resilient.

References

Percona Kubernetes Operator for Percona XtraDB Cluster

https://github.com/Tusamarco/testsuite

https://en.wikipedia.org/wiki/Isolation_(database_systems)#Dirty_reads

https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-sync-wait

https://www.slideshare.net/lefred.descamps/galera-replication-demystified-how-does-it-work

Jan
11
2021
--

Percona Kubernetes Operator for Percona XtraDB Cluster: HAProxy or ProxySQL?

Percona Kubernetes Operator HAProxy or ProxySQL

Percona Kubernetes Operator HAProxy or ProxySQLPercona Kubernetes Operator for Percona XtraDB Cluster comes with two different proxies, HAProxy and ProxySQL. While the initial version was based on ProxySQL, in time, Percona opted to set HAProxy as the default Proxy for the operator, without removing ProxySQL. 

While one of the main points was to guarantee users to have a 1:1 compatibility with vanilla MySQL in the way the operator allows connections, there are also other factors that are involved in the decision to have two proxies. In this article, I will scratch the surface of this why.

Operator Assumptions

When working with the Percona Operator, there are few things to keep in mind:

  • Each deployment has to be seen as a single MySQL service as if a single MySQL instance
  • The technology used to provide the service may change in time
  • Pod resiliency is not guaranteed, service resiliency is
  • Resources to be allocated are not automatically calculated and must be identified at the moment of the deployment
  • In production, you cannot set more than 5 or less than 3 nodes when using PXC

There are two very important points in the list above.

The first one is that what you get IS NOT a Percona XtraDB Cluster (PXC), but a MySQL service. The fact that Percona at the moment uses PXC to cover the service is purely accidental and we may decide to change it anytime.

The other point is that the service is resilient while the pod is not. In short, you should expect to see pods stopping to work and being re-created. What should NOT happen is that service goes down. Trying to debug each minor issue per node/pod is not what is expected when you use Kubernetes. 

Given the above, review your expectations… and let us go ahead. 

The Plus in the Game (Read Scaling)

As said, what is offered with Percona Operator is a MySQL service. Percona has added a proxy on top of the nodes/pods that help the service to respect the resiliency service expectations. There are two possible deployments:

  • HAProxy
  • ProxySQL

Both allow optimizing one aspect of the Operator, which is read scaling. In fact what we were thinking was, given we must use a (virtually synchronous) cluster, why not take advantage of that and allow reads to scale on the other nodes when available? 

This approach will help all the ones using POM to have the standard MySQL service but with a plus. 

But, with it also comes with some possible issues like READ/WRITE splitting and stale reads. See this article about stale reads on how to deal with it. 

For R/W splitting we instead have a totally different approach in respect to what kind of proxy we implement. 

If using HAProxy, we offer a second entry point that can be used for READ operation. That entrypoint will balance the load on all the nodes available. 

Please note that at the moment there is nothing preventing an application to use the cluster1-haproxy-replicas also for write, but that is dangerous and wrong because will generate a lot of certification conflicts and BF abort, given it will distribute writes all over the cluster impacting on performance as well (and not giving you any write scaling). It is your responsibility to guarantee that only READS will go through that entrypoint.

If instead ProxySQL is in use, it is possible to implement automatic R/W splitting. 

Global Difference and Comparison

At this point, it is useful to have a better understanding of the functional difference between the two proxies and what is the performance difference if any. 

As we know HAProxy acts as a level 4 proxy when operating in TCP mode, it also is a forward-proxy, which means each TCP connection is established with the client with the final target and there is no interpretation of the data-flow.

ProxySQL on the other hand is a level 7 proxy and is a reverse-proxy, this means the client establishes a connection to the proxy who presents itself as the final backend. Data can be altered on the fly when it is in transit. 

To be honest, it is more complicated than that but allows me the simplification. 

On top of that, there are additional functionalities that are present in one (ProxySQL) and not in the other. The point is if they are relevant for use in this context or not. For a shortlist see below (source is from ProxySQL blog but data was removed) : 

As you may have noticed HAProxy is lacking some of that functionalities, like R/W split, firewalling, and caching, proper of the level 7 implemented in ProxySQL.  

The Test Environment

To test the performance impact I had used a cluster deployed in GKE, with these characteristics:

  • 3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
  • 1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
  • PXC pods using:
    •  25GB of the 32 available 
    • 6 CPU of the 8 available
  • HAProxy:
    • 600m CPU
    • 1GB RAM
  • PMM agent
    • 500m CPU
    • 500 MB Ram
  • Tests using sysbench as for (https://github.com/Tusamarco/sysbench), see in GitHub for command details.

What I have done is to run several tests running two Sysbench instances. One only executing reads, while the other reads and writes. 

In the case of ProxySQL, I had R/W splitting thanks to the Query rules, so both sysbench instances were pointing to the same address. While testing HAProxy I was using two entry points:

  • Cluster1-haproxy – for read and write
  • Cluster1-haproxy-replicas – for read only

Then I also compare what happens if all requests hit one node only. For that, I execute one Sysbench in R/W mode against one entry point, and NO R/W split for ProxySQL.

Finally, sysbench tests were executed with the –reconnect option to force the tests to establish new connections.

As usual, tests were executed multiple times, on different days of the week and moments of the day. Data reported is a consolidation of that, and images from Percona Monitoring and Management (PMM) are samples coming from the execution that was closest to the average values. 

Comparing Performance When Scaling Reads

These tests imply that one node is mainly serving writes while the others are serving reads. To not affect performance, and given I was not interested in maintaining full read consistency, the parameter wsrep_sync_wait was kept as default (0). 

HAProxy

HAProxy ProxySQL

A first observation shows how ProxySQL seems to keep a more stable level of requests served. The increasing load penalizes HAProxy reducing if ? the number of operations at 1024 threads.

HAProxy ProxySQL HAProxy ProxySQL read comparison

Digging a bit more we can see that HAProxy is performing much better than ProxySQL for the WRITE operation. The number of writes remains almost steady with minimal fluctuations. ProxySQL on the other hand is performing great when the load in write is low, then performance drops by 50%.

For reads, we have the opposite. ProxySQL is able to scale in a very efficient way, distributing the load across the nodes and able to maintain the level of service despite the load increase. 

If we start to take a look at the latency distribution statistics (sysbench histogram information), we can see that:

latency HAProxy latency ProxySQL

In the case of low load and writes, both proxies stay on the left side of the graph with a low value in ms. HAProxy is a bit more consistent and grouped around 55ms value, while ProxySQL is a bit more sparse and spans between 190-293ms.

About reads we have similar behavior, both for the large majority between 28-70ms. We have a different picture when the load increases:  

ProxySQL is having some occurrences where it performs better, but it spans in a very large range, from ~2k ms to ~29k ms. While HAProxy is substantially grouped around 10-11K ms. As a result, in this context, HAProxy is able to better serve writes under heavy load than ProxySQL. 

Again, a different picture in case of reads.

Here ProxySQL is still spanning on a wide range ~76ms – 1500ms, while HAProxy is more consistent but less efficient, grouping around 1200ms the majority of the service. This is consistent with the performance loss we have seen in READ when using high load and HAProxy.  

Comparing When Using Only One Node

But let us now discover what happens when using only one node. So using the service as it should be, without the possible Plus of read scaling. 

Percona Kubernetes Operator for Percona XtraDB Cluster

The first thing I want to mention is strange behavior that was consistently happening (no matter what proxy used) at 128 threads. I am investigating it but I do not have a good answer yet on why the Operator was having that significant drop in performance ONLY with 128 threads.

Aside from that, the results were consistently showing HAProxy performing better in serving read/writes. Keep in mind that HAProxy just establishes the connection point-to-point and is not doing anything else. While ProxySQL is designed to eventually act on the incoming stream of data. 

This becomes even more evident when reviewing the latency distribution. In this case, no matter what load we have, HAProxy performs better:

As you can notice, HAProxy is less grouped than when we have two entry points, but it is still able to serve more efficiently than ProxySQL.

Conclusions

As usual, my advice is to use the right tool for the job, and do not force yourself into something stupid. And as clearly stated at the beginning, Percona Kubernetes Operator for Percona XtraDB Cluster is designed to provide a MySQL SERVICE, not a PXC cluster, and all the configuration and utilization should converge on that.

ProxySQL can help you IF you want to scale a bit more on READS using the possible plus. But this is not guaranteed to work as it works when using standard PXC. Not only do you need to have a very good understanding of Kubernetes and ProxySQL if you want to avoid issues, but with HAProxy you can scale reads as well, but you need to be sure you have R/W separation at the application level.

In any case, utilizing HAProxy for the service is the easier way to go. This is one of the reasons why Percona decided to shift to HAProxy. It is the solution that offers the proxy service more in line with the aim of the Kubernetes service concept. It is also the solution that remains closer to how a simple MySQL service should behave.

You need to set your expectations correctly to avoid being in trouble later.

References

Percona Kubernetes Operator for Percona XtraDB Cluster

 

Wondering How to Run Percona XtraDB Cluster on Kubernetes? Try Our Operator!

The Criticality of a Kubernetes Operator for Databases

 

Jan
07
2021
--

RedHat is acquiring container security company StackRox

RedHat today announced that it’s acquiring container security startup StackRox . The companies did not share the purchase price.

RedHat, which is perhaps best known for its enterprise Linux products has been making the shift to the cloud in recent years. IBM purchased the company in 2018 for a hefty $34 billion and has been leveraging that acquisition as part of a shift to a hybrid cloud strategy under CEO Arvind Krishna.

The acquisition fits nicely with RedHat OpenShift, its container platform, but the company says it will continue to support StackRox usage on other platforms including AWS, Azure and Google Cloud Platform. This approach is consistent with IBM’s strategy of supporting multicloud, hybrid environments.

In fact, Red Hat president and CEO Paul Cormier sees the two companies working together well. “Red Hat adds StackRox’s Kubernetes-native capabilities to OpenShift’s layered security approach, furthering our mission to bring product-ready open innovation to every organization across the open hybrid cloud across IT footprints,” he said in a statement.

CEO Kamal Shah, writing in a company blog post announcing the acquisition, explained that the company made a bet a couple of years ago on Kubernetes and it has paid off. “Over two and half years ago, we made a strategic decision to focus exclusively on Kubernetes and pivoted our entire product to be Kubernetes-native. While this seems obvious today; it wasn’t so then. Fast forward to 2020 and Kubernetes has emerged as the de facto operating system for cloud-native applications and hybrid cloud environments,” Shah wrote.

Shah sees the purchase as a way to expand the company and the road map more quickly using the resources of Red Hat (and IBM), a typical argument from CEOs of smaller acquired companies. But the trick is always finding a way to stay relevant inside such a large organization.

StackRox’s acquisition is part of some consolidation we have been seeing in the Kubernetes space in general and the security space more specifically. That includes Palo Alto Networks acquiring competitor TwistLock for $410 million in 2019. Another competitor, Aqua Security, which has raised $130 million, remains independent.

StackRox was founded in 2014 and raised over $65 million, according to Crunchbase data. Investors included Menlo Ventures, Redpoint and Sequoia Capital. The deal is expected to close this quarter subject to normal regulatory scrutiny.

Jan
06
2021
--

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

One Shard Support in Kubernetes Operator for Percona Server for MongoDB

So far, Percona Kubernetes Operator for Percona Server for MongoDB (PSMDB) has supported only managing replica sets, but from version 1.6.0 it is possible to start a sharding cluster, although at the moment only with one shard. This is a step toward supporting full sharding, with multiple shards being added in a future release.

Components that were added to make this work are config replica set and mongos support, with all things that go around that like services, probes, statuses, etc. As well as starting a sharded cluster from scratch, it is also possible to migrate from a single replica set to a shard setup – and back.

Configuration Options for Sharding

A new section was added into the cr.yaml configuration called “sharding” where you can enable/disable sharding altogether. You can also change the number of running pods for config server replica set and mongos, set antiAffinityTopologyKey, podDisruptionBudget, resources, and define how the mongos service will be exposed.

Here’s how some simple config might look like:

sharding:
  enabled: true
    configsvrReplSet:
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
  mongos:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      enabled: true
      exposeType: LoadBalancer

The default number of pods for config server replica set and mongos is three, but you can use less if you enable the “allowUnsafeConfigurations” option.
There are more configuration options inside the cr.yaml, but some of them are commented out since they are probably a bit more specific to different use cases or environments.

This is how the pods and service setup might look like when you start the sharding cluster:

NAME                                               READY   STATUS    RESTARTS   AGE
my-cluster-name-cfg-0                              2/2     Running   0          2m38s
my-cluster-name-cfg-1                              2/2     Running   1          2m10s
my-cluster-name-cfg-2                              2/2     Running   1          103s
my-cluster-name-mongos-556bdd5b79-bkgd2            1/1     Running   0          2m36s
my-cluster-name-mongos-556bdd5b79-klkh6            1/1     Running   0          2m36s
my-cluster-name-mongos-556bdd5b79-nbgd9            1/1     Running   0          2m36s
my-cluster-name-rs0-0                              2/2     Running   0          2m40s
my-cluster-name-rs0-1                              2/2     Running   1          2m11s
my-cluster-name-rs0-2                              2/2     Running   1          104s
percona-server-mongodb-operator-587658ccc8-k6zpt   1/1     Running   0          3m14s

NAME                     TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)           AGE
my-cluster-name-cfg      ClusterIP      None          <none>        27017/TCP         2m58s
my-cluster-name-mongos   LoadBalancer   10.51.244.4   34.78.50.13   27017:31685/TCP   2m56s
my-cluster-name-rs0      ClusterIP      None          <none>        27017/TCP         3m

Here you can see that in this example, we have mongos service configured to be exposed with LoadBalancer and available through external IP. At the current moment, the client will connect to mongos instances through the load balancer service in a round-robin fashion, but in the future, it is planned to support session affinity (sticky method) so that the same client would connect to the same mongos instance most of the time.

Migrating From Replica Set to One Shard Setup (and Back)

MongoDB (in general) supports migrating from replica set to sharding setup and also from sharding to replica set, but it requires more or less manual steps depending on the complexity of existing architecture. Our Kubernetes operator, at the current moment, supports automatic migration from replica set to one shard and back from one shard to replica set.

These are the steps that PSMDB Kubernetes Operator does when we enable sharding but have an existing replica set:

  • restart existing replica set members with “–shardsvr” option included
  • deploy config server replica set and mongos as they are defined in cr.yaml (default is three pods for each)
    • create stateful set for config replica set
    • setup Kubernetes service for mongos and config replica set
  • add existing replica set as a shard in sharding cluster

In this process, data is preserved, but there might be additional steps needed with application users since they will become shard local users and not available through mongos (so it is needed to create them from mongos).

When we migrate from one shard setup to replica set, data is also preserved, the steps which are mentioned above are reverted, but in this case, application users are lost since they were stored in config replica set which doesn’t exist anymore – so they will need to be recreated.

SmartUpdate Strategy for Sharding Cluster

As you may know, both Percona Kubernetes Operators (Percona XtraDB Cluster and PSMDB) have SmartUpdate strategy which tries to upgrade the clusters automatically and with as little interruption for the application as possible.

When we are talking about sharding, this is what the steps look like:

  • disable the balancer
  • upgrade config replica set (secondaries first, step down primary, upgrade primary)
  • upgrade data replica set (secondaries first, step down primary, upgrade primary)
  • upgrade mongos pods
  • enable balancer

This is how this process might look in the Operator logs when we upgrade the cluster from one PSMDB version to another (some parts stripped for brevity):

{"level":"info","msg":"update Mongo version to 4.2.7-7 (fetched from db)"}
{"level":"info","msg":"waiting for config RS update"}
{"level":"info","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-cfg"}
{"level":"info","msg":"balancer disabled"}
{"level":"info","msg":"primary pod is my-cluster-name-cfg-0.my-cluster-name-cfg.psmdb-test.svc.cluster.local:27017"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-cfg-2"}
{"level":"info","msg":"pod my-cluster-name-cfg-2 started"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-cfg-1"}
{"level":"info","msg":"pod my-cluster-name-cfg-1 started"}
{"level":"info","msg":"doing step down..."}
{"level":"info","msg":"apply changes to primary pod my-cluster-name-cfg-0"}
{"level":"info","msg":"pod my-cluster-name-cfg-0 started"}
{"level":"info","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-cfg"}
{"level":"info","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-rs0"}
{"level":"info","msg":"primary pod is my-cluster-name-rs0-0.my-cluster-name-rs0.psmdb-test.svc.cluster.local:27017"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-rs0-2"}
{"level":"info","msg":"pod my-cluster-name-rs0-2 started"}
{"level":"info","msg":"apply changes to secondary pod my-cluster-name-rs0-1"}
{"level":"info","msg":"pod my-cluster-name-rs0-1 started"}
{"level":"info","msg":"doing step down..."}
{"level":"info","msg":"apply changes to primary pod my-cluster-name-rs0-0"}
{"level":"info","msg":"pod my-cluster-name-rs0-0 started"}
{"level":"info","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-rs0"}
{"level":"info","msg":"update Mongo version to 4.2.8-8 (fetched from db)"}
{"level":"info","msg":"waiting for mongos update"}
{"level":"info","msg":"balancer enabled"}

Conclusion

Although adding support for one shard cluster doesn’t sound too important since it doesn’t allow sharding data across shards, it is a big milestone and laying the foundation for things that are needed in the future to support this. Except for that, it might allow you to expose your data to applications in different ways through mongos instances, so if interested please check the documentation and release notes for more details.

Dec
10
2020
--

New Relic acquires Kubernetes observability platform Pixie Labs

Two months ago, Kubernetes observability platform Pixie Labs launched into general availability and announced a $9.15 million Series A funding round led by Benchmark, with participation from GV. Today, the company is announcing its acquisition by New Relic, the publicly traded monitoring and observability platform.

The Pixie Labs brand and product will remain in place and allow New Relic to extend its platform to the edge. From the outset, the Pixie Labs team designed the service to focus on providing observability for cloud-native workloads running on Kubernetes clusters. And while most similar tools focus on operators and IT teams, Pixie set out to build a tool that developers would want to use. Using eBPF, a relatively new way to extend the Linux kernel, the Pixie platform can collect data right at the source and without the need for an agent.

At the core of the Pixie developer experience are what the company calls “Pixie scripts.” These allow developers to write their debugging workflows, though the company also provides its own set of these and anybody in the community can contribute and share them as well. The idea here is to capture a lot of the informal knowledge around how to best debug a given service.

“We’re super excited to bring these companies together because we share a mission to make observability ubiquitous through simplicity,” Bill Staples, New Relic’s chief product officer, told me. “[…] According to IDC, there are 28 million developers in the world. And yet only a fraction of them really practice observability today. We believe it should be easier for every developer to take a data-driven approach to building software and Kubernetes is really the heart of where developers are going to build software.”

It’s worth noting that New Relic already had a solution for monitoring Kubernetes clusters. Pixie, however, will allow it to go significantly deeper into this space. “Pixie goes much, much further in terms of offering on-the-edge, live debugging use cases, the ability to run those Pixie scripts. So it’s an extension on top of the cloud-based monitoring solution we offer today,” Staples said.

The plan is to build integrations into New Relic into Pixie’s platform and to integrate Pixie use cases with New Relic One as well.

Currently, about 300 teams use the Pixie platform. These range from small startups to large enterprises and, as Staples and Pixie co-founder Zain Asgar noted, there was already a substantial overlap between the two customer bases.

As for why he decided to sell, Asgar — a former Google engineer working on Google AI and adjunct professor at Stanford — told me that it was all about accelerating Pixie’s vision.

“We started Pixie to create this magical developer experience that really allows us to redefine how application developers monitor, secure and manage their applications,” Asgar said. “One of the cool things is when we actually met the team at New Relic and we got together with Bill and [New Relic founder and CEO] Lew [Cirne], we realized that there was almost a complete alignment around this vision […], and by joining forces with New Relic, we can actually accelerate this entire process.”

New Relic has recently done a lot of work on open-sourcing various parts of its platform, including its agents, data exporters and some of its tooling. Pixie, too, will now open-source its core tools. Open-sourcing the service was always on the company’s road map, but the acquisition now allows it to push this timeline forward.

“We’ll be taking Pixie and making it available to the community through open source, as well as continuing to build out the commercial enterprise-grade offering for it that extends the New Relic One platform,” Staples explained. Asgar added that it’ll take the company a little while to release the code, though.

“The same fundamental quality that got us so excited about Lew as an EIR in 2007, got us excited about Zain and Ishan in 2017 — absolutely brilliant engineers, who know how to build products developers love,” Benchmark Ventures General Partner Eric Vishria told me. “New Relic has always captured developer delight. For all its power, Kubernetes completely upends the monitoring paradigm we’ve lived with for decades. Pixie brings the same easy to use, quick time to value, no-nonsense approach to the Kubernetes world as New Relic brought to APM. It is a match made in heaven.”

Dec
07
2020
--

Running MongoDB on Amazon EKS Distro

MongoDB on AWS EKS-D

MongoDB on AWS EKS-DLast year AWS was about to ban the “multi-cloud” term in co-branding guides for Partners, removed the ban after community and partners critique, and now embraces multi-cloud strategy.

One of the products that AWS announced during its last re:Invent was Amazon EKS Distro — Kubernetes distribution based on and used by Amazon Elastic Kubernetes Service. It is interesting because it is the first step to the new service — EKS Anywhere — which enables AWS customers to run EKS anywhere, even on bare-metal or any other cloud, and later allows them to seamlessly migrate from on-prem EKS directly to AWS.

In this blog post, we will show how easy it is to spin up Amazon EKS Distro (EKS-D) and set up MongoDB with Percona Kubernetes Operator for Percona Server for MongoDB.

Let the Show Begin

Give Me the Cluster

I just spun up the brand new Ubuntu 20.10 virtual machine. You can spin it up anywhere, I myself use Multipass — it gives command-line interface to launch Linux machines locally in seconds.

Installing EKS-D on Ubuntu is one command “effort”:

$ sudo snap install eks --classic --edge
Run configure hook of "eks" snap if present                                                                                                                                                                                                                                                     
eks (1.18/edge) v1.18.9 from Canonical? installed

EKS on Ubuntu gives the same look and feel as microk8s — it has its own command line (eks) and allows you to add/remove nodes easily if needed. Read more here.

Check if EKS is up and running:

# eks status
eks is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none

eks kubectl

 gives you direct access to regular Kubernetes API. Hint: you can get configuration from eks and put it into

.kube

folder to control EKS with

kubectl

(you may need to install it). I’m lazy and will continue using

eks kubectl

.

# mkdir ~/.kube/ ; eks config > ~/.kube/config
# eks kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-node-rrsbd                          1/1     Running   1          15m
calico-kube-controllers-555fc8cc5c-2ll8f   1/1     Running   0          15m
coredns-6788f546c9-x8q7l                   1/1     Running   0          15m
metrics-server-768748c8f4-qpxnp            1/1     Running   0          15m
hostpath-provisioner-66667bf7f-pfg8s       1/1     Running   0          15m

hostpath-provisioner

is running, which means the host path based storage class needed for the database is already there.

Give Me the Database

As promised we will use Percona Kubernetes Operator for Percona Server for MongoDB to spin up the database. And it is the same process described in our minikube installation guide (as long as you run 1 node only).

Get the code from github:

# git clone -b v1.5.0 https://github.com/percona/percona-server-mongodb-operator
# cd percona-server-mongodb-operator

Deploy the operator:

# eks kubectl apply -f deploy/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbs.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbbackups.psmdb.percona.com created
customresourcedefinition.apiextensions.k8s.io/perconaservermongodbrestores.psmdb.percona.com created
role.rbac.authorization.k8s.io/percona-server-mongodb-operator created
serviceaccount/percona-server-mongodb-operator created
rolebinding.rbac.authorization.k8s.io/service-account-percona-server-mongodb-operator created
deployment.apps/percona-server-mongodb-operator created

I have one node in my fancy EKS cluster and I will

  • Change the number of nodes in a replica set to 1 (
    size: 1

    )

  • Remove the
    antiAffinity

    configuration 

  • Set
    allowUnsafeConfigurations

    flag to true in

    deploy/cr.yaml

    . This flag set to

    true

    allows users to run unsafe configurations (like 1 node MongoDB cluster), this is useful for development or testing purposes, but of course not recommended for production.

spec:
...
  allowUnsafeConfigurations: true
  replsets:
  - name: rs0
    size: 1
#    affinity:
#      antiAffinityTopologyKey: "kubernetes.io/hostname"

Now, give me the database:

# eks kubectl apply -f deploy/cr.yaml
1 minute later…
# eks kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
percona-server-mongodb-operator-6b5dbccbd5-jh9x8   1/1     Running   0          7m35s
my-cluster-name-rs0-0                              2/2     Running   0          77s

Simple as that!

Conclusion

Setting up Kubernetes locally can be easily done not only with EKS Distro, but with minikube, microk8s, k3s, and other distributions. The true value of EKS-D will be shown once EKS Anywhere goes live in 2021 and unlocks the multi-cloud Kubernetes. Percona has always been an open source company: we embrace, value, and heavily invest in multi-cloud ecosystems. Our Kubernetes Operators for Percona XtraDB Cluster and MongoDB enable businesses to run their data on Kubernetes on any public or private cloud without lock-in. We also provide full support for our operators and databases running on your Kubernetes cluster.

Dec
01
2020
--

AWS brings ECS, EKS services to the data center, open sources EKS

Today at AWS re:Invent, Andy Jassy talked a lot about how companies are making a big push to the cloud, but today’s container-focussed announcements gave a big nod to the data center as the company announced ECS Anywhere and EKS Anywhere, both designed to let you run these services on-premises, as well as in the cloud.

These two services, ECS for generalized container orchestration and EKS for that’s focused on Kubernetes will let customers use these popular AWS services on premises. Jassy said that some customers still want the same tools they use in the cloud on prem and this is designed to give it to them.

Speaking of ECS he said,  “I still have a lot of my containers that I need to run on premises as I’m making this transition to the cloud, and [these] people really want it to have the same management and deployment mechanisms that they have in AWS also on premises and customers have asked us to work on this. And so I’m excited to announce two new things to you. The first is the launch, or the announcement of Amazon ECS Anywhere, which lets you run ECS and your own data center,” he told the re:Invent audience.

Image Credits: AWS

He said it gives you the same AWS API’s and cluster configuration management pieces. This will work the same for EKS, allowing this single management methodology regardless of where you are using the service.

While it was at it, the company also announced it was open sourcing EKS, its own managed Kubernetes service. The idea behind these moves is to give customers as much flexibility as possible, and recognizing what Microsoft, IBM and Google have been saying, that we live in a multi-cloud and hybrid world and people aren’t moving everything to the cloud right away.

In fact, in his opening Jassy stated that right now in 2020, just 4% of worldwide IT spend is on the cloud. That means there’s money to be made selling services on premises, and that’s what these services will do.

Nov
25
2020
--

Cast.ai nabs $7.7M seed to remove barriers between public clouds

When you launch an application in the public cloud, you usually put everything on one provider, but what if you could choose the components based on cost and technology and have your database one place and your storage another?

That’s what Cast.ai says that it can provide, and today it announced a healthy $7.7 million seed round from TA Ventures, DNX, Florida Funders and other unnamed angels to keep building on that idea. The round closed in June.

Company CEO and co-founder Yuri Frayman says that they started the company with the idea that developers should be able to get the best of each of the public clouds without being locked in. They do this by creating Kubernetes clusters that are able to span multiple clouds.

“Cast does not require you to do anything except for launching your application. You don’t need to know  […] what cloud you are using [at any given time]. You don’t need to know anything except to identify the application, identify which [public] cloud providers you would like to use, the percentage of each [cloud provider’s] use and launch the application,” Frayman explained.

This means that you could use Amazon’s RDS database and Google’s ML engine, and the solution decides how to make that work based on your requirements and price. You set the policies when you are ready to launch and Cast will take care of distributing it for you in the location and providers that you desire, or that makes most sense for your application.

The company takes advantage of cloud-native technologies, containerization and Kubernetes to break the proprietary barriers that exist between clouds, says company co-founder Laurent Gil. “We break these barriers of cloud providers so that an application does not need to sit in one place anymore. It can sit in several [providers] at the same time. And this is great for the Kubernetes application because they’re kind of designed with this [flexibility] in mind,” Gil said.

Developers use the policy engine to decide how much they want to control this process. They can simply set location and let Cast optimize the application across clouds automatically, or they can select at a granular level exactly the resources they want to use on which cloud. Regardless of how they do it, Cast will continually monitor the installation and optimize based on cost to give them the cheapest options available for their configuration.

The company currently has 25 employees with four new hires in the pipeline, and plans to double to 50 by the end of 2021. As they grow, the company is trying keep diversity and inclusion front and center in its hiring approach; they currently have women in charge of HR, marketing and sales at the company.

“We have very robust processes on the continuous education inside of our organization on diversity training. And a lot of us came from organizations where this was very visible and we took a lot of those processes [and lessons] and brought them here,” Frayman said.

Frayman has been involved with multiple startups, including Cujo.ai, a consumer firewall startup that participated in TechCrunch Disrupt Battlefield in New York in 2016.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com