Recently I listened to Lenny Rachitsky’s podcast, where he invited Shreyas Doshi for the second time. The session was titled “4 questions Shreyas Doshi wishes he’d asked himself sooner”. One of the questions Shreyas brought up was, “Do I actually have a good taste?”. This is an interesting question to ask for an experienced product […]
04
2024
Exposing PostgreSQL with NGINX Ingress Controller
I wrote a blog post in the past about a generic approach on how to expose databases in Kubernetes with Ingress controllers. Today, we will dive deeper into PostgreSQL with ingress and also review two different ways that can be taken to expose the TCP service. The goal is to expose multiple PostgreSQL clusters through […]
12
2024
Simplify User Management with Percona Operator for MongoDB
Managing database users within complex CICD pipelines and GitOps workflows has long been a challenge for MongoDB deployments. With Percona Operator for MongoDB 1.17, we introduce a new feature, currently in technical preview, that streamlines this process. Now, you can create the database users you need directly within the operator, eliminating the need to wait […]
29
2024
Beyond The Horizon: Mastering Percona Server for MongoDB Exposure in Kubernetes – Part Two – Istio
This is the second part of the series of blog posts unmasking the complexity of MongoDB cluster exposure in Kubernetes with Percona Operator for MongoDB. In the first part, we focused heavily on split horizons and a single replica set. In this part, we will expose a sharded cluster and a single replica set with Istio, […]
28
2024
Beyond the Horizon: Mastering Percona Server for MongoDB Exposure in Kubernetes – Part One
Running and managing MongoDB clusters in Kubernetes is made easy with the Percona Operator for MongoDB. Some aspects are just easy to grasp as they are well defined in the operator custom resources and documentation, but some are often considered to be a hidden craft. Network exposure in cases of sharded clusters is quite straightforward, […]
08
2024
Troubleshooting PostgreSQL on Kubernetes With Coroot
Coroot, an open source observability tool powered by eBPF, went generally available with version 1.0 last week. As this tool is cloud-native, we were curious to know how it can help troubleshoot databases on Kubernetes.In this blog post, we will see how to quickly debug PostgreSQL with Coroot and Percona Operator for PostgreSQL.PrepareInstall CorootThe easiest […]
09
2024
Create an AI Expert With Open Source Tools and pgvector
2023 was the year of Artificial Intelligence (AI). A lot of companies are thinking about how they can improve user experience with AI, and the most usual first step is to use company data (internal docs, ticketing systems, etc.) to answer customer questions faster and (or) automatically.In this blog post, we will explain the basic […]
27
2023
Cloud Native Predictions for 2024
The evolution of cloud-native technology has been nothing short of revolutionary. As we step into 2024, the cornerstone of cloud-native technology, Kubernetes, will turn ten years old. It continues to solidify its position and is anticipated to reach USD 5575.67 million by 2028, with a forecasted Compound Annual Growth Rate (CAGR) of 18.51% in the coming years, as reported by Industry Research Biz.
The Cloud Native landscape continues to encompass both micro-trends and IT macro-trends, influencing and transforming the way businesses operate and deliver value to their customers.
As we at Percona wind down 2023 and look into what the next year holds, our attention is drawn to the cloud-native landscape and how it is maturing, growing, and evolving.
KubeCon NA 2023 recap
The theme for KubeCon NA was very clear — AI and Large Language Models (LLMs). Keynotes were focused on how Kubernetes and Cloud Native help businesses embrace the new AI era. And it is understandable, as Kubernetes slowly becomes what it is intended to be – the Platform.
The field of Platform Engineering has witnessed significant advancements, as evidenced by the publication of the CNCF platform whitepaper and the introduction of a dedicated Platform Engineering day at the upcoming KubeCon event. At Percona, we observe a growing trend among companies utilizing Kubernetes as a means to offer services to their teams, fostering expedited software delivery and driving business growth.
Declarative GitOps management, with ArgoCD and Flux, is the community way of adding orchestration on top of orchestration. In our conversations with developers and engineers during the conference, we confirmed the CNCF GItOps Microsurvey data – 91% are already using GitOps.
According to the Dynatrace Kubernetes in the Wild 2023 report, a significant 71% (with 48% year-over-year growth!) of respondents are currently utilizing databases in Kubernetes (k8s). This finding aligns with the observations made at the Data on Kubernetes (DoK) day, where discussions surrounding this topic transitioned from niche, tech-oriented conversations a year ago to more widespread, enterprise-level interest in adopting diverse use cases. These indicators suggest that the adoption of databases on k8s is in its early stages and is likely to continue growing in the future.
Predictions
Multi-cloud is a requirement
While this wave has been building for years, in 2024, we expect it to peak. According to a 2023 Forrester survey commissioned by Hashicorp, 61% of respondents had implemented, were expanding, or were upgrading their multi-cloud strategy. We expect that number to rise higher in 2024.
Nearly every vendor at Kubecon and every person we spoke to had some form of a multi-cloud requirement or strategy. Sometimes, this comes from necessity through acquisition or mergers. Oftentimes, it is a pillar of modern infrastructure strategy to avoid cloud vendor lock-in. At this point, it is ubiquitous, and if it is not part of your strategy, you are falling behind.
The business value of adopting this strategy is multi-fold:
- Freedom from vendor lock-in, which leads to increased negotiating power
- Agility in capitalizing on cloud-vendor advancements to innovate faster
- Increased application and database architecture RPO and RTO
- Adhering to security and governance requirements of customers
Percona’s Operators for MySQL, MongoDB, and PostgreSQL are designed with this value in mind. We want adopters of our technology to be able to deploy their critical open source databases and applications across any public or private cloud environment. All of the database automation for running a highly available, resilient, and secure database is built into the operator to simplify the operation and management of your clusters.
Simplify and secure
Looking through various State of Kubernetes reports (VMWare, RedHat, SpectroCloud), it becomes clear that Complexity and Security are the top concerns for platform engineering teams.
Simplification might come from different angles. Deployment is mostly solved already, whereas management and operations are still not. We expect to see various tooling and core patches to automate scaling, upgrades, migrations, troubleshooting, and more.
Operators are an integral part of solving the complexity problem, where they take away the need for learning k8s primitives and application configuration internals. They also remove toil and allow engineers to focus on application development vs platform engineering work. Not only will new operators appear, but existing operators will mature and provide capabilities that meet or exceed managed services that users can get on public clouds.
The latest report on Kubernetes adoption, security, and market trends in 2023 revealed that 67% reported delaying or slowing down deployment due to Kubernetes security concerns. Additionally, 37% of respondents experienced revenue or customer loss due to a container/Kubernetes security incident.
Considering the open source software vulnerability as one of the top concerns and the rapid increase in supply chain attacks (the SolarWinds attack and vulnerabilities like Log4Shell and Spring4Shell), along with container and Kubernetes strategies, there’s a growing emphasis on cybersecurity and operational understanding in development.
Another significant issue within security concerns is the escalating complexity of modern systems, especially in platforms like Kubernetes, which highlights the need for unified threat models and scanning tools to address vulnerabilities. Standardization and collaboration are key to sharing common knowledge and patterns across teams and infrastructures. Creating repositories for memory-safe patterns in cloud systems to improve overall security.
A majority of RedHat’s security research respondents have a DevSecOps initiative underway. Most organizations are embracing DevSecOps, a term that covers processes and tooling enabling security to be integrated into the application development life cycle rather than treated as a separate process. However, 17% of organizations operate security separately from DevOps, lacking any DevSecOps initiatives. Consequently, they might miss out on the benefits of integrating security into the SDLC, such as enhanced efficiency, speed, and quality in software delivery.
AI and MLOps
Kubernetes has become a new web server for many production AI workloads, focusing on facilitating the development and deployment of AI applications, including model training. The newly formed Open Source AI Alliance, led by META and IBM, promises to support open-source AI. It comprises numerous organizations from various sectors, including software, hardware, nonprofit, public, and academic. The goal is to collaboratively develop tools and programs facilitating open development and run scalable and distributed training jobs for popular frameworks such as PyTorch, TensorFlow, MPI, MXNet, PaddlePaddle, and XGBoost.
While integrating AI and machine learning into cloud-native architectures, there’s an increasing demand from users for AI to be open and collaborative. The emergence of trends stemming from ‘AI Advancements and Ethical Concerns’ cannot be ignored.
Addressing ethical concerns and biases will necessitate the implementation of transparent AI frameworks and ethical guidelines during application development. Customers will increasingly prioritize AI efficiency and education to tackle legal and ethical concerns. This marks the end of an era of chaos, paving the way for efficiency gains, quicker innovation, and standardized practices.
Conclusion
At Percona, we prioritize staying ahead of market trends by adhering to industry best practices and leveraging our team’s expertise.
We’ve always made sure to focus on security in our software development, and weaving multi-cloud deployment into our products has been a crucial part of our strategy. Our commitment to open source software drives us to take additional precautions, ensuring operational security through best practices and principles, such as of least privilege, security in layers, and separation of roles/responsibilities through policy and software controls. And with multi-cloud in mind, we consistently incorporate new sharding functionalities into our roadmaps, such as the upcoming Shard-per-location support in the Percona Operator for MongoDB.
At the same time, we are not hesitating to rock the cloud-native community by incorporating top-notch features to address any new rising trends. You mentioned ‘More Simple Kubernetes’? Well, here we are – with storage autoscaling for databases in Kubernetes, slated for release in Q1, 2024 after a year of hard work. This fully automated scaling and tuning will enable a serverless-like experience in our Operators and Everest. Developers will receive the endpoint without needing to consider resources and tuning at all. It’s worry-free and doesn’t require human intervention.
Finally, the rising popularity of generative AI and engines like OpenAI or Bard has prompted our team to bring vector-handling capabilities to Percona-powered database software by adding support for the pgvector extension.
Our team always focuses on innovation to accelerate progress for everyone, and we will continue to push the boundaries further for our community and the rest of the world.
The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.
11
2023
Storage Strategies for PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes is not new and can be easily streamlined through various Operators, including Percona’s. There are a wealth of options on how you can approach storage configuration in Percona Operator for PostgreSQL, and in this blog post, we review various storage strategies — from basics to more sophisticated use cases.
The basics
Setting StorageClass
StorageClass resource in Kubernetes allows users to set various parameters of the underlying storage. For example, you can choose the public cloud storage type – gp3, io2, etc, or set file system.
You can check existing storage classes by running the following command:
$ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE premium-rwo pd.csi.storage.gke.io Delete WaitForFirstConsumer true 54m regionalpd-storageclass pd.csi.storage.gke.io Delete WaitForFirstConsumer false 51m standard kubernetes.io/gce-pd Delete Immediate true 54m standard-rwo (default) pd.csi.storage.gke.io Delete WaitForFirstConsumer true 54m
As you see
standard–rwo is a default StorageClass, meaning that if you don’t specify anything, the Operator will use it.
To instruct Percona Operator for PostgreSQL which storage class to use, set it the
spec.instances.[].dataVolumeClaimSpec section:
dataVolumeClaimSpec: accessModes: - ReadWriteOnce storageClassName: STORAGE_CLASS_NAME resources: requests: storage: 1Gi
Separate volume for Write-Ahead Logs
Write-Ahead Logs (WALs) keep the recording of every transaction in your PostgreSQL deployment. They are useful for point-in-time recovery and minimizing your Recovery Point Objective (RPO). In Percona Operator, it is possible to have a separate volume for WALs to minimize the impact on performance and storage capacity. To set it, use
spec.instances.[].walVolumeClaimSpec section:
walVolumeClaimSpec: accessModes: - ReadWriteOnce storageClassName: STORAGE_CLASS_NAME resources: requests: storage: 1Gi
If you enable
walVolumeClaimSpec, the Operator will create two volumes per replica Pod – one for data and one for WAL:
cluster1-instance1-8b2m-pgdata Bound pvc-2f919a49-d672-49cb-89bd-f86469241381 1Gi RWO standard-rwo 36s cluster1-instance1-8b2m-pgwal Bound pvc-bf2c26d8-cf42-44cd-a053-ccb6abadd096 1Gi RWO standard-rwo 36s cluster1-instance1-ncfq-pgdata Bound pvc-7ab7e59f-017a-4655-b617-ff17907ace3f 1Gi RWO standard-rwo 36s cluster1-instance1-ncfq-pgwal Bound pvc-51baffcf-0edc-472f-9c95-7a0cea3e6507 1Gi RWO standard-rwo 36s cluster1-instance1-w4d8-pgdata Bound pvc-c60282ed-3599-4033-afc7-e967871efa1b 1Gi RWO standard-rwo 36s cluster1-instance1-w4d8-pgwal Bound pvc-ef530cb4-82fb-4661-ac76-ee7fda1f89ce 1Gi RWO standard-rwo 36s
Changing storage size
If your StorageClass and storage interface (CSI) supports VolumeExpansion, you can just change the storage size in the Custom Resource manifest. The operator will do the rest and expand the storage automatically. This is a zero-downtime operation and is limited by underlying storage capabilities only.
Changing storage
It is also possible to change the storage capabilities, such as filesystem, IOPs, and type. Right now, it is possible through creating a new storage class and applying it to the new instance group.
spec: instances: - name: newGroup dataVolumeClaimSpec: accessModes: - ReadWriteOnce storageClassName: NEW_STORAGE_CLASS resources: requests: storage: 2Gi
Creating a new instance group replicates the data to new replica nodes. This is done without downtime, but replication might introduce additional load on the primary node and the network.
There is work in progress under Kubernetes Enhancement Proposal (KEP) #3780. It will allow users to change various volume attributes on the fly vs through the storage class.
Data persistence
Finalizers
By default, the Operator keeps the storage and secret resources if the cluster is deleted. We do it to protect the users from human errors and other situations. This way, the user can quickly start the cluster, reusing the existing storage and secrets.
This default behavior can be changed by enabling a finalizer in the Custom Resource:
apiVersion: pgv2.percona.com/v2 kind: PerconaPGCluster metadata: name: cluster1 finalizers: - percona.com/delete-pvc - percona.com/delete-ssl
This is useful for non-production clusters where you don’t need to keep the data.
StorageClass data protection
There are extreme cases where human error is inevitable. For example, someone can delete the whole Kubernetes cluster or a namespace. Good thing that StorageClass resource comes with reclaimPolicy option, which can instruct Container Storage Interface to keep the underlying volumes. This option is not controlled by the operator, and you should set it for the StorageClass separately.
> apiVersion: storage.k8s.io/v1 > kind: StorageClass > ... > provisioner: pd.csi.storage.gke.io - reclaimPolicy: Delete + reclaimPolicy: Retain
In this case, even if Kubernetes resources are deleted, the physical storage is still there.
Regional disks
Regional disks are available at Azure and Google Cloud but not yet at AWS. In a nutshell, it is a disk that is replicated across two availability zones (AZ).
To use regional disks, you need a storage class that specifies in which AZs will it be available and replicated to:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: regionalpd-storageclass provisioner: pd.csi.storage.gke.io parameters: type: pd-balanced replication-type: regional-pd volumeBindingMode: WaitForFirstConsumer allowedTopologies: - matchLabelExpressions: - key: topology.gke.io/zone values: - us-central1-a - us-central1-b
There are some scenarios where regional disks can help with cost reduction. Let’s review three PostgreSQL topologies:
- Single node with regular disks
- Single node with regional disks
- PostgreSQL Highly Available cluster with regular disks
If we apply availability zone failure to these topologies, we will get the following:
- Single node with regular disks is the cheapest one, but in case of AZ failure, recovery might take hours or even days – depending on the data.
- With single node and regional disks, you will not be spending a dime on compute for replicas, but at the same time, you will recover within minutes.
- PostgreSQL cluster provides the best availability, but also comes with high compute costs.
Single PostgreSQL node, regular disk | Single PostgreSQL node, regional disks | PostgreSQL HA, regular disks | |
Compute costs | $ | $ | $$ |
Storage costs | $ | $$ | $$ |
Network costs | $0 | $0 | $ |
Recovery Time Objective | Hours | Minutes | Seconds |
Local storage
One of the ways to reduce your total cost of ownership (TCO) for stateful workloads on Kubernetes and boost your performance is to use local storage as opposed to network disks. Public clouds provide instances with NVMe SSDs that can be utilized in k8s with tools like OpenEBS, Portworx, and more. The way it is consumed is through regular storage classes and deserves a separate blog post.
Conclusion
In this blog post, we discussed the basics of storage configuration and saw how to fine-tune various storage parameters. There are different topologies, needs, and corresponding strategies for running PostgreSQL on Kubernetes, and depending on your cost, performance, and availability needs, you have a wealth of options with Percona Operators.
Try out the Percona Operator for PostgreSQL by following the quickstart guide here.
Join the Percona Kubernetes Squad – a group of database professionals at the forefront of innovating database operations on Kubernetes within their organizations and beyond. The Squad is dedicated to providing its members with unwavering support as we all navigate the cloud-native landscape.
06
2023
Percona Operators Custom Resource Monitoring With Kube-state-metrics
There are more than 300 Operators in operatorhub, and the number is growing. Percona Operators allow users to easily manage complex database systems in a Kubernetes environment. With Percona Operators, users can easily deploy, monitor, and manage databases orchestrated by Kubernetes, making it easier and more efficient to run databases at scale.
Our Operators come with Custom Resources that have their own statuses and fields to ease monitoring and troubleshooting. For example, PerconaServerMongoDBBackup resource has information about the backup, like the success or failure of the backup. Obviously, there are ways to monitor the backup through storage monitoring or Pod status, but why bother if the Operator already provides this information?
In this article, we will see how someone can monitor Custom Resources that are created by the Operators with kube-state-metrics (KSM), a standard and widely adopted service that listens to the Kubernetes API server and generates metrics. These methods can be applied to any Custom Resources.
Please find the code and recipes from this blog post in this GitHub repository.
The problem
Kube-state-metrics talks to Kubernetes API and captures the information about various resources – Pods, Deployments, Services, etc. Once captured, the metrics are exposed. In the monitoring pipeline, a tool like Prometheus scrapes the metrics exposed.
The problem is that the Custom Resource manifest structure varies depending on the Operator. KSM does not know what to look for in the Kubernetes API. So, our goal is to explain which fields in the Custom Resource we want kube-state-metrics to capture and expose.
The solution
Kube-state-metrics is designed to be extendable for capturing custom resource metrics. It is possible to specify through the custom configuration the resources you need to capture and expose.
Details
Install Kube-state-metrics
To start with, install kube-state-metrics if not done already. We observed issues in scraping custom resource metrics using version 2.5.0. We were able to scrape custom resource metrics without any issues from version >= 2.8.2.
Identify the metrics you want to expose along with the path
Custom resources have a lot of fields. You need to choose the fields that need to be exposed.
For example, the Custom resource “PerconaXtraDBCluster“ has plenty of fields: “spec.crVersion” indicates the CR version, “spec.pxc.size” shows the number of Percona XtraDB Cluster nodes set by the user (We will later look at how to monitor the number of nodes in PXC cluster in a better way).
Metrics can be captured from the status field of the Custom Resources if present. For example:
Following is the status of CustomResource PerconaXtraDBCluster fetched.
status.state indicates the status of Custom Resource, which is very handy information.
$ kubectl get pxc pxc-1 -oyaml | yq 'del(.status.conditions) | .status' backup: {} haproxy: … ready: 3 size: 3 status: ready pxc: … ready: 3 size: 3 status: ready version: 8.0.29-21.1 ready: 6 size: 6 state: ready
Decide the type of metrics for the fields identified
As of today, kube-state-metrics supports three types of metrics available in the open metrics specification:
- Gauge
- StateSet
- Info
Based on the fields selected, map the fields identified to how you want to expose it. For example:
spec.crVersion remains constant throughout the lifecycle of the custom resource until it’s upgraded. Metric type “Info” would be a better fit for this.
spec.pxc.size is a number, and it keeps changing based on the number desired by the user and operator configurations. Even though the number is pretty much constant in the later phase of the lifecycle of the custom resource, it can change. “Gauge” is a great fit for this type of metric.
status.state can take one of the following possible values. “StateSet” would be a better fit for this type of metric.
Derive the configuration to capture custom resource metrics
As per the documentation, configuration needs to be added to kube-state-metrics deployment to define your custom resources and the fields to turn into metrics.
Configuration derived for the three metrics discussed above can be found here.
Consume the configuration in kube-state-metrics deployment
As per the official documentation, there are two ways to apply custom configurations:
- Inline: By using
—custom–resource–state–config “inline yaml” - Refer a file: By using
—custom–resource–state–config–file /path/to/config.yaml
Inline is not handy if the configuration is big. Referring to a file is better and gives more flexibility.
It is important to note that the path to file is the path in the container file system of kube-state-metrics. There are several ways to get a file into the container file system, but one of the options is to mount the data of a ConfigMap to a container.
Steps:
1. Create a configmap with the configurations derived
2. Add configmap as a volume to the kube-state-metrics pod.
volumes: - configMap: name: customresource-config-ksm name: cr-config
3. Mount the volume to the container. As per the Dockerfile of the kube-state-metrics, path “/go/src/k8s.io/kube-state-metrics/” can be used to mount the file.
volumeMounts: - mountPath: /go/src/k8s.io/kube-state-metrics/ name: cr-config
Provide permission to access the custom resources
By default, kube-state-metrics will have permission to access standard resources only as per the ClusterRole. If deployment is done without adding additional privileges, required metrics won’t be scraped.
Add additional privileges based on the custom resource you want to monitor. In this example, we will add additional privileges to monitor
PerconaXtraDBCluster,
PerconaXtraDBClusterBackup,
PerconaXtraDBClusterRestore.
Apply cluster-role and check the logs to see if custom resources are being captured
Validate the metrics being captured
Check the logs of kube-state-metrics
$ kubectl logs -f deploy/kube-state-metrics I0706 14:43:25.273822 1 wrapper.go:98] "Starting kube-state-metrics" . . . I0706 14:43:28.285613 1 discovery.go:274] "discovery finished, cache updated" I0706 14:43:28.285652 1 metrics_handler.go:99] "Autosharding disabled" I0706 14:43:28.288930 1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_pxc_info kube_customresource_pxc_size kube_customresource_pxc_status_state] I0706 14:43:28.411540 1 builder.go:275] "Active resources" activeStoreNames="certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments,pxc.percona.com/v1, Resource=perconaxtradbclusters"
Check the kube-state-metrics service to list the metrics scraped.
Open a terminal and keep the port-forward command running:
$ kubectl port-forward svc/kube-state-metrics 8080:8080 Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Handling connection for 8080 Handling connection for 8080
In a browser, check for the metrics captured using “127.0.0.1:8080” (remember to keep the terminal running where the port-forward command is running).
Observe the metrics
kube_customresource_pxc_info,
kube_customresource_pxc_status_state,
kube_customresource_pxc_size being captured.
# HELP kube_customresource_pxc_info Information of PXC cluster on k8s # TYPE kube_customresource_pxc_info info kube_customresource_pxc_info{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",version="1.9.0"} 1 # HELP kube_customresource_pxc_size Desired size for the PXC cluster # TYPE kube_customresource_pxc_size gauge kube_customresource_pxc_size{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1"} 3 # HELP kube_customresource_pxc_status_state State of PXC Cluster # TYPE kube_customresource_pxc_status_state stateset kube_customresource_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",state="error"} 1 kube_customresource_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",state="initializing"} 0 kube_customresource_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",state="paused"} 0 kube_customresource_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",state="ready"} 0 kube_customresource_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",state="unknown"} 0
Customize the metric name, add default labels
As seen above, the metrics captured had the prefix
kube_customresource. What if we want to customize it?
There are some standard labels, like the name of the custom resource and namespace of the custom resources, which might need to be captured as labels for all the metrics related to a custom resource. It’s not practical to add this for every single metric captured. Hence, identifiers
labelsFromPath and
metricNamePrefix are used.
In the below snippet, all the metrics captured for the group
pxc.percona.com, version
v1, kind
PerconaXtrDBCluster will have the metric prefix
kube_pxc and all the metrics will have the following labels-
- name – Derived from the path metadata.name of the custom resource
- namespace – Derived from the path metadata.namespace of the custom resource.
spec: resources: - groupVersionKind: group: pxc.percona.com version: v1 kind: PerconaXtraDBCluster labelsFromPath: name: [metadata,name] namespace: [metadata,namespace] metricNamePrefix: kube_pxc
Change the configuration present in the configmap and apply the new configmap.
When the new configmap is applied, kube-state-metrics should automatically pick up the configuration changes; you can also do a “kubectl rollout restart deploy kube-state-metrics” to expedite the pod restart.
Once the changes are applied, check the metrics by port-forwarding to kube-state-metrics service.
$ kubectl port-forward svc/kube-state-metrics 8080:8080 Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Handling connection for 8080 Handling connection for 8080
In a browser, check for the metrics captured using “127.0.0.1:8080” (remember to keep the terminal running where the port-forward command is running).
Observe the metrics:
# HELP kube_pxc_pxc_info Information of PXC cluster on k8s # TYPE kube_pxc_pxc_info info kube_pxc_pxc_info{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",version="1.9.0"} 1 # HELP kube_pxc_pxc_size Desired size for the PXC cluster # TYPE kube_pxc_pxc_size gauge kube_pxc_pxc_size{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc"} 3 # HELP kube_pxc_pxc_status_state State of PXC Cluster # TYPE kube_pxc_pxc_status_state stateset kube_pxc_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",state="error"} 1 kube_pxc_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",state="initializing"} 0 kube_pxc_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",state="paused"} 0 kube_pxc_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",state="ready"} 0 kube_pxc_pxc_status_state{customresource_group="pxc.percona.com",customresource_kind="PerconaXtraDBCluster",customresource_version="v1",name="cluster1",namespace="pxc",state="unknown"} 0
Labels customization
By default, kube-state-metrics doesn’t capture all the labels of the resources. However, this might be handy in deriving co-relations from custom resources to the k8s objects. To add additional labels, use the flag
—metric–labels–allowlist as mentioned in the documentation.
To demonstrate, changes are made to the kube-state-metrics deployment and applied.
Check the metrics by doing a port-forward to the service as instructed earlier.
Check the labels captured of pod
cluster1–pxc–0:
kube_pod_labels{namespace="pxc",pod="cluster1-pxc-0",uid="1083ac08-5c25-4ede-89ce-1837f2b66f3d",label_app_kubernetes_io_component="pxc",label_app_kubernetes_io_instance="cluster1",label_app_kubernetes_io_managed_by="percona-xtradb-cluster-operator",label_app_kubernetes_io_name="percona-xtradb-cluster",label_app_kubernetes_io_part_of="percona-xtradb-cluster"} 1
Labels of the pod can be checked in the cluster:
$ kubectl get po -n pxc cluster1-pxc-0 --show-labels NAME READY STATUS RESTARTS AGE LABELS cluster1-pxc-0 3/3 Running 0 3h54m app.kubernetes.io/component=pxc,app.kubernetes.io/instance=cluster1,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator,app.kubernetes.io/name=percona-xtradb-cluster,app.kubernetes.io/part-of=percona-xtradb-cluster,controller-revision-hash=cluster1-pxc-6f4955bbc7,statefulset.kubernetes.io/pod-name=cluster1-pxc-0
Adhering to the Prometheus conventions, character . (dot) is replaced with _(underscore). Only labels mentioned in the
—metric–labels–allowlist are captured for the labels info.
Checking for the other pod:
$ kubectl get po -n kube-system kube-state-metrics-7bd9c67f64-46ksw --show-labels NAME READY STATUS RESTARTS AGE LABELS kube-state-metrics-7bd9c67f64-46ksw 1/1 Running 1 (40m ago) 120m app.kubernetes.io/component=exporter,app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.9.2,pod-template-hash=7bd9c67f64
Following are the labels captured in the kube-state-metrics service:
kube_pod_labels{namespace="kube-system",pod="kube-state-metrics-7bd9c67f64-46ksw",uid="d4b30238-d29e-4251-a8e3-c2fad1bff724",label_app_kubernetes_io_component="exporter",label_app_kubernetes_io_name="kube-state-metrics"} 1
As can be seen above, label
app.kubernetes.io/version is not captured because it was not mentioned in the
—metric–labels–allowlist flag of kube-state-metrics.
Conclusion
- Custom Resource metrics can be captured by modifying kube-state-metrics deployment. Metrics can be captured without writing any code.
- Alternate to the above method, the custom exporter can be written to expose the metrics, which gives a lot of flexibility. However, this needs coding and maintenance.
- Metrics can be scraped by Prometheus to derive useful insights combined with the other metrics.
If you want to extend the same process to other custom resources related to Percona Operators, use the following ClusterRole to provide permission to read the relevant custom resources. Configurations for some of the important metrics related to the custom resources are captured in this Configmap for you to explore.
The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.