Aug
05
2024
--

Open Source AI Database Agent Part 1: Introduction

Open Source AI Database AgentGenerative AI is top of mind for many engineers. The questions of how it can be applied to solve business problems and boost productivity are still up in the air. Recently I wrote a blog post about the impact of AI on platform engineers, where I talked about various AI Agents and how they can […]

Jul
08
2024
--

Automated Major Version Upgrades in Percona Operator for PostgreSQL

Automated Major Version Upgrades in Percona Operator for PostgreSQLPostgreSQL major versions are released every year, with each release delivering better performance and new features. With such rapid innovation, it is inevitable that there will be a need to upgrade from one version to another. Upgrade procedures are usually very complex and require thorough planning. With the 2.4.0 release of Percona Operator for PostgreSQL, […]

May
08
2024
--

Troubleshooting PostgreSQL on Kubernetes With Coroot

PostgreSQL on Kubernetes With CorootCoroot, an open source observability tool powered by eBPF, went generally available with version 1.0 last week. As this tool is cloud-native, we were curious to know how it can help troubleshoot databases on Kubernetes.In this blog post, we will see how to quickly debug PostgreSQL with Coroot and Percona Operator for PostgreSQL.PrepareInstall CorootThe easiest […]

Jan
09
2024
--

Create an AI Expert With Open Source Tools and pgvector

2023 was the year of Artificial Intelligence (AI). A lot of companies are thinking about how they can improve user experience with AI, and the most usual first step is to use company data (internal docs, ticketing systems, etc.) to answer customer questions faster and (or) automatically.In this blog post, we will explain the basic […]

Dec
11
2023
--

Storage Strategies for PostgreSQL on Kubernetes

Storage Strategies for PostgreSQL on Kubernetes

Deploying PostgreSQL on Kubernetes is not new and can be easily streamlined through various Operators, including Percona’s. There are a wealth of options on how you can approach storage configuration in Percona Operator for PostgreSQL, and in this blog post, we review various storage strategies — from basics to more sophisticated use cases.

The basics

Setting StorageClass

StorageClass resource in Kubernetes allows users to set various parameters of the underlying storage. For example, you can choose the public cloud storage type – gp3, io2, etc, or set file system.

You can check existing storage classes by running the following command:

$ kubectl get sc
NAME                      PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
premium-rwo               pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   54m
regionalpd-storageclass   pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   false                  51m
standard                  kubernetes.io/gce-pd    Delete          Immediate              true                   54m
standard-rwo (default)    pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   54m

As you see
standardrwo
is a default StorageClass, meaning that if you don’t specify anything, the Operator will use it.

To instruct Percona Operator for PostgreSQL which storage class to use, set it the
spec.instances.[].dataVolumeClaimSpec section:

    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: STORAGE_CLASS_NAME
      resources:
        requests:
          storage: 1Gi

Separate volume for Write-Ahead Logs

Write-Ahead Logs (WALs) keep the recording of every transaction in your PostgreSQL deployment. They are useful for point-in-time recovery and minimizing your Recovery Point Objective (RPO). In Percona Operator, it is possible to have a separate volume for WALs to minimize the impact on performance and storage capacity. To set it, use
spec.instances.[].walVolumeClaimSpec section:

    walVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: STORAGE_CLASS_NAME
      resources:
        requests:
          storage: 1Gi

If you enable
walVolumeClaimSpec, the Operator will create two volumes per replica Pod – one for data and one for WAL:

cluster1-instance1-8b2m-pgdata   Bound    pvc-2f919a49-d672-49cb-89bd-f86469241381   1Gi        RWO            standard-rwo   36s
cluster1-instance1-8b2m-pgwal    Bound    pvc-bf2c26d8-cf42-44cd-a053-ccb6abadd096   1Gi        RWO            standard-rwo   36s
cluster1-instance1-ncfq-pgdata   Bound    pvc-7ab7e59f-017a-4655-b617-ff17907ace3f   1Gi        RWO            standard-rwo   36s
cluster1-instance1-ncfq-pgwal    Bound    pvc-51baffcf-0edc-472f-9c95-7a0cea3e6507   1Gi        RWO            standard-rwo   36s
cluster1-instance1-w4d8-pgdata   Bound    pvc-c60282ed-3599-4033-afc7-e967871efa1b   1Gi        RWO            standard-rwo   36s
cluster1-instance1-w4d8-pgwal    Bound    pvc-ef530cb4-82fb-4661-ac76-ee7fda1f89ce   1Gi        RWO            standard-rwo   36s

Changing storage size

If your StorageClass and storage interface (CSI) supports VolumeExpansion, you can just change the storage size in the Custom Resource manifest. The operator will do the rest and expand the storage automatically. This is a zero-downtime operation and is limited by underlying storage capabilities only.

Changing storage

It is also possible to change the storage capabilities, such as filesystem, IOPs, and type. Right now, it is possible through creating a new storage class and applying it to the new instance group. 

spec:
  instances:
  - name: newGroup
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: NEW_STORAGE_CLASS
      resources:
        requests:
          storage: 2Gi

Creating a new instance group replicates the data to new replica nodes. This is done without downtime, but replication might introduce additional load on the primary node and the network. 

There is work in progress under Kubernetes Enhancement Proposal (KEP) #3780. It will allow users to change various volume attributes on the fly vs through the storage class. 

Data persistence

Finalizers

By default, the Operator keeps the storage and secret resources if the cluster is deleted. We do it to protect the users from human errors and other situations. This way, the user can quickly start the cluster, reusing the existing storage and secrets.

This default behavior can be changed by enabling a finalizer in the Custom Resource: 

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  name: cluster1
  finalizers:
  - percona.com/delete-pvc
  - percona.com/delete-ssl

This is useful for non-production clusters where you don’t need to keep the data. 

StorageClass data protection

There are extreme cases where human error is inevitable. For example, someone can delete the whole Kubernetes cluster or a namespace. Good thing that StorageClass resource comes with reclaimPolicy option, which can instruct Container Storage Interface to keep the underlying volumes. This option is not controlled by the operator, and you should set it for the StorageClass separately. 

> apiVersion: storage.k8s.io/v1
> kind: StorageClass
> ...
> provisioner: pd.csi.storage.gke.io
- reclaimPolicy: Delete
+ reclaimPolicy: Retain

In this case, even if Kubernetes resources are deleted, the physical storage is still there.

Regional disks

Regional disks are available at Azure and Google Cloud but not yet at AWS. In a nutshell, it is a disk that is replicated across two availability zones (AZ).

Kubernetes Regional disks

To use regional disks, you need a storage class that specifies in which AZs will it be available and replicated to:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: regionalpd-storageclass
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-balanced
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.gke.io/zone
    values:
    - us-central1-a
    - us-central1-b

There are some scenarios where regional disks can help with cost reduction. Let’s review three PostgreSQL topologies:

  1. Single node with regular disks
  2. Single node with regional disks
  3. PostgreSQL Highly Available cluster with regular disks

If we apply availability zone failure to these topologies, we will get the following:

  1. Single node with regular disks is the cheapest one, but in case of AZ failure, recovery might take hours or even days – depending on the data.
  2. With single node and regional disks, you will not be spending a dime on compute for replicas, but at the same time, you will recover within minutes. 
  3. PostgreSQL cluster provides the best availability, but also comes with high compute costs.
Single PostgreSQL node, regular disk Single PostgreSQL node, regional disks PostgreSQL HA, regular disks
Compute costs $ $ $$
Storage costs $ $$ $$
Network costs $0 $0 $
Recovery Time Objective Hours Minutes Seconds

Local storage

One of the ways to reduce your total cost of ownership (TCO) for stateful workloads on Kubernetes and boost your performance is to use local storage as opposed to network disks. Public clouds provide instances with NVMe SSDs that can be utilized in k8s with tools like OpenEBS, Portworx, and more. The way it is consumed is through regular storage classes and deserves a separate blog post.

Kubernetes Local storage

Conclusion

In this blog post, we discussed the basics of storage configuration and saw how to fine-tune various storage parameters. There are different topologies, needs, and corresponding strategies for running PostgreSQL on Kubernetes, and depending on your cost, performance, and availability needs, you have a wealth of options with Percona Operators. 

Try out the Percona Operator for PostgreSQL by following the quickstart guide here.

Join the Percona Kubernetes Squad – a group of database professionals at the forefront of innovating database operations on Kubernetes within their organizations and beyond. The Squad is dedicated to providing its members with unwavering support as we all navigate the cloud-native landscape.

Aug
29
2023
--

Bootstrap PostgreSQL on Kubernetes

Bootstrap PostgreSQL on Kubernetes

PostgreSQL has become increasingly popular in modern cloud-native environments. However, managing PostgreSQL clusters on Kubernetes can be a complex task. This is where the Percona Operator comes into play, offering a powerful solution to deploy and manage PostgreSQL clusters effortlessly. Developers often seek an easy way to bootstrap the clusters with data so that applications can start running immediately. It is especially important for CICD pipelines, where automation plays a crucial role.

In this blog post, we will explore the immense value of provisioning PostgreSQL clusters with Percona Operator by using bootstrap capabilities:

  1. Start the cluster with init SQL script
  2. Bootstrap the cluster from the existing cluster or backup

Bootstrap PostgreSQL on Kubernetes

Getting started

You need to have the Percona Operator for PostgreSQL deployed. Please follow our installation instructions and use your favorite way.

You can find all examples from this blog post in this GitHub repository. A single command to deploy the operator would be:

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/00-bundle.yaml --server-side

Init SQL script

Init SQL allows the creation of the database cluster with some initial data in it. Everything is created with postgres admin user. The way it works is the following:

  1. Create the ConfigMap resource with the SQL script
  2. Reference it in the
    PerconaPGCluster Custom Resource

The operator will apply the SQL during cluster creation. It is quite usual to combine this feature with the user creation.

Create the ConfigMap from 01-demo-init.yaml manifest:

The init.sql does the following:

  1. Connects to
    demodb database
  2. Creates schema
    media for user
    myuser
  3. Creates 2 tables –
    BLOG and
    AUTHORS in the schema

I’m combining bootstrapping with the user and database creation functionality that the Operator also provides. In my 02-deploy-cr.yaml manifest, I created the user
myuser and database
demodb:

  users:
    - name: myuser
      databases:
        - demo-db

Reference the
ConfigMap in the custom resource:

  databaseInitSQL:
    key: init.sql
    name: demo-cluster-init

Applying the manifest would do the trick:

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/02-deploy-cr.yaml

Troubleshooting

To verify if init SQL was executed successfully or to check if something went wrong, see the Operator’s log. Search for init SQL. For example, the following tells me that I had a syntax error in my SQL script for
democluster:

$ kubectl logs --tail=2000 percona-postgresql-operator-6f96ffd8d4-ddzth  | grep 'init SQL'
time="2023-08-14T09:37:37Z" level=debug msg="applied init SQL" PostgresCluster=default/demo-cluster controller=postgrescluster controllerKind=PostgresCluster key=init.sql name=demo-cluster-init namespace=default reconcileID=1d0cfdcc-0464-459a-be6e-b25eb46ed2c9 stderr="psql:<stdin>:11: ERROR:  syntax error at or near "KEYS"nLINE 2:    ID INT PRIMARY KEYS     NOT NULL,n                          ^n" stdout="You are now connected to database "demo-db" as user "postgres".nCREATE SCHEMAnCREATE TABLEn" version=

Bootstrap from cluster or backup

ConfigMaps cannot store more than one MB of data, which means that init SQL approach is good for some small data bootstraps. If you have a big dataset that you want to roll out along with cluster creation, then there are two ways to do that:

  1. From an existing cluster in Kubernetes
  2. From the backup

From the cluster

To use this, you must have a running cluster and pgBackrest configured repo for it. Now, you can create the second cluster.

03-deploy-cr2.yaml manifest will provision
democluster2. I have removed the
spec.databaseInitSQL section while keeping
spec.users. To instruct the Operator to restore from
democluster and its
repo1 I added the
dataSource  section:

  dataSource:
    postgresCluster:
      clusterName: demo-cluster
      repoName: repo1

The new cluster will be created once the manifest is applied:

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/03-deploy-cr2.yaml
$ kubectl get pg
NAME             ENDPOINT                               STATUS   POSTGRES   PGBOUNCER   AGE
demo-cluster     demo-cluster-pgbouncer.default.svc     ready    1          1           14m
demo-cluster-2   demo-cluster-2-pgbouncer.default.svc   ready    1          1           13m

demo-cluster-2  will have the same data as
democluster. Keep in mind that even if data is the same, the user passwords would be different by default. You can change this; please see users documentation.

From the backup

Another common case is bootstrapping from an existing backup in case the database cluster is not running anymore, or it is isolated in another Kubernetes cluster. In this case, the backups should be stored on some object storage. Please use our documentation to configure backups.

For example,
my democluster configuration in 04-deploy-cr.yaml looks like this if I want to take the backups to Google Cloud Storage (GCS):

pgbackrest:
      global:
        - secret:
            name: demo-cluster-gcs
...
      repos:
      - name: repo1
        schedules:
          full: "0 0 * * 6"
        gcs:
          bucket: "my-demo-bucket"

Once you have backups stored in the object storage, you can delete the cluster and reference it in the manifest anytime for bootstrapping. For example, in 05-deploy-cr3.yaml,
dataSource section looks like this:

  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: demo-cluster-gcs
      global:
        repo1-path: /pgbackrest/demo/repo1
      repo:
        name: repo1
        gcs:
          bucket: "my-demo-bucket"

The fields have the same structure and reference the same Secret resource where GCS configuration is stored.

Troubleshooting

When you bootstrap the cluster from pgBackrest backup, the Operator creates a
pgbackrestrestore pod. If it crashes and jumps into Error state, it indicates that something went wrong.

$ kubectl get pods
NAME                                           READY   STATUS     RESTARTS   AGE
demo-cluster-3-pgbackrest-restore-74dg5        0/1     Error      0          27s
$ kubectl logs demo-cluster-3-pgbackrest-restore-74dg5
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg15 --repo=1 --delta --link-map=pg_wal=/pgdata/pg15_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg15' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
WARN: repo1: [FileMissingError] unable to load info file '/pgbackrest/demo/repo1/backup/db/backup.info' or '/pgbackrest/demo/repo1/backup/db/backup.info.copy':
      FileMissingError: unable to open missing file '/pgbackrest/demo/repo1/backup/db/backup.info' for read
      FileMissingError: unable to open missing file '/pgbackrest/demo/repo1/backup/db/backup.info.copy' for read
      HINT: backup.info cannot be opened and is required to perform a backup.
      HINT: has a stanza-create been performed?
ERROR: [075]: no backup set found to restore

Conclusion

One of the key advantages of running PostgreSQL with Percona Operator is the speed of innovation it brings to the table. With the ability to automate database bootstrapping and management tasks, developers and administrators can focus on more important aspects of their applications. This leads to increased productivity and faster time-to-market for new features and enhancements.

Furthermore, the integration of bootstrapping PostgreSQL clusters on Kubernetes with CICD pipelines is vital. With Percona Operator, organizations can seamlessly incorporate their database deployments into their CI/CD processes. This not only ensures a rapid and efficient release cycle but also enables developers to automate database provisioning, updates, and rollbacks, thereby reducing the risk of errors and downtime.

Try out the Operator by following the quickstart guide here.

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Jul
19
2023
--

Deploy PostgreSQL on Kubernetes Using GitOps and ArgoCD

PostgreSQL on Kubernetes using GitOps and ArgoCD

In the world of modern DevOps, deployment automation tools have become essential for streamlining processes and ensuring consistent, reliable deployments. GitOps and ArgoCD are at the cutting edge of deployment automation, making it easy to deploy complex applications and reducing the risk of human error in the deployment process. In this blog post, we will explore how to deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD.

deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD

The setup we are looking for is the following:

  1. Teams or CICD roll out the manifests to Github
  2. ArgoCD reads the changes and compares the changes to what we have in Kubernetes
  3. ArgoCD creates/modifies Percona Operator and PostgreSQL custom resources
  4. Percona Operator takes care of day-1 and day-2 operations based on the changes pushed by ArgoCD to custom resources

Prerequisites:

  1. Kubernetes cluster
  2. GitHub repository. You can find my manifests here.

Start it up

Deploy and prepare ArgoCD

ArgoCD has quite detailed documentation explaining the installation process. I did the following:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Expose the ArgoCD server. You might want to use ingress or some other approach. I’m using a Load Balancer in a public cloud:

kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

Get the Load Balancer endpoint; we will use it later:

kubectl -n argocd get svc argocd-server
NAME            TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
argocd-server   LoadBalancer   10.88.1.239   123.21.23.21   80:30480/TCP,443:32623/TCP   6h28m

I’m not a big fan of Web User Interfaces, so I took the path of using
argocd CLI. Install it by following the CLI installation documentation.

Retrieve the admin password to log in using the CLI:

argocd admin initial-password -n argocd

Login to the server. Use the Load Balancer endpoint from above:

argocd login 123.21.23.21

PostgreSQL

Github and manifests

Put YAML manifests into the github repository. I have two:

  1. bundle.yaml – deploys Custom Resource Definitions, Service Account, and the Deployment of the Operator
  2. argo-test.yaml – deploys the PostgreSQL Custom Resource manifest that Operator will process

There are some changes that you would need to make to ensure that ArgoCD works with Percona Operator.

Server Side sync

Percona relies on OpenAPI v3.0 validation for Custom Resources. When done properly, it increases the size of a Custom Resource Definition manifest (CRDS) in some cases. As a result, you might see the following error when you apply the bundle:

kubectl apply -f deploy/bundle.yaml
...
Error from server (Invalid): error when creating "deploy/bundle.yaml": CustomResourceDefinition.apiextensions.k8s.io "perconapgclusters.pg.percona.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

To avoid it, use the
serverside flag. ArgoCD supports Server-Side apply. In the manifests, I added them through annotations to
CustomResourceDefinition objects:

kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-options: ServerSideApply=true
  name: perconapgclusters.pgv2.percona.com

Phases and waves

ArgoCD comes with Phases and Waves that allow you to apply manifests and resources in them in a specific order. You should use Waves for two reasons:

  1. To deploy Operator before Percona PostgreSQL Custom Resource
  2. It is also important to delete the Custom Resource first (so perform operations in reverse order)

I added the waves through annotations to the resources.

  • All resources in bundle.yaml are assigned to wave “1”:
kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-wave: "1"
  name: perconapgclusters.pgv2.percona.com

  • PostgreSQL Custom Resource in argo-test.yaml has wave “5”:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "5"
  name: argo-test

The bigger the number in wave, the later the resource will be created. In our case, PerconaPGCluster resource will be created after the Custom Resource Definitions from bundle.yaml.

Deploy with ArgoCD

Create the application in ArgoCD:

argocd app create percona-pg --repo https://github.com/spron-in/blog-data.git --path gitops-argocd-postgresql --dest-server https://kubernetes.default.svc --dest-namespace default

The commands do the following:

  • Creates an application called percona-pg
  • Uses the GitHub repo and a folder in it as a source of YAML manifests
  • Uses local k8s API. It is possible to have multiple k8s clusters.
  • Deploys into default namespace

Now perform a manual sync. The command will roll out manifests:

argocd app sync percona-pg

Check for pg object in the default namespace:

kubectl get pg
NAME        ENDPOINT          STATUS   POSTGRES   PGBOUNCER   AGE
argo-test   33.22.11.44       ready    3          3           80s

Operational tasks

Now let’s say we want to change something. The change should be merged into git, and ArgoCD will detect it. The sync interval is 180 seconds by default. You can change it in argocd-cm ConfigMap if needed.

Even when ArgoCD detects the change, it marks it as out of sync. For example, I reduced the number of CPUs for pgBouncer. ArgoCD detected the change:

argocd app get percona-pg
...
2023-07-07T10:11:22+03:00  pgv2.percona.com      PerconaPGCluster             default             argo-test                                OutOfSync                       perconapgcluster.pgv2.percona.com/argo-test configured

Now I can manually sync the change again. To automate the whole flow, just set the sync policy:

argocd app set percona-pg --sync-policy automated

Now all the changes in git will be automatically synced with Kubernetes once ArgoCD detects them.

Conclusion

GitOps, combined with Kubernetes and the Percona Operator for PostgreSQL, provides a powerful toolset for rapidly deploying and managing complex database infrastructures. By leveraging automation and deployment best practices, teams can reduce the risk of human error, increase deployment velocity, and focus on delivering business value. Additionally, the ability to declaratively manage infrastructure and application state enables teams to quickly recover from outages and respond to changes with confidence.

Try out the Operator by following the quickstart guide here.   

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Jun
30
2023
--

Announcing the General Availability of Percona Operator for PostgreSQL Version 2

General Availability of Percona Operator for PostgreSQL Version 2

Percona, a leading provider of open-source database software and services, announced the general availability of Percona Operator for PostgreSQL version 2. The solution is 100% open source and automates the deployment and management of PostgreSQL clusters on Kubernetes. Percona is also offering 24/7 support and managed services for paying customers.

As more and more organizations move their workloads to Kubernetes, managing PostgreSQL clusters in this environment can be challenging. Kubernetes provides many benefits, such as automation and scalability, but it also introduces new complexities when it comes to managing databases. IT teams must ensure high availability, scalability, and security, all while ensuring that their PostgreSQL clusters perform optimally.

Percona Operator for PostgreSQL version 2 simplifies the deployment and management of PostgreSQL clusters on Kubernetes. The solution automates tasks such as scaling, backup and restore, upgrades, and more, and supports Patroni-based PostgreSQL clusters. Additionally, Percona Operator for PostgreSQL version 2 includes expanded options for customizing backup and restore operations, improved monitoring and alerting capabilities, and support for PostgreSQL 15.

For organizations that need additional support and managed services, Percona is offering 24/7 support and managed services for paying customers. This includes access to a dedicated team of PostgreSQL experts who can help with everything from installation and configuration to ongoing management and support.

Percona’s commitment to open source ensures that the Percona Operator for PostgreSQL version 2 remains a flexible and customizable solution that can be tailored to meet the unique needs of any organization.

Below you will find a short FAQ about the new operator and a comparison to version 1.x.

What is better in version 2 compared to version 1?

Architecture

Operator SDK is now used to build and package the Operator. It simplifies the development and brings more contribution friendliness to the code, resulting in better potential for growing the community. Users now have full control over Custom Resource Definitions that Operator relies on, which simplifies the deployment and management of the operator.

In version 1.x, we relied on Deployment resources to run PostgreSQL clusters, whereas in 2.0 Statefulsets are used, which are the de-facto standard for running stateful workloads in Kubernetes. This change improves the stability of the clusters and removes a lot of complexity from the Operator.

Backups

One of the biggest challenges in version 1.x is backups and restores. There are two main problems that our users faced:

  • Not possible to change the backup configuration for the existing cluster
  • Restoration from backup to the newly deployed cluster required workarounds

In this version, both these issues are fixed. In addition to that:

Operations

Deploying complex topologies in Kubernetes is not possible without affinity and anti-affinity rules. In version 1.x, there were various limitations and issues, whereas this version comes with substantial improvements that enable users to craft the topology of their choice.

Within the same cluster, users can deploy multiple instances. These instances are going to have the same data but can have different configurations and resources. This can be useful if you plan to migrate to new hardware or need to test the new topology.

Each PostgreSQL node can have sidecar containers now to provide integration with your existing tools or expand the capabilities of the cluster.

Will Percona still support v1?

Percona Operator for PostgreSQL version 1 moves to maintenance mode and will go End-of-Life after one year – June 2024. We are going to provide bug and security fixes but will not introduce new features and improvements.

Customers with a contract with Percona will still have operational support until Operator goes into EoL stage.

I’m running version 1 now; how can I upgrade to version 2?

We have prepared detailed guidelines for migrating from version 1 to version 2 with zero or minimal downtime. Please refer to our documentation.

The Percona Operator for PostgreSQL version 2 is available now, and we invite you to try it out for yourself. Whether you’re deploying PostgreSQL for the first time or looking for a more efficient way to manage your existing environment, Percona Operator for PostgreSQL has everything you need to get the job done. We look forward to seeing what you can do with it!

For more information, visit Percona Operator for PostgreSQL v2 documentation page. For commercial support, please visit our contact page.

 

Learn more about Percona Operator for PostgreSQL v2

Jun
20
2023
--

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

Deploy Django on Kubernetes

Developers need an efficient, reliable way to run their Django applications with a robust PostgreSQL. Percona Operator for PostgreSQL offers a powerful solution for managing and scaling PostgreSQL databases in a Kubernetes environment, making it an ideal choice for developer use cases. In this blog post, we’ll explore what it takes to run Django on Kubernetes with Percona Operator.

Set up PostgreSQL

Use your favorite way to deploy PostgreSQL Operator. I will use the regular kubectl approach:

$ kubectl apply --server-side -f deploy/bundle.yaml

Django application would require its own user and database. Alter cluster custom resource (cr.yaml) manifest spec.users section. The following example will create the cluster
appdb,  the user
djangoapp and database
pgtest. The user will have access to
pgtest database only:

metadata:
  name: appdb
spec:
  users:
    - name: djangoapp
      databases:
        - pgtest

Now you can apply the manifest to provision the cluster:

$ kubectl apply -f deploy/cr.yaml

The Operator will generate the secret for the user called
CLUSTERNAMEpguserdjangoapp
. Read on to learn how to get credentials and connection string from it.

PostgreSQL 15 public schema

PostgreSQL 15 removes the global write privilege from the public schema. As a result, you might see the following error when running migration in Django:

django.db.migrations.exceptions.MigrationSchemaMissing: Unable to create the django_migrations table (permission denied for schema public
LINE 1: CREATE TABLE "django_migrations" ("id" bigint NOT NULL PRIMA...
                     ^
)

To fix that, it is necessary to explicitly allow
djangoapp
permissions to public schema. To do that, you need to connect to the cluster with superuser and run grants:

pgtest=# GRANT ALL ON SCHEMA public TO djangoapp;
GRANT

Learn how to connect with a superuser in our documentation.

Django and PostgreSQL

psycopg2

Psycopg is a PostgreSQL database adapter for the Python programming language. Django uses it to connect to the database. You will see the following error if you don’t have it installed and trying to connect to PostgreSQL:

django.core.exceptions.ImproperlyConfigured: Error loading psycopg2 or psycopg module

Install
psycopg2 by following its documentation. If you are installing it with pip, be aware that it might look for
pg_config  to build:

$ pip3 install psycopg2
…
Error: pg_config executable not found.


      pg_config is required to build psycopg2 from source.  Please add the directory
      containing pg_config to the $PATH or specify the full executable path with the
      option:


          python setup.py build_ext --pg-config /path/to/pg_config build ...

The easiest way to fix it is to install the
psycopg2binary instead:

$ pip3 install psycopg2-binary

settings.py


settings.py  is the main configuration file for Django applications. It is where you should configure the database – provide credentials, set the correct engine, and more
. You can read more about it in Django’s documentation.

There is nothing super specific about configuring Django and Percona Operator. First, get the connection string and credentials from the Operator. For the cluster
appdb and the user
djangoapp they are stored in a secret
appdbpguserdjangoapp. The following commands will get you what is needed:

$ kubectl get secret appdb-pguser-djangoapp --template='{{.data.password | base64decode}}'
4<_R|8O/@:.2>PnO+DyEW1Kd

$ kubectl get secret appdb-pguser-djangoapp --template='{{index .data "pgbouncer-host" | base64decode}}'
appdb-pgbouncer.default.svc

Your settings.py DATABASES section will look the following way:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'pgtest',
        'USER': 'djangoapp',
        'PASSWORD': '4<_R|8O/@:.2>PnO+DyEW1Kd',
        'HOST': 'appdb-pgbouncer.default.svc’,
        'PORT': '5432',
    }
}

settings.py and Kubernetes

The recommended way to pass credentials to containers is through environment variables. In Kubernetes, it will be additionally wrapped into a Secret resource.

To make it work, we will put our database URI into an environment variable. You can get the database URI from the secret as well:

$ kubectl get secret appdb-pguser-djangoapp --template='{{index .data "pgbouncer-uri" | base64decode}}'
postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest

The recommended way for Django is to store environment variables in
.env  file:

DATABASE_URL=postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest

For using the database URL, use dj_database_url. Install it as usual with pip:

$ pip3 install dj_database_url

Now you can have something like this in your settings.py:

import dj_database_url
import os

if os.environ.get('DATABASE_URL'):
  DATABASES['default'] = dj_database_url.config(default=os.environ['DATABASE_URL'])

DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql'

Note the
ENGINE, as
dj_database_url does not set it.

You can also avoid using
dj_database_url  and pass each variable separately through
os.environ.

Passing a variable in Kubernetes

In Kubernetes, you can pass the
DATABASE_URL to a container through a Secret. You can mount a separate Secret or reuse the one that the Operator manages. The recommended way is to have a separate Secret object, as your application might be in a separate namespace, and you might not have enough permissions to mount the one that is managed by the Operator. The Secret and Deployment might look as follows:

apiVersion: v1
kind: Secret
metadata:
  name: my-db-secret
stringData:
  pgbouncer-uri: postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: django-deploy
spec:
  replicas: 3
    spec:
      containers:
      - name: mydjango
        image: mydjangoapp:1.2.3
        ports:
        - containerPort: 8000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: my-db-secret
              key: pgbouncer-uri

Conclusion: Deploying Django with Percona Operator for PostgreSQL

Running Django on Kubernetes with Percona Operator for PostgreSQL offers developers an efficient and scalable solution for managing their database needs in a Kubernetes environment. While there are some caveats and potential issues to be aware of, the configuration examples and explanations provided in this blog post will help developers overcome any challenges they may encounter. With Percona Operators, developers can focus on building and delivering their applications with confidence and ease.

Learn more about Percona Operator for PostgreSQL in our documentation.

You can get early access to new product features, invite-only “ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.

Read Our New White Paper:

Why Customers Choose Percona for PostgreSQL

May
29
2023
--

Disaster Recovery for PostgreSQL on Kubernetes

disaster recover for PostgreSQL on Kubernetes

Disaster recovery is not optional for businesses operating in the digital age. With the ever-increasing reliance on data, system outages or data loss can be catastrophic, causing significant business disruptions and financial losses.

With multi-cloud or multi-regional PostgreSQL deployments, the complexity of managing disaster recovery only amplifies. This is where the Percona Operators come in, providing a solution to streamline disaster recovery for PostgreSQL clusters running on Kubernetes. With the Percona Operators, businesses can manage multi-cloud or hybrid-cloud PostgreSQL deployments with ease, ensuring that critical data is always available and secure, no matter what happens.

In this article, you will learn how to set up disaster recovery with Percona Operator for PostgreSQL version 2.

Overview of the solution

Operators automate routine tasks and remove toil. For standby, Operator provides the following options:

  1. pgBackrest repo-based standby
  2. Streaming replication
  3. Combination of (1) and (2)

We will review the repo-based standby as the simplest one:

1. Two Kubernetes clusters in different regions, clouds, or running in hybrid mode (on-prem + cloud). One is Main, and the other is Disaster Recovery (DR).

2. In each cluster, there are the following components:

    1. Percona Operator
    2. PostgreSQL cluster
    3. pgBackrest
    4. pgBouncer

3. pgBackrest on the Main site streams backups and Write Ahead Logs (WALs) to the object storage.

4. pgBackrest on the DR site takes these backups and streams them to the standby cluster.

Configure main site

Use your favorite method to deploy the Operator from our documentation. Once installed, configure the Custom Resource manifest so that pgBackrest starts using the Object Storage of your choice. Skip this step if you already have it configured.

Configure the backups.pgbackrest.repos section by adding the necessary configuration. The below example is for Google Cloud Storage (GCS):

spec:
  backups:
    configuration:
      - secret:
          name: main-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET

main-pgbackrest-secrets

contains the keys for GCS; please read more about the configuration in the backup and restore tutorial.

Once configured, apply the custom resource:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/main created

The backups should appear in the object storage. By default, pgBackrest puts them into the pgbackrest folder.

Configure DR site

The configuration of the disaster recovery site is similar to the Main, with the only difference in standby settings.

The following manifest has standby.enabled set to true and points to the repoName where backups are (GCS in our case):

metadata:
  name: standby
spec: 
...
  backups:
    configuration:
      - secret:
          name: standby-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET
  standby:
    enabled: true
    repoName: repo1

Deploy the standby cluster by applying the manifest:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/standby created

Failover

In case of Main site failure or in other cases, you can promote the standby cluster. The promotion effectively allows writing to the cluster. This creates a net effect of pushing Write Ahead Logs (WALs) to the pgBackrest repository. It might create a split-brain situation where two primary instances attempt to write to the same repository. To avoid this, make sure the primary cluster is either deleted or shut down before trying to promote the standby cluster.

Once the primary is down or inactive, promote the standby by changing the corresponding section:

spec:
  standby:
    enabled: false

Now you can start writing to the cluster.

Split brain

There might be a case where your old primary comes up and starts writing to the repository. To recover from this situation, do the following:

  1. Keep only one primary with the latest data running
  2. Stop the writes on the other one
  3. Take the new full backup from the primary and upload it to the repo

Automating the failover

Automated failover consists of multiple steps and is outside of the Operator’s scope. There are a few steps that you can take to reduce the Recovery Time Objective (RTO). To detect the failover, we recommend having a third site for monitoring both DR and Main. In this case, you can be sure that Main really failed, and it is not a network split situation.

Another aspect of automation is to switch the traffic for the application from Main to Standby after promotion. It can be done through various Kubernetes configurations and heavily depends on how your networking and application are designed. The following options are quite common:

  1. Global Load Balancer – various clouds and vendors provide their solutions
  2. Multi-cluster Services or MCS – available on most of the public clouds
  3. Federation or other multi-cluster solutions

Conclusion

Percona Operator for PostgreSQL provides high availability for database clusters by design, making it a robust and production-ready solution for multi-AZ deployments. At the same time, business continuity protocols require disaster recovery plans in place where your vital processes and applications can survive regional outages. In this blog post, we saw how Kubernetes and Operators can simplify your DR design. Try it out yourself, and let us know your experience at the Community Forum.

For more information, visit Percona Operator for PostgreSQL v2 documentation page. For commercial support, please visit our contact page.

 

Try Percona Operator for PostgreSQL today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com