Jan
09
2024
--

Create an AI Expert With Open Source Tools and pgvector

2023 was the year of Artificial Intelligence (AI). A lot of companies are thinking about how they can improve user experience with AI, and the most usual first step is to use company data (internal docs, ticketing systems, etc.) to answer customer questions faster and (or) automatically.In this blog post, we will explain the basic […]

Dec
11
2023
--

Storage Strategies for PostgreSQL on Kubernetes

Storage Strategies for PostgreSQL on Kubernetes

Deploying PostgreSQL on Kubernetes is not new and can be easily streamlined through various Operators, including Percona’s. There are a wealth of options on how you can approach storage configuration in Percona Operator for PostgreSQL, and in this blog post, we review various storage strategies — from basics to more sophisticated use cases.

The basics

Setting StorageClass

StorageClass resource in Kubernetes allows users to set various parameters of the underlying storage. For example, you can choose the public cloud storage type – gp3, io2, etc, or set file system.

You can check existing storage classes by running the following command:

$ kubectl get sc
NAME                      PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
premium-rwo               pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   54m
regionalpd-storageclass   pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   false                  51m
standard                  kubernetes.io/gce-pd    Delete          Immediate              true                   54m
standard-rwo (default)    pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   54m

As you see
standardrwo
is a default StorageClass, meaning that if you don’t specify anything, the Operator will use it.

To instruct Percona Operator for PostgreSQL which storage class to use, set it the
spec.instances.[].dataVolumeClaimSpec section:

    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: STORAGE_CLASS_NAME
      resources:
        requests:
          storage: 1Gi

Separate volume for Write-Ahead Logs

Write-Ahead Logs (WALs) keep the recording of every transaction in your PostgreSQL deployment. They are useful for point-in-time recovery and minimizing your Recovery Point Objective (RPO). In Percona Operator, it is possible to have a separate volume for WALs to minimize the impact on performance and storage capacity. To set it, use
spec.instances.[].walVolumeClaimSpec section:

    walVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: STORAGE_CLASS_NAME
      resources:
        requests:
          storage: 1Gi

If you enable
walVolumeClaimSpec, the Operator will create two volumes per replica Pod – one for data and one for WAL:

cluster1-instance1-8b2m-pgdata   Bound    pvc-2f919a49-d672-49cb-89bd-f86469241381   1Gi        RWO            standard-rwo   36s
cluster1-instance1-8b2m-pgwal    Bound    pvc-bf2c26d8-cf42-44cd-a053-ccb6abadd096   1Gi        RWO            standard-rwo   36s
cluster1-instance1-ncfq-pgdata   Bound    pvc-7ab7e59f-017a-4655-b617-ff17907ace3f   1Gi        RWO            standard-rwo   36s
cluster1-instance1-ncfq-pgwal    Bound    pvc-51baffcf-0edc-472f-9c95-7a0cea3e6507   1Gi        RWO            standard-rwo   36s
cluster1-instance1-w4d8-pgdata   Bound    pvc-c60282ed-3599-4033-afc7-e967871efa1b   1Gi        RWO            standard-rwo   36s
cluster1-instance1-w4d8-pgwal    Bound    pvc-ef530cb4-82fb-4661-ac76-ee7fda1f89ce   1Gi        RWO            standard-rwo   36s

Changing storage size

If your StorageClass and storage interface (CSI) supports VolumeExpansion, you can just change the storage size in the Custom Resource manifest. The operator will do the rest and expand the storage automatically. This is a zero-downtime operation and is limited by underlying storage capabilities only.

Changing storage

It is also possible to change the storage capabilities, such as filesystem, IOPs, and type. Right now, it is possible through creating a new storage class and applying it to the new instance group. 

spec:
  instances:
  - name: newGroup
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      storageClassName: NEW_STORAGE_CLASS
      resources:
        requests:
          storage: 2Gi

Creating a new instance group replicates the data to new replica nodes. This is done without downtime, but replication might introduce additional load on the primary node and the network. 

There is work in progress under Kubernetes Enhancement Proposal (KEP) #3780. It will allow users to change various volume attributes on the fly vs through the storage class. 

Data persistence

Finalizers

By default, the Operator keeps the storage and secret resources if the cluster is deleted. We do it to protect the users from human errors and other situations. This way, the user can quickly start the cluster, reusing the existing storage and secrets.

This default behavior can be changed by enabling a finalizer in the Custom Resource: 

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  name: cluster1
  finalizers:
  - percona.com/delete-pvc
  - percona.com/delete-ssl

This is useful for non-production clusters where you don’t need to keep the data. 

StorageClass data protection

There are extreme cases where human error is inevitable. For example, someone can delete the whole Kubernetes cluster or a namespace. Good thing that StorageClass resource comes with reclaimPolicy option, which can instruct Container Storage Interface to keep the underlying volumes. This option is not controlled by the operator, and you should set it for the StorageClass separately. 

> apiVersion: storage.k8s.io/v1
> kind: StorageClass
> ...
> provisioner: pd.csi.storage.gke.io
- reclaimPolicy: Delete
+ reclaimPolicy: Retain

In this case, even if Kubernetes resources are deleted, the physical storage is still there.

Regional disks

Regional disks are available at Azure and Google Cloud but not yet at AWS. In a nutshell, it is a disk that is replicated across two availability zones (AZ).

Kubernetes Regional disks

To use regional disks, you need a storage class that specifies in which AZs will it be available and replicated to:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: regionalpd-storageclass
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-balanced
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.gke.io/zone
    values:
    - us-central1-a
    - us-central1-b

There are some scenarios where regional disks can help with cost reduction. Let’s review three PostgreSQL topologies:

  1. Single node with regular disks
  2. Single node with regional disks
  3. PostgreSQL Highly Available cluster with regular disks

If we apply availability zone failure to these topologies, we will get the following:

  1. Single node with regular disks is the cheapest one, but in case of AZ failure, recovery might take hours or even days – depending on the data.
  2. With single node and regional disks, you will not be spending a dime on compute for replicas, but at the same time, you will recover within minutes. 
  3. PostgreSQL cluster provides the best availability, but also comes with high compute costs.
Single PostgreSQL node, regular disk Single PostgreSQL node, regional disks PostgreSQL HA, regular disks
Compute costs $ $ $$
Storage costs $ $$ $$
Network costs $0 $0 $
Recovery Time Objective Hours Minutes Seconds

Local storage

One of the ways to reduce your total cost of ownership (TCO) for stateful workloads on Kubernetes and boost your performance is to use local storage as opposed to network disks. Public clouds provide instances with NVMe SSDs that can be utilized in k8s with tools like OpenEBS, Portworx, and more. The way it is consumed is through regular storage classes and deserves a separate blog post.

Kubernetes Local storage

Conclusion

In this blog post, we discussed the basics of storage configuration and saw how to fine-tune various storage parameters. There are different topologies, needs, and corresponding strategies for running PostgreSQL on Kubernetes, and depending on your cost, performance, and availability needs, you have a wealth of options with Percona Operators. 

Try out the Percona Operator for PostgreSQL by following the quickstart guide here.

Join the Percona Kubernetes Squad – a group of database professionals at the forefront of innovating database operations on Kubernetes within their organizations and beyond. The Squad is dedicated to providing its members with unwavering support as we all navigate the cloud-native landscape.

Aug
29
2023
--

Bootstrap PostgreSQL on Kubernetes

Bootstrap PostgreSQL on Kubernetes

PostgreSQL has become increasingly popular in modern cloud-native environments. However, managing PostgreSQL clusters on Kubernetes can be a complex task. This is where the Percona Operator comes into play, offering a powerful solution to deploy and manage PostgreSQL clusters effortlessly. Developers often seek an easy way to bootstrap the clusters with data so that applications can start running immediately. It is especially important for CICD pipelines, where automation plays a crucial role.

In this blog post, we will explore the immense value of provisioning PostgreSQL clusters with Percona Operator by using bootstrap capabilities:

  1. Start the cluster with init SQL script
  2. Bootstrap the cluster from the existing cluster or backup

Bootstrap PostgreSQL on Kubernetes

Getting started

You need to have the Percona Operator for PostgreSQL deployed. Please follow our installation instructions and use your favorite way.

You can find all examples from this blog post in this GitHub repository. A single command to deploy the operator would be:

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/00-bundle.yaml --server-side

Init SQL script

Init SQL allows the creation of the database cluster with some initial data in it. Everything is created with postgres admin user. The way it works is the following:

  1. Create the ConfigMap resource with the SQL script
  2. Reference it in the
    PerconaPGCluster Custom Resource

The operator will apply the SQL during cluster creation. It is quite usual to combine this feature with the user creation.

Create the ConfigMap from 01-demo-init.yaml manifest:

The init.sql does the following:

  1. Connects to
    demodb database
  2. Creates schema
    media for user
    myuser
  3. Creates 2 tables –
    BLOG and
    AUTHORS in the schema

I’m combining bootstrapping with the user and database creation functionality that the Operator also provides. In my 02-deploy-cr.yaml manifest, I created the user
myuser and database
demodb:

  users:
    - name: myuser
      databases:
        - demo-db

Reference the
ConfigMap in the custom resource:

  databaseInitSQL:
    key: init.sql
    name: demo-cluster-init

Applying the manifest would do the trick:

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/02-deploy-cr.yaml

Troubleshooting

To verify if init SQL was executed successfully or to check if something went wrong, see the Operator’s log. Search for init SQL. For example, the following tells me that I had a syntax error in my SQL script for
democluster:

$ kubectl logs --tail=2000 percona-postgresql-operator-6f96ffd8d4-ddzth  | grep 'init SQL'
time="2023-08-14T09:37:37Z" level=debug msg="applied init SQL" PostgresCluster=default/demo-cluster controller=postgrescluster controllerKind=PostgresCluster key=init.sql name=demo-cluster-init namespace=default reconcileID=1d0cfdcc-0464-459a-be6e-b25eb46ed2c9 stderr="psql:<stdin>:11: ERROR:  syntax error at or near "KEYS"nLINE 2:    ID INT PRIMARY KEYS     NOT NULL,n                          ^n" stdout="You are now connected to database "demo-db" as user "postgres".nCREATE SCHEMAnCREATE TABLEn" version=

Bootstrap from cluster or backup

ConfigMaps cannot store more than one MB of data, which means that init SQL approach is good for some small data bootstraps. If you have a big dataset that you want to roll out along with cluster creation, then there are two ways to do that:

  1. From an existing cluster in Kubernetes
  2. From the backup

From the cluster

To use this, you must have a running cluster and pgBackrest configured repo for it. Now, you can create the second cluster.

03-deploy-cr2.yaml manifest will provision
democluster2. I have removed the
spec.databaseInitSQL section while keeping
spec.users. To instruct the Operator to restore from
democluster and its
repo1 I added the
dataSource  section:

  dataSource:
    postgresCluster:
      clusterName: demo-cluster
      repoName: repo1

The new cluster will be created once the manifest is applied:

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/bootstrap-postgresql-k8s/03-deploy-cr2.yaml
$ kubectl get pg
NAME             ENDPOINT                               STATUS   POSTGRES   PGBOUNCER   AGE
demo-cluster     demo-cluster-pgbouncer.default.svc     ready    1          1           14m
demo-cluster-2   demo-cluster-2-pgbouncer.default.svc   ready    1          1           13m

demo-cluster-2  will have the same data as
democluster. Keep in mind that even if data is the same, the user passwords would be different by default. You can change this; please see users documentation.

From the backup

Another common case is bootstrapping from an existing backup in case the database cluster is not running anymore, or it is isolated in another Kubernetes cluster. In this case, the backups should be stored on some object storage. Please use our documentation to configure backups.

For example,
my democluster configuration in 04-deploy-cr.yaml looks like this if I want to take the backups to Google Cloud Storage (GCS):

pgbackrest:
      global:
        - secret:
            name: demo-cluster-gcs
...
      repos:
      - name: repo1
        schedules:
          full: "0 0 * * 6"
        gcs:
          bucket: "my-demo-bucket"

Once you have backups stored in the object storage, you can delete the cluster and reference it in the manifest anytime for bootstrapping. For example, in 05-deploy-cr3.yaml,
dataSource section looks like this:

  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: demo-cluster-gcs
      global:
        repo1-path: /pgbackrest/demo/repo1
      repo:
        name: repo1
        gcs:
          bucket: "my-demo-bucket"

The fields have the same structure and reference the same Secret resource where GCS configuration is stored.

Troubleshooting

When you bootstrap the cluster from pgBackrest backup, the Operator creates a
pgbackrestrestore pod. If it crashes and jumps into Error state, it indicates that something went wrong.

$ kubectl get pods
NAME                                           READY   STATUS     RESTARTS   AGE
demo-cluster-3-pgbackrest-restore-74dg5        0/1     Error      0          27s
$ kubectl logs demo-cluster-3-pgbackrest-restore-74dg5
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg15 --repo=1 --delta --link-map=pg_wal=/pgdata/pg15_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg15' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
WARN: repo1: [FileMissingError] unable to load info file '/pgbackrest/demo/repo1/backup/db/backup.info' or '/pgbackrest/demo/repo1/backup/db/backup.info.copy':
      FileMissingError: unable to open missing file '/pgbackrest/demo/repo1/backup/db/backup.info' for read
      FileMissingError: unable to open missing file '/pgbackrest/demo/repo1/backup/db/backup.info.copy' for read
      HINT: backup.info cannot be opened and is required to perform a backup.
      HINT: has a stanza-create been performed?
ERROR: [075]: no backup set found to restore

Conclusion

One of the key advantages of running PostgreSQL with Percona Operator is the speed of innovation it brings to the table. With the ability to automate database bootstrapping and management tasks, developers and administrators can focus on more important aspects of their applications. This leads to increased productivity and faster time-to-market for new features and enhancements.

Furthermore, the integration of bootstrapping PostgreSQL clusters on Kubernetes with CICD pipelines is vital. With Percona Operator, organizations can seamlessly incorporate their database deployments into their CI/CD processes. This not only ensures a rapid and efficient release cycle but also enables developers to automate database provisioning, updates, and rollbacks, thereby reducing the risk of errors and downtime.

Try out the Operator by following the quickstart guide here.

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Jul
19
2023
--

Deploy PostgreSQL on Kubernetes Using GitOps and ArgoCD

PostgreSQL on Kubernetes using GitOps and ArgoCD

In the world of modern DevOps, deployment automation tools have become essential for streamlining processes and ensuring consistent, reliable deployments. GitOps and ArgoCD are at the cutting edge of deployment automation, making it easy to deploy complex applications and reducing the risk of human error in the deployment process. In this blog post, we will explore how to deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD.

deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD

The setup we are looking for is the following:

  1. Teams or CICD roll out the manifests to Github
  2. ArgoCD reads the changes and compares the changes to what we have in Kubernetes
  3. ArgoCD creates/modifies Percona Operator and PostgreSQL custom resources
  4. Percona Operator takes care of day-1 and day-2 operations based on the changes pushed by ArgoCD to custom resources

Prerequisites:

  1. Kubernetes cluster
  2. GitHub repository. You can find my manifests here.

Start it up

Deploy and prepare ArgoCD

ArgoCD has quite detailed documentation explaining the installation process. I did the following:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Expose the ArgoCD server. You might want to use ingress or some other approach. I’m using a Load Balancer in a public cloud:

kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

Get the Load Balancer endpoint; we will use it later:

kubectl -n argocd get svc argocd-server
NAME            TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
argocd-server   LoadBalancer   10.88.1.239   123.21.23.21   80:30480/TCP,443:32623/TCP   6h28m

I’m not a big fan of Web User Interfaces, so I took the path of using
argocd CLI. Install it by following the CLI installation documentation.

Retrieve the admin password to log in using the CLI:

argocd admin initial-password -n argocd

Login to the server. Use the Load Balancer endpoint from above:

argocd login 123.21.23.21

PostgreSQL

Github and manifests

Put YAML manifests into the github repository. I have two:

  1. bundle.yaml – deploys Custom Resource Definitions, Service Account, and the Deployment of the Operator
  2. argo-test.yaml – deploys the PostgreSQL Custom Resource manifest that Operator will process

There are some changes that you would need to make to ensure that ArgoCD works with Percona Operator.

Server Side sync

Percona relies on OpenAPI v3.0 validation for Custom Resources. When done properly, it increases the size of a Custom Resource Definition manifest (CRDS) in some cases. As a result, you might see the following error when you apply the bundle:

kubectl apply -f deploy/bundle.yaml
...
Error from server (Invalid): error when creating "deploy/bundle.yaml": CustomResourceDefinition.apiextensions.k8s.io "perconapgclusters.pg.percona.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

To avoid it, use the
serverside flag. ArgoCD supports Server-Side apply. In the manifests, I added them through annotations to
CustomResourceDefinition objects:

kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-options: ServerSideApply=true
  name: perconapgclusters.pgv2.percona.com

Phases and waves

ArgoCD comes with Phases and Waves that allow you to apply manifests and resources in them in a specific order. You should use Waves for two reasons:

  1. To deploy Operator before Percona PostgreSQL Custom Resource
  2. It is also important to delete the Custom Resource first (so perform operations in reverse order)

I added the waves through annotations to the resources.

  • All resources in bundle.yaml are assigned to wave “1”:
kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-wave: "1"
  name: perconapgclusters.pgv2.percona.com

  • PostgreSQL Custom Resource in argo-test.yaml has wave “5”:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "5"
  name: argo-test

The bigger the number in wave, the later the resource will be created. In our case, PerconaPGCluster resource will be created after the Custom Resource Definitions from bundle.yaml.

Deploy with ArgoCD

Create the application in ArgoCD:

argocd app create percona-pg --repo https://github.com/spron-in/blog-data.git --path gitops-argocd-postgresql --dest-server https://kubernetes.default.svc --dest-namespace default

The commands do the following:

  • Creates an application called percona-pg
  • Uses the GitHub repo and a folder in it as a source of YAML manifests
  • Uses local k8s API. It is possible to have multiple k8s clusters.
  • Deploys into default namespace

Now perform a manual sync. The command will roll out manifests:

argocd app sync percona-pg

Check for pg object in the default namespace:

kubectl get pg
NAME        ENDPOINT          STATUS   POSTGRES   PGBOUNCER   AGE
argo-test   33.22.11.44       ready    3          3           80s

Operational tasks

Now let’s say we want to change something. The change should be merged into git, and ArgoCD will detect it. The sync interval is 180 seconds by default. You can change it in argocd-cm ConfigMap if needed.

Even when ArgoCD detects the change, it marks it as out of sync. For example, I reduced the number of CPUs for pgBouncer. ArgoCD detected the change:

argocd app get percona-pg
...
2023-07-07T10:11:22+03:00  pgv2.percona.com      PerconaPGCluster             default             argo-test                                OutOfSync                       perconapgcluster.pgv2.percona.com/argo-test configured

Now I can manually sync the change again. To automate the whole flow, just set the sync policy:

argocd app set percona-pg --sync-policy automated

Now all the changes in git will be automatically synced with Kubernetes once ArgoCD detects them.

Conclusion

GitOps, combined with Kubernetes and the Percona Operator for PostgreSQL, provides a powerful toolset for rapidly deploying and managing complex database infrastructures. By leveraging automation and deployment best practices, teams can reduce the risk of human error, increase deployment velocity, and focus on delivering business value. Additionally, the ability to declaratively manage infrastructure and application state enables teams to quickly recover from outages and respond to changes with confidence.

Try out the Operator by following the quickstart guide here.   

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Jun
30
2023
--

Announcing the General Availability of Percona Operator for PostgreSQL Version 2

General Availability of Percona Operator for PostgreSQL Version 2

Percona, a leading provider of open-source database software and services, announced the general availability of Percona Operator for PostgreSQL version 2. The solution is 100% open source and automates the deployment and management of PostgreSQL clusters on Kubernetes. Percona is also offering 24/7 support and managed services for paying customers.

As more and more organizations move their workloads to Kubernetes, managing PostgreSQL clusters in this environment can be challenging. Kubernetes provides many benefits, such as automation and scalability, but it also introduces new complexities when it comes to managing databases. IT teams must ensure high availability, scalability, and security, all while ensuring that their PostgreSQL clusters perform optimally.

Percona Operator for PostgreSQL version 2 simplifies the deployment and management of PostgreSQL clusters on Kubernetes. The solution automates tasks such as scaling, backup and restore, upgrades, and more, and supports Patroni-based PostgreSQL clusters. Additionally, Percona Operator for PostgreSQL version 2 includes expanded options for customizing backup and restore operations, improved monitoring and alerting capabilities, and support for PostgreSQL 15.

For organizations that need additional support and managed services, Percona is offering 24/7 support and managed services for paying customers. This includes access to a dedicated team of PostgreSQL experts who can help with everything from installation and configuration to ongoing management and support.

Percona’s commitment to open source ensures that the Percona Operator for PostgreSQL version 2 remains a flexible and customizable solution that can be tailored to meet the unique needs of any organization.

Below you will find a short FAQ about the new operator and a comparison to version 1.x.

What is better in version 2 compared to version 1?

Architecture

Operator SDK is now used to build and package the Operator. It simplifies the development and brings more contribution friendliness to the code, resulting in better potential for growing the community. Users now have full control over Custom Resource Definitions that Operator relies on, which simplifies the deployment and management of the operator.

In version 1.x, we relied on Deployment resources to run PostgreSQL clusters, whereas in 2.0 Statefulsets are used, which are the de-facto standard for running stateful workloads in Kubernetes. This change improves the stability of the clusters and removes a lot of complexity from the Operator.

Backups

One of the biggest challenges in version 1.x is backups and restores. There are two main problems that our users faced:

  • Not possible to change the backup configuration for the existing cluster
  • Restoration from backup to the newly deployed cluster required workarounds

In this version, both these issues are fixed. In addition to that:

Operations

Deploying complex topologies in Kubernetes is not possible without affinity and anti-affinity rules. In version 1.x, there were various limitations and issues, whereas this version comes with substantial improvements that enable users to craft the topology of their choice.

Within the same cluster, users can deploy multiple instances. These instances are going to have the same data but can have different configurations and resources. This can be useful if you plan to migrate to new hardware or need to test the new topology.

Each PostgreSQL node can have sidecar containers now to provide integration with your existing tools or expand the capabilities of the cluster.

Will Percona still support v1?

Percona Operator for PostgreSQL version 1 moves to maintenance mode and will go End-of-Life after one year – June 2024. We are going to provide bug and security fixes but will not introduce new features and improvements.

Customers with a contract with Percona will still have operational support until Operator goes into EoL stage.

I’m running version 1 now; how can I upgrade to version 2?

We have prepared detailed guidelines for migrating from version 1 to version 2 with zero or minimal downtime. Please refer to our documentation.

The Percona Operator for PostgreSQL version 2 is available now, and we invite you to try it out for yourself. Whether you’re deploying PostgreSQL for the first time or looking for a more efficient way to manage your existing environment, Percona Operator for PostgreSQL has everything you need to get the job done. We look forward to seeing what you can do with it!

For more information, visit Percona Operator for PostgreSQL v2 documentation page. For commercial support, please visit our contact page.

 

Learn more about Percona Operator for PostgreSQL v2

Jun
20
2023
--

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

Deploy Django on Kubernetes

Developers need an efficient, reliable way to run their Django applications with a robust PostgreSQL. Percona Operator for PostgreSQL offers a powerful solution for managing and scaling PostgreSQL databases in a Kubernetes environment, making it an ideal choice for developer use cases. In this blog post, we’ll explore what it takes to run Django on Kubernetes with Percona Operator.

Set up PostgreSQL

Use your favorite way to deploy PostgreSQL Operator. I will use the regular kubectl approach:

$ kubectl apply --server-side -f deploy/bundle.yaml

Django application would require its own user and database. Alter cluster custom resource (cr.yaml) manifest spec.users section. The following example will create the cluster
appdb,  the user
djangoapp and database
pgtest. The user will have access to
pgtest database only:

metadata:
  name: appdb
spec:
  users:
    - name: djangoapp
      databases:
        - pgtest

Now you can apply the manifest to provision the cluster:

$ kubectl apply -f deploy/cr.yaml

The Operator will generate the secret for the user called
CLUSTERNAMEpguserdjangoapp
. Read on to learn how to get credentials and connection string from it.

PostgreSQL 15 public schema

PostgreSQL 15 removes the global write privilege from the public schema. As a result, you might see the following error when running migration in Django:

django.db.migrations.exceptions.MigrationSchemaMissing: Unable to create the django_migrations table (permission denied for schema public
LINE 1: CREATE TABLE "django_migrations" ("id" bigint NOT NULL PRIMA...
                     ^
)

To fix that, it is necessary to explicitly allow
djangoapp
permissions to public schema. To do that, you need to connect to the cluster with superuser and run grants:

pgtest=# GRANT ALL ON SCHEMA public TO djangoapp;
GRANT

Learn how to connect with a superuser in our documentation.

Django and PostgreSQL

psycopg2

Psycopg is a PostgreSQL database adapter for the Python programming language. Django uses it to connect to the database. You will see the following error if you don’t have it installed and trying to connect to PostgreSQL:

django.core.exceptions.ImproperlyConfigured: Error loading psycopg2 or psycopg module

Install
psycopg2 by following its documentation. If you are installing it with pip, be aware that it might look for
pg_config  to build:

$ pip3 install psycopg2
…
Error: pg_config executable not found.


      pg_config is required to build psycopg2 from source.  Please add the directory
      containing pg_config to the $PATH or specify the full executable path with the
      option:


          python setup.py build_ext --pg-config /path/to/pg_config build ...

The easiest way to fix it is to install the
psycopg2binary instead:

$ pip3 install psycopg2-binary

settings.py


settings.py  is the main configuration file for Django applications. It is where you should configure the database – provide credentials, set the correct engine, and more
. You can read more about it in Django’s documentation.

There is nothing super specific about configuring Django and Percona Operator. First, get the connection string and credentials from the Operator. For the cluster
appdb and the user
djangoapp they are stored in a secret
appdbpguserdjangoapp. The following commands will get you what is needed:

$ kubectl get secret appdb-pguser-djangoapp --template='{{.data.password | base64decode}}'
4<_R|8O/@:.2>PnO+DyEW1Kd

$ kubectl get secret appdb-pguser-djangoapp --template='{{index .data "pgbouncer-host" | base64decode}}'
appdb-pgbouncer.default.svc

Your settings.py DATABASES section will look the following way:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'pgtest',
        'USER': 'djangoapp',
        'PASSWORD': '4<_R|8O/@:.2>PnO+DyEW1Kd',
        'HOST': 'appdb-pgbouncer.default.svc’,
        'PORT': '5432',
    }
}

settings.py and Kubernetes

The recommended way to pass credentials to containers is through environment variables. In Kubernetes, it will be additionally wrapped into a Secret resource.

To make it work, we will put our database URI into an environment variable. You can get the database URI from the secret as well:

$ kubectl get secret appdb-pguser-djangoapp --template='{{index .data "pgbouncer-uri" | base64decode}}'
postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest

The recommended way for Django is to store environment variables in
.env  file:

DATABASE_URL=postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest

For using the database URL, use dj_database_url. Install it as usual with pip:

$ pip3 install dj_database_url

Now you can have something like this in your settings.py:

import dj_database_url
import os

if os.environ.get('DATABASE_URL'):
  DATABASES['default'] = dj_database_url.config(default=os.environ['DATABASE_URL'])

DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql'

Note the
ENGINE, as
dj_database_url does not set it.

You can also avoid using
dj_database_url  and pass each variable separately through
os.environ.

Passing a variable in Kubernetes

In Kubernetes, you can pass the
DATABASE_URL to a container through a Secret. You can mount a separate Secret or reuse the one that the Operator manages. The recommended way is to have a separate Secret object, as your application might be in a separate namespace, and you might not have enough permissions to mount the one that is managed by the Operator. The Secret and Deployment might look as follows:

apiVersion: v1
kind: Secret
metadata:
  name: my-db-secret
stringData:
  pgbouncer-uri: postgresql://djangoapp:UZihjfPtvfNTuIdVzhAUT%7B%5Bq@appdb-pgbouncer.default.svc:5432/pgtest
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: django-deploy
spec:
  replicas: 3
    spec:
      containers:
      - name: mydjango
        image: mydjangoapp:1.2.3
        ports:
        - containerPort: 8000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: my-db-secret
              key: pgbouncer-uri

Conclusion: Deploying Django with Percona Operator for PostgreSQL

Running Django on Kubernetes with Percona Operator for PostgreSQL offers developers an efficient and scalable solution for managing their database needs in a Kubernetes environment. While there are some caveats and potential issues to be aware of, the configuration examples and explanations provided in this blog post will help developers overcome any challenges they may encounter. With Percona Operators, developers can focus on building and delivering their applications with confidence and ease.

Learn more about Percona Operator for PostgreSQL in our documentation.

You can get early access to new product features, invite-only “ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.

Read Our New White Paper:

Why Customers Choose Percona for PostgreSQL

May
29
2023
--

Disaster Recovery for PostgreSQL on Kubernetes

disaster recover for PostgreSQL on Kubernetes

Disaster recovery is not optional for businesses operating in the digital age. With the ever-increasing reliance on data, system outages or data loss can be catastrophic, causing significant business disruptions and financial losses.

With multi-cloud or multi-regional PostgreSQL deployments, the complexity of managing disaster recovery only amplifies. This is where the Percona Operators come in, providing a solution to streamline disaster recovery for PostgreSQL clusters running on Kubernetes. With the Percona Operators, businesses can manage multi-cloud or hybrid-cloud PostgreSQL deployments with ease, ensuring that critical data is always available and secure, no matter what happens.

In this article, you will learn how to set up disaster recovery with Percona Operator for PostgreSQL version 2.

Overview of the solution

Operators automate routine tasks and remove toil. For standby, Operator provides the following options:

  1. pgBackrest repo-based standby
  2. Streaming replication
  3. Combination of (1) and (2)

We will review the repo-based standby as the simplest one:

1. Two Kubernetes clusters in different regions, clouds, or running in hybrid mode (on-prem + cloud). One is Main, and the other is Disaster Recovery (DR).

2. In each cluster, there are the following components:

    1. Percona Operator
    2. PostgreSQL cluster
    3. pgBackrest
    4. pgBouncer

3. pgBackrest on the Main site streams backups and Write Ahead Logs (WALs) to the object storage.

4. pgBackrest on the DR site takes these backups and streams them to the standby cluster.

Configure main site

Use your favorite method to deploy the Operator from our documentation. Once installed, configure the Custom Resource manifest so that pgBackrest starts using the Object Storage of your choice. Skip this step if you already have it configured.

Configure the backups.pgbackrest.repos section by adding the necessary configuration. The below example is for Google Cloud Storage (GCS):

spec:
  backups:
    configuration:
      - secret:
          name: main-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET

main-pgbackrest-secrets

contains the keys for GCS; please read more about the configuration in the backup and restore tutorial.

Once configured, apply the custom resource:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/main created

The backups should appear in the object storage. By default, pgBackrest puts them into the pgbackrest folder.

Configure DR site

The configuration of the disaster recovery site is similar to the Main, with the only difference in standby settings.

The following manifest has standby.enabled set to true and points to the repoName where backups are (GCS in our case):

metadata:
  name: standby
spec: 
...
  backups:
    configuration:
      - secret:
          name: standby-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET
  standby:
    enabled: true
    repoName: repo1

Deploy the standby cluster by applying the manifest:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/standby created

Failover

In case of Main site failure or in other cases, you can promote the standby cluster. The promotion effectively allows writing to the cluster. This creates a net effect of pushing Write Ahead Logs (WALs) to the pgBackrest repository. It might create a split-brain situation where two primary instances attempt to write to the same repository. To avoid this, make sure the primary cluster is either deleted or shut down before trying to promote the standby cluster.

Once the primary is down or inactive, promote the standby by changing the corresponding section:

spec:
  standby:
    enabled: false

Now you can start writing to the cluster.

Split brain

There might be a case where your old primary comes up and starts writing to the repository. To recover from this situation, do the following:

  1. Keep only one primary with the latest data running
  2. Stop the writes on the other one
  3. Take the new full backup from the primary and upload it to the repo

Automating the failover

Automated failover consists of multiple steps and is outside of the Operator’s scope. There are a few steps that you can take to reduce the Recovery Time Objective (RTO). To detect the failover, we recommend having a third site for monitoring both DR and Main. In this case, you can be sure that Main really failed, and it is not a network split situation.

Another aspect of automation is to switch the traffic for the application from Main to Standby after promotion. It can be done through various Kubernetes configurations and heavily depends on how your networking and application are designed. The following options are quite common:

  1. Global Load Balancer – various clouds and vendors provide their solutions
  2. Multi-cluster Services or MCS – available on most of the public clouds
  3. Federation or other multi-cluster solutions

Conclusion

Percona Operator for PostgreSQL provides high availability for database clusters by design, making it a robust and production-ready solution for multi-AZ deployments. At the same time, business continuity protocols require disaster recovery plans in place where your vital processes and applications can survive regional outages. In this blog post, we saw how Kubernetes and Operators can simplify your DR design. Try it out yourself, and let us know your experience at the Community Forum.

For more information, visit Percona Operator for PostgreSQL v2 documentation page. For commercial support, please visit our contact page.

 

Try Percona Operator for PostgreSQL today!

Apr
20
2023
--

Using Encryption-at-Rest for PostgreSQL in Kubernetes

encryption-at-rest in PostgreSQL

Data-at-rest encryption is essential for compliance with regulations that require the protection of sensitive data. Encryption can help organizations comply with regulations and avoid legal consequences and fines. It is also critical for securing sensitive data and avoiding data breaches.

PostgreSQL does not natively support Transparent Data Encryption (TDE). TDE is a database encryption technique that encrypts data at the column or table level, as opposed to full-disk encryption (FDE), which encrypts the entire database.

As for FDE, there are multiple options available for PostgreSQL. In this blog post, you will learn:

  • how to leverage FDE on Kubernetes with Percona Operator for PostgreSQL
  • how to start using encrypted storage for already running cluster

Prepare

In most public clouds, block storage is not encrypted by default. To enable the encryption of the storage in Kubernetes, you need to modify the StorageClass resource. This will instruct Container Storage Interface (CSI) to provision encrypted storage volume on your block storage (AWS EBS, GCP Persistent Disk, Ceph, etc.).

The configuration of the storage class depends on your storage plugin. For example, in Google Kubernetes Engine (GKE), you need to create the key in Cloud Key Management Service (KMS) and set it in the StorageClass:

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: my-enc-sc
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-standard
  disk-encryption-kms-key: KMS_KEY_ID

Get

KMS_KEY_ID

by following the instructions in this document.

For AWS EBS, you just need to add an encrypted field; the key in AWS KMS will be generated automatically.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-enc-sc
provisioner: kubernetes.io/aws-ebs
parameters:
  encrypted: 'true'
  fsType: ext4
  type: gp2
volumeBindingMode: WaitForFirstConsumer

Read more about storage encryption in the documentation of your cloud provider or storage project of your choice.

Try it out

Once you have the StorageClass created, it is time to use it. I will use Percona Operator for PostgreSQL v2 (currently in tech preview) in my tests, but such an approach can be used with any Percona Operator.

Deploy the operator by following our installation instructions. I will use the regular kubectl way:

kubectl apply -f deploy/bundle.yaml --server-side

Create a new cluster with encrypted storage

To create the cluster with encrypted storage, you must set the correct storage class in the Custom Resource.

spec:
  ...
  instances:
  - name: instance1
  ...
    dataVolumeClaimSpec:
      storageClassName: my-enc-sc
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

Apply the custom resource:

kubectl apply -f deploy/cr.yaml

The cluster should be up and running, backed by encrypted storage.

Encrypt storage for existing cluster

This task boils down to switching from one StorageClass to another. With version two of the Operator, we have a notion of instance groups. They are absolutely fantastic for testing new configurations, including compute and storage.

  1. Start with a regular cluster with two nodes – Primary and Replica. Storage is not encrypted. (0-fde-pg.yaml)
  2. Add another instance group with two nodes, but this time with encrypted storage (1-fde-pg.yaml). To do that, we change the spec.instances section:
  1.   - name: instance2
        replicas: 2
        dataVolumeClaimSpec:
          storageClassName: my-enc-sc
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

    Wait for replication to complete and see the traffic hitting new nodes.

  2. Terminate nodes with unencrypted storage by removing the old instance group from the Custom Resource (2-fde-pg.yaml).

Now your cluster runs using encrypted storage.

Conclusion

It is quite interesting that PostgreSQL does not have built-in data-at-rest encryption. Peter Zaitsev wrote a blog post about it in the past – Why PostgreSQL Needs Transparent Database Encryption (TDE) – and why it is needed.

Storage-level encryption allows you to keep your data safe, but it has its limitations. The top limitations are:

  1. You can’t encrypt database objects granularly, only the whole storage.
  2. Also (1) does not allow you to encrypt different data with different keys, which might be the blocker for compliance and regulations.
  3. Physical backups, when files are copied from the disk, are not encrypted.

Even with these limitations, encrypting the data is highly recommended. Try out our operator and let us know what you think.

  • Please use this forum for general discussions.
  • Submit JIRA issue for bugs, and improvements of feature requests.
  • For commercial support, please use our contact page.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

 

Learn More About Percona Kubernetes Operators

Oct
13
2022
--

Run PostgreSQL in Kubernetes: Solutions, Pros and Cons

Run PostgreSQL in Kubernetes

PostgreSQL’s initial release was in 1996 when cloud-native was not even a term. Right now it is the second most popular relational open source database according to DB-engines. With its popularity growth and the rising trend of Kubernetes, it is not a surprise that there are multiple solutions to run PostgreSQL on K8s.

In this blog post, we are going to compare these solutions and review the pros and cons of each of them. The solutions under our microscope are:

  1. Crunchy Data PostgreSQL Operator (PGO)
  2. CloudNative PG from Enterprise DB
  3. Stackgres from OnGres
  4. Zalando Postgres Operator
  5. Percona Operator for PostgreSQL

The summary and comparison table can be found in our documentation.

Crunchy Data PGO

Crunchy Data is a company well-known in the PostgreSQL community. They provide a wide range of services and software solutions for PG. Their PostgreSQL Operator (PGO) is fully open source (Apache 2.0 license), but at the same time container images used by the operator are shipped under Crunchy Data Developer Program. This means that you cannot use the Operator with these images in production without the contract with Crunchy Data. Read more in the Terms of Use.

Deployment

According to the documentation, the latest version of the operator is 5.2.0, but the latest tag in Github is 4.7.7. I was not able to find which version is ready for production, but I will use a quickstart installation from the GitHub page, which installs 5.2.0. The quick start is not that quick. First, you need to fork the repository with examples: link.

Executing these commands failed for me:

YOUR_GITHUB_UN="<your GitHub username>"
git clone --depth 1 "git@github.com:${YOUR_GITHUB_UN}/postgres-operator-examples.git"
cd postgres-operator-examples

Cloning into 'postgres-operator-examples'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

I just ended up cloning the repo with 

git clone --depth 1 https://github.com/spron-in/postgres-operator-examples

Ran kustomize script which failed as well:

$ kubectl apply -k kustomize/install
error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/percona/postgres-operator-examples/kustomize/install'

The instructions on the documentation page have other commands, so I used them instead. As a person who loves open source, I sent a PR to fix the doc on Github. 

kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default

Now Operator is installed. Install the cluster:

kubectl apply -k kustomize/postgres/

Features

PGO operator is used in production by various companies, comes with management capabilities, and allows users to fine-tune PostgreSQL clusters.

No need to go through the regular day-two operations, like backups and scaling. The following features are quite interesting:

  • Extension Management. PostgreSQL extensions expand the capabilities of the database. With PGO, you can easily add extensions for your cluster and configure them during bootstrap. I like the simplicity of this approach.
  • User / database management. Create users and databases during cluster initialization. This is very handy for CICD pipelines and various automations.
  • Backup with annotations. Usually, Operators come with a separate Custom Resource Definition for backups and restores. In the case of PGO, backups, and restores are managed through annotations. This is an antipattern but still follows the declarative form.

CloudNative PG

This operator was maturing in EnterpriseDB (EDB) to be finally open-sourced recently. It is Apache-licensed and fully open source, and there is an EDB Postgres operator, which is a fork based on CloudNative PG. The Enterprise version has some additional features, for example, support for Red Hat OpenShift.

Deployment

Using quickstart, here is how to install the Operator:

kubectl apply -f \  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.17/releases/cnpg-1.17.0.yaml

It automatically creates

cnpg-system

namespace and deploys necessary CRDs, service accounts, and more.

Once done, you can deploy the PostgreSQL cluster. There are multiple exampolary YAMLs.

kubectl apply -f https://cloudnative-pg.io/documentation/1.17/samples/cluster-example.yaml

There is also a helm chart available that can simplify the installation even more.

Features

CloudNative PG comes with a wide range of regular operational capabilities: backups, scaling, and upgrades. The architecture of the Operator is quite interesting:

  • No StatefulSets. Normally, you would see StatefulSets used for stateful workloads in Kubernetes. Here PostgreSQL cluster is deployed with standalone Pods which are fully controlled by the Operator.
  • No Patroni. Patroni is a de-facto standard in the PostgreSQL community to build highly available clusters. Instead, they use Postgres instance manager.
  • Barman for backups. Not a usual choice as well, but can be explained by the fact that pgBarman, a backup tool for PostgreSQL, was developed by the 2nd Quadrant team which was acquired by EDB.

Apart from architecture decisions, there are some things that I found quite refreshing:

  • Documentation. As a product manager, I’m honestly fascinated by their documentation. It is very detailed, goes deep into details, and is full of various examples covering a wide variety of use cases. 
  • The custom resource which is used to create the cluster is called “Cluster”. It is a bit weird, but running something like kubectl get cluster is kinda cool.
  • You can bootstrap the new cluster, from an existing backup object and use streaming replication from the existing PostgreSQL cluster, even from outside Kubernetes. Useful for CICD and migrations.

Stackgres

OnGres is a company providing its support, professional, and managed services for PostgreSQL. The operator – Stackgres – is licensed under AGPL v3.

Deployment

Installation is super simple and described on the website. It boils down to a single command:

kubectl apply -f 'https://sgres.io/install'

This will deploy the web user interface and the operator. The recommended way to deploy and manage clusters is through the UI. Get the login and password:

kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "username = %s\n" (.data.k8sUsername | base64decode) }}'
kubectl get secret -n stackgres stackgres-restapi --template '{{ printf "password = %s\n" (.data.clearPassword | base64decode) }}'

Connect to the UI. You can either expose the UI through a LoadBalancer or with Kubernetes port forwarding:

POD_NAME=$(kubectl get pods --namespace stackgres -l "app=stackgres-restapi" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward ${POD_NAME} --address 0.0.0.0 8443:9443 --namespace stackgres

Deployment of the cluster in the UI is quite straightforward and I will not cover it here.

Features

UI allows users to scale, backup, restore, clone, and perform various other tasks with the clusters. I found it a bit hard to debug issues. It is recommended to set up a log server and debug issues on it, but I have not tried it. But the UI itself is mature, flexible, and just nice!

Interesting ones:

  • Experimental Babelfish support that enables the migration from MSSQL to save on license costs.
  • Extension management system, where users can choose the extension and its version to expand PG cluster capabilities.

  • To perform upgrades, Vacuum, and other database activities, the Operator provides Database Operation capability. It also has built-in benchmarking, which is cool!

Zalando Postgres Operator

Zalando is an online retailer of shoes, fashion, and beauty. It is the only company in this blog post that is not database-focused. They open-sourced the Operator that they use internally to run and manage PostgreSQL databases and it is quite widely adopted. It is worth mentioning that the Zalando team developed and open-sourced Patroni, which is widely adopted and used.

Deployment

You can deploy Zalando Operator through a helm chart or with kubectl. Same as with Stackgres, this Operator has a built-in web UI.

Helm chart installation is the quickest and easiest way to get everything up and running:

# add repo for postgres-operator
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator

# install the postgres-operator
helm install postgres-operator postgres-operator-charts/postgres-operator

# add repo for postgres-operator-ui
helm repo add postgres-operator-ui-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator-ui

# install the postgres-operator-ui
helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui

Expose the UI:

kubectl port-forward svc/postgres-operator-ui 8081:80

Connect to the UI and create the cluster. 

Features

This is one of the oldest PostgreSQL Operators, over time its functionality was expanding. It supports backups and restores, major version upgrades, and much more. Also, it has a web-based user interface to ease onboarding.

  • The operator heavily relies on Spilo – docker image that provides PostgreSQL and Patroni bundled together. It was developed in Zalando as well. This is a centerpiece to build HA architecture.
  • As Zalando is using AWS for its infrastructure, the operator is heavily tested and can be integrated with AWS. You can see it in some features – like live volume resize for AWS EBS or gp2 to gp3 migration.

Percona Operator for PostgreSQL

Percona is committed to providing software and services for databases anywhere. Kubernetes is a de-facto standard for cloud-native workloads that helps with this commitment.

What are the most important things about our Operator:

  • Fully open source
  • Supported by the community and Percona team. If you have a contract with Percona, you are fully covered with our exceptional services.
  • It is based on the Crunchy Data PGO v 4.7 with enhancements for monitoring, upgradability, and flexibility

Deployment

We have quick-start installation guides through helm and regular YAML manifests. The installation through helm is as follows:

Install the Operator:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm install my-operator percona/pg-operator --version 1.3.0

Deploy PostgreSQL cluster:

helm install my-db percona/pg-db --version 1.3.0

Features

Most of the features are inherited from Crunchy Data – backups, scaling, multi-cluster replication, and many more. 

    • Open Source. Compared to Crunchy Data PGO, we do not impose any limitations on container images, so it is fully open source and can be used without any restrictions in production. 
    • Percona Monitoring and Management (PMM) is an open source database monitoring, observability, and management tool. Percona Operators come with an integration with PMM, so that users get full visibility into the health of their databases. 
    • Automated Smart Upgrades. Our Operator not only allows users to upgrade the database but also does it automatically and in a safe, zero-downtime way.
    • One-stop shop. Today’s enterprise environment is multi-database by default. Percona can help companies run PostgreSQL, MySQL, and MongoDB databases workloads over Kubernetes in a comprehensive manner.

To keep you excited, we are working on version two of the operator. It will have an improved architecture, remove existing limitations for backups and restores, enable automated scaling for storage and resources, and more. This quarter we plan to release a beta version, keep an eye on our releases.

Conclusion

PostgreSQL in Kubernetes is not a necessary evil, but an evolutionary step for companies who chose k8s as their platform. Choosing a vendor and a solution – is an important technical decision, which might impact various business metrics in the future. Still confused with various choices? Please start a discussion on the forum or contact our team directly.

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Mar
15
2022
--

Run PostgreSQL on Kubernetes with Percona Operator & Pulumi

Run PostgreSQL on Kubernetes with Percona Operator and Pulumi

Avoid vendor lock-in, provide a private Database-as-a-Service for internal teams, quickly deploy-test-destroy databases with CI/CD pipeline – these are some of the most common use cases for running databases on Kubernetes with operators. Percona Distribution for PostgreSQL Operator enables users to do exactly that and more.

Pulumi is an infrastructure-as-a-code tool, which enables developers to write code in their favorite language (Python, Golang, JavaScript, etc.) to deploy infrastructure and applications easily to public clouds and platforms such as Kubernetes.

This blog post is a step-by-step guide on how to deploy a highly-available PostgreSQL cluster on Kubernetes with our Percona Operator and Pulumi.

Desired State

We are going to provision the following resources with Pulumi:

  • Google Kubernetes Engine cluster with three nodes. It can be any Kubernetes flavor.
  • Percona Operator for PostgreSQL
  • Highly available PostgreSQL cluster with one primary and two hot standby nodes
  • Highly available pgBouncer deployment with the Load Balancer in front of it
  • pgBackRest for local backups

Pulumi code can be found in this git repository.

Prepare

I will use the Ubuntu box to run Pulumi, but almost the same steps would work on macOS.

Pre-install Packages

gcloud and kubectl

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update
sudo apt-get install -y google-cloud-sdk docker.io kubectl jq unzip

python3

Pulumi allows developers to use the language of their choice to describe infrastructure and applications. I’m going to use python. We will also pip (python package-management system) and venv (virtual environment module).

sudo apt-get install python3 python3-pip python3-venv

Pulumi

Install Pulumi:

curl -sSL https://get.pulumi.com | sh

On macOS, this can be installed view Homebrew with

brew install pulumi

 

You will need to add .pulumi/bin to the $PATH:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/percona/.pulumi/bin

Authentication

gcloud

You will need to provide access to Google Cloud to provision Google Kubernetes Engine.

gcloud config set project your-project
gcloud auth application-default login
gcloud auth login

Pulumi

Generate Pulumi token at app.pulumi.com. You will need it later to init Pulumi stack:

Action

This repo has the following files:

  • Pulumi.yaml

    – identifies that it is a folder with Pulumi project

  • __main__.py

    – python code used by Pulumi to provision everything we need

  • requirements.txt

    – to install required python packages

Clone the repo and go to the

pg-k8s-pulumi

folder:

git clone https://github.com/spron-in/blog-data
cd blog-data/pg-k8s-pulumi

Init the stack with:

pulumi stack init pg

You will need the key here generated before on app.pulumi.com.

__main__.py

Python code that Pulumi is going to process is in __main__.py file. 

Lines 1-6: importing python packages

Lines 8-31: configuration parameters for this Pulumi stack. It consists of two parts:

  • Kubernetes cluster configuration. For example, the number of nodes.
  • Operator and PostgreSQL cluster configuration – namespace to be deployed to, service type to expose pgBouncer, etc.

Lines 33-80: deploy GKE cluster and export its configuration

Lines 82-88: create the namespace for Operator and PostgreSQL cluster

Lines 91-426: deploy the Operator. In reality, it just mirrors the operator.yaml from our Operator.

Lines 429-444: create the secret object that allows you to set the password for pguser to connect to the database

Lines 445-557: deploy PostgreSQL cluster. It is a JSON version of cr.yaml from our Operator repository

Line 560: exports Kubernetes configuration so that it can be reused later 

Deploy

At first, we will set the configuration for this stack. Execute the following commands:

pulumi config set gcp:project YOUR_PROJECT
pulumi config set gcp:zone us-central1-a
pulumi config set node_count 3
pulumi config set master_version 1.21

pulumi config set namespace percona-pg
pulumi config set pg_cluster_name pulumi-pg
pulumi config set service_type LoadBalancer
pulumi config set pg_user_password mySuperPass

These commands set the following:

  • GCP project where GKE is going to be deployed
  • GCP zone 
  • Number of nodes in a GKE cluster
  • Kubernetes version
  • Namespace to run PostgreSQL cluster
  • The name of the cluster
  • Expose pgBouncer with LoadBalancer object

Deploy with the following command:

$ pulumi up
Previewing update (pg)

View Live: https://app.pulumi.com/spron-in/percona-pg-k8s/pg/previews/d335d117-b2ce-463b-867d-ad34cf456cb3

     Type                                                           Name                                Plan       Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   create     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     create
 +   ?? random:index:RandomPassword                                 password                            create
 +   ?? gcp:container:Cluster                                       gke-cluster                         create
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             create
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    create
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         create
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    create
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         create
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         create
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRoleBinding  pgo_deployer_crbClusterRoleBinding  create
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRole         pgo_deployer_crClusterRole          create
 +   ?? kubernetes:pg.percona.com/v1:PerconaPGCluster               my_cluster_name                     create

Diagnostics:
  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:19:49.739366105   53802 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies

Do you want to perform this update? yes

Updating (pg)
View Live: https://app.pulumi.com/spron-in/percona-pg-k8s/pg/updates/5
     Type                                                           Name                                Status      Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   created     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     created
 +   ?? random:index:RandomPassword                                 password                            created
 +   ?? gcp:container:Cluster                                       gke-cluster                         created
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             created
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    created
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         created
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         created
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    created
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         created
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRole         pgo_deployer_crClusterRole          created
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRoleBinding  pgo_deployer_crbClusterRoleBinding  created
 +   ?? kubernetes:pg.percona.com/v1:PerconaPGCluster               my_cluster_name                     created

Diagnostics:
  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:20:00.211695433   53839 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies

Outputs:
    kubeconfig: "[secret]"

Resources:
    + 13 created

Duration: 5m30s

Verify

Get kubeconfig first:

pulumi stack output kubeconfig --show-secrets > ~/.kube/config

Check if Pods of your PG cluster are up and running:

$ kubectl -n percona-pg get pods
NAME                                             READY   STATUS      RESTARTS   AGE
backrest-backup-pulumi-pg-dbgsp                  0/1     Completed   0          64s
pgo-deploy-8h86n                                 0/1     Completed   0          4m9s
postgres-operator-5966f884d4-zknbx               4/4     Running     1          3m27s
pulumi-pg-787fdbd8d9-d4nvv                       1/1     Running     0          2m12s
pulumi-pg-backrest-shared-repo-f58bc7657-2swvn   1/1     Running     0          2m38s
pulumi-pg-pgbouncer-6b6dc4564b-bh56z             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-vpppx             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-zkdwj             1/1     Running     0          81s
pulumi-pg-repl1-58d578cf49-czm54                 0/1     Running     0          46s
pulumi-pg-repl2-7888fbfd47-h98f4                 0/1     Running     0          46s
pulumi-pg-repl3-cdd958bd9-tf87k                  1/1     Running     0          46s

Get the IP-address of pgBouncer LoadBalancer:

$ kubectl -n percona-pg get services
NAME                             TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
…
pulumi-pg-pgbouncer              LoadBalancer   10.20.33.122   35.188.81.20   5432:32042/TCP               3m17s

You can connect to your PostgreSQL cluster through this IP-address. Use pguser password that was set earlier with

pulumi config set pg_user_password

:

psql -h 35.188.81.20 -p 5432 -U pguser pgdb

Clean up

To delete everything it is enough to run the following commands:

pulumi destroy
pulumi stack rm

Tricks and Quirks

Pulumi Converter

kube2pulumi is a huge help if you already have YAML manifests. You don’t need to rewrite the whole code, but just convert YAMLs to Pulumi code. This is what I did for operator.yaml.

apiextensions.CustomResource

There are two ways for Custom Resource management in Pulumi:

crd2pulumi generates libraries/classes out of Custom Resource Definitions and allows you to create custom resources later using these. I found it a bit complicated and it also lacks documentation.

apiextensions.CustomResource on the other hand allows you to create Custom Resources by specifying them as JSON. It is much easier and requires less manipulation. See lines 446-557 in my __main__.py.

True/False in JSON

I have the following in my Custom Resource definition in Pulumi code:

perconapg = kubernetes.apiextensions.CustomResource(
…
    spec= {
…
    "disableAutofail": False,
    "tlsOnly": False,
    "standby": False,
    "pause": False,
    "keepData": True,

Be sure that you use boolean of the language of your choice and not the “true”/”false” strings. For me using the strings turned into a failure as the Operator was expecting boolean, not the strings.

Depends On…

Pulumi makes its own decisions on the ordering of provisioning resources. You can enforce the order by specifying dependencies

For example, I’m ensuring that Operator and Secret are created before the Custom Resource:

    },opts=ResourceOptions(provider=k8s_provider,depends_on=[pgo_pgo_deploy_job,percona_pg_cluster1_pguser_secret_secret])

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com