Jun
07
2022
--

Migration of MongoDB Enterprise/Community Edition to Percona Server for MongoDB

Migration of MongoDB to Percona Server for MongoDB

Migration of MongoDB to Percona Server for MongoDBIn this blog post, we will discuss how we can migrate from the enterprise/community edition of MongoDB to Percona Server for MongoDB. But before we begin, let’s take a second to explain why you should migrate to Percona Server for MongoDB. 

Percona Distribution for MongoDB is a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together. Percona customers benefit from no lock-in and lower total cost of ownership, along with the freedom to run their MongoDB environment wherever they want to – in a public or private cloud, on-premises, or hybrid environment.

Percona Server for MongoDB offers the same, or equivalent, security features as MongoDB Enterprise without the price tag, and Percona experts are always available to help, bringing in-depth operational knowledge of MongoDB and open source tools so you can optimize database performance. If you’d like to learn more, please click here

Anyway, let’s get back to the purpose of the blog: migrating from the enterprise/community edition of MongoDB to Percona Server for MongoDB. 

Before starting the migration process it’s recommended that you perform a full backup (if you don’t have one already). See this post for MongoDB backup best practices.

The migration procedure:

  1. Backup the config files of the Mongo process.
  2. Stop the Mongo process. If it’s a replica set, then do it in a rolling fashion.
  3. Remove the package of MongoDB community/enterprise edition. For the replica set, do it in a rolling fashion.
  4. Install the Percona Server for MongoDB (PSMDB). It can be downloaded from here. Do it in a rolling fashion for a replica set and start the Mongo service.

Detailed migration plan:

Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL):

   1. Backup the mongo config and service file:

sudo cp /etc/mongod.conf /etc/mongod.conf_bkp

Debian:

sudo cp /lib/systemd/system/mongod.service /lib/systemd/system/mongod.service_bkp

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service /usr/lib/systemd/system/mongod.service_bkp

   2. Stop mongo service first and then remove mongodb-community/enterprise packages and repo:

To stop mongo services, connect to admin database and shutdown as below:

>use admin

>db.shutdownServer()

Remove the package:

Debian:

sudo apt-get remove mongodb-org mongodb-org-mongos mongodb-org-server mongodb-org-shell mongodb-org-tools
sudo rm /etc/apt/sources.list.d/mongodb-org-4.0.list

RHEL/CentOS:

sudo yum erase $(rpm -qa | grep mongodb-org)

If it’s OpsManager then:

   a. Unmanage the project in OpsManager GUI.

   b. Make sure to uncheck enforce users in Opsmanager GUI. 

   c. Disable the automation agent with the below:

sudo apt disable mongodb-mms-automation-agent

   d. Remove the automation agent with:

sudo systemctl remove mongodb-mms-automation-agent

   3. Configure percona repo and install Percona Server for MongoDB (PSMDB):

Debian:

sudo wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb

sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb

Enable the repo:

sudo percona-release enable psmdb-44 release

sudo apt-get update

Install the package:

sudo apt-get install percona-server-mongodb

RHEL/CentOS:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm

Enable the repo:

sudo percona-release setup pdmdb-44

Install the package:

sudo yum install percona-server-mongodb

   4. Copy back the mongod config and service file:

sudo cp /etc/mongod.conf_bkp /etc/mongod.conf

Debian:

sudo cp /lib/systemd/system/mongod.service_bkp /lib/systemd/system/mongod.service

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service_bkp /usr/lib/systemd/system/mongod.service

NOTE: Kindly check that the permissions and ownership of the data directory, keyfile, and log directory are properly updated for the mongod user. 

Also, if the SELinux policy is enabled, then set the necessary SELinux policy for dbPath, keyFile, and logs as below:

sudo semanage fcontext -a -t mongod_var_lib_t '/dbPath/mongod.*'
sudo chcon -Rv -u system_u -t mongod_var_lib_t '/dbPath/mongod'
sudo restorecon -R -v '/dbPath/mongod'
sudo semanage fcontext -a -t mongod_log_t '/logPath/log.*'
sudo chcon -Rv -u system_u -t mongod_log_t '/logPath/log'
sudo restorecon -R -v '/logPath/log'

   5. Enable and start mongod service:

sudo systemctl daemon-reload
sudo systemctl enable mongod
sudo systemctl start mongod
sudo systemctl status mongod

Migrate Replica set MongoDB Enterprise/Community edition to Percona Server for MongoDB (Debian/RHEL):

This migration process involves stopping the Mongo process in the hidden/secondary node first, removing the MongoDB community/enterprise edition packages, installing Percona Server for MongoDB, and starting it with the same data files. Then, step down the current primary node and repeat the same process.

   a. Make sure to check the current Primary and Secondary/hidden nodes.

db.isMaster().primary

   b. Start with the hidden node (if there is no hidden node then start with one of the secondary nodes with the least priority) first.

   c. Repeat steps from 1 to 5 from the section Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL).

   d. Wait for each node to be synced with Primary. Verify it with 

rs.printSecondaryReplicationInfo()

   e. Once completed for all secondary nodes, step down the current primary with

rs.stepDown()

   f. Wait for the new node to be elected as a Primary node and repeat steps 1 to 5 from the section Migrate standalone MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL) and wait for the former primary node to be synced with the newly elected Primary.

Migrate Sharded cluster MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL):

   1. Stop the balancer first:

sh.stopBalancer()

   2. Back up the configuration and service files for shards, CSRS, and Query router.

Backup the mongo config and service file for shards and CSRS:

sudo cp /etc/mongod.conf /etc/mongod.conf_bkp

Debian:

sudo cp /lib/systemd/system/mongod.service /lib/systemd/system/mongod.service_bkp

For router:

sudo cp /etc/mongos.conf /etc/mongos.conf_bkp

sudo cp /lib/systemd/system/mongos.service /lib/systemd/system/mongos.service_bkp

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongod.service /usr/lib/systemd/system/mongod.service_bkp

For router:

sudo cp /etc/mongos.conf /etc/mongos.conf_bkp
sudo cp /usr/lib/systemd/system/mongos.service /usr/lib/systemd/system/mongos.service_backup

   3. Start with the hidden node of the CSRS first (if there is no hidden node then start with one of the secondary nodes).

Repeat steps a to f from the section Migrate Replica set MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL)

Once migrated, the CSRS to Percona Server for MongoDB moves to Shards for the migration. Repeat steps a to f from the section Migrate Replica set MongoDB community/enterprise edition to Percona Server for MongoDB (Debian/RHEL)

   4. After the migration of the CSRS and Shards, start migrating the MongoS. Connect to one router at a time and execute the below steps followed by the remaining routers.

   5. Stop mongo service and then remove mongodb-community/enterprise packages and repo:

sudo systemctl stop mongos

Debian:

sudo apt-get remove mongodb-org mongodb-org-mongos mongodb-org-server mongodb-org-shell mongodb-org-tools
sudo rm /etc/apt/sources.list.d/mongodb-org-4.0.list

RHEL/CentOS:

sudo yum erase $(rpm -qa | grep mongodb-org)

   6. Configure percona repos and install Percona Server for MongoDB (PSMDB):

Debian:

sudo wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb

Enable the repo:

sudo percona-release enable psmdb-44 release
sudo apt-get update

Install the package:

sudo apt-get install percona-server-mongodb

RHEL/CentOS:

sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm

Enable the repo:

sudo percona-release setup pdmdb-44

Install the package:

sudo yum install percona-server-mongodb

Copy back the config and service file:

sudo cp /etc/mongos.conf_bkp /etc/mongos.conf

Debian:

sudo cp /lib/systemd/system/mongos.service_bkp /lib/systemd/system/mongos.service

RHEL/CentOS:

sudo cp /usr/lib/systemd/system/mongos.service_bkp /usr/lib/systemd/system/mongos.service

NOTE: Kindly check that the permissions and ownership of keyfile and log directory are properly updated for the mongod user.

   7. Enable and start mongos service:

sudo systemctl daemon-reload
sudo systemctl enable mongos
sudo systemctl start mongos
sudo systemctl status mongos

   8. Re-enable the balancer with below:

sh.startBalancer()

Conclusion:

To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered? 

We also encourage you to try our products for MongoDB like Percona Server for MongoDB, Percona Backup for MongoDB, or Percona Operator for MongoDB.

Jun
02
2022
--

How 3 Companies Used Percona to Boost Performance, Availability, and Market Value

Percona Maintains Optimal Customer Health

At Percona, we love collaborating with companies to achieve their goals – no matter how ambitious. With each new customer, we get the chance to apply our open source expertise and technology to solve complex and fascinating database challenges. We’re grateful to work with so many amazing companies and want to share some recent success stories.

Appsuite

Appsuite is a full-scale SaaS provider for hospitality businesses. The company serves as a customer relationship management solution, offering point-of-sale integrated CRM; loyalty, reward, and gift card programs; and online ordering applications. At the onset of COVID-19, Appsuite recognized that, in addition to helping their existing customers adjust to rapid changes resulting from the pandemic, there were new opportunities to create value as a SaaS provider. To expand their business and serve new segments, they needed to scale their operations to support customers with a maximum of 30 locations to those with 800 locations or more. 

The Percona Consulting team guided Appsuite through the design and implementation of a scalable, highly reliable open source database architecture. This included a move to Percona’s open source database software, Percona Server for MySQL. Appsuite continues to collaborate with Percona on critical support projects for their production database systems.

“Percona has the best engineers in the world. Percona provides us with immediate 24×7 access to world-class MySQL engineering support. Avoiding the need to directly hire a skilled DBA has saved us a considerable amount of money.”Ross Hunton, Appsuite CTO

Otto Office

Otto Office is one of the largest business-to-business distance retailers for office supplies in Germany, specializing in mail-order sales, communications equipment, and office furniture to businesses and consumers. Their MySQL database cluster is the central point of their online sales system and requires a fast and reliable system architecture.

Otto Office was already using cluster functionality via Percona XtraDB Cluster to provide quick data synchronization over their different data centers, but they also wanted to improve database latency and architecture. Otto Office again partnered with Percona, this time to review their database environment and recommend improvements for making their cluster more stable, performant, and highly available, including tolerance of data center failure. Following this consulting engagement, Otto Office chose Percona to provide database support in case of an emergency.

“Percona consultants are database experts that focus on stable solutions. The expertise of the Percona consultants fitted perfectly with our team and led to good exchanges that found fast solutions.” – Sven Jacobsen, Director IT & eCommerce, Otto Office

BlaBlaCar

BlaBlaCar is the world’s leading community-based travel network. Headquartered in Paris, France, The company enables more than 100 million members to share rides across 22 countries via its website and mobile apps. To support customer demand for flexible commute scheduling, BlaBlaCar must maintain a fast and reliable system architecture.

Working closely with the Percona Support team, BlaBlaCar used MariaDB to implement performance improvements to increase the uptime of their mission-critical applications. BlaBlaCar chose Percona because of its transparency, unbiased nature, and dedication to the open source database world, believing that the DNA of Percona matches their philosophy of always “choosing the right tool for the job.”

“BlaBlaCar’s Database Reliability Engineering team and Percona share the same belief that unbiased open source database expertise provides great products and attractive communication. Percona Support offers us a guarantee to run our business safely and allows us to be focused on high-value areas.” – Maxime Fouilleul Engineering Manager for BlaBlaCar’s DBRE Team

Get unbiased open source database expertise

In summary, Percona’s open source expertise enabled:

  • Appsuite to use MySQL to scale their business and expand their value in the market.
  • Otto Office to make their MySQL-based central sales system more stable, performant, and highly available.
  • BlaBlaCar to use MariaDB to improve their mission-critical application performance and uptime.

So what do you want to do? Our unbiased open source database experts are ready to roll up their sleeves and help you meet your goals too. Learn more about Percona support and services here

 

May
27
2022
--

Physical Backup Support in Percona Backup for MongoDB

Physical Backup Support in Percona Backup for MongoDB

Percona Backup for MongoDB (PBM) is a backup utility custom-built by Percona to help solve the needs of customers and users who don’t want to pay for proprietary software like MongoDB Enterprise and Ops Manager but want a fully-supported open-source backup tool that can perform cluster-wide consistent backups in MongoDB.

Version 1.7.0 of PBM was released in April 2022 and comes with a technical preview of physical backup functionality. This functionality enables users to benefit from the reduced recovery time. 

With logical backups, you extract the data from the database and store it in some binary or text format, and for recovery, you write all this data back to the database. For huge data sets, it is a time-consuming operation and might take hours and days. Physical backups take all your files which belong to the database from the disk itself, and recovery is just putting these files back. Such recovery is much faster as it does not depend on the database performance at all.

In this blog post, you will learn about the architecture of the physical backups feature in PBM, see how fast it is compared to logical backups, and try it out yourself.

Tech Peek

Architecture Review

In general, physical backup means a copy of a database’s physical files. In the case of Percona Server for MongoDB (PSMDB) these are WiredTiger

*.wt

  Btree, config, metadata, and journal files you can usually find in

/data/db

.  The trick is to copy all those files without stopping the cluster and interrupting running operations. And to be sure that data in files is consistent and no data will be changed during copying. Another challenge is to achieve consistency in a sharded cluster. In other words, how to be sure that we are gonna be able to restore data to the same cluster time across all shards.

PBM’s physical backups are based on the backupCursors feature of PSMDB (PSMDB). This implies that to use this feature, you should use Percona Server for MongoDB.

Backup

On each replica set, PBM uses

$backupCursor

to retrieve a list of files that need to be copied to achieve backup. Having that list next step is to ensure cluster-wide consistency. For that, each replica set posts a cluster time of the latest observed operation. The backup leader picks the most recent one. This will be the common backup timestamp (recovery timestamp) saved as the

last_write_ts

in the backup metadata. After agreeing on the backup time, the

pbm-agent

on each cluster opens a

$backupCursorExtend

. The cursor will return its result only after the node reaches the given timestamp. Thus the returned list of logs (journals) will contain the “common backup timestamp”. At that point, we have a list of all files that have to be in the backup. So each node copies them to the storage, saves metadata, closes cursors, and calls it a backup. Here is a blog post explaining Backup Cursors in great detail.

Of course, PBM does a lot more behind the scenes starting from electing appropriate nodes for the backup, coordinating operations across the cluster, logging, error handling, and many more. But all these subjects are for other posts.

Backup’s Recovery Timestamp

Restoring any backup PBM returns the cluster to some particular point in time. Here we’re talking about the time, not in terms of wall time but MongoDB’s cluster-wide logical clock. So the point that point-in-time is consistent across all nodes and replica sets in a cluster. In the case of logical or physical backup, that time is reflected in the

complete

section of

pbm list

  of

pbm status

  outputs. E.g.:

    2022-04-19T15:36:14Z 22.29GB <physical> [complete: 2022-04-19T15:36:16]
    2022-04-19T14:48:40Z 10.03GB <logical> [complete: 2022-04-19T14:58:38]

This time is not the time when a backup has finished, but the time at which cluster state was captured (hence the time the cluster will be returned to after the restore). In PBM’s logical backups, the recovery timestamp tends to be closer to the backup finish. To define it, PBM has to wait until the snapshot on all replica sets finishes. And then it starts oplog capturing from the backup start up to that time. Doing physical backups, PBM would pick a recovery timestamp right after a backup start. Holding the backup cursor open guarantees the checkpoint data won’t change during the backup, and PBM can define complete-time right ahead.

Restore

There are a few considerations for restoration.

First of all, files in the backup may contain operations beyond the target time (

commonBackupTimestamp

). To deal with that, PBM uses a special function of the replication subsystem’s startup process to set the limit of the oplog being restored. It’s done by setting the

oplogTruncateAfterPoint

value in the local DB’s

replset.oplogTruncateAfterPoint

 collection.

Along with the

oplogTruncateAfterPoint

 database needs some other changes and clean-up before start. This requires a series of restarts of PSMDB in a standalone mode.

Which in turn brings some hassle to the PBM operation. To communicate and coordinate its work across all agents, PBM relies on PSMDB itself. But once a cluster is taken down, PBM has to switch to communication via storage. Also, during standalone runs, PBM is unable to store its logs in the database. Hence, at some point during restore,

pbm-agent

logs are being available only in agents’ stderr. And

pbm logs

 won’t have access to them. We’re planning to solve this problem by the physical backups GA.

Also, we had to decide on the restore strategy in a replica set. One way is to restore one node, then delete all data on the rest and let the PSMDB replication do the job. Although it’s a bit easier, it means until InitialSync finishes, the cluster will be of little use. Besides, logical replication at this stage almost neglects all the speed benefits (later on that) the physical restore brings to the table. So we went with the restoration of each node in a replica set. And making sure after the cluster starts, no node will spot any difference and won’t start ReSync.

As with the PBM’s logical backups, the physical once currently can be restored to the cluster with the same topology, meaning replica set names in the backup and the target cluster should match. Although it won’t be an issue for logical backups starting from the next PBM version. And later this feature will be extended to the physical backups as well. Along with that, the number of replica sets in the cluster could be more than those in the backup but not vice-versa. Meaning all data in the backup should be restored. 

Performance Review

We used the following setup:

  • Cluster: 3-node replica set. Each mongod+pbm-agent on Digital Ocean droplet: 16GB, 8vCPU (CPU optimized).
  • Storage: nyc3.digitaloceanspaces.com
  • Data: randomly generated, ~1MB documents

physical backup MongoDB

In general, a logical backup should be more beneficial on small databases (a few hundred megabytes). Since on such a scale, the extra overhead on top of data that physical files bring still makes a difference. Basically reading/writing only user data during logical backup means less data needs to be transferred over the network. But as the database grows, overhead on logical read(select) and mostly write(insert) became a bottleneck for the logical backups. As for the physical backup, the speed is almost always bounded only by the network bandwidth to/from remote storage. In our tests, restoration time from physical backups has linear dependency on the dataset size, whereas logical restoration time grows non-linearly. The more data you have, the longer it takes to replay all the data and rebuild indexes. For example, for a 600GB dataset physical restore took 5x less time compared to logical. 

But on a small DB size, the difference is neglectable – a couple of minutes. So the main benefit of logical backups lay beyond the performance. It’s flexibility. Logical backups allow partial backup/restore of the database (on the roadmap for PBM). You can choose particular databases and/or collections to work with.  As physical backups work directly with database storage-engine files, they operate in an all-or-nothing frame.

Hands-on

PBM Configuration

In order to start using PBM with PSMDB or MongoDB, install all the necessary packages according to the installation instructions. Please note that starting from version 1.7.0 the user running the

pbm-agent

 process should also have the read/write access to PSMDB data directory for the purpose of performing operations with datafiles during physical backup or restore. 

Considering the design, starting from 1.7.0 the default user for

pbm-agent

is changed from

pbm

to

mongod

. So unless PSMDB runs under a different user than

mongod

, no extra actions are required. Otherwise, please carefully re-check your configuration and provide the necessary permissions to ensure proper PBM functioning.

In addition, keep in mind that for using PBM physical backups, you should run Percona Server for MongoDB starting from versions 4.2.15-16 and 4.4.6-8 and higher – this is where hotBackups and backup cursors were introduced.

Creating a Backup

With the new PBM version, you can specify what type of backup you wish to make: physical or logical. By default when no type is selected, PBM makes a logical backup.

> pbm backup
Starting backup '2022-04-20T11:12:53Z'....
Backup '2022-04-20T11:12:53Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm backup -t physical
Starting backup '2022-04-20T12:34:06Z'....
Backup '2022-04-20T12:34:06Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm status -s cluster -s backups
Cluster:
========
rs0:
  - rs0/mongo1.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo2.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo3.perconatest.com:27017: pbm-agent v1.7.0 OK
Backups:
========
S3 us-east-1 s3://https://storage.googleapis.com/pbm-bucket
  Snapshots:
    2022-04-20T12:34:06Z 797.38KB <physical> [complete: 2022-04-20T12:34:09]
    2022-04-20T11:12:53Z 13.66KB <logical> [complete: 2022-04-20T11:12:58]

Point-in-Time Recovery

Point-in-Time Recovery is currently supported only for logical backups. It means that a logical backup snapshot is required for pbm-agent to start periodically saving consecutive slices of the oplog. You can still make a physical backup while PITR is enabled, it won’t break or change the oplog saving process. 

The restoration process to the specific point in time will also use a respective logical backup snapshot and oplog slices which will be replayed on top of the backup.

Checking the Logs

During physical backup, PBM logs are available via

pbm logs

command as well as for all other operations. 

> pbm logs -e backup/2022-04-20T12:34:06Z
2022-04-20T12:34:07Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup started
2022-04-20T12:34:12Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading files
2022-04-20T12:34:54Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading done
2022-04-20T12:34:56Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup finished

As for restore,

pbm logs

command doesn’t provide information about restore from a physical backup. It’s caused by peculiarities of the restore procedure and will be improved in the upcoming PBM versions. However,

pbm-agent

still saves log locally, so it’s possible to check information about restore process on each node: 

> sudo journalctl -u pbm-agent.service | grep restore
pbm-agent[12560]: 2022-04-20T19:37:56.000+0000 I [restore/2022-04-20T12:34:06Z] restore started
.......
pbm-agent[12560]: 2022-04-20T19:38:22.000+0000 I [restore/2022-04-20T12:34:06Z] copying backup data
.......
pbm-agent[12560]: 2022-04-20T19:38:39.000+0000 I [restore/2022-04-20T12:34:06Z] preparing data
.......
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished <nil>
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished successfully

Restoring from a Backup

The restore process from a physical backup is similar to a logical one but requires several extra steps after the restore is finished by PBM.

> pbm restore 2022-04-20T12:34:06Z
Starting restore from '2022-04-20T12:34:06Z'.....Restore of the snapshot from '2022-04-20T12:34:06Z' has started. Leader: mongo1.perconatest.com:27017/rs0

After starting the restore process, pbm CLI returns the leader node ID, so it’s possible to track the restore progress by checking logs of the pbm-agent leader. In addition, status is written to the metadata file created on the remote storage. The status file is created in the root of the storage path and has the format

.pbm.restore/<restore_timestamp>.json

. As an option it’s also possible to pass

-w

flag during restore which will block the current shell session and wait for the restore to finish:

> pbm restore 2022-04-20T12:34:06Z -w
Starting restore from '2022-04-20T12:34:06Z'....Started physical restore. Leader: mongo2.perconatest.com:27017/rs0
Waiting to finish...........................Restore successfully finished!

After the restore is complete, it’s required to perform the following steps:

  • Restart all
    mongod

    (and

    mongos

     if present) nodes

  • Restart all pbm-agents
  • Run the following command to resync the backup list with the storage:

    $ pbm config --force-resync

Conclusion

MongoDB allows users to store enormous amounts of data. Especially if we talk about sharded clusters, where users are not limited by a single storage volume size limit. Database administrators often have to implement various home-grown solutions to ensure timely backups and restores of such big clusters. The usual approach is a storage-level snapshot. Such solutions do not guarantee data consistency and provide false confidence that data is safe.

Percona Backup for MongoDB with physical backup and restore capabilities enable users to backup and restore data fast and at the same time comes with data-consistency guarantees. 

Physical Backup functionality is in the Technical Preview stage. We encourage you to read more about it in our documentation and try it out. In case you face any issues feel free to contact us on the forum or raise the JIRA issue.

May
26
2022
--

Looking for an Excellent MySQL Book for Beginners? The MySQL Workshop is a Great Choice

MySQL Book for Beginners

MySQL Book for Beginners

Last week at Percona Live, I was asked what book I recommend for novices seeking to learn MySQL.  For a long time, there has not been a good choice for modern versions of MySQL. Luckily I had just stumbled upon such a book.  Now I am happy to recommend The MySQL Workshop – A practical guide to working with data and managing databases with MySQL by Petit and Cosentino.

The first chapter introduces database architectures, data types, storage engines (including MyRocks), and data normalization. The following chapter cover in great detail how to create a database, using MySQL Workbench, backups & restoring data, and creating indexes. Chapter four has a very good section on working with SQL, functions, and case statements. Then JOINs and stored procedures are covered.

In another book, that would probably be enough content, but later chapters plunge into using Node.JS, Access, and Excel with MySQL.

The last few chapters cover loading data from many file formats, JSON data usage, user permissions, and logical backups. While knot a comprehensive MySQL DBA guide this material is well done and complete.

That is a lot of content in seven hundred pages.  The writing style is clear and numerous examples are well done.  Bravo to the authors for their achievements in this book.

This may not be the reference text for a grizzled MySQL veteran but for real novices and intermediates, this book is a must-have.  This would be a great classroom text for an introduction to MySQL class or a handy hardcopy reference for someone on the initial climb of the learning curve.

May
16
2022
--

Running Rocket.Chat with Percona Server for MongoDB on Kubernetes

Running Rocket.Chat with Percona Server for MongoDB on Kubernetes

Our goal is to have a Rocket.Chat deployment which uses highly available Percona Server for MongoDB cluster as the backend database and it all runs on Kubernetes. To get there, we will do the following:

  • Start a Google Kubernetes Engine (GKE) cluster across multiple availability zones. It can be any other Kubernetes flavor or service, but I rely on multi-AZ capability in this blog post.
  • Deploy Percona Operator for MongoDB and database cluster with it
  • Deploy Rocket.Chat with specific affinity rules
    • Rocket.Chat will be exposed via a load balancer

Rocket.Chat will be exposed via a load balancer

Percona Operator for MongoDB, compared to other solutions, is not only the most feature-rich but also comes with various management capabilities for your MongoDB clusters – backups, scaling (including sharding), zero-downtime upgrades, and many more. There are no hidden costs and it is truly open source.

This blog post is a walkthrough of running a production-grade deployment of Rocket.Chat with Percona Operator for MongoDB.

Rock’n’Roll

All YAML manifests that I use in this blog post can be found in this repository.

Deploy Kubernetes Cluster

The following command deploys GKE cluster named

percona-rocket

in 3 availability zones:

gcloud container clusters create --zone us-central1-a --node-locations us-central1-a,us-central1-b,us-central1-c percona-rocket --cluster-version 1.21 --machine-type n1-standard-4 --preemptible --num-nodes=3

Read more about this in the documentation.

Deploy MongoDB

I’m going to use helm to deploy the Operator and the cluster.

Add helm repository:

helm repo add percona https://percona.github.io/percona-helm-charts/

Install the Operator into the percona namespace:

helm install psmdb-operator percona/psmdb-operator --create-namespace --namespace percona

Deploy the cluster of Percona Server for MongoDB nodes:

helm install my-db percona/psmdb-db -f psmdb-values.yaml -n percona

Replica set nodes are going to be distributed across availability zones. To get there, I altered the affinity keys in the corresponding sections of psmdb-values.yaml:

antiAffinityTopologyKey: "topology.kubernetes.io/zone"

Prepare MongoDB

For Rocket.Chat to connect to our database cluster, we need to create the users. By default, clusters provisioned with our Operator have

userAdmin

user, its password is set in

psmdb-values.yaml

:

MONGODB_USER_ADMIN_PASSWORD: userAdmin123456

For production-grade systems, do not forget to change this password or create dedicated secrets to provision those. Read more about user management in our documentation.

Spin up a client Pod to connect to the database:

kubectl run -i --rm --tty percona-client1 --image=percona/percona-server-mongodb:4.4.10-11 --restart=Never -- bash -il

Connect to the database with

userAdmin

:

[mongodb@percona-client1 /]$ mongo "mongodb://userAdmin:userAdmin123456@my-db-psmdb-db-rs0-0.percona/admin?replicaSet=rs0"

We are going to create the following:

  • rocketchat

    database

  • rocketChat

    user to store data and connect to the database

  • oplogger

    user to provide access to oplog for rocket chat

    • Rocket.Chat uses Meteor Oplog tailing to improve performance. It is optional.
use rocketchat
db.createUser({
  user: "rocketChat",
  pwd: passwordPrompt(),
  roles: [
    { role: "readWrite", db: "rocketchat" }
  ]
})

use admin
db.createUser({
  user: "oplogger",
  pwd: passwordPrompt(),
  roles: [
    { role: "read", db: "local" }
  ]
})

Deploy Rocket.Chat

I will use helm here to maintain the same approach. 

helm install -f rocket-values.yaml my-rocketchat rocketchat/rocketchat --version 3.0.0

You can find rocket-values.yaml in the same repository. Please make sure you set the correct passwords in the corresponding YAML fields.

As you can see, I also do the following:

  • Line 11: expose Rocket.Chat through
    LoadBalancer

    service type

  • Line 13-14: set number of replicas of Rocket.Chat Pods. We want three – one per each availability zone.
  • Line 16-23: set affinity to distribute Pods across availability zones

Load Balancer will be created with a public IP address:

$ kubectl get service my-rocketchat-rocketchat
NAME                       TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE
my-rocketchat-rocketchat   LoadBalancer   10.32.17.26   34.68.238.56   80:32548/TCP   12m

You should now be able to connect to

34.68.238.56

and enjoy your highly available Rocket.Chat installation.

Rocket.Chat installation

Clean Up

Uninstall all helm charts to remove MongoDB cluster, the Operator, and Rocket.Chat:

helm uninstall my-rocketchat
helm uninstall my-db -n percona
helm uninstall psmdb-operator -n percona

Things to Consider

Ingress

Instead of exposing Rocket.Chat through a load balancer, you may also try ingress. By doing so, you can integrate it with cert-manager and have a valid TLS certificate for your chat server.

Mongos

It is also possible to run a sharded MongoDB cluster with Percona Operator. If you do so, Rocket.Chat will connect to mongos Service, instead of the replica set nodes. But you will still need to connect to the replica set directly to get oplogs.

Conclusion

We encourage you to try out Percona Operator for MongoDB with Rocket.Chat and let us know on our community forum your results.

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Operator for MongoDB in CONTRIBUTING.md.

Percona Operator for MongoDB contains everything you need to quickly and consistently deploy and scale Percona Server for MongoDB instances into a Kubernetes cluster on-premises or in the cloud. The Operator enables you to improve time to market with the ability to quickly deploy standardized and repeatable database environments. Deploy your database with a consistent and idempotent result no matter where they are used.

May
12
2022
--

Percona Operator for MongoDB and Kubernetes MCS: The Story of One Improvement

Percona Operator for MongoDB and Kubernetes MCS

Percona Operator for MongoDB supports multi-cluster or cross-site replication deployments since version 1.10. This functionality is extremely useful if you want to have a disaster recovery deployment or perform a migration from or to a MongoDB cluster running in Kubernetes. In a nutshell, it allows you to use Operators deployed in different Kubernetes clusters to manage and expand replica sets.Percona Kubernetes Operator

For example, you have two Kubernetes clusters: one in Region A, another in Region B.

  • In Region A you deploy your MongoDB cluster with Percona Operator. 
  • In Region B you deploy unmanaged MongoDB nodes with another installation of Percona Operator.
  • You configure both Operators, so that nodes in Region B are added to the replica set in Region A.

In case of failure of Region A, you can switch your traffic to Region B.

Migrating MongoDB to Kubernetes describes the migration process using this functionality of the Operator.

This feature was released in tech preview, and we received lots of positive feedback from our users. But one of our customers raised an internal ticket, which was pointing out that cross-site replication functionality does not work with Multi-Cluster Services. This started the investigation and the creation of this ticket – K8SPSMDB-625

This blog post will go into the deep roots of this story and how it is solved in the latest release of Percona Operator for MongoDB version 1.12.

The Problem

Multi-Cluster Services or MCS allows you to expand network boundaries for the Kubernetes cluster and share Service objects across these boundaries. Someone calls it another take on Kubernetes Federation. This feature is already available on some managed Kubernetes offerings,  Google Cloud Kubernetes Engine (GKE) and AWS Elastic Kubernetes Service (EKS). Submariner uses the same logic and primitives under the hood.

MCS Basics

To understand the problem, we need to understand how multi-cluster services work. Let’s take a look at the picture below:

multi-cluster services

  • We have two Pods in different Kubernetes clusters
  • We add these two clusters into our MCS domain
  • Each Pod has a service and IP-address which is unique to the Kubernetes cluster
  • MCS introduces new Custom Resources –
    ServiceImport

    and

    ServiceExport

    .

    • Once you create a
      ServiceExport

      object in one cluster,

      ServiceImport

      object appears in all clusters in your MCS domain.

    • This
      ServiceImport

        object is in

      svc.clusterset.local

      domain and with the network magic introduced by MCS can be accessed from any cluster in the MCS domain

Above means that if I have an application in the Kubernetes cluster in Region A, I can connect to the Pod in Kubernetes cluster in Region B through a domain name like

my-pod.<namespace>.svc.clusterset.local

. And it works from another cluster as well.

MCS and Replica Set

Here is how cross-site replication works with Percona Operator if you use load balancer:

MCS and Replica Set

All replica set nodes have a dedicated service and a load balancer. A replica set in the MongoDB cluster is formed using these public IP addresses. External node added using public IP address as well:

  replsets:
  - name: rs0
    size: 3
    externalNodes:
    - host: 123.45.67.89

All nodes can reach each other, which is required to form a healthy replica set.

Here is how it looks when you have clusters connected through multi-cluster service:

Instead of load balancers replica set nodes are exposed through Cluster IPs. We have ServiceExports and ServiceImports resources. All looks good on the networking level, it should work, but it does not.

The problem is in the way the Operator builds MongoDB Replica Set in Region A. To register an external node from Region B to a replica set, we will use MCS domain name in the corresponding section:

  replsets:
  - name: rs0
    size: 3
    externalNodes:
    - host: rs0-4.mongo.svc.clusterset.local

Now our rs.status() will look like this:

"name" : "my-cluster-rs0-0.mongo.svc.cluster.local:27017"
"role" : "PRIMARY"
...
"name" : "my-cluster-rs0-1.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
...
"name" : "my-cluster-rs0-2.mongo.svc.cluster.local:27017"
"role" : "SECONDARY"
...
"name" : "rs0-4.mongo.svc.clusterset.local:27017"
"role" : "UNKNOWN"

As you can see, Operator formed a replica set out of three nodes using

svc.cluster.local

domain, as it is how it should be done when you expose nodes with

ClusterIP

Service type. In this case, a node in Region B cannot reach any node in Region A, as it tries to connect to the domain that is local to the cluster in Region A. 

In the picture below, you can easily see where the problem is:

The Solution

Luckily we have a Special Interest Group (SIG), a Kubernetes Enhancement Proposal (KEP) and multiple implementations for enabling Multi-Cluster Services. Having a KEP is great since we can be sure the implementations from different providers (i.e GCP, AWS) will follow the same standard more or less.

There are two fields in the Custom Resource that control MCS in the Operator:

spec:
  multiCluster:
    enabled: true
    DNSSuffix: svc.clusterset.local

Let’s see what is happening in the background with these flags set.

ServiceImport and ServiceExport Objects

Once you enable MCS by patching the CR with

spec.multiCluster.enabled: true

, the Operator creates a

ServiceExport

object for each service. These ServiceExports will be detected by the MCS controller in the cluster and eventually a

ServiceImport

for each

ServiceExport

will be created in the same namespace in each cluster that has MCS enabled.

As you see, we made a decision and empowered the Operator to create

ServiceExport

objects. There are two main reasons for doing that:

  • If any infrastructure-as-a-code tool is used, it would require additional logic and level of complexity to automate the creation of required MCS objects. If Operator takes care of it, no additional work is needed. 
  • Our Operators take care of the infrastructure for the database, including Service objects. It just felt logical to expand the reach of this functionality to MCS.

Replica Set and Transport Encryption

The root cause of the problem that we are trying to solve here lies in the networking field, where external replica set nodes try to connect to the wrong domain names. Now, when you enable multi-cluster and set

DNSSuffix

(it defaults to

svc.clusterset.local

), Operator does the following:

  • Replica set is formed using MCS domain set in
    DNSSuffix

     

  • Operator generates TLS certificates as usual, but adds
    DNSSuffix

    domains into the picture

With this approach, the traffic between nodes flows as expected and is encrypted by default.

Replica Set and Transport Encryption

Things to Consider

MCS APIs

Please note that the operator won’t install MCS APIs and controllers to your Kubernetes cluster. You need to install them by following your provider’s instructions prior to enabling MCS for your PSMDB clusters. See our docs for links to different providers.

Operator detects if MCS is installed in the cluster by API resources. The detection happens before controllers are started in the operator. If you installed MCS APIs while the operator is running, you need to restart the operator. Otherwise, you’ll see an error like this:

{
  "level": "error",
  "ts": 1652083068.5910048,
  "logger": "controller.psmdb-controller",
  "msg": "Reconciler error",
  "name": "cluster1",
  "namespace": "psmdb",
  "error": "wrong psmdb options: MCS is not available on this cluster",
  "errorVerbose": "...",
  "stacktrace": "..."
}

ServiceImport Provisioning Time

It might take some time for

ServiceImport

objects to be created in the Kubernetes cluster. You can see the following messages in the logs while creation is in progress: 

{
  "level": "info",
  "ts": 1652083323.483056,
  "logger": "controller_psmdb",
  "msg": "waiting for service import",
  "replset": "rs0",
  "serviceExport": "cluster1-rs0"
}

During testing, we saw wait times up to 10-15 minutes. If you see your cluster is stuck in initializing state by waiting for service imports, it’s a good idea to check the usage and quotas for your environment.

DNSSuffix

We also made a decision to automatically generate TLS certificates for Percona Server for MongoDB cluster with

*.clusterset.local

domain, even if MCS is not enabled. This approach simplifies the process of enabling MCS for a running MongoDB cluster. It does not make much sense to change the

DNSSuffix

 field, unless you have hard requirements from your service provider, but we still allow such a change. 

If you want to enable MCS with a cluster deployed with an operator version below 1.12, you need to update your TLS certificates to include

*.clusterset.local

SANs. See the docs for instructions.

Conclusion

Business relies on applications and infrastructure that serves them more than ever nowadays. Disaster Recovery protocols and various failsafe mechanisms are routine for reliability engineers, not an annoying task in the backlog. 

With multi-cluster deployment functionality in Percona Operator for MongoDB, we want to equip users to build highly available and secured database clusters with minimal effort.

Percona Operator for MongoDB is truly open source and provides users with a way to deploy and manage their enterprise-grade MongoDB clusters on Kubernetes. We encourage you to try this new Multi-Cluster Services integration and let us know your results on our community forum. You can find some scripts that would help you provision your first MCS clusters on GKE or EKS here.

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Operator for MongoDB in CONTRIBUTING.md.

May
10
2022
--

Spring Cleaning: Discontinuing RHEL 6/CentOS 6 (glibc 2.12) and 32-bit Binary Builds of Percona Software

Discontinuing RHEL 6/CentOS 6

Discontinuing RHEL 6/CentOS 6As you are probably aware, Red Hat Enterprise Linux 6 (RHEL 6 or EL 6 in short) officially reached “End of Life” (EOL) on 2020-11-30 and is now in the so-called Extended Life Phase, which basically means that Red Hat will no longer provide bug fixes or security fixes.

Even though EL 6 and its compatible derivatives like CentOS 6 had reached EOL some time ago already, we continued providing binary builds for selected MySQL-related products for this platform.

However, this became increasingly difficult, as the MySQL code base continued to evolve and now depends on tools and functionality that are no longer provided by the operating system out of the box. This meant we already had to perform several modifications in order to prepare binary builds for this platform, e.g. installing custom compiler versions or newer versions of various system libraries.

As of MySQL 8.0.26, Oracle announced that they deprecated the TLSv1 and TLSv1.1 connection protocols and plan to remove these in a future MySQL version in favor of the more secure TLSv1.2 and TLSv1.3 protocols. TLSv1.3 requires that both the MySQL server and the client application be compiled with OpenSSL 1.1.1 or higher. This version of OpenSSL is not available in binary package format on EL 6 anymore, and manually rebuilding it turned out to be a “yak shaving exercise” due to the countless dependencies.

Our build & release team was able to update the build environments on all of our supported platforms (EL 7, EL 8, supported Debian and Ubuntu versions) for this new requirement. However, we have not been successful in getting all the required components and their dependencies to build on EL 6, as it would have required rebuilding quite a significant amount of core OS packages and libraries to achieve this.

Moreover, switching to this new OpenSSL version would have also required us to include some additional shared libraries in our packages to satisfy the runtime dependencies, adding more complexity and potential security issues.

In general, we believe that running a production system on an OS that is no longer actively supported by a vendor is not a recommended best practice from a security perspective, and we do not want to encourage such practices.

Because of these reasons and to simplify our build/release and QA processes, we decided to drop support for EL 6 for all products now. Percona Server for MySQL 8.0.27 was the last version for which we built binaries for EL 6 against the previous version of OpenSSL.

Going forward, the following products will no longer be built and released on this platform:

  • Percona Server for MySQL 5.7 and 8.0
  • Percona XtraDB Cluster 5.7
  • Percona XtraBackup 2.4 and 8.0
  • Percona Toolkit 3.2

This includes stopping both building RPM packages for EL 6 and providing binary tarballs that are linked against glibc 2.12.

Note that this OS platform was also the last one on which we still provided 32-bit binaries.

Most of the Enterprise Linux distributions have stopped providing 32-bit versions of their operating systems quite some time ago already. As an example, Red Hat Enterprise Linux 7 (released in June 2014) was the first release to no longer support installing directly on 32-bit Intel/AMD hardware (i686/x86). Already back in 2018, we had taken the decision that we will no longer be offering 32-bit binaries on new platforms or new major releases of our software.

Given today’s database workloads, we also think that 32-bit systems are simply not adequate anymore, and we already stopped building newer versions of our software for this architecture.

The demand for 32-bit downloads has also been declining steadily. A recent analysis of our download statistics revealed that only 2.3% of our total binary downloads are referring to i386 binaries. Looking at IP addresses, these downloads originated from 0.4% of the total range of addresses.

This change affects the following products:

  • Percona Server for MySQL 5.7
  • Percona XtraDB Cluster 5.7
  • Percona XtraBackup 2.4
  • Percona Toolkit

We’ve updated the Percona Release Lifecycle Overview web page accordingly to reflect this change. Previously released binaries for these platforms and architectures will of course remain accessible from our repositories.

If you’re still running EL 6 or a 32-bit database or OS, we strongly recommend upgrading to a more modern platform. Our Percona Services team would be happy to help you with that!

Apr
18
2022
--

Expose Databases on Kubernetes with Ingress

Expose Databases on Kubernetes with Ingress

Ingress is a resource that is commonly used to expose HTTP(s) services outside of Kubernetes. To have ingress support, you will need an Ingress Controller, which in a nutshell is a proxy. SREs and DevOps love ingress as it provides developers with a self-service to expose their applications. Developers love it as it is simple to use, but at the same time quite flexible.

High-level ingress design looks like this: 

High-level ingress design

  1. Users connect through a single Load Balancer or other Kubernetes service
  2. Traffic is routed through Ingress Pod (or Pods for high availability)
    • There are multiple flavors of Ingress Controllers. Some use nginx, some envoy, or other proxies. See a curated list of Ingress Controllers here.
  3. Based on HTTP headers traffic is routed to corresponding Pods which run websites. For HTTPS traffic Server Name Indication (SNI) is used, which is an extension in TLS supported by most browsers. Usually, ingress controller integrates nicely with cert-manager, which provides you with full TLS lifecycle support (yeah, no need to worry about renewals anymore).

The beauty and value of such a design is that you have a single Load Balancer serving all your websites. In Public Clouds, it leads to cost savings, and in private clouds, it simplifies your networking or reduces the number of IPv4 addresses (if you are not on IPv6 yet). 

TCP and UDP with Ingress

Quite interestingly, some ingress controllers also support TCP and UDP proxying. I have been asked on our forum and Kubernetes slack multiple times if it is possible to use ingress with Percona Operators. Well, it is. Usually, you need a load balancer per database cluster: 

TCP and UDP with Ingress

The design with ingress is going to be a bit more complicated but still allows you to utilize the single load balancer for multiple databases. In cases where you run hundreds of clusters, it leads to significant cost savings. 

  1. Each TCP port represents the database cluster
  2. Ingress Controller makes a decision about where to route the traffic based on the port

The obvious downside of this design is non-default TCP ports for databases. There might be weird cases where it can turn into a blocker, but usually, it should not.

Hands-on

My goal is to have the following:

All configuration files I used for this blog post can be found in this Github repository.

Deploy Percona XtraDB Clusters (PXC)

The following commands are going to deploy the Operator and three clusters:

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/operators-and-ingress/bundle.yaml

kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/operators-and-ingress/cr-minimal.yaml
kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/operators-and-ingress/cr-minimal2.yaml
kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/operators-and-ingress/cr-minimal3.yaml

Deploy Ingress

helm upgrade --install ingress-nginx ingress-nginx   --repo https://kubernetes.github.io/ingress-nginx   --namespace ingress-nginx --create-namespace  \
--set controller.replicaCount=2 \
--set tcp.3306="default/minimal-cluster-haproxy:3306"  \
--set tcp.3307="default/minimal-cluster2-haproxy:3306" \
--set tcp.3308="default/minimal-cluster3-haproxy:3306"

This is going to deploy a highly available ingress-nginx controller. 

  • controller.replicaCount=2

    – defines that we want to have at least two Pods of ingress controller. This is to provide a highly available setup.

  • tcp flags do two things:
    • expose ports 3306-3308 on the ingress’s load balancer
    • instructs ingress controller to forward traffic to corresponding services which were created by Percona Operator for PXC clusters. For example, port 3307 is the one to use to connect to minimal-cluster2. Read more about this configuration in ingress documentation.

Here is the load balancer resource that was created:

$ kubectl -n ingress-nginx get service
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                                                   AGE
…
ingress-nginx-controller             LoadBalancer   10.43.72.142    212.2.244.54   80:30261/TCP,443:32112/TCP,3306:31583/TCP,3307:30786/TCP,3308:31827/TCP   4m13s

As you see, ports 3306-3308 are exposed.

Check the Connection

This is it. Database clusters should be exposed and reachable. Let’s check the connection. 

Get the root password for minimal-cluster2:

$ kubectl get secrets minimal-cluster2-secrets | grep root | awk '{print $2}' | base64 --decode && echo
kSNzx3putYecHwSXEgf

Connect to the database:

$ mysql -u root -h 212.2.244.54 --port 3307  -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5722

It works! Notice how I use port 3307, which corresponds to minimal-cluster2.

Adding More Clusters

What if you add more clusters into the picture, how do you expose those?

Helm

If you use helm, the easiest way is to just add one more flag into the command:

helm upgrade --install ingress-nginx ingress-nginx   --repo https://kubernetes.github.io/ingress-nginx   --namespace ingress-nginx --create-namespace  \
--set controller.replicaCount=2 \
--set tcp.3306="default/minimal-cluster-haproxy:3306"  \
--set tcp.3307="default/minimal-cluster2-haproxy:3306" \
--set tcp.3308="default/minimal-cluster3-haproxy:3306" \
--set tcp.3309="default/minimal-cluster4-haproxy:3306"

No helm

Without Helm, it is a two-step process:

First, edit the

configMap

which configures TCP services exposure. By default it is called

ingress-nginx-tcp

:

kubectl -n ingress-nginx edit cm ingress-nginx-tcp
…
apiVersion: v1
data:
  "3306": default/minimal-cluster-haproxy:3306
  "3307": default/minimal-cluster2-haproxy:3306
  "3308": default/minimal-cluster3-haproxy:3306
  "3309": default/minimal-cluster4-haproxy:3306

Change in

configMap

will trigger the reconfiguration of nginx in ingress pods. But as a second step, it is also necessary to expose this port on a load balancer. To do so – edit the corresponding service:

kubectl -n ingress-nginx edit services ingress-nginx-controller
…
spec:
  ports:
    …
  - name: 3309-tcp
    port: 3309
    protocol: TCP
    targetPort: 3309-tcp

The new cluster is exposed on port 3309 now.

Limitations and considerations

Ports per Load Balancer

Cloud providers usually limit the number of ports that you can expose through a single load balancer:

  • AWS has 50 listeners per NLB, GCP 100 ports per service.

If you hit the limit, just create another load balancer pointing to the ingress controller.

Automation

Cost-saving is a good thing, but with Kubernetes capabilities, users expect to avoid manual tasks, not add more. Integrating the change of ingress configMap and load balancer ports into CICD is not a hard task, but maintaining the logic of adding new load balancers to add more ports is harder. I have not found any projects that implement the logic of reusing load balancer ports or automating it in any way. If you know of any – please leave a comment under this blog post.

Apr
13
2022
--

Overcoming VACUUM WRAPAROUND

Overcoming VACUUM WRAPAROUND

Transaction ID Wraparound occurs when the VACUUM process cannot keep up with database activity and the PostgreSQL service is forced to shut down.

In more technical parlance: Transaction ID Wraparound occurs when the semantics of Multi-Version Concurrency Control (MVCC) fail and when the number of unique transaction ids reaches its maximum which numbers about two billion.

What leads up to this situation is when the VACUUM process managed by either the autovacuum workers or user-interaction (manual) does not keep up with the DML operations.

Transaction ID Wraparound can be caused by a combination of one or more of the following circumstances:

  1. Autovacuum is turned off
  2. Long-lived transactions
  3. Database logical dumps (on a REPLICA using streaming replication)
  4. Many session connections with locks extending across large swaths of the data cluster
  5. Intense DML operations forcing the cancellation of autovacuum worker processes

Transaction WRAPAROUND can cause a spontaneous shutdown of the Postgres database server in order to protect the integrity of the data.

PostgreSQL at any one time has a number of transactions that are tracked by a unique ID. Every so often that number reaches the upper limit that can be registered, for example, 200 million transactions which is the default and is then renumbered. But if the number of unique transaction IDs goes to its maximum transactions limit, known as TXID Wraparound, Postgres will force a shutdown in order to protect the data.

Here’s how it works:

  • 4 billion transactions, 2^32, is the integer upper limit for the datatype used in Postgres.
  • 2 billion transactions, 2^31, is the upper limit that PostgreSQL permits before forcing a shutdown.
  • 10 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged.
  • 1 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode.

Warning Signs

In the case of the autovacuum daemon falling behind across the entire data cluster, review your monitoring solution in order to identify the trend of these metrics:

  • IO wait increases
  • CPU load increases
  • SQL performance decreases

Mitigation steps include:

  • Reviewing the internal Postgres monitoring metrics and confirming tables are being vacuumed.
  • Reviewing the Postgres logs, look for an overabundance of canceled autovacuum worker processes.
  • Reviewing the view “pg_stat_activity” and looking for a query string – PREVENTING TRANSACTION ID WRAPAROUND -. Actually, this is a normal message. But one should not see autovacuum running solely for the purposes of mitigating WRAPAROUND.

Here are example error messages that you can find in the Postgres logs when threatened by a shutdown due to WRAPAROUND:

#
# When less than 10 million transactions remain before shutdown
#
WARNING: database "mydb" must be vacuumed within 177009986 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".

Here’s a set of queries that will help you determine if WRAPAROUND is a risk:

  --
    -- Database query for transaction age per database
    --  and as a percentage of maximum permitted transactions
    --
SELECT  datname,
       age(datfrozenxid),
       (age(datfrozenxid)::numeric/1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"
FROM pg_database ORDER BY 2 DESC;
--
-- Database query for transaction age per table
--
SELECT  
c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as "TXID age",
(greatest(age(c.relfrozenxid),age(t.relfrozenxid))::numeric/1000000000*100)::numeric(4,2) as "% WRAPAROUND RISK"
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm')
ORDER BY 2 DESC;
--
-- Other vacuum runtime parameters of interest
--  returning TXID age
--
SELECT  name, setting
FROM pg_settings
WHERE name ~ 'vacuum'
AND name ~'_age$'
ORDER BY 1 ASC;

Preventing Transaction ID Wraparound

First and foremost, make certain all tables are regularly vacuumed. A correctly configured autovacuum process takes care of this without it ever becoming an issue. Otherwise, you will need to consider a manual VACUUM strategy.

The following are merely suggestions since each situation is highly subjective.

If you have the time, run the following invocation of vacuumdb. The value for option ‘-j’ can vary from a couple to a value equal to the number of CPUs on the host. The option ‘-a’ will process each database in alphabetical order.

vacuumdb -F -a -z -j 10 -v

Consider a bash script targeting individual databases if you see one is more urgent than another:

vacuumdb -z -j 10 -v <mydatabase>

Immediate Action: Approaching Wraparound at < 10 Million Transactions

The following is the set of actions to take when TRANSACTION WRAPAROUND is imminent. Remember, you are in a race against time.

You must vacuum the entire data cluster before the remaining available transaction id drops to 1 million transactions.

Action

  • The task is to vacuum the databases as quickly as possible.
  • The tool of choice is the CLI “vacuumdb”.
  • Use as many threads as reasonable.
  • Run VACUUM  in verbose mode and log the output.
  • Monitor log output for anomalous messages i.e. vacuum fails, etc.
  • Run “vacuumdb” against individual databases and, if necessary, individual tables
  • Avoid using the option ‘-a’

Scripts

Here’s a pair of example scripts that you can use as a starting point when developing your own mitigation protocol.

Method

  1. Identify the database with the oldest TXID
  2. Generate a list of tables in order of the oldest TXID age to the youngest
  3. Feed this list of tables into a script that invokes vacuumdb and VACUUM one table per invocation

The secret sauce is xargs which enables one to utilize as many CPUs as reasonably possible. The following pair of bash scripts invoke vacuumdb against a series of tables. Of course, there’s more than one way to do this.

Script one generates a list of tables in a selected database and calls script two which executes the VACUUM on each of those tables individually.

SCRIPT ONE  (go1_highspeed_vacuum.sh)

#!/bin/bash
#
# INVOCATION
# EX: ./go1_highspeed_vacuum.sh
#

########################################################
# EDIT AS REQUIRED
export CPU=4
export PAGER=less PGUSER=postgres PGPASSWORD=mypassword PGDATABASE=db01 PGOPTIONS='-c statement_timeout=0'
########################################################

SQL1="
with a as (select &nbsp; c.oid::regclass as table_name,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; greatest(age(c.relfrozenxid),age(t.relfrozenxid))
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; from pg_class c
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; left join pg_class t on c.reltoastrelid = t.oid
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; where c.relkind in ('r', 'm')
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; order by 2 desc)
select table_name from a
"

LIST="$(echo "$SQL1" | psql -t)"

# the 'P' sets the number of CPU to use simultaneously
xargs -t -n 1 -P $CPU ./go2_highspeed_vacuum.sh $PGDATABASE<<<$LIST

echo "$(date): DONE"

SCRIPT TWO (go2_highspeed_vacuum.sh)

#!/bin/bash

########################################################
# EDIT AS REQUIRED
export PAGER=less PGUSER=postgres PGPASSWORD=mypassword PGOPTIONS='-c statement_timeout=0'
export DB=$1

########################################################

vacuumdb --verbose ${DB} > ${DB}.log 2>&1

Tips

  • Be prepared to execute vacuumdb against the databases in REVERSE alphabetical order to avoid clashing with the autovacuum worker processes which vacuums in FORWARD alphabetical order.
  • Query table “pg_stat_activity”.
  • Always monitor where the autovacuum processes are working.
  • Avoid working on the same table that the autovacuum workers are currently processing.
  • Use the autovacuum workers as an indicator of what databases remain to be processed.
  • Kill active autovacuum workers when in conflict with a manual vacuum in order to speed things up.

Immediate Action: When PostgreSQL Service Has Shutdown Due to Transaction Wraparound

One recovers from a forced shutdown due to transaction id wraparound by performing a cluster-wide vacuum in single-user mode:

Log in to the host and as UNIX user “postgres” execute an invocation that is something similar:

  # it is understood that environment 
    # variable PGDATA points to the data cluster
    #
    postgres --single -D $PGDATA postgres <<< 'vacuum analyze'

I would suggest scripting the vacuum process because you’ll need to log in to each database to perform the VACUUM.

Generate and edit a list of all the databases:

    postgres --single -D $PGDATA postgres <<< 'select datname from pg_database' \
    | grep '"' | cut -d '"' -f 2 > list_db

Here is an example using the aforementioned list “list_db”:

#
# it is understood the database list has 

#  been edited before invoking this code snippet
#
for u in $(cat list_db)
do
    postgres --single -D $PGDATA $u <<< 'vacuum analyze'
done

TXID Wraparound is one of the scariest scenarios that can occur. Thankfully, this is an extremely rare incident and only occurs when systems are either under extremely heavy load or have been neglected.

Don’t get caught!

Remember: the best DBA is the one that’s never noticed. ?

References

Mar
15
2022
--

Run PostgreSQL on Kubernetes with Percona Operator & Pulumi

Run PostgreSQL on Kubernetes with Percona Operator and Pulumi

Avoid vendor lock-in, provide a private Database-as-a-Service for internal teams, quickly deploy-test-destroy databases with CI/CD pipeline – these are some of the most common use cases for running databases on Kubernetes with operators. Percona Distribution for PostgreSQL Operator enables users to do exactly that and more.

Pulumi is an infrastructure-as-a-code tool, which enables developers to write code in their favorite language (Python, Golang, JavaScript, etc.) to deploy infrastructure and applications easily to public clouds and platforms such as Kubernetes.

This blog post is a step-by-step guide on how to deploy a highly-available PostgreSQL cluster on Kubernetes with our Percona Operator and Pulumi.

Desired State

We are going to provision the following resources with Pulumi:

  • Google Kubernetes Engine cluster with three nodes. It can be any Kubernetes flavor.
  • Percona Operator for PostgreSQL
  • Highly available PostgreSQL cluster with one primary and two hot standby nodes
  • Highly available pgBouncer deployment with the Load Balancer in front of it
  • pgBackRest for local backups

Pulumi code can be found in this git repository.

Prepare

I will use the Ubuntu box to run Pulumi, but almost the same steps would work on macOS.

Pre-install Packages

gcloud and kubectl

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update
sudo apt-get install -y google-cloud-sdk docker.io kubectl jq unzip

python3

Pulumi allows developers to use the language of their choice to describe infrastructure and applications. I’m going to use python. We will also pip (python package-management system) and venv (virtual environment module).

sudo apt-get install python3 python3-pip python3-venv

Pulumi

Install Pulumi:

curl -sSL https://get.pulumi.com | sh

On macOS, this can be installed view Homebrew with

brew install pulumi

 

You will need to add .pulumi/bin to the $PATH:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/percona/.pulumi/bin

Authentication

gcloud

You will need to provide access to Google Cloud to provision Google Kubernetes Engine.

gcloud config set project your-project
gcloud auth application-default login
gcloud auth login

Pulumi

Generate Pulumi token at app.pulumi.com. You will need it later to init Pulumi stack:

Action

This repo has the following files:

  • Pulumi.yaml

    – identifies that it is a folder with Pulumi project

  • __main__.py

    – python code used by Pulumi to provision everything we need

  • requirements.txt

    – to install required python packages

Clone the repo and go to the

pg-k8s-pulumi

folder:

git clone https://github.com/spron-in/blog-data
cd blog-data/pg-k8s-pulumi

Init the stack with:

pulumi stack init pg

You will need the key here generated before on app.pulumi.com.

__main__.py

Python code that Pulumi is going to process is in __main__.py file. 

Lines 1-6: importing python packages

Lines 8-31: configuration parameters for this Pulumi stack. It consists of two parts:

  • Kubernetes cluster configuration. For example, the number of nodes.
  • Operator and PostgreSQL cluster configuration – namespace to be deployed to, service type to expose pgBouncer, etc.

Lines 33-80: deploy GKE cluster and export its configuration

Lines 82-88: create the namespace for Operator and PostgreSQL cluster

Lines 91-426: deploy the Operator. In reality, it just mirrors the operator.yaml from our Operator.

Lines 429-444: create the secret object that allows you to set the password for pguser to connect to the database

Lines 445-557: deploy PostgreSQL cluster. It is a JSON version of cr.yaml from our Operator repository

Line 560: exports Kubernetes configuration so that it can be reused later 

Deploy

At first, we will set the configuration for this stack. Execute the following commands:

pulumi config set gcp:project YOUR_PROJECT
pulumi config set gcp:zone us-central1-a
pulumi config set node_count 3
pulumi config set master_version 1.21

pulumi config set namespace percona-pg
pulumi config set pg_cluster_name pulumi-pg
pulumi config set service_type LoadBalancer
pulumi config set pg_user_password mySuperPass

These commands set the following:

  • GCP project where GKE is going to be deployed
  • GCP zone 
  • Number of nodes in a GKE cluster
  • Kubernetes version
  • Namespace to run PostgreSQL cluster
  • The name of the cluster
  • Expose pgBouncer with LoadBalancer object

Deploy with the following command:

$ pulumi up
Previewing update (pg)

View Live: https://app.pulumi.com/spron-in/percona-pg-k8s/pg/previews/d335d117-b2ce-463b-867d-ad34cf456cb3

     Type                                                           Name                                Plan       Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   create     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     create
 +   ?? random:index:RandomPassword                                 password                            create
 +   ?? gcp:container:Cluster                                       gke-cluster                         create
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             create
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    create
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         create
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    create
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         create
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         create
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRoleBinding  pgo_deployer_crbClusterRoleBinding  create
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRole         pgo_deployer_crClusterRole          create
 +   ?? kubernetes:pg.percona.com/v1:PerconaPGCluster               my_cluster_name                     create

Diagnostics:
  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:19:49.739366105   53802 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies

Do you want to perform this update? yes

Updating (pg)
View Live: https://app.pulumi.com/spron-in/percona-pg-k8s/pg/updates/5
     Type                                                           Name                                Status      Info
 +   pulumi:pulumi:Stack                                            percona-pg-k8s-pg                   created     1 message
 +   ?? random:index:RandomPassword                                 pguser_password                     created
 +   ?? random:index:RandomPassword                                 password                            created
 +   ?? gcp:container:Cluster                                       gke-cluster                         created
 +   ?? pulumi:providers:kubernetes                                 gke_k8s                             created
 +   ?? kubernetes:core/v1:ServiceAccount                           pgoPgo_deployer_saServiceAccount    created
 +   ?? kubernetes:core/v1:Namespace                                pgNamespace                         created
 +   ?? kubernetes:core/v1:ConfigMap                                pgoPgo_deployer_cmConfigMap         created
 +   ?? kubernetes:batch/v1:Job                                     pgoPgo_deployJob                    created
 +   ?? kubernetes:core/v1:Secret                                   percona_pguser_secretSecret         created
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRole         pgo_deployer_crClusterRole          created
 +   ?? kubernetes:rbac.authorization.k8s.io/v1:ClusterRoleBinding  pgo_deployer_crbClusterRoleBinding  created
 +   ?? kubernetes:pg.percona.com/v1:PerconaPGCluster               my_cluster_name                     created

Diagnostics:
  pulumi:pulumi:Stack (percona-pg-k8s-pg):
    E0225 14:20:00.211695433   53839 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies

Outputs:
    kubeconfig: "[secret]"

Resources:
    + 13 created

Duration: 5m30s

Verify

Get kubeconfig first:

pulumi stack output kubeconfig --show-secrets > ~/.kube/config

Check if Pods of your PG cluster are up and running:

$ kubectl -n percona-pg get pods
NAME                                             READY   STATUS      RESTARTS   AGE
backrest-backup-pulumi-pg-dbgsp                  0/1     Completed   0          64s
pgo-deploy-8h86n                                 0/1     Completed   0          4m9s
postgres-operator-5966f884d4-zknbx               4/4     Running     1          3m27s
pulumi-pg-787fdbd8d9-d4nvv                       1/1     Running     0          2m12s
pulumi-pg-backrest-shared-repo-f58bc7657-2swvn   1/1     Running     0          2m38s
pulumi-pg-pgbouncer-6b6dc4564b-bh56z             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-vpppx             1/1     Running     0          81s
pulumi-pg-pgbouncer-6b6dc4564b-zkdwj             1/1     Running     0          81s
pulumi-pg-repl1-58d578cf49-czm54                 0/1     Running     0          46s
pulumi-pg-repl2-7888fbfd47-h98f4                 0/1     Running     0          46s
pulumi-pg-repl3-cdd958bd9-tf87k                  1/1     Running     0          46s

Get the IP-address of pgBouncer LoadBalancer:

$ kubectl -n percona-pg get services
NAME                             TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
…
pulumi-pg-pgbouncer              LoadBalancer   10.20.33.122   35.188.81.20   5432:32042/TCP               3m17s

You can connect to your PostgreSQL cluster through this IP-address. Use pguser password that was set earlier with

pulumi config set pg_user_password

:

psql -h 35.188.81.20 -p 5432 -U pguser pgdb

Clean up

To delete everything it is enough to run the following commands:

pulumi destroy
pulumi stack rm

Tricks and Quirks

Pulumi Converter

kube2pulumi is a huge help if you already have YAML manifests. You don’t need to rewrite the whole code, but just convert YAMLs to Pulumi code. This is what I did for operator.yaml.

apiextensions.CustomResource

There are two ways for Custom Resource management in Pulumi:

crd2pulumi generates libraries/classes out of Custom Resource Definitions and allows you to create custom resources later using these. I found it a bit complicated and it also lacks documentation.

apiextensions.CustomResource on the other hand allows you to create Custom Resources by specifying them as JSON. It is much easier and requires less manipulation. See lines 446-557 in my __main__.py.

True/False in JSON

I have the following in my Custom Resource definition in Pulumi code:

perconapg = kubernetes.apiextensions.CustomResource(
…
    spec= {
…
    "disableAutofail": False,
    "tlsOnly": False,
    "standby": False,
    "pause": False,
    "keepData": True,

Be sure that you use boolean of the language of your choice and not the “true”/”false” strings. For me using the strings turned into a failure as the Operator was expecting boolean, not the strings.

Depends On…

Pulumi makes its own decisions on the ordering of provisioning resources. You can enforce the order by specifying dependencies

For example, I’m ensuring that Operator and Secret are created before the Custom Resource:

    },opts=ResourceOptions(provider=k8s_provider,depends_on=[pgo_pgo_deploy_job,percona_pg_cluster1_pguser_secret_secret])

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com