Jun
03
2022
--

MyDumper’s Stream Implementation

MyDumper Stream Implementation

MyDumper Stream ImplementationAs you might know, mysqldump is single-threaded and STDOUT is its default output. As MyDumper is multithreaded, it has to write on different files. Since version 0.11.3 was released in Nov 2021, we have the possibility to stream our backup in MyDumper. We thought for several months until we decided what was the simplest way to implement it and we also had to add support for compression. So, after fixing several bugs, and we now consider it is stable enough, we can explain how it works.

How Can You Stream if MyDumper is Multithreaded?

Receiving a stream is not a problem for myloader, it receives a file at a time and sends it to a thread to process it. However, each worker thread in mydumper is connected to the database, and as soon as it reads data, it should be sent to the stream, which might cause collisions with other worker threads that are reading data from the database. In order to avoid this issue, we ended up with the simplest solution: mydumper is going to take a backup and store it in the local file system that you configured, and the filename will be enqueued to be processed by the Stream Thread which pops one file at a time and pipes to stdout. We study the alternative to send chunks of the file while it is being dumped, but the way that we implemented is simpler and improves the overall performance.

Implementation Details

Here is a high-level diagram of how we implemented it:

MyDumper


When a mydumper Worker Thread processes a job, it connects to the database and stores the output into a file. That didn’t change, but with stream, we are pushing the filename into the mydumper stream_queue.

The mydumper Stream Thread is popping filenames from the mydumper stream_queue, it is going to send the header of the file to stdout and then open the file and send its content.

Then, myloader Stream Thread is going to receive and detect the header, it will create the new file with the filename from the header and store the content in it.

After closing the file, it will enqueue the filename in the myloader stream_queue. A myloader Worker Thread is going to take that file and process it according to the kind of file it is.

By default, the files are deleted, but if you want to keep them, you can use the –no-delete option. 

The header is simply adding — to the filename so you can use myloader or mysql client to import your database. Here is an example:

-- sbtest-schema-create.sql
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `sbtest` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */;

-- sbtest.sbtest1-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;

/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `sbtest1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `pad2` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=100010 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- sbtest.sbtest1.00000.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `sbtest1` VALUES(1,49929,"83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330","67847967377-48000963322-62604785301-91415491898-96926520291","")
…

Simple Use Cases

A thread writes to a single file to avoid a collision, which improves the performance. However, having thousands of files for a backup of a couple of tables is not manageable. So, the simplest use case is to send everything to a single file:

mydumper -B <SCHEMA_NAME> -h <FROM> > filename.sql

Then you can just simply import it using:

myloader --stream -o -h <TO_SERVER> < filename.sql

Now that you can pipe from a mydumper process to myloader, this execution is possible:

mydumper -B <SCHEMA_NAME> -h <FROM> | myloader --stream -o -h <TO>

pipe from a mydumper process to myloader

Or you can send the stream through the network using nc: 

mydumper -B <SCHEMA_NAME> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
nc -l <MYDUMPER_SERVER> <ANY_PORT> | myloader --stream -o -h <TO_SERVER>

 

stream through the network using nc

This implementation is using the backup directory on mydumper and myloader as Buffers, you must take this into account, as by default it is going to create a directory where you run it.

Another thing that you need to take into account is that mydumper and myloader will be writing on disk, the whole backup will be written on both File Systems while it is being processed, and use a file system with enough disk space.

Finally, you can keep myloader running and send several mydumper backups. First, you need to run:

nc -k -l <MYDUMPER_SERVER> <ANY_PORT> | myloader --stream -o -h <TO_SERVER>

And then execute:

mydumper -B <SCHEMA_NAME_1> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_2> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_3> -h <FROM_SERVER> | nc <MYDUMPER_SERVER> <ANY_PORT>
mydumper -B <SCHEMA_NAME_4> -h <FROM_SERVER> | nc -N <MYDUMPER_SERVER> <ANY_PORT>

Some versions of nc have these two options:

      -k      When a connection is completed, listen for another one.  Requires -l.

     -N      shutdown(2) the network socket after EOF on the input.  Some servers require this to finish their work.

This is very useful if you are refreshing some testing environment and you only need a couple of tables on different databases or if you are using a where clause that only applies to some tables.

Considerations

Usually, when you send data to STDOUT, you are not going to have trouble with disk space usage on the dumper server. That is NOT true if you are using MyDumper. Files will be stored on the mydumper server until they are transferred to the receiving server. For instance, if you have a 10TB database, with a very low network bandwidth compared to the disk bandwidth, you might end up filling up the disk where you keep the files temporarily.

Conclusion

We focus the implementation to speed up export and import processes. Opposite to other software or implementations, we use the file system as a buffer causing a higher disk utilization.

Jun
03
2022
--

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded Cluster

Migration of a MongoDB Replica Set to a Sharded ClusterIn this blog post, we will discuss how can we migrate from a replica set to sharded cluster. 

Before moving to migration let me briefly explain Replication and Sharding and why do we need to shard a replica Set.

Replication: It creates additional copies of data and allows for automatic failover to another node in case Primary went down. It also helps to scale our reads if the application is fine to read data that may not be the latest.

Sharding: It allows horizontal scaling of data writes by allowing data partition in multiple servers by using a shard key. Here, we should understand that a shard key is very important to distribute the data evenly across multiple servers. 

Why Do We Need a Sharded Cluster?

We need sharding due to the below reasons:

  1. By adding shards, we can reduce the number of operations each shard manages. 
  2. It increases the Read/Write capacity by distributing the Reads/Writes across multiple servers. 
  3. It also gives high availability as we deploy the replicas for the shards, config servers, and multiple MongoS.

Sharded cluster will include two more components which are Config Servers and Query routers i.e. MongoS.

Config Servers: It keeps metadata for the sharded cluster. The metadata comprises a list of chunks on each shard and the ranges that define the chunks. The metadata indicates the state of all the data and its components within the cluster. 

Query Routers(MongoS): It caches metadata and uses it to route the read or write operations to the respective shards. It also updates the cache when there are any metadata changes for the sharded cluster like Splitting of chunks or shard addition etc. 

Note: Before starting the migration process it’s recommended that you perform a full backup (if you don’t have one already).

The Procedure of Migration:

  1. Initiate at least a three-member replica set for the Config Server ( another member can be included as a hidden node for the backup purpose).
  2. Perform necessary OS, H/W, and disk-level tuning as per the existing Replica set.
  3. Setup the appropriate clusterRole for the Config servers in the mongod config file.
  4. Create at least two more nodes for the Query routers ( MongoS )
  5. Set appropriate configDB parameters in the mongos config file.
  6. Repeat step 2 from above to tune as per the existing replica set.
  7. Apply proper SELinux policies on all the newly configured nodes of Config server and MongoS.
  8. Add clusterRole parameter into existing replica set nodes in a rolling fashion.
  9. Copy all the users from the replica set to any MongoS.
  10. Connect to any MongoS and add the existing replica set as Shard. 

Note: Do not enable sharding on any database until the shard key is finalized. If it’s finalized then we can enable the sharding.

Detailed Migration Plan:

Here, we are assuming that a Replica set has three nodes (1 primary, and 2 secondaries)

  1. Create three servers to initiate a 3-member replica set for the Config Servers.

Perform necessary OS, H/W, and disk-level tuning. To know more about it, please visit our blog on Tuning Linux for MongoDB.

  1. Install the same version of Percona Server for MongoDB as the existing replica set from here.
  2. In the config file of the config server mongod, add the parameter clusterRole: configsvr and port: 27019  to start it as config server on port 27019.
  3. If SELinux policy is enabled then set the necessary SELinux policy for dbPath, keyFile, and logs as below.
sudo semanage fcontext -a -t mongod_var_lib_t '/dbPath/mongod.*'

sudo chcon -Rv -u system_u -t mongod_var_lib_t '/dbPath/mongod'

sudo restorecon -R -v '/dbPath/mongod'

sudo semanage fcontext -a -t mongod_log_t '/logPath/log.*'

sudo chcon -Rv -u system_u -t mongod_log_t '/logPath/log'

sudo restorecon -R -v '/logPath/log'

sudo semanage port -a -t mongod_port_t -p tcp 27019

Start all the Config server mongod instances and connect to any one of them. Create a temporary user on it and initiate the replica set.

> use admin

> rs.initiate()

> db.createUser( { user: "tempUser", pwd: "<password>", roles:[{role: "root" , db:"admin"}]})

Create a role anyResource with action anyAction as well and assign it to “tempUser“.

>db.getSiblingDB("admin").createRole({ "role": "pbmAnyAction",

      "privileges": [

         { "resource": { "anyResource": true },

           "actions": [ "anyAction" ]

         }

      ],

      "roles": []

   });

> 

>db.grantRolesToUser( "tempUser", [{role: "pbmAnyAction", db: "admin"}]  )

> rs.add("config_host[2-3]:27019")

Now our Config server replica set is ready, let’s move to deploying Query routers i.e. MongoS.

  1. Create two instances for the MongoS and tune the OS, H/W, and disk. To do it follow our blog Tuning Linux for MongoDB or point 1 from the above Detailed migration.
  2. In mongos config file, adjust the configDB parameter and include only non-hidden nodes of Config servers ( In this blog post, we have not mentioned starting hidden config servers).
  3. Apply SELinux policies if it’s enabled, then follow step 4 and keep the same keyFile and start the MongoS on port 27017.
  4. Add the below parameter in mongod.conf on the Replica set nodes. Make sure the services are restarted in a rolling fashion i.e. start with the Secondaries then step down the existing Primary and restart it with port 27018.
clusterRole: shardsvr

Login to any MongoS and authenticate using “tempUser” and add the existing replica set as a shard.

> sh.addShard( "replicaSetName/<URI of the replica set>") //Provide URI of the replica set

Verify it with:

> sh.status() or db.getSiblingDB("config")['shards'].find()

Connect to the Primary of the replica set and copy all the users and roles. To authenticate/authorize mention the replica set user.

> var mongos = new Mongo("mongodb://put MongoS URI string here/admin?authSource=admin") //Provide the URI of the MongoS with tempUser for authentication/authorization.

>db.getSiblingDB("admin").system.roles.find().forEach(function(d) {

mongos.getDB('admin').getCollection('system.roles').insert(d)});

>db.getSiblingDB("admin").system.users.find().forEach(function(d) { mongos.getDB('admin').getCollection('system.users').insert(d)});

  1.  Connect to any MongoS and verify copied users on it. 
  2.  Shard the database if shardKey is finalized (In this post, we are not sharing this information as it’s related to migration of Replica set to Sharded cluster only).

Shard the database:

>sh.enableSharding("<db>")

Shard the collection with hash-based shard key:

>sh.shardCollection("<db>.<coll1>", { <shard key field> : "hashed" } )

Shard the collection with range based shard key:

sh.shardCollection("<db>.<coll1>", { <shard key field> : 1, ... } )

Conclusion

Migration of a MongoDB replica set to a sharded cluster is very important to scale horizontally, increase the read/write operations, and also reduce the operations each shard manages.

We encourage you to try our products like Percona Server for MongoDB, Percona Backup for MongoDB, or Percona Operator for MongoDB. You can also visit our site to know “Why MongoDB Runs Better with Percona”.

Jun
02
2022
--

How 3 Companies Used Percona to Boost Performance, Availability, and Market Value

Percona Maintains Optimal Customer Health

At Percona, we love collaborating with companies to achieve their goals – no matter how ambitious. With each new customer, we get the chance to apply our open source expertise and technology to solve complex and fascinating database challenges. We’re grateful to work with so many amazing companies and want to share some recent success stories.

Appsuite

Appsuite is a full-scale SaaS provider for hospitality businesses. The company serves as a customer relationship management solution, offering point-of-sale integrated CRM; loyalty, reward, and gift card programs; and online ordering applications. At the onset of COVID-19, Appsuite recognized that, in addition to helping their existing customers adjust to rapid changes resulting from the pandemic, there were new opportunities to create value as a SaaS provider. To expand their business and serve new segments, they needed to scale their operations to support customers with a maximum of 30 locations to those with 800 locations or more. 

The Percona Consulting team guided Appsuite through the design and implementation of a scalable, highly reliable open source database architecture. This included a move to Percona’s open source database software, Percona Server for MySQL. Appsuite continues to collaborate with Percona on critical support projects for their production database systems.

“Percona has the best engineers in the world. Percona provides us with immediate 24×7 access to world-class MySQL engineering support. Avoiding the need to directly hire a skilled DBA has saved us a considerable amount of money.”Ross Hunton, Appsuite CTO

Otto Office

Otto Office is one of the largest business-to-business distance retailers for office supplies in Germany, specializing in mail-order sales, communications equipment, and office furniture to businesses and consumers. Their MySQL database cluster is the central point of their online sales system and requires a fast and reliable system architecture.

Otto Office was already using cluster functionality via Percona XtraDB Cluster to provide quick data synchronization over their different data centers, but they also wanted to improve database latency and architecture. Otto Office again partnered with Percona, this time to review their database environment and recommend improvements for making their cluster more stable, performant, and highly available, including tolerance of data center failure. Following this consulting engagement, Otto Office chose Percona to provide database support in case of an emergency.

“Percona consultants are database experts that focus on stable solutions. The expertise of the Percona consultants fitted perfectly with our team and led to good exchanges that found fast solutions.” – Sven Jacobsen, Director IT & eCommerce, Otto Office

BlaBlaCar

BlaBlaCar is the world’s leading community-based travel network. Headquartered in Paris, France, The company enables more than 100 million members to share rides across 22 countries via its website and mobile apps. To support customer demand for flexible commute scheduling, BlaBlaCar must maintain a fast and reliable system architecture.

Working closely with the Percona Support team, BlaBlaCar used MariaDB to implement performance improvements to increase the uptime of their mission-critical applications. BlaBlaCar chose Percona because of its transparency, unbiased nature, and dedication to the open source database world, believing that the DNA of Percona matches their philosophy of always “choosing the right tool for the job.”

“BlaBlaCar’s Database Reliability Engineering team and Percona share the same belief that unbiased open source database expertise provides great products and attractive communication. Percona Support offers us a guarantee to run our business safely and allows us to be focused on high-value areas.” – Maxime Fouilleul Engineering Manager for BlaBlaCar’s DBRE Team

Get unbiased open source database expertise

In summary, Percona’s open source expertise enabled:

  • Appsuite to use MySQL to scale their business and expand their value in the market.
  • Otto Office to make their MySQL-based central sales system more stable, performant, and highly available.
  • BlaBlaCar to use MariaDB to improve their mission-critical application performance and uptime.

So what do you want to do? Our unbiased open source database experts are ready to roll up their sleeves and help you meet your goals too. Learn more about Percona support and services here

 

Jun
01
2022
--

Session Temporary Tablespaces and Disk Space Usage in MySQL

Session Temporary Tablespaces and Disk Space Usage in MySQL

Session Temporary Tablespaces and Disk Space Usage in MySQLTemporary Tables handling in MySQL has a somewhat checkered past. Some time ago, my colleague Przemek wrote a post about differences in the implementations between versions of MySQL. If you are looking for the differences in implementation, that’s a perfect place to check.

In this post, I’d like to focus on Session Temporary Tablespaces, InnoDB on-disk internal temporary tables that come to play when a query requires storing more data than tmp_table_size or TempTable engine allocated more than temptable_max_mmap of memory-mapped temporary files.

If that condition happens, a session executing the query needs to be allocated with a tablespace from a pool of temporary tablespaces. The pool initially contains 10 temporary tablespaces that are created when an instance is started. The size of the pool never shrinks, and tablespaces are added to the pool automatically if needed. The default size of the tablespace (IBT file) is five pages or 80 KB.

When a session disconnects, its temporary tablespaces are truncated and released to the pool with their initial size. The truncate operation happens ONLY when the session disconnects; this also means that as long as the session is connected, the tablespaces consume disk space. MySQL can reuse the area for future queries, but if one query requires a lot of temporary space, the IBT will remain huge for the whole lifetime of a session.

Let’s demonstrate that behavior. Firstly, let’s create a table and populate it with one million rows. As a side note, storing SHA1 checksums in a char(40) field isn’t perfect, but it makes the test cleaner.

mysql> CREATE TABLE table01 (id int unsigned primary key auto_increment, s char(40));
mysql> INSERT INTO table01 (s) SELECT sha1(FLOOR(RAND() * 100))  from sequence_table(1000000) t;

Then, the below query that’s using temporary tables is going to be executed. As the temporary table size is larger than a value of tmp_table_size, which is 16 MB by default, that query will have to use the Session Temporary Tablespace.

mysql> pager pt-visual-explain
mysql> explain SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
Table scan
rows           1991860
+- DERIVED
   table          derived(temporary(union(table01,table01)))
   +- Table scan
      +- TEMPORARY
         table          temporary(union(table01,table01))
         +- Filesort
            +- Table scan
               +- UNION
                  table          union(table01,table01)
                  +- Table scan
                  |  rows           995930
                  |  +- Table
                  |     table          table01
                  +- Table scan
                     rows           995930
                     +- Table
                        table          table01

mysql> SELECT * FROM ( SELECT * FROM table01 UNION SELECT * FROM table01 ORDER BY s DESC) t LIMIT 1;
+--------+------------------------------------------+
| id     | s                                        |
+--------+------------------------------------------+
| 999145 | fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f |
+--------+------------------------------------------+

Let’s check the temporary tablespaces attached to that session after executing the query:

mysql> SELECT PATH, format_bytes(SIZE), STATE, PURPOSE FROM INFORMATION_SCHEMA.INNODB_SESSION_TEMP_TABLESPACES WHERE id = CONNECTION_ID();
+----------------------------+--------------------+--------+-----------+
| PATH                       | format_bytes(SIZE) | STATE  | PURPOSE   |
+----------------------------+--------------------+--------+-----------+
| ./#innodb_temp/temp_10.ibt | 392.00 MiB         | ACTIVE | INTRINSIC |
+----------------------------+--------------------+--------+-----------+

# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 392M 05-29 14:10 temp_10.ibt

After closing the session the file size is back to its default value:

# ls -la './#innodb_temp/temp_10.ibt'
-rw-r----- 1 mysql mysql 80K 05-29 14:10 temp_10.ibt

The solution works as described in the MySQL manual, however, it also means that it can be problematic. Long-living sessions aren’t uncommon in the databases world, primarily if applications use connection pooling. Moreover, connection pooling was designed precisely for that purpose, to mitigate the overhead of creating a new connection each time an application needs it, as reusing already existing connections to a database is more efficient than opening a new connection.

For instance, if between an application and MySQL, ProxySQL is used as middleware, the application lacks most of the control of the backend connection. It means that balanced connections to the backend are going to be used, but also, the connections will most likely live forever. Because of that fact, MySQL will rarely reclaim space from Session Temporary Tablespace, and this, as a result, will increase the overall disk space utilization.

It’s not that hard to hit. One heavy OLAP query from time to time it’s enough to start the snowball.

ProxySQL has an option that forces backend connection to be re-initialized – mysql-connection_max_age_ms (disabled by default). However, this works only for ProxySQL, and there are plenty more Connection Poolers available, and some of them are implemented directly on the applications’ side, which in general increases the complexity of the whole problem.

In an ideal world, MySQL should handle this issue by triggering the truncate process more often. I’ve submitted a Feature Request with the goal of having the “surprise” factor removed, and ultimately making it more user-friendly.

May
31
2022
--

Percona Platform First Steps

Percona Platform

I am a big fan of Percona Monitoring and Management (PMM) and am happy to report that setting up Percona Platform is as easy to set up and offers a lot of value. Percona Platform reached GA status recently and I think you will find it a handy addition to your infrastructure.

What is Percona Platform?

Percona Platform brings together enterprise-level distributions of MySQL, PostgreSQL, and MongoDB plus it includes a range of open source tools for data backup, availability, and management. The core is PMM which provides database management, monitoring, and automated insights, making it easier to manage database deployments. The number of sites with more than 100 separate databases has grown rapidly in the past few years.  Being able to have command and control of that many instances from a CLI has become impossible. Businesses need to move faster in increasingly complex environments which puts ever-increasing pressure on database administrators, developers, and everyone involved in database operations. The spiraling levels of demand make it harder to support, manage, and correct issues in database environments.

What Percona Platform provides is a unified view of the health of your entire database environment to quickly visually identify and remediate issues.  Developers can now self-service many of their database demands quickly and efficiently so they can easily provision and manage databases on a self-service basis across test and production instances. So you spend fewer resources and time on the management of database complexity.

The two keys to Percona Platform are Query Analytics (QAN), which provides granular insights into database behavior and helps uncover new database performance patterns for in-depth troubleshooting and performance optimization, and Percona Advisors, which are automated insights, created by Percona Experts to identify important issues for remediation such as security vulnerabilities, misconfigurations, performance problems, policy compliance, and database design issues. Automated insights within Percona Monitoring and Management ensure your database performs at its best. The Advisors check for replication inconsistencies, durability issues, password-less users, insecure connections, unstable OS configuration, and search for available performance improvements among other functions. 

Percona Platform is a point of control for your database infrastructure and augments PMM to be even more intelligent when connected to the Percona Platform. By connecting PMM with the Percona Platform, you get more advanced Advisors, centralized user account management, access to support tickets, private Database as a Service, Percona Expertise with the fastest SLAs, and more.

Percona Platform architecture

So How Do I Install Percona Platform?

The first step is to install PMM by following the Quick Start Guide. You need version 2.2.7 or later.

Second, You will need a Percona Account.

Third, you will need to connect that account to PMM.

I will assume that you will already have PMM installed.  Did I mention that PMM is free, open source software?

Percona Platform account signup

The signup form allows you to create a new account or use an existing account.

Now you can create a name for your organization.

Form for creating your organization and connecting PMM

After creating your username and password, create your organization

Now login to your PMM dashboard and select the Settings / Percona Platform.  You will need to get your ‘Public Address’ which the browser can populate the value for you if need be.

The PMM Server ID is automatically generated by PMM.  You will need to provide a name for your server, and you will need a second browser window to login into Percona Platform to get the Percona Platform Access Token (this token has a thirty-minute lifetime, so be quick or regenerate another token).

Go back into PMM, paste the Access Token into the Percona Platform Access Token field, and click Connect.

On the Percona Platform page, you will see your PMM instances. Congratulations, you are using Percona Platform!

Advisor Checks

All checks are hosted on Percona Platform. PMM Server automatically downloads them from here when the Advisors and Telemetry options are enabled in PMM under Configuration > Settings > Advanced Settings. Both options are enabled by default.

Depending on the entitlements available for your Percona Account, the set of advisor checks that PMM can download from Percona Platform differ in terms of complexity and functionality.

If your PMM instance is not connected to Percona Platform, PMM can only download the basic set of Anonymous advisor checks. As soon as you connect your PMM instance to Percona Platform, has access to additional checks, available only for Registered PMM instances.

If you are a Percona customer with a Percona Customer Portal account, you also get access to Paid checks, which offer more advanced database health information. A list is provided below.

Check Name Description Tier
MongoDB Active vs Available Connections Checks the ratio between Active and Available connections. Registered, Paid
MongoDB Authentication Warns if MongoDB authentication is disabled. Anonymous, Registered, Paid
MongoDB Security AuthMech Warns if MongoDB is not using the default SHA-256 hashing as SCRAM authentication method. Paid
MongoDB IP Bindings Warns if MongoDB network binding is not set as recommended. Anonymous, Registered, Paid
MongoDB CVE Version Shows an error if MongoDB or Percona Server for MongoDB version is not the latest one with CVE fixes. Anonymous, Registered, Paid
MongoDB Journal Check Warns if journal is disabled. Registered, Paid
MongoDB Localhost Authentication Bypass is Enabled Warns if MongoDB localhost bypass is enabled. Anonymous, Registered, Paid
MongoDB Non-Default Log Level Warns if MongoDB is not using the default log level. Paid
MongoDB Profiling Level Warns when the MongoDB profile level is set to collect data for all operations. Registered, Paid
MongoDB Read Tickets Warns if MongoDB is using more than 128 read tickets. Paid
MongoDB Replica Set Topology Warns if the Replica Set cluster has less than three members. Registered, Paid
MongoDB Version Warns if MongoDB or Percona Server for MongoDB version is not the latest one. Anonymous, Registered, Paid
MongoDB Write Tickets Warns if MongoDB network is using more than 128 write tickets. Paid
Check if Binaries are 32-bits Notifies if version_compatible_machine equals i686. Anonymous, Registered, Paid
MySQL Automatic User Expired Password Notifies if version_compatible_machine equals i686. Registered, Paid
MySQL InnoDB flush method and File Format check Checks the following settings: innodb_file_formatinnodb_file_format_maxinnodb_flush_method and innodb_data_file_path Registered, Paid
MySQL Checks based on values of MySQL configuration variables Checks the following settings: innodb_file_format,innodb_file_format_max,innodb_flush_method and innodb_data_file_path. Paid
MySQL Binary Logs checks, Local infile and SQL Mode checks Warns about non-optimal settings for Binary Log, Local Infile and SQL mode. Registered, Paid
MySQL Configuration Check Warns if parameters are not following Percona best practices, for infile, replication threads, and replica checksum. Paid
MySQL Users With Granted Public Networks Access Notifies about MySQL accounts allowed to be connected from public networks. Registered, Paid
MySQL User Check Runs a high-level check on user setup Registered, Paid
MySQL Advanced User Check Runs a detailed check on user setup Paid
MySQL Security Check Runs a detailed check on user setup Paid
MySQL Test Database This check returns a notice if there is a database with name ‘test’ or ‘test_%’. Registered, Paid
MySQL Version Warns if MySQL, Percona Server for MySQL, or MariaDB version is not the latest one. Anonymous, Registered, Paid
PostgreSQL Archiver is Failing Verifies if the archiver has failed. Paid
PostgreSQL Cache Hit Ratio Checks database hit ratio and complains when this is too low. Paid
PostgreSQL Configuration Change Requires Restart/Reload Warns when a configuration was changed and requires a server restart/reload Registered, Paid
PostgreSQL fsync is Disabled Shows an error if the fsync configuration is disabled, as this can result in unrecoverable data corruption. Anonymous, Registered, Paid
PostgreSQL Autovacuum Logging Is Disabled Shows an error if the fsync configuration is disabled, as this can result in unrecoverable data corruption. Paid
PostgreSQL Checkpoints Logging Is Disabled Notifies if the log_checkpoints configuration option is not enabled. Paid
PostgreSQL Max_connections is too high Notifies if the max_connections setting is set above 300. Anonymous, Registered, Paid
PostgreSQL Stale Replication Slot Warns for stale replication slots since these can lead to WAL file accumulation and DB server outage. Paid
PostgreSQL Super Role Notifies if there are users with superuser privileges. Anonymous, Registered, Paid
PostgreSQL Version Check Warns if the PostgreSQL minor or major versions are not the latest, and shows an error if the major version is 9.4 or older. Anonymous, Registered, Paid

More Advisors will be added and Perona always welcomes contributions. Check out Percona Platform today!

 

May
31
2022
--

KMIP in Percona Distribution for MongoDB, Amazon KWS in Percona XtraBackup: Release Roundup May 31, 2022

Percona Release Roundup May 31 2022

It’s time for the release roundup!

Percona Release Roundup May 31 2022Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since May 16, 2022. Take a look!

 

Percona Distribution for MongoDB 4.4.14-14

On May 25, 2022, we released Percona Distribution for MongoDB 4.4.14-14, a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. The aim of Percona Distribution for MongoDB is to enable you to run and operate your MongoDB efficiently with the data being consistently backed up.

This release of Percona Distribution for MongoDB includes improvements and bug fixes, provided by MongoDB and included in Percona Server for MongoDB. The most notable of them are the following:

  • Support of the master key rotation for data encrypted using the Keys Management Interoperability Protocol (KMIP) protocol (tech preview feature). This improvement allows users to comply with regulatory standards for data security.
  • Fixed the issue where having a large number of split points causes the chunk splitter to not function correctly and huge chunks would not be split without manual intervention. This can be caused when having small shard key ranges and a very high number of documents and where more than 8192 split points would be needed.
  • Added the repairShardedCollectionChunksHistory command to restore history fields for some chunks. This aims to fix broken snapshot reads and distributed transactions.
  • Fixed incorrect logging of queryHash/planCacheKey for operations that share the same $lookup shape
  • Added a new startup parameter that skips verifying the table logging settings on restarting as a replica set node from the standalone mode during the restore. This speeds up the restore process.

Download Percona Distribution for MongoDB 4.4.14-14

 

Percona Distribution for MongoDB 4.2.20

Percona Distribution for MongoDB 4.2.20 was released on May 23, 2022. Bug fixes, provided by MongoDB and included in Percona Server for MongoDB 4.2.20-20 and Percona Distribution for MongoDB 4.2.20 are the following:

  • Support of the master key rotation for data encrypted using the Keys Management Interoperability Protocol (KMIP) protocol (tech preview feature). This improvement allows users to comply with regulatory standards for data security.
  • Abort the WiredTiger transaction after releasing Client lock to avoid deadlocks.
  • Check if the host has cgroups v2 enabled and read the memory limits according to that.
  • Return the shutdown error as the top-level error when batch writing command fails due to mongos shutdown.
  • Fixed the double free state for the DocumentSource::optimizeAt() pipeline by making sure all pointers are in place before disposing of the pipeline prefix.

Download Percona Distribution for MongoDB 4.2.20

 

Percona Server for MongoDB 4.4.14-14

On May 25, 2022, Percona Server for MongoDB 4.4.14-14  was released. It is an enhanced, source-available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.4.14 Community Edition. It supports MongoDB 4.4.14 protocols and drivers. A release highlight is that it now supports the master key rotation for data encrypted using the Keys Management Interoperability Protocol (KMIP) protocol (tech preview feature). This improvement allows users to comply with regulatory standards for data security.

Download Percona Server for MongoDB 4.4.14-14

 

Percona Server for MongoDB 4.2.20-20

May 23, 2022, saw the release of Percona Server for MongoDB 4.2.20-20. Improvements and bug fixes introduced by MongoDB and included in Percona Server for MongoDB are the following:

  • SERVER-64983 – Abort the WiredTiger transaction after releasing Client lock to avoid deadlocks
  • SERVER-60412 – Check if the host has cgroups v2 enabled and read the memory limits according to that.
  • SERVER-64642 – Return the shutdown error as the top-level error when batch writing command fails due to mongos shutdown.
  • SERVER-59435 – Fixed the double free state for the DocumentSource::optimizeAt() pipeline by making sure all pointers are in place before disposing of the pipeline prefix

Download Percona Server for MongoDB 4.2.20-20

 

Percona XtraDB Cluster 5.7.37-31.57

On May 18, 2022, Percona XtraDB Cluster 5.7.37-31.57 was released. It supports critical business applications in your public, private, or hybrid cloud environment. Our free, open source, enterprise-grade solution includes the high availability and security features your business requires to meet your customer expectations and business goals.

The following lists a number of the notable updates and fixes for MySQL 5.7.37, provided by Oracle, and included in Percona Server for MySQL:

  • The performance on debug builds has been improved by optimizing the buf_validate() function in the InnoDB sources.
  • Fix for when a query using an index that differs from the primary key of the partitioned table results in excessive CPU load.
  • Enabling PAD_CHAR_TO_FULL_LENGTH SQL mode on a replica server added trailing spaces to a replication channel’s name in the replication metadata repository tables. Attempts to identify the channel using the padded name caused errors. The SQL mode is disabled when reading from those tables.

Find the complete list of bug fixes and changes in MySQL 5.7.37 Release Notes.

Download Percona XtraDB Cluster 5.7.37-31.57

 

Percona XtraBackup 8.0.28-21

Percona XtraBackup 8.0.28-21 was released on May 25, 2022. It enables MySQL backups without blocking user queries. Percona XtraBackup is ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups. Percona XtraBackup adds support for the Amazon Key Management Service (KMS) component.

Download Percona XtraBackup 8.0.28-21

 

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

May
30
2022
--

Talking Drupal #349 – Storage Entities

Today we are talking about Storage Entities with Martin Anderson-clutz.

www.talkingDrupal.com/349

Topics

  • Miller
  • TD site update
  • Storage Entities
  • Inspiration
  • Benefit of new entity type
  • Comparison to custom entity
  • Lightweight by design
  • Roadmap
  • Use cases
  • Revisions
  • Inline entity form
  • Core candidacy
  • Naming modules
  • Getting Started

Resources

Guests

Martin Anderson-Clutz – @mandclu

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Mike Herchel – herchel.com@mikeherchel

MOTW

Preview Link Provides a preview link that is tokenised, and therefore bypasses access control to allow any user to view the latest revision of an entity.

May
30
2022
--

Macrame Baby Swing – A Boho-Inspired Addition To Your Child’s Room

Are you looking for a fun, boho-inspired addition to your child’s room? Then macrame baby swing is a must-have. It’s perfect for those lazy summer days when you want to relax and watch your little one play.

Macrame baby swings are becoming increasingly popular, and it’s no wonder why. They add a touch of whimsy and charm to any room. Macrame baby swings are similar to hammocks, but they’re smaller and have a more delicate design.

Whether you want to buy a macrame baby swing or do it yourself, we’ve got you covered. This article will show you how to make your own macrame baby swing and where to buy one if you’re not up for the DIY challenge.

What Is Macrame?

Macrame is a form of textile-making that uses knotting techniques to create intricate patterns. It first became popular in the 1970s but has seen a resurgence in recent years due to the bohemian/boho-chic trend.

While macrame can be used to create a wide variety of items, It’s often used to make wall hangings, plant hangers, and other home decor items.

Is A Macrame Baby Swing Safe?

Yes, macrame baby swings are safe. They’re made out of lightweight materials and have sturdy construction. However, it’s important to make sure that the swing is hung securely and that there’s nothing nearby that your child could bump into while playing.

A lot of time macrame baby swings come with wooden elements like a bar or hoop. These can add extra stability and security to the swing.

Where To Put The Macrame Baby Swing?

Macrame baby swings can be hung indoors or outdoors. They look great in a child’s bedroom, playroom, or even living room. If you’re planning on using it outdoors, choose a shady spot out of the way of any potential hazards.

As with any piece of baby gear, there are some safety considerations to keep in mind when using a macrame baby swing.

  1. Always use the swing indoors or in a shady, well-ventilated area outdoors. The swing should never be left in direct sunlight.
  2. Make sure the swing is on a level surface before letting your child use it. The last thing you want is for the swing to tip over while your child is in it.
  3. Never leave your child unattended in the swing. Always stay within arms’ reach if your child needs help getting out or falls asleep in the swing.
  4. Inspect the swing regularly for any signs of wear and tear. If you see any fraying or damage to the rope, discontinue the use of the swing and replace the damaged parts.

By following these simple safety tips, you can ensure your child has a safe and enjoyable experience in their macrame baby swing.

Best Ready-To-Buy Macrame Baby Swings

If you’re not up for the challenge of making your own macrame baby swing, then there are plenty of options available to purchase. Here are some of our favorites!

S.N.HANDICRAFTS Handmade Macrame Baby Swing

This macrame baby swing is handmade in India using 100% cotton rope. It’s durable and perfect for indoor or outdoor use. Perfect for toddlers up to 50 lbs.

Mass Lumber Macrame Baby Swing Outdoor Seat with Belt

This macrame baby swing is made out of durable materials. It has a safety belt to keep your child secure and can hold up to 110 lbs. Perfect for use outdoors.

N\A Wooden Macrame Baby Swing

A high-quality, handmade cotton Macrame baby swing composed of solid wood and knitted by hand, offering excellent safety and quality. The baby swing includes a 39-inch chain and a non-slip children’s seat cushion, which is both comfortable and supportive for your little one.

The swing may be raised or lowered as required to ensure that the infant is more comfortable. The swing can bear up to 80 pounds, and it is designed for children aged 1-5.

Bean Sprout Baby Collection – Baby Hammock Swing Chair

The Bean Sprout Baby Hammock Swing Chair is the perfect place for your little one to relax. It is a premium quality macrame baby swing made of 100% cotton. It is soft, comfortable, and safe for your baby.

Choosing A Macrame Baby Swing

When choosing a macrame baby swing, there are a few things to keep in mind. First, consider the size of the swing. It should be big enough for your baby to comfortably sit or lie down in but not so large that it takes up too much space in the room.

Second, think about the design. There are many different macrame patterns to choose from. Some are more intricate than others. Consider the overall style of your home and choose a swing that fits in with the rest of your decor.

Last, think about functionality. Some swings come with additional features like a wooden bar or hoop. These can add stability and security to the swing. Others come with removable cushions for added comfort. Choose the features that are most important to you and your family.

How To Make Macrame Baby Swing Yourself

Making your own macrame baby swing is excellent if you’re feeling crafty and up for the challenge. You will save money, but you’ll also get to choose the perfect design and color scheme for your home.

The Internet is full of different tutorials on making a macrame baby swing. We recommend you check out this step-by-step guide:

What Do You Need To Make A Macrame Baby Swing?

  • Macrame cord
  • Wooden dowel or hoop
  • Wooden base
  • Scissors
  • Tape measure
  • Pencil or pen
  • Paper clips

Choosing the suitable macrame cord/rope is crucial in making your own macrame baby swing. The cord should be strong enough to support your child’s weight but also soft and comfortable to the touch.

We recommend using a 3/8-inch (9mm) cotton rope. This size is strong enough to support most babies and toddlers, but it’s also soft and gentle on the skin.

Other popular macrame cords are made of jute or hemp. These materials are also strong and durable, but they’re not as soft as cotton. If you choose to use one of these materials, add a cushion or padding to the swing, so your child is comfortable.

When searching for a macrame baby swing pattern or tutorial, double-check a list of supplies needed. Some designs call for special tools or equipment that you may not have around the house.

We also recommend reading through the entire pattern before starting. This will help you understand the steps involved and ensure you have everything you need.

How Much Macrame Cord Do I Need For A Swing?

The amount of cord you need will depend on the size of the swing and the design you choose.
Most macrame baby swing patterns call for between 50 and 100 feet (15 to 30 meters) of macrame cord.

We recommend purchasing at least 200 feet (60 meters) of cord to be safe. This will give you enough to make a baby swing and allow some mistakes along the way. You can always use the extra cord for another project or donate it to a local craft store.

What Size Rope Is Best For A Baby Swing?

The most popular macrame cord size for a baby swing is 3/8-inch (9mm). This size is strong enough to support most babies and toddlers, but it can vary depending on a particular project.

Why Make A Macrame Baby Swing?

A macrame baby swing is a beautiful and unique addition to your child’s nursery or playroom. Not only is it eye-catching, but it’s also sturdy and functional.

As your child grows, they’ll be able to enjoy the swing as a fun place to play or relax. And when they’re no longer using it, you can easily repurpose it into a wall hanging or other home decor item.

If you decide to make a macrame baby swing yourself, it’s a great way to add a personal touch to your child’s room. Plus, there is something very satisfying about creating something beautiful with your own two hands. Your baby will be able to enjoy the swing for years to come, and it will always hold sentimental value.

The post Macrame Baby Swing – A Boho-Inspired Addition To Your Child’s Room appeared first on Comfy Bummy.

May
27
2022
--

Physical Backup Support in Percona Backup for MongoDB

Physical Backup Support in Percona Backup for MongoDB

Percona Backup for MongoDB (PBM) is a backup utility custom-built by Percona to help solve the needs of customers and users who don’t want to pay for proprietary software like MongoDB Enterprise and Ops Manager but want a fully-supported open-source backup tool that can perform cluster-wide consistent backups in MongoDB.

Version 1.7.0 of PBM was released in April 2022 and comes with a technical preview of physical backup functionality. This functionality enables users to benefit from the reduced recovery time. 

With logical backups, you extract the data from the database and store it in some binary or text format, and for recovery, you write all this data back to the database. For huge data sets, it is a time-consuming operation and might take hours and days. Physical backups take all your files which belong to the database from the disk itself, and recovery is just putting these files back. Such recovery is much faster as it does not depend on the database performance at all.

In this blog post, you will learn about the architecture of the physical backups feature in PBM, see how fast it is compared to logical backups, and try it out yourself.

Tech Peek

Architecture Review

In general, physical backup means a copy of a database’s physical files. In the case of Percona Server for MongoDB (PSMDB) these are WiredTiger

*.wt

  Btree, config, metadata, and journal files you can usually find in

/data/db

.  The trick is to copy all those files without stopping the cluster and interrupting running operations. And to be sure that data in files is consistent and no data will be changed during copying. Another challenge is to achieve consistency in a sharded cluster. In other words, how to be sure that we are gonna be able to restore data to the same cluster time across all shards.

PBM’s physical backups are based on the backupCursors feature of PSMDB (PSMDB). This implies that to use this feature, you should use Percona Server for MongoDB.

Backup

On each replica set, PBM uses

$backupCursor

to retrieve a list of files that need to be copied to achieve backup. Having that list next step is to ensure cluster-wide consistency. For that, each replica set posts a cluster time of the latest observed operation. The backup leader picks the most recent one. This will be the common backup timestamp (recovery timestamp) saved as the

last_write_ts

in the backup metadata. After agreeing on the backup time, the

pbm-agent

on each cluster opens a

$backupCursorExtend

. The cursor will return its result only after the node reaches the given timestamp. Thus the returned list of logs (journals) will contain the “common backup timestamp”. At that point, we have a list of all files that have to be in the backup. So each node copies them to the storage, saves metadata, closes cursors, and calls it a backup. Here is a blog post explaining Backup Cursors in great detail.

Of course, PBM does a lot more behind the scenes starting from electing appropriate nodes for the backup, coordinating operations across the cluster, logging, error handling, and many more. But all these subjects are for other posts.

Backup’s Recovery Timestamp

Restoring any backup PBM returns the cluster to some particular point in time. Here we’re talking about the time, not in terms of wall time but MongoDB’s cluster-wide logical clock. So the point that point-in-time is consistent across all nodes and replica sets in a cluster. In the case of logical or physical backup, that time is reflected in the

complete

section of

pbm list

  of

pbm status

  outputs. E.g.:

    2022-04-19T15:36:14Z 22.29GB <physical> [complete: 2022-04-19T15:36:16]
    2022-04-19T14:48:40Z 10.03GB <logical> [complete: 2022-04-19T14:58:38]

This time is not the time when a backup has finished, but the time at which cluster state was captured (hence the time the cluster will be returned to after the restore). In PBM’s logical backups, the recovery timestamp tends to be closer to the backup finish. To define it, PBM has to wait until the snapshot on all replica sets finishes. And then it starts oplog capturing from the backup start up to that time. Doing physical backups, PBM would pick a recovery timestamp right after a backup start. Holding the backup cursor open guarantees the checkpoint data won’t change during the backup, and PBM can define complete-time right ahead.

Restore

There are a few considerations for restoration.

First of all, files in the backup may contain operations beyond the target time (

commonBackupTimestamp

). To deal with that, PBM uses a special function of the replication subsystem’s startup process to set the limit of the oplog being restored. It’s done by setting the

oplogTruncateAfterPoint

value in the local DB’s

replset.oplogTruncateAfterPoint

 collection.

Along with the

oplogTruncateAfterPoint

 database needs some other changes and clean-up before start. This requires a series of restarts of PSMDB in a standalone mode.

Which in turn brings some hassle to the PBM operation. To communicate and coordinate its work across all agents, PBM relies on PSMDB itself. But once a cluster is taken down, PBM has to switch to communication via storage. Also, during standalone runs, PBM is unable to store its logs in the database. Hence, at some point during restore,

pbm-agent

logs are being available only in agents’ stderr. And

pbm logs

 won’t have access to them. We’re planning to solve this problem by the physical backups GA.

Also, we had to decide on the restore strategy in a replica set. One way is to restore one node, then delete all data on the rest and let the PSMDB replication do the job. Although it’s a bit easier, it means until InitialSync finishes, the cluster will be of little use. Besides, logical replication at this stage almost neglects all the speed benefits (later on that) the physical restore brings to the table. So we went with the restoration of each node in a replica set. And making sure after the cluster starts, no node will spot any difference and won’t start ReSync.

As with the PBM’s logical backups, the physical once currently can be restored to the cluster with the same topology, meaning replica set names in the backup and the target cluster should match. Although it won’t be an issue for logical backups starting from the next PBM version. And later this feature will be extended to the physical backups as well. Along with that, the number of replica sets in the cluster could be more than those in the backup but not vice-versa. Meaning all data in the backup should be restored. 

Performance Review

We used the following setup:

  • Cluster: 3-node replica set. Each mongod+pbm-agent on Digital Ocean droplet: 16GB, 8vCPU (CPU optimized).
  • Storage: nyc3.digitaloceanspaces.com
  • Data: randomly generated, ~1MB documents

physical backup MongoDB

In general, a logical backup should be more beneficial on small databases (a few hundred megabytes). Since on such a scale, the extra overhead on top of data that physical files bring still makes a difference. Basically reading/writing only user data during logical backup means less data needs to be transferred over the network. But as the database grows, overhead on logical read(select) and mostly write(insert) became a bottleneck for the logical backups. As for the physical backup, the speed is almost always bounded only by the network bandwidth to/from remote storage. In our tests, restoration time from physical backups has linear dependency on the dataset size, whereas logical restoration time grows non-linearly. The more data you have, the longer it takes to replay all the data and rebuild indexes. For example, for a 600GB dataset physical restore took 5x less time compared to logical. 

But on a small DB size, the difference is neglectable – a couple of minutes. So the main benefit of logical backups lay beyond the performance. It’s flexibility. Logical backups allow partial backup/restore of the database (on the roadmap for PBM). You can choose particular databases and/or collections to work with.  As physical backups work directly with database storage-engine files, they operate in an all-or-nothing frame.

Hands-on

PBM Configuration

In order to start using PBM with PSMDB or MongoDB, install all the necessary packages according to the installation instructions. Please note that starting from version 1.7.0 the user running the

pbm-agent

 process should also have the read/write access to PSMDB data directory for the purpose of performing operations with datafiles during physical backup or restore. 

Considering the design, starting from 1.7.0 the default user for

pbm-agent

is changed from

pbm

to

mongod

. So unless PSMDB runs under a different user than

mongod

, no extra actions are required. Otherwise, please carefully re-check your configuration and provide the necessary permissions to ensure proper PBM functioning.

In addition, keep in mind that for using PBM physical backups, you should run Percona Server for MongoDB starting from versions 4.2.15-16 and 4.4.6-8 and higher – this is where hotBackups and backup cursors were introduced.

Creating a Backup

With the new PBM version, you can specify what type of backup you wish to make: physical or logical. By default when no type is selected, PBM makes a logical backup.

> pbm backup
Starting backup '2022-04-20T11:12:53Z'....
Backup '2022-04-20T11:12:53Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm backup -t physical
Starting backup '2022-04-20T12:34:06Z'....
Backup '2022-04-20T12:34:06Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started

> pbm status -s cluster -s backups
Cluster:
========
rs0:
  - rs0/mongo1.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo2.perconatest.com:27017: pbm-agent v1.7.0 OK
  - rs0/mongo3.perconatest.com:27017: pbm-agent v1.7.0 OK
Backups:
========
S3 us-east-1 s3://https://storage.googleapis.com/pbm-bucket
  Snapshots:
    2022-04-20T12:34:06Z 797.38KB <physical> [complete: 2022-04-20T12:34:09]
    2022-04-20T11:12:53Z 13.66KB <logical> [complete: 2022-04-20T11:12:58]

Point-in-Time Recovery

Point-in-Time Recovery is currently supported only for logical backups. It means that a logical backup snapshot is required for pbm-agent to start periodically saving consecutive slices of the oplog. You can still make a physical backup while PITR is enabled, it won’t break or change the oplog saving process. 

The restoration process to the specific point in time will also use a respective logical backup snapshot and oplog slices which will be replayed on top of the backup.

Checking the Logs

During physical backup, PBM logs are available via

pbm logs

command as well as for all other operations. 

> pbm logs -e backup/2022-04-20T12:34:06Z
2022-04-20T12:34:07Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup started
2022-04-20T12:34:12Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading files
2022-04-20T12:34:54Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading done
2022-04-20T12:34:56Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup finished

As for restore,

pbm logs

command doesn’t provide information about restore from a physical backup. It’s caused by peculiarities of the restore procedure and will be improved in the upcoming PBM versions. However,

pbm-agent

still saves log locally, so it’s possible to check information about restore process on each node: 

> sudo journalctl -u pbm-agent.service | grep restore
pbm-agent[12560]: 2022-04-20T19:37:56.000+0000 I [restore/2022-04-20T12:34:06Z] restore started
.......
pbm-agent[12560]: 2022-04-20T19:38:22.000+0000 I [restore/2022-04-20T12:34:06Z] copying backup data
.......
pbm-agent[12560]: 2022-04-20T19:38:39.000+0000 I [restore/2022-04-20T12:34:06Z] preparing data
.......
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished <nil>
pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished successfully

Restoring from a Backup

The restore process from a physical backup is similar to a logical one but requires several extra steps after the restore is finished by PBM.

> pbm restore 2022-04-20T12:34:06Z
Starting restore from '2022-04-20T12:34:06Z'.....Restore of the snapshot from '2022-04-20T12:34:06Z' has started. Leader: mongo1.perconatest.com:27017/rs0

After starting the restore process, pbm CLI returns the leader node ID, so it’s possible to track the restore progress by checking logs of the pbm-agent leader. In addition, status is written to the metadata file created on the remote storage. The status file is created in the root of the storage path and has the format

.pbm.restore/<restore_timestamp>.json

. As an option it’s also possible to pass

-w

flag during restore which will block the current shell session and wait for the restore to finish:

> pbm restore 2022-04-20T12:34:06Z -w
Starting restore from '2022-04-20T12:34:06Z'....Started physical restore. Leader: mongo2.perconatest.com:27017/rs0
Waiting to finish...........................Restore successfully finished!

After the restore is complete, it’s required to perform the following steps:

  • Restart all
    mongod

    (and

    mongos

     if present) nodes

  • Restart all pbm-agents
  • Run the following command to resync the backup list with the storage:

    $ pbm config --force-resync

Conclusion

MongoDB allows users to store enormous amounts of data. Especially if we talk about sharded clusters, where users are not limited by a single storage volume size limit. Database administrators often have to implement various home-grown solutions to ensure timely backups and restores of such big clusters. The usual approach is a storage-level snapshot. Such solutions do not guarantee data consistency and provide false confidence that data is safe.

Percona Backup for MongoDB with physical backup and restore capabilities enable users to backup and restore data fast and at the same time comes with data-consistency guarantees. 

Physical Backup functionality is in the Technical Preview stage. We encourage you to read more about it in our documentation and try it out. In case you face any issues feel free to contact us on the forum or raise the JIRA issue.

May
26
2022
--

Is MySQL Statement-Based / Mixed Replication Really Safe?

MySQL Statement-Based

MySQL Statement-BasedThe binary logging format in MySQL has been ROW by default since MySQL 5.7, yet there are still many users sticking with STATEMENT or MIXED formats for various reasons. In some cases, there is just simple hesitation from changing something that has worked for years on legacy applications. But in others, there may be serious blockers, most typically missing primary keys in badly designed schemas, which would lead to serious performance issues on the replicas.

As a Support Engineer, I can still see quite a few customers using STATEMENT or MIXED formats, even if they are already on MySQL 8.0. In many cases this is OK, but recently I had to deal with a pretty nasty case, where not using ROW format was found to cause the replicas to silently lose data updates, without raising any replication errors! Was it some really rare edge use case? Not at all! Let me demonstrate a very simple test case below to illustrate how easy it is to end up in such a bad situation.

— source 

mysql> select @@binlog_format,@@system_time_zone;
+-----------------+--------------------+
| @@binlog_format | @@system_time_zone |
+-----------------+--------------------+
| STATEMENT       | BST                |
+-----------------+--------------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE `test1` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    ->   `a` varchar(30) NOT NULL,
    ->   `name` varchar(25) DEFAULT NULL,
    ->   PRIMARY KEY (`a`),
    ->   UNIQUE KEY `id` (`id`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> insert into test1 values (null,now(),"test1",0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test1 values (null,now(),"test2",0);
Query OK, 1 row affected (0.01 sec)

mysql> insert into test1 values (null,now(),"test3",0);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 10:13:37 | test1 | 0    |
|  2 | 2022-05-22 10:13:37 | test2 | 0    |
|  3 | 2022-05-22 10:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

— replica

mysql> select @@system_time_zone;
+--------------------+
| @@system_time_zone |
+--------------------+
| UTC                |
+--------------------+
1 row in set (0.00 sec)

mysql> select * from db1.test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 09:13:37 | test1 | 0    |
|  2 | 2022-05-22 09:13:37 | test2 | 0    |
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

— source 

mysql> UPDATE test1 SET name = 'foobar', d = CURRENT_TIMESTAMP WHERE a = 'test1' AND d = '2022-05-22 10:13:37';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from test1;
+----+---------------------+-------+--------+
| id | d                   | a     | name   |
+----+---------------------+-------+--------+
|  1 | 2022-05-22 10:16:15 | test1 | foobar |
|  2 | 2022-05-22 10:13:37 | test2 | 0      |
|  3 | 2022-05-22 10:13:38 | test3 | 0      |
+----+---------------------+-------+--------+
3 rows in set (0.00 sec)

— replica

mysql> select * from db1.test1;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  1 | 2022-05-22 09:13:37 | test1 | 0    |
|  2 | 2022-05-22 09:13:37 | test2 | 0    |
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
3 rows in set (0.00 sec)

mysql> pager egrep "Running|SQL_Error"
PAGER set to 'egrep "Running|SQL_Error"'

mysql > show replica status\G
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
               Last_SQL_Error: 
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

 

Another test, using UTC_TIME() compared against a column that was populated using the CURRENT_TIMESTAMP function:

— source 

mysql> select * from test1 WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 10:13:38 | test3 | 0    |
+----+---------------------+-------+------+
1 row in set (0.00 sec)

— replica

mysql> select * from test1 WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
Empty set (0.00 sec)

Therefore, when a similar condition is used for update:

— source

mysql> update test1 set name="bar" WHERE TIME(d) > DATE_SUB(UTC_TIME(), INTERVAL 11 HOUR) AND id=3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from test1 where id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 22:12:15 | test3 | bar  |
+----+---------------------+-------+------+
1 row in set (0.01 sec)

— replica

mysql> select * from test1 where id=3;
+----+---------------------+-------+------+
| id | d                   | a     | name |
+----+---------------------+-------+------+
|  3 | 2022-05-22 09:13:38 | test3 | 0    |
+----+---------------------+-------+------+
1 row in set (0.01 sec)

mysql > show replica status\G
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
               Last_SQL_Error: 
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
     Last_SQL_Error_Timestamp: 
1 row in set (0.00 sec)

Again replica ignored the update while there is no replication error reported. This particular scenario can be expected to happen really often in geographically-distributed database environments.

As the functions used here are not considered unsafe for replication, two usual safety actions are not performed:

  • no warning is printed in the error log when the STATEMENT format used
  • replication event is not logged in RBR format when the MIXED format is used but rather left the original query as it is

I find this potentially dangerous and hence reported it: https://bugs.mysql.com/bug.php?id=107293

Summary

ROW-based replication has become the standard in MySQL and is the most reliable one. It is also the only one permitted for virtually synchronous replication solutions like Percona XtraDB Cluster/Galera and MySQL Group Replication.

At the same time, STATEMENT or even MIXED format, may lead to data consistency issues, which can be undetected for a long time, making it very difficult to investigate when finally replication error happens as a result.

If there is anything that has prevented you from switching to ROW format yet, the sooner you deal with it the better.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com