May
15
2023
--

MongoDB 4.2 EOL… And Its Implications

MongoDB 4.2 EOL

Enjoy it while it lasts, as everything has its end.

It sounded a bit more cryptic than it was planned, but I hope that it gets the attention it needs, as it’s important to know that MongoDB 4.2 in April has reached its End of Life (EOL), and more versions are soon going to be decommissioned as well.

What does that mean for me?

If you are a user of MongoDB 4.2, whether the MongoDB Inc. version or Percona Server for MongoDB one, your database will no longer receive bug fixes, patches, or minor releases.

As defined in our Lifecycle Policy:

We will provide Operational Support for customers running EOL software on active platforms. For EOLed platforms, we provide Community Support.

And as stated in our Lifecycle Overview:

For software that Percona bases on an upstream build, we match the upstream EOL dates.

Our MongoDB Server follows the upstream EOL dates. This means that bug fixes and software builds will no longer be generated also for our release of MongoDB.

Also, with the Percona Server for MongoDB 4.2 reaching its end of life, the implications are as follows:

  • Percona Distribution for MongoDB 4.2 will no longer receive updates and bug fixes
  • Percona Backup for MongoDB (PBM) will no longer support 4.2 either. That means that testing with 4.2 has ceased, and while PBM may still successfully perform backups and restores, we cannot guarantee it anymore.

That being said, rest assured, you will not be left alone. Those that have or would like to sign up for a Percona Support Subscription will continue to receive operational support and services. Operational support includes but is not limited to:

  • Query optimization
  • MongoDB tuning (replica sets and sharded clusters)
  • MongoDB configuration, including our enterprise features such as LDAP
  • Upgrade support (from EOL versions, so, i.e., 3.6->4.0->4.2->…)
  • Setup and configuration of MongoDB clusters and tools such as Percona Backup for MongoDB and Percona Monitoring and Management (respecting the tool limitation for the EOL-ed version).
  • In case of crashes, although we do not report bugs, we can still track down known bugs and provide recommendations.

Still have questions about the 4.2 EOL?

In her recent blog post, MongoDB V4.2 EOL Is Coming: How To Upgrade Now and Watch Out for the Gotchas!, our MongoDB Tech Lead, Kimberly Wilkins, has covered the ins and outs of a MongoDB upgrade.

She has also hosted a webinar on the MongoDB 4.2 EOL common questions and challenges.

If you are our customer, please create a ticket for more assistance. Remember also that our Percona Community Forum is always open for any users of our software, as we believe that community is very important in building our products!

What’s next

I do not want to be the bearer of bad news, but we have seen the great popularity of MongoDB 4.2 and 4.4. If you are on 4.2 right now, it makes all the difference to move away from it ASAP. This version has just become a possible threat to your security.

As you see, 4.4 was mentioned as well. That’s right, this highly popular, last version in the 4.x family is scheduled to be EOL in February 2024. That’s less than one year to make preparations for upgrading.

Mongo 4.2 EOL

MongoDB EOL for the upcoming year or so.

While at it, notice that 5.0 is planned to be EOL in October 2024 as well, so next year, it’s worth considering upgrading to 6.0 to have at least till 2025 for the next EOL.

MongoDB eol

MongoDB 6.0 still has two years of life now.

If such an upgrade seems challenging and you want some help or at least advice around it, consider some of our premium services from MongoDB experts that can help you with migration by:

  • Support – Answering any operational questions
  • Managed Services – Playing the role of the remote DBA that handles all maintenance (including upgrades) for you
  • Consulting – Professionals that can come in and advise or even do the upgrade for you at any time
  • Training – So that your team can feel more comfortable with running the upgrades

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB Today!

May
15
2023
--

Proof of Concept: Horizontal Write Scaling for MySQL With Kubernetes Operator

horizontal write scaling kubernetes

Historically MySQL is great in horizontal READ scale. The scaling, in that case, is offered by the different number of Replica nodes, no matter if using standard asynchronous replication or synchronous replication.

However, those solutions do not offer the same level of scaling for writes operation.

Why? Because the solutions still rely on writing in one single node that works as Primary. Also, in the case of multi-Primary, the writes will be distributed by transaction. In both cases, when using virtually-synchronous replication, the process will require certification from each node and local (by node) write; as such, the number of writes is NOT distributed across multiple nodes but duplicated.

The main reason behind this is that MySQL is a relational database system (RDBMS), and any data that is going to be written in it must respect the RDBMS rules. In short, any data that is written must be consistent with the data present. To achieve that, the data needs to be checked with the existing through defined relations and constraints. This action is something that can affect very large datasets and be very expensive. Think about updating a table with millions of rows that refer to another table with another million rows.

An image may help:

data model for ecommerce

Every time I will insert an order, I must be sure that all the related elements are in place and consistent.

This operation is quite expensive but our database can run it in a few milliseconds or less, thanks to several optimizations that allow the node to execute most of them in memory with no or little access to mass storage.

The key factor is that the whole data structure resides in the same location (node), facilitating the operations.

Once we have understood that, it will also become clear why we cannot have relational data split in multiple nodes and have to distribute writes by table. If I have a node that manages only the items, another one the orders, and another one the payments, I will need to have my solution able to deal with distributed transactions, each of which needs to certify and verify other nodes’ data.

This level of distribution will seriously affect the efficiency of the operation, which will increase the response time significantly. This is it. Nothing is impossible; however, the performances will be so impacted that each operation may take seconds instead of milliseconds or a fraction of it unless lifting some of the rules breaks the relational model.

MySQL, as well as other RDBMS, are designed to work respecting the model and cannot scale in any way by fragmenting and distributing a schema, so what can be done to scale?

The alternative is to split a consistent set of data into fragments. What is a consistent set of data? It all depends on the kind of information we are dealing with. Keeping in mind the example above, where we have a shop online serving multiple customers, we need to identify which is the most effective way to split the data.

For instance, if we try to split the data by Product Type (Books, CD/DVD, etc.), we will have a huge duplication of data related to customers/orders/shipments and so on, and all this data is also quite dynamic given I will have customers constantly ordering things.

Why duplicate the data? Because if I do not duplicate that data, I will not know if a customer has already bought or not that specific item, or I will have to ask again about the shipment address and so on. This also means that whenever a customer buys something or puts something on the wish list, I have to reconcile the data in all my nodes/clusters.

On the other hand, if I choose to split my data by country of customer’s residence, the only data I will have to duplicate and keep in sync is the one related to the products, of which the most dynamic one will be the number of items in stock. This, of course, is unless I can organize my products by country as well, which is a bit unusual nowadays but not impossible.

Another possible case is if I am a health organization and I manage several hospitals. As for the example above, it will be easier to split my data by hospital, given most of the data related to patients is bound to the hospital itself, as well as treatments and any other element related to hospital management. In contrast, it will make no sense to split by patient’s country of residence.

This technique of splitting the data into smaller pieces is called sharding and is currently the only way we have to scale RDBMS horizontally. 

In the MySQL open source ecosystem, we have only two consolidated ways to perform sharding — Vitess and ProxySQL. The first one is a complete solution that takes ownership of your database and manages almost any aspect of its operations in a sharded environment and includes a lot of specific features for DBAs to deal with daily operations like table modifications, backup, and more.

While this may look great, it also has some strings attached, including the complexity and proprietary environment. That makes Vitess a good fit for “complex” sharding scenarios where other solutions may not be enough.

ProxySQL does not have a sharding mechanism “per se,” but given the way it works and the features it has, it allows us to build simple sharding solutions.

It is important to note that most of the DBA operations will still be on DBA to be executed, with incremented complexity given the sharding environment.

There is a third option which is application-aware sharding.

This solution sees the application aware of the need to split the data into smaller fragments and internally point the data to different “connectors” that are connected to multiple data sources.

In this case, the application is aware of a customer’s country and will redirect all the operations related to him to the datasource responsible for the specific fragment.

Normally this solution requires a full code redesign and could be quite difficult to achieve when it is injected after the initial code architecture definition.

On the other hand, if done at design, it is probably the best solution because it will allow the application to define the sharding rules and can also optimize the different data sources using different technologies for different uses.

One example could be using an RDBMS for most of the Online transaction processing (OLTP) data shared by country and having the products as distributed memory cache with a different technology. At the same time, all the data related to orders, payments, and customer history can be consolidated in a data warehouse used to generate reporting.    

As said, the last one is probably the most powerful, scalable, and difficult to design, and unfortunately, it represents probably less than 5% of the solution currently deployed. 

As well, very few cases are in need to have a full system/solution to provide scalability with sharding.

By experience, most of the needs for horizontal scaling fell in the simple scenario, where there is the need to achieve sharding and data separation, very often with sharding-nothing architecture. In shared-nothing, each shard can live in a totally separate logical schema instance / physical database server/data center/continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.

The POC

Why this POC?

Over the years, I have faced a lot of customers talking about scaling their database solution and looking at very complex sharding as Vitess as the first and only way to go.

This without even considering if their needs were driving them there for real.

In my experience and in talking with several colleagues, I am not alone when analyzing the real needs. After discussing with all the parties impacted, only a very small percentage of customers were in real need of complex solutions. Most of the others were just trying to avoid a project that will implement simple shared-nothing solutions. Why? Because apparently, it is simpler to migrate data to a platform that does all for you than accept a bit of additional work and challenge at the beginning but keep a simple approach. Also, going for the last shining things always has its magic.

On top of that, with the rise of Kubernetes and MySQL Operators, a lot of confusion started to circulate, most of which was generated by the total lack of understanding that a database and a relational database are two separate things. That lack of understanding of the difference and the real problems attached to an RDBMS had brought some to talk about horizontal scaling for databases, with a concerning superficiality and without clarifying if they were talking about RDBMS or not. As such, some clarification is long due as well as putting back the KISS principle as the main focus.

Given that, I thought that refreshing how ProxySQL could help in building a simple sharding solution may help to clarify the issues, reset the expectations and show how we can do things in a simpler way.  (See my old post, MySQL Sharding with ProxySQL.)

To do so, I built a simple POC that illustrates how you can use Percona Operator for MySQL (POM) and ProxySQL to build a sharded environment with a good level of automation for some standard operations like backup/restore software upgrade and resource scaling.

Why ProxySQL?

In the following example, we mimic a case where we need a simple sharding solution, which means we just need to redirect the data to different data containers, keeping the database maintenance operations on us. In this common case, we do not need to implement a full sharding system such as Vitess.  

As illustrated above, ProxySQL allows us to set up a common entry point for the application and then redirect the traffic on the base of identified sharding keys. It will also allow us to redirect read/write traffic to the primary and read-only traffic to all secondaries. 

The other interesting thing is that we can have ProxySQL as part of the application pod, or as an independent service. Best practices indicate that having ProxySQL closer to the application will be more efficient, especially if we decide to activate the caching feature.  

Why POM?

Percona Operator for MySQL has three main solutions: Percona Operator for Percona XtraDB Cluster, Percona Operator for MySQL Group Replication, and Percona Operator for Percona Server for MySQL. The first two are based on virtually-synchronous replication and allow the cluster to keep the data state consistent across all pods, guaranteeing that the service will always offer consistent data. In the K8s context, we can see POM as a single service with native horizontal scalability for reads, while for writes, we will adopt the mentioned sharding approach. 

The other important aspect of using a POM-based solution is the automation it comes with. Deploying POM, you will be able to set automation for backups, software updates, monitoring (using Percona Monitoring and Management (PMM)), and last but not least, the possibility to scale UP or DOWN just by changing the needed resources. 

The elements used

kubernetes sharding

In our POC, I will use a modified version of sysbench (https://github.com/Tusamarco/sysbench) that has an additional field continent and I will use that as a sharding key. At the moment, and for the purpose of this simple POC, I will only have two shards.

As the diagram above illustrates here, we have a simple deployment but good enough to illustrate the sharding approach.

We have:

  • The application(s) node(s) — it is really up to you if you want to test with one application node or more. Nothing will change, as well as for the ProxySQL nodes, but just keep in mind that if you use more ProxySQL nodes is better to activate the internal cluster support or use consul to synchronize them.
  • Shard one is based on POM with PXC; it has the following:
  • Load balancer for service entry point
    • Entry point for r/w
    • Entry point for read only
  • Three Pods for Haproxy
    • Haproxy container
    • Pmm agent container
  • Three Pods with data nodes (PXC)
    • PXC cluster node container
    • Log streaming
    • Pmm container 
  • Backup/restore service 
  • Shard two is based on POM for Percona Server for MySQL and Group Replication (technical preview)
    • Load balancer for service entry point
      • Entry point for r/w
      • Entry point for read-only
    • Three Pods for MySQL Router (testing)
      • MySQL router container
    • Three Pods with data nodes (PS with GR)
      • PS -GR cluster node container
      • Log streaming
      • Pmm container 
    • Backup/restore on scheduler

Now you may have noticed that the representation of the nodes is different in size; this is not a mistake while drawing. It indicates that I have allocated more resources (CPU and Memory) to shard1 than shard2. Why? Because I can and I am simulating a situation where a shard2 gets less traffic, at least temporarily, as such I do not want to give it the same resources as shard1. I will eventually increase them if I see the need.

The settings

Data layer

Let us start with the easy one, the data layer configuration. Configuring the environment correctly is the key, and to do so, I am using a tool that I wrote specifically to calculate the needed configuration in a K8s POM environment. You can find it here (https://github.com/Tusamarco/mysqloperatorcalculator). 

Once you have compiled it and run it, you can simply ask what “dimensions” are supported, or you can define a custom level of resources, but you will still need to indicate the expected load level. In any case, please refer to the README in the repository with all the instructions.

The full cr.yaml for PXC shard1 is here, while the one for PS-GR is here

For Shard1: I asked for resources to cover traffic of type 2 (Light OLTP), configuration type 5 (2XLarge) 1000 connections.

For Shard2: I ask for resources to cover traffic of type 2 (Light OLTP), configuration type 2 (Small), 100 connections.     

Once you have the CRs defined, you can follow the official guidelines to set the environment up:

It is time now to see the ProxySQL settings.

ProxySQL and sharding rules

As mentioned before, we will test the load sharding by continent, and we know that ProxySQL will not provide additional help to automatically manage the sharded environment. 

Given that one way to do it is to create a DBA account per shard or to inject shard information in the commands while executing. I will use the less comfortable one just to prove if it works, the different DBA accounts. 

We will have two shards: the sharding key is the continent field, and the continents will be grouped as follows:

  • Shard one:
    • Asia
    • Africa
    • Antarctica
    • Europe
    • North America
  • Shard two:
    • Oceania
    • South America

The DBAs users:

  • dba_g1
  • dba_g2

The application user:

  • app_test

The host groups will be:

  • Shard one
    • 100 Read and Write
    • 101 Read only
  • Shard two
    • 200 Read and Write
    • 201 Read only

Once that is defined, we need to identify which query rules will serve us and how. What we want is to redirect all the incoming queries for:

  • Asia, Africa, Antarctica, Europe, and North America to shard1.
  • Oceania and South America to shard2
  • Split the queries in R/W and Read only
  • Prevent the execution of any query that does not have a shard key
  • Backup data at regular intervals and store it in a safe place

ProxySQL and sharding rules

Given the above, we first define the rules for the DBAs accounts.

We set the Hostgroup for each DBA and then if the query matches the sharding rule, we redirect it to the proper sharding. Otherwise, the HG will remain as set.

This allows us to execute queries like CREATE/DROP table on our shard without a problem but will allow us to send data where needed. 

For instance, the one below is the output of the queries that sysbench will run.

Prepare:

INSERT INTO windmills_test1 /*  continent=Asia */ (uuid,millid,kwatts_s,date,location,continent,active,strrecordtype) VALUES(UUID(), 79, 3949999,NOW(),'mr18n2L9K88eMlGn7CcctT9RwKSB1FebW397','Asia',0,'quq')

In this case, I have the application simply injecting a comment in the INSERT SQL declaring the shard key; given I am using the account dba_g1 to create/prepare the schemas, rules 32/32 will be used and given I have sett apply=1, ProxySQL will exit the query rules parsing and send the command to the relevant hostgroup.

Run:

SELECT id, millid, date,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ? AND continent='South America'

SELECT SUM(kwatts_s) FROM windmills_test1 WHERE id BETWEEN ? AND ?  and active=1  AND continent='Asia'
SELECT id, millid, date,continent,active,kwatts_s  FROM windmills_test1 WHERE id BETWEEN ? AND ?  AND continent='Oceania' ORDER BY millid

SELECT DISTINCT millid,continent,active,kwatts_s   FROM windmills_test1 WHERE id BETWEEN ? AND ? AND active =1  AND continent='Oceania' ORDER BY millid

UPDATE windmills_test1 SET active=? WHERE id=?  AND continent='Asia'
UPDATE windmills_test1 SET strrecordtype=? WHERE id=?  AND continent='North America'

DELETE FROM windmills_test1 WHERE id=?  AND continent='Antarctica'

INSERT INTO windmills_test1 /* continent=Antarctica */ (id,uuid,millid,kwatts_s,date,location,continent,active,strrecordtype) VALUES (?, UUID(), ?, ?, NOW(), ?, ?, ?,?) ON DUPLICATE KEY UPDATE kwatts_s=kwatts_s+1

The above are executed during the tests.  In all of them, the sharding key is present, either in the WHERE clause OR as a comment. 

Of course, if I execute one of them without the sharding key, the firewall rule will stop the query execution, i.e.:

mysql> SELECT id, millid, date,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ?;
ERROR 1148 (42000): It is impossible to redirect this command to a defined shard. Please be sure you Have the Continent definition in your query, or that you use a defined DBA account (dba_g{1/2})


Check
here for the full command list.

Setting up the dataset

Once the rules are set, it is time to set up the schemas and the data using sysbench (https://github.com/Tusamarco/sysbench). Remember to use windmills_sharding tests.  

The first operation is to build the schema on SHARD2 without filling it with data. This is a DBA action; as such, we will execute it using the dba_g2 account:

sysbench ./src/lua/windmills_sharding/oltp_read.lua  --mysql-host=10.0.1.96  --mysql-port=6033 --mysql-user=dba_g2 --mysql-password=xxx --mysql-db=windmills_large --mysql_storage_engine=innodb --db-driver=mysql --tables=4 --table_size=0 --table_name=windmills --mysql-ignore-errors=all --threads=1  prepare

Setting table_size and pointing to the ProxySQL IP/port will do, and I will have the following:

mysql> select current_user(), @@hostname;
+----------------+-------------------+
| current_user() | @@hostname        |
+----------------+-------------------+
| dba_g2@%       | ps-mysql1-mysql-0 |
+----------------+-------------------+
1 row in set (0.01 sec)

mysql> use windmills_large;
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_windmills_large |
+---------------------------+
| windmills1                |
| windmills2                |
| windmills3                |
| windmills4                |
+---------------------------+
4 rows in set (0.01 sec)

mysql> select count(*) from windmills1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.09 sec)

All set but empty.

Now let us do the same but with the other DBA user:

sysbench ./src/lua/windmills_sharding/oltp_read.lua  --mysql-host=10.0.1.96  --mysql-port=6033 --mysql-user=dba_g1 --mysql-password=xxx --mysql-db=windmills_large --mysql_storage_engine=innodb --db-driver=mysql --tables=4 --table_size=400 --table_name=windmills --mysql-ignore-errors=all --threads=1  prepare

If I do now the select above with user dba_g2:

mysql> select current_user(), @@hostname;select count(*) from windmills1;
+----------------+-------------------+
| current_user() | @@hostname        |
+----------------+-------------------+
| dba_g2@%       | ps-mysql1-mysql-0 |
+----------------+-------------------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
|      113 |
+----------+
1 row in set (0.00 sec)

While If I reconnect and use dba_g1:

mysql> select current_user(), @@hostname;select count(*) from windmills1;
+----------------+--------------------+
| current_user() | @@hostname         |
+----------------+--------------------+
| dba_g1@%       | mt-cluster-1-pxc-0 |
+----------------+--------------------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
|      287 |
+----------+
1 row in set (0.01 sec)

I can also check on ProxySQL to see which rules were utilized:

select active,hits,destination_hostgroup, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;

+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+
| hits | destination_hostgroup | rule_id | match_digest        | match_pattern                                                              | apply | flagIN | flagOUT |
+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+
| 3261 | 100                   | 20      | NULL                | NULL                                                                       | 0     | 0      | 500     |
| 51   | 200                   | 21      | NULL                | NULL                                                                       | 0     | 0      | 600     |
| 2320 | 100                   | 31      | NULL                | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 1     | 500    | 0       |
| 880  | 200                   | 32      | NULL                | scontinents*(=|like)s*'*(Oceania|South America)'*                       | 1     | 500    | 0       |
| 0    | 100                   | 34      | NULL                | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 1     | 600    | 0       |
| 0    | 200                   | 35      | NULL                | scontinents*(=|like)s*'*(Oceania|South America)'*                       | 1     | 600    | 0       |
| 2    | 100                   | 51      | NULL                | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 0     | 0      | 1001    |
| 0    | 200                   | 54      | NULL                | scontinents*(=|like)s*'*(Oceania|South America)'*                       | 0     | 0      | 1002    |
| 0    | 100                   | 60      | NULL                | NULL                                                                       | 0     | 50     | 1001    |
| 0    | 200                   | 62      | NULL                | NULL                                                                       | 0     | 60     | 1002    |
| 7    | NULL                  | 2000    | .                   | NULL                                                                       | 1     | 0      | NULL    |
| 0    | 100                   | 2040    | ^SELECT.*FOR UPDATE | NULL                                                                       | 1     | 1001   | NULL    |
| 2    | 101                   | 2041    | ^SELECT.*$          | NULL                                                                       | 1     | 1001   | NULL    |
| 0    | 200                   | 2050    | ^SELECT.*FOR UPDATE | NULL                                                                       | 1     | 1002   | NULL    |
| 0    | 201                   | 2051    | ^SELECT.*$          | NULL                                                                       | 1     | 1002   | NULL    |
+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+

Running the application

Now that the data load test was successful let us do the real load following the indication as above but use 80 Tables and just a bit more records like 20000, nothing huge. 

Once the data is loaded, we will have the two shards with different numbers of records. If all went well, the shard2 should have ¼ of the total and shard1 ¾.

When the load is over, I have, as expected:

mysql> select current_user(), @@hostname;select count(*) as shard1 from windmills_large.windmills80;select /* continent=shard2 */ count(*) as shard2 from windmills_large.windmills80;
+----------------+--------------------+
| current_user() | @@hostname         |
+----------------+--------------------+
| dba_g1@%       | mt-cluster-1-pxc-0 |
+----------------+--------------------+
1 row in set (0.00 sec)

+--------+
| shard1 |
+--------+
|  14272 | ← Table windmills80 in SHARD1
+--------+
+--------+
| shard2 |
+--------+
|   5728 | ← Table windmills80 in SHARD2
+--------+

As you may have already noticed, I used a trick to query the other shard using the dba_g1 user, I just passed in the query the shard2 definition as a comment. That is all we need.

Let us execute the run command for writes in sysbench and see what happens.

The first thing we can notice while doing writes is the query distribution:

+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+
| weight | hostgroup | srv_host                                                                   | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries |
+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+
| 10000  | 100       | ac966f7d46c04400fb92a3603f0e2634-193113472.eu-central-1.elb.amazonaws.com  | 3306     | ONLINE | 24	     | 0        | 138    | 66      | 25          | 1309353 |
| 100    | 101       | a5c8836b7c05b41928ca84f2beb48aee-1936458168.eu-central-1.elb.amazonaws.com | 3306     | ONLINE | 0	     | 0        | 0      | 0       | 0           |       0 |
| 10000  | 200       | a039ab70e9f564f5e879d5e1374d9ffa-1769267689.eu-central-1.elb.amazonaws.com | 3306     | ONLINE | 24	     | 1        | 129    | 66      | 25          |  516407 |
| 10000  | 201       | a039ab70e9f564f5e879d5e1374d9ffa-1769267689.eu-central-1.elb.amazonaws.com | 6447     | ONLINE | 0	     | 0        | 0      | 0       | 0           |       0 |
+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+

Where we can notice that the load in connection is evenly distributed, while the load is mainly going to shard1 as we expected, given we have an unbalanced sharding by design.

At the MySQL level, we had:

Questions

MySQL

Com Type

The final point is, what is the gain of using this sharding approach?

Well, we still need to consider the fact we are testing on a very small set of data. However, if we can already identify some benefits here, that will be an interesting result. 

Let’s see the write operations with 24 and 64 threads:

MySQL writes

MySQL latency

We get a gain of ~33% just using sharding, while for latency, we do not have a cost. On the contrary, also with a small load increase, we can see how the sharded solution performs better. Of course, we are still talking about a low number of rows and running threads but the gain is there. 

Backup

The backup and restore operation when using POM is completely managed by the operator (see instructions in the POM documentation https://docs.percona.com/percona-operator-for-mysql/pxc/backups.html and https://docs.percona.com/percona-operator-for-mysql/ps/backups.html). 

The interesting part is that we can have multiple kinds of backup solutions, like:

  • On-demand
  • Scheduled 
  • Full Point in time recovery with log streaming

Automation will allow us to set a schedule as simple as this:

schedule:
     - name: "sat-night-backup"
        schedule: "0 0 * * 6"
        keep: 3
        storageName: s3-eu-west
      - name: "daily-backup"
        schedule: "0 3 * * *"
        keep: 7
        storageName: s3-eu-west

Or, if you want to run the on-demand:

kubectl apply -f backup.yaml

Where the backup.yaml file has very simple information:

apiVersion: ps.percona.com/v1alpha1
kind: PerconaServerMySQLBackup
metadata:
  name: ps-gr-sharding-test-2nd-of-may
#  finalizers:
#    - delete-backup
spec:
  clusterName: ps-mysql1
  storageName: s3-ondemand

Using both methods, we will be able to soon have a good set of backups like:

POM (PXC)

cron-mt-cluster-1-s3-eu-west-20234293010-3vsve   mt-cluster-1   s3-eu-west    s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-04-29-03:00:10-full   Succeeded   3d9h        3d9h
cron-mt-cluster-1-s3-eu-west-20234303010-3vsve   mt-cluster-1   s3-eu-west    s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-04-30-03:00:10-full   Succeeded   2d9h        2d9h
cron-mt-cluster-1-s3-eu-west-2023513010-3vsve    mt-cluster-1   s3-eu-west    s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-05-01-03:00:10-full   Succeeded   33h         33h
cron-mt-cluster-1-s3-eu-west-2023523010-3vsve    mt-cluster-1   s3-eu-west    s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-05-02-03:00:10-full   Succeeded   9h          9h

POM (PS) *

NAME                             STORAGE       DESTINATION                                                                     STATE       COMPLETED   AGE
ps-gr-sharding-test              s3-ondemand   s3://mt-bucket-backup-tl/ondemand/ondemand/ps-mysql1-2023-05-01-15:10:04-full   Succeeded   21h         21h
ps-gr-sharding-test-2nd-of-may   s3-ondemand   s3://mt-bucket-backup-tl/ondemand/ondemand/ps-mysql1-2023-05-02-12:22:24-full   Succeeded   27m         27m

Note that as DBA, we still need to validate the backups with a restore procedure. That part is not automated (yet). 

*Note that Backup for POM PS is available only on demand, given the solution is still in technical preview.

When will this solution fit in?

As mentioned multiple times, this solution can cover simple cases of sharding; better if you have shared-nothing. 

It also requires work from the DBA side in case of DDL operations or resharding. 

You also need to be able to change some SQL code to be sure to have present the sharding key/information in any SQL executed.

When will this solution not fit in?

Several things could prevent you from using this solution. The most common ones are:

  • You need to query multiple shards at the same time. This is not possible with ProxySQL.
  • You do not have a DBA to perform administrative work and need to rely on an automated system.
  • Distributed transaction cross-shard.
  • No access to SQL code.

Conclusions

We do not have the Amletic dilemma about sharding or not sharding. 

When using an RDBMS like MySQL, if you need horizontal scalability, you need to shard. 

The point is there is no magic wand or solution; moving to sharding is an expensive and impacting operation. If you choose it at the beginning, before doing any application development, the effort can be significantly less. 

Doing sooner will also allow you to test proper solutions, where proper is a KISS solution. Always go for the less complex things, because in two years you will be super happy about your decision.  

If, instead, you must convert a current solution, then prepare for bloodshed, or at least for a long journey. 

In any case, we need to keep in mind a few key points:

  • Do not believe most of the articles on the internet that promise you infinite scalability for your database. If there is no distinction in the article between a simple database and an RDBMS, run away. 
  • Do not go for the last shiny things just because they shine. Test them and evaluate IF it makes sense for you. Better to spend a quarter testing now a few solutions than fight for years with something that you do not fully comprehend.  
  • Using containers/operators/Kubernetes does not scale per se; you must find a mechanism to have the solution scaling. There is absolutely NO difference with premises. What you may get is a good level of automation. However, that will come with a good level of complexity, and it is up to you to evaluate if it makes sense or not.  

As said at the beginning, for MySQL, the choice is limited. Vitess is the full complete solution, with a lot of coding to provide you with a complete platform to deal with your scaling needs.

However, do not be so fast to exclude ProxySQL as a possible solution. There are out there already many using it also for sharding. 

This small POC used a synthetic case, but it also shows that with just four rules, you can achieve a decent solution. A real scenario could be a bit more complex … or not. 

References

Vitess (https://vitess.io/docs/)

ProxySQL (https://proxysql.com/documentation/)

Firewalling with ProxySQL (https://www.tusacentral.com/joomla/index.php/mysql-blogs/197-proxysql-firewalling)

Sharding:

 

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

 

Learn More About Percona Kubernetes Operators

May
10
2023
--

Understanding Linux IOWait

IOWait

I have seen many Linux Performance engineers looking at the “IOWait” portion of CPU usage as something to indicate whenever the system is I/O-bound. In this blog post, I will explain why this approach is unreliable and what better indicators you can use.

Let’s start by running a little experiment – generating heavy I/O usage on the system:

sysbench  --threads=8 --time=0 --max-requests=0  fileio --file-num=1 --file-total-size=10G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run

 

CPU Usage in Percona Monitoring and Management (PMM):

CPU Usage in Percona Monitoring and Management

root@iotest:~# vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  6      0 7137152  26452 762972    0    0 40500  1714 2519 4693  1  6 55 35  3
 2  8      0 7138100  26476 762964    0    0 344971    17 20059 37865  3 13  7 73  5
 0  8      0 7139160  26500 763016    0    0 347448    37 20599 37935  4 17  5 72  3
 2  7      0 7139736  26524 762968    0    0 334730    14 19190 36256  3 15  4 71  6
 4  4      0 7139484  26536 762900    0    0 253995     6 15230 27934  2 11  6 77  4
 0  7      0 7139484  26536 762900    0    0 350854     6 20777 38345  2 13  3 77  5

So far, so good, and — we see I/O intensive workload clearly corresponds to high IOWait  (“wa” column in vmstat). 

Let’s continue running our I/O-bound workload and add a heavy CPU-bound load:

sysbench --threads=8 --time=0 cpu run

 

heavy CPU usage

root@iotest:~# vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12  4      0 7121640  26832 763476    0    0 48034  1460 2895 5443  6  7 47 37  3
13  3      0 7120416  26856 763464    0    0 256464    14 12404 25937 69 15  0  0 16
 8  8      0 7121020  26880 763496    0    0 325789    16 15788 33383 85 15  0  0  0
10  6      0 7121464  26904 763460    0    0 322954    33 16025 33461 83 15  0  0  1
 9  7      0 7123592  26928 763524    0    0 336794    14 16772 34907 85 15  0  0  1
13  3      0 7124132  26940 763556    0    0 386384    10 17704 38679 84 16  0  0  0
 9  7      0 7128252  26964 763604    0    0 356198    13 16303 35275 84 15  0  0  0
 9  7      0 7128052  26988 763584    0    0 324723    14 13905 30898 80 15  0  0  5
10  6      0 7122020  27012 763584    0    0 380429    16 16770 37079 81 18  0  0  1

 

What happened?  IOWait is completely gone and now this system does not look I/O-bound at all!  

In reality, though, of course, nothing changed for our first workload — it continues to be I/O-bound; it just became invisible when we look at “IOWait”!

To understand what is happening, we really need to understand what “IOWait” is and how it is computed.

There is a good article that goes into more detail on the subject, but basically, “IOWait” is kind of idle CPU time. If the CPU core gets idle because there is no work to do, the time is accounted as “idle.”  If, however, it got idle because a process is waiting on disk, I/O time is counted towards “IOWait.”

However, if a process is waiting on disk I/O but other processes on the system can use the CPU, the time will be counted towards their CPU usage as user/system time instead. 

Because of this accounting, other interesting behaviors are possible.  Now instead of running eight I/O-bound threads, let’s just run one I/O-bound process on four core VM:

sysbench  --threads=1 --time=0 --max-requests=0  fileio --file-num=1 --file-total-size=10G --file-io-mode=sync --file-extra-flags=direct --file-test-mode=rndrd run

 

four core VM CPU usage

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  1      0 7130308  27704 763592    0    0 62000    12 4503 8577  3  5 69 20  3
 2  1      0 7127144  27728 763592    0    0 67098    14 4810 9253  2  5 70 20  2
 2  1      0 7128448  27752 763592    0    0 72760    15 5179 9946  2  5 72 20  1
 4  0      0 7133068  27776 763588    0    0 69566    29 4953 9562  2  5 72 21  1
 2  1      0 7131328  27800 763576    0    0 67501    15 4793 9276  2  5 72 20  1
 2  0      0 7128136  27824 763592    0    0 59461    15 4316 8272  2  5 71 20  3
 3  1      0 7129712  27848 763592    0    0 64139    13 4628 8854  2  5 70 20  3
 2  0      0 7128984  27872 763592    0    0 71027    18 5068 9718  2  6 71 20  1
 1  0      0 7128232  27884 763592    0    0 69779    12 4967 9549  2  5 71 20  1
 5  0      0 7128504  27908 763592    0    0 66419    18 4767 9139  2  5 71 20  1

 

Even though this process is completely I/O-bound, we can see IOWait (wa) is not particularly high, less than 25%. On larger systems with 32, 64, or more cores, such completely IO-bottlenecked processes will be all but invisible, generating single-digit IOWait percentages. 

As such, high IOWait shows many processes in the system waiting on disk I/O, but even with low IOWait, the disk I/O may be bottlenecked for some processes on the system.

If IOWait is unreliable, what can you use instead to give you better visibility? 

First, look at application-specific observability.  The application, if it is well instrumented, tends to know best whenever it is bound by the disk and what particular tasks are I/O-bound. 

If you only have access to Linux metrics, look at the “b” column in vmstat, which corresponds to processes blocked on disk I/O. This will show such processes, even of concurrent CPU-intensive loads, will mask IOWait:

CPU intensive load will mask IOWait

Finally, you can look at per-process statistics to see which processes are waiting for disk I/O. For Percona Monitoring and Management, you can install a plugin as described in the blog post Understanding Processes Running on Linux Host with Percona Monitoring and Management.

With this extension, we can clearly see which processes are runnable (running or blocked on CPU availability) and which are waiting on disk I/O!

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

 

Download Percona Monitoring and Management Today

May
09
2023
--

Restoring a Snapshot of a Sharded MongoDB Cluster to a Kubernetes-Based MongoDB Environment

Kubernetes-Based MongoDB Environment

Many MongoDB clusters use storage-level snapshots to provide fast and reliable backups. In this blog post, you’ll learn how to restore such a snapshot from a traditional VM-based sharded MongoDB cluster to a freshly deployed Percona Operator for MongoDB cluster on Kubernetes.

Background story

I recently worked with a company running a large four-shard MongoDB Enterprise Server database cluster on VMs on premises that decided to migrate to the Google Cloud Platform. After careful consideration and pros-and-con evaluation, the customer chose to go with Percona Distribution for MongoDB run on Kubernetes – specifically using Percona Operator for MongoDB. There are four main factors that contributed to this decision:

  1. The total cost of ownership: K8s resources are significantly cheaper than running a popular DBaaS on the same cloud
  2. Ease of deployments: Percona Operator for MongoDB makes day-1 and day-2 operations a breeze
  3. Preference for open-source tools and cloud-native solutions: applications are already migrated to Kubernetes – running the entire solution that way simplifies operations. 
  4. This approach frees the deployment from cloud (and any other vendor) lock-in.

To start realistic compatibility and performance testing, we needed to restore the filesystem snapshot backup that was stored on NetApp volume in GCP into a Google Kubernetes Engine (GKE) cluster running Percona Operator for MongoDB. It’s not a trivial task due to the requirements for restoring a sharded cluster backup into a new cluster. I will show you why and how we did it.

Requirements

Let’s look at the overall requirements for the cluster where I want to restore the snapshot to:

  1. The same major (ideally minor, too) MongoDB version
  2. The same number of shards
  3. The same name of all shards
  4. The same name of the Config RS
  5. The same hostnames of all nodes (this is how Config RS connects to the specific shard)
  6. The same MongoDB configuration with regards to how files are stored on the disk

This seems to be straightforward; however, there are a couple of special considerations to make when it comes to the environment controlled by Percona Operator for MongoDB, specifically:

  • You can’t control the name of Config RS
  • You can’t change the hostnames and those will certainly be different within Kubernetes
  • Percona Operator needs specific users to be present in your cluster in order to control it – and those won’t be present in your backup.

All of the above makes it impossible to simply adjust the Operator configuration and copy all the files from your snapshot backup in the specific volumes.

Plan

The high-level plan consists of the following steps:

  1. Deploy the cluster on K8s
  2. Restore snapshot files
    1. Pause the cluster on K8s
    2. Mount storage volumes to a separate VM
    3. Copy snapshot files to respective volumes
  3. Prepare each replica set in the standalone mode (hostnames, sharding configuration, users)
  4. Start the cluster on K8s and initialize each replica set

The following approach is applicable regardless of what “flavor” of MongoDB your source environment uses. This can be MongoDB Enterprise Server, MongoDB Community Edition, or Percona Server for MongoDB.

Step one: Deploy the cluster on K8s

In order to deploy Percona Server for MongoDB (PSMDB) on the K8s cluster, follow the documentation. Before you execute the last step (deploying cr.yaml), however, make sure to adjust the following elements of the configuration. This will make the cluster “fit” the one that we took the backup from.

  1. Set spec.image to a specific image version. It needs to be the version that matches the major version of the source cluster, for example:

    spec:
      image: percona/percona-server-mongodb:5.0.11-10
  2. Create as many shard replica sets as in the source cluster. Copy the default replica set definition (entire section) in spec.replsets[]. For example, if your source cluster has two shards with the names “shard1” and “shard2” (names must match the ones from the source cluster):

    replsets:
     - name: shard1
       size: 3
       [...]
     - name: shard2
       size: 3
       [...]

    Unfortunately, you can’t set the name of Config RS. We’ll deal with that later.

  3. If your source cluster WiredTiger configuration is different than default, adjust the mongod configuration of each replica set to match it. Specifically, two MongoDB configuration items are critical: storage.directoryPerDB and storage.wiredTiger.engineConfig.directoryForIndexes. You can do it in the following way:

    replsets:
      - name: shard1
        size: 3
        configuration: |
          storage:
            directoryPerDB: true
          wiredTiger:
            engineConfig:
              directoryForIndexes: true
  4. Save changes and start the cluster using the modified cr.yaml file

Step two: Restore snapshot files

Your cluster should be started at this point and it is important that all PersistentVolumes required for it be created.  You can check your cluster state with kubectl get psmdb, see all deployed pods with kubectl get pods, or check PVs with kubectl get pv. In this step, you need to mount volumes of all your database nodes to an independent VM as we’ll make changes in MongoDB standalone mode. You need those VMs temporarily to perform the required operations.

  1. Pause the PSMDB cluster on K8s by setting spec.pause: true in your cr.yaml file. That will cause the entire cluster to stop and make it possible to mount volumes elsewhere.
  2. Check zones where your PerstistentVolumes were created. You can use kubectl describe pv pv_name or find it in the cloud console.
  3. Create a VM in each zone, then mount volumes corresponding to PersistentVolumes to a VM in its zone. It’s critical that you can easily identify the volume’s purpose (which replica set in your MongoDB cluster it refers to and which node – primary or secondary). As an example, this is how you can mount a volume to your VM using gcloud API:

    # attach disk to an instance
    gcloud compute instances attach-disk vm_name 
    --disk disk_name 
    --zone=us-central1-a
    
    # check volume on vm
    sudo lsblk
    
    # create a mountpoint
    sudo mkdir -p /dir_name (e.g. rs0-primary)
    
    # mount the volume
    sudo mount -o discard,defaults /dev/sdb /dir_name

     

  4. Delete all files from each volume mounted to your VMs – primaries, and secondaries.
  5. Copy files from the snapshot to the respective volume, directly to the main directory of the mounted volume. Do it just for volumes related to primary nodes (leave secondaries empty).
  6. Install Percona Server for MongoDB on each of your VMs. Check installation instructions and install the same version as your Mongo cluster on K8s. Don’t start the server yet!

Step three: Prepare each replica set in the standalone mode

Now, you need to start PSMDB for each volume with data (each primary of each replica set, including Config RS) separately. We will then log in to mongo shell and edit the cluster configuration manually so that when we bring volumes with data back to Kubernetes, it can start successfully.

Execute the steps below for each replica set, including Config RS:

  1. On the VM, when your Config RS primary volume is mounted, edit /etc/mongod.conf. Specifically, adjust storage.dbPath (to the directory where you mounted the volume), storage.directoryPerDB, and storage.wiredTiger.engineConfig.directoryForIndexes.
  2. Start PSMDB with sudo mongod –config /etc/mongod.conf
  3. Connect to PSMDB with mongo command (authentication is not required). Once you successfully log in, we can start making changes to the cluster configuration.
  4. Delete the local replica set configuration with the following commands:

    use local;
    db.dropDatabase();
  5. [This step applies only to Config RS]
    Replace shards configuration. Execute the following command for each shard separately (you can list them all with db.shards.find() ). Replace in the following string: shard_name, cluster_name, namespace_name with values specific to your cluster:

    use config;
    
    db.shards.updateOne(
      { "_id" : "shard_name" },
      { $set :
        { "host" : "shard_name/cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017"}
    });
  6. [This step applies to shard RS only] Clear shard metadata

    use admin
    db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )
  7. [This step applies to shard RS only] Replace Config RS connection string in shardIdentity with the following command. Replace cluster_name, namespace_name with your values.

    use admin
    
    db.system.version.updateOne(
     { "_id" : "shardIdentity" },
     { $set :
       { "configsvrConnectionString" : "cfg/cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017"}
     });
  8. Percona MongoDB Operator requires system-level MongoDB users to be present in a database. We must create those users for the operator, as our backup doesn’t have them. If you haven’t changed default secrets.yaml during the deployment of the cluster, you can find default passwords either in the file or here in the documentation. To create required users and roles, use the Mongo commands below:

    use admin;
    //drop user in case they already exist in your backup
    db.dropUser("userAdmin");
    db.dropUser("clusterAdmin");
    db.dropUser("clusterMonitor");
    db.dropUser("backup");
    db.dropUser("databaseAdmin");
    
    //create missing role
    db.createRole({"role" : "explainRole",
                   "privileges" : [{"resource": {"db" : "","collection" : "system.profile"},
                    "actions" :   ["collStats","dbHash","dbStats","find","listCollections","listIndexes"]}],
                    roles: []});
    
    //create system users
    db.createUser({ user: "userAdmin",
                    pwd: "userAdmin123456",
                    roles: [ { role: "userAdminAnyDatabase", db: "admin" }]
                  });
    
    db.createUser({ user: "clusterAdmin",
                   pwd: "clusterAdmin123456",
                   roles: [ { role: "clusterAdmin", db: "admin" }]
                  });
    
    db.createUser({ user: "clusterMonitor",
                    pwd: "clusterMonitor123456",
                    roles: [{ role: "explainRole", db: "admin" },
                            { role: "read", db: "local" },
                            { role: "clusterMonitor", db: "admin" }]
                  });
    
    db.createUser({ user: "databaseAdmin",
                    pwd: "databaseAdmin123456",
                    roles: [{ role: "readWriteAnyDatabase", db: "admin" },
                            { role: "readAnyDatabase", db: "admin" },
                            { role: "dbAdminAnyDatabase", db: "admin" },
                            { role: "backup", db: "admin" },
                            { role: "restore", db: "admin" },
                            { role: "clusterMonitor", db: "admin" },]
                  });
    
    db.createUser({ user: "backup",
                    pwd: "backup123456",
                    roles: [{ role: "backup", db: "admin" },
                            { role: "restore", db: "admin" },
                            { role: "clusterMonitor", db: "admin" }]
                  });
  9. Shut down the server now using db.shutdownServer();
  10. The operator runs mongod process with “mongodb” user (not as a standard “mongod”!). Therefore, we need to fix permissions before we unmount the volume. Add the following line to your /etc/passwd file:

    mongodb:x:1001:0:Default Application User:/home/mongodb:/sbin/nologin
  11. Set permissions:

    cd /your_mountpoint
    sudo chown -R mongodb:1001 ./
  12. Unmount the volume and detach it from the VM. As an example, this is how you can unmount a volume from your VM using gcloud API:

    sudo umount /dir_name
    
    gcloud compute instances detach-disk vm_name 
    --disk disk_name 
    --zone=us-central1-a

     

Step four: Start the cluster on K8s and initialize each replica set

You’re ready to get back to K8s. Start the cluster (it will start with previously used volumes). It will be in a pending state because we broke (intentionally) replica sets. Only one pod per Replica Set will start. You must initialize replica sets one by one.

To unpause the PSMDB cluster set spec.pause: false in your cr.yaml file and apply it with kubectl. Then, repeat the steps below for all replica sets, starting with Config RS.

  1. Login to the shell of “pod 0” of the replica set with kubectl exec –stdin –tty cluster_name-cfg-0 — /bin/bash (for Config RS)
  2. Login to PSMDB with mongo command
  3. Authenticate as clusterAdmin using the following command (assuming you used default passwords):

    use admin;
    db.auth("clusterAdmin", "clusterAdmin123456");
  4. Initialize the replica set as below. Replace cluster_name and namespace_name and shard_name with your own values.

    # version for Config RS
    
    rs.initiate(
       {
         _id: "cfg",
          members: [
             { _id: 0, host : "cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
             { _id: 1, host : "cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
             { _id: 2, host : "cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
          ]
       }
    );
    
    # version for other RS (shards)
    rs.initiate(
       {
         _id: "shard_name",
          members: [
             { _id: 0, host : "cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
             { _id: 1, host : "cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
             { _id: 2, host : "cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
          ]
       }
    );
  5. After a few seconds, your node will become PRIMARY. You can check the health of the replica set using the rs.status() command. Remember that if your dataset is large, the initial synchronization process may take a long time (as with any MongoDB deployment).

That’s it! You now successfully restored a snapshot backup into Percona Server for MongoDB deployed on K8s with Percona Operator. To verify that you have successfully done that run kubectl get pods or kubectl get psmdb – the output should be similar to the one below.

$ kubectl get psmdb
NAME              ENDPOINT                                        STATUS   AGE
my-cluster-name   my-cluster-name-mongos.ns-0.svc.cluster.local   ready    156m

 

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

May
08
2023
--

Percona Monitoring and Management 2.37, Percona Distribution for MongoDB 5.0.17: Release Roundup May 8, 2023

Percona Releases

It’s time for the release roundup!

Percona is a leading provider of unbiased, performance-first, open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive, free from vendor lock-in.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights, critical information, links to the full release notes, and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since April 24, 2023. Take a look!

Percona Monitoring and Management 2.37

On May 2, 2023, Percona Monitoring and Management 2.37 (PMM) was released. It is an open source database monitoring, management, and observability solution for MySQL, PostgreSQL, and MongoDB. This release of PMM starts the series of enhancements that will improve how you work with Services and Nodes in PMM. With this first set of changes, we’re revamping the Inventory page to give you more context on your inventory objects and more actionable information for exploring and fixing possible issues. Also, this release introduces three new experimental dashboards for Kubernetes monitoring, along with other new features and improvements.

For more in-depth information, check out Roma Novikov’s blog covering all things Percona Monitoring and Management 2.37.

Download Percona Monitoring and Management 2.37

Percona Distribution for MongoDB 5.0.17

Percona Distribution for MongoDB 5.0.17 was released on May 4, 2023. A freely available MongoDB database alternative, it gives you a single solution that combines enterprise components from the open source community, designed and tested to work together. Release highlights include bug fixes and improvements provided by MongoDB and included in Percona Server for MongoDB 5.0.17-14, as well as improvements in Percona Backup for MongoDB 2.1.0, such as the general availability of incremental physical backups, selective backup and restore of sharded collections, the support of parallel download of data chunks from the S3 storage, and improved deletion of old backups and point-in-time recovery oplog chunks.

Download Percona Distribution for MongoDB 5.0.17

Percona Server for MongoDB 5.0.17-14

On May 4, 2023, Percona Server for MongoDB 5.0.17-14 was released. It is an enhanced, source-available, and highly-scalable database that is a fully-compatible drop-in replacement for MongoDB 5.0.16 Community Edition and MongoDB 5.0.17 Community Edition. It supports protocols and drivers of both MongoDB 5.0.16 and 5.0.17 and includes the improvements and bug fixes of MongoDB Community Edition 5.0.16 and 5.0.17.

Download Percona Server for MongoDB 5.0.17-14

Percona Operator for PostgreSQL 2.1.0 (Tech preview)

Percona Operator for PostgreSQL 2.1.0 (Tech preview) was released on May 4, 2023. It helps create and manage highly available, enterprise-ready PostgreSQL clusters on Kubernetes. It is 100% open source, free from vendor lock-in, usage restrictions, and expensive contracts, and includes enterprise-ready features: backup/restore, high availability, replication, logging, and more.

Release Highlights include:

  • PostgreSQL 15 is now officially supported by the Operator with the new exciting features it brings to developers.
  • UX improvements related to Custom Resource have been added in this release, including the handy pgpg-backup, and pg-restore short names useful to quickly query the cluster state with the kubectl get command and additional information in the status fields, which now show nameendpointstatus, and age.

Download Percona Operator for PostgreSQL 2.1.0

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments, and is trusted by global brands to unify, monitor, manage,
secure, and optimize their database environments.

May
08
2023
--

Percona Monitoring and Management 2.37: The Improved Inventory View, New Experimental Dashboards for Kubernetes Monitoring, and More!

Percona Monitoring and Management 2.37

We are excited to announce the release of Percona Monitoring and Management (PMM) V2.37, which includes a revamped Inventory view, new experimental dashboards for Kubernetes monitoring, as well as many other improvements and new features. See the full list of changes in the release notes.

To get started with PMM 2.37, check out the PMM Quickstart guide. Please consult the upgrade instructions if you’re upgrading from a previous version.

The improved Inventory view

This release of PMM starts the series of enhancements that will improve how you work with Services and Nodes in PMM. With this first set of changes, we’re revamping the Inventory page to give you more context on your inventory objects and more actionable information for exploring and fixing possible issues.

Let’s review all of these changes separately. 

Improved design

The new Inventory page now provides a much clearer design and emphasizes useful information while downplaying reference details like service labels and IDs.

PMM inventory
We also solved one of the most frequent questions from users: “On what Node is this service running?”

To answer this question, you would have had to search by ID in a few places and be familiar with PMM’s internal architecture. Now, we are showing this information in a dedicated Node Name column on this page.

Database and monitoring status

At the heart of our redesign thinking process laid one important question: “How can our users assess the health of their database by just looking at their list of Services?

To answer this, we’ve added some monitoring capabilities into the mix. The new Status column on the Inventory page shows the status of your databases based on metrics coming directly from them.

We’ve also tackled a trust question: How can you make sure that the Service status we’re exposing is accurate?

This is why we’ve added the Monitoring column. This summarizes the status of all the Agents related to a Service and also offers quick access to the list of Agents.

Database and monitoring status

Quick access to other sections

To simplify access to related sections under Monitoring, we’ve also added links to the Service Overview Dashboard and Query Analytics. You will find them in the Action section. 

Detailed section

We’ve also moved all secondary information, like parameters and labels, to an expandable section available for each Service and Node. 

What’s next for the Inventory?

We will continue to improve the presentation of PMM’s Inventory view with Node-monitoring status and Node Agents lists. We also plan to add more connection points from the Inventory to other important PMM features like Advisors, Alerting, and backups.

In addition, the new Inventory view will be even more helpful with the upcoming structure based on database clusters.

Stay tuned! 

New Experimental dashboards for Kubernetes monitoring

Important

These experimental dashboards are subject to change. It is recommended to use these dashboards for testing purposes only.

We are pleased to announce the release of PMM V2.37, which introduces three new experimental dashboards:

  • Kubernetes Cluster Summary
  • Kubernetes Pods Status
  • Kubernetes Volumes

These dashboards are designed to provide valuable insights into the status and performance of your Kubernetes cluster, pods, and volumes, helping you to identify and troubleshoot issues quickly and easily.

We welcome your feedback as we continue to enhance PMM with these new dashboards. Please leave your comments/feedback on Percona Forum.

Experimental Kubernetes Cluster Summary dashboard

Experimental Kubernetes Cluster Summary dashboard

Kubernetes Cluster Summary provides a comprehensive overview of your Kubernetes cluster, including:

  • Components
  • Node
  • Pod
  • PVC status
  • CPU
  • Memory overview, and more.

This dashboard displays all workloads running in the cluster, enabling you to take action and optimize its performance.

Experimental Kubernetes Pods Status dashboard

Experimental Kubernetes Pods Status dashboard

Kubernetes Pods Status dashboard provides detailed information about the state and performance of your pods, including CPU, Memory, and Network metrics.

This dashboard can help you quickly pinpoint any issues affecting your pods and ensure they continue to operate smoothly.

Experimental Kubernetes Volume dashboard

Experimental Kubernetes Volume dashboard

The Kubernetes Volumes dashboard provides insights into your Kubernetes volumes, including capacity and usage, in real time. With this dashboard, you can easily monitor the performance and usage of your volumes and take proactive measures to ensure their performance.

To learn more about these new experimental dashboards and how to use them, refer to the documentation.

Here are the steps to create a new folder and move all experimental dashboards to the new folder for quick access and internal use:

Note

You should have at least an Editor role to create a new folder and move all experimental dashboards.

  1. Navigate to the Main menu and hover on the Dashboards icon.
  2. Click New folder.
  3. Provide a name for your folder, and then select Create.
  4. Navigate to Dashboards from the Main menu and click Browse.
  5. Select the dashboard that you want to move and click Move.
  6. On the Choose Dashboard dialogue box, from the dropdown under Folder option, choose the folder where you want to move the dashboard.
  7. To apply your changes, select Save Dashboard.

What is next?

Experimental dashboards will be enhanced through UX improvements and the introduction of new metrics.

Thanks to Community and Perconians

At Percona, we are grateful for our supportive community and dedicated team, who work together to shape the future of PMM. If you would like to be a part of this community, you can join us on our forums to request new features, share your feedback, and ask for support. We value the input of our community and welcome all members to participate in the ongoing development of PMM.

 

See PMM in action now!

May
02
2023
--

How To Use pt-secure-collect for Capturing Data in a Secure Way From the OS and Database System

How To Use pt-secure-collect

Sometimes crucial data sharing is avoided because of compliance rules, organizational policies, or numerous security concerns. The common use cases involve sharing pt-mysql-summary, pt-stalk, and other OS-related details to assist Support Engineers or any other third-party team troubleshoot database-related issues.

In this context, pt-secure-collect is a very important utility from Percona, which helps capture the required information securely and also provides aid in masking the existing information.

Pt-secure-collect helps in collecting, sanitizing, and encrypting data from various sources. By default, this utility collects the output with the help of pt-stalk, pt-summary, and pt-mysql-summary.

Let’s see how this tool works.

Installation

The tool can be installed via the Percona official repositories:

sudo yum install percona-toolkit

Another option for downloading pt-secure-collect is either via the Percona Toolkit or directly installing the specific tool.

shell> sudo wget https://downloads.percona.com/downloads/percona-toolkit/3.5.2/binary/redhat/7/x86_64/percona-toolkit-3.5.2-2.el7.x86_64.rpm 
shell> sudo yum install percona-toolkit-3.5.2-2.el7.x86_64.rpm

OR

shell> sudo wget percona.com/get/pt-secure-collect 
shell> sudo chmod +x pt-secure-collect

 Now, let’s run our first command to capture the OS/Database-related details from the tool.

shell> ./pt-secure-collect collect --bin-dir=/usr/bin/ --temp-dir=/home/vagrant/pt/ --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password=Root@1234
Encryption password

Output:

INFO[2023-04-22 06:54:10] Temp directory is "/home/vagrant/pt/"
INFO[2023-04-22 06:54:10] Creating output file "/home/vagrant/pt/pt-stalk_2023-04-22_06_54_10.out"  
INFO[2023-04-22 06:54:10] Running pt-stalk --no-stalk --iterations=2 --sleep=30 --host=localhost --dest=/home/vagrant/pt/ --port=3306 --user=root --password=********  
INFO[2023-04-22 06:55:42] Creating output file "/home/vagrant/pt/pt-summary_2023-04-22_06_55_42.out"  
INFO[2023-04-22 06:55:42] Running pt-summary                            
INFO[2023-04-22 06:55:48] Creating output file "/home/vagrant/pt/pt-mysql-summary_2023-04-22_06_55_48.out"  
INFO[2023-04-22 06:55:48] Running pt-mysql-summary --host=localhost --port=3306 --user=root --password=********  
INFO[2023-04-22 06:56:01] Sanitizing output collected data              
INFO[2023-04-22 06:56:17] Creating tar file "/home/vagrant/pt/pt.tar.gz"  
INFO[2023-04-22 06:56:17] Encrypting "/home/vagrant/pt/pt.tar.gz" file into "/home/vagrant/pt/pt.tar.gz.aes"  
INFO[2023-04-22 06:56:17] Skipping encrypted file "pt.tar.gz.aes"   

So, here the above command collected the data from the “pt*” tools securely. By default, it encrypts the data and asks for the encryption password as well. However, we can skip that part by mentioning this option “ –no-encrypt”  option. 

Options:-

--bin-dir => Directory having the Percona Toolkit binaries (pt* tools). 
--temp-dir => Temporary directory used for the data collection.

Note – In order to run the command successfully all prerequisites binaries of (pt-stalk, pt-summary, and pt-mysql-summary) must be present and included in the command.

Let’s decrypt the file and observe the captured details:

shell> ./pt-secure-collect decrypt /home/vagrant/pt/pt.tar.gz.aes  --outfile=/home/vagrant/pt/pt.tar.gz
Encryption password:
INFO[2023-04-22 07:01:55] Decrypting file "/home/vagrant/pt/pt.tar.gz.aes" into "/home/vagrant/pt/pt.tar.gz" 

Note – Here, we need to provide the password which we used at the time of encryption.

--outfile => Write the output to this file. If omitted, the output file name will be the same as the input file, adding the .aes extension.

Now, inside the path, we can see the unencrypted file. Followed by this, we can uncompress the file to see the contents.

shell> /home/vagrant/pt 
-rw-------. 1 vagrant vagrant 500K Apr 22 07:01 pt.tar.gz

shell> tar -xzvf pt.tar.gz

Let’s look at a couple of examples where the sensitive data has been altered or masked.

  • With pt-secure-collect:
Hostname | hostname 
log_error | /var/log/hostname 
Config File | /etc/hostname 
pid-file        = /var/run/mysqld/hostname 
log-error     = /var/log/hostname 
socket        = /var/lib/mysql/hostname

  • Without pt-secure-collect:
Hostname | localhost.localdomain 
log_error | /var/log/mysqld.log 
Config File | /etc/my.cnf 
pid-file       = /var/run/mysqld/mysqld.pid 
log-error     = /var/log/mysqld.log 
socket        = /var/lib/mysql/mysql.sock

Note – We can clearly see some differences in the both types of outputs. With pt-secure-collection the above information was just replaced with some random value(“hostname”).

Now, let’s see how we can sanitize an existing file “pt-mysql-summary.out” and mask the critical information that ends with the below output section.

shell> ./pt-secure-collect sanitize --input-file=/home/vagrant/pt-mysql-summary.out > /home/vagrant/pt-mysql-summary_sanitize.out

Output:

Hostname | hostname 
Pidfile | /var/run/mysqld/hostname (exists) 
log_error | /var/log/hostname 
Config File | /etc/hostname 
pid-file        = /var/run/mysqld/hostname 
log-error     = /var/log/hostname 
socket        = /var/lib/mysql/hostname 
log-error     = /var/log/mariadb/hostname
pid-file        = /var/run/mariadb/hostname

You may also control the information which you want to skip from masking with settings with option –no-sanitize-hostnames and  –no-sanitize-queries.

Here, we see one more example where the critical information, such as “hostname” details inside the OS log file (“/var/log/messages”), is masked/replaced by some other value.

shell> sudo ./pt-secure-collect sanitize --input-file=/var/log/messages > /home/vagrant/messages_sanitize.out

 

Output (without pt-secure-collect):

Apr 23 03:37:13 localhost pmm-agent: #033[31mERRO#033[0m[2023-04-23T03:37:13.547+00:00] time="2023-04-23T03:37:13Z" level=error msg="Error opening connection to ProxySQL: dial tcp 127.0.0.1:6032: connect: connection refused" source="exporter.go:169"  #033[31magentID#033[0m=/agent_id/04dd6ad8-5c2e-4c52-a624-eb3bc7357651 #033[31mcomponent#033[0m=agent-process #033[31mtype#033[0m=proxysql_exporter

Output (with pt-secure-collect):

Apr 23 03:37:13 localhost pmm-agent: #033[31mERRO#033[0m[2023-04-23T03:37:13.547+00:00] time="2023-04-23T03:37:13Z" level=error msg="Error opening connection to ProxySQL: dial tcp hostname:6032: connect: connection refused" source="hostname:169"  #033[31magentID#033[0m=/agent_id/04dd6ad8-5c2e-4c52-a624-eb3bc7357651 #033[31mcomponent#033[0m=agent-process #033[31mtype#033[0m=proxysql_exporte

 

Summary

With the help of this tool, both OS and database-level information/logs can be encrypted or masked with some different values to hide the sensitive data. This tool comes in handy while dealing with critical data troubleshooting with any third-party stakeholders and also maintains security/compliance-related practices.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL, MariaDB, MongoDB, and PostgreSQL server and system tasks that are too difficult or complex to perform manually.

 

Learn more about Percona Toolkit

Apr
28
2023
--

Add More Security to Your Percona Server for MongoDB With AWS IAM integration!

MongoDB With AWS IAM Integration

Did you notice that Percona Server for MongoDB 6.0.5-4 was released just a few days ago? This time around, it introduced improvements to the way we handle master key rotation for data at rest encryption as well as AWS IAM integration.

One key to rule them all — improvements to master key rotation

With the improvements introduced in Percona Server for MongoDB 6.0.5-4, one key path can be used for all servers in a clustered environment. This allows us to use one vault key namespace for all nodes in a deployment while at the same time preserving key versions and allowing each node to perform key rotation without impact to the other nodes.

Changes introduced with Percona Server for MongoDB 6.0.5-4 now allow using the same key for all the members of a replica set if the user chooses so, without impact on functionality.

Why should you care about AWS IAM integration?

With all the systems users need to access daily, password management becomes a more pressing issue. The introduction of IAM systems to an enterprise has become somewhat of a security standard in large enterprises.

Our users approached us about integration with AWS IAM, commonly used in their organizations. It’s an integration missing from MongoDB Community Edition (CE) that is important for compliance with enterprise security policies of many companies. Integration with AWS IAM allows:

MongoDB AWS IAM integration

To set up this integration, follow the steps outlined in our documentation, and configure either the user or the role authentication. This will allow AWS Security Token Service (STS) to play the part of Identity Provider (IDP) in a SAML 2.0-based federation.

Your feedback matters!

We take pride in being open to feedback in Percona. Please do not hesitate to contact us via the community forums or this contact form.

What’s next

We are looking into the problems affecting large size datastores that are a pain to our users. Please let us know if there are any particular issues you are struggling with in MongoDB; we are always open to suggestions!

Learn more about Percona Server for MongoDB

Apr
28
2023
--

Open Source vs. Proprietary Database Software: What To Choose?

open source

For starters, yes, Percona has chosen sides. We are a company of open source proponents. We’re also dedicated and active participants in the global open source community. 

But the intent here is to be informative, not to declare a winner between the two. Both open source and proprietary options have advantages. At the same time, it’s hard to ignore key differences and how they affect customers. We also know that enterprise-grade features matter, so please read on. In election parlance, there might be a surprise third-party candidate that will win your vote.

Let’s start with a simple introductory comparison: With proprietary (closed source) database software, the public does not have access to the source code; only the company that owns it and those given access can modify it. With open source database software, anyone in the general public can access the source code, read it, and modify it.

Dispelling a couple of myths

Before expanding on the comparison, let’s dispel the most common myths about open source software:

Myth #1: Open source is less secure.

Sure, without the right protections, open source software can be vulnerable, but those protections exist and can be implemented. People assume that because the code is public, attackers can have at it and easily wreak havoc. That’s not so. 

With the code open to all, it’s truly transparent (DevOps.com, April 2021); security is scrutinized, and vulnerabilities are addressed by experts globally. Those most involved with open source software know the reality about security. Since the early 2000s, research has repeatedly shown that open source software is no more vulnerable than proprietary software. Those on the front lines echo those findings. A 2021 Red Hat survey showed that 89% of IT leaders see enterprise open source software as equally or more secure than proprietary software.

In contrast to the abundant support of the open source community, with proprietary systems, the software vendor must address and fix the problem, which often means providing a patch. Further, the developers of proprietary software aren’t always transparent about vulnerabilities, so software users might not know about threats until it’s too late. 

And about attackers: They often use hacking programs instead of attacking the code directly. They don’t need access to examine the code for hacking purposes (TA Digital, July 2021). Attackers find and exploit vulnerabilities in proprietary software all the time.

Myth #2: Proprietary databases are better and therefore more suitable for large enterprises.

Again, in the case of proprietary software, the developers and DBAs come from within one company or a limited team. Conversely, with open source, a global community contributes. 

The transparency of the open source model makes input inclusive and creates an abundance of scrutiny and support (Forbes, January 2022). The multitude of checks and balances places a premium on how the code affects performance. 

Another part of this myth is that open source can’t be enterprise-grade. With the right expertise, you can have an enterprise-grade open source solution. Here’s a small sampling of leading corporations that use open source software: Amazon, Audi, BMW, Google, IBM, and Facebook (CodeSnail, August 2022). The list goes on and on.

Now, myths aside, let’s get down to the brass tacks of database comparisons. Each of our subjects has its pros and cons.

Proprietary database software

Proprietary software can be beneficial for addressing immediate and/or focused database concerns. Sometimes a vendor will have an innovative way of solving a problem when there aren’t alternatives available on the market. A company will enter a relationship with that vendor because the vendor’s solution addresses present business objectives. Additionally, a single-vendor relationship can eliminate complexity; in some cases, the vendor’s solution can simplify the environment and ensure that all components work together. 

The benefits described above often are attributed solely — and inaccurately —  to proprietary software only. For some, proprietary is synonymous with business/enterprise-grade. Consequently, some organizations might not even consider using open source software, or they’ll quickly dismiss it because of a perceived lack of expertise, support, etc.

Those same customers, and others, often aren’t aware that proprietary software can limit creative options and the ability to scale. Those limitations can increasingly draw from a customer’s tech budget. As business objectives change, along with industry standards and technological advances, a customer can be stuck with software and upgrades that make more sense for the vendor’s bottom line than for addressing the customer’s changing needs. For example, the vendor might push a cloud-based solution when the customer prefers to keep its infrastructure on-premises. 

Additionally, with proprietary software, there can be drawbacks related to certifications. When you deploy software in a proprietary arrangement, the vendor might certify it only against a specific database or set of databases. Your apps, therefore, must run on one particular server.

Being stuck with a single vendor and its software can result in vendor lock-in that makes you susceptible to price hikes, paying for bundled technology with components you don’t need, and an inability to change software and infrastructure to meet unique business needs.

Open source database software

Upstream open source projects are free to download and use. There are no licensing or purchasing fees for reusing, modifying, or distributing the software. Beyond the obvious cost-efficiency, many IT leaders consider the quality of open source software on par with that of proprietary software. In fact, 32% of IT leaders in a Red Hat survey consider open source enterprise software to be of higher quality. 

Free of licensing restrictions and escalating costs that can come with proprietary software, developers can download open source software and use it to create new applications. Those freedoms help companies optimize limited tech budgets. They can more easily scale infrastructure — up or down — to meet economic conditions and changing business objectives. 

And there is the aforementioned online open source community. Whereas proprietary products exist at the whim of a single vendor, a strong open source community can help ensure a project carries on even if challenges arise for some of the project’s supporting companies. Additionally, with open source, companies can deploy their databases anywhere — in cloud, on-premises, or hybrid environments — and move them at any time.

A lack of readily available support and expertise, however, can offset the potential savings of open source database software. There must be careful implementation of the right protection to avoid vulnerabilities. And to achieve database objectives across the enterprise, a company that uses open source software often must either bolster its on-staff expertise or turn to outside support. Either option can be costly.

The best of both worlds — enterprise-grade open source software

Undoubtedly, you liked some of the attributes from each side. So how do you choose?

You don’t have to. There’s a third-party candidate offering the best of both worlds — open source database software with enterprise-grade features.

This option couples the cost-efficiency and scalability of open source with the simplicity (task-focused), cohesiveness (components work together), and security of proprietary software. With the right extensions and add-ons to make it enterprise-grade, an open source solution can replicate the applications a company uses and can handle the performance requirements of the company’s most critical workloads. A flexible, open source enterprise setup enables deployment and operation on-premises, in the cloud, or in a hybrid environment.

It’s important, however, to emphasize these words of caution: The phrase “enterprise-grade” is used a lot, but few vendors provide open source software that meets the demanding mix of enterprise needs related to integration, productivity, scalability, and security. And even when those needs are met, they’re soon to evolve. Therefore, enterprise-grade software — like community versions — still requires support. When seeking such support, it’s important to find a vendor that provides multi-database support, technology-agnostic expertise, and a flexible contract.

The search can be challenging, but vendors who provide true enterprise-grade open source software do exist. We happen to know of one.

You can learn more about the differences between open source and proprietary database software in The Ultimate Guide to Open Source Databases.

When you’re choosing a database, consider Percona

Percona is dedicated to making databases and applications run better through a combination of expertise and open source software. Our enterprise-grade distributions include the following:

  • Percona Distribution for MySQL: This single solution delivers optimized performance, greater scalability and availability, and enhanced backups — for even the most demanding workloads.
  • Percona Distribution for PostgreSQL: Put the best and most critical enterprise components from the open source community to work for you — in a single distribution, designed and tested to work together.
  • Percona Distribution for MongoDB: Ensure data availability while improving security and simplifying the development of new applications — in the most demanding public, private, and hybrid cloud environments.

Percona backs its enterprise-grade distributions with varying levels of support. We’ll provide support that best fits the needs of your company or organization — without a restrictive contract.

 

Watch Webinar: Optimize Costs With Open Source Software and Support

 

Get Open Source Support Services from Percona

 

Learn more:

Apr
26
2023
--

Speeding Up Restores in Percona Backup for MongoDB

Speeding Up Restores in Percona Backup for MongoDB

When you do a database restore, you want to have it done as soon as possible. In the case of disaster recovery, the situation is stressful enough on its own. And the database is unusable until the restore is done. So every minute matters. That becomes especially crucial with big datasets.

Bringing physical backups in Percona Backup for MongoDB (PBM) was a big step toward the restoration speed. A physical restore is essentially copying data files to the target nodes and starting a database with that data catalog, while logical means copying data and running insert operations on the database, which brings overhead on parsing data, building indexes, etc. Our tests showed physical database restores up to 5x faster than the logical ones. But can we do better? Let’s try.

The speed of the physical restoration comes down to how fast we can copy (download) data from the remote storage. So we decided to try parallel (concurrent) download. In physical backups, PBM stores WiredTiger files pretty much the same as they are in the data directory, just adding extra compression. So what if you want to download different files in parallel? It won’t exactly work as each MongoDB collection’s data is stored in one file. So data doesn’t spread evenly across the files. And we would have bottlenecks in case of big collections. So the better approach is to download each file concurrently.

PBM already downloads files in chunks, but it’s done solely for retries. So in case of a network failure, we’d have to retry a recent chunk rather than the whole file. The idea is to download these chunks concurrently. Here’s the problem: Reading out-of-order, we cannot write it straight to the file (with a seek offset), as we have to decompress data first (data in the backup is compressed by default). Hence, although we can read data out-of-order, we must write it sequentially. For that, we made a special memory buffer. Chunks can be put there concurrently and out-of-order, but consumers always read data in order.

The final design

The downloader starts the number of workers, which equals the concurrency (number of CPU cores by default). There is preallocated arena in the arenas pool for each worker. The arena basically is a bytes buffer with the free slots bitmap. Each arena is split into spans. The span size is equal to the download chunk size. When a worker wants to download a chunk, it acquires free span from its arena first and downloads data in there. When a consumer reads this data, the span is marked as free and can be reused for the next chunk. The worker doesn’t wait for the data to be read, and once it has downloaded a chunk, it takes another chunk from the task queue, acquires the next free span, and downloads data. To prevent uncontrolled memory consumption, the number of spans in each arena is limited, and the worker would have to wait for a free span to download the next chunk if all are busy. 

On the other hand, we keep track of what was given to the consumer, the number of the last written byte, for each file. And if the downloaded chunk is out-of-order, it’s being pushed to the heap or given to the consumer otherwise. On the next iteration (the next downloaded chunk), we check the top of the heap, pop chunks out, and give it back to the consumer if/until chunks are in order.

See the commit with changes for more details.

Config options

A few new options were added to the PBM config file to tweak concurrent downloads. 

numDownloadWorkers – sets concurrency. Default: number of CPUs

downloadChunkMb – the size of the chunks to download in Mb. Default: 32

maxDownloadBufferMb – the upper limit of the memory that can be used for download buffers (arenas) in Mb. Default: numDownloadWorkers * downloadChunkMb * 16. If set, chunk size might be changed to fit the max requirements. It doesn’t mean that all of this memory will be used and actually allocated in the physical RAM.

Results

PBM supports different storage types, but for this implementation, we decided to start with the most widely used – S3 compatible. We aim to port it to Azure Blob and FileSystem storage types in subsequent releases.

Our tests on AWS S3 show up to 19x improvements in the restore speed:

Instances
Backup size
Concurrency
Span size
Restore time
Concurrent download
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
4
   32Mb
    45 min
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
8
   32Mb
    32 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
12
   32Mb
    168 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
24
   32Mb
    117 min
Release v2.0.3
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
    227 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
    ~2280 min

 

AWS S3 MongoDB backup

* Tests were made on AWS i3en instances with the S3 storage in the same region.

** We didn’t wait for 5Tb restore on v2.0.3 to finish and used the “time to uploaded Gb” ratio for results extrapolation.

Try Percona Backup for MongoDB for faster restores

This is a significant improvement that comes among the other features with the new Percona Backup for MongoDB (PBM) release. Give it a try, and leave your feedback!

 

Get Percona Backup for MongoDB

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com