Jan
18
2022
--

Attack No-PK Replication Lag with MySQL/Percona Server 8 Invisible Columns!

no primary key replication lag mysql

no primary key replication lag mysqlThe most common issue when using row-based replication (RBR) is replication lag due to the lack of Primary keys.

The problem is that any replicated DML will do a full table scan for each modified row on the replica. This bug report explains it more in-depth: https://bugs.mysql.com/bug.php?id=53375

For example, if a delete is executed on the following table definition:

CREATE TABLE `joinit` (
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

 

With this amount of rows:

mysql> select count(*) from joinit;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+

 

The delete being:

mysql> flush status ;

mysql> delete from joinit where i > 5 and i < 150;
Query OK, 88 rows affected (0.04 sec)

mysql> show status like '%handler%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 2       |
| Handler_delete             | 1       |
…
| Handler_read_rnd_next      | 1048577 |
…

It can be seen that the delete on the Primary requires a full table scan (Handler_read_rnd_next matches row amount + 1) to delete 88 rows.

The additional problem is that each of the rows being deleted will be recorded in the binary log individually like this:

#220112 18:29:05 server id 1  end_log_pos 3248339 CRC32 0xdd9d1cb2 Delete_rows: table id 106 flags: STMT_END_F
### DELETE FROM `test2`.`joinit`
### WHERE
###   @1=6
###   @2='764d302b-73d5-11ec-afc8-00163ef3b519'
###   @3='18:28:39'
###   @4=27
### DELETE FROM `test2`.`joinit`
### WHERE
###   @1=7
###   @2='764d30bc-73d5-11ec-afc8-00163ef3b519'
###   @3='18:28:39'
###   @4=5
…
{88 items}

Which will result in 88 full table scans on the replica, and hence the performance degradation.

For these cases, the recommendation is to add a primary key to the table, but sometimes adding a PK might not be easy because:

  • There are no existing columns that could be considered a PK.
  • Or adding a new column (as the PK) is not possible as it might impact queries from a 3rd party tool that we have no control over (or too complex to fix with query rewrite plugin).

The solution is to use MySQL/Percona Server for MySQL 8 and add an invisible column

Adding a new column (named “newc”) invisible as a primary key can be done with the following line:

ALTER TABLE joinit ADD COLUMN newc INT UNSIGNED NOT NULL AUTO_INCREMENT INVISIBLE PRIMARY KEY FIRST;

Note, adding a PK is an expensive operation that requires a table rebuild as shown here.

After adding an invisible PK, the table will look like this:

CREATE TABLE `joinit` (
  `newc` int NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1

 

Deleting a row now will be recorded in the binary log like this:

### DELETE FROM `test`.`joinit`
### WHERE
###   @1=1048577
###   @2=1
###   @3='string'
###   @4='17:23:04'
###   @5=5
# at 430
#220112 17:24:56 server id 1  end_log_pos 461 CRC32 0x826f3af6 Xid = 71
COMMIT/*!*/;

Where @1 is the first column ( the PK in this case) which the replica can use to find the matching row without having to do a full table scan.

The operation executed on the replica would be similar to the following which requires only one scan to find the matching row:

mysql> flush status ;
Query OK, 0 rows affected (0.01 sec)

mysql> delete from joinit where newc = 1048578;
Query OK, 1 row affected (0.00 sec)

mysql> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
…
| Handler_read_key           | 1     |
…
| Handler_read_rnd_next      | 0     |
…

 

Also as the name suggests, an invisible column won’t show nor it needs to be referenced when doing operations over the table, i.e:

mysql> select * from joinit limit 2; 
+---+--------------------------------------+----------+----+
| i | s                                    | t        | g  |
+---+--------------------------------------+----------+----+
| 2 | ecc6cbed-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 58 |
| 3 | ecc7d9bb-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 56 |
+---+--------------------------------------+----------+----+
2 rows in set (0.00 sec)

mysql> insert into joinit values (4, "string", now(), 5);
Query OK, 1 row affected (0.01 sec)

 

But if needed, the new column (newc) can be fetched if explicitly queried:

mysql> select newc, i, s, t, g from joinit limit 2; 
+------+---+--------------------------------------+----------+----+
| newc | i | s                                    | t        | g  |
+------+---+--------------------------------------+----------+----+
|    1 | 2 | ecc6cbed-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 58 |
|    2 | 3 | ecc7d9bb-73c9-11ec-afc8-00163ef3b519 | 17:06:03 | 56 |
+------+---+--------------------------------------+----------+----+
2 rows in set (0.00 sec)

 

What If…?

What if MySQL automatically detects that the PK is missing for InnoDB tables and adds the invisible PK?

Taking into account that an internal six bytes PK is already added when the PK is missing, it might be a good idea to allow the possibility of making the PK visible if you need to. 

This means that when you execute this CREATE TABLE statement:

CREATE TABLE `joinit` (
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1

 

Will be automatically translated to:

CREATE TABLE `joinit` (
  `newc` int NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `i` int NOT NULL,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int NOT NULL,
  PRIMARY KEY (`newc`)
) ENGINE=InnoDB AUTO_INCREMENT=1048576 DEFAULT CHARSET=latin1

 

And then we can execute this command:

ALTER TABLE joint ALTER COLUMN newc SET VISIBLE;

To make it visible.

Conclusion

Missing primary keys is a problem for scaling databases, as replication will require a full table scan for each updated/delete row, and the more data the more lag.

Adding a PK might not be always possible because of 3rd party tools or restrictions, but adding an invisible primary key will do the trick and have the benefits of adding a PK without impacting syntax and operations from 3rd party clients/tools. What will be awesome is to make MySQL able to detect the missing PK, add it automatically, and change it to visible if you need to.

Jan
14
2022
--

DBaaS and the Enterprise

DBaaS and the Enterprise

DBaaS and the EnterpriseInstall a database server. Give the application team an endpoint. Set up backups and monitor in perpetuity. This is a pattern I hear about regularly from DBAs with most of my enterprise clients. Rarely do they get to troubleshoot or work with application teams to tune queries or design schemas. This is what triggers the interest in a DBaaS platform from database administrators.

What is DBaaS?

DBaaS stands for “Database as a Service”. When this acronym is thrown out, the first thought is generally a cloud offering such as RDS. While this is a very commonly used service, a DBaaS is really just a managed database platform that offloads much of the operational burden from the DBA team. Tasks handled by the platform include:

  • Installing the database software
  • Configuring the database
  • Setting up backups
  • Managing upgrades
  • Handling failover scenarios

A common misconception is that a DBaaS is limited to the public cloud. As many enterprises already have large data centers and heavy investments in hardware, an on-premise DBaaS can also be quite appealing. Keeping the database in-house is often favored when the hardware and resources are already available. In addition, there are extra compliance and security concerns when looking at a public cloud offering.

DBaaS also represents a difference in mindset. In conventional deployments, systems and architecture are often designed in very exotic ways making automation a challenge. With a DBaaS, automation, standardization, and best practices are the priority. While this can be seen as limiting flexibility, this approach can lead to larger and more robust infrastructures that are much easier to manage and maintain.

Why is DBaaS Appealing?

From a DBA perspective (and being a former DBA myself), I always enjoyed working on more challenging issues. Mundane operations like launching servers and setting up backups make for a less-than-exciting daily work experience. When managing large fleets, these operations make up the majority of the work.

As applications grow more complex and data sets grow rapidly, it is much more interesting to work with the application teams to design and optimize the data tier. Query tuning, schema design, and workflow analysis are much more interesting (and often beneficial) when compared to the basic setup. DBAs are often skilled at quickly identifying issues and understanding design issues before they become problems.

When an enterprise adopts a DBaaS model, this can free up the DBAs to work on more complex problems. They are also able to better engage and understand the applications they are supporting. A common comment I get when discussing complex tickets with clients is: “well, I have no idea what the application is doing, but we have an issue with XYZ”. If this could be replaced with a detailed understanding from the design phase to the production deployment, these discussions would be very different.

From an application development perspective, a DBaaS is appealing because new servers can be launched much faster. Ideally, with development or production deployment options, an application team can have the resources they need ready in minutes rather than days. It greatly speeds up the development life cycle and makes developers much more self-reliant.

DBaaS Options

While this isn’t an exhaustive list, the main options when looking to move to a DBaaS are:

  • Public cloud
    • Amazon RDS, Microsoft Azure SQL, etc
  • Private/Internal cloud
    • Kubernetes (Percona DBaaS), VMWare, etc
  • Custom provisioning/operations on bare-metal

Looking at public cloud options for a DBaaS, security and compliance are generally the first concern. While they are incredibly easy to launch and generally offer some pay-as-you-go options, managing access is a major consideration.

Large enterprises with existing hardware investments often want to explore a private DBaaS. I’ve seen clients work to create their own tooling within their existing infrastructure. While this is a viable option, it can be very time-consuming and require many development cycles. Another alternative is to use an existing DBaaS solution. For example, Percona currently has a DBaaS deployment as part of Percona Monitoring and Management in technical preview. The Percona DBaaS automates PXC deployments and management tasks on Kubernetes through a user-friendly UI.

Finally, a custom deployment is just what it sounds like. I have some clients that manage fleets (1000s of servers) of bare metal servers with heavy automation and custom scripting. To the end-user, it can look just like a normal DBaaS (an endpoint with all operations hidden). On the backend, the DBA team spends significant time just supporting the infrastructure.

How Can Percona help?

Percona works to meet your business where you are. If that is supporting Percona Server for MySQL on bare metal or a fleet of RDS instances, we can help. If your organization is leveraging Kubernetes for the data tier, the Percona Private DBaaS is a great option to standardize and simplify your deployments while following best practices. We can help from the design phase through the entire life cycle. Let us know how we can help!

Jan
14
2022
--

Help Percona Make Better Databases

Percona Make Better Databases

Hello everyone!  The new year is upon us and we are looking at planning some awesome new features, enhancements, and ideas in 2022.  One of the things we could really use your help on, however, is providing us with feedback on how we are doing and what we can do to improve our database products.  We have put together a few short surveys for each one of our main products (Supporting MySQL, MongoDB, PostgreSQL, Percona Monitoring and Management, and our Kubernetes Operators).  Each survey is a quick 5-7 questions and should take no more than a minute or two to complete.

Percona Server for MySQL & PXC https://per.co.na/lnsTqM
Percona Server for MongoDB https://per.co.na/7BGhFe
Percona Distribution for PostgreSQL https://per.co.na/1KfvRt
Percona Monitoring and Management https://per.co.na/JQ7yAt
Percona Operators (Either for PG, MYSQL, or MongoDB) https://per.co.na/Gxwcrj

Note, everyone who fills in a survey is eligible to be part of a drawing for a Percona T-Shirt. 10 people will randomly be selected. You don’t have to leave an email when you fill out the survey, but you do to be entered into the drawing.

Thank you for helping us out!

Jan
11
2022
--

Creating a Standby Cluster With the Percona Distribution for PostgreSQL Operator

Standby Cluster With the Percona Distribution for PostgreSQL Operator

A customer recently asked if our Percona Distribution for PostgreSQL Operator supports the deployment of a standby cluster, which they need as part of their Disaster Recovery (DR) strategy. The answer is yes – as long as you are making use of an object storage system for backups, such as AWS S3 or GCP Cloud Storage buckets, that can be accessed by the standby cluster. In a nutshell, it works like this:

  • The primary cluster is configured with pgBackRest to take backups and store them alongside archived WAL files in a remote repository;
  • The standby cluster is built from one of these backups and it is kept in sync with the primary cluster by consuming the WAL files that are copied from the remote repository.

Note that the primary node in the standby cluster is not a streaming replica from any of the nodes in the primary cluster and that it relies on archived WAL files to replicate events. For this reason, this approach cannot be used as a High Availability (HA) solution. Even though the primary use of a standby cluster in this context is DR, it can be also employed for migrations as well.

So, how can we create a standby cluster using the Percona operator? We will show you next. But first, let’s create a primary cluster for our example.

Creating a Primary PostgreSQL Cluster Using the Percona Operator

You will find a detailed procedure on how to deploy a PostgreSQL cluster using the Percona operator in our online documentation. Here we want to highlight the main steps involved, particularly regarding the configuration of object storage, which is a crucial requirement and should better be done during the initial deployment of the cluster. In the following example, we will deploy our clusters using the Google Kubernetes Engine (GKE) but you can find similar instructions for other environments in the previous link.

Considering you have a Google account configured as well as the gcloud (from the Google Cloud SDK suite) and kubectl command-line tools installed, authenticate yourself with gcloud auth login, and off we go!

Creating a GKE Cluster and Basic Configuration

The following command will create a default cluster named “cluster-1” and composed of three nodes. We are creating it in the us-central1-a zone using e2-standard-4 VMs but you may choose different options. In fact, you may also need to indicate the project name and other main settings if you do not have your gcloud environment pre-configured with them:

gcloud container clusters create cluster-1 --preemptible --machine-type e2-standard-4 --num-nodes=3 --zone us-central1-a

Once the cluster is created, use your IAM identity to control access to this new cluster:

kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account)

Finally, create the pgo namespace:

kubectl create namespace pgo

and set the current context to refer to this new namespace:

kubectl config set-context $(kubectl config current-context) --namespace=pgo

Creating a Cloud Storage Bucket

Remember for this setup we need a Google Cloud Storage bucket configured as well as a Service Account created with the necessary privileges/roles to access it. The respective procedures to obtain these vary according to how your environment is configured so we won’t be covering them here. Please refer to the Google Cloud Storage documentation for the exact steps. The bucket we created for the example in this post was named cluster1-backups-and-wals.

Likewise, please refer to the Creating and managing service account keys documentation to learn how to create a Service Account and download the corresponding key in JSON format – we will need to provide it to the operator so our PostgreSQL clusters can access the storage bucket.

Creating the Kubernetes Secrets File to Access the Storage Bucket

Create a file named my-gcs-account-secret.yaml with the following structure:

apiVersion: v1
kind: Secret
metadata:
  name: cluster1-backrest-repo-config
type: Opaque
data:
  gcs-key: <VALUE>

replacing the <VALUE> placeholder by the output of the following command according to the OS you are using:

Linux:

base64 --wrap=0 your-service-account-key-file.json

macOS:

base64 your-service-account-key-file.json

Installing and Deploying the Operator

The most practical way to install our operator is by cloning the Git repository, and then moving inside its directory:

git clone -b v1.1.0 https://github.com/percona/percona-postgresql-operator
cd percona-postgresql-operator

The following command will deploy the operator:

kubectl apply -f deploy/operator.yaml

We have already prepared the secrets file to access the storage bucket so we can apply it now:

kubectl apply -f my-gcs-account-secret.yaml

Now, all that is left is to customize the storages options in the deploy/cr.yaml file to indicate the use of the GCS bucket as follows:

    storages:
      my-gcs:
        type: gcs
        bucket: cluster1-backups-and-wals

We can now deploy the primary PostgreSQL cluster (cluster1):

kubectl apply -f deploy/cr.yaml

Once the operator has been deployed, you can run the following command to do some housekeeping:

kubectl delete -f deploy/operator.yaml

Creating a Standby PostgreSQL Cluster Using the Percona Operator

After this long preamble, let’s look at what brought you here: how to deploy a standby cluster, which we will refer to as cluster2, that will replicate from the primary cluster.

Copying the Secrets Over

Considering you probably have customized the passwords you use in your primary cluster and that they differ from the default values found in the operator’s git repository, we need to make a copy of the secrets files, adjusted to the standby cluster’s name. The following procedure facilitates this task, saving the secrets files under /tmp/cluster1-cluster2-secrets (you can choose a different target directory):

NOTE: make sure you have the yq tool installed in your system.
mkdir -p /tmp/cluster1-cluster2-secrets/
export primary_cluster_name=cluster1
export standby_cluster_name=cluster2
export secrets="${primary_cluster_name}-users"
kubectl get secret/$secrets -o yaml \
| yq eval 'del(.metadata.creationTimestamp)' - \
| yq eval 'del(.metadata.uid)' - \
| yq eval 'del(.metadata.selfLink)' - \
| yq eval 'del(.metadata.resourceVersion)' - \
| yq eval 'del(.metadata.namespace)' - \
| yq eval 'del(.metadata.annotations."kubectl.kubernetes.io/last-applied-configuration")' - \
| yq eval '.metadata.name = "'"${secrets/$primary_cluster_name/$standby_cluster_name}"'"' - \
| yq eval '.metadata.labels.pg-cluster = "'"${standby_cluster_name}"'"' - \
>/tmp/cluster1-cluster2-secrets/${secrets/$primary_cluster_name/$standby_cluster_name}

Deploying the Standby Cluster: Fast Mode

Since we have already covered the procedure used to create the primary cluster in detail in a previous section, we will be presenting the essential steps to create the standby cluster below and provide additional comments only when necessary.

NOTE: the commands below are issued from inside the percona-postgresql-operator directory hosting the git repository for our operator.

Deploying a New GKE Cluster Named cluster-2

This time using the us-west1-b zone here:

gcloud container clusters create cluster-2 --preemptible --machine-type e2-standard-4 --num-nodes=3 --zone us-west1-b
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value core/account)
kubectl create namespace pgo
kubectl config set-context $(kubectl config current-context) --namespace=pgo
kubectl apply -f deploy/operator.yaml

Apply the Adjusted Kubernetes Secrets:

export standby_cluster_name=cluster2
export secrets="${standby_cluster_name}-users"
kubectl create -f /tmp/cluster1-cluster2-secrets/$secrets

The list above does not include the GCS secret file; the key contents remain the same but the backrest-repo pod name needs to be adjusted. Make a copy of that file:

cp my-gcs-account-secret.yaml my-gcs-account-secret-2.yaml

then edit the copy to indicate “cluster2-” instead of “cluster1-”:

name: cluster2-backrest-repo-config

You can apply it now:

kubectl apply -f my-gcs-account-secret-2.yaml

The cr.yaml file of the Standby Cluster

Let’s make a copy of the cr.yaml file we customized for the primary cluster:

cp deploy/cr.yaml deploy/cr-2.yaml

and edit the copy as follows:

1) Change all references (that are not commented) from cluster1 to cluster2  – including current-primary but excluding the bucket reference, which in our example is prefixed with “cluster1-”; the storage section must remain unchanged. (We know it’s not very practical to replace so many references, we still need to improve this part of the routine).

2) Enable the standby option:

standby: true

3) Provide a repoPath that points to the GCS bucket used by the primary cluster (just below the storages section, which should remain the same as in the primary cluster’s cr.yaml file):

repoPath: “/backrestrepo/cluster1-backrest-shared-repo”

And that’s it! All that is left now is to deploy the standby cluster:

kubectl apply -f deploy/cr-2.yaml

With everything working on the standby cluster, do some housekeeping:

kubectl delete -f deploy/operator.yaml

Verifying it all Works as Expected

Remember that the standby cluster is created from a backup and relies on archived WAL files to be continued in sync with the primary cluster. If you make a change in the primary cluster, such as adding a row to a table, that change won’t reach the standby cluster until the WAL file it has been recorded to is archived and consumed by the standby cluster.

When checking if all is working with the new setup, you can force the rotation of the WAL file (and subsequent archival of the previous one) in the primary node of the primary cluster to accelerate the sync process by issuing:

psql> SELECT pg_switch_wal();

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Jan
11
2022
--

Online DDL With Group Replication in MySQL 8.0.27

Online DDL With Group Replication in MySQL 8.0.27

Online DDL With Group Replication in MySQL 8.0.27In April 2021, I wrote an article about Online DDL and Group Replication. At that time we were dealing with MySQL 8.0.23 and also opened a bug report which did not have the right answer to the case presented. 

Anyhow, in that article I have shown how an online DDL was de facto locking the whole cluster for a very long time even when using the consistency level set to EVENTUAL.

This article is to give justice to the work done by the MySQL/Oracle engineers to correct that annoying inconvenience. 

Before going ahead, let us remember how an Online DDL was propagated in a group replication cluster, and identify the differences with what happens now, all with the consistency level set to EVENTUAL (see).

In MySQL 8.0.23 we were having:

While in MySQL 8.0.27 we have:

As you can see from the images we have three different phases. Phase one is the same between version 8.0.23 and version 8.0.27. 

Phases two and three, instead, are quite different. In MySQL 8.0.23 after the DDL is applied on the Primary, it is propagated to the other nodes, but a metalock was also acquired and the control was NOT returned. The result was that not only the session executing the DDL was kept on hold, but also all the other sessions performing modifications. 

Only when the operation was over on all secondaries, the DDL was pushed to Binlog and disseminated for Asynchronous replication, lock raised and operation can restart.

Instead, in MySQL 8.0.27,  once the operation is over on the primary the DDL is pushed to binlog, disseminated to the secondaries and control returned. The result is that the write operations on primary have no interruption whatsoever and the DDL is distributed to secondary and Asynchronous replication at the same time. 

This is a fantastic improvement, available only with consistency level EVENTUAL, but still, fantastic.

Let’s See Some Numbers

To test the operation, I have used the same approach used in the previous tests in the article mentioned above.

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmill8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4-5:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Modifying a table with ~5 million rows:

node1-DC1 (root@localhost) [windmills_large]>select count(*) from  windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+

The numbers below represent the time second/milliseconds taken by the operation to complete. While I was also catching the state of the ALTER on the other node I am not reporting it here given it is not relevant. 

EVENTUAL (on the primary only)
-------------------
Node 1 same table:
.184
.186 <--- no locking during alter on the same node
.184
<snip>
.184
.217 <--- moment of commit
.186
.186
.186
.185

Node 1 another table :
.189
.198 <--- no locking during alter on the same node
.188
<snip>
.191
.211  <--- moment of commit
.194

As you can see there is just a very small delay at the moment of commit, but other impacts.

Now if we compare this with the recent tests I have done for Percona XtraDB Cluster (PXC) Non-Blocking operation (see A Look Into Percona XtraDB Cluster Non-Blocking Operation for Online Schema Upgrade) with the same number of rows and same kind of table/data:

Action Group Replication PXC (NBO)
Time on hold for insert for altering table ~ 0.217 sec ~ 120 sec
Time on hold for insert for another table ~ 0.211 sec ~ 25 sec

However, yes there is a however, PXC was maintaining consistency between the different nodes during the DDL execution, while MySQL 8.0.27 with Group Replication was postponing consistency on the secondaries, thus Primary and Secondary were not in sync until full DDL finalization on the secondaries.

Conclusions

MySQL 8.0.27 comes with this nice fix that significantly reduces the impact of an online DDL operation on a busy server. But we can still observe a significant misalignment of the data between the nodes when a DDL is executing. 

On the other hand, PXC with NBO is a bit more “expensive” in time, but nodes remain aligned all the time.

In the end, is what is more important for you to choose one or the other solution, consistency vs. operational impact.

Great MySQL to all.

Jan
10
2022
--

MySQL 8.0 Functional Indexes

MySQL 8.0 Functional Indexes

MySQL 8.0 Functional IndexesWorking with hundreds of different customers I often face similar problems around running queries. One very common problem when trying to optimize a database environment is index usage. A query that cannot use an index is usually a long-running one, consuming more memory or triggering more disk iops.

A very common case is when a query uses a filter condition against a column that is involved in some kind of functional expression. An index on that column can not be used.

Starting from MySQL 8.0.13 functional indexes are supported. In this article, I’m going to show what they are and how they work.

The Well-Known Problem

As already mentioned, a very common problem about index usage is when you have a filter condition against one or more columns involved in some kind of functional expression.

Let’s see a simple example.

You have a table called products containing the details of your products, including a create_time TIMESTAMP column. If you would like to calculate the average price of your products on a specific month you could do the following:

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

The query returns the right value, but take a look at the EXPLAIN:

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 99015
     filtered: 100.00
        Extra: Using where

 

The query triggers a full scan of the table. Let’s create an index on create_time and check again:

mysql> ALTER TABLE products ADD INDEX(create_time);
Query OK, 0 rows affected (0.71 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 99015
     filtered: 100.00
        Extra: Using where

 

A full scan again. The index we have created is not effective. Indeed any time an indexed column is involved in a function the index can not be used.

To optimize the query the workaround is rewriting it differently in order to isolate the indexed column from the function.

Let’s test the following equivalent query:

mysql> SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01';
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: range
possible_keys: create_time
          key: create_time
      key_len: 5
          ref: NULL
         rows: 182
     filtered: 100.00
        Extra: Using index condition

 

Cool, now the index is used. Then rewriting the query was the typical suggestion.

Quite a simple solution, but not all the times it was possible to change the application code for many valid reasons. So, what to do then?

 

MySQL 8.0 Functional Indexes

Starting from version 8.0.13, MySQL supports functional indexes. Instead of indexing a simple column, you can create the index on the result of any function applied to a column or multiple columns.

Long story short, now you can do the following:

mysql> ALTER TABLE products ADD INDEX((MONTH(create_time)));
Query OK, 0 rows affected (0.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

Be aware of the double parentheses. The syntax is correct since the expression must be enclosed within parentheses to distinguish it from columns or column prefixes.

Indeed the following returns an error:

mysql> ALTER TABLE products ADD INDEX(MONTH(create_time));
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'create_time))' at line 1

Let’s check now our original query and see what happens to the EXPLAIN

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

 

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ref
possible_keys: functional_index
          key: functional_index
      key_len: 5
          ref: const
         rows: 182
     filtered: 100.00
        Extra: NULL

 

The query is no longer a full scan and runs faster. The functional_index has been used, with only 182 rows examined. Awesome.

Thanks to the functional index we are no longer forced to rewrite the query.

Which Functional Indexes are Permitted

We have seen an example involving a simple function applied to a column, but you are granted to create more complex indexes.

A functional index may contain any kind of expressions, not only a single function. The following patterns are valid functional indexes:

INDEX( ( col1 + col2 ) )
INDEX( ( FUNC(col1) + col2 – col3 ) )

You can use ASC or DESC as well:

INDEX( ( MONTH(col1) ) DESC )

You can have multiple functional parts, each one included in parentheses:

INDEX( ( col1 + col2 ), ( FUNC(col2) ) )

You can mix functional with nonfunctional parts:

INDEX( (FUNC(col1)), col2, (col2 + col3), col4 )

There are also limitations you should be aware of:

  • A functional key can not contain a single column. The following is not permitted:
    INDEX( (col1), (col2) )
  • The primary key can not include a functional key part
  • The foreign key can not include a functional key part
  • SPATIAL and FULLTEXT indexes can not include functional key parts
  • A functional key part can not refer to a column prefix

At last, remember that the functional index is useful only to optimize the query that uses the exact same expression. An index created with nonfunctional parts can be used instead to solve multiple different queries.

For example, the following conditions can not rely on the functional index we have created:

WHERE YEAR(create_time) = 2019

WHERE create_time > ‘2019-10-01’

WHERE create_time BETWEEN ‘2019-10-01’ AND ‘2019-11-01’

WHERE MONTH(create_time+INTERVAL 1 YEAR)

All these will trigger a full scan.

Functional Index Internal

The functional indexes are implemented as hidden virtual generated columns. For this reason, you can emulate the same behavior even on MySQL 5.7 by explicitly creating the virtual column. We can test this, starting by dropping the indexes we have created so far.

mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `description` longtext,
  `price` decimal(8,2) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `create_time` (`create_time`),
  KEY `functional_index` ((month(`create_time`)))
) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

 

mysql> ALTER TABLE products DROP INDEX `create_time`, DROP INDEX `functional_index`;
Query OK, 0 rows affected (0.03 sec)

We can try now to create the virtual generated column:

mysql> ALTER TABLE products ADD COLUMN create_month TINYINT GENERATED ALWAYS AS (MONTH(create_time)) VIRTUAL;
Query OK, 0 rows affected (0.04 sec)

Create the index on the virtual column:

mysql> ALTER TABLE products ADD INDEX(create_month);
Query OK, 0 rows affected (0.55 sec)

 

mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `description` longtext,
  `price` decimal(8,2) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL,
  `create_month` tinyint GENERATED ALWAYS AS (month(`create_time`)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `create_month` (`create_month`)
) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

 

We can now try our original query. We expect to see the same behavior as the functional index.

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ref
possible_keys: create_month
          key: create_month
      key_len: 2
          ref: const
         rows: 182
     filtered: 100.00
        Extra: NULL

 

Indeed, the behavior is the same. The index on the virtual column can be used and the query is optimized.

The good news is that you can use this workaround to emulate a functional index even on 5.7, getting the same benefits. The advantage of MySQL 8.0 is that it is completely transparent, no need to create the virtual column.

Since the functional index is implemented as a hidden virtual column, there is no additional space needed for the data, only the index space will be added to the table.

By the way, this is the same technique used for creating indexes on JSON documents’ fields.

Conclusion

The functional index support is an interesting improvement you can find in MySQL 8.0. Some of the queries that required rewriting to get optimized don’t require that anymore. Just remember that only the queries having the same filter pattern can rely on the functional index. Then you need to create additional indexes or other functional indexes to improve other search patterns.

The same feature can be implemented on MySQL 5.7 with the explicit creation of a virtual generated column and the index.

For more detailed information, read the following page:

https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-functional-key-parts

Jan
04
2022
--

Percona Server for MySQL Encryption Options and Choices

Percona Server for MySQL Encryption

Percona Server for MySQL EncryptionSecurity will always be a main focal point of a company’s data. A common question I get from clients is, “how do I enable encryption?” Like every good consulting answer, it depends on what you are trying to encrypt. This post is a high-level summary of the different options available for encryption in Percona Server for MySQL.

Different certifications require different levels of encryption. For example, PCI requires both encryptions of data at rest and in transit. Here are the main facets of encryption for MySQL:

  • Data at Rest
    • Full disk encryption (at the OS level)
    • Transparent Data Encryption – TDE
    • Column/field-level encryption
  • Data in Transit
    • TLS Connections

Data at Rest

Data at rest is frequently the most asked about part of encryption. Data at rest encryption has multiple components, but at the core is simply ensuring that the data is encrypted at some level when stored. Here are the primary ways we can look at the encryption of data at rest.

Full Disk Encryption (FDE)

This is the easiest and most portable method of encrypting data at rest. When using full disk encryption, the main goal is to protect the hard drives in the event they are compromised. If a disk is removed from the server or the server is removed from a rack, the disk isn’t readable without the encryption key.

This can be managed in different ways, but the infrastructure team generally handles it. Frequently, enterprises already have disk encryption as part of the infrastructure stack. This makes FDE a relatively easy option for data at rest encryption. It also has the advantage of being portable. Regardless of which database technology you use, the encryption is managed at the server level.

The main disadvantage of FDE is that when the server is running, and the disk is mounted, all data is readable. It offers no protection against an attack on a running server once mounted.

Transparent Data Encryption (TDE)

Moving up the chain, the next option for data at rest encryption is Transparent Data Encryption (TDE). In contrast to FDE, this method encrypts the actual InnoDB data and log files. The main difference with database TDE is that the encryption is managed through the database, not at the server level. With this approach, the data and log files are encrypted on disk by the database. As data is read by MySQL/queries, the encrypted pages are read from disk and decrypted to be loaded into InnoDB’s buffer pool for execution.

For this method, the encryption keys are managed either through local files or a remote KMS (such as Hashicorp Vault) with the keyring_plugin. While this approach helps prevent any OS user from simply copying data files, the decrypted data does reside in memory which could be susceptible to a clever hacker. We must rely on OS-level memory protections for further assurance. It also adds a level of complexity for key management and backups that is now shifted to the DBA team.

Column Level Encryption

While the prior methods of at-rest encryption can help to meet various compliance requirements, both are limited when it comes to a running system. In either case, if a running system is compromised, the data stored is fully readable. Column level encryption works to protect the data in a running system without a key. Without a key, the data in the encrypted column is unreadable.

While this method protects selected data in a running system, it often requires application-level changes. Inserts are done with a specific encryption function (AES_ENCRYPT in MySQL, for example). To read the data, AES_DECRYPT with the specified key is required. The main risk with this approach is sending the plaintext values as part of the query. This can be sniffed if not using TLS or potentially leaked through log files. The better approach is to encrypt the data in the application BEFORE sending it to MySQL to ensure no plaintext is ever passed between systems.

In some cases, you can use a shared key for the entire application. Other approaches would be to use an envelope method and store a unique key alongside each encrypted value (protected by a separate master key).

Either way, it is important to understand one of the primary downsides to this approach – indexes and sort order can and will be impacted. For example, if you are encrypting the SSN number, you won’t be able to sort by SSN within MySQL. You would be able to look up a row using the SSN number but would need to pass the encrypted value.

Data in Transit

Now that we’ve discussed the different types of data-at-rest encryption, it is important to encrypt traffic to and from the database. Connecting to the server via TLS ensures that any sensitive sent to or from the server is encrypted. This can prevent data from leaking over the wire or via man-in-the-middle attacks.

This is a straightforward way to secure communication, and when combined with some at-rest encryption, serves to check a few more boxes towards various compliances.

Summary

Overall, there are several aspects of encryption in MySQL. This makes it possible to meet many common compliance requirements for different types of regulations. Security is a critical piece of the database tier, and these discussions are needed across teams in an organization. Ensuring that security, infrastructure, and the database team are on the same page is essential, especially during the design phase. Let our Professional Services team help you implement the approach that is best suited for your requirements – we are here to help!

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Jan
03
2022
--

Updated Percona Distribution for MongoDB, Bug Fixes in Percona Server for MySQL: Release Roundup January 3, 2022

Percona Releases January 2022

It’s the first release roundup of the New Year!

Percona Releases January 2022Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since December 20, 2021. Take a look!

 

Percona Distribution for MongoDB 5.0.5

December 29, 2021, saw the release of Percona Distribution for MongoDB 5.0.5. It is a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. The aim of Percona Distribution for MongoDB is to enable you to run and operate your MongoDB efficiently with the data being consistently backed up. This release includes bug fixes provided by MongoDB and included in Percona Server for MongoDB, as well as improvements to Percona Backup for MongoDB such as added support for automated access to S3 buckets using an EC2 instance profile.

Download Percona Distribution for MongoDB 5.0.5

Percona Distribution for MongoDB Operator 1.11.0

On December 21, 2021, we released Percona Distribution for MongoDB Operator 1.11.0, automating the creation, alteration, or deletion of members in your Percona Distribution for MongoDB environment. It can be used to instantiate a new environment or to scale an existing environment. The Operator contains all necessary Kubernetes settings to provide a proper and consistent Percona Server for MongoDB instance.

Release highlights include the ability to now configure backups to use Microsoft Azure Blob storage, making the Operator fully compatible with Azure Cloud, and custom sidecar containers allowing users to customize Percona Distribution for MongoDB and other Operator components without changing the container images.

Download Percona Distribution for MongoDB Operator 1.11.0

Percona Server for MongoDB 5.0.5-4

On Wednesday, December 29, 2021, Percona Server for MongoDB 5.0.5-4 was released. It is an enhanced, source-available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 5.0.5 Community Edition, supporting MongoDB 5.0.5 protocols and drivers. One of the main new features for MongoDB v5.0 is live resharding. Be aware that there have been multiple bugs related to this new feature through the releases to date (5.0.0 through 5.0.5). Please test thoroughly in your development and staging environments and use this feature on live production systems with caution.

Download Percona Server for MongoDB 5.0.5-4

Percona Server for MySQL 5.7.36-39

On December 22, 2021, Percona Server for MySQL 5.7.36-39 was released.  It includes all the features and bug fixes available in the MySQL 5.7.36 Community Edition, in addition to enterprise-grade features developed by Percona. Bug fixes for MySQL 5.7.36, provided by Oracle, are included in Percona Server for MySQL, including a fix for the possibility for a deadlock or failure when an undo log truncate operation is initiated after an upgrade from MySQL 5.6 to MySQL 5.7, a fix for when a parent table initiates a cascading SET NULL operation on the child table, and on a view, the query digest for each SELECT statement is now based on the SELECT statement and not the view definition, which was the case for earlier versions.

Download Percona Server for MySQL 5.7.36-39

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Dec
30
2021
--

Percona Monitoring and Management 2 Test Drive Using VirtualBox and SSH Tunnels

PMM using VirtualBox and SSH

PMM using VirtualBox and SSHPercona Monitoring and Management 2 (PMM2) is the database monitoring suite assembled and developed by Percona. It is based on standard open source components and custom-made software integrations. PMM2 helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

This blog post will describe a method to test PMM2 using your laptop’s VirtualBox, ssh tunnels, and without installing any agents on the database servers. This is a testing and evaluation environment, not intended for production. If you want to perform a full-fledged test of PMM2, we recommend an environment as similar as possible to your final production setup: enterprise virtualization, docker containers, or AWS. We assume that your laptop doesn’t have direct connectivity to the databases, this is why we use ssh tunnels.

PMM2 architecture consists of 2+1 components:PMM2 High Level Architecture
Two components run on your infrastructure: PMM Agents and PMM Server. The agents gather the metrics at the database and operating system levels. PMM Server takes care of processing, storing, grouping, and displaying these metrics. It can also perform additional operations like capturing serverless databases metrics, backups, and sending alerts (the last two features are in technical preview as of this writing). The other component to complete the formula is the Percona Platform, which adds more features to PMM, from advisors to DBaaS. Disclaimer: Percona Platform is in preview release with limited functionality – suitable for early adopters, development, and testing. Besides the extended features added to PMM, the Percona Platform brings together distributions of MySQL, PostgreSQL, and MongoDB including a range of open-source tools for data backup, availability, and management. You can learn more about the Platform here.

To make setup easier, PMM2 Server can be run either as a docker container or importing an OVA image, executed using VMWare VSphere, VirtualBox, or any other hypervisor. If you run your infrastructure in AWS, you can deploy PMM from the AWS Marketplace.

To run the agents, you need a Linux box. We recommend running the agents and the database on the same node. PMM can also gather the metrics using a direct connection to a server-less database or running an operating system that does not support the agent.

Often, installing the agents is a stopper for some DBAs who would like to test PMM2. Also, while containers are frequent in large organizations, we find virtualization and containers relegated to development and quality assurance environments. These environments usually don’t have direct access to production databases.

TCP Forwarding Across SSH Connections

AllowTcpForwarding is the ssh daemon configuration option that allows forwarding TCP ports across the ssh connection. At first sight, this may seem a security risk, but as the ssh documentation states: “disabling TCP forwarding does not improve security unless users are also denied shell access, as they can always install their forwarders.”

If your system administrators do not allow TCP forwarding, other options available to accomplish the same results are socat or netcat. But we will not cover them here.

If your laptop has direct access to the databases, you can skip all the ssh tunnels and use the direct access method described later in this post.

Install PMM 2 Ova

Download the Open Virtualization Format compatible image from https://www.percona.com/downloads/pmm2/ or use the command line:

$ wget https://www.percona.com/downloads/pmm2/2.25.0/ova/pmm-server-2.25.0.ova

You can import the OVA file using the UI, with the import option from the file menu, or using the command line:

$ VBoxManage import pmm-server-2.25.0.ova --vsys 0 --vmname "PMM Testing"
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Interpreting /Users/pep/Downloads/pmm-server-2.25.0.ova...
OK.
Disks:
vmdisk1 42949672960 -1 http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized PMM2-Server-2021-12-13-1012-disk001.vmdk -1 -1
vmdisk2 429496729600 -1 http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized PMM2-Server-2021-12-13-1012-disk002.vmdk -1 -1

Virtual system 0:
0: Suggested OS type: "RedHat_64"
(change with "--vsys 0 --ostype "; use "list ostypes" to list all possible values)
1: VM name specified with --vmname: "PMM Testing"
2: Suggested VM group "/"
(change with "--vsys 0 --group ")
3: Suggested VM settings file name "/Users/Pep/VirtualBox VMs/PMM2-Server-2021-12-13-1012/PMM2-Server-2021-12-13-1012.vbox"
(change with "--vsys 0 --settingsfile ")
4: Suggested VM base folder "/Users/Pep/VirtualBox VMs"
(change with "--vsys 0 --basefolder ")
5: Product (ignored): Percona Monitoring and Management
6: Vendor (ignored): Percona
7: Version (ignored): 2021-12-13
8: ProductUrl (ignored): https://www.percona.com/software/database-tools/percona-monitoring-and-management
9: VendorUrl (ignored): https://www.percona.com
10: Description "Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance"
(change with "--vsys 0 --description ")
11: Number of CPUs: 1
(change with "--vsys 0 --cpus ")
12: Guest memory: 4096 MB
(change with "--vsys 0 --memory ")
13: Network adapter: orig NAT, config 3, extra slot=0;type=NAT
14: SCSI controller, type LsiLogic
(change with "--vsys 0 --unit 14 --scsitype {BusLogic|LsiLogic}";
disable with "--vsys 0 --unit 14 --ignore")
15: Hard disk image: source image=PMM2-Server-2021-12-13-1012-disk001.vmdk, target path=PMM2-Server-2021-12-13-1012-disk001.vmdk, controller=14;channel=0
(change target path with "--vsys 0 --unit 15 --disk path";
disable with "--vsys 0 --unit 15 --ignore")
16: Hard disk image: source image=PMM2-Server-2021-12-13-1012-disk002.vmdk, target path=PMM2-Server-2021-12-13-1012-disk002.vmdk, controller=14;channel=1
(change target path with "--vsys 0 --unit 16 --disk path";
disable with "--vsys 0 --unit 16 --ignore")
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Successfully imported the appliance.

Once the machine is imported, we will connect it to a host-only network. This network restricts network traffic only between the host and the virtual machines. But first, let’s find a suitable network:

$ VBoxManage list hostonlyifs
Name: vboxnet0
GUID: 786f6276-656e-4074-8000-0a0027000000
DHCP: Disabled
IPAddress: 192.168.56.1
NetworkMask: 255.255.255.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Wireless: No
Status: Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet0

Select the first one that has Status up, write down the name and IP address. Then make sure that there is a DHCP server assigned to that interface:

$ VBoxManage list dhcpservers
NetworkName: HostInterfaceNetworking-vboxnet0
Dhcpd IP: 192.168.56.100
LowerIPAddress: 192.168.56.101
UpperIPAddress: 192.168.56.254
NetworkMask: 255.255.255.0
Enabled: Yes
Global Configuration:
minLeaseTime: default
defaultLeaseTime: default
maxLeaseTime: default
Forced options: None
Suppressed opts.: None
1/legacy: 255.255.255.0
Groups: None
Individual Configs: None

Now we will assign two network interfaces to our PMM virtual machine. One is allocated to the internal network, and the other uses NAT to connect to the internet and, for example, check for upgrades.

$ VBoxManage modifyvm "PMM Testing" --nic1 hostonly --hostonlyadapter1 vboxnet0
$ VBoxManage modifyvm "PMM Testing" --nic2 natnetwork

Once networking is configured, we may start the virtual machine.

$ VBoxManage startvm "PMM Testing"

The next step is to retrieve the IP address assigned to our PMM box. First, we will obtain the MAC address of the network card we recently added:

$ VBoxManage showvminfo "PMM Testing" | grep -i vboxnet0
NIC 1: MAC: 08002772600D, Attachment: Host-only Interface 'vboxnet0', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none

Using the retrieved MAC address we can look for the DHCP leases:

$ VBoxManage dhcpserver findlease --interface=vboxnet0 --mac-address=08002772600D
IP Address: 192.168.56.112
MAC Address: 08:00:27:72:60:0d
State: acked
Issued: 2021-12-21T22:15:54Z (1640124954)
Expire: 2021-12-21T22:25:54Z (1640125554)
TTL: 600 sec, currently 444 sec left

This is the IP address we will use to access the PMM server. Open a browser to connect to https://192.168.56.112 with the default credentials: admin/admin.

PMM2 Login window
The following step configures the tunnels to connect to the databases we monitor.

Set Up the SSH Tunnels

This is the topology of our network:

Private network topology
We will open two ssh connections per server we want to access from PMM. Open a terminal session and execute the following command, replacing it with the username you normally use to connect to your jump host:

$ ssh -L 192.168.56.1:3306:10.0.0.1:3306 @192.168.55.100

This creates a tunnel that connects the MySQL Server port 3306 with our local internal address in the same port. If you want to connect to more than one MySQL instance, you must use different ports. To open the tunnel for the MongoDB server, use the following command:

$ ssh -L 192.168.56.1:27017:10.0.0.2:27017 @192.168.55.100

Test the tunnel connectivity to the MySQL host using netcat:

$ nc -z 192.168.56.1 3306
Connection to 192.168.56.1 port 3306 [tcp/mysql] succeeded!

And also test the connectivity to the MongoDB host:

$ nc -z 192.168.56.1 27017
Connection to 192.168.56.1 port 27017 [tcp/*] succeeded!

This is the topology of our network including the ssh tunnels.
SSH tunnels

Configure Accounts

Follow the PMM documentation and create a MySQL account (or use an already existing account) with the required privileges:

CREATE USER 'pmm'@'10.0.0.100' IDENTIFIED BY '' WITH MAX_USER_CONNECTIONS 10;
GRANT SELECT, PROCESS, REPLICATION CLIENT, RELOAD, BACKUP_ADMIN ON *.* TO 'pmm'@'10.0.0.100';

Note that we need to use the internal IP address for the jump host. If you don’t know the IP address, use the wildcard ‘%’.

Add also the credentials for MongoDB, run this in a Mongo session:

db.getSiblingDB("admin").createRole({
role: "explainRole",
privileges: [{
resource: {
db: "",
collection: ""
},
actions: [
"listIndexes",
"listCollections",
"dbStats",
"dbHash",
"collStats",
"find"
]
}],
roles:[]
})

db.getSiblingDB("admin").createUser({
user: "pmm_mongodb",
pwd: "",
roles: [
{ role: "explainRole", db: "admin" },
{ role: "clusterMonitor", db: "admin" },
{ role: "read", db: "local" }
]
})

Add the Services to PMM

We can’t install the agents because we don’t have access to our PMM testing environment from the database servers. Instead, we will configure both services as remote instances. Go to the “Configuration” menu , select “PMM Inventory” , then “Add instance” . Then choose MySQL Add a remote instance.
Complete the following fields:
Hostname: 192.168.56.1 (This is the internal Host-Only VirtualBox address)
Service name: MySQL8
Port: 3306
Username: pmm
Password: <password>

And press the button. It will check the connectivity and, if everything is correct, the MySQL service will be added to the inventory. If there is an error, double-check that the ssh connection is still open and you entered the correct credentials. Make sure that the host that you specified to create the MySQL user is correct.

We will use a similar process for MongoDB:

These are the fields you have to complete with the correct information:
Hostname: 192.168.56.1 (Again, the internal Host-Only VirtualBox address)
Service name: MongoDB
Port: 27017
Username: pmm_mongodb
Password: <password>

And press the button. It will check the connectivity and, if everything is correct, the MongoDB service will be added to the inventory. If there is an error, double-check again that the ssh connection is open and you entered the correct credentials. You can use also the MongoDB client application to check access.

Once you added both services, you just need to wait for a few minutes to give time to collect data and start testing PMM2.

PMM2 Query Analyzer

Dec
27
2021
--

Backup Performance Comparison: mysqldump vs MySQL Shell Utilities vs mydumper vs mysqlpump vs XtraBackup

MySQL Backup Performance Comparison

MySQL Backup Performance ComparisonIn this blog post, we will compare the performance of performing a backup from a MySQL database using mysqldump, MySQL Shell feature called Instance Dump, mysqlpump, mydumper, and Percona XtraBackup. All these available options are open source and free to use for the entire community.

To start, let’s see the results of the test.

Benchmark Results

The benchmark was run on an m5dn.8xlarge instance, with 128GB RAM, 32 vCPU, and 2xNVMe disks of 600GB (one for backup and the other one for MySQL data). The MySQL version was 8.0.26 and configured with 89Gb of buffer pool, 20Gb of redo log, and a sample database of 177 GB (more details below).

We can observe the results in the chart below:

MySQL Backup Results

And if we analyze the chart only for the multi-threaded options:

multi-threaded options

As we can see, for each software, I’ve run each command three times in order to experiment using 16, 32, and 64 threads. The exception for this is mysqldump, which does not have a parallel option and only runs in a single-threaded mode.

We can observe interesting outcomes:

  1. When using zstd compression, mydumper really shines in terms of performance. This option was added not long ago (MyDumper 0.11.3).
  2. When mydumper is using gzip, MySQL Shell is the fastest backup option.
  3. In 3rd we have Percona XtraBackup.
  4. mysqlpump is the 4th fastest followed closer by mydumper when using gzip.
  5. mysqldump is the classic old-school style to perform dumps and is the slowest of the four tools.
  6. In a server with more CPUs, the potential parallelism increases, giving even more advantage to the tools that can benefit from multiple threads.

Hardware and Software Specs

These are the specs of the benchmark:

  • 32 CPUs
  • 128GB Memory
  • 2x NVMe disks 600 GB
  • Centos 7.9
  • MySQL 8.0.26
  • MySQL shell 8.0.26
  • mydumper 0.11.5 – gzip
  • mydumper 0.11.5 – zstd
  • Xtrabackup 8.0.26

The my.cnf configuration:

[mysqld]
innodb_buffer_pool_size = 89G
innodb_log_file_size = 10G

Performance Test

For the test, I used sysbench to populate MySQL. To load the data, I choose the tpcc method:

$ ./tpcc.lua  --mysql-user=sysbench --mysql-password='sysbench' --mysql-db=percona --time=300 --threads=64 --report-interval=1 --tables=10 --scale=100 --db-driver=mysql prepare

Before starting the comparison, I ran mysqldump once and discarded the results to warm up the cache, otherwise our test would be biased because the first backup would have to fetch data from the disk and not the cache.

With everything set, I started the mysqldump with the following options:

$ time mysqldump --all-databases --max-allowed-packet=4294967295 --single-transaction -R --master-data=2 --flush-logs | gzip > /backup/dump.dmp.gz

For the Shell utility:

$ mysqlsh
MySQL JS > shell.connect('root@localhost:3306');
MySQL localhost:3306 ssl test JS > util.dumpInstance("/backup", {ocimds: true, compatibility: ["strip_restricted_grants","ignore_missing_pks"],threads: 16})

For mydumper:

$ time mydumper  --threads=16  --trx-consistency-only --events --routines --triggers --compress --outputdir /backup/ --logfile /backup/log.out --verbose=2

PS: To use zstd, there are no changes in the command line, but you need to download the zstd binaries.

For mysqlpump:

$ time mysqlpump --default-parallelism=16 --all-databases > backup.out

For xtrabackup:

$ time xtrabackup --backup --parallel=16 --compress --compress-threads=16 --datadir=/mysql_data/ --target-dir=/backup/

Analyzing the Results

And what do the results tell us?

Parallel methods have similar performance throughput. The mydumper tool cut the execution time by 50% when using zstd instead of gzip, so the compression method makes a big difference when using mydumper.

For the util.dumpInstance utility, one advantage is that the tool stores data in both binary and text format and uses zstd compression by default. Like mydumper, it uses multiple files to store the data and has a good compression ratio. 

XtraBackup got third place with a few seconds of difference from MySQL shell. The main advantage of XtraBackup is its flexibility, providing PITR and encryption for example. 

Next, mysqlpump is more efficient than mydumper with gzip, but only by a small margin. Both are logical backup methods and works in the same way. I tested mysqlpump with zstd compression, but the results were the same, hence the reason I didn’t add it to the chart. One possibility is because mysqlpump streams the data to a single file.

Lastly, for mysqldump, we can say that it has the most predictable behavior and has similar execution times with different runs. The lack of parallelism and compression is a disadvantage for mysqldump; however, since it was present in the earliest MySQL versions, based on Percona cases, it is still used as a logical backup method.

Please leave in the comments below what you thought about this blog post, if I missed something, or if it helped you. I will be glad to discuss it!

Useful Resources

Finally, you can reach us through the social networks, our forum, or access our material using the links presented below:

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com