Mar
08
2023
--

Dealing With “Too Many Connections” Error in MySQL 8

too many connections MySQL

Over the years of being a DBA, I had to deal with all kinds of problems in the database. One of the most common problems I faced is the one related to the well-known error “ERROR 1040 (08004): Too many connections”. A lot has been written about this error. Still, the users keep falling into this trap, maybe because of a poorly configured database, a change in the application components, or just because of a sudden increase of connections in the application. At some point, we all face this issue in our careers, not only once but many times. The main objective of this blog post is to point out the new administrative connections allowed on MySQL 8, as these connections can save us from restarting the instance in case this happens.

Default behavior

We know that the amount of connections allowed in the database is defined by the parameter “max_connections.” The default value for this parameter is 151, and it can be changed dynamically, which means without a database restart. If the connections in the database are maxed out, we will hit the dreadful message “ERROR 1040 (08004): Too many connections”. It is important to remember that out of the box, MySQL allows one extra connection, this connection is reserved for the users with “SUPER” privilege (already deprecated here) or the CONNECTION_ADMIN privilege.

I’ll show an example of this feature; for this example, I have an instance with “max_connections=20”, and I have three users, user “monitor1” has only the PROCESS privilege, user “admin1” has the privileges PROCESS and CONNECTION_ADMIN, finally user “admin2” has the privilege SUPER (deprecated). We will see how MySQL treats these connections in the event of having an instance maxed out on user connections:

-- execute all 20 concurrent connections
sysbench oltp_read_write --table-size=1000000 --db-driver=mysql --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password="***" --num-threads=20 --time=0 --report-interval=1 run
-- test with user monitor1 
[root@rocky-test1 ~]# mysql -u monitor1 -p
Enter password:
ERROR 1040 (08004): Too many connections

-- test with user admin1
[root@rocky-test1 ~]# mysql -u admin1 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 144
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show grants;
+-----------------------------------------------+
| Grants for admin1@%                           |
+-----------------------------------------------+
| GRANT PROCESS ON *.* TO `admin1`@`%`          |
| GRANT CONNECTION_ADMIN ON *.* TO `admin1`@`%` |
+-----------------------------------------------+
2 rows in set (0.00 sec)

mysql> select count(1) from information_schema.processlist;
+----------+
| count(1) |
+----------+
|       22 |
+----------+
1 row in set (0.00 sec)


-- test with user admin2 
[root@rocky-test1 ~]# mysql -u admin2 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 145
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show grants;
+------------------------------------+
| Grants for admin2@%                |
+------------------------------------+
| GRANT SUPER ON *.* TO `admin2`@`%` |
+------------------------------------+
1 row in set (0.00 sec)

mysql> select count(1) from information_schema.processlist;
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

As you can see, one single connection with a user with “CONNECTION_ADMIN” or “SUPER” privilege is allowed, however, when user “monitor1” tried to connect, it was not possible because it did not have any of those privileges. Once we gain access to the database, we can easily increase the connections by changing the variable “max_connections” online and then checking the origin of the problem. It is important to remember that only one of these connections is allowed, so please don’t grant these privileges to any user, or you could still be locked out of your database.

– trying a second connection with user admin1

[root@rocky-test1 ~]# mysql -u admin1 -p
Enter password:
ERROR 1040 (HY000): Too many connections

Usually, when this problem occurs, and we cannot gain access to MySQL, the immediate solution is to restart the database and deal with all consequences that this causes, but hey… that is better than rejecting connections for several minutes during the business’s normal operating hours. There is another alternative to gain access to the database, which is by using GDB, but it is not always possible, and Too many connections? No problem! is an article we wrote about this tool in the past, the article is a bit old but still valid.

Side note for Percona Server for MySQL and MariaDB

Percona Server for MySQL, in versions before 8.0.14, had another way to access the database instance, similar to the new feature introduced in version 8.0.14. It was by enabling variables “extra_port” and “extra_max_connections,” and the usage of these variables is out of the scope of this blog post, but the objective of such variables was to allow connections to the database even when the database maximum connections have been reached. Remember that those variables were removed on version 8.0.14, and if found in the config file, the instance will not start, and an error will be shown. Like Percona Server for MySQL, MariaDB had a similar implementation for the same variables. Documentation for MariaDB can be found here.

New feature

Starting with MySQL 8.0.14, a new “Administrative Connections” or “Administrative Network Interface” feature was introduced. This feature allows connections to the database through an administrative port, there is no limit on the number of administrative connections. The difference between this feature and the single connection shown in the previous example is that this is a different port, and it does not limit the connections to only one but more than one connection if required. This should allow us to access the database when the user connections are maxed out and work from there to increase the connections or kill some of the application connections.

The easiest way to enable the “Administrative Connections” is to define the “admin_address” variable, this is the IP address that the administrative connections will listen to, for example, if you only want to allow local connections, you can define this variable as “127.0.0.1”, or if you want to connect through the network, you can define this variable as the server’s IP address. This variable is not dynamic, which means it will require a database restart. By default, this variable is empty, meaning the administrative interface is disabled. Another related variable is “admin_port”; this variable defines the port MySQL will listen to for the administrative connections, the default value for this variable is 33062. Once you define both variables and restart the database, you will see a message indicating the admin interface is ready for connections in the error log:

2023-02-28T14:42:44.383663Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '127.0.0.1'  port: 33062

Now that the admin interface is configured, you need to define the users that can access this administrative connection. These users will require the “SERVICE_CONNECTION_ADMIN” privilege; otherwise, they won’t be able to connect to it. Following our initial example, I have granted the “SERVICE_CONNECTION_ADMIN” to the user “admin1” but not to user “admin2”

mysql> show grants for admin1;
+------------------------------------------------------------------------+
| Grants for admin1@%                                                    |
+------------------------------------------------------------------------+
| GRANT PROCESS ON *.* TO `admin1`@`%`                                   |
| GRANT CONNECTION_ADMIN,SERVICE_CONNECTION_ADMIN ON *.* TO `admin1`@`%` |
+------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> show grants for admin2;
+------------------------------------+
| Grants for admin2@%                |
+------------------------------------+
| GRANT SUPER ON *.* TO `admin2`@`%` |
+------------------------------------+
1 row in set (0.00 sec)

Testing connection to the admin interface, we see that only user “admin1” is allowed, while user “admin2” connection is rejected for lacking privilege “SERVICE_CONNECTION_ADMIN.” Also, we can confirm user “admin1” is connected to port 33062, which is the port used for the admin interface.

-- testing user admin1

[root@rocky-test1 ~]# mysql -h 127.0.0.1 -P 33062 -u admin1 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 23
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 8.0.29-21 for Linux on x86_64 (Percona Server (GPL), Release 21, Revision c59f87d2854)

Connection id:		23
Current database:
Current user:		admin1@localhost
SSL:			Cipher in use is TLS_AES_256_GCM_SHA384
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854
Protocol version:	10
Connection:		127.0.0.1 via TCP/IP
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb4
Conn.  characterset:	utf8mb4
TCP port:		33062
Binary data as:		Hexadecimal
Uptime:			50 min 27 sec

Threads: 3  Questions: 188  Slow queries: 0  Opens: 335  Flush tables: 3  Open tables: 269  Queries per second avg: 0.062
--------------

-- testing user admin2

[root@rocky-test1 ~]# mysql -h 127.0.0.1 -P 33062 -u admin2 -p
Enter password:
ERROR 1227 (42000): Access denied; you need (at least one of) the SERVICE_CONNECTION_ADMIN privilege(s) for this operation

Conclusion

If you are using MySQL 8.0.14 or higher, you should enable the admin interface, as we have seen, enabling this feature is super easy and leverages a great feature by allowing access to the database to DBAs in case of an event of “ERROR 1040 (08004): Too many connections”. This new feature does not affect normal database performance and brings great power to DBAs.  Please consider adding the privilege “SERVICE_CONNECTION_ADMIN” only to administrative users, not application users, the idea is not to abuse this feature. If you are still using a lower version of Percona Server for MySQL, please remember you can configure variables  “extra_port” and extra_max_connections to access your database in case you face a max connections issue.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Mar
08
2023
--

Dealing With “Too Many Connections” Error in MySQL 8

too many connections MySQL

Over the years of being a DBA, I had to deal with all kinds of problems in the database. One of the most common problems I faced is the one related to the well-known error “ERROR 1040 (08004): Too many connections”. A lot has been written about this error. Still, the users keep falling into this trap, maybe because of a poorly configured database, a change in the application components, or just because of a sudden increase of connections in the application. At some point, we all face this issue in our careers, not only once but many times. The main objective of this blog post is to point out the new administrative connections allowed on MySQL 8, as these connections can save us from restarting the instance in case this happens.

Default behavior

We know that the amount of connections allowed in the database is defined by the parameter “max_connections.” The default value for this parameter is 151, and it can be changed dynamically, which means without a database restart. If the connections in the database are maxed out, we will hit the dreadful message “ERROR 1040 (08004): Too many connections”. It is important to remember that out of the box, MySQL allows one extra connection, this connection is reserved for the users with “SUPER” privilege (already deprecated here) or the CONNECTION_ADMIN privilege.

I’ll show an example of this feature; for this example, I have an instance with “max_connections=20”, and I have three users, user “monitor1” has only the PROCESS privilege, user “admin1” has the privileges PROCESS and CONNECTION_ADMIN, finally user “admin2” has the privilege SUPER (deprecated). We will see how MySQL treats these connections in the event of having an instance maxed out on user connections:

-- execute all 20 concurrent connections
sysbench oltp_read_write --table-size=1000000 --db-driver=mysql --mysql-host=localhost --mysql-db=sbtest --mysql-user=root --mysql-password="***" --num-threads=20 --time=0 --report-interval=1 run
-- test with user monitor1 
[root@rocky-test1 ~]# mysql -u monitor1 -p
Enter password:
ERROR 1040 (08004): Too many connections

-- test with user admin1
[root@rocky-test1 ~]# mysql -u admin1 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 144
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show grants;
+-----------------------------------------------+
| Grants for admin1@%                           |
+-----------------------------------------------+
| GRANT PROCESS ON *.* TO `admin1`@`%`          |
| GRANT CONNECTION_ADMIN ON *.* TO `admin1`@`%` |
+-----------------------------------------------+
2 rows in set (0.00 sec)

mysql> select count(1) from information_schema.processlist;
+----------+
| count(1) |
+----------+
|       22 |
+----------+
1 row in set (0.00 sec)


-- test with user admin2 
[root@rocky-test1 ~]# mysql -u admin2 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 145
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show grants;
+------------------------------------+
| Grants for admin2@%                |
+------------------------------------+
| GRANT SUPER ON *.* TO `admin2`@`%` |
+------------------------------------+
1 row in set (0.00 sec)

mysql> select count(1) from information_schema.processlist;
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

As you can see, one single connection with a user with “CONNECTION_ADMIN” or “SUPER” privilege is allowed, however, when user “monitor1” tried to connect, it was not possible because it did not have any of those privileges. Once we gain access to the database, we can easily increase the connections by changing the variable “max_connections” online and then checking the origin of the problem. It is important to remember that only one of these connections is allowed, so please don’t grant these privileges to any user, or you could still be locked out of your database.

– trying a second connection with user admin1

[root@rocky-test1 ~]# mysql -u admin1 -p
Enter password:
ERROR 1040 (HY000): Too many connections

Usually, when this problem occurs, and we cannot gain access to MySQL, the immediate solution is to restart the database and deal with all consequences that this causes, but hey… that is better than rejecting connections for several minutes during the business’s normal operating hours. There is another alternative to gain access to the database, which is by using GDB, but it is not always possible, and Too many connections? No problem! is an article we wrote about this tool in the past, the article is a bit old but still valid.

Side note for Percona Server for MySQL and MariaDB

Percona Server for MySQL, in versions before 8.0.14, had another way to access the database instance, similar to the new feature introduced in version 8.0.14. It was by enabling variables “extra_port” and “extra_max_connections,” and the usage of these variables is out of the scope of this blog post, but the objective of such variables was to allow connections to the database even when the database maximum connections have been reached. Remember that those variables were removed on version 8.0.14, and if found in the config file, the instance will not start, and an error will be shown. Like Percona Server for MySQL, MariaDB had a similar implementation for the same variables. Documentation for MariaDB can be found here.

New feature

Starting with MySQL 8.0.14, a new “Administrative Connections” or “Administrative Network Interface” feature was introduced. This feature allows connections to the database through an administrative port, there is no limit on the number of administrative connections. The difference between this feature and the single connection shown in the previous example is that this is a different port, and it does not limit the connections to only one but more than one connection if required. This should allow us to access the database when the user connections are maxed out and work from there to increase the connections or kill some of the application connections.

The easiest way to enable the “Administrative Connections” is to define the “admin_address” variable, this is the IP address that the administrative connections will listen to, for example, if you only want to allow local connections, you can define this variable as “127.0.0.1”, or if you want to connect through the network, you can define this variable as the server’s IP address. This variable is not dynamic, which means it will require a database restart. By default, this variable is empty, meaning the administrative interface is disabled. Another related variable is “admin_port”; this variable defines the port MySQL will listen to for the administrative connections, the default value for this variable is 33062. Once you define both variables and restart the database, you will see a message indicating the admin interface is ready for connections in the error log:

2023-02-28T14:42:44.383663Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '127.0.0.1'  port: 33062

Now that the admin interface is configured, you need to define the users that can access this administrative connection. These users will require the “SERVICE_CONNECTION_ADMIN” privilege; otherwise, they won’t be able to connect to it. Following our initial example, I have granted the “SERVICE_CONNECTION_ADMIN” to the user “admin1” but not to user “admin2”

mysql> show grants for admin1;
+------------------------------------------------------------------------+
| Grants for admin1@%                                                    |
+------------------------------------------------------------------------+
| GRANT PROCESS ON *.* TO `admin1`@`%`                                   |
| GRANT CONNECTION_ADMIN,SERVICE_CONNECTION_ADMIN ON *.* TO `admin1`@`%` |
+------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> show grants for admin2;
+------------------------------------+
| Grants for admin2@%                |
+------------------------------------+
| GRANT SUPER ON *.* TO `admin2`@`%` |
+------------------------------------+
1 row in set (0.00 sec)

Testing connection to the admin interface, we see that only user “admin1” is allowed, while user “admin2” connection is rejected for lacking privilege “SERVICE_CONNECTION_ADMIN.” Also, we can confirm user “admin1” is connected to port 33062, which is the port used for the admin interface.

-- testing user admin1

[root@rocky-test1 ~]# mysql -h 127.0.0.1 -P 33062 -u admin1 -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 23
Server version: 8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 8.0.29-21 for Linux on x86_64 (Percona Server (GPL), Release 21, Revision c59f87d2854)

Connection id:		23
Current database:
Current user:		admin1@localhost
SSL:			Cipher in use is TLS_AES_256_GCM_SHA384
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		8.0.29-21 Percona Server (GPL), Release 21, Revision c59f87d2854
Protocol version:	10
Connection:		127.0.0.1 via TCP/IP
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb4
Conn.  characterset:	utf8mb4
TCP port:		33062
Binary data as:		Hexadecimal
Uptime:			50 min 27 sec

Threads: 3  Questions: 188  Slow queries: 0  Opens: 335  Flush tables: 3  Open tables: 269  Queries per second avg: 0.062
--------------

-- testing user admin2

[root@rocky-test1 ~]# mysql -h 127.0.0.1 -P 33062 -u admin2 -p
Enter password:
ERROR 1227 (42000): Access denied; you need (at least one of) the SERVICE_CONNECTION_ADMIN privilege(s) for this operation

Conclusion

If you are using MySQL 8.0.14 or higher, you should enable the admin interface, as we have seen, enabling this feature is super easy and leverages a great feature by allowing access to the database to DBAs in case of an event of “ERROR 1040 (08004): Too many connections”. This new feature does not affect normal database performance and brings great power to DBAs.  Please consider adding the privilege “SERVICE_CONNECTION_ADMIN” only to administrative users, not application users, the idea is not to abuse this feature. If you are still using a lower version of Percona Server for MySQL, please remember you can configure variables  “extra_port” and extra_max_connections to access your database in case you face a max connections issue.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Feb
02
2023
--

An Argument for Logical Failover Slots

Logical Failover Slots PostgreSQL

These days, the typical PostgreSQL cluster consists not only of a multi-node replication cluster, which is ordinarily an asynchronous streaming replication model but can sometimes include a logical replication component.

 

logical replication postgresql

 

Recall that logical replication works by using the PUB/SUB mode, where individual tables are published and are then subscribed by remotely connected databases/tables. DML operations, such as INSERT, UPDATE, DELETE, TRUNCATE, etc., are then replicated from one or more publishers to one or more subscribers.

For the most part it’s a pretty straightforward operation. The only real limitation is that one can only publish tables from a read-write server such as the PRIMARY.

 

There’s a problem, however, when it comes to executing failovers. While a STANDBY/REPLICA host can quickly take over with a simple select pg_promote() SQL statement, the same cannot be said regarding logical replication. Instead, failing over is, in fact, not a single action but a collection of discrete steps that must be performed in an exact and precise manner.

Consider the following steps typifying a failover:

  1. Promote the REPLICA, thus becoming the new PRIMARY.
  2. When present, redirect other REPLICAs pointing to the new PRIMARY.
  3. Failover the logical replication slot:
    1. Block all logical replication from the failed PRIMARY.
    2. Create a logical replication slot on the newly promoted PRIMARY.
    3. Alter the subscription and point to the newly promoted PRIMARY.
    4. Resume/restore logical replication between publisher and subscriber.

The challenge, of course, is reconstituting the logical replication process as quickly as possible, ideally without any data loss.

Two common solutions come to mind:

  1. Block all incoming connections before promoting the STANDBY while simultaneously installing the logical replication slot.
  2. Promote the STANDBY and then flush and recopy the entire contents of the subscribed table(s).

In both cases, these solutions make it problematic for a fast recovery, not to mention the additional effort required by the SRA/DBA/DEV preparing the requisite instructions.

failover postgresql

In an ideal world, the perfect implementation of a logical slot failover would have one already in place on the STANDBY, thus guaranteeing data consistency without fear of data loss no matter how fast or slow the actual promotion or altering the subscription’s connectivity parameters may take.

The good news is that there are hacks that can speed up the entire recovery process, although they all have their own quirky limitations. Patroni, for example, implements one such technique by copying the file named state, which is located in the data cluster’s subdirectory pg_repslot, over to the REPLICA’s data cluster in the same location.

# PGDATA on an Ubuntu install
/var/lib/postgresql/15/main/pg_replslot/pg3
??? state

However, the caveat is that it requires a full server restart to the REPLICA, in addition to being promoted, before the logical slot is fully active. And, of course, timing is critical.

failover desired state postgresql

 

So there you have it.

Too bad such a feature doesn’t exist… or does it?

 

 

 

 

Stay tuned folks; the game is afoot. ?

For the curious, I’ve included some references in past blogs, courtesy of Jobin:

And here’s another set of references: a valiant attempt to incorporate this feature into PostgreSQL a few years ago.

 

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open source community in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

 

Dec
23
2022
--

Diffing PostgreSQL Schema Changes

Diffing PostgreSQL Schema Changes

One of the routine operations when administering PostgreSQL is periodic updates to the database system’s architecture. PostgreSQL does a good job of allowing one to update a schema, add types, functions, triggers, or alter a table by adding and removing columns and updating column data types, etc., in a reliable manner. However, there is no built-in mechanism to help identify the differences, let alone generate the necessary SQL, to accomplish updates in an easy manner from the development to the production environment.

So let’s talk about possible approaches to schema changes.

Using logical dump manifests

The easiest way to identify changes between schemas from one database to another is to compare schema dump manifests

The following example demonstrates an approach one can take looking for differences between schema on different databases:

EXAMPLE:

-- create database schemas
create database db01
create database db01


-- db01: version 1
create table t1 (
    c1 int,
    c2 text,
    c4 date
);

create table t2(
    c1 int,
    c2 varchar(3),
    c3 timestamp,
    c4 date
);
-- db02: version 2
create table t1 (
    c1 serial primary key,
    c2 varchar(256),
    c3 date default now()
);

create table t2(
    c1 serial primary key,
    c2 varchar(3),
    c3 varchar(50),
    c4 timestamp with time zone default now(),
    c5 int references t1(c1)
);

create index on t2 (c5);

 

# generate schema dumps
pg_dump -s db01 -Fc > db01.db
pg_dump -s db02 -Fc > db02.db
# generate manifests
pg_restore -l db01.db > db01_manifest.ini
pg_restore -l db02.db > db02_manifest.ini

This snippet demonstrates looking for differences by comparing the md5 checksums:

# EX 1: generate checksums
md5sum \
<(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \
<(tail +16 db02_manifest.ini | cut -d ' ' -f 4-)

# output
$ 9d76c028259f2d8bed966308c256943e  /dev/fd/63
$ ba124f9410ea623085c237dc4398388a  /dev/fd/62

This next snippet diffs the differences between the two manifests identifying only those objects and attributes that have changed. Notice that redundant information, the first 16 lines, are skipped:

# EX 2: perform diff
diff \
> <(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \
> <(tail +16 db02_manifest.ini | cut -d ' ' -f 4-)

 

This resultant diff shows the changes made between the two schemas:

1a2,3
> SEQUENCE public t1_c1_seq postgres
> SEQUENCE OWNED BY public t1_c1_seq postgres
2a5,12
> SEQUENCE public t2_c1_seq postgres
> SEQUENCE OWNED BY public t2_c1_seq postgres
> DEFAULT public t1 c1 postgres
> DEFAULT public t2 c1 postgres
> CONSTRAINT public t1 t1_pkey postgres
> CONSTRAINT public t2 t2_pkey postgres
> INDEX public t2_c5_idx postgres
> FK CONSTRAINT public t2 t2_c5_fkey postgres

The good news is that there are a number of existing tools that can reconcile differences between a proposed schema design and the target schema:

  1. Commercial offerings can differentiate schema between databases in an elegant and efficient manner. Researching, ala Google, yields the most popular technologies one can use.
  2. In regards to open source solutions, there are a number of projects capable of diffing Postgres database schemas.

Working with the apgdiff extension

The following is an example implementation of the open source tool apgdiff

Apgdiff can be found in the Postgres community repository. It compares two schema dump files and creates an SQL output file that is, for the most part, suitable for upgrades of old schemata:

Package: apgdiff

Version: 2.7.0-1.pgdg18.04+1
Architecture: all
Maintainer: Debian PostgreSQL Maintainers <team+postgresql@tracker.debian.org>
Installed-Size: 173
Depends: default-jre-headless | java2-runtime-headless
Homepage: https://www.apgdiff.com/
Priority: optional
Section: database
Filename: pool/main/a/apgdiff/apgdiff_2.7.0-1.pgdg18.04+1_all.deb
Size: 154800
SHA256: 9a83fcf54aed00e1a28c3d00eabe1c166977af1e26e91035e15f88b5215b181b
SHA1: ea713acb55898f07374dadd1bebb09ec2fa4b589
MD5sum: e70a97903cb23b8df8a887da4c54e945

The following example demonstrates how one can update differences between the development environment and the production database schema using apgdiff.

EXAMPLE:

apt install -y apgdiff
# EX 1: dump as SQL statements
pg_dump -s db01 -Fp > db01.sql
pg_dump -s db02 -Fp > db02.sql
createdb db03 --template=db01

apgdiff --ignore-start-with db01.sql db02.sql > db01-db02.sql

# “psql -1” encapsulates statements within a transaction
psql -1 -f db01-db02.sql db03

 

# EX 2: uses logical dumps
# notice the dumps are standard logical dumps and includes data
pg_dump db01 -Fc > db01.db
pg_dump db02 -Fc > db02.db
createdb db03 --template=db01

# this invocation assumes the resultant diff doesn’t require editing
apgdiff --ignore-start-with \
    <(pg_restore -s -f - db01.db) \
    <(pg_restore -s -f - db02.db) \
    | psql -1 db03

There’s more you can accomplish with these simple approaches. By incorporating variations of these, one can create fairly sophisticated shell scripts with little code and, with a little luck, not that much effort.

Dec
08
2022
--

Percona Live 2023 Call For Papers Open!

Percona Live - Call for Papers

Percona Live - Call for PapersPercona Live 2023 will be in Denver Colorado May 22nd through the 24th. This is the premier open source database conference and you not only want to attend but you also need to present at!

We are seeking talks from all experience levels to share their expertise in either 50-minute or 3-hour tutorials. Topics can cover any and all open source database and data-related topics. Whether you are a MySQL, MongoDB, PostgreSQL, or a MariaDB expert, Percona Live has you covered! If Kubernetes, Cloud, observability, or DevOps your thing – we want you to! The deadline to submit is February 17th.

    • Breakout Sessions broadly cover a technology area using specific examples. Sessions should be up to 50 minutes in length (including Q&A)
    • Tutorial Sessions present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be 3 hours in length (including Q&A).

?We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments.

Accepted speakers for Breakout Sessions and Tutorials receive a free full conference pass. We’ll also widely promote the conference, and share tips on how you can promote your confirmed, rock star speaker status!

Call For Papers


If you are a first-time presenter or just want an extra set of eyes to look over your submissions, please do not hesitate to contact me or any of the other Developer Evangelists at Percona (DM @stoker on Twitter or email at David.Stokes@Percona.com. The audience is warm and receptive and really wants you to succeed. So come tell us about your experience implementing something, a new technical trend, or how you solved a problem.

And watch for notices on when tickets go on sale if you decide not to submit a session.

Call For Papers

Dec
02
2022
--

Using Liquibase as a Solution for Deploying and Tracking MySQL Schema Changes

Liquibase mysql schema changes

Liquibase mysql schema changesDatabase-as-code service is a new concept and gaining some popularity in recent years. As we already know, we have deployment solutions for application code. Managing and tracking application changes are quite easy with tools like Git and Jenkins.

Now this concept is applied in the database domain as well, assuming SQL as a code to manage database changes (DDL, DML) the same way that applications handle code. From a database standpoint, this allows tracing the history of modifications, allowing problems to be quickly detected and addressed.

What is Liquibase?

Liquibase is an open source, database-independent framework for deploying, managing, and tracking database schema changes. All the modifications or changes to the database are stored in text files (XML, YAML, JSON, or SQL) known as changesets. To specifically list database changes in order, Liquibase employs a changelog. The changelog serves as a record of changes and includes a list of changesets that Liquibase can execute on a target database.

Let’s see how we can set up Liquibase and perform some database changes with this tool.

Installing Liquibase

1. Download and extract Liquibase files.

shell> wget https://github.com/liquibase/liquibase/releases/download/v4.17.2/liquibase-4.17.2.tar.gz 
shell> mkdir /usr/local/bin/liquibase/ 
shell> sudo tar -xzvf liquibase-4.17.2.tar.gz --directory  /usr/local/bin/liquibase/

2. Define the installation directory to the environment path and add the same in “~/.bashrc” file as well.

export PATH=$PATH:/usr/local/bin/liquibase/liquibase

3. Validate the installation.

shell> liquibase --version

####################################################
## _ _ _ _ ##
## | | (_) (_) | ##
## | | _ __ _ _ _ _| |__ __ _ ___ ___ ##
## | | | |/ _` | | | | | '_ \ / _` / __|/ _ \ ##
## | |___| | (_| | |_| | | |_) | (_| \__ \ __/ ##
## \_____/_|\__, |\__,_|_|_.__/ \__,_|___/\___| ##
## | | ##
## |_| ##
## ## 
## Get documentation at docs.liquibase.com ##
## Get certified courses at learn.liquibase.com ## 
## Free schema change activity reports at ##
## https://hub.liquibase.com ##
## ##
####################################################
Starting Liquibase at 15:04:16 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Home: /usr/local/bin/liquibase
Java Home /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre (Version 1.8.0_352)

Note – Java(JDK) needs to be set up on your system for Liquibase to function.

How to use Liquibase with MySQL

1. To use Liquibase and MySQL, we need the JDBC driver JAR file. Copy the jar file in liquibase internal library.

shell> wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-j-8.0.31.tar.gz 
shell> tar -xzvf mysql-connector-j-8.0.31.tar.gz 
shell> cp mysql-connector-j-8.0.31/mysql-connector-j-8.0.31.jar  /usr/local/bin/liquibase/internal/lib/

2. First, initialize a project with the options below.

shell> liquibase init project \ 
--project-dir=/home/vagrant/liquibase_mysql_project \ 
--changelog-file=file \ 
--format=sql \ 
--project-defaults-file=liquibase.properties \ 
--url=jdbc:mysql://localhost:3306/sbtest \ 
--username=root \ 
--password=Root@1234

Let’s understand what these parameters are.

--project-dir => project location where all files related to the project will be kept.
--changelog-file => file containing deployment changes.
--format =>format of deployment file (.sql,xml etc).
--project-defaults-file => liquibase property file.
--url => MySQL database url.
--username=root => database user name.
--password => database password.

3. We can then create a manual file (“changelog.sql”) in the project location and define the changeset. Other formats for defining the deployment changes include (.xml, .json, or .yaml). 

CREATE TABLE test_table (test_id INT, test_column VARCHAR(256), PRIMARY KEY (test_id))

4. Validate the connection to the database is successful. Inside the Liquibase project folder, run the below command.

shell> liquibase --username=root --password=Root@1234 --changelog-file=changelog.sql status

5. Inspect the SQL before execution.

shell> liquibase --changelog-file=changelog.sql update-sql

6. Finally, deploy the changes.

shell> liquibase --changelog-file=changelog.sql update

Output:

Starting Liquibase at 16:56:44 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000) 
Liquibase Version: 4.17.2 
Liquibase Community 4.17.2 by Liquibase 
Running Changeset: changelog.sql::raw::includeAll 
Liquibase command 'update' was executed successfully.

7. Validate the changes in the database. Additionally, you observe below extra tables in the same database which captures a few more stats of the execution.

a)  Table: DATABASECHANGELOG

mysql> select * from DATABASECHANGELOG\G;
*************************** 1. row ***************************
ID: raw
AUTHOR: includeAll
FILENAME: changelog.sql
DATEEXECUTED: 2022-11-08 16:10:36
ORDEREXECUTED: 1
EXECTYPE: EXECUTED
MD5SUM: 8:155d0d5f1f1cb1c0098df92a8e92372a
DESCRIPTION: sql
COMMENTS: 
TAG: NULL
LIQUIBASE: 4.17.2
CONTEXTS: NULL
LABELS: NULL
DEPLOYMENT_ID: 7923831699

Note – Each changeset is tracked in the table as a row and is identified by the id, author, and filename fields.

b) Table: DATABASECHANGELOGLOCK

mysql> select * from DATABASECHANGELOGLOCK\G;
*************************** 1. row ***************************
ID: 1
LOCKED: 0x00
LOCKGRANTED: NULL
LOCKEDBY: NULL
1 row in set (0.00 sec)

Note – To guarantee that only one instance of Liquibase is running at once, Liquibase employs the DATABASECHANGELOGLOCK table.

8. By running the command “liquibase history” we can check the past deployments as well.

Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Liquibase History for jdbc:mysql://localhost:3306/sbtest

- Database updated at 11/8/22 4:10 PM. Applied 1 changeset(s), DeploymentId: 7923831699
changelog.sql::raw::includeAll

So here we have successfully deployed the changes in the target database. Next, we will see how we can track the changes and perform rollback operations.

In order to perform rollbacks with respect to certain DDL or  DML we need to add rollback changeset details in the .sql file.

Let’s see the steps to perform the rollback operations

1) Create the deployment file “deployment.sql” inside the project location with the below changeset details.

--liquibase formatted sql

--changeset AJ:1 labels:label1 context:context1
--comment: DDL creation

create table P1 (
id int primary key auto_increment not null,
name varchar(50) not null
);
--rollback DROP TABLE P1;

create table P2 (
id int primary key auto_increment not null,
name varchar(50) not null
);
--rollback DROP TABLE P2;

Option details:

AJ:1 => denotes author:id

labels:label1 => Specifies labels that are a general way to categorize changesets like contexts.

context:context1 => Executes the change if the particular context was passed at runtime. Any string can be used for the context name

2) Next, run the deployment. Run the below command inside the project location.

shell> liquibase --changelog-file=deployment.sql update

3) Now, add the tagging to manage rollback scenarios.

shell> liquibase tag version4

4) By default, the tagging will be added in the recent deployments. We can check the same in the below table.

mysql> select * from DATABASECHANGELOG\G;

ID: 1
AUTHOR: AJ
FILENAME: deployment.sql
DATEEXECUTED: 2022-11-28 07:09:11
ORDEREXECUTED: 3
EXECTYPE: EXECUTED
MD5SUM: 8:2e4e38d36676981952c21ae0b51895ef
DESCRIPTION: sql
COMMENTS: DDL creation
TAG: version4
LIQUIBASE: 4.17.2
CONTEXTS: context1
LABELS: label1
DEPLOYMENT_ID: 9619351799

5) Let’s roll back the executed changes. This command will revert all changes made to the database after the specified tag.

shell> liquibase --changelog-file=deployment.sql rollback version4

Output:

Starting Liquibase at 07:38:36 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Rolling Back Changeset: deployment.sql::1::AJ
Liquibase command 'rollback' was executed successfully.

Note – Post this activity, we don’t see those tables anymore in the database (P1,P2). As a rollback operation these tables were dropped now.

Alternatively, we can perform the rollback activity on the basis of timestamps as well. The below command is used to revert all changes made to the database from the current date to the date and time you specify. 

Eg,

shell> liquibase --changelog-file=deployment.sql  rollback-to-date 2022-11-27

Output:

Starting Liquibase at 07:18:26 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Rolling Back Changeset: deployment.sql::1::AJ
Liquibase command 'rollback-to-date' was executed successfully.

Liquibase integration with Percona Toolkit (pt-osc)

A Liquibase extension is available to enable the pt-online-schema-change feature of the Percona Toolkit. With the use of pt-osc rather than SQL, this extension substitutes several default changes. With the aid of the pt-online-schema-change tool, you can upgrade a database without locking any tables.

Let’s see the steps to use pt-osc with Liquibase extension

1) Download the Percona Liquibase jar file.

shell> wget https://github.com/liquibase/liquibase-percona/releases/download/v4.17.1/liquibase-percona-4.17.1.jar

2)  Copy the jar file to Liquibase internal library folder.

shell> sudo cp liquibase-percona-4.17.1.jar /usr/local/bin/liquibase/internal/lib/

3) Add the below changeset in changelog.xml file which basically adds one column (“osc”) in table:liq1.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xmlns:pro="http://www.liquibase.org/xml/ns/pro"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">

<changeSet id="3" author="AJ">
<addColumn tableName="liq1"
xmlns:liquibasePercona="http://www.liquibase.org/xml/ns/dbchangelog-ext/liquibase-percona"
liquibasePercona:usePercona="true">
<column name="osc" type="varchar(255)"/>
</addColumn>
</changeSet>
</databaseChangeLog>

Note – here we mentioned using Percona=”true” which enable the DDL execution via pt-osc.

4) Validate the deployment changes before actual implementation. We can see below the pt-osc command reference which is going to be executed in the next phase. 

shell>  liquibase --changelog-file=changelog.xml update-sql

Output

Liquibase Community 4.17.2 by Liquibase
-- *********************************************************************
-- Update Database Script
-- *********************************************************************
-- Change Log: changelog.xml
-- Ran at: 11/28/22 8:28 AM
-- Against: root@localhost@jdbc:mysql://localhost:3306/liq
-- Liquibase version: 4.17.2
-- *********************************************************************

-- Lock Database
UPDATE liq.DATABASECHANGELOGLOCK SET `LOCKED` = 1, LOCKEDBY = 'localhost.localdomain (10.0.2.15)', LOCKGRANTED = NOW() WHERE ID = 1 AND `LOCKED` = 0;

-- Changeset changelog.xml::3::AJ
-- pt-online-schema-change --alter-foreign-keys-method=auto --nocheck-unique-key-change --alter="ADD COLUMN osc VARCHAR(255) NULL" --password=*** --execute h=localhost,P=3306,u=root,D=liq,t=liq1;

5) Finally, run the deployment.

shell>  liquibase --changelog-file=changelog.xml update

Summary

Apart from MySQL, Liquibase supports other popular databases like (PostgreSQL, Cassandra, and MongoDB). The developers benefit greatly from this since they can collaborate to write their own scripts that relate to the database and then commit them as part of their code. It maintains the versions of all the changes like any other version control change and supports branching and merging of SQL code.

Further reading

Nov
18
2022
--

Root Cause Analysis in Quality Engineering

Root Cause Analysis in Quality Engineering

In software development, dealing with bugs found by customers is inevitable. There is no perfect software. We’ve all been there: frantically trying to find a workaround, hurrying to release a hotfix, or trying to find an explanation for the relevant stakeholders. A vicious circle repeats itself with more or less intensity, depending on the bug’s severity or the magnitude of the affected customer. But it doesn’t have to be this way. We can turn bad experiences into productive ones by:

  • Learning about the actual real-world usage of the software that led to the discovery
  • Improving testing methods and processes based on these learnings
  • Assure stakeholders that relevant learnings have been acquired and that actions have been taken to ensure that the problems don’t happen again

Root Cause Analysis

In science and engineering, Root Cause Analysis (RCA) is defined as the process of discovering the root cause of a problem to identify the appropriate solution. Its purpose is to solve the underlying cause of the problem, instead of the symptom, and to apply a definitive solution instead of putting out the fire.

Most tech organizations have well-established RCA processes, especially for security breaches or uptime incidents. We MUST take the same approach in engineering and apply it to our quality process to acknowledge, analyze, resolve, learn and remove gaps in the software development life cycle (SDLC) that cause escapes (i.e. bugs reported by users not found during the SDLC) and problems for our customers.

root cause analysis

What will this accomplish?

  • We take a critical look at our own mistakes. Instead of taking the quick and easy route of fixing, testing, delivering, and forgetting, we pause for just a moment to think about why mistakes have happened. We become accountable;
  • We ensure that the appropriate tests are in place so that the reported problem does not happen again. Remember that we do not just aim to add “full coverage”, we need a representative test for the particular scenario, a test that is efficient and capable of catching the problem in all the next iterations;
  • In general, a Root Cause Analysis is a reactive activity. But by gathering, analyzing, and acting upon collected historical data, we become proactive in improving our processes and test procedures, before more feedback comes in;
  • Last but not least, this is a great opportunity to interact with the problems and the pains of our customers. We learn more about real-life production application usage; how to think and act like a real user of the software. 

How do we make this work?

Dealing with customer-reported bugs can be a time-sensitive issue. Therefore we must be mindful not to add bottlenecks. A simple asynchronous and easy exercise is a good way to start. 

We can streamline this process in several simple steps with the help of Jira workflow:

  1. Filter out bugs reported by paying customers and enforce RCA Workflow on them; Additionally, this activity can be expanded to other groups of reporters, such as communities. Places like Percona Forum can also be a good place to look for feedback.
  2. During bug resolution activities, the assigned developer investigates how the bug was introduced. When transitioning the ticket from a state like “In Progress” to a state like “In Review” the Developer posts a short description of the investigation mandatory text field.
  3. During the testing of the fix provided for the respective bug, the assigned quality engineer investigates why the problem was missed during all SDLC testing stages. When transitioning the ticket from a state like “In QA” to a state like “Ready for Merge” the Quality Engineer posts a short description of the investigation in a mandatory text field*.
  4. Depending on the level of testing where we take the preventive action, either the developer or the quality engineer picks options from 3 dropdown type fields:
    1. Problem category (Requirements, Testing, Coding, Environment, etc.)
    2. Problem detail (Missing Test Case, Inadequate Requirements, Missing Documentation, Insufficient Unit Test coverage, etc.)
    3. Corrective action (Automated Test Cases, Test Case Added to Regression suite, Documentation Updated, Process Improvement, etc.)
  5. Analysis of the data – The categories where we do end up taking corrective actions the most are our weak spots. Details provide us with information about what exactly we do wrong.
  6. The corrective action historical data provides the opportunity to become proactive and take preventative actions to improve weak areas.
Steps two and three do not have any measurable data or action tied to them, hence we use a free text field. The only purpose of these two steps is to make sure that engineers learn and convey the learnings to their teams. There is no place for naming or shaming in this activity.

root cause analysis

Conclusion

In practice, this is an adapted version of the fishbone technique for Root Cause Analysis. The missed bug represents the problem statement, and we do ask “Why did this happen” at several points. Why did we introduce the bug, why did we miss it in testing, and what exactly is the problem? Based on the answers to these questions we determine the actual root cause, and we take action to remove it definitely.

Applied diligently and in the long term, a Root Cause Analysis process does not only fill some gaps in the development and testing. It grows and matures the engineering organization. It encourages us to continuously scrutinize our methods and ask ourselves how we can do better. Let’s not miss this opportunity!

Nov
16
2022
--

How to Generate Test Data for MongoDB With Python

Generate Test Data for MongoDB With Python

Generate Test Data for MongoDB With PythonFor testing purposes, especially if you’re working on a project that uses any database technology to store information, you may need data to try out your project. In that case, you have two options:

  • Find a good dataset (Kaggle) or
  • Use a library like Faker

Through this blog post, you will learn how to generate test data for MongoDB using Faker.

Requirements

Dependencies

Make sure all the dependencies are installed before creating the Python script that will generate the data for your project.

You can create a requirements.txt file with the following content:

pandas
pymongo
tqdm
faker

Once you have created this file, run the following command:

pip install -r requirements.txt

Or if you’re using Anaconda, create an environment.yml file:

name: percona
dependencies:
  - python=3.10
  - pandas
  - pymongo
  - tqdm
  - faker

You can change the Python version as this script has been proven to work with these versions of Python: 3.7, 3.8, 3.9, 3.10, and 3.11.

Run the following statement to configure the project environment:

conda env create -f environment.yml

Fake data with Faker

Faker is a Python library that can be used to generate fake data through properties defined in the package.

from faker import Faker

fake = Faker()
for _ in range(10):
    print(fake.name())

The above code will print ten names, and on each call to method name(), it will produce a random value. The name() is a property of the generator. Every property of this library is called a fake. and there are many of them packaged in providers.

Some providers and properties available in the Faker library include:

You can find more information on bundled and community providers in the documentation.

Creating a Pandas DataFrame

After knowing Faker and its properties, a modules directory needs to be created, and inside the directory, we will create a module named dataframe.py. This module will be imported later into our main script, and this is where we define the method that will generate the data.

from multiprocessing import cpu_count
import pandas as pd
from tqdm import tqdm
from faker import Faker

Multiprocessing is implemented for optimizing the execution time of the script, but this will be explained later. First, you need to import the required libraries:

  • pandas. Data generated with Faker will be stored in a Pandas DataFrame before being imported into the database.
  • tqdm(). Required for adding a progress bar to show the progress of the DataFrame creation.
  • Faker(). It’s the generator from the faker library.
  • cpu_count(). This is a method from the multiprocessing module that will return the number of cores available.
fake = Faker()
num_cores = cpu_count() - 1

Faker() creates and initializes a faker generator, which can generate data by accessing the properties.

num_cores is a variable that stores the value returned after calling the cpu_count() method.

def create_dataframe(arg):
    x = int(60000/num_cores)
    data = pd.DataFrame()
    for i in tqdm(range(x), desc='Creating DataFrame'):
        data.loc[i, 'first_name'] = fake.first_name()
        data.loc[i, 'last_name'] = fake.last_name()
        data.loc[i, 'job'] = fake.job()
        data.loc[i, 'company'] = fake.company()
        data.loc[i, 'address'] = fake.address()
        data.loc[i, 'city'] = fake.city()
        data.loc[i, 'country'] = fake.country()
        data.loc[i, 'email'] = fake.email()
    return data

Then we define the create_dataframe() function, where:

  • x is the variable that will determine the number of iterations of the for loop where the DataFrame is created.
  • data is an empty DataFrame that will later be fulfilled with data generated with Faker.
  • Pandas DataFrame.loc attribute provides access to a group of rows and columns by their label(s). In each iteration, a row of data is added to the DataFrame and this attribute allows assigning values to each column.

The DataFrame that is created after calling this function will have the following columns:

#   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   first_name  60000 non-null  object
 1   last_name   60000 non-null  object
 2   job         60000 non-null  object
 3   company     60000 non-null  object
 4   address     60000 non-null  object
 5   country     60000 non-null  object
 6   city        60000 non-null  object
 7   email       60000 non-null  object

Connection to the database

Before inserting the data previously generated with Faker, we need to establish a connection to the database, and for doing this the PyMongo library will be used.

from pymongo import MongoClient

uri = "mongodb://user:password@localhost:27017/"
client = MongoClient(uri)

From PyMongo, we import the MongoClient() method.

Don’t forget to replace user, password, localhost, and port (27017) with your authentication details, and save this code in the modules directory and name it as base.py.

What is multiprocessing?

Multiprocessing is a Python module that can be used to take advantage of the CPU cores available in the computer where the script is running. In Python, single-CPU use is caused by the global interpreter lock, which allows only one thread to carry the Python interpreter at any given time, for more information see this blog post.

Imagine that you’re generating 60,000 records, running the script in a single core will take more time than you could expect, since each record is generated one by one within the loop. By implementing multiprocessing, the whole process is divided by the number of cores, so that if your CPU has 16 cores, every core will generate 4,000 records, and this is because only 15 cores will be used as we need to leave one available for avoiding freezing the computer.

To understand better how to implement multiprocessing in Python, I recommend the following tutorials:

Generating your data

All the required modules are now ready to be imported into the main script so it’s time to create the mongodb.py script. First, import the required libraries:

from multiprocessing import Pool
from multiprocessing import cpu_count
import pandas as pd

From multiprocessing, Pool() and cpu_count() are required. The Python Multiprocessing Pool class allows you to create and manage process pools in Python.

Then, import the modules previously created:

from modules.dataframe import create_dataframe
from modules.base import client

Now we create the multiprocessing pool configured to use all available CPU cores minus one. Each core will call the create_dataframe() function and create a DataFrame with 4,000 records, and after each call to the function has finished, all the DataFrames created will be concatenated into a single one.

if __name__ == "__main__":
    num_cores = cpu_count() - 1
    with Pool() as pool:
        data = pd.concat(pool.map(create_dataframe, range(num_cores)))
    data_dict = data.to_dict('records')
    db = client["company"]
    collection = db["employees"]
    collection.insert_many(data_dict)

After logging into the MongoDB server, we get the database and the collection where the data will be stored.

And finally, we will insert the DataFrame into MongoDB by calling the insert_many() method. All the data will be stored in a collection named employees.

Run the following statement to populate the database:

python mongodb.py

DataFrame creation with multiprocessing

DataFrame creation with multiprocessing

It will take just a few seconds to generate the DataFrame with the 60,000 records, and that’s why multiprocessing was implemented.

CPU Utilization on PMM

CPU utilization on Percona Monitoring and Management

Once the script finishes, you can check the data in the database.

use company;
db.employees.count()

The count() function returns the number of records in the employees table.

60000

Or you can display the records in the employees table:

db.employees.find().pretty()

{
        "_id" : ObjectId("6363ceeeda5c972cabf558b4"),
        "first_name" : "Sherri",
        "last_name" : "Phelps",
        "job" : "Science writer",
        "company" : "House Inc",
        "address" : "06298 Mejia Streets Suite 742\nRobertland, WY 98585",
        "city" : "Thomasview",
        "country" : "Cote d'Ivoire",
        "email" : "michelle63@hotmail.com"
}

The code shown in this blog post can be found on my GitHub account in the data-generator repository.

Nov
15
2022
--

PMM v2.32: Backup Management for MongoDB in GA, New Home Dashboard, and More!

Percona Monitoring and Management v2.32

Percona Monitoring and Management v2.32We are pleased to announce the general availability of the Backup Management for MongoDB and other improvements in Percona Monitoring and Management (PMM) v.2.32 that has been released in November 2022. Details are in this blog and also in the PMM 2.32 Release Notes.

PMM is now on the scene with a new Home Dashboard where you can quickly and easily check your databases’ health at one glance and detect anomalies. While there’s no one-size-fits-all approach, we created and released the new Home Dashboard to make it more user-friendly, even for users new to PMM.

You can get started using PMM in minutes with PMM Demo to check out the latest version of PMM V2.32.

Let’s have a look at the highlights of PMM 2.32:

General availability of Backup Management for MongoDB

The Backup Management for MongoDB in PMM has reached General Availability and is no longer in Technical Preview.

Supported setups

MongoDB Backup Management now supports replica set setups for the following actions:

  • Create logical snapshot backups
  • Create logical Point In Time Recovery (PITR) backups
  • Create physical snapshot backups. This is available only with Percona Server for MongoDB
  • Restore logical snapshot backups.
  • Restore physical backups. This requires additional manual operations after the restore and is only available with Percona Server for MongoDB.
  • Restore logical PITR backups from S3

Current limitations

  • Restoring logical PITR backups only supports S3 storage type
  • Restoring physical backups requires manual post-restore actions
  • Restoring a MongoDB backup on a new cluster is not yet supported
  • Restoring physical backups for containerized MongoDB setups is not supported
  • Local storage for MySQL is not supported
  • Sharded cluster setups are not supported
  • Backups that are stored locally cannot be removed automatically
  • Retention is not supported for PITR artifacts

 

Quicker and easier database health overview with the new Home Dashboard

As mentioned and promised in previous release notes, we were investigating better approaches, methods, and user-friendly presentation of database health in the Home Dashboard, which is also the entry point to PMM. Finally, we are proud to release this finalized dashboard as the new Home Dashboard. Thank you for your feedback and collaboration during all iterations of the experimental versions.

Monitor hundreds of nodes without any performance issues

If you have hundreds of nodes being monitored with the same PMM instance, the original dashboard may have taken a long time to load, which could have resulted in an unresponsive page, due to the design of the original Home Dashboard with repeating panels for each node. With performance issues in mind, we re-designed the Home Dashboard with new logic to show what is wrong or what is OK with your databases, instead of showing all metrics, for each node.  

PMM Home Dashboard_home_select multiple nodes

PMM Home Dashboard_home_select multiple nodes

Anomaly detection

Many of you probably use dozens of tools for different purposes in your daily work, meetings, and projects. These tools should make your life easier, not more intensive. With monitoring tools, the issue of too many metrics can be daunting— so analyzing data, and detecting anomalies that deviate from a database’s normal behavior should be easy and fast. Functional Anomaly Detection panels, as opposed to separate graphs for each node, are a much better way to visualize and recognize problems with your databases that may require action to be taken.

  • You can click the node name on the panel to see the Node Overview Dashboard of the related node if you see any anomaly. So you can see all metrics of the Node that you need to diagnose the problems.
  • All panels except Advisor Checks can be filtered by node and environment variables
  • Graphs in the Anomaly Detection row show the data for the top 20 nodes. e.g., CPU anomalies in the top 20
Anomaly Detection

PMM Anomaly Detection panels

Command Center panels

The primary motivation behind the new Home Dashboard is simplicity. It was always hard to balance presenting the required metrics for everyone and at the same time, making it clean, functional, and simple while working on the new design. So we decided to use Command Center panels which are collapsed by default. If you see any anomaly in Memory Usage with more than 90%, how do you know when it happened or started? Time-series graphs for the Top 20 in the Command Center panels will help you see when the anomalies occurred: in the last 1 hour or the last week? 

PMM Command Center Panels

PMM Command Center Panels on Home Dashboard

Enhanced main menu

We returned with two improvements we previously promised. These improvements were announced in V2.32 for easier access to dashboards from the Main Menu. After the last changes, with each possible monitored services type represented on the Main Menu as icons, the menu became crowded and extended with all icons representing different service types. In the latest version,  you’ll only see the icons of currently monitored services on the Main Menu. For example, if you’re monitoring MongoDB, you will see the MongoDB Dashboard’s icon on the main menu, as opposed to the previous versions, which showed all database types PMM is capable of monitoring, whether you had them in your system or not. When and if you start to monitor other services like MySQL, they will be automatically added to the Main Menu.

Another improvement on the Main Menu is the visibility of all other dashboards. PMM provides multiple dashboards for different levels of information for each service. You only see some dashboards in the main menu; the rest are available in the folders. Some users can miss these dashboards, which are not presented in the Main Menu. Also, customer dashboards created by different users in your organization can be missed or invisible to you until you see them in the folders by chance. So, we added Other Dashboards links to the sub-menu of each service,  so that you can easily click and see all dashboards in the Service folder.

Quick access to other dashboards from the menu

Quick access to other dashboards from the menu

What’s next?

  • We’ll improve the Vacuum Dashboard with more metrics. If you’d like to enhance it with us, you can share your feedback in the comments.
  • A health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 
  • For Backup Management, we are planning to continue to iterate on the current limitations listed above and make the restore processes as seamless as possible for all database types.

Install PMM 2.32 now or upgrade your installation to V2.32 by checking our documentation for more information about upgrading.

Learn more about Percona Monitoring and Management 3.32

Thanks to Community and Perconians

We love our community and team in Percona, who shape the future of PMM, together and help us with all those changes.

You can also join us on our community forums to request new features, share your feedback, and ask for support.

Thank you for your collaboration on the new Home Dashboards:

Cihan Tunal?   @SmartMessage 

Tyson McPherson @Parts Authority

Paul Migdalen @IntelyCare

Nov
09
2022
--

MySQL: Using UNION, INTERSECT, & EXCEPT

MySQL: Using UNION, INTERSECT, & EXCEPT

MySQL: Using UNION, INTERSECT, & EXCEPTMySQL 8.0.31 added INTERSECT and EXCEPT to augment the long-lived UNION operator. That is the good news. The bad news is that you have to be careful using the EXCEPT operator as there is a trick.

Let’s start with some simple tables and load some simple data.

SQL > create table a (id int, nbr int);
Query OK, 0 rows affected (0.0180 sec)
SQL > create table b (id int, nbr int);
Query OK, 0 rows affected (0.0199 sec)
SQL > insert into a (id,nbr) values (1,10),(3,30),(5,50),(7,70);
Query OK, 4 rows affected (0.0076 sec)

Records: 4  Duplicates: 0  Warnings: 0
SQL > insert into b (id,nbr) values (1,10),(2,20),(3,30),(4,40);
Query OK, 4 rows affected (0.0159 sec)

Records: 4  Duplicates: 0  Warnings: 0

So each table has four rows of data with two rows – (1,10) and (3,30) – appearing in both.

SQL > select id,nbr from a;
+----+-----+
| id | nbr |
+----+-----+
|  1 |  10 |
|  3 |  30 |
|  5 |  50 |
|  7 |  70 |
+----+-----+
4 rows in set (0.0011 sec)
SQL > select id,nbr from b;
+----+-----+
| id | nbr |
+----+-----+
|  1 |  10 |
|  2 |  20 |
|  3 |  30 |
|  4 |  40 |
+----+-----+
4 rows in set (0.0010 sec)

Since the two tables have identical structures — both have the same column id and nbr — we can use UNION to combine the contents of both. The two rows that are identical are not duplicated in the results which is why we go from two tables with four rows each to one table of six rows. If you need the duplicates to be displayed, use UNION ALL instead of UNION. UNION DISTINCT is the default choice and you do not have to specify the DISTINCT.

SQL > select * from a union select * from b;
+----+-----+
| id | nbr |
+----+-----+
|  1 |  10 |
|  3 |  30 |
|  5 |  50 |
|  7 |  70 |
|  2 |  20 |
|  4 |  40 |
+----+-----+
6 rows in set (0.0010 sec)

If we only need the rows that are in common in the two tables then we can use INTERSECT to find them.

SQL > select * from a intersect select * from b;
+----+-----+
| id | nbr |
+----+-----+
|  1 |  10 |
|  3 |  30 |
+----+-----+
2 rows in set (0.0010 sec)

Bonus: try the above and switch the order of the table names to see if there is a difference in the output. The answer is below.

What if we want the data from one of the tables that is not in the other table? Well, here we have to be careful. If we want the data from table B that is not in table A, we use EXCEPT and phrase it in that order of the columns.

SQL > select * from b except select * from a;
+----+-----+
| id | nbr |
+----+-----+
|  2 |  20 |
|  4 |  40 |
+----+-----+
2 rows in set (0.0012 sec)

And to get only the values from a table that are not in table B, we need to be explicit in ordering the tables.

SQL > select * from a except select * from b;
+----+-----+
| id | nbr |
+----+-----+
|  5 |  50 |
|  7 |  70 |
+----+-----+
2 rows in set (0.0011 sec)

And be careful because if we exclude the values in table A that are in table A, we will get an empty set. If this seems to be a contrived example that could never happen, please be advised that this type of logic bomb does explode all the time. This is a syntactically valid query even if the logic is not up to par.

SQL > select * from a except select * from a;
Empty set (0.0015 sec)

Ideally, I would like to put parens around the two select statements when using UNION, INTERSECT, and EXCEPT, as seen in the following example, to make the query more explicit.

SQL > (select * from b order by id) union (select * from a order by id);
+----+-----+
| id | nbr |
+----+-----+
|  1 |  10 |
|  2 |  20 |
|  3 |  30 |
|  4 |  40 |
|  5 |  50 |
|  7 |  70 |
+----+-----+
6 rows in set (0.0010 sec)

You especially want to use the parens if you need a transaction or really want to make sure the parser knows what you desire to do with your query.

Bonus answer: The UNION operator does not care if we match A with B or B with A.

It takes a little time for the Percona engineers to get all the enterprise-level features Percona Server for MySQL is known for after a new release is made (Set Theory in MySQL 8.0: UNION and Now INTERSECT and EXCEPT) so please be patient for release 8.0.31 to appear on the Percona repos. It is on the way!

And be sure to read Set Theory in MySQL for more examples plus a guide for using these operators in CTEs.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com