Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Oct
04
2023
--

What’s New in Percona Toolkit 3.5.5

percona toolkit 3.5.5

Percona Toolkit 3.5.5 was released on October 3, 2023. This blog post covers the main changes in this release.

pt-online-schema-change improvements

After nibble hook

This was contributed by Marcelo HP Ferreira at https://github.com/percona/percona-toolkit/pull/645. The feature allows you to define a custom plugin that will be triggered after copying each nibble.

Here are a few examples of what you can print with this hook:

  • Copy rate
  • Number of rows per nibble
  • Run time per nibble
  • Chunk size when using dynamic chunks

You can find an example plugin in the sample file
t/ptonlineschemachange/samples/plugins/on_copy_rows_after_nibble.pm

Using option –skip-check-slave-lag multiple times

ptonlineschemachange  always supported multiple invocations of option
skipcheckslavelag  but practically used only the first one; other replicas are still checked for the lag as reported at PT-2241. Contribution by PinoCao (PR-622) fixes this issue. Now, this option works as expected. You can specify multiple replicas for which you do not want to check lag.

No failure on replica disconnect

This is one more feature that was broken before. As described in the user manual for the
maxlag  option:

The tool waits forever for replicas to stop lagging. If any replica is stopped, the tool waits forever until the replica is started. The data copy continues when all replicas are running and not lagging too much.

However, this was not the case. In case of replica disconnect,
ptonlineschemachange  just stopped working. This was especially unfortunate if this happened after a few hours of data copying. This behavior was reported at PT-2168 and PT-2096

The reason why this feature did not work lies in a few places. One is a debugging code for bug fix PT-1760 that was not removed. As a result, the tool died in case of replica disconnect instead of handling the error.

But even after removing this debugging code, there exist multiple places in the code where the tool can die. Fix for PT-2168 adds checks for these situations and, depending on the option
failonstoppedreplication , either dies or continues waiting when the replica is available again. Now,
ptonlineschemachange  should die less frequently.

ptonlineschemachange  was failing at replica loss for many years, and users may rely on this behavior. Therefore, the option
failonstoppedreplication  is enabled by default. This means that you need to disable it (
nofailonstoppedreplication ) if you want
ptonlineschemachange  to continue copying data after one of the replicas is down.

ptonlineschemachange  also updates the replica list while waiting. If you use option
recursionmethod , set to processlist or hosts, this happens automatically. If you use
recursionmethod=dsn , you need to update the DSN table manually, so the tool re-reads it and updates the list of replicas it monitors.

K8 support

Percona Toolkit is shipped with the
ptk8sdebugcollector  tool that collects diagnostic data from Percona Kubernetes operators. It collects system information together with the output of
ptmysqlsummary ,
ptmongodbsummary , or
pg_gather  tools, depending on the operator. In version 3.5.5, support for PostgreSQL operator version two has been added. We are planning to add new features to this tool for every release. It will become a Swiss army knife for Percona Kubernetes Operators troubleshooting.

Quality improvements

Perl tools in Percona Toolkit come with a regression test suite that uses its own module that fires up MySQL sandbox (
lib/Sandbox.pm ) and extended Perl module
Test::More  (
lib/PerconaTest.pm ). Unfortunately, in the last few years, not all tests were stable, and the suite was not run at each code change. After noticing this, we started working on fixing existing tests so they do not fail due to improvements in new MySQL versions or environmental issues. Fix for PT-2156: Fix tests for lib finishes this effort. We now run tests locally before releasing the product. We plan to make regression tests part of the release process starting from the next version.

Another move in this direction is standardizing the output of the common options, such as format of the option
version . This is done as a bug fix for PT-2235: pt-mongodb-index-check does not support option –version and PT-2236: pt-secure-collect, pt-pg-summary do not follow PT standard for option –version.

Percona Toolkit also fixed typos and trailing whitespace issues in this release. Viktor Szépe contributed to these changes. He also updated our GitHub workflows, so such issues should not show up in the future.

Quality control is not only about code fixes but also about testing. In the latest releases, we get CVE reports for Go tools from Kushal Haldar. They include more vulnerabilities than Open Source tools that we use, can find, and help us improve the quality of Go-based tools.

Community contributions

Percona Toolkit continuously receives contributions. In the latest releases, we try to accept as many contributions as possible. At the same time, we started requiring minimal quality control, such as adding test cases and running regression tests before accepting the patch. There are still a few pull requests waiting for final resolution, but we hope we will process all of them by the end of the year.

For this release, we want to thank:

  • Kushal Haldar for his reports about vulnerabilities in Go-based tools
  • Viktor Szépe for pull requests, improving the quality of our code
  • Marcelo HP Ferreira for after nibble hook to
    ptonlineschemachange
  • PinoCao for
    skipcheckslavelag  bug fix

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL, MariaDB, MongoDB, and PostgreSQL server and system tasks that are too difficult or complex to perform manually.

 

Download Percona Toolkit

Oct
02
2023
--

Percona Toolkit – A Quick Introduction to Summary

Percona Toolkit

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL, MariaDB, MongoDB, and PostgreSQL server and system tasks that are too difficult or complex to perform manually.

You can get summaries of your database servers, or you verify replication lag on MySQL and PostgreSQL servers. You get thirty-eight scripts that can do any manner of actions, and you will find them very valuable in your regular database work. And did I mention they are open source?!

Percona Toolkit may already be in your distro by default in many Linux distros, such as Fedora, Homebrew, and Debian. And you can get it from us at Percona.

Installation is easy with the software comes in popular package formats. Follow the guide to installing the Percona repositories and then either sudo apt-get install percona-toolkit or sudo yum install percona-toolkit.

The one tricky part is installing the User-Defined Functions (UDF) needed for a few of the tools. I ran into the common error message:

ERROR 1126 (HY000): Can't open shared library 'fnv_udf.so' (errno: 22 fnv_udf.so: cannot open shared object file: No such file or directory)

But it is very simple to get the shared objects copied to where they need to be located.

sudo cp -v /usr/lib/mysql/plugin/libfnv_udf.so /usr/lib
sudo cp -v /usr/lib/mysql/plugin/libfnv1a_udf.so /usr/lib
sudo cp -v /usr/lib/mysql/plugin/libmurmur_udf.so /usr/lib

Then, those shared objects need to be loaded into into the server.

mysql -e "CREATE FUNCTION fnv1a_64 RETURNS INTEGER SONAME 'libfnv1a_udf.so'" -u root -p
mysql -e "CREATE FUNCTION fnv_64 RETURNS INTEGER SONAME 'libfnv_udf.so'" -u root -p
mysql -e "CREATE FUNCTION murmur_hash RETURNS INTEGER SONAME 'libmurmur_udf.so'" -u root -p

 

Pt-summary, pt-mysql-summary, and pt-pg-summary

For an example of what the Percona Toolkit offers, look at the output from pt-summary. This is a handy way to determine the basic parameters that the server is using.

$ pt-summary --user=root
[?2004l# Percona Toolkit System Summary Report ######################
Date | 2023-09-25 14:43:34 UTC (local TZ: CDT -0500)
Hostname | testbox
Uptime | 44 min, 1 user, load average: 0.47, 0.67, 0.57
Platform | Linux
Release | Ubuntu 22.04.3 LTS (jammy)
Kernel | 6.2.0-33-generic
Architecture | CPU = 64-bit, OS = 64-bit
Threading | NPTL 2.35
SELinux | No SELinux detected
Virtualized | No virtualization detected
# Processor ##################################################
Processors | physical = 1, cores = 2, virtual = 4, hyperthreading = yes
Speeds | 1x3000.651, 1x3061.574, 1x3069.932, 1x3093.881
Models | 4xIntel(R) Core(TM) i7-6500U CPU @ 2.50GHz
Caches | 4x4096 KB
# Memory #####################################################
Total | 15.4G
Free | 9.4G
Used | physical = 2.2G, swap allocated = 2.0G, swap used = 0.0, virtual = 2.2G
Shared | 475.0M
Buffers | 3.8G
Caches | 12.4G
Dirty | 928 kB
UsedRSS | 5.8G
Swappiness | 60
DirtyPolicy | 20, 10
DirtyStatus | 0, 0
Locator Size Speed Form Factor Type Type Detail
========= ======== ================= ============= ============= ===========
# Mounted Filesystems ########################################
Filesystem Size Used Type Opts Mountpoint
/dev/nvme0n1p1 511M 2% vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro /boot/efi
/dev/nvme0n1p2 457G 4% ext4 rw,relatime,errors=remount-ro /
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,inode64 /run
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 /run
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,relatime,size=1612684k,nr_inodes=403171,mode=700,uid=1000,gid=1000,inode64 /run
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,inode64 /run/user/1000
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 /run/user/1000
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run/user/1000
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run/user/1000
tmpfs 1.6G 1% tmpfs rw,nosuid,nodev,relatime,size=1612684k,nr_inodes=403171,mode=700,uid=1000,gid=1000,inode64 /run/user/1000
tmpfs 5.0M 1% tmpfs rw,nosuid,nodev,inode64 /run/lock
tmpfs 5.0M 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 /run/lock
tmpfs 5.0M 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run/lock
tmpfs 5.0M 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /run/lock
tmpfs 5.0M 1% tmpfs rw,nosuid,nodev,relatime,size=1612684k,nr_inodes=403171,mode=700,uid=1000,gid=1000,inode64 /run/lock
tmpfs 7.7G 2% tmpfs rw,nosuid,nodev,inode64 /dev/shm
tmpfs 7.7G 2% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 /dev/shm
tmpfs 7.7G 2% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /dev/shm
tmpfs 7.7G 2% tmpfs rw,nosuid,nodev,noexec,relatime,size=1612688k,mode=755,inode64 /dev/shm
tmpfs 7.7G 2% tmpfs rw,nosuid,nodev,relatime,size=1612684k,nr_inodes=403171,mode=700,uid=1000,gid=1000,inode64 /dev/shm
# Disk Schedulers And Queue Size #############################
nvme0n1 | [none] 1023
sda | [mq-deadline] 2
# Disk Partioning ############################################
# Kernel Inode State #########################################
dentry-state | 324004 295487 45 0 21657 0
file-nr | 9792 0 9223372036854775807
inode-nr | 301794 491
# LVM Volumes ################################################
Unable to collect information
# LVM Volume Groups ##########################################
Unable to collect information
# RAID Controller ############################################
Controller | No RAID controller detected
# Network Config #############################################
Controller | Intel Corporation Ethernet Connection I219-V (rev 21)
FIN Timeout | 60
Port Range | 60999
# Interface Statistics #######################################
interface rx_bytes rx_packets rx_errors tx_bytes tx_packets tx_errors
========= ========= ========== ========== ========== ========== ==========
lo 400000 3500 0 400000 3500 0
enp0s31f6 350000000 250000 0 15000000 125000 0
wlp4s0 500000 3500 0 300000 2250 0
# Network Connections ########################################
Connections from remote IP addresses
18.238.132.68 1
52.55.200.219 1
104.17.24.14 1
104.18.28.243 1
142.251.4.188 2
147.135.54.159 1
162.247.241.2 1
172.67.218.245 1
192.168.1.105 1
Connections to local IP addresses
192.168.1.238 10
Connections to top 10 local ports
33332 1
35414 1
42744 1
47926 1
50922 1
52888 1
58112 1
60468 1
60748 1
60752 1
States of connections
ESTABLISHED 8
LISTEN 6
TIME_WAIT 2
# Top Processes ##############################################
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8582 stoker 20 0 13212 4096 3328 R 12.5 0.0 0:00.02 top
1 root 20 0 168020 12908 8044 S 0.0 0.1 0:04.41 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_fl+
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+
10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_perc+
# Notable Processes ##########################################
PID OOM COMMAND
? ? sshd doesn't appear to be running
# Simplified and fuzzy rounded vmstat (wait please) ##########
procs ---swap-- -----io---- ---system---- --------cpu--------
r b si so bi bo ir cs us sy il wa st
1 0 0 0 150 225 250 600 7 2 90 2 0
0 0 0 0 0 90 900 3000 3 4 92 1 0
0 0 0 0 0 350 600 1250 2 1 93 4 0
0 0 0 0 0 225 700 1500 3 2 93 2 0
0 0 0 0 0 100 800 1500 1 12 87 0 0
# Memory management ##########################################
Transparent huge pages are enabled.
# The End ####################################################

The specifics for a MySQL server cover the details you will want to know about that instance can easily be had by using pt-mysql-sumary.

# Percona Toolkit MySQL Summary Report #######################
System time | 2023-09-25 15:50:16 UTC (local TZ: CDT -0500)
# Instances ##################################################
Port Data Directory Nice OOM Socket
===== ========================== ==== === ======
# MySQL Executable ###########################################
Path to executable | /usr/sbin/mysqld
Has symbols | 
# Slave Hosts ################################################
No slaves found
# Report On Port 3306 ########################################
User | root@localhost
Time | 2023-09-25 10:50:16 (CDT)
Hostname | testbox
Version | 8.0.33-25 Percona Server (GPL), Release '25', Revision '60c9e2c5'
Built On | Linux x86_64
Started | 2023-09-25 09:22 (up 0+01:28:02)
Databases | 4
Datadir | /var/lib/mysql/
Processes | 1 connected, 2 running
Replication | Is not a slave, has 0 slaves connected
Pidfile | /var/run/mysqld/mysqld.pid (exists)
# Processlist ################################################

Command COUNT(*) Working SUM(Time) MAX(Time)
------------------------------ -------- ------- --------- ---------
Daemon 1 1 5000 5000
Query 1 1 0 0

User COUNT(*) Working SUM(Time) MAX(Time)
------------------------------ -------- ------- --------- ---------
event_scheduler 1 1 5000 5000
root 1 1 0 0

Host COUNT(*) Working SUM(Time) MAX(Time)
------------------------------ -------- ------- --------- ---------
localhost 2 2 5000 5000

db COUNT(*) Working SUM(Time) MAX(Time)
------------------------------ -------- ------- --------- ---------
NULL 2 2 5000 5000

State COUNT(*) Working SUM(Time) MAX(Time)
------------------------------ -------- ------- --------- ---------
init 1 1 0 0
Waiting on empty queue 1 1 5000 5000

# Status Counters (Wait 10 Seconds) ##########################
Variable Per day Per second 11 secs
Aborted_connects 150 
Binlog_snapshot_position 12500 
Binlog_cache_use 50 
Bytes_received 200000 2 500
Bytes_sent 2250000 25 3000
Com_change_db 15 
Com_create_db 15 
Com_create_table 600 
Com_create_udf 100 
Com_flush 15 
Com_select 700 2
Com_set_option 70 
Com_show_binlogs 15 
Com_show_databases 15 
Com_show_engine_status 15 
Com_show_master_status 15 
Com_show_plugins 15 
Com_show_processlist 15 
Com_show_replicas 15 
Com_show_slave_hosts 15 
Com_show_replica_status 15 
Com_show_slave_status 15 
Com_show_status 80 
Com_show_storage_engines 15 
Com_show_variables 35 
Connections 800 1
Created_tmp_disk_tables 35 
Created_tmp_files 80 
Created_tmp_tables 175 
Error_log_buffered_bytes 50000 
Error_log_buffered_events 350 
Error_log_latest_write 30000000000000000 300000000000 
Flush_commands 50 
Handler_commit 10000 
Handler_delete 125 
Handler_external_lock 100000 1 2
Handler_prepare 200 
Handler_read_first 800 
Handler_read_key 30000 1
Handler_read_next 70000 
Handler_read_rnd_next 125000 1 100
Handler_rollback 50 
Handler_update 5000 
Handler_write 60000 50
Innodb_buffer_pool_bytes_data 250000000 3000 
Innodb_buffer_pool_pages_flushed 3500 
Innodb_buffer_pool_pages_made_not_young 35 
Innodb_buffer_pool_pages_made_young 250 
Innodb_buffer_pool_pages_old 6000 
Innodb_buffer_pool_read_requests 250000 3 7
Innodb_buffer_pool_reads 12500 
Innodb_buffer_pool_write_requests 35000 4
Innodb_checkpoint_max_age 1500000000 17500 
Innodb_data_fsyncs 2500 
Innodb_data_read 225000000 2500 
Innodb_data_reads 15000 
Innodb_data_writes 6000 
Innodb_data_written 60000000 700 1500
Innodb_dblwr_pages_written 1250 
Innodb_dblwr_writes 350 
Innodb_ibuf_segment_size 35 
Innodb_redo_log_uuid 17500000000 200000 
Innodb_redo_log_checkpoint_lsn 300000000 3500 
Innodb_redo_log_current_lsn 300000000 3500 
Innodb_redo_log_flushed_to_disk_lsn 300000000 3500 
Innodb_redo_log_logical_size 8000 
Innodb_redo_log_physical_size 50000000 600 
Innodb_redo_log_capacity_resized 1750000000 20000 
Innodb_log_write_requests 15000 
Innodb_log_writes 1000 
Innodb_lsn_current 300000000 3500 
Innodb_lsn_flushed 300000000 3500 
Innodb_lsn_last_checkpoint 300000000 3500 
Innodb_master_thread_active_loops 125 
Innodb_master_thread_idle_loops 90000 
Innodb_max_trx_id 35000 
Innodb_os_log_fsyncs 700 
Innodb_os_log_written 1250000 15 
Innodb_pages_created 2500 
Innodb_pages_read 12500 
Innodb_pages0_read 125 
Innodb_pages_written 3500 
Innodb_purge_trx_id 35000 
Innodb_rows_inserted 1500 4
Innodb_rows_read 1500 4
Innodb_system_rows_deleted 125 
Innodb_system_rows_inserted 175 
Innodb_system_rows_read 80000 2
Innodb_system_rows_updated 5000 
Innodb_num_open_files 250 
Innodb_undo_tablespaces_total 35 
Innodb_undo_tablespaces_implicit 35 
Innodb_undo_tablespaces_active 35 
Innodb_secondary_index_triggered_cluster_reads 35000 
Mysqlx_port 500000 6 
Mysqlx_ssl_ctx_verify_mode 80 
Mysqlx_worker_threads 35 
Net_buffer_length 250000 3 
Open_table_definitions 800 
Opened_files 35 
Opened_table_definitions 1250 
Opened_tables 3500 1
Performance_schema_session_connect_attrs_longest_seen 2000 
Queries 1500 4
Questions 1500 4
Select_full_join 50 
Select_scan 350 
Sort_rows 70 
Sort_scan 15 
Ssl_session_cache_timeout 5000 
Table_locks_immediate 125 
Table_open_cache_hits 50000 
Table_open_cache_misses 3500 1
Threads_created 35 
Uptime 90000 1 1
# Table cache ################################################
Size | 4000
Usage | 3%
# Key Percona Server features ################################
Table &amp; Index Stats | Disabled
Multiple I/O Threads | Enabled
Corruption Resilient | Enabled
Durable Replication | Not Supported
Import InnoDB Tables | Not Supported
Fast Server Restarts | Not Supported
Enhanced Logging | Disabled
Replica Perf Logging | Disabled
Response Time Hist. | Not Supported
Smooth Flushing | Not Supported
HandlerSocket NoSQL | Not Supported
Fast Hash UDFs | Enabled
# Percona XtraDB Cluster #####################################
# Plugins ####################################################
InnoDB compression | ACTIVE
# Schema #####################################################
Specify --databases or --all-databases to dump and summarize schemas
# Noteworthy Technologies ####################################
SSL | No
Explicit LOCK TABLES | No
Delayed Insert | No
XA Transactions | No
NDB Cluster | No
Prepared Statements | No
Prepared statement count | 0
# InnoDB #####################################################
Version | 8.0.33-25
Buffer Pool Size | 128.0M
Buffer Pool Fill | 10%
Buffer Pool Dirty | 0%
File Per Table | ON
Page Size | 16k
Log File Size | 2 * 48.0M = 96.0M
Log Buffer Size | 16M
Flush Method | fsync
Flush Log At Commit | 1
XA Support | 
Checksums | 
Doublewrite | ON
R/W I/O Threads | 4 4
I/O Capacity | 200
Thread Concurrency | 0
Concurrency Tickets | 5000
Commit Concurrency | 0
Txn Isolation Level | 
Adaptive Flushing | ON
Adaptive Checkpoint | 
Checkpoint Age | 0
InnoDB Queue | 0 queries inside InnoDB, 0 queries in queue
Oldest Transaction | 0 Seconds
History List Len | 0
Read Views | 0
Undo Log Entries | 0 transactions, 0 total undo, 0 max undo
Pending I/O Reads | 0 buf pool reads, 0 normal AIO, 0 ibuf AIO, 0 preads
Pending I/O Writes | 0 buf pool (0 LRU, 0 flush list, 0 page); 0 AIO, 0 sync, 0 log IO (0 log, 0 chkp); 0 pwrites
Pending I/O Flushes | 0 buf pool, 0 log
Transaction States | 3xnot started
# MyISAM #####################################################
Key Cache | 8.0M
Pct Used | 20%
Unflushed | 0%
# Security ###################################################
Users | 1 users, 0 anon, 0 w/o pw, 1 old pw
Old Passwords | 
# Encryption #################################################
No keyring plugins found
# Binary Logging #############################################
Binlogs | 2
Zero-Sized | 0
Total Size | 1.3k
binlog_format | ROW
expire_logs_days | 0
sync_binlog | 1
server_id | 1
binlog_do_db | 
binlog_ignore_db | 
# Noteworthy Variables #######################################
Auto-Inc Incr/Offset | 1/1
default_storage_engine | InnoDB
flush_time | 0
init_connect | 
init_file | 
sql_mode | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
join_buffer_size | 256k
sort_buffer_size | 256k
read_buffer_size | 128k
read_rnd_buffer_size | 256k
bulk_insert_buffer | 0.00
max_heap_table_size | 16M
tmp_table_size | 16M
max_allowed_packet | 64M
thread_stack | 1M
log | 
log_error | /var/log/mysql/error.log
log_warnings | 
log_slow_queries | 
log_queries_not_using_indexes | OFF
log_slave_updates | ON
# Configuration File #########################################
Config File | /etc/mysql/my.cnf
# /etc/mysql/conf.d/mysql.cnf

[mysqld]
# /etc/mysql/conf.d/mysqldump.cnf

[mysqldump]
quick
quote-names
max_allowed_packet = 16M
# /etc/mysql/mysql.conf.d/mysqld.cnf

[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
log-error = /var/log/mysql/error.log
# Memory management library ##################################
jemalloc is not enabled in mysql config for process with id 6899
# The End ####################################################

And pt-pg-summary is the analog for the PostgreSQL server.

$ t-pg-summary -U postgres
INFO[0000] Connecting to the database server using: sslmode=disable dbname=postgres 
INFO[0000] Connection OK 
INFO[0000] Detected PostgreSQL version: 16.0.0 
INFO[0000] Getting global information 
INFO[0000] Collecting global counters (1st pass) 
INFO[0000] Collecting Cluster information 
INFO[0000] Waiting 10 seconds to read counters 
INFO[0000] Collecting Connected Clients information 
INFO[0000] Collecting Database Wait Events information 
INFO[0000] Collecting Global Wait Events information 
INFO[0000] Collecting Port and Data Dir information 
INFO[0000] Collecting Tablespaces information 
INFO[0000] Collecting Instance Settings information 
INFO[0000] Collecting Slave Hosts (PostgreSQL 10+) 
INFO[0000] Waiting for counters information 
INFO[0010] Collecting global counters (2nd pass) 
INFO[0010] Collecting processes command line information 
INFO[0010] Finished collecting global information 
INFO[0010] Collecting per database information 
INFO[0010] Connecting to the "postgres" database 
INFO[0010] Collecting Table Access information 
INFO[0010] Collecting Table Cache Hit Ratio information 
INFO[0010] Collecting Index Cache Hit Ratio information 
INFO[0010] Connecting to the "test" database 
INFO[0010] Collecting Table Access information 
INFO[0010] Collecting Table Cache Hit Ratio information 
INFO[0010] Collecting Index Cache Hit Ratio information 
INFO[0010] Connecting to the "davetest" database 
INFO[0010] Collecting Table Access information 
INFO[0010] Collecting Table Cache Hit Ratio information 
INFO[0010] Collecting Index Cache Hit Ratio information 
INFO[0010] Connecting to the "postgis" database 
INFO[0010] Collecting Table Access information 
INFO[0010] Collecting Table Cache Hit Ratio information 
INFO[0010] Collecting Index Cache Hit Ratio information

##### --- Database Port and Data_Directory --- ####
+----------------------+----------------------------------------------------+
| Name | Setting |
+----------------------+----------------------------------------------------+
| data_directory | /var/lib/postgresql/16/main |
+----------------------+----------------------------------------------------+

##### --- List of Tablespaces ---- ######
+----------------------+----------------------+----------------------------------------------------+
| Name | Owner | Location |
+----------------------+----------------------+----------------------------------------------------+
| pg_default | postgres | |
| pg_global | postgres | |
+----------------------+----------------------+----------------------------------------------------+

##### --- Slave and the lag with Master --- ####
There are no slave hosts

##### --- Cluster Information --- ####
+------------------------------------------------------------------------------------------------------+
Usename : postgres
Time : 2023-09-25 10:23:17.068273 -0500 CDT
Client Address : 127.0.0.1
Client Hostname: 
Version : PostgreSQL 16.0 - Percona Distribution on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11....
Started : 2023-09-25 08:58:45.535777 -0500 CDT
Is Slave : false
+------------------------------------------------------------------------------------------------------+

##### --- Databases --- ####
+----------------------+------------+
| Dat Name | Size |
+----------------------+------------+
| postgres | 7508 kB |
| test | 7644 kB |
| template1 | 7580 kB |
| template0 | 7353 kB |
| davetest | 7596 kB |
| postgis | 7580 kB |
+----------------------+------------+

##### --- Index Cache Hit Ratios --- ####

Database: davetest
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| index hit rate | 0.00 |
+----------------------+------------+

Database: postgis
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| index hit rate | 0.00 |
+----------------------+------------+

Database: postgres
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| index hit rate | 0.00 |
+----------------------+------------+

Database: test
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| index hit rate | 0.00 |
+----------------------+------------+

##### --- Table Cache Hit Ratios --- ####
Database: davetest
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| cache hit rate | 0.00 |
+----------------------+------------+
Database: postgis
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| cache hit rate | 0.00 |
+----------------------+------------+
Database: postgres
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| cache hit rate | 0.00 |
+----------------------+------------+
Database: test
+----------------------+------------+
| Index Name | Ratio |
+----------------------+------------+
| cache hit rate | 0.00 |
+----------------------+------------+

##### --- List of Wait_events for the entire Cluster - all-databases --- ####
+----------------------+----------------------+---------+
| Wait Event Type | Event | Count |
+----------------------+----------------------+---------+
| Activity | BgWriterHibernate | 1 |
| Activity | WalWriterMain | 1 |
| Activity | LogicalLauncherMain | 1 |
| Activity | CheckpointerMain | 1 |
| Activity | AutoVacuumMain | 1 |
+----------------------+----------------------+---------+

##### --- List of users and client_addr or client_hostname connected to --all-databases --- ####
+----------------------+------------+---------+----------------------+---------+
| Wait Event Type | Client | State | Count |
+----------------------+------------+---------+----------------------+---------+
| postgres | 127.0.0.1/32 | active | 1 |
+----------------------+------------+---------+----------------------+---------+

##### --- Counters diff after 10 seconds --- ####

+----------------------+-------------+------------+--------------+-------------+------------+-------------+------------+-------------+------------+------------+-----------+-----------+-----------+------------+
| Database | Numbackends | XactCommit | XactRollback | BlksRead | BlksHit | TupReturned | TupFetched | TupInserted | TupUpdated | TupDeleted | Conflicts | TempFiles | TempBytes | Deadlocks |
+----------------------+-------------+------------+--------------+-------------+------------+-------------+------------+-------------+------------+------------+-----------+-----------+-----------+------------+
| | 0 | 0 | 0 | 0 | 25 | 68 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| davetest | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| postgis | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| postgres | 0 | 13 | 0 | 0 | 1480 | 1155 | 1155 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| template0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| template1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| test | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+----------------------+-------------+------------+--------------+-------------+------------+-------------+------------+-------------+------------+------------+-----------+-----------+-----------+------------+

##### --- Table access per database --- ####
Database: davetest
+----------------------------------------------------+------+--------------------------------+---------+
| Relname | Kind | Datname | Count |
+----------------------------------------------------+------+--------------------------------+---------+
| pg_database_datname_index | i | | 1 |
| pg_database | r | | 1 |
| pg_class | r | davetest | 1 |
| pg_locks | v | davetest | 1 |
| pg_class_oid_index | i | davetest | 1 |
| pg_stat_database | v | davetest | 1 |
| pg_database_oid_index | i | | 1 |
| pg_class_relname_nsp_index | i | davetest | 1 |
| pg_class_tblspc_relfilenode_index | i | davetest | 1 |
+----------------------------------------------------+------+--------------------------------+---------+
Database: postgis
+----------------------------------------------------+------+--------------------------------+---------+
| Relname | Kind | Datname | Count |
+----------------------------------------------------+------+--------------------------------+---------+
| pg_locks | v | postgis | 1 |
| pg_class_tblspc_relfilenode_index | i | postgis | 1 |
| pg_class_relname_nsp_index | i | postgis | 1 |
| pg_database_datname_index | i | | 1 |
| pg_class | r | postgis | 1 |
| pg_database | r | | 1 |
| pg_stat_database | v | postgis | 1 |
| pg_class_oid_index | i | postgis | 1 |
| pg_database_oid_index | i | | 1 |
+----------------------------------------------------+------+--------------------------------+---------+
Database: postgres
+----------------------------------------------------+------+--------------------------------+---------+
| Relname | Kind | Datname | Count |
+----------------------------------------------------+------+--------------------------------+---------+
| pg_database_datname_index | i | | 1 |
| pg_class | r | postgres | 1 |
| pg_database | r | | 1 |
| pg_stat_database | v | postgres | 1 |
| pg_class_relname_nsp_index | i | postgres | 1 |
| pg_class_oid_index | i | postgres | 1 |
| pg_database_oid_index | i | | 1 |
| pg_class_tblspc_relfilenode_index | i | postgres | 1 |
| pg_locks | v | postgres | 1 |
+----------------------------------------------------+------+--------------------------------+---------+
Database: test
+----------------------------------------------------+------+--------------------------------+---------+
| Relname | Kind | Datname | Count |
+----------------------------------------------------+------+--------------------------------+---------+
| pg_database_datname_index | i | | 1 |
| pg_class_oid_index | i | test | 1 |
| pg_class_tblspc_relfilenode_index | i | test | 1 |
| pg_database | r | | 1 |
| pg_locks | v | test | 1 |
| pg_class | r | test | 1 |
| pg_database_oid_index | i | | 1 |
| pg_stat_database | v | test | 1 |
| pg_class_relname_nsp_index | i | test | 1 |
+----------------------------------------------------+------+--------------------------------+---------+

##### --- Instance settings --- ####
Setting Value 
allow_in_place_tablespaces : off
allow_system_table_mods : off
application_name : 
archive_cleanup_command : 
archive_command : (disabled)
archive_library : 
archive_mode : off
archive_timeout : 0
array_nulls : on
authentication_timeout : 60
autovacuum : on
autovacuum_analyze_scale_factor : 0.1
autovacuum_analyze_threshold : 50
autovacuum_freeze_max_age : 200000000
autovacuum_max_workers : 3
autovacuum_multixact_freeze_max_age : 400000000
autovacuum_naptime : 60
autovacuum_vacuum_cost_delay : 2
autovacuum_vacuum_cost_limit : -1
autovacuum_vacuum_insert_scale_factor : 0.2
autovacuum_vacuum_insert_threshold : 1000
autovacuum_vacuum_scale_factor : 0.2
autovacuum_vacuum_threshold : 50
autovacuum_work_mem : -1
backend_flush_after : 0
backslash_quote : safe_encoding
backtrace_functions : 
bgwriter_delay : 200
bgwriter_flush_after : 64
bgwriter_lru_maxpages : 100
bgwriter_lru_multiplier : 2
block_size : 8192
bonjour : off
bonjour_name : 
bytea_output : hex
check_function_bodies : on
checkpoint_completion_target : 0.9
checkpoint_flush_after : 32
checkpoint_timeout : 300
checkpoint_warning : 30
client_connection_check_interval : 0
client_encoding : UTF8
client_min_messages : notice
cluster_name : 16/main
commit_delay : 0
commit_siblings : 5
compute_query_id : auto
config_file : /etc/postgresql/16/main/postgresql.conf
constraint_exclusion : partition
cpu_index_tuple_cost : 0.005
cpu_operator_cost : 0.0025
cpu_tuple_cost : 0.01
createrole_self_grant : 
cursor_tuple_fraction : 0.1
data_checksums : off
data_directory : /var/lib/postgresql/16/main
data_directory_mode : 0700
data_sync_retry : off
DateStyle : ISO, MDY
db_user_namespace : off
deadlock_timeout : 1000
debug_assertions : off
debug_discard_caches : 0
debug_io_direct : 
debug_logical_replication_streaming : buffered
debug_parallel_query : off
debug_pretty_print : on
debug_print_parse : off
debug_print_plan : off
debug_print_rewritten : off
default_statistics_target : 100
default_table_access_method : heap
default_tablespace : 
default_text_search_config : pg_catalog.english
default_toast_compression : pglz
default_transaction_deferrable : off
default_transaction_isolation : read committed
default_transaction_read_only : off
dynamic_library_path : $libdir
dynamic_shared_memory_type : posix
effective_cache_size : 524288
effective_io_concurrency : 1
enable_async_append : on
enable_bitmapscan : on
enable_gathermerge : on
enable_hashagg : on
enable_hashjoin : on
enable_incremental_sort : on
enable_indexonlyscan : on
enable_indexscan : on
enable_material : on
enable_memoize : on
enable_mergejoin : on
enable_nestloop : on
enable_parallel_append : on
enable_parallel_hash : on
enable_partition_pruning : on
enable_partitionwise_aggregate : off
enable_partitionwise_join : off
enable_presorted_aggregate : on
enable_seqscan : on
enable_sort : on
enable_tidscan : on
escape_string_warning : on
event_source : PostgreSQL
exit_on_error : off
extension_destdir : 
external_pid_file : /var/run/postgresql/16-main.pid
extra_float_digits : 2
from_collapse_limit : 8
fsync : on
full_page_writes : on
geqo : on
geqo_effort : 5
geqo_generations : 0
geqo_pool_size : 0
geqo_seed : 0
geqo_selection_bias : 2
geqo_threshold : 12
gin_fuzzy_search_limit : 0
gin_pending_list_limit : 4096
gss_accept_delegation : off
hash_mem_multiplier : 2
hba_file : /etc/postgresql/16/main/pg_hba.conf
hot_standby : on
hot_standby_feedback : off
huge_page_size : 0
huge_pages : try
icu_validation_level : warning
ident_file : /etc/postgresql/16/main/pg_ident.conf
idle_in_transaction_session_timeout : 0
idle_session_timeout : 0
ignore_checksum_failure : off
ignore_invalid_pages : off
ignore_system_indexes : off
in_hot_standby : off
integer_datetimes : on
IntervalStyle : postgres
jit : on
jit_above_cost : 100000
jit_debugging_support : off
jit_dump_bitcode : off
jit_expressions : on
jit_inline_above_cost : 500000
jit_optimize_above_cost : 500000
jit_profiling_support : off
jit_provider : llvmjit
jit_tuple_deforming : on
join_collapse_limit : 8
krb_caseins_users : off
krb_server_keyfile : FILE:/etc/postgresql-common/krb5.keytab
lc_messages : en_US.UTF-8
lc_monetary : en_US.UTF-8
lc_numeric : en_US.UTF-8
lc_time : en_US.UTF-8
listen_addresses : localhost
lo_compat_privileges : off
local_preload_libraries : 
lock_timeout : 0
log_autovacuum_min_duration : 600000
log_checkpoints : on
log_connections : off
log_destination : stderr
log_directory : log
log_disconnections : off
log_duration : off
log_error_verbosity : default
log_executor_stats : off
log_file_mode : 0600
log_filename : postgresql-%Y-%m-%d_%H%M%S.log
log_hostname : off
log_line_prefix : %m [%p] %q%u@%d 
log_lock_waits : off
log_min_duration_sample : -1
log_min_duration_statement : -1
log_min_error_statement : error
log_min_messages : warning
log_parameter_max_length : -1
log_parameter_max_length_on_error : 0
log_parser_stats : off
log_planner_stats : off
log_recovery_conflict_waits : off
log_replication_commands : off
log_rotation_age : 1440
log_rotation_size : 10240
log_startup_progress_interval : 10000
log_statement : none
log_statement_sample_rate : 1
log_statement_stats : off
log_temp_files : -1
log_timezone : America/Chicago
log_transaction_sample_rate : 0
log_truncate_on_rotation : off
logging_collector : off
logical_decoding_work_mem : 65536
maintenance_io_concurrency : 10
maintenance_work_mem : 65536
max_connections : 100
max_files_per_process : 1000
max_function_args : 100
max_identifier_length : 63
max_index_keys : 32
max_locks_per_transaction : 64
max_logical_replication_workers : 16
max_parallel_apply_workers_per_subscription : 2
max_parallel_maintenance_workers : 2
max_parallel_workers : 8
max_parallel_workers_per_gather : 2
max_pred_locks_per_page : 2
max_pred_locks_per_relation : -2
max_pred_locks_per_transaction : 64
max_prepared_transactions : 0
max_replication_slots : 10
max_slot_wal_keep_size : -1
max_stack_depth : 2048
max_standby_archive_delay : 30000
max_standby_streaming_delay : 30000
max_sync_workers_per_subscription : 2
max_wal_senders : 10
max_wal_size : 1024
max_worker_processes : 8
min_dynamic_shared_memory : 0
min_parallel_index_scan_size : 64
min_parallel_table_scan_size : 1024
min_wal_size : 80
old_snapshot_threshold : -1
parallel_leader_participation : on
parallel_setup_cost : 1000
parallel_tuple_cost : 0.1
password_encryption : scram-sha-256
plan_cache_mode : auto
port : 5432
post_auth_delay : 0
pre_auth_delay : 0
primary_conninfo : 
primary_slot_name : 
quote_all_identifiers : off
random_page_cost : 4
recovery_end_command : 
recovery_init_sync_method : fsync
recovery_min_apply_delay : 0
recovery_prefetch : try
recovery_target : 
recovery_target_action : pause
recovery_target_inclusive : on
recovery_target_lsn : 
recovery_target_name : 
recovery_target_time : 
recovery_target_timeline : latest
recovery_target_xid : 
recursive_worktable_factor : 10
remove_temp_files_after_crash : on
reserved_connections : 0
restart_after_crash : on
restore_command : 
row_security : on
scram_iterations : 4096
search_path : "$user", public
segment_size : 131072
send_abort_for_crash : off
send_abort_for_kill : off
seq_page_cost : 1
server_encoding : UTF8
server_version : 16.0 - Percona Distribution
server_version_num : 160000
session_preload_libraries : 
session_replication_role : origin
shared_buffers : 16384
shared_memory_size : 143
shared_memory_size_in_huge_pages : 72
shared_memory_type : mmap
shared_preload_libraries : 
ssl : on
ssl_ca_file : 
ssl_cert_file : /etc/ssl/certs/ssl-cert-snakeoil.pem
ssl_ciphers : HIGH:MEDIUM:+3DES:!aNULL
ssl_crl_dir : 
ssl_crl_file : 
ssl_dh_params_file : 
ssl_ecdh_curve : prime256v1
ssl_key_file : /etc/ssl/private/ssl-cert-snakeoil.key
ssl_library : OpenSSL
ssl_max_protocol_version : 
ssl_min_protocol_version : TLSv1.2
ssl_passphrase_command : 
ssl_passphrase_command_supports_reload : off
ssl_prefer_server_ciphers : on
standard_conforming_strings : on
statement_timeout : 0
stats_fetch_consistency : cache
superuser_reserved_connections : 3
synchronize_seqscans : on
synchronous_commit : on
synchronous_standby_names : 
syslog_facility : local0
syslog_ident : postgres
syslog_sequence_numbers : on
syslog_split_messages : on
tcp_keepalives_count : 9
tcp_keepalives_idle : 7200
tcp_keepalives_interval : 75
tcp_user_timeout : 0
temp_buffers : 1024
temp_file_limit : -1
temp_tablespaces : 
TimeZone : America/Chicago
timezone_abbreviations : Default
trace_notify : off
trace_recovery_messages : log
trace_sort : off
track_activities : on
track_activity_query_size : 1024
track_commit_timestamp : off
track_counts : on
track_functions : none
track_io_timing : off
track_wal_io_timing : off
transaction_deferrable : off
transaction_isolation : read committed
transaction_read_only : off
transform_null_equals : off
unix_socket_directories : /var/run/postgresql
unix_socket_group : 
unix_socket_permissions : 0777
update_process_title : on
vacuum_buffer_usage_limit : 256
vacuum_cost_delay : 0
vacuum_cost_limit : 200
vacuum_cost_page_dirty : 20
vacuum_cost_page_hit : 1
vacuum_cost_page_miss : 2
vacuum_failsafe_age : 1600000000
vacuum_freeze_min_age : 50000000
vacuum_freeze_table_age : 150000000
vacuum_multixact_failsafe_age : 1600000000
vacuum_multixact_freeze_min_age : 5000000
vacuum_multixact_freeze_table_age : 150000000
wal_block_size : 8192
wal_buffers : 512
wal_compression : off
wal_consistency_checking : 
wal_decode_buffer_size : 524288
wal_init_zero : on
wal_keep_size : 0
wal_level : logical
wal_log_hints : off
wal_receiver_create_temp_slot : off
wal_receiver_status_interval : 10
wal_receiver_timeout : 60000
wal_recycle : on
wal_retrieve_retry_interval : 5000
wal_segment_size : 16777216
wal_sender_timeout : 60000
wal_skip_threshold : 2048
wal_sync_method : fdatasync
wal_writer_delay : 200
wal_writer_flush_after : 128
work_mem : 4096
xmlbinary : base64
xmloption : content
zero_damaged_pages : off

##### --- Processes start up command --- ####
PID : Command line
776 : /usr/lib/postgresql/16/bin/postgres -D /var/lib/postgresql/16/main -c config_file=/etc/postgresql/16/main/postgresql.conf

So why is this valuable?

All the information that can be had from pt-summary, pt-mysql-summary, and pt-pg-summary can be had with SQL queries. But under pressure, will you remember all the needed statements? These scripts make it easier to get the needed information in one swoop.

This is also a great source of information to archive for future reference. Need to know what the last known good configuration looked like? Were you running SELinux when you set up the server? This is the type of information that is invaluable if you have hit handy and frustrating when you do not have it when tracing issues.

In the future, we will look at more of the Percona Toolkit and its uses.

Free your DBAs to focus on higher priorities.

 

Download Percona Toolkit

Sep
20
2023
--

Avoid Surprises When Restarting MySQL — Ensure Dynamic Changes Won’t Be Lost

restarting mysql

If you’re a DBA, one of your “easiest” tasks is to stop/start MySQL during a maintenance window, but even that could lead to unwanted scenarios if you modify some dynamic parameters at some point in your instance.

Here’s a brief story of how this could happen, to make it clearer:

You’re a DBA managing a few MySQL servers. The application using one of them starts having issues on a Friday night, right before it’s time to leave; after a quick check, you notice the app is requesting more connections, and the hotfix is to up max connections; you change them dynamically, the fire is off, and so are you. Let your future self worry about finding the root cause and fixing it properly the following Monday.

 

But life happens; Monday is here with new challenges, and you already forgot about the connections issue… A few months later, a restart for MySQL is required, and surprise, surprise, right after the restart, the issue is back ‘unexpectedly’; now you have to troubleshoot and lose time wondering what happened and fixing it.

Side note: This is not advice on how to fix things; the story only introduces stuff that can happen when restarting MySQL and how to prevent it. This could also be a tale of why doing that kind of “administration” is bad.

Introduction of pt-config-diff and the situation

As you might know, the Percona Toolkit is a collection of advanced command-line tools used to perform various MySQL, MongoDB, and system tasks that are too difficult or complex to perform manually.

One of those tools is pt-config-diff; this tool can show differences within MySQL configuration files and server variables.

You can use it to compare two config files (i.e., you want to ensure your new server has the same settings as the old one) or one config file and a running MySQL instance (this is what we’ll use here).

Below are the commands and output that lead to the different values in the story.

First, let’s see the connection settings in the configuration file.

$ grep conn /etc/my.cnf
max_connections=200

And before the dynamic change, MySQL is running with the same value.

$ mysql -e "SELECT @@max_connections"
+-------------------+
| @@max_connections |
+-------------------+
|               200 |
+-------------------+

That Friday night, the value was increased to 500:

$ mysql -vv -e "SET GLOBAL max_connections=500; SELECT @@GLOBAL.max_connections"

--------------
SET GLOBAL max_connections=500
--------------
Query OK, 0 rows affected (0.01 sec)


--------------
SELECT @@GLOBAL.max_connections
--------------
+--------------------------+
| @@GLOBAL.max_connections |
+--------------------------+
|                      500 |
+--------------------------+
1 row in set (0.00 sec)

Bye

By then, the settings in the configuration file and the runtime value for max_connections are different, and the dynamic changes will be lost after a MySQL reboot.

Side note: In case you missed it, with MySQL 8, you can now persist dynamic changes as well to avoid manual modification to the cnf file. See more in this blog post: Using MySQL 8 Persisted System Variables.

How to check dynamic changes

When it’s the moment to reboot MySQL, you can use pt-config-diff to validate if the runtime settings are the same as in the configuration file.

Using the same example, here’s the output:

$ pt-config-diff h=localhost /etc/my.cnf

1 config difference

Variable                  localhost.localdomain /etc/my.cnf
========================= ===================== ===========
max_connections           500                   200

Now, you know the difference between the config file and your runtime settings and can properly prepare for it accordingly (Modify the value in your config file or SET PERSIST).

Let’s modify the configuration file manually:

$ grep conn /etc/my.cnf
max_connections=500

And re-checking with the same command as above, now there’s no output (Meaning no differences were found).

$ pt-config-diff h=localhost /etc/my.cnf
$

Things to be aware of

The tool will only compare the values for parameters present in both sources. 

Using the same example. if the max_connections parameter is not present in the configuration file (MySQL was running with the default value), pt-config-diff won’t show the value in the output.

Example:

The parameter is not present in the configuration file.

$ grep conn /etc/my.cnf
$

And MySQL is running with the default value:

$ mysql -e "SELECT @@max_connections"
+-------------------+
| @@max_connections |
+-------------------+
|               151 |
+-------------------+

Modifying the value dynamically (using the same commands as in the previous example) and running pt-config-diff, the output is empty.

$ mysql -e "SELECT @@max_connections"
+-------------------+
| @@max_connections |
+-------------------+
|               500 |
+-------------------+


$ pt-config-diff h=localhost /etc/my.cnf
$

As mentioned above, this happens because there’s nothing to compare to; the configuration file doesn’t have a value for the max_connections parameter.

Another thing to keep in mind is if you have more than one config file (Using !include or !includedir, or in more than one of the default places MySQL looks for config files), you can’t compare all of them at once.

Ensure you compare all the required config files if that’s your case.

Final thoughts

First, take this one as a reminder always to persist your changes, either manually adding them to your configuration files or using SET PERSIST (Note: This command will create/use mysqld-auto.cnf inside the datadir).

With that being said, even after checking with pt-config-diff, there could be some differences that could be lost after a reboot, but using this tool can help you prevent unwanted surprises and highlight possible issues before a restart.

As always, even when performing the “simplest” task, it’s recommended to have some pre-checks to ensure the activity goes smoothly.

Also, if you identified yourself with the story at the beginning of this article and feel you could use a hand, remember you can always rely on our Managed Services offering, where you can have a team of DBAs managing the instances for you.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

May
02
2023
--

How To Use pt-secure-collect for Capturing Data in a Secure Way From the OS and Database System

How To Use pt-secure-collect

Sometimes crucial data sharing is avoided because of compliance rules, organizational policies, or numerous security concerns. The common use cases involve sharing pt-mysql-summary, pt-stalk, and other OS-related details to assist Support Engineers or any other third-party team troubleshoot database-related issues.

In this context, pt-secure-collect is a very important utility from Percona, which helps capture the required information securely and also provides aid in masking the existing information.

Pt-secure-collect helps in collecting, sanitizing, and encrypting data from various sources. By default, this utility collects the output with the help of pt-stalk, pt-summary, and pt-mysql-summary.

Let’s see how this tool works.

Installation

The tool can be installed via the Percona official repositories:

sudo yum install percona-toolkit

Another option for downloading pt-secure-collect is either via the Percona Toolkit or directly installing the specific tool.

shell> sudo wget https://downloads.percona.com/downloads/percona-toolkit/3.5.2/binary/redhat/7/x86_64/percona-toolkit-3.5.2-2.el7.x86_64.rpm 
shell> sudo yum install percona-toolkit-3.5.2-2.el7.x86_64.rpm

OR

shell> sudo wget percona.com/get/pt-secure-collect 
shell> sudo chmod +x pt-secure-collect

 Now, let’s run our first command to capture the OS/Database-related details from the tool.

shell> ./pt-secure-collect collect --bin-dir=/usr/bin/ --temp-dir=/home/vagrant/pt/ --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password=Root@1234
Encryption password

Output:

INFO[2023-04-22 06:54:10] Temp directory is "/home/vagrant/pt/"
INFO[2023-04-22 06:54:10] Creating output file "/home/vagrant/pt/pt-stalk_2023-04-22_06_54_10.out"  
INFO[2023-04-22 06:54:10] Running pt-stalk --no-stalk --iterations=2 --sleep=30 --host=localhost --dest=/home/vagrant/pt/ --port=3306 --user=root --password=********  
INFO[2023-04-22 06:55:42] Creating output file "/home/vagrant/pt/pt-summary_2023-04-22_06_55_42.out"  
INFO[2023-04-22 06:55:42] Running pt-summary                            
INFO[2023-04-22 06:55:48] Creating output file "/home/vagrant/pt/pt-mysql-summary_2023-04-22_06_55_48.out"  
INFO[2023-04-22 06:55:48] Running pt-mysql-summary --host=localhost --port=3306 --user=root --password=********  
INFO[2023-04-22 06:56:01] Sanitizing output collected data              
INFO[2023-04-22 06:56:17] Creating tar file "/home/vagrant/pt/pt.tar.gz"  
INFO[2023-04-22 06:56:17] Encrypting "/home/vagrant/pt/pt.tar.gz" file into "/home/vagrant/pt/pt.tar.gz.aes"  
INFO[2023-04-22 06:56:17] Skipping encrypted file "pt.tar.gz.aes"   

So, here the above command collected the data from the “pt*” tools securely. By default, it encrypts the data and asks for the encryption password as well. However, we can skip that part by mentioning this option “ –no-encrypt”  option. 

Options:-

--bin-dir => Directory having the Percona Toolkit binaries (pt* tools). 
--temp-dir => Temporary directory used for the data collection.

Note – In order to run the command successfully all prerequisites binaries of (pt-stalk, pt-summary, and pt-mysql-summary) must be present and included in the command.

Let’s decrypt the file and observe the captured details:

shell> ./pt-secure-collect decrypt /home/vagrant/pt/pt.tar.gz.aes  --outfile=/home/vagrant/pt/pt.tar.gz
Encryption password:
INFO[2023-04-22 07:01:55] Decrypting file "/home/vagrant/pt/pt.tar.gz.aes" into "/home/vagrant/pt/pt.tar.gz" 

Note – Here, we need to provide the password which we used at the time of encryption.

--outfile => Write the output to this file. If omitted, the output file name will be the same as the input file, adding the .aes extension.

Now, inside the path, we can see the unencrypted file. Followed by this, we can uncompress the file to see the contents.

shell> /home/vagrant/pt 
-rw-------. 1 vagrant vagrant 500K Apr 22 07:01 pt.tar.gz

shell> tar -xzvf pt.tar.gz

Let’s look at a couple of examples where the sensitive data has been altered or masked.

  • With pt-secure-collect:
Hostname | hostname 
log_error | /var/log/hostname 
Config File | /etc/hostname 
pid-file        = /var/run/mysqld/hostname 
log-error     = /var/log/hostname 
socket        = /var/lib/mysql/hostname

  • Without pt-secure-collect:
Hostname | localhost.localdomain 
log_error | /var/log/mysqld.log 
Config File | /etc/my.cnf 
pid-file       = /var/run/mysqld/mysqld.pid 
log-error     = /var/log/mysqld.log 
socket        = /var/lib/mysql/mysql.sock

Note – We can clearly see some differences in the both types of outputs. With pt-secure-collection the above information was just replaced with some random value(“hostname”).

Now, let’s see how we can sanitize an existing file “pt-mysql-summary.out” and mask the critical information that ends with the below output section.

shell> ./pt-secure-collect sanitize --input-file=/home/vagrant/pt-mysql-summary.out > /home/vagrant/pt-mysql-summary_sanitize.out

Output:

Hostname | hostname 
Pidfile | /var/run/mysqld/hostname (exists) 
log_error | /var/log/hostname 
Config File | /etc/hostname 
pid-file        = /var/run/mysqld/hostname 
log-error     = /var/log/hostname 
socket        = /var/lib/mysql/hostname 
log-error     = /var/log/mariadb/hostname
pid-file        = /var/run/mariadb/hostname

You may also control the information which you want to skip from masking with settings with option –no-sanitize-hostnames and  –no-sanitize-queries.

Here, we see one more example where the critical information, such as “hostname” details inside the OS log file (“/var/log/messages”), is masked/replaced by some other value.

shell> sudo ./pt-secure-collect sanitize --input-file=/var/log/messages > /home/vagrant/messages_sanitize.out

 

Output (without pt-secure-collect):

Apr 23 03:37:13 localhost pmm-agent: #033[31mERRO#033[0m[2023-04-23T03:37:13.547+00:00] time="2023-04-23T03:37:13Z" level=error msg="Error opening connection to ProxySQL: dial tcp 127.0.0.1:6032: connect: connection refused" source="exporter.go:169"  #033[31magentID#033[0m=/agent_id/04dd6ad8-5c2e-4c52-a624-eb3bc7357651 #033[31mcomponent#033[0m=agent-process #033[31mtype#033[0m=proxysql_exporter

Output (with pt-secure-collect):

Apr 23 03:37:13 localhost pmm-agent: #033[31mERRO#033[0m[2023-04-23T03:37:13.547+00:00] time="2023-04-23T03:37:13Z" level=error msg="Error opening connection to ProxySQL: dial tcp hostname:6032: connect: connection refused" source="hostname:169"  #033[31magentID#033[0m=/agent_id/04dd6ad8-5c2e-4c52-a624-eb3bc7357651 #033[31mcomponent#033[0m=agent-process #033[31mtype#033[0m=proxysql_exporte

 

Summary

With the help of this tool, both OS and database-level information/logs can be encrypted or masked with some different values to hide the sensitive data. This tool comes in handy while dealing with critical data troubleshooting with any third-party stakeholders and also maintains security/compliance-related practices.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL, MariaDB, MongoDB, and PostgreSQL server and system tasks that are too difficult or complex to perform manually.

 

Learn more about Percona Toolkit

Dec
12
2022
--

Using Percona Toolkit to Alter Database Tables Online: A Controlled Approach

Percona Toolkit to Alter Database Tables Online

Percona Toolkit to Alter Database Tables OnlineTable modifications are a common task for database administrators. In this blog, I’ll explain how to alter tables online in a controlled manner that does not disrupt application users or cause application downtime.

One of the tools in Percona Toolkit is pt-online-schema-change, a utility that alters the structure of a table without interfering with the reads or writes. The tool creates an empty copy of the table to alter and modify as desired before copying the rows from the original table into the new one.

When the copying is finished, it removes the original table and replaces it with the new one. Any changes made to data in the original tables during the copy process will be reflected in the new table as the tool creates triggers on the original table to update the corresponding rows in the new table.

How to test the pt-online-schema-change command?

Before running the actual alter using the tool, perform a dry run to ensure the pt-online-schema-change command is functional. The –dry-run option creates and modifies the new table without adding triggers, copying data, or replacing the existing table.

The basic command for modifying a table is as follows, which may need to be tweaked as needed using the variables like –critical-load threads_running –max-load Threads_running –chunk-size –max-lag, –max-flow-ctl (Percona XtraDB Cluster) and so on running in a production environment.

I’m using “ENGINE=InnoDB” for the first test case, which rebuilds the table; this is useful for removing fragmented spaces from the table.

Dry-run test:

$ pt-online-schema-change --dry-run --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Starting a dry run.  `mytestdb`.`authors` will not be altered.  Specify --execute instead of --dry-run to alter the table.
Creating new table...
Created new table mytestdb._authors_new OK.
Altering new table...
Altered `mytestdb`.`_authors_new` OK.
Not creating triggers because this is a dry run.
Not copying rows because this is a dry run.
Not swapping tables because this is a dry run.
Not dropping old table because this is a dry run.
Not dropping triggers because this is a dry run.
2022-12-09T05:44:23 Dropping new table...
2022-12-09T05:44:23 Dropped new table OK.
Dry run complete.  `mytestdb`.`authors` was not altered.
$

How to run the ALTER TABLE?

It is recommended that you read the documentation before performing the task.

To run the alter, replace the –dry-run option with –execute.

$ pt-online-schema-change --execute --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
Found 1 slaves:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Will check slave lag on:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
..
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `mytestdb`.`authors`...
Creating new table...
Created new table mytestdb._authors_new OK.
Altering new table...
Altered `mytestdb`.`_authors_new` OK.
2022-12-09T05:57:10 Creating triggers...
2022-12-09T05:57:10 Created triggers OK.
2022-12-09T05:57:10 Copying approximately 10023 rows...
2022-12-09T05:57:10 Copied rows OK.
2022-12-09T05:57:10 Analyzing new table...
2022-12-09T05:57:10 Swapping tables...
2022-12-09T05:57:10 Swapped original and new tables OK.
2022-12-09T05:57:10 Dropping old table...
2022-12-09T05:57:10 Dropped old table `mytestdb`.`_authors_old` OK.
2022-12-09T05:57:10 Dropping triggers...
2022-12-09T05:57:10 Dropped triggers OK.
Successfully altered `mytestdb`.`authors`.
$

Can we pause the pt-online-schema-change execution? Yes!

The –pause-file=/tmp/pt-osc.pause option helps you to pause the execution. While the file specified by this parameter is present, execution will be paused and resumed when it is removed.

Note: I shortened the pt-osc log to make the result more readable.

$ pt-online-schema-change --pause-file=/tmp/pt-osc.pause --execute --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
….
2022-12-10T15:42:01 Copying approximately 10023 rows...
Sleeping 60 seconds because /tmp/pt-osc.pause exists
Sleeping 60 seconds because /tmp/pt-osc.pause exists
Copying `mytestdb`.`authors`:  73% 00:44 remain
2022-12-10T15:44:04 Copied rows OK.
...
Successfully altered `mytestdb`.`authors`.

Can we review the data and tables before swapping them? Yes!

The —no-swap-tables —no-drop-old-table —no-drop-new-table —no-drop-triggers options allow us to do the alter in a controlled manner. 

That is, we will let tools handle the majority of the tasks, such as creating the new table, altering, copying the records, and the remaining table swapping and trigger dropping will be done manually.

Caution:  The —no-swap-tables option does not work if the table has foreign keys with child tables associated.

$ pt-online-schema-change --no-swap-tables --no-drop-triggers --no-drop-old-table --no-drop-new-table --execute --alter "CHARACTER SET = utf8mb4, COLLATE = utf8mb4_general_ci, MODIFY email varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,MODIFY name  varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL" h=172.31.92.72,D=mytestdb,t=authors2;
Found 1 slaves:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Will check slave lag on:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `mytestdb`.`authors2`...
Creating new table...
Created new table mytestdb._authors2_new OK.
Altering new table...
Altered `mytestdb`.`_authors2_new` OK.
2022-12-09T09:16:28 Creating triggers...
2022-12-09T09:16:28 Created triggers OK.
2022-12-09T09:16:28 Copying approximately 10067 rows...
2022-12-09T09:16:29 Copied rows OK.
Not dropping old table because --no-drop-triggers was specified.
Not dropping triggers because --no-drop-triggers was specified.  To drop the triggers, execute:
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_del`
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_upd`
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_ins`
Not dropping the new table `mytestdb`.`_authors2_new` because --no-drop-new-table was specified.  To drop the new table, execute:
DROP TABLE IF EXISTS `mytestdb`.`_authors2_new`;
Successfully altered `mytestdb`.`authors2`.
$

pt-online-schema-change has done the job and we now have two tables and three triggers. So we can safely review the table structure and data in the _authors2_new table, and once we’re sure everything is in order, we can swap and drop the triggers.

mysql> show tables like '%authors2%';
+---------------------------------+
| Tables_in_mytestdb (%authors2%) |
+---------------------------------+
| _authors2_new                   |
| authors2                        |
+---------------------------------+
2 rows in set (0.00 sec)

Find the TRIGGERS:

mysql> SELECT TRIGGER_NAME,EVENT_MANIPULATION FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_SCHEMA='mytestdb' and EVENT_OBJECT_TABLE like '%authors%' \G
*************************** 1. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_del
EVENT_MANIPULATION: DELETE
*************************** 2. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_upd
EVENT_MANIPULATION: UPDATE
*************************** 3. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_ins
EVENT_MANIPULATION: INSERT
3 rows in set (0.00 sec)

Run the following SQL to swap the tables and remove the triggers.

RENAME TABLE
mytestdb.authors2 TO mytestdb._authors2_old,
mytestdb._authors2_new TO mytestdb.authors2;

Lastly, remove the triggers and the old table:

DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_del`;
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_upd`;
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_ins`;
DROP TABLE IF EXISTS mytestdb._authors2_old;

Wrap up

pt-online-schema-change is a part of the Percona Toolkit for altering tables online, and we can customize it with various options available based on our needs. MySQL’s online DDL with the direct alter is an option, particularly for dropping indexes and changing metadata, among other things. Where online DDL is not a choice, we can use the pt-online-schema-change.

Caution: It is not recommended to run the tool directly in the replica instance as the PT-OSC operations will not produce a consistent table on the replicas. The statements coming via binlog will not be processed by triggers, so whatever new data is coming in via replication, will be missing in the new table.

Aug
29
2022
--

Two Extremely Useful Tools (pt-upgrade and checkForServerUpgrade) for MySQL Upgrade Testing

MySQL Upgrade Testing

MySQL Upgrade TestingMy last blog, Percona Utilities That Make Major MySQL Version Upgrades Easier, detailed the tools available from the Percona toolkit that assists us with major MySQL version upgrades. The pt-upgrade tool aids in testing application queries and generates reports on how each question performs on servers running various versions of MySQL.

MySQL Shell Upgrade Checker is a utility that helps in compatibility tests between MySQL 5.7 instances and MySQL 8.0 upgrades, which is part of the mysql-shell-utilities. The util.checkForServerUpgrade() function checks whether the MySQL 5.7 instance is ready for the MySQL 8.0 upgrade and generates a report with warnings, errors, and notices for preparing the current MySQL 5.7 setup for upgrading to MySQL 8.0.

We can run this Upgrade Checker Utility in the current MySQL 5.7 environment to generate the report; I would recommend running it on any of the replica instances that have the same configuration as the production.

The user account used to execute the upgrade checker tool must have ALL rights up to MySQL Shell 8.0.20. The user account requires RELOAD, PROCESS, and SELECT capabilities as of MySQL Shell 8.0.21.

How to generate a report using Upgrade Checker Utility

To generate a report using Upgrade Checker Utility we may either login to the shell prompt or execute directly from the command prompt.

mysqlsh -- util checkForServerUpgrade 'root@localhost:3306' --target-version=8.0.29 --config-path=/etc/my.cnf > CheckForServerUpgrade_Report.txt
Please provide the password for 'mysqluser@localhost:3306':

$ mysqlsh
MySQL  JS > util.checkForServerUpgrade('root@localhost:3306', { "targetVersion":"8.0.29", "configPath":"/etc/my.cnf"})
Please provide the password for 'mysqluser@localhost:3306':

To quit the mysqlsh command prompt, type \exit.

MySQL  JS > \exit
Bye!

Do pt-upgrade and Upgrade Checker Utility do the same tests?  No!

Don’t confuse the Upgrade Checker Utility with the pt-upgrade tool since they are used for different kinds of major version upgrade testing. The Upgrade Checker Utility performs a variety of tests on the selected MySQL server to ascertain whether the upgrade will be successful; however, the tool does not confirm whether the upgrade is compatible with the application queries or routines.

Does it check both my.cnf file and the MySQL server variables?

The utility can look for system variables declared in the configuration file (my.cnf) but removed in the target MySQL Server release, as well as system variables not defined in the configuration file but with a different default value in the target MySQL Server release.  You must give the file path to the configuration file when executing checkForServerUpgrade() for these checks. However, the tool is unable to identify the variables that have been deleted in the my.cnf file but are set in the MySQL server.

Let us remove query_cache_type from /etc/percona-server.conf.d/mysqld.cnf and run the command.

]# mysql -uroot -p -e "SHOW VARIABLES WHERE Variable_Name IN ('query_cache_type','query_cache_size')"
Enter password:
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| query_cache_size | 1048576 |
| query_cache_type | ON      |
+------------------+---------+

]# cat /etc/my.cnf
#
# The Percona Server 5.7 configuration file.
#
#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#   Please make any edits and changes to the appropriate sectional files
#   included below.
#
!includedir /etc/my.cnf.d/
!includedir /etc/percona-server.conf.d/
]#

Remove query_cache_type variable from mysqld.cnf:

]# sed -i '/query_cache_type/d' /etc/percona-server.conf.d/mysqld.cnf
]#

]# grep -i query /etc/my.cnf /etc/percona-server.conf.d/mysqld.cnf
/etc/percona-server.conf.d/mysqld.cnf:query_cache_size=5058320
]#

As the query cache type variable has been deleted from my.cnf,  the tool is unable to detect it.

#  mysqlsh -- util checkForServerUpgrade 'root@localhost:3306' --target-version=8.0.29 --config-path=/etc/my.cnf | grep  -B 6  -i "query_cache"
15) Removed system variables
  Error: Following system variables that were detected as being used will be
    removed. Please update your system to not rely on them before the upgrade.
  More information:
    https://dev.mysql.com/doc/refman/8.0/en/added-deprecated-removed.html#optvars-removed

  query_cache_size - is set and will be removed
ERROR: 1 errors were found. Please correct these issues before upgrading to avoid compatibility issues.

In JSON format, the report looks like this:

Note: To make the blog more readable, I shortened the report.

# mysqlsh -- util checkForServerUpgrade 'root@localhost:3306' --target-version=8.0.29 --config-path=/etc/my.cnf --output-format=JSON
{
    "serverAddress": "localhost:3306",
    "serverVersion": "5.7.39-42 - Percona Server (GPL), Release 42, Revision b0a7dc2da2e",
    "targetVersion": "8.0.29",
    "errorCount": 1,
    "warningCount": 27,
    "noticeCount": 1,
    "summary": "1 errors were found. Please correct these issues before upgrading to avoid compatibility issues.",
    "checksPerformed": [
        {
            "id": "oldTemporalCheck",
            "title": "Usage of old temporal type",
            "status": "OK",
            "detectedProblems": []
        },
        {
            "id": "reservedKeywordsCheck",
            "title": "Usage of db objects with names conflicting with new reserved keywords",
            "status": "OK",
            "detectedProblems": []
        },
…
        {
            "id": "sqlModeFlagCheck",
            "title": "Usage of obsolete sql_mode flags",
            "status": "OK",
            "description": "Notice: The following DB objects have obsolete options persisted for sql_mode, which will be cleared during upgrade to 8.0.",
            "documentationLink": "https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html#mysql-nutshell-removals",
            "detectedProblems": [
                {
                    "level": "Notice",
                    "dbObject": "global system variable sql_mode",
                    "description": "defined using obsolete NO_AUTO_CREATE_USER option"
                }
            ]
        },
        {
            "id": "enumSetElementLenghtCheck",
            "title": "ENUM/SET column definitions containing elements longer than 255 characters",
            "status": "OK",
            "detectedProblems": []
        },
…
        {
            "id": "removedSysVars",
            "title": "Removed system variables",
            "status": "OK",
            "description": "Error: Following system variables that were detected as being used will be removed. Please update your system to not rely on them before the upgrade.",
            "documentationLink": "https://dev.mysql.com/doc/refman/8.0/en/added-deprecated-removed.html#optvars-removed",
            "detectedProblems": [
                {
                    "level": "Error",
                    "dbObject": "query_cache_size",
                    "description": "is set and will be removed"
                }
            ]
        },
        {
            "id": "sysVarsNewDefaults",
            "title": "System variables with new default values",
            "status": "OK",
            "description": "Warning: Following system variables that are not defined in your configuration file will have new default values. Please review if you rely on their current values and if so define them before performing upgrade.",
            "documentationLink": "https://mysqlserverteam.com/new-defaults-in-mysql-8-0/",
            "detectedProblems": [
                {
                    "level": "Warning",
                    "dbObject": "back_log",
                    "description": "default value will change"
                },
                {
                    "level": "Warning",
                    "dbObject": "innodb_max_dirty_pages_pct",
                    "description": "default value will change from 75 (%)  90 (%)"
                }
            ]
        },
        {
            "id": "zeroDatesCheck",
            "title": "Zero Date, Datetime, and Timestamp values",
            "status": "OK",
            "detectedProblems": []
        },
…
    ],
    "manualChecks": [
        {
            "id": "defaultAuthenticationPlugin",
            "title": "New default authentication plugin considerations",
            "description": "Warning: The new default authentication plugin 'caching_sha2_password' offers more secure password hashing than previously used 'mysql_native_password' (and consequent improved client connection authentication). However, it also has compatibility implications that may affect existing MySQL installations.  If your MySQL installation must serve pre-8.0 clients and you encounter compatibility issues after upgrading, the simplest way to address those issues is to reconfigure the server to revert to the previous default authentication plugin (mysql_native_password). For example, use these lines in the server option file:\n\n[mysqld]\ndefault_authentication_plugin=mysql_native_password\n\nHowever, the setting should be viewed as temporary, not as a long term or permanent solution, because it causes new accounts created with the setting in effect to forego the improved authentication security.\nIf you are using replication please take time to understand how the authentication plugin changes may impact you.",
            "documentationLink": "https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-compatibility-issues\nhttps://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-replication"
        }
    ]
}

Please read Daniel Guzmán Burgos’ blog post to find out more about the Upgrade Checker Utility and click the link to learn more about the pt-upgrade testing.

Prior to a major version upgrade, application query testing and configuration checks are an inevitable task, and the pt-upgrade and “Upgrade Checker Utility” are quite helpful.

Apr
20
2022
--

Finding Differences Between MySQL Servers

Finding Differences Between MySQL Servers

Finding Differences Between MySQL ServersWhen one is responsible for promoting application development from Dev through the various environments such as QA, UAT, etc., through Production, it is often useful to ensure that configurations in test environments are comparable to the final production environment.  This is especially true with systems where a team of DBAs manage the servers.

Obviously, the difference in performance could be due to differences in hardware, storage, networking, software configuration, etc.  The question is how does one quickly and efficiently find the differences without having to run a lot of different commands and compare the output.  Fortunately, our Percona Toolkit has a couple of utilities that can make this much easier.  When you are tasked with supporting large numbers of servers, efficiency is paramount and this is where the toolkit can really help you!

You can find more information on the Percona Toolkit here: https://www.percona.com/software/database-tools/percona-toolkit

Let’s look at a few utilities that can help you spot differences in servers.  The first tool we will look at is pt-summary (https://www.percona.com/doc/percona-toolkit/LATEST/pt-summary.html).

The pt-summary tool will give you a summarized view of the server hardware.  It runs a series of Linux commands and formats the output into an easy-to-consume format.  You will get details on the following:

  • Operating System & Linux kernel
  • Virtualization
  • CPU (number of cores, speed, models, caching)
  • Memory consumption, swappiness, etc.
  • Mounted filesystems (mount points, size, usage, format, etc.)
  • Disk schedulers
  • Disk partitioning
  • Logical Volumes
  • RAID
  • Network configuration
  • Network traffic statistics
  • Network connections
  • Top processes
  • CPU utilization

By running the same report on the servers, you can spot-check differences in the hardware and Operating System.

When you want to find out differences in MySQL configuration, the toolkit has the perfect tool for this.  It is called pt-config-diff (https://www.percona.com/doc/percona-toolkit/LATEST/pt-config-diff.html).

This utility can connect to different servers and look at their configuration via MySQL Server Variables and then perform a “diff” on the values.  The tool creates a simple and concise output showing any differences between the servers.  Any differences will be immediately obvious.  Below is an example of the output:

2 config differences
Variable                  my.primary.cnf  my.replica.cnf
========================= =============== ===============
datadir                   /tmp/12345/data /tmp/12346/data
port                      12345           12346

Usage of these two utilities will allow you to quickly and easily find differences in either hardware or MySQL configuration between two or more servers.  These are the two tools that I always go to when I need to understand why two servers that are supposedly the same do not perform similarly.

Oct
04
2021
--

Synchronize Tables on the Same Server with pt-table-sync

Synchronize Tables on the Same MySQL Server

Synchronize Tables on the Same MySQL ServerIt is a common use case to synchronize data in two tables inside MySQL servers. This blog post describes one specific case: how to synchronize data between two different tables on the same MySQL server. This could be useful, for example, if you test DML query performance and do not want to affect production data. After few experiments, tables get out of sync and you may need to update the test one to continue working on improving your queries. There are other use cases when you may need to synchronize the content of the two different tables on the same server, and this blog will show you how to do it.

Table Content Synchronization

The industry-standard tool for table content synchronization – pt-table-sync – is designed to synchronize data between different MySQL servers and does not support bulk synchronization between two different databases on the same server yet. If you try it, you will receive an error message:

$ pt-table-sync D=db1 D=db2 --execute --no-check-slave
You specified a database but not a table in D=db1.  Are you trying to sync only tables in the 'db1' database?  If so, use '--databases db1' instead.

However, it is possible to synchronize two individual tables on the same server by providing table names as DSN parameters:

$ pt-table-sync D=db1,t=foo D=db2,t=foo --execute --verbose
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:24:26 03:24:26 2    db1.foo

You may even synchronize two tables in the same database:

$ pt-table-sync D=db2,t=foo D=db2,t=bar --execute --verbose
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:25:34 03:25:34 2    db2.foo

We can use this feature to perform bulk synchronization.

First, we need to prepare a list of tables we want to synchronize:

$ mysql --skip-column-names -se "SHOW TABLES IN db2" > db1-db2.sync

$ cat db1-db2.sync
bar
baz
foo

Then we can invoke the tool as follows:

$ for i in `cat db1-db2.sync`; do pt-table-sync D=db1,t=$i D=db2,t=$i --execute --verbose; done
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 GroupBy   03:31:52 03:31:52 0    db1.bar
# Syncing D=db2,t=baz
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   03:31:52 03:31:52 2    db1.baz
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 GroupBy   03:31:52 03:31:52 0    db1.foo

If you have multiple database pairs to sync, you can agree on the file name and parse it before looping through table names. For example, if you use pattern

SOURCE_DATABASE-TARGET_DATABASE.sync

  you can use the following loop:

$ for tbls in `ls *.sync`
>   do dbs=`basename -s .sync $tbls`
>   source=${dbs%-*}
>   target=${dbs##*-}
>   for i in `cat $tbls`
>     do pt-table-sync D=$source,t=$i D=$target,t=$i --execute --verbose 
>   done
> done
# Syncing D=cookbook_copy,t=limbs
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      4      0 GroupBy   04:07:07 04:07:07 2    cookbook.limbs
# Syncing D=cookbook_copy,t=limbs_myisam
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      5      0 GroupBy   04:07:08 04:07:08 2    cookbook.limbs_myisam
# Syncing D=db2,t=bar
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      5      0 GroupBy   04:07:08 04:07:08 2    db1.bar
# Syncing D=db2,t=baz
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      5      0 GroupBy   04:07:08 04:07:08 2    db1.baz
# Syncing D=db2,t=foo
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      5       0      0      0 GroupBy   04:07:08 04:07:08 2    db1.foo

Note that

pt-table-sync

synchronizes only tables that exist in both databases. It does not create tables that do not exist in the target database and does not remove those that do not exist in the source database. If your schema could be out of sync, you need to synchronize it first.

I used option

--verbose

in all my examples, so you can see what the tool is doing. If you omit this option the tool still is able to synchronize tables on the same server.

Complete the 2021 Percona Open Source Data Management Software Survey

Have Your Say!

Oct
08
2020
--

Dangerous Edge Case Warning for Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-changeRecently I was dealing with an unexpected issue raised by our Support customer, in which data became inconsistent after a schema change was applied.

After some investigation, it turned out that affected tables had a special word in the comments of some columns, which triggered an already known (and fixed) issue with the TableParser.pm library of Percona Toolkit.  The problem is that the customer was using an outdated Toolkit version, where pt-online-schema-change was using that buggy parser.

This bug applies only to Percona Toolkit versions up to 3.0.10, so if you have already 3.0.11 or newer installed, you can skip the rest of this post as these are no longer affected.

I am writing this post to warn every user of pt-online-schema-change who has not upgraded the toolkit, as the problem is potentially very dangerous and can lead to the silent loss of data. 

The problem can manifest in two ways. The first one, although confusing, is not really dangerous as the operation is canceled. It happens when the columns with culprit comments do not allow NULL values. For example:

CREATE TABLE `test_not_null` (
`id` int NOT NULL,
`add_id` int NOT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

Schema change operation on this one will look like this:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_not_null --print --alter "engine=InnoDB" --execute
(...)
Altering `test`.`test_not_null`...
Creating new table...
CREATE TABLE `test`.`_test_not_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) NOT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Created new table test._test_not_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_not_null_new` engine=InnoDB
Altered `test`.`_test_not_null_new` OK.
2020-09-30T21:25:22 Creating triggers...
2020-09-30T21:25:22 Created triggers OK.
2020-09-30T21:25:22 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/
2020-09-30T21:25:22 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_ins`
2020-09-30T21:25:22 Dropped triggers OK.
2020-09-30T21:25:22 Dropping new table...
DROP TABLE IF EXISTS `test`.`_test_not_null_new`;
2020-09-30T21:25:22 Dropped new table OK.
`test`.`test_not_null` was not altered.
2020-09-30T21:25:22 Error copying rows from `test`.`test_not_null` to `test`.`_test_not_null_new`: 2020-09-30T21:25:22 Copying rows caused a MySQL error 1364:
Level: Warning
Code: 1364
Message: Field 'add_id' doesn't have a default value
Query: INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/

So the reason for the failed operation may be unclear, but at least no data gets damaged. A much worse result happens when the column with comment allows nulls:

CREATE TABLE `test_null` (
`id` int NOT NULL,
`add_id` int DEFAULT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
+----+--------+
| id | add_id |
+----+--------+
| 1  |      1 |
| 2  |      2 |
| 3  |      3 |
+----+--------+
3 rows in set (0.01 sec)

For this one, the schema change command runs without any errors:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_null --print --alter "engine=InnoDB" --execute
(...)
Altering `test`.`test_null`...
Creating new table...
CREATE TABLE `test`.`_test_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) DEFAULT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Created new table test._test_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_null_new` engine=InnoDB
Altered `test`.`_test_null_new` OK.
2020-09-30T21:28:11 Creating triggers...
2020-09-30T21:28:11 Created triggers OK.
2020-09-30T21:28:11 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_null_new` (`id`) SELECT `id` FROM `test`.`test_null` LOCK IN SHARE MODE /*pt-online-schema-change 3568 copy table*/
2020-09-30T21:28:11 Copied rows OK.
2020-09-30T21:28:11 Analyzing new table...
2020-09-30T21:28:11 Swapping tables...
RENAME TABLE `test`.`test_null` TO `test`.`_test_null_old`, `test`.`_test_null_new` TO `test`.`test_null`
2020-09-30T21:28:11 Swapped original and new tables OK.
2020-09-30T21:28:11 Dropping old table...
DROP TABLE IF EXISTS `test`.`_test_null_old`
2020-09-30T21:28:11 Dropped old table `test`.`_test_null_old` OK.
2020-09-30T21:28:11 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_ins`
2020-09-30T21:28:11 Dropped triggers OK.
Successfully altered `test`.`test_null`.

But… the table data is not the same after:

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
+----+--------+
| id | add_id |
+----+--------+
|  1 |   NULL |
|  2 |   NULL |
|  3 |   NULL |
+----+--------+
3 rows in set (0.00 sec)

Summarizing, it is essential to make sure you are using the up to date Percona Toolkit, especially the pt-online-schema-change tool, to avoid potential disaster. The current latest stable release, as of when I am writing the post, is version 3.2.1, and the fixed version for this particular bug, 3.0.11, was released in July 2018.

References:

https://www.percona.com/doc/percona-toolkit/3.0/release_notes.html
https://jira.percona.com/browse/PT-1570

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com