Jan
30
2023
--

Talking Drupal #384 – The Drop Times

Today we are talking about The Drop Times with Anoop John.

For show notes visit: www.talkingDrupal.com/384

Topics

  • Tell us about the Drop Times
  • Tell us a bit about yourself
  • Getting involved with Drupal
  • People behind Drop Times
  • Mission of the Drop Times
  • Content source
  • Current audience
  • Attribution issues
  • How has the beginning been
  • Funding
  • Roadmap
  • Non-profit
  • Contribution of content
  • How can we help

Resources

Guests

Anoop John – zyxware.com @anoopjohn

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Katherine Druckman – katherinedruckman.com @katherined

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Content Planner Helps you to create, plan and manage content. It offers a dashboard, a content calendar and a todo list.

Jan
30
2023
--

Why PostgreSQL Needs Transparent Database Encryption (TDE)

PostgreSQL Needs Transparent Database Encryption

As Ibrar Ahmed noted in his blog post on Transparent Database Encryption (TDE). PostgreSQL is a surprising outlier when it comes to offering Transparent Database Encryption.  Instead, it seems PostgreSQL Developers are of the opinion that encryption is a storage-level problem and is better solved on the filesystem or block device level.  I do not share that opinion, and according to a poll I did late last year, many PostgreSQL community members don’t either.

There are two reasons TDE implemented inside the database is essential—technical reasons and non-technical reasons. In my opinion, engineers tend to focus on the technical side and stop on “technical solutions exist”, while in practice, what really matters, is if an organization is able to implement the solutions offered. This is where organizational issues (especially politics) play a big role, especially in large organizations.

For example, only a fraction of Oracle (and other legacy databases) workloads that could be migrated to PostgreSQL are actually migrated, even if it offers substantial cost savings and additional benefits, all because of organizational resistance.

But first, where does the need for encryption really come from? On a high level, some of it comes from Technologists who understand unencrypted sensitive data is not a good idea. Increasingly, though, the industry is not relying on engineers having common sense but rather on defined standards that companies need to comply with (HIPAA, PCI DSS, SOC 2, ISO 27001), etc.

Compliance requirements are typically written in rather general terms, and it is up to the compliance team (and compliance audit team) to translate these to technical requirements, many of which would prefer (or even require) TDE. 

Technical reasons for Transparent Database Encryption

Compared to storage-level encryption, encryption on the database side offers multiple benefits, including:

  • It offers more flexibility and granularity in the encryption of database objects (what is encrypted, what is not, and with what key).
  • If database files are copied or otherwise exposed in their raw form, exposure does not happen.  For example, physical backups stored in open S3 buckets are relatively common exposure vectors.
  • It does not require encryption capabilities on the storage level to provide data security.
  • It adds defense in depth. Even if the storage level encryption exists, having TDE plus storage level encryption gives an added layer of security.

Organizational (non-technical) reasons for TDE

Even if you found a way to deal with all the technical reasons you prefer encryption inside your database engine (TDE), you may be facing additional friction in your organization choosing this path.  This might be a challenge with the security/compliance team (as I already mentioned above) or require additional coordination with the team responsible for storage to ensure proper encryption of original data and all copies (including backups).

While these will look trivial for startups and small companies where all of those are single-person or well-connected teams, you may be surprised by how expensive those things are in some large organizations!

Encryption in the cloud Database as a Service (DBaaS) space

You may argue that storage-level encryption seems to be good enough for cloud vendors.  AWS, for example, offers encryption in RDS, which appears to be good enough for most of its customers!   However, things are a lot different in RDS environments; it is a rather restricted environment where many technical and organizational aspects are taken care of.  For example, if you have created an encrypted instance, backups will be automatically encrypted, too, eliminating the possibility of a “mistake.” 

State of TDE in PostgreSQL

Interestingly, the challenges of implementing Transparent Database Encryption in PostgreSQL are not just technical but organizational too.  In my memory, there have been several implementations offered for inclusion in PostgreSQL, but none made it through.  The Cybertec encryption patch existed for many years, and there was also work by NTT submitted a few years ago. Highgo seems to have worked on TDE, too, and there were talks at PgCon in other years.  What does not happen, though, is a feature merge into PostgreSQL Core. 

You may argue the PostgreSQL Core is conservative by design, and this is what made PostgreSQL such a stable product. I think, though, one needs to maintain the balance between innovation and stability; shifting too much to one side risks the long-term project future. 

Note I do not argue that including full TDE support in PostgreSQL Core is the right idea; it might be at this stage, we need innovation to happen through competition of several ideas and several implementations. What the PostgreSQL Core might highly benefit from is a pluggable storage interface, as this may not only implement encryption but also store data somewhere else than a locally mounted POSIX file system, such as what Neon is doing. 

Beyond Transparent Database Encryption in PostgreSQL

While I believe Transparent Database Encryption in PostgreSQL is important, I think it is just an illustration of a bigger question.  Is technical governance in PostgreSQL designed to maximize its success in the future, or is it more about sticking to the approaches that helped PostgreSQL reach current success levels? For a project of such scale and influence, there seems to be surprisingly little user impact on PostgreSQL Governance. The PostgreSQL Core Team consists of “seven long-time community members with various specializations” rather than having clear electable positions, as many other open source organizations do.  The development process in PostgreSQL is based around a mailing list rather than more modern and organized issue tracking and pull-request-based development workflows.   Interested in PostgreSQL Bugs? There is no bugs database that allows you to easily see which bug is confirmed and what version it was fixed in a user-friendly way. Instead, you need to dig through the bugs mailing list.

Do not get me wrong, PostgreSQL is a fantastic database, and it rightfully was Database of the Year more than any other database. Yet, I believe there are some opportunities to make PostgreSQL even more “future-proof” and be even more successful in the future.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jan
30
2023
--

Percona Backup for MongoDB 2.0.3, Updates to Percona Distribution for MySQL: Release Roundup January 30, 2023

Percona Releases Jan 30 2023

It’s time for the release roundup!

Percona Releases Jan 30 2023Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since January 9, 2023. Take a look!

 

Percona Distribution for MySQL (PS-Based Variant) 8.0.30 (Update)

An update to the release of Percona Distribution for MySQL (PS-based variant) 8.0.30 was made available on January 23, 2023. Percona Distribution for MySQL is a single solution with the most critical enterprise components from the MySQL open source community, designed and tested to work together. This update to the release of Percona Distribution for MySQL using the Percona Server for MySQL includes a new version of Percona Toolkit 3.5.1 that fixes the security vulnerability CVE-2022-32149.

Download Percona Distribution for MySQL (PS-based variant) 8.0.30 (Update)

 

Percona Distribution for MySQL (PS-Based Variant) 8.0.30 (Second Update)

The second update to Percona Distribution for MySQL (PS-based variant) 8.0.30 was released on January 23, 2023. This update to the release of Percona Distribution for MySQL using the Percona Server for MySQL includes a new version of Percona Toolkit 3.5.1 that fixes the security vulnerability CVE-2022-32149.

Download Percona Distribution for MySQL (PS-based variant) 8.0.30 (Second Update)

 

Percona Distribution for MySQL (PXC-Based Variant) 8.0.30 (Update)

An update to the release of Percona Distribution for MySQL (PXC-based variant) 8.0.30 was made available on January 23, 2023. This update to the release of Percona Distribution for MySQL using the Percona XtraDB Cluster includes the new version of Percona Toolkit 3.5.1 that fixes the security vulnerability CVE-2022-32149.

Download Percona Distribution for MySQL (PXC-based variant) 8.0.30 (Update)

 

Percona Backup for MongoDB 2.0.3

Percona Backup for MongoDB 2.0.3 was released on January 11, 2023. It is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a non-sharded replica set), and for restoring those backups to a specific point in time.

Release highlights include:

  • Incremental physical backups enable you to reduce storage costs and facilitate data safety for business crucial data. By saving only the differences results in faster completion time and makes IB much smaller in size compared to full backups. As such, you save on storage space and data transfer in case of cloud setups. This is the technical preview feature, yet we welcome your feedback to improve the functionality.
  • Now you can selectively back up and restore unsharded collections in sharded clusters. This extends the data set to work with.
  • Added support of AWS IRSA (Identity Roles for Service Accounts) credentials allows Percona Backup for MongoDB running in a pod to access the AWS storage using the IAM roles. This increases the security of your cloud infrastructure and enables you to control access to it from a single place.
  • Percona Backup for MongoDB is now available on Red Hat Enterprise Linux 9 and compatible derivatives

Download Percona Backup for MongoDB 2.0.3

 

Percona Toolkit 3.5.1

Percona Toolkit 3.5.1 was released on January 23, 2023. It is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL, MariaDB, MongoDB, and PostgreSQL server and system tasks that are too difficult or complex to perform manually – freeing your DBAs for work that helps you achieve your business goals. This release includes improvements, bug fixes, and support for PostgreSQL and MySQL operators.

Download Percona Toolkit 3.5.1

 

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Jan
26
2023
--

Why You Need To Keep Track of Bugs

Keep Track of software Bugs

Keep Track of software BugsMost of us are lucky that software bugs are a small part of our lives. The ‘things just work’ attitude works for most situations, but occasionally something appears that may seem trivial at first glance may be catastrophic.

A bug appeared in MySQL’s recently released 8.0.32 that really caught my attention. This is a prime example of how a minor bug could have a significant impact on your life and a detrimental impact on your company or project.

The title Wrong result for AVG() OVER(ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) is the first thing that caught my eye. A lot of developers heavily use the AVG() function. My first thought was, ‘ut-oh, AVG() being broken is a terrible thing,’ but then I realized it was thankfully not that drastic.

The OVER() clause indicates this is a problem in a Window Function. And odds are, if you are using OVER() for calculations, you are also using AVG() a great deal. I love Window Functions with AVG() a lot, so I was happy that the problem was constrained a little, but I was sitting squarely in the middle of those constraints.

I am curious. And also wondering if this could affect my code.

If you find yourself in a similar position, the first step is to read the entire bug report from top to bottom, paying attention to the details. At this point, it is still okay to wonder if this is an edge case that may not bother you, and you will be able to go on about your day happily.

Most MySQL bug reports come with an example of the problem. The author of this bug, Markus Winand, is well-known in the database world and wrote a beautiful, clear example of reporting a bug. He describes the problem, identifies the error, and provides a test script.

The problem, he states, is that there is a wrong answer when using AVG() with the window-frame ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING. He then details the issue: the result is wrong for the second row (it seems to return the SUM, not divided by the count). And then comes an indication that he really has tested this issue.

Problem does not exist in 8.0.31 or earlier and also not in 8.0.32 when using RANGE instead of ROWS.

This may not be a big issue as MySQL 8.0.32 is very new and not widely used in production yet. So there may be a collective sigh of relief in the MySQL metaverse that this is not a problem that has been silently plaguing us for an unknown period of time. And those who have already upgraded are scrambling to check their code for ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING.

I would encourage you to test your non-8.0.32 versions by issuing the following just in case and as practice. The following code is from the bug report.

CREATE TABLE t (
n INTEGER
);

INSERT INTO t VALUES (1), (2), (3), (4);

SELECT n
, SUM(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "sum(n)"
, COUNT(*) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "count(n)"
, AVG(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "avg(n)"
, SUM(n) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
/ COUNT(*) OVER(ORDER BY n ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS "sum(n)/count(n)"
FROM t
ORDER BY n;

+------+--------+----------+--------+-----------------+
| n | sum(n) | count(n) | avg(n) | sum(n)/count(n) |
+------+--------+----------+--------+-----------------+
| 1 | 9 | 3 | 3.0000 | 3.0000 |
| 2 | 7 | 2 | 7.0000 | 3.5000 | <-- Column "avg(n)" is wrong.
| 3 | 4 | 1 | 4.0000 | 4.0000 |
| 4 | NULL | 0 | NULL | NULL |
+------+--------+----------+--------+-----------------+

My 8.0.32 instance, not in production, reported the wrong number. My primary system reported the proper 3.5 in the avg(n) column. Whew! Hopefully, this will be an easy fix for 8.0.33.

Lessons learned

You need to peruse the list of bugs regularly, say once a week, just to keep apprised of potential problems on your horizon. You can search for bugs on https://bugs.mysql.com for a selected time and filter on versions and periods.

If the bug is a worry, try the test case. Yes, there are occasions when you are looking at an edge-case situation that may not bother you. Still, part of being a Database Administrator (DBA) is a healthy dose of paranoia that something will try and hurt your data. Test to make sure either way.

Lastly, it pays to check your critical calculations from time to time. Imagine you are an actuarial for an insurance company, and this bug skipped past you! The numbers you rely on are untrustworthy, and your company could be in ‘less than optimal circumstances.’

And a big thank you and ‘good catch’ to Marcus Winand for finding this bug.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Jan
25
2023
--

Identify Active Databases and Users in MySQL

Identify Active Databases and Users in MySQL

Identify Active Databases and Users in MySQLDatabase administrators often need to identify inactive databases and users to save resources.  This can be done using various methods to determine which databases and users are frequently accessed. We can save disk space and other resources by purging inactive databases and users. This task is part of regular maintenance and helps to manage disk space and other resources effectively.

This post will explain easy ways to identify active databases and users in MySQL.

Identifying active databases in MySQL

There are various ideas we can use to identify the active databases. We can identify based on the timestamp of the data file update, an entry in a slow log, scan binary logs, and parse the audit log or general log. But we can ignore the general log from this investigation as it consumes a lot of disk space, which is an overhead.

We have the below methods to identify active database objects:

  • Review slow logs to identify database objects being queried.
  • Find the updated timestamp on datadir files.
  • Scan binary logs for identifying tables that are being modified.
  • Scan audit logs for identifying database objects.

During our tests, we observed that the above methods helped us identify the active databases; however, we sometimes had to use two or more ways to get the best result. Also, if there are monthly read or written tables, the scope of the review increases, and you will have to maintain the logs for a more extended period.

We also have two stats table features present in Percona Server for MySQL and MySQL Community versions as follows:

  • User Statistics
  • Sys-schema-table-statistics

User Statistics: We have another way to identify the active databases, users, and other information in Percona Server for MySQL (and MariaDB). Both have userstat, a new plugin feature that is disabled by default and must be enabled by setting userstat to ON. This ensures that the statistics collection doesn’t cause any extra load on the server unless desired.

Sys-schema-table-statistics: sys-schema-table-statistics summarizes table statistics. By default, rows are sorted by descending total wait time. Sys schema stats table will help you to know the table statistics until MySQL Service Restart. Now, as MySQL restarts, the stats are lost; hence, we will not cover it in this blog.

We have a great utility called pt-table-usage from Percona Toolkit, which can identify database objects from logs.

Identify database tables using pt-table-usage

The pt-table-usage tool reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.

To identify the databases, you must depend on slow logs where we can find the read queries. Make sure to configure slow_query_log and long_query_time parameters as below:

SET GLOBAL slow_query_log=1;
SET GLOBAL long_query_time=0;
SET GLOBAL slow_query_log_use_global_control='long_query_time';

Slow_query_log: MySQL’s slow query log feature enables you to log queries that exceed a predefined time limit. When we use SET GLOBAL slow_query_log=1; the slow log is enabled, and to disable the slow_query_log, it needs to be set as 0. 

Long_query_time: When the slow_query_log is enabled, it logs any query that takes longer than long_query_time seconds to run. When configured with SET GLOBAL long_query_time=0;, all queries will be logged. 

SET global slow_query_log_use_global_control=’long_query_time’;: This variable is helpful to apply the global settings for current running sessions in Percona Server for MySQL. You can take an overview of slow_query_log_use_global_control with other options.

So, the above parameter settings will help to log all queries in the slow log file. Also, we can add the below parameter if we want to reduce the overhead of logging all queries and disk space.

SET GLOBAL Log_slow_rate_type = query;
SET GLOBAL Log_slow_rate_limit = 10;

Note: If we set the value to 10, then 10% of sessions/queries will be logged overhead of logging all queries and disk space. Once we have the slow log collection done for a sufficient time, we can parse the slow log and fetch the database objects actively used.

pt-table-usage reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.

The tool prints a usage report for each table in every query, similar to the following:

$ sudo pt-table-usage /var/lib/mysql/ip-172-31-92-72-slow.log > table_usage.log

$ cat table_usage.log | head -10
Query_id: 0xA212AD93263CF26F.1
SELECT DUAL

Query_id: 0xC684EA1D78348D23.1
SELECT information_schema.innodb_metrics
WHERE information_schema.innodb_metrics

Query_id: 0x153F1CE7D660AE82.1
SELECT information_schema.processlist
WHERE information_schema.processlist

From the above, the output used pt-table-usage with slow log (/var/lib/mysql/ip-172-31-92-72-slow.log) directed to a log file containing query and database object details.

Now parse the output table_usage.log using the below command to identify unique database objects.

$ cat table_usage.log | egrep "WHERE|JOIN" | grep -v "__SQ" | grep -v "DUAL" | awk '{print $2}' | sort | uniq -c | sort -nr
6703 information_schema.innodb_metrics
3352 performance_schema.table_lock_waits_summary_by_table
3352 information_schema.processlist
3352 INFORMATION_SCHEMA.PLUGINS
2790 information_schema.tables
558 performance_schema.table_io_waits_summary_by_table
558 performance_schema.table_io_waits_summary_by_index_usage
558 information_schema.schemata
5 authors
4 information_schema.table_statistics

Identify databases using slow logs

Once we have a collection of slow logs for a certain period with the above slow log configuration to capture all queries. We can easily fetch the databases read from a slow log file through Unix commands. 

Below command will list out all database names based on usage.

$ sudo grep Schema: /var/lib/mysql/mysql-slow.log| awk -F' ' '{print $4 $5}' | sort | uniq -c
2717 Schema:mytestdb
123 Schema:percona
762238 Schema:sbtest

Example for 8.0 Version:

$ sudo grep Schema: /var/lib/mysql/mysql-slow.log| awk -F' ' '{print $2 $3}' | sort | uniq -c
2717 Schema:mytestdb
273 Schema:percona
223567 Schema:sbtest

We can grep tables as well by exploring with Unix grep or awk commands.

Identifying active users in MySQL

We have two methods to identify active users for a specific period. 

  1. Parse Audit Logs to identify active users.
  2. Identify active users from processlist logs.

Disclaimer: Though we have user_stat plugin feature for identifying the active users in Percona Server for MySQL (and MariaDB). But the statistics in the INFORMATION_SCHEMA.CLIENT_STATISTICS table are available only from the last MySQL service restart. 

Parse Audit Logs to identify active users

The Percona Audit Log Plugin monitors and logs connection and query activity performed on a specific server. Information about the activity is stored in a log file. The Audit Log plugin is installed but, by default, is not enabled when you install Percona Server for MySQL. To check if the plugin is enabled, run the following commands:

mysql> SELECT * FROM information_schema.PLUGINS WHERE PLUGIN_NAME LIKE '%audit%';

Follow this link to enable audit_log_plugin and other options in Percona Server for MySQL.

We use audit_log_rotations and audit_log_rotate_on_size parameters to manage the disk space.

SET GLOBAL audit_log_rotate_on_size=1073741824;
SET GLOBAL audit_log_rotations=5;

 Audit_log_rotate_on_size: This variable is measured in bytes and specifies the maximum size of the audit log file. Upon reaching this size, the audit log will be rotated. The rotated log files are present in the same directory as the current log file. The sequence number is appended to the log file name upon rotation. For this variable to take effect, set the audit_log_handler variable to FILE.

Audit_log_rotations: The variable is used to specify how many log files should be kept when the audit_log_rotate_on_size variable is set to a non-zero value. This variable has an effect only when audit_log_handler is set to FILE.

Once these variables are set, we can fetch the user details from these logs. You can use more filters to get specific users, database objects, host details, etc.

$ sudo grep -w 'USER' /var/lib/mysql/audit.log | tail
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm"

Identify Active Users from processlist logs:

To identify active users from processlist, we will need the data collected over time to have the most possibly accurate information. The processlist will collect the information only at a specific interval; if the user is not executing the query at that time, there’s a high chance of missing it. Thus the granularity of collecting the processlist is critical to get the most accurate user.

So, here we will collect a processlist every five seconds using a chron, save it in a daily file, and analyze it.

processlist_DB1_20221215.log – Processlist file name collected for one day.

$ cat processlist_DB1_20221215.log | head
================================================================================
2022-12-15 00:00:05: Uptime: 8935629 Threads: 7 Questions: 5258388359 Slow queries: 96680 Opens: 4096 Flush tables: 1 Open tables: 1981
40 sampleuser 10.11.8.11:54540 sampledb Sleep 5 None 1 1 
41 sampleuser 10.11.8.11:54542 sampledb Sleep 5 None 38 38 
64 sbtest_user 10.11.8.11:54610 sbtest Sleep 5 None 3 3 
65 sbtest_user 10.11.8.11:54612 sbtest Sleep 5 None 0 544 
66 sbtest_user 10.11.8.11:54614 sbtest Sleep 5 None 1 15 
101 sampleuser 10.11.8.11:54736 sampledb Sleep 5 None 2 14 
106 percona 10.11.8.52:53386 percona Sleep 5 None 0 0

Once we have a collection of processlist logs for a particular period, we will fetch the active users information from them.

$ for f in `ls processlist_DB1*202212*`; do echo $f; cat $f | grep -e ^[1-9] | grep -vie "system\ user\|Uptime" | awk -F' ' '{print $2}' >> /tmp/USERZ;done; cat /tmp/USERZ | sort | uniq -c
11 mytestdb_user
42498 percona
8640 sbtest_user

processlist_DB1*202208* – Processlist file names collected for a specific time.

This command lists the active users on this node/database. From this output, we can filter out the user and then, with awk, sort commands to fetch the user list using a specific database.

At last, comes the general log, the ruler of all logs. But yes, it is logging everything that happens in MySQL.

General Log: The General Log is another option for retrieving information about database activity. It logs all activity on the instance, but it has some limitations. Enabling the General Log can consume a lot of disk space and generate a high I/O load due to the constant writing of operations to the log.

Therefore, it is generally better to avoid using the General Log and to use the methods described above instead. It would also be helpful if MySQL had a system table that stored active/inactive information about databases, users, and other items, even after MySQL restarts.

Conclusion

Following the process described above, we can identify the most active databases and users. We can collect information over a specific period and follow the same process to find the list of inactive databases. Any databases and users that do not appear in the output can be considered inactive. However, it is essential to note that we should consult with the application team before marking any databases or users as inactive, as some jobs may only run periodically (e.g., quarterly or semi-annually).

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Jan
24
2023
--

WAL Compression in PostgreSQL and Recent Improvements in Version 15

WAL Compression in PostgreSQL

Attempts to compress PostgreSQL WAL at different levels have always been around since the beginning. Some of the built-in features (wal_compression) have been there since 2016, and almost all backup tools do the WAL compression before taking it to the backup repository. But it is time to take another look at the built-in wal_compression because PostgreSQL 15 has much more to offer. There will be a considerable gain if we combine this with other sets of great improvements in WAL archiving in PostgreSQL 15, as discussed in previous posts New WAL Archive Module/Library in PostgreSQL 15 and Speed Up of the WAL Archiving in PostgreSQL 15.

I am not planning to look at WAL segment file compression by the backup tools because it is external to PostgreSQL and is primarily a feature of backup tooling. Even if we are not using specialized backup tools, we can still compress the WAL segment files as part of archiving, which can give great rewards.

Let’s look at what PostgreSQL offers internally for WAL compression. The WAL-compression feature inside the PostgreSQL kicks in (if we opt-in) while doing Full Page Writes into WAL, which can save a lot of I/O overhead. Reduced WAL segment size has further gains in replication and backup because less data need to be transmitted.

What is Full Page Writes?

Novice users might be wondering what is “Full Page Writes” , and we should remember that PostgreSQL uses 8k pages.

postgres=# show block_size ;
block_size
------------
8192
(1 row)

But the host machine might be dealing with pages of smaller size, say 4k pages.

$ getconf PAGESIZE
4096

PostgreSQL deals with 8k pages as the “atomic unit” for reading and writing. But since the host machine has a smaller page size, it will split the 8k page and treat each OS page as an atomic unit. This causes a problem if there are abrupt disruptions; a part of the 8k page can be saved, while another part might be lost because the host machine may not treat the latter as part of the original piece. This is generally referred to as “partial page writes” or “torn pages.”

Such “torn pages” are corruptions from the database point of view. If a datafile with such torn page exists, PostgreSQL loses the consistency of the page. This is not a problem for PostgreSQL alone; every database software needs to deal with this problem. For example, MySQL/InnoDB deals with this problem using a Doublewrite buffer from where a copy of the non-corrupt page can be obtained during the recovery. PostgreSQL’s approach to the problem is slightly different. PostgreSQL writes a copy of the full page, which is modified for the first time after a checkpoint into WAL logs. Since WALs are synced frequently, and PostgreSQL can decide on the point up too which the recovery should happen, it is a safe place to save the copy of “full page.”

The performance penalty of Full Page Writes

As mentioned above, when the database page is modified for the first time after a checkpoint, that needs to be safely written to WAL as the trustable reference. So during crash-recovery, PostgreSQL can safely apply the consistent pages from the WAL logs. But this comes with a considerable performance implication.

Now we know there is a high chance of too many full-page writes immediately after a checkpoint. This will be clearly visible in PostgreSQL performance benchmarks as a “Sawtooth wave” pattern observed by Vadim in his tests:

Sawtooth wave PostgreSQL

As we can see, the throughput suddenly drops after every checkpoint due to heavy WAL writing and gradually picks up until the next checkpoint.

Full Page compression and recent improvements

PostgreSQL 14 and older

The Full Pages are much bigger and fully contained. This allows one to compress those full pages before writing into WAL segments. This feature landed in PostgreSQL way back in PostgreSQL 9.5; it uses the built-in LZ compression implementation, often referred to as “pglz”. But it was not too popular because of the CPU overhead. So it has never been part of the general advice for optimizations. Those users who opt-in for FP compression after analyzing their workload could just set the parameter wal_compression on all supported PostgreSQL versions and signal the server as superuser.

ALTER SYSTEM SET wal_compression=ON;
SELECT pg_reload_conf();

PostgreSQL 15+

Modern compression algorithms started offering much better compression while taking fewer CPU cycles. Lz4 is a good example. PostgreSQL 15 modifies the same parameter wal_compression to take enumerated values like pglz, lz4, and zstd, in addition to on and off, which are used for backward compatibility. The boolean equivalent values like on, true, yes, and 1 are equivalent to “pglz.”

Unlike the pglz, which is a built-in compression library in PostgreSQL, the new compression functions are provided by external libraries. So they need to be enabled during the build time. The configuration flags –with-lz4 and –with-zstd for lz4 and zstd, respectively.

For the prebuilt packages installed from repositories, you shall check like:

/usr/pgsql-15/bin/pg_config | grep "zstd\|lz4"
CONFIGURE =  '--enable-rpath' '--prefix=/usr/pgsql-15' '--includedir=/usr/pgsql-15/include' '--mandir=/usr/pgsql-15/share/man' '--datadir=/usr/pgsql-15/share' '--libdir=/usr/pgsql-15/lib' '--with-lz4' '--with-extra-version= - Percona Distribution' '--with-zstd' '--with-icu' '--with-llvm' '--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' '--with-openssl' '--with-pam' '--with-gssapi' '--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls' '--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-selinux' '--with-systemd' '--with-system-tzdata=/usr/share/zoneinfo' '--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-15/doc' '--htmldir=/usr/pgsql-15/doc/html' 'CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,--as-needed' 'LLVM_CONFIG=/usr/bin/llvm-config' 'CLANG=/usr/bin/clang' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'PYTHON=/usr/bin/python3'
LIBS = -lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lpthread -lrt -ldl -lm

Why compression of FP writes only?

Individual processes generate WAL records, and latency is very crucial for transactions. So a coordinated effort for compressing WAL record generation by multiple sessions may not add much value. But there are areas like indexing or bulk data load which could potentially benefit from WAL compression if such a feature is included in the future.

A quick look at compression options

When it is about compression, automatically, the question will be about the savings in I/O. As we know, compression comes at the expense of CPU usage. My objective is to quickly check whether there is any benefit when there is already high CPU utilization and whether there is any adverse (negative) effect on overall TPS.

There is considerable saving if there are a lot of full-page writes. I could artificially trigger checkpoints to see the end effect on total WAL generation.

WAL generation postgresql

Lz4 gives a compression comparable to the default pglz at a much less CPU cost. Zstd can provide the highest compression (30% more compared to lz4).

In a system already on the brink of too much WAL generation, the uncompressed WAL can trigger many more checkpoints, resulting in even more WAL generation.

Summary

Some of the key points/takeaways I have from the discussion in the community and as well as in my simple tests:

  • The compression method pglz available in the older version was not very efficient. It uses more CPU and can affect the TPS for specific workloads and machine configuration.
  • Modern compression algorithms and libraries are excellent. They do a much better job than what was available in PostgreSQL (pglz).
  • I couldn’t see any adverse effect on the TPS on quick tests. Instead, I could observe 10-15% better throughput with compression enabled, maybe because of less I/O wait.
  • Lz4 compression can be the choice if the database workload is CPU bound because it is light on the CPU. It can give compression very close to pglz without causing high CPU overhead.
  • Zstd can be chosen if the server load is not CPU bound because it can give us better compression at the expense of more CPU utilization.
  • An indirect benefit of WAL compression is it reduces the chance for checkpoints triggered by the volume of WALs generated (max_wal_size).

The actual benefit of compression depends on many factors. Configured checkpoint frequency, the chance of checkpoints triggered by WAL generation, storage performance, acceptable CPU overhead, type of CPU architecture, and many other factors. With the introduction of new options, the entry barrier is significantly reduced.

Word of caution

If you are using lz4 or zstd compression for all its greatness, please make sure that the PostgreSQL binaries on standby are also capable of doing it. This warning may mainly apply to users building their PostgreSQL from source code.

Thanks to community

Thanks to Michael Paquier, who took the major part in the development of PG 15 improvements,  Justin Pryzby, who played the key role in this development, Andrey Borodin, who initiated the discussion, and many others who participated in the discussions.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jan
24
2023
--

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshots

Backup Databases on Kubernetes With VolumeSnapshotsDatabases on Kubernetes continue their rising trend. We see the growing adoption of our Percona Kubernetes Operators and the demand to migrate workloads to the cloud-native platform. Our Operators provide built-in backup and restore capabilities, but some users are still looking for old-fashioned ways, like storage-level snapshots (i.e., AWS EBS Snapshots).

In this blog post, you will learn:

  1. How to back up and restore from storage snapshots using Percona Operators
  2. What the risks and limitations are of such backups

Overview

Volume Snapshots went GA in Kubernetes 1.20. Both your storage and Container Storage Interface (CSI) must support snapshots. All major cloud providers support them but might require some steps to enable it. For example, for GKE, you must create a VolumeSnapshotClass resource first.

At the high level, snapshotting on Kubernetes looks like this:

As PersistentVolume is represented by the real storage volume,

VolumeSnapshot

is the Kubernetes resource for volume snapshot in the cloud.

Getting ready for backups

First, we need to be sure that VolumeSnapshots are supported. For the major clouds, read the following docs:

Once you have CSI configured and Volume Snapshot Class is in place, proceed to create a backup.

Take the backup

Identify the PersistentVolumeClaims (PVC) that you want to snapshot. For example, for my MongoDB cluster, I have six PVCs: three x replica set nodes and three x config server nodes.

$ kubectl get pvc
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongod-data-my-cluster-name-cfg-0   Bound    pvc-c9fb5afa-1fc9-41f9-88f3-4ed457f88e58   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-cfg-1   Bound    pvc-b9253264-f79f-4fd0-8496-1d88105d84e5   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-cfg-2   Bound    pvc-5d462005-4015-47ad-9269-c205b7a3dfcb   3Gi        RWO            standard-rwo   76m
mongod-data-my-cluster-name-rs0-0   Bound    pvc-410acf85-36ad-4bfc-a838-f311f9dfd40b   3Gi        RWO            standard-rwo   78m
mongod-data-my-cluster-name-rs0-1   Bound    pvc-a621dd8a-a671-4a35-bb3b-3f386550c101   3Gi        RWO            standard-rwo   77m
mongod-data-my-cluster-name-rs0-2   Bound    pvc-484bb835-0e2d-4a40-b5a3-1ba340ec0567   3Gi        RWO            standard-rwo   76m

Each PVC will have its own VolumeSnapshot. Example for

mongod-data-my-cluster-name-cfg-0

:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongod-data-my-cluster-name-cfg-0-snap
spec:
  volumeSnapshotClassName: gke-snapshotclass
  source:
    persistentVolumeClaimName: mongod-data-my-cluster-name-cfg-0

I have listed all my VolumeSnapshots objects in one YAML manifest here.

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-volumesnapshots.yaml
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-cfg-2-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-0-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-1-snap created
volumesnapshot.snapshot.storage.k8s.io/mongod-data-my-cluster-name-rs0-2-snap created

VolumeSnapshotContent is created and bound to every

VolumeSnapshot

resource. Its status can tell you the name of the snapshot in the cloud and check if a snapshot is ready:

$ kubectl get volumesnapshotcontent snapcontent-0e67c3b5-551f-495b-b775-09d026ea3c8f -o yaml
…
status:
  creationTime: 1673260161919000000
  readyToUse: true
  restoreSize: 3221225472
  snapshotHandle: projects/percona-project/global/snapshots/snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f

  • snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f is the snapshot I have in GCP for the volume.
  • readyToUse: true – indicates that the snapshot is ready

Restore

The restoration process, in a nutshell, looks as follows:

  1. Create persistent volumes using the snapshots. The names of the volumes must match the standard that Operator uses.
  2. Provision the cluster

Like any other backup, it must have secrets in place: TLS and users.

You can use this restoration process to clone existing clusters as well, just make sure you change the cluster, PVCs, and Secret names.

Create persistent volumes from snapshots. It is the same as the creation of regular PersistentVolumeClaim, but with a

dataSource

section that points to the snapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongod-data-my-cluster-name-rs0-0
spec:
  dataSource:
    name: mongod-data-my-cluster-name-rs0-0-snap
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/volume-snapshots/mongo-pvc-restore.yaml
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-cfg-2 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-0 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-1 created
persistentvolumeclaim/mongod-data-my-cluster-name-rs0-2 created

Once done, spin up the cluster as usual. The volumes you created earlier will be used automatically. Restoration is done.

Risks and limitations

Storage support

Both storage and the storage plugin in Kubernetes must support volume snapshots. This limits the choices. Apart from public clouds, there are open source solutions like Ceph (rook.io for k8s) that can provide snapshotting capabilities.

Point-in-time recovery

Point-in-time recovery (PITR) allows you to reduce your Point Recovery Objective by restoring or rolling back the database to a specific transaction or time.

Volume snapshots in the clouds store data in increments. The first snapshot holds all the data, and the following ones only store the changes. This significantly reduces your cloud bill. But snapshots cannot provide you with the same RPO as native database mechanisms.

Data consistency and corruption

Snapshots are not data-aware. When a snapshot is taken, numerous transactions and data modifications can happen. For example, heavy write activity and simultaneous compound index creation in MongoDB might lead to snapshot corruption. The biggest problem is that you will learn about data corruption during restoration.

Locking or freezing a filesystem before the snapshot would help to avoid such issues. Solutions like Velero or Veeam make the first steps towards data awareness and can create consistent snapshots by automating file system freezes or stopping replication.

Percona Services teams use various tools to automate the snapshot creation safely. Please contact us here to ensure data safety.

Cost

Public clouds store snapshots on cheap object storage but charge you extra for convenience. For example, the AWS EBS snapshot is priced at $0.05/GB, whereas S3 is only $0.023. It is a 2x difference, which for giant data sets might significantly increase your bill.

Time to recover

It is not a risk or limitation but a common misconception I often see: recovery from snapshots takes only a few seconds. It does not. When you create an EBS volume from the snapshot, it takes a few seconds. But in reality, the volume you just created does not have any data. You can read more about the internals of EBS snapshots in this nice blog post.

Conclusion

Volume Snapshots on Kubernetes can be used for databases but come with certain limitations and risks. Data safety and consistency are the most important factors when choosing a backup solution. For Percona Operators, we strongly recommend using built-in solutions which guarantee data consistency and minimize your recovery time and point objectives.

Learn More About Percona Kubernetes Operators

Jan
23
2023
--

Talking Drupal #383 – Programming the Physical World

Today we are talking about Programming the Physical World with Stephen Cross.

For show notes visit: www.talkingDrupal.com/383

Topics

  • What is meant by Programming in the Physical World
  • How do people interact
  • How are we focusing today’s topic
  • What is a Microcontroller
  • What would you not use a microcontroller for
  • How do they get programmed and what language
  • How do you contain the device
  • Do you need to solder
  • How does this relate to Drupal
  • What have you used it for in the past
  • Where should I get started

Resources

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Stephen Cross – www.StephenCross.com @stephencross

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Alexa Currently provides only a basic integration, for a developer to create a customized handler module to implement the specific functionality they need. An alexa_demo submodule provides a reference.

Jan
23
2023
--

Working of MySQL Replication Filters When Using Statement-based and Row-based Replication

MySQL Replication Filters

MySQL Replication FiltersA couple of days ago I was creating an index on the source and when I checked the replica side it was not replicated, so I just wanted to explain how the replication filter may increase the complexity of your DBA operations.

Replication occurs by reading events from the binary log of the source and then executing them on the replica. The events in the binary log are recorded in different formats, depending on the type of event. These formats are determined by the binary logging format used when the events were initially recorded on the source. The relationship between the binary logging formats and the terminology used during replication is as follows:

When using statement-based binary logging, the source writes SQL statements to the binary log. Replication of the source to the replica is performed by executing these SQL statements on the replica. This is known as statement-based replication (SBR), and it corresponds to the statement-based binary logging format in MySQL.

When using row-based binary logging, the source writes events to the binary log that shows how individual rows in tables are changed. Replication of the source to the replica is done by copying these events, representing the changes in the table rows, to the replica. This is known as row-based replication (RBR) and it corresponds to the row-based binary logging format in MySQL.

Row-based logging is the default method.

You can also configure MySQL to use a mix of both statement-based and row-based logging, depending on which is most appropriate for the change to be logged. This is called mixed-format binary logging.

MySQL supports two types of replication filters; those that apply at the database level and those that apply and the table level:

There are filters at the database level, known as binlog-do-db and binlog-ignore-db, that control what is included in the binary log. However, it’s important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. It’s therefore recommended to use filters that apply to the replica rather than the binary log.

When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. And when using table-level filters with STATEMENT-based replication, the filter applies to the table change.

Mixing database and table-level filters create more complex rules. First, the database-level filters are checked, and if the update qualifies, the table-level filters are also checked. This can result in different outcomes for STATEMENT-based and ROW-based replication when using non-default databases. Additionally, using the MIXED format for binary logging can also cause issues, as a small change in the query can change whether the statement is logged in the STATEMENT- or ROW-based format. For this reason, it’s safer to use table-level replication filters rather than database-level filters.

See Determination of Safe and Unsafe Statements in Binary Logging for details of how it is decided whether a statement is logged using the STATEMENT-based or ROW-based format when using MIXED mode replication.

Examples

All the following examples use the following schema:

mysql> CREATE DATABASE databasefilter; 
Query OK, 1 row affected (0.02 sec) 
mysql> CREATE DATABASE databasewithoutfilter; 
Query OK, 1 row affected (0.00 sec) 
mysql> use databasefilter; 
Database changed 
mysql> CREATE TABLE t1 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; 
Query OK, 0 rows affected (0.06 sec) 
mysql> CREATE TABLE t2 (uid int unsigned NOT NULL PRIMARY KEY) ENGINE=InnoDB; 
Query OK, 0 rows affected (0.04 sec)

 

Example one

Replication filter:

replicate-wild-do-table = databasefilter.t1\_%

Statements:

use databasefilter; 
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: Yes

Row-based replication: Yes

The replication filter is set to “replicate-wild-do-table = databasefilter.t1_%”, meaning that any table in the “databasefilter” database with a name starting with “t1_” will be replicated. If the statement “use databasefilter; INSERT INTO t1 VALUES (1);” is executed, it will be replicated using both statement-based and row-based replication methods.

Example two

Replication filter:

replicate-wild-do-table = databasefilter.t2\_%

Statements:

use databasefilter;
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: No

Row-based replication: No

The replication filter “replicate-wild-do-table = databasefilter.t2_%” is set, which means that only tables that match the pattern “databasefilter.t2_%” will be replicated. When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.

However, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will not replicate in either statement-based or row-based replication because the table “t1” does not match the pattern specified in the replication filter.

Example three

Replication filter:

replicate-do-db = databasefilter

Statements:

use databasefilter;
INSERT INTO t1 VALUES (1);

Replicates?

Statement-based replication: Yes

Row-based replication: Yes

The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated.When the statement “use databasefilter;” is executed, the current database will be set to “databasefilter”.

After that, when the statement “INSERT INTO t1 VALUES (1);” is executed, it will replicate in both statement-based and row-based replication because it is executed in the “databasefilter” database that matches the pattern specified in the replication filter.

Example four

Replication filter:

replicate-do-db = databasefilter

Statements:

use databasewithoutfilter;
INSERT INTO databasefilter.t1 VALUES (1);

Replicates?

Statement-based replication: No

Row-based replication: Yes

The replication filter “replicate-do-db = databasefilter” is set, which means that only statements executed in the “databasefilter” database will be replicated. When the statement “use databasewithoutfilter;” is executed, the current database will be set to “databasewithoutfilter”, which does not match the pattern specified in the replication filter. However, when the statement “INSERT INTO databasefilter.t1 VALUES (1);” is executed, it will replicate in row-based replication but not in statement-based replication.

The reason for this is that statement-based replication replicates the entire statement, including the “use databasewithoutfilter” statement, which does not match the pattern specified in the replication filter. But in row-based replication, it only replicates the actual data change and it does not care about the current database.

Conclusion

MySQL replication filters can be used to control which events are replicated from the source to the replica. These filters can be applied at the database level or the table level and can increase the complexity of DBA operations. It is important to note that if events are filtered out of the binary log, they can no longer be used for point-in-time recovery that involves those tables. When using ROW-based replication, the filter applies to the specific table the change is made on. However, when using STATEMENT-based replication, the database-level filters apply to the default database. Mixing database and table-level filters create more complex rules and can cause issues when using MIXED format for binary logging.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Jan
23
2023
--

[BUG] Stopping Purge/Resuming Purge in Error Logs After Upgrade to MySQL 5.7.40

Resuming Purge in Error Logs After Upgrade to MySQL 5.7.40

We had a couple of cases where clients reported that the MySQL error log was flooded with the below note:

2023-01-18T13:07:56.946323Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:07:56.948621Z 2 [Note] InnoDB: Resuming purge
2023-01-18T13:08:27.229703Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:08:27.231552Z 2 [Note] InnoDB: Resuming purge
2023-01-18T13:08:28.581674Z 2 [Note] InnoDB: Stopping purge

One of my colleagues Sami Ahlroos found that whenever we trigger a truncate on any table, the function is stopping the purge and then resuming it once it has found it stopped.

Below are the steps to reproduce.

  1. Log verbosity needs to be set to 3 (the default value)
mysql> show variables like 'log_error_verbosity%';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| log_error_verbosity | 3     |
+---------------------+-------+

     2. Create a test table and run a truncate multiple times

mysql> create table t(id int primary key);
Query OK, 0 rows affected (0.01 sec)

mysql> truncate table t;
Query OK, 0 rows affected (0.01 sec)

mysql> truncate table t;
Query OK, 0 rows affected (0.00 sec)

mysql> truncate table t;
Query OK, 0 rows affected (0.00 sec)

mysql> truncate table t;
Query OK, 0 rows affected (0.01 sec)

   3. Check the error logs

2023-01-18T13:07:56.946323Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:07:56.948621Z 2 [Note] InnoDB: Resuming purge
2023-01-18T13:08:27.229703Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:08:27.231552Z 2 [Note] InnoDB: Resuming purge
2023-01-18T13:08:28.581674Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:08:28.583307Z 2 [Note] InnoDB: Resuming purge
2023-01-18T13:08:29.322114Z 2 [Note] InnoDB: Stopping purge
2023-01-18T13:08:29.323765Z 2 [Note] InnoDB: Resuming purge

The bug was reported to MySQL and confirmed as well.

Along with MySQL version 5.7.40, similar behavior is also noticed in Percona Server for MySQL 5.7.40 and Percona XtraDB Cluster 5.7.40. This behavior was not noticed in the 8.0 version.

NOTE – There is no workaround to get through it as of now. We can only avoid this by setting log_error_verbosity to 2, so it discards notes in the error log. So if you are doing a lot of truncates, this can be a workaround not to add those notes in the error logs.

References:

https://bugs.mysql.com/bug.php?id=109552 

https://jira.percona.com/browse/PS-8575

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com