Dec
05
2022
--

Talking Drupal #376 – Burnout

Today we are talking about Burnout with Jono Bacon.

For show notes visit: www.talkingDrupal.com/376

Topics

  • What is burnout
  • Why is it so important to you
  • Have you suffered from burnout
  • Do different professions have different rates of burnout
  • Is it individual or teams / projects / community oriented
  • Is it only mental or can it be physical
  • What contributes to burnout as a contributor or maintainer
  • What can prevent burnout
  • How do you recover
  • First episode was Talking Drupal #265
  • Helping communities
  • Signs to watch out for
  • What is next

Resources

Guests

Jono Bacon – www.jonobacon.com @jonobacon

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Leslie Glynn – @leslieglynn

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Token The Token module provides a centralized API for text substitution. Since Drupal 7 some Token support is built into core, but the module provides common and reusable token UI elements and missing core tokens.

Dec
02
2022
--

Using Liquibase as a Solution for Deploying and Tracking MySQL Schema Changes

Liquibase mysql schema changes

Liquibase mysql schema changesDatabase-as-code service is a new concept and gaining some popularity in recent years. As we already know, we have deployment solutions for application code. Managing and tracking application changes are quite easy with tools like Git and Jenkins.

Now this concept is applied in the database domain as well, assuming SQL as a code to manage database changes (DDL, DML) the same way that applications handle code. From a database standpoint, this allows tracing the history of modifications, allowing problems to be quickly detected and addressed.

What is Liquibase?

Liquibase is an open source, database-independent framework for deploying, managing, and tracking database schema changes. All the modifications or changes to the database are stored in text files (XML, YAML, JSON, or SQL) known as changesets. To specifically list database changes in order, Liquibase employs a changelog. The changelog serves as a record of changes and includes a list of changesets that Liquibase can execute on a target database.

Let’s see how we can set up Liquibase and perform some database changes with this tool.

Installing Liquibase

1. Download and extract Liquibase files.

shell> wget https://github.com/liquibase/liquibase/releases/download/v4.17.2/liquibase-4.17.2.tar.gz 
shell> mkdir /usr/local/bin/liquibase/ 
shell> sudo tar -xzvf liquibase-4.17.2.tar.gz --directory  /usr/local/bin/liquibase/

2. Define the installation directory to the environment path and add the same in “~/.bashrc” file as well.

export PATH=$PATH:/usr/local/bin/liquibase/liquibase

3. Validate the installation.

shell> liquibase --version

####################################################
## _ _ _ _ ##
## | | (_) (_) | ##
## | | _ __ _ _ _ _| |__ __ _ ___ ___ ##
## | | | |/ _` | | | | | '_ \ / _` / __|/ _ \ ##
## | |___| | (_| | |_| | | |_) | (_| \__ \ __/ ##
## \_____/_|\__, |\__,_|_|_.__/ \__,_|___/\___| ##
## | | ##
## |_| ##
## ## 
## Get documentation at docs.liquibase.com ##
## Get certified courses at learn.liquibase.com ## 
## Free schema change activity reports at ##
## https://hub.liquibase.com ##
## ##
####################################################
Starting Liquibase at 15:04:16 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Home: /usr/local/bin/liquibase
Java Home /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre (Version 1.8.0_352)

Note – Java(JDK) needs to be set up on your system for Liquibase to function.

How to use Liquibase with MySQL

1. To use Liquibase and MySQL, we need the JDBC driver JAR file. Copy the jar file in liquibase internal library.

shell> wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-j-8.0.31.tar.gz 
shell> tar -xzvf mysql-connector-j-8.0.31.tar.gz 
shell> cp mysql-connector-j-8.0.31/mysql-connector-j-8.0.31.jar  /usr/local/bin/liquibase/internal/lib/

2. First, initialize a project with the options below.

shell> liquibase init project \ 
--project-dir=/home/vagrant/liquibase_mysql_project \ 
--changelog-file=file \ 
--format=sql \ 
--project-defaults-file=liquibase.properties \ 
--url=jdbc:mysql://localhost:3306/sbtest \ 
--username=root \ 
--password=Root@1234

Let’s understand what these parameters are.

--project-dir => project location where all files related to the project will be kept.
--changelog-file => file containing deployment changes.
--format =>format of deployment file (.sql,xml etc).
--project-defaults-file => liquibase property file.
--url => MySQL database url.
--username=root => database user name.
--password => database password.

3. We can then create a manual file (“changelog.sql”) in the project location and define the changeset. Other formats for defining the deployment changes include (.xml, .json, or .yaml). 

CREATE TABLE test_table (test_id INT, test_column VARCHAR(256), PRIMARY KEY (test_id))

4. Validate the connection to the database is successful. Inside the Liquibase project folder, run the below command.

shell> liquibase --username=root --password=Root@1234 --changelog-file=changelog.sql status

5. Inspect the SQL before execution.

shell> liquibase --changelog-file=changelog.sql update-sql

6. Finally, deploy the changes.

shell> liquibase --changelog-file=changelog.sql update

Output:

Starting Liquibase at 16:56:44 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000) 
Liquibase Version: 4.17.2 
Liquibase Community 4.17.2 by Liquibase 
Running Changeset: changelog.sql::raw::includeAll 
Liquibase command 'update' was executed successfully.

7. Validate the changes in the database. Additionally, you observe below extra tables in the same database which captures a few more stats of the execution.

a)  Table: DATABASECHANGELOG

mysql> select * from DATABASECHANGELOG\G;
*************************** 1. row ***************************
ID: raw
AUTHOR: includeAll
FILENAME: changelog.sql
DATEEXECUTED: 2022-11-08 16:10:36
ORDEREXECUTED: 1
EXECTYPE: EXECUTED
MD5SUM: 8:155d0d5f1f1cb1c0098df92a8e92372a
DESCRIPTION: sql
COMMENTS: 
TAG: NULL
LIQUIBASE: 4.17.2
CONTEXTS: NULL
LABELS: NULL
DEPLOYMENT_ID: 7923831699

Note – Each changeset is tracked in the table as a row and is identified by the id, author, and filename fields.

b) Table: DATABASECHANGELOGLOCK

mysql> select * from DATABASECHANGELOGLOCK\G;
*************************** 1. row ***************************
ID: 1
LOCKED: 0x00
LOCKGRANTED: NULL
LOCKEDBY: NULL
1 row in set (0.00 sec)

Note – To guarantee that only one instance of Liquibase is running at once, Liquibase employs the DATABASECHANGELOGLOCK table.

8. By running the command “liquibase history” we can check the past deployments as well.

Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Liquibase History for jdbc:mysql://localhost:3306/sbtest

- Database updated at 11/8/22 4:10 PM. Applied 1 changeset(s), DeploymentId: 7923831699
changelog.sql::raw::includeAll

So here we have successfully deployed the changes in the target database. Next, we will see how we can track the changes and perform rollback operations.

In order to perform rollbacks with respect to certain DDL or  DML we need to add rollback changeset details in the .sql file.

Let’s see the steps to perform the rollback operations

1) Create the deployment file “deployment.sql” inside the project location with the below changeset details.

--liquibase formatted sql

--changeset AJ:1 labels:label1 context:context1
--comment: DDL creation

create table P1 (
id int primary key auto_increment not null,
name varchar(50) not null
);
--rollback DROP TABLE P1;

create table P2 (
id int primary key auto_increment not null,
name varchar(50) not null
);
--rollback DROP TABLE P2;

Option details:

AJ:1 => denotes author:id

labels:label1 => Specifies labels that are a general way to categorize changesets like contexts.

context:context1 => Executes the change if the particular context was passed at runtime. Any string can be used for the context name

2) Next, run the deployment. Run the below command inside the project location.

shell> liquibase --changelog-file=deployment.sql update

3) Now, add the tagging to manage rollback scenarios.

shell> liquibase tag version4

4) By default, the tagging will be added in the recent deployments. We can check the same in the below table.

mysql> select * from DATABASECHANGELOG\G;

ID: 1
AUTHOR: AJ
FILENAME: deployment.sql
DATEEXECUTED: 2022-11-28 07:09:11
ORDEREXECUTED: 3
EXECTYPE: EXECUTED
MD5SUM: 8:2e4e38d36676981952c21ae0b51895ef
DESCRIPTION: sql
COMMENTS: DDL creation
TAG: version4
LIQUIBASE: 4.17.2
CONTEXTS: context1
LABELS: label1
DEPLOYMENT_ID: 9619351799

5) Let’s roll back the executed changes. This command will revert all changes made to the database after the specified tag.

shell> liquibase --changelog-file=deployment.sql rollback version4

Output:

Starting Liquibase at 07:38:36 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Rolling Back Changeset: deployment.sql::1::AJ
Liquibase command 'rollback' was executed successfully.

Note – Post this activity, we don’t see those tables anymore in the database (P1,P2). As a rollback operation these tables were dropped now.

Alternatively, we can perform the rollback activity on the basis of timestamps as well. The below command is used to revert all changes made to the database from the current date to the date and time you specify. 

Eg,

shell> liquibase --changelog-file=deployment.sql  rollback-to-date 2022-11-27

Output:

Starting Liquibase at 07:18:26 (version 4.17.2 #5255 built at 2022-11-01 18:07+0000)
Liquibase Version: 4.17.2
Liquibase Community 4.17.2 by Liquibase
Rolling Back Changeset: deployment.sql::1::AJ
Liquibase command 'rollback-to-date' was executed successfully.

Liquibase integration with Percona Toolkit (pt-osc)

A Liquibase extension is available to enable the pt-online-schema-change feature of the Percona Toolkit. With the use of pt-osc rather than SQL, this extension substitutes several default changes. With the aid of the pt-online-schema-change tool, you can upgrade a database without locking any tables.

Let’s see the steps to use pt-osc with Liquibase extension

1) Download the Percona Liquibase jar file.

shell> wget https://github.com/liquibase/liquibase-percona/releases/download/v4.17.1/liquibase-percona-4.17.1.jar

2)  Copy the jar file to Liquibase internal library folder.

shell> sudo cp liquibase-percona-4.17.1.jar /usr/local/bin/liquibase/internal/lib/

3) Add the below changeset in changelog.xml file which basically adds one column (“osc”) in table:liq1.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xmlns:pro="http://www.liquibase.org/xml/ns/pro"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">

<changeSet id="3" author="AJ">
<addColumn tableName="liq1"
xmlns:liquibasePercona="http://www.liquibase.org/xml/ns/dbchangelog-ext/liquibase-percona"
liquibasePercona:usePercona="true">
<column name="osc" type="varchar(255)"/>
</addColumn>
</changeSet>
</databaseChangeLog>

Note – here we mentioned using Percona=”true” which enable the DDL execution via pt-osc.

4) Validate the deployment changes before actual implementation. We can see below the pt-osc command reference which is going to be executed in the next phase. 

shell>  liquibase --changelog-file=changelog.xml update-sql

Output

Liquibase Community 4.17.2 by Liquibase
-- *********************************************************************
-- Update Database Script
-- *********************************************************************
-- Change Log: changelog.xml
-- Ran at: 11/28/22 8:28 AM
-- Against: root@localhost@jdbc:mysql://localhost:3306/liq
-- Liquibase version: 4.17.2
-- *********************************************************************

-- Lock Database
UPDATE liq.DATABASECHANGELOGLOCK SET `LOCKED` = 1, LOCKEDBY = 'localhost.localdomain (10.0.2.15)', LOCKGRANTED = NOW() WHERE ID = 1 AND `LOCKED` = 0;

-- Changeset changelog.xml::3::AJ
-- pt-online-schema-change --alter-foreign-keys-method=auto --nocheck-unique-key-change --alter="ADD COLUMN osc VARCHAR(255) NULL" --password=*** --execute h=localhost,P=3306,u=root,D=liq,t=liq1;

5) Finally, run the deployment.

shell>  liquibase --changelog-file=changelog.xml update

Summary

Apart from MySQL, Liquibase supports other popular databases like (PostgreSQL, Cassandra, and MongoDB). The developers benefit greatly from this since they can collaborate to write their own scripts that relate to the database and then commit them as part of their code. It maintains the versions of all the changes like any other version control change and supports branching and merging of SQL code.

Further reading

Dec
02
2022
--

Hacktoberfest Results – Percona Honors the Contributors

Percona Hacktoberfest

Participating in various community activities can be rewarding in different aspects. Many community members joined Hacktoberfest this year and contributed to various open source projects, and we would like to highlight the most active contributors to Percona software during this fantastic event. Each of them will receive a T-shirt and a mug with a unique design.

Hacktoberfest Results - Percona Honors the Contributors

Let’s explore in detail the results of their exceptional work.

Aditya-Kapadiya is our secured connection master! He changed HTTP to HTTPS protocol in the links in documentation, README, and other files in Percona Monitoring and Management (PMM), Community, MongoDB Exporter, and Percona Docker repositories.

PRs:

Azanul made two great contributions! First, he replaced a deprecated Azure SDK in PMM with a modern one. That change required some PMM code refactoring, and Azanul succeeded in that. As a result, we have fewer outdated dependencies in PMM and we can use Dependabot for tracking Azure SDK updates. Secondly, he replaced the reviewdog binary with GitHub Action in PMM CI. This tool is used for automated review comments in PR, and it’s redundant in local setups. So, replacing it with CI action reduces the project dependencies graph and speeds up local toolchain installs.

PRs:

daniel-shuy fixed the database migration test. It was tricky to figure out what actually went wrong with this test. In fact, the test was completely useless, but after this PR, the test is doing what it is supposed to.

PR: https://github.com/percona/pmm/pull/1299

fmbiete fixed the broken link in the Percona XtraBackup documentation. While this change is small, it’s still important. No one likes to follow broken links, do they?

PR: https://github.com/percona/pxb-docs

mintbomb27 helped us to change our Github Actions forks with upstream implementations. Initially, we believed that it would be a good idea to fork every Action that we are using in our CI for security reasons. But soon, we realized that it takes too much effort to maintain those forks. Now we have more time for our products. Thank you, mintbomb27!

PR: https://github.com/percona/pmm/pull/1316

natastro made a great job fixing grammar and typos in the PMM documentation. There was a problem with the CLA signing. That’s why the original PR was closed and reopened, but changes were finally merged.

PR: https://github.com/percona/pmm-doc/pull/891

naughtyGitCat added an alternative option for specifying credentials. Instead of adding user and password to Mongo URI, you can pass them with dedicated flags.

PR: https://github.com/percona/mongodb_exporter/pull/560

neelshah2409 also helped us to improve grammar and punctuation in PMM documentation.

PR: https://github.com/percona/pmm-doc/pull/898

NikhilWaghmode09 fixed the HTML tag in the community repository.

PR: https://github.com/percona/community/pull/583

ramunas-omnisend polished mongodb_mongos_sharding_balancer_enabled metric in MongoDB exporter.

PR: https://github.com/percona/mongodb_exporter/pull/573

sarthakastic corrected the CSS style in the community repo.

PR: https://github.com/percona/community/pull/588

slim-codes added a blog post What is Open Source and Why Should You Care to the community repo.

PR: https://github.com/percona/community/pull/608

Yash-sudo-web corrected mistakes in the Percona XtraDB Cluster documentation. It had invalid SQL examples in the HOW TO section.

PR: https://github.com/percona/pxc-docs/pull/20

yokogawa-k fixed an incorrect description of --ignore-engines option in Percona Toolkit.

PR: https://github.com/percona/percona-toolkit/pull/555

zulh-civo added support for headless service in Percona Operator for MySQL.

PR: https://github.com/percona/percona-xtradb-cluster-operator/pull/1242

Thank you! We really appreciate all the efforts and we already look forward to participating in Hacktoberfest next year and welcome everyone to join.

Dec
02
2022
--

New WAL Archive Module/Library in PostgreSQL 15

New WAL Archive Module:Library in PostgreSQL 15

PostgreSQL traditionally uses shell commands to achieve continuous WAL archiving, which is essential for backups and stable standby replication.  In the past, we blogged about the inefficiency in that design and how some of the backup tools like PgBackRest solve that problem. It is a well-known problem in the PostgreSQL community, and many discussions happened in the past about the same.

It is also unfortunate that PostgreSQL documentation gives an unsafe command string, employing a combination of the test and cp commands as an example:

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Even though the documentation mentions it as an example to explain how the archive_command executes shell scripts/commands, we keep seeing many users copying this approach, falling into this documentation trap, and running into archiving issues.

The good news is that PostgreSQL 15 came up with a set of features to address problems associated with WAL archiving. PostgreSQL can now do WAL archiving using loadable modules. Please refer to the following commit id for the implementation details: https://git.postgresql.org/gitweb/?p=postgresql.git;h=5ef1eefd7 for details. The ability to use an archive module/library is a revolutionary step in this area.

Specifying the archive_library

Specifying the archive_library can be performed when the system is up and running. Just send a signal SIGHUP afterward to get the configuration reloaded::

postgres=# alter system set archive_library='<Library>';
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)

It is important to keep an eye on PostgreSQL logs because anything going wrong will not be displayed on the psql session, but it will be there in the PostgreSQL logs:

2022-11-24 05:07:27.618 UTC [788] LOG:  received SIGHUP, reloading configuration files
2022-11-24 05:07:27.619 UTC [788] LOG:  parameter "archive_library" changed to "shell"
2022-11-24 05:07:27.620 UTC [1274] FATAL:  could not access file "shell": No such file or directory

We can remove the library specification if we want to use the shell commands as it was in PostgreSQL 14 or older.

ALTER SYSTEM SET archive_library='';
OR
ALTER SYSTEM RESET archive_library ;

And make sure that no archive_library specification is not in effect.

postgres=# show archive_library ;
archive_library
-----------------

(1 row)

At this stage, PostgreSQL will use the plain old archive_command for WAL archiving. And you may specify the same.

alter system set archive_command = '<shell command>';

So, in summary, the archive_command will be considered only if the archive_library spec is empty.

basic_archive: The sample archive library

PostgreSQL 15 provides a simple, sample archive library for a reference architecture, which is part of the contrib modules. If the contrib modules are already present, we can specify the archive_library like:

postgres=# ALTER SYSTEM SET archive_library = 'basic_archive';

But this library needs further input about where to place the archive files. Which can be specified using the module-specific parameter basic_archive.archive_directory.

Otherwise, we may start seeing messages as follows in PostgreSQL logs:

WARNING:  archive_mode enabled, yet archiving is not configured

You may want to refer to the official documentation also.

However, this archive library will be loaded only by the archiver process, not by the backend process of regular sessions. So any attempt to set the parameter or see the parameter can give you errors.

postgres=# ALTER SYSTEM SET basic_archive.archive_directory='/home/postgres/arch2';
ERROR:  unrecognized configuration parameter "basic_archive.archive_directory"

postgres=# show basic_archive.archive_directory;
ERROR:  unrecognized configuration parameter "basic_archive.archive_directory"

But regular sessions can LOAD the library if they want.

postgres=# LOAD 'basic_archive';
LOAD

postgres=# ALTER SYSTEM SET basic_archive.archive_directory='/home/postgres/arch2';                                                                                                                                                                             
ALTER SYSTEM

postgres=# show basic_archive.archive_directory;                                                                                                                                                                                             
basic_archive.archive_directory
---------------------------------

(1 row)

postgres=# SELECT pg_reload_conf();                                                                                                                                                                                                          
pg_reload_conf
----------------
t
(1 row)

postgres=# show basic_archive.archive_directory;
basic_archive.archive_directory
---------------------------------
/home/postgres/arch2
(1 row)

postgres=#

Here is a couple of important observations:

1) Instead of loading the library to a specific session, one might use other methods like shared_preload_libraries, but not advisable

2) The module-specific settings may not show up in pg_settings unless the module is loaded on the session.

If there are multiple archive libraries available, we can switch from one library to another when the system is live. But the archiver process will restart behind the scenes, and the new library specified will be loaded by the new archiver process. A message might appear in the PostgreSQL log like:

LOG:  restarting archiver process because value of "archive_library" was changed

if the attempt to archive the WAL by the basic_arhive fails, there will be corresponding ERROR entries in the PostgreSQL log.

2022-11-24 11:24:31.330 UTC [2323] ERROR:  could not create file "/home/postgres/arch2/archtemp.00000001000000000000007B.2323.1669289071330": No such file or directory
2022-11-24 11:24:31.330 UTC [2323] WARNING:  archiving write-ahead log file "00000001000000000000007B" failed too many times, will try again later

One who is closely observing the above error message might be wondering what this file: archtemp.00000001000000000000007B.2323.1669289071330, which the error message mentions, is. The name doesn’t appear to be a regular WAL segment file.  This happens because the basic_archive  module first creates a temporary file in the destination with a name like that, then it fsyncs the file, and then moves the file to its final name. This is to ensure better protection during crashes.

When the archiver is terminated by a signal (other than SIGTERM that is used as part of a server shutdown) or an error by the shell with an exit status greater than 125 (such as command not found), the archiver process aborts and it gets restarted by the postmaster. In such cases, the failures will be recorded in the

pg-stat-archiver

view. The errors in archive library execution also can be monitored using

pg-stat-archiver

.

postgres=# select * from pg_stat_archiver ;
-[ RECORD 1 ]------+------------------------------
archived_count     | 160
last_archived_wal  | 0000000100000000000000A1
last_archived_time | 2022-11-24 14:58:43.33668+00
failed_count       | 680
last_failed_wal    | 000000010000000000000087
last_failed_time   | 2022-11-24 14:49:51.716773+00
stats_reset        | 2022-11-15 08:58:34.939925+00

Advantage of basic_archive

The basic_archive module does pretty much the same thing as the shell command:

'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

But in a much better way. We already discussed one of the advantages:

It creates a temporary file, and it is fsynced and durably moved to the final archive file copy in the destination. This durability is something cp cannot provide. This drastically reduces the chance that a broken file in the archive destination can result in archive failures and sometimes even database recoverability.

Another important functional advantage of basic_archive is that it has a built-in feature to compare the file in the source (pg_wal) and archive destination when the same file exists. It compares the content of the files and verifies that they are exactly the same, and report back to the archiver a “success” so that the archiver process can continue to the next WAL segment. This also reduces the chance of archive failures. Because if the file is archived by the module, but the server crashes before recording it, PostgreSQL will try to archive the same WAL segment again. The new sample basic_archive module silently succeeds the second attempt if the file is copied and has the same content.

Additionally, the basic_archive module has a custom exception handler. This allows the archiver to treat an ERROR as a normal failure and avoid restarting the archiver again.

Potential risks/problems of using archive library

Since the archive modules are loaded into PostgreSQL, unlike shell commands executed by archive_command, they have access to server resources. So please pay attention and avoid modules from unknown sources and sources that are not trustable enough as they possess risk.

Module authors must also remember that the archiving callback runs in its transient memory context.  If the module needs to palloc() something that needs to stick around for a while, it will need to do so in a different memorycontext. One might refer to basic developer documentation and the source of base_archive source code.

The sample module: basic_archive may leave archtemp.* files behind in the archive directory if there were crashes or due to other strange failure cases. If such things happen, users may have to clean them before starting PostgreSQL again.

What can be expected in future

We can see a bright future with all these improvements, as the default archive_command  can become yet another archive module in future versions of PostgreSQL.

I wish the basic_archiveor new modules will come into existence with more features and configuration options like compression because Compression of PostgreSQL WAL Archives is Becoming More Important.

I would expect PostgreSQL support vendors to develop many more powerful libraries and add to the PostgreSQL ecosystem.

Database as a Service (DBaaS) vendors will be one of the biggest beneficiaries as they will have a very specific place and method for WAL archiving, which a module can handle in a much more efficient way.

I would expect the backup tools and methods to evolve. Now a backup tool can remain in demon mode, waiting for messages from the PostgreSQL archiver. walg_archive is an example in this direction.

I think it’s okay to expect restore_libary support also in future PostgreSQL.

Thanks to Community

Special thanks to Nathan Bossart, who took the major role in improving WAL archiving in PostgreSQL 15, and others like Robert Haas for active participation from the beginning till the final commit.

Thanks to Many others who were involved in the discussions and reviews, namely Julien Rouhaud, Andrey Borodin, Robert Haas, Stephen Frost, Magnus Hagander, David Steele, Fujii Masao, Michael Paquier, Alvaro Herrera, Tom Lane, Benoit Lobréau, Peter Eisentraut, Bharath Rupireddy, Kyotaro Horiguchi, Ian Lawrence Barwick, and many others.

Great to see that PostgreSQL 15 started addressing many of the areas of inefficiency, starting from the removal of the stats collector and progress in asynchronous I/O.

Dec
01
2022
--

Data Masking With Percona Server for MySQL – An Enterprise Feature at a Community Price

Data Masking With Percona Server for MySQL

Data Masking With Percona Server for MySQLData masking is a handy tool to obscure sensitive information.  Percona Server for MySQL is a free, fully compatible, enhanced, and open source drop-in replacement for any MySQL database. It provides superior performance, scalability, and instrumentation including data masking. With Percona Server for MySQL, you get the enterprise features without the cost.  

So how does data masking work? The server obscures the output so that the sensitive data is not displayed.  As you will see in later examples, there are many options to help you with your data masking needs.

Installation

Installation is easy as all you need to do is to load a shared object library from a MySQL client program with INSTALL PLUGIN data_masking SONAME ‘data_masking.so’;

stoker@testbox:~/Downloads$ mysql -u root -p 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 8.0.30-22 Percona Server (GPL), Release '22', Revision '7e301439b65'

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> INSTALL PLUGIN data_masking SONAME 'data_masking.so';
Query OK, 0 rows affected (0.04 sec)

mysql>

Basic usage

We will start by looking at the MASK_INNER() and MASK_OUTER() functions. First, we will need some data to work with. The following creates a two-column table and populates it with some data.

mysql> create table sensative_data (id int, hushhush bigint);
Query OK, 0 rows affected (1.25 sec)

mysql> insert into sensative_data values (1,1234567890),(2,0987654321);
Query OK, 2 rows affected (0.20 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql>

Note that for the second row the leading zero will be stripped off so that ‘987654321’ is stored not ‘0987654321’. This is why you see CHAR() data types occasionally where you would expect an INT().

The MASK_INNER() will mask, with either the character of your choice or use the default of an octothorpe (#), the inner characters of a column. And MASK_OUTER() masks the data on either end of the column.

mysql> SELECT id, 
       hushhush as 'Original', 
       MASK_INNER(convert(hushhush using binary),2,3) as 'Inner', 
       MASK_OUTER(convert(hushhush using binary),3,3) as 'Outer' 
FROM sensative_data;
+------+------------+------------+------------+
| id   | Original   | Inner      | Outer      |
+------+------------+------------+------------+
|    1 | 1234567890 | 12XXXXX890 | XXX4567XXX |
|    2 | 987654321  | 98XXXX321  | XXX654XXX  |
+------+------------+------------+------------+
2 rows in set (0.00 sec)



mysql>

The CONVERT() function is used to ensure that multi-byte character sets do not overwhelm the receiving functions.

SSN numbers

Social Security Numbers (SSN) are not supposed to be used as a general identifier despite them being used that way. There is a special data mask just for SSNs. Again we need some sample data and please note that the mask will complain if it is not working with eleven characters.

mysql> create table employee (id int, name char(15), ssn char(11));
Query OK, 0 rows affected (1.33 sec)

mysql> create table employee (id int, name char(15), ssn char(11));
Query OK, 0 rows affected (1.33 sec)

mysql> insert into employee values (1,"Moe",123-12-1234), (2,"Larry",22-222-2222),(3,'Curly',99-999-9999);
Query OK, 3 rows affected (0.14 sec)
Records: 3  Duplicates: 0  Warnings: 0

The MASK_SSN() is like the previous data masks save that it only returns the last four numbers.

mysql> select id, 
              name, 
              mask_outer(name,1,1,'#') as 'masked', 
              mask_ssn(ssn) as 'Masked SSN' 
        from employee;
+------+-------+--------+-------------+
| id   | name  | masked | Masked SSN  |
+------+-------+--------+-------------+
|    1 | Moe   | #o#    | XXX-XX-1234 |
|    2 | Larry | #arr#  | XXX-XX-2222 |
|    3 | Curly | #url#  | XXX-XX-9999 |
+------+-------+--------+-------------+
3 rows in set (0.01 sec)

Credit card numbers

Credit card numbers get their own masking function. The credit card number itself, known as the Primary Account Number or PAN, is routinely masked to return only the last four digits. To see how this works, we alter our employee table for a new column to house the 16-digit credit card, er, PAN, number.

SQL > alter table employee add column cc char(16);
Query OK, 0 rows affected (1.6026 sec)
Records: 0 Duplicates: 0 Warnings: 0
SQL > update employee set cc = "1234123412341234";
Query OK, 3 rows affected (0.0969 sec)

Rows matched: 3 Changed: 3 Warnings: 0

And now we can try MASK_CC().

SQL > select mask_pan(cc) from employee;
+------------------+
| mask_pan(cc) |
+------------------+
| XXXXXXXXXXXX1234 |
| XXXXXXXXXXXX1234 |
| XXXXXXXXXXXX1234 |
+------------------+
3 rows in set (0.0011 sec)
SQL >

Using a view with a mask

You may be asking how you can mask the sensitive data when they have access to the underlying table. The traditional way to protect the raw data is with a VIEW. The VIEW can have different privileges than the person using it. Best of all is that the VIEW can be treated like a table by the user even if they have restricted access to the data. And you can combine a VIEW with a mask to protect the data. The trick is that you have to set up, use, and maintain the use of the VIEW.

To create a user with no privs, but can see masked data, we need to start with a ‘bare bones’ account. The account ‘nopriv’ was set up with no grants and if we try to access anything in the schema with our sensitive data with this account it will be unsuccessful.

stoker@testbox:~$ mysql -u nopriv -p test
Enter password: 
ERROR 1044 (42000): Access denied for user 'nopriv'@'%' to database 'test'
stoker@testbox:~$ mysql -u nopriv -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 8.0.30-22 Percona Server (GPL), Release '22', Revision '7e301439b65'

Copyright (c) 2009-2022 Percona LLC and/or its affiliates
Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from test.employee;
ERROR 1142 (42000): SELECT command denied to user 'nopriv'@'localhost' for table 'employee'
mysql>

Now as ‘root’ or some similarly privileged account we can create a VIEW in the schema we are using.

SQL > create view e1_cc as 
          SELECT id, 
                 name, 
                 mask_pan(cc) 
           from employee;
Query OK, 0 rows affected (0.1664 sec)
SQL > select * from e1_cc;
+----+-------+------------------+
| id | name  | mask_pan(cc)     |
+----+-------+------------------+
|  1 | Moe   | XXXXXXXXXXXX1234 |
|  2 | Larry | XXXXXXXXXXXX1234 |
|  3 | Curly | XXXXXXXXXXXX1234 |
+----+-------+------------------+
3 rows in set (0.0028 sec)
SQL >

Now we can give the unprivileged access to the VIEW we created. Remember to do this from a privileged account!

SQL> GRANT SELECT ON test.e1_cc TO 'nopriv'@'%';

Now we have to log out of our unprivileged account and log in again to get the new privilege in effect. You can now, as the unprivileged user, get masked data from the sensitive employee table. This account cannot see the actual data in the table but it can get what is allowed via the view. This can be very handy.

SQL > select * from e1_cc;
+----+-------+------------------+
| id | name  | mask_pan(cc)     |
+----+-------+------------------+
|  1 | Moe   | XXXXXXXXXXXX1234 |
|  2 | Larry | XXXXXXXXXXXX1234 |
|  3 | Curly | XXXXXXXXXXXX1234 |
+----+-------+------------------+
3 rows in set (0.0028 sec)

Wrap up

Data masking is a powerful technique to keep sensitive data protected. And Percona provides this feature with the free, open source Percona Server for MySQL software. You will need to purchase a subscription from their vendors if you wish to use data masking with Oracle’s Enterprise Edition or MariaDB’s MaxScale.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Nov
30
2022
--

Upload Ongoing MyDumper Backups to S3

Upload Ongoing MyDumper Backups to S3

Upload Ongoing MyDumper Backups to S3If you are using MyDumper as your Logical Backup solution and you store your backups on S3, you need to take a local backup and then upload it to S3. But what if there is not enough space to hold the backup on the server where we are taking the backup? Even if we have enough disk space, we will need to wait until the end to start to upload the files, making the whole process longer.

MyDumper implemented stream backup in v0.11.3 and we have been polishing the code since then. We also implemented two ways of executing external commands:

--exec-per-thread: The worker that is getting the data from the database will write and redirect to the STDIN of the external command. It will be similar to execute cat FILE | command per every written and closed file.

--exec: In this case, the worker writes in the local storage and when the file is closed, the filename is enqueued. The exec threads are going to pop from the queue and execute the command based on the filename. FILENAME is a reserved word that is going to be replaced in the command, for instance, --exec=’/usr/bin/ls -l FILENAME’ will execute ls -l of every single file. The command must be an absolute path.

Both implementations have different use cases, pros, and cons. We are going to be using --exec, as current --exec-per-thread implementation doesn’t allow us to dynamically change the command with the filename which is going to be changing on each iteration.

Execution

For this example I created a table name test.mydumper2S3 with millions of rows, you need to configure a valid AWS account, install AWS CLI, and have a bucket. 

As I stated before, there are two ways of uploading the files, the main difference is the amount of execution of the AWS command or threads that you want to use. A stream will be only one process but --exec can control the amount of thread or execution with --exec-threads.

With stream

This might be the simplest way if you are familiar with piping your commands. In the example you will find, the table name, the split by rows value, the path where the temporary files will reside, and finally the --stream option:

mydumper -T myd_test.mydumper2S3 -r 20000 \
  -o data --stream | aws s3 cp - s3://davidducos/mydumper_backup.sql --region us-east-1

On the AWS CLI command, we specify the S3 service and the cp command, the – means that it will read from STDIN and then the location of the single file (s3://davidducos/mydumper_backup.sql) that is going to be uploaded.

In the log, you will entries like this:

…
2022-11-13 21:18:09 [INFO] - Releasing FTWR lock
2022-11-13 21:18:09 [INFO] - Releasing binlog lock
2022-11-13 21:18:09 [INFO] - File data/myd_test-schema-create.sql transferred | Global: 0 MB/s
2022-11-13 21:18:09 [INFO] - File data/myd_test.mydumper2S3-schema.sql transferred | Global: 0 MB/s
2022-11-13 21:18:09 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3`  WHERE `id` IS NULL OR `id` = 1 OR( 1 < `id` AND `id` <= 2001)       into data/myd_test.mydumper2S3.00000.sql| Remaining jobs: -3
………
2022-11-13 21:18:10 [INFO] - Thread 4 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 1740198 < `id` AND `id` <= 1760198)       into data/myd_test.mydumper2S3.00009.sql| Remaining jobs: 0
2022-11-13 21:18:10 [INFO] - File data/myd_test.mydumper2S3.00002.sql transferred | Global: 27 MB/s
2022-11-13 21:18:10 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 2283598 < `id` AND `id` <= 2303598)       into data/myd_test.mydumper2S3.00003.sql| Remaining jobs: 0
………
2022-11-13 21:18:10 [INFO] - Thread 3 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 2424197 < `id` AND `id` <= 2424797)       into data/myd_test.mydumper2S3.00007.sql| Remaining jobs: 1
2022-11-13 21:18:10 [INFO] - Thread 3: Table mydumper2S3 completed
2022-11-13 21:18:10 [INFO] - Thread 3 shutting down
2022-11-13 21:18:10 [INFO] - Releasing DDL lock
2022-11-13 21:18:10 [INFO] - Queue count: 0 0 0 0 0
2022-11-13 21:18:10 [INFO] - Main connection closed
2022-11-13 21:18:10 [INFO] - Finished dump at: 2022-11-13 21:18:10
2022-11-13 21:18:32 [INFO] - File data/myd_test.mydumper2S3.00009.sql transferred in 22 seconds at 0 MB/s | Global: 2 MB/s
2022-11-13 21:18:36 [INFO] - File data/myd_test.mydumper2S3.00003.sql transferred in 4 seconds at 4 MB/s | Global: 2 MB/s
2022-11-13 21:18:39 [INFO] - File data/myd_test.mydumper2S3.00001.sql transferred in 2 seconds at 9 MB/s | Global: 2 MB/s
2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3.00007.sql transferred in 1 seconds at 4 MB/s | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3-metadata transferred | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - File data/metadata transferred | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - All data transferred was 104055843 at a rate of 3 MB/s

As you see from the log, the files are being streamed as soon as they are closed. However, it took more than 30 seconds after the dump finished for all the files to be streamed. Finally, the command returned a couple of seconds after the “All data transferred…” entry, as the buffer needs to flush the data and upload it to S3.

With--exec

If you need to upload every single file individually, this is the option that you should use. For instance, you can use –load-data or directly the –csv option to allow another process to consume the files.

Let’s see the example:

mydumper -T myd_test.mydumper2S3 -o data -v 3 \
  --exec="/usr/bin/aws s3 cp FILENAME s3://davidducos/mydumper_backup/ --region us-east-1" --exec-threads=8

In this case, AWS CLI will send to STDERR the status of the files that are being uploaded:

upload: data/myd_test-schema-create.sql to s3://davidducos/mydumper_backup/myd_test-schema-create.sql
upload: data/myd_test.mydumper2S3-schema.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-schema.sql
upload: data/myd_test.mydumper2S3.00042.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00042.sql
upload: data/myd_test.mydumper2S3.00010.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00010.sql
upload: data/myd_test.mydumper2S3.00026.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00026.sql
upload: data/myd_test.mydumper2S3-metadata to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-metadata
upload: data/metadata to s3://davidducos/mydumper_backup/metadata
upload: data/myd_test.mydumper2S3.00006.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00006.sql
upload: data/myd_test.mydumper2S3.00000.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00000.sql
upload: data/myd_test.mydumper2S3.00004.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00004.sql
upload: data/myd_test.mydumper2S3.00005.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00005.sql
upload: data/myd_test.mydumper2S3.00001.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00001.sql
upload: data/myd_test.mydumper2S3.00002.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00002.sql
upload: data/myd_test.mydumper2S3.00003.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00003.sql

And the log will be the traditional mydumper log.

Conclusion

This is an example with S3 but it is also possible to use it with different vendors or if you need encryption, just pipe to your encryption command and pipe again to AWS or any other command. I didn’t use ZSTD compression which is another option that you should explore. 

Nov
30
2022
--

PHP 8.0 Reaches End of Life

PHP 8.0 Reaches End of Life

PHP 8.0 Reaches End of LifeWhile the LAMP stack – Linux/Apache/MySQL/PHP – is not the dominant platform it was a decade ago, there are still many websites that depend on it. One of the pillars of this quartet had a milestone last week when PHP 8.0 passed into End Of Life status. While it will still have limited security issue support for another year, 8.0 should be phased out of your environments.

By the way, 8.1 has another year until it reaches End of Life status.

So please update your PHP 8.0 as soon as you can, double check the connector you are using to access your database is updated too, and make sure you are ready for 8.1’s future.

And make sure that if you are currently running MySQL 5.7 that you upgrade by October 2023 when it too reaches End of Life.

Nov
29
2022
--

MySQL Data Archival With Minimal Disruption

MySQL Data Archival

MySQL Data ArchivalWe all know that data is important, and some businesses need historical data to be available all the time. The problem is that queries on large tables perform poorly if they are not properly optimized. We get many customer requests in Managed Services to purge/archive large tables, and to achieve it, we use pt-archiver.

Recently, we received a request to archive a large table, and the customer was worried about the downtime and performance issues during the archival.

We proposed a solution to the customer to archive the table using pt-archive. The idea is to archive old data to other tables and keep the latest data on the current table with minimal performance issues. All of the data will remain available and can be queried anytime.

Percona Toolkit

In the blog, I will not explain how to use the pt-archiver, but we will discuss a use case of the pt-archiver.

If you are unfamiliar with pt-archiver, please refer to “Want to archive tables? Use Percona Toolkit’s pt-archive “; it explains how pt-archiver works and various pt-archiver arguments.

We will test data archival to keep 100 days’ worth of data with a few simple steps for demonstration.

Note: This method uses timestamp datatype to filter the data.

  1. Create two dummy tables.
  2. Insert records in the source table.
  3. Archive the record from the source to the destination table using –where condition per business requirements.
  4. Rename the tables.
  5. Add Pt-archiver as a cron.

This diagram better illustrates the process.

Remember Date and –where the condition in this example is just a reference. Use the archiving condition in pt-archiver as per business requirements.

Let’s create a source table and insert records using mysql_random_data_load:

Create Table: CREATE TABLE `blogpost` (

  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,

  `created_at` timestamp NULL DEFAULT NULL,

  `updated_at` timestamp NULL DEFAULT NULL,

  `deleted_at` timestamp NULL DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `blogpost_created_at_index` (`created_at`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

Let’s create the destination table;

(Using the following table for demonstration purposes only)

create table if not exists blogpost_new like blogpost;

Query OK, 0 rows affected (0.02 sec)

Create the following triggers using pt-online-schema-change 

Why triggers?

Any modifications to data in the original tables during the copy will be reflected in the new table because the pt-online-schema-change creates triggers on the original table to update the corresponding rows in the new table. 

(For demonstration purposes, I have added triggers created from the pt-online-schema-change test run)

# Event: DELETE

delimiter //

CREATE TRIGGER `pt_osc_test_blogpost_del` AFTER DELETE ON `test`.`blogpost` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`blogpost_new` WHERE `test`.`blogpost_new`.`id` <=> OLD.`id`; END

//

delimiter ; //


# Event : UPDATE

delimiter //

CREATE TRIGGER `pt_osc_test_blogpost_upd` AFTER UPDATE ON `test`.`blogpost` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`_blogpost_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `test`.`blogpost_new`.`id` <=> OLD.`id`; REPLACE INTO `test`.`_blogpost_new` (`id`, `created_at`, `updated_at`, `deleted_at`) VALUES (NEW.`id`, NEW.`created_at`, NEW.`updated_at`, NEW.`deleted_at`); END

//

delimiter ; //

# Event : INSERT

delimiter //

CREATE TRIGGER `pt_osc_test_blogpost_ins` AFTER INSERT ON `test`.`blogpost` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; REPLACE INTO `test`.`blogpost_new` (`id`, `created_at`, `updated_at`, `deleted_at`) VALUES (NEW.`id`, NEW.`created_at`, NEW.`updated_at`, NEW.`deleted_at`);END

delimiter ; //

Verify that all triggers have been created.

SELECT * FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_SCHEMA='test' and EVENT_OBJECT_TABLE in ('blogpost')\G

The output should be something like 

+--------------------------+

| trigger_name             |

+--------------------------+

| pt_osc_test_blogpost_ins |

| pt_osc_test_blogpost_upd |

| pt_osc_test_blogpost_del |

+--------------------------+

3 rows in set (0.01 sec)

Copy the last 100 days of data using the pt-archiver. Verify with –dry-run

(Screen session can be used to perform pt-archiver if the table is large in size.)

pt-archiver --source h=localhost,D=test,t=blogpost,i=blogpost_created_at_index \

--dest h=localhost,D=test,t=blogpost_new \

--where "created_at >= date_sub(curdate(), interval 100 day)" --no-delete --no-check-columns \

--limit=10000 --progress=10000 --no-check-charset --dry-run

The output should be something like this:

SELECT /*!40001 SQL_NO_CACHE */ `id`,`created_at`,`updated_at`,`deleted_at` FROM `test`.`blogpost` FORCE INDEX(`blogpost_created_at_index`) WHERE (created_at >= date_sub(curdate(), interval 100 day)) ORDER BY `created_at` LIMIT 10000

SELECT /*!40001 SQL_NO_CACHE */ `id`,`created_at`,`updated_at`,`deleted_at` FROM `test`.`blogpost` FORCE INDEX(`blogpost_created_at_index`) WHERE (created_at >= date_sub(curdate(), interval 100 day)) AND (((? IS NULL AND `created_at` IS NOT NULL) OR (`created_at` > ?))) ORDER BY `created_at` LIMIT 10000

INSERT INTO `test`.`blogpost_new`(`id`,`created_at`,`updated_at`,`deleted_at`) VALUES (?,?,?,?)

Let’s execute the pt-archiver:

(Following Pt-archiver will copy 100 days worth of the data to _new table and triggers will up to date the _new table.)

pt-archiver --source h=localhost,D=test,t=blogpost,i=blogpost_created_at_index \

--dest h=localhost,D=test,t=blogpost_new \

--where "created_at >= date_sub(curdate(), interval 100 day)" --no-delete --no-check-columns \

--limit=10000 –replace --progress=10000 --no-check-charset

The pt-archiver output should be something like this:

TIME                ELAPSED   COUNT

2022-09-28T23:50:20       0       0

2022-09-28T23:50:44      24   10000

2022-09-28T23:51:08      48   20000

2022-09-28T23:51:25      65   27590

Once the pt-archiver finishes, check the condition code of the pt-archiver:

echo $?

(should be 0)

The next step is to check if the rows have been inserted into a new table and compare it with the original table. 

NOTE: The below results are just examples of tests; use the right date or where condition:

select max(created_at) from test.blogpost_new;

+---------------------+

| max(created_at)     |

+---------------------+

| 2022-27-09 02:11:00 |

+---------------------+

select count(1) from test.blogpost where created_at <= date_sub('2022-27-09', interval 100 day);

+----------+

| count(1) |

+----------+

|    65366 |

+----------+

1 row in set (0.02 sec)

select count(1) from test.blogpost_new where created_at >= date_sub('2022-09-26', interval 100 day);

+----------+

| count(1) |

+----------+

|    23758 |

+----------+

1 row in set (0.01 sec)

Why rename tables? 

The idea is to keep 100 days’ worth of data in the new table and rename it as the source table, and the original table with the _archive prefix will have all the data, including the last 100 days.

RENAME TABLE blogpost TO blogpost_archive, blogpost_new to blogpost;

It should appear something like this:

+--------------------------------+

| Tables_in_test                 |

+--------------------------------+

| blogpost         |

| blogpost_archive |

+--------------------------------+

Drop the triggers created using pt-online-schema-change . 

DROP TRIGGER IF EXISTS test.test_blogpost_ins;

DROP TRIGGER IF EXISTS test.test_blogpost_upd;

DROP TRIGGER IF EXISTS test.test_blogpost_del;

Once the table rename is completed, copy data from the blogpost table to _archive table.

Let’s add the pt-archiver command to cron to make the process automatic. (It is advisable to create the script, use the below archiver command, and test it.)

pt-archiver --source h=localhost,D=test,t=blogpost,i=blogpost_created_at_index \

--dest h=localhost,D=test,t=blogpost_archive \

--where "created_at >= date_sub(curdate(), interval 100 day)" --no-delete --no-check-columns \

--limit=10000 --progress=10000 –replace b=0,L=yes –bulk-insert --no-check-charset >dev>null 2>1> path/to/logs

Now it is time to check if the cron was successful.

Check if the rows deleted have been copied to the test.blogpost_archive table, and deleted from the test.blogpost:

SELECT COUNT(1) FROM test.blogpost_archive WHERE created_at <= date_sub('2022-09-02', interval 100 day);

Verify if the blogpost table has 100 days of data, and the following query should return 0 rows:

SELECT COUNT(1) FROM test.blogpost WHERE created_at <= date_sub('2022-09-02', interval 100 day);

Hope you found this use case of the pt-archiver helpful when you need to purge/archive large tables!

Nov
29
2022
--

PMM, Federated Tables, Table Stats, and Lots of Connections!

Percona Monitoring and Management Federated Tables

Percona Monitoring and Management Federated TablesEarlier in the year, I was working on an issue where one of my clients had reported a massive influx in connection on their hosts after enabling Percona Monitoring and Management (PMM). This was something I had not seen before and after researching for a couple of days I discovered that if you monitor a MySQL instance with PMM configured to collect table statistics, and if the tables that it’s gathering statistics from are Federated, it will generate a connection on the remote host for the Federated tables, one for each Federated table in the instance. Let’s go over the details and provide some examples so we can understand this a bit better.

First, I’ll offer a reminder that a Federated table is simply a table that you can put in your MySQL instance that is empty locally and uses a network connection to get the data from another MySQL host when the table is queried. For example, if I have a normal table called peter_data on host mysql1, I can set up a Federated table on mysql2 that points to mysql1. Each time that mysql2 has a query on the peter_data table, it connects to mysql1, gets the data, and then returns it locally. This feature is a lot less common now than it once was given how MySQL replication has improved over time, but as you can see here in the MySQL reference guide, it’s still supported.

So how does this impact our issue where PMM was establishing so many connections? Let’s set this up in my lab and have a look!

Lab setup

Let’s start by setting up my first host centos7-1 as the “remote host”. This is the host that has the actual data on it.

[root@centos7-1 ~]# mysql
.....
mysql> use ftest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show create table t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    2 |    2 |
|  3 |    3 |    3 |
+----+------+------+
3 rows in set (0.00 sec)

Now that’s done, I’ll set up my second host centos7-2 to act as the host that has the Federated table.

[root@centos7-2 ~]# mysql -u root -ppassword
mysql> select * from mysql.servers;
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
| Server_name | Host      | Db    | Username | Password | Port | Socket | Wrapper | Owner |
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
| fedlink     | 10.0.2.12 | ftest | root     | password | 3306 |        | mysql   |       |
+-------------+-----------+-------+----------+----------+------+--------+---------+-------+
1 row in set (0.00 sec)

mysql> use ftest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  `c3` int(11) DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1 CONNECTION='fedlink/t1'
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    2 |    2 |
|  3 |    3 |    3 |
+----+------+------+
3 rows in set (0.01 sec)

Recreating the issue

Now that we have our Federated table set up. Let’s test and see how querying table metadata on centos7-2, the instance with the Federated table, impacts connections on centos7-1, the remote host. What I did was connect to centos7-2, query the information_schema.tables table much in the same way that PMM does, disconnected, and then connected a second time running the same query.

[root@centos7-2 ~]# mysql -u root -ppassword
...
mysql> select * from information_schema.tables where table_schema = 'ftest';
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE    | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME         | CHECK_TIME | TABLE_COLLATION   | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| def           | ftest        | t1         | BASE TABLE | FEDERATED |      10 | Fixed      |          3 |           5461 |       16383 |               0 |            0 |         0 |           NULL | NULL        | 1969-12-31 19:33:42 | NULL       | latin1_swedish_ci |     NULL |                |               |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> exit
Bye

[root@centos7-2 ~]# mysql -u root -ppassword
....
mysql> select * from information_schema.tables where table_schema = 'ftest';
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE    | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME         | CHECK_TIME | TABLE_COLLATION   | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
| def           | ftest        | t1         | BASE TABLE | FEDERATED |      10 | Fixed      |          3 |           5461 |       16383 |               0 |            0 |         0 |           NULL | NULL        | 1969-12-31 19:33:42 | NULL       | latin1_swedish_ci |     NULL |                |               |
+---------------+--------------+------------+------------+-----------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+---------------------+------------+-------------------+----------+----------------+---------------+
1 row in set (0.01 sec)

mysql> exit
Bye

As you can see below, this resulted in two connections on centos7-1 that did not drop despite disconnecting and reconnecting on centos7-2.

mysql> show processlist;
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
| Id | User | Host            | db    | Command | Time | State    | Info             | Rows_sent | Rows_examined |
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
| 23 | root | localhost       | NULL  | Query   |    0 | starting | show processlist |         0 |             0 |
| 25 | root | 10.0.2.13:33232 | ftest | Sleep   |  112 |          | NULL             |         1 |             1 |
| 27 | root | 10.0.2.13:33236 | ftest | Sleep   |   71 |          | NULL             |         1 |             1 |
+----+------+-----------------+-------+---------+------+----------+------------------+-----------+---------------+
3 rows in set (0.00 sec)

This doesn’t sound like that big of a deal, especially considering that PMM usually remains connected to the host. So if you have one Federated table in your system and if PMM is monitoring table stats, it will only add one connection on the remote host right? That’s true, but in my lab, I expanded this to create 145 Federated tables, and the result of this despite only querying the information_schema.tables table, 145 connections were created on centos7-1.

[root@centos7-2 ~]# mysql -u root -ppassword
...
mysql> select * from information_schema.tables where table_schema = 'ftest';
....
145 rows in set (0.08 sec)

[root@centos7-1 ~]# mysql -u root -ppassword
mysql> select * from information_schema.processlist where substring_index(host,':',1) = '10.0.2.13' and user = 'root';
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
| ID   | USER | HOST             | DB    | COMMAND | TIME | STATE | INFO | TIME_MS | ROWS_SENT | ROWS_EXAMINED |
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
| 2120 | root  | 10.0.2.13:60728 | ftest | Sleep   |    7 |       | NULL |    6477 |         1 |             1 |
| 2106 | root  | 10.0.2.13:60700 | ftest | Sleep   |    7 |       | NULL |    6701 |         1 |             1 |
....
| 2117 | root  | 10.0.2.13:60722 | ftest | Sleep   |    7 |       | NULL |    6528 |         1 |             1 |
| 2118 | root  | 10.0.2.13:60724 | ftest | Sleep   |    7 |       | NULL |    6512 |         1 |             1 |
+------+------+------------------+-------+---------+------+-------+------+---------+-----------+---------------+
145 rows in set (0.00 sec)

This can be a big problem if you have a host that doesn’t support a lot of connections and you need those connections to be available for your app!

Conclusion

Based on the lab testing above, we can see how PMM queries against the information_schema.tables table can cause issues with a lot of connections being created on a Federated remote host. This probably will not be a problem for most MySQL users considering that Federated tables aren’t that common, but if you have Federated tables and if you’re considering adding PMM monitoring, or any other monitoring that collects table statistics, be warned! The maintenance of Federated connections on a remote host is not a bug, this is how it’s supposed to behave.

If you have Federated tables and if you want to avoid this problem, you can ensure that you use the flag –disable-tablestats when adding MySQL to your local PMM client using the “pmm-admin add mysql” command.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

Nov
28
2022
--

Talking Drupal #375 – Being A Creative Director

Today we are talking about Being a Creative Director with Randy Oest.

For show notes visit: www.talkingDrupal.com/375

Topics

  • What is a Creative Director?
  • How is being a CD at a technical company different?
  • Do Drupal CD’s need additional skills?
  • Sometimes things get lost in translation between design and development how do you bridge that gap?
  • How do you mentor?
  • How do you interview for creative positions?
  • Do you hire developers too?
  • Optimal makeup for a team.
  • Guiding the Four Kitchen’s team
  • Inpiration

Resources

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Randy Oest – randyoest.com @amazingrando

MOTW Correspondent

Martin Anderson-Clutz – @mandclu ECA ECA is a powerful, versatile, and user-friendly rules engine for Drupal 9+. The core module is a processor that validates and executes event-condition-action plugins. Integrated with graphical user interfaces like BPMN.iO, Camunda, ECA Core Modeller or other possible future modellers, ECA is a robust system for building conditionally triggered action sets.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com