Dec
22
2022
--

Talking Drupal #380 – Happy New Year

Today we share interviews conducted at NEDCamp in November 2022. This is an audio only episode.

For show notes visit: www.talkingDrupal.com380

Topics / Guests

  1. John Picozzi
  2. Shane Thomas
  3. Donna Bungard
  4. Chris Wells
  5. Jacob Rockowitz
  6. Leslie Glynn
  7. Mike Anello
  8. Mike Miles
  9. Nic Laflin

Hosts

Stephen Cross @stephencross

MOTW

with Martin Anderson-Clutz – @mandclu

Bookable Calendar Module https://www.drupal.org/project/bookable_calendar A very easy to use Bookable Calendar module. Whether you’re giving lessons and want your students to be able to book a lesson or a business trying to stagger traffic into your building, this module aims to get you up and running as fast as possible.

Dec
22
2022
--

PostgreSQL Role Inheritance at a Glance

PostgreSQL Role Inheritance at a Glance

PostgreSQL manages database access permissions using the concept of roles. A role can be either a database user or a group of database users, depending on how the role is set up. Roles can own the database objects and assign privileges on those objects to other roles to control who has access to which objects. Furthermore, it is possible to grant membership in a role to another role, thus allowing the member role to use privileges assigned to another role.

PostgreSQL lets you grant permissions directly to the database users. However, as a good practice for security and ease of user-account management, it is recommended that you create multiple roles with specific sets of permissions based on application and access requirements and then assign the appropriate roles to each user. Such assignment of roles can become complex if we assign a role to another role that is already a parent role of some other role.

To understand this with simple words, consider we have multiple roles inside the database as below:

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 A         |                                                            | {B}
 B         | Cannot login                                               | {E,D}
 C         |                                                            | {E,D,B}
 D         | Cannot login                                               | {}
 E         | Cannot login                                               | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

As you can see above, role A is a member of B and B is a member of D and E, so A is inheriting permissions from B directly and from D and E indirectly.  With a small set of roles, we can quickly identify the inheritance by looking at the output of \du, but it would become difficult with a large set of roles. 

The below SQL query can be used to get the role inheritance/cascading:

WITH RECURSIVE cte AS (
SELECT oid ,oid as member_of,1 as d,''::name as path FROM pg_roles r where r.rolname !~ '^pg_'
UNION
SELECT m.roleid,m.member as member_of ,d+1,path||'->'||pg_get_userbyid(cte.oid) as path
FROM cte
JOIN pg_auth_members m ON m.member = cte.oid where d<15
)
SELECT distinct pg_get_userbyid(member_of)::varchar as username, pg_get_userbyid(oid)::varchar as parent_role ,d::int as depth, substr(path::text||'->'||pg_get_userbyid(oid),3) as path FROM cte where d > 1 ORDER BY depth;

username | parent_role | depth |  path
----------+-------------+-------+---------
A        | B           |     2 | A->B
B        | D           |     2 | B->D
B        | E           |     2 | B->E
C        | B           |     2 | C->B
C        | D           |     2 | C->D
C        | E           |     2 | C->E
B        | D           |     3 | A->B->D
B        | D           |     3 | C->B->D
B        | E           |     3 | A->B->E
B        | E           |     3 | C->B->E
(10 rows)

We can also create a function to get the inherited role details for the specific role:

CREATE OR REPLACE FUNCTION role_inheritance(usrname character varying) RETURNS TABLE(username character varying, parent_role character varying, depth integer, inherit_path text)
LANGUAGE plpgsql
AS $$
begin
return query
WITH RECURSIVE cte AS (
SELECT oid ,oid as member_of,1 as d,''::name as path FROM pg_roles r where r.rolname=usrname
UNION
SELECT m.roleid,m.member as member_of ,d+1,path||'->'||pg_get_userbyid(cte.oid) as path
FROM cte
JOIN pg_auth_members m ON m.member = cte.oid where d<15
)
SELECT distinct pg_get_userbyid(member_of)::varchar as username, pg_get_userbyid(oid)::varchar as parent_role ,d::int as depth, substr(path::text||'->'||pg_get_userbyid(oid),3) as path FROM cte WHERE d > 1 order by 3;
end;
$$;

Execute the function for a specific role:

postgres=# select * from role_inheritance('A');
username | parent_role | depth | inherit_path
----------+-------------+-------+--------------
A        | B           |     2 | A->B
B        | D           |     3 | A->B->D
B        | E           |     3 | A->B->E
(3 rows)

Overall, PostgreSQL roles can be used very effectively to handle permissions if we know the impact of granting them. We encourage you to try our product Percona Distribution for PostgreSQL, trusted by numerous global brands across many industries, for a unified experience to monitor, manage, secure, and optimize database environments.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Dec
19
2022
--

Talking Drupal #378 – Acquia’s Drupal Acceleration Team

Today we are talking about Acquia’s Drupal Acceleration Team with Tim Plunkett.

For show notes visit: www.talkingDrupal.com/378

Topics

  • What is the Drupal Acceleration Team (DAT)
  • Responsibilities
  • Previous releases
  • Office of the CTO – OCTO
  • How big is the team
  • Direction
  • Priorities for new releases
  • Dries’ involvement
  • Contribution %
  • What are you working on now
  • Something you wish you were working on
  • R&D
  • Planning 2-5 years
  • Getting involved

Resources

Guests

Tim Plunkett – @timplunkett

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Leslie Glynn – redfinsolutions.com @leslieglynn

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Keysave Adds Javascript to allow editors and admins to save an entity or config using command-s or control-s instead of clicking on the submit button.

Dec
19
2022
--

PMM V2.33: Offline Metric Collection, Guided Alerting Tour, Security Fixes, and More!

latest release of Percona Monitoring and Management

latest release of Percona Monitoring and ManagementWe are excited to announce the latest release of Percona Monitoring and Management (PMM) – V2.33. This release includes several new features and improvements that make PMM even more effective and user-friendly. Some of the key highlights of PMM V2.33 include:

  • Offline metric collection during PMM server outages or loss of PMM client-server network connectivity
  • A guided tour of Alerting, which helps new users get up to speed quickly and start using the alerting features of PMM
  • Easily restore your MongoDB databases to a previous state
  • Updated Grafana to version 9.2.5 to fix critical security vulnerabilities
  • Tab completion for the pmm-admin CLI command, which makes it easier to use the command line interface to manage PMM

You can get started using PMM in minutes with our PMM Quickstart guide to check out the latest version of PMM V2.33. 

Client-side caching minimizes potential for metrics loss

This new feature ensures that the PMM Client saves the monitoring data locally when a connection to the PMM server is lost, preventing gaps in the data. When the connection is restored, the data is sent to the PMM server, allowing the monitoring of your systems to continue without any data loss.

Note:

The client node is currently limited to storing only 1 GB of offline data. So,  if your instance is down for three days and generates more than 1 GB of data during that time, all the data will not be retrieved.

One of the core principles of our open-source philosophy is transparency, and we are committed to sharing our roadmap openly and transparently with our users. We are happy to share the roadmap for the implementation of PMM high availability (HA) in three stages, which has been a highly requested feature by our users. 

PMM HA will be rolled out in three stages. Stage one, which is included in PMM 2.33.0, involves the implementation of a data loss prevention solution using VictoriaMetrics integration for short outages. This feature is now available in the latest release of PMM. Stages two and three of PMM HA will be rolled out, including additional features and enhancements to provide a complete high availability solution for PMM. We are excited to bring this much-anticipated feature to our users, and we look forward to sharing more details in the coming months.

 

Stages of PMM HA Solutions Provided
Stage one (included in PMM 2.33.0) As an initial step toward preventing data loss we have developed the following:

Offline metric collection for short outages

Stage two (will be rolled out in 2023) As part of PMM HA stage two in HA we plan to implement the following:

HA data sources

As part of stage two, we will let the users use external data sources thereby decreasing dependency on the file system.

Stage three (will be rolled out in 2023) As part of PMM HA stage three we plan to implement the following:

HA Clustered PMM Servers 

Clustered PMM will be the focus of stage three. Detailed information will be included in the upcoming release notes.

 

Please feel free to book a 1:1 meeting with us to share your thoughts, needs, and feedback about PMM HA.

Tip: To improve the availability of the PMM Server until the general availability of PMM HA, PMM administrators can deploy it on Kubernetes via the Helm chart. The Kubernetes cluster can help ensure that PMM is available and able to handle different types of failures, such as the failure of a node or the loss of network connectivity.

Critical security vulnerabilities fixed

In PMM 2.33.0, we have updated Grafana to version 9.2.5, which includes important security fixes. This upgrade addresses several critical and moderate vulnerabilities, including CVE-2022-39328, CVE-2022-39307, and CVE-2022-39306. For more details, please see the Grafana 9.2.5 release notes. We strongly recommend that all users upgrade to the latest version of PMM to ensure the security of their systems.

Guided tour on Alerting

In the 2.31.0 release of PMM, we added a new feature called Percona Alerting, which provides a streamlined alerting system. To help users get started with this new feature, we have added a short in-app tutorial that automatically pops up when you first open the Alerting page. This tutorial will guide you through the fundamentals of Percona Alerting, and help you explore the various features and options available. We hope this tutorial will make it easier for users to get started with Percona Alerting and take full advantage of its capabilities.

Restore MongoDB backups more easily

Building upon the significant improvements for MongoDB backup management introduced in the previous release, we are now simplifying the process for restoring physical MongoDB backups. Starting with this release, you can restore physical backups straight from the UI, and PMM will handle the process end-to-end. Prior to this, you would need to perform additional manual steps to restart your MongoDB database service so that your applications could make use of the restored data.

Improvements on the pmm-admin CLI command

pmm-admin is a command-line tool that is used to manage and configure PMM. It is part of the PMM Client toolset and can be used to perform various administrative tasks, such as managing inventory. We have added tab completion for the pmm-admin CLI command. This means that you no longer have to know the entire command when using pmm-admin. Instead, you can simply type the initial part of the command and press Tab, and the rest of the command will be automatically completed for you.  This new feature makes it easier to use the command line interface and ensures that you can quickly and easily access all of the powerful features of PMM. 

What’s next?

  • A Health dashboard for MySQL is on the way. Please share your suggestions in the comments or forum if you’d like to be part of the group shaping PMM. 
  • We have started to work on two new and significant projects: High Availability in PMM and advanced Role-Based Access Control (RBAC). We’d love to hear your needs, use cases, and suggestions. You can quickly book a short call with the product team to collaborate with us. 

Install PMM 2.33 now or upgrade your installation to V2.33 by checking our documentation for more information about upgrading.

Thanks to Community and Perconians

At Percona, we are grateful for our supportive community and dedicated team, who work together to shape the future of PMM. If you would like to be a part of this community, you can join us on our forums to request new features, share your feedback, and ask for support. We value the input of our community and welcome all members to participate in the ongoing development of PMM.

See PMM in action now!

Dec
15
2022
--

JSON and Relational Databases – Part One

JSON and Relational Databases

JSON and Relational DatabasesJSON data and relational databases have traditionally been diametric opposites. One is very free form with minimal rules on the formatting and the other is very strict and formal. So many folks were surprised when relational databases started adding a JSON data type to their products.

In this multi-part blog series, we will start with an overview of using JSON in a relational database. Then we will dig deeper into the issues raised to see what, if any, progress has been made in resolving the possible conflicts and issues. It is obvious that neither RDMS nor JSON are going away but can they coexist?

Why JSON?

JSON is popular. Very popular. It is the interchange format of choice for many developers, is easy to parse, and is ridiculously flexible. For some, the ability to encapsulate a JSON document in a row is a necessity. Others have found that JSON allows some schema flexibility or ‘fudge factor’ in situations where database schema changes are expensive.

Unstructured data has a lot of appeal in environments where defining the data itself, especially at the beginning of a project, is hard to do. Or, there are too many known unknowns. This approach to never-ending incrementalism is not optimal for database operations.

The Achilles Heel of unstructured data is that it is hard to apply rigor to the data. Making a key/value pair a required item or of a certain data type or format is not part of the JSON standard. And it is easy for a team to decide to record electronic mail addresses with inconsistent keys such as e-mail, EMail, or email as there is no ‘native’ enforcement mechanism.

Why not JSON?

Structured data has a lot going for it and the relational model works well with most business models. JSON breaks normalization rules.

In the structured world of relational databases, all columns are carefully thought out, typed, and arranged. So the idea of a free-form column of unstructured does not sit well with long-time database practices.

With a traditional relational database, you know that a column of integers is going to be nothing but integers. This makes searching that column of integers very easy if it is indexed. And if you try to put in a string to that column of integers the server will refuse to accept that row of data. It is a lot less expensive to keep out bad data than having to go back later and clean it up.

But altering tables to add or qualify columns can be slow, expensive, and unsettling. The alter table process can be thought of as the server making a copy of the existing table (goodbye memory and/or disk during the process), making the change on the new version of the table, and then loading the data into the new table. The actual process has a lot more intricacies than that and of course, larger tables take more time to alter. Yup, the free-form JSON stuff looks a lot more attractive now, doesn’t it?

But you do not need a JSON data type!

Before JSON data types were introduced, it was possible to save a JSON document in a text string. It was very simple to put the entire document in a column and be done with it. Sadly this approach made it hard to search for data embedded in the document. Regular Expression searches are not exactly speedy. Changing one part of the JSON document meant the whole document had to be rewritten. And usually, you had to pull the entire document out of the table and parse it in an application. This approach works but not well.

JSON with relational

So all but the most devoted to the relational model can see the benefits of having a JSON column, or two, in a row of a table. In MySQL, you get one gigabyte of space for each JSON column – nothing to be sneezed at.

Both the PostgreSQL and the MySQL implementations of JSON data types provide some extra benefits. First, you do not have to update the entire document when you make minor changes – just the changed bits. The server has many functions to extract data efficiently on the server itself.

There is a performance hit when searching JSON data (see the part on indexing below) but the extra flexibility is handy. For example, look how MySQL stores Dual Passwords in the mysql.user tables.

{"additional_password": "$A$005$;H7\u001b\u001b\u0006<`qFRU\tNRxT Z\u0003Ya/iej8Az8LoXGTv.dtf9K3RdJuaLFtXZHBs3/DntG2"}

In this case, you have a temporary second password on a user account so you do not need a column in the table to take care of this data. This information is around only for the accounts that need it. JSON_TABLE() is a powerful command! Sadly, PostgreSQL 15 withdrew its JSON_TABLE() at the last minute but hopefully, it will be in version 16. Simply stated, JSON_TABLE() temporarily converts the unstructured JSON data into structured data so it can be processed with Structured Query Language (SQL). This allows you to use CTEs, Window Functions, or aggregate functions on that data.

mysql> select country_name, IndyYear from countryinfo,
json_table(doc,"$" columns (country_name char(20) path "$.Name",
IndyYear int path "$.IndepYear")) as stuff
where IndyYear > 1992;
+----------------+----------+
| country_name | IndyYear |
+----------------+----------+
| Czech Republic | 1993 |
| Eritrea | 1993 |
| Palau | 1994 |
| Slovakia | 1993 |
+----------------+----------+
4 rows in set, 67 warnings (0.00 sec)

Generated columns and indexes

MySQL has a different approach to indexing JSON data that provides two choices. If you need to extract values from a JSON key/value pair into its own column then you can use generated columns.

ALTER TABLE car ADD car_type VARCHAR(30)
AS (JSON_UNQUOTE(car->"$.type")) STORED;

Then this column can be indexed for quick searches.

The second MySQL option is Multi-Value indexes. These are secondary indexes defined on a column that stores an array of values. Usually, there is a one-to-one relationship between an index entry and a row of data but Mult-Value indexes have a one-to-many relationship that is intended for indexing JSON arrays

With PostgreSQL, you can simply create a GIN index on the values of a specified key in the JSON document.

CREATE INDEX books_tages_idx ON books USING gin (data->'tags');

Rigor

MySQL implemented a subset of the JSON-Shema.org proposal to ensure that your JSON data is what you want before it is inserted into the database. This way you can require certain fields, as well as check data types and data ranges. The JSON_VALID() function checks the incoming JSON document against a parametrized list of attributes to see if it complied. Used in conjunction with a constraint check, you can keep the bad or incomplete data out of your JSON column.

Conclusion

JSON will remain popular for data interchange usage for a long time and relational databases will need to adapt to its usage. The old relational arguments about abstaining from using JSON will fade as efforts like that from JSON-Schema.org provide the ability to ensure the completeness of data. Altering tables will never be completely instantaneous even though work like MySQL’s instant add column work has proven useful.

And the need to add or alter columns in a table will always be around. Being able to get some flexibility with a JSON column could provide some relief in shops where continuous operations with zero downtime is the goal and could be enough of a trade-off when using the relational model.

Dec
15
2022
--

Least Privilege for Kubernetes Resources and Percona Operators

Operators hide the complexity of the application and Kubernetes. Instead of dealing with Pods, StatefulSets, tons of YAML manifests, and various configuration files, the user talks to Kubernetes API to provision a ready-to-use application. An Operator automatically provisions all the required resources and exposes the application. Though, there is always a risk that the user would want to do something manual that can negatively affect the application and the Operator logic.

In this blog post, we will explain how to limit access scope for the user to avoid manual changes for database clusters deployed with Percona Operators. To do so, we will rely on Kubernetes Role-based Access Control (RBAC).

The goal

We are going to have two roles: Administrator and Developer. Administrator will deploy the Operator, create necessary roles, and service accounts. Developers will be able to:

  1. Create, modify, and delete Custom Resources that Percona Operators use
  2. List all the resources – users might want to debug the issues

Developers will not be able to:

  1. Create, modify, or delete any other resource

Least Privilege for Kubernetes

As a result, the Developer will be able to deploy and manage database clusters through a Custom Resource, but will not be able to modify any operator-controlled resources, like Pods, Services, Persistent Volume Claims, etc.

Action

We will provide an example for Percona Operator for MySQL based on Percona XtraDB Cluster (PXC), which just had version 1.12.0 released

Administrator

Create a dedicated namespace

We will allow users to manage clusters in a single namespace called

prod-dbs

:

$ kubectl create namespace prod-dbs

Deploy the operator

Use any of the ways described in our documentation to deploy the operator into the namespace. My personal favorite will be with simple

kubectl

command:

$ kubectl apply -f https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/v1.12.0/deploy/bundle.yaml

Create ClusterRole

ClusterRole resource defines the permissions the user will have for a specific resource in Kubernetes. You can find the YAML in this github repository.

    - apiGroups: ["pxc.percona.com"]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: [""]
      resources:
      - pods
      - pods/exec
      - pods/log
      - configmaps
      - services
      - persistentvolumeclaims
      - secrets
      verbs:
      - get
      - list
      - watch

As you can see we allow any operations for

pxc.percona.com

resources, but restrict others to get, list, and watch.

$ kubectl apply -f https://github.com/spron-in/blog-data/blob/master/rbac-operators/clusterrole.yaml

Create ServiceAccount

We are going to generate a kubeconfig for this service account. This is what the user is going to use to connect to the Kubernetes API.

apiVersion: v1
kind: ServiceAccount
metadata:
    name: database-manager
    namespace: prod-dbs

$ kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/rbac-operators/serviceaccount.yaml

Create ClusterRoleBinding

We need to assign the

ClusterRole

to the

ServiceAccount

.

ClusterRoleBinding

 acts as a relation between these two.

$ kubectl apply -f https://github.com/spron-in/blog-data/blob/master/rbac-operators/clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
    name: percona-database-manager-bind
    namespace: prod-dbs
roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: percona-pxc-rbac
subjects:
    - kind: ServiceAccount
      name: database-manager
      namespace: prod-dbs

Developer

Verify

can-i

command allows you to verify if the service account can really do what we want it to do. Let’s try:

$ kubectl auth can-i create perconaxtradbclusters.pxc.percona.com --as=system:serviceaccount:prod-dbs:database-manager
yes

$ kubectl auth can-i delete service --as=system:serviceaccount:prod-dbs:database-manager
no

All good. I can create Percona XtraDB Clusters, but I can’t delete Services. Please note, that sometimes it might be useful to allow Developers to delete Pods to force the cluster recovery. If you feel that it is needed, please modify the ClusterRole.

Apply

There are multiple ways to use this service account. You can read more about it in Kubernetes documentation. For a quick demonstration, we are going to generate a kubeconfig that we can share with our user. 

Run this script to generate the config. What it does:

  1. Gets the service account secret resource name
  2. Extracts Certificate Authority (ca.crt) contents from the secret
  3. Extracts the token from the secret
  4. Gets the endpoint of the Kubernetes API
  5. Generates the kubeconfig using all of the above
$ bash generate-kubeconfig.sh > tmp-kube.config

Now let’s see if it works as expected:

$ KUBECONFIG=tmp-kube.config -n prod-dbs apply -f https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/v1.12.0/deploy/cr.yaml
perconaxtradbcluster.pxc.percona.com/cluster1 created

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          15m
cluster1-haproxy-1                                 2/2     Running   0          14m
cluster1-haproxy-2                                 2/2     Running   0          14m
cluster1-pxc-0                                     3/3     Running   0          15m
cluster1-pxc-1                                     3/3     Running   0          14m
cluster1-pxc-2                                     3/3     Running   0          13m
percona-xtradb-cluster-operator-77bf8b9df5-qglsg   1/1     Running   0          52m

I was able to create the Custom Resource and a cluster. Let’s try to delete the Pod:

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs delete pods cluster1-haproxy-0
Error from server (Forbidden): pods "cluster1-haproxy-0" is forbidden: User "system:serviceaccount:prod-dbs:database-manager" cannot delete resource "pods" in API group "" in the namespace "prod-dbs"

What about the Custom Resource?

$ KUBECONFIG=tmp-kube.config kubectl -n prod-dbs delete pxc cluster1
perconaxtradbcluster.pxc.percona.com "cluster1" deleted

Conclusion

Percona Operators automate the deployment and management of the databases in Kubernetes. Least privilege principle should be applied to minimize the ability of inexperienced users to affect availability and data integrity. Kubernetes comes with sophisticated Role-Based Access Control capabilities which allow you to do just that without the need to reinvent it in your platform or application.

Try Percona Operators

Dec
14
2022
--

Generate a pgBadger Report From PostgreSQL Flexible Server Logs

pgBadger Report From PostgreSQL Flexible Server Logs

In one of our previous posts Detailed Logging for Enterprise-Grade PostgreSQL, we discussed parameters to enable detailed logging and use Log Analyzer – pgBadger. In this blog post, we will configure a Microsoft Azure provisioned PostgreSQL Flexi Server to populate logs and generate a pgBadger report.

The latest pgBadger Utility provides support for JSON format logs. Microsoft Azure PostgreSQL Flexi Server does not provide PostgreSQL logs as we used to get with a single server or on-premises environment. It will get populated after enabling it in JSON Format.

In this blog, we will configure and generate a pgBadger report using JSON format logs and if we are using an older version of pgBadger utility, then convert it into regular logs.

Before downloading we need to tune parameters related to logging in the PostgreSQL.conf file and reload the configuration. You can download and install pgBadger from here.

Also, you can go through here and navigate to pgBadger to know more about it.

Configuration

From the Microsoft Azure Cloud console — https://portal.azure.com/#home  — we need to create a storage account as shown below:

Microsoft Azure Cloud console

Click on the CREATE option and fill in details like name and resource group as shown below:

Configure existing PostgreSQL Flexi Server to use the storage account to generate PostgreSQL Logs.

Select diagnostic settings and add the already-created storage account.

PostgreSQL storage account

 

Login into Microsoft Azure Cloud, navigate to the storage account and navigate to the respective storage account that has been created for PostgreSQL Flexible logs. Navigate to the location of the logs as shown below:

Navigate to the date and time to choose the hourly JSON file(s) required. Right-click on the .json file and download the log which will be in JSON format.

Sample .json logs look like the below:

 

Generate pgBadger report

Use the Jump server provisioned for pgBadger, and copy the JSON file from the local machine to the Jump server.

If you are using the latest pgBadger utility, you can pass the JSON format logs using the -f option to generate a pgBadger report:

pgbadger --prefix='%t-%c-' -f jsonlog PT1H.json -o pgbadger_report.html

If you do not have the option to use the latest pgBadger utility, then use the below Linux command to extract PostgreSQL logs from JSON File and generate a postgresql.log file.

cut -f9- -d\: PT1H.json| cut -f1 -d\} | sed -e 's/^."//' -e 's/"$//' > postgresql.log

Generate the PgBadger report from the postgresql.log file and parse it into an HTML file:

pgbadger --prefix='%t-%c-' postgresql.log -o pgbadger_report.html

Copy the pgbadger_report.html from the Jump server side to the local machine and review the PgBadger report.

Conclusion

The pgBadger utility is continuously emerging as the best log analyzer tool with each release, as it adds more features and functionalities. We can configure and generate pgBadger reports from Microsoft Azure’s Flexi server logs and it does make a DBA’s life easier! ?

Dec
14
2022
--

Powering PostgreSQL 15 With Columnar Tables

Powering PostgreSQL 15 With Columnar Tables

Prologue

This blog is the result of my investigation into column-wise tables. Hopefully, you will find it useful in your deliberations.

When I started writing it was meant to be a “Good News” blurb. I was pretty optimistic that I’d truly have impressive numbers to share but as it turns out, while there is a potential for significant advantage, a detailed testing regime of the CITUS columnar store extension should be carried out before implementing them into your own environment.

The missing sizzle

Sizzle: A phrase used to show affirmation or approval. It references the sound that one hears when cooking certain delicious foods i.e. bacon or fajitas and therefore transfers those feelings to non-edible objects or ideas.

There’s a lot to take in and I’m often impressed with every new version of PostgreSQL that is released. Nevertheless, there’s been a long-standing feature that, to be honest, I’ve always felt was missing. This weakness is being addressed by the industry. Ironically, some of these implementations are in fact Postgres derivatives.

What is this sizzling feature that I speak of you ask?

Column-wise tables!

When to consider column-wise tables

Consider using the CITUS Columnar Extension under one or more of the following conditions:

  • In regards to table attributes:
    • they are wide with “many” columns
    • size should at a minimum be in the tens of GB
    • the typical row is byte “heavy”
  • System space is at a premium i.e. it’s worth your while to manage space utilization as much as possible.
  • OLAP is a major component of overall activity i.e. lots of different kinds of SELECT statements.
  • INSERT performance is not a priority.
  • Indexes are not feasible 
  • Creating EXPRESSION indexes on columnar tables is faster by orders of magnitude than on heap tables.

CAVEAT: You cannot perform UPDATE, DELETE operations on a columnar table.

OLTP vs OLAP

Let’s get back to basics. In the database world there are essentially two types of database operations:

    1. Online Transaction Processing, OLTP: Online transaction processing applications have high throughput and are insert- or update-intensive in database management. These applications are used concurrently by hundreds of users. The key goals of OLTP applications are availability, speed, concurrency, and recoverability.

    2. Online analytical processing, OLAP: Online analytical processing applications enable users to analyze multidimensional data interactively from multiple perspectives which consist of three basic analytical operations: 
      1. consolidation (roll-up)
      2. drill-down
      3. slicing and dicing

About columnar tables

As an RDBMS, PostgreSQL is geared more toward OLTP than OLAP operations. Manipulating data by inserting, updating, deleting, etc., it does well. But while eminently capable of performing OLAP, it’s not quite as efficient. The primary reason is actually a pretty common characteristic among most RDBMS i.e. it stores and processes collections of data as individual records, otherwise known as row-wise processing. 

Suppose you are performing a deep-dive analysis involving a table of records containing 50 columns. And further, suppose that your query only needs a single column. It still needs to read all of the data from those 50 columns per row in order to access that single column. And if the query processes 100 million rows, that’s definitely not trivial!

Now let’s consider a reorientation of the table i.e. column-wise. In this case, the query only reads one column and not 50. The result is that the query is now much lighter, requiring less IO and processing power yet achieving the same result in significantly less time.

Introducing the CITUS columnar extension

As a General Purpose Database Management System, basic behaviors can be reconfigured in Postgres for different purposes. And as such, it is possible to enhance PostgreSQL with columnar tables by using the CITUS columnar table extension.

The CITUS columnar extension is just one part of a larger set of capabilities of this extension that when fully implemented creates a fully scalable distributed Postgres database system.

The CITUS columnar extension feature set includes:

  • Highly compressed tables:
    • Reduces storage requirements
    • Reduces the IO needed to scan the table
  • Projection Pushdown:  Only those column(s) within the SELECT are returned further reducing IO.
  • Chunk Group Filtering: Allows queries to skip over whole groups of unrelated data without processing them.

The complete CITUS feature set which, except for the Columnar storage component, is not covered in this blog, includes:

  • Distributed tables
  • References tables
  • Distributed query engine routes and parallelizes SELECT, DML, and operations
  • Columnar storage
    • compresses data
    • speeds up scans, 
    • supports fast projections
  • Query from any node

Getting it

This is a bit of a bad-news, good-news, excellent-news situation.

BAD-NEWS: At the time of writing this blog, the only publicly available packages on the Postgres repositories that I could find were the cstore foreign data wrappers which cover only version 12 and older versions of Postgres.

GOOD-NEWS: The CITUS extension is available on the Citus Data download page.

EXCELLENT-NEWS: As part of the process of writing this blog custom, DEB and RPM packages were created for PostgreSQL version 15 for CENTOS 7, 8, and Ubuntu 18.04, 20.04 and which are available for download from github HERE.

First-time installation

Assuming you opted for the easy way, installing the packages made for this blog:

  • Go to the PostgreSQL download page and configure your packaging system for your Linux distribution.
  • Download the appropriate custom-made columnar package for your version of Linux:
# Ubuntu 18.04
wget https://github.com/rbernierZulu/columnar-tables-extension/blob/main/postgresql-15-citus-columnar_11.1-1-UBUNTU1804_amd64.deb

# Ubuntu 20.04
wget https://github.com/rbernierZulu/columnar-tables-extension/blob/main/postgresql-15-citus-columnar_11.1-1-UBUNTU2004_amd64.deb

# Centos 7
wget https://github.com/rbernierZulu/columnar-tables-extension/blob/main/postgresql-15-citus_columnar-11.1-1-CENTOS7.x86_64.rpm

# Centos 8
wget https://github.com/rbernierZulu/columnar-tables-extension/blob/main/postgresql-15-citus_columnar-11.1-1-CENTOS8.x86_64.rpm

 

  • Install the package: note that all the dependent packages, including the Postgres server, will be automatically installed onto your host. Pay attention to CENTOS 7 as it requires the epel-release repository too.
# Ubuntu 18.04
apt update
apt-get install ./postgresql-15-citus-columnar_11.1-1-UBUNTU1804_amd64.deb

# Ubuntu 20.04
apt update
apt-get install ./postgresql-15-citus-columnar_11.1-1-UBUNTU2004_amd64.deb

# Centos 7
# ATTENTION: the epel package must be installed beforehand!
yum install epel-release
yum install ./postgresql-15-citus_columnar-11.1-1-CENTOS7.x86_64.rpm

# Centos 8
dnf -qy module disable postgresql
dnf install ./postgresql-15-citus_columnar-11.1-1-CENTOS7.x86_64.rpm
dnf check-update
  • Initialize the postgres data cluster and configure appropriately for your distribution of Linux in order to login to postgres.

 

A working example of columnar tables

What follows here are the results of my analysis. Of course, there’s always more that can be said. Hopefully, this will give you enough of an overview of the possibilities of this extension.

Installing the extension into your database is pretty much standard fare:

-- create your extension in the database:
create database db01;
\c db01
create extension citus_columnar;

Here are two tables, of type HEAP and COLUMNAR, that will be used for the initial investigations. You’ll notice that it took less time to create the regular HEAP accessed table than the columnar table:

drop table if exists t1,t2;

\timing

-- Time: 7628.612 ms (00:07.629)
create table if not exists t1(id,qty) 
using heap 
as 
select (random()*10)::int, (random()*1000)::int from generate_series(1,10e6);

-- Time:15635.768 ms (00:15.636)
create table if not exists t2(id,qty) 
using columnar 
as 
select (random()*10)::int, (random()*1000)::int from generate_series(1,10e6);
Table "public.t1"
Column |  Type   | Collation | Nullable | Default  
--------+---------+-----------+----------+---------
id     | integer |           |          |  
qty    | integer |           |          |  
Access method: heap

                Table "public.t2"
Column |  Type   | Collation | Nullable | Default  
--------+---------+-----------+----------+---------
id     | integer |           |          |  
qty    | integer |
Access method: columnar

The number of records is 10 million:

with a(row_wise)    as (select count(*) from t1),
     b(column_wise) as (select count(*) from t2)
select row_wise,column_wise from a,b;

 row_wise | column_wise
----------+-------------
  9900001 |     9900001

This is very cool, the columnar compression does a great job of shrinking the table:

\dt+ t[12]
                                   List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size  |
--------+------+-------+----------+-------------+---------------+--------+
 public | t1   | table | postgres | permanent   | heap          | 346 MB |
 public | t2   | table | postgres | permanent   | columnar      | 27 MB  |

SQL statements, preliminary

Let’s begin by comparing basic administration and SELECT statements of a HEAP vs COLUMNAR table. 

Examining the tabulated results you’ll see that much of the performance times indicate that columnar tables either perform, at best, similarly to that of a HEAP table but most of the time they take more time executing the same operations.

Using a psql session, the following SQL statements are executed and examined for performance differences:

SQL

Timings 

— HEAP TABLE
create table if not exists t1(id,qty) using heap as select (random()*10)::int, (random()*1000)::int from generate_series(1,10e6);

7.6s

15.6s

— COLUMNAR TABLE
create table if not exists t2(id,qty) using columnar as select (random()*10)::int, (random()*1000)::int from generate_series(1,10e6);

— HEAP TABLE, adding 5 million records
do
$$
begin
    for i in 5.1e6..10e6
    loop
    insert into t1 values((random()*10)::int,(random()*1000)::int);
    end loop;
end
$$;

13.7s

18.5s

— COLUMNAR TABLE, adding 5 million records
do
$$
begin
    for i in 5.1e6..10e6
    loop
    insert into t2 values((random()*10)::int,(random()*1000)::int);
    end loop;
end
$$;

— HEAP TABLE
create index on t1(id);

4.9s

7.8s

— HEAP TABLE
create index on t2(id);

SQL statement query plans

Part one (10 million records, two column tables)

Using the aforementioned table definitions, the following metrics were generated with the runtime parameter max_parallel_workers_per_gather = 4.

It’s quite evident that, at least for these two tables, there’s no performance benefit of a columnar accessed table over a regular heap accessed one:

SQL 

Timings 

— HEAP TABLE
explain analyze select id,qty from t1;

742.411 ms

914.096 ms

— COLUMNAR TABLE
explain analyze select id,qty from t2;

— HEAP TABLE
explain analyze select id,qty from t1 order by random();

6441.425 ms

5871.620 ms

— COLUMNAR TABLE
explain analyze select id,qty from t2 order by random();

— HEAP TABLE
explain analyze select sum(qty) from t1;

329.562 ms

902.614 ms

— COLUMNAR TABLE
explain analyze select sum(qty) from t2;

— HEAP TABLE
explain analyze select id,sum(qty) from t1 group by id;

531.525 ms

1602.756 ms

— COLUMNAR TABLE
explain analyze select id,sum(qty) from t2 group by id;

Part two (five million records, 100 column tables)

In order to get a better idea of performance differences, a second set of tables at a greater scale were created. However, this time, while the number of records was halved, the number of columns was increased from two to one hundred.

Even if most of the columns are simply copies of one another, the columnar table’s resultant compression is remarkable as the default size is reduced by a factor of 752X.

/* TABLES

                                                  Table "public.t[34]"
 Column |  Type  | Collation | Nullable |                                  Default
--------+--------+-----------+----------+----------------------------------------------------------------------------
 c1     | bigint |           | not null | nextval('t1_c1_seq'::regclass)
 c2     | bigint |           |          |
 c3     | text   |           |          | 'aowjfa fawjfawofjawofjawoifawevvaerarpfjkaofvaweawe[OJARGOIAJOAFWF'::text
 c4     | text   |           |          | 'aowjfa fawjfawofjawofjawoifawevvaerarpfjkaofvaweawe[OJARGOIAJOAFWF'::text
.
.
.
 c98    | text   |           |          | 'aowjfa fawjfawofjawofjawoifawevvaerarpfjkaofvaweawe[OJARGOIAJOAFWF'::text
 c99    | text   |           |          | 'aowjfa fawjfawofjawofjawoifawevvaerarpfjkaofvaweawe[OJARGOIAJOAFWF'::text
 c100   | text   |           |          | 'aowjfa fawjfawofjawofjawoifawevvaerarpfjkaofvaweawe[OJARGOIAJOAFWF'::text
Indexes:
    "t1_pkey" PRIMARY KEY, btree (c1)
    "t1_c2_idx" btree (c2)

List of relations
 Schema | Name | Type  |  Owner   | Persistence | Access method | Size  | Description
--------+------+-------+----------+-------------+---------------+-------+-------------
 public | t3   | table | postgres | permanent   | heap          | 67 GB |
 public | t4   | table | postgres | permanent   | columnar      | 89 MB |
*/

Examining the indexes one sees them to be the same size.

List of relations
Schema |     Name      | Type  |  Owner   | Table | Persistence | Access method |  Size  |
-------+---------------+-------+----------+-------+-------------+---------------+--------+
public | t3_c2_idx     | index | postgres | t3    | permanent   | btree         | 105 MB |  
public | t3_c2_idx1    | index | postgres | t3    | permanent   | btree         | 105 MB |  
public | t3_length_idx | index | postgres | t3    | permanent   | btree         | 33 MB  |  
public | t3_pkey       | index | postgres | t3    | permanent   | btree         | 107 MB |  
public | t4_c2_idx     | index | postgres | t4    | permanent   | btree         | 105 MB |  
public | t4_length_idx | index | postgres | t4    | permanent   | btree         | 33 MB  |  
public | t4_pkey       | index | postgres | t4    | permanent   | btree         | 107 MB |

Unlike the first set of query plans, these ones clearly demonstrate a significant performance improvement. 

Curious to see what would change in the way of performance, the varying the  max_parallel_workers_per_gather doesn’t appear to have changed much.

SQL 

      Timings
      max parallel workers 
      4                            1

— HEAP TABLE without index
explain analyze select sum(c2) from t3;

9.6s

590.176ms

8.7s

596.459ms

— COLUMNAR TABLE without index
explain analyze select sum(c2) from t4;

— HEAP TABLE
explain analyze select count(c3) from t3;

10.4s

509.209 ms

8.8s

541.452ms

— COLUMNAR TABLE
explain analyze select count(c3) from t4;

— HEAP TABLE
explain analyze select max(length(c25)) from t3;

1m34s

1.1s

1m17s

1.2s

— COLUMNAR TABLE
explain analyze select max(length(c25)) from t4;

— HEAP TABLE
explain analyze select sum(length(c50)) from t3;

1m33s

1.2s

1m18s

1.2s

— COLUMNAR TABLE
explain analyze select sum(length(c50)) from t4;

Working with indexes

General observations: Btree indexes are similar in size between HEAP and COLUMNAR tables. Overall their performance also appears similar although the columnar table’s index is somewhat slower, no doubt due to the extra processing required to uncompress the table’s values. 

Regarding Expression Indexes: Creating an EXPRESSION index on COLUMNAR table is significantly faster than that of HEAP:

-- 1m17s
create index on t3(length(c90));
-- 14s
create index on t4(length(c90));

Regarding Runtime Parameter max parallel workers: Index performance varies considerably on HEAP tables depending upon the value set to max parallel workers.

The following results highlight that, depending upon the type of table used, it can become important when hardware resource and server costs are a consideration:

SQL 

Timings

max parallel workers  

   4                        1

— HEAP TABLE using BTREE index
explain analyze select sum(c2) from t3;

467.789ms

561.522 ms

748.939ms

599.629ms

— COLUMNAR TABLE using BTREE index
explain analyze select sum(c2) from t4;

— HEAP TABLE using EXPRESSION index
explain analyze select max(length(c90)) from t3;

1.614ms

31.980ms

2.346ms

38.766ms

— COLUMNAR TABLE using EXPRESSION index
explain analyze select max(length(c90)) from t4;

About table constraints and access methods

Overall, indexes, constraints, and access methods are still evolving with many of the features still to be implemented.

Let’s start with a big issue i.e. DELETE and UPDATE:

-- fails
delete from t2 where id=5;  
ERROR:  UPDATE and CTID scans not supported for ColumnarScan

-- fails
update t2 set id=5;   
ERROR:  UPDATE and CTID scans not supported for ColumnarScan

Creating indexes on a columnar table is restricted to btree indexes:

-- works
create index on t2 using btree (id);

-- fails
create index on t2 using columnar (id);
ERROR:  unsupported access method for the index on columnar table t2

Creating foreign key constraints aren’t implemented:

select generate_series as id into t3 from generate_series(0,15);
alter table t3 add primary key(id);

-- works for our standard table t1
alter table t1 add foreign key(id) references t3(id);

-- fails with the columnar table t2
alter table t2 add foreign key(id) references t3(generate_series);
ERROR:  Foreign keys and AFTER ROW triggers are not supported for columnar tables
HINT:  Consider an AFTER STATEMENT trigger instead.

--works after converting table t1 from COLUMNAR to HEAP
alter table t2 set access method heap;
alter table t2 add foreign key(id) references t3(generate_series);
alter table t2 set access method columnar;

 

Partitioning

Columnar tables can be used as partitions; a partitioned table can be made up of any combination of row and columnar partitions.

An excellent use case is INSERT once and READ only table partitions where one can leverage both its compression and better performing OLAP type queries for very large tables.

 

Caveat

Columnar Extension Limitations, as of version 11.1:

  • It takes more time to create the table than standard heap access-based tables
  • The query performance is equal or slower with smallish tables compared to heap-based tables.
  • There is no update/delete possible in a columnar table.
  • The indexes are limited to btree, as of version 10.0.
  • There is no logical replication.

There’s actually more documented. Refer here for more information.

AUTHOR’S NOTE: In regard to the custom packages created for this blog. The entire citus suite is designed to enhance Postgres as a distributed database solution. Only the columnar table component was covered in this blog. The complete feature set is part of this package install and should work but has not been tested. 

Conclusion

Despite its current limitations, there are use cases where this extension can definitely make a difference. And it speaks well of its future as the team continues development by constantly improving and adding capabilities. Watch for updates on its GitHub source repository.

 

Dec
12
2022
--

Talking Drupal #377 – Drupal State

Today we are talking about the Drupal State module with Brian Perry.

For show notes visit: www.talkingDrupal.com/377

Topics

  • NedCamp
  • Drupal State Module
  • Who is involved
  • Drupal State – a module without a release
  • NPM
  • Decoupled menus
  • What problem does this solve
  • How is this different from GraphQl
  • In production
  • Roadmap
  • Getting involved
  • Which front-end
  • Individual Drupal ecosystems like commerce
  • Evaluating JS Frameworks
  • Stability

Resources

Guests

Brian Perry – Brianperry.dev @bricomedy

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Leslie Glynn – redfinsolutions.com @leslieglynn

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Christmas Snow Adds a Javascript snow effect to your website.

Dec
12
2022
--

Using Percona Toolkit to Alter Database Tables Online: A Controlled Approach

Percona Toolkit to Alter Database Tables Online

Percona Toolkit to Alter Database Tables OnlineTable modifications are a common task for database administrators. In this blog, I’ll explain how to alter tables online in a controlled manner that does not disrupt application users or cause application downtime.

One of the tools in Percona Toolkit is pt-online-schema-change, a utility that alters the structure of a table without interfering with the reads or writes. The tool creates an empty copy of the table to alter and modify as desired before copying the rows from the original table into the new one.

When the copying is finished, it removes the original table and replaces it with the new one. Any changes made to data in the original tables during the copy process will be reflected in the new table as the tool creates triggers on the original table to update the corresponding rows in the new table.

How to test the pt-online-schema-change command?

Before running the actual alter using the tool, perform a dry run to ensure the pt-online-schema-change command is functional. The –dry-run option creates and modifies the new table without adding triggers, copying data, or replacing the existing table.

The basic command for modifying a table is as follows, which may need to be tweaked as needed using the variables like –critical-load threads_running –max-load Threads_running –chunk-size –max-lag, –max-flow-ctl (Percona XtraDB Cluster) and so on running in a production environment.

I’m using “ENGINE=InnoDB” for the first test case, which rebuilds the table; this is useful for removing fragmented spaces from the table.

Dry-run test:

$ pt-online-schema-change --dry-run --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Starting a dry run.  `mytestdb`.`authors` will not be altered.  Specify --execute instead of --dry-run to alter the table.
Creating new table...
Created new table mytestdb._authors_new OK.
Altering new table...
Altered `mytestdb`.`_authors_new` OK.
Not creating triggers because this is a dry run.
Not copying rows because this is a dry run.
Not swapping tables because this is a dry run.
Not dropping old table because this is a dry run.
Not dropping triggers because this is a dry run.
2022-12-09T05:44:23 Dropping new table...
2022-12-09T05:44:23 Dropped new table OK.
Dry run complete.  `mytestdb`.`authors` was not altered.
$

How to run the ALTER TABLE?

It is recommended that you read the documentation before performing the task.

To run the alter, replace the –dry-run option with –execute.

$ pt-online-schema-change --execute --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
Found 1 slaves:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Will check slave lag on:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
..
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `mytestdb`.`authors`...
Creating new table...
Created new table mytestdb._authors_new OK.
Altering new table...
Altered `mytestdb`.`_authors_new` OK.
2022-12-09T05:57:10 Creating triggers...
2022-12-09T05:57:10 Created triggers OK.
2022-12-09T05:57:10 Copying approximately 10023 rows...
2022-12-09T05:57:10 Copied rows OK.
2022-12-09T05:57:10 Analyzing new table...
2022-12-09T05:57:10 Swapping tables...
2022-12-09T05:57:10 Swapped original and new tables OK.
2022-12-09T05:57:10 Dropping old table...
2022-12-09T05:57:10 Dropped old table `mytestdb`.`_authors_old` OK.
2022-12-09T05:57:10 Dropping triggers...
2022-12-09T05:57:10 Dropped triggers OK.
Successfully altered `mytestdb`.`authors`.
$

Can we pause the pt-online-schema-change execution? Yes!

The –pause-file=/tmp/pt-osc.pause option helps you to pause the execution. While the file specified by this parameter is present, execution will be paused and resumed when it is removed.

Note: I shortened the pt-osc log to make the result more readable.

$ pt-online-schema-change --pause-file=/tmp/pt-osc.pause --execute --alter "ENGINE=InnoDB" h=172.31.92.72,D=mytestdb,t=authors;
….
2022-12-10T15:42:01 Copying approximately 10023 rows...
Sleeping 60 seconds because /tmp/pt-osc.pause exists
Sleeping 60 seconds because /tmp/pt-osc.pause exists
Copying `mytestdb`.`authors`:  73% 00:44 remain
2022-12-10T15:44:04 Copied rows OK.
...
Successfully altered `mytestdb`.`authors`.

Can we review the data and tables before swapping them? Yes!

The —no-swap-tables —no-drop-old-table —no-drop-new-table —no-drop-triggers options allow us to do the alter in a controlled manner. 

That is, we will let tools handle the majority of the tasks, such as creating the new table, altering, copying the records, and the remaining table swapping and trigger dropping will be done manually.

Caution:  The —no-swap-tables option does not work if the table has foreign keys with child tables associated.

$ pt-online-schema-change --no-swap-tables --no-drop-triggers --no-drop-old-table --no-drop-new-table --execute --alter "CHARACTER SET = utf8mb4, COLLATE = utf8mb4_general_ci, MODIFY email varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,MODIFY name  varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL" h=172.31.92.72,D=mytestdb,t=authors2;
Found 1 slaves:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Will check slave lag on:
ip-172-31-90-216.ec2.internal -> ip-172-31-90-216.ec2.internal:socket
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `mytestdb`.`authors2`...
Creating new table...
Created new table mytestdb._authors2_new OK.
Altering new table...
Altered `mytestdb`.`_authors2_new` OK.
2022-12-09T09:16:28 Creating triggers...
2022-12-09T09:16:28 Created triggers OK.
2022-12-09T09:16:28 Copying approximately 10067 rows...
2022-12-09T09:16:29 Copied rows OK.
Not dropping old table because --no-drop-triggers was specified.
Not dropping triggers because --no-drop-triggers was specified.  To drop the triggers, execute:
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_del`
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_upd`
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_ins`
Not dropping the new table `mytestdb`.`_authors2_new` because --no-drop-new-table was specified.  To drop the new table, execute:
DROP TABLE IF EXISTS `mytestdb`.`_authors2_new`;
Successfully altered `mytestdb`.`authors2`.
$

pt-online-schema-change has done the job and we now have two tables and three triggers. So we can safely review the table structure and data in the _authors2_new table, and once we’re sure everything is in order, we can swap and drop the triggers.

mysql> show tables like '%authors2%';
+---------------------------------+
| Tables_in_mytestdb (%authors2%) |
+---------------------------------+
| _authors2_new                   |
| authors2                        |
+---------------------------------+
2 rows in set (0.00 sec)

Find the TRIGGERS:

mysql> SELECT TRIGGER_NAME,EVENT_MANIPULATION FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_SCHEMA='mytestdb' and EVENT_OBJECT_TABLE like '%authors%' \G
*************************** 1. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_del
EVENT_MANIPULATION: DELETE
*************************** 2. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_upd
EVENT_MANIPULATION: UPDATE
*************************** 3. row ***************************
      TRIGGER_NAME: pt_osc_mytestdb_authors2_ins
EVENT_MANIPULATION: INSERT
3 rows in set (0.00 sec)

Run the following SQL to swap the tables and remove the triggers.

RENAME TABLE
mytestdb.authors2 TO mytestdb._authors2_old,
mytestdb._authors2_new TO mytestdb.authors2;

Lastly, remove the triggers and the old table:

DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_del`;
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_upd`;
DROP TRIGGER IF EXISTS `mytestdb`.`pt_osc_mytestdb_authors2_ins`;
DROP TABLE IF EXISTS mytestdb._authors2_old;

Wrap up

pt-online-schema-change is a part of the Percona Toolkit for altering tables online, and we can customize it with various options available based on our needs. MySQL’s online DDL with the direct alter is an option, particularly for dropping indexes and changing metadata, among other things. Where online DDL is not a choice, we can use the pt-online-schema-change.

Caution: It is not recommended to run the tool directly in the replica instance as the PT-OSC operations will not produce a consistent table on the replicas. The statements coming via binlog will not be processed by triggers, so whatever new data is coming in via replication, will be missing in the new table.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com