Oct
31
2022
--

Talking Drupal #371 – WebOps Enabled Accessibility

Today we are talking about WebOps Enabled Accessibility with Tearyne Almendariz and Blake Bertuccelli.

For show notes visit: www.talkingDrupal.com/371

Topics

  • What is WebOps Enabled Accessibility
  • Who should be concerned about it
  • Describe a few use cases
  • What types of tests are good to perform
  • If Accessibility is “built-in” why do I need to test for it
  • How much accessibility is content related
  • Where do you run the tests
  • Are they pass / fail or percentage based
  • What tools are most common
  • How does Pantheon support WebOps Accessibility
  • Equalify.app

Resources

Guests

Blake Bertuccelli-Booth – Equalify.app @bbertucc

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Tearyne Almendariz – @tearyneg

MOTW Correspondent

Martin Anderson-Clutz – @mandclu

Entity Comparison

  • Brief description
    • Generate a configurable comparison table for two or more Drupal entities: products, events, people, etc
    • Not dependent on Commerce, but should be compatible
  • Brief history
    • Created in 2017
  • Current version
    • Current 4.0 release created in Aug 2022, ready for Drupal 10, and has security coverage
  • Actively maintained
  • Number of open issues
    • 3 open, 2 have patches
  • Usage stats
    • 292 sites
  • Module features and usage
    • Generates “Comparison” configuration entities, which specify which entity type and bundle will be used
    • Automatically creates a view mode for the selected bundle
    • Fields set to display will be included in the comparison table
    • Two custom blocks: one to show the link to add to or remove an entity from the comparison (via AJAX), and one to show a link to the comparison
    • Link for toggle of an entity’s inclusion is also available as a field
Oct
31
2022
--

Coroot – The Next Level in Kubernetes Observability

Coroot - The Next Level in Kubernetes Observability

Coroot - The Next Level in Kubernetes ObservabilityTo follow up on my previous Kubernetes articles:

I would like to introduce a project which brings Kubernetes observability to the next level and provides major help in visualizing and diagnosing Kubernetes deployments.

The project is Coroot and it comes with the Community version Coroot Community Edition (source code at https://github.com/coroot/coroot) which you can install on your Kubernetes. The installation using helm charts is trivial:

helm repo add coroot https://coroot.github.io/helm-charts
helm repo update
helm install --namespace coroot --create-namespace coroot coroot/coroot

Enabling port forwarding to access Coroot through the web:

kubectl port-forward -n coroot service/coroot 8080:8080 --address='0.0.0.0'

The first you can see is the graph representing Kubernetes deployment. Why is it helpful? Let’s review what we can see in the plain Kubernetes:

List of pods:

kubectl get pods   
NAME                                             READY   STATUS    RESTARTS       AGE
percona-server-mysql-operator-7bb68f7b6d-8mg4j   1/1     Running   0              2d20h
cluster1-orc-0                                   2/2     Running   0              2d5h
cluster1-orc-1                                   2/2     Running   0              2d5h
cluster1-mysql-1                                 3/3     Running   1 (2d5h ago)   2d5h
cluster1-orc-2                                   2/2     Running   0              2d5h
cluster1-haproxy-0                               2/2     Running   0              2d5h
cluster1-haproxy-1                               2/2     Running   0              2d5h
cluster1-haproxy-2                               2/2     Running   0              2d5h
cluster1-mysql-2                                 3/3     Running   1 (2d5h ago)   2d5h
cluster1-mysql-0                                 3/3     Running   1 (23h ago)    23h
my-op-psmdb-operator-855dbfff5d-s8js5            1/1     Running   0              69m
cluster2-orc-0                                   2/2     Running   0              58m
cluster2-orc-1                                   2/2     Running   0              58m
cluster2-mysql-0                                 3/3     Running   0              58m
cluster2-haproxy-0                               2/2     Running   0              57m
cluster2-haproxy-1                               2/2     Running   0              57m
cluster2-haproxy-2                               2/2     Running   0              57m
cluster2-mysql-1                                 3/3     Running   1 (57m ago)    58m
cluster2-orc-2                                   2/2     Running   0              57m
cluster2-mysql-2                                 3/3     Running   1 (56m ago)    57m

And the list of services:

kubectl get svc  
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                 AGE
kubernetes               ClusterIP   10.43.0.1       <none>        443/TCP                                 2d20h
cluster1-mysql-primary   ClusterIP   10.43.252.22    <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   2d5h
cluster1-mysql-unready   ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   2d5h
cluster1-mysql           ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   2d5h
cluster1-orc             ClusterIP   None            <none>        3000/TCP,10008/TCP                      2d5h
cluster1-orc-0           ClusterIP   10.43.236.23    <none>        3000/TCP,10008/TCP                      2d5h
cluster1-orc-1           ClusterIP   10.43.12.1      <none>        3000/TCP,10008/TCP                      2d5h
cluster1-orc-2           ClusterIP   10.43.12.67     <none>        3000/TCP,10008/TCP                      2d5h
cluster1-haproxy         ClusterIP   10.43.196.248   <none>        3306/TCP,3307/TCP,3309/TCP              2d5h
cluster2-mysql-primary   ClusterIP   10.43.219.150   <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   59m
cluster2-mysql-unready   ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   59m
cluster2-mysql           ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP,6033/TCP   59m
cluster2-orc             ClusterIP   None            <none>        3000/TCP,10008/TCP                      59m
cluster2-orc-0           ClusterIP   10.43.226.204   <none>        3000/TCP,10008/TCP                      59m
cluster2-orc-1           ClusterIP   10.43.122.217   <none>        3000/TCP,10008/TCP                      59m
cluster2-orc-2           ClusterIP   10.43.221.34    <none>        3000/TCP,10008/TCP                      59m
cluster2-haproxy         ClusterIP   10.43.41.95     <none>        3306/TCP,3307/TCP,3309/TCP              59m

I do not know how this is for you, but I have a hard time processing these lists, while Coroot offers visual help. Coroot is a web application, so there will be a lot of screenshots.

There is a glance overview of my deployments, including physical nodes:

MySQL Kubernetes deployments

And from there we can drill down into individual components, for example, cluster1-mysql:

cluster1-mysql

And there you can highlight important relations, like for HAProxy:

HAProxy coroot

We can see that it handles pod mysql-1 and mysql-2 right now, even for the MySQL setup itself:

handles pod mysql-1 and mysql-2

We can see mysql-2 is the primary and mysql-0 and mysql-1 are replicas. There is also a network layout:

coroot kubernetes network layout

This shows network connections between different physical nodes and pods mapping to nodes. If we run a scenario with MySQL primary failure (similar to what we did in Using Percona Kubernetes Operators With K3s Part 2: Percona Server for MySQL Operator), it also can be visualized.

Node failure:

coroot node failure

Primary failover from mysql-0 to mysql-2:

primary fallover coroot

In general, Coroot offers visibility into a variety of problems you may encounter in Kubernetes, and there are more details on their Failurepedia page.

Summary

In my view, Coroot brings the visibility of Kubernetes deployments to the next level and is a must-have tool for the diagnostics and troubleshooting of Kubernetes workloads. If you run your application in Kubernetes, I recommend you give it a try!

Oct
31
2022
--

Percona Distribution for PostgreSQL 15.0, Percona Backup for MongoDB 2.0.2: Release Roundup October 31, 2022

Percona Releases Oct 31 2022

It’s time for the release roundup!

Percona Releases Oct 31 2022Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since October 17, 2022. Take a look!

Percona Distribution for PostgreSQL 15.0

Percona Distribution for PostgreSQL 15.0, a solution with the collection of tools from the PostgreSQL community that are tested to work together and serve to assist you in deploying and managing PostgreSQL, was released on October 24, 2022. Release highlights include the addition of the MERGE command, a view can now be created with the permissions of the caller instead of the view creator, and a new logging format jsonlog outputs log data using a defined JSON structure, which allows PostgreSQL logs to be processed in structured logging systems.

Download Percona Distribution for PostgreSQL 15.0

 

Percona Backup for MongoDB 2.0.2

On October 27, 2022, we released Percona Backup for MongoDB 2.0.2, a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a non-sharded replica set), and for restoring those backups to a specific point in time. This release fixes the usability issues for applications operating with Percona Backup for MongoDB by providing the error messages for the status output in the JSON format.

Download Percona Backup for MongoDB 2.0.2

 

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Oct
28
2022
--

Restore a Specific MongoDB Collection(s) From a Physical Backup

Restore a Specific MongoDB Collection From a Physical Backup

Restore a Specific MongoDB Collection From a Physical BackupWe all know how critical it is to get our data back as soon as possible. To achieve this, as you all might be aware, we have two methods of restore available for MongoDB: logical and physical.

Of course, this depends on the type of backup we have configured. For large data sets, it is advisable to use physical backups which offer faster backup and restore times. I’ve used Percona Backup for MongoDB (PBM) to take a physical backup in this demo.

But here’s a catch. It’s very complicated to restore specific collection(s) from a physical backup, be it any kind of backup method like volume snapshot, cold rsync data file copies, or a Hotbackup/Percona Backup for MongoDB.

The simplest, or what almost everyone knows, is to restore the physical dump, a.k.a. data files, onto a temporary “mongod”  and get the logical dump of that collection(s) manually first, followed by a logical restore to the original cluster. This is usually done with the conjunction of mongodump and mongorestore. 

But there’s another way where we can avoid provisioning up a temporary mongod and then a generally slow logical dump, a.k.a. mongodump.

To achieve this, in this blog post we’ll be taking a look at the “wt” utility which will help us to perform a restore. 

As a pre-requisite, we need to first build “wt” from either their GitHub repo or download a tar archive file and build it to a specific version. Now let’s jump straight into the steps now.

In this blog, we’ll be building using the GitHub repo.

1. Clone the WiredTiger repo

$ git clone https://github.com/wiredtiger/wiredtiger.git 
$ cd wiredtiger

2. Check your mongod version and checkout the same branch version.

$ git branch --all | grep mongodb
$ git checkout remotes/origin/mongodb-5.0 (I was running version 5.0)

3. Depending on the type of compression method we’re using, we will install its library and configure its WiredTiger library extension. Mostly snappy is being used currently so I’ve used the same method. But depending upon what compression method you have, you can provide the same library path after installing/configuring it.

$ ./configure --enable-zstd --enable-zlib --enable-lz4  -enable-snappy (if it fails with 'configure: error', then install that library first</strong>)
$ yum install libzstd-devel zlib-devel lz4-devel snappy-devel
$ make -j $(nproc)

Once we have the necessary dependencies installed and verified, we can execute the commands below to dive straight into the action of restoring the dropped collection.

Restoration time

1. Find out the relevant URI of the collection you want to restore. This can be checked on the original cluster:

> demo:PRIMARY> db.demo.stats().wiredTiger.uri
statistics:table:collection-36-8063732008498576985
> demo:PRIMARY>  db.adminCommand({'getCmdLineOpts':1}).parsed.storage.dbPath
/var/lib/mongo

2. It’s necessary to check the compression method of the collection. Accordingly, we need to use the respective WiredTiger library extension. Look for ‘block_compressor’:

> demo:PRIMARY> db.demo.stats().wiredTiger.creationString
.........block_compressor=snappy.........  //truncated output

3. There are different ways to take a physical dump like Percona Backup for MongoDB, Hotbackup, Snapshot based, or even a rsync to a separate volume/path. Let’s now take a physical backup using Percona Backup for MongoDB and then drop our collection.

========
FS  /backup
  Snapshots:
    2022-10-26T03:05:19Z 10.87MB <physical> [restore_to_time: 2022-10-26T03:05:21Z]


$ pbm describe-backup 2022-10-26T03:05:19Z
name: "2022-10-26T03:05:19Z"
opid: 6358a3ef2722ffe26150f98b
type: physical
last_write_time: "2022-10-26T03:05:21Z"
last_transition_time: "2022-10-26T03:05:26Z"
mongodb_version: 4.4.16-16
pbm_version: 2.0.1
status: done
size_h: 10.9 MiB
replsets:
- name: demo
  status: done
  last_write_time: "2022-10-26T03:05:21Z"
  last_transition_time: "2022-10-26T03:05:25Z"
  security: {}


$ ls -lrth /backup
total 12K
drwxr-xr-x. 3 mongod mongod  17 Oct 26 03:05 2022-10-26T03:05:19Z
-rw-r--r--. 1 mongod mongod 11K Oct 26 03:05 2022-10-26T03:05:19Z.pbm.json


$ ls -lrth /backup/2022-10-26T03:05:19Z/demo
total 28K
drwxr-xr-x. 4 mongod mongod   37 Oct 26 03:05 admin
drwxr-xr-x. 4 mongod mongod   37 Oct 26 03:05 local
-rw-r--r--. 1 mongod mongod 6.7K Oct 26 03:05 WiredTiger.backup.s2
drwxr-xr-x. 4 mongod mongod   37 Oct 26 03:05 config
-rw-r--r--. 1 mongod mongod 4.7K Oct 26 03:05 _mdb_catalog.wt.s2
-rw-r--r--. 1 mongod mongod 2.0K Oct 26 03:05 WiredTigerHS.wt.s2
drwxr-xr-x. 4 mongod mongod   37 Oct 26 03:05 percona
-rw-r--r--. 1 mongod mongod 1.9K Oct 26 03:05 sizeStorer.wt.s2
-rw-r--r--. 1 mongod mongod   68 Oct 26 03:05 WiredTiger.s2z
drwxr-xr-x. 2 mongod mongod   76 Oct 26 03:05 journal

As you can clearly see, using PBM we can list all our underlying data files which we’ll use for this demo purpose.

> demo:PRIMARY> use percona
switched to db percona
> demo:PRIMARY> show collections
demo
> demo:PRIMARY> db.demo.countDocuments({})
1000000
> demo:PRIMARY>
> demo:PRIMARY> db.demo.drop()  //accidentally dropped collection</strong>
true
> demo:PRIMARY> show collections
> demo:PRIMARY> db
percona

4. Take the “wt dump” of the URI file from Step 1 into a mongodb known bson format. A regular output of “wt dump” would be in binary hex string format, otherwise.

$ ./wt -v -h  /backup -C "extensions=[/root/wiredtiger/ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" dump -x file:collection-36-8063732008498576985.wt | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p > /backup/percona.demo.bson

If you have got a different compression method for the collection or on a global level, you can use its respective WiredTiger library extension.

/clonedPath/wiredtiger/ext/compressors/<compressionMethod>/.libs/libwiredtiger_<compMethod>.so

Note: Don’t forget to append ‘.wt’ at the end of URI and add the prefix “table:” or “file:”, as within WiredTiger, all collection files are in the table or file format.

Let’s look at and understand the different flags used in the above command.

  • Extensions: based on our compression method, we have used respective compression WiredTiger library extension
  • [ -v, -x , dump, -h, -C]: These are WiredTiger binary flags to take the dump from a URI file in hex raw string in verbose style with default or command line provided configuration using “-C”
  • Tail is only used to trim the top seven header lines from the output of “wt dump
  • awk is used to filter out just the keys (line number NR%2 == 1) or values (line number NR%2 == 0) in the pretty-print or hex mode of wt dump.
  • xxd with conjunction of ‘-r’ and ‘-p’  is to convert raw hex strings into mongodb known bson format 
  • Finally, we’re redirecting the output to filename “percona.demo.bson”. This is very important to keep the output filename as per WiredTiger catalog ident which is nothing but a proper full namespace of collection URI. In our case, it was “percona.demo”. 
  • If needed, it can be validated using the below command: 
$ ./wt -v -h  /backup -C "extensions=[/root/wiredtiger/ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("md")) | [.ident, .ns] | @tsv' | sort | grep percona | awk '{print $2}'

percona.demo

5.  Finally, restore bson using the native mongorestore. 

$ mongorestore -authenticationDatabase "admin" --port 37017 -d percona -c demo /backup/percona.demo.bson

2022-10-17T03:03:43.232+0000	checking for collection data in /backup/percona.demo.bson
2022-10-17T03:03:43.240+0000	restoring percona.demo from /backup/percona.demo.bson
2022-10-17T03:03:46.231+0000	[####################....]  percona.demo  106MB/128MB  (83.4%)
2022-10-17T03:03:46.848+0000	[########################]  percona.demo  128MB/128MB  (100.0%)
2022-10-17T03:03:46.848+0000	finished restoring percona.demo (1000000 documents, 0 failures)
2022-10-17T03:03:46.848+0000	1000000 document(s) restored successfully. 0 document(s) failed to restore.

$ mongo --quiet --port 37017 percona
demo:PRIMARY> show collections
demo
demo:PRIMARY> db.demo.countDocuments({})
1000000
demo:PRIMARY> db.demo.findOne()
{
	"_id" : ObjectId("634cba61d42256b5fb5c9033"),
	"name" : "Blanche Potter",
	"age" : 38,
	"emails" : [
		"filvo@av.hm",
		"pamjiiw@ewdofmik.sh",
		"opaibizaj@ha.ch"
	]
}
demo:PRIMARY>

SUMMARY

A few things to consider

1. You all must be thinking, what about indexes, right? ? The same process can be done by dumping the index*.wt files. But it’s rather complex as dumped keys and values have slightly different formats. I’ll soon cover it in a separate blog. Also interesting to mention is that WiredTiger maintains multiple index* URI files for every index, thus it’s better to build manually with the “createIndex” command which is a far easier approach. 

demo:PRIMARY> db.demo.getIndexKeys()
[ { "_id" : 1 }, { "a" : 1 }, { "b" : 1 } ]

demo:PRIMARY> db.demo.stats({'indexDetails': true}).indexDetails['_id_'].uri
statistics:table:percona/index/35-2625234990440311433

demo:PRIMARY> db.demo.stats({'indexDetails': true}).indexDetails['a_1'].uri
statistics:table:percona/index/69-2625234990440311433

demo:PRIMARY> db.demo.stats({'indexDetails': true}).indexDetails['b_1'].uri
statistics:table:percona/index/71-2625234990440311433

2.  In order to perform point-in-time recovery, incremental oplogs still need to be replayed on top of the restored backup, which we have covered. 

3. This method is applicable for sharded clusters as well (both unsharded and sharded collections) but there are a few additional steps that need to be taken. We’ll cover similarly detailed demos in upcoming blogs.

Before I wrap up, let’s talk about some drawbacks since this is a bit risky and complicated approach. Hence test it out first in your lab or test environment first to get familiarized with WiredTiger internals before jumping straight into production. 

Cons

  1. Overall it’s a bit complicated approach as one has to have a clear understanding of “wt” utility and its internals.
  2. Doesn’t restore indexes and needs to be built separately, as mentioned.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Oct
27
2022
--

pt-archiver Misbehaving With Secondary Index

pt-archiver Misbehaving With Secondary Index

pt-archiver Misbehaving With Secondary IndexNot long ago, we found a long-running query in one of our clients on Managed Services. While reviewing this long-running query, we found it was generated by a pt-archiver operation. That doesn’t sound good, right?

I was able to reproduce the issue on my test environment. In this article, I will show you how to use pt-archiver to purge data from a table using a secondary index. First, we need to have the full picture before going any deeper. Here is a representation of the table structure:

Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=4000001 DEFAULT CHARSET=latin1

As you can see, this test table was generated with sysbench. The criteria defined to perform the purge process was to remove all rows where column “k” is lower than 1,500,000. In my test environment, this represents around 40K rows.

mysql> select count(1) from `dbtest`.`sbtest1` where (k < 1500000);
+----------+
| count(1) |
+----------+
|    41687 |
+----------+

Examining the pt-archiver operation used, we found that the options used were logical according to the archiving criteria and the table structure. These are the pt-archiver options used:

pt-archiver \
--source=h=localhost,P=3306,D=dbtest,t=sbtest1,b=0,i=k_1 \
--where='k < 1500000' --limit=1000 \
--primary-key-only --bulk-delete --purge

As you can see, it is suggesting the usage of index “k_1”, which references the column used on the purging criteria, and it is also using the option “–primary-key-only” which is suggested if you just want to purge data, which is our case. The option “–primary–key-only” makes pt-archiver include only the primary key columns on the SELECT clause to avoid fetching all columns from the table; this simple feature speeds up the purge process, especially on wide tables.

However, reviewing the sentence executed in the database, I found that the query running was not using the suggested index:

SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `dbtest`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE (k < 1500000) AND (`id` < '4000000') ORDER BY `id` LIMIT 1000

Now, if we get the actual explain plan for this query, we will see it is not optimal at all, as it is fetching almost two million rows and forcing the usage of the “PRIMARY” index. No wonder why this was a long-running query.

mysql> explain format=json SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `dbtest`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE (k < 1500000) AND (`id` < '4000000') ORDER BY `id` LIMIT 1000\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "783367.33"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "sbtest1",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "id"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 1928044,
        "rows_produced_per_join": 642617,
        "filtered": "33.33",
        "cost_info": {
          "read_cost": "654843.92",
          "eval_cost": "128523.41",
          "prefix_cost": "783367.33",
          "data_read_per_join": "117M"
        },
        "used_columns": [
          "id",
          "k"
        ],
        "attached_condition": "((`dbtest`.`sbtest1`.`k` < 1500000) and (`dbtest`.`sbtest1`.`id` < '4000000'))"
      }
    }
  }
}

But why is pt-archiver doing this? Why isn’t it using the index suggested? After playing around with pt-archiver options, I realized that the usage of option “–primary-key-only” inhibits the usage of the suggested secondary index. This was reported initially on ticket PT-157, and it was marked as fixed. However, we can confirm that the most recent version of pt-archiver 3.4.0 is behaving as expected. So I opened this ticket PT-2112 to follow up on the fix.

Anyways, we need to fix the issue on our client, and to do so, we just need to remove the “–primary-key-only” option, which will allow us to use the suggested index. The downside of doing this is that it will fetch all columns from the table. We can see the SQL statement pt-archiver will generate by specifying the option “–dry-run”; this is a cool feature to see if the statement generated is optimal. 

michael@testbox1:~$ pt-archiver \
> --source=h=localhost,P=3306,D=dbtest,t=sbtest1,b=0,i=k_1 \
> --where='k < 1500000' --limit=1000 \
> --bulk-delete --purge --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`k`,`c`,`pad` FROM `dbtest`.`sbtest1` FORCE INDEX(`k_1`) WHERE (k < 1500000) ORDER BY `k` LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`k`,`c`,`pad` FROM `dbtest`.`sbtest1` FORCE INDEX(`k_1`) WHERE (k < 1500000) AND ((`k` >= ?)) ORDER BY `k` LIMIT 1000
DELETE FROM `dbtest`.`sbtest1` WHERE (((`k` >= ?))) AND (((`k` <= ?))) AND (k < 1500000) LIMIT 1000

If we examine the explain plan for this generated statement we can see that the query cost for this new sentence is up to 7X times better than the previous one, and we are fetching only 79K rows compared to the 2M from the previous query. So definitely a better query.

mysql> explain format=json SELECT /*!40001 SQL_NO_CACHE */ `id`,`k`,`c`,`pad` FROM `dbtest`.`sbtest1` FORCE INDEX(`k_1`) WHERE (k < 1500000) ORDER BY `k` LIMIT 1000\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "111060.21"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "sbtest1",
        "access_type": "range",
        "possible_keys": [
          "k_1"
        ],
        "key": "k_1",
        "used_key_parts": [
          "k"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 79328,
        "rows_produced_per_join": 79328,
        "filtered": "100.00",
        "index_condition": "(`dbtest`.`sbtest1`.`k` < 1500000)",
        "cost_info": {
          "read_cost": "95194.61",
          "eval_cost": "15865.60",
          "prefix_cost": "111060.21",
          "data_read_per_join": "14M"
        },
        "used_columns": [
          "id",
          "k",
          "c",
          "pad"
        ]
      }
    }
  }
}

Conclusion

Hopefully, we will see this feature fixed in the future, but for now, we can just use pt-archiver as we did to take advantage of the secondary indexes. Also, as a recommendation, always use the “–dry-run” option to test your pt-archiver command and make sure the query produced is optimal for your database.

Oct
25
2022
--

Kubernetes: Getting Started With a Minimal MySQL Installation

kubernetes MySQL Installation

kubernetes MySQL InstallationThis week is KubeCon in Detroit, and in preparation for attending I have been polishing up my Kubernetes skills. This big rush to put software in containers and have Kubernetes run everything is getting a lot of push in the industry. Many software applications run perfectly well in ready-made packages from a container made for ephemeral consumption much like a can of Campell’s tomato soup. But generally, relational databases like permanence, stability, and a consistent presence. Databases like to run to stock caches and build statistics, so they are not great at running well after being started. But this article is the first in a series for ‘us’ database folks to learn how to keep our databases happy in a containerized world.

With the trip to the Motor City on my calendar, it was time to answer the question “What is the minimalist Kubernetes installation that I can use to run MySQL?”. I am sharing my steps hoping that others wondering similar questions will find this blog post helpful if not more palatable than a recent can of Campbell’s tomato soup. I pulled out my trusty test mule laptop and installed Ubuntu 22.04.1 LTS with updates. My plan was to run everything needed for a single MySQL instance on the laptop itself.

Googling Kubernetes offerings was exhausting as there are so many different options. Orchestrating Docker containers is what Kubernetes does and there are many choices, most of which are extremely complicated to set up. What was desired was a simple installation with the least number of “moving parts” to get working. I ended up using Minikube as it runs very well running a single-node Kubernetes cluster in a virtual machine on a laptop.

Installing Minikube

Minikube is a “local Kubernetes engine” so we need minimal preparation time and hardware. There are some prerequisite software packages needed when installing Minikube.

$ sudo apt install -y curl wget apt-transport-https

We will also need Docker for our container.

sudo apt-get install ca-certificates curl gnupg lsb-release
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Then we can get the latest Minikube and kubectl binaries.

$ wget https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
$ curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s 
https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

Those binaries need to be given a good home and made executable.

$ sudo cp minikube-linux-amd64 /usr/local/bin/minikube
$ sudo chmod +x /usr/local/bin/minikube
$ sudo cp kubectl /usr/local/bin
$ sudo chmod +x /usr/local/bin/kubectl

Now that the required software is in place, we start our cluster.

minikube start --driver=docker

So it looks like all the various components are functional. Now the containerized applications can be deployed.

Okay, now how do we get MySQL to run?

Putting MySQL into a container will seem a bit off for those used to the popular open source database. The first step is to create a file with the MySQL server’s root password. We create a simple password by sending a simple phrase through the base64 program.

echo -n 'hidave' | base64
aGlkYXZl

And we have to put that password into a file named mysql-secret.yaml

$ cat mysql-secret.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: mysql-pass
type: Opaque
data:
  password: aGlkYXZl

Double-check that the password you received out of base64 is the same as in your file. That yaml file is called a ‘manifest’ and it needs to be applied.

$ kubectl create -f mysql-secret.yaml

secret/mysql-pass created
$ kubectl get secrets
NAME         TYPE     DATA   AGE
mysql-pass   Opaque   1      97m

This creates the mysql-pass object. Now we need a manifest for the MySQL object that tells what is needed to run MySQL. Read through it to double-check paths, ports, and mount points.

apiVersion: v1
kind: Pod
metadata:
  name: k8s-mysql
  labels:
    name: lbl-k8s-mysql
spec:
  containers:
  - name: mysql
    image: mysql:latest
    env:
    - name: MYSQL_ROOT_PASSWORD
      valueFrom:
        secretKeyRef:
          name: mysql-pass
          key: password
    ports:
    - name: mysql
      containerPort: 3306
      protocol: TCP
    volumeMounts:
    - name: k8s-mysql-storage
      mountPath: /var/lib/mysql
  volumes:
  - name: k8s-mysql-storage
    emptyDir: {}

This manifest also needs to be applied.

$ kubectl create -f mysql-pod.yaml

pod/k8s-mysql created
$kubectl get pod
NAME                                               READY   STATUS    RESTARTS       AGE
k8s-mysql                                          1/1     Running   0              99m

At this point in time, we can bash shell to the MySQL container.

$ kubectl exec k8s-mysql -it -- bash
bash-4.4# mysql --user=root --password=$MYSQL_ROOT_PASSWORD
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 8.0.31 MySQL Community Server - GPL

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

We will also need an object to allow other objects to reach MySQL.

Create a mysql-service.yaml with the following contents to make the k8s-mysql pod be reachable:

apiVersion: v1
kind: Service
metadata:
  name: mysql-service
  labels:
    name: lbl-k8s-mysql
spec:
  ports:
  - port: 3306
  selector:
    name: lbl-k8s-mysql
  type: ClusterIP

And activate it too.

kubectl create -f mysql-service.yaml

$ kubectl get svc
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes      ClusterIP   10.96.0.1               443/TCP    127m
mysql-service   ClusterIP   10.100.110.78           3306/TCP   3m10s

At this point, we could add something like a WordPress website and continue (I do hope to build on this at a later date and do just that) to improve our Kubernetes cluster. But for now, we can look back at starting with nothing and building a small cluster using Minikube to support MySQL.

What I hope to do is build on this and discover how the various “tinker toys” communicate as well as how they are controlled. Then we can explore persistent data in a world of short-lived containers and how we can manage them.


For those who want to experiment more with this while I am in Detroit, I refer you to these two articles that I have borrowed liberally from and acknowledge them as great resources:

How to Install Minikube on Ubuntu 22.04 | 20.04

How to Deploy MySQL on Kubernetes

Oct
24
2022
--

Talking Drupal #370 – A Smaller Core

Today we are talking about A Smaller Core with Théodore Biadala.

For show notes visit: www.talkingDrupal.com/370

Topics

  • What is meant by a smaller core
  • What modules have been removed already
  • What is the process
  • Chopping block terminology
  • Which modules are under discussion
  • When they go to contrib is there a maintainer first
  • What is the impact to users of the module
  • How long will they be maintained in contrib
  • Why is this important
  • What modules are next
  • What is the commitment to being a core system maintainer
  • Is there going to a sub release process

Resources

Guests

Théodore Biadala – read.theodoreb.net @nod_

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Tearyne Almendariz – @tearyneg

MOTW

Swagger UI for OpenAPI UI Swagger UI is a javascript library which allows a user to explore the api documentation for a web services API. This module provides the Swagger UI library for display of OpenAPI specifications within Drupal site. OpenAPI UI is required to use the project, as it provides the underlying architecture.

Oct
19
2022
--

Milestone Release for Percona Backup for MongoDB Brings Physical Backups to the MongoDB Open Source Community

MongoDB Logical Backup

Although MongoDB has a very rigid high availability (HA) mechanism built in its core, automatic failover and data redundancy will not satisfy the security requirements of most enterprises. Not alone, at least. MongoDB Logical Backup

Understanding the needs of the MongoDB community, Percona offers Percona Backup for MongoDB (PBM) — a free to use and fully open source backup tool. The tool is fully functional with Percona Server for MongoDB, and has some features not available for the MongoDB Community Edition due to the lack of important enterprise features there. With that in mind, we highly recommend the free-to-use Percona Server for MongoDB, which offers you everything MongoDB Community Edition does and more.

While Percona provides free and open source tools, it’s important to remember that the company also offers premium services including top of the market consultants, managed services, support, and training

Backup and restore cut for larger data sets

Quite often we hear from our users that physical backups and restores are what they need. Logical dumps – the main backup tool available for MongoDB Community Edition – simply does not cut it for large data sets. Starting with Percona Backup for MongoDB 2.0.0, we are happy to announce that physical backups and restores are generally available and ready for production use. 

What’s important, with the data at rest encryption supported for physical backups and restores, you are safe to use these with private data in production environments and still be compliant with data security requirements driven by privacy standards like HIPAA, PCI, GDPR, CCPA or others required in your organization!

Recovery does not have to mean a full database!

Having a built-in redundancy in the form of replica sets makes full restores obsolete to some degree. Of course not totally obsolete, as it is still being covered end to end for any situation requiring such capability, and it’s still a very convenient way of cloning databases.

These days when I think about backups for MongoDB more and more it’s the human error that I want to be protected against. In case no errors were made during the database architecture design and multiple availability zones are assigned to every replica set, then it’s the human who can wreak havoc in a whole database or a single collection. 

While hurting the database to the point of doing a full recovery may not be a very probable scenario, the selective restore of a particular collection is what our users need. That’s why with PBM 2.0.0, selective backups and restores become available as a tech preview.

What tech preview means is (following our lifecycle page): 

*Tech Preview Features: RC/GA releases can include features that are not yet ready for enterprise use and are not included in support via SLA (supported by product/engineering on the best effort basis). These features will be fully documented and described as a tech preview so that customers can provide feedback prior to the full release of the feature in a future GA release (or removal of the feature is deemed not useful). This functionality can change (APIs, CLIs, etc.) from tech preview to GA, however, there is no guarantee of compatibility between different tech preview versions.

What this means to the Community of users is that you have a voice whether this is a feature you need and whether it’s in a shape you like. Please share your feedback with us and know that every voice counts!

What’s next

What’s good about open source software is that the messaging is more honest, transparent and to the point. So in the spirit of open communication, we have to point out that PBM 2.0.0 has come out with some limitations that we are determined to lift in order to deliver better coverage and experience across the product. Our main focus right now is around the:

  • Point in time recovery based on physical backups
  • Selective restores supporting sharding

You can also expect more integration with the freely available Percona Monitoring and Management (PMM) in the form of backup management and monitoring available from PMM.

As you are reading this blogpost a new patch release has come out, with PBM 2.0.1 we added some slight improvements and bugfixes. This tool is fully supported and we are determined to deliver fixes and new features to everyone in the spirit of true open-source, not only to our customers paying for premium support

 

Oct
17
2022
--

Talking Drupal #369 – PHP End of Life

Today we are talking about PHP End of Life with Josh Waihi & Matt Glaman.

For show notes visit: www.talkingDrupal.com/369

Topics

  • PHP End of Life (EoL)
  • Why does PHP go EoL
  • When will php 8.0 EoL
  • How are dates scheduled
  • Who is in charge
  • How do you prepare as a developer
  • How do you prepare as a platform
  • Acquia is supporting for one more year
  • Pantheon has no forced upgrades
  • How do you find edge cases
  • Do you speak with PHP maintainers
  • What are the changes to be excited about 7 > 8
  • For D7 this is challenging, any tips?

Resources

Guests

Matt Glaman – @nmdmatt Josh Waihi – @joshwaihi

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Tearyne Almendariz – @tearyneg

MOTW

HTTP Cache Control HTTP Cache Control module helps fine grain control of Drupal’s Cache Control headers.

Oct
17
2022
--

MongoDB: Delayed Secondary Member of a Replica Set and How It Can Be Useful

MongoDB: Delayed Secondary Member of a Replica Set and How It Can Be Useful

MongoDB: Delayed Secondary Member of a Replica Set and How It Can Be UsefulIn this blog post, we will discuss delayed members of a replica set and how it can be useful. A delayed member of the replica set is similar to like other members of the replica set but its data set mirrors the set delayed state of the data. Let’s say if a delayed node is delayed by one hour and the current time is 11:00, then its data set reflects recent operation till 10:00.

A delayed member copies and applies the operation from the primary’s oplog with a delay. The amount of delay for the delayed member must be greater than or equal to the expected maintenance window duration. The time of delay should be less than the oplog window. A delayed secondary can not acknowledge a write before the configured slaveDelay or secondaryDelaySecs.

Setting the delayed member’s vote to 0 and hidden can improve the cluster performance. Consider if you have primary-secondary-delayed and secondary architecture, and delayed nodes are non-voting with a delay of time 15 minutes. When the write concern ismajority and a non-delayed member becomes unavailable, then at that time application has to wait for acknowledgment of writes for about 15 minutes. In this case, the commit point will start to lag. The storage engine will start keeping all the changes after the commit point on the disk to retain a durable history. It will increase the I/O writes over time, which as a result impacts the writes and cache pressure.

Requirement for a node to be a delayed secondary member:

  • Priority zero: A delayed secondary node must have 0 priority. Its priority should be set to 0 to stop this node from being a primary node. Since it’ll be a delayed member and reflect the delayed state of the data set, it should be prevented to become a primary node.
  • Hidden: A delayed node must be hidden. As this node will reflect the delayed state of the data set, the application should always be stopped to see and query the delayed node.
  • Oplog: Oplog window’s capacity must be more than the specified amount of delay. 

In sharding, delayed secondary members have limited benefits if the balancer is running. Since delayed members clone the chunk migration with a delay, the state of the delayed member will not be useful for recovering to a previous state of the sharded cluster if migration occurred during the delay window.

One can configure a secondary node as a delayed member secondary by adjusting the members[n].priority value to 0, members[n].hidden value to true, and its members[n].slaveDelay value to the number of seconds to delay. For example, configure a node as delayed as below:

rs1:PRIMARY> cfg = rs.conf()
rs1:PRIMARY> cfg.members[2].priority = 0
rs1:PRIMARY> cfg.members[2].hidden = true
rs1:PRIMARY> cfg.members[2].slaveDelay = 900   //slaveDelay is in seconds and in v5.0 onwards, this parameter changed to secondaryDelaySecs.
rs1:PRIMARY> cfg.members[2].votes = 0
rs1:PRIMARY> rs.reconfig(cfg)

Note: Reconfiguring a replica set could lead to an election, so it is recommended to do the above in the scheduled maintenance window.

rs1:PRIMARY> rs.conf().members[2]
{
			"_id" : 3,
			"host" : "demo-node-3:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : true,
			"priority" : 0,
			"tags" : {

			},
			"slaveDelay" : NumberLong(900),
			"votes" : 0
	}

How can delayed secondary members of a replica set be useful?

Since delayed secondary members are delayed by some amount of time, it can be useful if one needs it to roll back to an earlier time or it can act as a hot backup to shelter against mistakes like accidentally dropping the entire database/collection. Or, consider, if an application upgrade was done but was unsuccessful or had operator errors. In that case, delayed secondary members can be useful to recover from.

Conclusion

A delayed secondary can be useful in case of accidentally dropping the entire database/collection and that drop has not been applied yet i.e till the specified delay time. You also need to make sure that you have enough oplog windows to allow it to be in sync with the primary. However, for a comprehensive recovery solution, we would recommend having a backup solution. You can check the Percona backup solution, to learn more please visit Percona Backup for MongoDB.

We also encourage you to try our products for MongoDB like Percona Server for MongoDB or Percona Operator for MongoDB.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com