Personal blog of Yzmir Ramirez » 2022

Aug
22
2022

Private DBaaS with Free Kubernetes Cluster

Percona Private DBaaS with Free Kubernetes Cluster

Percona Private DBaaS with Free Kubernetes Cluster We at Percona are committed to delivering software that enables users to run databases anywhere. Our Operators for databases and Percona Monitoring and Management (PMM) Database as a Service (DBaaS) confirm our commitment to Kubernetes. Kubernetes is not only the most popular container orchestrator, but also becoming a de-facto standard for containerized workloads.

Even though we have an enterprise-grade solution to run and manage databases on Kubernetes, we still see that Kubernetes itself sometimes becomes a blocker for onboarding. We wrote a blog post some time ago about spinning up DBaaS in under 20 minutes. What if we can do it in two? This is why we partnered with a cloud-native service provider – Civo – to provide our users with a free temporary Kubernetes cluster. In this blog post, you will learn how to use it and try out our Private DBaaS solutions without the need of being a database or Kubernetes expert.

How do I get the cluster?

Sign in to Percona Platform. If you don’t have an account yet, click Create one at the bottom of the sign-in form.
Find “Free Kubernetes” in the menu on the left:

Percona Portal

Click “Launch a new cluster”. It will take less than 90 seconds to create one.
Once the cluster is ready, you will be able to download kubeconfig – a file used to access the Civo Kubernetes cluster.

Percona DBaaS

Save this file somewhere on your computer, we will need it later to register Kubernetes in PMM DBaaS. That is it, the cluster is up and running.

Limitations

The cluster will be automatically destroyed in three hours. It must not be used for any production workloads.
The cluster comes with three nodes (4 CPUs, 8 GB RAM each) and does not have auto scaling enabled. It is enough for deploying a database cluster and an application.

Try DBaaS in Percona Monitoring and Management

Install PMM server

If you have a PMM server – skip this section. If not, we are going to deploy it using the quick install. You can also install PMM on Kubernetes with a helm chart by following our documentation and this blog post.

Run the following command to install PMM server on your docker compatible *nix based machine (see quick start guide for more details):

curl -fsSL https://www.percona.com/get/pmm | /bin/bash

When the script is done, the CLI tool will print a list of IP-based URLs you can put in a browser to access the PMM UI. Copy/paste one into your favorite browser. You may receive a security warning, there are instructions in the script output on how to bypass if you don’t get a “proceed anyway” option in your browser.

DBaaS

You can find necessary information about how to utilize DBaaS in our documentation or this video. In general there are few steps:

At the time of writing this blog post, DBaaS is in technical preview. Do not forget to enable it in Settings -> Advanced Settings.
Register the Kubernetes cluster in the DBaaS using the kubeconfig generated in the Portal
Deploy your first database

Your database will be ready in a few minutes, you will get the endpoint to connect to and the username and password. By default the database is not exposed publicly and reachable only within the Kubernetes cluster. You can change it in the Advanced Options when creating the database.

With ‘Free Kubernetes’ we want to simplify PMM DBaaS onboarding and we also want to bring value to our community of users. It is the first version and we plan to deliver more enhancements to provide even more exciting onboarding. It would be great if you could help us to find those improvements by submitting your feedback at platform_portal@percona.com. Please spend a couple of minutes and let us know what problems or improvements you would like to see in your PMM DBaaS and Kubernetes journey.

Written by Sergey Pronin in: Cloud,container,DBaaS,Kubernetes,mongodb,MySQL,Percona Platform,Percona Software,Zend Developer |

Aug
22
2022

Percona Distribution for MySQL (PS-Based Variant) 8.0.29, Percona Distribution for MongoDB 5.0.10-9: Release Roundup August 22, 2022

Percona releases Aug 22 2022

It’s time for the release roundup!

Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since August 8, 2022. Take a look!

Percona Distribution for MySQL (Percona Server-based variant) 8.0.29

On August 8, 2022, Percona Distribution for MySQL (PS-based variant) 8.0.29 was released. It is a single solution with the best and most critical enterprise components from the MySQL open-source community, designed and tested to work together. It provides two deployment variants: one is based on Percona Server for MySQL and another one is based on Percona XtraDB Cluster.

The following lists a number of the bug fixes for MySQL 8.0.29, provided by Oracle, and included in Percona Server for MySQL and Percona Distribution for MySQL:

The Performance Schema tracks if a query was processed on the PRIMARY engine, InnoDB, or a SECONDARY engine, HeatWave. An EXECUTION_ENGINE column, which indicates the engine used, was added to the Performance Schema statement event tables and the sys.processlist and the sys.x$processlist views.
Added support for the IF NOT EXISTS option for the CREATE FUNCTION, CREATE PROCEDURE, and CREATE TRIGGER statements.
Added support for ALTER TABLE ... DROP COLUMN ALGORITHM=INSTANT.
An anonymous user with the PROCESS privilege was unable to select processlist table rows.

Download Percona Distribution for MySQL (PS-based variant) 8.0.29

Percona Server for MySQL 8.0.29-21

Percona Server for MySQL 8.0.29-21 was released on August 8, 2022. It is a free, fully compatible, enhanced, and open source drop-in replacement for any MySQL database. It provides superior performance, scalability, and instrumentation. It includes all the features and bug fixes available in the MySQL 8.0.29 Community Edition in addition to enterprise-grade features developed by Percona.

Download Percona Server for MySQL 8.0.29-21

Percona Server for MySQL 5.7.39-42

August 15, 2022, saw the release of Percona Server for MySQL 5.7.39-42, which includes all the features and bug fixes available in MySQL 5.7.39 Community Edition in addition to enterprise-grade features developed by Percona. Improvements are that the SHOW PROCESSLIST statement now displays an extra field TIME_MS. The TIME_MS field provides the information about the time in milliseconds that the thread has been in its current state.

Download Percona Server for MySQL 5.7.39-42

Percona Distribution for MongoDB 5.0.10-9

On August 9, 2022, we released Percona Distribution for MongoDB 5.0.10-9, a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. This release of Percona Distribution for MongoDB includes bug fixes and improvements provided by MongoDB and included in Percona Server for MongoDB. It specifically includes multiple fixes related to sharding and the resharding operation.

Download Percona Distribution for MongoDB 5.0.10-9

Percona Server for MongoDB 5.0.10-9

Percona Server for MongoDB 5.0.10-9 was released on August 9, 2022. It is an enhanced, source available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 5.0.10 Community Edition. It is rebased on MongoDB 5.0.10 Community Edition and supports MongoDB 5.0.10 protocols and drivers.

This release includes bug fixes and improvements provided by MongoDB and included in Percona Server for MongoDB. It specifically includes multiple fixes relate to sharding and the resharding operation. A few of the bugs of note are the following:

SERVER-66418 – Fixed the issue with bad projection created during dependency analysis due to string order assumption. It resulted in the PathCollision error. The issue is fixed by improving dependency analysis for projections by folding dependencies into ancestor dependencies where possible.
SERVER-65821 – Fixed the deadlock situation in cross shard transactions that could occur when the FCV (Feature Compatibility Version) was set after the “prepared” state of the transactions. That ended up with both the the setFCV thread and the TransactionCoordinator hung.

Download Percona Server for MongoDB 5.0.10-9

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Written by David Quilty in: Percona services,Percona Software,Software Releases,Zend Developer |

Aug
18
2022

PostgreSQL For MySQL DBAs Episode 9: References

PostgreSQL For MySQL DBAs

Thank you for the many kind responses to this series. There are many out there who have MySQL experience and are seeking to expand into PostgreSQL for reasons ranging from simple curiosity to fortifying their professional skills. Now that we have covered the low-level basics, it is time to gain knowledge of some community resources in the PostgreSQL world.

When learning a new database there comes a time where you are able to somewhat master the basics – startup, add data, query data, make backups — and hopefully by following this series you are at a point where you are about to transition from the equivalent of crawling along and are about to toddle. To aid in this transition I would like to make you aware of some reference material that you should be aware of to aid you. Accompanying Video.

PostgreSQL Administration Cookbook cover

The book PostgreSQL 14 Administration Cookbook by Simon Riggs and Gianni Ciolli is a handy reference. It has an extensive width and breadth of material and I highly recommend the section on security as it is worth the price of the book alone. Since PostgreSQL is so much mechanically different to operate than MySQL, take some time to study the miniatous and replication chapters too. I like books and this one has become a handy reference, and I will write a full review of this book soon. If you do not find it in your local bookstore, you can find it on Amazon.

https://planet.postgresql.org is the equivalent to planet.mysql.com and for novices, much of the material is still a little over your head. But reading the content on this website will give you a glimpse at the activities in the community such as the second half of an internals book is available (for free!) and how to corrupt (on purpose) a PostgreSQL database. This is usually not a high-traffic site and checking in every few days will greatly expand your viewpoint.

https://www.scalingpostgres.com posts once a week and you can have a link to each new episode sent to your inbox. Each episode features a dozen or so posts from around the internet coving topics from high availability to fine points of SQL syntax. Again, most of the content is above novice level but it will help you build situational awareness as you transition to an intermediate skill level.

https://www.reddit.com/r/PostgreSQL/ suffers from a “high noise to signal” ratio but there are good questions asked and links to valuable articles. However, there are a lot of posts from students looking for help with homework

And last but not least https://forums.percona.com/ is where you can ask questions. This is usually not a high-volume traffic site but it affords you the chance to see problems others have so that hopefully, you can learn by observation (search the internet for Will Rogers on learning by observation). And if you can answer someone else’s question, Percona would love for you to contribute. Heck, you can even win awards for doing so.

You will find the companion videos for this blog post here.

There are other online resources like the many PostgreSQL mailing lists but generally, you will find those probably too esoteric at this point on the learning curve.

The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode one, episode two, episode three, episode four, episode five, episode six, episode seven, and episode eight.

Have a topic suggestion? If you are following this series and have a topic you want covered, please let me know.

Bonus! Learn PostgreSQL in a browser at https://www.crunchydata.com/developers/tutorials

Written by David Stokes in: Insight for DBAs,MySQL,postgresql,Zend Developer |

Aug
17
2022

FTWRL on MyDumper Removed

FTWRL on MyDumper Removed The title is not entirely true, but ‘FTWRL on MyDumper is not needed anymore for consistent backups’ was a long title. One more time, I wanted to share a new feature in MyDumper. This is related to an important piece: the locking mechanism that mydumper uses to sync all the threads.

MyDumper was born because, at that time, we didn’t have a tool that could take a consistent logical backup using multiple threads. Syncing all the threads was one of the problems, which has been solved using FLUSH TABLE WITH READ LOCK (FTWRL), til all the threads execute START TRANSACTION WITH CONSISTENT SNAPSHOT (STWCS), then we release the FTWRL and all the threads are in sync. We all know that FTWRL is very expensive and difficult to acquire on some database workloads.

I started to think about alternatives to avoid using FTWRL, and my first thought was, why don’t we use the SHOW ENGINE INNODB STATUS to check if all the threads are at the same point in time? That is doable but I didn’t like it, as threads know at what point in time they are!

I asked internally at Percona and I got my answer: https://jira.percona.com/browse/PS-4464. Since Percona Server for MySQL versions 8.0.19-10 and 5.7.30-33, we have a status variable that shows the last GTID executed in our STWCS which was all I needed.

I worked on the solution, which required some testing with high write traffic and forcing infrequent scenarios in real life.

How does it work?

On the next MyDumper release, when you use –no-locks, you can get a consistent backup anyway:

# ./mydumper -B myd_test -o data -v 4 --no-locks
...
** Message: 12:29:42.333: Thread 2: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2180547'.
** Message: 12:29:42.333: All threads in the same position. This will be a consistent backup.
** Message: 12:29:42.333: Thread 3: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2180547'.
** Message: 12:29:42.333: All threads in the same position. This will be a consistent backup.
** Message: 12:29:42.333: Thread 1: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2180547'.
** Message: 12:29:42.333: All threads in the same position. This will be a consistent backup.
** Message: 12:29:42.333: Thread 4: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2180547'.
** Message: 12:29:42.333: All threads in the same position. This will be a consistent backup.
...

In the log, you will see a line per thread letting you know the thread GTID position and confirming that all the threads are at the same point in time. We get the GTID from the execution of SHOW STATUS LIKE 'binlog_snapshot_gtid_executed' per thread, this value is compared with a global shared variable and if they are not the same, it fails, and mydumper is going to try five times to sync, and inform you the result:

...
** Message: 12:32:48.968: Thread 2: All threads in same pos check
** Message: 12:32:48.971: Thread 3: All threads in same pos check
** Message: 12:32:48.972: Thread 4: All threads in same pos check
** Message: 12:32:48.973: Thread 1: binlog_snapshot_gtid_executed_status_local failed with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196157'.
** Message: 12:32:48.973: Thread 2: binlog_snapshot_gtid_executed_status_local failed with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196164'.
** Message: 12:32:48.973: Thread 3: binlog_snapshot_gtid_executed_status_local failed with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196165'.
** Message: 12:32:48.973: Thread 4: binlog_snapshot_gtid_executed_status_local failed with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196165'.
** Message: 12:32:48.975: Thread 1: All threads in same pos check
** Message: 12:32:48.975: Thread 2: All threads in same pos check
** Message: 12:32:48.977: Thread 4: All threads in same pos check
** Message: 12:32:48.980: Thread 3: All threads in same pos check
** Message: 12:32:48.980: Thread 1: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196167'.
** Message: 12:32:48.980: All threads in the same position. This will be a consistent backup.
** Message: 12:32:48.980: Thread 2: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196167'.
** Message: 12:32:48.980: All threads in the same position. This will be a consistent backup.
** Message: 12:32:48.980: Thread 4: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196167'.
** Message: 12:32:48.980: All threads in the same position. This will be a consistent backup.
** Message: 12:32:48.980: Thread 3: binlog_snapshot_gtid_executed_status_local succeeded with gtid: '498aa664-fd29-11ec-a793-0800275ff74d:1-2196167'.
** Message: 12:32:48.980: All threads in the same position. This will be a consistent backup.
...

Remember that you will need to enable GTID and the binlogs to get the value of binlog_snapshot_gtid_executed which is the status variable that we use.

Conclusion

With this new feature, you will be able to reduce the contention caused by mydumper because of the usage of FTWRL. So, starting from the next release, you will be able to use –no-locks if you don’t need any DDL locking mechanism.

Written by David Ducos in: Insight for DBAs,mydumper,MySQL,mysql-and-variants,Zend Developer |

Aug
17
2022

MyRocks Use Case: Big Dataset

MyRocks Use Case Big Dataset

MyRocks Use Case Big Dataset One of the questions I am often asked is in what cases I would prefer MyRocks over InnoDB. We have covered MyRocks in our blog previously:

MyRocks Performance – Percona Database Performance Blog

Saving With MyRocks in The Cloud – Percona Database Performance Blog

But it would be good to refresh some materials.

This time I want to take an interesting (and real) data set, which I also covered previously: the Reddit Comments dataset (see Big Dataset: All Reddit Comments – Analyzing with ClickHouse – Percona Database Performance Blog). The dataset is still available for download from http://files.pushshift.io/reddit/submissions/ and it includes all recent comments to June 2022.

The size of the dataset is what is interesting, for example, the comments for January 2022 is 118GB and the total January-June 2022 dataset size is 729GB.

What is also interesting about this dataset is it comes in JSON format, and now MySQL provides wide capabilities to work with JSON files directly.

For example, we can directly load the dataset as:

mysqlsh root@127.0.0.1/rs -- util importJson /storage/vadim/reddit/RS_2022-01 --table=stor --tableColumn=doc

Into a table created as:

CREATE TABLE stor ( doc json DEFAULT NULL, id int NOT NULL AUTO_INCREMENT, PRIMARY KEY (id) )

Now we can compare load time into MyRocks vs InnoDB for the same files.

It is worth noting that I used a fast NVMe storage for the MySQL database, which is beneficial for InnoDB (as InnoDB performs better on the fast storage, see Saving With MyRocks in The Cloud – Percona Database Performance Blog).

So loading six files into MyRocks storage engine:

Processed 124.26 GB in 32091070 documents in 1 hour 45 min 4.8562 sec (5.09K documents/s)
Processed 116.83 GB in 29843162 documents in 1 hour 40 min 45.0204 sec (4.94K documents/s)
Processed 128.81 GB in 32677372 documents in 1 hour 54 min 16.1496 sec (4.77K documents/s)
Processed 130.34 GB in 33002461 documents in 1 hour 56 min 43.0586 sec (4.71K documents/s)
Processed 142.88 GB in 34838318 documents in 2 hours 1 min 22.7340 sec (4.78K documents/s)
Processed 139.09 GB in 34395243 documents in 2 hours 4 min 49.8066 sec (4.59K documents/s)

And the final size in MyRocks is 238G. MyRocks uses a mix of LZ4 + Zstandard compression to achieve this size (smaller than the original dataset).

The same for InnoDB storage engine:

Processed 124.26 GB in 32091070 documents in 2 hours 17 min 33.0404 sec (3.89K documents/s)
Processed 116.83 GB in 29843162 documents in 2 hours 8 min 6.3595 sec (3.88K documents/s)
Processed 128.81 GB in 32677372 documents in 2 hours 28 min 8.5292 sec (3.68K documents/s)
Processed 130.34 GB in 33002461 documents in 2 hours 31 min 25.8357 sec (3.63K documents/s)
Processed 142.88 GB in 34838318 documents in 2 hours 44 min 56.9327 sec (3.52K documents/s)
Processed 139.09 GB in 34395243 documents in 2 hours 39 min 53.3889 sec (3.59K documents/s)

The final size in InnoDB is 991G.

Note: InnoDB does not use compression in this test. Although the InnoDB engine has compression capabilities, this is not what we typically recommend to use due to different issues.

The MyRock average insert throughout is 4.81K documents/sec and InnoDB is 3.7K documents/sec, so there is a 24 percent gain in loading speed comparing MyRocks to InnoDB.

Now let’s compare some aggregation queries, with benefits to show how we can handle JSON documents in MySQL.

Find the top subreddits (by the number of comments) that are not marked as “Adult only” (I did not risk including “adult” subreddits in the output…):

SELECT doc->"$.subreddit",count(*) cnt FROM stor WHERE JSON_VALUE(doc, "$.over_18" RETURNING UNSIGNED) IS NOT TRUE GROUP BY 1 ORDER BY cnt DESC LIMIT 100;

The first lines of output:

+-------------------------+--------+ | doc->"$.subreddit" | cnt | +-------------------------+--------+ | "AskReddit" | 329683 | | "teenagers" | 131401 | | "memes" | 102118 | | "AutoNewspaper" | 82080 | | "relationship_advice" | 66883 | | "UltraAlgo" | 64958 | | "antiwork" | 54697 | | "NoStupidQuestions" | 52531 | | "NFTsMarketplace" | 48647 | | "CryptoCurrency" | 46817 |

And execution times:

MyRocks: 100 rows in set (6 min 20.31 sec)
InnoDB: 100 rows in set (8 min 10.29 sec)

The not-so-good parts of MyRocks

Full disclosure: I also want to show the part where MyRocks performs worse than InnoDB. For this, I will use queries that extract comments from the author.

For this we need to create an index by “author”, and because this is all stored in a JSON document, we need to create a VIRTUAL column “author” as:

ALTER TABLE stor ADD COLUMN author VARCHAR(255) GENERATED ALWAYS AS ( doc->"$.author" );

And for InnoDB this is practically an instant operation:

Query OK, 0 rows affected (1 min 22.61 sec)

While for MyRocks it takes time:

Query OK, 61934232 rows affected (45 min 46.61 sec)

The difference is that MyRocks does not support any “INSTANT” DDL operations yet, while InnoDB does. However, we see that exactly “INSTANT” operations cause major bugs and incompatibilities in InnoDB in versions 8.0.29 and 8.0.30 (see Percona XtraBackup 8.0.29 and INSTANT ADD/DROP Columns – Percona Database Performance Blog) so I would ask for extra caution before using “INSTANT” operations in InnoDB.

And now we can create an index on a virtual column:

MyRocks:

ALTER TABLE stor ADD KEY (author);

Query OK, 0 rows affected (14 min 37.34 sec)

InnoDB:

ALTER TABLE stor ADD KEY (author);

Query OK, 0 rows affected (40 min 4.39 sec)

MyRocks took much less time creating a new index on a column. To show where MyRocks is worse I will use the poor-man benchmark available in MySQL, namely the BENCHMARK command, such as:

SELECT BENCHMARK(50000000,(with t1 as (with q as (select FLOOR(1 + RAND()*(7685162 -1 + 1)) c1) select * from authors,q where id=q.c1) select count(*) cnt from stor,t1 where stor.author=t1.author limit 100));

Basically, I will execute 50mln times a query that extracts comments from a random author.

For InnoDB execution times (less is better, three different attempts):

1 row in set (2 min 14.66 sec)
1 row in set (1 min 53.10 sec)
1 row in set (1 min 40.18 sec)

For MyRocks:

1 row in set (6 min 38.60 sec)
1 row in set (5 min 42.83 sec)
1 row in set (6 min 4.76 sec)

To put this into perspective, for InnoDB it resulted in 442k queries/sec, and for MyRocks 137k queries/sec (compression and write-optimized data structure play a role here). We need to highlight these results WITHOUT network communication. In normal circumstances, there will be added network latency and the difference between two engines will be a lot less.

Conclusion

MyRocks makes a good use case for a big dataset (in our test we loaded ¾ TB dataset) providing a good insertion rate and small compressed size of the final product.

As a drawback of compression and a write-optimized engine, MyRocks is behind InnoDB in quick “index lookup” queries, which is acceptable in my opinion, but you should evaluate this for your usage.

Appendix:

Hardware specifications and configurations:

2022-MyRocks-reddit/benchmarkspec.md at main · Percona-Lab-results/2022-MyRocks-reddit (github.com)

Written by Vadim Tkachenko in: Insight for DBAs,MySQL,mysql-and-variants,storage engine,Zend Developer |

Aug
15
2022

Talking Drupal #360 – Backdrop Case Study

Today we are talking about Backdrop CMS with Eric Toupin.

www.talkingDrupal.com/360

Topics

What is backdrop
How did you hear about it
Tell us about Aten and your clients
What type of work is Aten doing with Stanford
Why was Backdrop CMS considered
- How long was Backdrop out before you considered it
Are there features Backdrop has that Drupal does not have
What are some limitations of Backdrop
If someone has Drupal 7 what do you consider the criteria for Backdrop vs Drupal 9
Are you working on other Backdrop sites
Do you consider Backdrop it’s own CMS
Have you contributed anything back to Drupal from Backdrop
Does Aten consider Backdrop a service it provides

Resources

Guests

Eric Toupin – www.drupal.org/u/erictoupin

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Cathy Theys – @YesCT

MOTW

hreflang The core Content Translation module adds hreflang tags only on content entity pages. This module, on the other hand, adds hreflang tags to all pages, and can be configured to defer to Content Translation module on content entity pages. If for some reason you’d like to modify the hreflang tags on a page, you can do so by implementing

Written by Talking Drupal Hosts in: Zend Developer |

Aug
11
2022

How to Build Percona XtraDB Cluster From Sources

Build Percona XtraDB Cluster

Build Percona XtraDB Cluster Percona XtraDB Cluster (PXC) is a database clustering solution for MySQL. It ensures high availability, prevents downtime and data loss, and provides linear scalability for a growing environment.

Lately, the number of questions about how to build Percona software has increased. More and more people try to add their own patches, add some modifications, and build software by themselves. But this raises the question of how to do this in the same way as Percona does, as sometimes the compiler flag can make a drastic impact on the final binary. This is a really essential question and let’s look at how we build Percona XtraDB Cluster here at Percona.

At the first glance, the building process is not something difficult. Everyone knows that magic make install all command and suppose that as a result everything will be built and installed on the system. But databases are very essential for applications and it is needed to do everything carefully.

Let’s divide the build process into the stages:

Prepare build environments
Prepare source tarballs
Prepare source package
Prepare binary packages

We can skip the third stage, but in this case, we wouldn’t be able to make repeatable builds so I would suggest always creating source packages.

So let’s move forward and discuss each step.

One of the key points during a build is that the environment should be clean and it is necessary to install all needed dependencies, and so on. For each version, the dependency list would be different. How do you get the correct dependency list? You can get all build requirements from the spec file (on rpm-based systems) or from the control file( on deb-based systems).

As for source tarball, you can easily do the following:

Get source tarball from the website

We publish source tarball for each release we issue, and you can easily get it for any released version (this is the link for the 8.0.28 version):

https://downloads.percona.com/downloads/Percona-XtraDB-Cluster-LATEST/Percona-XtraDB-Cluster-8.0.28/source/tarball/Percona-XtraDB-Cluster-8.0.28-19.tar.gz

Download tarball from GitHub

Here is the link:

https://github.com/percona/percona-xtradb-cluster/archive/refs/tags/Percona-XtraDB-Cluster-8.0.28-19.1.tar.gz

You can use build instructions from our website and proceed with the compilation process:

https://www.percona.com/doc/percona-xtradb-cluster/LATEST/install/compile.html#compile

So everything looks easy. But what is the way we use, internally, to prepare our release packages? As I mentioned earlier, each compiler option can make a significant effect.

Everyone tries to make life easier and automate all tasks. It is great, as automation is one of the keys to success as you can work on other tasks once the release build is in progress. So we have created a build script that is used for making builds. It can be found in the PXC GitHub repo and can be used by anyone to make his own builds.

This script can install all needed dependencies for the build environment, create binary tarballs, source RPMs and debs, RPMs, debs itself, and the binary tarball. So it covers all build cycles. So how do we use it?

The script has various params:

--builddir=DIR Absolute path to the dir where all actions will be performed

--get_sources Source will be downloaded from github

--build_src_rpm If it is 1 src rpm will be built

--build_source_deb If it is 1 source deb package will be built

--build_rpm If it is 1 rpm will be built

--build_deb If it is 1 deb will be built

--build_tarball If it is 1 tarball will be built

--install_deps Install build dependencies(root privileges are required)

--branch Branch for build

--repo Repo for build

--rpm_release RPM version( default = 1)

--deb_release DEB version( default = 1)

--debug Build debug tarball

So let’s see how we can make the build process easier:

1. Download the build script:

wget https://raw.githubusercontent.com/percona/percona-xtradb-cluster/8.0/build-ps/pxc_builder.sh

2. Create a build directory where you are going to perform all build actions:

mkdir /tmp/BUILD_PXC

3. Install dependencies(please note root permissions are required):

sudo ./pxc_builder.sh --builddir=/tmp/BUILD_PXC --install_deps=1

4. Download source code:

– From Percona repo (it is used by default):

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --get_sources=1 --branch=Percona-XtraDB-Cluster-8.0.28-19

– From your own repo and branch:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --get_sources=1 --branch=<your_branch_name> --repo=<link_to_your_repo_on_github>

5. Prepare src rpm:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_src_rpm=1

Please note if you already have your source tarball, just create a directory named source_tarball in the build directory and put it into it.

6. Prepare source deb:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_source_deb=1

Please note if you already have your source tarball, just create a directory named source_tarball in the build directory and put it into it.

7. Prepare rpm:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_rpm=1

8. Prepare deb:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC --build_deb=1

9. Prepare tarball:

./pxc_builder.sh --builddir=/tmp/BUILD_PXC –build_tarball=1

So as you can see, the build procedure becomes easier and you don’t need to think about what dependencies are needed, what cmake params should be used, etc. This build script will make everything for you and will use all build flags (if you didn’t change them in your sources) that we use for release builds.

Written by Evgeniy Patlan in: MySQL,mysql-and-variants,Percona Software,Percona XtraDB Cluster,Zend Developer |

Aug
11
2022

Window Functions in MySQL 8.0

Window Functions in MySQL 8.0 I have recently written an article for this blog presenting Window Functions for MongoDB 5.0. I used some public Italian COVID-19 data for a few real examples. Please have a look at it if you like.

Then I thought I should provide the same even for a relational database like MySQL.

MySQL introduced Window Functions in version 8.0. This feature has been in demand by many developers for some time. In this article, I’m going to show the benefits and how you can use Window Functions on MySQL 8.0.

Load some public data

I have a Percona Server for MySQL 8.0 running and I got some public Italian data about COVID-19 infections, hospitalizations, and other info. The data are available on a per-day and per-region basis from the following link:

https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni.

I loaded just a few months’ data spanning 2021 and 2022. Original data is labeled in Italian, so I created a similar and reduced table just for the needs of this article.

Here is a sample of the data:

mysql> DESC covid;
+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int unsigned | NO   | PRI | NULL    | auto_increment |
| day            | date         | YES  |     | NULL    |                |
| region         | varchar(50)  | YES  |     | NULL    |                |
| total_cases    | int unsigned | YES  |     | NULL    |                |
| deceased       | int unsigned | YES  |     | NULL    |                |
| hospitalized   | int unsigned | YES  |     | NULL    |                |
| intensive_care | int unsigned | YES  |     | NULL    |                |
| home_isolation | int unsigned | YES  |     | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+

mysql> SELECT * FROM covid LIMIT 5;
+----+------------+----------------+-------------+----------+--------------+----------------+----------------+
| id | day        | region         | total_cases | deceased | hospitalized | intensive_care | home_isolation |
+----+------------+----------------+-------------+----------+--------------+----------------+----------------+
|  1 | 2021-10-01 | Abruzzo        |       81281 |     2545 |           56 |              4 |           1677 |
|  2 | 2021-10-01 | Basilicata     |       30175 |      614 |           38 |              2 |           1214 |
|  3 | 2021-10-01 | Calabria       |       83918 |     1408 |          156 |             15 |           3359 |
|  4 | 2021-10-01 | Campania       |      456695 |     7944 |          233 |             21 |           6578 |
|  5 | 2021-10-01 | Emilia-Romagna |      424089 |    13477 |          416 |             46 |          13882 |
+----+------------+----------------+-------------+----------+--------------+----------------+----------------+

What are Window Functions

Window Functions permit the run of a window across sorted rows producing calculations over each step of the window. As a consequence, a result of the Window Functions is provided for each row returned by the query.

The main advantage of Window Functions is that you can run aggregations on the fly in more efficient and elegant ways, without the need of creating temporary tables or views to use for further queries.

Typical use cases are calculating rolling averages, correlation scores, or cumulative totals.

Two new clauses are used to define and use Windows Functions: OVER and WINDOW. The second one is not mandatory. We’ll see some examples to show how you can use them.

The first example: hospitalizations in Central Italy area

Let’s focus our attention on the specific area of “Central Italy” which includes the following five regions: Toscana, Umbria, Marche, Lazio, and Abruzzo.

We’d like to find out the number of hospitalizations per day in each region and also calculate the total of hospitalizations for the entire area for a specific period, the first week of December 2021.

mysql> SELECT day, region, hospitalized,
    -> SUM(hospitalized) OVER(PARTITION BY day) AS 'total_hospitalized'
    -> FROM covid
    -> WHERE region IN ('Toscana','Umbria','Lazio','Marche','Abruzzo')
    -> AND day BETWEEN '2021-12-01' AND '2021-12-06';
+------------+---------+--------------+--------------------+
| day        | region  | hospitalized | total_hospitalized |
+------------+---------+--------------+--------------------+
| 2021-12-01 | Lazio   |          792 |               1384 |
| 2021-12-01 | Abruzzo |          115 |               1384 |
| 2021-12-01 | Umbria  |           57 |               1384 |
| 2021-12-01 | Toscana |          307 |               1384 |
| 2021-12-01 | Marche  |          113 |               1384 |
| 2021-12-02 | Abruzzo |          118 |               1371 |
| 2021-12-02 | Lazio   |          785 |               1371 |
| 2021-12-02 | Marche  |          117 |               1371 |
| 2021-12-02 | Toscana |          298 |               1371 |
| 2021-12-02 | Umbria  |           53 |               1371 |
| 2021-12-03 | Abruzzo |          123 |               1408 |
| 2021-12-03 | Lazio   |          815 |               1408 |
| 2021-12-03 | Marche  |          123 |               1408 |
| 2021-12-03 | Toscana |          294 |               1408 |
| 2021-12-03 | Umbria  |           53 |               1408 |
| 2021-12-04 | Umbria  |           47 |               1383 |
| 2021-12-04 | Toscana |          295 |               1383 |
| 2021-12-04 | Marche  |          126 |               1383 |
| 2021-12-04 | Abruzzo |          112 |               1383 |
| 2021-12-04 | Lazio   |          803 |               1383 |
| 2021-12-05 | Abruzzo |          122 |               1408 |
| 2021-12-05 | Lazio   |          811 |               1408 |
| 2021-12-05 | Marche  |          126 |               1408 |
| 2021-12-05 | Toscana |          299 |               1408 |
| 2021-12-05 | Umbria  |           50 |               1408 |
| 2021-12-06 | Lazio   |          864 |               1472 |
| 2021-12-06 | Umbria  |           50 |               1472 |
| 2021-12-06 | Toscana |          300 |               1472 |
| 2021-12-06 | Marche  |          130 |               1472 |
| 2021-12-06 | Abruzzo |          128 |               1472 |
+------------+---------+--------------+--------------------+
30 rows in set (0.00 sec)

In this example, the SUM() function works as a window function that operates on a set of rows defined by the contents of the OVER clause. Many other aggregation functions you regularly use with GROUP BY can be used as window functions this way. The OVER clause does the magic by defining the window’s set of rows. Here, the PARTITION BY day tells MySQL to consider as a window all the rows having the same value for the day column, and then calculate the SUM() over those rows only. The main difference between using the aggregation function with GROUP BY or as a window function is that in the first case a single row is returned for each group, in the second case all the rows are returned and the aggregated value is returned for each row.

The following picture shows with different colors how the windows are considered for this query:

Multiple window functions in one query

Here is another example to show you can create multiple windows inside a query. Each one must have its own OVER clause to define the partition.

In the following query, we would like to return the number of deceased on average for the entire country in January 2022. Also, we would like to return the maximum number of deceased for each region in the period.

mysql> SELECT day, region, deceased,
    -> AVG(deceased) OVER() AS 'country average deceased'
    -> MAX(deceased) OVER(PARTITION BY region) AS 'max daily deceased in the period'
    -> FROM covid
    -> WHERE day BETWEEN '2022-01-01' AND '2022-01-31'
    -> ORDER BY day, region;
+------------+-----------------------+----------+--------------------------+----------------------------------+
| day        | region                | deceased | country average deceased | max daily deceased in the period |
+------------+-----------------------+----------+--------------------------+----------------------------------+
| 2022-01-01 | Abruzzo               |     2640 |                6737.0000 |                             2811 |
| 2022-01-01 | Basilicata            |      635 |                6737.0000 |                              681 |
| 2022-01-01 | Calabria              |     1625 |                6737.0000 |                             1887 |
| 2022-01-01 | Campania              |     8471 |                6737.0000 |                             9139 |
| 2022-01-01 | Emilia-Romagna        |    14231 |                6737.0000 |                            15088 |
| 2022-01-01 | Friuli Venezia Giulia |     4225 |                6737.0000 |                             4487 |
| 2022-01-01 | Lazio                 |     9275 |                6737.0000 |                             9823 |
| 2022-01-01 | Liguria               |     4587 |                6737.0000 |                             4873 |
| 2022-01-01 | Lombardia             |    35095 |                6737.0000 |                            37184 |
| 2022-01-01 | Marche                |     3249 |                6737.0000 |                             3418 |
| 2022-01-01 | Molise                |      512 |                6737.0000 |                              528 |
| 2022-01-01 | P.A. Bolzano          |     1307 |                6737.0000 |                             1357 |
| 2022-01-01 | P.A. Trento           |     1423 |                6737.0000 |                             1473 |
| 2022-01-01 | Piemonte              |    12059 |                6737.0000 |                            12601 |
| 2022-01-01 | Puglia                |     6987 |                6737.0000 |                             7215 |
| 2022-01-01 | Sardegna              |     1729 |                6737.0000 |                             1844 |
| 2022-01-01 | Sicilia               |     7514 |                6737.0000 |                             8527 |
| 2022-01-01 | Toscana               |     7562 |                6737.0000 |                             8263 |
| 2022-01-01 | Umbria                |     1504 |                6737.0000 |                             1623 |
| 2022-01-01 | Valle d'Aosta         |      488 |                6737.0000 |                              507 |
| 2022-01-01 | Veneto                |    12395 |                6737.0000 |                            13169 |
| 2022-01-02 | Abruzzo               |     2642 |                6737.0000 |                             2811 |
| 2022-01-02 | Basilicata            |      635 |                6737.0000 |                              681 |
| 2022-01-02 | Calabria              |     1630 |                6737.0000 |                             1887 |
| 2022-01-02 | Campania              |     8474 |                6737.0000 |                             9139 |
| 2022-01-02 | Emilia-Romagna        |    14239 |                6737.0000 |                            15088 |
| 2022-01-02 | Friuli Venezia Giulia |     4228 |                6737.0000 |                             4487 |
| 2022-01-02 | Lazio                 |     9290 |                6737.0000 |                             9823 |
| 2022-01-02 | Liguria               |     4591 |                6737.0000 |                             4873 |
| 2022-01-02 | Lombardia             |    35140 |                6737.0000 |                            37184 |
| 2022-01-02 | Marche                |     3252 |                6737.0000 |                             3418 |
| 2022-01-02 | Molise                |      512 |                6737.0000 |                              528 |
| 2022-01-02 | P.A. Bolzano          |     1308 |                6737.0000 |                             1357 |
| 2022-01-02 | P.A. Trento           |     1423 |                6737.0000 |                             1473 |
| 2022-01-02 | Piemonte              |    12065 |                6737.0000 |                            12601 |
| 2022-01-02 | Puglia                |     6987 |                6737.0000 |                             7215 |
| 2022-01-02 | Sardegna              |     1731 |                6737.0000 |                             1844 |
| 2022-01-02 | Sicilia               |     7527 |                6737.0000 |                             8527 |
| 2022-01-02 | Toscana               |     7568 |                6737.0000 |                             8263 |
| 2022-01-02 | Umbria                |     1508 |                6737.0000 |                             1623 |
| 2022-01-02 | Valle d'Aosta         |      488 |                6737.0000 |                              507 |
| 2022-01-02 | Veneto                |    12408 |                6737.0000 |                            13169 |
...
too many lines. Truncated!
…

As you can notice, we defined two window functions. The special case of OVER(), without any argument, means the entire result set is considered to apply the aggregation function. The second OVER clause defines partitions based on the region field.

Calculate daily new cases, the non-aggregate functions

Since we have in the table only the number of total cases, we would like to calculate the number of new cases on a per-day basis instead. This way we can understand if the status of the pandemic is getting worse or improving.

We need to define a Window Function to span the previous day’s row and calculate the difference between today’s total cases and yesterday’s total cases.

You can use the following query where a new special “non-aggregate” function is used.

mysql> SELECT region, day, total_cases,
    -> total_cases - LAG(total_cases) OVER(PARTITION BY region ORDER BY day ASC) AS 'new_cases'
    -> FROM covid
    -> WHERE day BETWEEN '2022-02-01' AND '2022-02-09';
+-----------------------+------------+-------------+-----------+
| region                | day        | total_cases | new_cases |
+-----------------------+------------+-------------+-----------+
| Abruzzo               | 2022-02-01 |      219045 |      NULL |
| Abruzzo               | 2022-02-02 |      221638 |      2593 |
| Abruzzo               | 2022-02-03 |      224389 |      2751 |
| Abruzzo               | 2022-02-04 |      226689 |      2300 |
| Abruzzo               | 2022-02-05 |      228922 |      2233 |
| Abruzzo               | 2022-02-06 |      231001 |      2079 |
| Abruzzo               | 2022-02-07 |      232079 |      1078 |
| Abruzzo               | 2022-02-08 |      235013 |      2934 |
| Abruzzo               | 2022-02-09 |      237059 |      2046 |
| Basilicata            | 2022-02-01 |       65200 |      NULL |
| Basilicata            | 2022-02-02 |       66235 |      1035 |
| Basilicata            | 2022-02-03 |       67252 |      1017 |
| Basilicata            | 2022-02-04 |       68168 |       916 |
| Basilicata            | 2022-02-05 |       69209 |      1041 |
| Basilicata            | 2022-02-06 |       70007 |       798 |
| Basilicata            | 2022-02-07 |       70435 |       428 |
| Basilicata            | 2022-02-08 |       71610 |      1175 |
| Basilicata            | 2022-02-09 |       72521 |       911 |
| Calabria              | 2022-02-01 |      171780 |      NULL |
| Calabria              | 2022-02-02 |      173814 |      2034 |
| Calabria              | 2022-02-03 |      176029 |      2215 |
| Calabria              | 2022-02-04 |      177441 |      1412 |
| Calabria              | 2022-02-05 |      178709 |      1268 |
| Calabria              | 2022-02-06 |      180565 |      1856 |
| Calabria              | 2022-02-07 |      181634 |      1069 |
| Calabria              | 2022-02-08 |      184087 |      2453 |
| Calabria              | 2022-02-09 |      185666 |      1579 |
…
too many lines. Truncated
…

The LAG function is a “non-aggregate” function introduced for the Window Functions usage only. The LAG function returns the value of the argument from the row lagging the current row within the partition. If nothing is specified as an argument, the previous row is considered. So, it’s our case, because we need yesterday’s total case to calculate the difference.

The LAG function can be also used as follows:

LAG(total_cases, 3)

In such a case it returns the value of total_cases from the row that lags (precedes) the current row by three rows within its partition.

Non-aggregate functions

We have introduced the LAG() function as non-aggregate. It is not the only one available. These functions perform for each row from a query, a calculation using rows related to that row.

The following table comes from the official documentation. There are all the available non-aggregate functions you can use with Window Functions.

Here is another example with some non-aggregate functions usage:

mysql> SELECT ROW_NUMBER() OVER(PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING) AS 'row number',
    -> region, day,
    -> FIRST_VALUE(home_isolation) OVER(PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING) AS 'first',
    -> LAST_VALUE(home_isolation) OVER(PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING) AS 'last'
    -> FROM covid
    -> WHERE day BETWEEN '2021-10-10' AND '2021-10-15';
+------------+-----------------------+------------+-------+-------+
| row number | region                | day        | first | last  |
+------------+-----------------------+------------+-------+-------+
|          1 | Abruzzo               | 2021-10-10 |  1439 |  1439 |
|          2 | Abruzzo               | 2021-10-11 |  1439 |  1360 |
|          3 | Abruzzo               | 2021-10-12 |  1439 |  1342 |
|          4 | Abruzzo               | 2021-10-13 |  1439 |  1332 |
|          5 | Abruzzo               | 2021-10-14 |  1439 |  1343 |
|          6 | Abruzzo               | 2021-10-15 |  1439 |  1374 |
|          1 | Basilicata            | 2021-10-10 |  1135 |  1135 |
|          2 | Basilicata            | 2021-10-11 |  1135 |  1117 |
|          3 | Basilicata            | 2021-10-12 |  1135 |  1104 |
|          4 | Basilicata            | 2021-10-13 |  1135 |  1102 |
|          5 | Basilicata            | 2021-10-14 |  1135 |  1055 |
|          6 | Basilicata            | 2021-10-15 |  1135 |  1028 |
|          1 | Calabria              | 2021-10-10 |  2818 |  2818 |
|          2 | Calabria              | 2021-10-11 |  2818 |  2752 |
|          3 | Calabria              | 2021-10-12 |  2818 |  2734 |
|          4 | Calabria              | 2021-10-13 |  2818 |  2707 |
|          5 | Calabria              | 2021-10-14 |  2818 |  2707 |
|          6 | Calabria              | 2021-10-15 |  2818 |  2738 |
|          1 | Campania              | 2021-10-10 |  6206 |  6206 |
|          2 | Campania              | 2021-10-11 |  6206 |  6024 |
|          3 | Campania              | 2021-10-12 |  6206 |  5843 |
|          4 | Campania              | 2021-10-13 |  6206 |  5797 |
|          5 | Campania              | 2021-10-14 |  6206 |  5881 |
|          6 | Campania              | 2021-10-15 |  6206 |  5900 |
…
too many lines. Truncated
…

For more details about non-aggregate functions, look at the official documentation: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html

Named windows

Take a look at the last query executed. We have used multiple non-aggregate functions and we have defined multiple times the same window. We can simplify the query by defining only once the window providing a specific name. Then we can use that name to address the window without repeating the definition.

Then the previous query can be rewritten as follows:

mysql> SELECT ROW_NUMBER() OVER wf AS 'row_number',
    -> region, day,
    -> FIRST_VALUE(home_isolation) OVER wf AS 'first',
    -> LAST_VALUE(home_isolation) OVER wf AS 'last'
    -> FROM covid
    -> WHERE day BETWEEN '2021-10-10' AND '2021-10-15'
    -> WINDOW wf AS (PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING);
+------------+-----------------------+------------+-------+-------+
| row_number | region                | day        | first | last  |
+------------+-----------------------+------------+-------+-------+
|          1 | Abruzzo               | 2021-10-10 |  1439 |  1439 |
|          2 | Abruzzo               | 2021-10-11 |  1439 |  1360 |
|          3 | Abruzzo               | 2021-10-12 |  1439 |  1342 |
|          4 | Abruzzo               | 2021-10-13 |  1439 |  1332 |
|          5 | Abruzzo               | 2021-10-14 |  1439 |  1343 |
|          6 | Abruzzo               | 2021-10-15 |  1439 |  1374 |
|          1 | Basilicata            | 2021-10-10 |  1135 |  1135 |
|          2 | Basilicata            | 2021-10-11 |  1135 |  1117 |
|          3 | Basilicata            | 2021-10-12 |  1135 |  1104 |
|          4 | Basilicata            | 2021-10-13 |  1135 |  1102 |
|          5 | Basilicata            | 2021-10-14 |  1135 |  1055 |
|          6 | Basilicata            | 2021-10-15 |  1135 |  1028 |
…
too many lines. Truncated
…

You create the named window using the WINDOW clause at the end of the query. You provide a name and the partition definition, then you can reference that window simply by name in the OVER clauses.

Multiple Window Functions can be created and named in the same query, like in the following example:

mysql> SELECT region, day,
    -> FIRST_VALUE(home_isolation) OVER wf_full AS 'first_full',
    -> FIRST_VALUE(home_isolation) OVER wf_last4days AS 'first_last4days'
    -> FROM covid
    -> WHERE day BETWEEN '2021-12-01' AND '2021-12-10'
    -> WINDOW wf_full AS (PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING),
    -> wf_last4days AS (PARTITION BY region ORDER BY day ROWS BETWEEN 4 PRECEDING AND 0 FOLLOWING);
+-----------------------+------------+------------+-----------------+
| region                | day        | first_full | first_last4days |
+-----------------------+------------+------------+-----------------+
| Abruzzo               | 2021-12-01 |       4116 |            4116 |
| Abruzzo               | 2021-12-02 |       4116 |            4116 |
| Abruzzo               | 2021-12-03 |       4116 |            4116 |
| Abruzzo               | 2021-12-04 |       4116 |            4116 |
| Abruzzo               | 2021-12-05 |       4116 |            4116 |
| Abruzzo               | 2021-12-06 |       4116 |            4376 |
| Abruzzo               | 2021-12-07 |       4116 |            4593 |
| Abruzzo               | 2021-12-08 |       4116 |            4776 |
| Abruzzo               | 2021-12-09 |       4116 |            4958 |
| Abruzzo               | 2021-12-10 |       4116 |            4920 |
| Basilicata            | 2021-12-01 |       1046 |            1046 |
| Basilicata            | 2021-12-02 |       1046 |            1046 |
| Basilicata            | 2021-12-03 |       1046 |            1046 |
| Basilicata            | 2021-12-04 |       1046 |            1046 |
| Basilicata            | 2021-12-05 |       1046 |            1046 |
| Basilicata            | 2021-12-06 |       1046 |            1050 |
| Basilicata            | 2021-12-07 |       1046 |            1094 |
| Basilicata            | 2021-12-08 |       1046 |            1101 |
| Basilicata            | 2021-12-09 |       1046 |            1125 |
| Basilicata            | 2021-12-10 |       1046 |            1120 |
| Calabria              | 2021-12-01 |       4374 |            4374 |
| Calabria              | 2021-12-02 |       4374 |            4374 |
| Calabria              | 2021-12-03 |       4374 |            4374 |
| Calabria              | 2021-12-04 |       4374 |            4374 |
| Calabria              | 2021-12-05 |       4374 |            4374 |
| Calabria              | 2021-12-06 |       4374 |            4537 |
| Calabria              | 2021-12-07 |       4374 |            4701 |
| Calabria              | 2021-12-08 |       4374 |            5050 |
| Calabria              | 2021-12-09 |       4374 |            5152 |
| Calabria              | 2021-12-10 |       4374 |            5233 |
…
too many lines. Truncated
…

Frame specification

As already shown in the queries we have tested so far, we can provide a frame clause when defining the Window Function. A frame is a subset of the current partition and the frame clause specifies how to define the subset.

Frames are determined with respect to the current row, which enables a frame to move within a partition depending on the location of the current row within its partition.

mysql> SELECT region, day,
    -> SUM(intensive_care) OVER(PARTITION BY region ORDER BY day ROWS UNBOUNDED PRECEDING) AS 'running_total',
    -> AVG(intensive_care) OVER(PARTITION BY region ORDER BY day ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS 'running_average'
    -> FROM covid
    -> WHERE day BETWEEN '2022-01-01' AND '2022-01-10';
+-----------------------+------------+---------------+-----------------+
| region                | day        | running_total | running_average |
+-----------------------+------------+---------------+-----------------+
| Abruzzo               | 2022-01-01 |            21 |         21.5000 |
| Abruzzo               | 2022-01-02 |            43 |         21.6667 |
| Abruzzo               | 2022-01-03 |            65 |         22.3333 |
| Abruzzo               | 2022-01-04 |            88 |         23.0000 |
| Abruzzo               | 2022-01-05 |           112 |         23.6667 |
| Abruzzo               | 2022-01-06 |           136 |         24.3333 |
| Abruzzo               | 2022-01-07 |           161 |         25.3333 |
| Abruzzo               | 2022-01-08 |           188 |         26.3333 |
| Abruzzo               | 2022-01-09 |           215 |         26.3333 |
| Abruzzo               | 2022-01-10 |           240 |         26.0000 |
| Basilicata            | 2022-01-01 |             1 |          1.5000 |
| Basilicata            | 2022-01-02 |             3 |          1.6667 |
| Basilicata            | 2022-01-03 |             5 |          2.3333 |
| Basilicata            | 2022-01-04 |             8 |          2.6667 |
| Basilicata            | 2022-01-05 |            11 |          2.6667 |
| Basilicata            | 2022-01-06 |            13 |          2.3333 |
| Basilicata            | 2022-01-07 |            15 |          2.0000 |
| Basilicata            | 2022-01-08 |            17 |          2.0000 |
| Basilicata            | 2022-01-09 |            19 |          2.0000 |
| Basilicata            | 2022-01-10 |            21 |          2.0000 |
| Calabria              | 2022-01-01 |            28 |         28.0000 |
| Calabria              | 2022-01-02 |            56 |         28.3333 |
| Calabria              | 2022-01-03 |            85 |         28.6667 |
| Calabria              | 2022-01-04 |           114 |         29.0000 |
| Calabria              | 2022-01-05 |           143 |         29.3333 |
| Calabria              | 2022-01-06 |           173 |         30.3333 |
| Calabria              | 2022-01-07 |           205 |         31.3333 |
| Calabria              | 2022-01-08 |           237 |         32.6667 |
| Calabria              | 2022-01-09 |           271 |         33.3333 |
| Calabria              | 2022-01-10 |           305 |         34.0000 |
…
too many lines. Truncated
…

The boundaries of the frame specification can include:

UNBOUNDED PRECEDING: frame starts at the first row of the partition.
N PRECEDING: a physical N of rows before the first current row. N can be a literal number or an expression that evaluates to a number.
CURRENT ROW: the row of the current calculation
UNBOUNDED FOLLOWING: the frame ends at the final row in the partition.
N FOLLOWING: a physical N of rows after the current row.

Conclusion

Windows Functions is a new amazing feature introduced in MySQL 8.0. This feature permits us to execute fewer and more efficient queries to solve specific problems. Without Window Functions, you may be forced to create temporary tables or not efficient views to achieve the same result by running multiple queries.

For more details, you can take a look at the official documentation:
https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

Test the Window Functions on Percona Server for MySQL 8.0.

Written by Corrado Pandiani in: Insight for Developers,MySQL,mysql-and-variants,Open Source,Percona Software,Zend Developer |

Aug
10
2022

Building Percona Server for MySQL 8.0 with RocksDB Storage Engine on macOS

RocksDB Storage Engine on MacOS

RocksDB Storage Engine on MacOS In Percona Server for MySQL 8.0.29-21, we added one more patch that helps us to build server code on macOS. To be precise here, we still could do this even before this patch but only partially. Now it is possible to build RocksDB Storage Engine as well.

A word of disclaimer here, at the moment, by macOS we still understand macOS for Intel x86_64 architecture (the most recent ARM versions with Apple M1 / M2 processors are out of the scope of this blog post). Moreover, Percona does not provide official macOS packages for Percona Server for MySQL. It’s just that a lot of our developers use Macs and we care a lot about this platform. As proof, here is a link to our Azure pipelines CI scripts with the latest CI changes that accompany this patch.

Prerequisites

In order to build Percona Server for MySQL, we need to install a number of dependencies. The most important one is Xcode which can be downloaded and installed directly from the App Store. Make sure that you run it at least once after installing. You will be asked to install Xcode Command Line Tools – please answer “yes” as this is the component we are looking for.

We will also need to install a number of third-party libraries on which Percona Server for MySQL code depends and the easiest way to do this would be via the brew package manager. If you don’t already have it installed, please follow the instructions.

After that, install the following tools/libraries via brew:

brew install cmake zstd opensl@1.1 libevent lz4 icu4c protobuf libfido2

Please also notice that macOS itself, Xcode Command Line Tools, and brew packages are being constantly updated and when you try to follow the provided instructions their versions may not be the same as at the time when this blog post was written and as a result, something may not go as expected. So, just in case, I am including an excerpt from the brew config output to simply give you an idea of how old those components were.

HOMEBREW_VERSION: 3.5.4-52-g0070591
ORIGIN: https://github.com/Homebrew/brew
HEAD: 007059160f1a9d7afba296e9aa30ab52d4ef29b7
Core tap ORIGIN: https://github.com/Homebrew/homebrew-core
Core tap HEAD: 83ce9a03239fe496e292f23a576bc7d1bcea4cca
Core tap branch: master
Clang: 13.1.6 build 1316
Git: 2.32.1 => /Library/Developer/CommandLineTools/usr/bin/git
Curl: 7.79.1 => /usr/bin/curl
macOS: 12.4-x86_64
Xcode: 13.4.1

Building

The first step would be to create a workspace directory and make it default:

mkdir ws
cd ws

Second, download Percona Server for MySQL source code from the git repository. In these instructions, we will be checking out code marked with the Percona-Server-8.0.29-21 tag (just because it was the most recent release when this blog post was written) but feel free to experiment with the trunk (the head of the 8.0 branch).

git clone https://github.com/percona/percona-server.git
cd percona-server
git checkout -b current Percona-Server-8.0.29-21
git submodule init
git submodule update

Then, we create a separate directory percona-build for intermediate objects and result libraries/executables.

cd ..
mkdir percona-build
cd percona-build

After that, we need to run cmake to configure the project and generate Unix Makefiles.

cmake ../percona-server \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DBUILD_CONFIG=mysql_release \
  -DMYSQL_MAINTAINER_MODE=OFF \
  -DDOWNLOAD_BOOST=ON \
  -DWITH_BOOST=../deps \
  -DWITH_SYSTEM_LIBS=ON \
  -DWITHOUT_TOKUDB=ON
  -DWITH_ROCKSDB=ON

I am not going to describe every parameter here as there is an official MySQL Server documentation page but the most important ones are:

-DCMAKE_BUILD_TYPE=RelWithDebInfo – configure to build optimized binaries with Debug information
-DMYSQL_MAINTAINER_MODE=OFF – do not treat compiler warnings as errors
-DWITH_SYSTEM_LIBS=ON – use system libraries (those we installed via the brew) instead of their bundled version
-DWITHOUT_TOKUDB=ON – do not build TokuDB Storage Engine, it is deprecated and is incompatible with macOS
-DWITH_ROCKSDB=ON – build with RocksDB Storage Engine (the main reason for this blog post)

And finally, initiate the build process

cmake --build . -- -j

Please be patient as this may take from 15 to 60 min depending on your hardware.

Testing

Just to make sure that everything has been built successfully let us run a few MySQL tests via MTR.

./mysql-test –debug-server rocksdb.1st

If you see something like this:

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 25%] rocksdb.1st 'write_prepared'              [ pass ]     89
[ 50%] rocksdb.1st 'write_unprepared'            [ pass ]     85
[ 75%] rocksdb.1st 'write_committed'             [ pass ]     85
[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------

Congratulations, you have successfully built Percona Server for MySQL with RocksDB on macOS.

Written by Yura Sorokin in: Insight for DBAs,MySQL,mysql-and-variants,Percona Server for MySQL,Percona Software,storage engine,Zend Developer |

Aug
10
2022

MySQL 8.0 Dynamic Redo Log Sizing

Dynamic Redo Log Sizing

This blog post will discuss the newest feature available in MySQL 8.0.30: dynamic redo log sizing. After the InnoDB buffer pool size, we can say that having a proper size for the redo logs is crucial for MySQL performance. There are numerous blog posts about how to calculate a good redo log size. One of our classic blog posts is this one from Baron: How to calculate a good InnoDB log file size (That blog post is from 2008 and is still a valid formula to use.)

Now, which problem is this feature trying to solve?

In MySQL, the buffer pool and redo log size are settings that do not change frequently. Usually, these settings are set during database installation; after that, they are forgotten until an issue arises. The problem is that they used to be static, which means that you had to restart MySQL so the changes could take effect.

MySQL solved the buffer pool problem with dynamic resizing in MySQL 5.7. Now, MySQL 8.0.30 solved the redo log problem. To resize is simple and straightforward:

mysql> SET GLOBAL innodb_redo_log_capacity = 2*1024*1024*1024;

This setting works to resize to a higher and a lower value than the current one defined. To make this work, the redo log files now reside in a new directory inside the datadir named #innodb_redo unless a different directory is specified by the innodb_log_group_home_dir variable. There are two types of redo log files, ordinary and spare. Ordinary redo log files are those being used. Spare redo log files are those waiting to be used. InnoDB maintains 32 redo log files in total, with each file equal in size to 1/32 * innodb_redo_log_capacity; however, file sizes may differ for a time after modifying the innodb_redo_log_capacity setting.

Redo log files use a #ib_redoN naming convention, where N is the redo log file number. Spare redo log files are denoted by a _tmp suffix. The following example shows the redo log files in a #innodb_redo directory, where 18 active redo log files and 14 spare redo log files are numbered sequentially.

$ ls
#ib_redo31  #ib_redo33 #ib_redo35  #ib_redo37 #ib_redo39  #ib_redo41 #ib_redo43  #ib_redo45 #ib_redo47  #ib_redo49_tmp  #ib_redo51_tmp  #ib_redo53_tmp  #ib_redo55_tmp  #ib_redo57_tmp  #ib_redo59_tmp  #ib_redo61_tmp
#ib_redo32  #ib_redo34 #ib_redo36  #ib_redo38 #ib_redo40  #ib_redo42 #ib_redo44  #ib_redo46 #ib_redo48  #ib_redo50_tmp  #ib_redo52_tmp  #ib_redo54_tmp  #ib_redo56_tmp  #ib_redo58_tmp  #ib_redo60_tmp  #ib_redo62_tmp

Several status variables are provided for monitoring the redo log and redo log capacity resize operations; for example, you can query Innodb_redo_log_resize_status to view the status of a resize operation:

mysql> SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_resize_status';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| Innodb_redo_log_resize_status |  OK   |
+-------------------------------+-------+

The Innodb_redo_log_capacity_resized status variable shows the current redo log capacity limit:

mysql> SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_capacity_resized';
+----------------------------------+------------+
| Variable_name                    | Value      |
+----------------------------------+------------+
| Innodb_redo_log_capacity_resized | 3221225472 |
+----------------------------------+------------+

Other status variables include:

MySQL 8.0.30 also introduced some new error messages related to this issue. To mention a few:

In case you want to check for more, you can use this link. All redo log error-related messages use the ER_IB_MSG_LOG_WRITER_ naming convention.

Estimating the redo log capacity

With these new status variables, we can estimate the redo log size using the following query:

mysql> SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_current_lsn'; SELECT SLEEP(60); SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_current_lsn';

And with the outputs provided by the previous query, we run the following:

mysql > SELECT ABS(20641693317 - 20903377487) / 1024 / 1024 AS MB_per_min;
+--------------+
| MB_per_min   |
+--------------+
| 249.56147194 |
+--------------+
1 row in set (0.00 sec)

So, in this case, writing around ~250MB/min in the redo log files, a good value would be 250*60(1 hour), equivalent to 15GB.

Deprecated parameters

To make this work, MySQL deprecated two parameters: innodb_log_files_in_group and innodb_log_file_size .

If you try to set them in in the my.cnf file, they will be ignored, and a warning will be printed in the error log:

2022-08-07T20:23:39.898370Z 0 [Warning] [MY-013869] [InnoDB] Ignored deprecated configuration parameter innodb_log_file_size. Used innodb_redo_log_capacity instead.
2022-08-07T20:23:39.898441Z 0 [Warning] [MY-013870] [InnoDB] Ignored deprecated configuration parameter innodb_log_files_in_group. Used innodb_redo_log_capacity instead.

Conclusion

The dynamic redo log sizing brings more flexibility to the DBA in case he needs to resize, and the application cannot stop for a MySQL restart. At the moment of this writing, there are no bug reports related to this feature. One piece of advice is to check if your backup tools are compatible with MySQL 8.0.30 and its new features. Lastly, always make sure to review your existing my.cnf due to the deprecated parameters.

Useful Resources

You can reach us through social networks, our forum, or access our material using the links presented below:

Written by Vinicius Grippa in: Insight for DBAs,MySQL,mysql-and-variants,Zend Developer |

« Newer Posts — Older Posts »

The Glass is twice as large as it needs to be

How do I get the cluster?

Limitations

Try DBaaS in Percona Monitoring and Management

Install PMM server

DBaaS

It’s time for the release roundup!

Percona Distribution for MySQL (Percona Server-based variant) 8.0.29

Percona Server for MySQL 8.0.29-21

Percona Server for MySQL 5.7.39-42

Percona Distribution for MongoDB 5.0.10-9

Percona Server for MongoDB 5.0.10-9

Have a topic suggestion? If you are following this series and have a topic you want covered, please let me know.

How does it work?

Conclusion

The not-so-good parts of MyRocks

Conclusion

Appendix:

Load some public data

What are Window Functions

The first example: hospitalizations in Central Italy area

Multiple window functions in one query

Calculate daily new cases, the non-aggregate functions

Non-aggregate functions

Named windows

Frame specification

Conclusion

Prerequisites

Building

Testing

Estimating the redo log capacity

Deprecated parameters

Conclusion

Useful Resources

Tag Cloud

Archives

Contributors