Apr
27
2022
--

A Quick Peek At MySQL 8.0.29

review of MySQL 8.0.29

MySQL 8.0.29Oracle released MySQL Server 8.0.29 on April 26th and this is a quick review of the release notes.  I have put my own comments in italics.

So what is in the ’29 release of MySQL Server?  Does it come festooned with new, neat features or is it a big bug-fix bonanza?

The TL;DR

While this server release has some interesting stuff, there is no compelling feature that will necessitate an immediate upgrade.  Read through the release notes to see if anything in there that is a must for you but for most of us, MySQL 8.0.29 does not require an immediate update.  If this was a birthday or holiday present, ’29 is the equivalent of getting a fresh box of dental floss – useful but not thrilling.

The shell is evolving too and the new version for VS Code looks promising.

MySQL Server 8.0.29

UTF8MB3?

The server now makes extensive use of UTF8MB3 (yes 3, not 4)  for the output of SHOW commands, reporting on invalid strings, when populating data dictionary tables from built-in character sets

Goodbye Arbitrary Delimiters in DATETIME/TIMESTAMP

The use of any nonstandard or excess delimiter or whitespace characters now triggers a warning.  MySQL used to allow you to use arbitrary delimiters and use an arbitrary number of them in TIME, DATE, DATETIME, and TIMESTAMP literals plus an arbitrary number of whitespaces before, after, and between the date and time values in DATETIME and TIMESTAMP literals. Those halcyon days are now over.  Expect this ability to be removed in a future release.

Deprecated Variables & Options

The replica_parallel_type, query_prealloc_size, and transaction_prealloc_size variables are deprecated with the mysqld options of  –abort-slave-event-count and –disconnect-slave-event-count.

PRIMARY and SECONDARY Engine

The Performance Schema now tracks the PRIMARY and SECONDARY engine used to process a query. Usually, this will be InnoDB and HeatWave. There is now an EXECUTION_ENGINE column to record which engine did the work with a COUNT_SECONDARY column to know how many times that engine was used.

SSL Libraries

Version bumped from1.1.1l o 1.1.1.n.

IF NOT EXITS

We get IF NOT EXISTS checks on function, procedure, and triggers. We needed this for a l-o-n-g time.

Detached XA Transactions

Group replication had a problem where XA transactions prepared on another connection could not be committed.  A prepared XA transaction can now be committed or rolled back by another connection. The current session can also then initiate another XA or local transaction without waiting for the prepared XA transaction to complete.

Classic and X Protocol Lock Files

It was possible that the lock file from either the MySQL Classic or X Protocol, which have different formats, could keep a server from starting. But now they use the same format.

Instant Drop Table

ALTER TABLE … DROP COLUMN  ALGORITHM=INSTANT is now available.

Secure Persisted Variables

Rather than use a keyring, we can now have secure persisted variables. You still need a keyring component instead of the keyring plugin to hold the encrypted information. So the data stays with the operating system in a file, not the keyring.

Clone Delayed

Cloning data too after dropping the old data is problematic for some operating systems that free things up asynchronously. So the new clone_delay_after_data_drop variable lets you wait up to an hour before your clone arrives.  Too bad we don’t get an Uber-like ‘your clone arriving in x minutes’ status update.

MyISAM

The myisam_repair_threads variable and myisamchk –parallel-check option is deprecated.

Isolated Variables

Better code for parsing, resolution, and execution of SQL statements now better enforces the isolation of access to system variables.

Bug Fixes

There are many bug fixes that take care of things from anonymous users with the PROCESS privilege could not select rows from the processlist table to histograms now storing buckets in dynamic arrays instead to reduce space overhead and will speed up selectivity estimation due to reduced indirection when performing a binary search on the buckets. I recommend reading through the bug fixes with each new release because they can trigger an ‘ah-ha’ moment when you find something that is impacting your instances that were on the edge of your awareness or provide a flood of endorphins from the schadenfreude of others’ problems.

MySQL Shell

MySQL Shell has a  new diagnostics utility, util.debug.collectDiagnostics, that gathers diagnostic information from the connected MySQL server. You can get reports in TSV and YAML formats before them in a zip archive in the location of your choice. This utility enables you to retrieve diagnostic information from standalone servers, members of replication topologies, MySQL Database Service DB Systems, and so on.

And there is a shell, prompt() function to provide a way for scripts to interact with a user.

Apr
27
2022
--

Working With Large PostgreSQL Databases

Working With Large PostgreSQL Databases

It’s a funny thing when the topic of database sizes comes up. Calling one small, medium, large, or even huge isn’t as straightforward as you’d think. Distinguishing the size of a database is based upon a number of factors whose characteristics can be classified as either “tangible”, things that you can measure in an objective manner, or “intangible”, those attributes best expressed using the catch-all phrase “it depends”. For example, a 2TB database is, to many people, a large database. On the other hand, a veteran DBA could describe a PostgreSQL database cluster as large when it enters the realm of Peta-Bytes.

Here’s a recap of some of PostgreSQL’s basic capabilities:

database size

unlimited

number of databases

4,294,950,911

relations per database

1,431,650,303

table size

32TB

rows per table, defined by the number
of tuples that can fit onto the page

4,294,967,295 pages

field per table

1,600

field size

1GB

identifier length

63 bytes

indexes per table

unlimited

columns per index

32

partition keys

32

NB: Despite possible physical constraints one faces when creating large numbers of schema, there is no theoretical limitation to the number created in postgres.

I’ve come to differentiate a small database from a large one using the following caveats. And while it is true that some of the caveats for a large database can be applied to a small one, and vice-versa, the fact of the matter is that most of the setups out there in the wild follow these observations:

  1. Small databases are often administered by a single person
  2. Small databases can be managed manually.
  3. Small databases are minimally tuned.
  4. Small databases can typically tolerate production inefficiencies more than large ones.
  5. Large databases are managed using automated tools.
  6. Large databases must be constantly monitored and go through an active tuning life cycle.
  7. Large databases require rapid response to imminent and ongoing production issues to maintain optimal performance.
  8. Large databases are particularly sensitive to technical debt.

Large databases often bring up the following questions and issues:

  • Is the system performance especially sensitive to changes in production load?
  • Is the system performance especially sensitive to minor tuning effects?
  • Are there large amounts of data churn?
  • Does the database load system saturate your hardware’s capabilities?
  • Do the maintenance activities, such as logical backups and repacking tables, take several days or even more than a week?
  • Does your Disaster Recovery Protocol require having a very small Recovery Point Objective (RPO) or Recovery Time Objective (RTO)?

The key difference between a small vs large database is how they are administered:

  1. Whereas it is common that small databases are manually administered, albeit it’s not best practice, using automation is the industry default mode of operation in many of these situations for large databases.
  2. Because circumstances can change quickly, large databases are particularly sensitive to production issues.
  3. Tuning is constantly evolving; while it is true that newly installed architectures are often well-tuned, circumstances change as they age and large databases are especially vulnerable.

Good planning is your friend: addressing potential issues for a large database by anticipating future conditions is the goal i.e. testing the entire infrastructure before it goes into production. 

Scripting your build environment using tools such as Ansible, Puppet, Terraform, etc. mitigates human error when provisioning the underlying infrastructure. It’s important to be able to build in a consistent and repeatable manner.

Once a database is in production it must be monitored and wired with alerts for the various critical thresholds. Aside from the standard errors, consider configuring your monitoring solution to follow the “Rule Of Three”. Select and watch only three metrics that track and alert for a specific “change of state”. This is not to be confused with following a particular issue, rather it is meant to inform you that you should pay attention to your system in order to understand that something has changed from what is considered normal behavior. Depending on your preferences you may want to watch for known production issues or when the system is stable you might be more interested in trending alerts such as query performance which have slowed below a predefined threshold.

In regards to system tuning: while small databases can, after a fashion, perform in a satisfactory manner using the default values large databases cannot. Configuring initial tuning parameters such as the shared_buffers etc is de rigueur but you should also monitor the environment in order to trend issues such as for example bloat and long-term query performance. Remember, the most common problem experienced by an otherwise stable and well-thought-out architecture is table and index bloat. Addressing bloat by tuning the autovacuum characteristics is essential.

Monitoring, especially before and after maintenance windows, is required because they can catch potential problems to the update before becoming production issues.

Pay close attention to following the regular maintenance activities during the life-cycle of your system:

  • Logical backups and misc database redundancies
  • Architectural evolution:
    • application stack updates, upgrades, and rollbacks
    • application/database scaling
  • PostgreSQL server upgrades:
    • minor
    • major
  • Database migrations
  • Hardware upgrades
  • Scaling the cluster by adding and removing server nodes

Maintenance activities such as logical backups and PostgreSQL minor upgrades are performed at regular intervals.

Plan for space utilization requirements of logical dumps and WAL archives.

In regards to logical backups: it can be difficult to justify backing up an entire database when it can take a week. Alternatively, differential backups are a potential solution. Backing up tables that are updated and deleted regularly can be archived at a faster frequency than the slower changing tables which can be stored without changes for a longer period of time. This approach however requires the appropriate architectural design considerations such as using table partitioning. 

An alternative to logical backups is to consider Copy On Write (COW), or stacked file systems, such as ZFS and BTRFS. Environments within containers for example can leverage snapshots and clones allowing for near-instant recoveries in a disaster recovery scenario.

Complex operations, such as hardware and database scaling, encompass many sub-activities and can often involve working with several teams at the same time. In this case, maintaining reference documentation is critical. Activities such as these are best tracked and planned in a Kanban, or Scrum, environment.

In regards to Disaster Recovery (DR) consider automating the following operations:

  • Recovery via Logical backups
  • Failover of a PRIMARY to a REPLICA
  • Dropping/Building a new REPLICA
  • Point In Time Recovery (PITR): rebuilding a cluster to a point in time

As an aside to PITR: instead of rebuilding an entire data cluster from scratch to a particular point in time, one can instead create a STANDBY host that is replicated on a delay and can be recovered to a particular point in time or promoted in its current state. Refer to run-time parameter recovery_min_apply_delay for more information.

In conclusion, while small databases can be managed by administrating in an ad hoc manner, the administration of a large database must always be performed using a more rigorous and conscientious approach. And what you learn from administering a large database can be carried over to administering a small one.

REFERENCES:

Apr
26
2022
--

Tommy Bahama Kids Beach Chair

Summertime is right around the corner, and your little ones will be spending plenty of time outdoors. Make sure they’re prepared with quality kids’ beach chair. Tommy Bahama has you covered with their selection of beach chairs for kids.

These chairs are perfect for the beach, pool, or even the backyard. They’re lightweight and easy to carry, so your kids can take them anywhere. The chair also has a built-in cup holder, so they can keep their drinks close by. The Tommy Bahama Kids Beach Chair is available in a variety of fun, bright colors.

What is Tommy Bahama?

Tommy Bahama is a lifestyle brand that offers casual, sophisticated apparel and accessories for the whole family. The company was founded in 1993 and is headquartered in Seattle, Washington. Tommy Bahama is best known for its island-inspired clothing, furniture, and home decor. After all, the brand’s mission is to “bring the spirit of the islands to life.” Tommy Bahama products are available online and in retail stores across the United States.

The company offers various designs for kids’ beach chairs, each with its own unique features. The kids’ chairs are made from durable materials that can withstand the elements. The chairs are also easy to clean, so you don’t have to worry about getting dirty at the beach.

Best Tommy Bahama Kids Beach Chairs

Let’s look at some of the best Tommy Bahama kids’ beach chairs. All of those are easily available on Amazon, and you can be sure to find the perfect chair for your child at the right price.

Tommy Bahama 5 Position Classic Lay Flat Beach Chair

This Tommy Bahama beach chair is made from durable aluminum. Thanks to that, the chair is lightweight and easy to carry, making it perfect for the beach.

The chair has five different recline positions, so your child can find the perfect position for them. With this chair, it is even possible to lay completely flat, ideal for sunbathing or taking a nap.

The chair also has a cup holder, so your child can keep their drinks close by. The bar on the back of the chair doubles as a towel bar that keeps the beach towel off of the sand. That’s smart!

The Tommy Bahama 5 Position Classic Lay Flat Beach Chair is available in a variety of colors.

Tommy Bahama Hi-Boy 17″ Seat Height 4-Position Lace-Up Suspension Folding Backpack Beach Chair

This Tommy Bahama beach chair is a bit different from the others. It has a higher seat, making it perfect for taller kids or adults. The chair also has four different recline positions, so your child can be comfortable in any position.

The chair has a lace-up suspension system that makes it sturdy and comfortable. Other features of this beach are a drink holder and a storage pouch which is perfect for holding sunscreen, toys, or snacks.

Solid hardwood armrests are adjustable and provide added comfort. They are also safe, as they offer the patented no-pinch feature that protects fingers.

Like the previous kids’ beach chair, this one also includes a padded shoulder strap for easy carrying.

Tommy Bahama 5-Position Classic Lay Flat Folding Backpack Beach Chair

This Tommy Bahama beach chair seems to combine the features of the two previous chairs. However, there is one key difference: the low-seating option.

With this chair, your child can sit low to the ground, making it easier to get in and out of the chair. As with the other chairs, this one also has a cup holder and a towel bar. A storage pouch follows, this time with a built-in cooler.

Thanks to the padded shoulder straps, the Tommy Bahama beach chair can be carried like a backpack.

Looking for a kids’ beach chair?

If you are looking for a kids’ beach chair, Tommy Bahama is a great option. The chairs are made from durable materials, they are easy to clean and come in various fun, bright colors. Most importantly, they will help your child enjoy the beach to the fullest.

Not sure about Tommy Bahama beach chairs? Check out our guide to the best kids’ beach chairs for more options.

The post Tommy Bahama Kids Beach Chair appeared first on Comfy Bummy.

Apr
25
2022
--

Talking Drupal #344 – OTC – Cypress, New tools and more

Today we are talking about Cypress, New Tools, and more.

www.talkingDrupal.com/344

Topics

  • Cypress
  • Process
  • HackMD
  • Quant UX
  • Non Drupal
    • SvelteKit
    • Laravel
    • NodeJS
    • Native Web Components

Resources

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Chris Wells – redfinsolutions.com@chrisfromredfin

MOTW

Flush Plays a fun flushing sound when the cache is flushed. Sound is configurable in the admin settings under UI.

Apr
22
2022
--

Zero Impact on Index Creation with Amazon Aurora 3

Zero Impact on Index Creation with Aurora 3

Zero Impact on Index Creation with Aurora 3In the last quarter of 2021, AWS released Aurora version 3. This new version aligns Aurora with the latest MySQL 8 version, porting many of the advantages MySQL 8 has over previous versions.

While this brings a lot of new interesting features for Aurora, what we are going to cover here is to see how DDLs behave when using the ONLINE option. With a quick comparison with what happens in MySQL 8 standard and with Group Replication.

Tests

All tests were run on an Aurora instance r6g.large with a secondary availability zone. The test was composed of:

        Four connections

    • #1 to perform DDL
    • #2 to perform insert data in the table I am altering
    • #3 to perform insert data on a different table 
    • #4 checking the other node operations

In the Aurora instance, a sysbench schema with 10 tables and five million rows was created, just to get a bit of traffic. While the test table with 5ml rows as well was:

CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`),
  KEY `idx_1` (`uuid`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC

The executed commands:

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills3  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";mysql --defaults-file=./my.cnf -h <secondary aurora instance> -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Operations:
1) start inserts from connections
2) start commands in connections 4 – 5 on the other nodes
3) execute: DC1-1(root@localhost) [windmills_large]>ALTER TABLE windmills_test ADD INDEX idx_1 (uuid,active), ALGORITHM=INPLACE, LOCK=NONE;

With this, what I was looking to capture is the operation impact in doing a common action as creating an Index. My desired expectation is to have no impact when doing operations that are declared “ONLINE” such as creating an index, as well as data consistency between nodes.

Let us see what happened…

Results

While running the insert in the same table, performing the alter:

mysql>  ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (16.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

It is NOT stopping the operation in the same table or any other table in the Aurora instance.

We can only identify a minimal performance impact:

[root@ip-10-0-0-11 tmp]# while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
.347
.283
.278
.297
.291
.317
.686  ? start
<Snip>
.512  ? end
.278
.284
.279

The secondary node is not affected at all, and this is because Aurora managed at the storage level the data replication. There is no such thing as Apply from Relaylog, as we have in standard MySQL asynchronous or data replicated with Group Replication.  

The result is that in Aurora 3, we can have zero impact index (or any other ONLINE/INSTANT) operation, with this I include the data replicated in the other instances for High Availability. 

If we compare this with Group replication (see blog):

GR         Aurora 3
Time on hold for insert for altering table   	~0.217 sec   ~0 sec
Time on hold for insert for another table   	~0.211 sec   ~0 sec

However, keep in mind that MySQL with Group Replication will still need to apply the data on the Secondaries. This means that if your alter was taking 10 hours to build the index, the Secondary nodes will be misaligned with the Source for approximately another 10 hours. 

With Aurora 3 or with Percona XtraDB Cluster (PXC), changes will be there when Source has completed the operation.    

What about PXC? Well, we have a different scenario:

PXC(NBO)     Aurora 3
Time on hold for insert for altering table   	~120 sec      ~0 sec
Time on hold for insert for another table   	~25  sec      ~0 sec

We will have a higher impact while doing the Alter operation, but the data will be on all nodes at the same time maintaining a high level of consistency in the cluster. 

Conclusion

Aurora is not for all uses, and not for all budgets. However, it has some very good aspects like the one we have just seen. The difference between standard MySQL and Aurora is not in the time of holding/locking (aka operation impact) but in the HA aspects. If I have my data/structure on all my Secondary at the same time as the Source, I will feel much more comfortable than having to wait an additional T time.

This is why PXC in that case is a better alternative if you can afford the locking time. If not, well, Aurora 3 is your solution, just do your math properly and be conservative with the instance resources.

Apr
21
2022
--

Some Things to Consider Before Moving Your Database to the Cloud

Moving Your Database to the Cloud

Moving Your Database to the CloudBefore you transition your database environment to the cloud, there are a few considerations you should consider first.  Some of the touted benefits of the cloud also carry with them some risks or negative impacts.  Let’s take a look at just a few of these.

First, consider whether or not you will face vendor lock-in.  Many people choose Open Source databases precisely to avoid this.  The interesting fact, however, is that you can actually get locked in without realizing it.  Many cloud providers have their own versions of popular database platforms such as MySQL, PostgreSQL, MongoDB, etc.  These versions may be utilizing heavily engineered versions of these database systems.

While it is often easy to migrate your data into the environments, applications are often modified to adapt to unique aspects of these database platforms.  In some cases, this may be utilizing specialized features that were added while other times it can even be dealing with a lack of features with extra development.  This can occur because often the cloud versions may be based upon older codebases which may not contain all of the latest feature sets of the DBMS.  This makes migrating off the cloud more challenging as it may require changes to the code to go back.  In this sense, you may pragmatically be locked into a particular cloud database without even realizing it.

Also, consider the additional cost of time and resources if you need to re-engineer your application to work with the cloud platform.  It may not be as simple as simply migrating over.  Instead, you will need to do extensive testing and perhaps rewrite code to make it all work as it should.

cloud save moneyA common reason many migrate to the cloud is to save cost.  Some cloud providers cite projected savings of 50% or more due to not needing so much infrastructure, staff, etc.  While this is certainly possible, it is also possible that your costs could rise.  With the ease of creating and configuring new servers, it is easy to spin up more instances very quickly.  Of course, each of these instances is increasing your costs.  Without proper oversight and someone managing the spend, that monthly bill could quickly cause some sticker shock!

Storage and networking are areas that can easily increase costs in addition to sheer server instance counts.  Although storage costs are relatively cheap nowadays, think about what happens when teams set up additional test servers and leave large backups and datasets lying around.  And, of course, you have to pay networking costs as these large data sets are transferred from server to server.  The inter-connected networking of servers that used to be essentially “free” in your on-prem Data Center, is now generating costs.  It can add up quickly.

Moreover, with the “cheap” storage costs, archiving becomes less of a concern.  This is a real double-edged sword as not only do your costs increase, but your performance decreases when data sets are not archived properly.  Databases typically lose performance querying these enormous data sets and the additional time to update indexes negatively impacts performance.

Also, consider the loss of control.  In your on-prem databases, you control the data entirely.  Security is completely your responsibility.  Without a doubt, cloud providers have built their systems around security controls and this can be a huge advantage.  The thing you have to consider is that you really don’t know who is managing the systems housing your data.  You lose insight into the human aspects and this cannot be discounted.

In addition, if you have components of your application that will be either in another cloud or will remain on-prem, you must think about the effects of network latency on your application.  No longer are the various components sitting in the same Data Center.  They may be geographically dispersed across the globe.  Again, this can be a benefit, but it also carries with it a cost in performance.

You also need to consider whether you will need to retrain your staff.  Certainly, today most people are familiar with the cloud but is almost certain you will have some holdouts who are not sold on the change in the way you manage your servers.  Without a doubt, the way they work day-to-day will change.  Done properly, this shift can prove beneficial by allowing your DBAs to focus more on database performance and less on setting up and configuring servers, managing backups, monitoring, etc.

Moving to the cloud is all the rage nowadays and this article is certainly not meant to dissuade you from doing so.  Instead, the idea is to consider some of the ramifications before making the move to determine if it is really in your best interest and help you avoid some of the pitfalls.

Apr
20
2022
--

Finding Differences Between MySQL Servers

Finding Differences Between MySQL Servers

Finding Differences Between MySQL ServersWhen one is responsible for promoting application development from Dev through the various environments such as QA, UAT, etc., through Production, it is often useful to ensure that configurations in test environments are comparable to the final production environment.  This is especially true with systems where a team of DBAs manage the servers.

Obviously, the difference in performance could be due to differences in hardware, storage, networking, software configuration, etc.  The question is how does one quickly and efficiently find the differences without having to run a lot of different commands and compare the output.  Fortunately, our Percona Toolkit has a couple of utilities that can make this much easier.  When you are tasked with supporting large numbers of servers, efficiency is paramount and this is where the toolkit can really help you!

You can find more information on the Percona Toolkit here: https://www.percona.com/software/database-tools/percona-toolkit

Let’s look at a few utilities that can help you spot differences in servers.  The first tool we will look at is pt-summary (https://www.percona.com/doc/percona-toolkit/LATEST/pt-summary.html).

The pt-summary tool will give you a summarized view of the server hardware.  It runs a series of Linux commands and formats the output into an easy-to-consume format.  You will get details on the following:

  • Operating System & Linux kernel
  • Virtualization
  • CPU (number of cores, speed, models, caching)
  • Memory consumption, swappiness, etc.
  • Mounted filesystems (mount points, size, usage, format, etc.)
  • Disk schedulers
  • Disk partitioning
  • Logical Volumes
  • RAID
  • Network configuration
  • Network traffic statistics
  • Network connections
  • Top processes
  • CPU utilization

By running the same report on the servers, you can spot-check differences in the hardware and Operating System.

When you want to find out differences in MySQL configuration, the toolkit has the perfect tool for this.  It is called pt-config-diff (https://www.percona.com/doc/percona-toolkit/LATEST/pt-config-diff.html).

This utility can connect to different servers and look at their configuration via MySQL Server Variables and then perform a “diff” on the values.  The tool creates a simple and concise output showing any differences between the servers.  Any differences will be immediately obvious.  Below is an example of the output:

2 config differences
Variable                  my.primary.cnf  my.replica.cnf
========================= =============== ===============
datadir                   /tmp/12345/data /tmp/12346/data
port                      12345           12346

Usage of these two utilities will allow you to quickly and easily find differences in either hardware or MySQL configuration between two or more servers.  These are the two tools that I always go to when I need to understand why two servers that are supposedly the same do not perform similarly.

Apr
19
2022
--

Make Database Management Easier With Percona Platform

Percona Platform

Percona Software PlatformDatabase management is hard. Not only must your database platform team work to provide consistent and reliable performance, availability, and security, but they also need to meet your developers’ ever-increasing demands to move faster. This strain has led to an environment of increasing complexity, in which you and your teams are tasked with managing hundreds or thousands of siloed databases and their supporting technologies. 

Then, of course, there’s the problem of vendor lock-in. Too often, you’re forced into non-portable platforms or environments, where it’s difficult to reclaim your data, massive and oppressive annual support obligations swell, and vendors are more than happy to hold you hostage over proprietary features. It can seem there’s little way out of the spiral. 

Fortunately, there’s a better way. 

Introducing Percona Platform 

Percona Platform

Percona is different. Since day one, we’ve stood for Keeping Open Source Open. Whatever your environment, any cloud or on-premises, and whatever “enterprise” features you need, we offer ultimate freedom and flexibility. It’s what you’d expect from a company committed to the ideals of free and open source software and why many of you have trusted us for years.

Today, we’re proud to announce that we’re bringing together many of your favorite Percona offerings into a new product called Percona Platform. 

Percona Platform combines our software and services into a subscription-based offering that gives you all the tools and expertise you need to succeed: Whether that’s our fully open, performance-optimized database servers, our Percona-curated distributions, which include best-of-breed solutions for high availability, security, and backup, or Percona Monitoring and Management (PMM).  And we’re backing it up with our world-class services, delivered however you need them — whether that’s support for self-managed installations, or providing you with fully managed database environments.

Take a look at everything included:

Percona software and services

This commitment to giving our customers greater control of availability and access to their data and saving them time on managing databases is why RedMonk wrote: 

“There has been an explosion of data tools available to enterprises, which in turn has led to rampant fragmentation and highly complex, disjointed workflows. Organizations that prize velocity are looking for ways to integrate various data tools into their broader development workflows and monitor them.  This is the opportunity that Percona Platform is built for. The combination of software and services in this space is intended to help customers better manage their database deployments.” –   Rachel Stephens, Sr. Analyst, RedMonk

But you really don’t have to look any further than our customers, who love the combination of Percona software and services. It’s why more than two-thirds of our Support customers run Percona Software, and 58% of our Managed Services customers choose to run Percona databases in addition to their Percona Software.

Here’s what some of them had to say: 

“Percona’s drop-in database software replacement was a big factor as migrations can be tricky! … We have more control with Percona.” – Carpages.ca

“Percona has the best engineers in the world.” –  Appsuite 

“Percona have provided us with the best solutions and identified opportunities for improvement in many areas of MySQL, Postgres, and MongoDB.” – Solera

Percona Platform is designed based on decades of experience from Percona experts, and its production-ready database software is battle-tested in mission-critical environments with the most demanding scale and performance needs.

Discover how the Percona Platform can make database management easier for you. 

 

Apr
19
2022
--

pt-archiver with Auto-Increment Column – Debunking a Blame

pt-archiver with Auto-Increment Column

pt-archiver with Auto-Increment ColumnAs a best practice before dropping a large table in MySQL, pt-archiver can be used to delete all the records in batches. This helps to avoid a situation where your server may get stalled under certain circumstances.

I recently read a comment from a user saying “The pt-archiver is not working as expected! It is skipping the last record, which seems like a bug.”. Let’s examine pt-archiver’s default behavior and understand why the author of this comment believes that pt-archiver is bugged (Spoiler: It’s not!).

But wait, before continuing on busting the blame, let me clarify why to use pt-archiver before dropping large tables.

When we drop a table in MySQL:

  • Table data/index (ibd) and definition (frm) files are removed.
  • Triggers are removed.
  • Table definition cache is updated by removing the table being dropped.
  • InnoDB buffer pool is scanned for associated pages to invalidate them.

Note that DROP is a DDL statement and it will require a Metadata Lock (MDL) to complete the task causing all the other threads to wait on that. This also creates additional pressure on the buffer pool for purging a large number of data pages associated with the table being dropped.

And finally, the table_definition_cache operation requires LOCK_open mutex to clean up and this causes all other threads to wait until the drop is complete.

To reduce the severity of this operation we can use pt-archiver to delete large data in small chunks, thus lowering the table size significantly. Once we have deleted records from the large table, the drop operation goes fast without creating a performance impact.

Coming back to debunking the blame. This behavior was noted by a community member that after the pt-archiver was completed, the table still had one row pending.

Generating a lab the scenarios proved that these allegations appear to be true:

# Created table
mysql> CREATE TABLE `tt1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` char(5) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

# Poured random test data into it
mysql> call populate('test','att1',10000,'N');

# Purged data using pt-archiver
[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --purge --where "1=1"

# Verifying count (expected 0, got 1)
mysql> select count(*) from test.tt1;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

The same happens when we use pt-archiver for data copy with –no-delete. Our tool, pt-archiver, appears not to copy the max value to the destination table.

[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --dest=h=localhost,D=test,t=tt2 --no-delete --where "1=1"

mysql> select count(*) from tt2;
+----------+
| count(*) |
+----------+
|     5008 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from tt1;
+----------+
| count(*) |
+----------+
|     5009 |
+----------+
1 row in set (0.00 sec)

We have a bug report that was already in place https://jira.percona.com/browse/PT-837 noting this behavior. But is this really a bug?

Reading through the pt-archiver documentation, there’s an option –[no]safe-auto-increment which describes the usage: “Do not archive rows with max AUTO_INCREMENT.” 

Meaning, the option –safe-auto-increment (default) adds an extra WHERE clause to prevent pt-archiver from removing the newest row when ascending a single-column AUTO_INCREMENT as seen in the code section below:

https://github.com/percona/percona-toolkit/blob/3.x/bin/pt-archiver#L6449
   if ( $o->get('safe-auto-increment')
         && $sel_stmt->{index}
         && scalar(@{$src->{info}->{keys}->{$sel_stmt->{index}}->{cols}}) == 1
         && $src->{info}->{is_autoinc}->{
            $src->{info}->{keys}->{$sel_stmt->{index}}->{cols}->[0]
         }
   ) {
      my $col = $q->quote($sel_stmt->{scols}->[0]);
      my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl}");
      $first_sql .= " AND ($col < " . $q->quote_val($val) . ")";
   }

 

Let’s see the difference between the two commands via dry-run output:

# With --no-safe-auto-increment
[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --dest=h=localhost,D=test,t=tt2 --no-delete --where "1=1" --no-safe-auto-increment --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`a` FROM `test`.`tt1` FORCE INDEX(`PRIMARY`) WHERE (1=1) ORDER BY `id` LIMIT 1
SELECT /*!40001 SQL_NO_CACHE */ `id`,`a` FROM `test`.`tt1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND ((`id` > ?)) ORDER BY `id` LIMIT 1
INSERT INTO `test`.`tt2`(`id`,`a`) VALUES (?,?)

# Without --no-safe-auto-increment (default)
[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --dest=h=localhost,D=test,t=tt2 --no-delete --where "1=1" --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`a` FROM `test`.`tt1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '5009') ORDER BY `id` LIMIT 1
SELECT /*!40001 SQL_NO_CACHE */ `id`,`a` FROM `test`.`tt1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '5009') AND ((`id` > ?)) ORDER BY `id` LIMIT 1
INSERT INTO `test`.`tt2`(`id`,`a`) VALUES (?,?)

Did you note the additional clause “AND (id < ‘5009’)” above?

This option of –no-safe-auto-increment guards against re-using AUTO_INCREMENT values if the server restarts. Note that the extra WHERE clause contains the maximum value of the auto-increment column as of the beginning of the archive or purge job. If new rows are inserted while pt-archiver is running, pt-archiver will not see them.

Alright, now we know the “why” of the “blame”, but why? What is the issue around safety for AUTO_INCREMENT?

The AUTO_INCREMENT counter is stored in memory and as MySQL restarts (crashes or otherwise), the counter will reset to the max value. If this happens and the table is accepting writes, the AUTO_INCREMENT value will change.

# deleting everything from table
mysql> delete from tt1;
...
mysql> show table status like 'tt1'\G
*************************** 1. row ***************************
           Name: tt1
         Engine: InnoDB
...
 Auto_increment: 10019
...


# Restarting MySQL
[root@centos_2 ~]# systemctl restart mysql

# Verifying auto-increment counter
[root@centos_2 ~]# mysql test -e "show table status like 'tt1'\G"
*************************** 1. row ***************************
           Name: tt1
         Engine: InnoDB
...
 Auto_increment: 1
...

These observations convincingly tell us that the problem here was not really with pt-archiver but with the option chosen. It is important to understand the use –no-safe-auto-increment option, in the case of using pt-archiver while working on AUTO_INCREMENT columns.

Let’s just verify it with our lab data.

# Verifying the usage of –no-safe-auto-increment option
[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --purge --where "1=1" --no-safe-auto-increment

mysql> select count(*) from test.tt1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

The same goes for a copy operation with –no-delete option.

[root@centos_2 ~]# pt-archiver --source=h=localhost,D=test,t=tt1 --dest=h=localhost,D=test,t=tt2 --no-delete --where "1=1" --no-safe-auto-increment

mysql> select count(*) from tt1; select count(*) from tt2;
+----------+
| count(*) |
+----------+
|     5009 |
+----------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
|     5009 |
+----------+
1 row in set (0.00 sec)

 

Now that we have established the understanding of pt-archiver’s –[no]safe-auto-increment option and before we conclude that everything is well and good; let us give the option itself some more thought.

  1. The –no-delete operation should by default include –no-safe-auto-increment option.
    • At present, safe-auto-increment is default behavior. When we use –no-delete option of pt-archiver, there are no delete operations. This means safe-auto-increment should not be a reason for concern.
    • Do you agree? Do you want to counter? Use comments.
  2. For MySQL 8.0, safe-auto-increment option is not required.
    • This thought comes from the knowledge that from MySQL 8.0 onwards, the auto-increment values are persistent and survives restarts or crashes. –Ref: MySQL worklog
    • And since MySQL 8.0 auto-increment is persisted via redo logs, this makes them a reason for not being a concern for our beloved pt-archiver. Thus we don’t need a safe-auto-increment option at all.
    • Do you agree? Do you want to counter?

Well, point #2 was too convincing until it was refuted by my friend Ananias with the following facts.

  • Reuse of previously allocated auto-increment counters cannot be guaranteed.

However, in the case of an unexpected server exit, reuse of a previously allocated auto-increment value cannot be guaranteed. Each time the current maximum auto-increment value is changed due to an INSERT or UPDATE operation, the new value is written to the redo log, but if the unexpected exit occurs before the redo log is flushed to disk, the previously allocated value could be reused when the auto-increment counter is initialized after the server is restarted” – MySQL Documentation

  • Yet another fact, as documentation reads “As of MySQL 8.0.21, you can disable redo logging”. Though this feature is specifically created for speeding up the data load to MySQL and is not to be used in production; “To err is human”.
  • That makes the “safe-auto-increment” option to stay in 8.0+. The only option we are left with is to understand the options.

Conclusion

The pt-archiver is a great tool for archiving MySQL data and it is important to be aware of all the options to have full control of what we want to achieve using it.

It’d be interesting to know if you have any thoughts while reading this; please use comments and share.

Apr
19
2022
--

PostgreSQL 14 B-Tree Index: Reduced Bloat with Bottom-Up Deletion

PostgreSQL 14 B-Tree Index

Concurrent access to data within PostgreSQL is managed with the Multiversion Concurrency Control (MVCC) model. Data snapshots are maintained for each SQL statement so that they always get consistent data, even if other transactions are modifying it concurrently. This leads to managing multiple versions of the same row when the row has been modified by one or more transactions. From a user perspective, there might only be a single row of data, but internally PostgreSQL may be maintaining one or more versions of that row.

Whether a row version is visible to a transaction is maintained with the row data in the heap. To optimize the fetching of visibility information PostgreSQL also maintains a “_vm” relation fork that keeps track of which pages contain only the tuples that are known to be visible to all active transactions.

Dead versions that are no longer visible to any transaction are cleaned up by the vacuum process. Until that happens the index and heap pages may contain a significant number of dead tuples (This really depends on the nature of your workload). For a very update-intensive workload, this could be a huge number!

It may seem innocuous at first sight, but this accumulation of dead index tuples creates a cascading effect that leads to significant performance degradation. After the deduplication work in PostgreSQL 13, the next logical step was to prevent btree index expansion by reducing page splits.

Physical Data Storage

PostgreSQL maintains data in fixed-sized storage units called pages. The size of a page is defined during the PostgreSQL server compilation process. The default page size is 8k, but this can be changed to a higher value. Though changing the page size complicates things as other tools may require recompilation or reconfiguration as well.

Each table and index is stored in an array of pages. When data is inserted in a table, the data is written to a page having enough free space. Otherwise, a new page is created.

Indexes however are a little different. The first page in an index is a meta page that contains control information about the index. There can also be special pages that maintain index-related information. In the case of a btree index, the data must be sorted based on the index columns and heap tuple ID (the physical location of the tuple within the table). Therefore insertion and updates must happen on the correct page to maintain the sorting order. If the page does not have enough space for the incoming tuple a new page must be created, and some items from the overflowing page are moved to the new page. Parent pages of these leaf pages are split recursively if needed.

Avoiding Page Splits

B-Tree index page splits occur when new tuples or new non-HOT tuple versions are to be added to the index. HOT is an abbreviation for “heap only tuple”. In basic terms, it is a way of removing dead rows on a given page (defragmentation) and thereby making space for new rows. By avoiding or delaying page splits, we can avoid or slow down index expansion and therefore reduce the bloat. Now that is exciting!

While there isn’t much that can be done for new tuples, updates can be managed such that obsolete versions of logically unchanged index tuples (i.e. unchanged index columns) can be incrementally removed to maintain free space for the new version. This process is aided by the planner which provides a hint, “index unchanged” to the index method. This is true if none of the index columns are changed as a result of this update.

The bottom-up deletion is done during an index operation when a “version churn page split” is anticipated (the “index unchanged” hint is true). Obsolete versions of logically unchanged index tuples are removed making space on the page for the newer version. This approach potentially avoids a page split.

Bottom-Up Delete in Action

To see the actual benefit of this approach let us dig a little deeper into the B-Tree index. We are going to compare the btree index sizes between PostgreSQL versions 13 and 14. For a more detailed inspection of index data, I’ll be using the “pageinspect” extension that is available in the contrib modules. The “pageinspect” extension allows us to see the low-level page contents of indexes or tables.

Let’s start by creating the pageinspect extension. You may need to install the contrib modules, or if you are building from the source make install it and then proceed on.

CREATE EXTENSION IF NOT EXISTS pageinspect;

 

Let’s now create a table “foo” with two columns, create two indexes with one covering index, and analyze the table as well.

DROP TABLE IF EXISTS foo;
CREATE TABLE foo WITH (autovacuum_enabled = false) AS (SELECT GENERATE_SERIES(1, 1000) AS col1, SUBSTR(MD5(RANDOM()::TEXT), 0, 25) AS value);
CREATE INDEX ON foo(col1);
CREATE INDEX ON foo(col1) INCLUDE(value);

 

It is time to inspect a number of pages, tuples, and relation sizes for the “foo” table.

SELECT  relname
        , relkind
        , relpages
        , reltuples
        , PG_SIZE_PRETTY(PG_RELATION_SIZE(oid))
FROM    pg_class
WHERE   relname LIKE '%foo%'
ORDER
BY      relkind DESC;

??      relname       | relkind | relpages | reltuples | pg_size_pretty 
--------------------+---------+----------+-----------+----------------
 foo                | r       |        8 |      1000 | 64 kB
 foo_col1_idx       | i       |        5 |      1000 | 40 kB
 foo_col1_value_idx | i       |        9 |      1000 | 72 kB
(3 rows)

 

Both 14.1 and 13.5 give the exact same output for the above query.

Disable sequential and bitmap scans to force index scans. This will force the queries in this example to use an index scan.

SET enable_seqscan = false;
SET enable_bitmapscan = false;

 

Create four new versions of tuples.

UPDATE foo SET value = value || 'x';
UPDATE foo SET value = value || 'x';
UPDATE foo SET value = value || 'x';
UPDATE foo SET value = value || 'x';

 

The above queries result in updating 1,000 rows each. “ANALYZE” the table to ensure our statistics are accurate. Also let us review the number of pages, tuples, and relation sizes for the “foo” table.

ANALYZE foo;

SELECT  relname
        , relkind
        , relpages
        , reltuples
        , PG_SIZE_PRETTY(PG_RELATION_SIZE(oid))
FROM    pg_class
WHERE   relname LIKE '%foo%'
ORDER
BY      relkind DESC;

--PostgreSQL 14.1
??      relname       | relkind | relpages | reltuples | pg_size_pretty 
--------------------+---------+----------+-----------+----------------
 foo                | r       |        8 |      1000 | 288 kB
 foo_col1_idx       | i       |        5 |      1000 | 88 kB
 foo_col1_value_idx | i       |        9 |      1000 | 216 kB
(3 rows)


--PostgreSQL 13.5
??--------------------+---------+----------+-----------+----------------
 foo                | r       |        8 |      1000 | 288 kB
 foo_col1_idx       | i       |        5 |      1000 | 104 kB
 foo_col1_value_idx | i       |        9 |      1000 | 360 kB
(3 rows)

 

The table size has increased by the same amount in both versions however, the indexes in 14.1 have significantly smaller sizes compared to 13.5. Great, but just to be sure let’s inspect the page contents to understand what has happened behind the scenes.

Reviewing the contents of the first index page (not the meta page) clearly shows how the bottom-up deletion is keeping index sizes small.

 

SELECT  itemoffset
        , ctid
        , itemlen
        , nulls
        , vars
        , dead
        , htid
FROM    bt_page_items('foo_col1_value_idx', 1)
LIMIT   15;

PostgreSQL 14.1
?? itemoffset |  ctid   | itemlen | nulls | vars | dead |  htid   
------------+---------+---------+-------+------+------+---------
          1 | (7,1)   |      16 | f     | f    |      | 
          2 | (7,181) |      40 | f     | t    | f    | (7,181)
          3 | (7,225) |      48 | f     | t    | f    | (7,225)
          4 | (7,182) |      40 | f     | t    | f    | (7,182)
          5 | (7,226) |      48 | f     | t    | f    | (7,226)
          6 | (7,183) |      40 | f     | t    | f    | (7,183)
          7 | (7,227) |      48 | f     | t    | f    | (7,227)
          8 | (7,184) |      40 | f     | t    | f    | (7,184)
          9 | (7,228) |      48 | f     | t    | f    | (7,228)
         10 | (7,185) |      40 | f     | t    | f    | (7,185)
         11 | (7,229) |      48 | f     | t    | f    | (7,229)
         12 | (7,186) |      40 | f     | t    | f    | (7,186)
         13 | (7,230) |      48 | f     | t    | f    | (7,230)
         14 | (7,187) |      40 | f     | t    | f    | (7,187)
         15 | (7,231) |      48 | f     | t    | f    | (7,231)
(15 rows)


PostgreSQL 13.5
?? itemoffset |  ctid   | itemlen | nulls | vars | dead |  htid   
------------+---------+---------+-------+------+------+---------
          1 | (0,1)   |      16 | f     | f    |      | 
          2 | (0,1)   |      40 | f     | t    | f    | (0,1)
          3 | (7,49)  |      40 | f     | t    | f    | (7,49)
          4 | (7,137) |      40 | f     | t    | f    | (7,137)
          5 | (7,181) |      40 | f     | t    | f    | (7,181)
          6 | (7,225) |      48 | f     | t    | f    | (7,225)
          7 | (0,2)   |      40 | f     | t    | f    | (0,2)
          8 | (7,50)  |      40 | f     | t    | f    | (7,50)
          9 | (7,138) |      40 | f     | t    | f    | (7,138)
         10 | (7,182) |      40 | f     | t    | f    | (7,182)
         11 | (7,226) |      48 | f     | t    | f    | (7,226)
         12 | (0,3)   |      40 | f     | t    | f    | (0,3)
         13 | (7,51)  |      40 | f     | t    | f    | (7,51)
         14 | (7,139) |      40 | f     | t    | f    | (7,139)
         15 | (7,183) |      40 | f     | t    | f    | (7,183)
(15 rows)

Looking at the “itemoffset” 2 to 3 for 14.1 and 2 to 6 for 13.5 tells us the entire story. 13.5 is carrying the entire set of tuple versions whereas 14.1 cleansed the dead tuples to make room. With fewer versions, there are fewer pages resulting in less bloating, and giving us a smaller index size.

Conclusion

Reduction in index size due to bottom deletion is a huge plus in PostgreSQL version 14. Btree indexes have a mechanism where plain index scans set the LP_DEAD flag. This is not set for bitmap index scans though. Once this is set, the space can be reclaimed without the need of vacuum. However, that’s a completely different class of dead tuples. In the long run, this bottom-up deletion strategy helps in significantly reducing a specific class of duplicates. It not only reduces the load on vacuum but also helps keep indexes healthier leading to faster access speeds. So if you have a high update workload, there will definitely be savings in resource utilization and costs while giving better performance.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com