Apr
28
2023
--

Add More Security to Your Percona Server for MongoDB With AWS IAM integration!

MongoDB With AWS IAM Integration

Did you notice that Percona Server for MongoDB 6.0.5-4 was released just a few days ago? This time around, it introduced improvements to the way we handle master key rotation for data at rest encryption as well as AWS IAM integration.

One key to rule them all — improvements to master key rotation

With the improvements introduced in Percona Server for MongoDB 6.0.5-4, one key path can be used for all servers in a clustered environment. This allows us to use one vault key namespace for all nodes in a deployment while at the same time preserving key versions and allowing each node to perform key rotation without impact to the other nodes.

Changes introduced with Percona Server for MongoDB 6.0.5-4 now allow using the same key for all the members of a replica set if the user chooses so, without impact on functionality.

Why should you care about AWS IAM integration?

With all the systems users need to access daily, password management becomes a more pressing issue. The introduction of IAM systems to an enterprise has become somewhat of a security standard in large enterprises.

Our users approached us about integration with AWS IAM, commonly used in their organizations. It’s an integration missing from MongoDB Community Edition (CE) that is important for compliance with enterprise security policies of many companies. Integration with AWS IAM allows:

MongoDB AWS IAM integration

To set up this integration, follow the steps outlined in our documentation, and configure either the user or the role authentication. This will allow AWS Security Token Service (STS) to play the part of Identity Provider (IDP) in a SAML 2.0-based federation.

Your feedback matters!

We take pride in being open to feedback in Percona. Please do not hesitate to contact us via the community forums or this contact form.

What’s next

We are looking into the problems affecting large size datastores that are a pain to our users. Please let us know if there are any particular issues you are struggling with in MongoDB; we are always open to suggestions!

Learn more about Percona Server for MongoDB

Apr
28
2023
--

Open Source vs. Proprietary Database Software: What To Choose?

open source

For starters, yes, Percona has chosen sides. We are a company of open source proponents. We’re also dedicated and active participants in the global open source community. 

But the intent here is to be informative, not to declare a winner between the two. Both open source and proprietary options have advantages. At the same time, it’s hard to ignore key differences and how they affect customers. We also know that enterprise-grade features matter, so please read on. In election parlance, there might be a surprise third-party candidate that will win your vote.

Let’s start with a simple introductory comparison: With proprietary (closed source) database software, the public does not have access to the source code; only the company that owns it and those given access can modify it. With open source database software, anyone in the general public can access the source code, read it, and modify it.

Dispelling a couple of myths

Before expanding on the comparison, let’s dispel the most common myths about open source software:

Myth #1: Open source is less secure.

Sure, without the right protections, open source software can be vulnerable, but those protections exist and can be implemented. People assume that because the code is public, attackers can have at it and easily wreak havoc. That’s not so. 

With the code open to all, it’s truly transparent (DevOps.com, April 2021); security is scrutinized, and vulnerabilities are addressed by experts globally. Those most involved with open source software know the reality about security. Since the early 2000s, research has repeatedly shown that open source software is no more vulnerable than proprietary software. Those on the front lines echo those findings. A 2021 Red Hat survey showed that 89% of IT leaders see enterprise open source software as equally or more secure than proprietary software.

In contrast to the abundant support of the open source community, with proprietary systems, the software vendor must address and fix the problem, which often means providing a patch. Further, the developers of proprietary software aren’t always transparent about vulnerabilities, so software users might not know about threats until it’s too late. 

And about attackers: They often use hacking programs instead of attacking the code directly. They don’t need access to examine the code for hacking purposes (TA Digital, July 2021). Attackers find and exploit vulnerabilities in proprietary software all the time.

Myth #2: Proprietary databases are better and therefore more suitable for large enterprises.

Again, in the case of proprietary software, the developers and DBAs come from within one company or a limited team. Conversely, with open source, a global community contributes. 

The transparency of the open source model makes input inclusive and creates an abundance of scrutiny and support (Forbes, January 2022). The multitude of checks and balances places a premium on how the code affects performance. 

Another part of this myth is that open source can’t be enterprise-grade. With the right expertise, you can have an enterprise-grade open source solution. Here’s a small sampling of leading corporations that use open source software: Amazon, Audi, BMW, Google, IBM, and Facebook (CodeSnail, August 2022). The list goes on and on.

Now, myths aside, let’s get down to the brass tacks of database comparisons. Each of our subjects has its pros and cons.

Proprietary database software

Proprietary software can be beneficial for addressing immediate and/or focused database concerns. Sometimes a vendor will have an innovative way of solving a problem when there aren’t alternatives available on the market. A company will enter a relationship with that vendor because the vendor’s solution addresses present business objectives. Additionally, a single-vendor relationship can eliminate complexity; in some cases, the vendor’s solution can simplify the environment and ensure that all components work together. 

The benefits described above often are attributed solely — and inaccurately —  to proprietary software only. For some, proprietary is synonymous with business/enterprise-grade. Consequently, some organizations might not even consider using open source software, or they’ll quickly dismiss it because of a perceived lack of expertise, support, etc.

Those same customers, and others, often aren’t aware that proprietary software can limit creative options and the ability to scale. Those limitations can increasingly draw from a customer’s tech budget. As business objectives change, along with industry standards and technological advances, a customer can be stuck with software and upgrades that make more sense for the vendor’s bottom line than for addressing the customer’s changing needs. For example, the vendor might push a cloud-based solution when the customer prefers to keep its infrastructure on-premises. 

Additionally, with proprietary software, there can be drawbacks related to certifications. When you deploy software in a proprietary arrangement, the vendor might certify it only against a specific database or set of databases. Your apps, therefore, must run on one particular server.

Being stuck with a single vendor and its software can result in vendor lock-in that makes you susceptible to price hikes, paying for bundled technology with components you don’t need, and an inability to change software and infrastructure to meet unique business needs.

Open source database software

Upstream open source projects are free to download and use. There are no licensing or purchasing fees for reusing, modifying, or distributing the software. Beyond the obvious cost-efficiency, many IT leaders consider the quality of open source software on par with that of proprietary software. In fact, 32% of IT leaders in a Red Hat survey consider open source enterprise software to be of higher quality. 

Free of licensing restrictions and escalating costs that can come with proprietary software, developers can download open source software and use it to create new applications. Those freedoms help companies optimize limited tech budgets. They can more easily scale infrastructure — up or down — to meet economic conditions and changing business objectives. 

And there is the aforementioned online open source community. Whereas proprietary products exist at the whim of a single vendor, a strong open source community can help ensure a project carries on even if challenges arise for some of the project’s supporting companies. Additionally, with open source, companies can deploy their databases anywhere — in cloud, on-premises, or hybrid environments — and move them at any time.

A lack of readily available support and expertise, however, can offset the potential savings of open source database software. There must be careful implementation of the right protection to avoid vulnerabilities. And to achieve database objectives across the enterprise, a company that uses open source software often must either bolster its on-staff expertise or turn to outside support. Either option can be costly.

The best of both worlds — enterprise-grade open source software

Undoubtedly, you liked some of the attributes from each side. So how do you choose?

You don’t have to. There’s a third-party candidate offering the best of both worlds — open source database software with enterprise-grade features.

This option couples the cost-efficiency and scalability of open source with the simplicity (task-focused), cohesiveness (components work together), and security of proprietary software. With the right extensions and add-ons to make it enterprise-grade, an open source solution can replicate the applications a company uses and can handle the performance requirements of the company’s most critical workloads. A flexible, open source enterprise setup enables deployment and operation on-premises, in the cloud, or in a hybrid environment.

It’s important, however, to emphasize these words of caution: The phrase “enterprise-grade” is used a lot, but few vendors provide open source software that meets the demanding mix of enterprise needs related to integration, productivity, scalability, and security. And even when those needs are met, they’re soon to evolve. Therefore, enterprise-grade software — like community versions — still requires support. When seeking such support, it’s important to find a vendor that provides multi-database support, technology-agnostic expertise, and a flexible contract.

The search can be challenging, but vendors who provide true enterprise-grade open source software do exist. We happen to know of one.

You can learn more about the differences between open source and proprietary database software in The Ultimate Guide to Open Source Databases.

When you’re choosing a database, consider Percona

Percona is dedicated to making databases and applications run better through a combination of expertise and open source software. Our enterprise-grade distributions include the following:

  • Percona Distribution for MySQL: This single solution delivers optimized performance, greater scalability and availability, and enhanced backups — for even the most demanding workloads.
  • Percona Distribution for PostgreSQL: Put the best and most critical enterprise components from the open source community to work for you — in a single distribution, designed and tested to work together.
  • Percona Distribution for MongoDB: Ensure data availability while improving security and simplifying the development of new applications — in the most demanding public, private, and hybrid cloud environments.

Percona backs its enterprise-grade distributions with varying levels of support. We’ll provide support that best fits the needs of your company or organization — without a restrictive contract.

 

Watch Webinar: Optimize Costs With Open Source Software and Support

 

Get Open Source Support Services from Percona

 

Learn more:

Apr
26
2023
--

Open Source Software Security: Is Open Source Software Safe?

is open source software safe

Even though open source software is firmly in the mainstream, used widely by businesses, governments, and everyone who owns a cell phone or computer, the question repeatedly arises: “Is open source software safe?” Broadly speaking, the answer is a resounding yes. But it’s worth examining what we mean by “safe,” contrasting open source software with proprietary software, and discussing when you should use caution.

Defining “safe” for software

Let’s start by defining what we mean by “safe” because it’s a non-specific term that might mean different things to different people.

Safe, here, encompasses security, stability, sustainability, and compliance.

Specifically, does open source software meet a reasonable security expectation comparable to or better than proprietary software? Is open source software as stable as other software? Is it sustainable, that is – will the software continue to be developed and maintained long-term, predictably, so you can depend on it? And, finally, does open source software carry any risks around legal compliance?

Finally, let’s clarify what we’re discussing with the phrase “open source software.” Anybody can slap an open source license on some software and put it online. Our bar is higher than that. We’re not addressing hobby projects or those that don’t have an active community.

When discussing open source software, we’re talking about ongoing projects with a healthy community and substantial adoption. We will talk about how to assess that when choosing a project.

Let’s start with the big one, security.

Is open source software secure?

When a project’s source code is available to all, the question of security isn’t far behind. How can something be secure if anyone can examine the code and look for security flaws?

Would-be attackers can comb through source code to find security flaws. Sometimes they do! But it also allows “white hat” types to examine open source projects to try to find and fix vulnerabilities before attackers find them and use them. It allows organizations to identify potential vulnerabilities, report them, and apply fixes without depending on a single vendor.

The relative security of open source software has been examined repeatedly by researchers since the early 2000s. Open source software contains no more flaws on average than proprietary software. In some cases, it may have fewer vulnerabilities.

Security through obscurity – expecting software to be more secure if attackers can’t see the source code – doesn’t work. Attackers find and exploit vulnerabilities in proprietary software all the time. The Log4Shell vulnerability in Apache Log4j’s software made big headlines in 2021, but it wasn’t alone. Consider ProxyShell – a set of vulnerabilities in Microsoft Exchange that could result in Remote Code Execution (RCE).

That’s just one example. You can peruse Microsoft’s acknowledgments of security reports for a long list of vulnerabilities discovered by various researchers who found vulnerabilities in its software without access to source code. 

So, is open source software secure? In absolute terms, no software should be considered free of vulnerabilities. But, in relative terms, we’d say yes. Open source software is secure relative to proprietary software – and in some instances, we’d say more secure than proprietary software.

In all instances, open source software allows anyone to examine the software and attempt to provide fixes if they discover a vulnerability. Open source software does not depend on a single vendor that controls the software entirely.

Is open source software stable?

Users may also wonder if open source software is stable, whether it’s safe to use open source software in production environments, and that sort of thing.

Again, the answer is yes, but with a few caveats worth calling out. Let’s start with some prime examples of open source software being used where stability is crucial.

Watch the “Using Open Source Software to Optimize and Troubleshoot Your MySQL Environment” video sessions

Open source software powers the Internet. Linux, for example, is the most widely used operating system to run services you use daily. All the major cloud providers use Linux, your cell phone company is likely using Linux to deliver phone calls, it’s used by streaming services, social media companies, and so forth. If you’re an Android user, it’s in your phone.

That’s just the most obvious example. Open source databases, like MySQL and PostgreSQL, are among the most widely used and popular databases for workloads large and small.

There’s also WordPress and Drupal, content management systems (CMSes) that power millions of websites worldwide. (WordPress, in fact, powers this blog and uses a MySQL database to store its content.)

On the smaller side, you have tools like curl and SQLite embedded in millions of devices for various uses. Open source has even gone to Mars.

Caveats of open source software

One could write a book on the successful use of open source and how well it stacks up, stability-wise, next to proprietary software. What are the caveats?

You need to evaluate open source software the same way you’d evaluate any software. Look at how it’s produced, and the health of its community or vendor, and put it to the test in a proof-of-concept (POC) or otherwise evaluate it to verify it suits your needs.

The health of the community is a broad topic, I won’t try to explore it here fully. But, in short, check out the history of the project. See how many contributors it has, whether it has vendors who support it if you need support, and make sure it’s still being maintained.

If you examine Linux, MySQL, PostgreSQL, Kubernetes, WordPress, Apache Kafka, and thousands of other projects, you’ll find projects with long histories, widespread adoption, and vendors who will provide support above and beyond just supplying the software.

That brings us to sustainability.

Is open source software sustainable?

Sustainable is a phrase used a lot to describe environmentally friendly concerns. But when we say “sustainable” here, we’re talking about whether the development process that produces the software is sustainable. To put it another way: Can we depend on that software to be here tomorrow, next month, or next year? Even longer?

This question isn’t unique to open source software! The same forces that cause software companies to go out of business or cancel projects can impact open source. 

Proprietary software goes away all the time, particularly in the age of Software-as-a-Service. Consider all the projects in Google’s graveyard, like Google Reader, Stadia, G+, and too many messaging apps to even try to recount.

Maintainers aren’t suppliers

However, open source has an added wrinkle, and we want to discuss it head-on. Open source projects are often powered by maintainers who aren’t paid directly to work on those projects. 

Maintainers are not the same thing as suppliers and vendors. An open source project is not necessarily the same thing as a product. 

For example, many of the Apache Software Foundation (ASF) projects have contributors from many different companies. Some may be paid to work on the project full time, and others may contribute as part of their day job where the software is used in their work, but they have other responsibilities. 

So if you evaluate an open source project to use in your business, you need to do some due diligence about the project’s health to verify that it has the longevity you want. Again, this is similar to doing due diligence on a software vendor. 

How to evaluate open source projects

You can feel confident that Microsoft will be around in 10 years and still support Windows and SQL Server. Likewise, Linux and PostgreSQL will almost certainly be around in 10 years. Apple is unlikely to go out of business and drop iOS anytime soon. WordPress has been chugging along for years and years and powers a huge chunk of the Internet, and it’ll still be used for blogs well into the future. 

Open source data management software survey

On the other hand, you can look at a lot of proprietary software that has hit end of life when its vendor was acquired or management changed. Microsoft killed off VisualBasic while it was still popular, for example. Twitter snapped up Vine and then shuttered. Adobe has (largely) retired Flash, though you’ll find few people to mourn Flash and quite a few who were happy to see it go.

Open source software can reach its end of life too. The ASF, for example, has its “Attic” – a process and home for ASF projects that have become obsolete or failed to maintain a large enough community of maintainers.

How can you know if an open source project will be around for the long haul and receive updates? 

A good rule of thumb? Look for widely adopted open source software with a good track record, and it’s even better if multiple vendors work on and support the software

If it can be picked up and developed by multiple vendors, it’s a much safer bet. MySQL and PostgreSQL, for example, are great examples of projects with product equivalents with support options equivalent to proprietary software without the downsides of being proprietary.

What about open source software compliance?

Finally, the question on many people’s minds is whether open source software is safe from a compliance perspective. That is, does open source software introduce any legal requirements?

I’m not a lawyer, nor do I play one on TV, so this isn’t to be confused with legal advice. If you need a genuine legal opinion, you’ll definitely want to consult a lawyer – the same as if you wanted legal advice on an End User License Agreement (EULA) with proprietary software. 

That said – licenses that meet the Open Source Definition (OSD) from the Open Source Initiative (OSI) have conditions triggered on distribution rather than use. If you install and run the software but don’t distribute it, then you don’t have any requirements to meet. Distribution is when you need to verify compliance.

What is open source distribution?

What is distribution? If your organization conveys software to other entities, that generally counts as distribution. For example, if your organization makes an electronic device with embedded software under open source licenses and sells them to customers, that’s distribution. Depending on the license, you may need to include a notice about the software, or you may need to make the source code available to customers on request. 

At least one open source license, the Affero GNU Public License (AGPL), extends the distribution concept to include interaction over a network. So, if you’re using AGPL’ed software in a SaaS offering, that may require you to distribute the source code or provide a mechanism for distributing the source code to users of that SaaS. 

So, if your organization ships software under open source licenses, then you need to have a plan to comply with the license requirements. If you simply use open source software, maybe you have a bunch of servers running Linux and an open source database like MySQL, but don’t distribute the software? Then you don’t have any special requirements to worry about.

The most popular open source licenses

The OSI has approved quite a few licenses as OSD-compliant, but in practice, you’ll see only a handful of them in use. Most open source software uses one of four or five permissive licenses (Apache 2.0, MIT, BSD 2, or BSD 3 being most likely) or one of the reciprocal GPL variants. 

These licenses are well-understood. You can find ample guidance on working with them. 

The Ultimate Guide to Open Source Databases

EULAs, on the other hand, are non-standard and ever-changing. If you use Apple software, for instance, you’re probably familiar with having to agree to EULA changes every time you update your software. If you use proprietary enterprise software, it probably has restrictions and compliance requirements to keep track of as you deploy it. 

The good news about EULAs is that you don’t have to worry about modification or distribution – because you’re not allowed to do that, you don’t need to ask what to do if you make a modification and want to distribute it. Problem solved! 

So… is it safe?

The real answer is, of course, the disappointing but realistic “it depends.” Open source software is not inherently unsafe or less safe than proprietary software.

 

Percona Database Software Solutions

Apr
26
2023
--

Speeding Up Restores in Percona Backup for MongoDB

Speeding Up Restores in Percona Backup for MongoDB

When you do a database restore, you want to have it done as soon as possible. In the case of disaster recovery, the situation is stressful enough on its own. And the database is unusable until the restore is done. So every minute matters. That becomes especially crucial with big datasets.

Bringing physical backups in Percona Backup for MongoDB (PBM) was a big step toward the restoration speed. A physical restore is essentially copying data files to the target nodes and starting a database with that data catalog, while logical means copying data and running insert operations on the database, which brings overhead on parsing data, building indexes, etc. Our tests showed physical database restores up to 5x faster than the logical ones. But can we do better? Let’s try.

The speed of the physical restoration comes down to how fast we can copy (download) data from the remote storage. So we decided to try parallel (concurrent) download. In physical backups, PBM stores WiredTiger files pretty much the same as they are in the data directory, just adding extra compression. So what if you want to download different files in parallel? It won’t exactly work as each MongoDB collection’s data is stored in one file. So data doesn’t spread evenly across the files. And we would have bottlenecks in case of big collections. So the better approach is to download each file concurrently.

PBM already downloads files in chunks, but it’s done solely for retries. So in case of a network failure, we’d have to retry a recent chunk rather than the whole file. The idea is to download these chunks concurrently. Here’s the problem: Reading out-of-order, we cannot write it straight to the file (with a seek offset), as we have to decompress data first (data in the backup is compressed by default). Hence, although we can read data out-of-order, we must write it sequentially. For that, we made a special memory buffer. Chunks can be put there concurrently and out-of-order, but consumers always read data in order.

The final design

The downloader starts the number of workers, which equals the concurrency (number of CPU cores by default). There is preallocated arena in the arenas pool for each worker. The arena basically is a bytes buffer with the free slots bitmap. Each arena is split into spans. The span size is equal to the download chunk size. When a worker wants to download a chunk, it acquires free span from its arena first and downloads data in there. When a consumer reads this data, the span is marked as free and can be reused for the next chunk. The worker doesn’t wait for the data to be read, and once it has downloaded a chunk, it takes another chunk from the task queue, acquires the next free span, and downloads data. To prevent uncontrolled memory consumption, the number of spans in each arena is limited, and the worker would have to wait for a free span to download the next chunk if all are busy. 

On the other hand, we keep track of what was given to the consumer, the number of the last written byte, for each file. And if the downloaded chunk is out-of-order, it’s being pushed to the heap or given to the consumer otherwise. On the next iteration (the next downloaded chunk), we check the top of the heap, pop chunks out, and give it back to the consumer if/until chunks are in order.

See the commit with changes for more details.

Config options

A few new options were added to the PBM config file to tweak concurrent downloads. 

numDownloadWorkers – sets concurrency. Default: number of CPUs

downloadChunkMb – the size of the chunks to download in Mb. Default: 32

maxDownloadBufferMb – the upper limit of the memory that can be used for download buffers (arenas) in Mb. Default: numDownloadWorkers * downloadChunkMb * 16. If set, chunk size might be changed to fit the max requirements. It doesn’t mean that all of this memory will be used and actually allocated in the physical RAM.

Results

PBM supports different storage types, but for this implementation, we decided to start with the most widely used – S3 compatible. We aim to port it to Azure Blob and FileSystem storage types in subsequent releases.

Our tests on AWS S3 show up to 19x improvements in the restore speed:

Instances
Backup size
Concurrency
Span size
Restore time
Concurrent download
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
4
   32Mb
    45 min
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
8
   32Mb
    32 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
12
   32Mb
    168 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
24
   32Mb
    117 min
Release v2.0.3
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
    227 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
    ~2280 min

 

AWS S3 MongoDB backup

* Tests were made on AWS i3en instances with the S3 storage in the same region.

** We didn’t wait for 5Tb restore on v2.0.3 to finish and used the “time to uploaded Gb” ratio for results extrapolation.

Try Percona Backup for MongoDB for faster restores

This is a significant improvement that comes among the other features with the new Percona Backup for MongoDB (PBM) release. Give it a try, and leave your feedback!

 

Get Percona Backup for MongoDB

Apr
24
2023
--

Talking Drupal #396 – Drupal Security

Today we are talking about Drupal Security with Mark Shropshire & Benji Fisher.

For show notes visit: www.talkingDrupal.com/396

Topics

  • Why do you care about security
  • Best tips for securing Drupal
  • Common Security Issues people have with Drupal
  • Convincing module maintainers to do full releases
  • Testing to ensure security
  • Guardr Drupal security distribution
  • What does the Drupal Security team do
  • Finding issues
  • Review compromised sites
  • Becoming a member
  • Process for writing security notices
  • Helping the security team

Resources

Guests

Benji Fisher – tag1consulting.com @benji17fisher Mark Shropshire – shrop.dev @shrop

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Jordan Graham – @jordanlgraham

MOTW Correspondent

Martin Anderson-Clutz – @mandclu CrowdSec Integrates your Drupal site with the open source CrowdSec Security Engine, a collaborative malicious activity detection and remediation tool.

Apr
24
2023
--

PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved

PostgreSQL Indexes

Indexes are generally considered to be the panacea when it comes to SQL performance tuning, and PostgreSQL supports different types of indexes catering to different use cases. I keep seeing many articles and talks on “tuning” discussing how creating new indexes speeds up SQL but rarely ones discussing removing them. The urge to create more and more indexes is found to be causing severe damage in many systems. Many times, removing indexes is what we should be doing first before considering any new indexes for the benefit of the entire system. Surprised? Understanding the consequences and overhead of indexes can help to make an informed decision and potentially save the system from many potential problems.

At a very basic level, we should remember that indexes are not free of cost. The benefits come with a cost in terms of performance and resource consumption. The following is the list of ten problems/overheads that the excessive use of indexes can cause. This post is about PostgreSQL, but most of the problems also apply to other database systems.

1. Indexes penalize the transactions

We might see an improvement in the performance of a SELECT statement after adding an index. But we should not forget that the performance gains come with a cost to the transactions on the same table. Conceptually, every DML on a table needs to update all the indexes of the table. Even though there are a lot of optimizations for reducing the write amplification, it is a considerable overhead.

For example, let’s assume that there are five indexes on a table; every INSERT into the table will result in an INSERT of the index record on those five indexes. Logically, five index pages also will be updated. So effectively, the overhead is 5x.

2. Memory usage

Index pages must be in memory, regardless of whether any query uses them because they need to get updated by transactions. Effectively, the memory available for pages of the table gets less. The more indexes, the more the requirement of memory for effective caching. If we don’t increase the available memory,  this starts hurting the entire performance of the system.

3. Random writes: Updating indexes is more costly

Unlike INSERTS new records into tables, rows are less likely to be inserted into the same page. Indexes like B-Tree indexes are known to cause more random writes.

4. Indexes need more cache than tables

Due to random writes and reads, indexes need more pages to be in the cache. Cache requirements for indexes are generally much higher than associated tables.

5. WAL generation

In addition to WAL records of the table updates, there will also be WAL records for indexes. This helps in crash recovery and replication. If you are using any wait event analysis tools/scripts like pg_gather, the overhead of the WAL generation will be clearly visible. The actual impact depends on the index type.

WAL

This is a synthetic test case, but if WAL-related wait events appear as any of the top wait events, it is a matter of concern for a transaction system, and we should take every step to address it.

 

Download Percona Distribution for PostgreSQL Today!

 

6. More and more I/O

Not just WAL records are generated; we will have more pages dirtied, as well. As the index pages get dirtied, they must be written back to files, leading to more I/O again—the “DataFileWrite” wait event, as seen in the previous screenshot.

Another side effect is indexes increase the total Active-Dataset size. By “Active dataset,” I mean the tables and indexes which are frequently queried and used. As the size of the active dataset increases, the cache becomes less and less efficient. Less-effective cache results in more datafile read, so read I/O is increased. This is in addition to the read I/O required to bring the additional index pages from storage for specific queries.

Again the pg_gather report of another system with mainly select queries shows this problem. As the Active-Dataset increases, PostgreSQL has no choice but to bring the pages from storage.

PostgreSQL Active-Dataset

A more significant percentage of “DataFileRead” sustaining for a longer duration indicates that the Active-Dataset is much bigger, which is not cachable.

7. Impact on VACUUM/AUTOVACUUM

The overhead is not only for inserting or updating index pages, as discussed in the previous points. There is overhead in maintaining it since the indexes also need cleanups of old tuple references.

I have seen cases where autovacuum workers on a single table run for a very long duration because of the size of the table and, most importantly, the excessive number of indexes on the table. In fact, it is widespread that users see their autovacuum worker is “stuck” for hours without showing any progress for a longer duration.  This happens because the index cleanup by the autovacuum is the opaque stage of autovacuum and is not visible through views like pg_stat_progress_vacuum other than the vacuum phase is indicated as vacuuming indexes.

Indexes can get bloated and become less efficient over time.  Periodic index maintenance (REINDEX) might be needed in many systems.

8. Tunnel vision while tuning

Tunnel vision is the loss of the field view. The user may be concentrating on a particular SQL statement in an attempt to “tune” and decide on creating indexes.  By creating an index for tuning a query, we are shifting more system resources to that query.  Then it may give more performance to that particular statement by penalizing others.

But as we keep creating more and more indexes for tuning other queries, the resources will shift again towards other queries. This leads to a situation where the effort to tune every query penalizes every other query. Ultimately, everyone will be hurt, and only losers will be in this war. Someone trying to tune should consider how every part of the system can co-exist (maximizing business value) rather than absolute maximum performance for a particular query.

9. Greater storage requirement

Almost every day, I see cases where indexes take more storage than tablets.

PostgreSQL index storage requirement

This may sound too silly for those with more money to spend on storage, but we should remember that this has a cascading effect. The total database size grows to a multiple of the actual data. So obviously, backups take more time, storage, and network resources, and then the same backup can put more load on the host machine. This would also increase the time to restore a backup and recover it. Bigger databases affect many things, including more time to build standby instances.

10. Indexes are more prone to corruption

I am not just talking about rarely occurring index-related bugs like silent index corruption of PostgreSQL 14 or index corruption due to glibc collation change, which keeps popping up now and then and affects many environments even today. Over decades of working with databases, I have observed that index corruptions are reported more frequently. (I hope anyone involved in PostgreSQL for years and who has seen hundreds of cases will agree with me). As we increase the number of indexes, we increase the probability.

What should we do?

A set of critical questions should accompany new index considerations: Is it essential to have this index, or is it necessary to speed up the query at the cost of more index?  Is there a way to rewrite the query to get a better performance? Is it ok to discard the small gains and live without an index?

Existing indexes also require a critical review over a period of time. All unused indexes (those indexes with idx_scan as zero in pg_stat_user_indexes) should be considered for dropping. Scripts like the one from pgexperts can help to do more analysis.

The upcoming PostgreSQL 16 has one more column in pg_stat_user_indexes / pg_stat_all_indexes  with the name last_idx_scan, which can tell us when was the last time the index was used (timestamp). This will help us to take a well-informed look at all the indexes in the system.

Summary

The summary in simple words: Indexes are not cheap. There is a cost, and the cost can be manifold.  Indexes are not always good, and sequential scans are not always bad, either.  My humble advice is to avoid looking for improving individual queries as the first step because it is a slippery slope. A top-down approach to tuning the system yields better results starting from tuning the Host machine, Operating System, PostgreSQL parameter, Schema, etc.  An objective “cost-benefit analysis” is important before creating an index.

Our world-class PostgreSQL training teaches you all about indexes: when to use them, when not to use them, how they impact your system, etc. Come take our intensive instructor-led training course.

 

Learn more about Percona Training

Apr
24
2023
--

How to Persist a Hashed Format Password Inside ProxySQL

persist a hashed password ProxySQL

In this blog post, we will see how to persist the password inside the ProxySQL mysql_users table in hashed format only. Also, even if someone stored the password in cleartext, we see how to change those into the hashed format easily.

Here we are just highlighting one of the scenarios during work on the client environment where we noticed that the ProxySQL mysql_users table had more than 100 user entries, but some of them were available/inserted into the clear text password, whereas some were inserted properly into hashed entries.

Before just explaining those simple commands that were used to fix those clear text entries into the hashed entry quickly, let’s see some more information about the ProxySQL mysql_users table and the password formats.

Password formats inside ProxySQL

ProxySQL is capable of storing passwords in two different formats within the mysql_users.password field, whether in-memory or on-disk. These formats include plain text and hashed passwords.

Plain text passwords are extremely vulnerable to unauthorized access, as anyone with access to the database or configuration files can easily read them. While storing these files in a secure location can mitigate some security concerns, there is still a risk of data breaches. Hashed passwords, on the other hand, are stored in the same format as passwords in the MySQL server’s “mysql.user.password” (before MySQL 8.0 version) or “mysql.user.authentication_string” column (since MySQL 8.0 version using the mysql_native_password plugin), providing an added layer of security.

In ProxySQL, any password that begins with an asterisk (*) is considered to be a hashed password.

The Admin interface of ProxySQL lacks a PASSWORD() function. Therefore, any passwords stored within ProxySQL are preserved in the format in which they were originally inserted. This format may either be plain text or a hashed value.

Note: In general, ProxySQL doesn’t support the user created using the caching_sha2_password plugin password, once the same mysql.user.authentication_string is stored inside the mysql_users.password column. Still, there is a workaround for using those user accounts that are created inside the database using the caching_sha2_password plugin by inserting the clear text password entries inside the ProxySQL mysql_users.password column, but that is not recommended as per security best practices to keep clear text password entries inside the ProxySQL. Hence, we could say, ProxySQL and MySQL communication better support users that are created with the mysql_native_password plugin inside the database. 

 

For more details, please check this blog post ProxySQL Support for MySQL caching_sha2_password and the official ProxySQL documentation Information about MySQL 8.0 – ProxySQL .

So, to explain this scenario, here we created four different test DB user accounts inside the database with the mysql_native_password plugin.

From a database node: 

Username Password (In Clear Text)
test1 test1
test2 test2
test3 test3
test4 test4
mysql [localhost:8028] {msandbox} ((none)) > select user,host,authentication_string,plugin from mysql.user where user like 'test%'G
*************************** 1. row ***************************
                 user: test1
                 host: localhost
authentication_string: *06C0BF5B64ECE2F648B5F048A71903906BA08E5C
               plugin: mysql_native_password
*************************** 2. row ***************************
                 user: test2
                 host: localhost
authentication_string: *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E
               plugin: mysql_native_password
*************************** 3. row ***************************
                 user: test3
                 host: localhost
authentication_string: *F357E78CABAD76FD3F1018EF85D78499B6ACC431
               plugin: mysql_native_password
*************************** 4. row ***************************
                 user: test4
                 host: localhost
authentication_string: *D159BBDA31273BE3F4F00715B4A439925C6A0F2D
               plugin: mysql_native_password
4 rows in set (0.00 sec)

From ProxySQL: 

Here we will insert the user accounts into the mysql_users tables in mixed clear text format as well as in hash format.

ProxySQL_Admin> INSERT INTO mysql_users(username,password) VALUES ('test1','test1'), ('test2','*7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E'),('test3','test3'), ('test4','*D159BBDA31273BE3F4F00715B4A439925C6A0F2D');
Query OK, 4 rows affected (0.00 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | test1                                     |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | test3                                     |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)


ProxySQL_Admin> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)


ProxySQL_Admin> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.01 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | test1                                     |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | test3                                     |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)

Below are some other ProxySQL configurations done to verify the ProxySQL’s mysql_users table password is working fine to establish a connection with the MySQL database.

ProxySQL_Admin> select hostgroup_id,hostname,port,status from mysql_servers;
+--------------+-----------+------+--------+
| hostgroup_id | hostname  | port | status |
+--------------+-----------+------+--------+
| 10           | localhost | 8028 | ONLINE |
+--------------+-----------+------+--------+
1 row in set (0.00 sec)


ProxySQL_Admin> select rule_id,active,proxy_port,match_digest,destination_hostgroup,retries,apply from mysql_query_rulesG
*************************** 1. row ***************************
              rule_id: 1048
               active: 1
           proxy_port: 6033
         match_digest: ^SELECT.*FOR UPDATE
destination_hostgroup: 10
              retries: 3
                apply: 1
*************************** 2. row ***************************
              rule_id: 1050
               active: 1
           proxy_port: 6033
         match_digest: ^SELECT.*$
destination_hostgroup: 10
              retries: 3
                apply: 1
2 rows in set (0.00 sec)

Let’s check the database connectivity via ProxySQL using these DB user accounts.

for i in {1..4}; do mysql -h 127.0.0.1 -utest$i -ptest$i -P6033 -e"select current_user(),version();";done
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test1@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test2@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test3@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test4@localhost | 8.0.28    |
+-----------------+-----------+

Here above, the main problem is that our mysql_users tables have some plain text passwords visible for some user accounts (test1 & test3), which we don’t want to keep in clear/plain text password format. Instead, all password entries should be available in the hashed password format.

There is one way to fix this problem: drop those clear text password entries for user accounts, get the password hash for those user passwords generated from the MySQL database prompt using the PASSWORD() function, and later insert those actual hashed entries inside the mysql_users table to fix the issue.

But as earlier mentioned, if our mysql_users table had a lot of (>100) entries, fixing those passwords manually can be a tedious job.

Note: Here it is assumed we are not using the percona-scheduler-admin client, which has the feature to sync your user accounts directly with the database nodes in the ProxySQL mysql_users table.

So for this case, let’s see the next section, where we will understand how the admin-hash_passwords variable will help us to solve this problem and persist only hashed password entries inside the ProxySQL’s mysql_users table.

ProxySQL’s admin-hash_passwords variable

ProxySQL version 1.2.3 has included a new global boolean variable called admin-hash_passwords, which is enabled by default to support hashed passwords. If admin-hash_passwords=true, passwords will be automatically hashed during runtime when executing the LOAD MYSQL USERS TO RUNTIME command. However, passwords stored in the mysql_users tables will not be automatically hashed.

Nevertheless, it is possible to hash these passwords both in-memory and on-disk by copying users from RUNTIME using commands such as SAVE MYSQL USERS FROM RUNTIME” after executing LOAD MYSQL USERS TO RUNTIME and then saving the updated information using SAVE MYSQL USERS TO DISK.

Let’s persist the hashed password inside ProxySQL

ProxySQL_Admin> select @@admin-hash_passwords;
+------------------------+
| @@admin-hash_passwords |
+------------------------+
| true                   |
+------------------------+
1 row in set (0.00 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | test1                                     |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | test3                                     |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)


ProxySQL_Admin> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | test1                                     |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | test3                                     |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)

Currently, passwords are hashed at RUNTIME, but they are not hashed on the mysql_users table. To hash them inside the mysql_users table as well, we need to run the SAVE MYSQL USERS FROM RUNTIME command.

ProxySQL_Admin> SAVE MYSQL USERS FROM RUNTIME;
Query OK, 0 rows affected (0.00 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | *06C0BF5B64ECE2F648B5F048A71903906BA08E5C |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | *F357E78CABAD76FD3F1018EF85D78499B6ACC431 |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)

The command SAVE MYSQL USERS TO DISK can now be used to store/persist the hashed passwords on the disk.

ProxySQL_Admin> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.01 sec)


ProxySQL_Admin> select distinct username,password from mysql_users where username like 'test%';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| test1    | *06C0BF5B64ECE2F648B5F048A71903906BA08E5C |
| test2    | *7CEB3FDE5F7A9C4CE5FBE610D7D8EDA62EBE5F4E |
| test3    | *F357E78CABAD76FD3F1018EF85D78499B6ACC431 |
| test4    | *D159BBDA31273BE3F4F00715B4A439925C6A0F2D |
+----------+-------------------------------------------+
4 rows in set (0.00 sec)

Let’s verify the database connectivity via ProxySQL using these DB user accounts.

for i in {1..4}; do mysql -h 127.0.0.1 -utest$i -ptest$i -P6033 -e"select current_user(),version();";done
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test1@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test2@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test3@localhost | 8.0.28    |
+-----------------+-----------+
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------+-----------+
| current_user()  | version() |
+-----------------+-----------+
| test4@localhost | 8.0.28    |
+-----------------+-----------+

Conclusion

Using the admin-hash_passwords feature can be extremely beneficial when there are mixed-format password entries in the mysql_users table. By saving the RUNTIME entries, which consist of hashed passwords, to disk and persisting only these entries in the mysql_users table of ProxySQL, we can easily simplify the management of hashed password entries. Furthermore, to ensure that only hashed password entries are stored within ProxySQL, it is imperative to create database user accounts using the mysql_native_password plugin.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Apr
24
2023
--

Percona XtraDB Cluster 8.0.32-24, Updated Percona Backup for MongoDB: Release Roundup April 24, 2023

Percona Releases

You know what time it is… it’s time for the release roundup!

Percona is a leading provider of unbiased, performance-first, open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive, free from vendor lock-in.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights, critical information, links to the full release notes, and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since April 10, 2023. Take a look!

Percona XtraDB Cluster 8.0.32-24

Percona XtraDB Cluster 8.0.32-24 was released on April 18, 2023. It supports critical business applications in your public, private, or hybrid cloud environment. Our free, open source, enterprise-grade solution includes the high availability and security features your business requires to meet your customer expectations and business goals. Percona XtraDB Cluster is based on Percona Server for MySQL. Find a complete list of improvements and bug fixes in the Percona Server for MySQL 8.0.32-24 (2023-03-20) release notes.

Download Percona XtraDB Cluster 8.0.32-24

Percona Backup for MongoDB 2.1.0

On April 18, 2023, we released Percona Backup for MongoDB 2.1.0. It is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a non-sharded replica set) and for restoring those backups to a specific point in time.

Release highlights include:

  • Incremental physical backups are now generally available, enabling you to use them in production environments. Note that due to the changes in metadata files required for the restore, backups made with previous PBM versions are incompatible for the restore with PBM 2.1.0.
  • You can now selectively back up and restore sharded collections. This improves the management of a desired subset of data in sharded clusters and saves you extra costs on data storage and transfer. This is the tech preview feature due to some known limitations.
  • The improved handling of master keys for data at rest encryption in Percona Server for MongoDB and the retrieval of the key ID/secret path by PBM from a backup simplifies the environment preparation for the physical restore and improves the restore flow.

Download Percona Backup for MongoDB 2.1.0

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Apr
21
2023
--

Fixing Errant GTID With Orchestrator: The Easy Way Out

Fixing Errant GTI With Orchestrator

In this article, we will discuss errant Transaction /GTID and how we can solve them with the Orchestrator tool.

Orchestrator is a MySQL high availability and replication management tool that runs as a service and provides command line access, HTTP API, and Web interface. I will not go into the details of the Orchestrator but will explore one of the features that can help us solve the errant GTID in a replication topology.

What are errant transactions?

Simply stated, they are transactions executed directly on a replica. Thus they only exist on a specific replica. This could result from a mistake (the application wrote to a replica instead of writing to the source) or by design (you need additional tables for reports).

What problem can errant transactions cause?

The major problem it causes during a planned change in a MySQL replication topology is that the transaction is not present in the binlog and hence cannot be sent over to the replica, which causes a replication error.

So let’s jump into generating and fixing an errant transaction. Below is my current topology:

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306  0s ok 5.7.41-44-log rw ROW GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log ro ROW GTID
+ 192.168.56.30:3306 0s ok 5.7.41-44-log ro ROW GTID

Now let’s make some changes on any of the replicas, which will generate an errant transaction. On 192.168.56.20:3306, I created a test database:

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

This will result in an errant transaction, so let’s see how the Orchestrator will show the topology.

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID:errant
+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

Now you can see we have an errant transaction; we can check in more detail by using the Orchestrator API as below:

[root@monitor ~]# orchestrator-client -c which-gtid-errant -i 192.168.56.20:3306
A71a855a-dcdc-11ed-99d7-080027e6334b:1

To know which binlogs have this errant transaction, you check with the below command:

[root@monitor ~]# orchestrator-client -c locate-gtid-errant -i 192.168.56.20:3306
mysqlbinlog.000001

Checking the binlogs is very important. We should know what changes were made to the replica, and you can check that binlog for specific GTIDs.

We can get the output from replication analysis, and you use this API feature in your custom code in case you want to monitor the topology for errant transactions:

[root@monitor ~]# orchestrator-client -c api -path replication-analysis | jq . | grep -A2 -B2 "StructureAnalysis"
      "Analysis": "NoProblem",
      "Description": "",
      "StructureAnalysis": [
        "ErrantGTIDStructureWarning"

There is a more detailed way to compare the ExecutedGtidSet and GtidErrant on the whole topology. So let me show you below:

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant:.GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": "a71a855a-dcdc-11ed-99d7-080027e6334b:1"
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10",
  "GtidErrant": ""
}

So now we know about the issue, let’s fix it with the Orchestrator.

The first way to fix it is to inject an empty transaction, which can be done as below:

[root@monitor ~]# orchestrator-client -c gtid-errant-inject-empty -i 192.168.56.20:3306
192.168.56.20:3306

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'

192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID

+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID

+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

The gtid-errant-inject-empty configuration contains settings related to injecting empty transactions to reconcile Global Transaction Identifiers (GTIDs) in a MySQL replication topology. GTIDs are a way to uniquely identify transactions in a MySQL cluster, and ensuring their consistency is critical for maintaining data integrity.

So with injecting an empty transaction, the Orchestrator will inject the empty transaction from the top, it will replicate to the bottom, and that GTID will be ignored by the replica server, which already has it. So now you can see that the gti-executed set is changed, and it contains the GTID with UUID from the replica 192.168.56.20:3306.

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant: .GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}

Another way to fix this is a DANGEROUS way is to reset the master.

Orchestrator has a command gtid-errant-reset-master, applied on an instance:

Then this command “fixes” errant GTID transactions via RESET MASTER; SET GLOBAL gtid_purged…

This command is, of course, destructive to the server’s binary logs. If binary logs are assumed to enable incremental restore, then this command is dangerous.

So an example to fix an errant transaction is:

[root@monitor ~]# orchestrator-client -c topology-tabulated -alias testcluster | tr '|' 't'
192.168.56.10:3306   0s ok 5.7.41-44-log  rw ROW   GTID
+ 192.168.56.20:3306 0s ok 5.7.41-44-log  ro ROW   GTID:errant
+ 192.168.56.30:3306 0s ok 5.7.41-44-log  ro ROW   GTID

[root@monitor ~]# orchestrator-client -c which-gtid-errant -i 192.168.56.20:3306
A71a855a-dcdc-11ed-99d7-080027e6334b:2

This is how it looks:

{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1-2",
  "GtidErrant": "a71a855a-dcdc-11ed-99d7-080027e6334b:2"
}

Let’s reset the master.

[root@monitor ~]# orchestrator-client -c gtid-errant-reset-master -i 192.168.56.20:3306
192.168.56.20:3306

Now you can see that the ExecutedGtidSet is synced with the Source ExecutedGtidSet.

[root@monitor ~]# sudo orchestrator-client -c api -path cluster/testcluster | jq -C '.[] | {Port: .Key.Port, Hostname: .Key.Hostname,ServerUUID: .ServerUUID, ExecutedGtidSet: .ExecutedGtidSet, GtidErrant: .GtidErrant}'
{
  "Port": 3306,
  "Hostname": "192.168.56.10",
  "ServerUUID": "3b678bc9-dcdc-11ed-b9fc-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.20",
  "ServerUUID": "a71a855a-dcdc-11ed-99d7-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}
{
  "Port": 3306,
  "Hostname": "192.168.56.30",
  "ServerUUID": "ea6c6af9-dcdc-11ed-9e09-080027e6334b",
  "ExecutedGtidSet": "3b678bc9-dcdc-11ed-b9fc-080027e6334b:1-10,na71a855a-dcdc-11ed-99d7-080027e6334b:1",
  "GtidErrant": ""
}

But this option is risky because this command actually purged the binlogs, and if any app is tailing the logs or if binary logs are assumed to enable incremental restore, then this command is dangerous and not recommended. It’s better to use gtid-errant-inject-empty, and if you still want to use gtid-errant-reset-master on a busy replica, then stop the replication first and make sure to wait for two or three minutes, then use gtid-errant-reset-master.

Conclusion

If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And specifically, be careful if you use a tool that reconfigures replication for you. It is always recommended to use the pt-table-checksum and pt-table-sync if you ever get this kind of situation where changes were made to the replica.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Apr
20
2023
--

Upgrading to MySQL 8: Tools That Can Help

upgrading to MySQL 8

As we approach end of life for MySQL 5.7 later this year, many businesses are currently working towards upgrading to MySQL 8. Such major version upgrades are rarely simple, but thankfully there are tools that can help smooth the process and ensure a successful upgrade.

It should be noted that while the technical aspects of the upgrade process are beyond the scope of this blog post, it is crucial to create a testing environment to verify the upgrade before proceeding to upgrade your production servers, particularly with MySQL 8. 

As there is no procedure for downgrading from MySQL 8 other than restoring a backup, testing and validation are more critical than previous major version upgrades. 

With that disclaimer out of the way, let’s examine some of the tools that are available to simplify the upgrade process.

Percona Toolkit

Percona Toolkit is a collection of advanced open source command-line tools developed and used by the Percona technical staff that are engineered to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually. 

Several of these tools can help with upgrade planning, making the entire process much easier and less prone to downtime or issues. 

    • pt-upgrade
      • The pt-upgrade tool helps you run application SELECT queries and generates reports on how each query pattern performs on the servers across different versions of MySQL. 
    • pt-query-digest
      • As best practice dictates gathering and testing all application queries by activating the slow log for a period of time, most companies will end up with an enormous amount of slow log data. The pt-query-digest tool can assist in query digest preparation for your upgrade testing. 
    • pt-config-diff
      • The pt-config-diff tool helps determine the differences in MySQL settings between files and server variables. This allows a comparison of the upgraded version to the previous version, allowing validation of configuration differences. 
    • pt-show-grants
      • The pt-show-grants tool extracts, orders, and then prints grants for MySQL user accounts. This can help to export and back up your MySQL grants before an upgrade or allow you to easily replicate users from one server to another by simply extracting the grants from the first server and piping the output directly into another server.

Early last year, my colleague Arunjith Aravindan wrote a great blog post covering these Percona tools while also detailing the pt-upgrade process itself. You can read his post here: Percona Utilities That Make Major MySQL Version Upgrades Easier.

For more information or to download the Percona Toolkit, please visit: https://docs.percona.com/percona-toolkit/

MySQL Shell Upgrade Checker

In recent years, MySQL has continued to evolve and introduce new features to enhance its capabilities. One of the more relevant additions to the MySQL toolset is the MySQL Shell Upgrade Checker, which is a built-in upgrade tool that enables users to check whether their MySQL instances are compatible with a specific version before upgrading.

The MySQL Shell Upgrade Checker is designed to help users avoid potential issues that may arise during the upgrade process. This tool works by analyzing the structure and content of the existing database and then comparing it to the requirements of the new version. In doing so, it can detect any potential issues, such as deprecated syntax or incompatible data types, and provide guidance on resolving them.

The MySQL 8 Shell Upgrade checker will check for compatibility of the dataset, looking for things such as:

    • Use of old temporal types
    • Use of database objects which conflict with reserved words
    • Removed system variables

All in all, the MySQL Shell Upgrade Checker contains around 21 total checks and will produce a detailed report of any errors, warnings, or notices.

MySQL 8 & MySQL Shell Upgrade Checker vs. pt-upgrade

Don’t confuse the MySQL Shell Upgrade Checker Utility with the pt-upgrade tool since they are used for very different kinds of major version upgrade testing. 

The MySQL Upgrade Checker Utility performs a variety of tests on a selected MySQL server to ascertain whether the upgrade will be successful. This tool, however, does NOT confirm whether the upgrade will be compatible with the application queries or routines, which is where pt-upgrade would come in. 

For more information on the MySQL 8 Shell Upgrade Checker, please refer to the following:

https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-upgrade.html

Conclusion

As always, with anything database related, one of the most important tasks to complete prior to any upgrade is to test extensively. This limits the risk of unforeseen circumstances and allows time to address any unfavorable results before the upgrade. Downgrading a major version is not very feasible once writes are sent to the new MySQL 8 source, so proper planning beforehand is critical.

The tools listed in this post can assist with this planning and help negate any nasty surprises before they become a problem. Percona is always here to help as well, so if you’d like to discuss our Support or Professional Service options, please feel free to contact us!

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com