To be honest, the comparison between the two MySQL distributions is not something that excited me a lot. Mainly because from my MySQL memories, I knew that there is not a real difference between the two distributions when talking about the code base.To my knowledge the differences in the enterprise version are in the additional […]
31
2024
MySQL Table Size Is Way Bigger After Adding a Simple Index; Why?
It is a known good practice to keep only necessary indexes to reduce the write performance and disk space overhead. This simple rule is mentioned briefly in the official MySQL Documentation:https://dev.mysql.com/doc/refman/8.0/en/optimization-indexes.htmlHowever, in some cases, the overhead from adding a new index can be way above the expectations! Recently, I’ve been analyzing a customer case like […]
25
2024
Decoding Sequential Scans in PostgreSQL
In every database product, sequential scans or full table scans are often resource consuming, so almost all the developers and DBAs see such scans as a performance killer. In my opinion, this is a false perception or biased view. In many cases, sequential scans are proven to be a performance booster. However, due to a […]
23
2024
Common PostgreSQL Upgrade Errors and How to Avoid Them
Everyone usually welcomes an upgrade, whether it’s a new operating system on your phone, a first-class seat on a flight, or a revamped kitchen in your home. Upgrades typically mean improvement, which is true for PostgreSQL, as each new version of PostgreSQL brings updates and enhancements essential for your application’s optimal performance, stability, and security. […]
12
2024
Can’t We Assign a Default Value to the BLOB, TEXT, GEOMETRY, and JSON Data Types?
One of our customers wants to create a table having a column of data type TEXT with the default value, but they encountered an error: [crayon-65a58ffb54a4c454230324-i/]. It seems reasonable at first glimpse, as we know that each BLOB, TEXT, GEOMETRY, or JSON value is represented internally by a separately allocated object. This is in contrast […]
22
2023
What’s New in Percona Toolkit 3.5.6
Percona Toolkit 3.5.6 was released on December 21, 2023. This blog post covers the main changes in this release.
New tool: pt-galera-log-explainer
We continue adding more instruments for the Support teams.
pt–galera–log–explainer was written by my Percona Support colleague Yoann La Cancellera.
pt–galera–log–explainer filters, aggregates, and summarizes multiple Galera library logs together. The purpose of this tool is to help find useful information in Percona XtraDB Cluster/Galera library logs.
pt–galera–log–explainer takes raw log files, usually very verbose as the one available in the regression test suite, and makes short summary out of them:
$ ./bin/pt-galera-log-explainer list --all src/go/pt-galera-log-explainer/tests/logs/merge_rotated_daily/node1.20230315.log identifier node1 current path .../tests/logs/merge_rotated_daily/node1.20230315.log last known ip last known name node1 mysql version 2023-03-15T20:10:57.784904+02:00 node2 joined 2023-03-15T20:10:57.785568+02:00 node3 left 2023-03-15T20:10:57.791959+02:00 node3 left 2023-03-15T20:10:57.797221+02:00 PRIMARY(n=2) 2023-03-15T20:20:12.714291+02:00 node2 joined 2023-03-15T20:20:12.714331+02:00 node3 joined 2023-03-15T20:20:13.776977+02:00 PRIMARY(n=3) 2023-03-15T20:20:14.839684+02:00 local node will resync node3 2023-03-15T20:20:14.839723+02:00 SYNCED -> DONOR 2023-03-15T20:20:15.799020+02:00 IST will be used 2023-03-15T20:20:16.850525+02:00 finished sending IST to node3 2023-03-15T20:20:16.850549+02:00 DESYNCED -> JOINED 2023-03-15T20:20:16.865312+02:00 JOINED -> SYNCED
The tool can process logs from multiple nodes and draw a timeline. You can find a usage example with sample output in the user reference manual. You can also filter events to have a more compact output.
pt–galera–log–explainer is an excellent tool for analyzing large log files that produce the Galera library.
Better macOS and ARM support
This release contains a few improvements for platforms and operating systems that Percona does not officially support.
PR-516, contributed by Ivan Kruglov, makes
pt–online–schema–changeand other tools respect case-insensitive lookup on Windows and macOS: the default option for these operating systems.
PR-720 simplifies the build process for the Percona Toolkit. Historically, Percona Toolkit was written in Perl and Shell programming languages. Build instructions for all code were simple:
perl Makefile.PL make (optionally) make test make install
Since the first tool, written in the Go programming language, was introduced, package maintainers had to perform one extra step:
cd src/go make <PLATFORM such as linux-amd64>
Now, this extra step is optional because the top-level
Makefilehas instructions for building Go tools on the current platform.
PR-712 adds the
darwin–arm64 platform to the list of platforms for which Go binaries could be built. While we do not officially support ARM and macOS, you can build the tools yourself without extra effort.
To create macOS ARM binaries on the same platform, simply run the following from the top-level directory.
perl Makefile.PL make
If you want to create binaries on another platform, change the directory to
src/go, then run
make darwin-arm64
We also have unofficial ARM packages for Percona Toolkit and PMM Dump in our labs at https://github.com/Percona-Lab/percona-on-arm.
Tools improvements
Explain output for slow query in JSON report for pt-query-digest
Earlier,
pt–query–digest did not print
EXPLAIN output with option
—output=json . PR-471, sent by Ayush Goyal, adds this possibility.
Hook before_die for pt-online-schema-change
PR-509, sent by Ilaria Migliozzi, introduces a new hook for
pt–online–schema–change . If
pt–online–schema–change stops ungracefully, this hook lets you print diagnostic information about failing operations. You can find an example plugin on GitHub.
Option –unstop for pt-archiver
pt–archiver supports the option –stop that terminates running instances by creating a sentinel file. However, there was no option to remove this file and restart
pt–archiver . PR-429 by fraff resolves this gap by adding the option
—unstop . If
pt–archiver is called with this option, it removes the sentinel file and restarts the operation.
Quality improvements
In my last Percona Toolkit release blog, What’s New in Percona Toolkit 3.5.5, I wrote about how we plan to make regression tests part of the release process. This partially happened, and we are testing Percona Toolkit with Percona Server for MySQL 5.7 and 8.0 on all supported platforms now. There are still a few issues with tests, as reported at PT-2295. We are planning to fix them and then add support for other products.
Viktor Szépe continued to contribute to the quality of the Percona Toolkit source code. He introduced EditorConfig for our GitHub repository, fixed typos, and made style improvements for Go code.
In addition to Kushal Haldar’s reports about vulnerabilities in Go, we enabled better automation for our GitHub repository. This release was built with the latest version of Go and fixes all known module vulnerabilities.
Percona Toolkit Docker images
We released an official Docker image for the Percona Toolkit at https://hub.docker.com/r/percona/percona-toolkit. It can be installed using the following command.
docker pull percona/percona-toolkit
To call any tool using docker, run the following.
docker run <TOOL NAME> <OPTIONS>
For example:
$ docker run --network="host" percona/percona-toolkit pt-online-schema-change > h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=t1 --alter='ADD COLUMN f2 INT' > --execute Found 2 slaves: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Will check slave lag on: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`t1`... Creating new table... Created new table test._t1_new OK. Altering new table... Altered `test`.`_t1_new` OK. 2023-12-21T15:29:22 Creating triggers... 2023-12-21T15:29:22 Created triggers OK. 2023-12-21T15:29:22 Copying approximately 1 rows... 2023-12-21T15:29:22 Copied rows OK. 2023-12-21T15:29:22 Analyzing new table... 2023-12-21T15:29:22 Swapping tables... 2023-12-21T15:29:22 Swapped original and new tables OK. 2023-12-21T15:29:22 Dropping old table... 2023-12-21T15:29:22 Dropped old table `test`.`_t1_old` OK. 2023-12-21T15:29:22 Dropping triggers... 2023-12-21T15:29:22 Dropped triggers OK. Successfully altered `test`.`t1`
Docker images are not only needed for those who prefer to install software via docker but will allow us to easily ship the Percona Toolkit together with other software, such as Percona Monitoring and Management or Percona Kubernetes Operators.
Community contributions
This release includes many contributions from community members. We want to thank:
- Ivan Kruglov for fixing case-sensitivity issues in
pt–online–schema–change and other tools - Chrys Swingler for fixing a bug with the option
—skip–check–slave–lag in
pt–online–schema–change and
pt–table–checksum - Ayush Goyal for adding
EXPLAIN output for slow query in JSON report for
pt–query–digest - Ilaria Migliozzi for new hook
after_die for
pt–online–schema–change - Jakob for fixing
pt–archiver bug PT-2064. - Viktor Szépe for introducing EditorConfig, fixing typos and issues with the Go code
- fraff for the option
—unstop for
pt–archiver - Zongzhi Chen for fixing
pt–pmp bug PT-2211 - Jason Ng for fixing
pt–visual–explain bug PT-2277 - Kushal Haldar for his reports about vulnerabilities in Go-based tools
Percona Toolkit: Free your DBAs with advanced open source command-line tools.
22
2023
The Pros and Cons of Wildcard Indexes on MongoDB
MongoDB is a schemaless database that is extremely flexible. When you create a collection, you don’t have to specify a structure in advance, providing field names and data types. Just start inserting JSON documents, and MongoDB will store them, no matter which fields and data types you provide. As a consequence, a collection can store completely different documents.
MongoDB does not require any ALTER statement to modify the schema of a collection, like in a relational database. If you need at some point to add new fields, just do it. Start inserting new JSON documents with the additional fields. Very easy.
But how can we manage the creation of indexes on a collection where I cannot foresee the fields we could have? In this article, I’ll show wildcard indexes and their pros and cons.
Create a wildcard index on a single field
The simple idea of a wildcard index is to provide the possibility to create an index without knowing in advance the fields we are expecting in the documents. You can put whatever you need and MongoDB will index everything, no matter the field’s name, no matter the data type. The feature looks amazing, but it comes at some cost.
To test wildcard indexes, let’s create a small collection for storing our users’ details. We have some fixed fields like name, date_of_birth, and gender, but also we have a subdocument userMetadata for any other attribute we don’t know in advance. This way, we can store everything we need.
db.user.insert( { name: "John", date_of_birth: new ISODate("2001-02-05"), gender: 'M', userMetadata: { "likes" : [ "dogs", "cats" ] } } ) db.user.insert( { name: "Marie", date_of_birth: new ISODate("2008-03-12"), gender: 'F', userMetadata: { "dislikes" : "hamsters" } } ) db.user.insert( { name: "Tom", date_of_birth: new ISODate("1998-12-23"), gender: 'M', userMetadata: { "age" : 25 } } ) db.user.insert( { name: "Adrian", date_of_birth: new ISODate("1991-06-22"), gender: 'M', userMetadata: "inactive" } ) db.user.insert( { name: "Janice", date_of_birth: new ISODate("1995-09-04"), gender: 'F', userMetadata: { "shoeSize": 8, "likes": [ "horses", "dogs" ] } } ) db.user.insert( { name: "Peter", date_of_birth: new ISODate("2004-01-25"), gender: 'M', userMetadata: { "drivingLicense": { class: "A", "expirationDate": new ISODate("2030-05-05") } } } )
db.user.find() [ { _id: ObjectId('658452229a147dcb1198d9df'), name: 'John', date_of_birth: ISODate('2001-02-05T00:00:00.000Z'), gender: 'M', userMetadata: { likes: [ 'dogs', 'cats' ] } }, { _id: ObjectId('658452289a147dcb1198d9e0'), name: 'Marie', date_of_birth: ISODate('2008-03-12T00:00:00.000Z'), gender: 'F', userMetadata: { dislikes: 'hamsters' } }, { _id: ObjectId('6584522e9a147dcb1198d9e1'), name: 'Tom', date_of_birth: ISODate('1998-12-23T00:00:00.000Z'), gender: 'M', userMetadata: { age: 25 } }, { _id: ObjectId('658452519a147dcb1198d9e2'), name: 'Adrian', date_of_birth: ISODate('1991-06-22T00:00:00.000Z'), gender: 'M', userMetadata: 'inactive' }, { _id: ObjectId('658452d69a147dcb1198d9e3'), name: 'Janice', date_of_birth: ISODate('1995-09-04T00:00:00.000Z'), gender: 'F', userMetadata: { shoeSize: 8, likes: [ 'horses', 'dogs' ] } }, { _id: ObjectId('658453a09a147dcb1198d9e4'), name: 'Peter', date_of_birth: ISODate('2004-01-25T00:00:00.000Z'), gender: 'M', userMetadata: { drivingLicense: { class: 'A', expirationDate: ISODate('2030-05-05T00:00:00.000Z') } } } ]
As you can see, the metaData subdocument contains different fields. But all those fields are not indexed. Let’s suppose our collection contains several million documents; how can we retrieve, for example, all users with a specific driving license class or a specific shoe size without triggering a full collection scan? We can create a wildcard index on the userMetadata field using the special syntax $**
Let’s do it:
db.user.createIndex({ "userMetadata.$**" : 1 }) db.user.getIndexes() [ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { 'userMetadata.$**': 1 }, name: 'userMetadata.$**_1' } ]
This tells MongoDB to create an entry in the index for every single field and for any array member inside userMetadata.
Now, we can benefit from the index to execute any kind of query.
db.user.find({ "userMetadata.likes": "dogs" }) [ { _id: ObjectId('658452229a147dcb1198d9df'), name: 'John', date_of_birth: ISODate('2001-02-05T00:00:00.000Z'), gender: 'M', userMetadata: { likes: [ 'dogs', 'cats' ] } }, { _id: ObjectId('658452d69a147dcb1198d9e3'), name: 'Janice', date_of_birth: ISODate('1995-09-04T00:00:00.000Z'), gender: 'F', userMetadata: { shoeSize: 8, likes: [ 'horses', 'dogs' ] } } ]
We can simply run explain() to certify the wildcard index is used and the query is an IXSCAN.
db.user.find({ "userMetadata.likes": "dogs" }).explain() { explainVersion: '1', queryPlanner: { namespace: 'test.user', indexFilterSet: false, parsedQuery: { 'userMetadata.likes': { '$eq': 'dogs' } }, queryHash: 'E2BC0D70', planCacheKey: '7C6EEF39', maxIndexedOrSolutionsReached: false, maxIndexedAndSolutionsReached: false, maxScansToExplodeReached: false, winningPlan: { stage: 'FETCH', inputStage: { stage: 'IXSCAN', keyPattern: { '$_path': 1, 'userMetadata.likes': 1 }, indexName: 'userMetadata.$**_1', isMultiKey: true, multiKeyPaths: { '$_path': [], 'userMetadata.likes': [ 'userMetadata.likes' ] }, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { '$_path': [ '["userMetadata.likes", "userMetadata.likes"]' ], 'userMetadata.likes': [ '["dogs", "dogs"]' ] } } }, rejectedPlans: [] ... ...
For example, all the following queries can benefit from the same index. You can test them on your own using explain().
db.user.find( { "userMetadata.age" : { $gt: 20 } } ) db.user.find( { "userMetadata": "inactive" } ) db.user.find( { "userMetadata.drivingLicense.class": "A", "userMetadata.drivingLicense.expirationDate": { $lt: ISODate("2032-01-01") } } ) db.user.find( { "userMetadata.shoeSize": 8})
Create a wildcard index on the entire document
What about creating a wildcard index on the entire document? Is it possible?
Yes, that’s possible. We can do this if we don’t know anything in advance about the documents we’re going to get in the collection.
There is another special syntax for doing that. Use the $** again without specifying a field name. Let’s do it on our user collection.
db.user.createIndex( { "$**" : 1 } )
Again, you can test the same queries we did before. You can notice all fields of the document are indexed now.
db.user.find( { name: "Marie" } ) [ { _id: ObjectId('658452289a147dcb1198d9e0'), name: 'Marie', date_of_birth: ISODate('2008-03-12T00:00:00.000Z'), gender: 'F', userMetadata: { dislikes: 'hamsters' } } ] db.user.find( { name: "Marie" } ).explain() { explainVersion: '1', queryPlanner: { namespace: 'test.user', indexFilterSet: false, parsedQuery: { name: { '$eq': 'Marie' } }, queryHash: '64908032', planCacheKey: 'A6C0273F', maxIndexedOrSolutionsReached: false, maxIndexedAndSolutionsReached: false, maxScansToExplodeReached: false, winningPlan: { stage: 'FETCH', inputStage: { stage: 'IXSCAN', keyPattern: { '$_path': 1, name: 1 }, indexName: '$**_1', isMultiKey: false, multiKeyPaths: { '$_path': [], name: [] }, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { '$_path': [ '["name", "name"]' ], name: [ '["Marie", "Marie"]' ] } } }, rejectedPlans: [] ... ...
Pros and cons
The good thing about wildcard indexes is easy to say it’s the great flexibility they provide. Just index everything, even what you are not expecting.
The bad thing instead is the index size. As you can simply imagine, what could happen in this case, I’ll start to insert an impressive amount of fields or arrays with thousands of items in the userMetadata? Well, more and more index entries will be created. A single document can generate thousands of index entries. Am I supposed to use such a huge amount of disk and memory for managing this very large index structure?
Remember, indexes are most effective if they can fit into the memory. The size of a wildcard index can simply explode if we cannot control (or we were not able to foresee) the amount of data we create.
Our collection is very small, so the numbers should not worry. But think about what could happen in the case of very large collections. The size of the indexes can go out of control.
Let’s use a simple trick to increase the size of the collection. Run the following statement to double at any time the number of documents. Do it eight or ten times, depending on how many documents you want.
db.user.find( {}, {_id:0}).forEach(function (doc) { db.user.insertOne(doc); } )
In my case, I have a collection of 19K documents. Large enough for testing index size.
db.user.stats() ... ... size: 2388744, count: 19147, numOrphanDocs: 0, storageSize: 540672, totalIndexSize: 1343488, totalSize: 1884160, indexSizes: { _id_: 630784, 'userMetadata.$**_1': 192512, '$**_1': 520192 }, avgObjSize: 124, ns: 'test.user', nindexes: 3, ...
The data in the collection is around 2.3MB, and the overall size of the indexes is 1.3 MB. Indexes represent more than 50% of the data size. If this is not impressive enough, think about what the percentage should be in case you have very large subdocuments. I have seen collections in production environments being over-indexed because of a single wildcard index.
The wildcard index, which was beneficial at the beginning to make things more flexible, ended instead in a serious bottleneck for the performance, causing more memory utilization and swapping.
Also, remember that most of the time, only a few fields are used for your most frequent queries. Not always does the utilization of a wildcard index really make sense.
Conclusions
Wildcard indexes may be useful in some specific cases when you don’t know much about the documents you expect. The main suggestion is not to abuse wildcard indexes since they can end in serious bottlenecks. Single-field, compound, and multikey indexes are usually better choices than wildcard ones.
Monitor the index size for all the collections for which you have a wildcard index. If you see the size is increasing rapidly drop it and create instead other indexes on the fields that are most frequently used in your queries.
There is some mitigation you can apply, like excluding some of the fields from the wildcard index. Have a look at the following page for more details: https://www.mongodb.com/docs/manual/core/indexes/index-types/index-wildcard/create-wildcard-index-multiple-fields/
The new MongoDB 7.0 now supports the creation of compound wildcard indexes. Have a look at the following page for more details: https://www.mongodb.com/docs/manual/core/indexes/index-types/index-wildcard/index-wildcard-compound/
Further reading about indexes on MongoDB: MongoDB Indexes Explained: A Comprehensive Guide to Better MongoDB Performance.
Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.
21
2023
Percona Monitoring and Management High Availability – A Proof of Concept
Percona Monitoring and Management (PMM) is a state-of-the-art piece of software that exists in part thanks to great open source projects like VictoriaMetrics, PostgreSQL, and ClickHouse. The integration of those projects, plus the years of Percona expertise in the database space, makes PMM one of the best database monitoring solutions on the market.
Being software composed of different, multiple technologies can add complexity to a well-known concept: High Availability (HA). Achieving HA for PMM, as a whole, has proved to be a challenge but not an impossible task.
The easy part
Setting up the PMM cluster is the easy part. All it needs is a reverse proxy in front of a couple or more PMM instances. The go-to proxy is HAProxy configured for active/passive topology, that is, without load distribution.
For the purpose of the PoC, a single HAProxy instance is used (running as a docker container). The configuration file looks like this:
global stats socket /var/run/api.sock user haproxy group haproxy mode 660 level admin expose-fd listeners log stdout format raw local0 info defaults mode http timeout client 10s timeout connect 5s timeout server 10s timeout http-request 10s log global frontend stats bind *:8404 stats enable stats uri / stats refresh 10s frontend pmmfrontend bind :80 default_backend pmmservers backend pmmservers option tcp-check server pmm1 172.31.12.174:80 check port 80 server pmm2 172.31.11.132:80 check port 80 backup
The Docker container is run with this command:
docker run -d --name haproxy -v $(pwd):/usr/local/etc/haproxy:ro -p 80:80 -p 443:443 -p 8404:8404 haproxytech/haproxy-alpine:2.
The -v for the volume guarantees that the local copy of the haproxy.cfg file is the one used inside the container. Whenever you make a change in the cfg file, for the haproxy container to use it, just execute:
docker kill -s HUP haproxy
And to follow the haproxy logs:
docker logs -f haproxy
We have two frontends: One for the HAProxy stats and another for the PMM itself. The backend is a single one where the “passive” PMM instance (the one that is a pure “read replica”) is marked as “backup” so that traffic is only routed there in case the primary fails the health check.
For simplicity, the PMM instances are configured to listen to the 80 port (http) on the private IPs. This is made to avoid SSL certificates since everything goes through the same VPC (everything runs on ec2 instances on AWS). The health check, then, can be a simple “tcp-check” against port 80.
As you can see, stats are available via the port 8404. With this, the easy part is done.
For this example, the PMM SERVER endpoint will be where the HAProxy frontend is listening, and that’s the one used when registering a new PMM CLIENT.
And you can access PMM always using the same endpoint.
The not-so-easy part (made easy)
The proxy is configured to be aware of two different PMM instances — pmm1 and pmm2 — (using the private IPs 172.31.12.174 and 172.31.11.132 in this case), but we haven’t mentioned anything about those PMM instances.
And here is the not-so-easy part: One has to deploy at least two PMM instances on at least two different servers AND set up replicas. How to do it? This is the actual Proof of Concept: Enter the PMM HA script: https://github.com/nethalo/pmmha
The script will take care of installing PMM (if you already have it, you can skip this step), preparing the Primary, and setting up the Secondary. Simple as that. PMM will remain to be a black box.
The requirements are:
Steps to run the script:
git clone https://github.com/nethalo/pmmha.git cd pmmha bash pmm.sh
Failover
The failover will be handled by the HAProxy automatically when it detects that the current primary is no longer available. Traffic will be routed to the backup server from the backend, which, if properly set as a replica, will already have the historical data for metrics and QAN, and also the inventory will be ready to continue the data scrapping from the exporters.
Your feedback is key
We want to hear from you! Suggestions, feature requests, and comments in general are welcome, either in the comment section or via the GitHub repository.
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.
13
2023
Database Upgrades: DIY or Choose a Database Upgrade Consulting Service?
To maintain competitiveness and operational efficiency — not to mention ensure security and compliance when your database version reaches End of Life (EOL) – it’s crucial to upgrade your database systems from time to time. When it comes to undertaking this considerable (and potentially time-consuming) project, you generally have one of two options: DIY upgrades or a database upgrade service. Each approach has its own set of advantages and challenges that can significantly impact the success of your upgrade.
Do-it-yourself (DIY) database upgrades vs a database upgrade consulting services
DIY upgrades
A DIY approach to database upgrades is often tempting for organizations looking to minimize costs and maintain control over their systems. The DIY route provides a hands-on experience and a deeper understanding of the internal workings of your database. It can be a good fit for those with in-house expertise capable of managing the complexities involved in an upgrade. The potential for cost savings is also an advantage — DIY means not having to hire (or pay) anyone else.
However, the challenges of DIY upgrades are significant. They require a substantial time investment and a steep learning curve, especially for complex databases or those without an in-house IT team with the necessary skills. There’s also a higher risk of errors, which can lead to downtime, data loss, and security vulnerabilities. Plus, there’s the risk that if things go truly sideways, you can end up spending a lot more money than you saved by not hiring a database upgrade consulting service.
Database upgrade consulting services
Opting for a professional upgrade service, on the other hand, provides expertise and a structured approach to your upgrade. These services come with teams of experts who are well-versed in the latest technologies and can offer insights into best practices — and anticipate any potential challenges — for your specific database environment.
One of the most compelling reasons to choose a professional service is the reduced risk. Upgrade services often have proven methodologies, robust testing procedures, and backup strategies to ensure a smooth transition with minimal downtime. This can be particularly important for businesses that rely heavily on their database for daily operations.
Additionally, choosing a professional service can be more cost-effective in the long run. While there is an upfront cost, the expertise provided can prevent costly mistakes and inefficiencies. These services can also often complete upgrades faster than an internal team might, allowing your staff to focus on core business activities rather than the intricacies of a database upgrade.
Choosing the right database upgrade consulting service
The significance of choosing the right database upgrade consulting service cannot be overstated. It’s about finding a partner that understands the nuances of your data environment and has the expertise to enhance its function without interrupting your business operations. The reliability of the service provider should be your north star. This includes their track record, the robustness of their infrastructure, and their ability to deliver on promises without causing disruptions to your business.
Understanding your specific systems
Compatibility with existing systems is critical. An upgrade isn’t just about the latest features; it’s about ensuring those new features work harmoniously with your existing applications and software. The right service provider will have a deep understanding of different database architectures and will be adept at integrating new technologies with legacy systems. Better yet, choose a provider with multi-database expertise.
Minimizing downtime and performance disruptions
Minimizing downtime and performance disruptions is a top priority. The best database service providers use sophisticated tools and techniques to ensure that upgrades are seamless, often executed with zero downtime strategies.
Service reliability
Service reliability goes beyond mere uptime. It encompasses the provider’s ability to foresee challenges and mitigate them proactively. It’s the assurance that your database is in capable hands, maintained by experts who ensure it operates at peak efficiency.
Customer support and expertise
Excellent customer support and expertise are non-negotiable. The right database upgrade service offers access to a team of experts who can provide insightful consultations and rapid responses to your concerns. This ensures that any issue is resolved swiftly, keeping your database running smoothly.
Effective security measures
In today’s digital landscape, security measures in place are as crucial as the database’s performance. The upgrade service must demonstrate a comprehensive approach to security, safeguarding your data against all forms of vulnerabilities and ensuring compliance with global data protection regulations.
Why Percona stands out for database upgrade consulting services
At Percona, our unique blend of technical excellence, open source commitment, technology-agnostic expertise, and an unwavering focus on customer success sets us apart.
We’ve been successfully upgrading databases across various industries and database types for nearly two decades. Our approach reduces unnecessary expenditures, maximizes resource utilization, and minimizes downtime. You’ll benefit from a comprehensive upgrade service that aligns with your budget and goals
07
2023
Book Review: PostgreSQL 14 Internals by Egor Rogov
The book PostgreSQL 14 Internals has been available in PDF format for quite a while, but recently, the ability to order a printed copy became available (https://postgrespro.com/community/books/internals).
This 548-page tome from PostgresPro covers the spectrum from data organization to details on the many available indexing options. This book has five major sections on MVCC and Isolation (108 pages), Buffer Cache and WAL (53 pages), Locks (42 pages), Query Execution (154 pages), and the types of indexes (127 pages). I mentioned that this material in PDF format was available in from a recent talk on PostgreSQL at Percona Live, and many folks wanted more details. The book arrived a few days after I placed my order (under $30).
I had been reading sections of the PDF version but immediately ordered a printed version when I found it was available. Books like this often require referring to another page for details while reading. I find that it is easier to do this with a book than on a screen. Plus, I tend to write notes in the margins, leave Post-It notes in certain areas, and dog ear pages for future reference, which is hard to do with PDF files.
Is this book an easy read?
Is this an easy read? The subject matter is well-written and communicates the material very well. The illustrations are clear, and there are many reference pointers to the documentation or source code where needed. If you are new to the internals of PostgreSQL, this book, along with the source code and some caffeine, should give you a great start on the subject. But you do not need to refer to the source code as the author covers the concepts with easy-to-read text and clear examples. This is a great tool for learning the intricacies of PostgreSQL.
How to read this book
The best way to read this book is from the first page to the last, as the author builds on previous definitions and explanations. For instance, chapter eight covers the situation where tuple vacuuming (removing the outdated copies of rows in a table) can not return the reclaimed space to the operating system. Having previously covered the structure of tables and rows, this chapter explores how VACUUM FULL
rebuilds tables and indexes. Now, this operation blocks all reads and writes during the operation. The author shows how using the pgstatttuple extension can show an estimated percentage of space filled with useful data. The author has done an excellent job of presenting all this information in a clear, concise fashion in small, easy-to-consume tidbits so that the reader does not feel swamped with details.
Version 14?
Some of you may be asking, “PostgreSQL 14? Isn’t 16 the current release?” The good news is that PostgreSQL’s once-a-year release cycle does not include that much change to the vast majority of the code, and changes are highlighted in the release notes. The material in this book teaches the concepts at a level that you will not be thrown off by minor changes.
For example, Chapter 10 covered the Write-Ahead Log or WAL, which is an area of great concern and some mysticism. In eighteen pages, the subject of the WAL is covered with an overview of how it works. In case of a power failure or other catastrophic calamity, the server needs a way to contain all the essential information needed to repeat operations at the time of the problem, and that is the purpose of the WAL. Since disk flushing is going to lag behind what is in memory, those changes in the data are written to the WAL.
The author starts with an overview of the operations performed by the WAL and proceeds into the logical structure. The WAL buffers are set up as a ring buffer where newer entries overwrite the older, which comes with a proviso to make sure you are getting the data synched to disk at an appropriate rate. The book has a wonderful example of how to track the position of the WAL by showing you how it changes as the data changes. Next, the author delves into the physical structure of the WAL and how to investigate that information. The rest of the chapter covers checkpointing and configuration. There is a lot of material in Chapter 10, but it is covered in a concise, clear fashion that illustrates very well even the most complex subjects.
Do you need a copy?
So, do you need a copy of the hardcopy book? If you are happy with the PDF version, then you probably do not spend the money. But if you are like me and tend to access sections randomly, scribble notes on pages, or just prefer the feel of a book in your hands, then the investment is a good one. Buy this book if you are looking for a deeper understanding of how the PostgreSQL server really works, and you will not be disappointed.
If you are new to PostgreSQL, possibly coming from another database, this book covers all the terminology you stumble across, such as WAL. Chapter 10 fully explains the whys and hows the Write Ahead Log is implemented with wonderful examples of how it all works. This should be required reading for anyone interested in the fine points of PostgreSQL. So, if you are in this group, you need this book, either the hard copy version or the PDF.
Yes, you need a copy of this book if for no other reason than someone asks a specific question about PostgreSQL that you cannot answer (correctly) off the top of your head. In these cases, the book is invaluable for articulating what the database server is doing.
And a big thank you to Egor Rogov for writing this book.