Jun
30
2022
--

Understanding Database High Availability

High Availability

Ensuring that databases are highly available is not just a thing these days, it’s the thing

Downtime, whether scheduled or unplanned, is barely, if at all, tolerated by end users. The consequences of downtime can be severe and may include things like loss of customers, damage to your reputation, or penalties for not meeting Service Level Agreements (SLAs). Making your database environment highly available, then, is a top priority that you need to get right. The good news is, you can build a great high availability (HA) solution using open source databases. We’ll touch on that shortly, but let’s start with some basics.

What is high availability and how do you get it?

High availability refers to the continuous operation of a system so that services to end users are largely uninterrupted. A basic high availability database system provides failover (preferably automatic) from a primary database node to redundant nodes within a cluster. 

HA is sometimes confused with “fault tolerance.” Although the two are related, the key difference is that an HA system provides quick recovery of all system components to minimize downtime. Some disruption may occur but will be minimal. Fault tolerance aims for zero downtime and data loss. As such, fault tolerance is much more expensive to implement because it requires dedicated infrastructure that completely mirrors the primary system. It also demands a lot of resources to maintain it. 

Achieving a database HA solution rests on putting three key principles into practice:

  • Single point of failure (SPOF) – Eliminating any single point of failure in the database environment, including the physical or virtual hardware the database system relies on and which would cause it to fail.
  • Redundancy – Ensuring sufficient redundancy of all components within the database environment and reliable crossover to these components in the event of failure.
  • Failure detection – Monitoring the entire database environment for failures.

How to measure high availability

HA does not guarantee 100% uptime, but it allows you to get pretty close. Within IT, the gold standard for high availability is 99.999%, or “five-nines” of availability, but the level of HA needed really depends on how much downtime you can bear. Streaming services, for example, run mission-critical systems in which excessive downtime could result in significant financial and reputational losses for the business. But many organizations can tolerate a few minutes of downtime without negatively impacting their end users.

The following table shows the amount of downtime for each level of availability from two to five nines. 

Availability % Downtime per year Downtime per month Downtime per week Downtime per day
99% (“two nines”) 3.65 days 7.31 hours 1.68 hours 14.40 minutes
99.5% (“two nines five”) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes
99.9% (“three nines”) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes
99.95% (“three nines five”) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds
99.99% (“four nines”) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.995% (“four nines five”) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds
99.999% (“five nines”) 5.26 minutes 26.30 seconds 6.05 seconds 864.00 milliseconds

How to make your database highly available

Not all companies are the same and neither are their requirements for HA. When planning your database HA architecture, the size of your company is a great place to start to assess your needs. For example, if you’re a small business, paying for a disaster recovery site outside your local data center is probably unnecessary and may cause you to spend more money than the data loss is worth. All companies regardless of size should consider how to strike a balance between availability goals and cost.                                                       

  • Startups and small businesses. Most startups and small businesses can achieve an effective HA infrastructure within a single data center on a local node. This base architecture keeps the database available for your applications in case the primary node goes down, whether that involves automatic failover in case of a disaster or planned switchover during a maintenance window.
  • Medium to large businesses. If you have a bit more budget, consider adding a disaster recovery site outside your local data center. This architecture spans data centers to add more layers of availability to the database cluster. It keeps your infrastructure available and your data safe and consistent even if a problem occurs in the primary data center. In addition to the disaster recovery site, this design includes an external layer of nodes so if communication between the sites is lost, the external node layer acts as a “source of truth” and decides which replica to promote as a primary. In doing so, it keeps the cluster healthy by preventing a split-brain scenario and keeps the infrastructure highly available.
  • Enterprises. An enterprise HA architecture adds another layer of insurance for  companies with additional resources; for whom downtime would mean devastating revenue and reputational losses; and who offer globally distributed services. It features two disaster recovery sites, adding more layers for the infrastructure to stay highly available and keep applications up and running. This architecture, which is based on tightly coupled database clusters spread across data centers and geographic availability zones, can offer 99.999% uptime when used with synchronous streaming replication, the same hardware configuration in all nodes, and fast internode connections. 

Which database is best for high availability?

A common question about database high availability is which database is best. If you’re planning to use proprietary databases, be aware that you’re opening yourself up to vendor lock-in due to burdensome contracts and the high costs associated with loss of data portability (e.g., exorbitant cloud egress fees).

HA is an important goal, but switching to a proprietary database solely for HA limits the other benefits of open source. In addition to enabling a strong database HA architecture, open source avoids costly licensing fees, offers data portability, gives you the freedom to deploy anywhere anytime you want, and delivers great software designed by a community of contributors who prioritize innovation and quality. 

Getting started with open source database high availability

Open source databases like Postgres, MariaDB, MySQL, and Redis are great options for HA but generally don’t include a built-in HA solution. That means you’ll need to carefully review the various extensions and tools available. These extensions and tools are excellent for enriching their respective databases but, as your environment scales, may not be able to keep up with evolving, more complex requirements.

Greater complexity means that you and your team will need certain skills and knowledge, for example, how to write application connectors, integrate multiple systems, and align business requirements with your HA solution. You should also understand open source databases, applications, and infrastructure. Consider the possibility that outside expertise may be necessary to help you manage your HA architecture. 

Learn more about database high availability

Determining which high availability solution is right for your database environment depends greatly on your goals. Find out how Percona high availability database support guarantees application uptime. 

If PostgreSQL is your database of choice for HA, learn how to build highly available PostgreSQL using only battle-tested open source components in our eBook,  Achieving High Availability on PostgreSQL With Open Source Tools.

Jun
30
2022
--

PostgreSQL Sequences – Episode 5 of PostgreSQL for MySQL DBAs

PostgreSQL Sequences

PostgreSQL SequencesMany MySQL DBAs have expressed interest in learning about PostgreSQL and this series is a guided tour through the basics. One of the ‘Wow! That is Different!’ things for MySQL DBAs looking at PostgreSQL are sequences.  Sequences can be used as a rough equivalent to MySQL’s AUTO_INCREMENT but that is only part of the story. Sequences are much more than just for auto-incrementing columns.  And the video can be found here.

SERIAL != SERIAL

First, PostgreSQL has a SERIAL data type.  They are defined as SMALLSERIAL, SERIAL, and BIGSERIAL that consume two, four, or eight bytes of memory. And respectively they count up to 32,767 or  2,147,483,647, or 9,223,372,036,854,775,807. 

MySQL has a keyword named SERIAL that is shorthand for BIGINT NOT NULL AUTO_INCREMENT UNIQUE.  

For the first example, we can create a smile table.

dvdrental=# CREATE SCHEMA test;
CREATE SCHEMA
dvdrental=# \c test
You are now connected to database "test" as user "percona".
test=# CREATE TABLE x (x SERIAL, y CHAR(20), z CHAR(20));
CREATE TABLE
test=# \d x
Table "public.x"
Column | Type | Collation | Nullable | Default
--------+---------------+-----------+----------+------------------------------
x | integer | | not null | nextval('x_x_seq'::regclass)
y | character(20) | | |
z | character(20) | | |

Column x gets a default value from a sequence named x_x_seq using the NEXTVAL() function.

Then we can add some simple data.

test=# INSERT INTO X (y,z) VALUES (100,200),(300,450);
INSERT 0 2
test=# SELECT * FROM x;
x | y | z
---+----------------------+----------------------
1 | 100 | 200
2 | 300 | 450
(2 rows)

Note that we did not enter any values for column x and that the server supplied the value from the x_x_seq sequence.  Use \d to see that the table and the sequence were created together.

test=# \d
List of relations
Schema | Name | Type | Owner
--------+---------+----------+---------
public | x | table | percona
public | x_x_seq | sequence | percona

Sequences by themselves

But sequences can be declared without a table.  In the following example, a sequence is created that will start at 1,001.  To fetch the next value from the sequence and increment that sequence, use the NEXTVAL() function.

test=# CREATE SEQUENCE order_id START 1001;
CREATE SEQUENCE
test=# SELECT NEXTVAL('order_id');
nextval
---------
1001
(1 row)

test=# SELECT NEXTVAL('order_id');
nextval
---------
1002
(1 row)

There are two ways to check the current count. The first is to query the sequence directly:

test=# select * from order_id;
last_value | log_cnt | is_called
------------+---------+-----------
1002 | 31 | t
(1 row)

Ignore all the columns except for the last_value, at least for now.  Or use the CUIRRVAL() function.

 

test=# select currval('order_id');
currval
---------
1003
(1 row)

Sequences are not series

A Series, as the name implies, is a series of numbers.  But they can not be referenced like a sequence can to the automatic valuation of a column. But they are very handy for generating data.

test=# create table test1 as (select generate_series(1,100) as id);
SELECT 100
test=# \d test1
Table "public.test1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |

test=# select * from test1 limit 5;
id
----
1
2
3
4
5
(5 rows)

Wrap-around sequences

Sequences can start or end at numbers you desire and can also wrap around to start again if you use the CYCLE qualifier.  Here we create a repeating sequence of one or two that wraps around.

test=# create sequence wrap_seq as int minvalue 1 maxvalue 2 CYCLE;
CREATE SEQUENCE
test=# select NEXTVAL('wrap_seq');
nextval
---------
1
(1 row)
test=# select NEXTVAL('wrap_seq');
nextval
---------
2
(1 row)
test=# select NEXTVAL('wrap_seq');
nextval
---------
1
(1 row)
test=# select NEXTVAL('wrap_seq');
nextval
---------
2
(1 row)

The details

Information on the status of a sequence can be had by using \d.

test=# \d order_id;
Sequence "public.order_id"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
--------+-------+---------+---------------------+-----------+---------+-------
bigint | 1001 | 1 | 9223372036854775807 | 1 | no | 1

test=# \d wrap_seq;
Sequence "public.wrap_seq"
Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
---------+-------+---------+---------+-----------+---------+-------
integer | 1 | 1 | 2 | 1 | yes | 1

Next episode — Explaining EXPLAIN

Stay tuned!

The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode oneepisode two, episode three, and episode four.

Jun
30
2022
--

Digital Signatures: Another Layer of Data Protection in Percona Server for MySQL

Percona Server for MySQL

Percona Server for MySQLImagine you need to design an online system for storing documents on a per-user basis where nobody, including database administrators, would be able to change the content of those documents without being noticed by document owners.

In Percona Server for MySQL 8.0.28-20, we added a new component called Encryption UDFs – an open-source alternative to MySQL Enterprise Encryption that allows users to access a number of low-level OpenSSL encryption primitives directly from MySQL. This includes calculating digests (with a great variety of hash functions), asymmetric key generation (RSA, DSA), asymmetric encryption/decryption for RSA, and calculating/verifying digital signatures (RSA, DSA) as well as primitives for working with Diffie-Hellman (DH) key exchange algorithm.

Prerequisites

In contrast to MySQL Enterprise Encryption, to make the functions available, users of Percona Server for MySQL do not have to register them manually with

CREATE FUNCTION … SONAME …

for each individual function. All they have to do is invoke

INSTALL COMPONENT 'file://component_encryption_udf'

Schema definition

Now, let us define a simple data schema in which we will store all the info required for our online document storage service.

It will have two tables: user and document.

CREATE TABLE user(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  login VARCHAR(128) NOT NULL,
  key_type VARCHAR(16) NOT NULL,
  public_key_pem TEXT NOT NULL,
  PRIMARY KEY(id),
  UNIQUE(login)
);

Here,
id – unique user identifier (numerical, for internal usage).
login – unique user identifier (lexical, for public usage).
key_type – type of the asymmetric key generated by the user (currently, either RSA or DSA).
public_key_pem – public component of the asymmetric key generated by the user in PEM format (“–––––BEGIN PUBLIC KEY–––––”).

CREATE TABLE document(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_ref INT UNSIGNED NOT NULL,
  name VARCHAR(128) NOT NULL,
  content BLOB NOT NULL,
  digest_type VARCHAR(16) NOT NULL,
  signature VARBINARY(2048) NOT NULL,
  PRIMARY KEY(id),
  FOREIGN KEY (user_ref) REFERENCES user(id),
  UNIQUE (user_ref, name)
);

Here,
id – unique document identifier (numerical, internal).
user_ref – the ID of the user who owns the document.
name – a name under which the document is stored for the user_ref user.
content – binary object that holds document content.
digest_type – the name of the hash function used to calculate the digital signature on content.
signature – digital signature calculated by signing the digest value (calculated with digest_type hash function on content) with a private key that belongs to user_ref.

User registration

So, the first step in registering a new user would be to generate an asymmetric key pair. Although in this example we will be using the RSA algorithm, our system is flexible enough and allows us to use DSA keys as well.

Important notice: for our use case, private keys must always be generated on the client machine and never leave local secure storage. Only public keys are allowed to be transferred via the network.

To generate an RSA private key, the user can run the standard openssl utility.

openssl genrsa -out private_key.pem 4096

The content of the generated 4096-bit RSA private key in PEM format will be written to the private_key.pem file.
Next, we need to extract the public component from the generated key.

openssl rsa -in private_key.pem -pubout -out public_key.pem

The extracted 4096-bit RSA public key in PEM format will be written to the public_key.pem file.

SET @public_key = '<public_key_pem_content>';

Here, <public_key_pem_content> is the content of the public_key.pem file (a public key in PEM format with the “–––––BEGIN PUBLIC KEY–––––” header).

Just for the simplicity of this blog post (Once again, never use this approach in a real system in production), RSA / DSA keys can be generated by the Percona Server:

SET @algorithm = 'RSA';
SET @private_key = create_asymmetric_priv_key(@algorithm, 4096); /* never do this in production */
SET @public_key = create_asymmetric_pub_key(@algorithm, @private_key); /* never do this in production */

Now, when we have a public key, the user registration is straightforward.

INSERT INTO user VALUES(DEFAULT, 'alice', 'RSA', @public_key);

Again, in production, @public_key must be set to the content of the public_key.pem generated locally (with openssl utility, for instance) rather than generated on the server.

Uploading documents

When a user wants to upload a new file to our online document storage, the first step would be to calculate its digital signature.

For instance, if we are going to upload a local file called secure_data.doc and we want to use, say, SHA256 as a hash function to calculate digest before signing with a previously generated 4096-bit RSA private key, execute the following

openssl dgst -sha256 -sign private_key.pem -out secure_data.binsig secure_data.doc

The signature in binary format will be written to the secure_data.binsig.

In order to simplify copying the content of this file to SQL statements, let us also convert this signature to HEX format. We will be using the xxd utility to perform this operation (please notice that on some Linux distributions this utility is a part of the vim-common package).

xxd -p -u -c0 secure_data.binsig secure_data.hexsig

The signature in HEX format will be written to the secure_data.hexsig file. After that, the user is supposed to call the upload_document() stored procedure.

CALL upload_document('alice', 'secure_data.doc', <file_content>,
    'SHA256',  UNHEX(<file_signature_hex>), @upload_status);

Here,
alice – name of the document owner.
secure_data.doc – a name under which the document will be stored.
<file_content> – the content of the local secure_data.doc passed as binary data.
SHA256 – the name of the hash function used to calculate the file digest.
<file_signature_hex> – the file signature in HEX format (the content of the secure_data.hexsig file).
On the server upload_document() stored routine should do the following.
First, it needs to find @user_ref, @key_type, and @public_key_pem in the user table for the provided owner’s login (alice).

SELECT id, key_type, public_key_pem
    INTO user_id, user_key_type, user_public_key_pem
    FROM user
    WHERE login = user_login;

Second, it needs to calculate message digest @digest using the provided hash function name (SHA256) for the provided file data (file_content).

SET digest = create_digest(digest_type, file_content);

Then, the server code needs to verify the file signature provided by the user (file_signature) with the public key associated with the file owner (alice).

SET verification_result = asymmetric_verify(
    user_key_type, digest, file_signature,
    user_public_key_pem, digest_type);

After that, only if verification_result is equal to 1, we confirm the identity of the document owner and insert a new record into the document table.

INSERT INTO document VALUES(DEFAULT, @user_ref, 'secure_data.doc',
    file_content, 'SHA256', file_signature);

Here is how upload_document() may look like

CREATE PROCEDURE upload_document(
  user_login VARCHAR(128), file_name VARCHAR(128),
  file_content BLOB, digest_type VARCHAR(16),
  file_signature VARBINARY(2048), OUT status INT)
L_return:
BEGIN
  DECLARE success INT DEFAULT 0;
  DECLARE error_login_not_found INT DEFAULT 1;
  DECLARE error_verification_failed INT DEFAULT 2;
  
  DECLARE user_id INT UNSIGNED DEFAULT 0;
  DECLARE user_key_type VARCHAR(16) DEFAULT NULL;
  DECLARE user_public_key_pem TEXT DEFAULT NULL;
  DECLARE verification_result INT DEFAULT 0;

  DECLARE digest VARBINARY(64) DEFAULT NULL;
  SELECT id, key_type, public_key_pem
    INTO user_id, user_key_type, user_public_key_pem
    FROM user
    WHERE login = user_login;

  IF user_id = 0 THEN
    SET status = error_login_not_found;
    LEAVE l_return;
  END IF;

  SET digest = create_digest(digest_type, file_content);
  SET verification_result = asymmetric_verify(
    user_key_type, digest, file_signature,
    user_public_key_pem, digest_type)
 
  IF verification_result = 0 THEN
    SET status = error_verification_failed;
    LEAVE l_return;
  END IF;

  INSERT INTO document VALUES(
    DEFAULT, user_id, file_name, file_content,
    digest_type, file_signature);

  SET status = success;
END

Downloading documents and verifying their integrity

In order to download a file from our online document storage, the first step will be getting its content along with digital signature metadata.

CALL download_document('alice', 'secure_data.doc', @downloaded_content,
    @downloaded_digest_type, @downloaded_signature, @download_status);

Here,
alice – name of the document owner.
secure_data.doc – the name of the file we want to download.
@downloaded_content – the content of the downloaded file will be put in this variable.
@downloaded_digest_type – the name of the hash function used to calculate file digest for this file will be put in this variable.
@downloaded_signature – the digital signature of the downloaded file will be put in this variable.
On the server, the download_document() stored routine should do the following.
First, it needs to find @user_ref, @key_type, and @public_key_pem in the user table for the provided owner’s login (alice).

SELECT id, key_type, public_key_pem
    INTO user_id, user_key_type, user_public_key_pem
    FROM user
    WHERE login = user_login;

Second, it needs to get the file content and digital signature metadata.

SELECT id, content, digest_type, signature
  INTO file_id, file_content, file_digest_type, file_signature
  FROM document
  WHERE user_ref = user_id AND name = file_name;

After that, we calculate the digest of the file_content using the file_digest_type hash function.

SET digest = create_digest(file_digest_type, file_content);

And finally, we verify the integrity of the document:

SET verification_result = asymmetric_verify(
  user_key_type, digest, file_signature,
  user_public_key_pem, file_digest_type);

Only if verification_result is equal to 1, do we confirm the integrity of the document and return successful status.
Here is how download_document() may look like

CREATE PROCEDURE download_document(
  user_login VARCHAR(128), file_name VARCHAR(128)
  OUT file_content BLOB, OUT file_digest_type VARCHAR(16),
  OUT file_signature VARBINARY(2048), OUT status INT)
L_return:
BEGIN
  DECLARE success INT DEFAULT 0;
  DECLARE error_login_not_found INT DEFAULT 1;
  DECLARE error_file_not_found INT DEFAULT 2;
  DECLARE error_verification_failed INT DEFAULT 3;

  DECLARE user_id INT UNSIGNED DEFAULT 0;
  DECLARE user_key_type VARCHAR(16) DEFAULT NULL;
  DECLARE user_public_key_pem TEXT DEFAULT NULL;
  DECLARE verification_result INT DEFAULT 0;

  DECLARE file_id INT UNSIGNED DEFAULT 0;

  DECLARE digest VARBINARY(64) DEFAULT NULL;

  SELECT id, key_type, public_key_pem
    INTO user_id, user_key_type, user_public_key_pem
    FROM user
    WHERE login = user_login;

  IF user_id = 0 THEN
    SET status = error_login_not_found;
    LEAVE l_return;
  END IF;

  SELECT id, content, digest_type, signature
    INTO file_id, file_content, file_digest_type, file_signature
    FROM document
    WHERE user_ref = user_id AND name = file_name;

  IF file_id = 0 THEN
    SET status = error_file_not_found;
    LEAVE l_return;
  END IF;

  SET digest = create_digest(file_digest_type, file_content);
  SET verification_result = asymmetric_verify(
    user_key_type, digest, file_signature, user_public_key_pem, file_digest_type);

  IF verification_result = 0 THEN
    SET status = error_verification_failed;
    LEAVE l_return;
  END IF;

  SET status = success;
END

Although we included a digital signature verification code into the download_document() routine, it does not guarantee that the end-user (caller of the download_document() routine) will get the unmodified document. This code was added only as an additional step to detect integrity violations at earlier stages. The real digital signature verification must be performed on the client-side, not inside Percona Server.

Basically, after calling download_document(), we need to save the content of the @downloaded_content output variable to a local file (say, downloaded.doc). In addition, the content of the @downloaded_signature in HEX format (HEX(@downloaded_signature)) must be saved into a local file as well (say, downloaded.hexsig).
After that, we can convert the signature in HEX format into binary form.

xxd -r -p -c0 downloaded.hexsig downloaded.binsig

A digital signature in binary form will be written to the downloaded.binsig file. Now, all we have to do is verify the digital signature:

openssl dgst -sha256 -verify public_key.pem -signature downloaded.binsig downloaded.doc

And only if we see the desired Verified OK status line, we can be sure that the document we just downloaded has not been modified.

Conclusion

To begin with, I would like to highlight that a real production-ready online document storage system is far more complicated than the one we just described. Probably because it does not use the OpenSSL command-line utility to perform cryptographic operations on the client-side. ? Moreover, it takes into consideration a number of other security aspects that are outside the scope of this blog post.

Nevertheless, I still hope that the example with digital signatures shown here helped to convince you that asymmetric cryptography is not rocket science and with the help of Encryption UDFs component for Percona Server for MySQL can be indeed easy and straightforward. Check out the full documentation.

Jun
30
2022
--

Database Corruption: An Overview

overview of database corruption

overview of database corruptionThough I am writing this post being a PostgreSQL DBA, this page can be read by anyone because the concept of corruption is the same in every database.

After reading this blog post, one should understand what database corruption is and how it may occur.

Being DBAs, corruption is something we do not want to ever see in our system; however, there are a number of systems where corruption revisits very frequently. Whenever it occurs in big databases, it becomes challenging for us to detect and repair it as we may see no sign of it. In my 15 years of experience as a DBA, I saw corruption as the toughest nut to crack because ostensible reasons for any corruption are not actually predictable. In other words, we may not know the actual cause of the issue; hence, it is quite difficult to get the RCA.

In this series of blogs, I am going to cover various types of corruption and methods to find and fix them.

This blog post will throw some light on the basics of database corruption.

Concept and analogy

To explain it in a simple manner, I will take an example of the Spanish language. There is text in the image below.

Here, the above text is in Spanish. For anyone who does not understand Spanish, is it possible for them to read it? The straightforward answer is “No”. 

Anyone would ask “How may a person without having knowledge of Spanish read it?”. To end the curiosity, the image reads “Goodbye” in English.

The same thing happens with software products as well. All software is programmed to read and write in its own pre-defined format, and it may not be able to do so in any other format that is supported by any other software product. For example, Python code can not be compiled or read, or executed, in C, Java, or Perl.

In the case of databases, it is about the format of data being stored on a disk. When a database product, such as PostgreSQL, MySQL, or MongoDB, writes on a disk, it performs the operation by using some format.

While reading from a disk, the database product expects the same format there; any portion of data on disk that is not in an appropriate format is CORRUPTION.

To summarize this, corruption is nothing but an improper format or a sequence of data.

How does it happen?

As mentioned in the last section, corruption is a result of an unreadable data format. As we know, data is stored in the form of bits on a disk. Now, in the case of integer or numeric, the conversion is simple. But for characters, every machine is designed to convert data in the form of bytes, which is a group of eight bits, in such a way that every byte represents a character. There are 256 different combinations of every byte, from 00000000(0) to 11111111(255).

To read bytes in the form of characters, some formats are designed, such as ASCII, EBCDIC, BCD, and so on. They are also known as encoding schemes. Out of all these schemes, ASCII (American Standard Code for Information Interchange) is more popular. In this format, every byte (all 256 combinations) is assigned a particular character.

Like,

01000001(65) – A

00101100(44) – ,

Below is the link to view all the ASCII codes.

https://www.rapidtables.com/code/text/ascii-table.html

Here, if any byte is stored with an unexpected sequence of bits, the machine will read a different character.

For example,

Let’s say character A(65) is stored as 11000001(193) instead of 01000001(65), which is “Á“(not the English letter A).

Now, in these mentioned encoding schemes, some characters are human-readable and the rest are not. But, another point to note is that all the software products are not built to decipher all the characters. So, in any case, if a byte at any position gets changed, it is possible that the database may not be able to read the character, and hence data. Those unreadable or non-parsable data are deemed as corrupted data.

For example,

In case How are you? is stored as How are you¿, character “¿” is not available in English, hence those character sets that can only parse English may not be able to recognize that character. So, it will not be able to read that text and throws an error by marking it unreadable. Here, only one character is unrecognizable, but the whole text will be marked as corrupted data.

Causes

It is truly mysterious because we may never know the actual reason for any kind of corruption. As I mentioned above, the corruption is attributed to changes of bits/bytes, but it is really difficult to make it certain what process/thread leads to that change. This is why any test case related to corruption is not actually reproducible. The only thing we can do is explore possible causes.

Some of the possible causes are as below.

Hardware issue:

When RAID disks are not designed properly or controllers are faulty, it may not be able to write data correctly on disks.  In non-RAID disks, mechanical devices should work properly because bits are not stored properly due to faulty disks as well.

Corruption may also occur due to heavy I/Os.

Bug in the operating system:

On occasions, due to a buggy kernel or code, OS encodes data wrongly, and it is later written to the disk. On occasions, OS produces corrupted data while it is inefficient to stem I/Os.

Bug in the database product:

In many cases, the product itself sometimes stores wrong data on the disk, or due to inefficient algorithms, it saves data in the wrong format.

Types of corruption

Every database comprises different types of files, such as data files, WAL files, commit logs, and so on. These files contain data for various database objects e.g. tables, indexes, materialized views, WAL records, etc. When these database files go corrupt, some queries retrieve wrong data or return errors, or some operations(e.g. recovery, replay) may not work as expected. As a DBA, one needs to identify what particular objects are affected due to that corruption. For ease of understanding, corruption is categorized into different types; some of them are as below.

Index corruption:

In general, an index keeps a pointer(s) for a particular value(or a value set) in a column. Whenever an index is asked to return pointers (ctid in PostgreSQL, rowid in Oracle), it fetches those pointers and returns them to the requestor.

In the case of corruption, a wrong pointer to any value is saved on the disk due to faulty bits on the disk. Consequently, it returns a wrong record.

Data corruption:

When data/toast pages store faulty data(in terms of format), it may become unrecognizable while reading the same data. Hence, they get errored out by the database.

WAL corruption:

WAL/Redo/Transaction log files store data in a particular format, and while reading them, WAL entries are parsed and applied. In the case of WAL corruption, WAL entries are not parsable, which affects the WAL reply operation.

Page header corruption:

The lowest unit of storage in databases is block/page, which actually stores the actual records. To maintain data integrity, some information is stored in a separate section that is called the page header. Any improper information in a page header is header corruption. This affects the data integrity.

Summary

Corruption results from changes in bits/bytes while storing data on the disk. When a database product (e.g. MySQL, PostgreSQL) does not get the data in an expected format, it is corruption.

The data in the database may get corrupted due to various reasons, such as faulty hardware and buggy OS/kernel/database products. Owing to this, the data is accidentally changed before it is stored on the disk. While it is wrongly stored on a disk, and hence files, it affects various functions of the software product; to easily understand what particular areas are affected, the corruption is classified into various types, such as index corruption, data corruption, and so on.

This is the first post in the series of database corruption blogs; other blogs will be posted soon. Stay tuned!

Jun
29
2022
--

Percona Operator for MongoDB and LDAP: How Hard Can it Be?

Percona Operator for MongoDB and LDAP

Percona Operator for MongoDB and LDAPAs a member of the Percona cloud team, I run a lot of databases in Kubernetes and our Percona Operator for MongoDB in particular. Most of us feel comfortable with rather small and tidy infrastructure, which is easy to control and operate. However, the natural growth of the organization brings a lot of challenges to be tackled, especially in the context of access management. We’ve all been there – every new account requires platform-specific efforts adding to the operational cost of your infrastructure support. Such a burden could be solved by LDAP-based software like OpenLDAP, Microsoft Active Directory, etc., as a source of truth for the authentication/authorization process. Let’s dive into the details of how Percona Distribution for MongoDB managed by the Kubernetes operator could be connected to the LDAP server.

Environment

Our scenario is based on the integration of the OpenLDAP server and Percona Distribution for MongoDB and the corresponding Kubernetes Operator. We are going to keep the setup as simple as possible thus no complicated domain relationships will be listed here.

OpenLDAP

On the OpenLDAP side the following settings may be used:

0-percona-ous.ldif: |-
   dn: ou=perconadba,dc=ldap,dc=local
   objectClass: organizationalUnit
   ou: perconadba
 1-percona-users.ldif: |-
   dn: uid=percona,ou=perconadba,dc=ldap,dc=local
   objectClass: top
   objectClass: account
   objectClass: posixAccount
   objectClass: shadowAccount
   cn: percona
   uid: percona
   uidNumber: 1100
   gidNumber: 100
   homeDirectory: /home/percona
   loginShell: /bin/bash
   gecos: percona
   userPassword: {crypt}x
   shadowLastChange: -1
   shadowMax: -1
   shadowWarning: -1

Also, a read-only user should be created for database-issued user lookups.

If everything is done correctly, the following command should work

$ ldappasswd -s percona -D "cn=admin,dc=ldap,dc=local" -w password -x "uid=percona,ou=perconadba,dc=ldap,dc=local"

Database

Percona Operator for MongoDB will do all the work inside the Kubernetes. You can use any supported platform from System Requirements

In order to get MongoDB connected with OpenLDAP we need to configure both:

  • Mongod
  • Internal mongodb role

As for mongod you may use the following configuration snippet:

security:
  authorization: "enabled"
  ldap:
    authz:
      queryTemplate: 'ou=perconadba,dc=ldap,dc=local??sub?(&(objectClass=group)(uid={USER}))'
    servers: "openldap"
    transportSecurity: none
    bind:
      queryUser: "cn=readonly,dc=ldap,dc=local"
      queryPassword: "password"
    userToDNMapping:
      '[
          {
            match : "(.+)",
            ldapQuery: "OU=perconadba,DC=ldap,DC=local??sub?(uid={0})"
          }
    ]'
setParameter:
  authenticationMechanisms: 'PLAIN,SCRAM-SHA-1'

The internals of the snippet is a topic for another blog post, though. Basically, we are providing mongod with ldap-specific parameters like domain name of ldap server (‘server’), explicit lookup user, domain rules, etc. 

Put the snippet on your local machine and create Kubernetes secret object named after the MongoDB cluster name from Install Percona server for MongoDB on Kubernetes. 

$ kubectl create secret generic cluster1-rs0-mongod --from-file=mongod.conf=<path-to-mongod-ldap-configuration>

Percona Operator for MongoDB is able to pass through a custom configuration from these Kubernetes objects: Custom Resource, ConfigMap, Secret. Since we are to use some security-sensitive information, we’ve picked the Kubernetes Secret.

The next step is to start the MongoDB cluster up as it’s described in Install Percona Server for MongoDB on Kubernetes.

On successful completion of the steps from the doc, we are to proceed with setting the LDAP users’ roles inside the MongoDB. Let’s log in to MongoDB as administrator and execute:

var admin = db.getSiblingDB("admin")
admin.createRole(
  {
    role: "ou=perconadba,dc=ldap,dc=local",
    privileges: [],
    roles: [ "userAdminAnyDatabase" ]
  }
)

Now our percona user created inside OpenLDAP is able to login to MongoDB as administrator. Please replace <mongodb-rs-endpoint> with a valid replica set domain name.

$ mongo --username percona --password 'percona' --authenticationMechanism 'PLAIN' --authenticationDatabase '$external' --host <mongodb-rs-endpoint> --port 27017

Limitations

Percona Operator for MongoDB supports unencrypted only transport between mongodb and LDAP servers at the moment of writing this blog post. We’ll do our best to bring such a feature in the future, in the meantime feel free to use it in security relaxed scenarios.

Conclusion

This very blog post describes only one possible case out of a huge variety of possible combinations. It’s rather simple and does not bring for discussion topics like TLS secured transport setup, directory design, etc. We encourage you to try the MongoDB LDAP integration described for proof of concept, development environment setup, etc., where security concerns are not so demanding. Don’t hesitate to bring more complexity with sophisticated directory structures, user privileges schemes, etc. If you find your own configuration interesting and worth sharing with the community, please visit our Percona Community Forum. We’ll be glad to check your outcomes. 

Have fun with Percona Operator for MongoDB!

Jun
29
2022
--

Window Functions in MongoDB 5.0

Window Functions in MongoDB 5.0

Window Functions in MongoDB 5.0I have already presented in previous posts some of the new features available on MongoDB 5.0: resharding and time series collections. Please have a look if you missed them:

MongoDB 5.0 Time Series Collections

Resharding in MongoDB 5.0

In this article, I would like to present another new feature: window functions.

Window functions are quite popular on relational databases, they permit the run of a window across sorted documents producing calculations over each step of the window. Typical use cases are calculating rolling averages, correlation scores, or cumulative totals. You can achieve the same result even with older versions of MongoDB or with databases where window functions are not available. But this comes at the cost of more complexity because multiple queries are usually required, and saving somewhere temporary data is needed as well.

Instead, the window functions let you run a single query and get the expected results in a more efficient and elegant way.

Let’s see how the feature works on MongoDB 5.0.

The window functions

A new aggregation stage $setWindowFields is available on MongoDB 5.0. This is the one that provides the window functions capability.

The following is the syntax of the stage:

{
  $setWindowFields: {
    partitionBy: <expression>,
    sortBy: { 
      <sort field 1>: <sort order>,
      <sort field 2>: <sort order>,
      ...,
      <sort field n>: <sort order>
    },
    output: {
      <output field 1>: {
        <window operator>: <window operator parameters>,
        window: { 
          documents: [ <lower boundary>, <upper boundary> ],
          range: [ <lower boundary>, <upper boundary> ],
          unit: <time unit>
        }
      },
      <output field 2>: { ... }, 
      ...
      <output field n>: { ... }
    }
  }
}

  • partitionBy (optional): some expression to group the document. If omitted by default all the documents are grouped into a single partition
  • sortBy (required in some cases ): sorting the documents. Uses the $sort syntax
  • output (required): specifies the documents to append to the result set. Basically, this is the parameter that provides the result of the window function
  • window (optional): defines the inclusive window boundaries and how the boundaries should be used for the calculation of the window function result

Well, the definitions may look cryptic but a couple of simple examples will clarify how you can use them.

The test dataset

I have a Percona Server for MongoDB 5.0 running and I got some public data about COVID-19 infections, hospitalizations, and other info from Italy. The data are available on a per-day and per-region basis from the following link: https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni.

I loaded just a few months’ data spanning 2021 and 2022. Data is labeled in Italian, so I created a similar and reduced collection just for the needs of this article.

Here is a sample of the documents:

> db.covid.find({"region":"Lombardia"}).sort({"date":1}).limit(5)
{ "_id" : ObjectId("62ab5f7d017d030e4cb314e9"), "region" : "Lombardia", "total_cases" : 884125, "date" : ISODate("2021-10-01T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb314fe"), "region" : "Lombardia", "total_cases" : 884486, "date" : ISODate("2021-10-02T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb31516"), "region" : "Lombardia", "total_cases" : 884814, "date" : ISODate("2021-10-03T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb31529"), "region" : "Lombardia", "total_cases" : 884920, "date" : ISODate("2021-10-04T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb3153d"), "region" : "Lombardia", "total_cases" : 885208, "date" : ISODate("2021-10-05T15:00:00Z") }

Each document contains the daily number of total COVID infections from the beginning of the pandemic for a specific Italian region.

Calculate daily new cases

Let’s create our first window function.

Since we have in the collection only the number of total cases, we would like to calculate the number of new cases per day. This way we can understand if the status of the pandemic is getting worse or improving.

You can achieve that by issuing the following aggregation pipeline:

> db.covid.aggregate( [
{ $setWindowFields: {
    partitionBy : "$region",
    sortBy: { date: 1 },
    output: {
      previous: {
        $push: "$total_cases",
        window: {
          range: [-1, -1],
          unit: "day"
        }
      } 
    }
  }
}
,
{ $unwind:"$previous"},
{ $addFields: {
    new_cases: {
      $subtract: ["$total_cases","$previous"]
    }
  }
},
{ $match: { "region": "Lombardia" } },
{ $project: { _id:0, region:1, date:1, new_cases: 1}  }
] )
{ "region" : "Lombardia", "date" : ISODate("2021-10-02T15:00:00Z"), "new_cases" : 361 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-03T15:00:00Z"), "new_cases" : 328 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-04T15:00:00Z"), "new_cases" : 106 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-05T15:00:00Z"), "new_cases" : 288 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-06T15:00:00Z"), "new_cases" : 449 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-07T15:00:00Z"), "new_cases" : 295 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-08T15:00:00Z"), "new_cases" : 293 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-09T15:00:00Z"), "new_cases" : 284 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-10T15:00:00Z"), "new_cases" : 278 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-11T15:00:00Z"), "new_cases" : 87 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-12T15:00:00Z"), "new_cases" : 306 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-13T15:00:00Z"), "new_cases" : 307 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-14T15:00:00Z"), "new_cases" : 273 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-15T15:00:00Z"), "new_cases" : 288 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-16T15:00:00Z"), "new_cases" : 432 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-17T15:00:00Z"), "new_cases" : 297 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-18T15:00:00Z"), "new_cases" : 112 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-19T15:00:00Z"), "new_cases" : 412 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-20T15:00:00Z"), "new_cases" : 457 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-21T15:00:00Z"), "new_cases" : 383 }

 

The pipeline also contains stages to make the output more readable. Let’s focus on the $setWindowFields anyway.

In the first stage, we define the window function in order to create for each document a new field containing the total cases from the previous day. The field was obviously named previous.

Then we’ll use this information in the following stages to simply calculate the difference between the total cases of “today” and “yesterday”. Then we get the daily increase.

Take a look at how the window function has been created. We used $push to fill the new field with the value of total_cases. In the window document, we defined the range as [-1,-1]. These numbers represent the lower and upper boundaries of the window and they both correspond to the previous (-1) document in the window. It spans only one document: yesterday. In this case, the usage of sortBy is relevant because it tells MongoDB which order to consider the documents in the windows. The trick of defining the range as [-1,-1] to get yesterday’s data is possible because the documents are properly sorted.

Calculate moving average

Let’s now calculate the moving average. We’ll consider the last week of data to calculate the average of new cases on a daily basis. This kind of parameter was a very popular one during the peak of the pandemic to trigger a lot of discussions around the forecasts and to address the decisions of the governments. Well, it’s a simplification. There were also other relevant parameters, but the moving average was one of them.

To calculate the moving average we need the daily new cases we have calculated in the previous example. We can reuse those values in different ways like adding another “$setWindowField” stage in the previous pipeline, adding the new_cases field on existing documents, or creating another collection as I did for simplicity this way using the $out stage:

> db.covid.aggregate( [ { $setWindowFields: { partitionBy : "$region", sortBy: { date: 1 }, output: { previous: { $push: "$total_cases", window: { range: [-1, -1],  unit: "day" } } } } }, { $unwind:"$previous"},  { $addFields: { new_cases: { $subtract: ["$total_cases","$previous"] } } }, { $project: { region:1, date:1, new_cases: 1} }, { $out: "covid_daily"  }  ] )

Now we can calculate the moving average on the covid_daily collection. Let’s do it with the following aggregation:

> db.covid_daily.aggregate([
{ $setWindowFields: {
    partitionBy : "$region",
    sortBy : { date: 1 },
    output: {
      moving_average: {
        $avg: "$new_cases",
        window: {
          range: [-6, 0],
          unit: "day"
        }
      }
    }
  }
},
{ $project: { _id:0  } }
])
{ "region" : "Abruzzo", "date" : ISODate("2021-10-02T15:00:00Z"), "new_cases" : 49, "moving_average" : 49 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-03T15:00:00Z"), "new_cases" : 36, "moving_average" : 42.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-04T15:00:00Z"), "new_cases" : 14, "moving_average" : 33 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-05T15:00:00Z"), "new_cases" : 35, "moving_average" : 33.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-06T15:00:00Z"), "new_cases" : 61, "moving_average" : 39 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-07T15:00:00Z"), "new_cases" : 54, "moving_average" : 41.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-08T15:00:00Z"), "new_cases" : 27, "moving_average" : 39.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-09T15:00:00Z"), "new_cases" : 48, "moving_average" : 39.285714285714285 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-10T15:00:00Z"), "new_cases" : 19, "moving_average" : 36.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-11T15:00:00Z"), "new_cases" : 6, "moving_average" : 35.714285714285715 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-12T15:00:00Z"), "new_cases" : 55, "moving_average" : 38.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-13T15:00:00Z"), "new_cases" : 56, "moving_average" : 37.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-14T15:00:00Z"), "new_cases" : 45, "moving_average" : 36.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-15T15:00:00Z"), "new_cases" : 41, "moving_average" : 38.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-16T15:00:00Z"), "new_cases" : 26, "moving_average" : 35.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-17T15:00:00Z"), "new_cases" : 39, "moving_average" : 38.285714285714285 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-18T15:00:00Z"), "new_cases" : 3, "moving_average" : 37.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-19T15:00:00Z"), "new_cases" : 45, "moving_average" : 36.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-20T15:00:00Z"), "new_cases" : 54, "moving_average" : 36.142857142857146 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-21T15:00:00Z"), "new_cases" : 72, "moving_average" : 40 }

 

Note we have defined the range boundaries as [-6,0] in order to span the last week’s documents for the current document.

Notes about window functions

We have used unit: “day” in the window definition, but this option field can also have other values like year, quarter, month, week, day, hour, and so on.

There are multiple operators that can be used with $setWindowFields: $avg, $count, $first, $last, $max, $min, $derivative, $sum, $rank and many others you can check on the documentation.

There are a few restrictions about window functions usage. Please have a look at the official documentation in case you hit some of them.

Conclusion

The new window function is a very good feature deployed on MongoDB 5.0. It could make life easier for a lot of developers.

For getting more details and to check the restrictions you can, have a look at the following page:

https://www.mongodb.com/docs/manual/reference/operator/aggregation/setWindowFields/

Percona Server for MongoDB 5.0 is a drop-in replacement for MongoDB Community. You can use it for free and you can rely on enterprise-class features like encryption at rest, LDAP authentication, auditing, and many others. You can also rely on all new features of MongoDB Community 5.0, including window functions.

Take a look at Percona Server for MongoDB.

Jun
29
2022
--

Importance of PostgreSQL Vacuum Tuning and Custom Scheduled Vacuum Job

Importance of PostgreSQL Vacuum Tuning

PostgreSQL’s built-in autovacuum – the housekeeper – is improving, version after version. It is becoming more capable while reducing its overhead and addressing edge cases. I think there is no PostgreSQL version that comes out without any autovacuum improvement, and no doubt that it is good enough for the majority of use cases.

But still, that is far from what is a perfect fit for any specific environment. While working with many of the customer environments we keep seeing cases where the built-in logic is not sufficient. As I keep doing fixes for many Percona customers, I thought of noting down important points for everyone.

Common Problems/Limitations

  1. Tables become candidates for autovacuum during peak hours.
    The autovacuum settings are based on scale factors/thresholds. The chance that the table crosses these limits is high when there is a high number of transactions on the table – which is the peak hours. Effectively it gets kicked in at the very wrong time.
  2. Starving tables.
    It is very common to see a few tables become candidates for vacuuming too frequently and occupy all workers repeatedly. While other tables down the list of candidature remain unvacuumed for a long. The current autovacuum intelligence is not sufficient to understand who starved more and give a better priority
  3. No way to control the throttle of autovacuum workers dynamically.
    This is probably the worst. Even if there is an informed DBA, who wants to adjust the autovacuum_vacuum_cost_limit based on need or time window and signal the PostgreSQL.
    For example:

    ALTER SYSTEM set autovacuum_vacuum_cost_limit = 2000;
    select pg_reload_conf();

    This has no effect on the currently running autovacuum workers. Only the next worker starting will consider this setting. So this can not be used for addressing the problem.
  4. The attempt by DBAs to tune parameters often backfires.
    After seeing the aged tables and starving tables, desperate DBAs keep aggressive settings and a higher number of workers. Many times this pushes the system way beyond its limit because everything gets in the wrong time with high aggression when the system has already a high number of active sessions. Multiplied by the maintenance_work_mem allocations by workers. system performance suffers to a great extent. The worst I have seen is autovacuum workers occupying up to 50% of server resources.
  5. Autovacuum during the active time window defeats its own purpose.
    The autovacuum worker will be referring to an old xid/snapshot if it takes time to complete during the high activity window. So effectively it won’t be cleaning the dead tuples generated during the same duration, which is against the very purpose of autovacuum
  6. Starved tables trigger wraparound prevention autovacuum.
    It is very common to see that the tables which are starved for a longer duration of autovacuum reach autovacuum_freeze_max_age and wraparound prevention aggressive vacuum get triggered.

Due to such ineffectiveness, we keep seeing DBAs tending to disable the autovacuum altogether and invite a bigger set of problems and even outages. At the very least, my request to anyone who is new to PostgreSQL is, please never try to turn off the autovacuum. That’s not the way to address autovacuum-related issues.

Tuning Autovacuum

Tuning autovacuum is obviously the first line of action.

Global level settings

The parameters autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay are the main two parameters to control the throttle of the autovacuum workers. autovacuum_max_workers controls how many workers will be working at a time on different tables. By default, autovacuum_vacuum_cost_limit will be disabled (-1) which means the value of the other parameter vacuum_cost_limit will be in effect. So the very first thing suggestible is to set a value for autovacuum_vacuum_cost_limit which will help us to control the autovacuum workers alone.

One common mistake I see across many installations is that autovacuum_max_workers is set to a very high value like 15!. assuming that this makes the autovacuum run faster. Please remember that autovacuum_vacuum_cost_limit is divided among all workers. So higher the number of workers, each worker runs slower. And slower workers mean ineffective cleanup as mentioned above. Moreover, each of them can occupy up to maintenance_work_mem In general, the default value of autovacuum_max_workers, which is 3 will be sufficient. Please consider increasing it only if it is an absolute necessity.

Table level analysis settings

The blanket tuning setting at the Instance level might not work great for at least a few tables. These outliers need special treatment and tuning the settings at table level might become unavoidable. I would start with those tables which becomes candidate too frequently for autovacuum.

PostgreSQL logs with log_autovacuum_min_duration the setting gives great details of those tables which are frequently becoming candidates and those autovacuum runs which took considerable time and effort. Personally, I prefer this as the starting point. A summary of autovacuum runs can be obtained by comparing the autovacuum_count of pg_stat_all_tables taken in two different timestamps also. We need to consider is the HOT (Heap Only Tuple) updates and fillfactor. Hot update information can be analyzed using the n_tup_hot_upd of the same view (pg_stat_all_tables), Tuning this can bring down the vacuum requirements drastically.

Equipped with all this information analysis, specific table level settings can be adjusted. For example:

alter table t1 set (autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=130000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=630000, autovacuum_enabled=true, fillfactor=82);

Supplementary Scheduled Vaccum Job

Our aim is not to disable the autovacuum, but to supplement the autovacuum with our knowledge about the system. It need not be complex at all. The simplest we can have is to run a ‘VACUUM FREEZE’ on tables which are having maximum age for itself or its TOAST.

For example, we can have vaccumjob.sql file with the following content

WITH cur_vaccs AS (SELECT split_part(split_part(substring(query from '.*\..*'),'.',2),' ',1) as tab FROM pg_stat_activity WHERE query like 'autovacuum%')
select 'VACUUM FREEZE "'|| n.nspname ||'"."'|| c.relname ||'";'
  from pg_class c 
  inner join pg_namespace n on c.relnamespace = n.oid
  left join pg_class t on c.reltoastrelid = t.oid and t.relkind = 't'
where c.relkind in ('r','m') 
AND NOT EXISTS (SELECT * FROM cur_vaccs WHERE tab = c.relname)
order by GREATEST(age(c.relfrozenxid),age(t.relfrozenxid)) DESC
limit 100;
\gexec

The query gets 100 aged tables that are not currently undergoing autovacuum and run a “VACUUM FREEZE” on them. (The \gexec at the end executes the query output)

This can be scheduled using corn for a low activity window like:

20 11 * * * /full/path/to/psql -X -f /path/to/vacuumjob.sql > /tmp/vacuumjob.out 2>&1

If there are multiple low-impact windows, all of them can be made use of using multiple schedules.

Practically we have seen that the supplementary, scheduled vacuum jobs, based on the table age approach have the following positive effects

  1. The chance of those tables becoming candidates again during the peak times is drastically reduced.
  2. Able to achieve very effective utilization of server resources during the off-peak times for the vacuum and freeze operation.
  3. Since the candidature was selected based on totally different criteria (age of table) than the default (scale factor and threshold), the chance of a few tables starving forever is eliminated. Moreover, that removes the possibility of the same table becoming a candidate for vacuum again and again.
  4. In customer/user environments, the wraparound prevention autovacuum is almost never reported again.

Summary

It is not rare to see systems where autovacuum remains untuned or poor settings are used at the instance level till the table level. Just want to summarize that:

  1. Default settings may not work great in most of the systems. Repeated autovacuum runs on a few tables while other tables starve for autovacuum is very common.
  2. Poor settings can result in autovacuum workers taking a considerable part of the server resources with little gain.
  3. Autovacuum has the natural tendency to start at the wrong time when a system undergoes heavy transactions.
  4. Practically a scheduled vacuum job becomes necessary for those systems which undergo heavy transactions and with a large number of transaction tables, and are expected to have spikes, and peak time periods of load.

Clear analysis and tuning are important. And it is always highly recommended to have a custom vacuum job that takes up your knowledge about the system and time windows of the least impact.

Jun
28
2022
--

Upgrade Process of Percona Server for MongoDB (Replica Set and Shard Cluster)

Upgrade Process of Percona Server for MongoDB

Percona Server for MongoDB (PSMDB) upgrade, major and minor versions

Upgrade Process of Percona Server for MongoDBIn this blog, we will see the best practices for upgrading major and minor versions of the clusters (Replica Set and Shard Cluster) with no downtime, in a rolling manner.

Percona Server for MongoDB (PSMDB) has versions like A.B.C where A.B refers to the major version and C refers to the minor version or the revision number. Below is the way to identify the versioning and understand which one is suitable for production and development environments:

  • If B is an even number then it refers to a stable release and can be deployed to production.
  • If B is an odd number then it refers to a development release, which is not stable and is under testing.
  • C will be an incrementing number like 0,1,2,3 and so on. Generally, it is for bug fixes and backward-compatible changes.

Before upgrading a Replica set or Sharded cluster, the below prerequisites should be taken care of.

Prerequisites:

  • Always take a full backup of the database before doing an upgrade.
  • Always check for any special considerations or compatibility issues specific to your PSMDB release from release notes (mostly in the Major version, but sometimes there is a rare minor-version bug fix that has backward incompatibility).
  • Please inform the application team and ask them to check the current driver compatibility with the version to which an upgrade is required.
  • Make sure all the nodes are in a healthy state.
  • Check the FCV (feature compatibility version), it should be the same as the current database version.Below is the command to check the current FCV:
    db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )

    Below is the command to set the current FCV:

    db.adminCommand( { setFeatureCompatibilityVersion: "version"} )
  • Plan the upgrade in a maintenance window. Also, before you plan to upgrade a production environment, it is advisable to upgrade a lower environment that reproduces your production environment, to ensure that your production configuration is compatible with all changes.

Upgrade major version in a Replica Set

Before starting the upgrade (in this blog, I will upgrade from 4.4 to 5.0), take the backup/copy of the PSMDB configuration file (mongod.conf) on every node, as it will be replaced with the default one.

We are using the default location of the configuration file in this blog for backup/copy:

$ cp /etc/mongod.conf /etc/mongod.conf_bkp

Start upgrading with secondaries one by one.

  • Stop the service on one secondary (if not logged in as root user, make sure user is having sudo access):
    $ sudo service mongod stop
  • After the service is stopped successfully, remove the binaries:
    $ sudo yum remove percona-server-mongodb*
  • Enable the repository:
    $ sudo percona-release enable psmdb-50
  • To install a specific version of Percona Server for MongoDB, run the below command to see a list of available versions:
    $ sudo yum list percona-server-mongodb --showduplicates

    Below is a sample output of the above command:

    percona-server-mongodb.x86_64 5.0.2-1.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.3-2.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.4-3.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.5-4.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.6-5.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.7-6.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.8-7.el7 psmdb-50-release-x86_64
    percona-server-mongodb.x86_64 5.0.9-8.el7 psmdb-50-release-x86_64
  • Install specific version packages. For example, to install Percona Server for MongoDB 5.0.8-7, run the following command:
    $ sudo yum install percona-server-mongodb-5.0.8-7.el7

    The below command will install the latest version available:

    $ sudo yum install percona-server-mongodb
  • Before restoring make sure to remove the content of the newly created configuration file:
    $ > /etc/mongod.conf
    $ cat /etc/mongod.conf_bkp > /etc/mongod.conf
  • Start the mongod service:
    $ sudo service mongod start
  • Check the installed version:
    $ mongo --version
  • Wait for the secondary to get in sync and come into the “SECONDARY” state.
  • Now perform the same steps one through nine, including taking a backup/copy of the config file on other nodes.
  • Once both the secondaries are upgraded, login into the Primary node and stepDown it. Once a new primary is elected, repeat the same steps one through nine.
  • Wait for the sanity completion by the QA or App team.
  • Then change the FCV to the current version once everything is running fine.

Upgrade major version in Sharded Cluster

  • Stop the balancer.
    Run the below command to stop the balancer:

    mongos> sh.stopBalancer()

    Run the below command to check the balancer state:

    mongos> sh.getBalancerState()
    false
  • Upgrade the Config server replica set first. Refer to steps 1-11 from the upgrade major version in the replica set.
  • Upgrade the Shard replica sets next. Refer to upgrade major version in the replica set. Also, you can upgrade the shards in parallel in the same rolling manner, steps will be the same for all shards.
  • The last mongos will be upgraded in the same rolling manner (If you have one mongos then please inform your application team for that particular downtime).
  • Start the balancer.
    mongos> sh.startBalancer()
  • Now follow steps 12 and 13 from the upgrade major version in the replica set once the cluster is upgraded.

Upgrade minor version in a Replica Set

Before starting the upgrade, take the backup/copy of the PSMDB configuration file (mongod.conf) file on every node, as it will be replaced with the default one.

Using the default location of the configuration file here for backup/copy:

$ cp /etc/mongod.conf /etc/mongod.conf_bkp

Start upgrading with secondaries one by one.

    • Stop the service on one secondary (if not logged in as a root user, make sure the user has sudo access):
      $ sudo service mongod stop
    • After the service is stopped successfully, remove the binaries:
      $ sudo yum remove percona-server-mongodb*
    • Check the available minor versions:
      $ sudo yum list percona-server-mongodb --showduplicates

      Sample output:

      percona-server-mongodb.x86_64 5.0.2-1.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.3-2.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.4-3.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.5-4.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.6-5.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.7-6.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.8-7.el7 psmdb-50-release-x86_64
      percona-server-mongodb.x86_64 5.0.9-8.el7 psmdb-50-release-x86_64
    • Install the specific version on which you want to upgrade (I choose to upgrade from 5.0.3 to 5.0.7):
      $ sudo yum install percona-server-mongodb-5.0.7-6.el7
    • Before restoring make sure to remove the content of the newly created configuration file:
$ > /etc/mongod.conf
$ cat /etc/mongod.conf_bkp > /etc/mongod.conf

  • Start the mongod service:
    $ sudo service mongod start
  • Check the new version:
    $ mongo --version
  • Wait for the secondary to get in sync and come into the “SECONDARY” state.
  • Now perform the same steps 1-10 including taking a backup/copy config file on other nodes.
  • Once all the secondaries are upgraded, login into the Primary node and stepDown it. Once a new primary is elected, repeat the same steps 1-10.
  • In the minor version, FCV will remain the same as we are just changing the C rather than A or B in A.B.C.

Upgrade minor version in Sharded Cluster

  • Stop the balancer.
    Run the below command to stop the balancer:

    mongos> sh.stopBalancer()

    Run the below command to check the balancer state:

    mongos> sh.getBalancerState()
    false
  • Upgrade the Config server replica set first. Refer to steps from the upgrade minor version in the replica set.
  • Upgrade the Shard replica sets next. Refer to steps from the upgrade minor version in the replica set. Also, you can upgrade the shards parallel in the same rolling manner, steps will be the same for all shards.
  • The last mongos will be upgraded in the same rolling manner (If you have one mongos then please inform your application team for that particular downtime).
  • Start the balancer.
    mongos> sh.startBalancer()

 

Summary

Percona Server for MongoDB is an open-source replacement for MongoDB Community Edition that combines all of the features and benefits of MongoDB Community Edition with enterprise-class features developed by Percona: LDAP Authentication and Authorization, Audit Logging, Kerberos Authentication, and hot backups

To learn more about the enterprise-grade features available in the vendor lock-in-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Jun
28
2022
--

Encryption Functions in Percona Server for MySQL 8.0.28, Multiple KMIP Servers in Percona Distribution for MongoDB 5.0.9: Release Roundup June 28, 2022

Release Roundup June 28

Release Roundup June 28It’s time for the release roundup!

Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights and critical information, as well as links to the full release notes and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since June 13, 2022. Take a look!

Percona Distribution for PostgreSQL 14.4

On June 27, 2022, Percona Distribution for PostgreSQL 14.4 was released.  This release of Percona Distribution for PostgreSQL is based on PostgreSQL 14.4. It fixes the issue with all PostgreSQL 14 versions that could cause silent data corruption when using the CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY commands. Learn how to detect and fix silent data corruption in your indexes in the upstream documentation.

Download Percona Distribution for PostgreSQL 14.4

 

Percona Distribution for MySQL (Percona Server for MySQL-based variant) 8.0.28

Percona Distribution for MySQL (PS-based variant) 8.0.28 was released on June 20, 2022. It is a single solution with the best and most critical enterprise components from the MySQL open source community, designed and tested to work together. Release highlights include:

  • Encryption functions and variables to manage the encryption range are added. The functions may take an algorithm argument. Encryption converts plaintext into ciphertext using a key and an encryption algorithm.
  • Support for the Amazon Key Management Service (AWS KMS) component.
  • Error messages are now logged to a memory buffer.
  • ZenFS file system plugin for RocksDB is updated to version 2.1.0.
  • Memory leak and errors detectors (Valgrind or AddressSanitizer) provide detailed stack traces from dynamic libraries (plugins and components). The detailed stack traces make it easier to identify and fix the issues.

Download Percona Distribution for MySQL (PS-based variant) 8.0.28

Percona Server for MySQL 8.0.28-20

On June 20, 2022, we released Percona Server for MySQL 8.0.28-20, a free, fully compatible, enhanced, and open source drop-in replacement for any MySQL database. It provides superior performance, scalability, and instrumentation. New features and improvements introduced include:

  • Percona Server for MySQL implements encryption functions and variables to manage the encryption range. The functions may take an algorithm argument. Encryption converts plaintext into ciphertext using a key and an encryption algorithm. You can also use the user-defined functions with the PEM format keys generated externally by the OpenSSL utility.
  • Percona Server for MySQL adds support for the Amazon Key Management Service (AWS KMS) component.
  • ZenFS file system plugin for RocksDB is updated to version 2.1.0.
  • Memory leak and error detectors (Valgrind or AddressSanitizer) provide detailed stack traces from dynamic libraries (plugins and components). The detailed stack traces make it easier to identify and fix the issues.

Download Percona Server for MySQL 8.0.28-20

 

Percona Distribution for MongoDB 5.0.9

On June 20, 2022, Percona Distribution for MongoDB 5.0.9 was released. It is a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. The aim of Percona Distribution for MongoDB is to enable you to run and operate your MongoDB efficiently with the data being consistently backed up. Release highlights include:

  • The following improvements to the data-at-rest encryption using the Keys Management Interoperability Protocol (KMIP) (tech preview feature):
    • the support of multiple KMIP servers as the failover to your setup
    • the ability to set KMIP client certificate passwords through a flag to simplify the migration from MongoDB Enterprise to Percona Server for MongoDB
  • Added support for Ubuntu 22.04 for Percona Server for MongoDB and Percona Backup for MongoDB.
  • Added improvements to initial syncs from a secondary sync source.

Download Percona Distribution for MongoDB 5.0.9

Percona Server for MongoDB 5.0.9-8

Percona Server for MongoDB 5.0.9-8 was released on June 20, 2022. It’s an enhanced, source-available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 5.0.9 Community Edition. It supports MongoDB 5.0.9 protocols and drivers.

Download Percona Server for MongoDB 5.0.9-8

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Jun
27
2022
--

Talking Drupal #353 – Native Web Components

Today we are talking about Web Components with Ron Northcutt.

www.talkingDrupal.com/353

Topics

  • What are native web components
  • How are they different from custom elements
  • How are they different from JS frameworks
  • Libraries
    • Elix
    • Bit
    • Polymer
    • Lit
  • ShadowDOM
  • Difference between Progressive Web App and NWC
  • Benefits of NWC
  • How do you use them
  • What are lifecycle callbacks
  • Styles
  • Html
  • Slots
    • named
    • anonymous
  • Support for NWCs
  • Use cases
  • Anything else?

Resources

Guests

Ron Northcutt – @rlnorthcutt

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Donna Bungard – @dbungard

MOTW

Menu block This module provides configurable blocks of menu links with additional features not available in Drupal 8+ core.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com