Mar
30
2023
--

The Ultimate Guide to Open Source Databases

The Ultimate Guide to Open Source Databases

The use of open source databases has increased steadily in recent years. Past trepidation — about perceived vulnerabilities and performance issues — has faded as decision makers realize what an “open source database” really is and what it offers.

This comprehensive overview examines open source database architecture, types, pros and cons, uses by industry, and how open source databases compare with proprietary databases. This guide also covers key factors to consider when selecting the right open source database for your organization.

1. What is an open source database?

In simple terms, an open source database is this: It’s a database with source code that is free and available to all. Public accessibility enables and allows users to download, modify, and distribute the code.

Still speaking generally, open source databases do this: They give businesses and organizations of all sizes a cost-effective, flexible alternative to proprietary commercial databases. And contrary to past misperceptions, “open source” is not synonymous with risk. In fact, the opposite is true, thanks to an active and vibrant global community of developers who work continuously to improve open source software. Their work produces higher-quality code and enables faster innovation, while maintaining high security standards.

Further, open source databases can be modified in infinite ways, enabling institutions to meet their specific needs for data storage, retrieval, and processing. Depending on those needs, an organization can choose between relational and nonrelational open source databases. Here’s a basic explanation of the differences:

  • Relational databases: Key-value pairs are used to store structured data into tables consisting of columns and rows. Relational databases — which include MySQL, PostgreSQL, and many others — are the most commonly used open source databases. They’re often preferred for storing and processing business intelligence data by organizations that require fast SQL queries.
  • Non-relational databases: Instead of tables, non-relational (NoSQL) databases use document-based data storage, column-oriented storage, and graph databases. Non-relational databases — which include MongoDB, Apache Cassandra, and others —  are favored for storing and processing unstructured data. For example, an analytics application would work best with unstructured image files stored in a non-relational graph database.

Closed source database vs. open source database

With closed source (proprietary) database software, the public does not have access to the source code; only the company that owns it and those given access can modify it. With open source database software, anyone in the general public can access the source code, read it, and modify it.

 

2. Advantages of open source databases for your organization

Developers, DBAs, and other decision makers are discovering the advantages of open source databases over proprietary commercial databases. Those advantages include:

Lower costs

Open source database software is free to download. There are no licensing or purchasing fees for reusing, modifying, or distributing the software. That’s in contrast to the annual licensing fees that commercial vendors charge. With open source databases, users have the freedom to modify and distribute the code as needed, reducing the overall cost of ownership.

No vendor lock-in

Open source software is free of proprietary restrictions that can come with vendor lock-in. Developers can customize the source code and try new applications without a big budget hit. Companies can more easily scale infrastructure — up or down — to meet economic conditions and changing business objectives. 

Relying on one vendor can address immediate concerns, reduce complexity, and provide a secure database. But vendor lock-in can occur, making a company susceptible to price hikes, paying for unnecessary technology, and being blocked from new technology that could be advantageous. With open source software, a business is not trapped into using one provider’s software, support, or services. Instead, the business may design and redesign systems as customer expectations change and business objectives evolve.

Faster innovation

Without prohibitive contracts and lengthy procurement processes, open source enables developers and DBAs to customize the source code and create new applications for addressing evolving needs. Plus, there’s a global community of dedicated volunteers driving the development of open source database technology. Open source standards and community support enable developers and DBAs to focus on accelerating feature creation and on enhancing availability, performance, scalability, and security. 

Quality control

The dedication of a mission-driven open source community also bolsters quality control. The community spans expertise and industries, which puts multitudes of eyes and fresh perspectives into the review and improvement of code. Bugs and security vulnerabilities are identified more quickly. The open source model can result in more robust and reliable database solutions.

Data portability

Open source freedom enables an organization to deploy databases anywhere and move them at any time — to cloud, on-premises, or hybrid environments. Free of licensing restrictions, developers may access and modify the code. It’s important to note that moving applications to the public cloud doesn’t necessarily eliminate data lock-in. Cloud providers can charge high egress fees that impede such movement, and the more data in the cloud, the more your applications must be cloud-based. With open source, organizations can move databases without paying penalties.

 

3. Popular open source database software

There are options for businesses and organizations that seek open source database software to best fit workloads and objectives. Those options include: relational databases, which contain data with predefined relationships, organized as a set of tables with columns and rows; and non-relational (or NoSQL) databases, which store data in non-tabular form and are based on data structures like documents. Here are some of the most widely used databases:

MySQL

MySQL — the SQL part stands for Structured Query Language — is one of the most popular open source relational databases. It is widely used for web applications, data warehousing, and online transaction processing. MySQL’s popularity is attributable to many factors: It’s a solid, quick, and dependable system; MySQL does not have a steep learning curve; it’s compatible with almost every operating system (OS) a DBA or developer will use; and the MySQL environment is conducive to scalability.  

PostgreSQL

PostgreSQL, a non-relational database system, has rapidly gained in popularity among professional developers. StackOverflow statistics show that 26% of developers preferred it in 2017, 34% in 2019, and 40% in 2021. Most recently, in StackOverflow’s 2022 Stack Developer Survey, PostgreSQL took a slight lead over MySQL (46.48% to 45.68%) as the most popular database platform among professional developers. 

PostgreSQL is favored strongly for its complex data analysis, data science, graphing, and AI-related capabilities. PostgreSQL is known for powerful and advanced features, including asynchronous replication, full-text searches of the database, and native support for JSON-style storage, key-value storage, and XML. PostgreSQL is also highly extensible, enabling users to add custom functionality through plug-ins and extensions.

MariaDB

MariaDB, a fork of MySQL known for its reliability and stability, offers many of the same features and capabilities as MySQL. Created as a response to concerns about the future of MySQL, MariaDB is now a popular alternative for those looking for a high-performance, open source relational database. MariaDB uses storage engines that give it a speed boost and enable users to implement distributed storage and distributed transactions. Additionally, it enables dynamic rows for table columns, which bolsters flexibility. Like MySQL, MariaDB is widely used for web and mobile applications, as well as for data warehousing and data analysis.

MongoDB

MongoDB is a popular source-available* non-relational (NoSQL) database. It stores data in documents and collections, a design ideal for handling large amounts of unstructured data. Known for performance and scalability, it’s often used for high volumes of data and for real-time web applications. 

* Whereas many open source software offerings — like the community version of MySQL — use the GPL license, MongoDB has been under the AGPL license and more recently under the SSPL license (introduced by MongoDB itself). Many open source proponents, including the Open Source Initiative, do not consider software under SSPL to be open source.

 

4. Comparison of open source databases

This section provides one-on-one comparisons of popular open source database software. The intent is not to pick winners, but to identify similarities and differences.

 

MySQL vs. PostgreSQL

MySQL is a relational database management system with typical RDBMS features, including tables, views, foreign keys, and stored procedures. It’s well-suited for most online transaction processing (OLTP) workloads and works with some online analytical processing (OLAP) workloads. MySQL is especially effective with standard relational schemas and web-based workloads. Simple asynchronous replication allows for easy read-scaling and report query offloading. Synchronous replication supports high availability. MySQL is used commonly by businesses that host web-based transactions — i.e. banking, shopping, ordering, and registering. MySQL ranks as the second most popular database management system (DB-Engines, March 2023).

PostgreSQL is an object-relational database management system (ORDBMS) with all the standard RDBMS features plus support for complex objects, table inheritance, and additional data types beyond JSON. Those additions allow PostgreSQL to support workloads and schema designs that are more complex. PostgreSQL has similar replication to that of MySQL for building out architectures. Being a more advanced database management system, PostgreSQL is well-suited for performing complex queries in a large environment quickly. Because it readily supports failover and full redundancy, it’s often preferred by financial institutions and manufacturers. It’s also preferred for use with geographic information systems (GIS) and geospatial data. PostgreSQL ranks as the fourth most popular database management system (DB-Engines, March 2023).

MySQL vs. MariaDB

MySQL, released in 1995, was acquired by Oracle in 2010. It’s an open source system, but there’s also proprietary code and a “premium” version available for paid users. Capable of handling a large volume of data, it Is the most commonly used system of storing, retrieving, and displaying data, and it’s used for millions of websites worldwide. Built-in data masking and dynamic columns make MySQL secure and fast. Due to its simple design and multiple storage engines, MySQL can deliver performant databases and continuous uptime. MySQL is a popular solution for designing database systems for eCommerce stores. As stated above, MySQL ranks as the second most popular database management system (DB-Engines, March 2023).

MariaDB was developed as a fork of MySQL soon after Oracle’s acquisition of MySQL. Like MySQL, MariaDB is an open source RDBMS. Similar in architecture and capabilities, MariaDB is a fully compatible drop-in replacement for MySQL. Unlike MySQL, MariaDB does not have proprietary versions; it’s completely open source, with no proprietary code. MariaDB is compatible with older versions of itself (backward compatible), a helpful feature with the software being constantly updated by the community. Also, now with a dynamic thread pool that enables retirement of inactive threads, MariaDB has improved speed, enhanced replication, and faster updates. It is used to work with the same types of applications as those of MySQL but can handle larger volumes of data. For instance, a company with an eCommerce store that must process exceptionally large volumes of data might opt for MariaDB. MariaDB ranks as the 13th most popular database management system (DB-Engines, March 2023).

MongoDB vs. Apache Cassandra

MongoDB is a non-relational (NoSQL) source-available database program that stores data in JSON-like documents with optional schemas. It’s highly scalable and ideal for real-time analytics and high-speed logging. MongoDB is a preferred system for analyzing data because documents are easily shared across multiple nodes, and because of its indexing, query-on-demand, and real-time aggregation capabilities. MongoDB replica sets enable data redundancy and automatic failover, setting the stage for high availability. MongoDB also provides strong encryption and firewall security. MongoDB requires substantial storage space. MongoDB is preferable for working with content management systems, mobile apps, and real-time analytics.

MongoDB is popular across industries — an enlyft survey of 90,240 companies using MongoDB listed the leading uses as Technology and Services (23%), Computer Software (16%), and Internet (6%). MongoDB ranks as the fifth most popular database management system and is No. 1 among non-relational systems (DB-Engines, March 2023).

Apache Cassandra is an open source distributed NoSQL database that also offers high scalability and availability. It manages unstructured data with thousands of writes every second. Fault tolerance and linear scalability make Cassandra popular for handling mission-critical data. But because it handles large amounts of data and multiple requests, transactions can be slower and there can be memory management issues. Apache Cassandra, with users across industries, ranks as the 12th most popular database management system (DB-Engines, March 2023).

 

5. Open source databases in action — by industry

Increasingly, organizations across industries are turning to open source solutions. DB Engines statistics (March 2023), in fact, show open source database management systems outranking commercial systems in both use and popularity. Organizations are making the move because open source provides flexibility, scalability, cost-effectiveness, and other attributes underdeveloped or missing in proprietary database offerings. Here are examples of how different organizations are using open source databases to meet industry-specific needs:

E-commerce

Open source databases are widely used for managing customer data, purchases, and other transactions, as well as inventory. With open source databases, e-commerce companies can process large amounts of data in real time. Open source solutions provide tools for ensuring high availability and maintaining performant applications, both of which are especially essential at peak shopping times.

Healthcare 

Strong security features make open source databases a logical choice for offices and institutions in which the vast majority of information is private. At the same time, open source gives authorized medical and mental health professionals reliable means of accessing and analyzing patient data. With health and lives at stake, such access cannot be compromised.

Government 

Agencies at all levels of government — local, regional, and national — use databases to provide services, conduct research, regulate industries, monitor environmental and personal well-being, improve infrastructure, protect citizens, and a lot more. With governments, maintaining highly resilient and available databases can be mission-critical, literally. Open source meets those essential needs, while delivering cost-effectiveness and scalability sought by agencies on limited budgets.

Nonprofit organizations 

Groups ranging from charities to educational institutions to churches use open source databases to track and manage donor information, identify deficiencies and needs, and organize volunteer activities. The flexible and customizable features of open source databases empower nonprofits — perpetually on limited budgets — to do more with less.

Financial services 

Banks, investment firms, payment services, and other financial institutions use open source databases to enable, track, and process customer transactions, ensure regulatory compliance, and perform risk assessments. Open source financial services databases provide the tools financial institutions need to build and provide highly resilient fintech products and services that satisfy the two disparate yet equally demanding groups referenced above: customers and regulatory agencies. The result is systems that can support real-time transactions, handle large amounts of data, and keep information secure.

High-tech/software companies 

Open source databases are used by high-tech and software companies to manage customer data, track sales, organize projects, and develop products. With their flexible and scalable architecture, open source databases are a popular choice for companies looking to streamline operations, support growth, and facilitate innovation. Open source enables software developers to collaborate with others in the global community.

 

6. The database debate: open source vs. proprietary  

In simple terms, open source software is free and available; anyone may access, use, and modify the code. With proprietary software, users cannot change the code; only the software owner/provider can do that. Each has merits and downsides. 

Proprietary database software

Proprietary software can be beneficial for addressing immediate and/or focused database concerns. Sometimes a vendor will have an innovative way of solving a problem when there aren’t alternatives available on the market. A company will enter a relationship with that vendor because the vendor’s solution addresses present business objectives. Additionally, a single-vendor relationship can eliminate complexity; the vendor’s solution simplifies the environment and ensures that all components work together. Support is simplified, too; the vendor provides a single-point-of-contact for addressing problems.

However, proprietary software can limit creative options and the ability to scale, and it can increasingly draw from a customer’s tech budget. As business objectives change, along with industry standards and technological advances, a customer can be stuck with software and upgrades that make more sense for the vendor’s bottom line than for addressing the customer’s changing needs. For example, the vendor might push a cloud-based solution when the customer prefers to keep its infrastructure on-premises. Being stuck with a single vendor and its software can result in vendor lock-in that makes you susceptible to price hikes, paying for bundled technology with components you don’t need, and an inability to change software and infrastructure to meet unique business needs.

Open source database software

Software that is truly open source is free to download. There are no licensing or purchasing fees for reusing, modifying, or distributing the software. Beyond cost-efficiency, the software itself is on par with, and sometimes better than, commercial options. Free of proprietary restrictions that can come with vendor lock-in, developers can customize the source code and try new applications without a big budget hit. Companies can more easily scale infrastructure — up or down — to meet economic conditions and changing business objectives. And there are strong, passionate online open source communities that can be a useful resource if challenges arise. Additionally, with open source, companies can deploy their databases anywhere — in cloud, on-premises, or hybrid environments — and move them at any time.

A lack of readily available support, however, can offset the potential savings of open source database software. Further, without the right built-in protections, open source databases can be more vulnerable than those of proprietary software. To achieve database objectives across the enterprise, a company that uses open source software often must either bolster its on-staff expertise or turn to third-party support. Either option can be costly.

Open source database software with enterprise-grade features

Open source database software with enterprise-grade features can deliver the best of both worlds — the cost-efficiency and scalability of open source coupled with the simplicity (task-focused), cohesiveness (components work together), and security of proprietary software. (A Red Hat survey showed that 89% of IT leaders see enterprise open source software as equally or more secure than proprietary software.) With the right extensions and add-ons to make it enterprise-grade, an open source solution can replicate the applications a company uses and can handle the performance requirements of the company’s most critical workloads. A flexible, open source enterprise setup enables deployment and operation on-premises, in the cloud, or in a hybrid environment.

The phrase “enterprise-grade” is used a lot; few vendors provide true enterprise-grade open source software. Enterprise-level integration, productivity, scalability, and security must be included. And even when it’s all there, enterprise-grade software, like community versions, might still require some level of support. When seeking such support, it’s important to find a vendor that provides multi-database support, technology-agnostic expertise, and a flexible contract.

 

7. Choosing the right open source database for your organization

With so many open source databases to choose from, it can be difficult to determine which one is the best fit for your organization. Here are a few factors to consider when choosing an open source database:

Functionality

Different open source databases are more suited for processing customer transactions, others for business analysis, and others for tracking assets and inventory. That’s just a sampling of general uses. But it’s important to consider how your business or organization uses data, and to select the database that provides the features, capabilities, and complexity that best support your activities and meet your objectives.

Scalability

You want to make sure that your database can grow along with the organization. Can your database store, retrieve, and process increasing amounts of data? Can it meet greater customer expectations and evolving compliance requirements? What about support for high availability and disaster recovery? You also should consider whether you might want to scale down your infrastructure amid challenging economic conditions. 

Community support

Open source software can amount to big savings, but not if you’re hit by slowdowns, shutdowns, or security breaches. In addition to strengthening and securing code available to all, a strong and active community can provide valuable resources, including tutorials, forums, and documentation for solving technical issues and maintaining performant operations. 

Cost

The software code is typically free to use and modify, but there are often costs associated with implementation, maintenance, and support of an open source database. It’s important to consider these costs when choosing an open source database and to factor them into your budget and long-term plans.

Technical skills

Finally, consider the technical skills and resources you have available in-house. Implementing and maintaining an open source database can require a significant investment of time and resources, so it’s important to choose a database that aligns with your existing technical skills and resources, with staff additions you’re willing to make, or with your willingness to seek outside help.

 

Enterprise-grade open source software and support from Percona 

When you’re choosing a database, consider Percona. We’re dedicated to making databases and applications run better through a combination of expertise and open source software. Our enterprise-grade distributions include the following:

  • Percona Distribution for MySQL: This single solution delivers optimized performance, greater scalability and availability, and enhanced backups — for even the most demanding workloads.
  • Percona Distribution for PostgreSQL: Put the best and most critical enterprise components from the open source community to work for you — in a single distribution, designed and tested to work together.
  • Percona Distribution for MongoDB: Ensure data availability while improving security and simplifying the development of new applications — in the most demanding public, private, and hybrid cloud environments.

Percona backs its enterprise-grade distributions with varying levels of support. We’ll provide support that best fits the needs of your company or organization — without a restrictive contract.

Mar
29
2023
--

Compression Methods in MongoDB: Snappy vs. Zstd


Compression in any database is necessary as it has many advantages, like storage reduction, data transmission time, etc.

Storage reduction alone results in significant cost savings, and we can save more data in the same space. As the amount of data grows, the need for efficient data compression becomes increasingly important to save storage space, reduce I/O overhead, and improve query performance.

In this blog, we will discuss both data and network-level compression offered in MongoDB. We will discuss snappy and zstd for data block and zstd compression in a network.

Percona Server for MongoDB (PSMDB) supports all types of compression and enterprise-grade features for free. I am using PSMDB 6.0.4 here.

Data compression

MongoDB offers various block compression methods used by the WiredTiger storage engine, like snappy, zlib, and zstd.

When data is written to disk, MongoDB compresses it with a specified block compression method and then writes it to disk. When this data block is read, it decompresses it in memory and presents it to the incoming request.

Block compression is a type of compression that compresses data in blocks rather than compressing the entire data set at once. Block compression can improve performance by allowing data to be read and written in smaller chunks.

By default, MongoDB provides a snappy block compression method for storage and network communication.

Snappy compression is designed to be fast and efficient regarding memory usage, making it a good fit for MongoDB workloads. Snappy is a compression library developed by Google.

Benefits of snappy compression in MongoDB:

  1. Fast compression and decompression speeds
  2. Low CPU usage
  3. A streamable format that allows for quick processing
  4. Minimal impact on query performance

Zstandard Compression or zstd, another newer block compression method provided by MongoDB starting for v4.2, provides higher compression rates. Zstd is a compression library that Facebook developed.

Zstd typically offers a higher compression ratio than snappy, meaning that it can compress data more effectively and achieve a smaller compressed size for the same input data.

Benefits of zstd compression in MongoDB:

  1. Higher compression ratios than Snappy
  2. Highly configurable compression levels
  3. Fast compression and decompression speeds
  4. Minimal impact on query performance

To enable zstd block compression, you need to specify the block compressor as “zstd” in the configuration file:

storage:
  engine: wiredTiger
  wiredTiger:
    collectionConfig:
      blockCompressor: zstd
      blockCompressorQuality: 6 #(available since v5.0)
    engineConfig:
      cacheSizeGB: 4

 

In the above example, blockCompressorQuality is set to 6, which is the default.

blockCompressorQuality specifies the level of compression applied when using the zstd compressor. Values can range from 1 to 22.

The higher the specified value for zstdCompressionLevel, the higher the compression which is applied. So, it becomes very important to test for the optimal required use case before implementing it in production.

Here, we are going to test snappy and zstd compression with the following configurations.

Host config: 4vCPU, 14 GB RAM

DB version: PSMDB 6.0.4

OS: CentOS Linux 7

I’ve used mgenerate command to insert a sample document.

mgeneratejs '{"name": "$name", "age": "$age", "emails": {"$array": {"of": "$email", "number": 3}}}' -n 120000000 | mongoimport --uri mongodb://localhost:27017/<db> --collection <coll_name> --mode insert

Sample record:

_id: ObjectId("64195975e40cea62af1be510"),
name: 'Verna Grant',
age: 44,
emails: [ 'guzwev@gizusuzu.mv', 'ba@ewobisrut.tl', 'doz@bi.ag' ]

I’ve created a collection using the below command with a specific block compression method. This does not affect any existing collection or any new collection being created after this.

db.createCollection("user", {storageEngine: {wiredTiger: {configString: "block_compressor=zstd"}}})

If any new collection is created in the default manner, it will always be the default snappy or compression method specified in the mongod config file.

At the time of insert ops, no other queries or DML ops were running in the database.

Snappy

Data size: 14.95GB

Data size after compression: 10.75GB

Avg latency: 12.22ms

Avg cpu usage: 34%

Avg insert ops rate: 16K/s

Time taken to import 120000000 document: 7292 seconds

snappy compression mongodb

Zstd (with default compression level 6)

Data size: 14.95GB

Data size after compression: 7.69GB

Avg latency: 12.52ms

Avg cpu usage: 31.72%

Avg insert ops rate: 14.8K/s

Time taken to import 120000000 document: 7412 seconds

zstd compression mongodb

We can see from the above comparison that we can save almost 3GB of disk space without impacting the CPU or memory.

Network compression

MongoDB also offers network compression.

This can further reduce the amount of data that needs to be transmitted between server and client over the network. This, in turn, requires less bandwidth and network resources, which can improve performance and reduce costs.

It supports the same compression algorithms for network compression, i.e., snappy, zstd, and zlib. All these compression algorithms have various compression ratios and CPU needs.

To enable network compression in mongod and mongos, you can specify the compression algorithm by adding the following line to the configuration file.

net:
compression:
   compressors: snappy

We can also use multiple compression algorithms like

net:
compression:
   compressors: snappy,zstd,zlib

 

The client should also use at least one or the same compression method specified in the config to have data over the network compressed, or the data between the client and server would be uncompressed.

In the below example, I am using a python driver to connect to my server with no compression, and zstd compression algorithm

I am doing simple find ops on the sample record shown above.

This is the outbound data traffic without any compression method

Here we can see data transmitted is around 2.33MB/s:

Now, I’ve enabled zstd compression algorithm in both the server and client

client = pymongo.MongoClient("mongodb://user:pwd@xx.xx.xx.xx:27017/?replicaSet=rs1&authSource=admin&compressors=zstd")

Here we can see data avg outbound transmission is around 1MB/s which is almost a 50% reduction.

Note that network compression can have a significant impact on network performance and CPU usage. In my case, there was hardly anything else running, so I did not see any significant CPU usage.

Conclusion

Choosing between snappy and zstd compression depends on the specific use cases. By understanding the benefits of each algorithm and how they are implemented in MongoDB, you can choose the right compression setting for your specific use case and save some disk space.

Choosing the appropriate compression algorithm is important based on your specific requirements and resources. It’s also important to test your applications with and without network compression to determine the optimal configuration for your use case.

I also recommend using  Percona Server for MongoDB, which provides MongoDB enterprise-grade features without any license, as it is free. You can learn more about it in the blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona also offers some more great products for MongoDB, like Percona Backup for MongoDBPercona Kubernetes Operator for MongoDB, and Percona Monitoring and Management.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Mar
29
2023
--

Fixing Misplaced Rows in a Partitioned Table

partitioned table in MySQL

A partitioned table in MySQL has its data separated into different tablespaces while still being viewed as a single table. Partitioning can be a useful approach in some cases when handling huge sets of data. Deleting huge data sets could be quickened up in a partitioned table, but if not handled properly, it can misplace your data in the table. In this blog, I will share how to check and fix the data in such a table with minimal disruption to the table.

In this example, we use a table partitioned based on a date range.

mysql> show create table salariesG
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE  COLUMNS(from_date)
(PARTITION p01 VALUES LESS THAN ('1985-12-31') ENGINE = InnoDB,
 PARTITION p02 VALUES LESS THAN ('1986-12-31') ENGINE = InnoDB,
 PARTITION p03 VALUES LESS THAN ('1987-12-31') ENGINE = InnoDB,
 PARTITION p04 VALUES LESS THAN ('1988-12-31') ENGINE = InnoDB,
 PARTITION p05 VALUES LESS THAN ('1989-12-31') ENGINE = InnoDB,
 PARTITION p06 VALUES LESS THAN ('1990-12-31') ENGINE = InnoDB,
 PARTITION p07 VALUES LESS THAN ('1991-12-31') ENGINE = InnoDB,
 PARTITION p08 VALUES LESS THAN ('1992-12-31') ENGINE = InnoDB,
 PARTITION p09 VALUES LESS THAN ('1993-12-31') ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN ('1994-12-31') ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN ('1995-12-31') ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN ('1996-12-31') ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN ('1997-12-31') ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN ('1998-12-31') ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN ('1999-12-31') ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN ('2000-12-31') ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN ('2001-12-31') ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN ('2002-12-31') ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */

If we query an arbitrary data set, MySQL returns it quickly and without any issue.    

mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'; 
+--------+--------+------------+------------+ 
| emp_no | salary | from_date  | to_date    | 
+--------+--------+------------+------------+ 
|  10001 |  84917 | 1999-06-23 | 2000-06-22 | 
+--------+--------+------------+------------+ 
1 row in set (0.00 sec)

But once the data is misplaced, it does not return any rows.  

mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23';
Empty set (0.00 sec)

At first glance, it looks like the data has been deleted; even EXPLAIN plan returns no result.

mysql> EXPLAIN SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: no matching row in const table
1 row in set, 1 warning (0.00 sec)

But, the data still exists in the table; it’s just misplaced in a different partition.

mysql> SELECT * FROM salaries PARTITION(p08) LIMIT 1;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  84917 | 1999-06-23 | 2000-06-22 |
+--------+--------+------------+------------+
1 row in set (0.00 sec)

 

How to check if there are more misplaced rows?

To check for more misplaced rows in the table, we could execute the CHECK command to find any more misplaced rows.

mysql> CHECK TABLE salaries;
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
| Table              | Op    | Msg_type | Msg_text                                                                              |
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
| employees.salaries | check | error    | Found a misplaced row in part 7 should be in part 14:  from_date:1999-06-23           |
| employees.salaries | check | error    | Partition p08 returned error                                                          |
| employees.salaries | check | error    | Table upgrade required. Please do "REPAIR TABLE `salaries`" or dump/reload to fix it! |
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
3 rows in set (0.83 sec)

 

How can rows be misplaced?

The above observation does raise one question: how can MySQL allow this to happen, and how can it be fixed? The answer to the former is that MySQL allows users to exchange a partition with a separate table; as explained in my previous blog Quick Data Archival in MySQL Using Partitions, by default, it checks every row before inserting data into the table. 

mysql> ALTER TABLE salaries EXCHANGE PARTITION p15 WITH TABLE salaries_p15;
Query OK, 0 rows affected (0.04 sec)

In the above query, salaries_p15 is an empty table with the same structure as salaries, except having no partitions in it. 

mysql> show create table salaries_p15G
*************************** 1. row ***************************
       Table: salaries_p15
Create Table: CREATE TABLE `salaries_p15` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

After exchanging the partition, partition p15 is empty, and salaries_p15 has data of partition p15.

mysql> SELECT count(*) from salaries_p15;
+----------+
| count(*) |
+----------+
|   260926 |
+----------+
1 row in set (0.02 sec)


mysql> SELECT count(*) from salaries PARTITION(p15);
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.06 sec)

MySQL allows exchanging partitions without checking every row so that exchanging data can be sped up.

mysql> ALTER TABLE salaries EXCHANGE PARTITION p08 WITH TABLE salaries_p15 WITHOUT VALIDATION;
Query OK, 0 rows affected (0.04 sec)

The above query will exchange the data in salaries_p15 quickly with an empty partition p08, assuming that the data is correct. In this case, it will result in misplaced rows in the wrong partition.

mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'; 
+--------+--------+------------+------------+ 
| emp_no | salary | from_date  | to_date    | 
+--------+--------+------------+------------+ 
|  10001 |  84917 | 1999-06-23 | 2000-06-22 | 
+--------+--------+------------+------------+ 
1 row in set (0.00 sec)

 

How to fix it

There are two ways to fix it – first, run the MySQL recommended REPAIR TABLE command, or second recreate the table by running an empty alter.

When the REPAIR TABLE is executed, the misplaced rows are arranged to their proper positions by analyzing all the rows, but it acquires a SYSTEM LOCK on the table. 

mysql> repair table salaries;
+--------------------+--------+----------+-----------------------------+
| Table              | Op     | Msg_type | Msg_text                    |
+--------------------+--------+----------+-----------------------------+
| employees.salaries | repair | warning  | Moved 260926 misplaced rows |
| employees.salaries | repair | status   | OK                          |
+--------------------+--------+----------+-----------------------------+
2 rows in set (5.30 sec)

The other way is to run an empty alter that will recreate the table.

One can run a direct alter on the table that will recreate the table and lock it until the alter is completed.

ALTER  employees.salaries ENGINE=Innodb;

One can also use pt-online schema change to recreate the table, which will not lock the table and have other benefits (Using Percona Toolkit to Alter Database Tables Online: A Controlled Approach) as well.

pt-online-schema-change --execute  --alter "ENGINE=InnoDb" h=localhost,D=employees,t=salaries

 

Conclusion

It is always recommended to be extra cautious while working with partitions and avoid using WITHOUT VALIDATION in the production environment to avoid any misplaced data.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Mar
28
2023
--

WiredTiger Logging and Checkpoint Mechanism

MongoDB WiredTiger

In this blog post, we will dive deep into WiredTiger’s Logging and Checkpoint mechanism.

Every database system has to ensure durability and reliability. MongoDB uses classic Write-Ahead-Logging (WAL) using Journals and Checkpoints.

Starting with the basics, why is WAL needed in the first place? It’s to ensure that our data is durable after each write operation and to make it persistent and consistent without compromising the performance.

In terms of MongoDB, it achieves WAL and data durability using a combination of both Journaling and Checkpoints. Let’s understand both of them.

1. Journal

It’s a process where every write operation gets written (appended) from Memory to a Journal file, AKA transaction log that exists on disk at a specific interval configured using “journalCommitIntervalMs.”

This acts as a step to ensure durability by recovering lost data from the same journal files in case of crashes, power, and hardware failures between the checkpoints (see below)

Here’s what the process looks like.

  • For each write operation, MongoDB writes the changes into Journal files, AKA transaction log files, which is basically a WAL mechanism used by MongoDB, as discussed above. This happens at every journalCommitIntervalMs.
  • The same data, in the form of pages inside the Wiredtiger cache, are also marked dirty.

Example of journal file when exported using WiredTiger Binary (wt):

$ wt printlog -u -x

{ "lsn" : [15,256],
    "hdr_flags" : "",
    "rec_len" : 256,
    "mem_len" : 256,
    "type" : "commit",
    "txnid" : 3836082,
    "ops": [
      { "optype": "row_put",
        "fileid": 14 0xe,
        "key": "u00e8du001au0015bu00ffu00ffu00dfu00c1",
        "key-hex": "e8641a1562ffffdfc1",
        "value": "gu0000u0000u0000u0002o….,
        "value-hex": "67000000026f7….."
      }
    ]
  },
  { "lsn" : [15,512],
    "hdr_flags" : "",
    "rec_len" : 128,
    "mem_len" : 128,
    "type" : "commit",
    "txnid" : 3836083,
    "ops": [
      { "optype": "row_modify",
        "fileid": 6 0x6,
        "key": "u0081",
        "key-hex": "81",
        "value": "u0001u0000u0000u….",
        "value-hex": "010000000000000008000000000000003e0000000000000008000000000000000100000063151a64"
      }

The important part of it is the byte and offset, which contains any data modifications that happened. 

2. Checkpoint

The role of a checkpoint in durability and consistency is equally important. A checkpoint is equivalent to a log, which records the changes in related data files after the last checkpoint.

Each checkpoint consists of a root page, three lists of pages pointing to specific locations on the disk, and the file size on the disk.

At every checkpoint interval (Default 60 seconds), MongoDB flushes the modified pages that are marked as dirty in the cache to their respective data files (both collection-*.wt and index-*.wt).

Using the same “wt” utility, we can list the checkpoints and view the information they contain. The checkpoint information shown below is stored with respect to each data file (collection and index). These checkpoints are stored in WiredTiger.wt file.

wiredtiger cache mongodb

The above diagram shows the information present in a checkpoint, while the below output shows how it looks when listed using the “wt” utility.

$ wt list -c 
WiredTigerCheckpoint.33: Sun Mar 26 08:35:59 2022 (size 8 KB)   	
file-size: 8 KB, checkpoint-size: 4 KB           	offset, size, checksum   root : 8192, 4096, 3824871989 (0xe3faea35)   	
alloc: 12288, 4096, 4074814944 (0xf2e0bde0)   	
discard : 0, 0, 0 (0)   	
available : 0, 0, 0 (0)

WiredTigerCheckpoint.34: Sun Mar 26 08:35:59 2022 (size 8 KB)   	
file-size: 8 KB, checkpoint-size: 4 KB           	offset, size, checksum   root : 8192, 4096, 997122142 (0x3b6ee05e)   	
alloc: 12288, 4096, 4074814944 (0xf8e0cde0)   	
discard : 0, 0, 0 (0)   	available : 0, 0, 0 (0)

This key information resides inside each checkpoint and consists of the following:

  • Root page:
    • Contains the size (size) of the root page, the position in the file (offset), and the checksum (checksum). When a checkpoint is created, a new root page will be generated.
  • Internal page:
    • Only carries the keys. WiredTiger traverses through internal pages to look for the respective Leaf page.
  • Leaf page:
    • Contains actual key: value pair
  • Allocated list pages: 
    • After the recent checkpoint, WiredTiger block manager keeps a record of newly allocated pages and their information, such as size, offset, and checksum.
  • Discarded list pages: 
    • Upon completion of the last checkpoint, associated pages will be discarded; however, key information such as size, offset, and the checksum of each such discarded page will be stored.
  • Available list pages: 
    • When this checkpoint is executed, all pages allocated by the WiredTiger block manager but not yet used; when deleting a previously created checkpoint, the available pages attached to it will be merged into the latest available list of this checkpoint, and also the size, offset, and checksum of each available page will be recorded.
  • File size:
    • Information about the size of a data file on disk upon completion of a checkpoint.

Although both the processes (involving disk) might look the same, they have different purposes. Journal, on the one hand, is an append-only operation in a journal file, AKA transaction log file present on disk. Checkpoints, on the other hand, deal with persisting the data on respective data files, which does include a lot of overhead due to the complexity involved, especially random disk operations and reconciliation.

Generally, the checkpoint is triggered.

  • At every 60 seconds (default), unless there’s a large amount of data that needs to be written, which creates a backlog due to I/O bottlenecks. 
  • When eviction_dirty_target or eviction_dirty_trigger reaches 5% and 20%, respectively. However, it’s not normal and only happens when there’s too much write activity beyond what the hardware can handle.

So, what happens when there’s an unexpected crash or hardware failure? Let’s take a look at the process when we start mongod.

  1. MongoD attempts to go into crash recovery and looks for anything there in the Journal files.

The trimmed output would look something like the one below in the “mongod log” files. 

{"t":{"$date":"2023-03-27T11:22:48.360+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1679916168:360670][9811:0x7f43b45d
7bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 15 through 16"}}
7bc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1679916159, 1)"}}
{"t":{"$date":"2023-03-27T11:22:48.688+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1679916168:688481][9811:0x7f43b45d
7bc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1679916154, 1)"}}
{"t":{"$date":"2023-03-27T11:22:48.695+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1679916168:695497][9811:0x7f43b45d
7bc0], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 10, snapshot max: 10 snapshot count: 0, oldest timestamp: (1679916154, 1) , meta checkpoint timest
amp: (1679916159, 1) base write gen: 11982970"}}
{"t":{"$date":"2023-03-27T11:22:48.705+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"
t":1679916159,"i":1}}}}

      2. Identifies the last successful checkpoint from the data files and recovers the uncommitted dirty data from the journal files back into the WireTtiger cache. The same pages will then again be marked as dirty.

Output is trimmed to show relevant information only.

file:demo/collection/108-2625234990440311433.wt
access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=snappy………checkpoint=(WiredTigerCheckpoint.33=(addr="018081e49e1d91ae9a81e4b44eefcd9b81e4be132c8a808080e30d3fc0e30c4fc0",order=33,time=1679897278,size=819200,newest_start_durable_ts=7215101747935576783,oldest_start_ts=0,......,checkpoint_backup_info=,checkpoint_lsn=(4294967295,2147483647)

3. This dirty page’s data will then again be ready to be flushed out during the next checkpoint to their respective data files on disk. This is handled by “WiredTiger Block Manager.” Unwanted journal entries will be then cleaned up accordingly post-checkpoint execution.

Voila!! We now have a durable and consistent data state even after a crash.

References: 

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB Today!

Mar
28
2023
--

What if the Backup Server Is Down and a Backup Is Needed? (Multi-repo Functionality of PgBackRest)

Multi Repo Functionality of PgBackRest

Our previous blogs discussed configuring and setting up backups using the pgBackRest solution. To briefly explain pgBackRest, it is an open source backup tool that takes FULL Backup, Incremental Backup, and Differential Backup for PostgreSQL databases.

Repository means the location/path on the server or the cloud where the actual copy of the backup will reside. In this blog, we will specifically discuss one of the important features of the pgBackRest called Multiple Repository (or, in short, Multi Repo). This attribute helps take the redundant copies of the databases at multiple locations remotely on different servers or locally on the same server.

Let’s discuss a few scenarios one by one.

Scenarios:
1. Behavior of pgBackRest with single repo
2. Behavior of pgBackRest with multiple repos
2.1. Configuring archives redundantly (async=y)
2.2. Taking backup locally with multiple repos
2.3. Taking backup locally and remotely on the cloud
2.4. Take backup locally and in multiple clouds

Pre-configured Setup:
>PostgreSQL installed and configured on the database host.
>pgBackRest is installed and configured on a dedicated backup and database host.

Scenario – 1: Behavior of pgBackRest with single repo

By default, pgBackRest takes backups in a single repository or location, which can reside locally on the same server, remote server, or cloud.

Below is the configuration file, which is normally configured for taking backups with single repo:

Backup Host: DB Host:
[global]
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

 

This configuration file contains details about the global section having common parameters like repo details, log details, etc. Even though there is a single repository, the parameter’s name starts from “repo1” to accommodate as many repositories as possible. Here, the backup will be stored on the local backup host at the /var/lib/pgbackrest_repo1 path that is already created with the proper user (in this case, Postgres) and permissions.

[pgstanza] is the name of the stanza for which backup is taken. For the sake of simplicity, we are considering backup for one DB Server only.

Let’s take the backup using the pgbackrest command:

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-03-23 04:19:14.059 P00 INFO: backup command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-03-23 04:19:15.315 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-03-23 04:19:15.822 P00 INFO: backup start archive = 000000010000000000000039, lsn = 0/39000028
2023-03-23 04:19:15.822 P00 INFO: check archive for prior segment 000000010000000000000038
2023-03-23 04:19:23.184 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-03-23 04:19:23.386 P00 INFO: backup stop archive = 000000010000000000000039, lsn = 0/39000138
2023-03-23 04:19:23.394 P00 INFO: check archive for segment(s) 000000010000000000000039:000000010000000000000039
2023-03-23 04:19:23.711 P00 INFO: new backup label = 20230323-041915F
2023-03-23 04:19:23.790 P00 INFO: full backup size = 22.0MB, file total = 961
2023-03-23 04:19:23.790 P00 INFO: backup command end: completed successfully (9733ms)
2023-03-23 04:19:23.791 P00 INFO: expire command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza
2023-03-23 04:19:23.792 P00 INFO: repo1: expire full backup 20230323-040330F
2023-03-23 04:19:23.806 P00 INFO: repo1: remove expired backup 20230323-040330F
2023-03-23 04:19:23.829 P00 INFO: repo1: 15-1 remove archive, start = 000000010000000000000035, stop = 000000010000000000000036
2023-03-23 04:19:23.830 P00 INFO: expire command end: completed successfully (39ms)

As shown below, the backup goes into the local directory, and inside that, we have two directories as below:

>backup – contains backup when FULL/INCR/DIFF backup is taken
>archive – contains archives that help in PITR

postgres@ip-172-31-54-194:~$ cd /var/lib/pgbackrest_repo1
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$ ls -ltr
total 8
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 archive
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 backup
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$

Scenario – 2: Behavior of pgBackRest with multiple repo

Multiple Repository (or multi-repo) functionalities of pgBackRest support different combinations of storing redundant backup copies. In this section, we have discussed a few of the most useful combinations where multiple backup copies can be stored.

2.1 Configuring Asynchronous archiving (archive-async=y)
Wal files redundancy in the two different repos is possible by using asynchronous archiving.
This (archive-async=y) parameter allows the archive-push and archive-get commands to work asynchronously.

When this parameter is enabled, the pgBackRest will copy the wal files into both the repos. The example below shows that the pgBackRest is archiving the wal files into two different repos.

--repo1-path=/var/lib/pgbackrest_repo1
--repo2-path=/var/lib/pgbackrest_repo2

If the first repo is unavailable and the second repo is available, then, in this case, the pgBackRest will continue copying/archiving the wal files to the second repo. Still, It will accumulate those wal files in the pg_wal directory not archived in the first repo.

A spool path (spool-path) parameter is necessary when asynchronous archiving is enabled. The current WAL archiving status is getting stored in the spool path.

We can track the activities of the asynchronous process in the [stanza]-archive-push-async.log file.

On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info check
2023-03-23 04:35:59.074 P00 INFO: check command begin 2.44: --exec-id=158656-d4a8f71e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --stanza=pgstanza
2023-03-23 04:35:59.920 P00 INFO: check repo1 configuration (primary)
2023-03-23 04:35:59.921 P00 INFO: check repo2 configuration (primary)
2023-03-23 04:36:00.124 P00 INFO: check repo1 archive for WAL (primary)
2023-03-23 04:36:01.327 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo1/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo1
2023-03-23 04:36:01.328 P00 INFO: check repo2 archive for WAL (primary)
2023-03-23 04:36:01.328 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo2/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo2
2023-03-23 04:36:01.430 P00 INFO: check command end: completed successfully (2358ms)
postgres@ip-172-31-54-194:~$

2.2 Taking backup locally with multiple repos:
In this example, we have tried to create two repositories in the local backup server itself, namely pgbackrest_repo1 and pgbackrest_repo2. One can configure both repositories in different storage. In case one storage is unavailable, then another storage will still have a backup, which can be helpful.

Please find the example of pgbackrest.conf in the backup host with two local repositories:

Backup Host: DB Host:
[global]
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
repo2-path=/var/lib/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
repo2-host=172.31.54.194
repo2-path=/var/lib/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

Let’s rename pgbackrest_repo1 so that it becomes inaccessible, and then let’s try to take the backup:

On the Backup Host:

ubuntu@ip-172-31-54-194:~$ sudo mv /var/lib/pgbackrest_repo1 /var/lib/pgbackrest_repo1_bkp
ubuntu@ip-172-31-54-194:~$ sudo su - postgres
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-02-10 12:40:27.160 P00 INFO: backup command begin 2.44: --exec-id=23422-c65cc1d9 --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:27.161 P00 INFO: repo option not specified, defaulting to repo1
ERROR: [055]: unable to load info file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' or '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy':
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' for read
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
2023-02-10 12:40:27.162 P00 INFO: backup command end: aborted with exception [055]
postgres@ip-172-31-54-194:~$

As shown above, it cannot take the backup and throws the error message highlighted, which is expected.

Let’s try to take a backup in repo=2 and check whether it allows us to do so.

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup --repo=2
2023-02-10 12:40:34.605 P00 INFO: backup command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:35.949 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-02-10 12:40:36.456 P00 INFO: backup start archive = 000000010000000000000028, lsn = 0/28000028
2023-02-10 12:40:36.456 P00 INFO: check archive for prior segment 000000010000000000000027
2023-02-10 12:40:43.993 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-02-10 12:40:44.195 P00 INFO: backup stop archive = 000000010000000000000028, lsn = 0/28000138
2023-02-10 12:40:44.201 P00 INFO: check archive for segment(s) 000000010000000000000028:000000010000000000000028
2023-02-10 12:40:45.521 P00 INFO: new backup label = 20230210-124035F
2023-02-10 12:40:45.579 P00 INFO: full backup size = 22.0MB, file total = 961
2023-02-10 12:40:45.580 P00 INFO: backup command end: completed successfully (10978ms)
2023-02-10 12:40:45.580 P00 INFO: expire command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza
2023-02-10 12:40:45.592 P00 INFO: repo2: 15-1 remove archive, start = 000000010000000000000020, stop = 000000010000000000000025
2023-02-10 12:40:45.592 P00 INFO: expire command end: completed successfully (12ms)
postgres@ip-172-31-54-194:~$

Excellent… the backup was successful for repo2. Now, let’s check the info and see what it says for repo1.

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info
stanza: pgstanza
status: mixed
repo1: error (missing stanza path)
repo2: ok
cipher: none
db (current)
wal archive min/max (15): 000000010000000000000026/000000010000000000000028
full backup: 20230210-123819F
timestamp start/stop: 2023-02-10 12:38:19 / 2023-02-10 12:38:28
wal start/stop: 000000010000000000000026 / 000000010000000000000026
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
full backup: 20230210-124035F
timestamp start/stop: 2023-02-10 12:40:35 / 2023-02-10 12:40:44
wal start/stop: 000000010000000000000028 / 000000010000000000000028
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info --repo=1
stanza: pgstanza
status: error (missing stanza path)
postgres@ip-172-31-54-194:~$

2.3 Taking backup locally and remotely on the cloud:
Let’s consider a scenario where one repo is locally available on the dedicated backup server, and another repo is available on the cloud. The advantage here is that in case anyone repo from a local system or cloud is unavailable, it can be availed. This combination can help us to take advantage of the cloud and on-prem local machines.

Let’s check the main configuration needed in pgbackrest.conf:

Backup Host: DB Host:
[global]
## Repo1: Local
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user=postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
## Repo1: Local
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user=postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

 

As we can see, repo1 related options are specific to storing the backup in the local repository present in the dedicated backup host on the path – /var/lib/pgbackrest_repo1.A few important options for repo2, available in the AWS S3:repo-type is s3 indicating the AWS S3, and it could be azure for Azure Cloud, GCS for Google Cloud,repo2-s3-bucket, repo2-s3-endpoint, repo2-s3-key-secret, and repo2-s3-region attributes varies from cloud to cloud. A bucket or required repo with proper user and permission must be created before configuring pgBackRest backups. More information on the same can be found in pgBackRest User Guide.

2.4 Take backup locally and in multiple clouds:
Another very useful scenario is creating a repository on multiple clouds and one locally in the dedicated backup host. Even if one cloud provider is unavailable, a backup could be available from any other cloud or the local repository. In this case, a configuration could be like repo1-type, repo2-type, repo3-type, and so on.

PostgreSQL backup locally and in multiple clouds

In the above diagram, four repositories have been created where one repository is available locally, and other repositories are at different clouds viz AWS S3, Azure, and Google Cloud, respectively. In this case, the configuration on the backup host will be as follows.

On the Backup Host:

postgres@ip-172-31-54-194:~$ cat /etc/pgbackrest.conf
[global]
## Repo1: Local
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
## Repo3: Azure
repo3-type=azure
repo3-path=/pgbackrest_repo3
repo3-retention-full=2
repo3-azure-account=pgbackrest
repo3-azure-container=pgbackrest-container
repo3-azure-key=accessKey3
## Repo4: Google Cloud
repo4-type=gcs
repo4-path=/pgbackrest_repo4
repo4-retention-full=2
repo4-gcs-bucket=pgbackrest-bucket
repo4-gcs-key=/etc/pgbackrest/gcs-key.json
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186

DB Host configurations will be set in the same way that has been mentioned in the earlier sections, along with the multiple repository details of the cloud.

Conclusion

To conclude, the major advantage of pgBackRest multi-repo functionality is that redundant backup copies can be taken. With the async=y option – archives will move to multiple repositories, and in case the default repo is unavailable, then the second repository will take care of archive files automatically.

The only limitation of this feature is that, by default, the backup will not go to repo=2 even though it is configured. We need to mention the repository number in case a backup needs to be taken in repo2. Also, one needs to take backup multiple times, mentioning the repo number explicitly so that backup can be taken in repositories other than the default repo. Eg – pgbackrest –stanza=pgstanza –log-level-console=info –type=full backup –repo=2.

Despite these limitations, multi-repo functionality can be used to take the backups on the secondary repo, even if the dedicated backup server is unavailable.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Mar
27
2023
--

Talking Drupal #392 – Public Speaking

Today we are talking about Public Speaking with Marjorie Freeman & AmyJune Hineline.

For show notes visit: www.talkingDrupal.com/392

Topics

  • Why Public Speaking is important
  • Why people are afraid of Public Speaking
  • Tips to overcome
  • Imposter Syndrome
  • What makes a good experience
  • Never do’s
  • Favorite Public Speakers
  • Alternatives
  • Podcasting as Public Speaking
  • Writing Workshop
  • How to get started

Resources

Guests

Marjorie Freeman – @the_mjfreeman AmyJune Hineline – @volkswagenchick

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Kat Shaw – drupal.org/u/katannshaw @katannshaw

MOTW Correspondent

Stephen Cross @stephencross Environment Indicator Adds visual indicator to identify which admin environment is being accessed.

Mar
27
2023
--

How To Fix Oplog Restore Error: E11000 Duplicate Key Error Collection Using Percona Backup for MongoDB

error E11000 Percona Backup for MongoDB

E11000 duplicate key error’ is an error that you might have encountered during the restore process. In this blog, we will talk about in what scenario you might face this error when restoring Oplog for PITR (point-in-time recovery).

You might wonder why this error will come during PITR as the operations in the Oplog are idempotent, meaning they always result in the same change to the database no matter how many times they’re performed. Now let’s see in which scenario you might face this error while applying Oplog.

I created a collection book with four documents with a unique compound index. So as per the application logic, the document is first inserted, updated, and deleted, but when a new document is re-inserted, it is created with the same keys and values on which a unique index is created.

Index:

{
                "v" : 2,
                "unique" : true,
                "key" : {
                        "number" : 1,
                        "author" : 1
                },
                "name" : "number_1_author_1"
        }

Now your application logic is written in such a way that it is doing insert, update, delete, and again inserting the document with the same values in the number and author keys on which the unique index is created. Below we have already inserted four docs, and now, we will update one of the below documents.

  1. First insert:

    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    { "_id" : ObjectId("641c11c1d0495f80ac5e610f"), "number" : 4, "title" : "a time to kill", "author" : "Graham" }
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 08:45:55 GMT+0000 (UTC)
    rs1:PRIMARY>

    Corresponding ops in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"})
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }, "ts" : Timestamp(1679561151, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:45:51.775Z") }
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }, "ts" : Timestamp(1679561151, 2), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:45:51.778Z") }
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }, "ts" : Timestamp(1679561151, 3), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:45:51.781Z") }
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c11c1d0495f80ac5e610f"), "number" : 4, "title" : "a time to kill", "author" : "Graham" }, "ts" : Timestamp(1679561153, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:45:53.302Z") }
    rs1:PRIMARY>
  2. After update:

    rs1:PRIMARY> db.book.update({"number":4},{$set: {title: "Minion"}})
    WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 08:47:29 GMT+0000 (UTC)
    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    { "_id" : ObjectId("641c11c1d0495f80ac5e610f"), "number" : 4, "title" : "Minion", "author" : "Graham" }
    rs1:PRIMARY>

    Corresponding op in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"}).sort({$natural:-1}).limit(1)
    { "op" : "u", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "$v" : 1, "$set" : { "title" : "Minion" } }, "o2" : { "_id" : ObjectId("641c11c1d0495f80ac5e610f") }, "ts" : Timestamp(1679561243, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:47:23.448Z") }
    rs1:PRIMARY>
  3. After remove:

    rs1:PRIMARY> db.book.remove({number: 4})
    WriteResult({ "nRemoved" : 1 })
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 08:48:37 GMT+0000 (UTC)
    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    rs1:PRIMARY>

    Corresponding op in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"}).sort({$natural:-1}).limit(1)
    { "op" : "d", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c11c1d0495f80ac5e610f") }, "ts" : Timestamp(1679561315, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:48:35.203Z") }
    rs1:PRIMARY>
  4. Below we will insert a new document:

    rs1:PRIMARY> db.book.insert({"number" : 4, "title" : "Stranger things", "author" : "Graham" })
    WriteResult({ "nInserted" : 1 })
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 08:50:46 GMT+0000 (UTC)
    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }
    rs1:PRIMARY>

    Corresponding op in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"}).sort({$natural:-1}).limit(1)
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679561444, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:50:44.861Z") }
    rs1:PRIMARY>
  5. Take mongodump of database london:

    # mongodump
    2023-03-23T08:52:01.032+0000    writing admin.system.version to dump/admin/system.version.bson
    2023-03-23T08:52:01.033+0000    done dumping admin.system.version (1 document)
    2023-03-23T08:52:01.033+0000    writing london.book to dump/london/book.bson
    2023-03-23T08:52:01.034+0000    done dumping london.book (4 documents)
    #
  6. We will again insert a new doc:

    rs1:PRIMARY> db.book.insert({"number" : 5, "title" : "Stranger things", "author" : "Graham" })
    WriteResult({ "nInserted" : 1 })
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 08:52:37 GMT+0000 (UTC)
    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }
    { "_id" : ObjectId("641c1352673573868f17cacf"), "number" : 5, "title" : "Stranger things", "author" : "Graham" }
    rs1:PRIMARY>

    Corresponding op in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"}).sort({$natural:-1}).limit(1)
    { "op" : "i", "ns" : "london.book", "ui" : UUID("9cb7ae0a-6974-4363-ae24-0de30f0cd915"), "o" : { "_id" : ObjectId("641c1352673573868f17cacf"), "number" : 5, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679561554, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T08:52:34.739Z") }
    rs1:PRIMARY>
  7. Take incremental Oplog backup.

    After first Oplog dump:

    # mongodump --db=local --collection=oplog.rs --query '{"ts":{"$gte":{"$timestamp": {"i":1, "t": 1679561120}}, "$lte":{"$timestamp": {"i":2, "t": 1679561151}}}}'
    2023-03-23T08:59:29.667+0000    writing local.oplog.rs to dump/local/oplog.rs.bson
    2023-03-23T08:59:29.668+0000    done dumping local.oplog.rs (7 documents)
    #
    # cd dump/local/
    # ls -ltrh
    total 8.0K
    -rw-r--r-- 1 root root  193 Mar 23 08:59 oplog.rs.metadata.json
    -rw-r--r-- 1 root root 1.1K Mar 23 08:59 oplog.rs.bson
    # mv oplog.rs.metadata.json oplog_1.rs.metadata.json
    # mv oplog.rs.bson oplog_1.rs.bson
    # ls -ltr
    total 8
    -rw-r--r-- 1 root root  193 Mar 23 08:59 oplog_1.rs.metadata.json
    -rw-r--r-- 1 root root 1.1K Mar 23 08:59 oplog_1.rs.bson
    #
    # bsondump oplog_1.rs.bson | tail -n 1
    
    2023-03-23T09:01:50.590+0000    7 objects found
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c11bfd0495f80ac5e610d"},"number":{"$numberDouble":"2.0"},"title":"game of thrones","author":"chetan"},"ts":{"$timestamp":{"t":1679561151,"i":2}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561151778"}}}
    #

    Above is the last document of the first Oplog backup, i.e., till {“$timestamp”:{“t”:1679561151,”i”:2}}.

    After second Oplog dump (incremental):

    # mongodump --db=local --collection=oplog.rs --query '{"ts":{"$gt":{"$timestamp": {"i":2, "t": 1679561151}}, "$lte":{"$timestamp": {"i":1, "t": 1679562180}}}}'
    2023-03-23T09:03:27.070+0000    writing local.oplog.rs to dump/local/oplog.rs.bson
    2023-03-23T09:03:27.071+0000    done dumping local.oplog.rs (107 documents)
    #
    # ls -ltrh
    total 24K
    -rw-r--r-- 1 root root  193 Mar 23 08:59 oplog_1.rs.metadata.json
    -rw-r--r-- 1 root root 1.1K Mar 23 08:59 oplog_1.rs.bson
    -rw-r--r-- 1 root root  193 Mar 23 09:03 oplog.rs.metadata.json
    -rw-r--r-- 1 root root  12K Mar 23 09:03 oplog.rs.bson
    #
    # bsondump oplog.rs.bson | head -n 1
    2023-03-23T09:04:28.976+0000    107 objects found
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c11bfd0495f80ac5e610e"},"number":{"$numberDouble":"3.0"},"title":"2 states","author":"Peter"},"ts":{"$timestamp":{"t":1679561151,"i":3}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561151781"}}}
    #

    Above is the starting document of the second/incremental Oplog backup i.e., $timestamp”:{“t”:1679561151,”i”:3}}

     

  8. Drop database:

    rs1:PRIMARY> db.dropDatabase()
    {
            "dropped" : "london",
            "ok" : 1,
            "$clusterTime" : {
                    "clusterTime" : Timestamp(1679562380, 2),
                    "signature" : {
                            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                            "keyId" : NumberLong(0)
                    }
            },
            "operationTime" : Timestamp(1679562380, 2)
    }
    rs1:PRIMARY> Date()
    Thu Mar 23 2023 09:06:37 GMT+0000 (UTC)
    rs1:PRIMARY>

    Corresponding op in Oplog:

    rs1:PRIMARY> db.oplog.rs.find({ns:"london.$cmd"}).sort({$natural:-1}).limit(1)
    { "op" : "c", "ns" : "london.$cmd", "o" : { "dropDatabase" : 1 }, "ts" : Timestamp(1679562380, 2), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T09:06:20.121Z") }
    rs1:PRIMARY>

  9. First, we will restore the database from the dump, which we took in step five:

    # mongorestore dump/london/book.bson
    2023-03-23T09:09:05.433+0000    checking for collection data in dump/london/book.bson
    2023-03-23T09:09:05.433+0000    reading metadata for london.book from dump/london/book.metadata.json
    2023-03-23T09:09:05.448+0000    restoring london.book from dump/london/book.bson
    2023-03-23T09:09:05.490+0000    finished restoring london.book (4 documents, 0 failures)
    2023-03-23T09:09:05.490+0000    restoring indexes for collection london.book from metadata
    2023-03-23T09:09:05.490+0000    index: &idx.IndexDocument{Options:primitive.M{"name":"number_1_author_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"number", Value:1}, primitive.E{Key:"author", Value:1}}, PartialFilterExpression:primitive.D(nil)}
    2023-03-23T09:09:05.526+0000    4 document(s) restored successfully. 0 document(s) failed to restore.
    #

    The above-restored documents match until step five before taking the dump of london database:

    rs1:PRIMARY> db.book.find()
    { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
    { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }
    { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }
    rs1:PRIMARY>

    Corresponding ops in Oplog after restore from dump:

    { "lsid" : { "id" : UUID("ce17fa90-9f06-4710-9b3b-3f4e9484e213"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(1), "op" : "i", "ns" : "london.book", "ui" : UUID("9e30c316-b21a-4f43-bbcc-d80593001391"), "o" : { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679562545, 5), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T09:09:05.451Z"), "stmtId" : 3, "prevOpTime" : { "ts" : Timestamp(1679562545, 4), "t" : NumberLong(1) } }
    
    { "lsid" : { "id" : UUID("ce17fa90-9f06-4710-9b3b-3f4e9484e213"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(1), "op" : "i", "ns" : "london.book", "ui" : UUID("9e30c316-b21a-4f43-bbcc-d80593001391"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610e"), "number" : 3, "title" : "2 states", "author" : "Peter" }, "ts" : Timestamp(1679562545, 4), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T09:09:05.451Z"), "stmtId" : 2, "prevOpTime" : { "ts" : Timestamp(1679562545, 3), "t" : NumberLong(1) } }
    
    { "lsid" : { "id" : UUID("ce17fa90-9f06-4710-9b3b-3f4e9484e213"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(1), "op" : "i", "ns" : "london.book", "ui" : UUID("9e30c316-b21a-4f43-bbcc-d80593001391"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610d"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }, "ts" : Timestamp(1679562545, 3), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T09:09:05.451Z"), "stmtId" : 1, "prevOpTime" : { "ts" : Timestamp(1679562545, 2), "t" : NumberLong(1) } }
    
    { "lsid" : { "id" : UUID("ce17fa90-9f06-4710-9b3b-3f4e9484e213"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(1), "op" : "i", "ns" : "london.book", "ui" : UUID("9e30c316-b21a-4f43-bbcc-d80593001391"), "o" : { "_id" : ObjectId("641c11bfd0495f80ac5e610c"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }, "ts" : Timestamp(1679562545, 2), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T09:09:05.451Z"), "stmtId" : 0, "prevOpTime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) } }

  10. Now we will check in Oplog backup to which document the data has been recovered and from which Oplog file we need to apply ops. We can see that documents in the first Oplog backup have been restored in step nine. To verify, below is the last ops entry in the first Oplog backup:
    # bsondump oplog_1.rs.bson | tail -n 1
    2023-03-23T09:11:40.539+0000    7 objects found
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c11bfd0495f80ac5e610d"},"number":{"$numberDouble":"2.0"},"title":"game of thrones","author":"chetan"},"ts":{"$timestamp":{"t":1679561151,"i":2}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561151778"}}}
    #

    Now we need to replay the second Oplog backup just before the drop command ops (we already have the time of drop database command in step eight) for PITR (we can see below ops is already available, but we cannot split the BSON file based on the time or ops, so we need to apply full Oplog slice):

    # bsondump oplog.rs.bson | head -n 1
    2023-03-23T09:12:31.646+0000    107 objects found
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c11bfd0495f80ac5e610e"},"number":{"$numberDouble":"3.0"},"title":"2 states","author":"Peter"},"ts":{"$timestamp":{"t":1679561151,"i":3}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561151781"}}}
    #
    #mongorestore --oplogFile=oplog.rs.bson --oplogReplay --oplogLimit 1679562370:1 /root/dump/local/
    2023-03-23T09:13:18.969+0000    preparing collections to restore from
    2023-03-23T09:13:18.969+0000    don't know what to do with file "/root/dump/local/oplog.rs.bson", skipping…
    2023-03-23T09:13:18.969+0000    don't know what to do with file "/root/dump/local/oplog.rs.metadata.json", skipping…
    2023-03-23T09:13:18.969+0000    don't know what to do with file "/root/dump/local/oplog_1.rs.bson", skipping…
    2023-03-23T09:13:18.969+0000    don't know what to do with file "/root/dump/local/oplog_1.rs.metadata.json", skipping…
    2023-03-23T09:13:18.969+0000    replaying oplog
    2023-03-23T09:13:18.978+0000    Failed: restore error: error applying oplog: applyOps: (DuplicateKey) E11000 duplicate key error collection: london.book index: number_1_author_1 dup key: { number: 4.0, author: "Graham" }
    2023-03-23T09:13:18.978+0000    0 document(s) restored successfully. 0 document(s) failed to restore.
    #

    We can see Oplog replay got failed due to a unique index constraint, as we can see the ops associated with { number: 4.0, author: “Graham” } is already present in the database:

    { "_id" : ObjectId("641c12e4d0495f80ac5e6111"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }

    Below are the ops from the incremental Oplog backup slice associated with { number: 4.0, author: “Graham” }. So if you see the first ops below is an insert op with a different _id (“o”:{“_id”:{“$oid”:”641c11c1d0495f80ac5e610f“}) which was inserted in the beginning. When Oplog tries to replay the below op, it sees that there is already a document with a different _id  associated with { number: 4.0, author: “Graham” }, and it cannot apply this op due to a unique index violation. Thus failing to apply the Oplog and PITR.

    # bsondump oplog.rs.bson | grep Graham 
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c11c1d0495f80ac5e610f"},"number":{"$numberDouble":"4.0"},"title":"a time to kill","author":"Graham"},"ts":{"$timestamp":{"t":1679561153,"i":1}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561153302"}}}
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c12e4d0495f80ac5e6111"},"number":{"$numberDouble":"4.0"},"title":"Stranger things","author":"Graham"},"ts":{"$timestamp":{"t":1679561444,"i":1}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561444861"}}}
    {"op":"i","ns":"london.book","ui":{"$binary":{"base64":"nLeuCml0Q2OuJA3jDwzZFQ==","subType":"04"}},"o":{"_id":{"$oid":"641c1352673573868f17cacf"},"number":{"$numberDouble":"5.0"},"title":"Stranger things","author":"Graham"},"ts":{"$timestamp":{"t":1679561554,"i":1}},"t":{"$numberLong":"1"},"v":{"$numberLong":"2"},"wall":{"$date":{"$numberLong":"1679561554739"}}}
    2023-03-23T09:14:34.118+0000    107 objects found
    #

     

There are two solutions for the above issue:

    1. If the incremental Oplog backup is having only ops starting since the last op that is in the database backup.
    2. Have Percona Backup for MongoDB (PBM) configured and let PBM handle all the above manual processes (restoring dump + applying Oplog for PITR) automatically.

To overcome the above issue, I configured the PBM on the same replica set and took a backup (both full and incremental Oplog). Here’s how to install, set up, and configure PBM.

Below is the process I followed again from step one to step six for PBM, and below are the corresponding ops in Oplog:

rs1:PRIMARY> db.oplog.rs.find({ns:"london.book"})
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75df8"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }, "ts" : Timestamp(1679571722, 2), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T11:42:02.988Z") }
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75df9"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }, "ts" : Timestamp(1679571722, 3), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T11:42:02.990Z") }
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75dfa"), "number" : 3, "title" : "2 states", "author" : "Peter" }, "ts" : Timestamp(1679571722, 4), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T11:42:02.992Z") }
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0b6eddc54ce4f75dfb"), "number" : 4, "title" : "a time to kill", "author" : "Graham" }, "ts" : Timestamp(1679571723, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T11:42:03.840Z") }
{ "op" : "u", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "$v" : 1, "$set" : { "title" : "Minion" } }, "o2" : { "_id" : ObjectId("641c3b0b6eddc54ce4f75dfb") }, "ts" : Timestamp(1679571914, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T11:45:14.026Z") }
{ "op" : "d", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0b6eddc54ce4f75dfb") }, "ts" : Timestamp(1679573573, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T12:12:53.817Z") }
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c42519662ff210d35343e"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679573585, 3), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T12:13:05.567Z") }
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c433b474b755a080f10bb"), "number" : 5, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679573819, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T12:16:59.806Z") }

Below are the two full backups taken + incremental Oplog backup:

# pbm list
Backup snapshots:
  2023-03-23T11:52:05Z <logical> [restore_to_time: 2023-03-23T11:52:10Z]
  2023-03-23T12:14:04Z <logical> [restore_to_time: 2023-03-23T12:14:08Z]
PITR <off>:
  2023-03-23T11:52:11Z - 2023-03-23T13:06:45Z
#

Above, you can see the latest backup was taken till 2023-03-23T12:14:08, and incremental Oplog backup was taken till 2023-03-23T13:06:45.

Now we will drop the database:

rs1:PRIMARY> db.dropDatabase()
{
        "dropped" : "london",
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1679576445, 4),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1679576445, 4)
}
rs1:PRIMARY> Date()
Thu Mar 23 2023 13:00:51 GMT+0000 (UTC)

Now we will restore the database and perform PITR using PBM:

# pbm restore --time="2023-03-23T13:00:44"

Below are the logs for restore + PITR:

# pbm logs --tail=100 | grep -i restore
2023-03-23T13:07:37Z I [rs1/127.0.0.1:27017] got command pitrestore [name: 2023-03-23T13:07:37.410117048Z, point-in-time ts: 1679576444] <ts: 1679576857>
2023-03-23T13:07:37Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] to time: 2023-03-23T13:00:44Z
2023-03-23T13:07:37Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] recovery started
2023-03-23T13:07:38Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] moving to state running
2023-03-23T13:07:41Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] restoring users and roles
2023-03-23T13:07:41Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] moving to state dumpDone
2023-03-23T13:07:43Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] starting oplog replay
2023-03-23T13:07:43Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] oplog replay finished on {1679576440 2}
2023-03-23T13:07:44Z I [rs1/127.0.0.1:27017] [pitrestore/2023-03-23T13:07:37.410117048Z] recovery successfully finished

Below are the documents after restore + PITR via PBM:

rs1:PRIMARY> db.book.find()
{ "_id" : ObjectId("641c3b0a6eddc54ce4f75df8"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }
{ "_id" : ObjectId("641c3b0a6eddc54ce4f75df9"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }
{ "_id" : ObjectId("641c3b0a6eddc54ce4f75dfa"), "number" : 3, "title" : "2 states", "author" : "Peter" }
{ "_id" : ObjectId("641c42519662ff210d35343e"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }
{ "_id" : ObjectId("641c433b474b755a080f10bb"), "number" : 5, "title" : "Stranger things", "author" : "Graham"
rs1:PRIMARY>

Below are the Oplog entries after PBM restore, and we can see that PBM restored the relevant base backup first and started applying Oplog after the last op in the base backup.

rs1:PRIMARY> db.oplog.rs.find({ns:/^london/}).sort({$natural:-1})
{ "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c433b474b755a080f10bb"), "number" : 5, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679576863, 9), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:43.222Z") }
{ "op" : "c", "ns" : "london.$cmd", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "commitIndexBuild" : "book", "indexBuildUUID" : UUID("c95a669a-9e62-4a59-974a-a702c52127da"), "indexes" : [ { "v" : 2, "unique" : true, "key" : { "number" : 1, "author" : 1 }, "name" : "number_1_author_1" } ] }, "ts" : Timestamp(1679576861, 8), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.159Z") }
{ "op" : "c", "ns" : "london.$cmd", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "startIndexBuild" : "book", "indexBuildUUID" : UUID("c95a669a-9e62-4a59-974a-a702c52127da"), "indexes" : [ { "v" : 2, "unique" : true, "key" : { "number" : 1, "author" : 1 }, "name" : "number_1_author_1" } ] }, "ts" : Timestamp(1679576861, 6), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.152Z") }
{ "lsid" : { "id" : UUID("6ec94b0c-098b-4986-a971-078c36761cf5"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(3), "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c42519662ff210d35343e"), "number" : 4, "title" : "Stranger things", "author" : "Graham" }, "ts" : Timestamp(1679576861, 4), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.006Z"), "stmtId" : 3, "prevOpTime" : { "ts" : Timestamp(1679576861, 3), "t" : NumberLong(1) } }
{ "lsid" : { "id" : UUID("6ec94b0c-098b-4986-a971-078c36761cf5"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(3), "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75dfa"), "number" : 3, "title" : "2 states", "author" : "Peter" }, "ts" : Timestamp(1679576861, 3), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.006Z"), "stmtId" : 2, "prevOpTime" : { "ts" : Timestamp(1679576861, 2), "t" : NumberLong(1) } }
{ "lsid" : { "id" : UUID("6ec94b0c-098b-4986-a971-078c36761cf5"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(3), "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75df9"), "number" : 2, "title" : "game of thrones", "author" : "chetan" }, "ts" : Timestamp(1679576861, 2), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.006Z"), "stmtId" : 1, "prevOpTime" : { "ts" : Timestamp(1679576861, 1), "t" : NumberLong(1) } }
{ "lsid" : { "id" : UUID("6ec94b0c-098b-4986-a971-078c36761cf5"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "txnNumber" : NumberLong(3), "op" : "i", "ns" : "london.book", "ui" : UUID("5959bd53-6986-4d0d-a72a-9fac81113934"), "o" : { "_id" : ObjectId("641c3b0a6eddc54ce4f75df8"), "number" : 1, "title" : "harry potter", "author" : "shakespear" }, "ts" : Timestamp(1679576861, 1), "t" : NumberLong(1), "v" : NumberLong(2), "wall" : ISODate("2023-03-23T13:07:41.006Z"), "stmtId" : 0, "prevOpTime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) } }


Above, you can see that PBM has applied the latest backup and performed the PITR automatically. The reason that we didn’t face the ‘
E11000 duplicate key error’ during PITR using PBM is that PBM handles it automatically, from which Oplog entry it needs to apply the ops after the restore from a full backup. PBM will ensure consistency while restoring a full backup + incremental Oplog backup.

Here’s how Percona Backup for MongoDB works.

Conclusion

So above, we can see how to avoid the ‘E11000 duplicate key error’ using PBM automatically. Another way is also possible, as explained above, but that will require a manual process. Why go with a manual process when PBM is open source, does not require any license, and can handle it automatically?

Please check out our products for Percona Server for MongoDB, Percona Backup for MongoDB, and Percona Operator for MongoDB. We also recommend checking out our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Mar
27
2023
--

Impact of Querying Table Information From information_schema


On MySQL and Percona Server for MySQL, there is a schema called information_schema (I_S) which provides information about database tables, views, indexes, and more.

A lot of useful information can be retrieved from this schema, for example, table metadata and foreign key relations, but trying to query I_S can induce performance degradation if your server is under heavy load, as shown in the following example test.

Disclaimer: This blog post is meant to show a less-known problem but is not meant to be a serious benchmark. The percentage in degradation will vary depending on many factors {hardware, workload, number of tables, configuration, etc.}.

Test

The test compares a baseline of how the server behaves while “under heavy load but no queries against I_S” vs. ” under heavy load + I_S queries” to check for performance degradation.

The same tests have been executed in Percona Server for MySQL 5.7 and 8.0. The queries executed against I_S to check performance degradation checks information about some table FKs (foreign keys) relationships.

Setup

The setup consists of creating 10K tables with sysbench and adding 20 FKs to 20 tables.

Hardware

Number of CPUs: 12

Memory size: 12288 MB

Main Percona Server for MySQL configuration variables tuned in my.cnf:

Open-files-limit = 65535
Table-definition-cache = 4096
Table-open-cache = 2000
Table_open_cache_instances = 1
Innodb-buffer-pool-size = 10G
Innodb_buffer_pool_instances = 2

You can review the full my.cnf file here: https://github.com/ctutte/blog_IS/blob/master/my.cnf

Executed queries

To generate the database setup, I used sysbench with the following flags:

sysbench --db-driver=mysql --mysql-user=root --mysql-password=123456Ab! 
    --mysql-db=test --range_size=100 
    --table_size=2250 --tables=10000 --threads=12 --events=0 --time=120   
    --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

Then manually created 20 FKs:

ALTER TABLE sbtest1 ADD FOREIGN KEY (id) REFERENCES sbtest8001 (id);
ALTER TABLE sbtest2 ADD FOREIGN KEY (id) REFERENCES sbtest8002 (id);
…
{repeat 20 times}

After the scenario was set up, I executed three runs of sysbench with the following query:

sysbench --db-driver=mysql --mysql-user=root --mysql-password=123456Ab! 
    --mysql-db=test --range_size=100 
    --table_size=2250 --tables=1000 --threads=10 --events=0 --time=60 
    --rand-type=uniform /usr/share/sysbench/oltp_read_write.lua --report-interval=1 run

To query against I_S, I used the following bash command line, which checks FK in a while loop from bash:

while true; do { mysql -u root -p123456Ab! -e "SELECT TABLE_NAME, COLUMN_NAME, CONSTRAINT_NAME, REFERENCED_TABLE_NAME, 
REFERENCED_COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_SCHEMA IS NOT NULL"; }; done

Considering that the server has 12 vCPUs and sysbench is using 10, then there is some spare CPU capacity for running the queries against I_S, right? Let’s see the results.

Results for Percona Server for MySQL 5.7

Three runs of 60 seconds and 1000 tables:

Total queries per second
Without FK check 8337600 46320
With FK checks 6924900 38472

 

It can be seen that only running sysbench, 46320 queries per second, could be executed on the server.

In the scenario where I checked sysbench + I_S queries to check FKs continuously, 38472 queries per second were executed. That is (46320 – 38472) * 100 / 46320 = 16.9% performance degradation.

If the I_S queries were CPU bound, then I would expect the server to be able to do more than 46320 QPS, but something is going on, and actually, the amount of QPS did go down.

The reason for this performance degradation is that querying against I_S will need to open tables frm files (limited by table_open_cache variable).

In my example test, when running only sysbench, the workload shows few Table_open_cache_misses (the first column is initial value, each successive value is delta increase per second).

Table_open_cache_misses while NOT checking for FKs:

58437 28  15 13 7 5 7 6 1 0 0 1 0 3 8 10 5 0 0 0 0 0 0 1 0 5 0 6 5 7 5 4 0 2 2 4 6 1 0 0 0 0 0 0 0 0 0 0 0

Whereas when running sysbench + I_S queries, there is a greater number of Table_open_cache_misses due to MySQL/Percona Server for MySQL 5.7 having to open each table.frm (and in which my test runs, I have purposely read a very high number of tables compared to “Table-open-cache” variable).

Table_open_cache_misses while checking for FKs:

68133 30   31 33 30 32 31 31 29 32 28 32 30 3230 33 31 30 31 28 31 31 33 32 33 29 33 34 32 31 33 30 28 34 33 30 29 29 29 25 30 31 29 33 27 29 29 28 28

The above outputs were generated using pt-stalk + pt-mext from Percona Toolkit.

This can also be seen using Percona Monitoring and Management (PMM) and checking the “MySQL overview” dashboard ->“MySQL table open cache status” graphic.

When running only sysbench, there is a high number of cache hits (~99.98%) and a low number of “misses,” which means the table_cache is big enough for the workload.

MySQL table open cache status

Whereas while running sysbench + I_S queries, there is a decrease in the number of cache hits at ~15:33:40 when FK checks executed until ~15:35:00, which also shows an increase in (table) “Openings” and “Misses” for the duration of the test.

MySQL open cache

Performance degradation when querying against I_S can be minimized if:

Results for Percona Server for MySQL 8.0

For MySQL and Percona Server for MySQL 8, executing sysbench + I_S queries shows there almost no cache misses, as can be seen in the following snapshot:

MySQL 8 improved data dictionary access, which avoids having to open all tables .frm files. This improvement has been ported to Percona Server for MySQL 8.

Conclusion

In conclusion, checking table information from 5.7 I_S relies on opening .frm files from disk to retrieve table information, which can cause a performance drop when the amount of opened tables does not fit in the table_cache.

If you rely heavily on queries against information_schema, then your workload will benefit from upgrading to MySQL/Percona Server for MySQL 8 because of the new data dictionary format. While you are on 5.7, you can also consider increasing table_open_cache to avoid table_misses or at least use some filters on the “WHERE” clause to avoid opening all .frm files and limit the query to a subset of the tables for a limited performance impact.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Mar
27
2023
--

Updated Percona Distributions for MySQL, Percona Distributions for MongoDB: Release Roundup March 27, 2023

Percona Releases

It’s time for the release roundup!

Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights, critical information, links to the full release notes, and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since March 13, 2023. Take a look!

Percona Monitoring and Management 2.36

Percona Monitoring and Management 2.36 was released on March 23, 2023. It is an open source database monitoring, management, and observability solution for MySQL, PostgreSQL, and MongoDB. We are excited to announce PMM 2.36 now supports pg_stat_monitor 2.0 (PGSM 2.0) in QAN, a powerful PostgreSQL query performance monitoring tool. By downloading this update, you will have access to the latest improvements and fixes covered by PGSM2. Important: PostgreSQL versions 11 and 12, along with PGSM 2.0, will be supported in PMM 2.37.0.

Download Percona Monitoring and Management 2.36

Percona Distribution for MySQL (PXC-based variant) 8.0.31

On March 15, 2023, Percona Distribution for MySQL (Percona XtraDB Cluster-based variant) 8.0.31 was released. It is the most stable, scalable, and secure open-source MySQL distribution, with two download options: one based on Percona Server for MySQL and one based on Percona XtraDB Cluster. This release is focused on the Percona XtraDB Cluster-based deployment variation.

This release adds the GCache encryption and Write-Set cache encryption feature in tech preview, along with improvements and bug fixes introduced by Oracle for MySQL 8.0.31.

Download Percona Distribution for MySQL (PXC-based variant) 8.0.31 

Percona Distribution for MySQL (PS-based variation) 8.0.32

Percona Distribution for MySQL (Percona Server for MySQL-based variation) 8.0.32 was released on March 20, 2023. This release is focused on the Percona Server for MySQL-based deployment variation. It is based on Percona Server for MySQL 8.0.32-24. In this release, Percona decided to revert the following MySQL bug fix:

The data and the GTIDs backed up by mysqldump were inconsistent when the options --single-transaction and --set-gtid-purged=ON were both used. It was because in between the transaction started by mysqldump and the fetching of GTID_EXECUTED, GTIDs on the server could have increased already. With this fixed, a FLUSH TABLES WITH READ LOCK is performed before the fetching of GTID_EXECUTED to ensure its value is consistent with the snapshot taken by mysqldump.

The MySQL fix also added a requirement when using –single-transaction and executing FLUSH TABLES WITH READ LOCK for the RELOAD privilege. (Bug #33630199, Bug #105761)

Download Percona Distribution for MySQL (PS-based variation) 8.0.32

Percona Server for MySQL 8.0.32

March 20, 2023, saw the release of Percona Server for MySQL 8.0.32. It includes all the features and bug fixes available in the MySQL 8.0.32 Community Edition and enterprise-grade features developed by Percona. A release highlight is a decision to revert the following MySQL bug fix:

The data and the GTIDs backed up by mysqldump were inconsistent when the options –single-transaction and –set-gtid-purged=ON were both used. It was because in between the transaction started by mysqldump and the fetching of GTID_EXECUTED, GTIDs on the server could have increased already. With this fixed, a FLUSH TABLES WITH READ LOCK is performed before the fetching of GTID_EXECUTED to ensure its value is consistent with the snapshot taken by mysqldump.

The MySQL fix also added a requirement when using –single-transaction and executing FLUSH TABLES WITH READ LOCK for the RELOAD privilege. (Bug #33630199, Bug #105761)

Download Percona Server for MySQL 8.0.32

Percona XtraDB Cluster 8.0.31-23

Percona XtraDB Cluster 8.0.31-23 was released on March 14, 2023. It supports critical business applications in your public, private, or hybrid cloud environment. Our free, open source, enterprise-grade solution includes the high availability and security features your business requires to meet your customer expectations and business goals.

Percona XtraDB Cluster is based on Percona Server for MySQL. This release adds the feature GCache encryption and Write-Set cache encryption, which is in tech preview, and includes improvements and bug fixes introduced by Oracle for MySQL 8.0.31.

Download Percona XtraDB Cluster 8.0.31-23

Percona Distribution for PostgreSQL 15.2

On March 20, 2023, Percona Distribution for PostgreSQL 15.2 was released. It is a solution with the collection of tools from the PostgreSQL community that are tested to work together and serve to assist you in deploying and managing PostgreSQL. The aim of Percona Distribution for PostgreSQL is to address the operational issues like High-Availability, Disaster Recovery, Security, Performance and Scalability, and others that enterprises are facing. Release highlights include:

  • A new extension pgpool – a middleware between PostgreSQL server and client for high availability, connection pooling and load balancing – is added.
  • Percona Distribution for PostgreSQL is now available on Red Hat Enterprise Linux 9 and compatible derivatives.

Download Percona Distribution for PostgreSQL 15.2

Percona Distribution for PostgreSQL 14.7

On March 23, 2023, we released Percona Distribution for PostgreSQL 14.7. This release of Percona Distribution for PostgreSQL is based on PostgreSQL 14.7. Release highlights include:

  • Added a new extension pgpool – a middleware between PostgreSQL server and client for high availability, connection pooling, and load balancing.
  • Percona Distribution for PostgreSQL is now available on Red Hat Enterprise Linux 9 and compatible derivatives

Download Percona Distribution for PostgreSQL 14.7

Percona Distribution for MongoDB 5.0.15

Percona Distribution for MongoDB 5.0.15 was released on March 16, 2023. It is a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. The aim of Percona Distribution for MongoDB is to enable you to run and operate your MongoDB efficiently with the data being consistently backed up. Warning: Due to critical issues identified in previous releases that may affect data integrity and performance, we recommend upgrading all production environments to the latest version – currently Percona Distribution for MongoDB 5.0.15. Find details about those issues in MongoDB 5.0 Community Edition release notes.

Download Percona Distribution for MongoDB 5.0.15

Percona Distribution for MongoDB 4.4.19

Percona Distribution for MongoDB 4.4.19 was released on March 20, 2023. It includes the following components:

  • Percona Server for MongoDB is a fully compatible source-available, drop-in replacement for MongoDB.
  • Percona Backup for MongoDB is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets.

This release of Percona Distribution for MongoDB is based on Percona Server for MongoDB 4.4.19-19 and Percona Backup for MongoDB 2.0.4. Please read the release notes for highlights, bug fixes, and improvements.

Download Percona Distribution for MongoDB 4.4.19

Percona Server for MongoDB 5.0.15-13

On March 16, 2023, we released Percona Server for MongoDB 5.0.15-13. It is an enhanced, source-available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 5.0.15 Community Edition, supporting MongoDB 5.0.15 protocols and drivers.

Warning: Due to critical issues identified in previous releases that may affect data integrity and performance, we recommend upgrading all production environments to the latest version – currently MongoDB 5.0.15. Find details about those issues in MongoDB 5.0 Community Edition release notes.

Download Percona Server for MongoDB 5.0.15-13

Percona Server for MongoDB 4.4.19-19

On March 20, 2023, we released Percona Server for MongoDB 4.4.19-19, a source available, highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.4.19 Community Edition enhanced with enterprise-grade features. It supports MongoDB 4.4.19 protocols and drivers.

Release Highlights include:

  • The support for authentication using AWS IAM enables you to natively integrate Percona Server for MongoDB with AWS services, increase security of your infrastructure by setting up password-less authentication and offload your DBAs from managing different sets of secrets. This a technical preview feature.
  • Improved master key rotation for data at rest encrypted with HashiCorp Vault enables you to use the same secret key path on every server in your entire deployment, thus significantly simplifying the secrets management and key rotation process.

Download Percona Server for MongoDB 4.4.19-19

Percona Backup for MongoDB 2.0.5

March 23, 2023, saw the release of Percona Backup for MongoDB 2.0.5, a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a non-sharded replica set) and for restoring those backups to a specific point in time.

This release introduces the following fixes:

  • for the physical restore process for deployments where the mongod --dbpath option has a forward slash (‘/’) as the last char
  • security vulnerability CVE-2022-41723 by updating the Golang library dependencies to the latest versions

Download Percona Backup for MongoDB 2.0.5

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments.

Mar
24
2023
--

A Workaround for The “RELOAD/FLUSH_TABLES privilege required” Problem When Using Oracle mysqldump 8.0.32


In MySQL Server 8.0.32, Oracle fixed Bug #105761:

“mysqldump make a non-consistent backup with ‐‐single-transaction option” (this commit)

which caused a wave of complaints from users who could no longer do backups with the mysqldump utility because of the lack of the required privileges.

  • Bug #109701 “Fix for #33630199 in 8.0.32 introduces regression when ‐‐set-gtid-purged=OFF”
  • Bug #109685 “mysqldump has incompatible change in MySQL 8.0.32″

Later, Oracle admitted the problem and even added a new paragraph to the MySQL Server 8.0.32 Release Notes.

Limitation: This fix adds a requirement for the RELOAD privilege when using ‐‐single-transaction to execute FLUSH TABLES WITH READ LOCK; the MySQL team is investigating a solution. (Bug #33630199, Bug #105761)

With the release of Percona Server for MySQL 8.0.32-24, we believe there is a better solution to this problem.

For some combinations of server Global Transaction Identifier (GTID) mode and mysqldump command line arguments, you can use the mysqldump utility from the Percona Server for MySQL 8.0.32 packages/tarballs.

Below you will find a detailed description of the mechanism that mysqldump utility is using when creating dumps (thanks to Saikumar VS for conducting a series of experiments).

The terms below:

  • STWCS – START TRANSACTION WITH CONSISTENT SNAPSHOT
  • FTWRL – FLUSH TABLES WITH READ LOCK
  1. When the server has GTIDs enabled
    mysqldump command line arguments mysqldump from MySQL Server 8.0.31 mysqldump from MySQL Server 8.0.32 mysqldump from Percona Server for MySQL 8.0.32
    ‑‑single‑transaction
    ‐‐set‑gtid‑purged=OFF
    STWCS FTWRL
    STWCS
    STWCS
    ‑‑single-transaction
    ‐‐set‑gtid‑purged=ON
    STWCS FTWRL
    STWCS
    STWCS
  2. When the server has GTIDs disabled
    mysqldump command line arguments mysqldump from MySQL Server 8.0.31 mysqldump from MySQL Server 8.0.32 mysqldump from Percona Server for MySQL 8.0.32
    ‑‑single‑transaction STWCS FTWRL
    STWCS
    STWCS

To sum up, it is safe to use the mysqldump utility from Percona Server for MySQL 8.0.32 with the “‐‐single-transaction” command line argument when the server is not using GTIDs. Also, when GTIDs are enabled on the server, it is safe to use the mysqldump utility from Percona Server for MySQL 8.0.32 with the “‐‐single-transaction ‐‐set-gtid-purged=OFF”. Creating these dumps does not rely on sending FTWRL to the server and, therefore, won’t require the unnecessary RELOAD/FLUSH_TABLES privilege from the user on behalf of whom the backup is created.

On the other hand, when you use the mysqldump utility from Percona Server for MySQL 8.0.32 with the “‐‐single-transaction ‐‐set-gtid-purged=ON”, just the STWCS statement (without FTWRL) will be sent to the server, which guarantees consistent results only when using Percona Server for MySQL (not Oracle MySQL Server). This happens because of additional code patches that Percona Server for MySQL has and Oracle MySQL Server does not. Basically, when you use mysqldump utility from Percona Server for MySQL 8.0.32 with “‐‐single-transaction ‐‐set-gtid-purged=ON” on an instance of Oracle MySQL Server 8.0.32, you will get the old (faulty) 8.0.31 behavior that does not guarantee that the snapshot taken and purged GTID set will be in sync.

If you indeed need to run mysqldump with “‐‐single-transaction ‐‐set-gtid-purged=ON” on an instance of Oracle MySQL Server and you want to get a consistent snapshot of your data, you have no other choice than using the mysqldump binary from the MySQL Server 8.0.32. And “yes,” in this case, your user needs to have either RELOAD or FLUSH_TABLES privilege.

Please also notice that these recommendations are just temporary solutions that can be used until Oracle fixes these bugs properly (the fixes are currently scheduled to be included in the upcoming MySQL Server 8.0.33).

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com