Oct
31
2023
--

Percona Is Introducing Telemetry Mechanisms Into MySQL, PostgreSQL, and MongoDB

Telemetry Mechanisms Into MySQL, PostgreSQL, and MongoDB

Percona has a mission to provide the best open source database software, support, and services so our users can innovate freely.

We are proud of how far we have come over the last 16+ years. Continuing this trajectory into the future improvements in the development of our software products will require many decisions. Our hope is to make these decisions with as much useful data as possible. Data-informed decision-making is key to delivering products that users love and also key to making the difficult choices on where to invest precious development resources and funds.

How we have used data for data-driven decisions in the past

This is not the first time we have collected data and based our investment decisions on it. 

Back in 2020, we set out to learn more about how Percona Monitoring and Management (PMM) was being used in order to provide a better product to the community. The following are a few examples of decisions that came from this data and some of what comes next in terms of additional telemetry in Percona products. 

Many know that the two M’s in PMM stand for Monitoring and Management ⏤ and we were working on improving the Management in PMM ⏤ so we started building in components that would allow users to not just monitor their environments but also maintain them. One such feature was managed backups. The feedback from users was that “backup management is a must” and “we need something that allows us to manage all of our database backups in one place, not server by server.” When we looked at adoption, we were surprised to see almost none; the little we did see was just us. We needed to make a decision on continuing to invest in this feature or pull the plug on it. Using this data enabled us to have very targeted conversations with users about where we were missing the mark. We poured their feedback into the product, and as we iterated, adoption took off!

This data also showed us that PMM is monitoring tens of thousands of MySQL instances, and when we looked at the breakdown of versions, we saw that ⏤ as of 10 months ago ⏤ the majority of those were still on version 5.7, which is scheduled for EOL by the end of October 2023. This insight allowed us to put two critical programs in place to help companies either make the transition to 8.0 faster or offer End-of-Life support to those who were unable to make the switch. Both programs have been met with great responses from new and existing customers who found themselves stuck!

These two examples of data-driven decisions at Percona are great illustrations of what we want to be able to do across all the products and deployment configurations.

The usage trends of today are the investment areas of tomorrow

To continue creating added benefits for our users and customers, we are adding telemetry directly to our core database products. With this mechanism in place, we will gain insights into how databases are being used, both in terms of environments, versions, and lifecycle, as well as key features of the solutions.

This will help us understand more about how these products are being used so we can provide the same level of data-informed decision-making on MySQL, PostgreSQL, and MongoDB.

While this isn’t new to the industry, it is new for Percona distributions for databases. In our ongoing effort to be as transparent with our customers and users as possible, we are outlining explicitly what kind of information we will collect, what we do with it, and how you can opt out.

We will share the data back

We believe in open source at Percona. This principle has led us to where we are today, and we want to apply it to our implementation of telemetry as well.

The process of gathering data will take some time, but we plan to periodically share statistics that we collect, like breakdowns of versions of database software being used or popular operating systems and architectures. This blog post already comes with some of such data (breakdown of MySQL versions we observed via PMM over the last six months; please see above).

Opt-out mechanisms with no impact on functionality

Our telemetry is going to be enabled by default. All the statistics we collect are anonymized and thus can’t be tracked back to their origins. We do not collect any proprietary information. That being said, if you decide you do not want to share this data with Percona, you can opt out from the telemetry mechanisms easily. Read on for how to control the telemetry. This information will also be provided within Percona documentation pages for all the products with the telemetry module.

The decision not to share the telemetry data with Percona has absolutely no impact on database functionality.

Do you want to learn more?

All the details about what we collect will be found in the help documents structure for involved Percona products. These articles will also help you understand how to opt out from the telemetry, how it is being sent to Percona, and how you can exercise your data owners’ rights.

The type and scope of the data we collect with the telemetry will evolve over time. This blog contains the information precise for the first release of the telemetry module. Please be sure to check the associated product documentation to learn more about the telemetry being collected for each of the products.

Here is an excerpt from Percona XtraDB Cluster 8.0.34 release documentation, which is the first database product to receive Percona telemetry:

What information is collected

At this time, telemetry is added only to the Percona packages and Docker images. Percona XtraDB Cluster collects only information about the installation environment. Future releases may add additional metrics.

Be assured that access to this raw data is rigorously controlled. Percona does not collect personal data. All data is anonymous and cannot be traced to a specific user. To learn more about our privacy practices, read our Percona Privacy statement.

An example of the data collected is the following:

[{"id" : "c416c3ee-48cd-471c-9733-37c2886f8231",
"product_family" : "PRODUCT_FAMILY_PXC",
"instanceId" : "6aef422e-56a7-4530-af9d-94cc02198343",
"createTime" : "2023-10-16T10:46:23Z",
"metrics":
[{"key" : "deployment","value" : "PACKAGE"},
{"key" : "pillar_version","value" : "8.0.34-26"},
{"key" : "OS","value" : "Oracle Linux Server 8.8"},
{"key" : "hardware_arch","value" : "x86_64 x86_64"}]}]

Disable telemetry

Starting with Percona XtraDB Cluster 8.0.34-26-1, telemetry will be enabled by default. If you decide not to send usage data to Percona, you can set the PERCONA_TELEMETRY_DISABLE=1 environment variable for either the root user or in the operating system prior to the installation process. Setting this environment variable looks different depending on the deployment method. The upcoming documentation release will explain this process in more detail.

Telemetry for Operators is collected via a different mechanism but can be disabled using instructions here (this is for Postgres Operator, but the same steps work for MySQL and MongoDB).

Please note that in the example above, the “id” field is the measurement id, and “instance_id” field is an id of the database node where the data is collected from. The “instance_id” is required by our backend software to differentiate data coming from different sources, but it is impossible for Percona to determine the actual origin or host information based on this identifier.

Check our code

You can check the telemetry module’s code in our GitHub repository: https://github.com/percona-lab/telemetry-agent 

Do you have feedback?

Please share it through this Percona Community Forum thread I have created for that purpose.

Oct
31
2023
--

Percona Monitoring and Management 2.40.1, Percona Distribution for MongoDB 6.0.11: Release Roundup October 31, 2023

Percona Releases

Percona is a leading provider of unbiased, performance-first, open source database solutions that allow organizations to easily, securely, and affordably maintain business agility, minimize risks, and stay competitive, free from vendor lock-in. Percona software is designed for peak performance, uncompromised security, limitless scalability, and disaster-proofed availability.

Our Release Roundups showcase the latest Percona software updates, tools, and features to help you manage and deploy our software. It offers highlights, critical information, links to the full release notes, and direct links to the software or service itself to download.

Today’s post includes those releases and updates that have come out since October 16, 2023. Take a look.

Percona Monitoring and Management 2.40.1

Percona Monitoring and Management 2.40.1 (PMM) was released on October 20, 2023. PMM is an open source database monitoring, management, and observability solution for MySQL, PostgreSQL, and MongoDB. In this release, fixed issues include:

  • PMM-12592 – Fixed an issue where the Library Panels for the PMM dashboard were not working after upgrade to PMM 2.40.0.
  • PMM-12576 – After upgrading to PMM 2.40.0, changing the Admin user’s password from the terminal was not functioning. The issue has been resolved now.
  • PMM-12587 – After upgrading to PMM 2.40.0, some users may experience incorrect mappings between dashboards, folders, users, and groups. This can result in either a successful upgrade or a 500 internal server error. The issue has now been resolved.
  • PMM-12590 CVE-2023-4911 is a vulnerability in the OS that PMM is based on. It has been fixed in the base OS, and the fix is available in PMM.

Download Percona Monitoring and Management 2.40.1

Percona Distribution for MongoDB 6.0.11

On October 19, 2023, Percona Distribution for MongoDB 6.0.11 was released. It is a freely available MongoDB database alternative, giving you a single solution that combines enterprise components from the open source community, designed and tested to work together. Percona Distribution for MongoDB includes the following components:

  • Percona Server for MongoDB is a fully compatible source-available, drop-in replacement for MongoDB.
  • Percona Backup for MongoDB is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets.

Download Percona Distribution for MongoDB 6.0.11

Percona Distribution for MongoDB 4.4.25

Percona Distribution for MongoDB 4.4.25 was released on October 16, 2023. The bug fixes and improvements provided by MongoDB and included in Percona Server for MongoDB are the following:

  • Improved the balancer behavior to stop iterating collections when there are no more available shards
  • Improved performance of updating the routing table and prevented blocking client requests during refresh for clusters with 1 million of chunks
  • Fixed commit point propagation for exhaust oplog cursors.
  • Disallowed the increase of oldest ID during recovery and let all the history store records return to the rollback to stable irrespective of global visibility.
  • Disallowed saving the update chain when there are no updates to be written to the history store.

Download Percona Distribution for MongoDB 4.4.25

Percona Server for MongoDB 6.0.11-8

Percona Server for MongoDB 6.0.11-8 was released on October 19, 2023. It is an enhanced, source-available, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 6.0.10 Community Edition and MongoDB 6.0.11 Community Edition.

Download Percona Server for MongoDB 6.0.11-8

Percona Server for MongoDB 4.4.25-24

On October 16, 2023, we released Percona Server for MongoDB 4.4.25-24, a source available, highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.4.25 Community Edition enhanced with enterprise-grade features.

Download Percona Server for MongoDB 4.4.25-24

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MongoDB, PostgreSQL, MariaDB, and other open source databases in on-premises and cloud environments and is trusted by global brands to unify, monitor, manage, secure, and optimize their database environments.

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Oct
31
2023
--

Resolving Data Drift in a Dual-Primary Topology With Replica in MySQL/MariaDB

Resolving Data Drift in a Dual-Primary Topology With Replica

Hello friends,

In Managed Services, we have the opportunity to see different technologies and various topologies, which makes the work fascinating and challenging at the same time.

This time, I’m going to tell you about a particular case: a client with a dual-primary topology plus a replica, as detailed below:

PS-primary-1=192.168.0.14 [RW]
  |___ PS-primary-2=192.168.0.59 [RW] (Slave_delay: 0)
  |___ PS-replica-1=192.168.0.99 [R] (Slave_delay: 0)

[RW] means Read/Write access. 

[R] means Read Only access.

We received an alert of this kind on PS-primary-2:

Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table foo.persons; Can't find record in 'persons', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000360, end_log_pos 58177

Seconds later, we noticed that someone (not us) resolved the replication issue, possibly by skipping the error in some way (using pt-slave-restart) or simply executing something like:

mysql> STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

Or by injecting an empty transaction.

(Disclaimer: We won’t deeply analyze the root cause here, but rather the surprise we found during the analysis.)

Having seen this, the most advisable action is to run pt-table-checksum, and if any differences appear, use pt-table-sync to fix the data drift.

Therefore, we executed the pt-table-checksum in the following way:

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T12:40:31      0      1        6          1       1       0   0.193 foo.persons

(Second disclaimer: We have populated the table with a small amount of data to simplify and improve visibility in this case. With larger volumes, the same scenario also applies.)

(Credentials, for security reasons, are stored in the file /home/user/.my.cnf, which is why you don’t see them in any executed command. If you want to know how to create it, here’s an example)

At a glance, there seem to be differences. Let’s check the other primary instance and the replica as well.

PS-primary-2> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 08:40:31 | foo | persons |          5 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

PS-replica-1> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 09:53:10 | foo | persons |          4 |      1 |
+---------------------+-----+---------+------------+--------+
1 row in set (0.00 sec)

The next step is to execute pt-table-sync with the –print option to review what the utility will actually do. The following output appears:

$ pt-table-sync --print h=192.168.0.59 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='3' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.14 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.59 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1519 user:percona host:localhost.localdomain*/;

A DELETE statement has appeared. How is it possible?

As this is a dual-primary topology, it is advisable to perform a checksum on the other primary server as well, which is PS-primary-2. This will help identify any discrepancies and ensure data consistency between both primary servers.

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T13:08:12      0      1        5          1       1       0   0.147 foo.persons

That’s when we notice that one server has six rows, and the other has four. How is it possible? Let’s review:

PS-primary-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

PS-replica-1 (none)> SELECT max(ts), db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
+---------------------+-----+---------+------------+--------+
| max(ts)             | db  | tbl     | total_rows | chunks |
+---------------------+-----+---------+------------+--------+
| 2023-07-28 10:03:00 | foo | persons |          6 |      1 |
+---------------------+-----+---------+------------+--------+

Let’s see what data the table currently contains:

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

As you can see, there is data drift everywhere. 

And how can this be possible? I’ll enumerate several possibilities, and it is not an exhaustive list

Data drift and integrity of replica data is complex. Let’s outline some common ways this can happen (generally speaking):

  • A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
  • A statement was executed directly on the replica
  • This can happen if the replica was not in super_read_only and a super user executed
  • This can happen if the replica was not in read_only
  • A statement was executed on a replica, and the replica was later promoted to a primary without GTID in place
  • A primary server is not configured for full ACID compliance, and it crashed
  • At some point, the primary was not configured for row-based replication (even briefly)

These are the most common ways that data drift occurs, but certainly not the only ways that it can happen. More exotic cases can involve bugs, engine differences, version differences, etc.

Coming back to the issue, if we run pt-table-sync targeting the second primary instance, we will see the following result:

$ pt-table-sync --print h=192.168.0.14 --sync-to-master --database=foo --table=persons --replicate percona.checksums

DELETE FROM `foo`.`persons` WHERE `personid`='1' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
DELETE FROM `foo`.`persons` WHERE `personid`='7' LIMIT 1 /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10') /*percona-toolkit src_db:foo src_tbl:persons src_dsn:P=3306,h=192.168.0.59 dst_db:foo dst_tbl:persons dst_dsn:h=192.168.0.14 lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:5732 user:percona host:localhost.localdomain*/;

Here, the output is less promising: it suggests deleting the rows with ID 1 and 7, and performing a REPLACE on ID 3.

Therefore, note that it asks us to delete rows, which is not correct. So, what to do?  After a discussion with the customer (we’ve shared with them all that we gathered), we agreed on the following:

  • We need to add the row with ID=3 to the instances where it’s missing.
  • We need to update the row with ID=1 in the instances where it’s missing.
  • We need to add the row with ID=7 to the instances where it’s missing.
  • We DO NOT need to delete any rows; no DELETES are necessary.

Important note: The client informed us that the data that exists only in one instance and not in the others is necessary.  The client also verified that the actions we proposed to them were correct. Remember: we, as DBAs, are the guardians of the data and its integrity, but not necessarily the owners, let alone know the content for ethical reasons.

How do we achieve this, then?

Unfortunately, in this case, using pt-table-sync is not recommended since, as we have seen with the –print option, it would perform DELETES in both cases, regardless of which primary instance we consider as the source of truth.

We also cannot execute the REPLACE on PS-primary-1 because the REPLACE command follows a specific logic explained here

MySQL uses the following algorithm for REPLACE (and LOAD DATA … REPLACE):

  1. Try to insert the new row into the table
  2. While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
    1. Delete from the table the conflicting row that has the duplicate key value
    2. Try again to insert the new row into the table

That means that if we execute a REPLACE on PS-primary-1 where the row with ID 7 does not exist but does exist in PS-primary-2, it will result in a duplicate key error, which I’ll show you below (as the row already exists):

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 09:20:13');
Query OK, 1 row affected (0.01 sec)

We connected to PS-primary-2, and indeed, we found the duplicate key error:

PS-primary-2 (none)> PAGER grep "Last_Err"; SHOW SLAVE STATUSG
PAGER set to 'grep "Last_Err"'
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table foo.persons; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000359, end_log_pos 26840
1 row in set (0.01 sec)

The commands we should execute would be the following:

set sql_log_bin=0;
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');

However, as we observed, the first one will fail due to a duplicate key.  So, how do we resolve this?  The safest option, in my humble opinion, would be to execute the three REPLACE statements on each of the instances by first invoking this command:

PS-primary-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-1 > Bye


PS-primary-2 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.00 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-primary-2 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)


PS-primary-2 > Bye


PS-replica-1 > set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('3', 'Colonel', 'Vil', 'PER', 'Lima', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('1', 'Iglesias', 'Par', 'IND', 'Jaipur', '2023-07-27 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > REPLACE INTO `foo`.`persons`(`personid`, `lastname`, `firstname`, `address`, `city`, `insert_timestamp`) VALUES ('7', 'Ewd', 'Woo', 'CAN', 'Vancouver', '2023-07-28 07:10:10');
Query OK, 1 row affected (0.01 sec)

PS-replica-1 > set global super_read_only=1; set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

Finally, we verify that all instances have the same data, either with the SELECT statement or the pt-table-checksum command.

$ for SERVE in 192.168.0.14 192.168.0.59 192.168.0.99; do echo ; echo $SERVE ; mysql -h $SERVE -e "select * from foo.persons"; done

192.168.0.14
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.59
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

192.168.0.99
+----------+-----------+-----------+---------+------------+---------------------+
| PersonID | LastName  | FirstName | Address | City       | Insert_timestamp    |
+----------+-----------+-----------+---------+------------+---------------------+
|        1 | Iglesias  | Par       | IND     | Jaipur     | 2023-07-27 07:10:10 |
|        2 | Rooster   | War       | IND     | Tamil Nadu | 2023-07-27 07:10:10 |
|        3 | Colonel   | Vil       | PER     | Lima       | 2023-07-27 07:10:10 |
|        4 | SoS       | Syl       | CAN     | Ontario    | 2023-07-27 07:10:10 |
|        5 | Al-lot    | Pat       | IND     | Delhi      | 2023-07-27 07:10:10 |
|        6 | The Crazy | Vai       | IND     | Vadodara   | 2023-07-27 07:10:10 |
|        7 | Ewd       | Woo       | CAN     | Vancouver  | 2023-07-28 07:10:10 |
+----------+-----------+-----------+---------+------------+---------------------+

$ pt-table-checksum h=192.168.0.14 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons

Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:32      0      0        7          0       1       0   0.185 foo.persons

$ pt-table-checksum h=192.168.0.59 --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate percona.checksums --recursion-method=hosts --max-load Threads_running=50 --max-lag=100 --databases=foo --tables=persons
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
07-28T14:12:50      0      0        7          0       1       0   0.148 foo.persons

Conclusion

Maintaining this type of topology (dual primaries) can be a headache, and they are prone to such problems. Therefore, resolving them is not always as simple as it may seem. The tools pt-table-checksum and pt-table-sync are essential for resolving these issues in most cases and for efficient verification, as in this case.

Remember that if you have any questions or concerns, you can always contact us.

Lastly, don’t forget to run the pt-table-sync with the –print option to verify and share with the client or data owner whether the data to be corrected and the proposed corrections seem appropriate. Always.

Let’s stay synchronized, and I’ll see you in the next blog!

 

Oct
30
2023
--

Talking Drupal #422 – Commerce Kickstart

Today we are talking about Commerce Kickstart, Commerce in General, and What’s new at Centarro with guest Ryan Szrama. We’ll also cover Navigation as our module of the week.

For show notes visit: www.talkingDrupal.com/422

Topics

  • High level overview of commerce kickstart
  • Is it a distribution
  • Will it move to recipes
  • Why use commerce directly over kickstart
  • Does kickstart lend itself to a specific type of site
  • Compare with Shopify
  • Do you ever recommend Shopify
  • Are there additional conditions or events being added
  • Can people contribute to kickstart
  • What is Centarro focused on now
  • What is the state of headless

Resources

Guests

Ryan Szarma – ryanszrama.com rszrama

Hosts

Nic Laflin – nLighteneddevelopment.com nicxvan John Picozzi – epam.com johnpicozzi Mark Casias – kanopi.commarkie

MOTW Correspondent

Martin Anderson-Clutz – @mandclu Navigation

  • Brief description:
    • Would you like to try out and give feedback on a proposed overhaul to how Drupal’s administration menu works? There’s a module for that.
  • Brief history
    • How old: project created in 2004, but the namespace was taken over earlier this year as a place to work on the proposed new navigation in the contrib space
    • Versions available: No releases yet, so you need to clone the repo into your custom modules
  • Maintainership
    • Very active development, commits in the past day
  • Number of open issues:
    • 46, 14 of which are bugs
  • Usage stats:
    • Officially 1 site is using it, but not recommended for production anyway
  • Maintainer(s):
    • Include Christina Chumillas, Sascha Eggenberger, Lauri Eskola, Mike Herschel, and more
  • Module features and usage
    • At this point, really a prototype, trying to define what the user experience should be
    • Worth noting that the latest release for the Gin admin theme also includes this new updated navigation as an experimental feature that can be updated, but still best to leave feedback on the Navigation project
    • The main idea is that instead of having dropdowns that fly out for deeper level menu items, the navigation is in a sidebar, with menu items that expand to reveal child items when clicked
    • It’s worth noting that dropdown menus with multiple levels handled in flyouts are a known usability pain point, and are often cited by industry experts like Jakob Neilsen as something to avoid
    • There are still some usability issues to be thought through with this approach, for example there is no longer a way to reach the top page of a section or subsection, because clicking on the link shows or hides the child items instead
    • This was a subject of some very active discussions at DrupalCon Europe last week, so I thought it would be good to cover this week, so our listeners can add their voices
Oct
23
2023
--

Talking Drupal #421 – The Future of Drupal

Today we are talking about The Future of Drupal article, What Challenges Drupal may have, and How we can overcome them together! with guest Ricardo Marcelino.

For show notes visit: www.talkingDrupal.com/421

Topics

  • Can you describe the article this topic is based on
  • Module percentages
  • Drupal’s challenges
  • Drupal’s unique selling proposition
  • How can Drupal expand relevance
  • Why is relevance important
  • How does Drupal move forward
  • What’s next for Drupal
  • Do you think this is a natural change for a project like Drupal
  • How do we ensure the future of Drupal

Resources

Guests

Ricardo Marcelino – omibee.com rfmarcelino

Hosts

Nic Laflin – nLighteneddevelopment.com nicxvan John Picozzi – epam.com johnpicozzi Mark Casias – kanopi.commarkie

MOTW Correspondent

Martin Anderson-Clutz – @mandclu (Next.js)[https://www.drupal.org/project/next]

  • Brief description:
    • Have you ever wanted to build a website with a React-based front end, but with lots of the robust and mature CMS capabilities that Drupal provides? There’s a module for that.
  • Brief history
    • How old: created in Jan 2021 by shadcn, who continues to support it, including in the #nextjs channel in Drupal slack
  • Versions available:
    • 1.6.3 for Drupal 9 & 10
  • Maintainership
    • Actively maintained, though it uses a Github repo as its place for collaboration, including issues
  • Number of open issues:
    • 135 open, 12 of them bugs
  • Test coverage? Y
  • Usage stats:
    • 1,249 sites
  • Maintainer(s):
    • shadcdn, who continues to support it, including in the #nextjs channel in Drupal slack
  • Module features and usage:
    • For anyone not familiar with Next.js, it’s a React framework for building front-end applications that gives developers a number a number of useful capabilities, such as dynamic routing, performance optimizations, integrations, and more
    • The Next.js module for Drupal optimizes your Drupal backend for use as the content repository for a Next.js front end
    • It add functionality like headless preview of unpublished content, even across multiple front end apps
    • It also supports Incremental Site Regeneration, a best practice to ensure your front end site can serve static pages for the fastest possible delivery, but without the risk of serving stale content
    • Works with the JSON:API Menu Items and JSON:API Views Drupal modules to allow even more of your front end site to be managed within your Drupal backend
    • There’s even a Next.js Webform module that allows your Drupal site builder to create webforms, and have the React forms automatically created
    • There is also a specialized version of Next.js, a “next-drupal-basic-starter”, also maintained by shadcn, that helps to get your Next.js front end working with Drupal more quickly
Oct
20
2023
--

Database Migration Plan: Avoiding Common Pitfalls in Open Source Migration

database migration plan

Many organizations are turning to open source solutions to streamline their operations and reduce costs. Open source migration can be a game-changer, offering flexibility, scalability, and cost-effectiveness. However, it’s not without its challenges. Below, we’ll explore the common pitfalls of open source migration and provide insights on how to avoid them.

Common pitfalls when migrating to open source

Lack of proper planning

Proper is the operative word here. Some level of planning is expected when migrating to open source. Heck, typing “database migration plan” into Google and reading this blog could constitute planning. But what separates proper planning from, well, “planning” planning?

In brief, a proper database migration plan includes: 

  • Establishing clear objectives:  Define what you want to achieve and how open source will get there. Are you looking to cut costs? Free yourself from lock-in? Provide self-service to developers?
  • Creating a detailed project plan: Develop a comprehensive project plan that outlines the tasks, timelines, and responsibilities of all stakeholders.
  • Taking stock of resources:  Do you have the right people with the necessary skills on board? Or is it necessary to call in an expert third party? Can you allocate an appropriate budget and time frame for the project?
  • Identifying and mitigating risks: Conduct a thorough risk assessment to identify potential obstacles. Develop strategies to mitigate these risks and have contingency plans in place.
  • Communication and training: Keep all stakeholders informed about the progress and changes throughout the migration process. Provide training to staff members, DBAs, and developers to ensure they are well-prepared for the transition.
  • Testing and QA: Test the open source solution thoroughly to identify and address any issues before the full-scale migration.

Bottom line: Avoid a rushed migration to open source. The last thing you want to do is struggle with project scope, timelines, and resource allocation.

Inadequate training

Transitioning to open source requires a team that understands your target database solution. Failure to provide adequate training to your operations and development staff— on database configuration knowledge, best practices, and backup and recovery techniques, to name a few key areas — can lead to frustration, resistance, and inefficiencies. 

Neglecting compatibility issues

Compatibility between existing systems and open source software is critical. Neglecting this aspect can result in data loss, functionality issues, and integration challenges. To assess compatibility and ensure a successful database migration, we recommend doing the following: 

  • Assessment: Assess your existing database, applications, and infrastructure. Document the current state, including data schemas, stored procedures, and dependencies.
  • Data mapping and transformation: Create a detailed data map that identifies the structure and format of your existing data. Use ETL (Extract, Transform, Load) tools to assist with this process.
  • Testing: Create a testing environment that mirrors your production setup. Test the migration thoroughly, including data integrity, application functionality, and performance. 
  • Modify or refactor applications: If your applications are tightly coupled with the proprietary database, you may need to modify or refactor them to work with the open source database. This could involve changing SQL queries, database drivers, or data access layers.
  • Data validation and migration: Before migration, ensure that your data is clean and consistent. Remove duplicates, address data quality issues, and validate data to prevent issues during migration.

Overlooking security

We wrote about this topic in greater depth in our blog post, Open Source Software Security: Is Open Source Software Safe? but, in short, open source software is no more or less secure than proprietary software. Still, misconfigurations and lax security practices when migrating can expose your organization to vulnerabilities. So take the time to ensure that the open source database meets your security and compliance requirements — conducting security audits and implementing necessary access controls and encryption measures.

Underestimating customization efforts

Open source software often requires customization to meet specific business, security, and compliance needs. For example, PostgreSQL, a popular destination for organizations leaving proprietary databases like Oracle, is not enterprise-ready out of the box. It lacks many of the high availability and security features critical for production environments. Underestimating the effort required for customization can lead to project delays.

Poor communication

Effective communication among stakeholders is crucial. Miscommunication can lead to unclear expectations, misaligned priorities, scope creep, decision delays, and risk ignorance. It can also exacerbate any of the aforementioned problems. Clearly communicate project goals with all necessary stakeholders, provide regular updates, ensure transparency, and thoroughly define roles, responsibilities, and expectations. 

Budget and resource constraints

Cost-saving is a key driver of open source adoption. Yet, many open source migration costs can escalate if not managed properly. Then, there are the opportunity costs that can originate from a delayed (or botched) migration. What happens, for example, if you experience prolonged periods of downtime? Give serious consideration to both the ability of your on-staff to execute a migration as well as the need to budget for any unforeseen expenses. 

Failure to conduct quality assurance and testing

Thorough database migration testing is essential to ensure that the open source solution functions correctly. Skipping this step can lead to post-migration issues. Specifically, quality assurance and testing should involve:

  • Functional testing: Ensure that all features and functions of the open source solution work as intended. Test various scenarios to validate functionality.
  • Data migration testing: Verify that data is accurately and securely migrated from the old system to the new one. Test data integrity, formats, and relationships.
  • Integration testing: Assess how the open source solution interacts with other systems, including data flow and synchronization.
  • Performance testing: Evaluate the system’s performance under different conditions, such as load testing, to ensure it can handle expected user loads.
  • Security testing: Perform security assessments to identify and address vulnerabilities.
  • Regression testing: After any changes or updates, conduct regression testing to ensure that new features or fixes do not introduce new issues.

Documentation gaps

Maintaining comprehensive documentation is often overlooked. Over time, details about the migration process, configurations, and customizations can be lost.  Documentation helps prevent knowledge erosion, onboarding hurdles, and troubleshooting delays.  

Ignoring Community support

Open source thrives on community support. A strong and active community can provide valuable resources, including tutorials, forums, and documentation for solving technical issues and maintaining performant operations. Ignoring this valuable resource can also limit your access to updates, patches, and best practices.

Database migration planning: Next steps

Our experts will help you create and validate an open source database migration strategy tailored to your unique business and technical requirements. Then, if you want, they can help you execute. You can learn more here. 

If you’re not yet ready — or wish to learn more about planning for a database migration —  check out our Database Migration Planning Checklist. It’s full of helpful tips and tricks to help you plan your move. 

 

Get your Database Migration Planning Checklist

Oct
20
2023
--

Scalable Solutions with Percona Distribution for PostgreSQL (Part 2): Using Citus

Scalable Solutions with Percona Distribution for PostgreSQL Using Citus

This blog post is part two of a multi-post series about using the Citus extension with Percona Distribution for PostgreSQL. You can read part one here: Scalable Solutions With Percona Distribution for PostgreSQL: Set Up Three PostgreSQL Database Instances.

Citus is an open-source extension for PostgreSQL that expands its capacity to manage databases of varying scales, ranging from individual nodes to extensively distributed database clusters. Citus maintains seamless integration with PostgreSQL.

Citus has many advantages, one of which is the “Sharding Technique” and this is what we are going to explore in this blog post. We will use the “Sharding Technique” with Percona Distribution for PostgreSQL.

This is how Citus defines sharding: “Sharding is a technique used in database systems and distributed computing to horizontally partition data across multiple servers or nodes. It involves breaking up a large database or dataset into smaller, more manageable parts called Shards. Each shard contains a subset of the data, and together, they form the complete dataset.”

postgresql sharding

Citus official website. “Distribute Tables”, URL

There are three types of tables in a Citus cluster, each serving a distinct purpose. We will be utilizing the Distributed Tables type.

“These appear to be normal tables to SQL statements but are horizontally partitioned across worker nodes.” – Citus

Percona Distribution for PostgreSQL is a collection of tools to assist you in managing your PostgreSQL database system: it installs PostgreSQL and complements it with a selection of extensions that enable solving essential practical tasks efficiently. Some of these extensions are HAProxy, Patroni, pgAudit, and much more.

The goal of this tutorial is to show a simplified example of how you might use Citus with Percona Distribution PostgreSQL 15.

Ready?

Let’s start with…

For this tutorial, I am using

  1. Operating System Ubuntu 20.04
  2. Percona Distribution for PostgreSQL 15 on Ubuntu

If you want to check the version of Percona PostgreSQL, you can use the ‘psql’ command-line utility for interacting with PostgreSQL databases. To do this, open a terminal and run the following SQL query: “SELECT version();” This will display the version of Percona PostgreSQL installed on your system.

sudo su postgres
psql

You will have this as an output:

psql (15.4 - Percona Distribution)
Type "help" for help.

postgres=# SELECT VERSION();
                                                           version
------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.4 - Percona Distribution on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

Installing Citus for Distributed PostgreSQL

Let’s install Citus version 12 in our three nodes of Percona Distribution for PostgreSQL.

# Let's download the deb.sh script to install Citus
curl https://install.citusdata.com/community/deb.sh > add-citus-repo.sh

# Execute the script
sudo bash add-citus-repo.sh

# Install Citus package
sudo apt-get -y install postgresql-15-citus-12.0

Add Citus as a PostgreSQL library

Once Citus is installed, we will add “citus” as a shared library to preload when the PostgreSQL server starts. This should be done on all three nodes of Percona Distribution for PostgreSQL.

sudo su postgres
psql
ALTER SYSTEM SET shared_preload_libraries=citus;

Restart Percona PostgreSQL to apply the changes.

sudo systemctl restart postgresql

Create the Citus extension

We will create and enable the Citus extension within the database. Make sure you run these commands with the “postgres” user, also on all three Percona Distribution for PostgreSQL nodes.

sudo su postgres
psql
CREATE EXTENSION citus;

Now, let’s check if the Citus extension has been correctly created.

# Checking Citus version
select citus_version();

You will see an output similar to this:

citus_version
----------------------------------------------------------------------------------------------------
 Citus 12.0.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

Set up a multi-node Citus cluster

We start by setting the coordinator host. This is used in a Citus distributed database environment to set the IP address and port of the coordinator node. We will take the first node we created as a coordinator host.

This coordinator node manages the overall cluster and coordinates queries and data distribution among other nodes.

sudo su postgres
psql
SELECT citus_set_coordinator_host('172.31.88.41', 5432);

It is time to add each node to the Citus cluster. Make sure to run these commands with “postgres” user.

SELECT * from citus_add_node('172.31.86.26', 5433);
SELECT * from citus_add_node('172.31.89.45', 5434);

Check distributed worker nodes

Now, let’s list the distributed nodes in the cluster, including the main node.

sudo su postgres
psql
postgres=# SELECT * FROM pg_dist_node;
 nodeid | groupid |   nodename    | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+---------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
      1 |       0 | 172.31.88.41 |     5432 | default  | t           | t        | primary  | default     | t              | f
      6 |       5 | 172.31.86.26 |     5433 | default  | t           | t        | primary  | default     | t              | t
      7 |       6 | 172.31.89.45 |     5434 | default  | t           | t        | primary  | default     | t              | t
(3 rows)

Or you can also list the active Postgres Citus nodes in the cluster:

postgres=# SELECT * FROM citus_get_active_worker_nodes();
 node_name | node_port
-----------+-----------
 172.31.86.26 |      5433
 172.31.89.45 |      5434
(2 rows)

Once this is ready, we will log in to the main PostgreSQL and create our tables “users” and “events”

Creating the database

```sqlsq
-- Create basic users table
CREATE TABLE users (
    user_id serial PRIMARY KEY,
    user_name text,
    email text
);
-- Create basic events table
CREATE TABLE events (
    event_id bigserial,
    user_id int,
    event_time timestamp,
    data jsonb,
    PRIMARY KEY (user_id, event_id)
);
```

In this example, we create two tables:

“users” table with the following columns:

  • user_id: This column will be used for sharding.
  • user_name: Name of the user text type.
  • email:  Email of the user text type.

And  “events” table with the following columns:

  • event_id: A serial primary key to ensure unique event IDs. This column will be used for sharding.
  • user_id: This column is to register user’s id.
  • event_time: A timestamp to record when the event occurred.
  • data: A JSONB column for storing event-related data.

Creating distributed tables

Now, let’s distribute the “users” and “events” tables across shards placed locally or on the Postgres Citus nodes. In this case, we are distributing through the “user_id” field.

SELECT create_distributed_table('users', 'user_id');
SELECT create_distributed_table('events', 'user_id');

Insert data into our sharded tables

Now, insert data into the sharded table:

-- Insert user data
INSERT INTO users (user_name, email)
VALUES
    ('user1', 'user1@example.com'),
    ('user2', 'user2@example.com'),
    ('user3', 'user2@example.com');

-- Insert event data
INSERT INTO events ( user_id, event_time, data)
VALUES
    (1, '2023-09-01 10:00:00', '{"temperature": 25.5, "humidity": 62}'),
    (2, '2023-09-01 11:30:00', '{"temperature": 22.0, "humidity": 56}'),
    (1, '2023-09-02 08:15:00', '{"temperature": 24.0, "humidity": 59}'),
    (3, '2023-09-08 07:06:00', '{"temperature": 25.8, "humidity": 66}'),
    (1, '2023-09-10 04:00:00', '{"temperature": 20.0, "humidity": 54}');
    (1, '2023-09-04 10:00:00', '{"temperature": 25.5, "humidity": 62}'),
    (2, '2023-09-05 11:30:00', '{"temperature": 22.0, "humidity": 50}'),
    (1, '2023-09-03 08:15:00', '{"temperature": 24.0, "humidity": 56}'),
    (3, '2023-09-06 07:06:00', '{"temperature": 25.8, "humidity": 65}'),
    (1, '2023-09-09 04:00:00', '{"temperature": 20.0, "humidity": 52}');

When you use the primary key (user_id) to create a distributed table in Citus, it means you’re effectively “hash-partitioning” the data based on the user_id column.  This means that events with the same user_id will be stored together on the same shard.

Citus takes the values in the ID column, applies a hash function to each value, and assigns each row to a shard based on the hash value. This process ensures that rows with similar ID values are distributed across multiple shards in a way that attempts to distribute the data evenly.

Testing and query analysis

Query the sharded table:

SET citus.explain_all_tasks TO on;
EXPLAIN ANALYZE SELECT * FROM events;

“SET citus.explain_all_tasks TO on”; is used to enable the “explain all tasks” feature in the Citus extension for PostgreSQL. This feature is particularly useful when you’re working with distributed queries in a Citus cluster.

“EXPLAIN ANALYZE”, is used to analyze and explain the execution plan of a SQL query in PostgreSQL. 

We can see all tasks, and it also shows all Postgres Citus nodes.

We will analyze the first task.

The output showcases the advantage of parallel processing. Citus demonstrated its ability to parallelize operations, distributing the workload across multiple nodes, in this case on the nodes on ports 5433 and 5434, resulting in remarkably fast query execution times. 

This was an example of how to use Citus to sharding in “events” table. The main advantage of sharding with Citus is its ability to scale PostgreSQL databases horizontally, spreading data across multiple nodes to accommodate growing data sets and high workloads.

Note: Sharding may not be necessary for all applications; it’s typically most beneficial when dealing with large-scale datasets and high-concurrency workloads.

Check more about Percona PostgreSQL and multi-node Citus, and visit our Percona Community Forum page in case you have questions. We are happy to help!

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Oct
19
2023
--

Scalable Solutions With Percona Distribution for PostgreSQL (Part 1): Set Up Three PostgreSQL Database Instances

Scalable Solutions With Percona Distribution for PostgreSQL

Welcome to the first installment of our series: Scalable Solutions with Percona Distribution for PostgreSQL. In this guide, we will demonstrate how to establish a Citus database spanning multiple nodes by using Percona Distribution for PostgreSQL. This setup empowers your database to efficiently manage increased data volumes, enhance performance, and maintain availability.

This initial section will guide you through establishing a Percona Distribution PostgreSQL in three EC2 instances.

Then, in the next part of this series, Scalable Solutions with Percona Distribution for PostgreSQL (Part 2): Using Citus, we will use these worker nodes to establish a multi-node Citus cluster and implement sharding techniques.

This tutorial will use Ubuntu 20.04 and Percona Distribution for PostgreSQL 15.

What we have so far is this:

  • Three EC2 instances in the same subnet.

Three EC2 instances in the same subnet

  • A security group in AWS for PostgreSQL with inbound rules allowing traffic on ports 5432, 5433, and 5434 for PostgreSQL.

Installing Percona Distribution for PostgreSQL on the three instances

Let’s first install Percona Distribution for PostgreSQL on each EC2 instance. Use the following procedure for all the three nodes.

# Download Debian package file for Percona Realease
wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb

# Install Percona Debian package
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb

# Refresh local package database
sudo apt update

# Add Percona repository for PostgreSQL to our system's list of repositories.
sudo percona-release setup ppg-15

# Install "percona-ppg-server-15" package
sudo apt install percona-ppg-server-15

Now, let’s verify if the service is operating correctly.

sudo systemctl status postgresql.service

You should see this output, indicating an ‘Active’ status in all the instances.

● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: active (exited) since Mon 2023-09-18 13:36:07 UTC; 1h 15min ago
    Process: 706 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 706 (code=exited, status=0/SUCCESS)

If you want to check the version of Percona PostgreSQL, you can use the ‘psql’ command-line utility for interacting with PostgreSQL databases. To do this, open a terminal and run the following SQL query: “SELECT version();” This will display the version of Percona Distribution PostgreSQL installed on your system.

sudo su postgres
psql

You will have this as an output:

postgres=# SELECT VERSION();
                                                           version
------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.4 - Percona Distribution on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

Configuring postgresql.conf on three instances

Before starting the database on each node, modifying specific configuration files is essential. To do this, follow these steps:

  1. Navigate to the PostgreSQL Configuration Directory: Locate the ‘postgresql.conf’ file, typically found at the path ‘/etc/postgresql/15/main’. You will need to access this file to make the necessary adjustments.
  2. Log in as the ‘postgres’ User: To access and modify the ‘postgresql.conf’ file, you should log in using the “postgres” user account, ensuring the necessary permissions for configuration changes.
sudo su postgres

      3. Configure Essential Settings: You’ll configure vital settings within the ‘postgresql.conf’ file. These settings may include specifying each node’s IP address, PORT, and SSL certificate. Ensure that these values align with your desired database configuration.

Node main

# Ip address
listen_addresses = '172.31.88.41'

# Port for node1_data worker
port = 5432

# - SSL -
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'

Node worker01

# Ip address
listen_addresses = '172.31.86.26'

# Port for node1_data worker
port = 5433

# - SSL -
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'

Node worker02

# Ip address
listen_addresses = '172.31.89.45'

# Port for node1_data worker
port = 5434

# - SSL -
ssl = on
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'

Configuring pg_hba.conf for three instances

Now, let’s modify ‘pg_hba.conf,’ which can be found in the same directory as the ‘postgresql.conf’ file: ‘/etc/postgresql/15/main’.

Node main

# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer

# IPv4 local connections:
host    all             all           127.0.0.1/32              trust

# IPv6 local connections:
host    all             all            ::1/128                 scram-sha-256

# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Node worker01

# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer

# IPv4 local connections:
host    all            postgres       72.31.88.41/32            trust

# IPv6 local connections:
host    all             all           ::1/128                   scram-sha-256

# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Node worker02

# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer

# IPv4 local connections:
host    all             postgres        172.31.88.41/32         trust

# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256

# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Note: Using “trust” in the settings is too permissive. See Citus notes about Increasing Worker Security. The PostgreSQL manual explains how to make them more restrictive.

Restart all PostgreSQL instances

Now, we will start the PostgreSQL instances. 

sudo systemctl restart postgresql,

Three nodes of Percona PostgreSQL that are listening on ports 5432, 5433, and 5434.

Node main on port 5432

ubuntu@ip-172-31-88-41:~$ ss -tuln
Netid  State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port    Process
tcp    LISTEN   0       244   172.31.88.41:5432      0.0.0.0:*

Worker01 on port 5433

ubuntu@ip-172-31-86-26:~$ ss -tuln
Netid  State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port    Process
tcp    LISTEN   0      244    172.31.86.26:5433       0.0.0.0:*

Worker02 on port 5434

ubuntu@ip-172-31-89-45:~$ ss -tuln
Netid  State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port    Process
tcp    LISTEN   0      244    172.31.89.45:5434      0.0.0.0:*

Are you ready to see what we can do with these Percona Distributions for PostgreSQL nodes? 

Check the Scalable Solutions with Percona Distribution for PostgreSQL (Part 2): Using Citus, and Percona Distribution for PostgreSQL 15: An Introduction to Scalable Database Solutions.

In the meantime, you can learn more about the Percona Distribution for PostgreSQL. Additionally, if you’re interested in deploying PostgreSQL on Kubernetes, you can explore information about the Percona Operator for PostgreSQL.

You can also contact Percona’s experts for assistance in maximizing your application’s performance. We offer support for open-source databases, managed services, and consulting services, or visit our Percona Community forum; we are happy to help.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Oct
19
2023
--

Our Kudos to All Percona Community Contributors

percona community

Percona stands for open source and keeps open source open for everyone. Open source software is impossible without the efforts of the community. Today, we want to personally thank the most active contributors. They not only contributed code but also reported issues, helped users on the forum, and wrote blog posts on the Community website. We highly appreciate all your effort and support.

Let’s meet the heroes!

Percona Monitoring and Management (PMM)

Naresh Chandra is very active on the forum, providing a piece of advice to other users. He also published more than 30 issues in Jira Percona, reporting bugs and suggesting PMM improvements. Our PMM team specifically highlighted Naresh’s contribution.

Marko Sutic reported issues related to Explain Plan.

Anish Rajan fixed the issue when some PRs failed on linters.

Jonas Genannt added the ability to receive freeStorage stats from dbstats.

naughtyGitCat fixed the issue when exporter binary failed on arbiter role instance.

Leigh Ola, rikwasmus, Vishwas Sharma, and Marc Tudurí also did their part in improving PMM. Thank you!

Percona Kubernetes Operators

Vishal Anarase added support for a custom user secret name to Percona Operator for PostgreSQL and added changes to allow configuring HAProxy, HAProxyReplica, and ProxySQL service as headless using service annotation to Percona Operator for MySQL together with Yang Gang.

Jo Lyshoel resolved the issue when some traffic was leaving the ServiceMash in Percona Operator for MongoDB.

Vitaliy Orbidan added the ability to add external replicaset members when the replicaset is not exposed via k8s service to Percona Operator for MongoDB.

Anton Ivanov also contributed to Percona Operator to MongoDB and improved the security of cronjob by removing extra verbose logging.

Rush Simonson added loadBalancerIP field to pgPrimary and pgBouncer to Percona Operator for PostgreSQL.

Big thanks to Denis Khachyan and Rodney Karemba, who also contributed to Percona Operators.

Percona HELM Charts

Dragos Boca added variable DISABLE_TELEMETRY required to enable or disable telemetry sending.

Igor Blackman added the option to skip ServiceAccounts and RBAC.

John Carew fixed the issue with gRPC that occurs when using an ingress.

Serge added securityContext to avoid warning messages on the deploy process and podAnnotations helps to Prometheus to collect operator metrics.

Slava Utesinov added advanced affinity option into a chart alongside with antiAffinityTopologyKey.

Thomas Petit, Hal Lesesne, and Christ-Jan Prinse also improved Percona HELM Charts.

Percona Toolkit

Daniël van Eeden added a template for pull requests for Percona Toolkit.

PinoCao fixed the issue when –skip-check-slave-lag multiple times does not work as expected.

Viktor Szépe did a lot of work to clean up Percona Toolkit from different kinds of typos.

Marcelo HP Ferreira added a new hook to the pt-online-schema-change script that allows users to write a custom code to get information from table row_cnt, nibble_time, progress, and rate. Metrics can be submitted from that hook without further changes to the shared Progress class.

Kushal Haldar reported some vulnerability issues in Percona Toolkit. Special thanks to Kushal from Percona’s Principal Support Engineering Coordinator, Sveta Smirnova.

We also would like to thank Ricardo and Hanzhongzi for their contributions.

Percona Server for MySQL

Coby Geralnik fixed the issue when using the authentication_ldap_simple plugin; the LDAP query was not properly escaped when the user DN contains a double quote ” in their name.

Gena Makhomed reported several issues in Jira and suggested improvements.

Jean-François Gagné is an active community member. He not only suggested improvements in Percona Server for MySQL but also created a Planet for the MySQL Community blog aggregator and participated in Percona LIve as a speaker.

Nikolay Yankin and Valeriy Solovyov also helped to improve Percona Server for MySQL.

Percona XtraDB Cluster

Nedzinskas Laimis and Tono Mao reported issues in Percona XtraDB Cluster.

Percona XtraBackup

We thank Wangbincmss, who added a innodb_redo_log archive option, and Ahmed Et-tanany for reporting issues.

PostgreSQL

Gert van den Berg added the version on the virtual package “Provides” for the Debian packages, which fixes third-party packages that depend on specific versions of the packages.

Community

Wayne Leutwyler did so much on Percona Forum helping others. He also wrote blog posts for Percona Community Blog describing his experience on building Percona software on a Raspberry Pi.

Also, we would like to thank everyone who came to talks by Percona speakers or visited our booths on various conferences and events. It is such a pleasure to see your interest and feel your support. Check out all the upcoming events. Looking forward to seeing you there!

We so much appreciate every contribution. There are no minor ones. We encourage you to continue supporting open source actively, especially these days when open source is under pressure. You make Percona software better, you make open source better, and, after all, you make the world a better place for everyone.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com