Apr
16
2021
--

Integrated Alerting Design in Percona Monitoring and Management

Integrated Alerting Design Percona Monitoring and Management

Integrated Alerting Design Percona Monitoring and ManagementPercona Monitoring and Management 2.13 (PMM) introduced the Integrated Alerting feature as a technical preview. It adds a user-friendly way to set up and manage alerts for your databases. You can read more about this feature usage in our announcement blog post and in our documentation, while in this article we will be focusing on design and implementation details.

Entities

There are four basic entities used for IA: Alert Rule Template, Alert Rule, Alert, and Notification Channel.

Everything starts from the alert rule template. You can see its YAML representation below:

---
templates:
 - name: pmm_mongodb_high_memory_usage
   version: 1
   summary: Memory used by MongoDB
   expr: |-
     sum by (node_name) (mongodb_ss_mem_resident * 1024 * 1024)
     / on (node_name) (node_memory_MemTotal_bytes)
     * 100
     > [[ .threshold ]]
   params:
     - name: threshold
       summary: A percentage from configured maximum
       unit: "%"
       type: float
       range: [0, 100]
       value: 80
   for: 5m
   severity: warning
   labels: 
     cultom_label: demo
   annotations:
     summary: MongoDB high memory usage ({{ $labels.service_name }})
     description: |-
       {{ $value }}% of memory (more than [[ .threshold ]]%) is used
       by {{ $labels.service_name }} on {{ $labels.node_name }}.

A template serves as the base for alert rules. It defines several fields, let’s look at them:

  • name: uniquely identifies template (required)
  • version: defines template format version (required)
  • summary: a template description (required)
  • expr: a MetricsQL query string with parameter placeholders. MetricsQL is backward compatible with PromQL and provides some additional features. (required)
  • params: contains parameter definitions required for the query. Each parameter has a name, type, and summary. It also may have a unit, available range, and default value.
  • for: specifies the duration of time the expression must be met for;  The  alert query should return true for this period of time at which point the alert will be fired (required)
  • severity: specifies default alert severity level (required)
  • labels: are additional labels to be added to generated alerts (optional)
  • annotations: are additional annotations to be added to generated alerts. (optional)

A template is designed to be re-used as the basis for multiple alert rules so from a single pmm_node_high_cpu_load template you can have alerts for production vs non-production, warning vs critical, etc.

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

Users can create alert rules from templates. An alert rule is what’s actually executed against metrics and what produces an alert. The rule can override default values specified in the template, add filters to apply the rule to only required services/nodes/etc, and specify target notification channels, such as email, Slack, PagerDuty, or Webhooks. If the rule hasn’t any associated notification channels its alerts will be available only via PMM UI. It’s useful to note that after creation rule keeps its relation with the template and any change in the template will affect all related rules.

Here is an alert rule example:

---
groups:
 - name: PMM Integrated Alerting
   rules:
     - alert: /rule_id/c8e5c559-ffba-43ed-847b-921f69c031a9
       rule: test
       expr: |-
         sum by (node_name) (mongodb_ss_mem_resident * 1024 * 1024)
         / on (node_name) (node_memory_MemTotal_bytes)
         * 100
         > 40
       for: 5s
       labels:
         ia: "1"
         rule_id: /rule_id/c8e5c559-ffba-43ed-847b-921f69c031a9
         severity: error
         template_name: pmm_mongodb_high_memory_usage
         cultom_label: demo
       annotations:
         description: |-
         { { $value } }% of memory (more than 40%) is used
         by {{ $labels.service_name }} on {{ $labels.node_name }}.
         summary: MongoDB high memory usage ({{ $labels.service_name }})

It has a Prometheus alert rule format.

How it Works

Integrated Alerting feature built on top of Prometheus Alertmanager, VictoriaMetrics TimescaleDB (TSDB), and VMAlert.

VictoriaMetrics TSDB is the main metrics storage in PMM, VMalert responsible for alert rules execution, and Prometheus Alertmanager responsible for alerts delivery. VMAlert runs queries on VM TSDB, checks if they are positive for the specified amount of time (example: MySQL is down for 5 minutes), and triggers alerts. All alerts forwarded to the PMM internal Alertmanager but also can be duplicated to some external Alertmanager (it can be set up on the PMM Settings page).

There are four available templates sources:

  1. Built-in templates, shipped with PMM distribution. They are embedded into the managed binary (core component on PMM).
  2. Percona servers. It’s not available yet, but it will be similar to the STT checks delivery mechanism (HTTPS + files signatures).
  3. Templates created by the user via PMM UI. We persist them in PMM’s database.
  4. Templates created by the user as files in the /srv/ia/templates directory.

During PMM startup, managed loads templates from all sources into the memory.

Alert rules can be created via PMM UI or just by putting rule files in the /srv/prometheus/rules directory. Alert rules created via UI persist in PMM’s internal PostgreSQL database. For each alert rule from DB, managed binary creates a YAML file in /etc/ia/rules/ and asks VMalert to reload the configuration and reread rule files. VMAlert executes query from each loaded alert rule every minute, once the rule condition is met (query is positive for the specified amount of time) VMAlert produces an alert and passes it to the Alertmanager. Please note that /etc/ia/rules/ controlled by managed and any manual changes in that directory will be lost.

Managed generates configuration for Alertmanager and updates it once any related entity changes.

Managed goes through the list of the existing rules and collects unique notification channel combinations. For example, if we have two rules and each of them has assigned channels a,b, and c it will be the one unique channel combination. For each rule managed generates a route and for each unique channel combination, it generates a receiver in the Alertmanager configuration file. Each route has a target receiver and filter by rule id, also it can contain user-defined filters. If a rule hasn’t assigned notification channels, then a special empty receiver will be used. Users can redefine an empty receiver with Alertmanagers base configuration file /srv/alertmanager/alertmanager.base.yml. When some Notification Channel is disabled, managed recollects unique channel combinations excluding disabled channels and regenerates receivers and routing rules. If the rule has only one specified channel and it was disabled then a special disabled receiver will be used for that. Unlike empty receiver, disabled can’t be redefined by the user and always means “do nothing”.  It prevents unexpected behavior after channels disabling. After each Alertmanager configuration update, managed asks Alermanager to reload it.

When Alertmanager receives an alert from VMAlert, it uses routes to find an appropriate receiver and forward alerts to destination channels. The user also can observe alerts via PMM UI. In that case, managed gets all available alerts from Alertmanager API and applies required filters before showing them.

Conclusion

The Integrated Alerting feature has many moving parts, and functionally it’s more about managing configuration for different components and making them work together. It provides a really nice way to be aware of important events in your system. While it’s still in tech preview state, it’s already helpful. With built-in templates, it’s easy to try without diving into documentation about Prometheus queries and other stuff. So please try it and tell us about your experience. What parameters of a system you would like to have covered with templates? What use cases do you have for alerting? We will happy to any feedback.

Apr
16
2021
--

Add Microsoft Azure Monitoring Within Percona Monitoring and Management 2.16.0

microsoft azure percona monitoring and management

microsoft azure percona monitoring and managementThe Microsoft Azure SQL Database is among the most popular databases of 2020, according to DB-Engine’s DBMS of the Year award. Also, it’s steadily climbing up in DB-Engines Ranking. The ranking is updated monthly and places database management systems according to their popularity. In case you didn’t know, DB-Engines is an initiative to collect and present information on database management systems (DBMS).

So we are excited to share that you can now monitor Azure instances in the Percona Monitoring and Management 2.16.0 (PMM) release. PMM can collect Azure DB metrics as well as available system metrics.

Only basics metrics are provided by Azure Portal.


No Disk, virtual CPU, or RAM data are available in PMM dashboards. Here is an example of a home page with a monitored Azure service. It’s shown in the middle row.

Microsoft Azure Monitoring Within Percona Monitoring and Management
DB metrics are collected by exporters from services directly. It allows you to have all possible metrics. You can find some screenshots of MySQL and PostgreSQL dashboards at the end of this blog post.

Simple Steps to Add an Azure DB Service and Get Metrics in PMM

  • The feature is a technical preview and has to be enabled on the Setting page. Turning this feature OFF will not remove added Services from monitoring, it will just hide the ability to dscover and add new Microsoft Azure Services.

Add an Azure DB Service PMM
This feature is a technical preview because we are releasing it as soon as possible to get some feedback from users. We are expecting to do more work on this feature, to make it more API and resource-efficient.

  • Go to page “Add Instance” (Configuration … PMM Inventory … Add instance)

  • Press the button “Microsoft Azure MySQL or PostgreSQL” and fill in the requested Azure and DB credentials.

Microsoft Azure MySQL or PostgreSQL

Please follow the link “Where do I get the security credentials for my Azure DB instance” if some credential parameters are missing.

Also, please keep in mind that a separate node will be created for each service. It’s named as a service hostname and can’t be changed. But you may specify a service name when adding service details. By default, node and service names are equal.

That’s it. You may go to the list of dashboards and observe collected data.

If you are a Microsoft Azure user or going to become one, please give Percona Monitoring and Management a test run.  We are always open to suggestions and propositions.  Please contact us, leave a message on our forum, or join the slack channel.

Here are screenshots of the “MySQL Instance Summary” and “PostgreSQL Instance Summary” dashboards for Azure instances.

 

Read more about the release of Percona Monitoring and Management 2.16 and all the exciting new features included with it!

Percona Live ONLINE, the open source database conference, is coming up quickly! Registration is now OPEN… and FREE! 

Apr
15
2021
--

Platform End of Support Announcement for Ubuntu 16.04 LTS

EOL Ubuntu 16.04

EOL Ubuntu 16.04The End Of Support date for Ubuntu 16.04 LTS is coming soon. According to the Ubuntu Release Life Cycle, it will be at the end of April 2021. With this announcement comes some implications to support for Percona software running on these operating systems.

So we will no longer be producing new packages and binary builds for Ubuntu 16.04.

We generally align our platform end of life/support dates with those of the upstream platform vendor. The platform end of life/support dates are published in advance on our website on the  Percona Software support life cycle page

According to our policies, Percona will continue to provide operational support for your databases on Ubuntu 16.04. However, we will be unable to provide any bug fixes, builds, or OS-level assistance if you encounter an issue outside the database itself.

Each platform vendor has a supported migration or upgrade path to their next major release. Please reach out to us if you need assistance in migrating your database to your vendor’s supported platform – Percona will be happy to assist you.

Apr
15
2021
--

Tecton teams with founder of Feast open source machine learning feature store

Tecton, the company that pioneered the notion of the machine learning feature store, has teamed up with the founder of the open source feature store project called Feast. Today the company announced the release of version 0.10 of the open source tool.

The feature store is a concept that the Tecton founders came up with when they were engineers at Uber. Shortly thereafter an engineer named Willem Pienaar read the founder’s Uber blog posts on building a feature store and went to work building Feast as an open source version of the concept.

“The idea of Tecton [involved bringing] feature stores to the industry, so we build basically the best in class, enterprise feature store. […] Feast is something that Willem created, which I think was inspired by some of the early designs that we published at Uber. And he built Feast and it evolved as kind of like the standard for open source feature stores, and it’s now part of the Linux Foundation,” Tecton co-founder and CEO Mike Del Balso explained.

Tecton later hired Pienaar, who is today an engineer at the company where he leads their open source team. While the company did not originally start off with a plan to build an open source product, the two products are closely aligned, and it made sense to bring Pienaar on board.

“The products are very similar in a lot of ways. So I think there’s a similarity there that makes this somewhat symbiotic, and there is no explicit convergence necessary. The Tecton product is a superset of what Feast has. So it’s an enterprise version with a lot more advanced functionality, but at Feast we have a battle-tested feature store that’s open source,” Pienaar said.

As we wrote in a December 2020 story on the company’s $35 million Series B, it describes a feature store as “an end-to-end machine learning management system that includes the pipelines to transform the data into what are called feature values, then it stores and manages all of that feature data and finally it serves a consistent set of data.”

Del Balso says that from a business perspective, contributing to the open source feature store exposes his company to a different group of users, and the commercial and open source products can feed off one another as they build the two products.

“What we really like, and what we feel is very powerful here, is that we’re deeply in the Feast community and get to learn from all of the interesting use cases […] to improve the Tecton product. And similarly, we can use the feedback that we’re hearing from our enterprise customers to improve the open source project. That’s the kind of cross learning, and ideally that feedback loop involved there,” he said.

The plan is for Tecton to continue being a primary contributor with a team inside Tecton dedicated to working on Feast. Today, the company is releasing version 0.10 of the project.

Apr
14
2021
--

Win Percona Swag for Technical Product Reviews!

Percona Swag Review

Percona Swag ReviewYou can earn Percona swag by writing a short technical review about Percona software by May 30, 2021!  The keyword here is technical. Percona is seeking reviews that describe Percona software use in particular environments or in solving particular issues, and why!

This means being specific about your Percona environment, architecture, and problem-solving. Usage of any size, from small to enterprise, is welcome. Relevant topics may include the operating system, CPU, memory, volumes of data, engines, programming languages, applications, versions, configurations, bugs, performance, obstacles, etc. Naturally, you should not disclose anything confidential or proprietary when you write.

App marketplaces collect hundreds of user reviews and help users make sound decisions. If Percona has been good for you, your testimonials and recommendations can also help others find similar success. Add your review to one of these well-known software marketplaces: Capterra, G2, and SourceForge under one of these seven Percona product areas:

Percona Product Capterra   G2           SourceForge
Percona Monitoring and Management Capterra G2 SourceForge
Percona Server For MySQL Capterra G2 SourceForge
Percona XtraDB Cluster Capterra G2 SourceForge
Percona XtraBackup Capterra G2 SourceForge
Percona Distribution for PostgreSQL G2 SourceForge
Percona Backup for MongoDB G2 SourceForge
Percona Server for MongoDB G2 SourceForge
Percona Kubernetes Operator for Percona XtraDB Cluster G2 SourceForge

 

Review authors will receive a Percona T-shirt and a mug shipped free almost anywhere in the world. To claim your swag, email community-team@percona.com after your review is published. Include the following:

  1. Link to your published review.
  2. Your postal address.
  3. Your phone number (for delivery use only, never for marketing).
  4. Your T-shirt size (Small, Medium, Large, or Extra Large).
  5. Your preferred shirt color (White, Black, or Blue).

All reviews are read by our team and gratefully received. Note however that generalized reviews without strong technical content are not eligible to receive swag. Percona reserves final judgment on which reviews qualify and which do not.  Thank you for your submission!

Apr
14
2021
--

What You Can Do With Auto-Failover and Percona Distribution for MySQL (8.0.x)

auto-failover Percona MySQL

Where x is >= 22 ?

The Problem

There are few things your data does not like. One is water and another is fire. Well, guess what:

OVH Fire

If you think that everything will be fine after all, take a look:



Given my ISP had part of its management infrastructure on OVH, they had been impacted by the incident.

As you can see from the highlight, the ticket number in three years changes very little (2k cases) and the date jumps from 2018 to 2021. On top of that, I have to mention I had opened several tickets the month before that disappeared. 

So either my ISP was very lucky and had very few cases in three years and sent all my tickets to /dev/null… or they have lost THREE YEARS of data.   

Let us go straight to the chase; they have lost their data, period. 

After the fire at the OVH, these guys did not have a good backup to use for data restoring and did not even have a decent Disaster Recovery solution. Their platform remained INACCESSIBLE for more than five days, during which they also lost visibility of their own network/access point/clients and so on.   

Restoring data has brought them back online, but it takes them more than a month to review and fix the internal management system and bring the service back to acceptable standards. Needless to say, complaints and more costly legal actions had been raised against them.

All this because they missed two basic Best Practices when designing a system:

  • Good backup/restore procedure
  • Always have a Disaster Recovery solution in place 

Yeah, I know… I should change ISP. 

Anyhow, a Disaster Recovery (DR) solution is a crucial element in any production system. It is weird we still have to cover this in 2021, but apparently, it still is something being underestimated that requires our attention. 

This is why in this (long) article, I will illustrate how to implement another improved DR solution utilizing Percona Server for MySQL and standard MySQL features as group replication and asynchronous replication automatic failover (AAF).

Asynchronous Replication Automatic Failover

I have already covered the new MySQL feature here (http://www.tusacentral.net/joomla/index.php/mysql-blogs/227-mysql-asynchronous-source-auto-failover) but let us recap.

From MySQL 8.0.22 and Percona Server for MySQL 8.0.22 you can take advantage of AAF when designing distributed solutions. What does this mean?

When using simple Async-replication you have this:

simple Async-replication

Whereas, a Highly Available (HA) solution in DC2 is pulling data out from another HA solution in DC1 with the relation 1:1, meaning the connection is one node against another node.

If you have this:

Your data replication is interrupted and the two DCs diverge. Also you need to manually (or by script) recover the interrupted link. With AAF you can count on a significant improvement:

 

The link now is NOT 1:1, but a node in DC2 can count on AAF to recover the link on the other remaining nodes:

Asynchronous Replication Automatic Failover

If a node in the DC2 (the replica side) fails, then the link is broken again and it requires manual intervention. This solves a quite large chunk of problems, but it does not fix all, as I mentioned in the article above.

 

If a node in the DC2 (the replica side) fails, then the link is broken again and it requires manual intervention.

GR Failover

I was hoping to have this fixed in MySQL 8.0.23, but unfortunately, it is not. So I decided to develop a Proof Of Concept and see if it would fix the problem, and more importantly what needs to be done to do it safely. 

The result is a very basic (and I need to refine the code) Stored Procedure called grfailover, which manages the shift between primaries inside a Group Replication cluster:

 

I borrowed the concept from Yves’ Replication Manager for Percona XtraDB Cluster (https://github.com/y-trudeau/Mysql-tools/tree/master/PXC), but as we will see for GR and this use we need much less.

Why Can This Be a Simplified Version?

Because in GR we already have a lot of information and we also have the autofailover for async replication. Given that, what we need to do is only manage the start/stop of the Replica. Auto-failover will take care of the shift from one source to the other, while GR will take care of which node should be the preferred Replica (Primary on replica site). 

In short, the check just needs to see if the node is a Primary, and if so, start the replication if it is not already active while eventually stopping it if the node IS NOT a primary.

We can also maintain a table of what is going on, to be sure that we do not have two nodes replicating at the same time.

The definition will be something like this:

+--------------+---------------+------+-----+---------+-------+
| Field        | Type          | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| server_uuid  | char(36)      | NO   | PRI | NULL    |       |
| HOST         | varchar(255)  | NO   |     | NULL    |       |
| PORT         | int           | NO   |     | 3306    |       |
| channel_name | varchar(100)  | NO   |     | NULL    |       |
| gr_role      | varchar(30)   | NO   |     | NULL    |       |
| STATUS       | varchar(50)   | YES  |     | NULL    |       |
| started      | timestamp(6)  | YES  |     | NULL    |       |
| lastupdate   | timestamp(6)  | YES  |     | NULL    |       |
| active       | tinyint       | YES  |     | 0       |       |
| COMMENT      | varchar(2000) | YES  |     | NULL    |       |
+--------------+---------------+------+-----+---------+-------+

The full code can be found in GitHub here: https://github.com/Tusamarco/blogs/tree/master/asyncAutoFailOver.

How-To

The first thing you need to do is deploy Percona Server Distribution for MySQL (8.0.22 or greater) using Group Replication as a HA solution. To do so, refer to the extensive guide here: Percona Distribution for MySQL: High Availability with Group Replication Solution.

Once you have it running on both DCs, you can configure AAF on both DCs Primary node following either MySQL 8.0.22: Asynchronous Replication Automatic Connection (IO Thread) Failover or this MySQL Asynchronous SOURCE auto failover.

Once you have the AAF replication up and running, it is time for you to create the procedure and the management table in your DC-Source Primary.

First of all, be sure you have a percona schema, and if not, create it:

Create schema percona;

Then create the table:

CREATE TABLE `group_replication_failover_manager` (
  `server_uuid` char(36) NOT NULL,
  `HOST` varchar(255) NOT NULL,
  `PORT` int NOT NULL DEFAULT '3306',
  `channel_name` varchar(100) NOT NULL,
  `gr_role` varchar(30) NOT NULL,
  `STATUS` varchar(50) DEFAULT NULL,
  `started` timestamp(6) NULL DEFAULT NULL,
  `lastupdate` timestamp(6) NULL DEFAULT NULL,
  `active` tinyint DEFAULT '0',
  `COMMENT` varchar(2000) DEFAULT NULL,
  PRIMARY KEY (`server_uuid`)
) ENGINE=InnoDB;

Last, create the procedure. Keep in mind you may need to change the DEFINER or simply remove it. The code will be replicated on all nodes. To be sure, run the command below on all nodes:

select ROUTINE_SCHEMA,ROUTINE_NAME,ROUTINE_TYPE from information_schema.ROUTINES where ROUTINE_SCHEMA ='percona' ;
+----------------+--------------+--------------+
| ROUTINE_SCHEMA | ROUTINE_NAME | ROUTINE_TYPE |
+----------------+--------------+--------------+
| percona        | grfailover   | PROCEDURE    |
+----------------+--------------+--------------+

You should get something as above. 

If not, then check your replication, something probably needs to be fixed. If instead, it all works out, this means you are ready to go.

To run the procedure you can use any kind of approach you like, the only important thing is that you MUST run it FIRST on the current PRIMARY node of each DCs

This is because the PRIMARY node must be the first one to register in the management table. Personally, I like to run it from cron when in “production” while manually when testing:

IE:/opt/mysql_templates/PS-8P/bin/mysql -h 127.0.0.1 -P 3306 -D percona -e "call  grfailover(5,\"dc2_to_dc1\");"

Where:

  • grfailover is the name of the procedure.
  • 5 is the timeout in minutes after which the procedure will activate the replication in the Node.
  • dc2_to_dc1 Is the name of the channel in the current node, the procedure needs to manage.

Given two clusters as:

DC1-1(root@localhost) [(none)]>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | e891d1b4-9793-11eb-92ac-08002734ed50 | gr3         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | ebff1ab8-9793-11eb-ba5f-08002734ed50 | gr1         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | f47df54e-9793-11eb-a60b-08002734ed50 | gr2         |        3306 | ONLINE       | PRIMARY     | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

DC2-2(root@localhost) [percona]>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 79ede65d-9797-11eb-9963-08002734ed50 | gr4         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | 7e214802-9797-11eb-a0cf-08002734ed50 | gr6         |        3306 | ONLINE       | PRIMARY     | 8.0.23         |
| group_replication_applier | 7fddf04f-9797-11eb-a193-08002734ed50 | gr5         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

If you query the management table after you have run the procedure ONLY on the two Primaries:

>select * from percona.group_replication_failover_manager order by host\G
*************************** 1. row ***************************
server_uuid: f47df54e-9793-11eb-a60b-08002734ed50
        HOST: gr2
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: PRIMARY
      STATUS: ONLINE
     started: 2021-04-08 10:22:40.000000
  lastupdate: 2021-04-08 10:22:53.000000
      active: 1
     COMMENT: Just inserted
*************************** 2. row ***************************
 server_uuid: 7e214802-9797-11eb-a0cf-08002734ed50
        HOST: gr6
        PORT: 3306
channel_name: dc1_to_dc2
     gr_role: PRIMARY
      STATUS: ONLINE
     started: 2021-04-08 09:17:50.000000
  lastupdate: 2021-04-08 09:17:50.000000
      active: 1
     COMMENT: Just inserted

Given the replication link was already active, the nodes will report only “Just Inserted” in the comment. 

While if one of the two channels was down and the node NOT deactivated (set the active flag in the management table to 0), the comment will change to “COMMENT: REPLICA restarted for the channel <channel name>”

At this point, you can run the procedure also on the other nodes and after that, if you query the table by channel:

DC1-1(root@localhost) [(none)]>select * from percona.group_replication_failover_manager where channel_name ='dc2_to_dc1' order by host\G
*************************** 1. row ***************************
 server_uuid: ebff1ab8-9793-11eb-ba5f-08002734ed50
        HOST: gr1
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: SECONDARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Just inserted
*************************** 2. row ***************************
 server_uuid: f47df54e-9793-11eb-a60b-08002734ed50
        HOST: gr2
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: PRIMARY
      STATUS: ONLINE
     started: 2021-04-08 10:22:40.000000
  lastupdate: 2021-04-08 10:22:53.000000
      active: 1
     COMMENT: REPLICA restarted for the channel dc2_to_dc1
*************************** 3. row ***************************
 server_uuid: e891d1b4-9793-11eb-92ac-08002734ed50
        HOST: gr3
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: SECONDARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Just inserted
3 rows in set (0.00 sec)

What happens if I now change my Primary, or if the Primary goes down? Well let say we “just” shift our PRIMARY:

stop slave for channel 'dc2_to_dc1';SELECT group_replication_set_as_primary('ebff1ab8-9793-11eb-ba5f-08002734ed50');
Query OK, 0 rows affected, 1 warning (0.01 sec)

+--------------------------------------------------------------------------+
| group_replication_set_as_primary('ebff1ab8-9793-11eb-ba5f-08002734ed50') |
+--------------------------------------------------------------------------+
| Primary server switched to: ebff1ab8-9793-11eb-ba5f-08002734ed50         |
+--------------------------------------------------------------------------+

Please note that given I have an ACTIVE replication channel, to successfully shift the primary, I MUST stop the replication channel first.

C1-2(root@localhost) [percona]>DC1-2(root@localhost) [percona]>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | e891d1b4-9793-11eb-92ac-08002734ed50 | gr3         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | ebff1ab8-9793-11eb-ba5f-08002734ed50 | gr1         |        3306 | ONLINE       | PRIMARY     | 8.0.23         |
| group_replication_applier | f47df54e-9793-11eb-a60b-08002734ed50 | gr2         |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

Reading the management table we will see that grFailOver had started the shift:

DC1-1(root@localhost) [(none)]>select * from percona.group_replication_failover_manager where channel_name ='dc2_to_dc1' order by host\G
*************************** 1. row ***************************
 server_uuid: ebff1ab8-9793-11eb-ba5f-08002734ed50
        HOST: gr1
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: PRIMARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Need to wait 5 minutes, passed: 0
*************************** 2. row ***************************
 server_uuid: f47df54e-9793-11eb-a60b-08002734ed50
        HOST: gr2
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: PRIMARY
      STATUS: ONLINE
     started: 2021-04-08 10:22:40.000000
  lastupdate: 2021-04-08 10:22:53.000000
      active: 1
     COMMENT: REPLICA restarted for the channel dc2_to_dc1
*************************** 3. row ***************************
 server_uuid: e891d1b4-9793-11eb-92ac-08002734ed50
        HOST: gr3
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: SECONDARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Just inserted

Checking the new PRIMARY node gr1, we can see that:

  • Gr_role is PRIMARY
  • COMMENT reports the countdown (in minutes) the node waits

After the 5 minutes: 

DC1-1(root@localhost) [(none)]>select * from percona.group_replication_failover_manager where channel_name ='dc2_to_dc1' order by host\G
*************************** 1. row ***************************
 server_uuid: ebff1ab8-9793-11eb-ba5f-08002734ed50
        HOST: gr1
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: PRIMARY
      STATUS: ONLINE
     started: 2021-04-08 10:27:54.000000
  lastupdate: 2021-04-08 10:30:12.000000
      active: 1
     COMMENT: REPLICA restarted for the channel dc2_to_dc1
*************************** 2. row ***************************
 server_uuid: f47df54e-9793-11eb-a60b-08002734ed50
        HOST: gr2
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: SECONDARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Resetted by primary node ebff1ab8-9793-11eb-ba5f-08002734ed50 at 2021-04-08 10:27:53
*************************** 3. row ***************************
 server_uuid: e891d1b4-9793-11eb-92ac-08002734ed50
        HOST: gr3
        PORT: 3306
channel_name: dc2_to_dc1
     gr_role: SECONDARY
      STATUS: ONLINE
     started: NULL
  lastupdate: NULL
      active: 1
     COMMENT: Just inserted

Now, what we can see is:

  • Node gr1 had become active in replicating
    • It reports the time it started the replication 
    • It reports the last time it checked for the replication to be active
  • Node gr2 is marked SECONDARY
    • In the comment is also reported the time and when the replication was restarted on the new REPLICA node

If for any reason the replication in the original node gr2 was restarted (like moving back the PRIMARY) while the countdown was still in place, grFailOver will stop any action and reset the gr1 status. 

In short, now my two DCs can rely on AAF for failing over on a different SOURCE and on grFailOver for shifting the Node following GR Primary, or to failover to another node when my Primary crashes.

Conclusion

I am sure Oracle is backing something about this and I am sure we will see it out soon, but in the meantime, I have to say that this simple solution works. It has improved the resiliency of my testing architecture A LOT. 

And while I am still testing it and I am totally confident that the procedure can be written in a more efficient way, I am also sure bugs and errors are around the corner. 

BUT, this was a POC and I am happy with the outcome. This proves it is not so difficult to make better what we have, and also proves that sometimes a small thing can have a HUGE impact. 

It also proves we should not always wait for others to do what is required and that ANYONE can help. 

Finally, as mentioned above, this is a POC solution, but no one prevents you to start from it and make it a production solution, as my colleague Yves did for his Percona XtraDB Cluster Replication Manager. 

Is just on you!  Great MySQL to all. 

References

https://www.datacenterdynamics.com/en/news/fire-destroys-ovhclouds-sbg2-data-center-strasbourg/

http://www.tusacentral.net/joomla/index.php/mysql-blogs/227-mysql-asynchronous-source-auto-failover

https://github.com/y-trudeau/Mysql-tools/tree/master/PXC

https://www.percona.com/blog/2020/10/26/mysql-8-0-22-asynchronous-replication-automatic-connection-io-thread-failover/

Apr
14
2021
--

Percona Distribution for MySQL: High Availability with Group Replication Solution

This blog provides high availability (HA) guidelines using group replication architecture and deployment recommendations in MySQL, based on our best practices.

Every architecture and deployment depends on the customer requirements and application demands for high availability and the estimated level of usage. For example, using high read or high write applications, or both, with a need for 99.999% availability.

Here, we give architecture and deployment recommendations along with a technical overview for a solution that provides a high level of high availability and assumes the usage of high read/write applications (20k or more queries per second).

Layout

MySQL High Availability with Group Replication

Components

This architecture is composed of two main layers:

  • Connection and distribution layer
  • RDBMS (Relational Database Management System) layer

Connection Layer

The connection layer is composed of:

  • Application to proxy redirection mechanism which can be anything from a Virtual IP managed by Keepalived local service to a DNS resolution service like Amazon Route 53. Its function is to redirect the traffic to the active Proxy node.
  • Proxy connection distribution is composed of two or more nodes. Its role is to redirect the traffic to the active nodes of the Group Replication cluster. In cases like ProxySQL where the proxy is a level 7 proxy and is able to perform Read/Write split, this layer is also in charge of redirecting writes to the Primary node and reads to the Replicas, and of HA to prevent a single point of failure

Data Layer

The data layer is composed of:

  • Primary node serving writes (or source) – this is the node that will accept writes and DDL modifications. Data will be processed following the ACID paradigm (atomicity, consistency, isolation, durability) and replicated to all other nodes.
  • Replica nodes are the elements serving read requests. Some replica nodes can be elected Primary in case of Primary node failure. A replica node should be able to leave and join back a healthy cluster without impacting the service.
  • Replication mechanism to distribute changes across nodes and in this solution is done with Group Replication. Group Replication is a tightly coupled solution which means that the database cluster is based on a Datacentric approach (single state of the data, distributed commit). In this case, the data is consistent in time across nodes and replication requires a high performant link. Given that, geographic distribution is strongly discouraged and Disaster Recovery (DR) is not implicitly supported by the main Group Replication mechanism. 

The node characteristics (CPU/RAM/Storage) are not relevant to the main solution design.  They instead must reflect the estimated workload the solution will have to cover, which is a case-by-case identification. 

What is important to keep in mind is that all nodes that are part of the cluster must have the same characteristics.  If they don’t, the cluster will be imbalanced and service will be affected.

As a generic indication, we recommend using solutions with at least 8 cores and 16GB RAM when production.  

High Availability

How do we measure availability and at what point does it become “high” availability?

Generally speaking, the measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be 1, which is equal to 100% availability. At Percona we don’t consider a solution to be highly available if it is not at least 99% or “two nines” available.  

Availability % Downtime per year Downtime per month Downtime per week Downtime per day
99% (“two nines”) 3.65 days 7.31 hours 1.68 hours 14.40 minutes
99.5% (“two nines five”) 1.83 days 3.65 hours 50.40 minutes 7.20 minutes
99.9% (“three nines”) 8.77 hours 43.83 minutes 10.08 minutes 1.44 minutes
99.95% (“three nines five”) 4.38 hours 21.92 minutes 5.04 minutes 43.20 seconds
99.99% (“four nines”) 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.995% (“four nines five”) 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds
99.999% (“five nines”) 5.26 minutes 26.30 seconds 6.05 seconds 864.00 milliseconds

How is High Availability Achieved?

There are three key components to high availability:

  1. Infrastructure – This is the physical or virtual hardware that database systems rely on to run. Without enough infrastructure (VM’s, networking, etc) there cannot be high availability. The easiest example is: there is no way to make a single server highly available.
  2. Topology Management – This is the software management related specifically to the database and managing its ability to stay consistent in the event of a failure. Many clustering or synchronous replication solutions offer this capability out of the box. However, for asynchronous replication, this is handled by additional software. 
  3. Connection Management – This is the software management related specifically to the networking and connectivity aspect of the database. Clustering solutions typically bundle with a connection manager, however in asynchronous clusters deploying a connection manager is mandatory for high availability.

This Solution Provides:

The proposed solution, based on a tightly coupled database cluster, offers an HA level of 99.995% when coupled with the Group replication setting group_replication_consistency=AFTER.

group replication

Failovers

If properly planned and architected, a database failure or configuration change that requires a restart shouldn’t affect the stability of the database infrastructure. Failovers are an integral part of a stability strategy and aligning the business requirements for availability and uptime with failover methodologies is critical to achieving those goals. Below are the 3 main types of failovers that can occur in database environments.

  • Planned Failover: A planned failover is a failover that has been scheduled in advance or occurs at a regular interval. There can be many reasons for planned failovers including patching, large data operations, retiring existing infrastructure, or simply testing the failover strategy.
  • Unplanned Failover: An unplanned failover is what occurs when a database unexpectedly becomes unresponsive or experiences instability. This could also include emergency changes that do not fall under the planned failover cadence or scheduling parameters. Unplanned failovers are generally considered higher-risk operations due to the high stress and high potential for either data corruption or data fragmentation.
  • Regional or Disaster Recovery Failover: Unplanned failovers still work with the assumption that additional database infrastructure is immediately available and in a usable state. In a regional or DR failover, we would be making the assumption that there is a large-scale infrastructure outage that requires the business to move its operations away from its current availability zone.
  • This solution covers both planned and unplanned failovers.

Maintenance Windows

Major vs Minor Maintenance: Although it may not be obvious at first, not all maintenance activities are created equal and do not have the same dependencies. It is good to separate maintenance that demands downtime or failover from maintenance that can be done without impacting those important stability metrics. When defining these maintenance dependencies there can be a change in the actual maintenance process that allows for a different cadence.

Maintenance Without Service Interruption: With rolling restart and using proper version upgrade it is possible to cover both major and minor maintenance without service interruption.

Uptime

When referring to database stability, uptime is likely the largest indicator of stability and oftentimes is the most obvious symptom of an unstable database environment. Uptime is composed of 3 key components and, contrary to common perception, is based on what happens when the database software is not able to take incoming requests rather than maintaining the ability to take requests with errors.

Recovery Time Objective (RTO): This can be boiled down to a very simple question “How long can the business sustain a database outage?”. Once the business is aligned with a goal of a minimum viable recovery time objective, it is much more straightforward to plan and invest in the infrastructure required to meet that requirement. It is important to acknowledge that while everyone desires 100% uptime, there is a need for realistic expectations that align with the business needs and not a technical desire.

Recovery Point Objective (RPO): There is a big distinction between the Recovery Point and the Recovery Time for database infrastructure. The database can be available, but not to the exact state that it was when it became unavailable. That is where Recovery Point comes in. The question we ask here is “How much data can the business lose during a database outage?”. All businesses have their own requirements here and it is worthy to note that it is always the goal to never sustain any data loss. But this is framed in a worst-case scenario how much data could be lost and the business maintains the ability to continue.

Disaster Recovery: While RTO and RPO are great for unplanned outages or small-scale hiccups to the infrastructure, when we talk about Disaster Recovery this is a major large-scale outage not strictly for the database infrastructure. How capable is the business of restarting operations with the assumption that all resources are completely unavailable in the main availability zone? The assumption here is that there is no viable restoration point or time that aligns with the business requirements. While each DR scenario is unique based on available infrastructure, backup strategy, and technology stack, there are some common threads for any scenario. 

This solution helps improve uptime:

Using this solution will help you to significantly reduce both RPO and RTO. Given the tightly coupled cluster solution approach, the failure of a single node will not result in service interruption.
Increasing the number of nodes will also improve the cluster resilience by the formula:
F = (N -1) / 2

Where:

F – Number of admissible failures

N – number of nodes in the cluster

Examples:

In a cluster of 5 nodes, F = (5 – 1)/2 = 2. 

The cluster can support up to 2 failures. 

In a cluster of 4 nodes, F = (4 – 1)/2 = 1.  

The cluster can support up to 1 failure. 

This solution also allows for a more restrictive backup policy, dedicating a node to the backup cycle, which will help in keeping RPO low. As previously mentioned, DR is not covered by default by the solution which will require an additional replication setup and controller. 

Measurement and Monitoring

To ensure database infrastructure is performing as intended or at its best, it is necessary to measure specific metrics and alert when some of these metrics are not in line with expectations. Periodic review of these measurements is also encouraged to promote stability and understand potential risks associated with the database infrastructure. Below are the 3 aspects of Database performance measurement and monitoring

Measurement: To understand how a database infrastructure is performing there is a need to measure multiple aspects of the infrastructure. With measurement, it’s important to understand the impact of the sample sizes, sample timing, and sample types.

Metrics: Metrics refer to the actual parts of the database infrastructure being measured. When we discuss metrics, more isn’t always better as it could introduce unintentional noise or allow for troubleshooting to become overly burdensome.

Alerting: When one or many metrics of the database infrastructure is not within a normal or acceptable range, an alert should be generated so that the team responsible for the appropriate portion of the database infrastructure can investigate and remedy it

Monitoring for this solution is covered by:

Percona Monitoring and Management has a specific dashboard to monitor the Group Replication state and cluster status as a whole. (https://www.percona.com/doc/percona-monitoring-and-management/2.x/introduction.html) has a specific dashboard to monitor Group Replication state, and cluster status as a whole.   

How to Implement the Infrastructure

In this section, we are providing the step by step instructions on how to implement the above solution. 

The Elements

The following will be used:

  • 1 Virtual IP for ProxySQL failover – 192.168.4.194
  • 2 ProxySQL nodes
    • Proxy1 192.168.4.191
    • Proxy2 192.168.4.192
  • 4 MySQL nodes in Single Primary mode
    • Gr1 192.168.4.81 – Initial Primary
    • Gr2 192.168.4.82 – Replica / failover
    • Gr3 192.168.4.83 – Replica / failover
    • Gr4 192.168.4.84 – Replica / Backup
  • Ports. All ports must be open if a firewall is in place or any other restriction like AppArmor or SELinux. 
    • Proxysql
      • 6033
      • 6032
      • 3306
    • MySQL – GR
      • 3306
      • 33060
      • 33061

Software Installation

First, you need to install the Percona Distribution for MySQL, the Percona Server for MySQL-based variant, on each node. Follow the instructions at https://www.percona.com/doc/percona-distribution-mysql/8.0/installing.html to install Percona Server for MySQL v8.0.

Configure the Nodes

Before anything, make sure that all the nodes use the same time-zone and time:

[root@gr1 ps8]# date
    Tue Aug 18 08:22:12 EDT 2020

Check also for ntpd service to be present and enabled. Be sure that each node resolves the other nodes by name:

root@gr1 ps8]# for i in 1 2 3 4 ; do ping -c 1 gr$i > /dev/null;echo $?; done

If not able to resolve, add the entries in the /etc/hosts file.

Once instances are up and running check the Percona Server for MySQL version on each node:

(root@node1) [(none)]>\s
--------------
/opt/mysql_templates/PS-8P/bin/mysql  Ver 8.0.20-11 for Linux on x86_64 (Percona Server (GPL), Release 11, Revision 159f0eb)

Step 1

Create a proper user for administration:

CREATE user dba@localhost identified by 'dbapw';
CREATE user dba@'192.168.%' identified by 'dbapw'; 

GRANT ALL on *.* to dba@localhost with grant option;
GRANT ALL on *.* to dba@'192.168.%' with grant option;

Exit from the client as user root and login as user dba.

Be sure to have a good and unique SERVER_ID value:

(dba@node1) [(none)]>show global variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     |     1 | <--- Not good given the same for all nodes
+---------------+-------+
1 row in set (0.01 sec)

It’s now time to add group replication settings to the instances.

Step 2

Stop all running nodes, then in the my.cnf add:

#####################
    #Replication + binlog settings
    #####################
    auto-increment-increment                                    =1
    auto-increment-offset                                       =1

    log-bin                                             =<path_to_logs>/binlog
    log-bin-index                                       =binlog.index
    binlog-checksum                                             =NONE
    binlog-format                                               =ROW
    binlog-row-image                                            =FULL
    log-slave-updates 						=1
    binlog-transaction-dependency-tracking                      =WRITESET


    enforce-gtid-consistency                                    =TRUE
    gtid-mode                                                   =ON

    master-info-file                                            =master.info
    master-info-repository                                      =TABLE
    relay_log_info_repository                                   =TABLE
    relay-log                                            =<path_to_logs>/relay

    sync-binlog                                                 =1

    ### SLAVE SECTION
    skip-slave-start
    slave-parallel-type                                        = LOGICAL_CLOCK
    slave-parallel-workers                                      = 4
    slave-preserve-commit-order                                 = 1

    ######################################
    #Group Replication
    ######################################
    plugin_load_add                                    ='group_replication.so'
    plugin-load-add                                    ='mysql_clone.so'
    group_replication_group_name       ="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" <-- Not good use something that will help you 
                                                                            to identify the GR transactions and from where they 
                                                                             come from IE "dc1euz1-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
    group_replication_start_on_boot                     =off
    group_replication_local_address                     = "192.168.4.81/2/3/4:33061"  <---- CHANGE THIS TO MATCH EACH NODE LOCAL IP
    group_replication_group_seeds                       = "192.168.4.81:33061,192.168.4.82:33061,192.168.4.83:33061,192.168.4.84:33061"
    group_replication_bootstrap_group                   = off
    transaction-write-set-extraction                    = XXHASH64

Restart all nodes and connect to them.

Step 3

Create a user for replication (on all nodes):

SET SQL_LOG_BIN=0;
    CREATE USER replica@'192.168.4.%' IDENTIFIED BY 'replicapw';   #<--- Please note the filter by IP is more restrictive 
    GRANT REPLICATION SLAVE ON *.* TO replica@'192.168.4.%';
    FLUSH PRIVILEGES;
    SET SQL_LOG_BIN=1;

Link the nodes with replication channel (on all nodes):

CHANGE MASTER TO MASTER_USER='replica', MASTER_PASSWORD='replicapw' FOR CHANNEL 'group_replication_recovery';

Check the current status:

(dba@node1) [(none)]>\u performance_schema
    (dba@node1) [performance_schema]>show tables like '%repl%';
    +-------------------------------------------+
    | Tables_in_performance_schema (%repl%)     |
    +-------------------------------------------+
    | replication_applier_configuration         |
    | replication_applier_filters               |
    | replication_applier_global_filters        |
    | replication_applier_status                |
    | replication_applier_status_by_coordinator |
    | replication_applier_status_by_worker      |
    | replication_connection_configuration      |
    | replication_connection_status             |
    | replication_group_member_stats            |
    | replication_group_members                 | <------------------------
    +-------------------------------------------+

   (dba@node1) [performance_schema]>select * from replication_group_members\G
CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 
   MEMBER_HOST: 
   MEMBER_PORT: 
  MEMBER_STATE: 
   MEMBER_ROLE: OFFLINE
MEMBER_VERSION: 
1 row in set (0.00 sec)

At this stage, you should be able to start the first (Primary) cluster node.

Only on GR1:

(dba@node1)[none]> SET GLOBAL group_replication_bootstrap_group=ON;
(dba@node1)[none]> START GROUP_REPLICATION;
(dba@node1)[none]> SET GLOBAL group_replication_bootstrap_group=OFF;

And then check it:

(dba@node1) [none]>select * from performance_schema.replication_group_members\G
     CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 90a353b8-e6dc-11ea-98fa-08002734ed50
   MEMBER_HOST: gr1
   MEMBER_PORT: 3306
  MEMBER_STATE: ONLINE
   MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.20

Once the Primary is running, connect on the second node GR2 and start Group replication:

(dba@node2) [none]>START GROUP_REPLICATION;
Query OK, 0 rows affected (4.60 sec)

Check if it registered correctly:

(dba@node2) [performance_schema]>select * from replication_group_members\G
*************************** 1. row ***************************
  CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 58ffd118-e6dc-11ea-8af8-08002734ed50
   MEMBER_HOST: gr2
   MEMBER_PORT: 3306
  MEMBER_STATE: ONLINE
   MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.20
*************************** 2. row ***************************
  CHANNEL_NAME: group_replication_applier
     MEMBER_ID: 90a353b8-e6dc-11ea-98fa-08002734ed50
   MEMBER_HOST: gr1
   MEMBER_PORT: 3306
  MEMBER_STATE: ONLINE
   MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.20

Test if replication works:

On GR1

(dba@node1) [performance_schema]>create schema test;
Query OK, 1 row affected (0.76 sec)

(dba@node1) [performance_schema]>\u test
Database changed

(dba@node1) [test]>create table test1 (`id` int auto_increment primary key);
Query OK, 0 rows affected (0.32 sec)

(dba@node1) [test]>insert into test1 values(null);
Query OK, 1 row affected (0.34 sec)

On GR2

(dba@node2) [performance_schema]>use \test
 Database changed
 (dba@node2) [test]>select * from test1;
 +----+
 | id |
 +----+
 |  1 |
 +----+
 1 row in set (0.00 sec)

Start group replication of the other two nodes GR3 and GR4:

(dba@node3) [performance_schema]>START GROUP_REPLICATION;
(dba@node4) [performance_schema]>START GROUP_REPLICATION;

Proxy Setup

Step 1

In our solution we will use two ProxySQL nodes:

  • Proxy1 192.168.4.191
  • Proxy2 192.168.4.192

First, you need to install ProxySQL on the nodes you have selected, in our case the two above.

To install the software follow the instructions in How to Install ProxySQL From the Percona RepositoryOnce you have installed the software, we first need to grant access to the ProxySQL monitor user to our Percona Server for MySQL nodes.

Create monitor user in MySQL group replication nodes:

Create monitor user in MySQL group replication nodes:
create user monitor@'192.168.4.%' identified by 'monitor';
grant usage on *.* to 'monitor'@'192.168.4.%';
grant select on sys.* to 'monitor'@'192.168.4.%';

Then define some basic variables:

update global_variables set Variable_Value='admin:admin;cluster1:clusterpass'  where Variable_name='admin-admin_credentials';
update global_variables set variable_value='cluster1' where variable_name='admin-cluster_username';
update global_variables set variable_value='clusterpass' where variable_name='admin-cluster_password';
update global_variables set Variable_Value=0  where Variable_name='mysql-hostgroup_manager_verbose';
update global_variables set Variable_Value='true'  where Variable_name='mysql-query_digests_normalize_digest_text';
update global_variables set Variable_Value='8.0.20'  where Variable_name='mysql-server_version';
update global_variables set Variable_Value='utf8'  where Variable_name='mysql-default_charset';
update global_variables set Variable_Value=300  where Variable_name='mysql-tcp_keepalive_time';
update global_variables set Variable_Value='true'  where Variable_name='mysql-use_tcp_keepalive';
update global_variables set Variable_Value='true'  where Variable_name='mysql-verbose_query_error';
update global_variables set Variable_Value='true'  where Variable_name='mysql-show_processlist_extended';
update global_variables set Variable_Value=50000  where Variable_name='mysql-max_stmts_cache';
update global_variables set Variable_Value='false'  where Variable_name='admin-web_enabled';
update global_variables set Variable_Value='0'  where Variable_name='mysql-set_query_lock_on_hostgroup';

load admin variables to run;save admin variables to disk;
load mysql variables to run;save mysql variables to disk;

The user name and password need to reflect your standards. The ones used above are just an example. Then set up the nodes as a cluster:

INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES('192.168.4.191',6032,100,'PRIMARY');
INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES('192.168.4.192',6032,100,'SECONDARY');
load proxysql servers to run;save proxysql servers to disk;

Step 2

Define user(s), servers, and query rules to perform read/write split. Create one or more valid user(s), for instance, if you have a user named app_gr with the password test, that has access to your group replication cluster:

insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('app_gr','test',1,400,'mysql',1,'application test user GR');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

Define servers:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.81',400,3306,10000,2000,'GR1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.81',401,3306,100,2000,'GR1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.82',401,3306,10000,2000,'GR2');    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.83',401,3306,10000,2000,'GR2');        
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.84',401,3306,1,2000,'GR2');        
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

Define query rules to get read-write split:

INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(4040,6033,'app_gr',400,1,3,'^SELECT.*FOR UPDATE',1);
INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,multiplex,apply) values(4042,6033,'app_gr',401,1,3,'^SELECT.*$',1,1);
LOAD MYSQL QUERY RULES TO RUN;SAVE MYSQL QUERY RULES TO DISK;

Step 3

Once we have all the configuration ready, we need to have a special view in the SYS schema in our Percona server nodes. The view working for the server version 8 and above can be found here (https://github.com/Percona-Lab/group_replication_tools/blob/master/GR_sys_view_forProxysql_v1.sql

Run that sql on the PRIMARY node of the Group Replication cluster.

Step 4

Now we are ready to activate the native support for Group Replication in ProxySQL. We will use the following group definition:

Writer HG-> 400
Reader HG-> 401
BackupW HG-> 402
Offline HG-> 9401 

INSERT INTO mysql_group_replication_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) 
values (400,402,401,9401,1,1,1,100);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

Few comments here about the parameters (for full reference see here https://proxysql.com/documentation/main-runtime#mysql_group_replication_hostgroups ). We recommend setting the number of writers always to 1, and witer_is_also_reader to 1 as well to obtain the most reliable results.

max_writers: 1
writer_is_also_reader: 1

The max_transactions_behind is a subjective parameter that you should calculate on the basis of your needs. If for instance you cannot have a stale read, it will be safe to set this value to a low number (ie 50) and to set in all Group replication nodes:

set global group_replication_consistency=AFTER;

If instead, you have no issue or strict requirements about some stale read, you can relax the parameter and ignore the group_replication_consistency setting. Our recommended setting is group_replication_consistency=AFTER and max_transactions_behind: 100.

Proxy HA

The final step is to enable High Availability for the ProxySQL layer. In this approach, we will use the well-known keepalived service. First, install the keepalived software using yum or apt-get on each ProxySQL node:

Sudo yum install -y keepalived
Or
Sudo apt-get install -y keepalived

Then modify the /etc/keepalived/keepalived.conf file accordingly to your setup. In our case:

  • Proxy1 192.168.4.0/24 dev enp0s9 proto kernel scope link src 192.168.4.191
  • Proxy2 192.168.4.0/24 dev enp0s9 proto kernel scope link src 192.168.4.192
  • VIP       192.168.4.194

We want to have the primary to be Proxy1 and the failover node to be Proxy2. Given that the config will look like:

cat /etc/keepalived/keepalived.conf 
global_defs {
  # Keepalived process identifier
  router_id  proxy_HA
}
# Script used to check if Proxy is running
vrrp_script check_proxy {
  script "killall -0 proxysql"
  interval 2
  weight 2
}
# Virtual interface
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER 
  interface enp0s9
  virtual_router_id 51
  priority 100  <----- This needs to be different for each ProxySQL node, like 100/99 
 
  # The virtual ip address shared between the two load balancers
  virtual_ipaddress {
   192.168.4.194  dev enp0s9
  }
  track_script {
    check_proxy
  }
}

Once done, start the keepalived service, and from now on the VIP will be associated with the Proxy1 unless service is down.

In the system log:

proxysql1 Keepalived_vrrp[17422]: VRRP sockpool: [ifindex(4), proto(112), unicast(0), fd(10,11)]
proxysql1 Keepalived_vrrp[17422]: VRRP_Script(check_proxy) succeeded
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Transition to MASTER STATE
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Received advert with higher priority 101, ours 100
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Entering BACKUP STATE
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Changing effective priority from 100 to 102
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) forcing a new MASTER election
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Transition to MASTER STATE
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Entering MASTER STATE
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) setting protocol VIPs.
proxysql1 Keepalived_vrrp[17422]: Sending gratuitous ARP on enp0s9 for 192.168.4.194
proxysql1 Keepalived_vrrp[17422]: VRRP_Instance(VI_01) Sending/queueing gratuitous RPs on enp0s9 for 192.168.4.194
proxysql1 Keepalived_vrrp[17422]: Sending gratuitous ARP on enp0s9 for 192.168.4.194
..
proxysql1 Keepalived_vrrp[17422]: Sending gratuitous ARP on enp0s9 for 192.168.4.194
proxysql1 avahi-daemon[989]: Registering new address record for 192.168.4.194 on enp0s9.IPv4.

Disaster Recovery Implementation

The implementation of a DR (Disaster Recovery) site will follow the same direction provided for the main site. There are only some generic rules that should be followed:

  • A DR site should be located in a different geographic location than the main site (several hundred kilometers/miles away).
  • The connection link between the main site and the DR site can only be established using asynchronous replication (standard MySQL replication setup ).

Monitoring

There are few ways to monitor a Group Replication cluster. The easiest way is to have Percona Monitoring and Management (Version 2.10 or later) deployed to do it for you. For an easy installation of Percona Monitoring and Management check out this quickstart.

Percona Monitoring and Management

The only important thing to remember is that when registering the Percona Server for MySQL node or the MySQL node, you should specify the replication_set flag.

Ie:  pmm-admin add mysql --username=pmm --password=pmm --query-source=perfschema --replication-set=gr_test_lab  group_rep4 127.0.0.1:3306

Then you can use the Group replication Dashboard and monitor your cluster with a lot of details.

The sections are:

  • Overview(3 panels)

MySQL Group Replication

  • Replication Delay Details(3 panels)

  • Transactions(8 panels)

  • Conflicts

From Command Line

From the command line you need to manually query the tables in Performance schema:

+----------------------------------------------+
| replication_applier_configuration            |
| replication_applier_filters                  |
| replication_applier_global_filters           |
| replication_applier_status                   |
| replication_applier_status_by_coordinator    |
| replication_applier_status_by_worker         |
| replication_connection_configuration         |
| replication_connection_status                |
| replication_group_member_stats               |
| replication_group_members                    |
+----------------------------------------------+

For instance, to get the lag in the number of transactions on a node:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;
+---------------+
| last_executed |
+---------------+
| 125624        |
+---------------+
1 row in set, 1 warning (0.03 sec)

+---------------+
| last_received |
+---------------+
| 125624        |
+---------------+
1 row in set, 1 warning (0.00 sec)

+----------+
| real_lag |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Or use a more composite query:

SELECT
  conn_status.channel_name as channel_name,
  conn_status.service_state as IO_thread,
  applier_status.service_state as SQL_thread,
  conn_status.LAST_QUEUED_TRANSACTION as last_queued_transaction,
  applier_status.LAST_APPLIED_TRANSACTION as last_applied_transaction,
  LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP - 
                            LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP 'rep delay (sec)',
  LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP - 
                           LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP 'transport time', 
  LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP - 
                           LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP 'time RL',
  LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP - 
                           LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP 'apply time',  
  if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0" ,        abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0,timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) `lag_in_sec` 
FROM
  performance_schema.replication_connection_status AS conn_status
JOIN performance_schema.replication_applier_status_by_worker AS applier_status
  ON applier_status.channel_name = conn_status.channel_name
ORDER BY lag_in_sec, lag_in_sec desc\G

Which will provide information about each applier:

*************************** 1. row ***************************
channel_name: group_replication_applier
IO_thread: ON
SQL_thread: ON
last_queued_transaction: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:125624
last_applied_transaction: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:125621
rep delay (sec): 3.153038
transport time: 0.061327
time RL: 0.001005
apply time: 0.388680
lag_in_sec: 0

As you can see, Percona Monitoring and Management will give you a better view without compromising the details. 

Conclusions

Using these steps and recommendations, you can set up database infrastructure with high availability based on group replication and use Percona Monitoring and Managemen to monitor the infrastructure’s performance and health. 

Keep in mind that we are constantly working on making our recommendations better. As such, what is illustrated here is subject to changes and revision especially on the basis of the increasing adoption of Group Replication. This is because the more GR is used the more edge cases or deviation we will identify. Those are a significant help for us to refine our best practices.

Apr
07
2021
--

MySQL 101: Basic MySQL Server Triage

MySQL 101 Server Triage

MySQL 101 Server TriageSo your MySQL server has crashed.  What do you do now?  When a server is down, in my opinion, there are two steps that are essential and both are extremely important and neither should be neglected:

  1. Save diagnostic information for determining the root cause analysis (RCA).
  2. Get the server back up and running.

Too many people rush to Step #2 and lose pertinent diagnostics from Step #1.  Likewise, too many people will spend too much time on Step #1 and delay getting to Step #2 and restoring service.  The goal is to collect diagnostics as quickly as possible for later review while getting service restored as fast as possible.

As a Technical Account Manager (TAM) and assisting on server restoration calls, I have seen both issues at play.  Technical resources have a tendency to get so bogged down in trying to understand the cause of the server outage that they forget that the downtime is costing the business money.  The desire to crawl through server logs, review metrics, pour-over system metrics, and so on, can be too tempting for some who are concerned that important diagnostic data will be lost when service is restored.  This is a valid concern, but there must be a middle ground.

Conversely, many, especially those in management, will demand service is restored immediately to continue business functions.  Of course, after the service is back up, the demand for an RCA will come.  Sadly, many metrics, and some logs, are lost when a server is bounced.  Below are basic guidelines on what metrics to collect for MySQL.  The steps are in no particular order.

  1. Save a copy of the MySQL Error Log.
    sudo cp /path/to/datadir/*.log /some/where/safe
  2. Make a copy of the MySQL configuration file.
    sudo cp /path/to/my.cnf /some/where/safe
  3. Make a copy of system logs and save them somewhere on persistent storage in a location that will not be overwritten.  Consider doing something like the following on Linux:
    sudo cp /var/log/syslog /some/where/safe/syslog
    sudo cp /var/log/messages /some/where/safe/messages
    sudo journalctl -e > /some/where/safe/journalctl.txt
  4. If MySQL is running still and you can log in, get some MySQL metrics.  You will want to save the output into files somewhere.
    sudo mysqladmin -i10 -c10 proc > /some/where/safe/mysql_procs.txt
    mysql> SHOW GLOBAL VARIABLES;
    sudo mysqladmin -i10 -c10 ext > /some/where/safe/mysql_ext.txt
    mysql> SHOW ENGINE INNODB STATUS\G
  5. If MySQL is running and you have Percona Toolkit, you should collect some pt-stalk output.
    sudo ./pt-stalk --no-stalk --iterations=2 --sleep=30 --dest=/some/where/safe -- --user=root --password=<mysql-root-pass>;
  6. If you have space and time, a copy of the database files (data directory in MySQL) could be helpful.  Certainly, for many installations, getting all of the data files will be impossible.  If it is a small database and space and time allow, it can be best to get all the files just in case.
    sudo cp -R /path/to/datadir /some/where/safe/datadir
  7. Copy database logs and save them somewhere safe for later review.  Systems like Percona XtraDB Cluster (PXC) will create GRA files during an issue which can be really helpful to look at to determine the root cause.  By combining the GRA header file with the contents of the GRA log files, you can use the mysqlbinlog command to get the records of transactions causing issues.  More information can be found in one of our older blogs here
    Percona XtraDB Cluster (PXC): what about GRA_*.log files?.

    sudo cp /path/to/data/dir/GRA* /some/where/safe/datadir/
  8. Save system metrics pertaining to CPU, I/O, and memory usage:
    sudo mpstat -a 1 60 > /some/where/safe/mpstat.txt
    sudo vmstat 1 60 > /some/where/safe/vmstat.txt
    sudo iostat -dmx 1 60 > /some/where/safe/iostat.txt
  9. Save system info.
    sudo cat /proc/cpuinfo > /some/where/safe/cpuinfo.txt
  10. If you have Percona Toolkit, the following would be very helpful:
    sudo pt-summary > /some/where/safe/pt-summary.txt
    sudo pt-mysql-summary > /some/where/safe/pt-mysql-summary.txt
  11. Get hardware diagnostics.
    # disk info
    sudo df -k > /some/where/safe/df_k.txt
    sudo lsblk -o KNAME,SCHED,SIZE,TYPE,ROTA > /some/where/safe/lsblk.txt
    sudo lsblk --all > $PTDEST/lsblk-all;
    
    # lv/pv/vg only for systems with LVM
    sudo lvdisplay --all --maps > /some/where/safe/lvdisplau-all-maps.txt
    sudo pvdisplay --maps > /some/where/safe/pvdisplay-maps.txt
    sudo pvs -v > /some/where/safe/pvs_v.txt
    sudo vgdisplay > /some/where/safe/vgdisplay.txt
    
    # nfsstat for systems with NFS mounts 
    sudo nfsstat -m > /some/where/safe/nfsstat_m.txt
    sudo nfsiostat 1 120 > /some/where/safe/nfsiostat.txt
    
    # Collect hardware information 
    sudo dmesg > /some/where/safe/dmesg.txt
    sudo dmesg -T free -m > /some/where/safe/dmesg_free.txt 
    sudo dmesg -T > /some/where/safe/dmesg_t.txt
    sudo ulimit -a > /some/where/safe/ulimit_a.txt
    sudo cat /proc/sys/vm/swappiness > /some/where/safe/swappiness 
    sudo numactl --hardware > /some/where/safe/numactl-hardware.txt

It goes without saying, it would be best to script the above into a useful bash script you can run when there is an issue.  Just be sure to test the script in advance of an issue.

Again, the goal is to preserve useful diagnostic data that could be useful for determining the root cause of the issue at a later time after the service is restored.  Just don’t get caught up in looking through the above diagnostics!  Certainly, more data is better but the above is a great starting point.  As time goes on, you may realize you wish you had other metrics and can add them to your script or Standard Operating Procedure (SOP).

Naturally, adding monitoring like Percona Monitoring and Management (PMM) would be a great option that can save you a lot of time and collect even more trends over time which can be extremely helpful.

With the above diagnostics, you would have a ton of information in the event of an issue to find the root cause.  Now, you can sort through the diagnostics.  Of course, if you need help with that, Percona can help you here as well.

Apr
06
2021
--

Monitoring OPNSense Firewall with Percona Monitoring and Management

OPNsense firewall Percona Monitoring Management

OPNsense firewall Percona Monitoring ManagementI try to use Open Source when a good solution exists, so for my home firewall, I’m using OPNSense  – a very powerful FreeBSD-based firewall appliance with great features along with a powerful GUI.

One of the plugins available with OPNSense is  node_exporter, which exposes a lot of operating system metrics through the Prometheus protocol.

Installing this plugin will allow you to monitor your OPNSense based firewall with any Prometheus-compatible system including, as you have guessed,  Percona Monitoring and Management (PMM).

For best results, you will need PMM 2.14 or later, as it has improved support for external exporters.

Adding OPNSense to PMM for monitoring requires just one simple command:

 

pmm-admin add external-serverless --url=http://10.11.13.1:9100/metrics --external-name fw01 --group opnsense

 

Let’s break down what this command does:

  • We are adding this as “serverless” exported because there are no pmm-agent processes running on that node and the only access we have to it is through the Prometheus protocol.
  • 10.11.13.1 is the IP of the firewall.  Port 9100 is what OPNSense uses by default.
  • I chose to name this firewall “fw01” for purpose of monitoring, this is how it will be identified in PMM.
  • We put it in the group “opnsense” which will allow us to easily have dashboards that are focused on OPNSense firewalls only, not accidentally picking data from other services.

If you prefer, you can also use your PMM installation instead (See PMM -> PMM Add Instance Menu) and pick “External Service”.

PMM External Service

 

After this step, we will already have some information available in our PMM installation.

 

PMM dashboard

 

The Node Summary Dashboard will pick up some of the OS metrics, however, as this dashboard is built with a focus on Linux rather than FreeBSD, we will not have all data populated or tested to be correct, and this should be seen as a lucky incident rather than an expected outcome.

The next step you can take is to look if there are any dashboards available for the system you’re looking to monitor.  A quick search located this dashboard on the Grafana website.

While this dashboard was a good start, it relied on very particular naming of the hosts in order to work and had some bugs which needed fixing.   If a given dashboard was not designed to work with PMM, you also often need to make some adjustments because PMM applies different labels to the metrics compared to a vanilla Prometheus installation.

I uploaded an updated dashboard back to the Grafana website.

This makes installing it with PMM very easy; just go to Import Dashboard and Enter Dashboard ID – 14150

 

import dashboard percona monitoring and management

 

Once the dashboard is imported you will see a variety of data the OpnSense built-in node_exporter provides:

 

OpnSense built-in node_exporter

 

That’s it!


Percona Monitoring and Management is free to download and use. Try it today!

Apr
01
2021
--

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression Feature

Infinitely Scalable Storage with High Compression FeatureIt is no secret that compute and storage costs are the main drivers of cloud bills. Migration of data from the legacy data center to the cloud looks appealing at first as it significantly reduces capital expense (CapEx) and keeps operational expenses (OpEx) under control. But once you see the bill, the lift and shift project does not look that promising anymore. See Percona’s recent open source survey which shows that many organizations saw an unexpected growth around cloud and data.

Storage growth is an organic process for the expanding business: more customers store more data, and more data needs more backups and disaster recovery storage for low RTO.

Today, the Percona Innovation Team, which is part of the Engineering organization, is proud to announce a new feature – High Compression. With this feature enabled, your MySQL databases will have infinite storage at zero cost.

The Problem

Our research team was digging into the problem of storage growth. They have found that the storage growth of a successful business inevitably leads to the increase of the cloud bill. After two years of research we got the data we need and the problem is now clear, and you can see it on the chart below:

The correlation is clearly visible – the more data you have, the more you pay.

The Solution

Once our Innovation Team received the data, we started working day and night on the solution. The goal was to change the trend and break the correlation. That is how after two years, we are proud to share with the community the High Compression feature. You can see the comparison of the storage costs with and without this new feature below:

Option 100 TB AWS EBS 100 TB AWS S3 for backups 100 TB AWS EBS + High compression 100 TB AWS S3 for backups + High Compression
Annual run rate

$120,000

$25,200

$1.2

< $1

As you see it is a 100,000x difference! What is more interesting, the cost of the storage with the High Compression feature enabled always stays flat and the chart now looks like this:

Theory

Not many people know, but data on disks is stored as bits, which are 0s and 1s. They form the binary sequences which are translated into normal data.

After thorough research, we came to the conclusion that we can replace the 1s with 0s easily. The formula is simple:

f(1) = 0

So instead of storing all these 1s, our High Compression feature stores zeroes only:

 

Implementation

The component which does the conversion is called the Nullifier, and every bit of data goes through it. We are first implementing this feature in Percona Operator for Percona XtraDB Cluster and below is the technical view of how it is implemented in Kubernetes:

As you see, all the data written by the user (all Insert or Update statements) goes through the Nullifier first, and only then are stored on the Persistent Volume Claim (PVC). With the High Compression feature enabled, the size of the PVC can be always 1 GB.

Percona is an open source company and we are thrilled to share our code with everyone. You can see the Pull Request for the High Compression feature here. As you see in the PR, our feature provides the Nullifier through the underestimated and very powerful Blackhole engine.

  if [ "$HIGH_COMPRESSION" == 'yes' ]; then
        sed -r "s|^[#]?default_storage_engine=.*$|default_storage_engine=BLACKHOLE|" ${CFG} 1<>${CFG}
        grep -E -q "^[#]?pxc_strict_mode" "$CFG" || sed '/^\[mysqld\]/a pxc_strict_mode=PERMISSIVE\n' ${CFG} 1<>${CFG}
    fi

The High Compression feature will be enabled by default starting from PXC Operator version 1.8.0, but we have added the flag into

cr.yaml

to disable this feature if needed:

spec.pxc.highCompression: true

.

Backups and with the High Compression feature are blazing fast and take seconds with any amount of data. The challenge our Engineering team is working on now is recovery. The Nullifier does the job, but recovering the data is hard. We are confident that De-Nullifier will be released in 1.8.0 as well.

Conclusion

Percona is spearheading innovation in the database technology field. The High Compression feature solves the storage growth problem and as a result, reduces the cloud bill significantly. The release of the Percona Kubernetes Operator for Percona XtraDB Cluster 1.8.0 is planned for mid-April, but this feature is already available in Tech Preview.

As a quick peek at our roadmap, we are glad to share that the Innovation Team has already started working on the High-Density feature, which will drastically reduce the compute footprint required to run MySQL databases.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com