Aug
31
2020
--

RocksDB Variables in Percona Server for MySQL, Updates to Percona Server for MongoDB: Release Roundup August 31, 2020

Percona Releases

Percona ReleasesIt’s release roundup time here at Percona!

Our Release Roundups showcase the latest software updates, tools, and features to help you manage and deploy our software, with highlights and critical information, as well as links to the full release notes and direct links to the software or service itself.

Today’s post includes those releases and updates that have come out since August 17, including new features and improvements for Percona Server for MySQL 5.7.31-34 as well as updated versions and new features for our MongoDB software.

 

Percona Server for MySQL 5.7.31-34

On August 24th, 2020, Percona Server for MySQL 5.7.31-34 was released. It includes all the features and bug fixes available in MySQL 5.7.31 Community Edition in addition to enterprise-grade features developed by Percona. Along with some bug fixes, there are several new features and improvements in this release, including Document RocksDB variables: rocksdb_max_background_compactionsrocksdb_max_background_flushes, and rocksdb_max_bottom_pri_background_compactions, the addition of Coredumper functionality, and the enhancement of crash artifacts (core dumps and stack traces) to provide additional information to the operator.

Download Percona Server for MySQL 5.7.31-34

 

Percona Backup for MongoDB 1.3.0

Percona Backup for MongoDB 1.3.0 was released on August 26, 2020. It is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a single replica set), and for restoring those backups to a specific point in time. New features include adding oplog archiver thread for PITR, and the ability to modify “pbm restore” to accept arbitrary point in time when PITR oplog archives available.

Download Percona Backup for MongoDB 1.3.0

 

Percona Server for MongoDB 4.4.0-1

August 26, 2020, saw the release of Percona Server for MongoDB 4.4.0-1. It is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.4.0 Community Edition. It supports MongoDB 4.4.0 protocols and drivers.

This release includes all features of MongoDB 4.4.0 Community Edition and provides Enterprise-level enhancements for free.

Download Percona Server for MongoDB 4.4.0-1

 

Percona Distribution for MongoDB 4.4.0

Percona Distribution for MongoDB 4.4.0, a collection of solutions to run and operate your MongoDB efficiently with the data being consistently backed up, was released on August 26, 2020. It includes the following components:

  • Percona Server for MongoDB – a fully compatible open source, drop-in replacement for MongoDB.
  • Percona Backup for MongoDB – a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets.

This release of Percona Distribution for MongoDB is based on Percona Server for MongoDB 4.4.0-1 and Percona Backup for MongoDB 1.3.0.

Download Percona Distribution for MongoDB 4.4.0

 

That’s it for this roundup, and be sure to follow us on Twitter to stay up-to-date on the most recent releases! Percona is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training, and software for MySQL, MariaDB, MongoDB, PostgreSQL, and other open source databases in on-premises and cloud environments.


We understand that choosing open source software for your business can be a potential minefield. You need to select the best available options, which fully support and adapt to your changing needs. Choosing the right open source software can allow you access to enterprise-level features, without the associated costs.

In our white paper, we discuss the key features that make open source software attractive, and why Percona’s software might be the best option for your business.

Download: When is Percona Software the Right Choice?

Aug
31
2020
--

Percona Monthly Bug Report: August 2020

Percona Monthly Bug Report August 2020

Percona Monthly Bug Report August 2020At Percona, we operate on the premise that full-transparency makes a product better. We strive to build the best open-source database products, but also to help you manage any issues that arise in any of the databases that we support. And, in true open-source form, report back on any issues or bugs you might encounter along the way.

We constantly update our bug reports and monitor other boards to ensure we have the latest information, but we wanted to make it a little easier for you to keep track of the most critical ones. This monthly post is a central place to get information on the most noteworthy open and recently resolved bugs. 

In this edition of our monthly bug report, we have the following list of bugs:

Percona Server/MySQL Bugs

PS-7264(MySQL#83799):  [ERROR] InnoDB: dict_load_foreigns() returned 38 for ALTER TABLE

Affects Version/s: 5.6,5.7  [Tested/Reported version 5.6.49,5.7.30 ]

Critical since a user can easily hit this bug when performing ALTER on the table with FK for specified cases in the bug report. With debug build, it crashes. 

MySQL#91977: Dropping Large Table Causes Semaphore Waits; No Other Work Possible

Affects Version/s: 5.7,8.0  [Tested/Reported version 5.7.23, 8.0.12]

Running DROP/ALTER on the large table could lead to this bug. DROP/ALTER query will be stuck in ‘checking permissions’ state  and later it may crash mysqld due to long semaphore

Wait. It’s critical since it can result in unplanned downtime. The issue also evident with the pt-online-schema-change tool while performing ALTER operations on large tables.

PS-6990(MySQL#100118): Server doesn’t restart because of too many gaps in the mysql.gtid_executed table

Affects Version/s: 8.0  [Tested/Reported version 8.0.19, 8.0.20,8.0.21]

Fixed Version/s: 8.0.21 (Next release)

This issue introduces due to the Clone plugin replication coordinates worklog (WL#9211) changes for updating the mysql.gtid_executed table. With this bug, Replica not usable, as mysql.gtid_executed is getting huge, crashes, and never comes back because of the way how the GTID compression works.

PS-7203:  audit plugin memory leak on replicas when opening tables.

Affects Version/s: 5.7,8.0  [Tested/Reported version 5.7.26-29, 5.7.29-32]

Fixed Version/s: 5.7.31, 8.0.21 (Next release)

Audit plugin could leak memory resulting in OOM on the slave if STATEMENT base replication is used.

PS-7163(MySQL#99286):Concurrent update cause crash in row_search_mvcc

Affects Version/s: 5.7  [Tested/Reported version 5.7.29]

Concurrent updates on the same record (DELETE/INSERT and UPDATE from different sessions) could lead to this crash. This bug is critical since such operations are common for many database environments.

Percona Xtradb Cluster

PXC-3373: [ERROR] WSREP: Certification exception: Unsupported key prefix: ^B: 71 (Protocol error) at galera/src/key_set.cpp:throw_bad_prefix():152

Affects Version/s: 5.7   [Tested/Reported 5.7.30]

IST to the lower version node will fail with this error when the write load is running on the Donor node. Rolling upgrade is the most commonly used method resulting in a different version of pxc nodes for while and in such cases, the user can experience this issue easily.

Possible workaround,

  • Upgrade pxc node to have the same version across the cluster.
  • Stop write load on the donor node while IST is running. 

PXC-3352: Unexpected ERROR 1205 modifying a child table in an FK relationship

Affects Version/s: 5.7   [Tested/Reported 5.7.28]

Fixed Version/s: 5.7.31, 8.0.20 (Next release)

Affected users using foreign keys with PXC, When deleting/updating from a child table in a FK relationship if the parent table’s referenced rows are locked, the operation on a child table failed with lock wait timeout error when the parent table is unlocked.  

Percona Xtrabackup

PXB-2237: PXB crashes during a backup when an encrypted table is updated

Affects Version/s:  2.4  [Tested/Reported version 2.4.20]

Databases with encrypted tables are affected by this bug. As a workaround, Taking backup in non-peak hours could avoid this crash.

PXB-2178: Restoring datadir from partial backup results in an inconsistent data dictionary

 Affects Version/s:  8.0  [Tested/Reported version 8.0.11

As a result of this bug after restored, you will see additional database/s which were not part of a partial backup. Issue evident only in Xtrabackup 8.0 due to new data dictionary implementation in MySQL 8.0 version, this issue is not reproducible with xtrabackup 2.4. 

The workaround for this issue is to use “DROP DATABASE IF EXISTS” for dropping unwanted extra database/s. 

Percona Toolkit

PT-1747: pt-online-schema-change was bringing the database into a broken state when applying the “rebuild_constraints” foreign keys modification method if any of the child tables were blocked by the metadata lock.

Affects Version/s:  3.x   [Tested/Reported version 3.0.13]

Critical bug since it can cause data inconsistency in the user environment. It potentially affects who rebuilds tables with foreign keys.

PT-1853: pt-online-schema-change doesn’t handle self-referencing foreign keys properly

When using pt-osc to change a table that has a self FK pt-osc creates the FK pointing to the old table instead of pointing to the _new table. Because of it pt-osc needs to rebuild the FK after swapping the tables (DROP the FK and recreating it again pointing to the _new table). This can cause issues because the INPLACE algorithm is supported when foreign_key_checks is disabled. Otherwise, only the COPY algorithm is supported.

Affects Version/s:  3.x  [Tested/Reported version 3.2.0]

Fixed Version/s: 3.2.1

Affects who rebuild tables with a self-referencing foreign key.

PMM  [Percona Monitoring and Management]

PMM-4547: MongoDB dashboard replication lag count incorrect

Affects Version/s:  2.9  [Tested/Reported version 2.0,2.9.0]

Replica lag takes the lag value from the delayed node which makes it unreliable for all nodes. So PMM cannot be used to monitor delays.

PMM-6336:   Incompatible pmm-admin options: ‘–disable-queryexamples’ and ‘–query-source=perfschema’

In case the user doesn’t want to show a query example that has “Sensitive Data” on PMM Query Analytics UI  they can disable it with the help of –disable-queryexamples option in pmm-admin.

But this option is not working as expected when mysql instance added using –query-source perfschema option and on PMM Query Analytics UI  query examples is visible with original data.

Workaround: we do not see this issue when using –query-source slowlog 

Affects Version/s:  2.9  [Tested/Reported version 2.9]

Fixed version/s: 2.10.0

PMM-5823: pmm-server log download and api to get the version failed with timeout

Affects Version/s:  2.8  [Tested/Reported version 2.2]

Occurring at irregular intervals and only affected to pmm docker installation with no external internet access from pmm-server docker container. The issue only visible for a while(around 5-10 mins) after starting pmm-server later you will not see this issue.

Summary

We welcome community input and feedback on all our products. If you find a bug or would like to suggest an improvement or a feature, learn how in our post, How to Report Bugs, Improvements, New Feature Requests for Percona Products.

For the most up-to-date information, be sure to follow us on Twitter, LinkedIn, and Facebook.

Quick References:

Percona JIRA  

MySQL Bug Report

Report a Bug in a Percona Product

___

About Percona:

As the only provider of distributions for all three of the most popular open source databases—PostgreSQL, MySQL, and MongoDB—Percona provides expertise, software, support, and services no matter the technology.

Whether its enabling developers or DBAs to realize value faster with tools, advice, and guidance, or making sure applications can scale and handle peak loads, Percona is here to help.

Percona is committed to being open source and preventing vendor lock-in. Percona contributes all changes to the upstream community for possible inclusion in future product releases.

Aug
31
2020
--

Talking Drupal # 262 – Backwards Compatibility

In today’s show we discuss Jeff Geerling’s recent blog post, Backwards Compatibility” While we announced this post on the last show, we’ve had time to digest it. We are talking about it today and a few other things.

www.talkingdrupal.com/262

Topics

  • John – new door
  • Nic – Taskmasters
  • Stephen – My hero
  • People Powered
  • Community feedback
  • New England Drupal Camp
  • Layout Building and Translation
  • Backward Compatibility

Resources

Taskmasters

People Powered

New England Drupal Camp

Did breaking backwards compatibility kill Drupal?

Hosts

Stephen Cross – www.stephencross.com @stephencross

John Picozzi – www.oomphinc.com @johnpicozzi

Nic Laflin – www.nLighteneddevelopment.com @nicxvan

Aug
28
2020
--

Steno raises $3.5 million led by First Round to become an extension of law offices

The global legal services industry was worth $849 billion in 2017 and is expected to become a trillion-dollar industry by the end of next year. Little wonder that Steno, an LA-based startup, wants a piece.

Like most legal services outfits, what it offers are ways for law practices to run more smoothly, including in a world where fewer people are meeting in conference rooms and courthouses and operating instead from disparate locations.

Steno first launched with an offering that centers on court reporting. It lines up court reporters, as well as pays them, removing both potential headaches from lawyers’ to-do lists.

More recently, the startup has added offerings like a remote deposition videoconferencing platform that it insists is not only secure but can manage exhibit handling and other details in ways meant to meet specific legal needs.

It also, very notably, has a lending product that enables lawyers to take depositions without paying until a case is resolved, which can take a year or two. The idea is to free attorneys’ financial resources — including so they can take on other clients — until there’s a payout. Of course, the product is also a potentially lucrative one for Steno, as are most lending products.

We talked earlier this week with the company, which just closed on a $3.5 million seed round led by First Round Capital (it has now raised $5 million altogether).

Unsurprisingly, one of its founders is a lawyer named Dylan Ruga who works as a trial attorney at an LA-based law group and knows first-hand the biggest pain points for his peers.

More surprising is his co-founder, Gregory Hong, who previously co-founded the restaurant reservation platform Reserve, which was acquired by Resy, which was acquired by American Express. How did Hong make the leap from one industry to a seemingly very different one?

Hong says he might not have gravitated to the idea if not for Ruga, who was Resy’s trademark attorney and who happened to send Hong the pitch behind Steno to get Hong’s advice. He looked it over as a favor, then he asked to get involved. “I just thought, ‘This is a unique and interesting opportunity,’ and said, ‘Dylan, let me run this.’ ”

Today the 19-month-old startup has 20 full-time employees and another 10 part-time staffers. One major accelerant to the business has been the pandemic, suggests Hong. Turns out tech-enabled legal support services become even more attractive when lawyers and everyone else in the ecosystem is socially distancing.

Hong suggests that Steno’s idea to marry its services with financing is gaining adherents, too, including amid law groups like JML Law and Simon Law Group, both of which focus largely on personal injury cases.

Indeed, Steno charges — and provides financing — on a per-transaction basis right now, even while its revenue is “somewhat recurring,” in that its customers constantly have court cases.

Still, a subscription product is being considered, says Hong. So are other uses for its videoconferencing platform. In the meantime, says Hong, Steno’s tech is “built very well” for legal services, and that’s where it plans to remain focused.

Aug
28
2020
--

ProxySQL Binary Search Solution for Rules

We sometimes receive challenging requests… this is a story about one of those times.

The customer has implemented a sharding solution and would like us to review alternatives or improvements. We analyzed the possibility of using ProxySQL as it looked to be a simple implementation. However, as we had 200 shards we had to implement 200 rules — the first shard didn’t have much overload, but the latest one had to go through 200 rules and took longer.

My first idea was to use FLAGIN and FLAGOUT creating a B-Tree, but the performance was the same. Reviewing the code, I realized that the rules were implemented as a list, which means that, in the end, all the rules were going to be processed until hit with the right one and FLAGIN is used just to filter out.

At that point, I asked, what could I do? Is it possible to implement it differently? What is the performance impact?

One Problem, Two Solutions

I think that it would be worthy to clarify again that I have to change the code because I found no performance gain with the current implementation. This means that writing the rules to take the advantage of the binary search took me halfway, and implementing the rules with Map allowed the performance gain expected, as now we are jumping to the right rule chain and skipping the others.

Solution

I decided to change the ProxySQL code to use a different structure (Map) to store the rules and when the FLAGOUT is there, start that path. This is 100% proof of concept, do not use the code in this repo on production as it is not thoroughly tested and might have several bugs. However, we can trust the behavior and results of the test under the scenario that I’m presenting.

Base case

Using ProxySQL without any change and with 1 rule per shard will be our base case. This means, that it is going to evaluate 1 rule for shard-1 but 200 evaluations need to be made to reach the rule for shard-200.

In this case, the rules will be like this:

insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 000',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 001',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 002',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 003',1,0);

Binary search use case

In order to reduce the number of evaluations, I decided to use the divide and conquer idea. I created the rules in this way:

replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* [0-1]',0,01,0,999);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 0' ,01, 0,1, 0);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 1' ,01, 0,1, 1);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 2' , 0, 0,1, 2);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 3' , 0, 0,1, 3);

will be more rules to write but the number of evaluations are less and evenly distributed:

Shard | Amount of Evaluations
0     | 2
1     | 3
2     | 2
3     | 3

Rule evaluation

Take into account that evaluating a rule means basically reading the parameters and comparing them. This might not be hard work if you have a few amounts of rules, but we had 200 shards, so we need at least 200 rules. Let’s compare how many evaluations are being made on each case:

root@ProxySQL_git:~/git/proxysql# grep "Evaluating rule_id" /var/lib/proxysql/proxysql.log | wc -l
202000
root@ProxySQL_git:~/git/proxysql# grep "Evaluating rule_id" /var/lib/proxysql/proxysql.log | wc -l
 37600

The first number is the number of evaluations that ProxySQL needs using the List, the second is using the B-Tree solution and Map. As you can see, we are evaluating 5.3 times less.

Tests

For the test, I created 3 EC2 instances with these roles:

  • App simulator which is going to run a script that simulates 32 threads running 2M queries like this:
/* 000 */ select 'test' from dual;

  • ProxySQL Server which is going to run both version with the best solution each.
  • Percona Server

The original version of ProxySQL was able to execute 36k of queries per second and using Map and B-Tree was able to execute 61k of queries per second, a 40% increase in throughput.

Another thing to consider is the load in the ProxySQL server for both tests:

In the first picture, we see that the server is reaching 90% of CPU usage but using Map and B-Tree is less than 60%.

Conclusions

I think this proof of concept showed 3 important facts:

  1. That ProxySQL is an amazing tool that is still growing.
  2. The performance penalty using a large number of rules could be reduced.
  3. Writing rules taking into account the binary search might not be only a solution for sharding, could be used for queries hashes for Read-Write splitting.
Aug
28
2020
--

ProxySQL Overhead — Explained and Measured

ProxySQL brings a lot of value to your MySQL infrastructures such as Caching or Connection Multiplexing but it does not come free — your database needs to go through additional processing traffic which adds some overhead. In this blog post, we’re going to discuss where this overhead comes from and measure such overhead. 

Types of Overhead and Where it Comes From 

There are two main types of overhead to consider when it comes to ProxySQL — Network Overhead and Processing Overhead. 

Network Overhead largely depends on where you locate ProxySQL. For example, in case you deploy ProxySQL on the separate host (or hosts) as in this diagram: 

The application will have added network latency for all requests, compared to accessing MySQL Servers directly. This latency can range from a fraction of milliseconds if ProxySQL is deployed at the same local network to much more than that if you made poor choices with ProxySQL locations.  

I have seen exceptionally poor deployment cases with ProxySQL deployed in the different regions from MySQL and Application causing a delay of tens of milliseconds (and more than 100% of overhead for many queries).

Processing Overhead

The second kind of overhead is Processing Overhead — every request which ProxySQL receives undertakes additional processing on the ProxySQL side (compared to talking to MySQL Directly). If you have enough CPU power available (CPU is not Saturated) when the main drivers to cost of such processing will be the size of the query, its result set size, as well as your ProxySQL configuration. The more query rules you have, and the more complicated they are the more processing overhead you should expect. 

In the worst-case scenario, I’ve seen thousands of regular expression based query rules which can add very high overhead. 

Another reason for high Processing overhead can be improper ProxySQL configuration. ProxySQL as of Version 2.0.10 defaults to a maximum of 4 processing threads (see mysql-threads global variable) which limits use no more than 4 CPU cores. If you’re running ProxySQL on the server with a much larger number of CPU cores and see ProxySQL pegged with CPU usage you may increase the number up to a matching number of your CPU cores.

Linux “top” tool is a good way to see if ProxySQL is starved for resources — if you have mysql-threads set at 4 and it is showing 400% of CPU usage — It is the problem.

Also watch for overall CPU utilization, especially if something else is running on the system beyond ProxySQL – oversubscribed CPU will cause additional processing delays. 

Reducing Overhead 

In this blog post we look at the additional Overhead ProxySQL introduces through it also can reduce it — overhead of establishing network connection (especially with TLS) can be drastically lower if you run ProxySQL which is local to the application instance and that maintains a persistent connection to a MySQL Server. 

Let’s Measure It!

I decided not to measure Network overhead because it is way too environment specific but rather look at the Processing Overhead, in case we run MySQL, ProxySQL, and Benchmark Client on the same box. We will try using TCP/IP and Unix Domain Socket to connect to ProxySQL because it makes quite a difference and we also look at Prepared Statements and standard Non-Prepared Statements. Google Spreadsheet with all results and benchmark parameters is available here.

We use Plain ProxySQL setup with no query rules and only one MySQL Server configured so overhead is minimal in this regard.

To get stable results with single-thread tests we had to set up CPU affinity as described in this blog post.

MySQLDump

Let’s start with the most non-scientific test — running MySQLDump on the large table (some 2GB) and measuring how long it takes. This test exposes how expensive result processing is in ProxySQL as query routing work in this case is negligible.

We can see 20% longer times with ProxySQL (through considering results processing by mysqldump actual query execution time difference is likely higher).

Another interesting way to think about it is — we have 4.75sec added to process 10mil rows meaning the ProxySQL overhead is 475ns per about 200-byte row which is actually pretty good.

64 Concurrent Connections Workload 

For this workload, I’m using the server with 28 Cores and 56 logical CPU threads and I have to raise mysql-threads to 32 to make sure ProxySQL is not keeping itself on a diet.

There is a lot of interesting data here. First, we can see disabling Prepared Statements gives us a 15% slowdown with direct MySQL connection and about 13.5% when going through ProxySQL, which makes sense as Processing overhead on ProxySQL side should not increase as much when Prepared Statements are disabled.

The performance between direct connection and going through ProxySQL is significant, though going directly is almost 80% faster when Prepared Statements are in use and over 75% when Prepared Statements are disabled. 

If you think about these numbers — considering sysbench itself is taking some resources, for trivial primary key lookup queries the number of resources ProxySQL requires is comparable to those needed by MySQL Server itself to serve the query.

Single Connection Workload

Let’s now take a look at the performance of the same simple point lookup queries but using only a single thread. We also schedule MySQL, Sysbench, ProxySQL to the different CPU cores so there is no significant contention for CPU resources and we can look at efficiency. In this test, all connections are done using UNIX Socket so we’re looking at best-case scenarios and Prepared Statements are Enabled.

The direct connection gives some 55% better throughput than ProxySQL. 

The other way we can do the math is to see how long does it takes to server the query directly and with ProxySQL in the middle — it is 46 microseconds with MySQL Directly and 71 microseconds when going through ProxySQL, meaning ProxySQL adds around 25 microseconds. 

While 25 microseconds is a large portion of total query execution in this single-host environment and trivial queries it may be a lot less significant for more complicated queries and network-based deployments.

Unix Socket vs TCP/IP

As I recently wrote — there is quite a performance difference between using TCP/IP or Unix Socket for local MySQL Connection.  It is reasonable to assume that the same would apply to ProxySQL deployment, only with ProxySQL we have two connections to take care of — the connection between ProxySQL and MySQL Server and between ProxySQL and Application. In our single host test, we can use Unix Socket in both cases. If you deploy ProxySQL as your application sidecar or on MySQL Server you will be able to use Unix socket at least for one of such connections.

Letters “U” and “T” correspond to connection type — the “UU” means Unix Socket was used for both connections and “TT” means TCP/IP was used in both places.

Results are quite expected — for best performance you should use Unix Socket, but even using Socket for one of the connection types improves performance.

Using TCP/IP for both connection types instead of Unix Socket reduces performance by more than 20%.

If we do the same math to compute how much latency going through TCP/IP adds — it is 20 microseconds, meaning ProxySQL through TCP/IP adds almost double processing latency compared to ProxySQL via Unix Socket.

Summary

ProxySQL is quite efficient — 25-45 microseconds of added latency per request and hundreds of nanoseconds per row of the result set is going to be acceptable for most workloads and can more than pay for itself with features ProxySQL brings to the table. Poor ProxySQL configuration though can yield much higher overhead. Want to be confident? Perform similar tests for your real ProxySQL deployment with its full rules configuration within its real deployment topology. 

Aug
27
2020
--

Box benefits from digital transformation as it raises its growth forecast

Box has always been a bit of an enigma for Wall Street, and perhaps for enterprise software in general. Unlike vendors who shifted to the cloud tools like HR, CRM or ERP, Box has been building a way to manage content in the cloud. It’s been a little harder to understand than these other enterprise software stalwarts, but slowly but surely Box has shifted into a more efficient, and dare we say, profitable public company.

Yesterday the company filed its Q2 2021 earnings report and it was solid. In fact, the company reported revenue of $192.3 million. That’s an increase of 11% year over year and it beat analyst’s expectations of $189.6 million, according to the company. Meanwhile the guidance looked good too, moving from a range of $760 to $768 million for the year to a range of $767 to $770 million.

All of this points to a company that is finding its footing. Let’s not forget, Starboard Value bought a 7.5% stake in the company a year ago, yet the activist investor has mostly stayed quiet and Box seems to be rewarding its patience as the pandemic acts as a forcing function to move customers to the cloud faster — and that seems to be working in Box’s favor.

Let’s get profitable

Box CEO Aaron Levie has not been shy about talking about how the pandemic has pushed companies to move to the cloud much more quickly than they probably would have. He said as a digital company, he was able to move his employees to work from home and remain efficient because of tools like Slack, Zoom, Okta and, yes, Box were in place to help them do that.

All of that helped keep the business going, and even thriving, through the extremely difficult times the pandemic has wrought. “We’re fortunate about how we’ve been able to execute in this environment. It helps that we’re 100% SaaS, and we’ve got a great digital engine to perform the business,” he said.

He added, “And at the same time, as we’ve talked about, we’ve been driving greater profitability. So the efficiency of the businesses has also improved dramatically, and the result was that overall we had a very strong quarter with better growth than expected and better profitability than expected. As a result, we were able to raise our targets on both revenue growth and profitability for the rest of the year,” Levie told TechCrunch.

Let’s get digital

Box is seeing existing customers and new customers alike moving more rapidly to the cloud, and that’s working in its favor. Levie believes that companies are in the process of reassessing their short and longer term digital strategy right now, and looking at what workloads they’ll be moving to the cloud, whether that’s cloud infrastructure, security in the cloud or content.

“Really customers are going to be trying to find a way to be able to shift their most important data and their most important content to the cloud, and that’s what we’re seeing play out within our customer base,” Levie said.

He added, “It’s not really a question anymore if you’re going to go to the cloud, it’s which cloud are you going to go to. And we’ve obviously been very focused on trying to build that leading platform for companies that want to be able to move their data to a cloud environment and be able to manage it securely, drive workflows on it, integrate it across our applications and that’s what we’re seeing,” he said.

That translated into a 60% increase quarter over quarter on the number of large deals over $100,000, and the company crossed 100,000 customers globally on the platform in the most recent quarter, so the approach seems to be working.

Let’s keep building

As with Salesforce a generation earlier, Box decided to build its product set on a platform of services. It enabled customers to tap into these base services like encryption, workflow and metadata and build their own customizations or even fully functional applications by taking advantage of the tools that Box has already built.

Much like Salesforce president and COO Bret Taylor told TechCrunch recently, that platform approach has been an integral part of its success, and Levie sees it similarly for Box. calling it fundamental to his company’s success, as well.

“We would not be here without that platform strategy,” he said. “Because we think about Box as a platform architecture, and we’ve built more and more capabilities into that platform, that’s what is giving us this strategic advantage right now,” he said.

And that hasn’t just worked to help customers using Box, it also helps Box itself to develop new capabilities more rapidly, something that has been absolutely essential during this pandemic when the company has had to react quickly to rapidly changing customer requirements.

Levie is 15 years into his tenure as CEO of Box, but he still sees a company and a market that is just getting started. “The opportunity is only bigger, and it’s more addressable by our product and platform today than it has been at any point in our history. So I think we’re still in the very early stages of digital transformation, and we’re in the earliest stages for how document and content management works in this modern era.”

Aug
27
2020
--

More on Checkpoints in InnoDB MySQL 8

Recently I posted about checkpointing in MySQL, where MySQL showed interesting “wave” behavior.

Soon after Dimitri posted a solution with how to fix “waves,” and I would like to dig a little more into proposed suggestions, as there are some materials to process.

This post will be very heavy on InnoDB configuration, so let’s start with the basic configuration for MySQL, but before that some initial environment.

I use MySQL version 8.0.21 on the hardware as described here

As for the storage, I am not using some “old dusty SSD”, but production available Enterprise-Grade Intel SATA SSD D3-S4510. This SSD is able to handle the throughput of 468MiB/sec of random writes or 30000 IOPS of random writes of 16KiB blocks.

So initial configuration for my test was:

[mysqld]
datadir= /data/mysql8-8.0.21
user=mysql
bind_address = 0.0.0.0

socket=/tmp/mysql.sock
log-error=error.log

ssl=0
performance_schema=OFF

skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000

join_buffer_size=256K
sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=10G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size= 140G
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_purge_threads=4
innodb_lru_scan_depth=512
innodb_log_buffer_size=64M

default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0

#innodb_monitor_enable=all
max_prepared_stmt_count=1000000

innodb_adaptive_hash_index=1
innodb_monitor_enable='%'

innodb-buffer-pool-load-at-startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF

There is a lot of parameters, so let’s highlight the most relevant for this test:

innodb_buffer_pool_size= 140G

Buffer pool size is enough to fit all data, which is about 100GB in size

innodb_adaptive_hash_index=1

Adaptive hash index is enabled (as it comes in default InnoDB config)

innodb_buffer_pool_instances=8

This is what defaults provide, but I will increase it, following my previous post. 

innodb_log_file_size=10G

innodb_log_files_in_group=2

These parameters define the limit of 20GB for our redo logs, and this is important, as our workload will be “redo-log” bounded, as we will see from the results

innodb_io_capacity=2000

innodb_io_capacity_max=4000

You may ask, why do I use 2000 and 4000, while the storage can handle 30000 IOPS.

This is a valid point, and as we can see later, these parameters are not high enough for this workload, but also it does not mean we should use them all the way up to 30000, as we will see from the results.

MySQL Manual says the following about innodb_io_capacity:

“The innodb_io_capacity variable defines the overall I/O capacity available to InnoDB. It should be set to approximately the number of I/O operations that the system can perform per second (IOPS). When innodb_io_capacity is set, InnoDB estimates the I/O bandwidth available for background tasks based on the set value.” 

From this, you may get the impression that if you set innodb_io_capacity to I/O bandwidth of your storage, you should be fine. Though this part does not say what you should take as I/O operations. For example, if your storage can perform 500MB/sec, then if you do 4KB block IO operations it will be 125000 IO per second, and if you do 16KB IO, then it will be 33000 IO per second. 

MySQL manual leaves it up to your imagination, but as InnoDB typical page size is 16KB, let’s assume we do 16KB blocks IO.

However later on that page, we can read:

“Ideally, keep the setting as low as practical, but not so low that background activities fall behind. If the value is too high, data is removed from the buffer pool and change buffer too quickly for caching to provide a significant benefit. For busy systems capable of higher I/O rates, you can set a higher value to help the server handle the background maintenance work associated with a high rate of row changes”

and

“Consider write workload when tuning innodb_io_capacity. Systems with large write workloads are likely to benefit from a higher setting. A lower setting may be sufficient for systems with a small write workload.”

I do not see that the manual provides much guidance about what value I should use, so we will test it.

Initial results

So if we benchmark with initial parameters, we can see the “wave” pattern.

 

As for why this is happening, let’s check Percona Monitoring and Management “InnoDB Checkpoint Age” chart:

Actually InnoDB Flushing by Type in PMM does not show sync flushing yet, so I had to modify chart a little to show “sync flushing” in orange line:

And we immediately see that Uncheckpointed Bytes exceed Max Checkpoint Age in 16.61GiB, which is defined by 20GiB of innodb log files. 16.61GiB is less than 20GB, because InnoDB reserves some cushion for the cases exactly like this, so even if we exceed 16.61GiB, InnoDB still has an opportunity to flush data.

Also, we see that before Uncheckpointed Bytes exceed Max Checkpoint Age, InnoDB flushes pages with the rate 4000 IOPS, just as defined by innodb_io_capacity_max.

We should try to avoid the case when Uncheckpointed Bytes exceed Max Checkpoint Age, because when it happens, InnoDB gets into “emergency” flushing mode, and in fact, this is what causes the waves we see. I should have detected this in my previous post, mea culpa.

So the first conclusion we can make – if InnoDB does not flush fast enough, what if we increase innodb_io_capacity_max ? Sure, let’s see. And for the simplification, for the next experiments, I will use

Innodb_io_capacity = innodb_io_capacity_max, unless specified otherwise.

Next run with Innodb_io_capacity = innodb_io_capacity_max = 7000

Not much improvement and this also confirmed by InnoDB Checkpoint ge chart

InnoDB tries to flush more pages per second up to 5600 pages/sec, but it is not enough to avoid exceeding Max Checkpoint Age.

Why is this the case? The answer is a double write buffer.

Even though MySQL improved the doublewrite buffer in MySQL 8.0.20, it does not perform well enough with proposed defaults. 

Well, at least the problem was solved because previous Oracle ran benchmarks with disabled doublewrite, just to hide and totally ignore the issue with doublewrite. For the example check this.

But let’s get back to our 8.0.21 and fixed doublewrite.

Dimiti mentions:

“the main config options for DBLWR in MySQL 8.0 are:

  • innodb_doublewrite_files = N
    innodb_doublewrite_pages = M”

Let’s check the manual again:

“The innodb_doublewrite_files variable is intended for advanced performance tuning. The default setting should be suitable for most users.

innodb_doublewrite_pages

The innodb_doublewrite_pages variable (introduced in MySQL 8.0.20) controls the number of maximum number of doublewrite pages per thread. If no value is specified, innodb_doublewrite_pages is set to the innodb_write_io_threads value. This variable is intended for advanced performance tuning. The default value should be suitable for most users.

Was it wrong to assume that innodb_doublewrite_files and  innodb_doublewrite_pages provides the value suitable for our use case?

But let’s try with the values Dmitri recommended to look into, I will use

innodb_doublewrite_files=2 and innodb_doublewrite_pages=128

Results with innodb_doublewrite_files=2 and innodb_doublewrite_pages=128

The problem with waves is fixed! 

And InnoDB Checkpoint Age chart:

Now we are able to keep Uncheckpointed Bytes under Max Checkpoint Age, and this is what fixed “waves” pattern.

We can say that parallel doublewrite is a new welcomed improvement, but the fact that one has to change innodb_doublewrite_pages in order to get improved performance is the design flaw in my opinion.

But there are still a lot of variations in 1 sec resolution and small drops. Before we get to them, let’s take a look at another suggestion: use –innodb_adaptive_hash_index=0 ( that is to disable Adaptive Hash Index). I will use AHI=0 on the charts to mark this setting.

Let’s take a look at the results with improved settings and with –innodb_adaptive_hash_index=0

Results with –innodb_adaptive_hash_index=0

To see what is the real improvement with –innodb_adaptive_hash_index=0 , let’s compare barcharts:

Or in numeric form:

settings Avg tps, last 2000 sec
io_cap_max=7000,doublewrite=opt 7578.69
io_cap_max=7000,doublewrite=opt,AHI=0 7996.33

So –innodb_adaptive_hash_index=0 really brings some improvements, about 5.5%, so I will use  –innodb_adaptive_hash_index=0 for further experiments.

Let’s see if increased innodb_buffer_pool_instances=32 will help to smooth periodical variance.

Results with innodb_buffer_pool_instances=32

So indeed using innodb_buffer_pool_instances=32 gets us less variations, keeping overall throughput about the same. It is 7936.28 tps for this case.

Now let’s review the parameter innodb_change_buffering=none, which Dmitri also suggests.

Results with innodb_change_buffering=none

There is NO practical difference if we disable innodb_change_buffer.

And if we take a look at PMM change buffer chart:

We can see there is NO Change Buffer activity outside of the initial 20 mins. I am not sure why Dimitri suggested disabling it. In fact, Change Buffer can be quite useful, and I will show it in my benchmark for the different workloads.

Now let’s take a look at suggested settings with Innodb_io_capacity = innodb_io_capacity_max = 8000. That will INCREASE innodb_io_capacity_max , and compare to results with innodb_io_capacity_max = 7000.

Or in tabular form:

settings Avg tps, last 2000 sec
io_cap_max=7000,doublewrite=opt,AHI=0,BPI=32 7936.28
io_cap_max=8000,doublewrite=opt,AHI=0,BPI=32 7693.08

Actually with innodb_io_capacity_max=8000 the throughput is LESS than with  innodb_io_capacity_max=7000

Can you guess why? 

Let’s compare InnoDB Checkpoint Age.

This is for innodb_io_capacity_max=8000 :

And this is for innodb_io_capacity_max=7000 

This is like a child’s game: Find the difference.

The difference is that with  innodb_io_capacity_max=7000
Uncheckpointed Bytes is 13.66 GiB,
and with innodb_io_capacity_max=8000
Uncheckpointed Bytes is 12.51 GiB

What does it mean? It means that with innodb_io_capacity_max=7000 HAS to flush LESS pages and still keep within Max Checkpoint Age.

In fact, if we try to push even further, and use innodb_io_capacity_max=innodb_io_capacity=6500 we will get InnoDB Checkpoint Age chart as:

Where Uncheckpointed Bytes are 15.47 GiB. Does it improve throughput? Absolutely!

settings Avg tps, last 2000 sec
io_cap_max=6500,doublewrite=opt,AHI=0,BPI=32 8233.628
io_cap_max=7000,doublewrite=opt,AHI=0,BPI=32 7936.283
io_cap_max=8000,io_cap_max=8000,doublewrite=opt,AHI=0,BPI=32 7693.084

The difference between innodb_io_capacity_max=6500 and innodb_io_capacity_max=8000 is 7%

This now becomes clear what Manual means in the part where it says:

“Ideally, keep the setting as low as practical, but not so low that background activities fall behind”

So we really need to increase innodb_io_capacity_max to the level that Uncheckpointed Bytes stays under Max Checkpoint Age, but not by much, otherwise InnoDB will do more work then it is needed and it will affect the throughput.

In my opinion, this is a serious design flaw in InnoDB Adaptive Flushing, that you actually need to wiggle innodb_io_capacity_max to achieve appropriate results.

Inverse relationship between innodb_io_capacity_max and innodb_log_file_size

To show an even more complicated relation between innodb_io_capacity_max and innodb_log_file_size, let consider the following experiment.

We will increase innodb_log_file_size from 10GB to 20GB, effectively doubling our redo-log capacity.

And now let’s check InnoDB Checkpoint Age with innodb_io_capacity_max=7000:

We can see there is a lot of space in InnoDB logs which InnoDB does not use. There is only 22.58GiB of Uncheckpointed Bytes, while 33.24 GiB are available.

So what happens if we increase innodb_io_capacity_max to 4500

 InnoDB Checkpoint Age with innodb_io_capacity_max=4500:

In this setup, We can push Uncheckpointed Bytes to 29.80 GiB, and it has a positive effect on the throughput.

Let’s compare throughput :

settings Avg tps, last 2000 sec
io_cap_max=4500,log_size=40GB,doublewrite=opt,AHI=0,BPI=32 9865.308
io_cap_max=7000,log_size=40GB,doublewrite=opt,AHI=0,BPI=32 9374.121

So by decreasing innodb_io_capacity_max from 7000 to 4500 we can gain 5.2% in the throughput.

Please note that we can’t continue to decrease innodb_io_capacity_max, because in this case Uncheckpointed Bytes risks to exceed Max Checkpoint Age, and this will lead to the negative effect of emergency flushing.

So again, in order to improve throughput, we should be DECREASING innodb_io_capacity_max, but only to a certain threshold. We should not be setting innodb_io_capacity_max to 30000, to what really SATA SSD can provide.

Again, for me, this is a major design flaw in the current InnoDB Adaptive Flushing. Please note this was a static workload. If your workload changes during the day, it is practically impossible to come up with optimal value. 

Conclusions:

Trying to summarize all of the above, I want to highlight:

  • To fix “wave” pattern we need to tune innodb_io_capacity_max and innodb_doublewrite_pages 
  • InnoDB parallel doublewrite in MySQL 8.0.20 is a definitely positive improvement, but the default values seem chosen poorly, in contradiction with Manual. I wish Oracle/MySQL shipped features that work out of the box for most users.
  • InnoDB Adaptive Hash index is not helping here, and you get better performance by disabling it. I also observed that in other workloads, the InnoDB Adaptive Hash index might be another broken subsystem, which Oracle ignores to fix and just disables it in its benchmarks.
  • InnoDB Change Buffer has no effect on this workload, so you may or may not disable it — there is no difference. But I saw a positive effect from InnoDB Change Buffer in other workloads, so I do not recommend blindly disabling it.
  • Now about InnoDB Adaptive Flushing. In my opinion, InnoDB Adaptive Flushing relies too much on manual tuning of innodb_io_capacity_max , which in fact has nothing to do with the real storage IO capacity. In fact, often you need to lower innodb_io_capacity_max  to get better performance, but not make it too low, because at some point it will hurt the performance. The best way to monitor it is to check InnoDB Checkpoint Age chart in PMM
  • I would encourage Oracle to fix the broken design of InnoDB Adaptive Flushing, where it would detect IO capacity automatically and to not flush aggressively, but to keep  Uncheckpointed Bytes just under Max Checkpoint Age. Let’s hope Oracle faster than doublewrite buffer because history shows that to force Oracle to make improvements in InnoDB IO subsystem, we need to do it first in Percona Server for MySQL like we did with parallel doublewrite buffer.  For the reference parallel doublewrite was implemented first in Percona Server for MySQL 5.7.11-4 which was released March 15th, 2016. Oracle implemented (with not optimal default settings ) parallel doublewrite in MySQL 8.0.20, which was released 4 years later after Percona Server, on April 4th, 2020.
Aug
27
2020
--

How Salesforce beat its own target to reach $20B run rate ahead of schedule

Salesforce launched in 1999, one of the early adherents to what would eventually be called SaaS and cloud computing. On Tuesday, the company reached a huge milestone when it surpassed $5 billion in revenue, putting the SaaS giant on a $20 billion run rate for the first time.

Salesforce revenue has been on a firm upward trajectory for years now, but when the company reached $10 billion in revenue in November 2017, CEO Marc Benioff set the goal for $20 billion right then and there, and five years hence the company beat that goal pretty easily. Here’s what he said at the time:

In fact as the fastest growing enterprise software company ever to reach $10 billion, we are now targeting to grow the company organically to more than $20 billion by fiscal year 2022 and we plan to do that to be the fastest enterprise software company ever to get to $20 billion.

There are lots of elements that have led to that success. As the Salesforce platform evolved, the company has also had an aggressive acquisition strategy, and companies are moving to the cloud faster than ever before. Yet Salesforce has been able to meet that lofty 2017 goal early, while practicing his own unique form of responsible capitalism in the midst of a pandemic.

The platform play

While there are many factors contributing to the company’s revenue growth, one big part of it is the platform. As a platform, it’s not only about providing a set of software tools like CRM, marketing automation and customer service, it’s also giving customers the ability to build solutions to meet their needs on top of that, taking advantage of the work that Salesforce has done to build its own software stack.

Bret Taylor, president and chief operating officer at Salesforce, says the platform has played a huge role in the company’s success. “Actually our platform is behind a huge part of Salesforce’s momentum in multiple ways. One, which is one thing we’ve talked a lot about, is just the technology characteristics of the platform, namely that it’s low code and fast time to value,” he said.

He added, “I would say that these low-code platforms and the ability to stand up solutions quickly is more relevant than ever before because our customers are going to have to respond to changes in their business faster than ever before,” he said.

He pointed to nCino, a company built on top of Salesforce that went public last month as a prime example of this. The company was built on Salesforce, sold in the AppExchange marketplace and provides a way for banking customers to do business online, taking advantage of all that Salesforce has built to do that.

The acquisition strategy

Another big contributing factor to the company’s success is that beyond the core CRM product it brought to the table way back in 1999, it has built a broad set of marketing, sales and service tools and as it has done that, it has acquired many companies along the way to accelerate the product road map.

The biggest of those acquisitions by far was the $15.7 billion Tableau deal, which closed just about a year ago. Taylor sees data fueling the push to digital we are seeing during the pandemic, and Tableau is a key part of that.

“Tableau is so strategic, both from a revenue and also from a technology strategy perspective,” he said. That’s because as companies make the shift to digital, it becomes more important than ever to help them visualize and understand that data in order to understand their customers’ requirements better.

“Fundamentally when you look at what a company needs to do to thrive in an all-digital world, it needs to be able to respond to [rapid] changes, which means creating a culture around that data,” he said. This enables companies to respond more quickly to changes like new customer demands or shifts in the supply chain.

“All of that is about data, and I think the reason why Tableau grew so much this past quarter is that I think that the conversation around data when you’re digitizing your entire company and digitizing the entire economy, data is more strategic than it ever was,” he said.

With that purchase, combined with the $6.5 billion MuleSoft acquisition in 2018, the company feels like it has a way to capture and visualize data wherever it lives in the enterprise. “It’s worth noting how complementary MuleSoft and Tableau are together. I think of MuleSoft as unlocking all your enterprise data, whether it’s on a legacy system or a modern system, and Tableau enables us to understand it, and so it’s a really strategic overall value proposition because we can come up with a really complete solution around data,” Taylor said.

Capitalism with some heart

Benioff was happy to point out in an appearance on Mad Money Tuesday that even as he has made charity and volunteerism a core part of his organization, he has still delivered solid returns for his shareholders. He told Mad Money host Jim Cramer, “This is a victory for stakeholder capitalism. It shows you can do good and do well.” This is a statement he has made frequently in the past to show that you can be a good corporate citizen and give back to your community, while still making money.

Those values are what separates the company from the pack says Paul Greenberg, founder and principal analyst at 56 Group and author of CRM at the Speed of Light. “Salesforce’s genius, and a large part of the reason I don’t expect any serious slowdown in that extraordinary growth, is that they manage to align the technology business with corporate social responsibility in a way that makes them stand out from any other company,” Greenberg told TechCrunch.

Yesterday’s numbers come after Q1 2021, in which the company offered softer guidance as it was giving some of its customers, suffering from the impact of the pandemic, more financial flexibility. As it turns out, that didn’t seem to hurt them, and the guidance for next quarter is looking good too: $5.24 billion to $5.25 billion, up approximately 16% year over year, according to the company.

It’s worth noting that while Benioff pledged no new layoffs for 90 days at the start of the pandemic, with that time now ending, The Wall Street Journal reported yesterday that the company was planning to eliminate 1,000 roles out of the organization’s 54,000 total employees, while giving those workers 60 days to find other roles in the company.

Getting to $20 billion

Certainly getting to that $20 billion run rate is significant, as is the speed with which they were able to achieve that goal, but Taylor sees an evolving company, one that is different than the one it was in 2017 when Benioff set that goal.

“I would say the reason we’ve been able to accelerate is through organic [growth], innovation and acquisitions to really build out this vision of a complete customer [picture]. I think it’s more important than ever before,” he said.

He says that when you look at the way the platform has changed, it’s been about bringing multiple customer experience capabilities together under a single umbrella, and giving customers the tools they need to build these out.

“I think we as a company have constantly redefined what customer relationship management means. It’s not just opportunity management for sales teams. It’s customer service, it’s e-commerce, it’s digital marketing, it’s B2B, it’s B2C. It’s all of the above,” he said.

Aug
27
2020
--

MySQL 8.0.19 InnoDB ReplicaSet Configuration and Manual Switchover

Manual Switchover

Manual SwitchoverInnoDB ReplicaSet was introduced from MySQL 8.0.19. It works based on the MySQL asynchronous replication. Generally, InnoDB ReplicaSet does not provide high availability on its own like InnoDB Cluster, because with InnoDB ReplicaSet we need to perform the manual failover. AdminAPI includes the support for the InnoDB ReplicaSet. We can operate the InnoDB ReplicaSet using the MySQL shell. 

  • InnoDB cluster is the combination of MySQL shell and Group replication and MySQL router
  • InnoDB ReplicaSet is the combination of MySQL shell and MySQL traditional async replication and MySQL router

Why InnoDB ReplicaSet?

  • You can manually perform the switchover and failover with InnoDB ReplicaSet
  • You can easily add the new node to your replication environment. InnoDB ReplicaSet helps with data provisioning (using MySQL clone plugin) and setting up the replication.

In this blog, I am going to explain the process involved in the following topics

  • How to set up the InnoDB ReplicaSet in a fresh environment?
  • How to perform the manual switchover with ReplicaSet?

Before going into the topic, I am summarising the points which should be made aware to work on InnoDB ReplicaSet. 

  • ReplicaSet only supports GTID based replication environments. 
  • MySQL version should be 8.x +.
  • It has support for only Row-based replication.
  • Replication filters are not supported with InnoDB ReplicaSet
  • InnoDB ReplicaSet should have one primary node ( master ) and one or multiple secondary nodes ( slaves ). All the secondary nodes should be configured under the primary node. 
  • There is no limit for secondary nodes, you can configure many nodes under ReplicaSet.
  • It supports only manual failover.
  • InnoDB ReplicaSet should be completely managed with MySQL shell. 

How to set up the InnoDB ReplicaSet in a fresh environment?

I have created two servers (replicaset1, replicaset2) for testing purposes. My goal is to create the InnoDB ReplicaSet with one primary node and one secondary node. I installed Percona Server for MySQL 8.0.20 for my testing.

Step 1 :

Allow hostname based communication. Make sure that you configured this on all the servers, which participated in the ReplicaSet.

#vi /etc/hosts
172.28.128.20 replicaset1 replicaset1
172.28.128.21 replicaset2 replicaset2

Step 2 :

In this step, I am going to prepare the MySQL instances for InnoDB ReplicaSet. Below are the major tasks that need to be performed as part of this operation.

  • Create a dedicated user account to effectively manage the ReplicaSet. The account will be automatically created with sufficient privileges.
  • MySQL parameters changes which need to be updated for InnoDB ReplicaSet (persisting settings).
  • Restart the MySQL instance to apply the changes.

Command : dba.configureReplicaSetInstance()

Connecting the shell,

[root@replicaset1 ~]# mysqlsh --uri root@localhost
Please provide the password for 'root@localhost': *************
Save password for 'root@localhost'? [Y]es/[N]o/Ne[v]er (default No): y
MySQL Shell 8.0.20

Configuring the instance,

Once you triggered the command, it will start to interact with you. You have to choose the needed options. 

MySQL  localhost:33060+ ssl  JS > dba.configureReplicaSetInstance()
Configuring local MySQL instance listening at port 3306 for use in an InnoDB ReplicaSet...
This instance reports its own address as replicaset1:3306
Clients and other cluster members will communicate with it through this address by default. If this is not correct, thereport_host MySQL system variable should be changed.
ERROR: User 'root' can only connect from 'localhost'. New account(s) with proper source address specification to allow remote connection from all instances must be created to manage the cluster.

1) Create remotely usable account for 'root' with same grants and password
2) Create a new admin account for InnoDB ReplicaSet with minimal required grants
3) Ignore and continue
4) Cancel

Please select an option [1]: 2
Please provide an account name (e.g: icroot@%) to have it created with the necessary
privileges or leave empty and press Enter to cancel.
Account Name: InnodbReplicaSet
Password for new account: ********
Confirm password: ********

NOTE: Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable                 | Current Value | Required Value | Note                                             |
+--------------------------+---------------+----------------+--------------------------------------------------+
| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
| server_id                | 1             | <unique ID>    | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+
Some variables need to be changed, but cannot be done dynamically on the server.
Do you want to perform the required configuration changes? [y/n]: y
Do you want to restart the instance after configuring it? [y/n]: y
Cluster admin user 'InnodbReplicaSet'@'%' created.
Configuring instance...
The instance 'replicaset1:3306' was configured to be used in an InnoDB ReplicaSet.
Restarting MySQL...
NOTE: MySQL server at replicaset1:3306 was restarted.

You can find the updated parameters from the file “mysqld-auto.cnf”. The blog by Marco Tusa has more details about the PERSIST configuration. 

[root@replicaset1 mysql]# cat mysqld-auto.cnf 

{ "Version" : 1 , "mysql_server" : { "server_id" : { "Value" : "3391287398" , "Metadata" : { "Timestamp" : 1598084590766958 , "User" : "root" , "Host" : "localhost" } } , "read_only" : { "Value" : "OFF" , "Metadata" : { "Timestamp" : 1598084718849667 , "User" : "InnodbReplicaSet" , "Host" : "localhost" } } , "super_read_only" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084898510380 , "User" : "InnodbReplicaSet" , "Host" : "localhost" } } , "mysql_server_static_options" : { "enforce_gtid_consistency" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084590757563 , "User" : "root" , "Host" : "localhost" } } , "gtid_mode" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084590766121 , "User" : "root" , "Host" : "localhost" } } } } }

Note :

  • Make sure that this step is executed on all the MySQL instances which are going to participate in the ReplicaSet group.
  • Make sure that the cluster account name and password are the same on all MySQL instances.

Step 3 :

In this step, I am going to switch my login to the ReplicaSet account which was created in Step 2. 

MySQL  localhost:33060+ ssl  JS > \connect InnodbReplicaSet@replicaset1
Creating a session to 'InnodbReplicaSet@replicaset1'
Please provide the password for 'InnodbReplicaSet@replicaset1': ********
Save password for 'InnodbReplicaSet@replicaset1'? [Y]es/[N]o/Ne[v]er (default No): y
Fetching schema names for autocompletion... Press ^C to stop.
Closing old connection...
Your MySQL connection id is 8 (X protocol)
Server version: 8.0.20-11 Percona Server (GPL), Release 11, Revision 5b5a5d2

Step 4:

Now, all are set to create the ReplicaSet.

Command : dba.createReplicaSet(‘<ReplicaSet Name>’)

MySQL  replicaset1:33060+ ssl  JS > dba.createReplicaSet('PerconaReplicaSet')
A new replicaset with instance 'replicaset1:3306' will be created.

* Checking MySQL instance at replicaset1:3306
This instance reports its own address as replicaset1:3306
replicaset1:3306: Instance configuration is suitable.

* Updating metadata...
ReplicaSet object successfully created for replicaset1:3306.
Use rs.addInstance() to add more asynchronously replicated instances to this replicaset and rs.status() to check its status.
<ReplicaSet:PerconaReplicaSet>

ReplicaSet is created with the name “PerconaReplicaSet”

Step 5:

In this step, I am going to assign the ReplicaSet to the variable and check the ReplicaSet status. Assigning to the variable can be done while creating the ReplicaSet as well  (i.e. var replicaset = dba.createReplicaSet(‘<ReplicaSet Name>’)  

MySQL  replicaset1:33060+ ssl  JS > replicaset = dba.getReplicaSet()
You are connected to a member of replicaset 'PerconaReplicaSet'.
<ReplicaSet:PerconaReplicaSet>
 MySQL  replicaset1:33060+ ssl  JS > 
 MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset1:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

The ReplicaSet status states the Instance replicaset1 is operational and is the PRIMARY member. 

Step 6:

Now, I need to add the secondary instance “replicaset2” to the ReplicaSet.

When adding the new instance, it should be fulfilled with all the ReplicaSet requirements. We have two recovery methods when joining the new node.

Clone: It will take the snapshot from the ONLINE instance and build the target node with a snapshot and finally add to the ReplicaSet. This method is always recommended when adding fresh nodes.

Incremental: This method relies on MySQL replication and applies all the transactions which are missed on the new instance. This can be faster when the missing transaction amount is small.

Command : replicaset.addInstance(‘<instance name>:<port>’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.addInstance('replicaset2:3306')
Adding instance to the replicaset...
* Performing validation checks
This instance reports its own address as replicaset2:3306
replicaset2:3306: Instance configuration is suitable.
* Checking async replication topology...
* Checking transaction state of the instance...

NOTE: The target instance 'replicaset2:3306' has not been pre-provisioned (GTID set is empty). The Shell is unable to decide whether replication can completely recover its state.
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'replicaset2:3306' with a physical snapshot from an existing replicaset member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

WARNING: It should be safe to rely on replication to incrementally recover the state of the new instance if you are sure all updates ever executed in the replicaset were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the replicaset or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.
Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone): C
* Updating topology
Waiting for clone process of the new member to complete. Press ^C to abort the operation.
* Waiting for clone to finish...
NOTE: replicaset2:3306 is being cloned from replicaset1:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed
** Stage RECOVERY: |
NOTE: replicaset2:3306 is shutting down...

* Waiting for server restart... ready
* replicaset2:3306 has restarted, waiting for clone to finish...
* Clone process has finished: 60.68 MB transferred in about 1 second (~60.68 MB/s)
** Configuring replicaset2:3306 to replicate from replicaset1:3306
** Waiting for new instance to synchronize with PRIMARY...
The instance 'replicaset2:3306' was added to the replicaset and is replicating from replicaset1:3306.

Here I have chosen the clone method for recovery. 

MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset1:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }, 
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "SECONDARY", 
                "mode": "R/O", 
                "replication": {
                    "applierStatus": "APPLIED_ALL", 
                    "applierThreadState": "Slave has read all relay log; waiting for more updates", 
                    "receiverStatus": "ON", 
                    "receiverThreadState": "Waiting for master to send event", 
                    "replicationLag": null
                }, 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

The second instance has been added to the ReplicaSet. 

How to perform the manual switchover with ReplicaSet? 

As per the current topology,

  • replicaset1 is the PRIMARY
  • replicaset2 is the SECONDARY

Requirement: Regarding the maintenance activity, I am planning to remove the server “replicaset1” from the ReplicaSet. This needs to be performed in a safe manner and the secondary instance “replicaset2” should be available for application writes and reads.

  • First, I need to promote “replicaset2” as the PRIMARY.
  • Then, remove the “replicaset1” from the group.

Switching the “replicaset2” as the PRIMARY.

Command : replicaset.setPrimaryInstance(‘host:port’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.setPrimaryInstance('replicaset2:3306')
replicaset2:3306 will be promoted to PRIMARY of 'PerconaReplicaSet'.
The current PRIMARY is replicaset1:3306.

* Connecting to replicaset instances
** Connecting to replicaset1:3306
** Connecting to replicaset2:3306
** Connecting to replicaset1:3306
** Connecting to replicaset2:3306
* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...
* Synchronizing transaction backlog at replicaset2:3306
* Updating metadata
* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Updating replication topology
** Configuring replicaset1:3306 to replicate from replicaset2:3306
replicaset2:3306 was promoted to PRIMARY.

You can see the “replicaset2” has been promoted as PRIMARY.

MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset2:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "SECONDARY", 
                "mode": "R/O", 
                "replication": {
                    "applierStatus": "APPLIED_ALL", 
                    "applierThreadState": "Slave has read all relay log; waiting for more updates", 
                    "receiverStatus": "ON", 
                    "receiverThreadState": "Waiting for master to send event", 
                    "replicationLag": null
                }, 
                "status": "ONLINE"
            }, 
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

Removing “replicaset1” from the group,

Command : replicaset.removeInstance(‘host:port’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.removeInstance('replicaset1:3306')
The instance 'replicaset1:3306' was removed from the replicaset.
MySQL  replicaset1:33060+ ssl  JS > 
MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset2:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

We can perform the forced failover using “ReplicaSet.forcePrimaryInstance()”. This is dangerous and only recommended to use on the disaster type of scenario.

MySQL InnoDB ReplicaSet is a very good feature to manage the MySQL asynchronous replication environment. It has the CLONE plugin support and it greatly helps on data provisioning and setting up the replication. But, still it has some limitations when compared with the MySQL InnoDB Cluster. 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com