Oct
09
2020
--

Amazon Aurora Multi-Primary First Impression

Amazon Aurora Multi-Primary First Impression

Amazon Aurora Multi-Primary First ImpressionFor what reason should I use a real multi-primary setup?

To be clear, not a multi-writer solution where any node can become the active writer in case of needs, as for Percona XtraDB Cluster (PXC) or Percona Server for MySQL using Group_replication. No, we are talking about a multi-primary setup where I can write at the same time on multiple nodes. I want to insist on this “why?”.

After having excluded the possible solutions mentioned above, both covering the famous 99.995% availability, which is 26.30 minutes of downtime in a year, what is left?

Disaster Recovery? Well, that is something I would love to have, but to be a real DR solution we need to put several kilometers (miles for imperial) in the middle.

And we know (see here and here) that aside from some misleading advertising, we cannot have a tightly coupled cluster solution across geographical regions.

So, what is left? I may need more HA, ok, that is a valid reason. Or I may need to scale the number of writes, which is a valid reason as well. This means, in the end, that I am looking to a multi-primary because:

  • Scale writes (more nodes more writes)
    • Consistent reads (what I write on A must be visible on B)
  • Gives me 0 (zero) downtime, or close to that (5 nines is a maximum downtime of 864 milliseconds per day!!)
  • Allow me to shift the writer pointer at any time from A to B and vice versa, consistently.

Now, keeping myself bound to the MySQL ecosystem, my natural choice would be MySQL NDB cluster. But my (virtual) boss was at AWS re-invent and someone mentioned to him that Aurora Multi-Primary does what I was looking for. This (long) article is my voyage in discovering if that is true or … not.

Given I am focused on the behavior first, and NOT interested in absolute numbers to shock the audience with millions of QPS, I will use low-level Aurora instances. And will perform tests from two EC2 in the same VPC/region of the nodes.

You can find the details about the tests on GitHub here.

Finally, I will test:

  • Connection speed
  • Stale read
  • Write single node for baseline
  • Write on both node:
    • Scaling splitting the load by schema
    • Scaling same schema

Test Results

Let us start to have some real fun. The first test is …

Connection Speed

The purpose of this test is to evaluate the time taken in opening a new connection and time taken to close it. The action of the open/close connection can be a very expensive operation, especially if applications do not use a connection pool mechanism.

Amazon Aurora Multi-Primary


As we can see, ProxySQL results to be the most efficient way to deal with opening connections, which was expected given the way it is designed to reuse open connections towards the backend.


Different is the close connection operation, in which ProxySQL seems to take a little bit longer.

As a global observation, we can say that by using ProxySQL we have more consistent behavior. Of course, this test is a simplistic one, and we are not checking the scalability (from 1 to N connections) but it is good enough to give us the initial feeling. Specific connection tests will be the focus of the next blog on Aurora MM.

Stale Reads

Aurora multi-primary uses the same mechanism of the default Aurora to update the buffer pool:


Using the Page Cache update, just doing both ways. This means that the Buffer Pool of Node2 is updated with the modification performed in Node1 and vice versa.

To verify if an application would be really able to have consistent reads, I have run this test. This test is meant to measure if, and how many, stale reads we will have when writing on a node and reading from the other.

Amazon Aurora multi-primary has two consistency models:

Aurora consistency model
As an interesting fact, the result was that with the default consistency model (INSTANCE_RAW), we got a 100% stale read.
Given that I focused on identifying the level of the cost that exists when using the other consistency model (REGIONAL_RAW), that allows an application to have consistent reads.

The results indicate an increase of 44% in total execution time, and of 95% (22 times slower) in write execution.

It is interesting to note that the time taken is in some way predictable and consistent between the two consistency models.

The graph below shows in yellow how long the application must wait to see the correct data on the reader node. In blue is the amount of time the application waits to get back the same consistent read because it must wait for the commit on the writer.

lag time in nanoseconds

As you can see, the two are more or less aligned.

Given the performance cost imposed by using REGIONAL_RAW,  all the other tests are done with the default INSTANCE_RAW, unless explicitly stated.

Writing Tests

All tests run in this section were done using sysbench-tpcc with the following settings:

sysbench ./tpcc.lua --mysql-host=<> --mysql-port=3306 --mysql-user=<> --mysql-password=<> --mysql-db=tpcc --time=300 --threads=32 --report-interval=1 --tables=10 --scale=15  --mysql_table_options=" CHARSET=utf8 COLLATE=utf8_bin"  --db-driver=mysql prepare

 sysbench /opt/tools/sysbench-tpcc/tpcc.lua --mysql-host=$mysqlhost --mysql-port=$port --mysql-user=<> --mysql-password=<> --mysql-db=tpcc --db-driver=mysql --tables=10 --scale=15 --time=$time  --rand-type=zipfian --rand-zipfian-exp=0 --report-interval=1 --mysql-ignore-errors=all --histogram  --report_csv=yes --stats_format=csv --db-ps-mode=disable --threads=$threads run

Write Single Node (Baseline)

Before starting the comparative analysis, I was looking to define what was the “limit” of traffic/load for this platform.

baseline reads/writes

From the graph above, we can see that this setup scales up to 128 threads and after that, the performance remains more or less steady.

Amazon claims that we can mainly double the performance when using both nodes in write mode and use a different schema to avoid conflict.

aurora scalability

Once more, remember I am not interested in the absolute numbers here, but I am expecting the same behavior. Given that, our expectation is to see:

expected scalability

Write on Both Nodes, Different Schemas

So AWS recommend this as the scaling solution:


And I diligently follow the advice. I used two EC2 nodes in the same subnet of the Aurora Node, writing to a different schema (tpcc & tpcc2).

Overview

Let us make it short and go straight to the point. Did we get the expected scalability?

Well, no:


We just had a 26% increase, quite far to be the expected 100% Let us see what happened in detail (if not interested just skip and go to the next test).

Node 1

Schema read writes Aurora

Node 2


As you can see, Node1 was (more or less) keeping up with the expectations and being close to the expected performance. But Node2 was just not keeping up, and performances there were just terrible.

The graphs below show what happened.

While Node1 was (again more or less) scaling up to the baseline expectations (128 threads), Node2 collapsed on its knees at 16 threads. Node2 was never able to scale up.

Reads

Node 1


Node1 is scaling the reads as expected, but also here and there we can see performance deterioration.

Node 2


Node2 is not scaling Reads at all.

Writes

Node 1


Same as Read.

Node 2


Same as read.

Now someone may think I was making a mistake and I was writing on the same schema. I assure you I was not. Check the next test to see what happened if using the same schema.

Write on Both Nodes,  Same Schema

Overview

Now, now, Marco, this is unfair. You know this will cause contention. Yes, I do! But nonetheless, I was curious to see what was going to happen and how the platform would deal with that level of contention.

My expectations were to have a lot of performance degradation and an increased number of locks. About conflict I was not wrong, node2 after the test reported:

+-------------+---------+-------------------------+
| table       | index   | PHYSICAL_CONFLICTS_HIST |
+-------------+---------+-------------------------+
| district9   | PRIMARY |                    3450 |
| district6   | PRIMARY |                    3361 |
| district2   | PRIMARY |                    3356 |
| district8   | PRIMARY |                    3271 |
| district4   | PRIMARY |                    3237 |
| district10  | PRIMARY |                    3237 |
| district7   | PRIMARY |                    3237 |
| district3   | PRIMARY |                    3217 |
| district5   | PRIMARY |                    3156 |
| district1   | PRIMARY |                    3072 |
| warehouse2  | PRIMARY |                    1867 |
| warehouse10 | PRIMARY |                    1850 |
| warehouse6  | PRIMARY |                    1808 |
| warehouse5  | PRIMARY |                    1781 |
| warehouse3  | PRIMARY |                    1773 |
| warehouse9  | PRIMARY |                    1769 |
| warehouse4  | PRIMARY |                    1745 |
| warehouse7  | PRIMARY |                    1736 |
| warehouse1  | PRIMARY |                    1735 |
| warehouse8  | PRIMARY |                    1635 |
+-------------+---------+-------------------------+

Which is obviously a strong indication something was not working right. In terms of performance gain, if we compare ONLY the result with the 128 Threads:


Also with the high level of conflict, we still have 12% of performance gain.

The problem is that in general, we have the two nodes behaving quite badly. If you check the graph below you can see that the level of conflict is such to prevent the nodes not only to scale but to act consistently.

Node 1

Write on Both Nodes,  Same Schema

Node 2


Reads

In the following graphs, we can see how node1 had issues and it actually crashed three times, during tests with 32/64/512 threads. Node2 was always up but the performances were very low.

Node 1


Node 2


Writes

Node 1


Node 2


Recovery From Crashed Node

About recovery time, reading the AWS documentation and listening to presentations, I often heard that Aurora Multi-Primary is a 0 downtime solution. Or other statements like: “in applications where you can’t afford even brief downtime for database write operations, a multi-master cluster can help to avoid an outage when a writer instance becomes unavailable. The multi-master cluster doesn’t use the failover mechanism, because it doesn’t need to promote another DB instance to have read/write capability”

To achieve this the suggestion, the solution I found was to have applications pointing directly to the Nodes endpoint and not use the Cluster endpoint.

In this context, the solution pointing to the Nodes should be able to failover within a second or so, while the cluster endpoint:

Recovery From Crashed Node

Personally, I think that designing an architecture where the application is responsible for the connection to the database and failover is some kind of refuse from 2001. But if you feel this is the way, well, go for it.

What I did for testing is to use ProxySQL, as plain as possible with nothing else, and the basic monitor coming from the native monitor. I then compared the results with the tests using the Cluster endpoint. In this way, I adopt the advice of pointing directly at the nodes, but I was doing things in our time.

The results are below and they confirm (more or less) the data coming from Amazon.


A downtime of seven seconds is quite a long time nowadays, especially if I am targeting the 5 nines solution that I want to remember is 864 ms downtime per day. Using ProxySQL is going closer to that, but still too long to be called zero downtime.
I also have fail-back issues when using the AWS cluster endpoint, given it was not able to move the connection to the joining node seamlessly.

Last but not least, when using the consistency level INSTANCE_RAW, I had some data issue as well as PK conflict:
FATAL: mysql_drv_query() returned error 1062 (Duplicate entry ‘18828082’ for key ‘PRIMARY’) 

Conclusions

As state at the beginning of this long blog, the reasonable expectations to go for a multi-primary solution were:

  • Scale writes (more nodes more writes)
  • Gives me zero downtime, or close to that (5 nines is a maximum downtime of 864 milliseconds per day!!)
  • Allow me to shift the writer pointer at any time from A to B and vice versa, consistently.

Honestly, I feel we have completely failed the scaling point. Probably if I use the largest Aurora I will get much better absolute numbers, and it will take me more to encounter the same issues, but I will. In any case, if the multi-primary solution is designed to provide that scalability, and it should do that with any version.

I did not have zero downtime, but I was able to failover pretty quickly with ProxySQL.

Finally, unless the consistency model is REGIONAL_RAW, shifting from one node to the other is not prone to possible negative effects like stale reads. Given that I consider this requirement not satisfied in full.

Given all the above, I think this solution could eventually be valid only for High Availability (close to being 5 nines), but given it comes with some limitations I do not feel comfortable in preferring it over others just for that, at the end default Aurora is already good enough as a High available solution.

References

AWS re:Invent 2019: Amazon Aurora Multi-Master: Scaling out database write performance

Working with Aurora multi-master clusters

Improving enterprises ha and disaster recovery solutions reviewed

Robust ha solutions with proxysql

Limitations of multi-master clusters

Oct
02
2020
--

Rate Limit (Throttle) for MySQL with ProxySQL

Rate Limit (Throttle) for MySQL with ProxySQL

Rate Limit (Throttle) for MySQL with ProxySQLMaybe one of the more “obscure” operations when dealing with replica lag, or, in general, when one needs to control writes to the database, is the Rate limit. It’s also lately one of the most popular conversations around the community.

But what is it? In plain words: holding up queries for a while, giving air to the replicas to breath and catch up. Something similar to the Galera’s Flow Control mechanism, although flow control, when it kicks in, stops all the writes while the nodes catch up. With a throttle no write is stopped, just delayed.

There are several ways to do this. A popular tool is Freno but this is also something that can be achieved with ProxySQL. Let’s see how.

Delay

ProxySQL has a variable called “mysql-default_query_delay”  that is pretty self-explanatory. It will add a delay, in milliseconds, to all the queries passing through ProxySQL.

Now, the trick is to monitor the replica lag and apply the delay. Something that in this case is solved with a small bash script:

#/bin/bash

set -o pipefail

proxysqlhost=127.0.0.1
proxysqladmport=6032
proxysqluser=admin
proxysqlpass=admin

replicahost=mysql2
replicauser=pmm
replicapass=pmm

lagThreshold=3


function check_lag() {
         lag=$(mysql -h$replicahost -u$replicauser -p$replicapass -e "SHOW SLAVE STATUS\G" | grep  "Seconds_Behind_Master" | awk -F\: '{print $2}' 2>&1)
}

function modifyProxysqlVariable () {
        out=$(mysql -u$proxysqluser -p$proxysqlpass -h$proxysqlhost -P6032 -e"set mysql-default_query_delay = $delay ; LOAD MYSQL VARIABLES TO RUNTIME;" 2>&1)
        echo $out
}

function destructor () {
        delay=0
        modifyProxysqlVariable
        echo "bye"
}

trap destructor EXIT INT TERM

while true; do

        check_lag
        echo $lag
        if [ $lag -ge $lagThreshold ]; then
                delay=1
                modifyProxysqlVariable
        elif [ $lag -eq 0 ]; then
                delay=0
                modifyProxysqlVariable
        fi
        sleep 0.001
done

Replica lag will be monitored on a loop, and when it’s bigger than the threshold, 1mS of delay will be added overall. And after lag is under control, the delay is removed.

For testing purposes, I have set up a basic Primary/Replica environment, running with sysbench on high concurrency setup to make the replica lag on purpose. I collected some graphs with Percona Monitoring and Management,  and here’s how it looks:

Rate Limit MySQL

The above graph shows the lag constantly growing until the script starts to run and the replica eventually catches up. Fantastic! Until you see the Questions graphs and it looks very….flat

throttle MySQL

So what happened is that until the lag reached the threshold (in this case, 3 seconds) all the traffic was delayed. Once the lag is under control, traffic is back but not at 100% since the throttle script continues to run. So it maintains the replica up to date at a cost of rate-limiting the writes.

And the above graph shows the same (questions) but from the ProxySQL dashboard perspective.

Now, what’s the alternative?

Delay on Rules

ProxySQL query rules can be set with delays per rule. Since the idea is to add less overhead, a good approach is to check which query rule is the least used and see if by adding a delay to it we can control the lag.

Admin> SELECT hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply, delay FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
| hits      | rule_id | match_digest | match_pattern         | replace_pattern | cache_ttl | apply | delay |
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
| 0         | 1       | NULL         | ^SELECT .* FOR UPDATE | NULL            | NULL      | 0     | NULL  |
| 609897074 | 2       | NULL         | ^SELECT .*            | NULL            | NULL      | 0     | NULL  |
| 87128148  | 3       | NULL         | ^UPDATE .*            | NULL            | NULL      | 0     | 0     |
| 43561905  | 4       | NULL         | ^INSERT .*            | NULL            | NULL      | 0     | 0     |
| 43562935  | 5       | NULL         | ^DELETE .*            | NULL            | NULL      | 0     | 0     |
| 0         | 6       | NULL         | ^REPLACE .*           | NULL            | NULL      | 0     | NULL  |
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
6 rows in set (0.01 sec)

From that query, we can see that the rule with id 5, the one that routes the DELETEs, is the least used from all the rules that hit the Primary.

We just need to make a small adjustment to the script to modify the mysql_query_rules table, by adding a proper function. The full script with the modification looks like this:

#/bin/bash

set -o pipefail

proxysqlhost=127.0.0.1
proxysqladmport=6032
proxysqluser=admin
proxysqlpass=admin

replicahost=mysql2
replicauser=pmm
replicapass=pmm

lagThreshold=3

ruleId=5

function destructor () {
        delay=0
        modifyRuleDelay
        echo "bye"
}

function check_lag() {
         lag=$(mysql -h$replicahost -u$replicauser -p$replicapass -e "SHOW SLAVE STATUS\G" | grep  "Seconds_Behind_Master" | awk -F\: '{print $2}' 2>&1)
}

function modifyRuleDelay () {
        out=$(mysql -u$proxysqluser -p$proxysqlpass -h$proxysqlhost -P6032 -e"update mysql_query_rules set delay = $delay where rule_id = $ruleId ; LOAD MYSQL QUERY RULES TO RUNTIME;" 2>&1)
}

trap destructor EXIT INT TERM

while true; do

        check_lag
        echo $lag
        if [ $lag -ge $lagThreshold ]; then
                delay=1
                modifyRuleDelay
        elif [ $lag -eq 0 ]; then
                delay=0
                modifyRuleDelay
        fi
        sleep 0.001
done

Now, running the script we can see that the same behavior happens. Lag is controlled:

rate limit for Proxsql

What about the Questions? Well, now the drop is not complete, and some traffic can still be routed and when the replica catches up, the behavior is the same as a small reduction, being that the cost of keeping the lag controlled.

ProxySQL reports the same.

Now, we have talked about controlling lag but pretty much any status variable can be monitored. Perhaps your issues are with contention (Threads_running) or with IO-bound issues (InnoDB checkpoint age). Regardless of the variable, rate limit can be achievable with ProxySQL.

Aug
28
2020
--

ProxySQL Binary Search Solution for Rules

We sometimes receive challenging requests… this is a story about one of those times.

The customer has implemented a sharding solution and would like us to review alternatives or improvements. We analyzed the possibility of using ProxySQL as it looked to be a simple implementation. However, as we had 200 shards we had to implement 200 rules — the first shard didn’t have much overload, but the latest one had to go through 200 rules and took longer.

My first idea was to use FLAGIN and FLAGOUT creating a B-Tree, but the performance was the same. Reviewing the code, I realized that the rules were implemented as a list, which means that, in the end, all the rules were going to be processed until hit with the right one and FLAGIN is used just to filter out.

At that point, I asked, what could I do? Is it possible to implement it differently? What is the performance impact?

One Problem, Two Solutions

I think that it would be worthy to clarify again that I have to change the code because I found no performance gain with the current implementation. This means that writing the rules to take the advantage of the binary search took me halfway, and implementing the rules with Map allowed the performance gain expected, as now we are jumping to the right rule chain and skipping the others.

Solution

I decided to change the ProxySQL code to use a different structure (Map) to store the rules and when the FLAGOUT is there, start that path. This is 100% proof of concept, do not use the code in this repo on production as it is not thoroughly tested and might have several bugs. However, we can trust the behavior and results of the test under the scenario that I’m presenting.

Base case

Using ProxySQL without any change and with 1 rule per shard will be our base case. This means, that it is going to evaluate 1 rule for shard-1 but 200 evaluations need to be made to reach the rule for shard-200.

In this case, the rules will be like this:

insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 000',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 001',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 002',1,0);
insert into mysql_query_rules (active,match_pattern,apply,destination_hostgroup) values (1,'\/\* 003',1,0);

Binary search use case

In order to reduce the number of evaluations, I decided to use the divide and conquer idea. I created the rules in this way:

replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* [0-1]',0,01,0,999);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 0' ,01, 0,1, 0);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 1' ,01, 0,1, 1);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 2' , 0, 0,1, 2);
replace into mysql_query_rules (active,match_pattern,flagIN,flagOUT,apply,destination_hostgroup) values (1,'\/\* 3' , 0, 0,1, 3);

will be more rules to write but the number of evaluations are less and evenly distributed:

Shard | Amount of Evaluations
0     | 2
1     | 3
2     | 2
3     | 3

Rule evaluation

Take into account that evaluating a rule means basically reading the parameters and comparing them. This might not be hard work if you have a few amounts of rules, but we had 200 shards, so we need at least 200 rules. Let’s compare how many evaluations are being made on each case:

root@ProxySQL_git:~/git/proxysql# grep "Evaluating rule_id" /var/lib/proxysql/proxysql.log | wc -l
202000
root@ProxySQL_git:~/git/proxysql# grep "Evaluating rule_id" /var/lib/proxysql/proxysql.log | wc -l
 37600

The first number is the number of evaluations that ProxySQL needs using the List, the second is using the B-Tree solution and Map. As you can see, we are evaluating 5.3 times less.

Tests

For the test, I created 3 EC2 instances with these roles:

  • App simulator which is going to run a script that simulates 32 threads running 2M queries like this:
/* 000 */ select 'test' from dual;

  • ProxySQL Server which is going to run both version with the best solution each.
  • Percona Server

The original version of ProxySQL was able to execute 36k of queries per second and using Map and B-Tree was able to execute 61k of queries per second, a 40% increase in throughput.

Another thing to consider is the load in the ProxySQL server for both tests:

In the first picture, we see that the server is reaching 90% of CPU usage but using Map and B-Tree is less than 60%.

Conclusions

I think this proof of concept showed 3 important facts:

  1. That ProxySQL is an amazing tool that is still growing.
  2. The performance penalty using a large number of rules could be reduced.
  3. Writing rules taking into account the binary search might not be only a solution for sharding, could be used for queries hashes for Read-Write splitting.
Aug
28
2020
--

ProxySQL Overhead — Explained and Measured

ProxySQL brings a lot of value to your MySQL infrastructures such as Caching or Connection Multiplexing but it does not come free — your database needs to go through additional processing traffic which adds some overhead. In this blog post, we’re going to discuss where this overhead comes from and measure such overhead. 

Types of Overhead and Where it Comes From 

There are two main types of overhead to consider when it comes to ProxySQL — Network Overhead and Processing Overhead. 

Network Overhead largely depends on where you locate ProxySQL. For example, in case you deploy ProxySQL on the separate host (or hosts) as in this diagram: 

The application will have added network latency for all requests, compared to accessing MySQL Servers directly. This latency can range from a fraction of milliseconds if ProxySQL is deployed at the same local network to much more than that if you made poor choices with ProxySQL locations.  

I have seen exceptionally poor deployment cases with ProxySQL deployed in the different regions from MySQL and Application causing a delay of tens of milliseconds (and more than 100% of overhead for many queries).

Processing Overhead

The second kind of overhead is Processing Overhead — every request which ProxySQL receives undertakes additional processing on the ProxySQL side (compared to talking to MySQL Directly). If you have enough CPU power available (CPU is not Saturated) when the main drivers to cost of such processing will be the size of the query, its result set size, as well as your ProxySQL configuration. The more query rules you have, and the more complicated they are the more processing overhead you should expect. 

In the worst-case scenario, I’ve seen thousands of regular expression based query rules which can add very high overhead. 

Another reason for high Processing overhead can be improper ProxySQL configuration. ProxySQL as of Version 2.0.10 defaults to a maximum of 4 processing threads (see mysql-threads global variable) which limits use no more than 4 CPU cores. If you’re running ProxySQL on the server with a much larger number of CPU cores and see ProxySQL pegged with CPU usage you may increase the number up to a matching number of your CPU cores.

Linux “top” tool is a good way to see if ProxySQL is starved for resources — if you have mysql-threads set at 4 and it is showing 400% of CPU usage — It is the problem.

Also watch for overall CPU utilization, especially if something else is running on the system beyond ProxySQL – oversubscribed CPU will cause additional processing delays. 

Reducing Overhead 

In this blog post we look at the additional Overhead ProxySQL introduces through it also can reduce it — overhead of establishing network connection (especially with TLS) can be drastically lower if you run ProxySQL which is local to the application instance and that maintains a persistent connection to a MySQL Server. 

Let’s Measure It!

I decided not to measure Network overhead because it is way too environment specific but rather look at the Processing Overhead, in case we run MySQL, ProxySQL, and Benchmark Client on the same box. We will try using TCP/IP and Unix Domain Socket to connect to ProxySQL because it makes quite a difference and we also look at Prepared Statements and standard Non-Prepared Statements. Google Spreadsheet with all results and benchmark parameters is available here.

We use Plain ProxySQL setup with no query rules and only one MySQL Server configured so overhead is minimal in this regard.

To get stable results with single-thread tests we had to set up CPU affinity as described in this blog post.

MySQLDump

Let’s start with the most non-scientific test — running MySQLDump on the large table (some 2GB) and measuring how long it takes. This test exposes how expensive result processing is in ProxySQL as query routing work in this case is negligible.

We can see 20% longer times with ProxySQL (through considering results processing by mysqldump actual query execution time difference is likely higher).

Another interesting way to think about it is — we have 4.75sec added to process 10mil rows meaning the ProxySQL overhead is 475ns per about 200-byte row which is actually pretty good.

64 Concurrent Connections Workload 

For this workload, I’m using the server with 28 Cores and 56 logical CPU threads and I have to raise mysql-threads to 32 to make sure ProxySQL is not keeping itself on a diet.

There is a lot of interesting data here. First, we can see disabling Prepared Statements gives us a 15% slowdown with direct MySQL connection and about 13.5% when going through ProxySQL, which makes sense as Processing overhead on ProxySQL side should not increase as much when Prepared Statements are disabled.

The performance between direct connection and going through ProxySQL is significant, though going directly is almost 80% faster when Prepared Statements are in use and over 75% when Prepared Statements are disabled. 

If you think about these numbers — considering sysbench itself is taking some resources, for trivial primary key lookup queries the number of resources ProxySQL requires is comparable to those needed by MySQL Server itself to serve the query.

Single Connection Workload

Let’s now take a look at the performance of the same simple point lookup queries but using only a single thread. We also schedule MySQL, Sysbench, ProxySQL to the different CPU cores so there is no significant contention for CPU resources and we can look at efficiency. In this test, all connections are done using UNIX Socket so we’re looking at best-case scenarios and Prepared Statements are Enabled.

The direct connection gives some 55% better throughput than ProxySQL. 

The other way we can do the math is to see how long does it takes to server the query directly and with ProxySQL in the middle — it is 46 microseconds with MySQL Directly and 71 microseconds when going through ProxySQL, meaning ProxySQL adds around 25 microseconds. 

While 25 microseconds is a large portion of total query execution in this single-host environment and trivial queries it may be a lot less significant for more complicated queries and network-based deployments.

Unix Socket vs TCP/IP

As I recently wrote — there is quite a performance difference between using TCP/IP or Unix Socket for local MySQL Connection.  It is reasonable to assume that the same would apply to ProxySQL deployment, only with ProxySQL we have two connections to take care of — the connection between ProxySQL and MySQL Server and between ProxySQL and Application. In our single host test, we can use Unix Socket in both cases. If you deploy ProxySQL as your application sidecar or on MySQL Server you will be able to use Unix socket at least for one of such connections.

Letters “U” and “T” correspond to connection type — the “UU” means Unix Socket was used for both connections and “TT” means TCP/IP was used in both places.

Results are quite expected — for best performance you should use Unix Socket, but even using Socket for one of the connection types improves performance.

Using TCP/IP for both connection types instead of Unix Socket reduces performance by more than 20%.

If we do the same math to compute how much latency going through TCP/IP adds — it is 20 microseconds, meaning ProxySQL through TCP/IP adds almost double processing latency compared to ProxySQL via Unix Socket.

Summary

ProxySQL is quite efficient — 25-45 microseconds of added latency per request and hundreds of nanoseconds per row of the result set is going to be acceptable for most workloads and can more than pay for itself with features ProxySQL brings to the table. Poor ProxySQL configuration though can yield much higher overhead. Want to be confident? Perform similar tests for your real ProxySQL deployment with its full rules configuration within its real deployment topology. 

Oct
23
2019
--

ProxySQL 2.0.7 and proxysql-admin Tool Now Available

ProxySQL

ProxySQLProxySQL 2.0.7, released by ProxySQL, is now available for download in the Percona Repository along with Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

This release includes ProxySQL 2.0.7 which introduces many new features and enhancements and also fixes a number of bugs.

The proxysql-admin tool now supports MariaDB 10.4.

New Features

Improvements

  • PSQLADM-195: A new option --with-stats-reset has been added to the proxysql-status script to display the  *_reset tables from the stats database. If this option is not specified, these tables are not displayed by default.

Bugs Fixed

  • PSQLADM-157: In some cases, the proxysql-status script used the cat command to display a file without checking if the file existed and was readable.
  • PSQLADM-181: When run with --update-cluster --write-node=<node_name>, the proxysql-admin tool now verifies that the writer nodes are not read-only.

The ProxySQL 2.0.7 source and binary packages available from the Percona download page for ProxySQL include ProxySQL Admin – a tool developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 2.0.7 are available as well. You can download the original ProxySQL from GitHub. GitHub hosts the documentation in the wiki format.

ProxySQL is available under Open Source license GPLv3.

Oct
03
2019
--

Percona XtraDB Cluster 8.0 New Feature: wsrep_sst_auth Removal

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0The problem

In PXC 5.6 and 5.7, when using xtrabackup-v2 as the SST method, the DBA must create a user with the appropriate privileges for use by Percona XtraBackup (PXB). The username and password of this backup user are specified in the wsrep_sst_auth variable.

This is a problem because this username and password was being stored in plaintext and required that the configuration file be secured.

The PXC 8.0 solution

(This feature is still under development and may change before PXC 8.0 GA)

Because the wsrep_sst_auth is only needed on the donor side to take a backup, PXC 8.0 uses an internal user (created specifically for use by PXC) with a randomly generated password. Since this user is only needed on the donor, the plaintext password is not needed on the joiner node.

This password consists of 32 characters generated at random. A new password is generated for each SST request. The plaintext of the password is never saved and never leaves the node. The username/password is sent to the SST script via unnamed pipes (stdin).

New PXC internal user accounts

mysql.pxc.internal.session

The mysql.pxc.internal.session user account provides the appropriate security context to create and set up the other PXC accounts. This account has a limited set of privileges, enough needed to create the mysql.pxc.sst.user??.

This account is locked and cannot be used to login (the password field will not allow login).

mysql.pxc.sst.user

The mysql.pxc.sst.user is used by XtraBackup to perform the backup. This account has the full set of privileges needed by XtraBackup.?? This account is created for an SST and is dropped at the end of an SST and also when the PXC node is shutdown. The creation/provisioning of this user account is not written to the binlog and is not replicated to other nodes. However, this account is sent with the backup to the joiner node. So the joiner node also has to drop this user after the SST has finished.

mysql.pxc.sst.role

The mysql.pxc.sst.role is the MySQL role that provides the privileges needed for XtraBackup. This allows for easy addition/removal of privileges needed for an SST.

The experimental release of PXC is based on MySQL 8.0.15, and we have not implemented the role-based support due to issues found with MySQL 8.0.15. This will be revisited in future versions of PXC 8.0.

Program flow

  1. DONOR node receives SST request from the JOINER
  2. DONOR node generates a random password and creates the internal SST user
    SET SESSION sql_log_bin = OFF;
    DROP USER IF EXISTS 'mysql.pxc.sst.user'@localhost;
    CREATE USER 'mysql.pxc.sst.user'@localhost IDENTIFIED WITH 'mysql_native_password' BY 'XXXXXXXX' ACCOUNT LOCK;
    GRANT 'mysql.pxc.sst.role'@localhost TO 'mysql.pxc.sst.user'@localhost;
    SET DEFAULT ROLE 'mysql.pxc.sst.role'@localhost to 'mysql.pxc.sst.user'@localhost;
    ALTER USER 'mysql.pxc.sst.user'@localhost ACCOUNT UNLOCK;

    The code that uses role is not being used in the current release due to issues with MySQL 8.0.15. Currently, we create the user with all the permissions needed explicitly.

  3. Launch the SST script (passing the username/password via stdin)
  4. SST uses the username/password to perform the backup
  5. SST script exits
  6. The DONOR node drops the user.
  7. The JOINER node receives the backup and drops the user. Note that the JOINER node also contains the internal SST user!

As a precaution, the user is also dropped when the server is shutdown.

Oct
01
2019
--

Experimental Binary of Percona XtraDB Cluster 8.0

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0Percona is happy to announce the first experimental binary of Percona XtraDB Cluster 8.0 on October 1, 2019. This is a major step for tuning Percona XtraDB Cluster to be more cloud- and user-friendly. This release combines the updated and feature-rich Galera 4, with substantial improvements made by our development team.

Improvements and New Features

Galera 4, included in Percona XtraDB Cluster 8.0, has many new features. Here is a list of the most essential improvements:

  • Streaming replication supports large transactions
  • The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)
  • More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.
  • Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.
  • Multiple system tables help find out more about the state of the cluster state.
  • The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

Percona XtraDB Cluster 8.0 has been reworked in order to improve security and reliability as well as to provide more information about your cluster:

  • There is no need to create a backup user or maintain the credentials in plain text (a security flaw). An internal SST user is created, with a random password for making a backup, and this user is discarded immediately once the backup is done.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases). This avoids manual intervention and simplifies the operation in the cloud.
  • SST (State Snapshot Transfer) rolls back or fixes an unwanted action. It is no more “a copy only block” but a smart operation to make the best use of the copy-phase.
  • Additional visibility statistics are introduced in order to obtain more information about Galera internal objects. This enables easy tracking of the state of execution and flow control.

Installation

You can only install this release from a tarball and it, therefore, cannot be installed through a package management system, such as apt or yum. Note that this release is not ready for use in any production environment.

Percona XtraDB Cluster 8.0 is based on the following:

Please be aware that this release will not be supported in the future, and as such, neither the upgrade to this release nor the downgrade from higher versions is supported.

This release is also packaged with Percona XtraBackup 8.0.5. All Percona software is open-source and free.

In order to experiment with Percona XtraDB Cluster 8.0 in your environment, download and unpack the tarball for your platform.

Note

Be sure to check your system and make sure that the packages are installed which Percona XtraDB Cluster 8.0 depends on.

For Debian or Ubuntu:

$ sudo apt-get install -y \
socat libdbd-mysql-perl \
rsync libaio1 libc6 libcurl3 libev4 libgcc1 libgcrypt20 \
libgpg-error0 libssl1.1 libstdc++6 zlib1g libatomic1

For Red Hat Enterprise Linux or CentOS:

$ sudo yum install -y openssl socat  \
procps-ng chkconfig procps-ng coreutils shadow-utils \
grep libaio libev libcurl perl-DBD-MySQL perl-Digest-MD5 \
libgcc rsync libstdc++ libgcrypt libgpg-error zlib glibc openssl-libs

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
27
2019
--

Multiplexing (Mux) in ProxySQL: Use Case

Multiplexing (Mux) in ProxySQL

Multiplexing (Mux) in ProxySQLMultiplexing Background

Historically it’s a technique used in networking to integrate multiple analog and digital signals via a shared medium. The goal of multiplexing over the network was to enable signals to be transmitted more efficiently for a given communication channel, thus achieving cost efficiency. 

Since the term Multiplexing comes from telecommunications, the industry has heavily used a device called Multiplexers – aka Mux. There was even a term called muxing where signals were often analog and digital to be combined in a single line. 

The technique was developed in the early 1870s, it’s origins to be found in telegraphy, and it has become a standard for digital telecommunications in the 20th Century. 

Following multiplexing methods are currently available: 

  • Frequency Division Multiplexing (FDM)
  • Time Division Multiplexing (TDM)
  • Wavelength Division Multiplexing (WDM)

There’s the other way in telco that we aren’t going to get into too much, which is called demultiplexer. Demux involves reanalyzing composite signals to separate them. 

How can we achieve Multiplexing for MySQL database connections?

We would basically need a Mux between the database and the application server. This means a proxy layer that can combine communication channels to the backend database. Again the goal is to reduce the overhead of opening several connections and maintaining a minimal number of open connections to reduce memory footprint. 

Thread pooling is available in the following MySQL distributions:

  • MySQL Enterprise Edition 
  • Percona Server 
  • MariaDB 

The way they work is the listener accepts the connections and hands it over to a group of thread pools. This way each client connection is handled by the same thread pool. The above implementations aren’t as efficient as ProxySQL’s implementation. 

ProxySQL comes to help when we need to Mux our communication channels to a database server, where it can often be flooded.

Main Use Cases

  • Any application with a persistent connection to the database
  • Java applications to be specific with built-in connection pools

Goals to Achieve 

  • Reduce connections similar to Aurora
  • Reduce huge number of connections from hardware 
  • Reduce context switching
  • Reduce mutex/contention 
  • Reduce CPU cache usage

ProxySQL Technique Used:

Threading Models 

  • One thread per connection
    • Easier to develop
    • Blocking I/O
  • Thread Pooling
    • Non-blocking I/O
    • Scalable architecture

Here’s what we can control using ProxySQL

Pros

  • Reduced number of connections and overhead to the backend database  
  • Control over database connections
  • Collect metrics of all database connections and monitor 
  • Can be enabled and disabled. 

Cons

  • Certain conditions and limitations apply to use
  • Can cause unexpected behavior to application logic. 

How it works:  

Ad-hoc enable/disable of multiplexing

mysql_query_rules.multiplexing allows to enable or disable multiplexing based on matching criteria. 

The field currently accepts these values:

  • 0: disable multiplex
  • 1: enable multiplex
  • 2: do not disable multiplex for this specific query containing @

Also, ProxySQL has some default behavior that can lead to unexpected results. In fact, ProxySQL can disable on purpose the multiplexing on a connection in the following cases:

  • When a transaction is active
  • When you issued commands like LOCK TABLE, GET_LOCK() and others
  • When you created a temporary table using CREATE TEMPORARY TABLE
  • When you used in the query session or user variables starting the @ symbol

In the majority of cases, ProxySQL can return the connection to the pool enabling the multiplexing. But only in the last case, when using session and user variables starting with @, the connections never return to the pool.

In a recent investigation on one of our client’s systems, we discovered a high amount of connections and threads used, despite the ProxySQL layer. The problem was related to queries containing the @ symbol in the Java connector when establishing a new connection. In a matter of seconds after enabling it, the multiplexing was disabled for all the connections. The effect is having no multiplexing at all: for example, 2000 connections from the front ends were reflected on the same number of connections to the back ends. We needed a way to avoid automatic disabling of multiplexing.  

How to find on ProxySQL stats all the executed queries containing the @ symbol:

SELECT DISTINCT digest, digest_text FROM stats_mysql_query_digest WHERE digest_text LIKE '%@%';
*************************** 1. row ***************************
digest_text: select @@version_comment limit ?
*************************** 2. row ***************************
digest_text: SELECT @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@collation_server AS collation_server, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_buffer_length AS net_buffer_length, @@net_write_timeout AS net_write_timeout, @@query_cache_size AS query_cache_size, @@query_cache_type AS query_cache_type, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@tx_isolation AS tx_isolation, @@wait_timeout AS wait_timeout
*************************** 3. row ***************************
digest_text: SELECT @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_buffer_length AS net_buffer_length, @@net_write_timeout AS net_write_timeout, @@query_cache_size AS query_cache_size, @@query_cache_type AS query_cache_type, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@tx_isolation AS tx_isolation, @@wait_timeout AS wait_timeout

To create rules on ProxySQL:

INSERT INTO mysql_query_rules(active,digest,multiplex,apply) SELECT DISTINCT 1,digest,2,0 FROM stats_mysql_query_digest WHERE digest_text LIKE '%@%';

LOAD MYSQL QUERY RULES TO RUNTIME;

SAVE MYSQL QUERY RULES TO DISK;

And the result revealed huge savings over connections and thread pool directly captured from PMM.

Conclusion

In conclusion, ProxySQL’s multiplexing feature is something every high traffic site would benefit from. What we always observe is a lower number of connections to the backend lowers the overhead of both memory utilization and context switching. In order to fully benefit multiplexing, be aware of the above limitations and investigate connection types even after implementing. 

References 

https://www.allaboutcircuits.com/technical-articles/an-intro-to-multiplexing-basis-of-telecommunications/

https://www.techopedia.com/definition/8472/multiplexing

Understanding Multiplexing in Telecommunications

https://medium.com/searce/reduce-mysql-memory-utilization-with-proxysql-multiplexing-cbe09da7921c

https://en.wikipedia.org/wiki/Multiplexing

Thanks for their valuable input :

  • Rene Cannao
  • Marco Tusa
  • Daniel Guzman Burgos
  • Corrado Pandiani
  • Tom De Comman

The AnomeMultiplexing diagram, modified, CC BY-SA 3.0

Sep
18
2019
--

Percona XtraDB Cluster 5.7.27-31.39 Is Now Available

Percona XtraDB Cluster

Percona XtraDB ClusterPercona is happy to announce the release of Percona XtraDB Cluster 5.7.27-31.39 on September 18, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.27-31.39 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

  • PXC-2432: PXC was not updating the information_schema user/client statistics properly.
  • PXC-2555: SST initialization delay: fixed a bug where the SST process took too long to detect if a child process was running.
  • PXC-2557: Fixed a crash when a node goes NON-PRIMARY and SHOW STATUS is executed.
  • PXC-2592: PXC restarting automatically on data inconsistency.
  • PXC-2605: PXC could crash when log_slow_verbosity included InnoDB.  Fixed upstream PS-5820.
  • PXC-2639: Fixed an issue where a SQL admin command (like OPTIMIZE) could cause a deadlock.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sep
17
2019
--

Percona XtraDB Cluster 5.6.45-28.36 Is Now Available

Percona XtraDB Cluster

Percona XtraDB Cluster

Percona is glad to announce the release of Percona XtraDB Cluster 5.6.45-28.36 on September 17, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.6.45-28.36 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

  • PXC-2432: PXC was not updating the information schema user/client statistics properly.
  • PXC-2555: SST initialization delay: fixed a bug where the SST process took too long to detect if a child process was running.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com