Apr
12
2017
--

ProxySQL Rules: Applying and Chaining the Rules

ProxySQL Rules

In this post, I am going to show you how you can minimize the performance impact of ProxySQL rules by using some finesse.

Apply Test

In my previous post, we could see the effect of the rules on ProxySQL performance. As we could also see, the “apply” option does not help with 1000 tables. Are we sure about this? Let’s consider: if we know 90% of our traffic won’t match any rules, it doesn’t matter if we have 10 or 500 rules – it has to check all of them. And this is going to have a serious effect on performance. How can we avoid that?

Let’s insert rule number ONE, which matches all queries, like this:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest,apply) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest([1-9]d{3,}|[1-9][0-9][1-9])b',1);

This rule matches all queries where table names > sbtest100. But again, this logic also can be applied on “userids” or any other keys. We just have to know our application and our query distribution.

With this rule, the 90% of the queries have to check only one rule (the first one):

Now we have 101 rules, but the performance is almost the same as when we had only ten rules! As we can see, creating the rules based on our query distribution has a huge impact!

But what if we don’t know which queries are the busiest, or every query has the same amount of hits? Can we do anything? Yes, we can.

Chaining

In my previous post, I mentioned the “flagIN”, “flagOUT” options. With these options we can chain the rules. But why is that good for us?

If we have 100 rules and 100 tables, even with applying, on average ProxySQL has to check 50 rules. But if we write rules like these:

insert into mysql_query_rules (flagin,flagout,username,active,retries,match_digest,apply) VALUES
(0,1000,'testuser_rw',1,3,'(from|into|update|into table) sbtest.b',0),
(0,1100,'testuser_rw',1,3,'(from|into|update|into table) sbtest1.b',0),
(0,1200,'testuser_rw',1,3,'(from|into|update|into table) sbtest2.b',0),
(0,1300,'testuser_rw',1,3,'(from|into|update|into table) sbtest3.b',0),
(0,1400,'testuser_rw',1,3,'(from|into|update|into table) sbtest4.b',0),
(0,1500,'testuser_rw',1,3,'(from|into|update|into table) sbtest5.b',0),
(0,1600,'testuser_rw',1,3,'(from|into|update|into table) sbtest6.b',0),
(0,1700,'testuser_rw',1,3,'(from|into|update|into table) sbtest7.b',0),
(0,1800,'testuser_rw',1,3,'(from|into|update|into table) sbtest8.b',0),
(0,1900,'testuser_rw',1,3,'(from|into|update|into table) sbtest9.b',0);
insert into mysql_query_rules (flagin,destination_hostgroup,active,match_digest,apply) VALUES
(1100,600,1,'(from|into|update|into table) sbtest11b',1),
(1100,600,1,'(from|into|update|into table) sbtest12b',1),
(1100,600,1,'(from|into|update|into table) sbtest13b',1),
(1100,600,1,'(from|into|update|into table) sbtest14b',1),
(1100,600,1,'(from|into|update|into table) sbtest15b',1),
(1100,600,1,'(from|into|update|into table) sbtest16b',1),
(1100,600,1,'(from|into|update|into table) sbtest17b',1),
(1100,600,1,'(from|into|update|into table) sbtest18b',1),
(1100,600,1,'(from|into|update|into table) sbtest19b',1);
...

We are going to have more than 100 rules, but first we match on the first digit after the second and then go on. With this approach ProxySQL has to only check 15 rules on average.

Let’s see the results:

As we can see, even with more rules, chaining is way faster than without chaining.

Tips

Hits

ProxySQL keeps statistics about a rule’s hits. When you add a rule you can see how many queries it applied to:

select * from stats_mysql_query_rules;
+---------+------+
| rule_id | hits |
+---------+------+
| 2 | 6860 |
| 3 | 6440 |
| 4 | 6880 |
| 5 | 6610 |
| 6 | 6850 |
| 7 | 7130 |
| 8 | 6620 |
| 9 | 7300 |
| 10 | 6750 |
| 11 | 7050 |
| 12 | 7280 |
| 13 | 6780 |
| 14 | 6670 |
...

Query_Processor_time_nsec

ProxySQL does not record how much time it spends on a rule (not yet, anyway: https://github.com/sysown/proxysql/issues/966), but it has a global stat:

select * from stats_mysql_global where Variable_name="Query_Processor_time_nsec";
+---------------------------+----------------+
| Variable_Name | Variable_Value |
+---------------------------+----------------+
| Query_Processor_time_nsec | 3184114671740 |
+---------------------------+----------------+

You can monitor this statistic, and if you see a huge increase after you added a rule, you might want to review it again.

Conclusion

ProxySQL can handle many rules, and of course they have some costs. But if you design your rules based on your workload and your query distribution, you can minimize this cost a lot.

Apr
10
2017
--

ProxySQL Rules: Do I Have Too Many?

In this blog post we are going to take a closer look at ProxySQL rules. How do they work, and how big is the performance impact of having many rules?

I would like to say thank you to Renè, who was willing to answer all my questions during my tests.

Overview

ProxySQL is heavily based on the query rules. We can set up ProxySQL without rules based only on the host groups, but if we want read/write splitting or sharding (or anything else) we need rules.

ProxySQL knows the SQL protocol and language, so we can easily create rules based on username, schema name and even on the query itself. We can write regular expressions that match the query digest. Let me show you an example:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('Testuser',601,1,3,'^SELECT');

This rule matches all the queries starting with “SELECT”, and sends them to host group 601.

After version 1.3.1, the default regex engine was RE2. Starting after version 1.4, the default regex engine will be PCRE.

I would like to highlight three options that can have a bigger impact on your rules than you think: flagINflagOUTapply.

With regards to the manual:

. . .these allow us to create “chains of rules” that get applied one after the other. An input flag value is set to 0, and only rules with flagIN=0 are considered at the beginning. When a matching rule is found for a specific query, flagOUT is evaluated and if NOT NULL the query will be flagged with the specified flag in flagOUT. If flagOUT differs from flagIN, the query will exit the current chain and enters a new chain of rules having flagIN as the new input flag. If flagOUT matches flagIN, the query will be re-evaluated again against the first rule with said flagIN. This happens until there are no more matching rules, or apply is set to 1 (which means this is the last rule to be applied)

You might not be sure what this means, but I will show you later.

As you can see, adding a rule is easy and we can add hundreds of rules, But is there any performance impact?

Test Case

We can write rules based on any part of the query (for example, “userid” or some “sharding key”). In these tests I wrote the rules based on table names because I can easily generate tables with “sysbench”, and run queries against these tables.

I created 1000 tables using sysbench, and I am going to test them with a direct MySQL connection, ProxySQL without rules, with ten rules and with 100 rules.

Time to do some tests to see if adding 100 or more rules have any effect on the performance?

I used two c4.4xlarge instances with SSDs, and I am going to share the steps so anybody can repeat my test and share/compare the results. NodeA is running MySQL 5.7.17 server, and NodeB is running “ProxySQL 1.3.4: and sysbench. During the test I increased the sysbench threads in the following steps:1,2,4,8,12,16,20,24.

I tried to use the simplest ProxySQL configuration as possible:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.10.10.243',600,3306,1000,0);
INSERT INTO mysql_replication_hostgroups VALUES (600,'','');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('testuser_rw','Testpass1.',1,600,'test');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK; 

Only one server, one host group. I tried to measure the impact the rules had, so in all the test I sent the queries to the same host group. I only changed the rules (and some ProxySQL settings, as I will explain later).

As I mentioned, I am going to filter based on table names. Here are the 100 rules that I used:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest1b');
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest2b'); ... insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest100b');First Test

First I ran tests with a direct MySQL connection, ProxySQL without rules, ProxySQL with ten rules and ProxySQL with 100 rules.

ProxySQL rules

ProxySQL itself has an impact on the performance, but there is a big difference between 10 and 100 rules. So adding more and more rules can have a negative effect on the performance.

That’s all? Can we do anything to speed things up? I used the default ProxySQL settings. Let’s have a look what can we tune.

Increasing the Number of Threads

Let’s go step by step. First we can increase the thread number inside ProxySQL (the default is 4). We will increase it to 8:

UPDATE global_variables SET variable_value='8' WHERE variable_name='mysql-threads';
SAVE MYSQL VARIABLES TO DISK;

ProxySQL has to be restarted after this changes.

ProxySQL rules

With this simple changes, we can improve the performance. As we can see, the difference is getting larger and larger as we increase the number of the sysbench threads.

Compiling

By compiling our own package, we can gain some extra performance. It is not clear why, so we opened a ticket for further investigation:

ProxySQL rules

I removed some of the columns because the graph got to busy.

ProxySQL 1.4

In ProxySQL 1.4 (which is not GA yet), we can change between the regex engines. However, even using the same engine (RE2) is faster in 1.4:

ProxySQL rules

Apply

As I mentioned, ProxySQL has a few important parameters like “apply”. With apply, if the query matches a rule it won’t check the remaining rules. In an ideal world, if you have 100 rules and 100 queries in random order which match only one rule, you only have to check 50 rules on average.

The new rules:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest,apply) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest1b',1);

ProxySQL rules

As you can see it didn’t help at all. But why? Because in this test we have 1000 tables, and we are running queries on all of the tables. This means 90% the queries have to check all the rules anyway. Let’s make a test with 100 tables to see if the “apply” helps or not:

ProxySQL rules

As we can see, with 100 tables we get a much better performance. But of course this is not a valid solution because we can’t just drop tables, “userids” or “sharding keys”. In the next post I will show you how to use “apply” in a more effective way.

Conclusion

So far, ProxySQL 1.4 with the PCRE engine and eight threads gives us the best performance with 100 rules and 1000 tables. As we can see, both the number of the rules and the query distribution matter. Both impact the performance. In my next blog post, I will show you how you can add some logic into your rules so that, even if you have more rules, you will get better performance.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com