In this blog post, I’ll present the ProxySQL query cache functionality. This is a query caching mechanism on top of ProxySQL. As there are already many how-tos regarding the ProxySQL prerequisites and installation process, we are going to skip these steps. For those who are already familiar with ProxySQL query cache configuration, let’s go directly to the query rules and the performance results.
Before talking about the ProxySQL query cache, let’s take a look at other caching mechanisms available for MySQL environments.
MySQL query cache is a query caching mechanism – deprecated as of MySQL 5.7.20 and removed in MySQL 8.0 – on top of MySQL itself (based on the official MySQL documentation).
The MySQL query cache stores the text of a SELECT statement together with the corresponding result sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
Although MySQL query cache is supposed to improve performance, there are cases where MySQL query cache is not scaling well and can degrade performance due to its locking and invalidation algorithms.
You can find a really interesting post regarding MySQL query cache here.
There is also another method to cache results in a MySQL environment. It’s external caching (i.e., Memcached, Redis, etc.), but this also has some drawbacks. Introducing such a mechanism requires some changes on the application side.
But what is ProxySQL query cache if there are already the MySQL query cache and other external caching mechanisms? At the moment, although we’ve done some tests, we are not going to compare ProxySQL query cache performance against other caching mechanisms. We’ll address this in a future blog post. We will only focus on ProxySQL itself.
What is ProxySQL Query Cache
ProxySQL query cache is an in-memory key-value storage that uses:
- as key: a combination of username, schema and query text. It is a hash derived from username, schema name and the query itself. Combining these ensures that users access only their resultsets and for the correct schema.
- as value: the resultset returned by the backend (mysqld or another proxy).
There is some more metadata stored for each resultset:
- length: length of the resultset
- expire_ms: defines when the entry will expire
- access_ms: records the last time an entry was accessed
- ref_count: a reference count to identify resultset currently in use
Based on the configuration, the resultsets are cached on the wire while queries are executed, and the resultset is returned to the application. If the application re-executes the same query within the time slot defined by “expire_ms”, the resultset is returned by the embedded ProxySQL query cache.
The only way to invalidate entries from the ProxySQL query cache is through a time-to-live in milliseconds. This is in contrast to MySQL query cache, where the query cache gets invalidated each time a table gets updated. At the moment, it is only possible to tune the total amount of memory used by the query cache, using the variable “mysql-query_cache_size_MB”. The current implementation of mysql-query_cache_size_MB doesn’t impose a hard limit. Instead, it is used as an argument by the purging thread.
It’s obvious that it’s not easy to directly compare these two cache mechanisms, as each of them has its own way to invalidate results. Please also note a significant difference between MySQL and ProxySQL when query cache is enabled. ProxySQL query cache may serve stale resultsets due to the way it invalidates cached data (cached data are valid for the interval specified by “cache_ttl”, while MySQL’s cached data get invalidated each time data change). Every query that is cached may return stale data, and this may or may not be acceptable by the application.
How it Works
Before doing any benchmarks, I will try to give you a short description of how ProxySQL query cache gets enabled. Unlike MySQL query cache, where a common caching space exists for all tables, in ProxySQL query cache we have to define what traffic gets cached. This is done by defining query rules that match traffic that is going to be cached and setting a “cache_ttl” for the cached results. There are many ways to define matches for incoming traffic, either by query or digest using patterns. All we need to cache the resultset is to define the matching criteria and the TTL. If a query passed the matching criteria, the resultset is cached so the next requests are served directly from the ProxySQL instance instead of querying the hostgroup DB nodes (if cache_ttl has not expired).
Let’s use an example to make it clear how ProxySQL query cache is enabled.
In our setup we have three backend DB servers in a master-slave topology, with Percona Server for MySQL 5.7 and a ProxySQL ver. 1.4.3 instance with sysbench 1.0 installed. Backend servers are within the same reader hostgroup, and traffic is balanced among these servers using the same priority.
As I’ve already said, we won’t look at the ProxySQL installation. There are many topologies you can implement: deploying ProxySQL on each application server thus removing the “single point of failure” weakness, for example. But in our case, we will just present the ProxySQL query cache having a single instance. In general, you would expect to have better performance with the ProxySQL instance closer to the application.
Configuration
With the ProxySQL instance up and running, let’s confirm that all servers are OK. Querying ProxySQL admin shows that all servers are ONLINE:
Admin> select * from mysql_servers; +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | 2 | 10.0.2.12 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 1 | 10.0.2.11 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 2 | 10.0.2.13 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | | 2 | 10.0.2.11 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | | +--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
As you can see, there are two hostgroups: the “1” used for WRITES and the “2” used for READS.
Some random connects proves that traffic is correctly routed to the DB backends:
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname | +--------------+ | db1-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname | +--------------+ | db2-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname | +--------------+ | db1-atsaloux | +--------------+ [RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname" +--------------+ | @@hostname | +--------------+ | db3-atsaloux | +--------------+
Let’s first take a look at some statistics. Before using sysbench, the “stats_mysql_query_digest” table (where digests are stored) is empty:
Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC; Empty set (0.00 sec)
The “stats_mysql_query_digest: table contains statistics related to the queries routed through the ProxySQL server. How many times each query was executed and the total execution time are two of the several provided statistics.
Before doing any benchmarks, I had to create some data. The following sysbench commands were used for selects by PK or by RANGE. For simplicity, we are not going to execute benchmarks inside transactions — although ProxySQL query cache is effective.
--threads
will be adjusted for each benchmark:
sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=1 --oltp-simple-ranges=0 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=0 --oltp-simple-ranges=1 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run
Before running the full benchmark, a simple sysbench was run to get the queries digests that are used for the ProxySQL query cache configuration.
After running the first benchmark with ProxySQL query cache disabled, I queried the “stats_mysql_query_digest” table again and got the following results where it logs all executed queries.
Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC; +------------+-------------+-----------+--------------------+------------------------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+-------------+-----------+--------------------+------------------------------------------------+ | 301536 | 20962929791 | 2 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | | 3269 | 30200073 | 2 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | +------------+-------------+-----------+--------------------+------------------------------------------------+ 2 row in set (0.01 sec)
Add mysql_query_rules To Be Cached
Output above provides all the needed information in order to enable ProxySQL query cache. What we need to do now add the query rules that match the results that should be cached. In this case we use a matching pattern criteria and a cache_ttl of 5000ms. Taking this into consideration, we added the following rules:
Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (1,1,'0xBF001A0C13781C1D',5000,1); Query OK, 1 row affected (0.00 sec) Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (2,1,'0x290B92FD743826DA',5000,1); Query OK, 1 row affected (0.00 sec)
We shouldn’t forget that we must load query rules at runtime. If we don’t want to lose these rules (i.e., after a ProxySQL restart), we should also save to disk:
Admin> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; Query OK, 0 rows affected (0.00 sec)
Now let’s reset the stats_mysql_query_digest results:
Admin> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; -- we reset the counters +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.01 sec) ----------
And re-run the benchmarks with query cache enabled. To confirm what traffic was cached, we have to query the stats_mysql_query_digest once again:
Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC; +------------+------------+-----------+--------------------+------------------------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+------------+-----------+--------------------+------------------------------------------------+ | 108681 | 6304585632 | 2 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | | 343277 | 0 | -1 | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? | +------------+------------+-----------+--------------------+------------------------------------------------+ 2 rows in set (0.00 sec)
Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC; +------------+-----------+-----------+--------------------+----------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+-----------+-----------+--------------------+----------------------------------+ | 79629 | 857050510 | 2 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | | 441194 | 0 | -1 | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | +------------+-----------+-----------+--------------------+----------------------------------+ 2 rows in set (0.00 sec)
Cached queries are the ones marked with a special hostgroup -1 (this means that these queries were not sent to any hostgroup), and the total execution time for the queries cached is 0 (this means that the request was served within the same events loop).
Below you can see the benchmark results. Let’s look at what happens for selects by PK and selects by RANGE:
Points of Interest
- In all cases, when threads and backend servers are increasing, ProxySQL performs better. This is achieved due to it’s connection pooling and multiplexing capabilities.
- Enabling ProxySQL query cache provides a significant performance boost.
- ProxySQL query cache can achieve a ~2X performance boost at a minimum.
- This boost can be considerably valuable in cases where MySQL performance may fall to 50% (i.e., select by RANGE).
- We shouldn’t forget that results are affected by hardware specs as well, but it’s obvious that ProxySQL with query cache enabled gives a really high throughput.
ProxySQL Query Cache Limitations
Current known limitations:
- It is not possible to define query cache invalidation other than with cache_ttl.
- There is no way to enforce query cache purge.
- mysql-query_cache_size_MB is not strictly enforced, but only used as a metric to trigger automatic purging of expired entries.
- Although access_ms is recorded, it is not used as a metric to expire an unused metric when mysql-query_cache_size_MB is achieved.
- Query cache does not support prepared statements.
- Query cache may serve stale data.
Conclusion
ProxySQL is generally a very powerful and easy-to-use tool.
With regards to query cache, it seems to scale very well and achieve a significant performance boost. Although having complex configs (not only for query cache) can add some extra overhead, it’s easy to maintain.
cache_ttl can be a limitation, but if it’s correctly configured in conjunction with max_replication_lag when configuring nodes in hostgroups, it does not add any significant drawback. In any case, it depends whether or not this is acceptable by the application.