Mar
26
2018
--

The Linux Foundation launches a deep learning foundation

Despite its name, the Linux Foundation has long been about more than just Linux. These days, it’s a foundation that provides support to other open source foundations and projects like Cloud Foundry, the Automotive Grade Linux initiative and the Cloud Native Computing Foundation. Today, the Linux Foundation is adding yet another foundation to its stable: the LF Deep Learning Foundation.

The idea behind the LF Deep Learning Foundation is to “support and sustain open source innovation in artificial intelligence, machine learning, and deep learning while striving to make these critical new technologies available to developers and data scientists everywhere.”

The founding members of the new foundation include Amdocs, AT&T, B.Yond, Baidu, Huawei, Nokia, Tech Mahindra, Tencent, Univa and ZTE. Others will likely join in the future.

“We are excited to offer a deep learning foundation that can drive long-term strategy and support for a host of projects in the AI, machine learning, and deep learning ecosystems,” said Jim Zemlin, executive director of The Linux Foundation.

The foundation’s first official project is the Acumos AI Project, a collaboration between AT&T and Tech Mahindra that was already hosted by the Linux Foundation. Acumos AI is a platform for developing, discovering and sharing AI models and workflows.

Like similar Linux Foundation-based organizations, the LF Deep Learning Foundation will offer different membership levels for companies that want to support the project, as well as a membership level for non-profits. All LF Deep Learning members have to be Linux Foundation members, too.

Mar
13
2018
--

Don’t Get Hit with a Database Disaster: Database Security Compliance

Percona Live 2018 security talks

In this post, we discuss database security compliance, what you should be looking at and where to get more information.

As Percona’s Chief Customer Officer, I get the opportunity to talk with a lot of customers. Hearing about the problems that both their technical teams face, as well as the business challenges their companies experience first-hand is incredibly valuable in terms of what the market is facing in general. Not every problem you see has a purely technical solution, and not every good technical solution solves the core business problem.

Matt Yonkovit, Percona CCOAs database technology advances and data continues to be the core blood of most modern applications, DBA’s will have a say in business level strategic planning more than ever. This coincides with the advances in technology and automation that make many classic manual “DBA” jobs and tasks obsolete. Traditional DBA’s are evolving into a blend of system architect, data strategist and master database architect. I want to talk about the business problems that not only the C-Suite care about, but DBAs as a whole need to care about in the near future.

Let’s start with one topic everyone should have near the top of their list: security.

We did a recent survey of our customers, and their biggest concern right now is security and compliance.

Not long ago, most DBA’s I knew dismissed this topic as “someone else’s problem” (I remember being told that the database is only as secure as the network, so fix the network!). Long gone are the days when network security was enough. Even the DBA’s who did worry about security only did so within the limited scope of what the database system could provide out of the box.  Again, not enough.

So let me run an experiment:

Raise your hand if your company has some bigger security initiative this year. 

I’m betting a lot of you raised your hand!

Security is not new to the enterprise. It’s been a priority for years now. However, it has not been receiving a hyper-focus in the open source database space until the last three years or so. Why? There have been a number of high profile database security breaches in the last year, all highlighting a need for better database security. This series of serious data breaches have exposed how fragile some security protocols in companies are. If that was not enough, new government regulations and laws have made data protection non-optional. This means you have to take the security of your database seriously, or there could be fines and penalties.

Percona Live 2018 security talksGovernment regulations are nothing new, but the breadth and depth of these are growing and are opening up a whole new challenge for databases systems and administrators. GDPR was signed into law two years ago (you can read more here: https://en.wikipedia.org/wiki/General_Data_Protection_Regulation and https://www.dataiq.co.uk/blog/summary-eu-general-data-protection-regulation) and is scheduled to take effect on May 25, 2018. This has many businesses scrambling not only to understand the impact, but figure out how they need to comply. These regulations redefine simple things, like what constitutes “personal data” (for instance, your anonymous buying preferences or location history even without your name).

New requirements also mean some areas get a bit more complicated as they approach the gray area of definition. For instance, GDPR guarantees the right to be forgotten. What does this mean? In theory, it means end-users can request that all their personal information is removed from your systems as if they did not exist. Seems simple, but in reality, you can go as far down the rabbit hole as you want. Does your application support this already? What about legacy applications? Even if the apps can handle it, does this mean previously taken database backups have to forget you as well? There is a lot to process for sure.

So what are the things you can do?

  1. Educate yourself and understand expectations, even if you weren’t involved in compliance discussions before.
  2. Start working on incremental improvements now on your data security. This is especially true in the area’s where you have some control, without massive changes to the application. Encryption at rest is a great place to start if you don’t have it.
  3. Start talking with others in the organization about how to identify and protect personal information.
  4. Look to increase security by default by getting involved in new applications early in the design phase.

The good news is you are not alone in tackling this challenge. Every company must address it. Because of this focus on security, we felt strongly about ensuring we had a security track at Percona Live 2018 this year. These talks from Fastly, Facebook, Percona, and others provide information on how companies around the globe are tackling these security issues. In true open source fashion, we are better when we learn and grow from one another.

What are the Percona Live 2018 security talks?

We have a ton of great security content this year at Percona Live, across a bunch of technologies and open source software. Some of the more interesting Percona Live 2018 security talks are:

Want to attend Percona Live 2018 security talks? Register for Percona Live 2018. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Mar
07
2018
--

Percona Live 2018 Featured Talk: Securing Your Data on PostgreSQL with Payal Singh

Payal PostgreSQL 1

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Payal Singh, DBA at OmniTI Computer Consulting Inc. Her talk is titled Securing Your Data on PostgreSQL. There is often a lack of understanding about how best to manage minimum basic application security features – especially with major security features being released with every major version of PostgreSQL. In our conversation, we discussed how Payal works to improve application security using Postgres:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Payal: I’m primarily a data addict. I fell in love with databases when it was first taught to me in high school. The declarative SQL syntax was intuitive to me, and efficient compared to other languages I had used (C and C++). I realized that if given the opportunity, I’d choose to become a database administrator. I joined OmniTI in summer of 2012 as a web engineer intern during my Masters, but grabbed the chance to work on an internal database migration project. Working with the DBA team gave me a lot of new insight and exposure, especially into open source databases. The more I learned, the more I loved my job. Right after completing my Masters I joined OmniTI as a full-time database administrator, and never looked back!

Percona: Your talk is titled ” Securing Your Data on PostgreSQL”. Why do you think that security (or the lack of it) is such an issue?

Payal: Securing your data is critical. In my experience, the one reason people using commercial databases are apprehensive of switching to open source alternatives is a lack of exposure to security features. If you look at open source databases today, specifically PostgreSQL, it has the most advanced security features: data encryptionauditingrow-level security to name a few. People don’t know about them, though. As a FOSS project, we don’t have a centralized marketing team to advertise these features to our potential user base, which makes it necessary to spread information through other channels. Speaking about it at a popular conference like Percona Live is one of them!

In addition to public awareness, Postgres is advancing at a lightning pace. With each new major version released every year, a bunch of new security feature additions and major improvements in existing security features are added. So much so that it becomes challenging to keep up with all these features, even for existing Postgres users. My talk on Postgres security aims to inform current as well as prospective Postgres users about the advanced security features that exist and their use case, useful tips to use them, the gotchas, what’s lacking and what’s currently under development.

Percona: Is PostgreSQL better or worse with security and security options than either MySQL or MongoDB? Why?

Payal PostgreSQL 1Payal: I may be a little biased, but I think Postgres is the best database from a security point of view. MySQL is pretty close though! There are quite a few reasons why I consider Postgres to be the best, but I’d like to save that discussion for my talk at Percona Live! For starters though, I think that Postgres’s authentication and role architecture significantly clearer and more straightforward than MySQL’s implementation. Focusing strictly on security, I’d also say that access control and management is more granular and customizable in Postgres than it is in MySQL – although here I’d have to say MySQL’s ACL is easier and more intuitive to manage.

Percona: What is the biggest challenge for database security we are facing?

Payal: For all the databases? I’d say with the rapid growth of IoT, encrypted data processing is a huge requirement that none of the well-known databases currently provide. Even encryption of data at rest outside of the IoT context requires more attention. It is one of the few things that a DBMS can do as a last-ditch effort to protect its data in SQL injection attacks, if all other layers of security (network, application layer, etc.) have failed (which very often is the case).

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Payal: My talk is a run-through of all current and future Postgres security features, from the basic to the very advanced and niche. It is not an isolated talk that assumes Postgres is the only database in the world. I often compare and contrast other database implementations of similar security features as well. Not only is it a decent one-hour primer for people new and interested in Postgres, but also a good way to weigh the pros and cons among databases from a security viewpoint.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Payal: I’m looking forward to all the great talks! I got a lot of information out of the talks at Percona Live last year. The tutorials on new MySQL features were especially great!

Want to find out more about this Percona Live 2018 featured talk, and Payal and PostgreSQL security? Register for Percona Live 2018, and see her talk Securing Your Data on PostgreSQL. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
28
2018
--

OpenStack gets support for virtual GPUs and new container features

 OpenStack, the open-source infrastructure project that aims to give enterprises the equivalent of AWS for the private clouds, today announced the launch of its 17th release, dubbed “Queens.” After all of those releases, you’d think that there isn’t all that much new that the OpenStack community could add to the project, but just as the large public clouds keep adding… Read More

Feb
22
2018
--

Feature Labs launches out of MIT to accelerate the development of machine learning algorithms

 Feature Labs, a startup with roots in research begun at MIT, officially launched today with a set of tools to help data scientists build machine learning algorithms more quickly. Co-founder and CEO Max Kanter says the company has developed a way to automate “feature engineering,” which is often a time consuming and manual process for data scientists. “Feature Labs helps… Read More

Feb
21
2018
--

Percona Live 2018 Open Source Database Conference Full Schedule Now Available

Percona Live 2018 Featured Talk

Percona Live 2018The conference session schedule for the seventh annual Percona Live 2018 Open Source Database Conference, taking place April 23-25 at the Santa Clara Convention Center in Santa Clara, CA is now live and available for review! Advance Registration Discounts can be purchased through March 4, 2018, 11:30 p.m. PST.

Percona Live Open Source Database Conference 2018 is the premier open source database event. With a theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to start learning how to use and operate open source databases.

Major areas of focus at the conference include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB Landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

Conference Session Schedule

Conference sessions take place April 24-25 and will feature 90+ in-depth talks by industry experts related to each of the key areas. Several sessions from Oracle and Percona will focus on how the new features and enhancements in the upcoming release of MySQL 8.0 will impact businesses. Conference session examples include:

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Platinum – Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, SolarWinds, Timescale, TwinDB, Yelp
  • Other Sponsors – cPanel
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference is held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Feb
13
2018
--

InfluxData scores $35 million Series C to expand time series database business

 In a world where sensors are capturing ever-increasing amounts of data, being able to collect that high volume and measure it over time becomes increasingly important. InfluxData, the startup built on top of the open source time series database platform, announced it has received a $35 million Series C investment today led by Sapphire Ventures, the investment arm of enterprise software… Read More

Feb
08
2018
--

Sylabs launches Singularity Pro, a container platform for high performance computing

 Sylabs, the commercial company behind the open source Singularity container engine, announced its first commercial product today, Singularity Pro. Sylabs was launched in 2015 to create a container platform specifically designed for scientific and high performance computing use cases, two areas that founder and CEO Gregory Kurtzer, says were left behind in the containerization movement over… Read More

Feb
08
2018
--

Tutorial Schedule for Percona Live 2018 Is Live

Percona Live 2018

Percona Live 2018Percona has revealed the line-up of in-depth tutorials for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, Calif. Secure your spot now with Advanced Registration prices (available until March 4, 2018). Sponsorship opportunities for the conference are still available.

Percona Live 2018 Open Source Database Conference is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. 

Tutorials take place throughout the day on April 23, 2018. Tutorials provide practical, in-depth knowledge of critical open source database issues. Topics include:

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – SolarWinds, TwinDB, Yelp
  • Media Sponsors – Datanami, EnterpriseTech, HPCWire, ODBMS.org
Feb
07
2018
--

ProxySQL Query Cache: What It Is, How It Works

ProxySQL query cache

In this blog post, I’ll present the ProxySQL query cache functionality. This is a query caching mechanism on top of ProxySQL. As there are already many how-tos regarding the ProxySQL prerequisites and installation process, we are going to skip these steps. For those who are already familiar with ProxySQL query cache configuration, let’s go directly to the query rules and the performance results.

Before talking about the ProxySQL query cache, let’s take a look at other caching mechanisms available for MySQL environments.

MySQL query cache is a query caching mechanism – deprecated as of MySQL 5.7.20 and removed in MySQL 8.0 – on top of MySQL itself (based on the official MySQL documentation).

The MySQL query cache stores the text of a SELECT statement together with the corresponding result sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

Although MySQL query cache is supposed to improve performance, there are cases where MySQL query cache is not scaling well and can degrade performance due to its locking and invalidation algorithms.

You can find a really interesting post regarding MySQL query cache here.

There is also another method to cache results in a MySQL environment. It’s external caching (i.e., Memcached, Redis, etc.), but this also has some drawbacks. Introducing such a mechanism requires some changes on the application side.

But what is ProxySQL query cache if there are already the MySQL query cache and other external caching mechanisms? At the moment, although we’ve done some tests, we are not going to compare ProxySQL query cache performance against other caching mechanisms. We’ll address this in a future blog post. We will only focus on ProxySQL itself.

What is ProxySQL Query Cache

ProxySQL query cache is an in-memory key-value storage that uses:

  • as key: a combination of username, schema and query text. It is a hash derived from username, schema name and the query itself. Combining these ensures that users access only their resultsets and for the correct schema.
  • as value: the resultset returned by the backend (mysqld or another proxy).

There is some more metadata stored for each resultset:

  • length: length of the resultset
  • expire_ms: defines when the entry will expire
  • access_ms: records the last time an entry was accessed
  • ref_count: a reference count to identify resultset currently in use

Based on the configuration, the resultsets are cached on the wire while queries are executed, and the resultset is returned to the application. If the application re-executes the same query within the time slot defined by “expire_ms”, the resultset is returned by the embedded ProxySQL query cache.

The only way to invalidate entries from the ProxySQL query cache is through a time-to-live in milliseconds. This is in contrast to MySQL query cache, where the query cache gets invalidated each time a table gets updated. At the moment, it is only possible to tune the total amount of memory used by the query cache, using the variable “mysql-query_cache_size_MB”. The current implementation of mysql-query_cache_size_MB doesn’t impose a hard limit. Instead, it is used as an argument by the purging thread.

It’s obvious that it’s not easy to directly compare these two cache mechanisms, as each of them has its own way to invalidate results. Please also note a significant difference between MySQL and ProxySQL when query cache is enabled. ProxySQL query cache may serve stale resultsets due to the way it invalidates cached data (cached data are valid for the interval specified by “cache_ttl”, while MySQL’s cached data get invalidated each time data change). Every query that is cached may return stale data, and this may or may not be acceptable by the application.

How it Works

Before doing any benchmarks, I will try to give you a short description of how ProxySQL query cache gets enabled. Unlike MySQL query cache, where a common caching space exists for all tables, in ProxySQL query cache we have to define what traffic gets cached. This is done by defining query rules that match traffic that is going to be cached and setting a “cache_ttl” for the cached results. There are many ways to define matches for incoming traffic, either by query or digest using patterns. All we need to cache the resultset is to define the matching criteria and the TTL. If a query passed the matching criteria, the resultset is cached so the next requests are served directly from the ProxySQL instance instead of querying the hostgroup DB nodes (if cache_ttl has not expired).

Let’s use an example to make it clear how ProxySQL query cache is enabled.

In our setup we have three backend DB servers in a master-slave topology, with Percona Server for MySQL 5.7 and a ProxySQL ver. 1.4.3 instance with sysbench 1.0 installed. Backend servers are within the same reader hostgroup, and traffic is balanced among these servers using the same priority.

As I’ve already said, we won’t look at the ProxySQL installation. There are many topologies you can implement: deploying ProxySQL on each application server thus removing the “single point of failure” weakness, for example. But in our case, we will just present the ProxySQL query cache having a single instance. In general, you would expect to have better performance with the ProxySQL instance closer to the application.

Configuration

With the ProxySQL instance up and running, let’s confirm that all servers are OK. Querying ProxySQL admin shows that all servers are ONLINE:

Admin> select * from mysql_servers;
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname  | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 2        	| 10.0.2.12 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
| 1        	| 10.0.2.11 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
| 2        	| 10.0.2.13 | 3306 | ONLINE | 1  	| 0       	| 1000            | 0               	| 0   	| 0          	|     	|
| 2        	| 10.0.2.11 | 3306 | ONLINE | 1  	| 0       	| 1000        	| 0               	| 0   	| 0          	|     	|
+--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

As you can see, there are two hostgroups: the “1” used for WRITES and the “2” used for READS.

Some random connects proves that traffic is correctly routed to the DB backends:

[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db1-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db2-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db1-atsaloux |
+--------------+
[RDBA] percona@atsaloux-proxysql: ~ $ mysql -h 127.0.0.1 -P 6033 -e "select @@hostname"
+--------------+
| @@hostname   |
+--------------+
| db3-atsaloux |
+--------------+

Let’s first take a look at some statistics. Before using sysbench, the “stats_mysql_query_digest” table (where digests are stored) is empty:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
Empty set (0.00 sec)

The “stats_mysql_query_digest: table contains statistics related to the queries routed through the ProxySQL server. How many times each query was executed and the total execution time are two of the several provided statistics.

Before doing any benchmarks, I had to create some data. The following sysbench commands were used for selects by PK or by RANGE. For simplicity, we are not going to execute benchmarks inside transactions — although ProxySQL query cache is effective.

--threads

 will be adjusted for each benchmark:

sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=1 --oltp-simple-ranges=0 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run
sysbench --threads=16 --max-requests=0 --time=60 --mysql-user=percona --mysql-password=percona --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=6033 --oltp-table-size=1000000 --oltp-read-only=on --oltp-skip-trx=on --oltp-test-mode=simple --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-point-selects=0 --oltp-simple-ranges=1 /usr/share/sysbench/tests/include/oltp_legacy/oltp.lua run

Before running the full benchmark, a simple sysbench was run to get the queries digests that are used for the ProxySQL query cache configuration.

After running the first benchmark with ProxySQL query cache disabled, I queried the “stats_mysql_query_digest” table again and got the following results where it logs all executed queries.

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC;
+------------+-------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time    | hostgroup | digest             | digest_text                                    |
+------------+-------------+-----------+--------------------+------------------------------------------------+
| 301536     | 20962929791 | 2         | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=?               |
| 3269       | 30200073    | 2         | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+-------------+-----------+--------------------+------------------------------------------------+
2 row in set (0.01 sec)

Add mysql_query_rules To Be Cached

Output above provides all the needed information in order to enable ProxySQL query cache. What we need to do now add the query rules that match the results that should be cached. In this case we use a matching pattern criteria and a cache_ttl of 5000ms. Taking this into consideration, we added the following rules:

Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (1,1,'0xBF001A0C13781C1D',5000,1);
Query OK, 1 row affected (0.00 sec)
Admin> INSERT INTO mysql_query_rules (rule_id,active,digest,cache_ttl,apply) VALUES (2,1,'0x290B92FD743826DA',5000,1);
Query OK, 1 row affected (0.00 sec)

We shouldn’t forget that we must load query rules at runtime. If we don’t want to lose these rules (i.e., after a ProxySQL restart), we should also save to disk:

Admin> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Now let’s reset the stats_mysql_query_digest results:

Admin> SELECT 1 FROM stats_mysql_query_digest_reset LIMIT 1; -- we reset the counters
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
----------

And re-run the benchmarks with query cache enabled. To confirm what traffic was cached, we have to query the stats_mysql_query_digest once again:

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+------------+-----------+--------------------+------------------------------------------------+
| count_star | sum_time   | hostgroup | digest             | digest_text                                    |
+------------+------------+-----------+--------------------+------------------------------------------------+
| 108681     | 6304585632 | 2         | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
| 343277     | 0          | -1        | 0x290B92FD743826DA | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
+------------+------------+-----------+--------------------+------------------------------------------------+
2 rows in set (0.00 sec)

Admin> SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;
+------------+-----------+-----------+--------------------+----------------------------------+
| count_star | sum_time  | hostgroup | digest             | digest_text                      |
+------------+-----------+-----------+--------------------+----------------------------------+
| 79629      | 857050510 | 2         | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
| 441194     | 0         | -1        | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
+------------+-----------+-----------+--------------------+----------------------------------+
2 rows in set (0.00 sec)

Cached queries are the ones marked with a special hostgroup -1 (this means that these queries were not sent to any hostgroup), and the total execution time for the queries cached is 0 (this means that the request was served within the same events loop).

Below you can see the benchmark results. Let’s look at what happens for selects by PK and selects by RANGE:

ProxySQL query cache

 

ProxySQL query cache

Points of Interest

  • In all cases, when threads and backend servers are increasing, ProxySQL performs better. This is achieved due to it’s connection pooling and multiplexing capabilities.
  • Enabling ProxySQL query cache provides a significant performance boost.
  • ProxySQL query cache can achieve a ~2X performance boost at a minimum.
  • This boost can be considerably valuable in cases where MySQL performance may fall to 50% (i.e., select by RANGE).
  • We shouldn’t forget that results are affected by hardware specs as well, but it’s obvious that ProxySQL with query cache enabled gives a really high throughput.

ProxySQL Query Cache Limitations

Current known limitations:

  • It is not possible to define query cache invalidation other than with cache_ttl.
  • There is no way to enforce query cache purge.
  • mysql-query_cache_size_MB is not strictly enforced, but only used as a metric to trigger automatic purging of expired entries.
  • Although access_ms is recorded, it is not used as a metric to expire an unused metric when mysql-query_cache_size_MB is achieved.
  • Query cache does not support prepared statements.
  • Query cache may serve stale data.

Conclusion

ProxySQL is generally a very powerful and easy-to-use tool.

With regards to query cache, it seems to scale very well and achieve a significant performance boost. Although having complex configs (not only for query cache) can add some extra overhead, it’s easy to maintain.

cache_ttl can be a limitation, but if it’s correctly configured in conjunction with max_replication_lag when configuring nodes in hostgroups, it does not add any significant drawback. In any case, it depends whether or not this is acceptable by the application.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com